When working with computer code, a task that comes up often is parsing text files. There are many reasons to do this. Some examples are parsing commands from an interpreted language, reading data from a dataset, reading configuration parameters, applying automated changes to code, and many more. Text files are generally human-readable and separated into lines, so line-by-line parsing is very common. Most programming languages have relatively easy ways to read a text file line-by-line. The following example shows two ways to do this in perl:
#!/usr/bin/perl
$file_name = “test.txt”;
# Read the file in-place if (open my $file, “<$file_name“) { # NOTE: The <> operator pulls the next line from a file while (my $line = <$file>) { # NOTE: $line still has the newline character print $line; } close $file; }
# Read the entire file into a variable. This can be faster than in-place # but uses more memory. if (open my $file, “<$file_name“) { # NOTE: This changes the ‘$/’ global variable so that the <> operator pulls # in the entire file. my $file_contents = do { local $/; <$file> }; my @file_lines = split (‘\R’, $file_contents); foreach my $line (@file_lines) { # NOTE: This time, $line does not have a newline character print “$line\n”; } } |
Perl’s built-in regex features make it a very powerful tool for parsing text files. Perl scripts can easily search for lines that match a pattern, and perform some operation when a match is found. However, Perl is an interpreted language so it’s not suitable for projects that require large computations or real-time operation. These tasks are better handled by a compiled language like C. The following example shows one way to parse a text file line-by-line in C:
#include <stdio.h>
#define FILE_NAME “test.txt”
int main (int argc, char *argv[]) { FILE *fin; // NOTE: Line length is limited char line[256];
/* Open the input file for reading in text mode */ fin = fopen(FILE_NAME, “r”); if (fin) { /* Read one line at a time until end of file */ while (fgets(line, sizeof(line), fin)) { // NOTE: line contains the end-of-line character printf(“%s”, line); } fclose(fin); fin = NULL; }
return 0; } |
The C code is a little more involved than the Perl, but not by much. The C language does not have built-in regex support so the actual parsing could be much more complex, but such parsing may be the only way to get required data into the application.