Parsing Text Files Line by Line

When working with computer code, a task that comes up often is parsing text files. There are many reasons to do this. Some examples are parsing commands from an interpreted language, reading data from a dataset, reading configuration parameters, applying automated changes to code, and many more. Text files are generally human-readable and separated into lines, so line-by-line parsing is very common. Most programming languages have relatively easy ways to read a text file line-by-line. The following example shows two ways to do this in perl:

#!/usr/bin/perl

$file_name = “test.txt”;

# Read the file in-place

if (open my $file, “<$file_name“) {

# NOTE: The <> operator pulls the next line from a file

while (my $line = <$file>) {

# NOTE: $line still has the newline character

print $line;

}

close $file;

}

# Read the entire file into a variable. This can be faster than in-place

# but uses more memory.

if (open my $file, “<$file_name“) {

# NOTE: This changes the ‘$/’ global variable so that the <> operator pulls

# in the entire file.

my $file_contents = do { local $/; <$file> };

my @file_lines = split (‘\R’, $file_contents);

foreach my $line (@file_lines) {

# NOTE: This time, $line does not have a newline character

print “$line\n”;

}

Perl’s built-in regex features make it a very powerful tool for parsing text files. Perl scripts can easily search for lines that match a pattern, and perform some operation when a match is found. However, Perl is an interpreted language so it’s not suitable for projects that require large computations or real-time operation. These tasks are better handled by a compiled language like C. The following example shows one way to parse a text file line-by-line in C:

#include <stdio.h>

#define FILE_NAME “test.txt”

int main (int argc, char *argv[])

{

FILE *fin;

// NOTE: Line length is limited

char line[256];

/* Open the input file for reading in text mode */

fin = fopen(FILE_NAME, “r”);

if (fin) {

/* Read one line at a time until end of file */

while (fgets(line, sizeof(line), fin)) {

// NOTE: line contains the end-of-line character

printf(“%s”, line);

}

fclose(fin);

fin = NULL;

}

return 0;

}

The C code is a little more involved than the Perl, but not by much. The C language does not have built-in regex support so the actual parsing could be much more complex, but such parsing may be the only way to get required data into the application.

Complete Communications Engineering

Parsing Text Files Line by Line