CSV Parsing Intro

CSVHelper is a de facto standard in C# to read and write CSV files in a simple and flexible way. It provides a number of features such as support for reading and writing different types of data, handling different types of delimiters, and handling different types of encodings.

It also includes built-in support for handling common data types, such as dates and numbers, and provides the ability to customize how data is read and written.

To read a CSV file using CSVHelper in C#, we first need to define the CSV field mappings in a class file in C#, for example:

public readonly record struct CsvRecord {

    [Name("NAME")]
    public readonly string Name { get; init; }

    [Name("FIELD_FLOAT")]
    public readonly float FieldFloat { get; init; }

    [Name("FIELD_INT")]
    public readonly float FieldInt { get; init; }

}

We use C# 10 readonly record struct here since we want instances to be immutable. [Name] annotations specify the field names as they appear in the CSV file.

Next, we can define a function to read the CSV file using CsvReader:

public IReadOnlyCollection<CsvRecord> ParseCsvFile(FileInfo fileInfo) {
    Logger.LogInformation($"Reading report file from {fileInfo.FullName}");
    using var stream = new StreamReader(fileInfo.FullName);
    using var csv = new CsvReader(
        stream,
        new CsvConfiguration(CultureInfo.InvariantCulture) {
            // header / no header
            HasHeaderRecord = true,
            // ignore parsing exceptions
            ReadingExceptionOccurred = _ => false
        }
    );
    return csv.GetRecords<CsvRecord>().ToImmutableList();
}

The CsvConfiguration defines various options, such as whether CSV file contains a header record, trimming, etc. Some commonly used options are:

  • HasHeaderRecord — boolean to define whether CSV file contains a header record.
  • TrimOptions — enum to define whether to trim whitespace in the fields, can be TrimOptions.None, TrimOptions.Trim and TrimOptions.InsideQuotes.
  • Delimiter — string to specify the delimiter if not comma.
  • Comment — string to specify the comment character (no comments allowed by default), should go together with AllowComments option.
  • AllowComments — boolean to define whether comments are allowed or not.
  • IgnoreBlankLines — boolean to define whether to ignore empty lines.
  • ReadingExceptionOccurred — callable to decide what to do when exception happens, this can be useful to ignore malformed lines.

Full list of options is available in the source code: github: CsvConfiguration.cs.

If CSV is zipped, you can replace the StreamReader with the following to read it:

using var reader = File.OpenRead(fileInfo.FullName);
using var zip = new GZipStream(reader, CompressionMode.Decompress, true);
using var stream = new StreamReader(zip);

The function then returns csv records as a read-only collection of the struct objects we have defined above.