CSV Parsing Intro
CSVHelper is a de facto standard in C# to read and write CSV files in a simple and flexible way. It provides a number of features such as support for reading and writing different types of data, handling different types of delimiters, and handling different types of encodings.
It also includes built-in support for handling common data types, such as dates and numbers, and provides the ability to customize how data is read and written.
To read a CSV file using CSVHelper in C#, we first need to define the CSV field mappings in a class file in C#, for example:
public readonly record struct CsvRecord {
[Name("NAME")]
public readonly string Name { get; init; }
[Name("FIELD_FLOAT")]
public readonly float FieldFloat { get; init; }
[Name("FIELD_INT")]
public readonly float FieldInt { get; init; }
}
We use C# 10 readonly record struct
here since we want instances to be immutable. [Name]
annotations specify the field names as they appear in the CSV file.
Next, we can define a function to read the CSV file using CsvReader
:
public IReadOnlyCollection<CsvRecord> ParseCsvFile(FileInfo fileInfo) {
Logger.LogInformation($"Reading report file from {fileInfo.FullName}");
using var stream = new StreamReader(fileInfo.FullName);
using var csv = new CsvReader(
stream,
new CsvConfiguration(CultureInfo.InvariantCulture) {
// header / no header
HasHeaderRecord = true,
// ignore parsing exceptions
ReadingExceptionOccurred = _ => false
}
);
return csv.GetRecords<CsvRecord>().ToImmutableList();
}
The CsvConfiguration
defines various options, such as whether CSV file contains a header record,
trimming, etc.
Some commonly used options are:
-
HasHeaderRecord
— boolean to define whether CSV file contains a header record. -
TrimOptions
— enum to define whether to trim whitespace in the fields, can beTrimOptions.None
,TrimOptions.Trim
andTrimOptions.InsideQuotes
. -
Delimiter
— string to specify the delimiter if not comma. -
Comment
— string to specify the comment character (no comments allowed by default), should go together withAllowComments
option. -
AllowComments
— boolean to define whether comments are allowed or not. -
IgnoreBlankLines
— boolean to define whether to ignore empty lines. -
ReadingExceptionOccurred
— callable to decide what to do when exception happens, this can be useful to ignore malformed lines.
Full list of options is available in the source code: github: CsvConfiguration.cs.
If CSV is zipped, you can replace the StreamReader
with the following to read it:
using var reader = File.OpenRead(fileInfo.FullName);
using var zip = new GZipStream(reader, CompressionMode.Decompress, true);
using var stream = new StreamReader(zip);
The function then returns csv records as a read-only collection of the struct objects we have defined above.