C# CSV Parsing Complete Example
In this example, we are going to read simple price records from a CSV file and store them in the database. The program is supposed to run on schedule by an OS, such as Cron or Task Scheduler. The basic structure of our project is as follows:
-
domain
-
PriceCsvRecord.cs
-
-
options
-
CsvOptions.cs
-
DbOptions.cs
-
-
repositories
-
PriceRepository.cs
-
-
services
-
CsvParsingService.cs
-
PriceService.cs
-
-
CsvJob.cs
-
Program.cs
-
appsettings.json
The structure follows a common Domain-Driven Design (DDD) approach, and we will see the purpose of each file later, but here is the workflow in a nutshell:
Program.cs
is the entry point of the program, that will do the dependency injection and start the CSV parsing job.CsvJob
starts the service and call the CSV processing routine, handling errors if necessary.PriceService
does CSV parsing by callingCsvParsingService
, converts CSV records to the domain objects and stores relevant data to the database viaPriceRepository
.
Let's start with the domain and options definitions. First we look at PriceCsvRecord.cs
:
public readonly record struct PriceCsvRecord {
[Name("NAME")]
public readonly string ContractName { get; init; }
[Name("DATE")]
public readonly DateOnly Date { get; init; }
[Name("PRICE")]
public readonly decimal Price { get; init; }
}
We keep our csv record class simple, with just contract name, date and price fields. Look at the CSV Parsing Intro for more information.
CsvOptions.cs
and DbOptions.cs
are simple classes to hold some config variables:
public class CsvOptions {
// path to read CSVs from
public string CsvPath { get; set; }
// path to move them after reading (read further)
public string ArchivePath { get; set; }
}
public class DbOptions {
public string DB { get; set; }
}
PriceRepository.cs
stores CSV records in the database:
public interface IPriceRepository {
public Task InsertMany(IEnumerable<Price> prices);
}
public class PriceRepository: IPriceRepository {
private readonly MyDbContext _dbContext;
public PriceRepository(MyDbContext dbContext) {
_dbContext = dbContext;
}
public Task InsertMany(IEnumerable<Price> prices) {
await _dbContext.Prices.AddRangeAsync(prices);
await _dbContext.SaveChangesAsync();
}
}
PriceService.cs
is a domain-level service that converts CSV record object to the database objects and
make the insert call to the repository.
Real applications can extend the logic of insert beyond simple call forwarding:
public interface IPriceService {
public Task<IReadOnlyCollection<Price>> ConvertToStlPrices(IReadOnlyCollection<PriceCsvRecord> records);
Task InsertPrices(IReadOnlyCollection<Price> prices);
}
public class PriceService : IPriceService {
private readonly ILogger<PriceService> _logger;
private readonly IPriceRepository _priceRepository;
public PriceService(
ILogger<PriceService> logger,
IPriceRepository priceRepository
) {
_logger = logger;
_priceRepository = priceRepository;
}
public async Task<IReadOnlyCollection<Price>> ConvertToPrices(
IReadOnlyCollection<PriceCsvRecord> records
) {
return records
.Select(
r => new Price(
r.ContractName, r.Date, r.Price
)
)
.ToImmutableList();
}
public async Task InsertPrices(IReadOnlyCollection<Price> prices) {
await _priceRepository.InsertMany(prices);
}
}
Where Price
is an entity class in your database model.
Next we move to CsvParsingService.cs
:
public interface ICsvParsingService {
public Task ProcessCsv();
}
public class CsvParsingService : ICsvParsingService {
private readonly ILogger<CsvParsingService> _logger;
private readonly CsvOptions _options;
private readonly IHolidayService _holidayService;
public CsvParsingService(
ILogger<CsvParsingService> logger,
IOptions<CsvOptions> csvOptions,
IPriceService pricesService
) {
_logger = logger;
_options = csvOptions.Value;
if (!Directory.Exists(_options.CsvPath)) {
throw new PathException($"Input Folder [{_options.CsvPath}] does not exist");
}
_priceService = pricesService;
}
public async Task ProcessCsv() {
Logger.LogDebug($"Checking new files in {Options.FtpTo}");
var reportDir = new DirectoryInfo(Options.FtpTo);
var allFiles = reportDir.GetFiles();
foreach (var fileInfo in allFiles) {
if (FileSystemName.MatchesSimpleExpression("prices.csv", fileInfo.Name)) {
Logger.LogDebug($"Found holidays.out file in {Options.FtpTo}");
var records = ParseCsvFile(fileInfo);
var prices = await _priceService.ConvertToPrices(records);
await _priceService.InsertPrices(prices);
// in a typical production system, you either remove the file or move it to archive
MoveToArchive(fileInfo, Options.ArchivePath);
}
}
}
private IEnumerable<PriceCsvRecord> ParseCsvFile(FileInfo fileInfo) {
Logger.LogInformation($"Reading report file from {fileInfo.FullName}");
using var reader = new StreamReader(fileInfo.FullName);
using var csv = new CsvReader(reader, CultureInfo.InvariantCulture);
// register class map here if you have one
//csv.Context.RegisterClassMap<PriceCsvRecord>();
return csv.GetRecords<PriceCsvRecord>().ToList();
}
private void MoveToArchive(FileInfo sourceFile, string archivePath) {
var directoryInfo =
Directory.CreateDirectory(Path.Combine(archivePath, "CSVs"));
File.Move(sourceFile.FullName, Path.Combine(directoryInfo.FullName, sourceFile.Name), true);
Logger.LogInformation($"Moved {sourceFile.FullName} to {directoryInfo}");
}
}
public class PathException : Exception {
public PathException(string message) : base(message) { }
}
CsvParsingService
does most of the heavy lifting: checks the directory we read CSV files from exists,
parses CSV file to obtain record objects, calls PriceService
to insert them, and moves or deletes the
original file.
CsvJob
is an auxiliary class for global error handling:
public class CsvJob
{
private readonly ILogger<CsvJob> _logger;
private readonly IServiceProvider _serviceProvider;
public CsvJob(
ILogger<CsvJob> logger,
IServiceProvider serviceProvider
) {
_logger = logger;
_serviceProvider = serviceProvider;
}
public async Task DoWork()
{
Thread.CurrentThread.Name ??= GetType().Name;
_logger.LogInformation($"{GetType().Name} is starting.");
try {
using var scope = _serviceProvider.CreateScope();
var svc = scope.ServiceProvider.GetRequiredService<ICsvParsingService>();
await svc.ProcessCsv();
} catch (Exception ex) {
_logger.LogError(ex, $"{GetType().Name} exception. Runtime Error: " + ex.Message);
}
_logger.LogInformation($"{GetType().Name} is finishing.");
}
}
Finally, we have the Program.cs
entry point, where we do all our configuration scaffolding, dependency
injection
and call CsvJob
to start the process:
[ExcludeFromCodeCoverage]
class Program
{
public static void Main(string[] args) {
RunConsoleApp(args).GetAwaiter().GetResult();
}
private static async Task RunConsoleApp(string[] args) {
var env = new HostingEnvironment {
EnvironmentName = Environment.GetEnvironmentVariable("DOTNET_ENVIRONMENT") ?? "Production",
ApplicationName = AppDomain.CurrentDomain.FriendlyName,
ContentRootPath = AppDomain.CurrentDomain.BaseDirectory,
ContentRootFileProvider = new PhysicalFileProvider(AppDomain.CurrentDomain.BaseDirectory)
};
IConfiguration configuration = new ConfigurationBuilder()
.AddJsonFile("appsettings.json", optional: false, reloadOnChange: true)
.AddJsonFile($"appsettings.{env.EnvironmentName}.json", optional: true, reloadOnChange: true)
.AddEnvironmentVariables()
.AddCommandLine(args)
.Build();
var services = new ServiceCollection();
services
.AddScoped<IPriceService, PriceService>()
.AddScoped<IPriceRepository, PriceRepository>()
.AddScoped<ICsvParsingService, CsvParsingService>()
.AddScoped<CsvJob>()
.AddSingleton<IHostEnvironment>(_ => env)
.AddOptions();
var dbOptions = configuration.GetSection(nameof(DbConnectionOptions)).Get<DbConnectionOptions>();
services.AddDbContextFactory<MyDbContext>(
options => options.UseSqlServer(dbOptions.DB, o => o.EnableRetryOnFailure())
.ConfigureWarnings(c => c.Log((RelationalEventId.CommandExecuted, LogLevel.Debug)))
);
services.Configure<CsvOptions>(configuration.GetSection(nameof(CsvOptions)));
services.AddLogging(logging =>
{
logging.ClearProviders();
logging.AddConfiguration(configuration.GetSection("Logging"));
logging.AddNLog();
logging.AddConsole();
});
var serviceProvider = services.BuildServiceProvider(true);
var scope = serviceProvider.CreateScope();
await scope.ServiceProvider.GetRequiredService<CsvJob>().DoWork();
}
}
We won't go into all the details of the file, as it deserves a separate article, but here is what is happening here in a nutshell:
ConfigurationBuilder
creates a configuration object from the appsettings.json
file.
We use it later to read various sections from the file and map them to the options domain objects.
We also allow to override options if another appsettings
files is present for a given environment.
For example, when starting the program locally, you can set DOTNET_ENVIRONMENT=local
.
Then, settings from appsettings.local.json
will override the settings from the base file.
ServiceCollection
creates services collections that allows to specify Dependency Injection (DI)
object
that we want to be available for binding in constructors.
services.AddDbContextFactory
creates a factory binding for the connection to the database using EF
Core.
You should have your context definition MyDbContext
created before.
services.AddLogging
add logging based on the NLog library.
On the last line, we call our job to start the process.