Excel Parsing Intro in C#

Handling Excel files in C# is facilitated by several popular libraries, each offering different features and capabilities. This article will primarily focus on EPPlus, but let's briefly cover some of the main libraries:

  • EPPlus: A powerful and popular library for reading and writing Excel files. It provides features like support for different data types, handling various Excel formulae, generating charts and pivot tables programmatically, and working with .xlsx files using the Office Open XML specification. It does not support the older .xls format.
  • NPOI: Another feature-rich library that provides extensive functionality for manipulating Excel, Word, and PowerPoint files. Unlike EPPlus, it supports both .xls and .xlsx formats. However, it can be more complex to use and may be overkill for simple Excel processing tasks.
  • ClosedXML: This library provides an easier-to-use API than EPPlus and NPOI for basic tasks and also supports .xlsx files based on the Open XML specification. While not as feature-rich as the others, it can be a good choice for simpler tasks that don't require advanced features.
  • Microsoft.Office.Interop.Excel: This is a library provided by Microsoft that enables interaction with Excel by using the same object model that Visual Basic for Applications (VBA) uses. This means that most operations that can be performed in Excel can also be done in C# through this library. However, it requires an installed version of Excel on the machine where the code runs and can be slower and more resource-consuming than other libraries.

Working with EPPlus

EPPlus works with .xlsx files and uses the Office Open XML specification. Similar to CSV parsing, we define a data holder class that represents the structure of our data in the Excel file. For instance:

public class ExcelRecord {

    public string Name { get; set; }

    public float FieldFloat { get; set; }

    public int FieldInt { get; set; }
}

Then, we can define a function to read the Excel file using the ExcelPackage class provided by EPPlus:

public IList<ExcelRecord> ParseExcelFile(string filePath) {
    var excelRecords = new List<ExcelRecord>();
    using var package = new ExcelPackage(new FileInfo(filePath));
    var worksheet = package.Workbook.Worksheets[0]; // gets the first worksheet

    int rowCount = worksheet.Dimension.Rows;

    for (int row = 2; row <= rowCount; row++)
    {
        excelRecords.Add(new ExcelRecord
        {
            Name = worksheet.Cells[row, 1].Value.ToString(),
            FieldFloat = float.Parse(worksheet.Cells[row, 2].Value.ToString()),
            FieldInt = int.Parse(worksheet.Cells[row, 3].Value.ToString())
        });
    }

    return excelRecords;
}
  • The ExcelPackage class provides methods to interact with the Excel file
  • The Workbook property of an ExcelPackage object represents the Excel file
  • Worksheets collection contains all the sheets in the file

We loop through each row in the worksheet starting from row 2 (assuming the first row contains headers), and create an ExcelRecord object for each row.

If Excel file has multiple worksheets, you can access them by their index or name, like so:

var firstWorksheet = package.Workbook.Worksheets[0];
var namedWorksheet = package.Workbook.Worksheets["Sheet1"];

Working with Microsoft.Office.Interop.Excel

Microsoft.Office.Interop.Excel, often referred to as Interop, provides a lower-level way to interact with Excel. It offers full control over Excel's features but can be more complex to use. It requires Excel to be installed on the user's machine.

Here's a simple example of reading data from an Excel worksheet using Interop:

var excelApp = new Microsoft.Office.Interop.Excel.Application();
var workbook = excelApp.Workbooks.Open("C:\\Path\\To\\Your\\File.xlsx");
var worksheet = (Microsoft.Office.Interop.Excel.Worksheet)workbook.Sheets[1]; // Index starts at 1
var range = worksheet.UsedRange; // This represents the used range of the worksheet
for (int row = 1; row <= range.Rows.Count; row++)
{
for (int col = 1; col <= range.Columns.Count; col++)
{
var cellValue = ((Microsoft.Office.Interop.Excel.Range)range.Cells[row, col]).Value2;
Console.WriteLine(cellValue);
}
}

workbook.Close();
excelApp.Quit();

Note that proper cleanup of COM objects is critical when using Interop to prevent memory leaks and lingering Excel processes.

In the next articles, we will delve deeper into the features offered by these libraries, including handling different data types, reading from multiple worksheets, and writing data to Excel files.