Plexdata CSV Parser

Overview

The Plexdata CSV Parser represents a library that allows importing and exporting of CSV files based on user defined classes. Further, it is possible to configure the importing and exporting behavior.

Main feature of this library is that users only need to create an own class representing a single line of a CSV file. Each property that has to be included in the CSV processing should be tagged by a proper CSV attribute, which can be obtained from this library. Thereafter, this custom class can be used either together with the importer or together with the exporter to process CSV files.

Finally, it would also be possible (assuming a proper configuration is used) to write a CSV output according to the rules of RFC 4180. For more information about RFC 4180 please visit the web‑site under https://www.ietf.org/rfc/rfc4180.txt.

Documentation

The full API documentation as CHM file is part of the release bundle. The online version of this documentation can be found in the project’s Wiki.

Licensing

The software has been published under the terms of

MIT License

Copyright © 2019 plexdata.de

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Installation

The binary files of the Plexdata CSV Parser are provided as NuGet package and can be obtained from https://www.nuget.org/packages/Plexdata.CsvParser.NET. How to install this NuGet package manually is explained there.

Using the Plexdata CSV Parser together with Visual Studio.

Additionally, all releases can be downloaded from GitHub. Please visit page Plexdata CSV Parser to find all available versions.

Escaping

The current library version also supports escaping of double‑quotes according to the rules of RFC 4180. This mean for example that values which contain a double‑quote inside their strings do no longer end at this double‑quote. Consider the following CSV content for an example what is actually meant.

Name,                Notes
"Marley, Bob",       "Having an ""escaped"" double quote in here."
"Hurley, Elizabeth", "Having another \"escaped\" double quote in here."

As shown in the example above, the first data row contains ""escaped"", which actually represents the double‑quote escaping according to the RFC. The second data line instead contains \"escaped\". This kind of escaping represents the default character escaping which is usually applied in strings. No matter which of these variations is used, the CSV Importer is able to process both of them.

In contrast to that, the CSV Exporter will always replace each double‑quote inside a string value into two double‑quotes. This is exactly the behavior which the RFC 4180 describes.

Importing

Importing data from an external CSV file is pretty easy. The only thing to do is to create a custom entity class that contains public properties. As next, each of the properties should be tagged by an attribute called CsvColumn.

On the other hand, each property that is not tagged by attribute CsvColumn will be ignored implicitly. But it would also be possible to explicitly exclude properties from being processed. For such a case the library provides an attribute called CsvIgnore.

Please keep in mind, all examples in this section are provided in C#. For other languages, the code should look pretty similar.

Here below an example of how such a custom entity class could look like.

[CsvDocument]
class Customer
{
    [CsvIgnore]
    public Int32 Id { get; set; }

    [CsvColumn(Offset = 2, Header = "Identifier")]
    public Int32 ExternalId { get; set; }

    [CsvColumn(Offset = 1, Header = "Forename")]
    public String FirstName { get; set; }

    [CsvColumn(Offset = 0, Header = "Surname")]
    public String LastName { get; set; }

    [CsvColumn(Offset = 5, Header = "Active")]
    public Boolean IsActive { get; set; }

    [CsvColumn(Offset = 3, Header = "Date")]
    public DateTime? EntryDate { get; set; }

    [CsvColumn(Offset = 4, Header = "Sales")]
    public Decimal SalesAverage { get; set; }

    [CsvColumn(Offset = 6, Header = "Notes")]
    public String Description { get; set; }
}

With an entity class as show above it would be possible to import a CSV file that looks like shown below.

Surname,  Forename,    Identifier, Date,       Sales,      Active, Notes
"Marley", "Bob",       1001,       2007-05-03, "1,234.56", nope,   "Have a short note here."
"Monroe", "Marilyn",   1002,       2008-06-05, "1,234.56", nope,   ""
"Snipes", "Wesley",    1003,       2009-07-06, "1,234.56", yeah,   "Have a short note here." 
"Hurley", "Elizabeth", 1004,       2005-08-08, "1,234.56", yeah,   "Have a short note here."

As next it might be considered to use a configuration. Such a configuration is done by using the class CsvSettings. With this class it becomes possible to tell the importer, for example, how to handle decimal conversions. In this conjunction it’s important to know that the importer always uses current UI culture. This means that on a German operating system all decimal numbers are split by a comma. Therefore, if it is wanted to force the usage of a dot separator instead, the culture “en‑US” should be used.

Here an example of how the importer could be configured using “en‑US” as culture and replacing each “yeah” value of type Boolean by “true” as well as each “nope” value by “false”.

CsvSettings settings = new CsvSettings
{
    Culture = CultureInfo.GetCultureInfo("en-US"),
    Mappings = new CsvMappings
    {
        TrueValues = new List<String> { "yeah" },
        FalseValues = new List<String> { "nope" },
    },
};

Now, the only remaining thing is to perform the import. This is done by calling the CSV Importer together with all necessary parameters. The example below shows exemplarily how to do the file import.

IEnumerable<Customer> result = CsvImporter<Customer>.Load(filename, settings);

foreach (Customer current in result)
{
    Console.WriteLine(current.ToString());
}

Console.ReadKey();

Finally, please note, it is strictly recommended to surround each call to the CSV Importer by an appropriated try…catch block!

Exporting

The same as mentioned in section Importing applies to Exporting as well. First thing to do, a custom entity class should be created. Such a class could look like shown below.

[CsvDocument]
class Customer
{
    [CsvIgnore]
    public Int32 Id { get; set; }

    [CsvColumn(Offset = 2, Header = "Identifier")]
    public Int32 ExternalId { get; set; }

    [CsvColumn(Offset = 1, Header = "Forename")]
    public String FirstName { get; set; }

    [CsvColumn(Offset = 0, Header = "Surname")]
    public String LastName { get; set; }

    [CsvColumn(Offset = 5, Header = "Active")]
    public Boolean IsActive { get; set; }

    [CsvColumn(Offset = 3, Header = "Date")]
    public DateTime? EntryDate { get; set; }

    [CsvColumn(Offset = 4, Header = "Sales")]
    public Decimal SalesAverage { get; set; }

    [CsvColumn(Offset = 6, Header = "Notes")]
    public String Description { get; set; }
}

As next a configuration should be created which is used together with the CSV Exporter. Here an example of such a configuration.

CsvSettings settings = new CsvSettings
{
    Culture = CultureInfo.GetCultureInfo("en-US"),
    Textual = true,
    Mappings = new CsvMappings
    {
        TrueValue = "yeah",
        FalseValue = "nope",
    },
};

In contrast to the configuration used for importing, this configuration explicitly enables Textual mode. This mode surrounds each value of type string by double quotes.

Finally, the remaining thing is to perform an export. For this purpose a list of custom entities like this is assumed.

List<Customer> customers = new List<Customer>
{
    new Customer {
        LastName     = "Marley",
        FirstName    = "Bob",
        ExternalId   = 1001,
        EntryDate    = new DateTime(2007, 5, 3),
        SalesAverage = 1234.56m,
        IsActive     = false,
        Description  = "Have a short note here." },
    new Customer {
        LastName     = "Monroe",
        FirstName    = "Marilyn",
        ExternalId   = 1002,
        EntryDate    = new DateTime(2008, 6, 5),
        SalesAverage = 1234.56m,
        IsActive     = false,
        Description  = null },
    new Customer {
        LastName     = "Snipes",
        FirstName    = "Wesley",
        ExternalId   = 1003,
        EntryDate    = new DateTime(2009, 7, 6),
        SalesAverage = 1234.56m,
        IsActive     = true,
        Description  = "Have a short note here." },
    new Customer {
        LastName     = "Hurley",
        FirstName    = "Elizabeth",
        ExternalId   = 1004,
        EntryDate    = new DateTime(2005, 8, 8),
        SalesAverage = 1234.56m,
        IsActive     = true,
        Description  = "Have a short note here." },
};

Now, exporting of those data takes place like shown as follows.

CsvExporter<Customer>.Save(customers, filename, settings);

Console.ReadKey();

Under the above conditions, the output file would look like this.

Surname,Forename,Identifier,Date,Sales,Active,Notes
"Marley","Bob",1001,2007-05-03T00:00:00,1234.56,nope,"Have a short note here."
"Monroe","Marilyn",1002,2008-06-05T00:00:00,1234.56,nope,
"Snipes","Wesley",1003,2009-07-06T00:00:00,1234.56,yeah,"Have a short note here."
"Hurley","Elizabeth",1004,2005-08-08T00:00:00,1234.56,yeah,"Have a short note here."

As already mentioned, it is strictly recommended to surround each call to the CSV Exporter by an appropriated try…catch block!

Reading

Another feature in the current version is the possibility to read a CSV content into some kind of “plain” structure. This plain structure in turn allows an access to each cell independently.

With this in mind, the CSV Reader allows processing of CSV content that either has more than the expected number of cells in a particular line or vice versa that has fewer cells. See code snippet below to get an impression about what is meant in detail.

// Source file could contain this content:
// Name;               Notes
// "Marley, Bob";      "Jamaican singer-songwriter"
// "Monroe, Marilyn";  "American actress";          "model and singer"
// "Snipes, Wesley";   "American actor";            "director, film producer"; "martial artist"
// "Hurley, Elizabeth" 

CsvSettings settings = new CsvSettings() { Heading = true, Separator = ColumnSeparators.SemicolonSeparator };
CsvContainer container = CsvReader.Read(filename, settings);

String col0row1 = container.GetValue<String>(0, 1) as String; // Marley, Bob
String col0row2 = container.GetValue<String>(0, 2) as String; // Monroe, Marilyn
String col0row3 = container.GetValue<String>(0, 3) as String; // Snipes, Wesley
String col0row4 = container.GetValue<String>(0, 4) as String; // Hurley, Elizabeth

String col1row1 = container.GetValue<String>(1, 1) as String; // Jamaican singer-songwriter
String col1row2 = container.GetValue<String>(1, 2) as String; // American actress
String col1row3 = container.GetValue<String>(1, 3) as String; // American actor
String col1row4 = container.GetValue<String>(1, 4) as String; // null

String col2row1 = container.GetValue<String>(2, 1) as String; // null
String col2row2 = container.GetValue<String>(2, 2) as String; // model and singer
String col2row3 = container.GetValue<String>(2, 3) as String; // director, film producer
String col2row4 = container.GetValue<String>(2, 4) as String; // null

String col3row1 = container.GetValue<String>(3, 1) as String; // null
String col3row2 = container.GetValue<String>(3, 2) as String; // null
String col3row3 = container.GetValue<String>(3, 3) as String; // martial artist
String col3row4 = container.GetValue<String>(3, 4) as String; // null

As shown above, the example CSV file contains just two columns for the header. But in line three the file contains exactly three columns. On the other hand, four columns are available in line four. In contrast to that, line five contains just one single column.

Another detail to mention is that the second value of the call of method GetValue() starts at one. This is because of the CSV file contains a header at line zero. Therefore, accessing the data cells must start at line one.

Writing

With the current version it becomes possible to write a matrix of objects, which represents a kind of “plain” CSV content, into a CSV file. This functionality in turn allows to write everything into a CSV file, not matter if the lines contain a flexible number of columns.

The following code snippet shows how to use the CSV Writer to put a flexible content into a CSV file.

List<List<Object>> content = new List<List<Object>>
{
    new List<Object> { "Name", "Notes" },
    new List<Object> { "Marley, Bob", "Jamaican singer-songwriter" },
    new List<Object> { "Monroe, Marilyn", "American actress", "model and singer" },
    new List<Object> { "Snipes, Wesley", "American actor", "director, film producer", "martial artist" },
    new List<Object> { "Hurley, Elizabeth" }
};

CsvSettings settings = new CsvSettings() { 
    Heading = true, Textual = true, 
    Separator = ColumnSeparators.SemicolonSeparator };

CsvWriter.Write(content, filename, settings);

With the above example code in mind, the output file would contain the following content.

Name;Notes
"Marley, Bob";"Jamaican singer-songwriter"
"Monroe, Marilyn";"American actress";"model and singer"
"Snipes, Wesley";"American actor";"director, film producer";"martial artist"
"Hurley, Elizabeth"

Settings

The Settings are made by using a library class called CsvSettings. This class provides properties that allow to control the handling of data imports and data exports. Which of these properties is responsible for what, is part of this section.

PropertyDescription
Culture The used culture mainly applies to number conversion. For example using German culture will treat numbers such as 1.234,56 as valid decimals. In contrast to that, using an English culture will treat a number like 1,234.56 as valid decimal value. Default value is current UI culture.
Encoding File encoding is relevant for both, importing and exporting CSV data. It describes how data have to be handled during loading as well as during saving. For most cases UTF‑8 file encoding is a good choice. Default value is UTF‑8.
Exactly If this mode is enabled, then the column heading as well as the column order must exactly fit the data type definition, but only if Heading mode is enabled as well. Furthermore, exactly treatment also applies to data type conversion. For example, an exceptions is thrown if exactly mode is enabled and one of the imported data items could not be converted. The exactly mode is only relevant for a CSV data import and its default value is false.
Heading If this mode is enabled, then the header is written into the output. In this case the header information is taken from the column attributes. In case of a column attribute does not contain header information then the property name is taken as header instead. Otherwise, the header is excluded. Default value is true.
Mappings The mapping is used to interpret values while importing respectively while exporting data. For example, an imported CSV file contains yes and/or no for Boolean values. In such a case the mapping is used to tell the importer that yes means true and of course vice versa no means false. Default value is standard mapping (see below).
Separator The separator character is used to determine the split position in each line of a CSV file. Usually a comma is used for this purpose. But because of missing a clear definition there are many CSV files that uses various different characters. Some of the examples are: semicolon, tabulator, colon and sometimes spaces as well. With this property it becomes possible to define the character to be used to split the content of any CSV file. Default value is a comma.
Textual If this mode is enabled, then any textual data types are enclosed in double quotes. This would overwrite the default behavior. Default behavior means that only those textual data types are enclosed in double quotes which contain control characters such as the separator, carriage returns, line feeds, and/or double quotes. The textual mode is only relevant for a CSV data export and its default value is false.

The Mappings are used to convert CSV values into its programmable representation. This handling works for both, importing and exporting. The library provides a class called CsvMappings for this purpose. How the value mapping works is part of the following details.

Consider a CSV file to import that contains values of type Boolean. But those values are not set to true respectively to false. These values using yes and no instead. So, the problem is how to convert yes into true and no into false. The solution, class CsvMappings can do this mapping.

The following example demonstrates how to configure class CsvMappings to become able to handle a mapping of values yes and no.

CsvMappings Mappings = new CsvMappings
{
    TrueValues = new List<String> { "yeah" },
    FalseValues = new List<String> { "nope" },
};

On the other hand, the same mapping can be applied for exporting CSV data. But instead of using the “plural” properties, the “singular” properties are configured. See following example of how to put yes and no into the output file, instead of true and false.

CsvMappings Mappings = new CsvMappings
{
    TrueValue = "yes",
    FalseValue = "no",
};

See following overview for more details about the properties of class CsvMappings.

PropertyDescription
TrueValue Modifies the descriptor representing a “True” value which should occur in an exported CSV file instead of the standard Boolean “True” value. The default “True” descriptor is “true”.
FalseValue Modifies the descriptor representing a “False” value which should occur in an exported CSV file instead of the standard Boolean “False” value. The default “False” descriptor is “false”.
NullValue Modifies the descriptor representing a “Null” value which should occur in an exported CSV file instead of an empty string. The default “Null” descriptor is “empty”.
TrueValues Modifies the list of descriptors containing all possible values representing variations of “True” which may occur in imported CSV files. The default list of “True” descriptors contain “true”, “1”, “y” and “yes”.
FalseValues Modifies the list of descriptors containing all possible values representing variations of “False” which may occur in imported CSV files. The default list of “False” descriptors contain “false”, “0”, “n” and “no”.
NullValues Modifies the list of descriptors containing all possible values representing variations of “Null” which may occur in imported CSV files. The default list of “Null” descriptors just contains “<null>”.

Limitation

There are some limitations when using the Plexdata CSV Parser. Clarifying them is task of this section.

The content of a CSV file header may contain the separator character. In such a case it is strictly recommended to ensure that each header part is enclosed in double quotes that contains the separator. Here an example of what is meant.

"Head,er1",Header2,Header3

Such a constellation will not cause any trouble if the header is formatted as shown above. But if the header of a CSV file is formatted like shown below, then a problem will occur.

"Head,er1","Header2,Header3"

In this case the parser will discover only two columns, instead of the wanted three. This will end up in an error and the parser gives up with an exception!