Tuesday, 1 July 2014

Self Define Data Type for Each Columns from Reading Text Files with Schema.ini using OleDB in C# & VB

Have you been using Oledb to read text files and come across a situation where you want to define each of the columns data types instead of letting it auto define for you. You can actually use a file called Schema.ini and place it in the same folder with the text file that you want to read.

Here's what your schema.ini should contain.

- File name including extension but excludes folder path. The file name is enclosed in a square bracket.
- Whether your file contain header or only data.
- The file format type [csv, tab, etc...]
- The list of columns and its data type.

So let's say in a case where you want all your data types to be in a string format, it will look like the following in your schema.ini.

[Tables.csv]
ColNameHeader=True
Format=CSVDelimited
Col1="Description" Text
Col2=Red Text
Col3=Green Text
Col4=Blue Text

The column defined are in sequences from left to right based on the columns in your text file. So if you did not specify the column sequence correctly, then schema.ini will define the data type wrongly. Also, The column name need to be enclosed in double quotation if it contains a space.

Once finish construct the schema.ini, place it together with the text file that you going to read. Then just read your text file like what you normally did with Oledb. You will notice that the value obtained for each columns are in string.

The following are the code for reading the comma delimited file with header and display it out. It will loop through the comma delimited file content and populate it to the Table class.

[C#]
static void Main(string[] args)
{
    var tables = ListTables(Path.Combine(Environment.CurrentDirectory, "TestFile", "Tables.csv"));

    foreach (var table in tables)
    {
        Console.WriteLine(string.Format("Description: {0}", table.Description));
        Console.WriteLine(string.Format("Red: {0}", table.Red));
        Console.WriteLine(string.Format("Green: {0}", table.Green));
        Console.WriteLine(string.Format("Blue: {0}\n", table.Blue));
    }
    Console.ReadKey();
}


private static List<Table> ListTables(string fileName)
{
    var tables = new List<Table>();
    var fileInfo = new FileInfo(fileName);
    const string SQL_STATEMENT = "SELECT * FROM [{0}]";

    using (var con = new OleDbConnection(string.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties='text; HDR=YES;'", fileInfo.DirectoryName)))
    {
        con.Open();

        using (var cmd = new OleDbCommand(string.Format(SQL_STATEMENT, fileInfo.Name), con))
        {
            using (var reader = cmd.ExecuteReader())
            {
                while (reader.Read())
                {
                    var table = new Table();
                    table.Description = reader.GetValue(reader.GetOrdinal("Description")).ToString();
                    table.Red = reader.GetValue(reader.GetOrdinal("Red")).ToString();
                    table.Green = reader.GetValue(reader.GetOrdinal("Green")).ToString();
                    table.Blue = reader.GetValue(reader.GetOrdinal("Blue")).ToString();

                    tables.Add(table);
                }
            }
        }
    }
    return tables;
}

[VB]
Sub Main()
    Dim tables As List(Of Table) = ListTables(Path.Combine(Environment.CurrentDirectory, "TestFile", "Tables.csv"))

    For Each table As Table In tables
        Console.WriteLine(String.Format("Description: {0}", table.Description))
        Console.WriteLine(String.Format("Red: {0}", table.Red))
        Console.WriteLine(String.Format("Green: {0}", table.Green))
        Console.WriteLine(String.Format("Blue: {0}" + Environment.NewLine, table.Blue))
    Next
    Console.ReadKey()
End Sub


Private Function ListTables(ByVal fileName As String) As List(Of Table)
    Dim tables As List(Of Table) = New List(Of Table)
    Dim fileInfo As FileInfo = New FileInfo(fileName)
    Const SQL_STATEMENT As String = "SELECT * FROM [{0}]"

    Using con As OleDbConnection = New OleDbConnection(String.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties='text; HDR=YES;'", fileInfo.DirectoryName))
            con.Open()

        Using cmd As OleDbCommand = New OleDbCommand(String.Format(SQL_STATEMENT, fileInfo.Name), con)
            Using reader As OleDbDataReader = cmd.ExecuteReader()
                While reader.Read
                    Dim table As Table = New Table()
                    table.Description = reader.GetValue(reader.GetOrdinal("Description")).ToString()
                    table.Red = reader.GetValue(reader.GetOrdinal("Red")).ToString()
                    table.Green = reader.GetValue(reader.GetOrdinal("Green")).ToString()
                    table.Blue = reader.GetValue(reader.GetOrdinal("Blue")).ToString()

                    tables.Add(table)
                End While
            End Using
        End Using
    End Using
    Return tables
End Function

This example is just to demonstrate how you can define each of the columns' data type. It is advisable to define a correct data type for the column, so that you don't have to do unnecessary data conversion.

If you get hit with the error "The 'Microsoft.ACE.OLEDB.12.0' provider is not registered on the local machine.". Please install Microsoft Access Database Engine 2010 x86.

More on how to construct your own Schema.ini can be found on the following link : http://msdn.microsoft.com/en-us/library/ms709353(VS.85).aspx

Source code can be obtained here: https://onedrive.live.com/redir?resid=E6612168B803803D!355&authkey=!AOcpGnFSW310zJ0&ithint=file%2czip

Github: https://github.com/Jaryllan/Demonstration-Oledb-With-Schema




No comments:

Post a Comment