Jarloo

Menu

C# Bulk Upsert to SQL Server Tutorial

Do you have thousands of rows that you need to insert into a table in SQL Server, but some of those rows are duplicates and you want to update the duplicate rows?  The Update/Insert is commonly called an Upsert, but C# doesn’t have the ability to Upsert nicely.  If you loop and update / insert for each row it will take a very long time and will cause unwanted stress on your database.

Anyone that’s had to deal with inserting large volumes of data into a database from C# has probably came across the SqlBulkCopy class. This class lets you bulk insert data into sql server very fast from C#. The class is great if you want to insert data, but if there is duplicate record it will throw an exception.  You could change your index to have the  “IGNORE_DUP_KEY” flag, which will throw away duplicates without generating an exception, but if you wanted to update those rows this method doesn’t work.

Luckily SQL Server has a handy feature that can bridge this gap and it’s called Merge.  The Merge statement is basically an Upsert function.  It has a statement to execute and a part where you can declare what to do if the row exists, and what to do if it doesn’t exist.  Sounds like exactly what we need, but there is one problem: The Merge statement works by selecting from a table, and our rows are not in a table yet!

The solution to this problem is using SQL Server temp tables.  The application works like so:

  1. Create a temp table that matches your production table.
  2. Bulk insert all your data (including duplicates) to the temp table.
  3. Use the SQL Server Merge statement to Upsert from the temp table to your production table.
  4. Clean up by removing your temp table.

Setup Some Data

Lets create some test data and go through an example of this.
But first lets build a build a business object:

internal class Price
{
	public string Symbol { get; set; }
	public double Value { get; set; }
	public DateTime Timestamp { get; set; }
}

Now lets create some random prices and include a few duplicates:

List<Price> prices = new List<Price>();

Random rnd = new Random();

//Create some unique rows
for (int i = 0; i < 10000; i++)
{
	prices.Add(new Price {Symbol = "AAPL", Value = rnd.NextDouble(),
		Timestamp = new DateTime(2010, 1, 1).AddHours(i)});
	prices.Add(new Price {Symbol = "GOOG", Value = rnd.NextDouble(),
		Timestamp = new DateTime(2010, 1, 1).AddHours(i)});
}

//Add a few duplicates
for (int i = 0; i < 10; i++)
{
	prices.Add(new Price {Symbol = "AAPL", Value = rnd.NextDouble(),
		Timestamp = new DateTime(2010, 1, 1).AddHours(i)});
	prices.Add(new Price {Symbol = "GOOG", Value = rnd.NextDouble(),
		Timestamp = new DateTime(2010, 1, 1).AddHours(i)});
}

Step #1:  Create the Temp Table

Here we issue a create table statement and use the # character to let SQL Server know our table is temporary.

//Make a temp table in sql server that matches our production table
string tmpTable = "create table #Prices (symbol nvarchar(10), Price decimal(18,4), Timestamp DateTime)";

//Create a datatable that matches the temp table exactly.
DataTable table = new DataTable();
table.Columns.Add(new DataColumn("Symbol", typeof (string)));
table.Columns.Add(new DataColumn("Price", typeof (double)));
table.Columns.Add(new DataColumn("Timestamp", typeof (DateTime)));

//Add prices in our list to our DataTable
foreach (Price price in prices)
{
	DataRow row = table.NewRow();
	row["Symbol"] = price.Symbol;
	row["Price"] = price.Value;
	row["Timestamp"] = price.Timestamp;
	table.Rows.Add(row);
}

//Connect to DB
string conString = ConfigurationManager.ConnectionStrings["ConnectionString"].ConnectionString;
using (SqlConnection con = new SqlConnection(conString))
{
	con.Open();

	//Execute the command to make a temp table
	SqlCommand cmd = new SqlCommand(tmpTable, con);
	cmd.ExecuteNonQuery();
}

Step #2: Bulk Insert

Here we use the SqlBulkCopy class to insert the data into our temp table.

The order of columns is extremely important. The order of the columns must be the same as the order of columns in your create table statement. The SqlBulkCopy class uses the order to insert not the name, so if the order is not exactly the same you could be inserting data into the wrong columns or get an error.

//BulkCopy the data in the DataTable to the temp table
using (SqlBulkCopy bulk = new SqlBulkCopy(con))
{
	bulk.DestinationTableName = "#Prices";
	bulk.WriteToServer(table);
}

Step #3: Upsert using Merge

Use the SQL Server Merge statement to Upsert our data from the temp table to the production table quickly.

//Now use the merge command to upsert from the temp table to the production table
string mergeSql = "merge into ProductionPrices as Target " +
				  "using #Prices as Source " +
				  "on " +
				  "Target.Symbol=Source.Symbol " +
				  "and Target.Timestamp = Source.Timestamp " +
				  "when matched then " +
				  "update set Target.Price=Source.Price " +
				  "when not matched then " +
				  "insert (Symbol,Price,Timestamp) values (Source.Symbol,Source.Price,Source.Timestamp);";

cmd.CommandText = mergeSql;
cmd.ExecuteNonQuery();

Step #4: Clean up

Let SQL Server recover the resources used by that temp table as soon as your done with it.

//Clean up the temp table
cmd.CommandText = "drop table #Prices";
cmd.ExecuteNonQuery();

Thats all you need to write, to get fast Bulk Upserts from C# into SQL Server.  I hope this helps someone else out there.

Complete Code Listing

using System;
using System.Collections.Generic;
using System.Configuration;
using System.Data;
using System.Data.SqlClient;

namespace Jarloo.BulkUpsert
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            List<Price> prices = new List<Price>();

            Random rnd = new Random();

            //Create some unique rows
            for (int i = 0; i < 10000; i++)
            {
                prices.Add(new Price {Symbol = "AAPL", Value = rnd.NextDouble(), Timestamp = new DateTime(2010, 1, 1).AddHours(i)});
                prices.Add(new Price {Symbol = "GOOG", Value = rnd.NextDouble(), Timestamp = new DateTime(2010, 1, 1).AddHours(i)});
            }

            //Add a new duplicates
            for (int i = 0; i < 10; i++)
            {
                prices.Add(new Price {Symbol = "AAPL", Value = rnd.NextDouble(), Timestamp = new DateTime(2010, 1, 1).AddHours(i)});
                prices.Add(new Price {Symbol = "GOOG", Value = rnd.NextDouble(), Timestamp = new DateTime(2010, 1, 1).AddHours(i)});
            }

            //Make a temp table in sql server that matches our production table
            string tmpTable = "create table #Prices (symbol nvarchar(10), Price decimal(18,4), Timestamp DateTime)";

            //Create a datatable that matches the temp table exactly. (WARNING: order of columns must match the order in the table)
            DataTable table = new DataTable();
            table.Columns.Add(new DataColumn("Symbol", typeof (string)));
            table.Columns.Add(new DataColumn("Price", typeof (double)));
            table.Columns.Add(new DataColumn("Timestamp", typeof (DateTime)));

            //Add prices in our list to our DataTable
            foreach (Price price in prices)
            {
                DataRow row = table.NewRow();
                row["Symbol"] = price.Symbol;
                row["Price"] = price.Value;
                row["Timestamp"] = price.Timestamp;
                table.Rows.Add(row);
            }

            //Connect to DB
            string conString = ConfigurationManager.ConnectionStrings["ConnectionString"].ConnectionString;
            using (SqlConnection con = new SqlConnection(conString))
            {
                con.Open();

                //Execute the command to make a temp table
                SqlCommand cmd = new SqlCommand(tmpTable, con);
                cmd.ExecuteNonQuery();

                //BulkCopy the data in the DataTable to the temp table
                using (SqlBulkCopy bulk = new SqlBulkCopy(con))
                {
                    bulk.DestinationTableName = "#Prices";
                    bulk.WriteToServer(table);
                }

                //Now use the merge command to upsert from the temp table to the production table
                string mergeSql = "merge into ProductionPrices as Target " +
                                  "using #Prices as Source " +
                                  "on " +
                                  "Target.Symbol=Source.Symbol " +
                                  "and Target.Timestamp = Source.Timestamp " +
                                  "when matched then " +
                                  "update set Target.Price=Source.Price " +
                                  "when not matched then " +
                                  "insert (Symbol,Price,Timestamp) values (Source.Symbol,Source.Price,Source.Timestamp);";

                cmd.CommandText = mergeSql;
                cmd.ExecuteNonQuery();

                //Clean up the temp table
                cmd.CommandText = "drop table #Prices";
                cmd.ExecuteNonQuery();
            }
        }
    }

    internal class Price
    {
        public string Symbol { get; set; }
        public double Value { get; set; }
        public DateTime Timestamp { get; set; }
    }
}

Categories:   Code

Comments

Sorry, comments are closed for this item.