C# Code, Tutorials and Full Visual Studio Projects

C# Bulk Upsert to SQL Server Tutorial

Posted by on Nov 18, 2010 in Database, Tutorials | 27 comments

C# Bulk Upsert to SQL Server Tutorial

Do you have thousands of rows that you need to insert into a table in SQL Server, but some of those rows are duplicates and you want to update the duplicate rows?  The Update/Insert is commonly called an Upsert, but C# doesn’t have the ability to Upsert nicely.  If you loop and update / insert for each row it will take a very long time and will cause unwanted stress on your database.

Anyone that’s had to deal with inserting large volumes of data into a database from C# has probably came across the SqlBulkCopy class. This class lets you bulk insert data into sql server very fast from C#. The class is great if you want to insert data, but if there is duplicate record it will throw an exception.  You could change your index to have the  “IGNORE_DUP_KEY” flag, which will throw away duplicates without generating an exception, but if you wanted to update those rows this method doesn’t work.

Luckily SQL Server has a handy feature that can bridge this gap and it’s called Merge.  The Merge statement is basically an Upsert function.  It has a statement to execute and a part where you can declare what to do if the row exists, and what to do if it doesn’t exist.  Sounds like exactly what we need, but there is one problem: The Merge statement works by selecting from a table, and our rows are not in a table yet!

The solution to this problem is using SQL Server temp tables.  The application works like so:

  1. Create a temp table that matches your production table.
  2. Bulk insert all your data (including duplicates) to the temp table.
  3. Use the SQL Server Merge statement to Upsert from the temp table to your production table.
  4. Clean up by removing your temp table.

Setup Some Data

Lets create some test data and go through an example of this.
But first lets build a build a business object:

internal class Price
{
	public string Symbol { get; set; }
	public double Value { get; set; }
	public DateTime Timestamp { get; set; }
}

No lets create some random prices and include a few duplicates:

List<Price> prices = new List<Price>();

Random rnd = new Random();

//Create some unique rows
for (int i = 0; i < 10000; i++)
{
	prices.Add(new Price {Symbol = "AAPL", Value = rnd.NextDouble(),
		Timestamp = new DateTime(2010, 1, 1).AddHours(i)});
	prices.Add(new Price {Symbol = "GOOG", Value = rnd.NextDouble(),
		Timestamp = new DateTime(2010, 1, 1).AddHours(i)});
}

//Add a new duplicates
for (int i = 0; i < 10; i++)
{
	prices.Add(new Price {Symbol = "AAPL", Value = rnd.NextDouble(),
		Timestamp = new DateTime(2010, 1, 1).AddHours(i)});
	prices.Add(new Price {Symbol = "GOOG", Value = rnd.NextDouble(),
		Timestamp = new DateTime(2010, 1, 1).AddHours(i)});
}

Step #1:  Create the Temp Table

Here we issue a create table statement and use the # character to let SQL Server know our table is temporary.

//Make a temp table in sql server that matches our production table
string tmpTable = "create table #Prices (symbol nvarchar(10), Price decimal(18,4), Timestamp DateTime)";

//Create a datatable that matches the temp table exactly.
DataTable table = new DataTable();
table.Columns.Add(new DataColumn("Symbol", typeof (string)));
table.Columns.Add(new DataColumn("Price", typeof (double)));
table.Columns.Add(new DataColumn("Timestamp", typeof (DateTime)));

//Add prices in our list to our DataTable
foreach (Price price in prices)
{
	DataRow row = table.NewRow();
	row["Symbol"] = price.Symbol;
	row["Price"] = price.Value;
	row["Timestamp"] = price.Timestamp;
	table.Rows.Add(row);
}

//Connect to DB
string conString = ConfigurationManager.ConnectionStrings["ConnectionString"].ConnectionString;
using (SqlConnection con = new SqlConnection(conString))
{
	con.Open();

	//Execute the command to make a temp table
	SqlCommand cmd = new SqlCommand(tmpTable, con);
	cmd.ExecuteNonQuery();
}

Step #2: Bulk Insert

Here we use the SqlBulkCopy class to insert the data into our temp table.

The order of columns is extremely important. The order of the columns must be the same as the order of columns in your create table statement. The SqlBulkCopy class uses the order to insert not the name, so if the order is not exactly the same you could be inserting data into the wrong columns or get an error.
//BulkCopy the data in the DataTable to the temp table
using (SqlBulkCopy bulk = new SqlBulkCopy(con))
{
	bulk.DestinationTableName = "#Prices";
	bulk.WriteToServer(table);
}

Step #3: Upsert using Merge

Use the SQL Server Merge statement to Upsert our data from the temp table to the production table quickly.

//Now use the merge command to upsert from the temp table to the production table
string mergeSql = "merge into ProductionPrices as Target " +
				  "using #Prices as Source " +
				  "on " +
				  "Target.Symbol=Source.Symbol " +
				  "and Target.Timestamp = Source.Timestamp " +
				  "when matched then " +
				  "update set Target.Price=Source.Price " +
				  "when not matched then " +
				  "insert (Symbol,Price,Timestamp) values (Source.Symbol,Source.Price,Source.Timestamp);";

cmd.CommandText = mergeSql;
cmd.ExecuteNonQuery();

Step #4: Clean up

Let SQL Server recover the resources used by that temp table as soon as your done with it.

//Clean up the temp table
cmd.CommandText = "drop table #Prices";
cmd.ExecuteNonQuery();

Thats all you need to write, to get fast Bulk Upserts from C# into SQL Server.  I hope this helps someone else out there.

Complete Code Listing

using System;
using System.Collections.Generic;
using System.Configuration;
using System.Data;
using System.Data.SqlClient;

namespace Jarloo.BulkUpsert
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            List<Price> prices = new List<Price>();

            Random rnd = new Random();

            //Create some unique rows
            for (int i = 0; i < 10000; i++)
            {
                prices.Add(new Price {Symbol = "AAPL", Value = rnd.NextDouble(), Timestamp = new DateTime(2010, 1, 1).AddHours(i)});
                prices.Add(new Price {Symbol = "GOOG", Value = rnd.NextDouble(), Timestamp = new DateTime(2010, 1, 1).AddHours(i)});
            }

            //Add a new duplicates
            for (int i = 0; i < 10; i++)
            {
                prices.Add(new Price {Symbol = "AAPL", Value = rnd.NextDouble(), Timestamp = new DateTime(2010, 1, 1).AddHours(i)});
                prices.Add(new Price {Symbol = "GOOG", Value = rnd.NextDouble(), Timestamp = new DateTime(2010, 1, 1).AddHours(i)});
            }

            //Make a temp table in sql server that matches our production table
            string tmpTable = "create table #Prices (symbol nvarchar(10), Price decimal(18,4), Timestamp DateTime)";

            //Create a datatable that matches the temp table exactly. (WARNING: order of columns must match the order in the table)
            DataTable table = new DataTable();
            table.Columns.Add(new DataColumn("Symbol", typeof (string)));
            table.Columns.Add(new DataColumn("Price", typeof (double)));
            table.Columns.Add(new DataColumn("Timestamp", typeof (DateTime)));

            //Add prices in our list to our DataTable
            foreach (Price price in prices)
            {
                DataRow row = table.NewRow();
                row["Symbol"] = price.Symbol;
                row["Price"] = price.Value;
                row["Timestamp"] = price.Timestamp;
                table.Rows.Add(row);
            }

            //Connect to DB
            string conString = ConfigurationManager.ConnectionStrings["ConnectionString"].ConnectionString;
            using (SqlConnection con = new SqlConnection(conString))
            {
                con.Open();

                //Execute the command to make a temp table
                SqlCommand cmd = new SqlCommand(tmpTable, con);
                cmd.ExecuteNonQuery();

                //BulkCopy the data in the DataTable to the temp table
                using (SqlBulkCopy bulk = new SqlBulkCopy(con))
                {
                    bulk.DestinationTableName = "#Prices";
                    bulk.WriteToServer(table);
                }

                //Now use the merge command to upsert from the temp table to the production table
                string mergeSql = "merge into ProductionPrices as Target " +
                                  "using #Prices as Source " +
                                  "on " +
                                  "Target.Symbol=Source.Symbol " +
                                  "and Target.Timestamp = Source.Timestamp " +
                                  "when matched then " +
                                  "update set Target.Price=Source.Price " +
                                  "when not matched then " +
                                  "insert (Symbol,Price,Timestamp) values (Source.Symbol,Source.Price,Source.Timestamp);";

                cmd.CommandText = mergeSql;
                cmd.ExecuteNonQuery();

                //Clean up the temp table
                cmd.CommandText = "drop table #Prices";
                cmd.ExecuteNonQuery();
            }
        }
    }

    internal class Price
    {
        public string Symbol { get; set; }
        public double Value { get; set; }
        public DateTime Timestamp { get; set; }
    }
}

27 Comments

Join the conversation and post a comment.

  1. Jake Hogen

    Thanks! I was looking for a way to do this and ended up looping through my data, but this approach is much better.

  2. Gravitas

    Brilliant article!!! Thanks for this.

  3. Gravitas

    Notes: to get this example to compile, you need to add the “System.Configuration” assembly, as described:
    at http://msdn.microsoft.com/en-us/library/system.configuration.configurationmanager.connectionstrings.aspx

  4. Gravitas

    Notes: to get it working, you must first create a target table the same as the temp #Prices table. I used the following command in Microsoft SQL Server Management Studio (my database was named “Test”):

    create table [Test].[dbo].[ProductionPrices] (symbol nvarchar(10), Price decimal(18,4), Timestamp DateTime)

  5. Gravitas

    Performance is excellent: 20,000 rows per second insert on commodity hardware, and 28,000 if the insert doesn’t alter the database (i.e. we repeat the upsert, so no new data is written into onto the hard drive).

  6. kelias

    Good catch on the System.Configuration. I add it all my projects just by habit now so I tend to forget that one.

  7. kelias

    Glad you find the performance good. I needed a way to insert large amounts of trade data into a database and this is what I came up with. Worked well for my needs.

  8. Gravitas

    Yes, the performance is excellent. I’ve tested it by inserted 100 million rows into the database, each insert was 250k rows with overlapping duplicated symbols and dates. The performance remained constant throughout the entire upload, at around 10 to 13 seconds per insert of 250k rows.

    Using this method to insert also seems to be far less disk intensive compared to insertions using SqlBulkInsert (although, SqlBulkInsert will do up to 50,000 rows per second). SqlBulkInsert is probably not a good option as it has no way of automatically updating any existing data in the database (i.e. it won’t handle duplicates).

  9. kelapa

    Hi jurloo…nice article and easy to understand…but i got a question here…how does i can implement for reading csv files???..actually i’ve done your method in my application but i must deals with several error..

    the first one at the connection string..my debugger said “Object reference not set to an instance of an object” occurs at that line..why?? i’ve already add the reference system.Configuration…pls reply me…thank you in advance..

  10. Derek Fowler

    In case you’re interested I effectively wrote a follow up to this post and have also wrapped up this technique as a github project.

    http://blog.dezfowler.com/2011/05/bulk-upsert-to-sql-server-from-net.html

  11. admin

    Clever, I like what you’ve done there!

  12. Franco

    Hi Everybody,

    Please, I need a help.
    Does anyone know how use SqlBulkCopy with IDataReader?.

    Thanks in advance.

  13. Chris DiLalla

    Great tutorial. Exactly what I was looking for. Thank you!

  14. Christopher Harrison

    Great tutorial. I will definitely use this in the near future. My one comment is about your warning that the column order is important with Bulkcopy. It’s really important if you use the Bulkcopy’s ColumnMapping functionality which lets you define which columns map to what.

  15. Christopher Harrison

    I meant to say it’s really NOT important if you use the ColumnMapping functionality.

  16. nina Savanovych

    Thanks! it helped me alot!

  17. Carmen

    Awesome. Thanks

  18. Nader

    Great ! Thank`s ! You save my time

  19. Ryan Leach

    I’ve had problems following this method with auto incrementing ID’s, the original ID’s seem to be ignored and new ones generated in the temporary table, causing the wrong rows to be upserted.

  20. Kelly Elias

    I’ve been using it for over 2 years. I insert over 20 million rows a day without issue. I would bet that the problem is on the SQL Server side. (Sorry can’t help more, but I’ve never ran into this issue. Been working perfectly for me.)

  21. y2kStephen

    Thx Kelly
    great tutorial !
    you save my time and my hair from pulling off!

  22. Ezequiel

    Very helpful, it works great
    Thanks!

  23. Kelly Elias

    Glad you found what your looking for. Took me a bit to test the method, but I haven’t found anything faster and I’m using this in production code all over the place.

  24. Anders

    The indicated best practice for a MERGE would be to add a unique index to the source table (after loading in the data of course).
    Have you tried that? I’d be interested to see if that improves performance at all…

  25. Kelly Elias

    I’ve done it on tables that already have a unique index beforehand, I’ve never tried to merge then add an index after the fact.

  26. Ramu

    Very Useful to me ,but when i run this code then below error is came.
    “The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.”
    please send me suggesstion to above email Id . i am waiting for your suggestions.

  27. Kelly Elias

    The error means that your trying to update the same row in the destination table more than once which is not allowed. Check your source and you will find there are at least two records with the same key as defined by your destination table.

Leave a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>