Mayank Nagar's Blog

MERGE is designed to apply both UPDATE and INSERTs into a target table from a source table. The statement can do both at once, or simply do INSERTs or only UPDATEs. One might even get the impression that INSERT and UPDATE are no longer needed. Why not always use MERGE?

MERGE can also do DELETEs from the target table

Sample Data and Some Basic Examples

To illustrate our case, let’s set up some very simplistic source and target tables, and populate them with some data that we can demonstrate with.

CREATE TABLE #Target

(

ID BIGINT PRIMARY KEY

, Value INT

);

CREATE TABLE #Source

(

ID BIGINT PRIMARY KEY

, Value INT

);

INSERT INTO #Target

VALUES ( 1, 2342 ),

( 2, 345 );

INSERT INTO #Source

VALUES ( 1, 975 ),

( 3, 683 );

When we MERGE into #Target, our matching criteria will be the ID field, so the normal case is to UPDATE like IDs and INSERT any new ones like this:

-- Standard MERGE of all #Source rows into #Target

MERGE #Target t

USING #Source s

ON s.ID = t.ID

WHEN MATCHED

THEN UPDATE

SET Value = s.Value

WHEN NOT MATCHED

THEN INSERT

( ID, Value )

VALUES

( s.ID

, s.Value

);

SELECT *

FROM #Target;

This produces quite predictable results that look like this:

ID Value

1 975

2 345

3 683

Let’s change the values in our #Source table, and then use MERGE to only do an UPDATE.

-- Change the values of our source rows

UPDATE #Source

SET Value = CASE ID

WHEN 1 THEN 555

WHEN 3 THEN 999

END

-- MERGE that only updates

MERGE #Target t

USING #Source s

ON s.ID = t.ID

WHEN MATCHED

THEN UPDATE

SET Value = s.Value;

SELECT *

FROM #Target;

The results now in #Target are:

ID Value

1 555

2 345

3 999

Finally, we know we can also use MERGE to replace INSERT by omitting the MATCHED clause. Let’s INSERT a new row into #Source and do this.

-- Insert a new row into our source

INSERT #Source

VALUES ( 4, 242 );

-- MERGE that only inserts

MERGE #Target t

USING #Source s

ON s.ID = t.ID

WHEN NOT MATCHED

THEN INSERT

( ID, Value )

VALUES

( s.ID

, s.Value

);

SELECT *

FROM #Target;

Unsurprisingly, the results now in #Target are these:

ID Value

1 555

2 345

3 999

4 242

Sorry if you know all this stuff and I’ve bored you, but we needed to get these basics out of the way.

Exploring the Performance of MERGE

To effectively test the performance of our alternatives, we’ll need to set up a test harness with a non-trivial number of rows in our #Source and #Target tables. You can open the Test Harness 1.sql file in the resources section of this article and follow along. The basic set-up data is as follows.

TRUNCATE TABLE #Source;

TRUNCATE TABLE #Target;

WITH Tally ( n )

AS ( SELECT TOP 1000000 ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL

) )

FROM sys.all_columns a

CROSS JOIN sys.all_columns b

)

INSERT INTO #Target

SELECT 2 * n

, 1 + ABS(CHECKSUM(NEWID())) % 1000

FROM Tally;

WITH Tally ( n )

AS ( SELECT TOP 1000000 ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL

) )

FROM sys.all_columns a

CROSS JOIN sys.all_columns b

)

INSERT INTO #Source

SELECT CASE WHEN n <= 500000 THEN 2 * n - 1

ELSE 2 * n

END

, 1 + ABS(CHECKSUM(NEWID())) % 1000

FROM Tall

We’ve purposely set up our source table so that the INSERTs it will do when merged with the target are interleaved with existing records for the first 500,000 rows.

Using SQL Profiler, we’ll compare two identical query scripts:

-- MERGE

MERGE #Target t

USING #Source s

ON s.ID = t.ID

WHEN MATCHED

THEN UPDATE

SET Value = s.Value

WHEN NOT MATCHED

THEN INSERT

( ID, Value )

VALUES

( s.ID

, s.Value

);

-- TRUNCATE and re-populate Source and Target tables

-- UPDATE/INSERT

BEGIN TRANSACTION T1;

UPDATE t

SET Value = s.Value

FROM #Target t

JOIN #Source s

ON s.ID = t.ID;

INSERT INTO #Target

SELECT s.ID

, s.Value

FROM #Source s

LEFT JOIN #Target t

ON s.ID = t.ID

WHERE t.ID IS NULL;

COMMIT TRANSACTION T1;

Both of these INSERT 500,000 rows and UPDATE 500,000 rows, the latter enclosed in a TRANSACTION. We omit error handling and possible ROLLBACK of the transaction for simplicity. The results that we obtained running the test harness in SQL Profiler (5 runs) are as follows:

Query CPU Reads Writes Duration

MERGE 4492 4513786 2578 4864

INSERT/UPDATE 3588 3072489 5496 3847

MERGE 4820 4514303 2899 5253

INSERT/UPDATE 3572 3072979 4007 4035

MERGE 4462 4513793 2571 4770

INSERT/UPDATE 3635 3072463 5497 3829

MERGE 4524 4513795 2931 4800

INSERT/UPDATE 3588 3072474 5505 3665

MERGE 4648 4513814 2939 4955

INSERT/UPDATE 3479 3072491 5522 3716

These indicate that MERGE took about 28% more CPU and 29% more elapsed time than the equivalent INSERT/UPDATE. Not surprising considering all the complexity that MERGE must handle, but possibly forgivable for the additional safety, convenience and simplicity it represents (no need for a TRANSACTION with error handling).

Using MERGE as a Substitute for INSERT

To test the performance of MERGE as a substitute for INSERT, the same test harness that set up the #Target table can be used, but we’ll change the set up for the #Source table as follows.

WITH Tally (n) AS

(

SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

FROM sys.all_columns a CROSS JOIN sys.all_columns b

)

INSERT INTO #Source

SELECT CASE WHEN n <= 500000 THEN 2*n-1 ELSE 2000000+n END

,1+ABS(CHECKSUM(NEWID()))%1000

FROM Tally;

A quick check of the row counts generated from Test Harness #2.sql confirms that both MERGE and INSERT insert exactly 1,000,000 rows. The results from a Profiler trace using these queries give us this:

Query CPU Reads Writes Duration

MERGE 5054 6024150 2397 5576

INSERT 4992 6248001 7563 5507

MERGE 5226 6024165 3868 5529

INSERT 5383 6248005 7571 6298

MERGE 5257 6023557 3689 5473

INSERT 4851 6247403 7431 5546

MERGE 5273 6023589 2167 5662

INSERT 4914 6247440 7427 5281

MERGE 5179 6024619 1426 5476

INSERT 5039 6248483 6211 5954

In this case, CPU and elapsed time are probably too close to call. CPU usage was about 3% more for the MERGE and elapsed time was about 3% less for the MERGE.

Using MERGE as a Substitute for UPDATE

Once again, to test this we’ll change the test harness only for the #Source table so that it creates the rows with exactly the same IDs as the #Target table.

WITH Tally (n) AS

(

SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

FROM sys.all_columns a CROSS JOIN sys.all_columns b

)

INSERT INTO #Source

SELECT 2*n

,1+ABS(CHECKSUM(NEWID()))%1000

FROM Tally;

After confirming that both MERGE and UPDATE update 1,000,000 rows, our Profile trace from running Test Harness #3.sql gave us these results.

Query CPU Reads Writes Duration

MERGE 1903 7982 2568 2010

UPDATE 1763 7954 2568 1840

MERGE 1904 7986 2576 2303

UPDATE 1809 7955 2560 1974

MERGE 1903 7968 2576 1951

UPDATE 1763 7940 2568 2005

MERGE 1918 7957 2568 2009

UPDATE 1731 7464 2584 1809

MERGE 1903 8005 2560 2023

UPDATE 1732 7977 2584 2063

MERGE was just about 9% more costly in CPU and 7% more in elapsed time and with this UPDATE seems to be more advantageous.

Mayank Nagar's Blog

Wednesday, January 1, 2020

Performance of the SQL MERGE vs. INSERT/UPDATE

About Me

Blog Archive