MERGE
is designed to apply both UPDATE and INSERTs into a target table from a source
table. The statement can do both at once, or simply do INSERTs or only
UPDATEs. One might even get the impression that INSERT and UPDATE are no
longer needed. Why not always use MERGE?
MERGE
can also do DELETEs from the target table
Sample Data and Some Basic Examples
To
illustrate our case, let’s set up some very simplistic source and target
tables, and populate them with some data that we can demonstrate with.
CREATE TABLE #Target
(
ID BIGINT PRIMARY KEY
, Value INT
);
CREATE TABLE #Source
(
ID
BIGINT PRIMARY KEY
, Value INT
);
INSERT INTO #Target
VALUES ( 1, 2342 ),
( 2, 345 );
INSERT INTO #Source
VALUES ( 1, 975 ),
( 3, 683 );
When
we MERGE into #Target, our matching criteria will be the ID field, so the
normal case is to UPDATE like IDs and INSERT any new ones like this:
--
Standard MERGE of all #Source rows into #Target
MERGE #Target t
USING #Source s
ON s.ID = t.ID
WHEN MATCHED
THEN UPDATE
SET Value = s.Value
WHEN NOT MATCHED
THEN INSERT
( ID, Value )
VALUES
( s.ID
, s.Value
);
SELECT *
FROM #Target;
This
produces quite predictable results that look like this:
ID
Value
1
975
2
345
3
683
Let’s
change the values in our #Source table, and then use MERGE to only do an
UPDATE.
--
Change the values of our source rows
UPDATE #Source
SET Value = CASE ID
WHEN 1 THEN 555
WHEN 3 THEN 999
END
--
MERGE that only updates
MERGE #Target t
USING #Source s
ON s.ID = t.ID
WHEN MATCHED
THEN UPDATE
SET Value = s.Value;
SELECT *
FROM #Target;
The
results now in #Target are:
ID
Value
1
555
2
345
3
999
Finally,
we know we can also use MERGE to replace INSERT by omitting the MATCHED
clause. Let’s INSERT a new row into #Source and do this.
--
Insert a new row into our source
INSERT #Source
VALUES ( 4, 242 );
--
MERGE that only inserts
MERGE #Target t
USING #Source s
ON s.ID = t.ID
WHEN NOT MATCHED
THEN INSERT
( ID, Value )
VALUES
( s.ID
, s.Value
);
SELECT *
FROM #Target;
Unsurprisingly,
the results now in #Target are these:
ID Value
1 555
2 345
3 999
4 242
Sorry
if you know all this stuff and I’ve bored you, but we needed to get these
basics out of the way.
Exploring the Performance of MERGE
To
effectively test the performance of our alternatives, we’ll need to set up a
test harness with a non-trivial number of rows in our #Source and #Target
tables. You can open the Test Harness 1.sql file in the
resources section of this article and follow along. The basic set-up data
is as follows.
TRUNCATE TABLE #Source;
TRUNCATE TABLE #Target;
WITH Tally ( n )
AS ( SELECT TOP 1000000 ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL
) )
FROM sys.all_columns a
CROSS JOIN sys.all_columns b
)
INSERT INTO #Target
SELECT 2 * n
, 1 + ABS(CHECKSUM(NEWID())) % 1000
FROM Tally;
WITH Tally ( n )
AS ( SELECT TOP 1000000 ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL
) )
FROM sys.all_columns a
CROSS JOIN sys.all_columns b
)
INSERT INTO #Source
SELECT CASE WHEN n <= 500000 THEN 2 * n - 1
ELSE 2 * n
END
, 1 + ABS(CHECKSUM(NEWID())) % 1000
FROM Tall
We’ve
purposely set up our source table so that the INSERTs it will do when merged
with the target are interleaved with existing records for the first 500,000
rows.
Using
SQL Profiler, we’ll compare two identical query scripts:
--
MERGE
MERGE #Target t
USING #Source s
ON s.ID = t.ID
WHEN MATCHED
THEN UPDATE
SET Value = s.Value
WHEN NOT MATCHED
THEN INSERT
( ID, Value )
VALUES
( s.ID
, s.Value
);
--
TRUNCATE and re-populate Source and Target tables
--
UPDATE/INSERT
BEGIN TRANSACTION T1;
UPDATE t
SET Value = s.Value
FROM #Target t
JOIN #Source s
ON s.ID = t.ID;
INSERT INTO #Target
SELECT s.ID
, s.Value
FROM #Source s
LEFT JOIN #Target t
ON s.ID = t.ID
WHERE t.ID IS NULL;
COMMIT TRANSACTION T1;
Both
of these INSERT 500,000 rows and UPDATE 500,000 rows, the latter enclosed in a
TRANSACTION. We omit error handling and possible ROLLBACK of the
transaction for simplicity. The results that we obtained running the test
harness in SQL Profiler (5 runs) are as follows:
Query
CPU Reads Writes Duration
MERGE
4492 4513786 2578 4864
INSERT/UPDATE 3588 3072489
5496 3847
MERGE
4820 4514303 2899 5253
INSERT/UPDATE 3572 3072979
4007 4035
MERGE
4462 4513793 2571 4770
INSERT/UPDATE 3635 3072463
5497 3829
MERGE
4524 4513795 2931 4800
INSERT/UPDATE 3588 3072474
5505 3665
MERGE
4648 4513814 2939 4955
INSERT/UPDATE 3479 3072491
5522 3716
These
indicate that MERGE took about 28% more CPU and 29% more elapsed time than the
equivalent INSERT/UPDATE. Not surprising considering all the complexity
that MERGE must handle, but possibly forgivable for the additional safety,
convenience and simplicity it represents (no need for a TRANSACTION with error
handling).
Using MERGE as a Substitute for INSERT
To
test the performance of MERGE as a substitute for INSERT, the same test harness
that set up the #Target table can be used, but we’ll change the set up for the
#Source table as follows.
WITH Tally (n) AS
(
SELECT
TOP
1000000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM
sys.all_columns
a CROSS
JOIN
sys.all_columns
b
)
INSERT INTO #Source
SELECT CASE WHEN n <= 500000 THEN 2*n-1 ELSE 2000000+n END
,1+ABS(CHECKSUM(NEWID()))%1000
FROM Tally;
A
quick check of the row counts generated from Test Harness #2.sql
confirms that both MERGE and INSERT insert exactly 1,000,000 rows. The
results from a Profiler trace using these queries give us this:
Query
CPU Reads Writes Duration
MERGE
5054 6024150 2397 5576
INSERT
4992 6248001 7563 5507
MERGE 5226
6024165 3868 5529
INSERT
5383 6248005 7571 6298
MERGE
5257 6023557 3689 5473
INSERT
4851 6247403 7431 5546
MERGE
5273 6023589 2167 5662
INSERT
4914 6247440 7427 5281
MERGE
5179 6024619 1426 5476
INSERT
5039 6248483 6211 5954
In
this case, CPU and elapsed time are probably too close to call. CPU usage
was about 3% more for the MERGE and elapsed time was about 3% less for the
MERGE.
Using MERGE as a Substitute for UPDATE
Once
again, to test this we’ll change the test harness only for the #Source table so
that it creates the rows with exactly the same IDs as the #Target table.
WITH Tally (n) AS
(
SELECT
TOP
1000000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM
sys.all_columns
a CROSS
JOIN
sys.all_columns
b
)
INSERT INTO #Source
SELECT 2*n
,1+ABS(CHECKSUM(NEWID()))%1000
FROM Tally;
After
confirming that both MERGE and UPDATE update 1,000,000 rows, our Profile trace
from running Test Harness #3.sql gave us these results.
Query
CPU Reads Writes Duration
MERGE
1903 7982 2568 2010
UPDATE
1763 7954 2568 1840
MERGE
1904 7986 2576 2303
UPDATE
1809 7955 2560 1974
MERGE
1903 7968 2576 1951
UPDATE
1763 7940 2568 2005
MERGE
1918 7957 2568 2009
UPDATE
1731 7464 2584 1809
MERGE
1903 8005 2560 2023
UPDATE
1732 7977 2584 2063
MERGE was just about 9% more costly in CPU and 7% more in elapsed time and with this UPDATE seems to be more advantageous.
No comments:
Post a Comment