Wednesday, January 1, 2020

Performance of the SQL MERGE vs. INSERT/UPDATE


MERGE is designed to apply both UPDATE and INSERTs into a target table from a source table.  The statement can do both at once, or simply do INSERTs or only UPDATEs.  One might even get the impression that INSERT and UPDATE are no longer needed.  Why not always use MERGE?
MERGE can also do DELETEs from the target table
Sample Data and Some Basic Examples
To illustrate our case, let’s set up some very simplistic source and target tables, and populate them with some data that we can demonstrate with.
CREATE TABLE #Target
    (
      ID BIGINT PRIMARY KEY
    , Value INT
    );

CREATE TABLE #Source
    (
      ID BIGINT PRIMARY KEY
    , Value INT
    );

INSERT INTO #Target
    VALUES  ( 1, 2342 ),
            ( 2, 345 );
INSERT INTO #Source
    VALUES  ( 1, 975 ),
            ( 3, 683 );
When we MERGE into #Target, our matching criteria will be the ID field, so the normal case is to UPDATE like IDs and INSERT any new ones like this:
-- Standard MERGE of all #Source rows into #Target
MERGE #Target t
    USING #Source s
    ON s.ID = t.ID
    WHEN MATCHED
        THEN     UPDATE
            SET Value = s.Value
    WHEN NOT MATCHED
        THEN     INSERT
                 ( ID, Value )
            VALUES
                ( s.ID
                , s.Value
                );

SELECT *
    FROM #Target;
This produces quite predictable results that look like this:
ID   Value
1    975
2    345
3    683
Let’s change the values in our #Source table, and then use MERGE to only do an UPDATE.
-- Change the values of our source rows
UPDATE #Source
    SET Value = CASE ID
                  WHEN 1 THEN 555
                  WHEN 3 THEN 999
                END

-- MERGE that only updates
MERGE #Target t
    USING #Source s
    ON s.ID = t.ID
    WHEN MATCHED
        THEN     UPDATE
            SET Value = s.Value;

SELECT *
    FROM #Target;
The results now in #Target are:
ID   Value
1    555
2    345
3    999
Finally, we know we can also use MERGE to replace INSERT by omitting the MATCHED clause.  Let’s INSERT a new row into #Source and do this.
-- Insert a new row into our source
INSERT #Source
    VALUES ( 4, 242 );

-- MERGE that only inserts
MERGE #Target t
    USING #Source s
    ON s.ID = t.ID
    WHEN NOT MATCHED
        THEN     INSERT
                ( ID, Value )
            VALUES
                ( s.ID
                , s.Value
                );

SELECT *
    FROM #Target;
Unsurprisingly, the results now in #Target are these:
ID Value
1 555
2 345
3 999
4 242
Sorry if you know all this stuff and I’ve bored you, but we needed to get these basics out of the way.
Exploring the Performance of MERGE
To effectively test the performance of our alternatives, we’ll need to set up a test harness with a non-trivial number of rows in our #Source and #Target tables.  You can open the Test Harness 1.sql file in the resources section of this article and follow along.  The basic set-up data is as follows.
TRUNCATE TABLE #Source;
TRUNCATE TABLE #Target;

WITH    Tally ( n )
          AS ( SELECT TOP 1000000 ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL
                                                              ) )
                FROM sys.all_columns a
                    CROSS JOIN sys.all_columns b
             )
    INSERT INTO #Target
            SELECT 2 * n
                  , 1 + ABS(CHECKSUM(NEWID())) % 1000
                FROM Tally;

WITH    Tally ( n )
          AS ( SELECT TOP 1000000 ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL
                                                              ) )
                FROM sys.all_columns a
                    CROSS JOIN sys.all_columns b
             )
    INSERT INTO #Source
            SELECT CASE WHEN n <= 500000 THEN 2 * n - 1
                        ELSE 2 * n
                   END
                  , 1 + ABS(CHECKSUM(NEWID())) % 1000
                FROM Tall
We’ve purposely set up our source table so that the INSERTs it will do when merged with the target are interleaved with existing records for the first 500,000 rows.
Using SQL Profiler, we’ll compare two identical query scripts:
-- MERGE
MERGE #Target t
    USING #Source s
    ON s.ID = t.ID
    WHEN MATCHED
        THEN     UPDATE
            SET Value = s.Value
    WHEN NOT MATCHED
        THEN     INSERT
                ( ID, Value )
            VALUES
                ( s.ID
                , s.Value
                );

-- TRUNCATE and re-populate Source and Target tables
-- UPDATE/INSERT
BEGIN TRANSACTION T1;
UPDATE t
    SET Value = s.Value
    FROM #Target t
        JOIN #Source s
        ON s.ID = t.ID;

INSERT INTO #Target
        SELECT s.ID
              , s.Value
            FROM #Source s
                LEFT JOIN #Target t
                ON s.ID = t.ID
            WHERE t.ID IS NULL;

COMMIT TRANSACTION T1;
Both of these INSERT 500,000 rows and UPDATE 500,000 rows, the latter enclosed in a TRANSACTION.  We omit error handling and possible ROLLBACK of the transaction for simplicity.  The results that we obtained running the test harness in SQL Profiler (5 runs) are as follows:
Query           CPU     Reads    Writes  Duration
MERGE           4492    4513786  2578    4864
INSERT/UPDATE   3588    3072489  5496    3847
   
MERGE           4820    4514303  2899    5253
INSERT/UPDATE   3572    3072979  4007    4035
   
MERGE           4462    4513793  2571    4770
INSERT/UPDATE   3635    3072463  5497    3829
   
MERGE           4524    4513795  2931    4800
INSERT/UPDATE   3588    3072474  5505    3665
   
MERGE           4648    4513814  2939    4955
INSERT/UPDATE   3479    3072491  5522    3716
These indicate that MERGE took about 28% more CPU and 29% more elapsed time than the equivalent INSERT/UPDATE.  Not surprising considering all the complexity that MERGE must handle, but possibly forgivable for the additional safety, convenience and simplicity it represents (no need for a TRANSACTION with error handling).
Using MERGE as a Substitute for INSERT
To test the performance of MERGE as a substitute for INSERT, the same test harness that set up the #Target table can be used, but we’ll change the set up for the #Source table as follows.
WITH Tally (n) AS
(
    SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM sys.all_columns a CROSS JOIN sys.all_columns b
)
INSERT INTO #Source
SELECT CASE WHEN n <= 500000 THEN 2*n-1 ELSE 2000000+n END
    ,1+ABS(CHECKSUM(NEWID()))%1000
FROM Tally;
A quick check of the row counts generated from Test Harness #2.sql confirms that both MERGE and INSERT insert exactly 1,000,000 rows.  The results from a Profiler trace using these queries give us this:
Query           CPU     Reads    Writes Duration
MERGE           5054    6024150  2397   5576
INSERT          4992    6248001  7563   5507
   
MERGE           5226    6024165  3868   5529
INSERT          5383    6248005  7571   6298
   
MERGE           5257    6023557  3689   5473
INSERT          4851    6247403  7431   5546
   
MERGE           5273    6023589  2167   5662
INSERT          4914    6247440  7427   5281
   
MERGE           5179    6024619  1426   5476
INSERT          5039    6248483  6211   5954
In this case, CPU and elapsed time are probably too close to call.  CPU usage was about 3% more for the MERGE and elapsed time was about 3% less for the MERGE.
Using MERGE as a Substitute for UPDATE
Once again, to test this we’ll change the test harness only for the #Source table so that it creates the rows with exactly the same IDs as the #Target table.
WITH Tally (n) AS
(
    SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM sys.all_columns a CROSS JOIN sys.all_columns b
)
INSERT INTO #Source
SELECT 2*n
    ,1+ABS(CHECKSUM(NEWID()))%1000
FROM Tally;
After confirming that both MERGE and UPDATE update 1,000,000 rows, our Profile trace from running Test Harness #3.sql gave us these results.
Query           CPU     Reads   Writes  Duration
MERGE           1903    7982    2568    2010
UPDATE          1763    7954    2568    1840
   
MERGE           1904    7986    2576    2303
UPDATE          1809    7955    2560    1974
   
MERGE           1903    7968    2576    1951
UPDATE          1763    7940    2568    2005
   
MERGE           1918    7957    2568    2009
UPDATE          1731    7464    2584    1809
   
MERGE           1903    8005    2560    2023
UPDATE          1732    7977    2584    2063
MERGE was just about 9% more costly in CPU and 7% more in elapsed time and with this UPDATE seems to be more advantageous.

No comments: