Fastest Way To Insert the Data in MS SQL – Part 3 – Multi-value insert vs Bulk Api

java sending data to mssql server

In a previous post, we concluded that the Bulk API insert is the fastest way to insert data into an MS SQL Server database if the batch size is reasonably large. For Jdbc Template inserts, we created a regular insert statement in the form of insert into table (c1, c2, c3) values (v1, v2, v3), where c1, c2, and c3 are column names, and v1, v2, and v3 are values.

For 100,000 Person objects with a batch size in the range of 1,000-10,000, we concluded that Bulk API insert is four to five times faster than JDBC Template. However, with MS SQL Server, JDBC Template has one more potential speedup that it can utilize, possibly competing with Bulk API and proving useful in certain situations.

If we check the MS SQL Server official syntax for the insert statement, we can see that it supports multiple values inserts in one statement. This means that if we want to insert multiple values into the same table, instead of sending multiple insert statements, we can do that in a single insert statement. The insert statement will take the following form: insert into table (c1, c2, c3) values (v1, v2, v3), (v4, v5, v6).... The maximum number of value bindings that can be included in one statement is 2,100, consistent with any other MS SQL Server statement.

How multiple values insert impact performance

Firstly, we will compare this multiple value insert with a single value insert and check if increasing the number of value bindings per insert statement improves performance.

The following code is used to perform the insertion of 1,000,000 persons. It is very similar to the code from Part 2 except that it now generates multi-value insert statements.

 1public void insertPeople(TableDescriptor<Person> tableDescriptor,
 2                         List<Person> people,
 3                         int batchSize,
 4                         int objectsPerInsert) {
 5    if (people.isEmpty()) return;
 6    objectsPerInsert = Math.min(objectsPerInsert, people.size());
 7
 8    List<List<Person>> partitions = Lists.partition(people, objectsPerInsert);
 9    if (partitions.size() == 1 || partitions.get(partitions.size() - 1).size() == objectsPerInsert) {
10        persist(tableDescriptor, batchSize, objectsPerInsert, partitions);
11    } else {
12        List<List<Person>> equalSizePartitions = partitions.subList(0, partitions.size() - 1);
13        persist(tableDescriptor, batchSize, objectsPerInsert, equalSizePartitions);
14        List<List<Person>> lastPartition = Collections.singletonList(partitions.get(partitions.size() - 1));
15        persist(tableDescriptor, 1, lastPartition.size(), lastPartition);
16    }
17}
18
19private void persist(TableDescriptor<Person> tableDescriptor, int batchSize, int objectsPerInsert, List<List<Person>> partitions) {
20    String insertQuery = getInsertQuery(tableDescriptor, objectsPerInsert);
21    jdbcTemplate.batchUpdate(insertQuery,
22            partitions,
23            batchSize,
24            (ps, argument) -> BulkPersistCustomRepositoryImpl.insertBatchData(tableDescriptor, ps, argument));
25}
26
27public <T> void insertBulk(TableDescriptor<T> tableDescriptor,
28                           List<? extends T> data,
29                           int batchSize) {
30    if (data.isEmpty()) return;
31    String insertQuery = getInsertQuery(tableDescriptor, 1);
32    jdbcTemplate.batchUpdate(insertQuery,
33            data,
34            batchSize,
35            (ps, argument) -> {
36                List<ColumnDescriptor<T>> columns = tableDescriptor.getColumns();
37                for (int i = 0; i < columns.size(); i++) {
38                    ColumnDescriptor<T> columnDescriptor = columns.get(i);
39                    ps.setObject(i + 1, columnDescriptor.getObjectFieldValue(argument));
40                }
41            });
42}

I ran multiple tests with different batch sizes and the number of people per insert. Each person has five fields, so the number of binding values per insert is actually the number of people per insert times five.

The results:

Number of people per insertBatch sizeMedian duration (ms)
4001004175
400104275
40010004348
400100004697
10010005335
1001005359
100105585
100100005873
10100006302.5
1010006331
101006512
10108027.5
11000014792.5
1100015222
110017812.5
11025252

Multi-value insert

As we can observe, the number of bound values per insert has a significant impact on performance. It has an even greater performance impact than the batch size, which is, for me at least, very surprising. The more parameters we bind, the better the performance we achieve. Interestingly, the best performance is with a batch size of 100.

Multi-value insert vs Bulk API

After concluding that we have much better performance with multi-value inserts, I will now compare Bulk API inserts with JDBC Template 400-person inserts per statement.

Method nameBatch sizeMedian duration (ms)
Bulk Api100003188.5
Jdbc Template1004175
Jdbc Template104275
Jdbc Template10004348
Bulk Api10004358
Jdbc Template100004697
Bulk Api10014440.5
Bulk Api10101251.5

Multi-value insert vs Bulk Api

Based on the results, we can see that the performance difference is much smaller than before. As the batch size increases, Bulk API performance improves. However, when we increase the batch size for JDBC Template to more than 100, the performance worsens. Therefore, we should be careful when setting the batch size for JDBC Template with multi-value inserts.

Conclusion

Based on the results above, we can conclude that multi-value inserts drastically improve insert performances and that you should definitely consider it if you have performance problems. On the other hand, if you need to insert a huge amount of data, then Bulk API insert is the king.

For updates, you can follow me on Twitter or LinkedIn.

Thank you for your attention.

comments powered by Disqus