Fastest Way To Insert the Data in MS SQL – Part 2 – Hibernate vs Prepared Statement vs Jdbc Template vs Bulk Api

Oct 21, 2023 · 7 min read · java ms sql server ·

Share on:

Introduction

In the previous post Fastest way to insert the data in MS SQL - Part 1 Hibernate Batching we compared the performance of Hibernate inserts with batching turned off vs batching turned on. In this post, we will explore other ways to insert data into MS SQL server and compare performance with Hibernate.

We will consider four different ways to persist data:

Using Hibernate batching, calling persist method.
Manually inserting data using a prepared statement with batching.
Using Spring Data Jdbc Template with batching.
Using Microsoft Bulk API.

Hibernate Batching

Hibernate batching is implemented as following:

 1@Override
 2public <T> void persistAndFlushInChunks(List<T> objects, int bulkInsertBatchSize) {
 3	if(CollectionUtils.isEmpty(objects)){
 4		return;
 5	}
 6
 7	// set manual flush mode
 8	entityManager.setFlushMode(FlushModeType.COMMIT);
 9	entityManager
10			.unwrap(Session.class)
11			.setJdbcBatchSize(bulkInsertBatchSize);
12
13	for(List<T> chunk : Lists.partition(objects, bulkInsertBatchSize)) {
14		persistAndFlushObjects(chunk);
15	}
16}
17
18private <T> void persistAndFlushObjects(Iterable<T> objects) {
19	for (T object : objects) {
20		entityManager.persist(object);
21	}
22	entityManager.flush();
23	entityManager.clear();
24}

This post details the identical implementation as the previous one. I enabled manual flush mode to gain enhanced control over when data gets transmitted to the MS SQL server. Following each flush, I clear the entity manager, as the objects have already been persisted in the database and are no longer needed in memory.

Manually Inserting Data With Prepared Statement

Since all actions are performed manually, a substantial amount of code has been written to facilitate this process. Therefore, I will simply outline the key components and provide an explanation of how each element operates.

 1public <T> void insert(List<? extends T> objects, TableDescriptor<T> descriptor, int batchSize) {
 2    Session session = entityManager.unwrap(Session.class);
 3
 4    session.doWork(connection ->
 5            insertInternal(objects, descriptor, connection, batchSize));
 6}
 7
 8private <T> void insertInternal(List<? extends T> objects,
 9                                TableDescriptor<T> descriptor,
10                                Connection connection,
11                                int batchSize) throws SQLException {
12
13    try (PreparedStatement preparedStatement = prepareInsertStatement(descriptor, connection, 1)) {
14
15        for (int i = 0; i < objects.size(); i++) {
16            @NonNull List<ColumnDescriptor<T>> columns = descriptor.getColumns();
17            for (int j = 0; j < columns.size(); j++) {
18                ColumnDescriptor<T> columnDescriptor = columns.get(j);
19                preparedStatement.setObject(j + 1, columnDescriptor.getObjectFieldValue(objects.get(i)));
20            }
21            preparedStatement.addBatch();
22            if ((i + 1) % batchSize == 0) {
23                preparedStatement.executeBatch();
24                preparedStatement.clearBatch();
25            }
26        }
27        preparedStatement.executeBatch();
28        preparedStatement.clearBatch();
29    }
30}

The method takes three parameters: a list of Person objects to insert, a TableDescriptor providing information about the database table, and a batch size. Table descriptor contains column names and a mapping function for data that is used for preparing statement: columnDescritor.getObjectFieldValue(object).

This method utilizes Hibernate to obtain a database session and connection. The actual insertion process is carried out by the insertInternal method, which prepares a PreparedStatement for inserting data. It iterates through the list of objects, sets their values in the prepared statement, and adds them to a batch. When the specified batch size is reached, the batch is executed, and this process is repeated until all objects are inserted.

Using Spring Data Jdbc Template With Batching

 1public <T> void insertBulk(TableDescriptor<T> tableDescriptor,
 2                           List<? extends T> data,
 3                           int batchSize) {
 4    if (data.isEmpty()) return;
 5    String insertQuery = getInsertQuery(tableDescriptor, 1);
 6    jdbcTemplate.batchUpdate(insertQuery,
 7            data,
 8            batchSize,
 9            (ps, argument) -> {
10                List<ColumnDescriptor<T>> columns = tableDescriptor.getColumns();
11                for (int i = 0; i < columns.size(); i++) {
12                    ColumnDescriptor<T> columnDescriptor = columns.get(i);
13                    ps.setObject(i + 1, columnDescriptor.getObjectFieldValue(argument));
14                }
15            });
16}

This method represents the most straightforward approach I've employed for batch data persistence. It leverages the Spring data JdbcTemplate class, which offers a user-friendly interface and abstracts away the intricacies of batching within the method. I supplied the query, data, and batchSize as parameters, with the only manual task being the population of a prepared statement with the data.

Microsoft Bulk Api

 1public <T> void performBulkInsert(List<? extends T> data,
 2                              TableDescriptor<T> descriptor,
 3                              int batchSize) {
 4    Session unwrap = entityManager.unwrap(Session.class);
 5
 6    unwrap.doWork((connection) -> {
 7        try (SQLServerBulkCopy bulkCopy =
 8                     new SQLServerBulkCopy(connection.unwrap(SQLServerConnection.class))) {
 9            SQLServerBulkCopyOptions options = new SQLServerBulkCopyOptions();
10            options.setBatchSize(batchSize);
11            bulkCopy.setBulkCopyOptions(options);
12            bulkCopy.setDestinationTableName(descriptor.getTableName());
13
14            CachedRowSet dataToInsert = createCachedRowSet(data, descriptor);
15            // Perform bulk insert
16            bulkCopy.writeToServer(dataToInsert);
17        }
18    });
19}
20
21private <T> CachedRowSet createCachedRowSet(List<? extends T> data, TableDescriptor<T> descriptor) throws SQLException {
22    RowSetFactory factory = RowSetProvider.newFactory();
23    CachedRowSet rowSet = factory.createCachedRowSet();
24
25    rowSet.setMetaData(createMetadata(descriptor));
26
27    for (T rowData : data) {
28        rowSet.moveToInsertRow();
29        for (int i = 0; i < descriptor.getColumns().size(); i++) {
30            ColumnDescriptor<T> column = descriptor.getColumns().get(i);
31            rowSet.updateObject(i + 1, column.getObjectFieldValue(rowData), column.getSqlType());
32        }
33        rowSet.insertRow();
34    }
35
36    rowSet.moveToCurrentRow();
37    return rowSet;
38}
39
40private static  <T> RowSetMetaData createMetadata(TableDescriptor<T> descriptor) throws SQLException {
41    RowSetMetaData metadata = new RowSetMetaDataImpl();
42
43    // Set the number of columns
44    metadata.setColumnCount(descriptor.getColumns().size());
45    for (int i = 0; i < descriptor.getColumns().size(); i++) {
46        metadata.setColumnName(i + 1, descriptor.getColumns().get(i).getColumnName());
47        metadata.setColumnType(i + 1, descriptor.getColumns().get(i).getSqlType());
48    }
49    return metadata;
50}

This data insertion method operates in much the same way as the preceding ones. The primary distinction lies in its utilization of the MS SQL server bulk API. This API is typically employed for importing data from a file into a table. However, in this instance, the source is a collection held in memory, rather than a file.

Results

I conduct the tests using varying batch sizes for each method, specifically batch sizes of 10, 100, and 1,000. 100,000 Person objects are persisted to the database.

In each test scenario, I run the test 10 times and compute the median value from the results. Following the completion of each test, I truncate the Person table. My measurements exclusively focus on the time taken for the insert operations to conclude.

Method Name	Median Duration Ms	Batch Size
Bulk Api	314	10000
Bulk Api	432	1000
Bulk Api	1504	100
Jdbc Template	1508.5	10000
Prepared Statement	1520	10000
Jdbc Template	1551.5	1000
Prepared Statement	1559.5	1000
Hibernate Persist	1732	10000
Prepared Statement	1770.5	100
Hibernate Persist	1778	1000
Jdbc Template	1783	100
Hibernate Persist	2281	100
Prepared Statement	2583	10
Jdbc Template	2618.5	10
Hibernate Persist	4850.5	10
Bulk Api	10178.5	10

Conclusion

Right from the start, it's apparent that the Bulk API is both the speediest and, paradoxically, the slowest data insertion method. When dealing with small batch sizes, its performance can be agonizingly sluggish. However, as we scale up the batch size, it truly shines and demonstrates itself as the optimal and swiftest means of inserting data into the MS SQL server database.

The Jdbc Template and Prepared Statement methods exhibit nearly identical performance levels, which isn't surprising since they essentially share the same underlying code.

Furthermore, we can deduce that Hibernate Persist is the least efficient method for data persistence across various batch sizes (excluding a batch size of 10). While the difference in speed is relatively minor for larger batch sizes, it introduces approximately a 10% increase in processing time for batch sizes of 10,000 and 1,000. Moreover, this overhead becomes more pronounced when dealing with smaller batch sizes.

We can employ an additional technique when working with a prepared statement. In our previous tests, we conducted inserts using the following statement:

1insert into Person(person_id, user_name, first_name, last_name, years) values (?,?,?,?,?)

With this statement, we insert one person object at a time. However, MS SQL Server supports an alternative syntax:

1insert into Person(person_id, user_name, first_name, last_name, years) values (?,?,?,?,?),(?,?,?,?,?)...

Allowing us to insert multiple sets of values in a single statement, up to a limit of 2,100 parameters. In my upcoming post, I will assess the performance of this method and compare it to the fastest Bulk API insert to determine its efficiency.

For updates, you can follow me on Twitter or LinkedIn.

Thank you for your attention.