How to Use Case Insensitive String in Hash Map

Case sensitive vs case insensitive hash map

Introduction

By default, Microsoft SQL Server processes strings without considering their case sensitivity. Java, unlike Microsoft SQL Server, is case-sensitive which can result in problems. Specifically, on the project I'm working on, there have been numerous bugs caused by the mismatch in case sensitivity.

Generally, there are many situations where case-insensitive strings are necessary. One such example is with email addresses, which are inherently case-insensitive. Therefore, the issue of case sensitivity is not solely related to the mismatch between MS SQL Server and Java, but rather it is a more widespread concern.

In the following, I will explain how case sensitivity can cause issues in code, and how and why using a case-insensitive hash map can provide a solution.

Example

Let's say:

  1. The groups API provides a service called GroupsService which accepts a list of user email addresses and creates a group with those users.
  2. Each user's email address is unique in the EmailGroups database.
  3. Another API called users is used to create new users with their email addresses and store them in a database.
  4. When a request is made to the GroupsService service, each email address in the list is validated by checking a database using a case-insensitive search. If an email address in the list is not found in the database, an error is thrown and the error message includes the exact email address that wasn't found.
  5. The GroupsService service is designed to handle large requests, so it employs a solution that checks email addresses in chunks, rather than one-by-one against the database.

Implementation

  1. Retrieve emails (along with any other data necessary for processing the request) from the database by filtering them using the email addresses provided in the incoming request.
  2. Store the retrieved results in a Java HashMap.
  3. Iterate through each email address in the incoming request and check if there is a corresponding entry in the HashMap.
  4. If an entry doesn't exist for an email address, an error is thrown to the user.

We can use the following Java code to implement described behaviour:

 1public class GroupsService {
 2    @Value
 3    private static class MemberInfo {
 4        int id;
 5        String email;
 6        String name;
 7    }
 8    public void createGroupWithMembership(@NonNull String groupName, @NonNull List<String> memberEmails) {
 9        Map<String, MemberInfo> stringMemberInfoMap = loadMemberInfoByEmails(memberEmails);
10
11        for (String memberEmail : memberEmails) {
12            if (!stringMemberInfoMap.containsKey(memberEmail)) {
13                throw new IllegalArgumentException("Can't find email: '%s' in the system.");
14            }
15        }
16
17    }
18
19    private Map<String, MemberInfo> loadMemberInfoByEmails(List<String> memberEmails) {
20        List<String> memberEmailsToLookupFor = memberEmails.stream().distinct().sorted().toList();
21        List<MemberInfo> memberInfoList = groupRepository.loadMemberInfoByEmails(memberEmailsToLookupFor);
22        return
23                memberInfoList
24                        .stream()
25                        .collect(Collectors.toMap(MemberInfo::getEmail, Function.identity()));
26    }
27}

The Problem

Since the database is case-insensitive, it will return all emails without considering the case of the input string. However, in Java, we use a HashMap to match incoming emails with emails from the database. The issue with this approach is that HashMap keys are case-sensitive, which means that some emails may not be found.

To illustrate the problem, consider the following example:

 1  INPUT LIST                             HASH MAP
 2  (lower case emails)              (case-sensitive email keys)
 3
 4  +------------------------+       +------------------------+
 5  |  alice@example.com      |       | alice@example.com       |   MemberInfo
 6  +------------------------+       +------------------------+
 7  |  BOB@example.com  (*)   |       | Bob@example.com         |   MemberInfo
 8  +------------------------+       +------------------------+
 9  |  charlie@example.com    |       | charlie@example.com     |   MemberInfo
10  +------------------------+       +------------------------+
11  |  DAVID@example.com  (*) |       | David@example.com       |   MemberInfo
12  +------------------------+       +------------------------+

Sure, here's a clearer rewrite of the text:

In the previous example, Bob and David have different casing in their email addresses. Because the hashCode function generates different hash codes for the same string with different casing, the hash map will not be able to find their email addresses in the database. This can lead to errors for the user, even though we have data about both email addresses in the database.

Solution 1: Use TreeMap

The solution to the problem of case-sensitive hash maps in Java is to use a TreeMap instead of a HashMap, as it can accept a custom comparator. By using a case-insensitive comparator, the TreeMap will be able to match email addresses with different casing, and the correct MemberInfo can be retrieved from the database. Here is an example of how to use a TreeMap with a custom comparator in Java:

1Map<String, String> mapping = new TreeMap(String::caseInsensitiveComparator);

While TreeMap offers good performance, it may not be as fast as HashMap for large datasets. The query time increases with the size of the dataset. However, searching the data using TreeMap is similar to performing a binary search, so the access time doesn't drop linearly. Although TreeMap provides good performance, it can't match the nearly constant O(1) access time of HashMap.

Solution 2: Use HashMap

Unfortunately, hash maps only work based on object hashCode and equals methods, and it doesn't allow for custom functions to be supplied at the constructor or in any other way. This can be limiting, and as a result, many programmers opt for TreeMap.

However, as professional developers, we can create a better implementation for case-insensitive string equals and hash code methods. There are two main approaches:

  1. Convert all strings to lower/upper case. However, this creates a lot of new objects, which doubles memory usage, and we always need to lower/upper case input strings for the hash map. This can create memory churn in high-load scenarios, and it is also required to store strings with the original casing, so we need a way to map from lower/upper cased strings to the original ones. Due to all these challenges, this approach is not recommended.

  2. Extend the String class and override the hashCode and equals methods. Alternatively, we can create a new type, store the string as a final local variable, and implement custom equals and hashCode methods. This approach still requires the creation of new objects, but compared to creating new strings, memory allocation is much lower.

The challenge now is to implement those two methods. For equals, we can use the equalsIgnoreCase method instead of toLowerCase, which avoids the problems mentioned above. For the hashMethod, we can copy the logic from the String class and use the lowercase version of the characters for generating the hashCode. Here is an implementation example:

 1@RequiredArgsConstructor(staticName = "of")
 2public class CiString {
 3    @NonNull
 4    private final String stringVal;
 5    private int hash = 0;
 6
 7    @Override
 8    public boolean equals(Object o) {
 9        if (this == o) return true;
10
11        if (o == null || getClass() != o.getClass()) return false;
12        CiString ciString = (CiString) o;
13        return stringVal.equalsIgnoreCase(ciString.toString());
14    }
15    public int hashCode() {
16        int h = hash;
17        if (h == 0 && stringVal.length() > 0) {
18            for (int i = 0; i < stringVal.length(); i++) {
19                h = 31 * h + Character.toLowerCase(stringVal.charAt(i));
20            }
21            hash = h;
22        }
23        return h;
24    }
25    @Override
26    public String toString() {
27        return stringVal;
28    }
29}

Furthermore, if our intention is to use the CiString class only as keys in a hash map, and we calculate the hash code only once, we can remove the private int hash property from the class. This optimization can conserve memory space since the hash code is only necessary for determining the bucket location in the hash map, and not for any other tasks.

As professional developers, it's important to write tests to ensure the correctness and reliability of our code. Here is an example of a test class for the case-insensitive string hash map implementation:

 1public class CiStringTest {
 2       @Test
 3       public void nvl() {
 4               Assertions.assertNull(CiString.nvl(null));
 5               Assertions.assertNotNull(CiString.nvl(""));
 6       }
 7       @Test
 8       public void testEquals() {
 9               Assertions.assertEquals(CiString.of("String"), CiString.of("sTRING"));
10               Assertions.assertNotEquals(null, CiString.of(""));
11       }
12       @Test
13       public void testHashCode_caseInsensitive() {
14               CiString lowerCase = CiString.of("sTriNg");
15               CiString upperCase = CiString.of("StRING");
16               Assertions.assertEquals(lowerCase.hashCode(), upperCase.hashCode());
17       }
18       @Test
19       public void of_whenNullString_thenException() {
20               NullPointerException nullPointerException = Assertions
21                               .assertThrows(NullPointerException.class, () -> CiString.of(null));
22
23               Assertions.assertEquals("stringVal is marked non-null but is null", nullPointerException.getMessage());
24       }
25       @Test
26       public void toStringTest() {
27               Assertions.assertEquals("test", CiString.of("test").toString());
28       }
29}

Other Solutions: Baeldung

In this article, you can find a description of the problem we are discussing. The article also proposes several solutions to the problem, including the previously mentioned Solution 1. In addition to that, the article suggests using Apache's CaseInsensitiveMap or Spring's LinkedCaseInsensitiveMap as alternative solutions. However, it is important to note that both of these solutions are based on lowercasing the keys, which may not be the desired behavior in certain scenarios.

I'm active on Twitter and LinkedIn, and I'd love it if you could give me a follow. You can find me on Twitter at @mare_milenkovic and on LinkedIn at mare-milenkovic.

comments powered by Disqus