≡ Menu

Using SQL’s “IN” in JDBC

In SQL, the IN operator is used to restrict columns to one of a set of values. Using IN in JDBC, though, is sometimes problematic because of the way different databases handle prepared statements.

With JDBC, prepared statements use ? to serve as markers for values in a SQL statement. Thus, you might see:

PreparedStatement ps=connection.prepareStatement("SELECT * FROM FOO WHERE BAR = ?");

This serves to help prevent SQL injection attacks; assigning a value of "'' or 1==1'" would check that actual value against BAR rather than return all rows.

Exploits of a Mom

The parameter number of each ? is an index, starting from 1, so to set the value against which to compare BAR we might see:

ps.setParameter(1, "BAZ");

The IN operator in SQL allows selection from a set of values. Thus, we might see:

SELECT * FROM FOO WHERE BAR IN ('BAZ', 'QUUX', 'CORGE')

If BAR is one of BAZ, QUUX, or CORGE, then the row matches the query and will be returned.

It would make sense to see a PreparedStatement declared as:

PreparedStatement ps=connection.prepareStatement("SELECT * FROM FOO WHERE BAR IN (?)");

However, this doesn’t work. (It gives you only one element to use for the IN selector.) You have two choices: you can write SQL against your specific database, or you can generate custom SQL for the query.

Let’s look at the most general form (the SQL customization) first, since that’s going to be supported best. We are assuming a simple table, created with:

create table if not exists information (id identity primary key, info integer)

Note that we’re presuming H2 at this stage. In PostgreSQL, an equivalent statement would be create table if not exists information (id serial primary key, info integer). With MySQL… oh, who cares, nobody should use MySQL.

Given an array of data to use for the IN clause of Integer[] data = {3, 4, 6, 11};, we can construct a viable (and general) SQL query like this:

StringJoiner joiner = new StringJoiner(
  ",", 
  "select * from information where info in (", 
  ")");
for (Object ignored : data) {
  joiner.add("?");
}
String query = joiner.toString();
try (PreparedStatement ps = conn.prepareStatement(query)) {
  for (int c = 0; c < data.length; c++) {
    ps.setObject(c + 1, data[c]);
  }
  try (ResultSet rs = ps.executeQuery()) {
    showResults(rs);
  }
}

This code isn’t complicated, although it looks like a lot for what it does. It first creates a SQL statement with a placeholder for every element in the data array, then sets each placeholder to the corresponding value in data, and then runs the query. The SQL has to be regenerated for every case where data has a different length. (We could potentially reuse the statement if data always has the same length.)

You can also generalize this, depending on your database. It requires custom SQL, though, and the code to use the SQL differs by database as well.

H2

For H2, we can use the ARRAY_CONTAINS function. Our SQL statement will look like "select * from information where array_contains(?, info)", and the code to use this statement looks like this:

try (PreparedStatement ps = conn.prepareStatement(query)) {
  ps.setObject(1,data);
  try (ResultSet rs = ps.executeQuery()) {
    showResults(rs);
  }
}

H2 can use setObject() and use that as the input for the ARRAY_CONTAINS function; this way, we have one placeholder and we don’t have to generate custom SQL for every different size of the input array.

PostgreSQL

In PostgreSQL, we can use the ANY function. Our SQL looks like "select * from information where info = ANY(?)". Our code to use the statement:

try (PreparedStatement ps = conn.prepareStatement(query)) {
  Array array=conn.createArrayOf("INTEGER", data);
  ps.setArray(1, array);
  try (ResultSet rs = ps.executeQuery()) {
    showResults(rs);
  }
}

MySQL

Nobody should use MySQL.

This is offered somewhat tongue-in-cheek, for a few reasons: one is that I genuinely dislike MySQL, another is that the SQL technique offered here probably isn’t needed very often in the first place (so doing an exhaustive solution is overkill), and the third is ironic: this site is hosted in WordPress, and uses MySQL as the backend database. Irony ftw, right?

Conclusion

We’ve shown a few possibilities for restricting the results of queries, using a general-purpose restriction (IN, with custom SQL generated for every query, still protected from SQL injection attacks), and custom SQL queries for both H2 and PostgreSQL. These are definitely not the only possibilities; feel free to show how you’d do it, or discuss potential optimizations. Some sample code for these examples can be found at https://github.com/jottinger/jdbc_contains – note that some of the code might require modification for each database, and the project doesn’t describe how to create the PostgreSQL database. (The project was written largely to prove the mechanisms described here, and wasn’t meant to be a one-size-fits-all solution.)

Comments on this entry are closed.