Pattern matching over strings in SQL is a frequent need, much more frequent than some may think. Queries aren’t just for compiling demanding aggregate calculations, advanced joins, and table partitioning. This applies especially to data scientists and data engineers who often have to deal with raw, unstructured data. For these professional pattern matching operations, from the most basic to more advanced, are probably an everyday task.
The demand for efficient SQL pattern matching becomes immediate as the velocity of available data increases, especially within data-driven companies where decision-making is highly dependent on data.
The simplest and most common case in which such an operation is needed is when one simply wants to search in his database for data that match exactly or are similar to a specific pattern. However, this becomes extremely useful when you do not actually know the exact word or phrase you seek. E.g., Those whose name is ‘John’ or contains the string ‘John’, for example, ‘Johnpaul’.
In this post, we will go through all the capabilities that the majority of SQL dialects offer regarding both the aforementioned tasks when using Amazon Redshift, PostgreSQL, Google BigQuery, or SQL Server.
So, let’s get started!
When it comes to pattern matching, the usual options are 3: LIKE operator, SIMILAR TO operator, which is available only on some SQL dialects, and Regular Expressions.