Regex stands for Regular Expressions, and in practice, it is a sequence of characters and symbols that become search patterns for other text. It is a technique developed in theoretical computer science and formal language theory. Generally, these patterns are used as functionalities by algorithms within programming languages, operating systems, ETL tools, Business Intelligence, Databases, Office Automation and many more.
Particularly in Business Intelligence, it is often necessary to perform ETL (Extract Transform and Load) operations, to clean, format and validate the data provided to us. This is when we have the possibility of using Regex to solve problems in a fast, efficient and even elegant way.
Examples of use:
Illustration 1: Example of Regular Expression to validate a date format of the type dd-MM-yyyy
Many times data does not come ready and comfortable to use with a BI visualization tool, a recurrent case is when the output data of a survey software delivers coded answers and questions. Within this context, Regex becomes very useful to search for a particular value (coded answer) within another text dictionary of questions with their possible output values (the alternatives).
Illustration 2: Example of coded survey responses
The solution is to find a Regex pattern that allows to identify and extract a value searched by a key, within a dictionary in JSON format (it will work for any question in the survey!).
In this case, a pattern that works for us is 2:”([^”]+)”, where the number 2 in the example, will subsequently be the variable of the coded answers of each question, the characters in black are literal and that format is searched in the dictionary. The parentheses () in Regex indicate a group, which is a portion of the text that matches the pattern. Finally, the text [^”]+ tells us that it accepts any character 1 or more times that is different from quotes (“).
Illustration 3: Regular Expression Testing
There are sites like https://regex101.com/ that help you write a Regex, understand its syntax and test it online!
The expected result would be something like this:
Power BI does not have a native implementation of Regex, so you need to insert Python or R scripts at the Power Query level, in addition to installing the environment of the chosen language previously.
Illustration 4: Python script in Power Query
Illustration 5: Field calculated in Tableau
Using the leading tools in the Business Intelligence market, such as Power BI and Tableau, we have been able to solve a recurring problem with the help of Regex, in addition to knowing its power and the various uses in which it can help when we face ETL problems, at the beginning of a visualization project.