How to integrate Power BI and Azure Data Factory?

  • Articles
  • Data Engineering

Introduction

Today, there are companies that generate competitive advantages over others through the convenient use of their data. But it is not something quick to do, much less simple. Many times the data generated in the traditional databases of organizations store records in a format oriented to atomicity, coherence, isolation and durability (known as ACID properties), which are classic characteristics of relational database engines. However, in order to analyze and make decisions based on them, it is necessary to transform them and store them in some other place that allows them to be queried quickly. Depending on their dimensions, the concepts of data warehouses, data marts and data lakes usually appear.

Without going much further, at Brain Food we help companies to design and implement technological architectures that allow them to make efficient and intelligent use of their data, so that they can make better decisions, improve their processes, increase their competitiveness and make a difference in their respective markets. However, it is common to encounter the problem that users cannot visualize the information graphically, and therefore, it is vital to have a good reporting tool, such as Microsoft Power BI. This tool is a market leader and has cloud services, which allows processing and viewing reports online, creating workspaces, sharing with other users, limiting access, among other features. On the other hand, in order to have the quality information that you want to visualize in the reports, it is common to previously perform ETLs (Extract, Transformation & Load) processes, which will take raw data to process and store them in the destination defined by the organization’s architecture. For that task, there are different tools and one of our favorites is Azure Data Factory, which is Azure’s cloud ETL service for horizontal scalability serverless data integration and transformation. It offers a codeless user interface that favors intuitive creation and a monitoring and administration from a single dashboard, notoriously simplifying the management of this kind of processes. This post will explain how the integration between Power BI and Azure Data Factory (ADF) works.

Process

Commonly, it is established that the processes that load data into a data mart or data warehouse, are triggered periodically to maintain a base as updated as possible (the latter, subject to business rules), however, that does not necessarily keep the reports updated, as it is usual that in Power BI there are specific dataflows and datasets for each visualization. So, one solution would be that every time an ETL is successfully completed in Azure Data Factory, the corresponding dataflow could be executed, which in turn would update the dataset that consumes the report and thus prevent the user from viewing an old version.

The first step to achieve this, is to enable in the Power BI Service administration panel that the applications registered in Azure Active Directory (later, AAD) can use its APIs with the enabled button, and also select the security group that can do so, as can be seen in the second red box in the following image (of course the ADF resource must be registered there as well).

Once the permissions have been configured, an HTTP request must be sent to the AAD authentication service in order to obtain the token that will allow communication with PBI from ADF, and this can be done with a web activity that inside the body sends the necessary credentials as parameters using the POST method, to the corresponding url as shown below:

The url must follow the format: https://login.microsoftonline.com/{tenantid}/oath2/token

If the request is successful, AAD will respond with the token_type and access_token parameters that will allow secure requests to the PBI API. Then, the next step is to execute the corresponding dataflow update using the following available endpoint:

POST: https://api.powerbi.com/v1.0/myorg/groups/{groupId}/dataflows/{dataflowId}/refreshes

Where groupId and dataflowId are the identifiers of the workspace and the associated dataflow.

In addition, the parameters obtained from AAD must be concatenated with the format ‘{token_type} {access_token}’ and included in the headers of each request. On the other hand, this endpoint requires sending a parameter to configure the type of notification to be received once the dataflow is updated. The above can be seen in the following image:

Now, to update the dataset that specifically consumes the report, you have to wait for the dataflow to finish updating, which, depending on the size of the model, may take several minutes. To find out when that will happen, the following endpoint can be used:

GET: https://api.powerbi.com/v1.0/myorg/groups/{groupId}/dataflows/{dataflowId}/transactions

Which will return the status parameter, corresponding to the status of the transaction (in this case, the dataflow update). Since the response is likely to be InProgress, it can be used as an edge condition when the status is no longer so, as can be visualized in the following pipeline:

Once verified that the dataflow process is finished, it is finally time to update the dataset with the following endpoint:

POST: https://api.powerbi.com/v1.0/myorg/datasets/{datasetId}/refreshes

(Note that this endpoint also requires the notifyOption parameter).

Finally, an orchestration of this type can look like this:

Final thoughts

With this implementation, a continuous and automatic flow is achieved, which obtains data from a transactional system, transforms them and makes them available as information in a layer specially developed for the business, capable of generating concrete actionable. All this without the need for the user to update the dataset, since the API integration leaves the report ready, saving about 10 minutes that would take each iteration, which, in times of high demand, can be more than 3 iterations per day, per report.

On the other hand, it proved the feasibility of an integration between Power BI and Azure Data Factory, tools that are quite versatile and user-friendly, and that allow building analytical layers of information that tomorrow each area of an organization could use to meet their own needs.

Referencias