January 18, 2021
As a data engineer, my main goal is to create a single and complete source of truth. This has brought me into the cloud and the ELT architecture where we can easily bring all data into a single source (usually a data warehouse) for later usage. However, in recent years we’ve seen a bigger awareness on where the data is stored and in some cases a requirement to hold it and keep it in a specific region. For teams like BI, this trend becomes a challenge. How can we produce analysis with data that can’t be moved away from another region? To solve this we have the following options:
Although we can’t move the data away from the region, we can do local aggregation of the data and then merge it (i.e total revenue). However, for teams with existing infrastructure, this approach quickly becomes unfeasible as global analysis would require a refactor of all processes and the BI tool. Additionally, to don’t remove previous visibility on the user’s data we’d need to have a BI tool instance on each region.
Another option is to move the query engine like Trino (former prestoSQL). By separating the data storage from the processing we could leverage its ability to query multiple databases and add views that automatically query the same data on different regions. We’d only need to do simple adjustments to our reports to read from the slightly adjusted sources. This type of abstraction also has the plus of abstracting away the addition of further regions. However, this kind of approach has some drawbacks:
This new type of challenge can appear even for small data teams as the users, the owners of their data, set these as a requirement from the start. For all data teams that might have the goal of going international, you should start thinking on either adding a SQL query engine (there multiple good options on all cloud providers) or on developing your data models with aggregations throughout it.
I'm Jose Cabeda, a data engineer focused on improving data systems and educating on how to use them. I also do a lot of planning and read as much as I can.