Data Trends & Insights

The 20 most popular data engineering tools in the Nordics

March 30, 2022

Richard Wang

The 20 most popular data engineering tools in the Nordics

And the surprising role of BigQuery

At Validio, we’ve had the pleasure to speak with 100+ modern data teams globally, many of them located in the Nordics. During our conversations, we’ve covered topics such as the requirements needed for data pipelines, rationale for technology choices and the challenges involved when building out the data infrastructure. One specific topic we’ve covered extensively includes the technology preferences and tools being used by data teams in their data stacks. We’ve taken a look at our notes, crunched some data, and now share our findings on the 20 most popular data engineering tools being used in the Nordics.

In addition, we’ll do a deep dive on the adoption of cloud data warehouses in the Nordic region. As noted by Matt Turck in the 2021 Machine Learning, AI and Data Landscape analysis, modern cloud data warehouses have unlocked an entire ecosystem of tools and companies:

“Today, cloud data warehouses (Snowflake, Amazon Redshift and Google BigQuery) and lakehouses (Databricks) provide the ability to store massive amounts of data in a way that’s useful, not completely cost-prohibitive and doesn’t require an army of very technical people to maintain. In other words, after all these years, it is now finally possible to store and process Big Data. That is a big deal, and has proven to be a major unlock for the rest of the data/AI space." - Matt Turck, Partner at FirstMark Capital

The companies we’ve spoken with are primarily fast-growing scaleups and unicorns; hence a majority of the data teams we’ve spoken to have been able to build greenfield data stacks, without being entrenched in legacy on-prem systems needing migrations or retrofit integrations. In light of that, don’t be surprised if you don’t see tools such as Oracle database or Microsoft SQL Server on the list.

Let’s dig in!

Airflow, dbt and BigQuery (?) topping the list

Perhaps the most interesting thing about modern data tool usage in the Nordics is the prevalent usage of Airflow, dbt and BigQuery—all of which are being used by almost half of the teams we’ve spoken to.

Fig 1. Airflow, dbt and BigQuery are the three most popular tools among Nordic scale-ups and unicorns by a significant margin. Source: Validio interviewing Nordic data teams

Furthermore, there’s one tool close to the bottom of the list that might be considered an odd bird for the global data community, and that further suggests that the data is collected from the Nordics: the workflow orchestration tool Luigi. Two weeks ago Spotify’s engineering team published an article explaining why they’re switching their workflow orchestration tooling away from Luigi (which first started as an internal tool at Spotify). This spurred a lot of discussions within the data community where e.g. Erik Bernhadsson, one of the main maintainers of Luigi while he was at Spotify, started a Twitter thread discussing why Luigi didn’t reach worldwide adoption like Airflow (which originates from AirBnb). Spotify is by the way not switching their data orchestration to Airflow, but to Flyte.

Looking at the list with different tooling categories in mind, Nordic scale-ups and unicorns have agreed on a set of category favorites:

Workflow orchestration: Airflow
Data transformation: dbt
Data warehouse: BigQuery
BI visualization: Looker
Transactional database: Postgres
Stream processing: Kafka

One thing that may come as a surprise to some readers is to see Redshift down below in 10th place, especially considering that the AWS native data warehouse is often viewed as one of the first major tools responsible for ushering us into the era of cloud-native data infrastructure, or the Modern Data Stack as many call it today.

Fig 2. The evolution of data warehouses. Redshift, which launched publicly in 2012, is considered the pioneer of modern cloud data warehouses.

What’s more, the lead BigQuery has on its main competitors Snowflake and Redshift might also come as a surprise. Comparing our data with that from other parts of the world illustrates what we mean:

Fig 3. In the data reported by Secoda and Immuta, notice how BigQuery is either behind or only slightly ahead of its closest competitor. Note: We’re primarily interested in comparing the relative popularity of the tools. Given the different sources and data collecting methodologies used, we wouldn’t read too much into the difference in absolute % between each source. Source: Secoda, Immuta (2)

In other words, the data from our discussions with Nordic scale-ups clearly suggest that Nordic companies are adopting BigQuery at a significantly higher rate vs. Snowflake and Redshift in other regions and the rest of the world.

Recent job postings support BigQuery’s popularity in the Nordics

By now, unless you’re a data professional operating in the Nordics with experience and insight into multiple Nordic scale-ups and finding the above statistics to completely conform to your expectations, you might at this point question the validity of the data and the rigidity of the data collection methodology (or lack thereof).

Luckily, Robert Sahlin (Senior Data Engineer at a Nordic scale-up and data community influencer) has collected some job posting data, which will serve as a second data source to support our findings.

Fig 4. Swedish companies are looking for talent with BigQuery skills at a noticeably higher rate than the US and rest of the world when compared to Redshift and Snowflake. Notice how similar the share of postings asking for BigQuery skills in Sweden (44%) are compared to the share of companies using BigQuery in the Nordics (45%) in Fig. 1—granted, this is not an apple to apple comparison given the different geographical scope and that not all companies necessarily advertise open positions at the same time, but still, isn’t it exciting when different data sets allude to the same story? Source: Robert Sahlin LinkedIn post

As Sahlin’s data suggests, Swedish-based companies (which serve as a representative sample of the Nordics) are indeed looking to hire BigQuery talent to a larger extent compared to the US and the rest of the world.

In the same post, there are some noteworthy comments as to whether a few large companies (e.g. Spotify, IKEA and King, all of which are using BigQuery (3)) happened to be looking to fill multiple data roles when the job postings data was collected, potentially contributing to the large share of BigQuery postings (turns out they accounted for a little bit more than ~10% of the postings). Given that the data from our surveyed scale-ups and unicorns, and Sahlin’s job ad data points towards the same thing, we can draw the conclusion that the Nordics indeed is a BigQuery stronghold (4).

Fluke or structural reasons?

Clickbait headlines aside, is this a fluke or can we find structural reasons and a narrative behind the stats? Again, comments on Sahlin’s post on job postings offers some interesting points:

1. Nordic data talent were schooled at Nordic success stories such as Spotify, iZettle and King, where BigQuery was used

This one is SahIin’s own hypothesis, where he suggests that data talent first worked at companies where BigQuery was the data warehouse of choice and later took senior positions at other companies, influencing the decision of which data tools to use. Prominent Swedish scale-ups like Spotify, iZettle (now Zettle after being acquired by PayPal) and King are all examples of heavy BigQuery users where data talent may have started their careers. (although the last mentioned parent company, Activision Blizzard, just got acquired by Microsoft - Hello Azure. Who would have thought you would need to live through a cloud migration when you’re already on cloud…).

2. Google has a strong local sales team

In the Nordic market, the local Google sales team seemingly have a good reputation, illustrated by comments such as:

Note: Leif was working at Tableau at the time the comment was made

As an outsider, one could speculate about what came first: the chicken or the egg? Is Google doubling down on the Nordics with a strong team allowing them to defend and grow their local market share, or has Google managed to gain a strong market position only because they have a strong team?

3. When using Google Cloud Platform (GCP), BigQuery comes out of the box

Which came first, the software engineer or the data engineer? Traditionally, in most companies, it’s the software engineer. If your software engineers already are on GCP, it’s not hard to imagine the account managers at Google upselling their existing accounts with their suite of data tools (including BigQuery) to data engineers, especially after the above anecdotal evidence of Google’s strong local team. Alternatively, data engineers may start to use BigQuery on their own accord, again, simply because their software engineer colleagues are already on GCP.

4. Flat and non-hierarchical companies in the Nordics adopt the community favorite

This one is our own hypothesis and something that struck us when we first saw the stats on the web communities of each of the data warehouses. Before taking a look at the community stats, we wanted to share a comment made about our CEO, Patrik, from one of our colleagues who recently moved to the Nordics (paraphrased):“Patrik doesn't really interfere with our work, I haven’t had any other boss who has had so little to say about the work we do everyday, and he is supposedly the big boss.”

This was naturally not a comment on Patrik’s competencies or leadership abilities, but rather a comment on the egalitarian work culture that Sweden and other Nordics countries are known for. An article published in The Local in 2019 discussed flat hierarchies in Sweden and interviewed a Nordic CEO:“You don’t need to be a group manager or a boss to be in charge [...] everybody can make a decision—as long as it aligns to the company's plan and you take responsibility for it and inform everybody who would be affected by it.”

In other words, Nordic employees are encouraged to participate and influence company decisions, supposedly to a greater extent than in other parts of the world, something I believe that many Nordic citizens and expats in the area would agree to (If you want to know more about the flat hierarchies in the Nordics, we recommend taking a look at the article. A flat hierarchy doesn’t come without its own set of challenges).

Let’s now change gears and get back to the community stats we mentioned earlier:

Fig.5: Based on community stats, BigQuery is clearly the community favorite with the biggest numbers among data warehouse alternatives. Source: Reddit, Stack Overflow, as of 22/3/2022

BigQuery being the clear favorite has a few implications:

It’s a strong indication of the bottom-up/grass-root support the platform has from data engineers around the world—many data engineers are working with BigQuery and are invested enough in making it work to start participating in communities asking questions and sharing tips and tricks.
It has flywheel characteristics—i.e. anyone who wants to get started with a data warehouse can start Googling and comparing the alternatives. Once they find out that BigQuery has a big community with a plethora of resources and information available, they then join the BigQuery community, and start contributing themselves, increasing the community size. For other potential BigQuery users, simply rinse and repeat.

By now, you’ve probably put two and two together and see where we’re going with this; if BigQuery is the community favorite with strong bottom-up/grass root support, and Nordic companies encourage all employees regardless of tenure to participate in the decision-making processes - wouldn’t it make sense then that engineers would recommend the alternative they’ve heard so much about on the internet and from friends, maybe even played around with themselves on the sparetime, when it comes to choosing technology?

For anyone familiar with product-led growth (PLG) and community-led growth, what we’re essentially saying is that the Nordics has a work culture particularly well-suited for a PLG and community-led growth motion.

Final thoughts

At Validio, we are building the next generation data quality validation and monitoring platform. As such, we expect solutions in this category to soon find themselves on these lists. Not only have we heard from the 100+ discussions we’ve had with modern data teams on how data quality management is becoming a top priority, but companies are also publicly announcing data quality strategies and implementing specific OKRs, such as e.g. Gitlab.

What’s clear to us is that the amount of tools out there won’t reduce in number any time soon. Regardless of what problem a new data tool aims to solve, it’s paramount that it integrates and plays nicely with existing tools, whether it’s data warehouses like BigQuery, Redshift, Snowflake or some ETL/ELT or workflow orchestration tool. This way, data teams can mix and match and pick the tools that best suit their needs. As Bessemer notes, new infrastructure providers need to work seamlessly with a company’s predominant tools if they are to achieve any real adoption.

Lastly, data engineering is not—and never has been—about any particular technology. Data engineering is about designing, building, and maintaining systems and data platforms that incorporate best-of-breed and fit-for-purpose technologies and practices in a cost-effective way. Tools need to provide value, the right type of deployment optionality and abstract complexity away from data engineering.

Notes

Similar to Validio, Secoda has interviewed 100+ data engineers. At the time of writing, they are a small team of about 10 people based in Toronto, Canada. We assume that most of the data engineers they interviewed are also based in Canada.
Immuta has segmented their data based on DataOps maturity, data shown here are for companies with ‘mature DataOps’ as we believe it reflects our sample of companies best.
A quick Google query on “[prominent Nordic company]” + “data engineer” results in data engineering job ads that often list the data stack being used by the company. Doing the exercise for HM and IKEA, for example, shows that they’re on GCP and using BigQuery.
Open goal for AWS or Snowflake to prove us wrong with official stats - but hey, in that case we got some market insights out of them.