Photo by Eyasu Etsub on Unsplash

4 ways bad data is ruining your data engineering career

Monday, Jun 27, 20224 min read
Sara Landfors

Data quality is massively on the rise as exemplified by the 2021 updates made to Matt Turck’s Machine Learning, Analytics and Data landscape. The upside of achieving data quality as an outcome is rather clear on an organizational level; better decision-making, better operations, better products. The list can be made long. Similarly, poor data quality, or “bad data”, has an impact also on a personal level—especially if you work as a data professional. In this article, we list four ways bad data holds you back as a data engineer.

1. You lose organizational trust in data

Bad data can cause your entire organization to lose trust in data as a whole, undermining the efforts of the data team. Whereas a company might have the desire to become “data driven,” the reality might be quite different—especially if the data itself can’t be trusted. Sometimes it’s enough with only one or two bad data quality failures for stakeholders to turn their back on data altogether.

Admittedly, we don’t know how the incident was handled internally, but this example helps paint a picture: Put yourself in the shoes of a stakeholder at Twitter when it was discovered that they had overstated the number of daily users on its service for three years straight, overcounting by up to 1.9 million users each quarter—how would you feel about the trustworthiness of the data?

In the end, as a data engineer, you might find that you lose the ability to become a trustworthy advisor to the organization. In effect, you might see overall opportunities dwindle as stakeholders hesitate to sponsor data use cases internally.

2. You have less time to implement robust and scalable systems with real impact for your organization

If you’re constantly firefighting bad data, you will be less productive because you will have less time for other things. Those “other things” include projects and tasks that have real impact for your organization and your team. It might mean building robust data systems that can gracefully handle unavailable nodes. It might also mean building scalable data systems that can handle higher data throughput as your organization grows. Regardless of what “other things” are, they are the work where you as a data engineer can bring value to your organization, and bad data is hindering you from doing so. But it’s not only about the value you bring to the organization—it’s also about the value you as an individual get out of your work, as we shall discuss next.

3. You have less time to work on—and learn about—things you’re passionate about 

Bad data can throw plans and schedules out the window. Many data teams can relate to canceled meetings and priorities in favor of resolving some pipeline bug that affects critical dashboards downstream. This shift of focus to urgent priorities can negatively impact your ability to concentrate on the important but non-urgent things in your career. One of the first things to go out the window is dedicated learning time.

In the short term, reprioritization often makes sense as commitments to other teams and colleagues are more important than commitments you make to yourself. However, the effects can be detrimental in the long run. If you make it a habit to cancel or deprioritize personal learning time due to firefighting bad data, your work will become less interesting. You will also risk losing touch with the latest trends and technologies in your space. Ultimately, you run the risk of caring less about your work. What used to be a 110% workday effort on your end might decline to an 80%. The value of your skillset and contribution to the company runs the risk of losing value over time.

4. You find it harder to advocate for the value of your role and team

Lastly, bad data can make it harder to explain a data team’s reason for existing. Advocating to decision-makers for a larger team, more budget for tooling, or maybe even a raise becomes more challenging. A C-level executive might reason that value should be delivered first, and that the team’s budget can be increased afterward.

This situation is symptomatic of the unsung hero that the data engineer often is; if things work well, there might be little upside to a job well done. The exception of course being if the data team is able to quantify and prove the value of their work, as highlighted by Rebecka Storm in a Heroes of Data article. If things don’t work well on the other hand, as is the case with bad data, the downside can be substantial in the eyes of the rest of the organization.

It’s safe to say that bad data can become a real villain in the lives of data practitioners, as it can cause real damage to their careers. But that’s not to say the future is bleak; there are many things that can be done about bad data, and a proliferation of products available to tackle this issue. That will be the focus of another article.

Let us know if you agree with this article and if you have experienced any of these four effects yourself?