Hey there! Welcome to the Marketer Of The Month blog!
We recently interviewed Kirill Skaletskiy for our monthly podcast – ‘Marketer of the Month’! We had some amazing insightful conversations with Kirill and here’s what we discussed about-
1. Data Governance’s Role: Vital for data management, a key aspect of being a data-driven company.
2. Good Data Quality Examples: Varies by use case, driven by data requirements and context.
3. Data Quality Challenges: Responsibility and issue remediation processes pose hurdles.
4. Open-Source vs. Commercial Tools: Pros and cons of both options for data quality management.
5. Overcoming Challenges: Data profiling and collaboration with data owners and consumers.
6. Data Monitoring: Importance of data lineage and continuous data quality measurement.
About our host:
Dr. Saksham Sharda is the Chief Information Officer at Outgrow.co. He specializes in data collection, analysis, filtering, and transfer by means of widgets and applets. Interactive, cultural, and trending widgets designed by him have been featured on TrendHunter, Alibaba, ProductHunt, New York Marketing Association, FactoryBerlin, Digimarcon Silicon Valley, and at The European Affiliate Summit.
About our guest:
Kirill Skaletskiy is a senior data quality analyst at Just Eat Takeaway, leveraging his expertise in collecting, interpreting, analyzing, and visualizing data to empower businesses in achieving their desired outcomes. Proficient in both front-end dashboard design and back-end data processing, he utilizes a range of programming languages, including Python, SQL, DAX, and Excel VBA.
EPISODE 141: The Art of Data Governance: Just Eat’s Data Quality Analyst Kirill Skaletskiy on the Power of Data
Table of Contents
Saksham Sharda: Hi, everyone. Welcome to another episode of Outgrow’s Marketer of the Month. I’m your host, Dr. Saksham Sharda, and I’m the creative director at Outgrow. co. And for this month we are going to interview Kirill Skaletskiy who is a senior data quality analyst at Just Eat Takeaway.
Kirill Skaletskiy: Great to be here. Thank you.
Don’t have time to read? No problem, just watch the Podcast!
Or you can just listen to it on Spotify!
The Rapid Fire Round!
Saksham Sharda: We’re going to start with the rapid-fire round. You can only answer with one word or one sentence to these questions.
Kirill Skaletskiy: One word or one sentence.
Saksham Sharda: Yeah, All right. The first one is, at what age do you want to retire?
Kirill Skaletskiy: 80.
Saksham Sharda: How long does it take you to get ready in the mornings?
Kirill Skaletskiy: Five minutes.
Saksham Sharda: Most embarrassing moment of your life?
Kirill Skaletskiy: I don’t know this one.
Saksham Sharda: You can pass. If you don’t wanna answer. Favorite color?
Kirill Skaletskiy: Green.
Saksham Sharda: What time of day are you most inspired?
Kirill Skaletskiy: Night.
Saksham Sharda: How many hours of sleep can you survive on?
Kirill Skaletskiy: The minimum is four.
Saksham Sharda: Fill in the blank. An upcoming technology trend is _____.
Kirill Skaletskiy: The upcoming technology trend is based on data governance.
Saksham Sharda: The city in which the best kiss of your life happened?
Kirill Skaletskiy: Moscow.
Saksham Sharda: Pick one. Mark Zuckerberg or Elon Musk.
Kirill Skaletskiy: Elon Musk.
Saksham Sharda: The biggest mistake of your career?
Kirill Skaletskiy: Good question. There were a lot of them. I can’t choose one.
Saksham Sharda: How do you relax?
Kirill Skaletskiy: With my wife.
Saksham Sharda: How many cups of coffee do you drink per day?
Kirill Skaletskiy: One or two.
Saksham Sharda: A habit of yours that you hate?
Kirill Skaletskiy: Overworking.
Saksham Sharda: The most valuable skill you’ve learned in life.
Kirill Skaletskiy: Stakeholder management.
Saksham Sharda: Your favorite Netflix show?
Kirill Skaletskiy: I watch Anime.
Saksham Sharda: Are you an early riser or a night owl?
Kirill Skaletskiy: Night owl.
Saksham Sharda: One-word description of your leadership style?
Kirill Skaletskiy: Energy.
Saksham Sharda: The top priority in your daily schedule?
Kirill Skaletskiy: Meetings with my team.
Saksham Sharda: Memorable career milestone?
Kirill Skaletskiy: Moving to Amsterdam.
Saksham Sharda: Okay and a recent business innovation that caught your attention?
Kirill Skaletskiy: AI and Data Quality.
The Big Questions!
Saksham Sharda: Alright, so that was the end of the rapid fire. You did well. Thank you. And now the long-form questions, which you can answer with as much time and ease as you’d like.
Kirill Skaletskiy: Okay, sure.
Saksham Sharda: So how did you get started in working with data management?
Kirill Skaletskiy: Okay, that is a good question. So, in the era of my career I had a perfect chance to work as a BI analyst for it, then as a data analyst, and then a bit of data engineering. Everywhere I worked there was one problem no matter what the company size: data quality issues were like a problem in itself, mostly because there was no responsibility for the data quality. There was no process for data quality like who is responsible for this? What is the issue? Remediation processes? And I was very upset about this. Then one day I was reading an article about the top data jobs for the future and Data Governance was one of them. And I was like whoa that’s really what I want to do. And especially the data quality part of it. I found that one of the lowest data governance, one of the companies that had a data governance process in place and was one of the biggest retail companies in Russia which has more than 22,000 stores. And you can imagine how big this company is but for my liking, it was a very fast-paced company with a good data-driving culture. And I learned that a lot. And this is how I began my data quality and data governance journey.
Saksham Sharda: So do you always put data governance first and instinct next? What role do you think instinct has to play in today’s society now that data is taking over everything?
Kirill Skaletskiy: So first of all, data quality is a part of data governance. So we can’t make a distinction between data quality and data governance as a part of it. But then from the other side, data governance is now getting more and more hype. I would say not so much as AI but still think it’s crucial especially for big companies to have management in data. We do people management, we do software management but a lot of companies now only try to establish data management. I think that’s a crucial part because we, every company, try to be a data driver. But a huge part of being a data driver is having good data management in place.
Saksham Sharda: What does data quality mean to you? And what are its main goals?
Kirill Skaletskiy: That’s a very good question but maybe I can ask? How would you define data quality first?
Saksham Sharda: Is that what you meant by a conversational interview?
Kirill Skaletskiy: Yes, because it’s always fun to ask this question because people always give different answers.
Saksham Sharda: Well, I would assume data quality is maintaining data that is of value instead of data that is just everywhere. But how do you determine what is of value and what is not unless you have a sufficient amount of everything? Yeah, So I guess that’s what I vaguely think of it.
Kirill Skaletskiy: That’s a very good answer. It’s close to my answer. So basically there are two answers. One is official and one is more personal. So the official answer says that data is of high quality, it meets the expectations of data consumers, if it’s fit for purpose, and blah, blah, blah, blah. So you can imagine data quality is like having a treasure map. If you have proper directions, if there are no misses in different steps you’ll get to your treasure as fast as you can. And in our world, the treasurer for the companies that we work for is always, you know, getting more and more money and more revenue. So if you don’t have any issues with your map, with your path to the treasure then you’ll get it faster. That’s data quality for me.
Saksham Sharda: So could you provide some real examples of what good data quality looks like in practice?
Kirill Skaletskiy: Once again, I would say that it’s mostly by the use case. So because it’s based on the requirements we can say that let’s give you an example. If you have 5% of publications in the table, is it good or not? We don’t know because technically of course any number of publications is bad but that’s based on the requirements. If we had, I don’t know, like hundreds of billion rows, maybe two double K rows is not a big problem, but we have 100 rows, two double K rows is a big problem. So it’s like, it always depends on the requirements, I would say when we can say it mostly that once again let’s go back to our treasure map. Imagine that we have two treasure maps and you have five steps and specific coordinates and mine has seven steps. In another coordinate there’s inconsistency between our maps, but for the same treasure. So that’s a good example of the data quality issue. If we compare the same map we see different steps and we don’t know which one is right.
Saksham Sharda: So, speaking of that, when it comes to data quality, where different people involved in managing data disagree on what it means and how to achieve it?
Kirill Skaletskiy: Based on at least my experience and from what I see in different companies in different talks the problem there’s not a problem with disagreement on how to manage data quality there are some, and basically, that would be the topic for tomorrow’s talk on how we as a data governance team manage to introduce a unique unified approach for the data quality across the whole Jet. So just to give you a quick spoil we have a lot of engineering teams in Jet that do data quality differently, and this leads to a leak of comprehensive reporting for a single source of tool for data quality. As a data currency, we introduced a single source of tools with different controls in place with different specifications and processes. But let me keep this for tomorrow’s talk.
Kirill Skaletskiy: But basically, the biggest problem that I see with the data quality is finding the person who would be responsible for the data quality. Just one good example you can ask. I can imagine that whenever a person asks if data quality is important, everyone will say yes, of course, data quality is important. Yes, but then ask the same person if you want to be responsible for the data quality most of them will say no. This is the biggest problem. I guess in terms of my experience, based on what I see in the market finding the responsible individuals for the data quality as well as establishing their issue remediation processes because it’s easy to find that we have double cases on the table, but how will we resolve this issue? That’s the open question. I think this is the biggest challenge that organizations face when they’re trying to manage data quality.
Saksham Sharda: So they’re both open-source and commercial tools for improving data quality? Can you explain the main difference between these two options?
Kirill Skaletskiy: Of course, I can’t speak on behalf of both the open-source tools and pay tools. But I’m always trying to be in touch with different vendors to see what they’re doing. But basically, I would say that it’s more or less the same as we do in software engineering or any other in any other technology area. When you do open source, you are responsible for this. It’s like you need to pull the repo and then it’s your product. You need to deploy it on your own, you need to monitor it on your own. If you want to have any additional features. You are the one who needs to implement this one. And with the paid like going with the vendors, everything is already there which from one side is good but from the other side, there are risks. Like for example, vendor log. You may use the tool for many, many years, but then, I don’t know if the price increases like 10 times or if the tool just goes off the market. Then you are stuck with the tool as well and if you want to do some new features, you will not be able to because the tool is not open source. So having both open source and some pay tools is good. It’s just a matter of whether the company can use it because one important thing to know is when you pay for tools that are deployed not in your cloud environment for example, the data quality, the tool can have access to your data, which can be critical. For example, for a company like Just Eat Take Away, which is a public company, we don’t want to share data with anyone. There is privacy with our InfoSec. So open source is my personal choice. But if you can, of course, using some vendor tooling where everything is already in place for you and you don’t need to spend your time setting up these things always also works.
Saksham Sharda: Could you give us some examples of some of the open-source tools that you’d like to recommend to our audience?
Kirill Skaletskiy: Of course, in terms of data well data validation, No matter what company I work for I always try to bring great expectations and personal opinion to one of the best data validation tools. Then, of course, it’s important to mention soda, so there’s like another open source and both Great Expectations and Soda have managed versions. So you can pay for a managed solution, but you can also do open source as well. I think it’s important to mention it’s not data quality, but it’s part of data governance. A lot of data observability and data catalog tools, like, for example, data hub from data LinkedIn or open metadata as well. And there are a lot of tools that I love to use as an open source.
Saksham Sharda: So what challenges do engineers and others face when trying to figure out what information is important for finding problems in their work processes?
Kirill Skaletskiy: Do you mean problems in the data? I would say that the biggest problem when we try to establish the data quality process is finding the data requirements. So imagine you have a table with hundreds of columns and billions of rows and you want to do data quality on the table. You look at the table but you see numbers you see like strings but you don’t know if you see some no values but you don’t know if there is a no value or if there should be a duplicate or not. Because you may not know the primary key, for example for a table. I think the biggest challenge is understanding these business rules because you can do technical data quality checks easily. For example, checks like timeliness to see if the data is updated in the time that it was expected. Like double key checks if you know the primary key, maybe some basic null checks. If you do data profiling and you see that, for example, this column has 100% non-null values, you can do a data quality check on the null values. But finding the business rules, for example, in terms of just the takeaway like the food delivery company, we can say that if this order comes from a specific country, it should have a specific idea with a specific value of a specific code. And these business rules are very hard to find. And I guess this is the biggest challenge.
Saksham Sharda: And so then how do you recommend overcoming these challenges to improve workflow efficiency?
Kirill Skaletskiy: First of all, data profiling. Data profiling if possible, to have billions of pros and hundreds of cones. Data profiling may be a bit expensive, but of course, you can do samples of data and pro profile of data. The first step I would say is data profiling. Data profiling will say how many new values you have, what is the distribution, and all of this. And then later work closely with the find data owners, which is another very, very big challenge. Finding data, as I mentioned, everyone says data quality is important. Who wants to be responsible? Almost no one. So finding the data owners, working closely with the business users with the data consumers, with people dealing dashboards on top of the data with people who, with data engineers of course, but mostly I would say working closely with data consumers because data consumers are people who daily consume this data. They know all the possible issues there. They work with the data. They mostly do some data cleaning on their own and you can find it through them.
Saksham Sharda: So can you give some examples of adding data quality checks at the beginning of a data workflow to help spot issues?
Kirill Skaletskiy: Yeah, of course. So there are a lot of them, but the simplest one and one of the things that is currently on may be the same type as AI in terms of data governance and data contracts. And one part of the data contract is checking data types, for example. So let’s say we have a consumer and producer like the source and the target table, and we do a daily ingestion. We can do data validation between the data types. So always know that if there is I don’t know the name field is always a text or a streak. No, it should not be an integer, a float, or any other data type. If this changes, our pipeline will break. So this is like a simple example, but of course what we can do is like integrity check for example, in terms of what once again terminal adjusted takeaway. Let’s say that we can have an order with a CTID of five, but in a CT table, we don’t have a CTID of five, which can lead to inconsistent reporting. Now joints won’t go there. We can also provide this by doing the pre-assertions, like a pre-data quality check, as well as let’s say that if we pay for the data not only for data storage but also for data forgetting the word data doesn’t matter. So processing. Yeah, so we also pay for data processing. It may be worth checking if there are data quality issues, we don’t want to bring this less data to our data marts, and we don’t want to pay for processing, then we will need to reset again. Maybe it’s worth stopping the whole pipeline, fixing the issue on the source, and then reinjecting it. That’s another use case where you can implement data quality at the beginning of your pipeline rather than post-injection or post-transformation.
Saksham Sharda: Are there any specific instances where early data quality checks led to the discovery of significant data issues that you know of?
Kirill Skaletskiy: There are a lot of them. That’s why we are hired. I don’t know if I can share the biggest example is duplicates in some of the data marts with orders we can have because one second just the many other companies are public companies. So we share our KPIs publicly because we trade. We know how many orders we have, and how many customers we have. This is a public company. Having issues with these KPIs can lead to significant challenges for the company.
Saksham Sharda: Let’s pivot a little. So in the world of data problems, what’s the equivalent of tracking metrics across different parts of a service? Like we do application monitoring?
Kirill Skaletskiy: I would say it depends on the scope of the data platform. So within the data platform and to have a proper data quality process in place, it’s always based on tooling and processes. And to deploy tooling, you mostly use the same software, the same tools that you use for software engineering. You use Kubernetes, you use Grafana for monitoring, and you use ElasticSearch for some data storage and databases. But in terms of the metrics what is important I would say is data lineage, data lineage is very important for data quality. Understanding how the data flow can help us identify if there is an issue in the source, and what tables would be impacted on our reporting layer, for example. We can notify all the owners that there is an issue with the source and therefore all the dashboards and Tableau, for example, may have inaccurate data. So having a data leader is very important as well as measuring data quality on a continuous process. So not like we check the duplicates once there are no duplicates, okay, this cable is of high quality. No, we need to do this daily or weekly. I mean, during every ingestion maybe, or at least continuously. And measure the percentage of growth that hasn’t met our expectations. So we can always have a data quality dashboard, understanding what was the quality of the data a week ago, a month ago, and whether we improved or not.
Saksham Sharda: So the last question for you is of a personal kind. It is, if not data, what would you be doing in your life right now?
Kirill Skaletskiy: Music
Saksham Sharda: Which is also data, I suppose.
Kirill Skaletskiy: Yeah. Everything is mathematics.
Saksham Sharda: Okay.
Kirill Skaletskiy: Music.
Saksham Sharda: So do you make music of your own from the side?
Kirill Skaletskiy: I try but I can’t, but I try. And painting. I also paint, but I love it.
Saksham Sharda: Thanks, everyone for joining us for this month’s episode of Outgrow’s Marketer of the Month. That was Kirill Skaletskiy who is a senior data quality analyst at Just Eat Takeaway.
Kirill Skaletskiy: Pleasure. Thanks for having me.
Saksham Sharda: Check out the website for more details and we’ll see you once again next month with another marketer of the month.