HomeBig DataThe Way forward for the Fashionable Knowledge Stack in 2022 - Atlan

The Way forward for the Fashionable Knowledge Stack in 2022 – Atlan


That includes the 6 huge concepts you must know from 2021

As the information world slowed down for the vacations, I bought some downtime to step again and take into consideration the final yr. And I can’t assist however assume, wow, what a yr it’s been!

Is it simply me, or did knowledge undergo 5 years’ value of change in 2021?

It’s partially COVID time, the place a month seems like a day and a yr on the identical time. You’d blink, and out of the blue there could be a brand new buzzword dominating Knowledge Twitter. It’s additionally partially the deluge of VC cash and loopy startup rounds, which added gasoline to the yr’s knowledge hearth.

With a lot hype, it’s onerous to know what developments are right here to remain and which is able to disappear simply as rapidly as they arose.

This weblog breaks down the six concepts you must know in regards to the trendy knowledge stack going into 2022 — those that exploded within the knowledge world final yr and don’t appear to be going away.

Future of the modern data stack in 2022: Data Mesh

You most likely know this time period by now, even you don’t precisely know what it means. The thought of the “knowledge mesh” got here from two 2019 blogs by Zhamak Dehghani, Director of Rising Applied sciences at Thoughtworks:

  1. Transfer Past a Monolithic Knowledge Lake to a Distributed Knowledge Mesh
  2. Knowledge Mesh Ideas and Logical Structure

Its core concept is that corporations can change into extra data-driven by shifting from centralized knowledge warehouses and lakes to a “domain-oriented decentralized knowledge possession and structure” pushed by self-serve knowledge and “federated computational governance”.

As you’ll be able to see, the language across the knowledge mesh will get complicated quick, which is why there’s no scarcity of “what really is a knowledge mesh?” articles.

The thought of the information mesh has been quietly rising since 2019, till out of the blue it was in every single place in 2021. The Thoughtworks Know-how Radar moved Knowledge Mesh’s standing from “Trial” to “Assess” in only one yr. The Knowledge Mesh Studying Neighborhood launched, and their Slack group bought over 1,500 signups in 45 days. Zalando began doing talks about the way it moved to a knowledge mesh.

Quickly sufficient, scorching takes had been flying backwards and forwards on Twitter, with knowledge leaders arguing over whether or not the information mesh is revolutionary or ridiculous.

Future of the modern data stack in 2022: Data Mesh

In 2022, I feel we’ll see a ton of platforms rebrand and provide their providers because the “final knowledge mesh platform”. However the factor is, the information mesh isn’t a platform or a service you can purchase off the shelf. It’s a design idea with some fantastic ideas like distributed possession, domain-based design, knowledge discoverability, and knowledge product delivery requirements — all of that are value attempting to operationalize in your group.

So right here’s my recommendation: As knowledge leaders, it is very important follow the primary rules at a conceptual degree, quite than purchase into the hype that you simply’ll inevitably see available in the market quickly. I wouldn’t be shocked if some groups (particularly smaller ones) can obtain the information mesh structure by means of a completely centralized knowledge platform constructed on Snowflake and dbt, whereas others will leverage the identical rules to consolidate their “knowledge mesh” throughout complicated multi-cloud environments.

Future of the modern data stack in 2022: Metrics Layer

Metrics are important to assessing and driving an organization’s progress, however they’ve been struggling for years. They’re usually cut up throughout totally different knowledge instruments, with totally different definitions for a similar metric throughout totally different groups or dashboards.

In 2021, folks lastly began speaking about how the trendy knowledge stack may repair this concern. It’s been known as the metrics layermetrics retailerheadless BI, and much more names than I can listing right here.

It began in January, when Base Case proposed “Headless Enterprise Intelligence”, a brand new strategy to fixing metrics issues. A pair months later, Benn Stancil from Mode talked in regards to the “lacking metrics layer” in immediately’s knowledge stack.

That’s when issues actually took off. 4 days later, Mona Akmal and Aakash Kambuj from Falkon printed articles about making metrics first-class residents and the “trendy metrics stack”.

Two days after that, Airbnb introduced that it had been constructing a home-grown metrics platform known as Minerva to resolve this concern. Different outstanding tech corporations quickly adopted go well with, together with LinkedIn’s Unified Metrics Platform, Uber’s uMetric, and Spotify’s metrics catalog of their “new experimentation platform”.

Simply once we thought this fervor had died down, Drew Banin (CPO and Co-Founding father of dbt) opened a PR on dbtcore in October. He hinted that dbt could be incorporating a metrics layer into its product, and even included hyperlinks to these foundational blogs by Benn and Base Case. The PR blew up and reignited the dialogue round constructing a greater metrics layer within the trendy knowledge stack.

In the meantime, a bunch of early stage startups have launched to compete for this house. Remodel might be the largest title up to now, however MetriqlLightdashSupergrain, and Metlo additionally launched this yr. Some larger names are additionally pivoting to compete within the metrics layer, corresponding to GoodData’s foray into Headless BI.

Future of the modern data stack in 2022: Metrics Layer

I’m extraordinarily excited in regards to the metrics layer lastly turning into a factor. Just a few months in the past, George Fraser from Fivetran had an unpopular opinion that all metrics shops will evolve into BI instruments. Whereas I don’t totally agree, I do imagine {that a} metrics layer that isn’t tightly built-in with BI is unlikely to ever change into commonplace.

Nonetheless, current BI instruments aren’t actually incentivized to combine an exterior metrics layer into their instruments… which makes this a rooster and egg downside. Standalone metrics layers will battle to encourage BI instruments to undertake their frameworks, and will likely be pressured to construct BI like Looker was pressured to a few years in the past.

For this reason I’m actually enthusiastic about dbt saying their foray into the metrics layer. dbt already has sufficient distribution to encourage at the least the trendy BI instruments (e.g. Preset, Mode, Thoughtspot) to combine deeply into the dbt metrics API, which can create aggressive strain for the bigger BI gamers.

I additionally assume that metrics layers are so deeply intertwined with the transformation course of that intuitively this is sensible. My prediction is that we’ll see metrics change into a first-class citizen in additional transformation instruments in 2022.

Future of the modern data stack in 2022: Reverse ETL

For years, ETL (Extract, Remodel, Load) was how knowledge groups populated their methods. First, they’d pull knowledge from third-party methods, clear it up, after which load it into their warehouses. This was nice as a result of it stored knowledge warehouses clear and orderly, however it additionally meant that it took ceaselessly to get knowledge into warehouses. Typically, knowledge groups simply needed to dump uncooked knowledge into their methods and cope with it later.

That’s why many corporations moved from ETL to ELT (Extract, Load, Remodel) a few years in the past. As an alternative of reworking knowledge first, corporations would ship uncooked knowledge into an information lake, then rework it later for a selected use case or downside.

In 2021, we bought one other main evolution on this concept — reverse ETL. This idea first began getting consideration in February, when Astasia Myers (Founding Enterprise Accomplice at Quiet Capital) wrote an article in regards to the emergence of reverse ETL.

Since then, Hightouch and Census (each of which launched in December 2020) have set off a firestorm as they’ve battled to personal the reverse ETL house. Census introduced that it raised a $16 million Sequence A in February and printed a sequence of benchmarking stories focusing on Hightouch. Hightouch countered with three raises of a complete $54.2 million in lower than 12 months.

Hightouch and Census have dominated the reverse ETL dialogue this yr, however they’re not the one ones within the house. Different notable corporations are GrouparooHeadsUp, PolytomicRudderstack, and Workato (who closed a $200m Sequence E in November). Seekwell even bought acquired by Thoughtspot in March.

Future of the modern data stack in 2022: Reverse ETL

I’m fairly enthusiastic about all the pieces that’s fixing the “final mile” downside within the trendy knowledge stack. We’re now speaking extra about use knowledge in day by day operations than warehouse it — that’s an unbelievable signal of how mature the basic constructing blocks of the information stack (warehousing, transformation, and many others) have change into!

What I’m not so certain about is whether or not reverse ETL must be its personal house or simply be mixed with an information ingestion instrument, given how related the basic capabilities of piping knowledge out and in are. Gamers like Hevodata have already began providing each ingestion and reverse ETL providers in the identical product, and I imagine that we would see extra consolidation (or deeper go-to-market partnerships) within the house quickly.

Future of the modern data stack in 2022: Active Metadata & Third-Gen Data Catalogs

Within the final couple of years, the talk round knowledge catalogs was, “Are they out of date?” And it could be straightforward to assume the reply is sure. In a few well-known articles, Barr Moses argued that knowledge catalogs had been useless, and Michael Kaminsky argued that we don’t want knowledge dictionaries.

However, there’s by no means been a lot buzz about knowledge catalogs and metadata. There are such a lot of knowledge catalogs that Rohan from our crew created thedatacatalog.com, a “catalog of catalogs”, which feels each ridiculous and utterly obligatory. So which is it — are knowledge catalogs useless or stronger than ever?

This yr, knowledge catalogs bought new life with the creation of two new ideas — third-generation knowledge catalogs and lively metadata.

Firstly of 2021, I wrote an article on trendy metadata for the trendy knowledge stack. I launched the concept that we’re getting into the third-generation of knowledge catalogs, a elementary transformation from the prevalent old-school, on-premise knowledge catalogs. These new knowledge catalogs are constructed round numerous knowledge property, “huge metadata”, end-to-end knowledge visibility, and embedded collaboration.

This concept bought amplified by a large transfer Gartner made this yr — scrapping its Magic Quadrant for Metadata Administration Options and changing it with the Market Information for Lively Metadata. In doing this, they launched “lively metadata” as a brand new class within the knowledge house.

What’s the distinction? Outdated-school knowledge catalogs gather metadata and convey them right into a siloed “passive” instrument, aka the standard knowledge catalog. Lively metadata platforms act as two-way platforms — they not solely carry metadata collectively right into a single retailer like a metadata lake, but in addition leverage “reverse metadata” to make metadata obtainable in day by day workflows.

For the reason that first time we wrote about third-generation catalogs, they’ve change into a part of the discourse round what it means to be a contemporary knowledge catalog. We even noticed the phrases pop up in RFPs!

Third-Gen Data Catalog RFP
Snippet of an anonymized RPF

On the identical time, VCs have been keen to take a position on this new house. Metadata administration has grown a ton with raises throughout the board — e.g. Collibra’s $250m Sequence GAlation’s $110m Sequence D, and our $16m Sequence A at Atlan. Seed-stage corporations like Stemma and Acryl Knowledge additionally launched to construct managed metadata options on current open-source initiatives.

Future of the modern data stack in 2022: Active Metadata & Third-Gen Data Catalogs

The info world will all the time be numerous, and that variety of individuals and instruments will all the time result in chaos. I’m most likely biased, on condition that I’ve devoted my life to constructing an organization within the metadata house. However I really imagine that the important thing to bringing order to the chaos that’s the trendy knowledge stack lies in how we will use and leverage metadata to create the trendy knowledge expertise.

Gartner summarized the way forward for this class in a single sentence: “The stand-alone metadata administration platform will likely be refocused from augmented knowledge catalogs to a metadata ‘wherever’ orchestration platform.”

The place knowledge catalogs within the 2.0 era had been passive and siloed, the three.0 era is constructed on the precept that context must be obtainable wherever and at any time when customers want it. As an alternative of forcing customers to go to a separate instrument, third-gen catalogs will leverage metadata to enhance current instruments like Looker, dbt, and Slack, lastly making the dream of an clever knowledge administration system a actuality.

Whereas there’s been a ton of exercise and funding within the house in 2021, I’m fairly certain we’ll see the rise of a dominant and really third-gen knowledge catalog (aka an lively metadata platform) in 2022.

Future of the modern data stack in 2022: Data Teams as Product Teams

As the trendy knowledge stack goes mainstream and knowledge turns into a much bigger a part of day by day operations, knowledge groups are evolving to maintain up. They’re now not “IT people”, working individually from the remainder of the corporate. However this raises the query, how ought to knowledge groups work with the remainder of the corporate? Too usually, they get caught within the “service lure” — endless questions and requests for creating stats, quite than producing insights and driving influence by means of knowledge.

Emilie Schario - Service Trap
Emilie Schario’s iconic picture on the fact of engaged on an information crew. (Picture from MDSCON 2021.)

In 2021, Emilie Schario from Amplify CompanionsTaylor Murphy from Meltano, and Eric Weber from Sew Repair talked a few option to break knowledge groups out of this lure — rethinking knowledge groups as product groups. They first defined this concept with a weblog on Domestically Optimistic, adopted by nice talks at conferences like MDSCONdbt Coalesce, and Future Knowledge.

A product isn’t measured on what number of options it has or how rapidly engineers can quash bugs — it’s measured on how effectively it meets prospects’ wants. Equally, knowledge product groups must be centered on the customers (i.e. knowledge customers all through the corporate), quite than questions answered or dashboards constructed. This enables knowledge groups to give attention to expertise, adoption, and reusability, quite than ad-hoc questions or requests.

This give attention to breaking out of the service lure and reorienting knowledge groups round their customers actually resonated with the information world this yr. Extra folks have began speaking about what it means to construct “knowledge product groups”, together with loads of scorching takes on who to rent and set targets.

Future of the modern data stack in 2022: Data Teams as Product Teams

Of all of the hyped developments in 2021, that is the one I’m most bullish on. I imagine that within the subsequent decade, knowledge groups will emerge as probably the most essential groups within the group material, powering the trendy, data-driven corporations on the forefront of the financial system.

Nonetheless, the fact is that knowledge groups immediately are caught in a service lure, and solely 27% of their knowledge initiatives are profitable. I imagine the important thing to fixing this lies within the idea of the “knowledge product” mindset, the place knowledge groups give attention to constructing reusable, reproducible property for the remainder of the crew. It will imply investing in person analysis, scalability, knowledge product delivery requirements, documentation, and extra.

Future of the modern data stack in 2022: Data Observability

This concept got here out of “knowledge downtime”, which Barr Moses from Monte Carlo first spoke about in 2019 saying, “Knowledge downtime refers to durations of time when your knowledge is partial, inaccurate, lacking or in any other case inaccurate”. It’s these emails you get the morning after a giant challenge, saying “Hey, the information doesn’t look proper…”

Knowledge downtime has been part of regular life on an information crew for years. However now, with many corporations counting on knowledge for actually each facet of their operations, it’s an enormous deal when knowledge stops working.

But everybody was simply reacting to points as they cropped up, quite than proactively stopping them. That is the place knowledge observability — the concept of “monitoring, monitoring, and triaging of incidents to forestall downtime” — got here in.

I nonetheless can’t imagine how rapidly knowledge observability has gone from being simply an concept to a key a part of the trendy knowledge stack. (Not too long ago, it’s even began being known as “knowledge reliability” or “knowledge reliability engineering”.)

The house went from being non-existent to internet hosting a bunch of corporations, with a collective $200m of funding raised in 18 months. This contains AcceldataAnomaloBigeyeDatabandDatafoldMetaplaneMonteCarlo, and Soda. Folks even began creating lists of latest “knowledge observability corporations” to assist preserve monitor of the house.

Future of the modern data stack in 2022: Data Observability

I imagine that previously two years, knowledge groups have realized that tooling to enhance productiveness will not be a good-to-have however essential. In spite of everything, knowledge professionals are probably the most sought-after hires you’ll ever make, in order that they shouldn’t be losing their time on troubleshooting pipelines.

So will knowledge observability be a key a part of the trendy knowledge stack sooner or later? Completely. However will knowledge observability live on as its personal class or will or not it’s merged right into a broader class (like lively metadata or knowledge reliability)? That is what I’m not so certain about.

Ideally, when you’ve got all of your metadata in a single open platform, you must have the ability to leverage it for quite a lot of use circumstances (like knowledge cataloging, observability, lineage and extra). I wrote about that concept final yr in my article on the metadata lake.

That being stated, immediately, there’s a ton of innovation that these areas want independently. My sense is that we’ll proceed to see fragmentation in 2022 earlier than we see consolidation within the years to come back.

Future of the modern data stack in 2022: Last thoughts

It might really feel chaotic and loopy at instances, however immediately is a golden age of knowledge.

Within the final eighteen months, our knowledge tooling has grown exponentially. All of us make lots of fuss in regards to the trendy knowledge stack, and for good cause — it’s so a lot better than what we had earlier than. The sooner knowledge stack was frankly as damaged as damaged may get, and this gigantic leap ahead in tooling is strictly what knowledge groups wanted.

For my part, the following “delta” on the horizon for the information world is the trendy knowledge tradition stack — the perfect practices, values, and cultural rituals that may assist us numerous people of knowledge collaborate successfully and up our productiveness as we sort out our new knowledge stacks.

Nonetheless, we will solely take into consideration working collectively higher with knowledge after we’ve nailed, effectively, working with knowledge. We’re on the cusp of getting the trendy knowledge stack proper, and we will’t wait to see what new developments and developments 2022 will carry!


This text was initially printed on In the direction of Knowledge Science.

Header picture: Mike Kononov on Unsplash



RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments