В чем основная проблема концепции data first
Перейти к содержимому

В чем основная проблема концепции data first

  • автор:

Data-First Architecture

I recently had a light bulb moment when I saw a tweet from Evan Todd. It helped bring together some ideas I have had for a while on software architecture.

Data characteristics excluding software functionality should dictate the system architecture.

The shape, size and rate of change of the data are the most important factors when starting to architect a system. The first thing to do is estimate these characteristics in average and extreme cases.

Functional programming encourages this mindset since the data and functions are kept separate. F# has particular strengths in data-oriented programming.

I am going to make the case with an example. I will argue most asset management systems store and use the wrong data. This limits functionality and increases system complexity.

Traditional Approach

Most asset management systems consider positions , profit and returns to be their primary data. You can see this as they normally have overnight batch processes that generate and save positions for the next day.

This produces an enormous amount of duplicate data. Databases are large and grow rapidly. What is being saved is essentially a chosen set of calculation results.

Worse is that other processes are built on top of this position data such as adjustments, lock down and fund aggregation.

This architecture comes from not investigating the characteristics of the data first and jumping straight to thinking about system entities and functionality.

Data-First Approach

The primary data for asset management is asset terms , price timeseries and trades . All other position data are just calculations based on these. We can ignore these for now and consider caching of calculations at a later stage.

  • terms data is complex in structure but relatively small and changes infrequently. Event sourcing works well here for audit and a changing schema.
  • timeseries data is simple in structure and can be efficiently compressed down to 10-20% of its original size.
  • trades data is a simple list of asset quantity flows from one entity to another. The data is effectively all numeric and fixed size. A ledger style append only structure works well here.

We can use the iShares fund range as an extreme example. They have many funds and trade far more often than most asset managers.

Downloading these funds over a period and focusing on the trade data gives us some useful statistics:

  • Total of 280 funds.
  • Ranging from 50 to 5000 positions per fund.
  • An average of 57 trades per day per fund.
  • The average trade values can be stored in less than 128 bytes.
  • A fund for 1 year would be around 1.7 MB.
  • A fund for 10 years would be around 17 MB.
  • 280 funds for 10 years would be around 5 GB.

Now we have a good feel for the data we can start to make some decisions about the architecture.

Given the sizes we can decide to load and cache by whole fund history. This will simplify the code, especially in the data access layer, and give a greater number of profit and return measures that can be offered. Most of these calculations are ideally performed as a single pass through the ordered trades stored in a sensible structure. It turns out with in memory data this requires negligible processing time and can just be done as the screen refreshes.

More advanced functionality can be offered, such as looking at a hierarchy of funds and perform calculations at a parent level, with various degrees of filtering and aggregation. As the data is bitemporal we can easily ask questions such as «what did this report look like previously?» or even «what was responsible for a change in a calculation result?». Since the data is append only we can just update for latest additions and save cloud costs.

Conclusion

By first understanding the data, we can build a system that is simpler, faster, more flexible and cheaper to host.

Software developers cannot always answer questions on the size and characteristics of their system’s data. It has been abstracted away from them. People are often surprised that full fund history can be held in memory and queried.

We are not google. Our extreme cases will be easier to estimate. Infinitely scalable by default leads to complexity and poor performance.

With cloud computing, where architectural costs are obvious, right sizing is essential.

Most of the references I could find come from the games industry. I would be interested to hear about any other examples or counterexamples.

References

FAQ — some questions I’ve been asked

How do you deal with previously reported values and make sure they will be the same in the future?

The data model is bitemporal so we can request any reporting data as at any prior time. Lockdown process design becomes simply storing a timestamp for a reporting period. Reporting can make use of lockdown timestamps to produce a complete view of prior period adjustments with full details. Without a bitemporal data model this often becomes a reconciliation process, leading to further manual steps.

What about reported values changing due to code changes?

Reporting data can be saved when key reports are generated and used in regression testing. Regression testing of all reports using the old and new code can also be automated. This is very good practice for high quality systems and is not very difficult to implement.

Building a Data-First Culture for Digital Transformation

Data-first Culture

Most of the organizations, these days, are making efforts to tap into the potential value that comes from digital business transformation. Ask any business leader about their 2020 priorities, and you’ll hear about their strategic plans for transforming digitally. However, stats reveal that over 70% of organizations either have a digital transformation strategy for their business or are working on one. Further, only 7% have implemented their digital transformation strategy completely.

A significant question that arises here is why organizations fail to achieve the desired results despite having a digital transformation strategy? Because there is a misconception about what digital transformation is. It is often different than what is being done. It is a lot more than just ‘being more digital’.

According to Rob Roy, Chief digital officer at Sprint,

Digital transformation is not about digitizing a channel or adopting more digital things. It is about working with each area in the business to help everyone think & act digitally for the things they are responsible for. It is about improving and simplifying customer “moments of truth” and all the supporting processes that build a true omnichannel to deliver world-class experience.

Therefore, digital transformation is about coming up with a business plan, which is more thoughtful and based on data and actual use cases. It is about one of the most crucial aspects which organizations often forget, I.e. a DATA CULTURE. Digital transformation is about promoting a data-first mindset.

But why data is so vital for businesses in their digital transformation journey?

Why Data-First Culture?

Your Business Is Not About You; It’s About Those Who You Serve.

For every business to succeed, it is important that they serve customers to their utmost satisfaction. For this, organizations need to understand their customers’ behavior to produce products and services just like the way they need. Here’s where a data-first culture comes into play. Focusing on data that helps understand user behavior such as age, current carrier, location, and demographics can help organizations better understand their customers’ journey across the web.

Netflix is an excellent example of being a successful data-driven organization. Founded in 1997, Netflix exceeded Walt Disney Co. in 2018 as the number one media company in the United States by market value. By digitizing interaction with its 151 million subscribers, Netflix collects data from each of its customers to understand their behavior and watching patterns. Using this information, Netflix is able to recommend TV shows and movies customized as per the subscriber’s interest and choices. By offering the subscribers what they want, Netflix frees its users from searching endlessly through content streams to find out their favorite TV shows and movies. Hence, instituting a data-first culture helps Netflix make its viewers’ job easier, giving them a better and customized viewer experience.

So, having a data-first culture can prove significant for businesses in their digital transformation process by helping them:

  1. Get a deeper understanding of the target audience
  2. Make Better Decisions
  3. Increase Efficiency of Resource Allocation
  4. Early Problem Detection
  5. Empower Employees to Manage themselves with Confidence

How To build a Data-First Culture ?

Data First Culture Post Image

A data-driven culture can’t be manufactured or purchased; it must be cultivated and developed.

While companies have realized the importance of fostering a data-first culture, cultivating such an environment hasn’t been easy. A stat from the Gartner report reveals, “While 80% of CEOs claim to have operationalized the notion of data as an asset, only 10% say that their company actually treats it that way.” While organizations depend upon data for their day-to-day business operations, they fail to treat it like a strategic asset, in reality. Another study published by NewVantage Partners says, “The main challenges to becoming data-first were people (62.5%) and process (30.0%) and not technology (only 7.5%).”

So, where does the problem lie?

Well, if organizations want to build a data-driven culture, they should first align their business strategy with the collected and organized data and ensure that data quality is maintained throughout. Also, since a data-first culture can only be developed and not manufactured using any technology, organizations need to take a more holistic approach for cultivating a data-first culture.

Hence, organizations must focus on four main pillars of nurturing a data-first environment. These pillars can be considered as a significant framework to determine or analyze potential gaps organizations may have in their overall data strategy.

1. Change the MINDSET

One of the hardest roadblocks in fostering a data-first culture is shifting the collective mindset of your team to embrace data. In an attempt to steer your workforce in a new direction, you should focus on:

  • Implement the powerful method of leading by example. Your leaders should first believe in using data, and then you can expect your team to adapt to such a mindset.
  • Generating quick wins can help your team experience the tangible benefits of using data, which will help grow the data-first momentum with each win.
  • Experimenting with new ideas, make mistakes, and repeat. With each iteration, you will inculcate the discipline of relying on data for decision making and innovate quickly.

2. Reinforce the SKILLSET

To nurture a data-first culture, your team should have specific data-related knowledge and expertise to work with an abundance of data. While hiring new talent with such expertise can be one way, it will lead to the wastage of the knowledge and skills of your existing talent. Hence, you should focus on these key areas to boost your employees’ data skills:

  • Provide basic training to your employees on how to read and understand the data.
  • Train your team on data storytelling to help them learn how to combine data and communicate key insights in the data to others.
  • Build a strong team of analysts comprising of people who are skillful enough to help others become more data-savvy.

3. Enhance the TOOLSET

Nurturing a data-first culture based on a variety of data tools and systems can actually be a hindrance rather than a help. Companies should focus on the following aspects to ensure they have a strong technology foundation for a data-first culture:

  • Establishing a common data language that everyone embraces and adopts.
  • Democratize the data to more members can empower your team to leverage data on a more regular basis and free up your data scientists and analysts to focus more on strategic tasks.
  • Automating labor-intensive tasks to free up your people so that they can add value to your business in more productive ways.
  • Integrating your analytics tool into your existing processes or systems to make them even more powerful.

4. Strengthen the DATASET

Ensure the data you collect is useful, relevant, and trusted by the people of your organization. To achieve this, you should focus on the following aspects:

  • Define and communicate clearly the core priorities of your organization. Your data and analytics tools should be closely tied with your business strategy to generate significant output.
  • Maintain data governance without overshadowing your team’s ability to create value with the data.
  • Ensure your data privacy is maintained and the data is used securely. It is important to help your team understand the roles they play in securing data and other digital assets.

How Businesses Should Leverage Human Assets?

According to research conducted by Intellyx, using machine data in the context of business operations can be difficult. Using data to improve the quality of your human assets is what makes machine learning so important. You want to reduce redundant routine activities through automation and enable your employees to be more productive and effective. To get the most out of machine learning in your workplace, you need to audit your organizational data, understand business processes and examine how technology can enable digital transformation. Empowering employees with the right customer information using data also enables them to have better customer interactions. But there is also a need to acknowledge the fact that not all existing human resources are capable of handling big data and modern technology. It is up to you as an employer to empower them with the right skills. Skills training programs for existing employees as well as hiring new talent to handle all data management processes are important within an organization. The first step towards digital transformation is to have a robust digital transformation network.

How Employees Should Help Drive Digital Transformation?

It is up to the organization to empower its employees with the machine data needed to bring in a digital transformation. Employees on the other hand should focus on improving customer experiences with the data they have in hand. They should focus on using the data in a way that adds value to customers. Most hindrances to digital business transformation occur due to the inability to leverage data. Machine data is capable of offering inputs about customers which includes a customer’s expectations or behaviors. The company culture needs to match the customer culture in order to be successful with its digital transformation framework.

The Takeaway

Establishing a data-first culture is a time-taking process and demands patience and constant efforts. To unlock its powers, companies should accurately understand data right from collecting tons of data to refining it. However, challenges are inevitable, wherein your people being the most difficult aspect. As James Belasco and Ralph Stayer, authors of the Flight of the Buffalo, said, “Change is hard because people overestimate the value of what they have and underestimate the value of what they may gain by giving that up.” Focusing on the pillars discussed above can give you a clear direction for establishing a data-first culture. This is exactly what we do to transform your business digitally at Softobiz.

Why We Need to Move From Data-First to a Knowledge-First World

Creative ideas, knowledge and skills of person in social network. Profile african american girl overlap a lot of photos

Creative ideas, knowledge and skills of person in social network. Profile african american girl overlap a lot of photos, isolated on gray background

We live in a data-rich world. Very data rich. Indeed, it’s estimated that roughly 2.5 quintillion bytes of data are created every day.

Perhaps because of its ubiquity, there are those who believe the sheer volume of available data means we have all we need to easily and accurately answer any question without delay. If you can’t, they declare, you just need more data.

But if you already have a massive amount of data and you still can’t answer a question… is more data really what you need? At this point, it’s not for lack of data that you haven’t been able to solve your problem. So why would you believe that with more data, your problem is going to be solved?

To use a platitude often attributed to Einstein, “The definition of insanity is doing the same thing over and over again and expecting a different result.” And that’s what a data-first mindset is doing: driving us insane.

Here, I’ll explain why we need to move away from this data-first world, and why we need a paradigm shift away from the myopic focus on data and answering every question with, “We need more!”

To maximize the value of all the data available to us, we need to move to a knowledge-first world, a world where we think about context, people, and relationships first.

Overfilling Your Data Lake with a Data-First Approach

If you’re stockpiling ever more data in an attempt to solve for stubborn use cases, you need to store that data somewhere. And that somewhere is usually a data lake. The more full that lake becomes with endless amounts of data, the murkier your organization’s understanding of what’s in there, and of what it all means; when that happens, your data lake has become a data swamp.

A symptom of this pervasive problem is the widely seen shift from the Extract, Transform, Load (ETL) process for replicating data from source systems to target systems, to ELT: Extract, Load, Transform. Yes, the move to ELT saves time by allowing organizations to load data to destination systems without modeling it beforehand, but that often means data remains incompatible with the target systems the moment it’s needed. And that leads to business users with little data literacy scrutinizing raw data and saying, “What the hell is this? There’s so much data here… but I don’t know what I’m looking at.”

And this is the problem with our data-first world; the disconnect between the data itself and the valuable knowledge that data can provide.

Knowledge-First Can Save Us

In a knowledge-first world, you approach your data with a people-first, relationship-first, and context-first perspective. Instead of firing over mass amounts of confusing, raw data, consider:

  • Who needs to consume the data? (People)
  • Why do they need to consume it? What use case are they trying to solve for? (Context)
  • How is this data related to other data and people? (Relationships)

Then, when you’ve answered those questions, you need to ensure your transformed data can be understood… and understood by business users who may not have the technical chops of your data team. These are the first questions to consider in order to start treating data as a product.

This is where modeling and semantics — and knowledge — take center stage. And this is where it’s crucial for data experts to be business literate, or have a business-literate teammate to translate.

Teams Must be Data and Business Bilingual to Succeed

A data-first world is focused on data literacy, a topic that’s been discussed ad nauseum in our industry over the past 20 years. We’ve hammered on the importance of teaching business users how to analyze datasets to get maximum value from the organization’s data, from the executive level on down. But the onus has been on the business users for too long, and a massive amount of value has been lost because of it. To really tap into the value of your data, it has to be a two way street.

A knowledge first world is focused on business literacy. And in order for data teams to return maximum value for their organizations, we’re gonna have to go to school.

Right now, the disconnect between data teams and business teams means that the first might not understand the business, while the second might not understand the data. Data literacy has become a near-crucial skill for business leaders; going forward, business literacy will become equally as important for data leaders. How does our sales pipeline work? What do we consider a marketing qualified lead? What in the heck is a BDR? To answer questions like these, your team will need to talk to the people in your organization for whom they’re a daily priority.

And once these questions, and many more, can be answered by your business-literate data team, you’ll start gaining the context you need to deliver the data your business users need to drive business success.

More importantly, you’ll be able to deliver not just data, but knowledge, in a knowledge-first world.

Data-first marketing: A strategy to stop wasting 30% of your budget

Focus on the fundamentals ensures the contact and account data generated from our marketing efforts is compliant, marketable, informed, connected and actionable. This commitment makes our people, programs, and results better.

Data, data, data. The fuel for the engine, the grease for the system, the protein for the body. All apt descriptions of the role data plays in smart, effective marketing (and business). Like eating well, sleeping restfully, and getting exercise, it’s what increases our performance. We know it, maybe we even preach it. How come most B2B teams fail to conquer data readiness and governance?

In today’s digital-first world, a world where the customer comes first, a world where it’s about the buyer and not your internal team structures, data silos or sales process, data quality is the first and foundational building block that your marketing and sales teams need. It’s what allows your teams to align around, develop strategies, execute programs, and attack your target markets.

With expectations at an all-time high and growth the charter for B2B teams, now is the time to (re-) commit to getting your data right. “Right” today means focusing on the fundamentals ensuring the contact and account data generated from our marketing efforts is compliant, marketable, informed, connected and actionable. This commitment makes our people, programs, and results better.

The real cost of bad prospect, customer and account data

When we fail to address data quality, first and upfront before it hits your database, the out-of-pocket, hidden, and professional costs are real. It’s expensive, it burns resources, and it delivers crappy experiences when you’re marketing is off the mark.

Bad – incomplete, inaccurate, unstandardized data – significantly hampers marketing’s performance and ability to deliver against its numbers to the business and promise to their customers and prospects. Like smoking, it’s a costly, bad habit that needs to be addressed. Let’s dive deeper and break down the numbers.

  • 25-30% of data generated from your demand programs is unmarketable because the data is inaccurate, non-compliant for privacy and/or doesn’t match your Ideal Customer Profile (ICP). Worse, you’re spending budget to generate unmarketable prospect data. Doing the quick math, that $1 million budget you thought you were putting to work is now only $700,000.
  • It costs $100-120 per record to clean data once it’s in your systems compared to only $2-$3 to get it right before it hits your database. At 100,000 new records a year that is the difference between investing $200,000 for clean data first strategy versus spending $700,000 to try and fix it later.
  • SDRs spend on average 27 hours per month (Sales Assembly, January 2020) cleaning up bad data generated from marketing’s programs. That’s $4,000 worth of monthly salary for every SDR you have on the front lines. You can run the math on productivity and costs.

These are just the “in your face” numbers. The “hidden” costs come in the form of losing sales trust and turning off potential prospects and customers when follow-on outreach misses the mark.

Strategies and tactics to getting your data right, first

As the costs add up quickly, we need scalable strategies to shake up the traditional way we have thought about fixing bad data and ensuring clean, marketable data. One approach is a proactive strategy and the other is an emerging benchmark for B2B organizations. Both are proven strategies and they’re even better when used together.

  • Take care of bad data before it hits your database. Most of us say, ”I have tools or services that will clean it up once inside my Marketing Automation and/or CRM databases.” If we’re honest that never really happens. One marketing executive recently shared, “…it’s like trying to clean water when it enters the sewage treatment plant. It needs to be done but it’s so much harder after the fact.”
  • Make it a top KPI, setting goals, benchmarks, and metrics around data health. As more B2B teams focus on precise targets and account-based strategies, one powerful benchmark is based on hitting a high % of marketable database and coverage of their personas in their named account buying groups. For example, “our goal is that 85% of our database is permissioned and matches our target audience.” Or “…of our 7,150 named accounts, we have at least 3 opt-in members of our buying group”. This data first commitment helps everybody perform better.

Create a marketable, segmented, compliant database of ideal buying groups and accounts

A data-first strategy supports an essential marketing and sales strategy – a healthy, active and permissioned database of prospects and customers. Deploying and mastering these data-first strategies become even more critical if your organization has any of these progressive growth strategies in action or on the drawing board.

  • Deploying an account-based strategy or shifting to ABM – it’s a lot of heavy lifting if your contact and account information isn’t accurate or synched for programs. No matter how strong the account-based sales and marketing plays you run in the market are, you’re dead on arrival without good data. And there’s no hope for sales-marketing alignment.
  • Mapping buyer and/or account journeys – segmentation is nearly impossible if your prospect and account data is not actionable, your contacts aren’t permissioned, and/or your account data is not linked across buying groups and across channels.
  • Building databases and audiences as you enter new vertical or geo markets – jump-starting marketing and sales requires opt-in contacts across the buying groups within your named accounts or ICP to create demand and support sales. A marketable database is essential.

Marketing in 2021 is going to be much harder as performance demands from execs increase and prospects and customers interacting, research and purchasing remotely is the norm. With budgets being developed this quarter, now is an ideal time for nailing your strategy and committing to getting your data right, first.

Opinions expressed in this article are those of the guest author and not necessarily MarTech. Staff authors are listed here.

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *