You’re likely dealing with tracking internal data on supply chain activity and needs, employee productivity, HR information, efficiency metrics, financial activity, or KPIs for almost any other imaginable scenario. Historically, data storage and retrieval has been accomplished using tables. Using spreadsheets or relational databases, we organize data into lists of identical items, and we use related tables to force data into a tightly integrated system. That worked well for a long time. Tables make it possible to handle a lot of data efficiently, and we’re used to the way they function.
The problem, however, is that if we’re not careful we will start thinking of tables as our only tool for managing data. And as they say, if the only tool you have is a hammer, then every problem becomes a nail. But that’s not how data works in the real world. Data in the real world is messy and not always homogeneous, and the design constraints of a table make it difficult to manage similar (but not identical) items. That leaves us with long lists of items where it’s the differences, not the similarities, which are most important.
The unfortunate reality is that table-based spreadsheets and relational databases have done grave damage to computer science. Traditional development practices rely heavily on related tables to store data in a similar format with similar limitations. In this scenario, we have to force all of our data into that tabular structure, and we use “related” tables to massage the data and make it fit. What we end up with is a bloated, convoluted solution that can’t handle the massive influx of data most organizations are dealing with.
Until we break free of the constraints of thinking in tabular structures, we are destined to create solutions that can’t handle the demand placed on them.
That’s why we need a different approach—one that can organize data in the most relevant way, with an eye toward modern solutions that dont’t limit us to yesterday’s technical preconceptions.
The Object/Attribute Model: Rethinking Your Data
In today’s data-hungry world, it’s not uncommon to see data sets with tens or hundreds of millions of objects. As the messiness of the real world impinges on the table-based system, performance suffers and we have difficulty scaling.
If we’re going to free ourselves from the tyranny of the table, then we need a more flexible way to organize data that allows for differences among data objects. Most of the concepts we need to accomplish this are not new to us:
- Data Types (i.e. Number, Date or String)– Data types define what kind of data can be included in a column or list. For example, we use spreadsheets to format cells based on what type of data they contain.
- Data Labels – We’re also used to labeling our data. This would be the equivalent of a column label in a spreadsheet or table.
- Data Values–Data values conform to a specific data type. For example, they can be strings, numbers, Booleans, etc.
With these concepts, developers can create a new data structure that describes “Objects” with “Attributes.” These attributes are a list of data labels (name, age address, etc.) with the data values that conform to a specific data type.
Here’s an example. Let’s say I have a list of objects that all have the same attributes. A table makes sense in this scenario. Every object is exactly the same type, so I can just arrange my list with all the attributes lined up. It’s easy to navigate and edit.
So far, so good.
But the real world is rarely that homogenized. Frequently, how items on a list are different is the most important information, not how they are the same. If we use the Object / Attribute model, we can add a new “list” data type that contains a list of other data items. Now, our data structure rules allow us to model the real world much more closely. We can think of our data as collections of similar objects, and loosely coupling those collections provides us with greater flexibility when introducing new features or managing availability.
Physically organizing the data in this manner on a hard drive, rather than as a set of related tables, is sometimes called a NOSQL (Not Only SQL) database. It is non-relational, flexible, and scalable, which makes it a good fit for large, evolving datasets. MongoDB, for example, is a flexible object storage database that stores data in a textual format like JSON. Data structure can change over time and fields can vary among documents. This format also makes it easy to analyze data, and it has unlimited ability to scale.
Data Innovation Depends on Building the Right Team
As you consider the data needs of your company and determine a data management strategy, picking the right team is the most important step you can take to prepare for innovation both now and in the future. This team will need to weigh the needs of the current project, understand the types of data involved in the software as it relates to your specific business context, and choose architectural solutions that don’t default to past assumptions.
At Worthwhile, we help you identify not only the current needs of your company, but also those that will likely arise in the future. Due to the huge volume of data most companies have to manage, we often recommend object storage is one part of the solution to help you break through limitations that may be imposed by going with old default methodologies. Our goal is to design the right solution for every client, and that means we never approach two projects the same way. Instead, we work with you to discover the solution best fitted to your organizational needs, user behavior, and business objectives.
It’s innovation at the speed of competition, designed for the best real-world results.