Truth or Myth? Data must be replicated and isolated for secure downstream analytics

August 18, 2022 | Pyramid Analytics

Last updated: February 22, 2023

Data replication is the process of storing the same data in multiple spots to improve data availability and accessibility. But must data always be replicated and isolated for secure downstream analytics?

Replicating data to use in analytics and business intelligence preserves the structure of the source data. But at what cost? Data replication adds extra steps and complexity to the process as organizations need resources and procedures to maintain the data’s consistency under these conditions.

“Data preprocessing such as cleansing and formatting it for analysis is time-consuming. Some estimates suggest that this can account for 80% of the effort in data analysis projects,” according to Deloitte.

Ventana Research adds, “Analysts spend the bulk of their time on manual tasks such as preparing data for analysis (47%) and checking quality and consistency (45%) in the data rather than doing actual analysis.”

Is data replication really necessary? Is this smart?

Many BI tools force people to replicate data from source locations into a data silo or warehouse before they can analyze it. The idea is to copy data from various enterprise sources, clean it, and have a separate, controlled environment to store the data where it can be accessed by the BI tool. In addition to security that is implemented in the data warehouse, some analysts find themselves building content multiple times to go against different data sets to restrict what access is available to the end-user.

The myth of data replication and security automation has persisted because organizations are repeatedly told that replicating data keeps it consistent, reliable, and up to date. People are conditioned to avoid working directly with source data to preserve it. Typical analytics tools further confirm these actions because the tools are built to pull from a data warehouse instead of straight from the data source, claiming this improves the speed and efficiency of the data analysis.

The truth is, there’s no need to replicate data for decision intelligence.

Data replication is an acceptable strategy for disaster recovery, but when it comes to analytics, it increases risk and causes unneeded chaos. Replicating your data and moving it to another location loses its inherent security and, depending on the way the data is shared (extracted into intermediate files and shared via email or other insecure means, etc.), can inadvertently introduce additional downstream security concerns (e.g., data getting into the wrong hands). Plus, you risk creating copies of the data that conflict with the source data (data silos). Data latency is another concern: the moment it’s extracted from the source, it’s no longer up to date.

With the right decision intelligence platform, you can leave the data where it is and avoid the unneeded chaos.

Decision intelligence is what’s next in analytics.

At Pyramid, we’re out to change how you think about analytics and business intelligence. Rather than continue to pour resources into data replication and patchwork security, your team can instantly connect to any data source directly, query, and blend any amount of data. Check out our Mythbusting Business Intelligence vs. Decision Intelligence guidebook to see how we break down common BI myths into categories related to data, people, and analytics and show you how, with decision intelligence, you can take your analytics to the next level. Get instant access to the Mythbusting guide here (no registration required).