Practical Aspects of Governed Data Discovery
In my previous blog on Governed Data Discovery, I covered what governance means in the business analytics world. In this blog, I’ll go over some of the key practical aspects of a governed data discovery solution.
Governed Data Discovery was a big theme in Gartner’s MQ 2014 report. Since then, numerous BI products have laid claim to the governance paradigm in their offerings. However, few measure up (pardon the pun) once you scratch the surface of their offerings. In contrast, BI Office delivers strongly on the key objectives, making it one of the preeminent governed data discovery solutions in the market today.
Here are some key aspects to look for in a real governed platform.
Self-driven content is a must have in the analytics space today. But the benefits of centrality should not be compromised in the process. A good platform delivers on both goals: self-service content creation in a centralized, shared paradigm.
||All content is saved centrally and can be accessed and managed by users and admins alike. This promotes content integrity and makes support and upgrades easy to apply and deliver. Content can be shared or kept private.
|Shared Business Logic
||Calculations and user logic are stored centrally and can be shared between users or kept “private”. Sharing reduces logic repetition and logic mismatching. It also allows users to collaborate better.
||The need to easily shift new content from development, to test to production is a key aspect of governance.
||In-built versioning of every aspect of content and business logic creation is a must. A standard feature in proper IT development frameworks, the same approach is desperately needed in the data discovery space.
|Content and Data Synchronicity
||Often content and its matching data source get out of sync. The toolset needed to check one against the other and synchronize them is a critical feature often overlooked in business analytics given the ever changing data sets that exist.
||The ability to see the flow of content items between users (over time); across different delivery platforms; and its usage is a must-have toolset for tracking the life-cycle of content in the organization. This equally applies to business logic and data assets and promotes better allocation of resources for optimization and management amongst other things.
||All content should be automatically tagged to support better categorization and search. It’s also a key part of the governance lineage tracking process described previously.
||If content and logic is centralized, it goes without saying that the same reporting components should be reusable across different analyses, reporting modalities or content formats. This is an obvious extension to centralized content.
|Common Cross-platform Content
||The ability to have the same server-based content accessible from multiple desktop platforms AND mobile platforms has become key in the industry. Using versions for a desktop that are unrelated to web browsers and separate from mobile devices is un-scalable and inherently complicated to manage.
|Centralized Formatting & Styles
||Often not rated as a ‘must-have’, consistent formatting and styles is definitely a big ‘nice-to-have’ when trying to push analytics through larger organizations. Some might say that corporate branding standards, especially for content sent externally, is by no means optional.
Things to look out for in content governance
Desktop tools that include a server version (“hybrids”) can claim some of the content governance capabilities mentioned here. However, most tools cannot force users to centrally locate all their content on the server (e.g. Excel with SharePoint). Even if they could, they lack numerous functions mentioned above because the underlying DNA of their product framework is designed around the model of personal BI.
Things like versioning, consistent formatting and shared business logic are generally lost in any hybrid solutions.
Business analytics is heavily dependent on the data warehouse stack. As such, it has a strong need for numerous elements of the data governance models that are usually implemented on the entire data stack in an organization. The following list is a sub-set of enterprise-wide data governance requirements found in such protocols as MDM.
|Secured, managed data access
||Data sources need to be presented to users in a manageable way. This includes ensuring that access to the data is secured (often in the data source itself). But often, it includes providing a more convenient gate-keeper function to expose the right data source (or version of data source) to the end user WITHOUT complexity.
|Centralized Data Modeling
||Data Modeling and mash-up tools should be executed on the server, not in the localized desktop. Apart from delivering a more scalable approach to modeling ever-growing data sets, centralization ensures the resulting model is instantly shareable, securable and refreshable without being confined to the user’s desktop.
||Together with centralized data modeling, data lineage and its impact on content (shown through content lineage) is a big part of classic data governance. Although BI tools do not traditionally provide in-depth capabilities here, it has become more important now that the data modeling function resides in the data discovery platform.
|Meta tagging and Meta Viewers
||All data elements used in models should be automatically tagged to support better categorization and search. It’s also a key part of the governance lineage tracking process described previously.
|Data usage tracking
||Last, but not least, is strong usage statistics on how data models (and elements) are being used downstream in content. This improves resource allocation, optimization and provides a clear impact analysis.
Things to look out for in data governance
Localized desktop tools don’t feed a central repository. So getting accurate usage and lineage stats is often impossible. Even when there is a “server” version, all the issues related to the stand-alone deployments cannot be voided.
Another tremendous problem relates to data modeling. Few desktop tools offer data mash-ups and modeling directly on a server. Instead users build the models (with all the data extracting processing) on their desktops. Then they upload the finished model to the “server”. This introduces numerous headaches:
- The data is hard or impossible to update without refreshing via the desktop process.
- The data needs to be downloaded to a desktop first before being re-uploaded again. Space, performance and data security are just some of the obvious problems here.
- There is no guarantee that the model being used on a particular desktop model is the same as the server version. Data incongruity now becomes a big problem.
Distributed, desktop data modeling is one of the biggest issues in non-governed data discovery platforms (some might say it’s *THE* issue).
Any data worth analyzing is usually important enough to be secured. Having a strong security model is key to business analytics. The difference between using it properly or not is central to the idea of governed data discovery. Security permeates all aspects of a BI platform. Unfortunately, the proliferation of desktop tools has made the enforcement of security in BI very difficult, if not impossible.
||Enterprise security integration is a must. Stand-alone security that is only part of the BI platform are inherently weak and unscalable. The need to plug-in and re-use the LDAP or directory deployed in the enterprise is an obvious feature needed for proper governance.
|Role Based Security
||The solution needs to employ both user and role based security. This permeates everything, including content and data security as well as things like social networking. Roles based that can use directory security groups is key to a scalable security architecture.
||All content must be securable by user or role. This is where centralization plays a supporting role, because enforcing content security in a distributed desktop application is almost impossible (e.g. Excel).
||All data must be securable by user or role. Often this is a function of the data source itself. Too often, the BI toolset does not credential properly down to the data source – voiding the data layer security.
||Multi-tenancy, or the ability to manage more than 1 “customer” from the same installation, is a big requirement. This is not only true for external SaaS solutions on the cloud – but also for internal ‘in-company’ cloud where different departments (e.g. finance, marketing etc.) need to have different workspaces but coexist on the same platform.
|Social Networking Security
||The implementation of social networking tools with BI often overlooks the larger problem of securing messaging on the network when it relates to important or confidential business-related data.
||Having the ability to create granular security profiles to control which parts of the application can be used by different users is key. Centralized tools offer this capability that can enact fine grain control of each user’s experience from a centralized console.
||The ability to allocate licenses to new users from a centralized console is critical when the scale of the analytics platform extends beyond a department. Similarly to deprecate or change license type or access.
||Any high-volume system needs a synchronization tool set that auto-provisions users based on changes in the underlying security framework. An auto provisioning tool adds, updates and removes users in the BI platform based on the external security system automatically.
||The ability to support numerous authentication models - especially single sign on.
Things to look out for in security implementations
Unfortunately, the proliferation of desktop tools has made the enforcement of security in BI very difficult, if not impossible. Localized desktop tools don’t really provide the same level of security that is found in server-driven, centralized platforms: the user has the ability to skirt the security apparatus when and as needed. (That doesn’t mean that there is NO security in other approaches). At the same time some platforms that have a server component usually implement their own security models – which do not fully piggy-back the enterprise security framework or fully utilize it. Having the entire platform (content, licensing, data and business logic) entirely integrated with an enterprise security model is fundamental to achieving governance. Turning a user on or off is a click away. With a desktop tool or distributed model, it is an installation away - far more complicated!
Built-in support for multi-tenancy and securing social networking tools are also overlooked features. Most tools employ simplistic solutions for these things.
A strong, comprehensive administrative toolset is imperative for gluing all the pieces of a centralized business analytics platform together. In effect, you cannot govern a platform without a strong and extensible administrative backend.
||As described elsewhere, a centralized administrative console is a must for managing a large BI deployment. It is therefore the centerpiece of a governed data discovery platform. Without it, and without the ability to manage every user’s experience, security, content and data access, the ability to govern the deployment is all but lost.
|System Logging, Usage Tracking and Audit
||Strong governance calls for system logging, usage tracking and auditing user access to the platform. The tracking provides avenues for security management, performance enhancement and resource allocation.
||Large organizations typically have their own specialized administrative applications for managing their entire IT environment. A strong governance feature set will include API development interfaces to allow customers the ability to integrate the platform’s management with the broader organization’s framework.
Things to look out for in admin implementations
Desktop tools with “server” versions that cannot provide full administrative control over both deployment modes is fairly useless. If those tools do not aggregate into a centralized control console, how does an organization collect and track usage; monitor querying issues and bottlenecks; or delegate and deny rights? Similarly, generic APIs for managing business analytics need to include the API’s to fully manage users, role security, licensing, data access and content so they can be embedded in a broader organization’s governance strategy and platform. API’s that merely provide control over queries or visualization fall far short when it comes to managing a broad platform and many hundreds of users in the real world.