Big Data Solutions


‘Big Data’ has received intense attention in recent times, primarily due to the confluence of a number of important trends.

Enterprise data asset volumes are increasing exponentially. Some estimate that over 50% of available data was created in the last 18 months! The data itself is also becoming more diverse, especially with unstructured data from sources such as social media, email, and call centers.   What’s more, data is being consumed and analyzed faster than ever before.

Traditionally the realm of large and expensive mainframes, today Big Data can be managed by clusters of inexpensive, commodity hardware delivered via  infrastructure-as-a-service and proven software such as Apache™ Hadoop.™

However, the key for companies like yours is to develop an enterprise eco-system that fully leverages data assets in an efficient, scalable and cost-effective manner.

That’s where Data Centric can help.

Big Data, a cornerstone of your data management ecosystem

Backed by almost two decades of success in data management, we can help you incorporate Hadoop into the heart of your data management architecture. This includes analyzing the role of Hadoop across your organization and finding answers to questions like these:

  • What use cases can Hadoop serve?
  • What complex analytics can be facilitated?
  • Where can it serve as a data exploration laboratory?
  • Where can it be used as a low-cost alternative or complement to high-cost hardware, MPP or relational databases?
  • How and where will it interact with existing data management assets?

Efficient and committed teams

Hadoop is a proven, open-source technology, but hardening it and making it a relevant and usable part of an enterprise data infrastructure is often handed over to armies of Java developers and statisticians.

We can replace that army with a focused team of specialists — efficient, committed and armed with the right tool kit to develop the best enterprise-level solution for your organization. Most importantly, these are individuals that bring decades of data management experience to the table, providing appropriate perspective on Hadoop as part of your DM tool belt.

Multi-purpose data repositories

Data Centric has a distribution partner agnostic approach to building your Hadoop repository. We deliver these solutions as (1) long term, persistent, staging areas for your current analytic structures, (2) data profiling and exploration repositories, or (3) DaaS (Data as a Service).

Our approach allows you to quickly repurpose your data exploration laboratory for new business users. For example, a marketing analytics lab may be quickly expanded to support a sales analytics lab by a quick regrouping of the logical model. Additionally, we   support faster time to market for “short – term” analytical repositories, required for activities such as conducting acquisition due diligence,  monitoring feedback on BETA release of new products, investigating the effect of a black swan event on your organization.

Assets to optimally leverage your investment in Hadoop

Our Standard Architecture for Hadoop-based Big Data solutions is a hybrid of proven strategies, design principles and pre-built, reusable components.  Grounded in years of information management implementation experience and project success, the framework has three primary elements.

Automated Schema Discovery. Using our patented BRMS™ solution, relationships across multiple incoming schemas can be automatically discovered and linked for both data-at-rest and data-in-motion. Faster and better insights are achieved by more rapid integration of structured and unstructured data.

Partitioning Strategy.  Our ‘Automated Schema Discovery’ coupled with our methods for pattern-based data distribution, help you organize incoming data more efficiently, optimizing performance and data access.

Shared Integration Components. Built in Java and MapReduce, these standard components help deliver consistency and shorten integration development lifecycles. Our components include facilities for replay and reload, validation and reconciliation, regression testing, data masking and generic extraction.