2017/12/19

Storing all the historical and current data generated in a company with Hadoop from Hortonworks

Businesses can't afford to miss anything. Nowadays companies have the possibility to use a relatively economic and scalable hardware environment as they grow to store all the data related to their business. Although they do not yet know what their data can be used for, companies do not want to regret in the future not having collected it. The software is free and allows you to manage as much data as Google or Facebook. It will never be too small.

Client: a real estate company.

Need: To store in an orderly way in a single system all the historical and current data generated in the company. 

Previous situation: The company's real estate and accounting management system was in place and a data mart with the data that the management considered necessary to follow the evolution of the company in an instance of SQL Server with Analisis Services. The data was integrated with Integration Services from various sources. Reports were created with SQL Reporting Services and Power BI. 

Implementation: A cluster of Hadoop (Hortonworks) was created where all the operative data of the company was going to be stored. In this way, all the reports, letters, communications and e-mails that were scattered throughout the company were stored. The reports were modified so that they also wrote a copy in .csv file format in order to have an auditable historical record of the reports generated. The data deemed appropriate from the company's DBMS was systematically collected. The logs of this system and of the web server of the company were captured with Filebeats and Elasticsearch. Data of clients and properties managed by the commercials was transferred to Hadoop in a standardized format. 

New datasets useful for company monitoring were subsequently identified and new multidimensional models were created for company managers with SSAS and Power BI reports. Dremio was installed and configured to facilitate and speed up other Power BI queries directly on the data stored in the different file types of the Cluster.

All storage design and information processing was done according to the european General Data Protection Regulations (GDPR) that would come into effect in Spain on May 25, 2018.

No comments:

Post a Comment