2018/05/16

Developing a Hadoop environment with Cloudera as a datawarehouse for an insurance company

Time is money. Business environment is constantly changing and being late for a potential market can lead to losing it. You have to find a way to be the first with the necessary quality. And to offer quality you have to know the market and the risks very well. And for all of this, having all the relevant data accessible is more necessary than ever.


Customer: Insurance company


Need: Study the market and customers to offer new types of insurance.


Previous situation: The data was dispersed over the internet, in the company's corporate data warehouse, in the logs of the corporate websites and in the affected departments. Some OLAP cubes had been developed in SQL Server Analysis Services to study the possibility of developing new markets but the information was insufficient and the pricing policy could not be fine-tuned to cover potential risks or determine what customers would be willing to pay for the risks covered.


Implementation: After discarding a classic development with the company's resources for the time it was estimated it would take to develop it, a Hadoop environment was created with Cloudera in which all available data was dumped with hardly any treatment. Programs were created that captured data in real time from campaigns and opinion studies that were created on the web. And access was given to the departments involved to consult the data in real time. SSAS OLAP cubes were enriched with new data and new cubes were created. And also virtual OLAP cubes were created directly in the Cluster of Cloudera with Atscale for a direct consultation of the data without the need to copy them again in another system. These new virtual cubes could be accessed with Excel, Power BI or Tableau sheets depending on the preference of the department. All the information was indexed with Cloudera Search in order to do non-standard searches. Certain company employees were also trained to search for information with Apache Hue and Apache Impala.

2017/12/19

Storing all the historical and current data generated in a company with Hadoop from Hortonworks

Businesses can't afford to miss anything. Nowadays companies have the possibility to use a relatively economic and scalable hardware environment as they grow to store all the data related to their business. Although they do not yet know what their data can be used for, companies do not want to regret in the future not having collected it. The software is free and allows you to manage as much data as Google or Facebook. It will never be too small.

Client: a real estate company.

Need: To store in an orderly way in a single system all the historical and current data generated in the company. 

Previous situation: The company's real estate and accounting management system was in place and a data mart with the data that the management considered necessary to follow the evolution of the company in an instance of SQL Server with Analisis Services. The data was integrated with Integration Services from various sources. Reports were created with SQL Reporting Services and Power BI. 

Implementation: A cluster of Hadoop (Hortonworks) was created where all the operative data of the company was going to be stored. In this way, all the reports, letters, communications and e-mails that were scattered throughout the company were stored. The reports were modified so that they also wrote a copy in .csv file format in order to have an auditable historical record of the reports generated. The data deemed appropriate from the company's DBMS was systematically collected. The logs of this system and of the web server of the company were captured with Filebeats and Elasticsearch. Data of clients and properties managed by the commercials was transferred to Hadoop in a standardized format. 

New datasets useful for company monitoring were subsequently identified and new multidimensional models were created for company managers with SSAS and Power BI reports. Dremio was installed and configured to facilitate and speed up other Power BI queries directly on the data stored in the different file types of the Cluster.

All storage design and information processing was done according to the european General Data Protection Regulations (GDPR) that would come into effect in Spain on May 25, 2018.

2017/06/20

Migrating a Real Estate Company information system to SQL Server

Common databases for the entire company facilitate communication between departments and managers.


Client: a real estate company.


Need: To homogenize different data handled by company's departments.

Previous situation: Most of the information was in a specific management program for real estate companies, but there were also several Access databases and many Excel spreadsheets managed by different departments. There was a lot of duplicated and apparently contradictory information.


Implementation: All data that was scattered throughout the organization was grouped in SQL Server with Integration Services. Once the data model and the DB were created, the company's processes had to be modified. Multidimensional tables were also created with Analisis Services in order to have a better control from the management of the company's progress. Finally, reports were generated to consult from computers or mobile phones with Microsoft Power BI reporting server.


 Cost: 500 days + licenses

2016/12/14

Optimizing SSAS (SQL Server Analysis Services) OLAP cubes

Business intelligence solutions do not work forever. They degrade over time, grow in data, are introduced small changes that in many cases deteriorate response times. The company begins to demand more. Many more reports are generated. In short, what used to work before makes it worse and worse.


Customer: Industrial company


Need: Data from the company's multidimensional cubes must be available at all times, with reasonable response times for users.


Previous Situation: The client applications for generating reports and dashboards worked very slowly and the OLAP server was unavailable many times.


Implementation: A system performance study was carried out by activating and generating traces of the SSAS server to detect where delays and blockages were occurring. The multidimensional model was studied to relate the delays with the traces and to identify time consuming mdx queries. When conclusions were drawn, changes were introduced that did not entail major disruptions to the service in order to improve it as much as possible. In a pre-production environment, changes were finally introduced in the design of the cubes that were deemed necessary to improve performance (new grouping of dimensions, new partitions, new added measures, ...) and the relevant tests and changes were made to meet the needs of the company. The process of updating the cubes was optimized and automated with Integration Services (SSIS).