72476

Автор(ы): 

Автор(ов): 

4

Параметры публикации

Тип публикации: 

Доклад

Название: 

Efficient data exchange between typical Data Lake and DWH corporate systems

DOI: 

10.1109/ICECET52533.2021.9698468

Наименование конференции: 

  • 2021 International Conference on Electrical, Computer and Energy Technologies (ICECET)

Наименование источника: 

  • Proceedings of the International Conference on Electrical, Computer and Energy Technologies (ICECET 2021)

Город: 

  • Cape Town

Издательство: 

  • IEEE

Год издания: 

2021

Страницы: 

https://ieeexplore.ieee.org/document/9698468
Аннотация
In the last five years, many companies around the world have been successfully implemented Apache Hadoop as a main Data Lake storage for all data presented in the organization. At the same time, the adoption of other Open-Source technologies has been also increasing for years, such as classical MPP-based systems for Analytical workloads. Thus, the question of efficient and fast data integration between Apache Hadoop and other organizational data storage systems is highly important for enterprises, where business and decision makers need the minimum delay of big heterogeneous data exchange between Hadoop and other storages. In this paper, we compare different options for loading data from Apache Hadoop, representing the Data Lake of organization, into Open-Source MPP Greenplum database with the role of classical data warehouse for analytical workloads, and choose the best one. Also, we identify potential risks of using different data loading methods

Библиографическая ссылка: 

Панфилов П., Сулейкин А.С., Чумаков И., Бобкова А. Efficient data exchange between typical Data Lake and DWH corporate systems / Proceedings of the International Conference on Electrical, Computer and Energy Technologies (ICECET 2021). Cape Town: IEEE, 2021. С. https://ieeexplore.ieee.org/document/9698468.