Vincent Granville en Computer Science and Engineering, Big Data, Mathematics Executive Data Scientist, Co-Founder, Managing Partner • Data Science Central 17/9/2018 · 1 min de lectura · ~100

What is Data Lake?

What is Data Lake?

Data Lake Defined

Computers transformed how data gets stored and retrieved. Today, little thought is given to its space requirements. Modern computer applications have demonstrated the usefulness of management and logistical applications for data.

Storage of modern data has been compared to a warehouse. Data is prepared before it is stored there. Itemized data offers a better management solution in such a preplanned system.

Technology is moving so fast that a data warehouse is not enough. Raw data is in high demand. Technologies like artificial intelligence can consume raw data faster than it can be prepared for a warehouse system.

A data lake is a term that describes both structured and unstructured data. It is considered an end solution for organizing the multi-sourced, diversified data that a business absorbs.

Broad Advantages

Lake systems facilitate the exploration and discovery of raw data. They are innovating data collection. Developing a lake ecosystem enables several data structures to be handled together. It can also easily add value and improve upon dissemination.

The main difference between a warehouse system and lake system is in the data processing. A lake system can ingest data much quicker because initial preparation is not needed. Instead, altering data can be done on the fly and as needed.

Practical Uses

There are several practical uses for data lakes. Applications for advanced analytics are especially interesting. Some useful analytics for mining are:

  • text and data mining
  • statistical analysis
  • clustering
  • those involving graphs

In regard to advanced analytics, data lakes extend the warehouse model and benefit applications that differ from reporting.

Data lakes also benefit operational reporting and business monitoring. They ensure that these applications have the newest data. This can be accomplished because unaltered data remains in its original form when it enters a lake ecosystem.

Potential Applications

The data ecosystem in marketing continues to grow in size and complexity. Real-time data is needed more often. Data lakes excel at these applications by ingesting from several channels while handling multiple touchpoints. The Internet of Things also plays a role. What becomes clear is that data sources, applications, and innovation all need a speedy solution. Data lakes provide a centralized, easy-to-use resource.

This blog was originally posted by Vincent Granville on