The world of technology has much in common with the world of fashion. Both, for example, are ruled by trends. Last season, Data Lake was fashionable, which is now passe. This season, Machine Learning is on top. And what will be fashionable in the next one? It's hard to guess.
It's always a good idea to keep your hand in, but blindly following fashion can sometimes be dangerous. We can possibly accept if there is a coat in our closet that we will never wear again. It's much worse if our insurance company invests several million zlotys and several years of work in a solution that was only a fad. And such a scenario is unfortunately very common in business organizations. How to avoid such a mistake? Is it worth implementing Machine Learning?
What are the laws of fashion?
First, we need to understand how the mechanism of fashion in technology works. New technologies first get more and more publicity and reach the apogee of popularity. Sticking with the example of Data Lake, at the peak of expectations it was rallied that data lakes were the solution to the Big Data problem, would replace outdated data warehouses and almost save the world. It is at the moment of such hype that many companies invest in the technology. And then problems arise.
Quite quickly, it turns out that expectations were too high, and that Data Lake technology neither solves all the problems (because none do), nor is it mature enough to be widely used. So there is a complete reversal of thinking about Data Lake, and there are claims that they are useless and should not be used. But this is also not true.
Eventually there is some stabilization, we begin to understand when a technology is worth using and when it is useless. In the case of Data Lake, we understood that they do not replace data warehouses, because they are a completely different solution. On the other hand, we can find other interesting applications for them.
Understanding technology
The second step is to try to understand the strengths and weaknesses of the technology. In the case of Data Lake, we have three key advantages. First, this type of solution, unlike data warehousing, supports the processing of unstructured data like PDF files, videos, images or sounds. Second, it is designed to store huge amounts of raw data. Third, the costs it generates are lower than those of classic databases.
On the other hand, Data Lake also has disadvantages. First, accessing the data requires knowledge of one of the programming languages. Secondly, Data Lake data by definition does not have an analytical structure, so you have to create it yourself (schema-on-read). Third, Data Lake does not provide many mechanisms to control the quality of data in terms of its correctness or consistency.
Select a scenario
Understanding the advantages and disadvantages of the technology will allow us to evaluate possible scenarios for its application in the third step. Given its ability to store huge amounts of diverse data at a reasonable cost, interesting applications of Data Lake will be the collection of all data from an organization for further construction of a data warehouse, the collection of images of traffic damage for further processing by Artificial Intelligence algorithms, or the collection of billions of records generated by IoT devices plugged into cars in support of "pay as you drive" traffic policies.
On the other hand, Data Lake, due to the uncertain quality of the data, as well as the lack of an imposed schema, will not be useful for: replacing a data warehouse, as a source of reporting on an insurance company's bottom line, or as a direct source of data for machine learning algorithms assessing insurance risk or the likelihood of a customer leaving.
Always in fashion
At the beginning I wrote that the world of technology is similar to the world of fashion. And in both cases we can apply similar strategies. On the one hand, we can throw away all our clothes every year and buy new ones - currently fashionable ones. On the other hand, the core of our closet can be classic clothes that will always work in certain conditions. And here I mean not only a classic suit or a "little black", but also a rain jacket with a good membrane.
Instead of focusing on fashion and getting into the "latest" and "best" technologies every year, it is better to focus on choosing the right solutions to fit our needs. Because the truth is that the latest technologies don't solve any problems. Only the well-chosen ones do. They are the ones that will serve us well and will not go out of fashion for a long time.