Scroll to top
Tech.mt - Malta Leading Through Innovation
Share

The challenges of the data dumpster.


Eunoia Limited - May 25, 2021 - 0 comments

It is widely acknowledged that data is growing exponentially and there is no sight of slowing down. The more we are connected, through systems, social media, intelligent office, paperless offices, augmented devices and the list never ends, the more data is generated. According to Forbes from 2010 to 2020 the volumes increased by 5000% to 59 trillion gigabytes, in just 10 years. Furthermore, IDC published a report stating that the amount of data that will be created over the next three years will be more than the data created over the past 30 years.

Ten/fifteen years ago DBAs were given the task to make sure that all data is archived and so they embraced the Data Lake concept which years later became a dumpster of all data in different formats and structures, but truly voluminous. Fast forward to today and following regulations such as the right to be forgotten, executives turned again to their DBAs to check if it was possible to identify a record in that dumpster and ensure that in all the structures, archives, backups etc. it was anonymized. Just imagine a simple example. If a website keeps data of the individual in the datawarehouse it’s fine, easy to identify. But then if there are logs in unstructured formats, images, videos, documents, social media content etc., binding that data to the same person record, well to make a parallel comparison, it would be easier to climb the Everest on a winter day than to identify that record content. Anyone living in this space knows this and is already scrambling to find a solution, but for anyone else this might come as a shock. The techies who have worked so hard in the past to implement big data platforms using the hadoops of this world, now are facing new and immensely difficult challenges.

“….whilst we are at it, add a list of new features and a 50x performance improvement over the traditional Delta Lake and Data Warehouse.”

Fortunately enough, the tech has moved on and the Data lake and the Data Warehouse now have evolved, have a sibling, the lakehouse. The objective of the lakehouse is to get the best of both worlds and whilst we are at it, add a list of new features and a 50x performance improvement over the traditional Delta Lake and Data Warehouse. This is creating a lot of momentum, because it makes incredible sense to move these workloads to the new tech, which is also in the cloud and consumed as a service. Platforms such as Databricks which is a unified platform that pioneered the lakehouse, making the binding of schemas within the different structures and formats a reality. Moreover the platform has a unique collaboration standpoint, bringing together all the data roles of an organization. Data engineers to build proper pipelines, data analysts to visualize the data story, and the data scientist to make use of the Artificial Intelligence capabilities of the platform to augment the data and predict outcomes.

Eunoia has been founded as a data and analytics company. The journey has been instrumental to become experts in legacy business intelligence solutions and grow as solution experts in highly engineered cloud big data solutions. Microsoft Azure has provided us with a great opportunity to increase customer ROI and drastically reduce the TCO. Our partnership with Microsoft is a proof of our industry capabilities through our talented, certified and genius talent pool. Moreover this year we have cemented our position in the space by becoming the first Databricks Professional Partner on the Island serving Malta, Cyprus and Greece.

“As part of our mission to evangelize the technologies, we hold regular Analytics in a Day workshops for Developers interested in this space.”

Databricks Unified Data Analytics Platform helps organizations accelerate innovation by unifying data science with engineering and business. This managed Cloud service auto-scales clusters and includes an optimized version of Apache Spark that is up to 50x faster, and uses Delta Lake to bring data reliability and scalability to your existing data lake. It’s the best-in-class collaborative platform that truly unifies data science and data engineering for fast iterations of data prep, model training and production deployment. Databricks customers also benefit from data security, compliance and reduced DevOps costs. All of this means organizations can finally apply AI across their data and drive disruptive innovations to the market.

Related posts