It was a particularly exciting year in the world of Big Data and Machine Learning. The field is growing faster than ever!

One of the most positive trends in 2019 has been the move to real-time data processing and stream processing, as opposed to batch or transactional data analytics. Adoption of this technology can be attributed to two factors: IoT technology shifted business data to events, and investment by the large technology companies who have used it to scale. Stream-processing is proving an effective way to scale distributed and microservice-type applications as discussed before.

The top 3 stream-processing projects by Apache on GitHub tell a clear story of continual strong growth in 2019:

Data governance continues to be a hot topic. Despite GDPR enforcement beginning in 2018, 2019 is the year it got its teeth with all four multi-million dollar fines taking place this year (Google, Marriott, 1&1, and British Airways). We are likely to see more in 2020, and not just in Europe as the California Consumer Privacy Act has a similar structure to GDPR and comes into force 1st January 2020.

Another trend is for SaaS providers to continue to democratise Machine Learning. Making more advanced models available for any programmer who knows how to use an API. These models range from image recognition, voice synthesis, text classification, recommendation engines and more. Although they are still only effective when presented with the exact use case they were designed for, the ability to customise and configure them has been improving rapidly. It is still the case that if you want to solve a more specialist problem, then you will need to recruit the expertise of a specialist Data Scientist.

Companies continue to invest significantly in this area. Earlier this year SAS announced [1] that they were investing $1bn over 3 years into AI & ML in order to improve their services. There have also been significant acquisitions. Salesforce bought Tableau for a whopping $15.7bn and Google bought Looker for $2.6bn. Not all are looking good, with a business asset sellout by MapR to HP signifying the beginning of the end for batch analytics on Hadoop – a technology stack that was a mainstay in early Big Data platforms.

2020 will be an exciting year for Big Data and Machine Learning technology. Watch this space!