Skip to main content

2 posts tagged with "spark"

View All Tags

· 2 min read

Hello folks!

Azure Data Lake is a big data solution which allows organizations to ingest multiple data sets covering structured, unstructured, and semi-structured data into an infinitely scalable data lake enabling storage, processing, and analytics. It enables users to build their own customized analytical platform to fit any analytical requirements in terms of volume, speed, and quality.

Cloud Scale Analytics with Azure Data Services: Build modern data warehouses on Microsoft Azure

In the context of above, the book "Cloud Scale Analytics with Azure Data Services" book is your guide to learning all the features and capabilities of Azure data services for storing, processing, and analyzing data (structured, unstructured, and semi-structured) of any size. You will explore key techniques for ingesting and storing data and perform batch, streaming, and interactive analytics

The book also shows you how to overcome various challenges and complexities relating to productivity and scaling. Next, you will be able to develop and run massive data workloads to perform different actions. Using a cloud based big data-modern data warehouse–analytics setup, you will also be able to build secure, scalable data estates for enterprises. Finally, you will not only learn how to develop a data warehouse but also understand how to create enterprise-grade security and auditing big data programs.

By the end of this Azure book, you will have learned how to develop a powerful and efficient analytical platform to meet enterprise needs.

WIN YOUR FREE COPY

Enter your details and Click Submit to enter our monthly prize draw. Take your chance to receive one of the 10 free copies as give away.

· 2 min read

Let me start this blog with a little example. Assume, Sachin have a leak in a water pipe in his garden. He takes a bucket and a some sealing material to fix the problem. After a while, he see s that the leak is much bigger that and he needs a specialist  to bring bigger tools. Meanwhile, he still uses the bucket to drain the water. After a while, he notices that a massive underground stream has opened and he needs to handle millions of liters of water every second. 

He does n't just need new buckets, but a completely new approach to looking at the problem just because the volume and velocity of water has grown. To prevent the town from flooding, maybe he needs his government to build a massive dam that requires an enormous civil engineering expertise and an elaborate control system. To make things worse, everywhere water is gushing out from nowhere and everyone is scared with the variety.

Welcome to Big Data.

Key elements of Big Data:

  1. There are over 600 million tweets every day that is flowing every second which tells about the High Volume & Velocity
  2. Next  need to understand what each tweet means - where is it from, what kind of a person is tweeting, is it trustworthy or not which tells about the High Variety
  3. Identify the sentiment - is this person talking negative about iPhone or positive? which describes about the High Complexity
  4. And finally need to have a way to quantify the sentiment and track it in real time which tells about High Variability

Traditional architecture of any Big Data solution would look something like below,