Skip to main content

8 posts tagged with "bigdata"

View All Tags

· 2 min read

Hello folks!

Azure Data Lake is a big data solution which allows organizations to ingest multiple data sets covering structured, unstructured, and semi-structured data into an infinitely scalable data lake enabling storage, processing, and analytics. It enables users to build their own customized analytical platform to fit any analytical requirements in terms of volume, speed, and quality.

Cloud Scale Analytics with Azure Data Services: Build modern data warehouses on Microsoft Azure

In the context of above, the book "Cloud Scale Analytics with Azure Data Services" book is your guide to learning all the features and capabilities of Azure data services for storing, processing, and analyzing data (structured, unstructured, and semi-structured) of any size. You will explore key techniques for ingesting and storing data and perform batch, streaming, and interactive analytics

The book also shows you how to overcome various challenges and complexities relating to productivity and scaling. Next, you will be able to develop and run massive data workloads to perform different actions. Using a cloud based big data-modern data warehouse–analytics setup, you will also be able to build secure, scalable data estates for enterprises. Finally, you will not only learn how to develop a data warehouse but also understand how to create enterprise-grade security and auditing big data programs.

By the end of this Azure book, you will have learned how to develop a powerful and efficient analytical platform to meet enterprise needs.

WIN YOUR FREE COPY

Enter your details and Click Submit to enter our monthly prize draw. Take your chance to receive one of the 10 free copies as give away.

· 5 min read

We have a wide variety of options to store data in Microsoft Azure. Nevertheless, every storage option has a unique purpose for its existence. In this blog, we will discuss ADLS (Azure Data Lake Storage) and its multi-protocol access that Microsoft introduced in the year 2019.

Introduction to ADLS (Azure Data Lake Storage)

According to the Microsoft definition, it is an enterprise-wide hyper-scale repository for big data analytics workloads and enables you to capture data of any size and ingestion speed in one single space for operational and exploratory analytics.

The main purpose of its existence is to enable analytics on the stored data (it may be of any type structured, semi-structured and unstructured data) and provide enterprise-grade capabilities like scalability, manageability, reliability, etc.

Where does it build?

ADLS is built on the top of the Azure Blob Storage. Blob Storage is one of the storage services under the suite of Storage accounts. Blob storage lets you store any type of data and it doesn’t necessarily to be a specific data type.

Does the functionality of ADLS sound like the Blob storage?

From the above paragraphs, it looks like both ADLS and Blob storage has the same functionality. Because, both the services can be used to store any type of data. But, as I said before, every service has its purpose for its existence. Let us explore, what is the difference between ADLS and Blob storage in the following.

Difference between ADLS and Blob storage

Purpose

It is optimized for analytical purposes on the data stored in the ADLS, but Blob storage is a usual way of storing file-based information in Azure where the data which will not be accessed very often also called as cold storage.

Cost

In both the storage options, we need to pay the amount for the data stored and I/O operations. In the case of ADLS, the cost is slightly higher than the Blob.

Support for Web HDFS interface

ADLS supports a standard web HDFS interface and can access the files and directories in Hadoop. Blob does not support this feature.

I/O performance

ADLS is built for running large scale systems that require massive read throughput when queried against the DB at any pace. Blob is used for store data which will be accessed infrequently.

Encryption at rest

Since ADLS GA, it supports encryption at rest. It encrypts data flowing in public networks and at rest. Blob Storage does not support encryption at rest. See more details on the comparison here.

Now, without any further delay let us dig on the Multi-protocol access for ADLS.

Multi-protocol access for ADLS

This is one of the significant announcements that Microsoft has done in the year 2019 as far as ADLS is concern. Multi-protocol access to the same data allows you to leverage existing object storage capabilities on Data Lake Storage accounts, which are hierarchical namespace-enabled storage accounts built on top of Blob storage. This allows you to put all your different types of data in the data lake so that the users can make the best use of your data as the use case evolves.

The multi-protocol concept can be achieved via Azure Blob storage API and Azure Data Lake Storage API. The convergence of both the existing services, ADLS Gen1 and blob storage, paved the path to a new term called Azure Data Lake Storage Gen 2.

Expanded feature set

With the announcement of multi-protocol access, existing blob features such as access tiers and lifecycle management policies are now unlocked for ADLS. Furthermore, it enables many of the features and ecosystem support of blob storage is now supported for your data lake storage.

This could be a great shift because your blob data can now be used for analytics. The best thing is you don’t need to update the existing applications to get access to your data stored in Data Lake Storage. Moreover, you can leverage the power of both your analytics and object storage applications to use your data most effectively.

While exploring the expanded feature sets, one of the best things I could found is that ADLS can now be integrated with Azure Event Grid.

Yes, we have one more publisher on the list for Azure Event Grid. Azure Event Grid can now be used to consume events generated from Azure Data Lake Storage Gen2 and routed to its subscribers with webhooks, Azure Event Hubs, Azure Functions, and Logic Apps as endpoints.

Modern Data Warehouse scenario

The above image depicts the use case scenario of ADLS integration with Event Grid. First off, there are a lot of data comes from different sources like Logs, Media, Files and Business apps. Those data are ending up in the ADLS via Azure Data Factory and the Event Grid which listens to the ADLS gets triggered once data reaches it. Further, the event gets routed via Event Grid and Functions to Azure Databricks. The file will be processed by the databricks job and writes the output back to Azure Data Lake Storage Gen2. Meanwhile, Azure Data Lake Storage Gen2 pushes a notification to Event Grid which triggers an Azure Function to copy data to Azure SQL Data Warehouse. Finally, the data will be served via Azure Analysis Services and PowerBI.

Wrap-up

In this blog, we have seen an introduction about the Azure Data Lake Storage and the difference between ADLS and blob storage. Further, we investigated the multi-protocol access which is one of the new entrants in ADLS. Finally, we looked into one of the extended feature sets - integration of ADLS with Azure Event Grid and its use case scenario.

I hope you enjoyed reading this article. Happy Learning!

Image Credits: Microsoft

This article was contributed to my site by Nadeem Ahamed and you can read more of his articles from here.

· 4 min read

You might have noticed my recent posts were mostly focused on how to get started and learn Azure in 2020. This post is also to help all Azure enthusiasts to get to know and download all the free ebooks by Microsoft to learn more in depth about Azure. I have added the reference to this blog in my tool Azure 360 as well. There are around 174 ebooks which can be downloadable from the Microsoft site as of now. I have listed the top 60 books which are recommended and liked,shared the most among the social mediums.

Ebooks For Achitects,Developers and Decision Makers

eBook NameDownload
Azure for ArchitectsPDF
Designing Distributed SystemsPDF
Cloud Migration EssentialsPDF
Kubernetes : Up and RunningPDF
Learning Azure Cognitive ServicesPDF
Effective DevOps—Building a DevOps Culture at ScalePDF
How to Containerize Your Go CodePDF
Build and deploy a multi-container application in Azure Container ServicePDF
Build and deploy multi-container application in Azure Service FabricPDF
Kubernetes objects on Microsoft AzurePDF
Azure Serverless Computing CookbookPDF
Create your first intelligent bot with Microsoft AIPDF
Best Practices for Migrating Windows Servers to AzurePDF
Cloud Database Migration EssentialsPDF
Getting started with Apache Spark on Azure DatabricksPDF
15 Lessons Learned: Migrating SAP to the CloudPDF
Learning Node.js Development and deploy on AzurePDF
Cloud Analytics with Microsoft AzurePDF
Grow Your ISV Business with SaaSPDF
Building Intelligent Cloud ApplicationsPDF
Manage your network more effectively with the Azure Networking CookbookPDF
Developer’s Guide to Getting Started with Microsoft Azure Database for MySQLPDF
Developer's Guide to Getting Started with Cosmos DBPDF
Quick Start Guide to Azure SentinelPDF
The Developer's Guide to AzurePDF
Kubernetes on AzurePDF
Professional Azure SQL Database AdministrationPDF
Devops with ASP.NET Core and AzurePDF
Devops for Containerized AppsPDF
Enterprise Cloud StrategyPDF
Implementing a Zero Trust approach with Azure Active Directory PDF
Microsoft Azure Trips and Tips - DataPDF
Azure AD Application Proxy – Adoption Kit – eBookPDF
Azure Active Directory B2B Collaboration – Adoption Kit – eBookPDF
AI for Retail: Learn the scenarios that are driving today's digital consumerPDF
Building IoT Solutions with Azure: A Developer’s GuidePDF
The enterprise developer’s guide to building five-star mobile appsPDF
Modernizing existing .NET appsPDF
Azure Active Directory Company Branding- Adoption Kit – eBookPDF
Azure Migration SQL Server to Azure SQL Database Managed Instance a step by step guidePDF
Azure Active Directory Connect Health- Adoption Kit – eBookPDF
Azure Active Directory Self-Service Group Management – eBookPDF
AI in Action—explore three technical case studies in one guidePDF
Azure Active Directory Identity Protection – eBookPDF
Azure Multi-Factor Authentication – eBookPDF
Azure Privileged Identity Management – eBookPDF
Azure Active Directory Single Sign-On – eBookPDF
Azure Active Directory Self-Service Password Reset – eBookPDF
Azure Active Directory User Provisioning – eBookPDF
Five Principles for Deploying and Managing Linux in The Cloud with AzurePDF
Optimizing Azure Site Recovery (ASR) WAN OptimizersDOC
Azure VM – Oracle 12c on Linux – Configuration Steps – eBookPDF
Azure Strategy and Implementation GuidePDF
Build your first intelligent app with a guide from O’ReillyPDF
Guide to migrate schema & data from Oracle to Azure SQL DBPDF
How to Set up Azure AutomationPDF
Deploy IBM DB2 pureScale on AzurePDF
Azure AD in Windows 10 cloud subscriptionsPDF
Learn Azure in a Month of LunchesPDF
Microsoft Azure Essentials Migrating SQL Server DatabasesPDF
Migrate your SAP estate to the cloud—securely and reliablyPDF
Microsoft Azure ExpressRoute GuidePDF
Making the Most of the Cloud EverywherePDF
APIs + MicroservicesPDF
Hands-On Linux Administration on AzurePDF
Designing your Hybrid Cloud Strategy: Identity and Access ManagementPDF
Overview of Azure Active DirectoryDOC
Containerize Your Apps with Docker and KubernetesPDF
Solve your big data and AI challenges with an Azure Databricks use case e-bookPDF
Azure Rapid Deployment Guide For Azure Rights ManagementPDF
Practical Microsoft Azure IaaSPDF
IoT in the Real World: Stories from ManufacturingPDF
Continuous Delivery in JavaPDF
Azure Rethinking Enterprise Storage: A Hybrid Cloud ModelPDF
Azure AD/Office 365 seamless sign-inDOC
Exam Ref AZ-900 Microsoft Azure Fundamentals (NOT eBook)PDF
Azure AD & Windows 10: Better Together for Work or SchoolDOC

If you want to access all the Ebooks,Research papers, Reports in one place you can directly go here and get it. Hope these links will be helpful and Azure be the cloud you love!

· 4 min read

You might have noticed my recent posts were mostly focused on how to get started and learn Azure in 2020. Want to become more productive with Azure?. There are around 174 ebooks which can be downloadable from the Microsoft site as of now. I have listed the top 60 books which are recommended and liked,shared the most among the social mediums.

eBooks For Architects,Developers and Decision Makers

eBook NameDownload
Azure for ArchitectsPDF
Designing Distributed SystemsPDF
Cloud Migration EssentialsPDF
Kubernetes : Up and RunningPDF
Learning Azure Cognitive ServicesPDF
Effective DevOps—Building a DevOps Culture at ScalePDF
How to Containerize Your Go CodePDF
Build and deploy a multi-container application in Azure Container ServicePDF
Build and deploy multi-container application in Azure Service FabricPDF
Kubernetes objects on Microsoft AzurePDF
Azure Serverless Computing CookbookPDF
Create your first intelligent bot with Microsoft AIPDF
Best Practices for Migrating Windows Servers to AzurePDF
Cloud Database Migration EssentialsPDF
Getting started with Apache Spark on Azure DatabricksPDF
15 Lessons Learned: Migrating SAP to the CloudPDF
Learning Node.js Development and deploy on AzurePDF
Cloud Analytics with Microsoft AzurePDF
Grow Your ISV Business with SaaSPDF
Building Intelligent Cloud ApplicationsPDF
Manage your network more effectively with the Azure Networking CookbookPDF
Developer’s Guide to Getting Started with Microsoft Azure Database for MySQLPDF
Developer's Guide to Getting Started with Cosmos DBPDF
Quick Start Guide to Azure SentinelPDF
The Developer's Guide to AzurePDF
Kubernetes on AzurePDF
Professional Azure SQL Database AdministrationPDF
Devops with ASP.NET Core and AzurePDF
Devops for Containerized AppsPDF
Enterprise Cloud StrategyPDF
Implementing a Zero Trust approach with Azure Active Directory PDF
Microsoft Azure Trips and Tips - DataPDF
Azure AD Application Proxy – Adoption Kit – eBookPDF
Azure Active Directory B2B Collaboration – Adoption Kit – eBookPDF
AI for Retail: Learn the scenarios that are driving today's digital consumerPDF
Building IoT Solutions with Azure: A Developer’s GuidePDF
The enterprise developer’s guide to building five-star mobile appsPDF
Modernizing existing .NET appsPDF
Azure Active Directory Company Branding- Adoption Kit – eBookPDF
Azure Migration SQL Server to Azure SQL Database Managed Instance a step by step guidePDF
Azure Active Directory Connect Health- Adoption Kit – eBookPDF
Azure Active Directory Self-Service Group Management – eBookPDF
AI in Action—explore three technical case studies in one guidePDF
Azure Active Directory Identity Protection – eBookPDF
Azure Multi-Factor Authentication – eBookPDF
Azure Privileged Identity Management – eBookPDF
Azure Active Directory Single Sign-On – eBookPDF
Azure Active Directory Self-Service Password Reset – eBookPDF
Azure Active Directory User Provisioning – eBookPDF
Five Principles for Deploying and Managing Linux in The Cloud with AzurePDF
Optimizing Azure Site Recovery (ASR) WAN OptimizersDOC
Azure VM – Oracle 12c on Linux – Configuration Steps – eBookPDF
Azure Strategy and Implementation GuidePDF
Build your first intelligent app with a guide from O’ReillyPDF
Guide to migrate schema & data from Oracle to Azure SQL DBPDF
How to Set up Azure AutomationPDF
Deploy IBM DB2 pureScale on AzurePDF
Azure AD in Windows 10 cloud subscriptionsPDF
Learn Azure in a Month of LunchesPDF
Microsoft Azure Essentials Migrating SQL Server DatabasesPDF
Migrate your SAP estate to the cloud—securely and reliablyPDF
Microsoft Azure ExpressRoute GuidePDF
Making the Most of the Cloud EverywherePDF
APIs + MicroservicesPDF
Hands-On Linux Administration on AzurePDF
Designing your Hybrid Cloud Strategy: Identity and Access ManagementPDF
Overview of Azure Active DirectoryDOC
Containerize Your Apps with Docker and KubernetesPDF
Solve your big data and AI challenges with an Azure Databricks use case e-bookPDF
Azure Rapid Deployment Guide For Azure Rights ManagementPDF
Practical Microsoft Azure IaaSPDF
IoT in the Real World: Stories from ManufacturingPDF
Continuous Delivery in JavaPDF
Azure Rethinking Enterprise Storage: A Hybrid Cloud ModelPDF
Azure AD/Office 365 seamless sign-inDOC
Exam Ref AZ-900 Microsoft Azure Fundamentals (NOT eBook)PDF
Azure AD & Windows 10: Better Together for Work or SchoolDOC

If you want to access all the EBooks,Research papers, Reports in one place you can directly go here and get it. Hope these links will be helpful and Azure be the cloud you love!

· 5 min read

Traditional Architecture :

In a traditional Application with the normal approach, transactional use-cases usually involve persisting data in a few SQL tables or in a NOSQL database. When the changes are performed on the object the database is updated to match the new state.

The traditional approach works well in case if you do not need to know the changes that object has gone through, but in modern systems customers always comes up with a requirement to get the log of changes that particular entity has gone through. With the traditional approach, there is no way of knowing what the user had in the object before changing it, or at which point of time the contents changed. We can still solve this with the traditional way by storing the extra information about the modifications but the solution becomes more complex.

For example in traditional approach,

https://gist.github.com/sajeetharan/2d9921571c67f7038ec5a4053882b85f

Which will create an entry for each insert in the SQL database as follows,

2019-02-03_13-10-23

The current state is saved in a relational database. We load the object, change it and save  it back.

EventSourcing Architecture :

In the eventsourcing solution, we look at the problem as a sequence of events that occur and save the occurrence of events as it is. The events contains all details about what actually happened at particular point of time. These are historical information and once it is saved it should not be modified.

https://gist.github.com/sajeetharan/825ec83fd780b7670146649bf6d4a0ce

All events for a certain product are stored. Their data and sequence define the current state of the product. Event is the easiest way to remember what happened at a certain time. Event sourcing comes with an advantage of having audit trail by itself and to get full understanding of what the system is doing.

Event Sourcing Architecture with AzureCosmosdb and EventHub

To implement event sourcing in your application, Microsoft azure provides the following services to  full fledged solution and we will discuss in this blog.

Lets look at the diagram below,

NEW_LEGAL

Application 1 stores the data in the traditional database and your customer needs the changes that has been done on the product. The above architecture will easily fulfill the requirement with the event sourcing.

Components involved in the architecture as follows,

Azure EventHub

Azure Eventhub is a managed service to receive and process millions of events per second. It is intended to handle event based messaging in huge scale. This could be used in an product if you have devices application publishing events and send them to eventhub. It will create a stream of all these events which can be read by different applications in different ways. Eventhub provides interfaces such as AMQP and HTTP to make it easy to send messages to it. In Eventhub we can define consumer groups which lets us to read stream of events. We can decide on consumer group based on the number of receivers(applications)

CosmosDB

Azure Cosmos DB is a globally-distributed, multi-model database as a service build for low latency and elastic scalability.  It supports the following options to store the data and it is highly available from anywhere in the world,

  • Key-value
  • Column-family
  • Document: MONGO or SQL
  • Graph

I will be not going in detail as there are enough blogs to get started with CosmosDB. In the above architecture there will be millions of events created after each update hence we need to store them in the cosmosdb with the state of the object. This way brings a lot of benefits. First, the event store with cosmosdb becomes your canonical source of truth that describes the updates applied to your domain in an unbiased form.

Implementation:

Application 1:

Whenever user updates an object in the application1, there will be notification message sent to the EventHub with an ID (unique id for each message) that something has happened on application. We could make use of epoch timestamp with 8 digits to make sure it is a unique one. A sample payload would look like,

{"MessageId": 1547632386819}

Note:  As Eventhub can have a message of maximum size 256k it is always better to have minimum size of message.

Once the notification is sent, the state of the object is stored in the eventstore(cosmosdb).

Application 2:

Application 2 will have an EventHub receiver which runs on the background which will subscribe to the EventHub and get the latest message. Once the id is retrieved by the receiver, it can request the eventstore with the id and get all the changes prior to the id as follows,

https://gist.github.com/sajeetharan/c34965a606c8afff9d02f2a3a17522bf

which will create the documents in Cosmosdb as,

2019-02-03_16-55-58

With the above approach ensures that all changes to product are stored as sequence of events. When we look at broader picture, it also ensures that all changes to application state are stored.

This is the simple architecture diagram to implement event sourcing in your application. One of the very good pattern to implement event sourcing is by using CQRS(Command Query Responsibility Segregation).

test

Lets look at the etail implementation with the code in the upcoming blogs. Hope this will help someone out there to implement event sourcing in your application if you are using Azure platform.