Skip to main content

21 posts tagged with "cosmosdb"

View All Tags

· 4 min read

Recently, I was involved in working with a customer who had their data in on prem SQL server. As they are shifting their soultion to Cloud(Azure) with Cosmosdb, the first requirement was to migrate the existing data to Cosmosdb. In this post, I will be explaining on how to do the migration and also how to do the data transformation as Cosmosdb stores the data as key value(JSON) format in the collection whereas MSSQL stores it as a row in the table.

Cosmosdb Data Migration Tool:

The migration tool provided by Cosmosdb team supports a different various data sources. To date, it can import data that you currently may have stored in SQL Server, existing JSON files, flash files of comma separated values, MongoDB, Azure Table Storage.

PreRequisities
  1. MSSQL Server , if you don't have it grab it from here.
  2. Download Cosmosdb data migration tool.
  3. Cosmosdb Account on Azure.

In this post i will show on how to import data from AdventureWorks and if you don't already know, AdventureWorks is a popular sample database for SQL Server. There are more than 40 tables in the database and we will use one of the Views to migrate to the Cosmosdb. Lets pick vStoreWithAddress. It has the columns such as Name,AddressType,AddressLine etc.As we are migrating this data  to Cosmosdb we could make use of nested values by merging address fields together. Let's get started.

 Step 1:

Connect to AdventureWorks2017 database and open the view Sales.vStoreWithAddress. 1

Step 2:

Lets assume the requirement is to migrate the data which contains the AddressType as shipping. With this step we will also do the data transformation by merging the address fields together. So the query will be like, https://gist.github.com/sajeetharan/985bd411e3e80ebb8134a16c684748ce  

Step 3:

Open Cosmosdb data migration tool. You can open it by clicking the dtui.exe inside downloaded folder  as mentioned in the prerequisites section.

Step 4:

We need to fill the source information in the tool. As we know our data source is from SQL server. Pick the source as SQL. 3

Step 5:

You need to fill the connection string for SQL server, which you can obtain easily by going to server explorer and connecting to the SQL server as follows. 5 Once you enter the connection string verify if its working by clicking on verify.

Step 6:

Next step is to enter the query to select the source data, we can do this by either pasting the query or by selecting the SQL query file. 3

Step 7:

We need to give the Nesting Operator as “.” As we have used to merge the Address parts as one object. Once this is done click on next.

Step 8:

Next step is to fill the target information, as you know here our target database is Cosmosdb. I assume you have a Cosmosdb account, you can obtain your connection string by navigating to Azure portal and select Cosmosdb account.

6 Once you copy paste the connection string, one more extra thing you need to do is to append the database with the connections string https://gist.github.com/sajeetharan/1e6e21339820b8437b1d9542e8133d51#file-sajeetharan-com\_4192016\_mssql\_cosmosdb You can verify the connection string by clicking on verify button. Also give the collection name as you prefer. It is important to define a PartitionKey in order to query the data later.Partition key is used to group multiple documents together within physical partitions. Let's partition by “/address/postalCode” which we're storing as postal code nested beneath address and for throughput, we'll just go with the default here of a thousand request units per second. One more thing we need to set the indexing policy. You'll notice this large text box here where you can set the indexing policy.I want to choose the range indexing policy which you can do here is by right clicking inside the text box and selecting from the context menu. 7 Just click Next and you will be taken to the summary page, where you can review the Migration steps at once.

Step 9:

Once you click import on the last step. You will see a successful message as all the data been transferred with the number of documents created. 8

Step 10:

You could verify the migration by running a query on the Cosmosdb data explorer as follows, 10 That's how you  migrate data from SQL server to Cosmosdb and it is very easy using the Cosmosdb Migration tool. For the other modes of data migration i will be writing separate blogs. Hope this blog helps if someone wants to transform data and migrate the data from MSSQL to cosmosdb. Cheers!

· 2 min read

I have been working with couple of applications built with CosmosDB and one of the things that surprised me was one cannot clear all documents in a collection from the Azure web portal or using the Storage Explorer. As I was struggling to do this while doing some tests on the application I decided to write a blog on the solution I used. There are two ways to achieve the same

  • Using a stored procedure
  • Using Cosmosdb SDK

Using Cosmosdb SDK:

I came up with a script in Node which can be done with any of the programming languages such as C#,Python supported by the SDK

Let’s go through the steps:

Step 1:

Open VScode and Create a file named cosmosdb_helper.js

Step 2:

Let’s install the necessary packages needed.

Install documentdb javascript sdk with the following command,

npm i documentdb

and you will see the output as follows

2019-02-04_21-36-10

Let’s install require to handle the dependencies with the following command,

npm i require

and you will see the output as follows,

2019-02-04_21-36-31

Step 3: 

Let's do some coding. You will be able to understand the following code with the comments added on each line,

https://gist.github.com/sajeetharan/8efe2c9424dfc89d1f58b34627858944

Suppose if you have partitionKey created with your collection, you need to pass queryoptions with the partitionKey in selectAll as well as deletDocument as follows,

https://gist.github.com/sajeetharan/d5302257d3b5e54a33e5601b215decf1

Step 4:

Let’s run the script and see the output,

You can run the helper script as follows,

if you want to list all documents in the collection,

node cosmosdb_helper.js selectAll

which will list the output of all documents in the collection.

If you want to delete all documents within a collection, you can run the script as,

node cosmosdb_helper.js deletAll

which will remove all documents in the collection.

Using a stored procedure:

As mentioned above, 2nd way is to use the stored procecure given by Microsoft employee as mentioned here.

Hope the helper script will help someout out there in order to delete all documents in a collection. You can get the whole code from Cosmosd_Helper

· 5 min read

Traditional Architecture :

In a traditional Application with the normal approach, transactional use-cases usually involve persisting data in a few SQL tables or in a NOSQL database. When the changes are performed on the object the database is updated to match the new state.

The traditional approach works well in case if you do not need to know the changes that object has gone through, but in modern systems customers always comes up with a requirement to get the log of changes that particular entity has gone through. With the traditional approach, there is no way of knowing what the user had in the object before changing it, or at which point of time the contents changed. We can still solve this with the traditional way by storing the extra information about the modifications but the solution becomes more complex.

For example in traditional approach,

https://gist.github.com/sajeetharan/2d9921571c67f7038ec5a4053882b85f

Which will create an entry for each insert in the SQL database as follows,

2019-02-03_13-10-23

The current state is saved in a relational database. We load the object, change it and save  it back.

EventSourcing Architecture :

In the eventsourcing solution, we look at the problem as a sequence of events that occur and save the occurrence of events as it is. The events contains all details about what actually happened at particular point of time. These are historical information and once it is saved it should not be modified.

https://gist.github.com/sajeetharan/825ec83fd780b7670146649bf6d4a0ce

All events for a certain product are stored. Their data and sequence define the current state of the product. Event is the easiest way to remember what happened at a certain time. Event sourcing comes with an advantage of having audit trail by itself and to get full understanding of what the system is doing.

Event Sourcing Architecture with AzureCosmosdb and EventHub

To implement event sourcing in your application, Microsoft azure provides the following services to  full fledged solution and we will discuss in this blog.

Lets look at the diagram below,

NEW_LEGAL

Application 1 stores the data in the traditional database and your customer needs the changes that has been done on the product. The above architecture will easily fulfill the requirement with the event sourcing.

Components involved in the architecture as follows,

Azure EventHub

Azure Eventhub is a managed service to receive and process millions of events per second. It is intended to handle event based messaging in huge scale. This could be used in an product if you have devices application publishing events and send them to eventhub. It will create a stream of all these events which can be read by different applications in different ways. Eventhub provides interfaces such as AMQP and HTTP to make it easy to send messages to it. In Eventhub we can define consumer groups which lets us to read stream of events. We can decide on consumer group based on the number of receivers(applications)

CosmosDB

Azure Cosmos DB is a globally-distributed, multi-model database as a service build for low latency and elastic scalability.  It supports the following options to store the data and it is highly available from anywhere in the world,

  • Key-value
  • Column-family
  • Document: MONGO or SQL
  • Graph

I will be not going in detail as there are enough blogs to get started with CosmosDB. In the above architecture there will be millions of events created after each update hence we need to store them in the cosmosdb with the state of the object. This way brings a lot of benefits. First, the event store with cosmosdb becomes your canonical source of truth that describes the updates applied to your domain in an unbiased form.

Implementation:

Application 1:

Whenever user updates an object in the application1, there will be notification message sent to the EventHub with an ID (unique id for each message) that something has happened on application. We could make use of epoch timestamp with 8 digits to make sure it is a unique one. A sample payload would look like,

{"MessageId": 1547632386819}

Note:  As Eventhub can have a message of maximum size 256k it is always better to have minimum size of message.

Once the notification is sent, the state of the object is stored in the eventstore(cosmosdb).

Application 2:

Application 2 will have an EventHub receiver which runs on the background which will subscribe to the EventHub and get the latest message. Once the id is retrieved by the receiver, it can request the eventstore with the id and get all the changes prior to the id as follows,

https://gist.github.com/sajeetharan/c34965a606c8afff9d02f2a3a17522bf

which will create the documents in Cosmosdb as,

2019-02-03_16-55-58

With the above approach ensures that all changes to product are stored as sequence of events. When we look at broader picture, it also ensures that all changes to application state are stored.

This is the simple architecture diagram to implement event sourcing in your application. One of the very good pattern to implement event sourcing is by using CQRS(Command Query Responsibility Segregation).

test

Lets look at the etail implementation with the code in the upcoming blogs. Hope this will help someone out there to implement event sourcing in your application if you are using Azure platform.

· 4 min read

This is time for another blog on cosmosdb explaining how to stream tweets from twitter using hashtags and store them in cosmosdb in real time. You should be able to setup and run this demo within 15 minutes.

PRE REQUISITIES:

I have the following in my local environment , hope you guys have already have😊, if not start setting up.

  • Windows 10 OS
  • Python 2.7
  • Visual Studio Code or PyCharm (Any editor)
  • Azure subscription

let’s get started.

Step 1: Install Python

Hope you have already installed Python in your system , if not download and install from here. Once you install run the following command and see if its properly installed.

Step 2: Install Tweepy and PyDocumentDB

Install the following libraries needed.

Tweepy:

Tweepy is a python package which is easy to use for accessing the twitter api. The API class provides access to the entire twitter RESTful API methods. Each method can accept various parameters and return responses. Install it with the following command,

Pip install tweepy  

If you get an error 'pip' is not recognized as an internal or external command. You should set the path as follows,

C:\>set PATH=C:\Python27\Scripts

Now you should be able to install it without any issue,

Pydocumentdb:

As mentioned above we will be storing the tweets in Azure’s cosmosdb , In order to do that we need the python package for cosmosdb which is pydocumentdb. Install it with the following command.

Pip install pydocumentdb

Now we have everything needed. Lets dive into coding.

Step 3 : Creating Listener to invoke the cosmosdb client

Create a listener named CosmosDBListener with the following methods

__init__ Initializes the client to make sure the connection is available.

On_data will load the data retrieved from the stream and write to the Cosmosdb.

On_error will throw if there is any network/key issues on console.

 
from config import *
import json
from tweepy.streaming import StreamListener

class CosmosDBListener(StreamListener):

def __init__(self, client, collLink):
self.client = client
self.collLink = collLink

def on_data(self, data):
try:
dictData = json.loads(data)
dictData["id"] = str(dictData["id"])
self.client.CreateDocument(self.collLink, dictData)
return True
except BaseException as e:
print("Error on data: %s" % str(e))
return True

def on_error(self, status):
print(status)
return True

Step 4: Stream data from Twitter to CosmosDB

Lets create the real code to connect to twitter and get the related tweets for several hashtags. We will need to authenticate with tweepy to get the twets, so pass the consumer secret and access secret to the api as follows.

    auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)

Set the connection policy for cosmosdb and create a client as follows,

    connectionPolicy = documents.ConnectionPolicy()
connectionPolicy.EnableEndpointDiscovery
connectionPolicy.PreferredLocations = preferredLocations

Next step is to read the tweets as follows , we are using .filter method to get tweets related to particular hashtags.

client = document_client.DocumentClient(host, {'masterKey': masterKey}, connectionPolicy)
dbLink = 'dbs/' + databaseId
collLink = dbLink + '/colls/' + collectionId

twitter_stream = Stream(auth, CosmosDBListener(client, collLink))
twitter_stream.filter(track=['#CosmosDB', '#Microsoft', '#MVP', '#BigData', '#DataScience', '#Mongo', '#Graph'], async=True)

Step 5: Creating Configuration File

Create the config file with the following values,

# Enter CosmosDB config details below.
masterKey = ' '
host = ' '

#Enter your database, collection and preferredLocations here.
databaseId = 'tweepyDemo'
collectionId = 'tweets'
preferredLocations = ''

# Enter twitter OAuth keys here.
consumer_key = ''
consumer_secret = ''
access_token = ''
access_secret = ''

You need to have CosmosDB account on azure to get the master key and host values, if you are stuck , read my previous blog on How to setup cosmosdb account

You also need to register the script as a new application at twitter developer portal. After choosing a name and application for your app, you will be provided with a  consumer key , Consumer secret, access token and access token secret - which need to be filled into  the above config.py to provide the app programmatic access to Twitter.

Step 6: Run the script

That’s it folks now if you goto command prompt and run the following command,

py cosmosdbdriver.py

You should see the tweets coming into your cosmosdb collection as follows.

Tweets you need are now in your CosmosDB and use them for further analysis as you need. Hope it helps someone out there. If you are stuck at any point, look at the complete code from here.

· 8 min read

Azure CosmosDB (Azure Cosmos DB – Globally Distributed Database Service (formerly DocumentDB) | Microsoft Azure) is a super set of the service once known as “Azure Document Db”. In short: “Azure CosmosDB ” = “Azure Document Db” + new data types + new APIs.

You can try CosmosDB  for free on Azure or you can setup the CosmosDB on your local environment by following my previous blog. I am becoming a fan of .NET Core with all the features and it is getting better day by day . In this blog post i just wanted to take that initial steps of how to work with CosmosDB from .NET Core Client context. After reading this blog, you should be able to do the following with CosmosDB programmatically,

  • Create Database
  • Create Collection
  • Create Documents
  • Query a Document
  • Delete Database

Pre-Requisities Needed:

I have the following in my local environment , hope you guys have already have😊, if not start setting up.

  • Windows 10 OS
  • Azure CosmosDB Emulator
  • Visual Studio Code editor with C# plugin
  • .NET Core 2.0

Ok folks, lets get started.

Step 1: Create .Net Core Console Application :  As other tutorials, to make it simple I will be creating a dotnetcore console app to work with CosmosDB . With Net Core , we now  have a CLI. Lets create the new app with the following steps. (I’ve mentioned in the previous blog)

  1. Open command prompt or poweshell (Administrator Mode)
  2. Navigate to your folder where you need to create the app
  3. Execute the following command
dotnet new console -n CosmosCoreClient -o CosmosCoreClient

here -n denotes the name of the application, and -o tells the CLI to create a folder with that name and create the application inside the folder

Open the newly created project in Visual Studio Code. Execute the following command

Code.

Here is a screenshot of how it should look on your end:

I am using C# 7.1 feature to create a async Main method in my console app. For that, we will need to make a small change in our project file a little. Open CosmosDBClient.csproj file to edit. Add the following XML node to PropertyGroup node.

<LangVersion>latest</LangVersion>

After changes, your csproj file should look like below:

Lets move to the core part of integrating CosmosDB with .netCore application and start building the features.

Step 2: Add CosmosDB Nuget Package

If you have followed the above steps, we have successfully created the application, next is to add reference to CosmosDB nuget package to get the client libraries. Advantage of these packages/libraries are, they make it easy to work with Cosmosdb.

  1. Open a command prompt and navigate to root of your project.
  2. Execute the following command
dotnet add package Microsoft.Azure.DocumentDB.Core

You might wonder the namespace has DocumentDB in it. In fact DocumetDB is where the whole journey started and hence the name sticks in Cosmos world too. If you now look at the project file a new reference for DocumentDB would have been added. Here is the screenshot of my project file.

Step 3: Creating Model for CosmosDB

Lets build the database. If you are new to CosmosDB you should know that CosmosDB has a query playground here https://www.documentdb.com/sql/demo. It is a sandboxed environment with couple of databases and you can try around with different queries you can write against the database. For this post, lets create the database named Course locally.

Since we our application is to deal with the Courses we need 4 Models here.

  1. Course
  2. Session
  3. Teacher
  4. Student

Here are the Models of the above 4.

Course.cs

using Microsoft.Azure.Documents;
using Newtonsoft.Json;
using System;
using System.Collections.Generic;
public class Course : Document
{
[JsonProperty(PropertyName = "CourseId")]
public Guid CourseId { get; set; }

[JsonProperty(PropertyName = "Name")]
public string Name
{
get
{
return GetPropertyValue<string>("Name");
}
set
{
SetPropertyValue("Name", value);
}
}

[JsonProperty(PropertyName = "Sessions")]
public List<Session> Sessions { get; set; }

[JsonProperty(PropertyName = "Teacher")]
public Teacher Teacher { get; set; }

[JsonProperty(PropertyName = "Students")]
public List<Student> Students { get; set; }
}

Session.cs

using System;

public class Session
{
public Guid SessionId { get; set; }

public string Name { get; set; }

public int MaterialsCount { get; set; }
}

Teacher.cs

using System;

public class Teacher
{
public Guid TeacherId { get; set; }

public string FullName { get; set; }

public int Age { get; set; }
}

Student.cs

using System;

public class Student
{
public Guid StudentId { get; set; }
public string FullName { get; set; }

}

Lets create the Client as the next step.

Step 4: Creating the Client

Next step you will need to instantiate the CosmosDb client before we do anything with the database. In order to connect to the local instance of the cosmosDb, we need to configure 2 things,

  1. URL of the CosmosDb instane
  2. Authentication key needed to authenticate.

As stated above, When you start the CosmosDb  local emulator, the db instance is available at https://localhost:8081. The authkey for local emulator is a static key and you can find it here in this article(https://docs.microsoft.com/en-us/azure/cosmos-db/local-emulator#authenticating-requests). This key works only with the local emulator and wont work with your Azure instance, you can find the key if you are using azure instance from the portal as mentioned in the answer. Here is the code snippet to instantiate the client:

        static string endpointUri = "https://localhost:8081";
static string authKey = "C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw==";
string dbName = "CourseDB";
string collectionName = "Courses";
static void Main(string[] args)
{
Console.WriteLine("Press any key to run");
Console.ReadLine();

Run();

Console.ReadLine();

}
private static async void Run()
{
DocumentClient documentClient = new DocumentClient(new Uri(endpointUri),
authKey);
}

When the method Run is exectued the Client is instantiated with the local CosmosDB emulator.

Step 5: Lets start building the features

Next step is to build the features as listed above. Lets add the methods inside the Async method.

Creating Database:

To create a new database programmatically, we make use of CreateDatabaseAsync() or CreateDatabaseIfNotExistsAsync(). When creating the database we pass the database name. Here is the code snippet:

private static async Task<Database> CreateDatabase(DocumentClient documentClient)
{
Database database = documentClient.CreateDatabaseQuery().Where(c => c.Id == "courseDatabase").AsEnumerable().FirstOrDefault();
if (database == null)
{
database = await documentClient.CreateDatabaseAsync(new Database()
{
Id = "courseDatabase"
});
}
return database;
}

When you refresh the URL of local CosmosDB emulator, You should see the database created in your local db emulator as follows,

Creating Collection:

Once the database is created, we can then create a collection. We make use of CreateDocumentCollectionAsync() or CreateDocumentCollectionIfNotExistsAsync().

We will need to provide what is known as the database link (basically the URI at which the db can be reached) and the collection name to the create method. Here is the code snippet.

private static async Task<DocumentCollection> CreateDocumentCollection(DocumentClient documentClient, Database database)

{
DocumentCollection documentCollection = documentClient.CreateDocumentCollectionQuery(database.CollectionsLink).Where(c => c.Id == "courseDocumentCollection").AsEnumerable().FirstOrDefault();

if (documentCollection == null)
{
documentCollection = await documentClient.CreateDocumentCollectionAsync(database.SelfLink, new DocumentCollection()
{
Id = "courseDocumentCollection"
});
}

return documentCollection;
}

Now you should the the Collection for Course is created as follows,

Creating Document :  After creating the database and collection, we can now create the documents. We make use of CreateDocumentAsync() for this purpose. We will need to pass the URI of the collection under which we want to create the document and the document data itself. In this example we make use of the Course data mode i showed earlier and pass it to the create method. Here is the code snippet:

private static async Task CreateCourse(DocumentClient documentClient, DocumentCollection documentCollection)
{
Course course = new Course()
{
CourseId = Guid.NewGuid(),
Name = "En",
Teacher = new Teacher()
{
TeacherId = Guid.NewGuid(),
FullName = "Scott Hanselman",
Age = 44
},
Students = new List<Student>()
{
new Student(){
FullName = "Trump",
StudentId = Guid.NewGuid()
}
},
Sessions = new List<Session>(){
new Session(){
SessionId = Guid.NewGuid(),
Name = "CosmosDB",
MaterialsCount = 10
},
new Session(){
SessionId = Guid.NewGuid(),
Name = "Ch1",
MaterialsCount = 3
}
}
};
Document document = await documentClient.CreateDocumentAsync(documentCollection.DocumentsLink, course);
}

You should see the document inserted in localdb Emulator as follows.

Querying Document:

Now that we have created a document, we can see how to query it. We can make use of CreateDocumentQuery() method for this purpose. We will need to pass the collection link on which we need to query. We can then build the query as a LINQ expression and the client library does the rest. This is the best part of the client library. It has the ability to translate your LINQ expression to cosmos REST URIs without me having to crack my head in constructing those URIs. Here is the code snippet:

private Course QueryCourse(Guid guid, String dbName, DocumentClient documentClient, string collectionName)
{
Course selectedCourse = documentClient.CreateDocumentQuery<Course>(
UriFactory.CreateDocumentCollectionUri(dbName, collectionName))
.Where(v => v.Name == "CosmosDB")
.AsEnumerable()
.FirstOrDefault();
return selectedCourse;
}

Note that you will need to import System.Linq for the LINQ expression to work.

Deleting Database:

Finally, we can make use of DeleteDatabaseAsync() method to delete the database programmatically. We will need to provide the database link to the delete method. We can use the UriFactory.CreateDatabaseUri() helper method to create the database link. Here is the code snippet:

await documentClient.DeleteDatabaseAsync(UriFactory.CreateDatabaseUri(dbName));

Well, those are the main features that Azure CosmosDB client provides and if you are stuck with any of the steps above , you can check out the repository i have added with the samples.

Happy Coding! Lets spread Azure's CosmosDB to the world.