Skip to main content

8 posts tagged with "bigdata"

View All Tags

· 2 min read

One of the valuable addition to data analytics by Microsoft was adding python into SQL server.Now SQL Server will support the two primary languages of Data Science within SQL Server R and Python. I am a fan of Python and  Python is near the top of the most popular programming language charts, many people are interested in learning more about it.  As many professionals are unfamiliar with Python, i wanted to this post about the same.

Installing Python in SQL Server

If you have already used R with SQL server then the process for using Python in SQL Server is very similar to it.  Microsoft renamed R Services to Machine Learning Services, and now allows both R and Python to be installed, as shown in the screen.  Microsoft’s version of Python uses Anaconda, which is an open source analytics platform created by Continuum. This is where Python differs from other open source languages, as Continuum is providing the version of Python as it contains data science components which are not included in the standard distribution of Python. Continuum also sells an enterprise version of Anaconda, with of course more features than come with the free version. Also it is mandatory and  important to remember the python environment as you will need select the same distribution when running Python code outside of SQL Server.

Configuration Changes for Python

The last thing needed to run Python is to configure and restart the SQL Server Services. In a new query type the following command

 sp_configure 'external scripts enabled', 1   GO   Reconfigure   GO  

After restarting the SQL Server Service, SQL Server will now run Python code. Since Python is easy to learn for even a novice developer. Code is easy to read and you can do a lot of things just by looking at it. Lets dig into python with sql server and do wonders with data analytics.

· 2 min read

Let me start this blog with a little example. Assume, Sachin have a leak in a water pipe in his garden. He takes a bucket and a some sealing material to fix the problem. After a while, he see s that the leak is much bigger that and he needs a specialist  to bring bigger tools. Meanwhile, he still uses the bucket to drain the water. After a while, he notices that a massive underground stream has opened and he needs to handle millions of liters of water every second. 

He does n't just need new buckets, but a completely new approach to looking at the problem just because the volume and velocity of water has grown. To prevent the town from flooding, maybe he needs his government to build a massive dam that requires an enormous civil engineering expertise and an elaborate control system. To make things worse, everywhere water is gushing out from nowhere and everyone is scared with the variety.

Welcome to Big Data.

Key elements of Big Data:

  1. There are over 600 million tweets every day that is flowing every second which tells about the High Volume & Velocity
  2. Next  need to understand what each tweet means - where is it from, what kind of a person is tweeting, is it trustworthy or not which tells about the High Variety
  3. Identify the sentiment - is this person talking negative about iPhone or positive? which describes about the High Complexity
  4. And finally need to have a way to quantify the sentiment and track it in real time which tells about High Variability

Traditional architecture of any Big Data solution would look something like below,

· 3 min read

Well, i was one of the speaker at Colombo Big Data Meetup which was held yesterday and i spoke about Google's bigquery. Hence i have decided to write a blog on that so that you could get benefited if you are a BigData Fan.

What is Big Data?

There are so many definitions for Big Data , let me explain what does it really mean? In the near feature, every object on this earth will be generating data including our body.We have been exposed to so much information everyday.In vast ocean of data, complete picture of where  we live where we go and what we say, its all been recorded and stored forever.More data allows us to see new , better different things.Data in the recent times have changed from stationary and static to fluid and dynamic.we rely a lot on data and thatch is  major part of any business.we live in a very exciting world  today, a world where technology is advancing at a staggering pace, a world data is exploding, tons of data being generated. 10 years before we were measuring data in mega bytes, today we are talking about data which is in petabyte size, may be in few years we are going to reach zetabyte era, that means the end of English alphabets.Does it means the end of Big Data? .No . If you have shared a photo or post or a tweet on any social media,You are one of them who is generating data, and you are doing it very rapidly.

Once you have decided to use bigquery there are certain things you need to know before using for optimizations and less cost.

Do not use queries that contains Select * , which is going to execute entire dataset and hence it will result in a high cost.

Since bigquery stores values in nested fields it is always better to use repeated fields.

Store in multiple tables as possible since it is recommended not to have JOINS

Bigquery also supports extensions such as ebq and dry run to encrypt the data and for executing the query to actually check how much resources that actual query is going to consume, which makes lot of developers and data analysts job easy.

I will be writing two separate blogs in the coming days on how to integrate with Bigquery and How to ingest the data into bigquery.

You can find the slides of the presentation from here