The Art of Data Science
Data Science requires that data must be gleaned and pre-processed before data is in readily ingestable format for the models and algorithms. It is therefore the first step before applying data science to the date to transform it into valuable insights and visual delights.
Data Science Evolution
The explosion of the information technology usage through world wide web and social media has provided a variety of data sources which was not readily available earlier. Endless streams of data are now available from social media giants , stock exchanges, government institutions and non government organizations on demand. This now enables to predict and plan for an organizations efforts to make its marketing strategy more effective and also explain previously un-explainable revenue drops. For example by corelating local government’s road works with revenue drops, it is possible to visualise the impact of such road closures on revenue falls for organizations. Similarly research publications state, “On average, overall crime increases by 2.2% and violent crime by 5.7% on days with maximum daily temperatures above 85 degrees Fahrenheit (29.4° C) compared to days below that threshold.” Thus seemingly unrelated events are able to be investigated and correlations established.
Data Streams amd Data Sets
When billions of people use the internet continuously their activities including clicks, searches, navigations are all potentially valuable information. Ecommerce purchases, in-game player activity, information from social networks, financial stock market trading floors, geolocation services, and telemetry from connected devices are all now available as streams. All this information is collected and made available in some cases free and in some cases at a cost for potential users. Most data streams are continuously collecting data from thousands of data sources and then send smaller sized data records simultaneously at high speed. Nasa provides data from space through its data stream at Nasa Data. Social networks provide their data streaming services at Twitter Streaming API, Facebook RealTime Updates API . The Australian bureau of meteorology provides streaming data services here .
Several government and Non Government Organizations do provide data sets which are not necessarily in streming format due to the data’s innate naure. For example a huge cache of date can be accessed from the Australian Bureau of Statistics. Similar plethora of data can also be obtained from at the local gvernment level as also world bodies like OECD.
Python and Scala are a language that is used extensively in datascience from data preparation to modelling. Tools suvh as Alteryx and Knime are in vougue in the data preparation phase of the data science continuum.
The SAS Institute developed SEMMA process is used as a model in data science. The five steps that created the acronym of SEMMA are
Data Science enables the interleaving of inter-domain data with a view to extract valuable insights for organizations. This in turn maximizes the timeliness of deliverables and real-time solutions to meet customer expectations.
When billions of people use the internet continuously their activities including clicks, searches, navigations are all very valuable information. Ecommerce purchases, in-game player activity, information from social networks, financial stock market trading floors, geolocation services, and telemetry from connected devices are all now available as streams. All this information is used by the data scientist to analyze and build models to predict the next move of a potential customer in real time.
Model building is basically evaluating the use of statistical functions and building algorithms to decipher a pattern and make projections or predictions. MATLAB and R are used extensively in model building to identify patterns in the data set of consequence.
Sophisticated models are thus built and trained by using real data. With increased volume of data and exposure, the model gets refined oveer time and predictions become more and more accurate. These models are used to work out and answer questions like “what a customer is most likely to buy”, “what specific feature or function that he or she is looking for” etc. This in turn enables a customised marketing pitch to convince the individual to choose the companies product or service.
“One picture is worth a thousand words” is an old adage. The task of converting data stories into easily explained images is in the hands of visualization tools such as Power BI and Tableau. Several visualization tools enable the data scientist to convert the stories hidden in the data into visual delights. The data scientist in addition to modelling customer behaviour uses these tools to present the visual presentation of the data to the information users. This covers the predictions and the impact of following a specific recommendation that comes of out of such data analytics using powerful visualization tools.
Data analysis and Statistics has evolved over many years even prior to computing and data science as we know it. We have restricted our discussion of data science in the context of Information Technology. In summary data science is a fast evolving field with great potential particularly in predicting and personalising products , services and solutions to win the customer’s favour. Organizations have moved on from viewing data science as a nice to have to a tool for survival and gaining competitive advantage.