5 Big Data Myths Debunked.

By Sanne Steegstra

So you heard about this thing called big data, and how it’s going to rock your  world. But beyond all this yay saying and data praying, what is actually going on and what should you buckle up for ? 

 

An edgy perspective on  the subject: 5 Big Data myths debunked

 

Myth 1: Big Data is big 

Nope. Big is relative. The ‘Big’ is marketing lingo, an attention grabber, as if we should automatically be afraid of scary unknown big things. Big actually means ‘more difficult to access and query than we are used to’. Yes, at some point you will probably need new tools. But does that make your data big, or your legacy tools and systems old? Telecom operators have been analyzing CDR’s (call detail records) for over fifteen years now, often using hundreds of millions of records. Sounds big? Heck no, it will fit on a USB flash drive that is available on Amazon for under $10, how can you call that big? Most people and companies are not working with Big Data, just data. If you want to talk about big data problems, ask the guys at NASA, planning missions where around 24TB of data will be streamed. Every day. From outer space

 

 

Myth 2: Big Data is a technology thing
 

Wrong, it is a paradigm shift in business models. And up until the new models are a common thing,  Big Data will  keep its adjective ‘Big’. Don’t be fooled by the traditional soft- and hardware vendors that are the first ones to step up to sell you Big Data Solutions. This doesn’t mean it’s an IT thing, don’t confuse the message with the messenger.  

Big data is about acting smart, and right now about changing  organizations to be smart, to have a competitive advantage with a better understanding and serving of customer needs. There, I said it; customer needs. And there’s your shift in business models: first comes the customer need, then the product. Successful companies of the future will not look for customers to sell their products to, but look for products that meet their customers’ needs.

It will go from 1. The understanding, to 2. The meeting - and 3. even the creation of new customer needs. How is it possible to create new customer needs you ask?  Well, with new technologies come new latent customer needs, it is inevitable. Remember standing bend over a map spread out on the trunk of your car? And How navigation software dramatically changed the driving experience? Now compare the first devices to today’s standards.. would you still be satisfied with a navigation system that can’t even tell you where the nearest gas station is, or doesn’t use live  traffic information? Well in a couple of years, you will feel the same way about the solution that cannot even suggest tomorrow’s best departure time for you, giving you an extra 15 minutes of sleep. All this is not about technology, it is about serving customer needs with technology that has been around for a while. 

 

Myth 3: There is an enormous shortage in analytical talent and experienced analysts. 

A lot of companies are having a hard time recruiting the right analytically skilled people. The quest for analysts and data scientists is a hard one, with consultancy agencies acclaiming that message in advertorials everywhere. Although there can never be enough analytical talent, and yes there is a shortage,  an important part of the problem lies somewhere else.


Your company is not interesting enough

Nice pic, but in the non-StockPhoto reality you are probably another boring company..

Nice pic, but in the non-StockPhoto reality you are probably another boring company..

Let’s be honest, your company’s corporate website might show pictures of young and good-looking people, working on an apparently fun business problem while pointing at a computer.. however in the non-Photostock reality you probably are a boring company that sells boring products and have boring problems to solve. So how on earth can you compete with interesting start-ups and cool, tech savvy companies? Well, just create interesting problems. Create multidisciplinary  teams where analytical talent is not only (mis)used as support. Allow them to create the interesting problems. Be like Google, facilitate them in spending time on their own projects. If managed correctly, you might end up with your next big thing this way. 

It is not your fault you are selling insurances, but the blame is on you if you hire analysts to only do some 1.0 direct marketing. Appealing to engineers and the analytical employee with coding skills will be less difficult when you provide the right challenges.

..A Playstation and a foosball table won’t do.


You already hired them, they are working in the wrong department

And I bet it is the IT department.. All your STEM ( Science, Technology, Engineering, Mathematics) skills are in one place. Nicely hidden away on a separate floor, or in some sort of ‘incompany quarantine’. So besides a big mentality- and organizational change,  there are two things you can do.  You either teach your marketing staff analytical skills, or you teach your analytical  staff marketing skills. I suggest you try the last. Bring them in on marketing and sales meetings. Again, not just as on demand support, but in the (co) driver’s seat.  You will be surprised how much creativity you can unleash.

Data Analytics still  is unnecessarily complex

A data analyst loves to analyze data, not the hardship of accessing the data. Are programming skills required to create cool tools, models and applications?  Or are they an absolute necessity because otherwise 90% of the time would be filled by meeting with the IT department? Data analysts work on the frontiers of data. That means the data is by definition not structural, seldom relational and hardly quickly accessible. The company providing a plug and play like sandbox solution for all company data will leap an important part of the analytical gap. 



 

Myth 4: Big Data is social data 

Social is data’s super sexy showcase. It will often start with social data, not only because it (still) is up for  grabs and there is a lot of it, but also because everything with a like button on it  appeals to marketers, creative companies and more than the usual suspects (yep IT, BI and CRM, I mean you guys). Since it is a fat chance that you – oh dear reader - are working for Facebook or Google, social data will mainly be big for them. Data lies at the core of their business model.  And although it is all about liking and sharing, they don’t like to share.

So don’t forget to have a focus on your own ‘big' data. Got a central cash register system, a website with a large volume of clicks and views (and it’s up to you if you decide to label it large or ‘big’) or even better, have check-ins, sensor data or some other sort of activity generated data? Even better! That’s your big data!   Got none of the above? An internet connection, web scraper and the blogosphere will do fine as well.  Wherever there are actions and transactions or interactions and the capability to store them, there is data.  You can make it as big as you like.

 

Myth 5: Big Data is a hype

Stop the definition debate. Who cares. Everybody agrees on the possibilities and disruptive force data can have. The era of data has already begun. 

 


What we are sensing is merely the urge of a breeze, storm is coming. 


Sanne Steegstra

 

3 Books on data: must read list

No matter if you are a CTO, Marketing Manager or Data Analyst, these are the books you want to add to your reading list. A smart selection with 3 brilliant books on data and analyses. Good and highly entertaining reads, for both the layman and the data pro. 

 

 

 

The signal and the noise - Nate Silver

signaL-and_the_noise.jpg

The versatile and multi talented Nate Silver has written the must read for everyone who is either interested in data, comes across professional predictions at the office (no matter if it is about the stock market, housing market or baseball figures next to the coffee machine) or is just up for a good read.

 

Largely based on Bayes’ theorem Nate Silver provides a method that will get you to understand and bridge “the gap between what you know, what you think you know and what you should know”. Basically the Signal And The Noise is about thinking probabilistically: especially using conditional probability (the probability A is true if B has happened).

 

In a modest way, with appealing examples Silver provides a good mental instrument to identify false prophets and charlatans, be it in the form of optimistic sales figures in the boardroom or overconfident political pundits on TV. After reading this book, you won’t be a statistics or forecasting expert, but you can expose the next person that tries to, willingly or unwillingly, impress or lie to you with data- and statistical powerplay.


The Signal and the noise is Available over here on Amazon or Bol.com

 

 

 

 

Super Crunchers: How anything can be predicted -Ian Ayres

 

Yale professor Ian Ayres definitely puts in a word for the use of statistics in every day (business) life. The common thread throughout Ayres’ story is the question of using ‘Experts vs. Equations’.  

 

From appealing subjects as rating wine (and predicting future value), via application in baseball and marketing, Super Crunchers also addresses more weighty subjects with topics concerning governmental decision-making and medicine. But where Ayres can point out some nice examples regarding fighting crime and unemployment with data, he should cut physicians some slack when he touches the subject of fact-(evidence) based medicine. Yes, the revolution of super crunching will drastically change medicine, and yes expertise and intuition will make way for data and predictive models, Ayres himself is making a little drama comparing today’s doctors to the House character..

 

 

However, using easy to understand –and sometimes witty - examples Super Crunchers breaks a lance for fact- and data based decision-making. The future of expertise and intuition might lie in basic statistics. You don’t have to be a data expert or professional to understand how basic rules of thumb will help you make better choices every day.


Super Crunchers is Available over here on Amazon or Bol.com

 

 

 

The Information: a history, a Theory, a Flood - James Gleick

 

Let’s be honest, if you don’t already have some enthusiasm for data or information theory or got buzzed by the Big data hype, you are probably not going to read James Gleick’s 400+ paged ‘The Information’.

And that is a shame.

 

James Gleick has written the chronicle on information. Showing how information became the fuel of the modern society and economy.

With beautiful examples from African drums (a.k.a tam-tam or talking drums: A sophisticated communication system that actually has the information in the context), early alphabets, secret code, Gutenberg’s printing press, to DNA and quantum physics, Gleick enlightens us with other ways of creating, storing and sharing information and therefore communication in general. Our species' story is one about information as well. We have come a long way since we left the savannahs. Gleick tells this story in a fascinating way. You don’t have to be a data scientist or CTO to be intrigued.

 

The Information puts things in perspective on what information was, is and where it is going. Not only on information and your mistaken beliefs about it, but more important on its impact on society and how it’s going to change our world in the age of information explosion.


The Information is Available over here on Amazon or Bol.com

 

                                      * * * * * * * * * * * * * * * * * * 




The Coolest Data Mining Jobs

You probably came across the headlines warning us for the looming shortage in analytical talent. McKinsey estimates are somewhere between 140.000 and 190.000 people for the next few years, and that is U.S. only..   

But before you quit your day job and get retrained, what are the coolest data jobs? We have made a Top 5 list of where to send your resume.

 

1. Data Analist @ NASA 

If you were hoping on your own desk in a shuttle, please apply again in 80 years. However if you want to take a hand in mankind's next giant leap, this is definitely one of the most impactful employers. NASA's data mining department's mission is described as finding issues before they become incidents.

Data miners find, diagnose, predict, and alleviate problems on all missions, finding the clues to safer flights. And with a cost of $450 million per shuttle mission, failure is simply not an option.

If you want to get a feeling on what you''ll be working with and get your hands dirty, check out these algorithms and data sets at DASHlink. 


Applications can be done via NASA's career website


 

 

2. Data analist @ New York Yankees

Or any other team for that matter. Ever since Brad Pitt starred in "Moneyball (2011), people are increasingly aware of the role statistics and analyses play in sports. For baseball, with hardcore fans keeping notebooks with game details for every inning even before the computer era,  this national pastime was bound to become the first sport to seriously incorparate analytics. And when sports get serious, the money get serious...

      You will harness the wealth of data by analyzing and predicting batting performances, evaluating opponent's tendencies, assisting the coach in what player to bring in, calculate optimal team composition based on individual qualities in different situations and well, basically whatever you can come up with to improve performance. Professional baseball teams are not rarely more into -or even embracing- data mining and analytics than big the corporates are. 


Get in the lineup over here.

 

 

3. Gaming data analist

Do you understand statistics and know what makes gameplay design stand out from the crowd? Are you ready to handle truly big data generated by millions of game sessions to help game developers make better decisions in game design? And most of all, do you love gaming? Make sure to check out the big game developer's websites on a regular base, cause they are hiring!  

 

For those of you born before Pac Man: video games have outgrown the arcade hall & developed to a multi-billion dollar industry, enforcing their teams with the best data analysts in the market. Next to okay money to be made one of the main perks is the possibility to work for a truly cool company. Don't worry about the right tie, a clean shave and corporate chit-chat at your job interview, worry about amazing them with your hardcore coding skills and multidisciplinary understanding of gaming. The biggest players will generally let you use your own preferred software, but be prepared to get your hands dirty with some hardcore SQL, Hive/Hadoop and Python.


Pick your winning team at EA gamesRiot GamesROCK STAR GAMES or Activision.


 

4. Data Journalist @ The Guardian

the_guardian.jpg

In the digital world facts have become data, so it makes sense the ultimate fact-checkers have a whole new discipline of their own: Data Journalism. 

 

With a vast and growing source of .. well sources.. , data of interest to journalists is increasingly available on the web. With initiatives as Opengov even governments, always of special interest to journalists, are opening up and sharing their data. Either if you want to check your members of parliament's expenses and create a scandal or investigate deeper relations between unemployment and crime, data is often unstructured and sparsely spread over sources, so some basic programming skills come in handy. 


Some nice examples of Data Journalism are: GapMinder , Texas government salary investigation,  Visualization of the Iraq war and the British Class Calculator.


If Data Journalism wants to become mainstream, the next step it has to make is to drop the  focus on cool infographics and start telling the story again. The data-based story.

 

 

5. Obama’s campaign data strategist

Okay, in spite of the fact that it is quite unlikely that Obama will have a third term.. we had to sneak this one in.  Perhaps the most impactful data analyses job in this list. 

Don't take it personal -or political - , the main reason we mention the Democrat's team is that The republican party failed to set up a successful data strategy team.

And, I admit, since we are from Europe, we tend to cheer for Obama just a little bit louder.


Over a hundred engineers, developers, data scientists and plain old hackers teamed up in Obama's Chicago headquarters, contributing to the 2012 victory. As Harper Reed (campaign CTO) described, what they did was making technology a "force multiplier". Reed's task was to create a data mining infrastructure to target voters, both for their money and their votes.

       So what would a day on the job look like? Well to begin with, your teammates would be extremely experienced, former Google, Facebook, Twitter and Quora empoyees. Working on building tools with names like 'the Facebook Blaster'and the 'People Matcher with tools like ...well whatever kind of tool does the job.  With their hacking mentality, no one will force you to use any kind of tooling.  if you are good with Python, use Python, better with Rails, Rails it is. The challenge is to get the answer, not to find the best way to it. Or as Reed states: "We have to elect the President. We don't need to sell our software to Oracle". 


You wonder, with so much brains and data-street-smartness, did this team build Direct Marketing's Supermachine? The  core of the machine are response models, multichannel with a focus on online, so what makes its purpose differ from the campaign management software the bulk of marketing departments use?

...Or perhaps the Supermachine is the team itself?

I vote for the last option. Being part of the most data-driven presidential campaign is a great resume booster, as we saw in June this year, when Google's Eric Schmidt hired almost the entire team to work for Google.