Categories
Uncategorized

Data science reading list for Tuesday, October 30, 2018: The sexiest job, most in-demand data skills, 4 ways the data scientist has evolved, and the null hypothesis and p-values

Is data science still among the sexiest jobs of the 21st century?

It was in a 2012 Harvard Business Review article that data scientist was declared “the sexiest job of the 21st century”. Is it still true six years later?

I’ll spare you the torment and give you the answer, which (naturally) appears at the end of the article:

The role of data scientists is and will remain a sexy profession for some time, partly due to its relative exclusivity, and the field of data science itself will no doubt remain an exciting space.

You may find the middle of the article a little more useful, as it lists qualities of good data scientists:

A good data scientist should be:

  • Adaptable: Data scientists must be willing to constantly upskill themselves to master advanced machine learning skills such as deep learning. While technical skills are fundamental for data scientists, it’s crucial for them to master communication skills too so they can easily interact with domain experts or business developers. Data scientists will need to develop a better understanding of the overarching business strategy and business challenges in real-world scenarios to create solutions for real problems.
  • Statistics at the heart: Data scientists must have quantitative capabilities to figure out multifaceted trends within a data set that may entail more than one million rows.
  • Detail-oriented: Data often have errors and discrepancies, and data scientists must identify and correct incomplete, incorrect or inaccurate data. It’s critical that data are clean, high-quality and unbiased to ensure the best output upon which to make business decisions.
  • Good programming skills: Programming skills, together with statistics, are critical. For statistical analysis to happen, data scientists need to know programming languages (such as Java, SQL, and Python) to break down the data set in more digestible formats.
  • Business knowledge: While it is important for data scientists to be technically capable, they must also be business savvy and understand the organisation’s business goals and objectives, so they can analyse the data to support business success.

The most in-demand skills for data scientists

Here are the two key graphs from the article:

From the end of the article:

Based on the results of these analyses, here are some general recommendations for current and aspiring data scientists concerned with making themselves widely marketable.

  • Demonstrate you can do data analysis and focus on becoming really skilled at machine learning.
  • Invest in your communication skills. I recommend reading the book Made to Stick to help your ideas have more impact. Also check out the Hemmingway Editor app to improve the clarity of your writing.
  • Master a deep learning framework. Being proficient with a deep learning framework is a larger and larger part of being proficient with machine learning. For a comparison of deep learning frameworks in terms of usage, interest, and popularity see my article here.
  • If you are choosing between learning Python and R, choose Python. If you have Python down cold, consider learning R. You’ll definitely be more marketable if you also know R.

Four Ways the Data Scientist Has Evolved in the 21st Century

These four ways are:

1. Data science is more applied than ever. What can be built and fit over a real-life scenario has the dreadful requirement of mattering. Modeling for modeling sake is no longer a thing, and best-fit diagnostics are less important than best-fit for the situation. If a model goes unused, it serves no purpose. We can no longer tolerate or afford the luxury of building models purely for R&D purposes without consideration of utilization.

2. The skill of computer use seems to have taken over the knowledge of applied statistics. Understanding the interior workings of the black box has become less important, unless you are the creator of the black box. Fewer data scientists with truly deep knowledge of statistical methods are kept in the lab creating the black boxes that hopefully get integrated within tools. This is somewhat frustrating for long time data professionals with rigorous statistical background and understanding, but this path may be necessary to truly scale modeling efforts with the volume of data, business questions, and complexities we now must answer.

3. Data scientists are not weird anymore. We’re seen as strategic inputs to the decision-making process, and our craft is becoming much more understood. This trend is evidenced by C-level positions at large companies, vertical alignment and paths for data scientists, and inclusion at the highest levels, as well as the many academic programs and emphasis now available globally. This appreciation and positioning can sometimes make the field appealing for what seasoned data scientists might call the “wrong reasons” such as corporate fame and value. I would argue that we really want professionals in the field with a thirst for the truth – the science should be about empirically answering questions, and powered by truth-seekers at their heart.

4. Data Science is becoming more widely recognized as both art and science. Understanding the importance of the human – machine integration and complementary decision-making skills from each appears to have made its way more squarely into our field of understanding.

Statistical Significance, the Null Hypothesis and P-Values Defined & Explained in One Minute

And finally, some material that’s more than just hand-waving: a quick explanation of what the null hypothesis and p-values are, all done in a minute, courtesy of One Minute Economics:

Categories
Uncategorized

Data science reading list for Monday, October 29, 2018: The worst data science article, 5 basic stats concepts you need to know, Bayes, democratization, and web scraping

A terrible “data skills” article that you should read, but only as a warning

I remember the hype that surrounded the web in the late 1990s. I also remember the copious amount of well-intentioned misinformation that made the rounds as writers attempted to capitalize on that hype. It’s now data science’s turn, if this bit of “advertorial” in Harvard Business Review — Prioritize Which Data Skills Your Company Needs with This 2×2 Matrix — is any indication.

Written by Chris Littlewood, chief innovation and product officer of filtered.com (I’m not going to help them by linking to their site), a company that purports to use AI to “lift productivity by making learning recommendations”, the article clearly highlight’s the author’s ignorance and HBR’s willingness to publish any article that has to do with data or data science. To the credit of the readers, a number of them registered with the site simply to be able to post comments pointing out how nonsensical the article was.

Treat this article as an object lesson in technology hype, as well a sign that data science skills are seen as valuable.

The 5 Basic Statistics Concepts Data Scientists Need to Know

Forget that the article mentioned above said that mathematics and statistics aren’t useful data skills — you can’t do data science without them! You’ll need to understand these 5 concepts (in addition to others):

  1. Statistical features
  2. Probability distributions
  3. Dimensionality reduction
  4. Under- and oversampling
  5. Bayesian statistics

This article in Towards Data Science provides a brief overview.

Data Skeptic: Bayesian Updating

One of the better data science podcasts out there is Kyle Polich’s Data Skeptic, which has been around since 2014 and has over 400 episodes. The podcast features short mini-episodes explaining high level concepts in data science, and longer interview segments with researchers and practitioners.

I’ve just started working my way through this podcast, and have used the example in episode 5, Bayesian Updating, to explain Bayes’ Theorem to people who avoiding studying probability and stats. Give it a listen, then check out the rest of the podcast episodes!

The Democratization of Data Science

Here’s a Harvard Business Review article on data science that’s actually worth reading:

Intelligent people find new uses for data science every day. Still, despite the explosion of interest in the data collected by just about every sector of American business — from financial companies and health care firms to management consultancies and the government — many organizations continue to relegate data-science knowledge to a small number of employees.

That’s a mistake — and in the long run, it’s unsustainable. Think of it this way: Very few companies expect only professional writers to know how to write. So why ask only professional data scientists to understand and analyze data, at least at a basic level?

Data Science Skills: Web scraping using python

Another article from Towards Data Science:

One of the first tasks that I was given in my job as a Data Scientist involved Web Scraping. This was a completely alien concept to me at the time, gathering data from websites using code, but is one of the most logical and easily accessible sources of data. After a few attempts, web scraping has become second nature to me and one of the many skills that I use almost daily.

In this tutorial I will go through a simple example of how to scrape a website to gather data on the top 100 companies in 2018 from Fast Track. Automating this process with a web scraper avoids manual data gathering, saves time and also allows you to have all the data on the companies in one structured file.

Categories
Uncategorized

“Accordion Guy” and “Global Nerdy” Stats for 2009

Hand with finger holding up a small stack of beans

“You can’t improve what you don’t measure” is a maxim for many fields. Engineers, businesspeople and athletes may all have their own way of phrasing it, but however it’s put, they repeat it to each other all the time.

The act of measurement becomes murkier when applied to creative endeavours such as blogging. The qualitative stuff – How many people read the blog? Which articles were the big ones? Is the readership trend going up or down? – is pretty easy. A little StatCounter code embedded in the pages of The Adventures of Accordion Guy in the 21st Century and Global Nerdy does the tedious stuff; I just look at the data and interpret it. As for the qualitative stuff, I’ll leave that as an exercise for the individual reader.

Accordion Guy’s Stats for 2009

Once again, The Adventures of Accordion Guy in the 21st Century passed the “2 million pageviews” mark. As of this writing, here’s how the numbers break down:

  • 2,198,906 pageviews – that is, the number of web pages from the Accordion Guy blog that were downloaded. Every time you www.joeydevilla.com, one of the individual article pages or hit the “refresh” button on your browser while reading my blog, it registers as a pageview.
  • 105,599 returning visitors – when you visit Accordion Guy, the StatCounter code embedded on every page attempts to leave a “cookie” – a tiny scrap of data stored by your browser – for anonymized tracking. If the StatCounter code sees that your browser has already stored an Accordion Guy cookie, it means you’ve visited the site before. The cookie data includes the date and time of your last visit, and if it’s been more than an hour since you last visited the Accordion Guy blog, you’re counted as a “returning visitor”.
  • 1,672,393 first-time visitors – the opposite of a returning visitor is a “first-time visitor”. If the embedded StatCounter code can’t see an Accordion Guy cookie stored by your browser, you’re counted as one of these.
  • 1,777,992 unique visitors – this is a calculated value: “unique visitors” is simply the sum of returning and first-time visitors.

Here’s an incredibly compressed chart showing the day-to-day activity on the Accordion Guy blog:

Day-to-day statistics for the "Accordion Guy" blog

The spikes in the graph represent the most popular articles. The rightmost spike, which also happens to be the tallest, represents the How Fanboys See Operating Systems article from December 16th. That one got featured on Reddit and re-tweeted like crazy.

Here’s how the numbers look for each quarter:

Quarterly statistics for the "Accordion Guy" blog

The trend is up-slightly down-up-slightly down, but still rising overall.

Global Nerdy’s Stats for 2009

Accordion Guy is my “hobby” blog. It’s the forum in which I express myself, tell stories and jokes, share pictures I’ve taken and point to interesting things I’ve found on the ‘net. I write it “just for kicks”, and the moment I stop enjoying writing it, I’ll stop.

Global Nerdy is a different beast. It is my second personal blog devoted to programming, internet technology and the nerd lifestyle, my first being The Happiest Geek on Earth (which Cory Doctorow called me in this Boing Boing article, which points to The Accidental Go-Go Dancer, in which I chronicled my brief stint as an accordion-playing go-go dancer at a downtown Toronto nightclub). Global Nerdy is both: T

  • An exercise to make me a better programmer and tech advocate through writing about the field, and doing the necessary legwork and research to support that writing, as well as
  • Self-promotion. Yes, it’s also a mercenary playing-to-win, look-at-me, hire-me, separate-myself-from-the-crowd, I-am-ten-Scobles blog.

I can say with certainty that Global Nerdy has helped me land my last three jobs, which includes my current one as a Developer Evangelist with Microsoft Canadaa job I landed in the middle of the econopocalypse of 2008 after getting laid off. In spite of all the job market doom and gloom, I was unemployed a mere three weeks.

This year, Global Nerdy crossed the “1 million pageviews” mark for the first time. Here’s how the numbers break down (for an explanation of the terms, see the Accordion Guy review above):

  • 1,608,638 pageviews
  • 60,340 returning visitors
  • 1,263,873 first-time visitors
  • 1,324,213 unique visitors

Here’s the chart showing the day-to-day activity on Global Nerdy:

Day-to-day statistics for the "Global Nerdy" blog 

The spikiest period is in late January, which represents the buzz around the Winning the Gnu article, in which I won Richard Stallman’s auction for a plush version of the Free Software Foundation’s mascot, the gnu.

Here’s how the quarterly numbers break down:

Quarterly statistics for the "Global Nerdy" blog

Eek – a downward trend!

If viewed in isolation, this would be a worrying development. However, there’s another blog that’s been getting the readers that would normally go to Global Nerdy, and I’ve included a screenshot of that blog below:

Screenshot of the "Canadian Developer Connection" blog

Canadian Developer Connection is Microsoft Canada’s developer blog, and it literally pays the rent. As a Developer Evangelist for Microsoft, I’m paid to write it, and my performance – and yes, my bonus — is judged on the number of articles I write for it and the impact those articles have.

Furthermore, I’m trying to be Microsoft Canada’s most prolific, most-read and most influential blogger. After that, I’m aiming for Microsoft worldwide. I think my closest competition is my friend, and coworker (and guy who recommended me for the job), David Crow. Here’s how we stack up, blog-wise, according to Alexa:

Alexa stats for "Accordion Guy", "Global Nerdy: and David Crow's blog

In your face, Drinky Crow!

(I’ll admit, he’s got an edge on me in Twitter followers – I have 4,498, he has 4,719 – and we each have our own spheres of influence. And hey, he’s the man behind DemoCamp – I just help out.)

As a result, I’ve been doing two things:

  • I’ve been writing Global Nerdy articles and cross-posting them to Canadian Developer Connection.
  • I use Twitter to promote those articles, but I link to the Canadian Developer Connection one first, and the Global Nerdy one second.

I still think of Global Nerdy as my primary tech blog; I’m just  nice (and pragmatic) enough to share my material with Microsoft. Should the day come when Microsoft and I part ways – I can’t see such a day on the horizon, but the era of the lifelong “company man” has passed – I’ll still have it. There’s also the fact that sometimes, there’s stuff I’ll post here that I won’t post in Canadian Developer Connection, such as when I’m speaking for myself and not on behalf of Microsoft Corporation.

The Blogs Over the Years

Accordion Guy is a long-running blog – not the longest-running by a long shot, but pretty long-lived, having had its start in November 2001. I’ve been measuring it with StatCounter since 2005, and here’s how it’s been doing since then:

Yearly statistics for the "Accordion Guy" blog, 2005-2009

There was a slight dip from the 2008 to 2009 numbers, and the cure is simple: write more, write better.

Global Nerdy is a newer blog – my friend George Scriban and I started it as a career-booster in mid-2006. George no longer writes for Global Nerdy, what with his being very busy with stuff at Microsoft’s main HQ in Redmond, and my job is a little more in-your-face than his. Global Nerdy’s maintained an upward trend, with an big shot in the arm from my joining Microsoft in late 2008:

Yearly statistics for the "Global Nerdy" blog, 2005-2009

Again, the mantra for Global Nerdy in 2010 is simple: write more, write better!

To of you who read either of my blogs – thanks for the great year, and expect great things in the new decade!

This article also appears in The Adventures of Accordion Guy in the 21st Century.

Categories
Uncategorized

One Million Pageviews!

This may not really be of interest to anyone but me and StatCounter, but earlier today Global Nerdy hit the one million pageview mark for 2009. I’d like to thank all you readers who keep coming back for more; I promise I’ll make it worth your while!

(And if you happen to run web ads, feel free to drop me a line. I have a readership!)

Here’s a screencapture of the my StatCounter page for Global Nerdy:

Screencap: StatCounter stats page showing Global Nerdy's 1 million pages for 2009.

Categories
Uncategorized

Global Nerdy’s 2008 Stats

History Lesson

Global Nerdy is my third tech blog.

Joey deVilla and accordion, go-go dancing on a bar. My first was The Happiest Geek on Earth (don’t bother looking; it’s been offline for years now). I started it back in 2002 when my non-tech readers started to doze off after reading tech articles I posted on The Adventures of Accordion Guy in the 21st Century. It took its name from an article on Boing Boing that Cory Doctorow wrote about me when he heard that I’d taken up part-time work as an accordion-playing go-go dancer at a popular downtown Toronto bar.

In September 2003, a couple of months after Tucows took me on as their tech evangelist, I started The Farm (again, no longer in operation), which was pretty much The Happiest Geek on Earth run under the Tucows banner. While I did cover stuff directly related to Tucows, there’s only so much you can blog about Tucows’ core business of domain name registration, hosted email and managed DNS. Luckily for me, they didn’t mind that I blogged about all sorts of things of interest to developer and techies, and I like to think that I helped shift the perception of Tucows being “oh yeah, the shareware company”.

In mid-2006, I was chatting with George Scriban, my old pal from Crazy Go Nuts University. Somehow we got to talking about the tech blogosphere and came to the conclusion that yes, the web needed yet another tech blog. We’d both pitch in: George would cover things from his biz-dev and product-dev point of view, while I’d blog from the developer and goofball angles. We couldn’t think up a name, so I used a little program I’d been working on – The Duke of URL – to access Tucows’ “namespinner” service to come up with available domain names given some keywords. The keyword “nerd” resulted in a lot of junk names, but one stood out: globalnerdy.com.

Global Nerdy. That doesn’t sound bad,” I said.

“Actually, it sounds pretty good,” replied George.

“Even has a bit of an Engrish feel to it,” I added.

And now you know where the blog’s name came from.

Thanks

George’s made some very valuable contributions to this blog. Those insightful entries about the tech business in the archives in 2006? Those are his. He has a much better grasp of that stuff than I ever will, and he’s an astute observer and a great writer. The reason this blog registers on Techmeme at all? That’s also George’s doing. The Economist are fools for not snapping him up. He hasn’t contributed in a while, but that’s because the demands of both Microsoft (where he’s a senior product manager) and family life (a lovely wife and two handsome sons) have kept him pretty busy. While George’s presence on this blog is missed; he seems to be always present. Since 1987, we’ve somehow managed to end up working at the same place, whether it’s writing articles for the same paper, working at the same pub, joining Cory Doctorow’s startup and now, working for The Empire.

That means that you Global Nerdy readers are stuck with me. I’d like to start by thanking you, the readers, for your continued readership, comments, support and kind words.

I’d like to extend special thanks to my hosting company, Pressharbor, for doing an excellent job – the blog’s been Dugg, Reddited, Slashdotted, Boing Boinged, Hacker Newsed and Techmemed, and not once has it shown any sign of blogging down or just giving up and 500ing. If you’re looking for some rock-solid WordPress hosting, Pressharbor are the people to see.

And Now, the Numbers

Here’s what StatCounter has to say about Global Nerdy’s readership since the beginning. As with all web stats packages, you have to take these numbers with a grain of salt:

StatCounter chart showing Global Nerdy pageviews for 2006 - 2008 -- 2006: 24,742, 2007: 289,864, 2008: 702,913.

As the graph shows, 2008 was Global Nerdy’s best year, with 702,913 pageloads and 562,022 uniques, which is more than double 2007’s numbers.

Here’s how the 2008 numbers break down by quarter:

StatCounter chart showing Global Nerdy pageviews for 2008 -- Q1: 78,544, Q2: 130,183, Q3: 197,194, Q4: 296,992.

It’s a steady improvement, with Q4’s pageloads nearly four times that of Q1. A fair bit of it comes on the heels of a Stack Overflow podcast in which Jeff “Coding Horror” Atwood said some very nice things about Global Nerdy. (Thanks, Jeff, and remember: I have a Microsoft expense account now! The steak’s on me when next we meet!)

Beyond the Numbers and into 2009

While the numbers are a good indicator of whether I’m writing stuff that readers are finding interesting, I’m really looking to improve my qualitative performance. By that, I mean write better articles and get into some interesting things, which I’ll cover very soon in a “What’s Up in 2009?” article.