Looking to hire a data scientist but don’t know how your organization’s data is collected? You might want to read this.
Of all the disciplines examined in Brainstation’s 2019 Digital Skills Survey, data science may encompass the widest range of applications. But although data science has existed for decades, it has only recently come into full bloom. “As the availability of data has expanded, companies have realized how important data science can be,” says Briana Brownell, Founder and CEO of Pure Strategy, and a Data Scientist for 13 years. “Every company now needs to have a partial focus on technology.” Just this week, for instance, McDonald’s paid an estimated $300 million USD to acquire its own big data firm.
It’s no wonder, then, that competition for Data Scientists is incredibly high. In just two years, the demand is expected to grow by 28 percent, equivalent to about 2.7 million new jobs. That’s more openings than new graduates will be able to fill—meaning tech workers in other fields will have to brush up their skills and transition into data to meet this demand.
In fact, our survey suggests this is already happening. Roughly four out of five data pros began their career doing something else, and 65 percent of all Data Scientists have been working in the field for five years or less. This huge influx of new minds has a double-edged effect, says Brownell; on one hand, “there are a lot of new ideas coming in,” she says. “When I look at some of the content coming out of the data science community, I’m surprised how much innovation there is.” The flipside, though, is a tendency to reinvent the wheel.
Recruiting and Reskilling
High demand for Data Scientists is great if you are one (or thinking about becoming one), but for employers, recruitment can be a daunting challenge. Here, reskilling is an obvious solution; it may be more cost-effective to retrain a current employee in data science than to headhunt a new one.
But even if you plan to hire a new data science team, your organization as a whole may need to brush up on its data literacy, Brownell cautions. “Everybody wants to work on something that has an impact on their workplace, that makes people’s lives better,” she says. “If your company culture isn’t such that [your Data Scientists] can make an impact, it’s almost impossible to hire.” Leadership must be capable not only of communicating to potential hires how they’ll be able to contribute—but of comprehending the proposals their data science team eventually puts forward as well.
Unfortunately, Brownell says, “the uncomfortable majority are the companies that haven’t figured things out.” Our survey backs this up: most respondents (52 percent) described the level of data literacy across their organizations as “basic,” with “intermediate” the next most common response (31 percent). This suggests that some foundational data science training could be useful for a large majority of companies—especially in leadership.
This need for improved data literacy—and communication—is heightened by the way most data science teams are structured: as a discrete team, usually with 10 people or fewer (according to 71 percent of respondents), and often five or fewer (38 percent). These close-knit teams can’t afford to be isolated. “Individuals who work within larger companies are usually within a small data science–specific group, and their clients are internal—other parts of the organization,” Brownell explains, “so it’s a team that has to operate across many different areas of the organization.”
So What is Data Science?
The common perception (that Data Scientists crunch numbers) is not too far off the mark, Brownell says. “There are a lot of datasets that need to have insights revealed from them, and that involves a lot of steps like model building and data cleanup, and even just deciding what data you need.” Ultimately, though, this effort is goal-oriented: “At its core, you need to do something with the data.”
For that matter, data isn’t always numbers. While a majority of respondents (73 percent) indicated they work with numerical data, 61 percent said they also work with text, 44 percent with structured data, 13 percent with images and 12 percent with graphics (and small minorities even work with video and audio—6 percent and 4 percent respectively). These survey results hint at the ways data science is expanding far beyond financial tables, enlisting people for such projects as maximizing customer satisfaction or gleaning valuable insights from the fire hose of social media.
As a result, “there’s enormous variety within the data science field,” Brownell says. “Every industry has its own take on what types of data the Data Scientists work on, the types of outcomes they’re expecting, and how that fits into their company’s leadership structure.”
In every case, though, the goal is to leverage data to help a company make better decisions. “That could be making products better, understanding the market that they want to go into, retaining more customers, understanding their labor force usage, understanding how to make good hires—all kinds of different things.”
The Nuts and Bolts of a Data Science Job
In some areas of tech, becoming a generalist can be your best foot in the door—not so with data science. Employers typically look for skills specialized to their industry. Because data science comes in so many different flavors, our survey probed deeper, examining five main job categories: data analyst, researcher, business analyst, data and analytics manager, and data scientist proper.
Across all these job titles, “data wrangling and cleanup” takes up the bulk of one’s time—but to what end? Most often, the goal is to optimize an existing platform, product or system (45 percent), or to develop new ones (42 percent). Digging deeper, we found that “optimizing existing solutions” tends to fall to Business Analysts and Data Analysts, while “developing new solutions” more often falls to Data Scientists and researchers.
The techniques Data Scientists use vary across specializations, too. Linear regression was a common tool across all categories, cited by 54 percent of respondents, but there were a few surprises when we looked at the software people are using.
Excel—that workhorse of dataset manipulation—is virtually ubiquitous, cited by 81 percent of all respondents, and the most popular tool in every category except Data Scientists proper (who most frequently rely on Python—and also cited a larger toolkit than other categories). What makes Excel so inescapable, even in 2019?
“The thing that I love about Excel is how it allows you to see the data and get an intuitive feel for it,” Brownell explained. “We also use a lot of Python, and in that case, when you’re doing analytics on a data file, it’s hidden; unless you specifically program part of your code to do some visualization of the raw data that you’re analyzing, you don’t see it. Whereas with Excel, it’s right in front of you. That has a lot of advantages. Sometimes you can spot issues with the data file. I don’t see Excel disappearing from analysis ever.”
That said, there’s still a long list of other programs used in the field—unsurprising given its diversity. SQL (43 percent) and Python (26 percent) lead in popularity, with Tableau (23 percent), R (16 percent), Jupyter Notebooks (14 percent), and a handful of others clocking up significant numbers—not to mention the whopping 32 percent of respondents who cited “other” tools, even given this already long list.
What is the Future of Data Science?
Finally, we asked what trends will shape the digital landscape over the next five to 10 years. Machine learning and AI—both of which have applications within data science—were overwhelmingly the developments respondents expect to have the biggest impact, at 80 percent and 79 percent respectively. This despite the fact that less than a quarter (23 percent) of them currently work with AI.
Artificial intelligence can “absolutely” transform data science, confirms Brownell, whose company develops AI products. “That’s really the glory of unsupervised learning methods. We only have so much time to look at these datasets, and especially with large ones, it’s very difficult to do everything. AI tools can help reveal something that maybe you wouldn’t have thought to look for. We’ve definitely had that happen.”
Other trends Data Scientists expect to dominate in the near future: the Internet of things (51 percent), blockchain (50 percent) and eCommerce (36 percent), augmented reality and virtual reality (38 percent and 27 percent), and even voice-based experiences (25 percent)—all significant showings, and all areas where data science can be put to good use.