How to Set Up Your Machine Learning Team for SuccessDirector of Data Science at Featurespace
“Machine learning” — one of the hottest buzzwords sweeping the C-suite — isn’t new; however, its latest iteration is an extension of a relatively recent trend wherein data- and analytics-driven organizations used machine learning as a competitive advantage.
In a 2006 article for the Harvard Business Review, Thomas Davenport wrote, “At a time when firms in many industries offer similar products and use comparable technologies, business processes are among the last remaining points of differentiation. And analytics competitors wring every last drop of value from those processes.” That was 13 years ago, and since then, machine learning has matured.
The latest trends it is driving — especially within the financial services industry — are:
- Deeper, more complex models
- Application to a broadening set of data types (such as unstructured text, images, video)
- An ever-accelerating prediction velocity, with real-time predictions becoming standard
Around the world, business leaders are under intense pressure to establish machine learning and data science teams to accelerate processes, increase performance and quickly provide value. So, how should you tackle this challenge?
Start with the Basics
Fundamental to the process is fully understanding your existing business processes, the problem you are trying to solve and what data you have to work with. This information allows you to formulate the best approach, which must then be measured against the resources available to you.
Models are fairly simple to set up; the challenging part is the real-time execution and monitoring of production.
Manage Expectations and Deliverables
Be aware that, along with adding complexity and cost, haphazardly implementing machine learning can also increase risk. For example, if you are seeking to better manage payment flows, start with batch testing.
It is much easier to catch and prevent negative consequences in testing than in a live scenario. And this illustrates only the technical risk; machine learning technology typically touches many areas of the organization, and coordination across all stakeholders can be complex, slowing the project down.
The lack of data and/or data quality is one of the biggest factors in data science and machine learning talent dissatisfaction.
Pick your battles carefully, and build a simple, yet reliable machine learning solution that consistently demonstrates business value. From there, it will be easier to get internal buy-in for more ambitious plans that target larger problems.
Prioritize the Infrastructure
Successful machine learning efforts depend critically on large amounts of quality data. According to The Financial Times, the lack of data and/or data quality is one of the biggest factors in data science and machine learning talent dissatisfaction.
It isn’t necessary to design an entire data lake with the fanciest distribution of Hadoop before you start. If hosting data in the cloud is an option, then the cloud infrastructure providers have easy mechanisms to host and query large amounts of data that can be held in compliance with most companies’ information security policies.
Amazon Web Services (Redshift), Google Cloud Platform (BigQuery) and Microsoft Azure (SQL Data Warehouse) all have compelling services that are great starting points for data collection and warehousing. If a cloud is not an option, there are efficient open-source tools, like ClickHouse, that can be deployed on local infrastructure for a slight increase in operational overhead costs.
Building the Team
An organization often tends to focus a great deal of attention onto the data scientists because they’re the ones building the machine learning model.
However, keep in mind that there are many other key players needed to ensure success, including:
- Domain experts: To understand the data and the business and who is part of the development team. Usually this person also has the relationships and political capital necessary to obtain consent to get data and use it for analytic purposes.
- Data architects: To know what data to get from where, what the specifications are as well as how to check and ensure data quality.
- Data engineers/developers: To ensure that you get the data reliably and consistently and will identify and remedy issues once in production.
- Data scientists: To develop the model, while understanding the business challenge and the data.
- Project managers: To coordinate the project across business functions, such as IT.
- Data analysts: To create dashboards, monitor model quality, identify any areas of model weakness and identify/fix any data issues once your model is in production.
- Model governance: To implement a robust governance structure through external validation.
If the workload you’ll be placing on your models will be relatively small, hosting your model and using a cloud prediction service (like Azure Machine Learning, Amazon SageMaker, Google Cloud’s Machine Learning Engine) is a good choice.
The software engineering is done by services such as Amazon Web Services. Conversely, if the model will be processing a large number of events (several million a day), then having machine learning engineers to implement the prediction infrastructure in-house would be more beneficial.
Maximizing Machine Learning with Real Time
The move to real-time machine learning comes with an important consideration: the impact on risk and cost. To achieve real time, software developers must set up specialized components, data engineers must connect the system to the data flow and additional data scientists will be needed for reporting and maintaining a pulse on the performance of the models in place.
Establishing these core competencies provides you with a team of cross-functional experts who will quickly produce business value, set the stage to attack larger inefficiencies more effectively down the road and deliver a competitive advantage for the entire organization.