Hi Josh, tell us about your journey and how you arrived at Domino Data Lab.
I’ve been in the analytics space for about 18 years and have loved every step along the way. We, humans, are naturally prone to making decisions based on a limited understanding. Information, when properly communicated, frees us from our biases and ignorance. It is so exciting when that happens. That’s why I love this work.
As to my journey, I earned my Undergraduate degree in Mathematics from UC Irvine, worked as an Analytical Consultant for a bit, and then earned a Masters in Applied Statistics from Cornell University. Since then I’ve designed and implemented Data Science solutions across several domains including Manufacturing, Public Safety, and Retail. I’ve also managed Data Science strategy and teams for several analytical software companies.
When Domino approached me about coming on board, I was blown away by the Product and Domino’s thought leadership. Domino fought for the very things I knew our discipline needed, namely, to stop being a back-office experiment and start being a reliable, regular part of the enterprise, and to stop researching in a one-at-a-time skunkworks manner and start acting like a true research lab with collaboration, reproducibility, and proper management of knowledge-based assets. I wanted to be a part of that journey.
How big is your Data Science team? What strategies do you adopt to meet your data goals?
Domino has about a dozen employees with a solid Data Science background acting in a variety of roles. As the first company to begin to define the Data Science space, seven years ago we’ve had to blaze our own trails which means you have to be agile.
Our data scientists speak at industry conferences on the latest innovations, they train clients on new open-source techniques, they help clients architect systems to support collaborative data science research, they design product innovations to solve current and nascent problems for our industry, they help customers to succeed in their data science projects, they look at product usage metrics to enable smarter internal decision making, and much more.
Agility, collaboration, capturing of insights and failures, and rapid prototyping are just a few of the strategies that we emphasize across the variety of data science roles at Domino.
Data Science is an enterprising career. Yet, it’s the toughest. What unique challenges does one have to deal with to become a Data Scientist, and then lead Data Strategies for an enterprise?
Yes, there are challenges to becoming a Data Scientist and then leading data strategies for an enterprise, but I would argue that those challenges are not unique to Data Science. They are common to many technical professions.
The traits, skills, and work habits needed to excel at becoming a data scientist are sometimes quite different than those needed to be successful at leading data strategies for an enterprise. Learning how to balance hands-on technical work with strategic and softer leadership is not easy. It helps to have a mentor. It helps to study. Most of all it helps to practice and learn from your successes and failures.
We know about Data Science applications. Could you tell us more about the business side of Data Science, for example turning AI into a service, or training Big Data Teams, etc.?
As data science gets a seat at the executive table, more and more accountability is coming to our discipline. It is not uncommon for data science organizations to have revenue targets or at least some means of justifying their contribution to the company. This transition requires increased attention by data science managers and analytical executives on the fundamentals of project management and business decisioning.
The concepts will be mostly the same as you would find in other applications, but data science does put its own spin on things. Deciding what to work on, searching for prior art, defining the customer, acquiring success criteria, prototyping the final solution, providing an efficient process for building the product, capturing knowledge (successes, failures, and insights), navigating risk (including ethics and trust), deploying solutions, monitoring deployed assets, and calculating ROI are just a few of the bullet-point essential to ensuring business success.
Do you think Data Science/Predictive Intelligence could have done better in:
1. Predicting the origin of viruses, such as SARS in 2019 and Corona in 2020
2. Preventing the spread of COVID-19 from China to global hotspots
I don’t. I think it is unrealistic to expect AI and ML to solve all the world’s problems. In my opinion, there will be plenty of situations where AI and ML are not the right fit. Knowing when and how to spend your resources is just as important as knowing how to build the best data science product.
I think that analytics, in general, can help us understand the impact and spread of COVID-19 in a post-event manner. It can forecast the effect COVID-19 will have to help us determine how we should respond. I do not think it could have prevented the origin or the spread of the disease.
Data Science and AI Engineering teams extensively work with technologies. What kind of infrastructure does one need to scale the results from a start-up to a full-blown customer-centric company? What’s the timeline should one focus on –
This is one specific problem that data science platforms are seeking to solve. Last year Domino paused work on most of our product enhancements to completely transform the platform, from the ground up, on Kubernetes. Domino was the first data science platform to use docker containers. We saw the potential of docker to transform the way data science was done. We saw the ancillary benefits of Docker-based work to collaboration, environment management, knowledge asset management, and deployment of artifacts.
We see a similar transformation with Kubernetes as it takes the scale, security, and orchestration of containers to true enterprise levels. As data science work evolves from the back office to the main stage, it will need to be regulated and governed under IT controls. This is mandatory for the security and coherence of analytical work across the enterprise.
Domino facilitates this while allowing data scientists to use the tools they know without having to be DevOps experts. Companies not working on improving the speed and efficiency of their data science process are missing the boat. That is what is going to separate analytical winners and losers over the next 10 years. Timelines for data science projects have to come down from months to days. And they are.
What is the future of AI, ML Automation, and Robotics in Data Engineering?
Deep learning is, of course, transforming the industry. Most of the research coming out of academia will continue to be focused on deep neural networks. Applications of this technology will continue to improve. Chatbots will achieve near human-level abilities. Computer Vision has already surpassed human-level abilities and will continue to evolve.
At the same time, there will always be a need for more traditional ML and statistical modeling. Some business and research questions do not lend themselves to advanced AI methods. One thing I hope the future holds, is less Marketing hype about AI and more of an understanding of the differences between, and benefits of, various forms of analytics.