AI is transforming all business functions, and software development is no exception. Not only can machine learning techniques be used to accelerate the traditional software development lifecycle (SDLC), they present a completely new paradigm for inventing technology.
Traditionally, developing a computer programs requires you to specify in advance exactly what you want the system to do and then hand engineer all of the features of your technology. Encoding many tasks in an explicit way is possible, as computers before the advent of AI were still quite powerful.
There are many tasks and decisions, however, that are far too complex to teach to computers in a rigid, rule-based way. Even an activity as seemingly simple as identifying whether a photo or video on the internet is of a cat is beyond the reach of traditional software development. Given the vast possible permutations that cat photos can take, no team of engineers can possibly enumerate all of the rules that would reliably recognize cats vs. all of the other possible objects that can appear in media.
Machine Learning Fundamentally Changes The Software Development Paradigm
Enter AI techniques such as machine learning and deep learning. In these approaches, an engineer does not give the computer rules for how to make decisions and take actions. Instead, she curates and prepares domain-specific data which is fed into learning algorithms which are iteratively trained and continuously improved. A machine learning model can deduce from data what features and patterns are important, without a human explicitly encoding this knowledge. The outputs of ML models can even surprise humans and highlight perspectives or details we haven’t thought of ourselves.
Thus, the most profound impact of AI on computer programming is the unraveling of how humans perceive, define, and execute software development. Author, scientist, and Google research engineer Pete Warden is confident that “there will be a long ramp-up as knowledge diffuses through the developer community, but in ten years I predict most software jobs won’t involve programming.”
Andrej Karpathy, a former research scientist at OpenAI who now serves as Director of AI at Tesla agrees, illustrating a future where “a large portion of programmers of tomorrow do not maintain complex software repositories, write intricate programs, or analyze their running times. They collect, clean, manipulate, label, analyze and visualize data that feeds neural networks.” Karpathy describes the sea change with a highly quotable insight: “Neural networks are not just another classifier, they represent the beginning of a fundamental shift in how we write software. They are Software 2.0.”
He describes the “classical stack” of Software 1.0 as explicit instructions to the computer as written by a programmer using languages such as Python or C++. A traditional software development lifecycle typically starts with requirements definition (i.e. a technical spec), then moves to design and development. Once viable prototypes are built, there’s QA testing. Finally, once a product passes muster, it is deployed to production and must be continuously maintained. Agile processes can make this cycle go faster, since engineers will choose a smaller feature set to focus on for 2-4 week sprints rather than attempt to build an entire piece of software in one go. The process, whether agile or waterfall, is essentially the same however.
Over time, these systems have become incredibly complex, requiring multiple dependencies and integrations as well as layers upon layers of functionality and interfaces. All of these components must be manually managed and updated by humans, leading to inconsistencies and unresolvable bugs.
By contrast, machine learning models extrapolate important features and patterns in data. In Karpathy’s words, Software 2.0 is code written in the form of “neural network weights” not by humans but by machine learning methods such as back propagation and stochastic gradient descent. Updating models entails retraining algorithms with new data, which will change how the model will behave and perform.
While machine learning development has its own debugging and maintenance challenges, Karpathy highlights the fact that the Software 2.0 has become both highly viable and valuable because “a large portion of real-world problems have the property that it is significantly easier to collect the data (or more generally, identify a desirable behavior) than to explicitly write the program.” Among the fields that stand to benefit most from Software 2.0 are computer vision, speech recognition, machine translation, gaming, robotics, and databases.
Karpathy also cited the benefits of the new paradigm:
- More homogeneous and easier to manage
- Can easily be baked into hardware
- Constant running time and memory use
- High degree of portability
- High degree of agility and integrability
- Easier to learn for future developers
- Better than the best human coder in certain functions/verticals (i.e. images/video, sound/speech, and text)
The pros are not without cons, however. The critical limitations of many machine learning approaches is our human inability to fully comprehend how such complex systems work, leading them to appear to us as “black boxes”. Another challenge that derives from our lack of understanding and control are unintended and embarrassing consequences that arise from flawed models such as algorithmic bias and bigoted bots.
Traditional Software Gets A Boost From ML Techniques
Traditional software development is not going away, however. Training a performant machine learning model is only a single step in productizing AI technology. As a popular Google paper asserts, only a fraction of real-world machine learning systems is composed of machine learning code.
Critical components such as data management, front-end product interfaces, and security will still need to be handled by regular software. However, technologies developed using the traditional SDLC can still benefit from machine learning approaches in the following ways:
1. Rapid Prototyping. Turning business requirements into technology products typically requires months if not years of planning, but machine learning is shortening this process by enabling less technical domain experts to develop technologies using either natural language or visual interfaces.
2. Intelligent Programming Assistants. Developers spend the vast majority of their time reading documentation and debugging code. Smart programming assistants can reduce this time by offering just-in-time support and recommendations, such as relevant document, best practices, and code examples. Examples of such assistants include Kite for Python and Codota for Java.
3. Automatic Analytics & Error Handling. Programming assistants can also learn from past experience to identify common errors and flag them automatically during the development phase. Once a technology has been deployed, machine learning can also be used to analyze system logs to quickly and even proactively flag errors. In the future, it would also be possible to enable software to change dynamically in response to errors without human intervention.
4. Automatic Code Refactoring. Clean code is critical for team collaboration and long-term maintenance. As enterprises upgrade their technologies, large-scale refactoring are unavoidable and often painful necessities. Machine learning can be used to analyze code and automatically optimize it for interpretability and performance.
5. Precise Estimates. Software development notoriously goes over budget and over timelines. Reliable estimates require deep expertise, understanding of context, and familiarity with the implementation team. Machine learning can train on data from past projects – such as user stories, feature definitions, estimates, and actuals – to predict effort and budget more accurately.
6. Strategic Decision-Making. A significant portion of time is spent debating which products and features to prioritize and which to cut. An AI solution trained on both past development projects and business factors can assess the performance of existing applications and help both business leaders and engineering teams identify efforts that would maximize impact and minimize risk.
According to a Forrester Research report on AI’s impact on software development, the bulk of the interest in applying AI to software development lies in automated testing and bug detection tools.
Can AI Create AI?
The ultimate question is whether AI can create AI, thus subverting the need for humans to be involved in technology development at all. Indeed, we’re already seeing huge growth of AutoML solutions which are technologies that aim to automate pieces of the machine learning model training process, reducing the workload on data scientists and engineers and enabling domain experts to train production-quality models. Solutions such as H2O.ai’s Driverless AI, Google Cloud’s AutoML, and Amazon Sagemaker automate or streamline key components, such as data preparation, model search and tuning, and model deployment and scaling.
In the next article, we’ll examine the benefits and the limitations of AutoML systems and address the controversial question of whether non-technical experts can deploy performant machine learning models or whether you still need data scientists and machine learning engineers to achieve your business goals.