Why we’re building Voiceflow
I’m frequently asked about the “voice” market. In particular, how big it is and how big it could be. My easy answer through 2019 was to quote the latest smart speaker shipments and describe a wonderful future where voice interfaces were as ubiquitous as the phones in our pocket. The way I saw the world, voice was still fairly niche, and use cases were less obvious. This year, I’ve started to look at the world in a very different light, and it’s given new meaning to our mission at Voiceflow. To see the world in a new way, you just need to ask one question:
What does a bank teller do for a living?
This may seem like an obvious question, and the most common answer you would hear is bank tellers help customers access their accounts, transfer money and in general, be helpful. In reality, the bank teller is a human interface — their purpose is to relay information between the customer and the bank’s IT system. The same question, and answer, could be posed for retail workers, drive-thru staff, call centres, and thousands of other roles where the primary job function is being an interface to relay information between a customer and the company’s APIs.
When you begin thinking this way about these jobs, you start to realize the enormous impact conversational AI will have on the world and its workforce. From retail to call centres and salespeople, we’re talking about hundreds of millions of jobs worldwide over the next decade being replaced or augmented by conversational AI.
Conversational AI is not a platform shift; it’s another form of automation in the same bucket of technologies as robotics, the assembly line, and cars. A bucket of technologies that replaces what was once done by humans with machines. Platform shifts replace technology; automation replaces humans.
Conversational AI will get better, it’s inevitable
It took 67 years for 50 million people to own cars and seven years for 50 million people to use the internet. Given the increasing pace of technology, it’s not hard to believe that conversational AI will automate large swaths of jobs over the next five years. Most disruptive technologies first tackle niche markets and are inferior technologies in all ways but a single superior, unmatchable trait. For cars, this superior trait was their power that couldn’t be matched by horses, but they were far inferior in terms of versatility, reliability, and range. But, as the technology grows in adoption and investment, its weaknesses are fixed, and the older technology is replaced.
I’ve spoken about the innovator’s dilemma at length before, and it’s clear that I’m summarizing its core ideas again here, but only because they apply to conversational AI so well. Conversational AI’s unmatched trait is that it’s a computer born of silicon — it doesn’t need to sleep, rest, eat or unionize. Conversational AIs are magnitudes cheaper than their human interface counterpart when performing the same job. However, they are worst than a human at pretty much everything else right now.
Despite conversational AIs being worse in so many ways, the dramatic cost savings are already pushing its adoption and subsequent investment, starting with easier programmatic human interface roles such as call centre routing. As the technology gains more traction and investment, we can expect the complexity of conversations the AI can have to increase and, in tandem, replace larger and larger swathes of the human interface workforce. I wouldn’t be surprised if in 10 years, all major retail stores employ largely conversational AIs as frontline workers and only boutique retail features human associates as part of their luxury brand.
So what?
As this shift happens rapidly with or without our consent, we as a species of consumers need to ask ourselves one more question:
Do we want to be stuck having awful bot conversations all day?
For most, the answer is a resounding no. We all know the frustration of a poorly designed automated phone system that takes 2 minutes of routing for what would be 30 seconds with a human. Heck, even human interfaces aren’t all that helpful sometimes as I know firsthand having worked in retail for my first job (apologies again to anyone I tried to help). There’s a reason companies spend so much money training their employees to smile and be knowledgeable — we all want to have great conversations that are relevant and engaging.
But great conversations don’t come easy. There are entire textbooks written on the science behind great conversations and how we unknowingly cooperate as we progress through dialog. If you ask the average technologist, they will likely say that great conversations come through great technology – which is in part, true, but only half the equation. Great conversations are found at the intersection of great technology and great design.
As with every technology shift, a market often starts focusing on the technology where simply having it is a zero to one benefit. Think back to when the web first came out. It didn’t matter what your website looked like, as most of the benefit came from having a website. However, as the adoption of a technology increases – and subsequently, so does market competition through saturation – design becomes a significant differentiating competitive advantage. As we look to conversational AI, it shouldn’t be a contrarian idea to say that having well designed, great conversations with your customers is a competitive advantage.
The problem is today’s tools
Great tools for designing, testing, and launching conversational interfaces did not exist before, because until recently, there wasn’t a great market opportunity as conversational AI only lived in call centres. For the past 20 years, call centres have been at the forefront of conversational AI. From there, today’s standard conversation design toolset of flowcharts, word docs, and spreadsheets emerged. You may be asking why these three tools and not just one of the above? It’s because conversations have incredible depth and variety compared to visual design, which is relatively “flat.” Docs, flowcharts, and spreadsheets all uniquely cover a different level of conversational depth. Script docs provide a surface-level view of the conversation and its “flow” if the conversation went perfectly, but it’s linear and only shows one path. Flowcharts are the next layer of depth and display conversation logic and branching, but they cannot show deep response variety and error pathing without quickly becoming an unreadable mess. Spreadsheets offer the most conversational depth as they can be used to design context, response variety, slot fulfillment, and error pathing — but they’re hard to read and follow.
There’s a clear inverse relationship between how much conversational depth you can display and how easy the design is to follow. Conversations have been designed this way for the past 20 years across every major company, and it continues to be the standard methodology today. Before adopting Voiceflow as their design platform, we’ve worked with companies that have had a central spreadsheet design with 3,000+ rows for a single IVR.
Visual designs only take a glance to understand, whereas conversation designs require the actual experience. As Voiceflow’s Head of Product, Rob Hayes, says:
When designing conversations— the prototype is the design.
It’s just too hard to get a feel for a conversation from a messy spreadsheet. Most companies solve this prototyping need (even the largest tech companies) by “Wizard of Oz testing,” – where one person acts as the assistant by reading a script behind a curtain. This doesn’t scale as it’s a manual process, and testers don’t get the full experience. For most companies today, WoZ testing is the best they can do to iteratively test and user test without coding the entire experience – which is costly.
Lastly, imagine you’re a developer given a script doc, flowchart, and spreadsheet from your designer. Do you think you’d be able to interpret exactly the conversation that was in the designer’s head and create the right experience on the first try? For most, especially those new to conversational interfaces, this is a resounding no. The designer to developer handoff process is costly, slow, and requires many iterations to create the experience the designer wants to portray.
Introducing Voiceflow, the standard creative platform for Conversational AI
Great conversations require great tooling, and that’s why we’re building Voiceflow. We’re working to provide teams with the tools they need to design, test, and launch the interfaces of the future across every conversational channel and modality, including screen, voice, and chat interfaces.
Many platforms are trying to be the full-stack solution right out of the gate providing everything from design to NLP/NLU to analytics and content management. We believe this may work in an early market like today, where merely having a conversational experience provides most of the benefit. Like web and mobile, as adoption grows, design will become more critical. We believe customers will want to have the best tool for each part of their workflow to create the best possible experience. This is why we’ve opted to perfect the design, prototyping, and designer-developer handoff sections of the workflow while ensuring Voiceflow is easy to integrate with any other tool (e.g., CMS) or underlying technology (e.g., NLP/NLU/ASR).
We’re still early in our journey and have many more product launches and big announcements to come until our long-term vision is made public. For many customers, Voiceflow is a design and prototyping tool. For others, Voiceflow is a platform to build, launch, and host their production experiences. Voiceflow plays both roles and will continue to do so as our ultimate goal is to give customers a unified platform to design, test, launch, and manage great conversations across every channel.
The world is changing rapidly, and conversational AI is transforming how the world communicates. But, one thing is increasingly certain — great conversations start with Voiceflow.