How 640 Million Smart Speakers by 2024 Will Reinvent At-Home Audio (Part 1)

November 6, 2020 by  Andrew Cohen

WHAT’S HAPPENING: COVID Quarantine + the Smart Speaker Boom Is Leading to the Reinvention of At-Home Audio

Smart speaker ownership is growing at an astounding pace:

  • In 2021, it’s projected there will be 163 million smart speaker units installed worldwide, marking a 21% YoY growth

  • The global smart speaker installed base is forecasted to reach 640 million by 2024

  • 34% of non smart speaker owners say they are likely or very likely to buy one in the next six months

  • In 2019, the global install base for smart speakers surpassed wearables and AR/VR products. By the end of 2021, it’s projected that there will be more smart speakers than tablets.

  • The US will continue to be the largest global market with 90 million installed devices and 46% YoY sales growth

  • Over the last few months, Amazon, Google, and Apple have all announced additions or updates to their smart speaker product suites, signaling that this is a priority sector for these big tech incumbents.

  • Since the COVID-19 outbreak:

    • 36% of US adult smart speaker owners say they are using their device more to listen to music and entertainment, and 52% of those between 18-34 years old say the same

    • 35% of US adult smart speaker owners are listening to more news and information, and 50% of those between 18-34 years old say the same

COVID is Forcing Audiences to Adjust their Audio Consumption Habits

COVID is accelerating the “at-homification” of everything. From film to concerts, fitness, shopping, education, food, beauty/skincare, healthcare, and everything in between — the global pandemic has forced brands and consumers to reevaluate how they interact. We may not be sheltering in place forever (fingers crossed), but our consumption habits and expectations for omnichannel services will never return to their pre-pandemic state.

Audio is no different.

Like any other entertainment medium, audio content formats and business models have always been defined by the context of its consumption. For audio, and podcasts specifically, the consumption context has predominantly been out-of-home. In 2019, 64% of podcast listeners reported that they tend to consume while on-the-go — either commuting or walking somewhere.

However, these “on-the-go” scenarios have all but disappeared from our lives. 77% of US adults have changed their typical routine due to COVID. 54% are staying at home unless it’s completely necessary. During this “new normal”, consumers are building new habits around their in-home audio. A recent report from Edison Research explains: 

“With tens of millions of Americans no longer commuting, smart speakers are becoming even more important as a conduit for news and information”.

So, between the proliferation of smart speakers and the changing context of podcast consumption, how should creators and brands adapt to capitalize on this evolving landscape?

HOW Will Audio Formats Evolve to Fit New Consumption Environments?  

Changing consumption settings lead to changing content formats and business models. This idea isn’t new. It’s a familiar process with recent precedent.

A little over a decade ago, the proliferation of the smartphone led to the mobile video revolution.

Once everyone had a smartphone in their pockets, video also became an on-the-go medium. Programmers, platforms, and creators had to adjust to this new consumption environment. It became clear that merely repurposing traditional video for mobile distribution wasn’t going to satisfy consumers. When watching videos on a phone — while on the train, in a waiting room, or even while watching TV on the couch — audiences have completely different needs and expectations defining their delight.

As user behaviors shifted over the last decade, billions of actively engaged smartphone viewers emerged. And the platforms and creators that won them over — TikTok, Snapchat, Instagram, Buzzfeed, House of Highlights — have been the ones that best understand how to program for these unique consumption contexts (these technologists and programmers have also been revolutionary in many other areas like social connectivity, AI, data analytics, and more, but here we focus on evaluating the at-home audio opportunity).

Josh Constine refers to the defining characteristic of quality, mobile-first, micro-entertainment as “content density”:

“I define Content Density as the entertainment value of a piece of content divided by its length — how many oohs, ahhs, huhs, or hahas per second. Content Density is a measure of broadcast efficiency. The higher the density, the faster and more frequently the content delivers on its purpose of being cute, informative, inspiring, impressive, alluring, or funny…if a creator truly respects their audience’s attention, they cut out the noise and deliver pure signal.”

We believe that the “at-homification” of audio will follow a similar path as the “out-of-homification” of video. While video became shorter and denser when it moved out of the home, audio will become shorter and denser when it moves into the home.

Historically, Traditional Audio Formats Were Developed for an Out-of-Home Audience 

“Drive time radio” set the standard. This long-form audio format relies on a sense of personal familiarity. The hosts become your interesting, funny, smart friends who you’re hanging out with during your commute. You listen to their informal, unstructured conversations to learn or laugh, and feel a sense of community during your daily drive.

This is still the structure of most podcasts today.

The shift from radio to podcasts broadened the accessibility of out-of-home audio — you no longer needed to be in a car to listen — but for the most part, it didn’t revolutionize the content itself. Out-of-home audio is out-of-home audio. What worked for Howard Stern works for Joe Rogan.

However, the Shift from Out-of-Home Audio to At-Home Audio on Smart Speakers Is a Dramatic Shift in Consumption Context, Similar to the Emergence of Mobile Video

The use cases are completely distinct. Most consumers won’t sit in front of their smart speaker to listen to an hour-long podcast. Just like most don’t want to watch Game of Thrones on their phone, or a TikTok video on their TV.

And just like the mobile video revolution didn’t mean the death of at-home video, the emergence of a dedicated at-home audio vertical won’t diminish or cannibalize the out-of-home podcast market. Netflix and Tiktok are co-existing just fine. Rather, we believe that the proliferation of smart speakers will introduce a multichannel programming ecosystem within the world of audio, just like cell phones created in the world of video. And, like we’ve seen with video, these two distinct consumption environments will necessitate distinct content and revenue models.

The best podcasts for smart speakers will master the art of “content density”. Delighting users with the right content at the right time, for the right amount of time.

So, what do we think the future of microcasts and smart speaker audio will look like? We believe there are three key pillars that will come to define the most effective in-home audio programming…

The 3 Core Pillars of Effective At-Home Audio

  1. Habitual / Routine-Based

    Out-of-home audio has always derived its value through capturing an essential component of our daily routine: commuting and traveling.

    By owning our consumption habits during these moments of transit, audio publishers have developed an unmatched level of audience intimacy and trust, which is the foundation of audio monetization. From the exploding podcast advertising market (podcast ad reads are 4.5x more valuable than banner ads and the total ad spend is projected to increase 55% in 2021), to derivative revenues like merch, touring / live events, and film / tv adaptations, the value of audio springs from the unique level of audience connection that this habit-driven medium evokes. Smart speaker audio will create value by leveraging the at-home context to distill and intensify this audience intimacy, engagement, and trust.

    Think of all the routines that happen within your home. Big or small. From the time we wake up to the time we go to sleep, there are limitless opportunities for publishers and brands to enhance these moments and become an intimate part of our lives on an ongoing basis. As we discussed in our report on the beauty e-commerce space, there’s immense long-tail value in cultivating this “brand stickiness”, by being a companion to the user during life’s many mundane yet frequent activities.

  2. High “Content Density”

    When it comes to smart speaker microcasts, most listeners will let you into their homes for a short amount of time, every day. But only if the producers earn it by truly making this time count. Only if they’re truly enhancing this routine. A viral TikTok video that takes 15 seconds to watch might’ve taken days to develop, produce, and edit. Smart speaker listenership will primarily be high-frequency and require low time commitments. Yet just because it’s 10% the length of a traditional podcast, it doesn’t mean it requires or deserves 10% of the effort. This programming will need to pack a punch, and the payoff will be daily active engagement.

  3. Personalized / Adaptive / Interactive

In addition to being “dense” in order to earn a place in the daily routines of the user, microcasts also need to be highly personalized, adaptive, and interactive.

AI is becoming an increasingly central component of smart speaker products. Microcast programmers must utilize this capability to create even “stickier” routine-based programming. Routines like cooking, skincare, and fitness are consistent and often regimented, yet they also evolve and vary over time. To effectively function as an additive companion to our household routines, the best microcast content will not be one-size-fits all. Instead, it will be a perpetual dialog between user and programmer, resulting in content that becomes more beneficial over time.

In order to achieve this breakthrough in audio programming, we anticipate that more podcast studios will be acquired by companies that manufacture smart speakers. For example, last month, Sonos led a $6.4 million Series A investment in premium podcast publisher QCODE. We’re already seeing the benefits of verticalization in other areas of the audio market. When Spotify recently launched new podcast features such as interactive polls and music integration, it worked with its subsidiary podcast companies like The Ringer, Parcast, and Anchor to roll them out. 

What does all this mean for smart speakers and microcasts? With owned and operated creative capabilities, smart speaker manufacturers like Amazon, Google, and Apple could provide the type of personalized, adaptive, and interactive content that will help users get the most out of these products.

In Part 2 of this industry report (link here), we ideate some fun examples, including… 

  • Shoppable microcasts, audio-driven live commerce, and “dropcasts”

  • Sticky brand routines

  • Cooking / Food shows

  • Live sports companion content

  • Education

  • Fitness, Health, and Wellness

  • Serialized short-form storytelling

  • Gaming

Ping us anytime at hello@wearerockwater.com. We love to hear from our readers.

Get RockWater’s latest insights on Media, Tech, and Commerce straight to your inbox