RFQ / BOM 0 Sign In / Register

Select Your Location

Important considerations and analysis of advantages and disadvantages in smart speaker design

June 30, 2021


Needless to say, voice-controlled speakers (often called smart speakers) are a popular consumer product. According to data from market research firm eMarketer, in 2017, 35.6 million American consumers used voice-activated devices at least once a month, and this number grew at a compound annual growth rate of nearly 50%.

Future market forecasts are also relatively optimistic. Juniper Research predicts that by 2022, smart devices such as Amazon Echo, Google Home, Apple HomePod and Sonos One will be installed in most American households. They also predict that 70 million families will install at least one of these smart speakers in their homes, and the total number of devices installed will exceed 175 million. This is especially impressive for a product category that did not exist before November 2014.

However, these seemingly simple devices are often more complex than microphones and speakers used in conjunction with Internet interfaces. Smart speakers contain many electronic functions, all of which are implemented by using dozens of complex integrated circuits (ICs). Original equipment manufacturers (OEMs) enter the smart speaker market with differentiated products. They must decide which devices to provide, how to provide them, and the trade-offs that can be used in such small, low-power devices.

What are the practical functions of smart speakers? How to use smart speakers in the home? In short, smart speakers first capture and digitize the voice commands of end users, then transmit the results to cloud services based on network connections for interpretation, and then respond to end users through operating instructions or response results. Smart speakers can also search for and play audio content from devices that have a network or Bluetooth® connection. As shown in Figure 1, many smart speakers can interact with other devices in the home, such as lights, door locks, and temperature control systems.


As a media player, smart speakers must be simple in design, elegant in appearance, and provide good sound quality. As a smart home hub, they must provide accurate language recognition and connection for the entire set of smart devices in the home.

OEM manufacturers not only hope that their products can stand out in this process; more, they hope to gain control of the information access and transmission of the room or even the entire residence, thus becoming the only digital media and family Automation hub.

Make smart speakers a reality

Smart speakers need a lot of circuits to achieve their normal and good operation. To achieve this goal, we need a series of complex analog, digital, mixed-signal and power management subsystems, interfaces, etc., and let them realize interconnection.


In addition, we still have many design issues to be solved, such as the number and type of microphones, audio output and speakers, power management, user interface, and wireless connections. For OEM manufacturers, the first question is whether to use a "black box" chipset, which includes a system-on-chip (SoC) for audio decoding and signal processing, and a microcontroller (MCU) that integrates Wi-Fi® and Bluetooth radio. . Sometimes, this also includes a custom power management IC (PMIC). However, this "canned" solution cannot provide much design space for product differentiation. Let us now take a look at the design areas and challenges in smart speaker systems.


When choosing a microphone technology, the advantages and disadvantages of each technology may not be obvious. In this regard, we can choose any of the following options:

• "Analog" microphones based on microelectromechanical systems (MEMS). It has an integrated preamplifier and an external 24-bit audio analog-to-digital converter (ADC) to output formatted digital codes to the SoC.

• MEMS-based "digital" microphones. It has a single-bit first-order delta-sigma modulator ADC that can output a pulse-width modulated (PDM) digital bit stream, which requires further filtering to create a formatted digital code. Whether it is a SoC dedicated to speech recognition or a digital signal processor (DSP), this filtering must be handled. The independent voice DSP can reduce a lot of processing work of the SoC, but it will also increase the cost.

Digital microphones are more expensive than analog microphones, but the SoC front end of analog microphones will also require additional ADCs. Given that the sensor size needs to be adapted to the performance limitations of the ADC in the microphone package and the integrated ADC itself, compared with analog microphones with a separate ADC, digital microphones also have a lower signal-to-noise ratio (SNR) and smaller dynamic range. The SNR of a common digital microphone is about 65dB, and the dynamic range is about 104db. When the ADC is integrated, we cannot further improve SNR or dynamic range through filtering and oversampling.

On the other hand, when an analog microphone is combined with an external ADC, its SNR or dynamic range (the two have the same meaning in the ADC) can be as high as 120dB. This external ADC is usually a 24-bit multi-channel high-precision audio ADC, using a third-order or fourth-order delta-sigma modulator with high oversampling function. They also integrate complex programmable digital decimation filters; PGAs with configurable automatic gain control functions, and miniature DSPs for additional noise filtering and equalization. If in a typical crowded room or a room where music is being played, the sound level in the surrounding environment can easily reach 60dB, unless the end user is close to the microphone or uses more microphones to make the command much higher than the ambient sound, otherwise, digital The lower dynamic range of the microphone may result in the inability to correctly recognize the voice command. Increasing the dynamic range from 104dB to 120dB will bring amazing results, which requires serious consideration. If we increase the dynamic range by 6dB, we can double the range of speech recognition. In some cases, it is impractical or useless to expand the dynamic range too much, but you can also gain more design space by this. With an additional 14dB of dynamic range, you can save costs by reducing the number of microphones required. After adding the word microphone, in addition to increasing the cost, the system will also route the three signal traces (data and clock) of each pair of microphones to the SoC according to the number of PDM inputs available to the SoC itself, which increases the complexity of the layout. not possible. The fact is that every signal trace will receive and/or radiate noise, which makes electromagnetic interference a bigger problem. Finally, the clock line running to each digital microphone can cause routing and jitter problems. At present, analog microphones have different outputs and support common-mode rejection of signal wiring. The ADC also provides bias power for each microphone, which reduces the complexity of the power tree for the array.


Using an analog microphone equipped with a precision ADC can expand the microphone range and increase sensitivity, which not only reduces cost and complexity, but also significantly reduces command recognition errors in various noisy environments. With the introduction of the second generation of smart speakers, this error rate will gradually become an important market advantage.

When adopting multi-microphone design and speech recognition, we also don't need to redesign. The TI circular microphone board (CMB) reference design based on PCM1864 (as shown in Figure 3) uses two 4-channel audio ADCs to connect with a set of analog microphones (up to eight), and can extract clear signals in a noisy environment User voice commands.

Speaker amplifier and power supply

For speaker amplifiers, you need to make trade-offs between output power (usually between 5W and 25W), power consumption, thermal performance, size, speaker protection, and sound fidelity.

A simple speaker system with a mid-range tweeter and woofer can produce excellent sound quality. At the same time, if combined with the latest audio processing technology, multiple speakers can provide a 360-degree audio experience.

You can also choose to perform a one-time room calibration to adjust and best match the spectral characteristics of the speakers, or use more complex adaptive adjustment methods to compensate for the sound effects in the sound zone. TI PurePath console graphics development kit can provide simple one-time tuning and achieve excellent results.

In terms of power consumption and thermal performance, one way to reduce continuous power consumption is to combine amplifier pulse-width modulation schemes with adaptive power supplies to reduce speaker power requirements. This technique uses a variable (non-fixed) switching frequency for the Class D output while changing the frequency based on the audio content. In other words, the more content, the higher the switching frequency; the less content, the lower the switching frequency.

In order to improve efficiency, you can also dynamically adjust the amplifier's output power supply voltage according to the content. This technique is called envelope tracking. It tracks the audio content and increases the voltage (output power) only when the music needs to increase its power, especially in the heavy bass part (there are many peaks in the signal content).

The stereo evaluation module reference design for digital input, Class D, IV induction audio amplifiers (as shown in Figure 4) not only accepts digital inputs in multiple formats and provides high-quality audio, its Class D topology also includes other functions to maximize Reduce power consumption at multiple output levels without sacrificing fidelity and performance.


Power management

Like most electronic systems, power management plays an important role in system design. Our ultimate goal is to efficiently provide power to reduce heat dissipation, thereby achieving a smaller, lower cost system, and extending the battery runtime of portable systems. SoC and Wi-Fi chipsets are sometimes bundled with dedicated PMICs, but you may still prefer to use separate DC/DC converters, low-dropout regulators, and voltage monitors to modify functions (such as sequencing ), change the circuit board layout and reduce noise and/or cost to increase the board layout space and increase the vendor flexibility of discrete implementations.

In addition to the functions provided by a fixed integrated solution (such as operating at a lower quiescent current or using a higher switching frequency (such as 1.4MHz to 4MHz), you may also want to optimize the design to reduce the footprint to meet the needs of smaller Inductor requirements. Or you can also use pulse skip or ECO mode to save power under light load. At the same time, please do not switch the audio frequency band below 20kHz (this may cause audible noise). In addition, you may also need The system input voltage is flexible. These amplifiers require a 12V to 24V power supply, which can be provided through an internal power supply or an external power adapter.

An internal AC/DC power supply can provide the main power, but an external AC/DC wall adapter with an output voltage of 12V or 5V is more common, depending on the speaker power required. The main power supply can be provided through the micro USB interface for low-power speakers or the new streamlined USB Type-C™ for high-power speakers, replacing the bulky traditional wall AC/DC adapters and barrel sockets. Due to the different power levels of these adapters, using USB Type-C requires some level of handshake from the speaker to the adapter, or the use of an input USB current limit switch or a battery charger with integrated overcurrent and overvoltage protection.

For portable speakers, a technology called power path management supports the use of external AC/DC wall adapters to charge the battery while at the same time charging the speakers "in real time" through an integrated regulator. If you need a higher speaker amplifier power rail (such as 12V or 18V), one option is to use two 8V batteries and then increase the voltage according to the needs of the speaker amplifier. The battery charger needs to boost the input voltage to a higher battery voltage (if the adapter output voltage is 5V), and you need to use an additional boost converter for the speaker amplifier power rail to achieve higher peak power Voltage. In addition, a portable smart speaker system must have a low standby power consumption level and an effective step-down converter to achieve a longer running time between charging cycles when the battery is the only power source.

Since speakers are the main power consuming equipment, power supplies that are closely related to their amplifier requirements can achieve a cost-effective, low-power design. The envelope tracking power supply reference design for audio power amplifiers (shown in Figure 5) is a good example of such a solution: it runs on an input voltage rail from 5.4V to 8.4V and provides 2 to 8Ω loads. × 20W power (using 7.2V power rail). In addition, it can change the output voltage according to the peak-to-peak envelope of the audio signal, thereby maintaining high efficiency within the output voltage range. Therefore, it dynamically adjusts the power amplifier's power supply based on the audio content to optimize its power consumption.


User Interface

You must decide which type of user interface to provide based on the desired end user experience, because the man-machine interface is a major factor in the differentiation of the smart speaker market. Such interfaces may include low-cost simple buttons and single indicator LEDs, rotating LED arrays, small LCD displays, and LCD displays with touch input and tactile feedback.

LEDs are basically used to indicate status, and recently they are also used to improve the end user experience by generating dynamic colors in various patterns. Simpler systems may use single-color LEDs, but most systems use red, green, and blue (RGB) LEDs. If you choose multi-color LEDs, you need to determine how many RGB LEDs are used and whether the system processor, MCU, or a new multi-LED driver with integrated LED engine will control them. Each choice needs to weigh cost, power and system considerations. Using an integrated LED graphics engine can reduce the burden on the processor when it manages graphics generation, and drive the RGB LED array when the processor or MCU enters a low-power standby mode.

As shown in Figure 6, various LED ring light illumination pattern reference designs illustrate how to design a multi-color RGB LED ring light graphics subsystem using a new multi-channel RGB LED driver equipped with an integrated LED engine. Use the ambient light sensor IC to automatically control the brightness of the LED.


The price of the corresponding panel buttons may be low, but they are more prone to mechanical failure and only have a single function. This type of button requires the end user to "press and hold" to perform operations (up, down, scrolling). In the field of smartphones, this type of operation is outdated and contrary to normal usage habits. In contrast, the sensitive surface of capacitive touch can support more interaction and can enhance user interface functions. This touch-sensitive surface can detect the proximity of the end user without physical external force, and supports backlighting that is easier to use in dark environments. Different from simple pressing, the touch-sensitive surface can support "swipe" or "rotation", allowing users to more easily access the familiar interface, which can make smart speakers stand out. A reasonably designed capacitive touch controller can operate on various surfaces, such as plastic, glass or metal, and can be flush with the surface of the speaker enclosure.

The gesture-based capacitive touch speaker interface reference design (shown in Figure 7) provides an easy-to-use evaluation system for the multi-gesture capacitive touch interface of smart speakers using TI's capacitive touch MCU. This design supports tap, flick, slide and rotate gestures.

Wireless connections

Finally, there is a basic out-of-the-box usage problem. If it is not connected to the Internet, the smart speaker will not work properly. Considering speed requirements and power limitations, we will provide you with design decisions on the best connection method.

The most common smart speakers can connect directly to the Internet via Wi-Fi. Here, the broadband of IEEE 802.11n is more than enough, and it also supports multi-room wireless speaker mesh connection. However, Wi-Fi power amplifiers consume a lot of power and may limit the running time of battery-powered smart speakers. Therefore, speakers that support Wi-Fi connections are usually plugged directly into a wall outlet or equipped with an AC adapter that supports continuous operation.

In order to cover the room as much as possible or improve the stereo sound quality, users often want to use multiple smart speaker devices, which requires the broadband support of IEEE 802.11n/s to realize a mesh network. In a mesh network, any speaker can become the master (connected to the cloud) when the other speakers are used as slaves. If the speaker operating as the master is powered off or disconnected from the network, the mesh network will automatically assign other speakers as the master. In a multi-speaker mesh network, the biggest problem is synchronization.

The Wi-Fi controller in the mesh network must have a reliable synchronization scheme to avoid trouble for users.

Battery-powered portable speakers may transfer Wi-Fi cloud connections to nearby mobile devices. If you want to connect to a mobile device to achieve indirect cloud connection and/or listen to the content stored on the mobile device, you need to use traditional Bluetooth (or Bluetooth basic rate) to achieve continuous connection to stream audio content, which is caused by low Due to the broadband limitation of Bluetooth power consumption and the power supply scheme. When used in conjunction with traditional Bluetooth, Bluetooth Low Energy can control the communication between devices.

Home automation is another function that currently exists in many households as a separate entity. As an independent hub, it can be connected to the Internet via Wi-Fi, and can also be linked by setting up a wireless mesh network for home automation (implemented according to standards such as Zigbee®, Thread, Z-wave, etc.) with dedicated lamps and thermostats. As long as this additional independent hub is available, smart speakers can reasonably announce the provision of home automation via the Internet.


However, in order to eliminate the need for end users to purchase this additional wireless hub, smart speakers can simply add a multi-band wireless MCU with an integrated RF power amplifier to become a home automation hub. The wireless MCU handles the protocol stack operation and controls the radio to avoid burdening the existing SoC or Wi-Fi network processor. At the same time, it supports communication through commonly used long-distance home automation protocols (including 2.4GHz and sub-1GHz frequency bands). Because Wi-Fi and Bluetooth also use the 2.4GHz frequency band, you need to ensure the coexistence of the two through a combination of hardware and software built into the integrated wireless MCU.

Look to the future

The smart speakers of the future will not be just stand-alone devices for audio only. As flat-panel TVs are thinner and lighter, they require smaller speakers, which will have a negative impact on TV sound. Therefore, sound bars that can enhance the sound of flat-panel TVs will become increasingly popular. Adding voice recognition is obviously the next step in the development of soundbars.

To realize this vision, smart soundbars will need to include a set-top box for wireless video streaming, while only one HDMI cable is connected to the TV, which is used as a huge display. As flat-panel TVs are thinner and lighter, TV control circuits and power supplies can also be implemented in smart sound bars. Then, smart speakers and smart sound bars will compete to become the hub of the entire home entertainment system. After adding the home automation connection, these devices will also compete to become the automation hub of the smart home.

Another new addition is the smart speaker display. Adding a display screen to a smart speaker is a natural extension of its function. Just as the center console display screens are constantly added in cars, consumers also require home information/entertainment devices to provide additional visual experience. What we can also see is that the way content is requested and displayed will be different from the handheld smartphone or tablet experience. Since voice commands are the main mode of requesting content and control, we will need to use simplified search and control applications to help quickly obtain accurate results. In addition, we can simplify the displayed images, reduce the need for touch interaction, and provide super-large images suitable for long-distance viewing.

This will provide clear visual content, allowing consumers to get a more pleasant experience when interacting with smart speakers.

With this new display function, smart speakers can give way to smart soundbars in the living room, thus focusing on areas outside the living room. Smart speakers can provide small personal displays, from integrated LCD screens to large ultra-short-range HD projections (using TI DLP® technology to create large displays on any surface). In high-traffic areas, smart devices near the kitchen or living room need to be more beautiful and undisturbed. Adding a flat-panel display the size of a tablet computer or larger does not always meet these standards. When users obtain information through smart speakers, such as weather, cooking, traffic, etc., and make expressions to anonymous voices, projection display technology can provide a more interactive experience. As a result, the role and importance of smart speakers in the family will continue to change and develop, thus bringing new trends and opportunities for designers to make their designs unique.