[see also: Qualcomm Advocates Parallel Computing By Joining HSA [OnQ blog from Qualcomm, Oct 3, 2012]]
Source and more information: QDSP6 V4: Qualcomm Gives Customers and Developers Programming Access to its DSP Core [BDTi, June 22, 2012] [Applications of Digital Signal Processing in Mobile Computing Devices]
At the January IEEE International Conference on Emerging Signal Processing Applications (IEEE-ESPA), Dr. Raj Talluri, Qualcomm’s Vice President of Product Management, used portions of his plenary talk [Applications of Digital Signal Processing in Mobile Computing Devices] to showcase key target applications for the QDSP6 architecture. Some of them were predictable case studies of already-established DSP opportunities: audio processing (encoding, decoding, transcoding, noise cancellation, bass boost, virtual surround and other enhancement functions), along with various types of still image and video processing tasks. The increasingly ubiquitous H.264 video codec received particular showcase …
… Other highlighted applications in the IEEE-ESPA presentation were more trendsetting. Talluri mentioned, for example, the conversion between 2-D and 3-D versions of a polygon- or pixel-based image or video, for appropriate-format output to an integrated or tethered display. He also noted the execution time and power consumption improvements that could be garnered by migrating an augmented reality application from a 100% CPU-based approach to one that fully leverages the integrated QDSP6 V3 DSP core (Figure 6).
Hexagon™ DSP Augmented Reality Demonstration on a Snapdragon S4 (MSM8960) from Qualcomm Applications DSP (ADSP) [QUALCOMMVlog YouTube channel, Feb 5, 2013]
From ‘Applications of Digital Signal Processing in Mobile Computing Devices’ by Raj Talluri [as reported by Susie Wee, Cisco VP & CTEO, Jan 13, 2012]
- Mobile computing apps are dominated by digital signal processing tasks
- There are compute modules that can be used in app processors on smartphones
- Apps processors need to continue to improve in performance and low power
Notable observations given before that by Raj Talluri:
- 300M smartphones sold per year in 2010. Qualcomm predicts 1B per year in 2015
- The highest volume of camera sales is in your mobile phones. Lots of opp for image processing & computer vision.
- Most mobile phones have multiple microphones. Can be used for lots of signal processing apps.
- There is a big opportunity for sensor fusion as there is large number of sensors in every mobile phone. Sensor fusion opportunity: Shake detection – detect shakes, then remove blurring.
- Lots of signal processing is done to provide smooth gesture interactions on phone.
- When breaking down the amount of signal processing needed for games multi-core CPU and GPU processing is key.
Raj Talluri, “Programmes of Innovative Development” [“Wireless &Mobile” session of rASiA.com Business Forum in Moscow, May 16, 2012, published on Aug 10, 2012]
[15:00] Enhancing User Experience
[15:04] Video Telephony + 3D Gaming
[15:45] “A little video clip which shows you other things we do in gaming”
- It’s blidingly fast
- It’s insanely crisp
- Augmented reality
- It’s effortlessly connected
- It’s innately social
- It’s abundantly armed
- Snapdragon Game command
- It’s unusually versatile
- It’s absurdly efficient
[17:00] 3D Positional Audio Using Open SL
- Advanced reverb and virtual surround
- 3D positional audio, bass boost, DSP acceleration
- Collaboration with SRS
– SRS Completes Integration of TruMedia onto Hexagon DSP based Snapdragon Platforms [SRS Labs press release via BusinessWire, June 29, 2012] which relates to the
– earlier agreement to Integrate SRS Audio Technology on Qualcomm’s Reference Design Development Platform [SRS Labs press release via BusinessWire, Dec 8, 2011] as well as to
– SRS Labs and Qualcomm Sign Licensing Agreement to Bring HD-Quality Audio to Mobile Devices [SRS Labs press release via BusinessWire, March 22, 2011]]
[17:47] Dolby Multi-channel Audio
- Dolby Digital Plus delivers a richer, cinematic audio experience to mobile
- Scalable and extensible for optimization to available bandwidth
- Supports many existing and emerging home theater, broadcast, online, and mobile applications
- Provide up to 7.1 channels of cinematic surround sound
- Enables compatibility with millions of existing home theater systems via simple conversion to Dolby Digital
– “The Qualcomm Snapdragon 800 processors also introduce the very latest mobile experiences. … HD multichannel audio with DTS-HD and Dolby Digital Plus for enhanced audio” in Qualcomm Announces Next Generation Snapdragon Premium Mobile Processors [Qualcomm press release, Jan 7, 2013]
– “Qualcomm recently announced it would support Dolby Digital Plus in its new Snapdragon chipset, allowing OEMs to choose to deliver Dolby 7.1 Surround sound at the chipset level.” in Dolby Labs’ CEO Discusses F1Q12 Results – Earnings Call Transcript [Seeking Alpha, Jan 31, 2012]]
[18:23] Multi-burst Photo Continuous photo capture
- High-speed full resolution burst capture
- Zero shutter lag
- Continuous Auto-Focus
[19:24] Natural Human Interfaces in Next Generation Devices
- Ultrasonics processing
- Coded/structured light depth mapping
- Stylus based gestures
- Stereo sparse depth mapping
- Time-of-flight depth mapping
Custom DSP Architecture [#14 slide from At the Heart of Mobile Devices presentation by Qualcomm, Oct 25, 2012]
Qualcomm Leads in Global DSP Silicon Shipments [Wireless/DSP Market Bulletin, Forward Concepts, Nov 12, 2012]
When people speak of “DSP chips“, they are usually referring to discrete devices that are catalog or “off-the-shelf” units; although at the high end they tend to be customized for high-volume customers. And, they correctly associate Texas Instruments as the DSP chip market leader. However, those DSP chips from TI, Freescale, ADI, NEC and others constitute only about 10% of the “DSP silicon” market in revenue terms, as I have detailed in a much earlier newsletter [May 4, 2009].
The largest market for “DSP silicon” is as embedded solutions, generally thought of as System on Chip (SoC) products. Of that SoC DSP market, cellphones constitute the largest segment, with baseband modem chips being the most significant. All baseband chips consist of one or more DSP cores. Qualcomm, the clear baseband market leader, has long employed two DSP cores in each of its MSM modem chips, and of late is shipping three or more of its latest Hexagon DSP cores in its Snapdragon S4 chips. In calendar year 2011, Qualcomm shipped a reported 521 million MSM chip shipments and we estimate that an average of 2.3 of its DSP cores in each unit resulted in 1.2 billion DSPs shipped in silicon. This (calendar) year, we estimate that the company will ship an average of 2.4 DSP cores with each (more complex) MSM chip. We estimate that Qualcomm will ship about 610 million MSM chips in 2012, for a total of 1.5 billion DSPs shipped in silicon for the full year. Clearly, Qualcomm leads the global unit market for DSP silicon shipments.
Qualcomm Intros Multimode LTE Snapdragon Chips in 28nm [Wireless/DSP Market Bulletin, Forward Concepts, Nov 1, 2011]
Although the company claims to have employed them in earlier Snapdragons, this is the first public announcement we have seen of Qualcomm’s Hexagon™ DSP cores which have been under development for several years. Hexagon cores are employed in both the modem and the multimedia subsystems of the S4. According to the company, Hexagon “merges the numeric support, parallelism and wide computation engine of a DSP with the advanced system architecture of a modern microprocessor.” Qualcomm plans to release more Hexagon details and benchmarks later this quarter.
CEVA Says 927 million Basebands Shipped with its DSP Cores in 2011 [Wireless/DSP Market Bulletin, Forward Concepts, Feb 2, 2012]
CEVA, Inc. is clearly the leading licensor of DSP baseband cores and 2011 was a good year for them. All-time high quarterly and annual revenues were up 22% and 34% year-over-year, respectively. Although cellphones constitute the bulk of its licensing business, the company is aggressively pursuing the consumer business with recent design wins in Smart TV and connectivity for smartphones and solid state drives. CEVA’s IP portfolio includes not only comprehensive technologies for cellular baseband, but also multimedia (HD video, Image Signal Processing (ISP) and HD audio), voice over packet (VoP), Bluetooth, Serial Attached SCSI (SAS) and Serial ATA (SATA). In 2011, CEVA claims that its IP was shipped in over 1 billion devices, powering (at least some) handsets from 7 out of the top 8 handset OEMs, including Nokia, Samsung, LG, Motorola, Sony and ZTE. Today, the company claims that more than 40% of handsets shipped worldwide are powered by a CEVA DSP core.
Voice Evolution – Higher Capacity, Better Quality, A Richer Experience [Qualcomm, May 3, 2012]
Qualcomm Enables the True HD-Voice Experience
Qualcomm offers market leading technologies that dramatically improve the quality of voice, and the overall voice experience. These enhancements range from new wideband codecs, Fluence™ noise suppression, active noise cancellation, VoIP optimizations, unique HDOn™ feature to support wideband codecs on narrowband channels, and many more.
From: The Voice Evolution [Qualcomm presentation, April 2012]
Qualcomm is working on further enhancements even in this traditional voice/audio space as evidenced by this job placement: Audio DSP Systems Engineer [Jan 23, 2013]
… Development and deployment of various audio signal processing algorithms for Qualcomm’s chipset solutions:
- Audio and Voice compression technologies
- Audio Pre/Post Processing such as echo cancellation, noise reduction, array signal processing, audio effects, blind bandwidth extension, companding, etc.
- Voice recognition
ADC and DAC
… Desired skills
Knowledge of echo cancellation, noise reduction, array signal processing. Knowledge of voice recognition, speaker identification. Knowledge of fixed-point programming or assembly language. Knowledge of audio effects, virtualization, HRTF, 3D audio. Knowledge of ultrasound signal processing Knowledge of Psycho-acoustic modeling Audio codecs such as AAC/AAC+/MP3/Dolby/DTS, etc. Voice codecs such as AMR/AMR-WB/EVRC/EVRC-WB/EFR/TTY/CTM, etc.
In addition there is a growing partner program in the overall user experience enhancements Qualcomm Announces the Expansion of the Hexagon DSP Access Program at Uplinq 2012 [Qualcomm press release, June 27, 2012]
Qualcomm Incorporated (NASDAQ: QCOM) announced today at Uplinq 2012 the expansion of the Hexagon™ DSP Access Program on select Snapdragon™ S4 processors. The new expansion provides original equipment manufacturers (OEMs) and independent software vendors (ISVs) with added resources, including software development tools and support, which allow them to provide increased differentiation on multimedia features via the integrated Hexagon DSP in Snapdragon S4 processors. The program offers the ability to integrate proprietary algorithms while enabling best-in-class power dissipation and multi-threaded hardware for concurrency. A comprehensive set of multimedia baseline features is provided standard with the Snapdragon platform, and the Hexagon Access Program enables the customization and augmentation of the baseline feature sets and usage models included on Snapdragon processors.
“Qualcomm is committed to providing unparalleled usability for our customers,” said Raj Talluri, senior vice president of product management at Qualcomm. “The expansion of the Hexagon Access Program gives added support and resources to the Snapdragon ecosystem. Qualcomm is enabling power-competitive designs with differentiated multimedia features and customization of multimedia end-use cases via our highly efficient Hexagon DSP technology.”
Previously offered solely on the Snapdragon S3 MSM8660 platform, the Hexagon Access Program now includes select Snapdragon S4 processors, including the APQ8064, MPQ8064, MSM8960, APQ8060A, MSM8260A, MSM8660A, MSM8930, APQ8030, MSM8630, MSM8230, MSM8227 and MSM8627. Both OEMs and ISVs can optimize the features and performance of their multimedia software for execution on the fully integrated audio-video acceleration hardware in Snapdragon processors. Program participants will have access to software development tools that the OEM or ISV can utilize to compile or hand-code their proprietary algorithms. These tools are provided to assist OEMs and ISVs with their audio and video programming on supported processors.
Current ISV participants in Qualcomm’s Hexagon Access Program include: Berkeley Design Technology Inc. (BDTI), Bsquare, Mentor Graphics, Nextreaming, NXP Software, Qsound, SRS Labs, TATA ELXSI and Waves Audio.
The Qualcomm Snapdragon processors that are supported via a Hexagon DSP Tools Suite and via software and documentation as part of the Hexagon DSP Access Program are available to OEMs and ISVs now. For more information on access to Hexagon programming tools and optional hardware development boards and documentation for the customization of multimedia on these processors, please visit developer.qualcomm.com.
Forums – Qualcomm’s DSP Access Program [Qualcomm Developer Network, June 27, 2011]
Get ready for programming on a Qualcomm digital signal processor (DSP), enabling your multimedia features for a large mobile handset market. For the first time Qualcomm is opening up a new programmable processor that software developers can use to accelerate their algorithms and offload the main applications processor. In this session targeted towards mobile software developers, we will introduce the program, go over the tools available, the chipsets supported and the timeline for availability. We will show how developers can add their own customizations to Qualcomm’s audio and video processing engines and enable device makers to better differentiate their smartphone and tablet devices by augmenting the Snapdragon platform’s multimedia suite.
[19:08] This section talks about how you can use the deliverables we will provide in the Open DSP program. Now just to give you an idea of what we do with the DSP, when we supply our MSMs and our SW to the OEM. The DSP is not blank. It is doing something and this is what it is doing. The DSP is actually running the voice codecs part of the voice call. [19:41] … [19:54] Same thing for the audio.
…. [21:04] As Qualcomm moves to its next-generation of chipsets we will probably see more functionality coming to the DSP, more than just audio. [21:17]
Qualcomm’s DSP Access Program Debuts [Qualcomm Developer Network press release, March 21, 2011]
Program Enables Manufacturers (OEMs) and Independent Software Vendors (ISVs) to Optimize Multimedia Solutions Utilizing Qualcomm Audio and Video Acceleration Hardware
SAN DIEGO — March 22, 2011 — Qualcomm Incorporated (NASDAQ: QCOM) today announced that OEMs and ISVs will now be able to program their own audio and video codecs using optimized processors and hardware on select versions of Qualcomm’s Mobile Station Modem™ (MSM™) chipsets through the new Qualcomm Developer Network DSP Access Program. This allows OEMs to better differentiate their smartphone and tablet devices by augmenting or modifying the Snapdragon™ platform’s multimedia suite with their own features or procure differentiated features directly from ISVs.
Both OEMs and ISVs can optimize the features and performance of their multimedia software for execution on Qualcomm chipset audio-video acceleration hardware. Qualcomm will offer software development tools that the OEM or ISV can utilize to compile (C/C++) or hand-code (assembly) their proprietary algorithms on Qualcomm’s optimized audio-video processor architectures. These tools are provided with training and support documentation to assist OEMs and ISVs with their audio/video programming on supported chipsets. Additional details on the Qualcomm Developer Network DSP Access Program are available on the Qualcomm Developer Network (http://developer.qualcomm.com/multimedia).
“Our customers and developers can increase the differentiation of their products on select Qualcomm chipsets by offering new and unique multimedia features and/or customization of the priority and concurrency of their multimedia features,” said Steven Brightfield, director of product management at Qualcomm CDMA Technologies. “Access to our audio and video acceleration hardware enables OEMs and ISVs to give end users access to a wider range of multimedia content and a richer multimedia experience on their mobile devices.”
The Qualcomm chipsets that will be supported via tools and documentation as part of the Qualcomm Developer Network DSP Access Program are the MSM8x60™, MSM8960™, MSM8270™, MSM8x55™, MSM7x27™ and MSM7x30™. For additional information and inquiries on access to programming tools and hardware documentation for the multimedia acceleration subsystems on these chipsets please inquire on the Qualcomm Developer Network (http://developer.qualcomm.com/multimedia).
Why mobile developers should care about the hardware [Qualcomm Developer Network, Aug 4, 2010]
In today’s crowded apps marketplace, it can sometimes be difficult for your apps to stand out. So as a developer, how can you make your apps stand out from the crowd? Certainly well-known IPs such as Tetris or PAC-MAN don’t really need help. But for most games and apps out there, there’s a good chance that a better understanding of the underlying hardware can help you optimize and differentiate your premium apps for an even greater payoff.
Qualcomm’s Snapdragon platform offers an unprecedented combination of processing performance and optimized power consumption for the next generation of smart mobile devices. Because Snapdragon chipsets combine the CPU, GPU, connectivity, memory, GPS, and high performance multimedia capabilities into a powerfully integrated platform, building your applications for Qualcomm-based devices can help you take advantage of these optimized chipset features to create innovative, premium applications and content.
For example, you can use our hardware accelerated codecs to improve app performance and power consumption, or use the dedicated 2D hardware using OpenVG, allowing higher graphics quality. These are just a few examples of how you can take advantage of the hardware to differentiate your app. To learn more, download the whitepaper “Why Should Mobile Developers Care About the Hardware.”
For developers who want early access to Qualcomm-powered devices, you’ll be glad to hear that you can now pre-order the Snapdragon Mobile Development Platform online. With this development platform, you can begin developing your apps before commercial devices become available thus maximizing your revenue potential. Some of the key technical specs of the MSM8655-based Snapdragon MDP are:
1 GHz CPU
Multi-touch capacitive touch screen
Adreno 205 GPU
12 –megapixel camera
720-pixel HD video decode and encode
Stereo 16mm loudspeakers
512 MB of RAM (2x 32b ports)
4GB on-board Flash
So get started. If you have an application worth showcasing, tell us about it.
Why Should Mobile Developers Care About the Hardware? [Qualcomm whitepaper, April 2, 2010, modified Sep 28, 2012 with new agreement included]:
hardware-based 3D audio effects solutions by taking advantage of embedded digital signal processors (DSPs) which provide greater user experiences
Qualcomm’s integrated chipset solutions (i.e. Snapdragon platform) address audio performance challenges head-on, by enabling developers to take advantage of embedded digital signal processors (DSPs), which provide greater user experiences. For example, many applications like mobile games play multiple sounds simultaneously such as, the playing of background music tracks while sound effects are trigged in the foreground (i.e. referred to as audio layering). Since the decoding of MP3 streams is already accomplished in the DSP, it’s a natural fit to mix multiple audio streams in the hardware.
This enhanced audio functionality enables developers to implement their entire audio path at higher sampling rates than a software mixer could process, thereby yielding higher quality audio outputs. Moreover, by leveraging DSP decoder functionality, source audio streams can be encoded as AAC instead of MP3, resulting in smaller file sizes. These smaller AAC files further benefit developers by providing a reduction of bandwidth in applications where the files are transferred over the air, thereby enabling a better user experience.
Presently, developers utilize graphic hardware solutions to give games and UIs the third dimension of depth. Dimension of depth functionality is achievable with audio through Qualcomm’s QAudioFX™ 3D positional engine.
Developers designing applications around 3D positional audio solutions from the start (i.e. such as first-person games etc), will benefit greatly from the added dimensional depth QAudioFX will provide. For example, audio gaming cues placed around the user will signal a car passing from behind, or an enemy approaching from the side, long before they appear on the device screen. These types of enhanced audio functionalities enable developers to vastly increase the user’s 3D environment. Moreover, since the QAudioFX engine will be implemented in the DSP, developers incur little penalty in incorporating 3D positional audio functionality. Currently, QAudioFX and related features are not available to the developers. It is planned to be supported on devices in the later part of 2010.
Developers can also look forward to taking advantage of QAudioFX’s reverberation engine capabilities. One challenge developer’s face in working with quality reverb algorithms is that they require a significant amount of memory for delay buffers. Qualcomm’s DSP-based audio solution allocates delay buffers in the hardware, thereby freeing up memory for the application. For example, in a racing game, rather than switching sound files when the vehicle enters a tunnel, developers can make a single API call to enable reverb. Another API call allows the developer to disable the reverb when the vehicle exits the tunnel.
Developers desiring music functionality in their applications can leverage CMX™, Qualcomm’s DSP-based MIDI synthesizer. Genuinely, realistic sounding instruments require a synthesizer with many articulators and a wavetable with large samples. While equivalent software synthesizers may be realized on today’s mobile processors, they come with a price namely, greater cycle and power consumption requirements. Qualcomm’s CMX solution output quality rivals PC sound cards, to the extent that developers may want to revisit MIDI in situations where they need to minimize audio file sizes. The hardware accelerated CMX capabilities are currently available on BREW platforms. Other high-level operating systems have similar MIDI capabilities but are enabled though software solutions.
Hardware-based 3D audio effects solutions by Qualcomm will provide powerful differentiating performance advantages for developers seeking a competitive edge in creating applications that contain a greater immersive user experience.
DSP History [by Will Strauss of Forward Concepts, May 2009]
This history of digital signal processing was originally written by Will Strauss of Forward Concepts and published in May, 2009 as part of a market study, “DSP SILICON STRATEGIES ’09.” ©2009 Forward Concepts Co. Permission to use excerpts of this history is granted as long as proper attribution is included.
1. DSP History
The earliest record of digital filtering techniques (albeit on paper) was in solving problems of astronomy and the compilation of mathematical tables in the early 1600s. The great mathematician Laplace (c.1779) understood the “z-transform,” the mathematical basis of modern digital signal processing.
During the Great Depression of the ’30s, the U.S. Bureau of Standards retained its surplus employees and set them to developing a variety of mathematical tools. Perhaps the most useful of these was a technique to evaluate the Fourier transform from a number of discrete data points, and using only multiplications and additions.
This Discrete Fourier Transform (DFT) technique lay dormant for a number of years before sampled-data control systems came into common usage. It was then realized that the Bureau’s technique could be directly applied to analyzing the frequency makeup (or spectral content) in these systems, and further, that this technique was ideal for use with computers, and later, digital signal processors. The successor to the DFT, the Fast Fourier Transform, or FFT, is a basic DSP algorithm employed in all forms of spectral analysis from seismic data processing and radar image processing to MP3 audio compression and Wi-Fi, DSL and WiMAX communication, and soon 4th-generation cellular (LTE).
In the early ’70s, scientists were beginning to use off-the-shelf TTL (transistor-transistor-logic) discrete logic chips to implement specialized DSP “engines.” The first systems were relatively slow and consumed lots of space, but the second generation of IC implementations (c.1974) began to use bit-slice logic, like Advanced Micro Devices’ Am2901 TTL 4-bit arithmetic logic unit (ALU). In 1973, TRW bid a military project with the first practical parallel multiplier designs for use with bit-slice ALUs and shipped the first working parts in 1975. But, at several hundred dollars just for the multiplier chip, only the military and government laboratories could afford the approach.
Originally used for implementing “super” minicomputers, the Am2901 found application as the heart of Digital Equipment Corporation’s DECsystem 2020, Data General’s Nova 4 and other mid-sized computer systems of the day. The 2901 and associated chips (consisting of address generators, carry look-ahead logic, program sequencers and fast multipliers—along with memory and I/O circuitry) constituted a basic “building-block” approach to implementing a fast digital signal processor.
In the late ’70s, some of the first commercial applications of DSP used the Am2901 chip family to implement array processors for medical diagnostic equipment like CT (computer tomography) scanners and nuclear magnetic resonance (NMR, now called magnetic resonance imaging, or MRI) systems. The high sales price of such systems justified the high cost of the building-block DSP technology.
For military applications, like radar image processing, the building-block approach proved to be ideal. Because little else was available, the 2901 family chips (and its successors) were also applied to other military DSP programs, such as sonar. Although other IC houses made their own bit-slice chips (and sequencers, etc.), the 290x family architecture has become obsolete (though later implemented as CMOS data-path elements in several ASIC libraries).
Probably the first single-chip implementation of a DSP algorithm was the TMS280 (later renamed the 281A) chip in Texas Instrument’s Speak & Spell™ learning aid introduced in 1978. Implementing Linear Predictive Coding (LPC) for speech synthesis, the device was not programmable, but was controlled by a separate microprocessor (TMS370) and a large ROM containing the library of digitized words. All three chips were implemented in PMOS dynamic logic. The Speak & Spell design team was headed by Gene Frantz (now a TI Principal Fellow). The idea for the product came from Frantz’ boss at the time, Paul Breedlove, who came up with the idea through a series of brainstorming sessions on how to use a hot technology of the day…bubble memories.
This was clearly a consumer product, a chip that retailed for $49.95 (instead of TI’s design goal of $29.95), rather than the thousands of dollars inherent in earlier military implementations. Speak & Spell was such a wild success that TI couldn’t meet demand, so it kept raising the price. However, it proved the commercial viability of DSP technology in a consumer product.
In 1978, American Microsystems Inc. (AMI) announced the first programmable integrated circuit designed specifically for digital signal processing, the 12-bit S2811, designed by Dick Blasco and his group under the direction of Bill Nicholson. Although of truly innovative circuit design, the chip was implemented in a radical “V-groove” MOS technology and never yielded volume commercial products.
Although AMI was the first to announce a single-chip DSP, Intel Corporation was the first company to actually begin shipping a product. In 1979, Intel introduced the Intel 2920 DSP chip, designed by Marcian (Ted) Hoff (famous for invention of what some count as the first single-chip MPU, the Intel 4004). Designed as a “drop-in” analog circuit replacement, complete with on-board A/D and D/A converters, the chip was called an “analog signal processor” by Intel; after all, it (digitally) processed analog signals. The 2920 did not have a parallel multiplier and was too slow (with a 600 ns cycle time) to perform useful work in the audio spectrum—where the initial high-volume DSP chip market was to eventually materialize. After lack of success elsewhere, the second wafer lot was sold out to U.S. Robotics for use in then-current 300 bps modems as adaptive equalizers. Although the 2920 was unsuccessful, Intel did not capitalize on the fact that (with the on-board A/D & D/A converters) this was the first single-chip codec—for which, Intel was awarded the patent.
It was in 1980 that NEC announced the first practical programmable single-chip DSP for the merchant market, the 16-bitµPD7720. Although hampered by primitive development tools, the 122-ns NMOS chip had a (two-cycle) on-chip parallel multiplier which was fast enough to perform useful “work” in the audio spectrum.
This began the first generation of “true” DSP chips, most based on the “Harvard” architecture which employs separate data and memory buses for better real-time operation. In the same year, AT&T’s Bell Laboratories introduced the DSP-1, which also had an on-chip parallel multiplier. But the chip was intended for captive use by AT&T’s manufacturing arm, Western Electric Company. Consequently, NEC was the first merchant market vendor of a practical DSP chip.
In 1979, Ed Caudell of Texas Instruments designed the initial architecture of what was later to become TI’s first DSP chip. Caudell was earlier involved in designing TI’s very popular TMS1000 8-bit MCU. The effort was under TI’s Microprocessor Microcomputer Products (MMP) group headed by Wally Rhines (now CEO of Mentor Graphics) and Jerry Rogers (MMP Design Manager). Known as the Signal Processing Computer (SPC) Program, John Hughes was the Program Manager and Tony Leigh was the Design Manager. Dr. Surendar Magar was hired in 1980 to optimize the SPC architecture around DSP algorithms. Dr. Magar’s Ph.D. was in Signal Processing and had been working for Plessey in the U.K. Soon after joining TI, Dr. Magar recommended the inclusion of a hardware multiplier, which was not in the original SPC specification. Wanda Gass (nee English) joined Magar as a key Design Engineer along with others and the full logic was complete by the end of 1980.
The resulting design was implemented in 3.0 um NMOS and introduced to the world in February, 1982 through Dr. Magar’s classic ISSCC (International Solid State Circuits Conference) paper. The final product, the TMS32010, was announced by Caudell in April, 1982 at the Paris, France ICASSP (International Conference on Acoustics, Speech and Signal Processing). The TMS32010 went into production in 1983 and the DSP Group at that time was headed by Dave French (later VP at Analog Devices Inc. and then CEO of Cirrus Logic Inc.).
TI’s early recognition of the potential of DSP carried it through seven years of “missionary” work before a profit was turned, and by then others realized that it was a market ripe for growth. Three other major semiconductor companies then joined TI and NEC in the programmable DSP chip market: AT&T Microelectronics (later Agere Systems, then merged with LSI Logic to become part of LSI Corp.), Motorola Semiconductor Products Sector (now Freescale Semiconductor) and Analog Devices Inc. Today, there are many other semiconductor houses that employ DSP technology in their products, but for the most part, those chips are not programmable by the user.
First-generation chips generally lacked parallelism, since at least two processor cycles were required to perform a complete multiply-accumulate (MAC) function, and limited on-chip memory required expensive (in terms of real estate and added memory) additions for most applications. The more primitive “engines” of the day (like conventional MCUs) required as many as 60 clock cycles to perform a multiply-accumulate operation.
Other chips offered to the market in this first-generation era were the ITT UDPI-01 (the first announced CMOS single-chip DSP, which never reached the sampling stage) and the Hitachi HD61810, another CMOS chip which saw mostly internal company use.
The most striking improvement in second-generation DSP chips was in the implementation of a single-cycle multiplier-accumulator (MAC), effectively doubling the bandwidth capability of the chips. Direct memory access (DMA) emerged as a way to quickly load new algorithms into the DSP chip. Enhancements to first-generation chips added serial communication ports and timers, and interrupt capability began to emerge for control applications. DSP instruction sets became richer, with event control capability, which further broadened chip utility—allowing true stand-alone capability for many more implementations.
The Fujitsu MB8764 (announced in 1983) was the first of this genre, followed by the TI TMS32020 (in 1985). The TMS32020 was the result of collaborative design efforts between Texas Instruments and their customer, ITT Corp. (then International Telephone and Telegraph Corp.). Dr. Surendar Magar of TI and Dr. Kristine Kneib of ITT were the principal architects of the -020.
Other single-chip DSPs were announced in this second-generation era by Toshiba (T6386/7), STC (DSP-128) and Matsushita (MN1901/9), but they were never successful in the merchant market. (The DSP-128 never sampled, according to our information.) During this time, AT&T continued internal DSP development with the DSP-2 chip. Thomson (now STMicroelectronics) introduced the ST68930/31 in 1986, but confined most marketing efforts to Europe.
Texas Instruments licensed the NMOS 32020 architecture to General Instrument Microelectronics (now Microchip Technology). GI moved the NMOS design into CMOS and in turn provided it back to TI which resulted in TI’s first CMOS DSP chip, the TMS320C25. The new chip was designed in Japan and principal designer of the -C25 was Takashi Takamizawa, later to become TI’s DSP Business Manager in Japan.
The first floating-point DSP chip from a major vendor reluctantly made it to market in this era. AT&T introduced itsDSP32 for internal use at 8 MFLOPS in 1984 and began to sell its DSP32 to the merchant market in July, 1986. Thrust into the merchant world by the divestiture of the Bell operating companies, AT&T did not have cohesive DSP-chip marketing direction until late 1987, when the company realized that their first integer DSP chip, the (third-generation-design) 18.2 MIPS DSP-16, could make the company a credible force in the merchant market.
The period of the third generation became the “glory years” for DSP volume shipments by TI, which had captured over 60% of the world single-chip DSP market by 1986. Third-generation changes centered mainly on reconfigurable memory, with flexibility of on- and off-chip memory that could be variously configured for program, data, or coefficients. The degree of parallelism increased even further, with as many as three operations performed in a single clock cycle. Further-expanded instruction sets became evident. In late 1986, Zoran Corp. introduced the first single-chip DSP with CISC-like vector instructions to efficiently perform FFT functions for military applications.
Analog Devices introduced the ADSP-2100 chip, which was unique in that it had a 24-bit instruction word, 16-bit data paths, and had no on-board memory. But, it was designed to access two words of external data on every cycle and had an instruction set optimized to perform FFTs and zero-overhead loops. The ADSP-2100 and its faster successor, the ADSP-2100A, found heavy use in military and imaging applications, while newer members of the family, the ADSP-2101 and ADSP-2105 (both with on-board memory) saw a wider variety of applications.
The Motorola 56000 family of 24-bit chips (for both instructions and data paths) was the first integer DSP optimized for high-fidelity audio applications, and found early acceptance in professional audio processing and music synthesis. Other third-generation integer DSP chips included those from AT&T (the DSP-16A at 40 MIPS by 1988), Hitachi (the DSP-I, sold only in Japan) and TI (TMS320C50).
Coincident with the era of the third-generation of integer DSP chips, additional first-generation floating-point DSP chips were announced, including improved units from AT&T (the 25 MFLOP CMOS DSP32C was announced in 1987), Texas Instruments (TMS320C30), Zoran (ZR34325), Fujitsu (MB86232), STMicroelectronics (ST18940/41), NEC (µPD77230) and Oki (MSM699210).
The emergence of a fourth generation of DSP chips was heralded by announcements made in the early ’90s. Several fourth-generation integer DSP chips were characterized by on-chip codec circuitry, like Motorola’s DSP56156.
As the fourth generation evolved, geometries progressed to submicron (0.8 µm) levels and multiply-accumulate times continued to fall. Additional CISC instructions to accommodate key algorithms became evident for some chip designs.
Second-generation floating-point chips emerged coincident with the introduction of fourth-generation integer chips: AT&T’s DSP3210, Motorola’s DSP-96002, NEC’s µPD77240, TI’s TMS320C40, and somewhat later, Analog Devices’ ADSP-21020.
The fifth generation of DSP chips began in 1994 with the Texas Instruments’ TMS320C54xx family. Introduced with a 20ns (50 MIPS) speed, it was TI’s first chip with a Viterbi accelerator, and the successor to its popular C25 and C50 families. With the accelerator, the chip was clearly intended for communication applications (like modems).
But, from a public relations standpoint, the C54 family was overshadowed by the 1985 sampling of TI’s TMS320C80(earlier termed a Multimedia Video Processor—MVP). The C80 consisted of four 64-bit DSP cores along with a 32-bit RISC core. A less-capable, but cheaper version, the C82, was introduced with two DSP cores and the RISC core. Although an extremely powerful chip family, the C80’s programming complexity confined the bulk of its applications to sophisticated image processing, and the chip never achieved general industry acceptance.
Motorola Semiconductor’s Data Communications Operation melded its DSP56002 DSP core with a 68302 microprocessor (MPU) core on the same die, the M68356 “Signal Processing Communications Engine.” In a similar pairing, TI joined its C54 core with an ARM RISC core on a single die and found success in tens of millions digital cellphones.
Although multiple-MAC designs were earlier available on specialized 8-bit video filter chips like the Inmos (acquired by STMicroelectronics) A121 and Zoran ZR33881, 16-bit programmable DSPs also began taking on multiple MACs late in this generation. Half-micron geometries led to sub-20 ns multiply-accumulate times—across several MACs or several DSP cores, leading to substantially higher bandwidth capability.
Continued improvements of second-generation floating-point DSP chips were introduced coincident with the fifth generation of fixed-point chips. Texas Instruments introduced the TMS320C32 and TMS320C44 chips, while Analog Devices introduced the ADSP-21060 SHARC™ (Super Harvard Architecture Computer).
The late-‘90s era of 0.35 um CMOS geometries saw the first introductions of VLIW and superscalar architectures for DSP.
Texas Instruments’ TMS320C6201 was the first user-programmable VLIW DSP chip available. Employing eight ALUs, two of which had MACs, the ‘C62 was initially capable of executing 1,600 raw MIPS and 400 DSP MIPS (MMACS) at 200 MHz. The VLIW approach requires an optimizing C-language compiler, and TI invested heavily in developing an efficient compiler. Until the family later moved to 0.18 um geometries, power consumption was a problem. The ‘C62x family was announced in February 1997 and began to ship in moderate volumes in Q1/98.
Lucent Technologies (later Agere Systems) introduced the DSP16000 family of 16-bit DSPs which featured dual ALUs and dual MACs. The 16000 was optimized for low power consumption and for bank speech coding applications such as required in cellular base stations or Internet Protocol (IP) telephony gateways. Initially rated at 400 MIPS (@200 MHz), the 16000 could go head-to-head with TI’s C62x family in applications where there was no need for the extra (non-DSP) MIPS provided by the TI chip. The DSP16000 began sampling in November 1997 and in 1998, Lucent began shipping the DSP16410, a chip consisting of two 16000 cores on a die. The DSP16410 has been a favorite in GSM cellular base station implementations.
In 1998, a startup company, ZSP Corporation, began sampling its ZSP16400 family of DSPs, which like the Lucent product, had dual ALUs and dual MACs. However, the ZSP family was based on a 4-issue superscalar architecture and employed a different (proprietary) approach to feeding data to multiple MACs. Initially rated at 400 MIPS (@200 MHz),the ZSP design was acquired by LSI Logic Corp. in mid-1999, and volume shipments of the renamed ZSP400 family began in Q3/99. LSI Logic sold both ZSP chips and licensed ZSP cores to a number of companies. The ZSP chip operations were sold in mid-2006 by LSI Logic to Verisilicon Corp., which has since expanded the product offerings.
Third-generation floating-point chipsemerged in this time frame, beginning with TI’s C67x family based on the VLIW architecture of the fixed-point C62x family. The family is source-code compatible with the C62x and started with a 1-GOPS version. Analog Devices announced its own dual-ALU/dual-MAC product, the floating-point ADSP21100(“Hammerhead”) family, which was code-compatible with its ADSP21000 (SHARC) family of products. Sampling began in Q4/99. Because of its code compatibility, the Hammerhead presented an instant upgrade for existing sockets.
In 2001, TI introduced a formal pairing of its C55-family DSP core and ARM900-family RISC cores on a single die, formalized as its OMAP™ product family (said to have evolved from Open Multimedia Applications Processor). For reasons explained later in this report, the pairing of DSP and RISC, rather than a single processor architecture for both, has considerable merit in many applications.
By 2003, 0.15 um DSP chips became commonplace and 0.13 um chips were in volume production. And DSP cores become an ever smaller percentage of the die area as peripherals and (mostly) on-board memory begin to dominate the silicon die.
9. THE CURRENT DSP CROP [c.2009]
By early 2007, 65-nm versions of Texas Instruments’ C55 family were shipping in volume as part of the OMAP™ family, which became the market leader due to its deployment in hundreds of millions of cellphones, annually. But, the company’s 1-GHz C64 VLIW became TI’s flagship “catalog” product. One member of the C64 family includes both Viterbi and turbocoding accelerators, clearly targeting the cellular base station market. The C64 family has since been expanded to three or more cores on a single die, again addressing the base station market. Other C64 implementations employ MPEG4 and graphics accelerators for video multimedia applications, under the DaVinci™ family name.
Motorola has fielded its MSC8144 chip, successor to those first announced in Q4/01. Based on four StarCore VLIW cores (each with four ALUs & MACs) and 11.5 Mbits of on-chip memory, the earlier MSC8122 chip was originally introduced at 300 MHz, but the successor MSC8144 is now shipping at over 1GHz, and at introduction was rated by Berkeley Design Technology Inc. (BDTI) as the fastest available DSP processor, even as a single-chip implementation.
Formally announced in Q3/01, Analog Devices’ TigerSHARC™, a floating-point VLIW design, was also targeted (unsuccessfully) for the cellular base station market. Uniquely, in addition to traditional “symbol-rate” baseband processing, the chip can also perform high-speed “chip-rate” signal processing functions (required for CDMA cellular operation) that competitors generally assign to ASIC or FPGA implementations. The chip provided a significant jump in performance over the earlier Hammerhead, but it is not code compatible with the ADSP21000 product family, so new tools were required to develop products based on the higher performance product.
In a similar vein, Analog Devices introduced the fixed-point Blackfin™ DSP family, based on the “Frio” core jointly developed with Intel Corporation. The first chip (the ADI21535) was announced in mid-2001 at a 300MHz clock rate. It was priced at $27 @10K units. The initial superscalar design employed two 16-bit MACs and two 40-bit ALUs and four 8-bit video ALUs, clearly targeting multimedia applications. By mid-2003, the chip family began sampling at 600 MHz (1.2 GMACS), and is currently available at 750 MHz. The architecture is said to scale to at least 1GHz, a speed that ADI’s older 219x devices would be unlikely to achieve. As with the TigerSHARC, it is not code compatible with the earlier 2100 family devices, so the company has developed new tools to support the new architecture. However, with the 300 MHz version later priced at $4.95 @10K units, the chip achieved strong early market attention.
Intel chose to employ the Frio core in SoC (System on Chip) products under its PCA (Personal Internet Communications Architecture) banner. Rather than use the DSP nomenclature (a term which they equate to TI), Intel chose to call the Frio technology “Micro Signal Architecture.” The Frio architecture became part of Intel’s cellular baseband chip, code-named “Hermon” (after Mount Hermon, the highest mountain in Israel), which (along with Intel’s XScale RISC product line) was sold to Marvell Semiconductor in late 2006. Marvell has since expanded on the initial architecture, now offering a cellular baseband chip code-named “Tavor” (after the second highest mountain in Israel). Tavor found a home in RIM’s Blackberry “Bold” 3G cellphone introduced in late 2008.
Agere Systems introduced (Q2/03) the DSP16411, a 0.13 um successor to the earlier (Lucent) DSP16410 that has been popular in GSM base stations. Clearly, the faster, and code-compatible, 16411 is a natural for retrofitting GSM base stations to GPRS capability. The operation became a division of LSI Corp. shipping its Trident HP chip based on its workhorse DSP16000 and ARM7TDMI cores for the GSM/GPRS/EDGE cellphone market. The cellphone chip operation was later sold to Infineon, but LSI Corp. continues serving the cellular base station market with its Starpro line of multicore chips based on the StarCore DSP core jointly developed with (then) Motorola’s Semiconductor Division.
VeriSilicon now offers its new ZSP600 family as licensed cores. Based on a 6-issue superscalar architecture with 4 MACs running at up to 300 MHz in 0.13um CMOS, the chip occupies a unique niche. Although earlier designs, like the ZSP400 and ZSP500 were also sold as chips by LSI Logic, VeriSilicon only licenses the IP, in addition to applying the cores in its own ASIC chip designs.
The designs of programmable DSP chips continue to evolve, with VLIW becoming the base architecture of choice for most of the highest performance discrete chips. But all discrete DSP chip vendors are now treating their basic engines as ASIC cores, as central elements in an ASSP (application-specific standard product), like a digital still camera chip, or for a customer-specific (usually high-volume) design like a cellphone baseband chip.
Another clear trend is that of licensable RISC cores have evolved to incorporate ever-increasing DSP functionality, either through customizable instruction set architectures or through the addition of SIMD augmentation.
The trend toward ASSPs for vertical markets, like cellphones, cameras and personal media players continues, with off-the-shelf discrete DSPs becoming a diminishing percentage of the (still-growing) DSP-centric silicon market.