Microsoft Tellme cloud service for WP7 ‘Mango’ and other systems

Microsoft Tellme Vision for Future Interactions [Aug 9, 2011]

Four Reasons We’ll Love Talking to Our TVs [Zig Serafin, General Manager, Microsoft Tellme, Aug 9, 2011]

Microsoft is making big bets on speech NUI. Microsoft Tellme is driving that forward, powering the speech experiences in Kinect for Xbox 360, Windows Phone, Bing Mobile and Microsoft Tellme IVR. Because speech fits well with NUI across devices of all screen sizes, Microsoft Tellme is truly at the center of the NUI evolution.

In developing the speech NUI, we’ve designed the Microsoft Tellme speech service as a system that continuously learns and adapts. The more you use it, the more it learns and improves — we hope meeting and often exceeding your expectations. It continually gets smarter through a natural feedback loop that spans mobile, entertainment, customer care and other interactions. It learns from the great diversity of ways people speak across these interactions. The Microsoft Tellme speech service currently processes more than 11 billion voice interactions a year; each one helping to improve the service and, along with it, your experience. It’s the ultimate crowd-sourcing example.

That the Microsoft Tellme speech service gets better with each interaction is important. But that’s not the coolest thing about the future of speech. We aspire to deliver services that are just as natural and easy as human conversation. We see a future where the service will know you: know your intent, your social and business connections, your likes and dislikes, your privacy preferences, and the things that define the context that’s important to you. The result will be a speech NUI service that helps you accomplish everyday tasks in a more natural and conversational manner. This service will simplify tasks that used to be tedious or impossible on a TV or other device, by combining an understanding of language and intent with a deep knowledge of you, the user. We envision a future where we build on the experiences we deliver today with Kinect for Xbox 360, Windows Phone, or Bing for iPad or iPhone apps, by enhancing the speech NUI experience to understand more layers of context: what you are doing, where you are doing it, the kinds of devices you are using and your historical preferences. Because this is a cloud-based service, your interactions will be able to persist over time, enabling you to pick up where you left off, regardless of what device you may be using. That is a pretty exciting future, and one where your TV experience will be as helpful and intuitive as it is natural today with Kinect for Xbox 360. In other words, you may never have to see another remote control on your coffee table again!

Look who’s talking: Speech in Mango [Bill Pardi, senior consumer writer in Windows Phone Engineering, Aug 3, 2011]

On a recent run around town with my wife to grab dinner and pick up one of the kids, a text message came in from my son. Not an unusual event in itself, but what made this message interesting is that my phone read it aloud to me — and I replied back with my voice.

Meet Voice-to-text, a new hands-free messaging feature coming this fall in Mangoand one that’s quickly become a personal favorite. And after seeing it in action on my test phone on our drive, my wife looked at me and said, “I want that for my car.”

Voice-to-text works for both text and instant messages, and it’s handy even when you’re not driving since it can slash the time you spend typing—a good thing at times even considering the fantastic keyboardon Windows Phone.

But the feature really shines when being hands-free is a necessity, like when I’m driving. My car has Bluetooth built in, and my Windows Phone is paired with it. When I’m driving and a message comes in, Windows Phone uses the Bluetooth connection and car’s sound system to narrate the message and record my response (pausing and resuming music or the radio if needed). The “conversation” goes something like this:

WP: [music pauses]You have a text message from Cody Pardi. You can say read it or ignore.
Me: Read it.
WP: “When will you be home?” You can say reply, call or I’m done.
Me: Reply.
WP: Say your message.
Me: “In about 20 minutes.”
WP: [The phone transcribes and repeats the message] You can say send, try again, or I’m done.
Me: Send. [music resumes]

My initial thought when I used it for the first time was “this is a game changer” because it felt natural to use while driving without being a distraction. And it all just worked. In fact, I was so impressed with the technology I decided to sit down with Alex Perez Avila, a program manager for many of the voice features in Windows Phone, to get an inside look at how it all happens.

Speech dialog box

Alex works in the Microsoft Tellme team, which develops the voice recognition and text-to-speech technology found in a growing number of Microsoft products including Office, Windows, and Xbox. He told me that competing smartphones are adding some voice features, mostly for existing phone options. Alex and his team, meanwhile, wanted to create something seamless that felt natural for completing everyday tasks such as calling someone in your contacts list or finding a local restaurant. “We think this will set Windows Phone apart,” he said.

Windows Phone taps the Microsoft Tellme cloud service for voice recognition and transcription. “No one else has it,” Alex said, “and we think customers are really going to like it.” The service, he notes, has built-in ways to learn from itself and improve recognition and transcription accuracy over time–all without putting additional software on the phone. The feature, he says, “will just get better and better as more people use it.”

I mentioned to Alex that I noticed my Mango phone can speak modern-day abbreviations such as TTYL (“talk to you later”), LOL (“laugh out loud”), and even Smile (“happy smiley face”). I asked him if Windows Phone could translate those back if I spoke them while composing a text message. “Yep. We understand a limited set of key phrases and will transcribe them as abbreviations.” He demonstrated—and indeed it worked as advertised.

In addition to Voice-to-text, Alex walked me through several other Speech-related improvements on the way. In Mango, for example, Speech can be triggered even when the phone is locked by pressing and holding the Start button. You also have control over how and when text messages are read. By default, the phone reads messages aloud when connected to Bluetooth headset or stereo (which is how Windows Phone knows to read my text messages in the car).

Speech Interface: Windows Phone Mango Preview [July 25, 2011]

While you may have already seen some of the new speech interface features in our Ultimate Windows Phone 7.5 Mango Preview, we thought it would be fun to give you an even more in-depth demonstration of what exactly you can do without having to touch or look at your phone. While you may have already seen some of the new speech interface features in our Ultimate Windows Phone 7.5 Mango Preview, we thought it would be fun to give you an even more in-depth demonstration of what exactly you can do without having to touch or look at your phone. Windows Phone Mango’s updated speech interface is finally capable of performing all commands via a Bluetooth headset. This was not true with the original version of Windows Phone 7 which could only use a Bluetooth headset to make calls.

There are some great new accessibility-related Speech features coming in Mango—using voice to forward calls and setup a speed-dial list. When Alex showed me these, I was impressed. In one very cool example, he stored a number in a speed dial location and then dialed it, hands-free. Other things you can use Speech for in Windows Phone include:

  • Making a phone call by name or nickname
  • Redialing a number
  • Calling voicemail
  • Searching Bing
  • Turning on the speakerphone
  • Starting an app while in a call
  • Navigating Maps

All these features put together makes voice an incredibly integrated part of Windows Phone in Mango, and I think will it set the bar for voice-recognition technology in a smartphone. To finish the story I started this post with, I told my wife that if she wanted that voice feature in her car she’d have to get a Windows Phone because her smartphone doesn’t do that.

“OK, fine with me,” she said.

Now that was something really worth hearing.
——————————————————–

Windows Phone around the world: Language support in Mango [July 6, 2011]

  • Voice-to-text and Voice-to-dialis available in 6 countries: France, Germany, Italy, Spain, United Kingdom, and the Unites States.
  • Voice search is supported in 4 countries: France, Germany, United Kingdom, and the United States.

Compare this to the previous speech capability:
How To Use The Speech Feature | Windows Phone 7 [Oct 11, 2010]

Use voice commands to make a phone call (Call), search the web (Find), or open a mobile app (Open) on your Windows Phone 7.

Learn more about using the Speech Feature: Use Speech on my phone [Oct 9, 2010]

Speech Recognition Integration in Windows Phone 7 [July 24, 2010]

Microsoft Tellme, Microsoft’s voice recognition service, is deeply integrated into the upcoming release of Windows Phone 7. Once activated, you can direct the phone to call a friend, search for local businesses and launch applications, all using simple voice commands (“Call john,” “open app name,” etc.) The speech recognition feature takes advantage of the largest cloud-based speech platform in the industry for near real-time recognition and enhanced accuracy. Tellme was also recently added to the new Ford Fiesta vehicles with SYNC. (Source:Channel10, Microsoft PressPass)

OTHER SYSTEMS

TVs (via Kinect and Xbox)

E3: Xbox Kinect Voice Control [June 6, 2011]

Microsoft E3 2011 – Kinect Voice Control Dashboard [MS press conference, June 6, 2011]

Tellme and the Voice of Kinect [Aug 1, 2011]

There’s a great article on Microsoft News Centertoday that provides some interesting context around the development of the Kinect.

Back in the early 2000s, Bill Gates and other Microsoft execs had been talking a lot about enabling a connected media center for people’s homes, alluding at that time mostly to the Media Center PC. The problem? The traditional PC interface wasn’t widely accepted by people in their living rooms. Perhaps people didn’t want a keyboard on their coffee table.

At a certain point, the Xbox team realized they had a foothold in the living room like no other device, so their product was the natural one to bring Bill’s vision to reality. They built an entirely new kind of interface so people could access their entertainment in a more natural way – the result of that work was the Kinect, a big part of which is the audio or speech capability.

What’s interesting is the collaboration involved to create a device “that feels like Star Trek.” The underlying technologies powering Kinect’s speech interface had actually been in development at Microsoft for a long time, but no one had put them together in such a seamless way.

The Xbox team worked with one of the company’s senior researchers, Ivan Tashev, to “purify” the audio signal and allow our speech-recognition platform to do its job despite the often spacious and noisy characteristics of many people’s living rooms. As Tashev says in the article, “Basically in Kinect I have technologies that are a summary of the research I did for seven years.” You can read more about Tashev’s contribution to the project in this related article published by Microsoft Research and in a post here on Next.

The speech-recognition technology used in Kinect is provided by Microsoft Tellme, a flexible speech service also used to power the speech experiences in Windows Phone, Bing Mobile and other key Microsoft products. Microsoft acquired Tellme in 2007to add to the company’s already robust research efforts in speech recognition and to gain valuable expertise in running cloud-based speech services.

These guys make it sound easy, but applying chalkboard-sized algorithms to cancel out random noises in a microphone audio stream is an epic challenge, and just one of many the team had to overcome in building the first Kinect. This is kind of the technology version of a quest story, like Jason and the Argonauts. You have heroes like Microsoft Tellme and Tashev overcoming villains like ambient conversations and echoes. Fortune steps in, in the form of the keyword “Xbox,” which ends up being a unique phonetic construction and thus the perfect choice for an aural “push to talk” button. In the end, three separate technology threads have been woven together in a way that advances the entire industry.

This is what I love about technology — it may be geeky, but it’s never dull.

Listening to Kinect [Apr 25, 2011]

I’ve been telling anyone who’ll listen recently that Natural User Interfacesare more than just touch, gesture and speech – though Kinect, perhaps the hottest NUI tech around, does two of these exceedingly well. Much of the focus of tinkering with Kinect has been with gesture, using the skeletal tracking capability. The speech capability of Kinect has had less focus and a recent post by Rob Knies on the Microsoft Research site reminded me that it was perhaps time to give speech the spotlight for a moment.

The story starts with Ivan Tashevwho has been working for the majority of his career in Microsoft Research, always focused on the sound. He knew someday, we’d be talking to computers but didn’t quite know when the call would come. A few years ago, Alex Kipman from our Xbox team was looking for an audio capability that could be listening 100% of the time and didn’t rely on a button being pressed to signal “listening mode”. Added to this, Alex was looking for a system that could detect distinct voices in a noisy environment….oh and do this at 4 meters. Regular readers will know that Alex was the driving force behind Kinect. Ivan’s call had arrived.

He figured most of the above as possible but one big challenge remained. Stereo acoustic-echo-cancellation is a longstanding research problem that would be required to filter out the loudspeaker sound and zone in on user who were talking to the system. It turned out that acoustic-echo canceller 10 times better than normal industrial devices have.

Voice modality preferred

In his MIX talk, Ivan talks about preferred modalities for input – noting that the combination of speech and gesture can deliver a powerful multimodal interface. You issue one command with speech (for example a search) and select from a short result set with gesture.

Many months of works ensued on the development of the audio pipeline with our Tellmegroup involved in building a solution that many thought was impossible. As Alex has reminded me on a few occasions, the development of Kinect is a story of making improbable (perhaps even impossible) things possible. Ivan says that Microsoft didn’t get in this position by accident – testament to the many years of investment in something we didn’t quite know what it would be used. That’s the risk, and reward of basic research and something I’m personally proud that Microsoft continues to invest in.

tashev

34 days before Kinect shipped to the public, the audio work was complete. That’s some high risk, high reward timing!

The story doesn’t end there. Very soon, the Kinect for Windows SDK betawill include the ability to take advantage of the four-element microphone array with the acoustic noise and echo cancellation that Ivan and the team developed. Right at the end of the talk, Ivan gives some insight in to the future capability of Kinect audio.

I’m looking forward to seeing what the tinkerers do with this audio wizardry.

‘Xbox: Play.’ — Microsoft Tellme and the Voice of Kinect for Xbox 360 [Microsoft feature story, Aug 1, 2011]

For years leading up to the launch of Kinect for Xbox 360, Microsoft was blending technologies for the connected living room, working toward its vision of a natural, powerful center for home entertainment.

At the same time, millions of people around the world had invited the newest iteration of video game consoles into their homes — the Xbox 360video game and entertainment system, which was capable of handling games, movies, TV, music and photos — and it opened a world of Internet-connected possibilities.

“Bill Gates spoke about Microsoft’s strategy for the living room, with an intelligent entertainment center to enable amazing experiences,” says Thomas Soemo, principal program manager lead for the Xbox platformat Microsoft. “We knew that the Xbox 360 system was going to be a prime component of this vision.”

The challenge was that no one had ever really found an interface that worked well in the living room. Other industry attempts featured a keyboard to input commands on screen, which never resonated with consumers. The Xbox 360 Controller was great for games but limited for searching media — and unfamiliar territory for nongamers. There had to be a better way to interact.

“How do we solve this problem?” Soemo says. “How do we enable a very natural form of interaction with this device that also fits the social atmosphere of the living room? How do we achieve what feels like Star Trek? That’s the challenge we took on.”

With that challenge in front of them, the Xbox team set out to create the next-generation human-machine interface, capable of understanding requests and commands the way humans do — through speech and gesture. The resulting product, Kinect, has brought speech service beyond the telephone voice prompt and into millions of homes worldwide.

“We are witnessing the beginning of a revolution today,” Soemo says. “Speech recognition is entering the mainstream and redefining how people find, consume and interact with their media content on the Xbox 360.”

The Living Room Challenge

In creating the Kinect, one of the biggest engineering challenges was the living room itself. Living rooms tend to be larger rooms, leading to an unprecedented design requirement for the Xbox team — the Kinect’s microphone array would need to work seamlessly up to four meters away from the couch, much farther than other speech-recognition systems in the industry could comfortably handle.

Another complication was the fact that living rooms are social gathering places and are often filled with ambient noise, such as conversations, movie soundtracks and music.

“Imagine if everything you said could be interpreted by the Xbox 360 as a command,” says Keith Herold, a senior program manager lead with Microsoft Tellme, the company’s speech-recognition service that also powers Windows Phone 7 devices and appears in an array of other products. “That’s the big problem in the living room — how do we get the device to ignore everything but actual commands?”

To solve this, the Xbox team reached out to Ivan Tashev, a Microsoft Research principal software architectwith more than a dozen patents related to helping machines capture and interpret sound.

Tashev had been prototyping technologies for speech enhancement, audio processing, microphone arrays and echo cancellation. For the Xbox 360 system, he went to work purifying the audio signal so the Kinect could understand what it was being told. He used his expertise in echo cancellation to subdue everything coming out of the console — soundtracks, movie dialogue, game audio — as well as room noise the microphone would pick up. This was an immensely challenging problem based on advanced mathematics, but Tashev relished the task.

“Basically, in Kinect I have technologies that are a summary of the research I did for seven years,” he says. “We know what’s coming out of the console — it’s a constantly shifting, dynamic signal. The trick was to remove that outbound signal from the incoming signal. And to do it in real time.”

Another challenge was to help the Kinect determine who is talking, focus on that source and ignore everything else. To solve this, Tashev used “beamforming” technology, which spotlights the person giving commands to the system.

“If there are four people in the room and one is talking, the spotlight goes to him or her, and if that person says ‘Xbox,’ then we start listening,”Tashev says.

In the end, the Kinect’s audio enhancement chain consists of six major stages that consecutively improve the quality of the speech signal, removing clutter, noise and reverberation from the room to help the speech recognizer do its job.

Making the Natural Interface Natural

With the audio pipeline in place, the next step was to integrate that signal with the Microsoft Tellmespeech service. For this phase of the project, the Xbox team turned to Herold’s team to bring Microsoft Tellme to the Xbox 360 platform.

Microsoft - 2nd generation Kinect with voice recognition at E3 -- 2011

The living room presents unique challenges for voice technology. Kinect’s microphone array needs to work seamlessly up to four meters away from the couch and contend with ambient noise such as conversations, movie soundtracks and music.

“Our job was to take the remaining audio, now at this point just a player’s commands, and do something rational with it,” says Herold. “This project required us to step up and push our boundaries well past telephony voice response and desktop speech, into a much more human environment. We needed to put ourselves in the mindset of the living room environment and all of the interactions that are possible there. We wanted to change the way people thought of speech technology.”

Adding to the challenge was the Xbox team’s allowable error rate, which seemed impossibly low for a system with so many variables.

“We never want a command to trigger random actions on the console,” Herold says. “The idea of ‘never’ is not achievable of course, but we picked a suitably small number for never.”

The solution to this problem was the software equivalent of a concept first developed for backpack-sized walkie-talkies in the 1940s — the transmit, or “push-to-talk,” button. This was embodied as the keyword “Xbox.”

When you say ‘Xbox,’ the system knows you’re talking to it and what’s coming next is a command. If you don’t say it first, you haven’t pushed the virtual ‘push-to-talk’ button, and the system won’t listen,” Herold says.

Since the Kinect supports both speech and gestures, the combined Xbox and Microsoft Tellme team spent considerable time determining how to enable both forms of interaction in a way that was complementary and intuitive. Their guiding principal was the concept of the Natural User Interface(NUI), in which people communicate with machines in the most human way possible.

For example, speech might be the best modality to search through thousands of songs, since gesturing to scroll through such a vast list could be tedious. Telling the machine, “Xbox: Bing, The Beatles” allows the user to get what they want in the most natural way possible from the vast collection of content available through Xbox LIVE.

Once the list is narrowed, using gesture to select a specific song may be the most natural interaction. Graphics, text and sounds on screen help cue users to make the interface more intuitive and easy to use.

According to Herold, this is the strength of “multimodal” interfaces, which combine speech with touch, gesture or other forms of input: Each modality is used where it is stronger, and the combination becomes much more powerful.

Advancing the Platform

For the first iteration of the device, the Xbox team prioritized the commands that would resonate most with people in their living rooms. They decided that simple navigation functions and media playback controls— “Xbox: play. Xbox: pause.” — gave people something valuable, while also demonstrating the system’s potential.

“When you’re building a new product on new technology, you can try and do everything and it may work most of the time, or you can stay laser focused on the key scenarios and make them amazing,” Soemo says. “The first release of Kinect was about shipping a product that handled those key speech experiences extremely well.”

From the start, however, the team was thinking long term. When the Xbox team announced the next round of Kinect functionality at the recent E3 conferencein June 2011, it was the next step in a vision that began years ago.

“For the launch of Kinect, we leapt over some major technology hurdles on our way to ‘Xbox: play.’ and ‘Xbox: pause.’,” Soemo says. “Nobody had ever done highly accurate speech recognition from up to four meters away, without a physical ‘push-to-talk’ button, in an environment filled with ambient noise, all while playing in 5.1 surround sound. Because of the collaboration among the Xbox, Microsoft Researchand Microsoft Tellme teams, we were able to take science fiction and make it science fact.”

Soemo says the functionality announced at E3 is just the second iterationin the journey toward the Xbox 360 system becoming the entertainment hub for the home — redefining how people discover and use the range of media content available on Xbox LIVE and making the remote a thing of the past.

“We are laying a foundation that will transform how people interact with devices,” Soemo says. “We are at that cusp. With Kinect, we’ve put speech into the living room. Now, Microsoft will continue to push the boundaries of NUIs to enable seamless experiences that span devices and platforms.”

With that foundation in place, the Kinect’s latest functionality goes well beyond simple navigation and allows people to use voice commands to traverse very large media catalogs with ease, and the team doesn’t plan to stop there.

“What are the most amazing experiences with speech we can imagine?” Herold says. “Can we create technology that is as natural as talking to a friend? This is where we want to go, and it’s happening in front of our eyes.”

No keyboard necessary.

Autos

Ford SYNC

The driver’s seat just became a lot more powerful – now, get information on the go simply by asking. With Ford SYNC, drivers can ask for traffic reports, directions, local business, weather, sports scores, movies and more without taking their eyes off the road to look at a screen. Say the name of a business and Ford SYNC will tell you directions, turn by turn. Say “Home,” and Ford SYNC directs you back home.

Using Microsoft Tellme cloud-powered speech services, Ford SYNC connects you with the world outside your car.

Ford and Microsoft SYNC Up in Europe [Microsoft feature story, Feb 28, 2011]

Ford launches SYNC powered by Microsoft at CeBIT 2011.

More than three years ago Ford introduced SYNC, its award-winning connectivity technology built on the Windows Embedded Automotive platform to deliver rich, interactive experiences for drivers. Initially available only in North America, SYNC quickly became one of the industry’s most advanced voice-controlled connectivity and infotainment systems. At the end of 2010, Ford celebrated the installation of SYNC in more than 3 million vehicles.

This week at CeBIT, Ford President and CEO Alan Mulally will take to the stage to unveil the company’s global plans for SYNC. He will announce that, next year, European drivers will be able to benefit from a smarter, intuitive and simplified way of interacting with in-car technologies and their digital devices. The system will debut in the new Ford Focusnext year with the goal of being in more than 2 million vehicles in the region by 2015.

Microsoft and Ford have spent more than five years building innovative functionality into SYNC. Since Bill Gates first announced the partnership at CES in 2007, both companies have continued to work together closely to develop new experiences to surprise and delight Ford customers. This includes the addition of the MyFord Touch interface, the Microsoft Tellme voice-activated app for SYNC Traffic, Directions and Information (TDI)Services, and other new features.

MyFord Touchpowered by SYNC makes it easier to make phone calls, listen to music and get directions while in the car, while the voice-activated TDI system in Microsoft Tellme expands Ford SYNC’s cloud-based voice-command capabilities.

Ford SYNC automotive infotainment system -- February 2011

“We are pleased to announce that SYNC will soon be available to customers around the world,” Mulally said. “It is a smarter, safer and simpler way to connect drivers with in-car technologies and their digital lives. At Ford, we have always believed that the intelligent application of technology can help us deliver the very best customer experience and help us contribute to a better world, so we challenged ourselves to build technologically advanced cars that make driving greener, safer and smarter for all.”

MyFord Touch and the latest features of Ford SYNC demonstrate the flexibility of the Windows Embedded Automotive platform to offer Ford and third-party developers the opportunity to develop new and innovative features, such as mobile applications, an open API and Wi-Fi capability, while supporting the latest must-have consumer devices that are brought into the car.

Besides being able to play music the old-fashioned way — through CDs — users can also listen to all their favorite tracks via their smartphone, MP3 player and USB flash drives. They will also benefit from Internet “on the go” with SYNC’s Wi-Fi “hot spot” capability via a USB dongle or smartphone tether. Drivers are able to manage everything including climate control, mobile phone calls, satellite navigation and radio adjustments through voice control or an 8-inch, touch-screen LCD color display. They can even have e-mail messages read aloud and compose text messageresponses through voice command while on the move.

Some of the other features users will benefit from when SYNC launches in Europe include a voice-control system able to recognize 10,000 commands in each of 19 different languages.

Ford SYNC to be More Multilingual as Vocabulary Expands to Industry-Leading 19 Languages [Feb 27, 2011]

  • Ford SYNC® to expand its vocabulary from three to 19 languages, as Ford announces global rollout of the in-vehicle connectivity technology
  • New languages will be available first in Europe in 2012 with introduction in the Ford Focus
  • SYNC language expansion sets an industry benchmark for automotive voice recognition capability

Ford is expanding the reach of Ford SYNC globally with the European launch of its popular voice-controlled connectivity system, with the capability of now offering 19 languages.

SYNC was originally launched in North America in 2007 with three languages. With the additional 16 vernaculars, Ford will offer voice recognition capability, powered by Nuance Communications, in more languages than any other automaker offering voice control.

The expansion brings the convenience of SYNC to a much larger audience of potential customers, said Ford President and CEO Alan Mulally, who kicked off the global launch of SYNC this week at the 2011 CeBIT technology show in Hanover, Germany.

“We are pleased to announce that SYNC will soon be available to customers around the world,” Mulally said. “It is a smart and simple way to connect drivers with in-car technologies and their digital lives.”

Teaching a car to speak
At the heart of SYNC is the speech engine, and Ford is working with its speech technology partner, Nuance Communications, to deliver a similar experience across the multiple languages.

Ford leverages significant investments made by Nuance to support the broad dialect coveragerequired in larger regions such as the United States. Additionally, regions such as Europe present unique challenges, in part because of the proximity of different countries and the resulting need for multilingual solutions.

For the customer, that means SYNC can recognize 10,000 voice commands in any one of the available 19 languages, and can cope with variances in accents, vocabulary and local dialects.

If a German customer, for example, is driving in Italy, the system can provide directions in German but will use the correct Italian pronunciation for street names.

Within each international market, a unique set of abbreviations for text messaging also has been identified. For example, “cvd,” short for “Ci vediamo dopo,” was added for SYNC to read aloud, which basically means “See you later” in Italian.

“We had to make sure the system would behave as people expect in different countries and different cultures,” said Mark Porter, supervisor, SYNC Product Development. “That means we had to solicit local, native-speaking input for common abbreviations used in SMS messages as well as support different units of distance and date formats.”

Song titles and artist names posed further challenges. A German owner, for instance, may have songs by artists of German, American, Spanish and other nationalities on an MP3 player. Due to phonetic differences between the languages, the system must be able to recognize a name whether it’s pronounced in German or deep southern American English.

“The in-car experience needs to be global in nature, supporting a variety of languages to ensure all commands, addresses and song titles are recognized, whether you’re from Germany, Portugal or France. Localization should not equal limitations,” said Arnd Weil, vice president, Nuance Automotive. “Working closely with Ford, we’ve customized the SYNC experience across multiple languages to ensure drivers in all regions experience the simplicity and convenience that in-car voice technology has to offer.”

With the language expansion, SYNC with MyFord Touch will be available in:

  • U.S. English
  • U.K. English
  • Australian English
  • European French
  • Canadian French
  • European Spanish
  • U.S. Spanish
  • European Portuguese
  • Brazilian Portuguese
  • German
  • Italian
  • Dutch
  • Russian
  • Turkish
  • Arabic
  • Korean
  • Japanese
  • Mandarin Chinese
  • Taiwanese Mandarin (supported through Mandarin Chinese)

Software, rather than hardware, solutions
As with many SYNC advancements over the years, the expanded language capabilities leverage the system’s flexible, software-based platform for a cost-effective and efficient solution.

Using a single, common hardware module equipped with Wi-Fi®, SYNC can be easily configured for language on the assembly line. An on-the-line server connects with the SYNC module wirelessly, determines the appropriate software installation – including language – and downloads the information to the vehicle.

Using a common module and Wi-Fi installation avoids the logistics of stocking unique modules with every possible combination of language and capability offered by SYNC. In fact, Ford would have had to produce more than 90 different hardware modules to accommodate all of the different languages installed at assembly plants around the world.

Voice poised to become primary in-car communication interface
With independent research firms such as Datamonitor predicting that advanced speech recognition in the mobile world will triple by 2014 with similar growth for speech recognition in vehicles, Ford is ahead of the curve with the SYNC global language expansion plan.

Ford is committed to making voice recognition the primary user interface inside the car throughout the world, helping all drivers keep their eyes on the road and hands on the wheel,” said Jim Buczkowski, a Henry Ford Technical Fellow and director of Electrical and Electronics Systems for Ford Research and Advanced Engineering. “This expansion of SYNC language capabilities is a huge step forward in bringing voice technology to every market Ford serves.”

The Ford Focus will be the first vehicle to launch with SYNC in Europe in 2012.

2012 Ford Focus – MyFord Touch voice command tour [Feb 25, 2011]

Dominic Colella, Ford Sync Systems Engineer, gives us a tour of MyFord Touch in the 2012 Ford Focus.

FORD AND NUANCE ADVANCE VOICE RECOGNITION OF SYNC: NOW FASTER, FRIENDLIER, MORE PERSONAL [July 15, 2010]

  • With the introduction of MyFord Touch™ driver connect technology, Ford makes it easier to control in-car systems with fewer steps and more natural language; customers can now speak more than 10,000 first-level commands, up from only 100 in first-generation SYNC®
  • Working with voice control leader Nuance, SYNC will recognize more direct voice commands such as “Call John Smith,” “Find ice cream” and “Add a phone,” allowing users to do more with fewer steps
  • Innovative features boost recognition accuracy and provide “Samantha,” the voice of SYNC, with smoother, more natural speech patterns
  • Consumer acceptance of voice control is increasing; the Harris Interactive® 2010 AutoTECHCAST survey found an 8 point year-over-year improvement, and industry analysts predict continued segment growth

Ford SYNC Voice Recognition(PDF)

Video:
MyFord Touch – Faster, Friendlier Voice Recognition Control

DEARBORN, Mich., July 15, 2010 – Ford made in-car voice activation a reality for millions of drivers with SYNC, first introduced in 2007. Now, Ford engineers – working with voice technology pioneers Nuance Communications (NASDAQ: NUAN) – plan to once again raise the bar with the next generation of SYNC, a system that can understand 100 times more commands than the original, thus delivering a more conversational experience between car and driver.

The voice upgrades will be available on the next generation of SYNC powering the new driver connect technology, MyFord Touch, launching this year on the new 2011 Ford Edge. The system will make it easier for drivers to use voice control and get what they want more quickly using more natural phrases.

“Ford is committed to making voice recognition the primary user interface inside of the car because it allows drivers to keep their eyes on the road and hands on the wheel,” said Jim Buczkowski, director of Ford electronics and electrical systems engineering. “The improvements we’ve made will make it easier for drivers to use and interact with it, even those customers that have never used voice recognition before.”

Improved vocabulary
At the heart of SYNC is the speech engine, and Ford is working with speech technology leader Nuance to create and integrate a vast library of possible driver requests. This library will enable the SYNC speech engine to listen for and respond to more voice commands directly, recognize different words that mean the same thing (aliases), and integrate a vast number of point-of-interest (POI) names and business types into its navigation system.

“With this latest generation of SYNC, users can control the system without having to learn nearly as many commands or navigate as many menus,” said Brigitte Richardson, Ford global voice control technology and speech systems lead engineer. “As we’ve gained processing power and learned more about how drivers use the system, we’ve been able to refine the interface. Customers can do more and say more from the top-level menu, helping them accomplish their tasks more quickly and efficiently.”

Examples of some improvements to SYNC powering MyFord Touch-equipped vehicles include:
More direct, first-level commands

  • “Call John Smith” dials the phone number associated with John in a connected phone’s phonebook directly – the user isn’t required to say “Phone” first
  • Direct commands related to destinations, like “Find a shoe store” or “Find a hotel,” place users in the navigation system menu where they will be walked through the POI search process
  • The command, “Add a phone,” will enter the phone pairing menu and walk users through the connection process – users don’t have to enter a phone submenu to initiate the pairing process

Quicker, easier entry and search

  • Navigation entries can be spoken as a single one-shot command; for example, “One American Road, Dearborn,” instead of requiring individual city, street and building number entries
  • Brand names are recognized by the navigation POI menu, allowing drivers to look for chain restaurants, shoe stores, department stores and more, as well as regional and local favorites
  • Direct tuning of radio stations by simply saying “AM 1270” or “FM 101.1,” or using SIRIUS station names or numbers such as “21” or “Alt-Nation”

Use of aliases

  • Within the climate menu, users can voice-request the same function using several different phrases, such as “Warmer,” “Increase temperature” or “Temperature up” – helping reduce the need for drivers to learn specific commands
  • When requesting a specific song from an MP3 player, users can now say “Play song [title]” in addition to saying “Play track [title]”

Personalized access

  • If an occupant’s USB-connected device, such as an MP3 player, has been named, users can simply say the device name, such as “John Smith’s iPod,” rather than the less personal “USB” command

More friendly and adaptable
Ford voice engineers refined SYNC beginning with the two features customers interact with first: the voice recognition system and Samantha, the digital voice behind system commands.

To help SYNC react to driver commands more quickly and accurately, the team integrated Nuance’s Unsupervised Speaker Adaptation (USA) technology. USA learns the voice of a driver within the first three voice commands, quickly creating a user profile and adapting to tone, inflection and even dialect for a 50 percent improvement in recognition performance. USA then continues to learn during that same trip, even picking out another user and creating a second profile if the voice is markedly different. Currently SYNC can actively adapt to voices in English, French-Canadian and Mexican-Spanish – with more languages on tap.

“The power of the SYNC voice control system is its ability to understand and respond to more natural language commands – and the advanced adaptability of the speech recognition technology enables the system to train itself with each successive use,” said Michael Thompson, senior vice president and general manager, Nuance Mobile. “The adaptability of SYNC is pretty remarkable – a feature functionality Nuance and Ford worked hard to develop to ensure seamless customer interaction with the system every time it starts up. So even if the car owner has a cold or someone borrows the car, SYNC will adapt to the changed voice and process spoken commands without missing a beat.”

Initial interactions also involve Samantha, the “voice” of SYNC. In an attempt to help Samantha sound less computerized, Ford boosted the size of her speech profile approximately fivefold. The additional speech units will help Samantha speak in a smoother, more human voice as she helps vehicle occupants accomplish their in-car tasks such as making phone calls, playing songs from a connected digital device and getting directions.

Voice poised to become primary in-car communication interface
With smart phones expected to replace desktop and laptop PCs as the primary web access point by 2015, some industry analysts believe voice control will replace touch devices like keyboards and screens as the primary method of search. Dr. Philip E. Hendrix, Ph.D., founder and director of immr and analyst with GigaOM Pro, says that a majority of smart phones will have optimized a Voice User Interface by the end of 2012.

Research trends show strong consumer acceptance of voice recognition technology. The Harris Interactive 2010 AutoTECHCAST study found that 35 percent of drivers1 say they would be likely to adopt voice-activated controls or features in their vehicle, up from just over one-quarter (27 percent) in 2009. In recent Ford-conducted market research of SYNC owners, more than 60 percent reported they use the voice controls while driving.

Datamonitor, an independent research firm, predicts that the global market for advanced speech recognition in the mobile world will triple from 2009 to 2014. Market growth of speech recognition in vehicles is expected to grow at a similar rate, from $64.3 million in 2009 to $208.2 million in 2014.

Voice commands may reduce distracted driving
Ford knows that customers are increasingly using mobile electronics while driving, and studies show hands-free, voice-activated systems such as Ford SYNC offer significant safety benefits versus hand-held devices.

According to a 100-car study conducted by Virginia Tech Transportation Institute, driver inattention that may involve looking away from the road for more than a few seconds is a factor in nearly 80 percent of accidents. The improvements to SYNC should help drivers accomplish tasks hands-free using natural speech patterns and fewer commands, enabling them to focus on the task of driving.

Ford SYNC Voice Recognition [July 13, 2010]

  • With the introduction of MyFord Touch™ driver connect technology, Ford makes it easier to control in-car systems with fewer steps and more natural language; customers can now speak more than 10,000 first-level commands, up from only 100 in first-generation SYNC®
  • Working with voice control leader Nuance, SYNC will recognize more direct voice commands such as “Call John Smith,” “Find ice cream” and “Add a phone,” allowing users to do more with fewer steps
  • Innovative features boost recognition accuracy and provide “Samantha,” the voice of SYNC, with smoother, more natural speech patterns
  • Consumer acceptance of voice control is increasing; the Harris Interactive® 2010 AutoTECHCAST survey found an 8 point year-over-year improvement, and industry analysts predict continued segment growth

FACT SHEET: FORD SYNC® VOICE-CONTROLLED COMMUNICATIONS & CONNECTIVITY SYSTEM [Sept 21, 2010]

Overview
Ford SYNC®, co-developed with Microsoft and using Nuance Communications voice recognition technology, allows customers to bring digital media players and Bluetooth®-enabled mobile phones into their vehicles and operate the devices via voice commands or with the steering wheel’s redundant audio controls. SYNC is an agnostic software platform that connects with the vast majority of makes and models of Bluetooth-enabled cell and smart phones from all network service providers, plus digital music players and USB memory sticks.

Facts

  • Launched in fall of 2007, first on the 2008 Focus, the most affordable Ford car at the time
  • SYNC has since been installed on more than 2.5 million cars, trucks and crossovers
  • SYNC will launch globally, in Europe and Asia-Pacific, in 2011 with the introduction of the new 2012 Focus
  • SYNC voice recognition available in U.S. English, Canadian-French, and North American Spanish (expanding to 21 languages next year)
  • In general, SYNC is installed on 70 percent of all Ford vehicles sold. More specifically, among 2010 models, it was selected by 81 percent of F-150 buyers, 85 percent of Fusion buyers and nearly 90 percent of Edge buyers
  • Ford market research results:
    • Post SYNC demonstration, non-Ford owners show a 3-fold increase in willingness to consider Ford
    • Of SYNC owners:
      • 32% see SYNC as having played an important or critical role in their purchase decision.
      • 60% of owners use the voice commands
      • 62% are completely satisfied with 80% of heavy users completely satisfied
      • 77% would recommend – 92% of heavy users would recommend.

Availability

  • SYNC, where optional, costs $395, the same price as when it launched in 2007.
    • No subscription necessary
  • On most Ford products, SYNC is optional on mid-level trim series (SEL and XLT) and standard on high-end trim series (Limited and Sport).
    • SYNC is available on the following 2010 models: Focus, Fusion, Fusion Hybrid, Taurus, Mustang, Edge, Flex, Escape, Escape Hybrid, Explorer, Explorer Sport Trac, Expedition, F-Series, E-Series, Super Duty (plus new 2011 Fiesta and Edge)
  • SYNC is standard on Lincoln models including the 2010 MKZ, MKS, MKX, MKT, Navigator (plus new 2011 MKZ Hybrid and MKX)

Standard SYNC Features

  • Bluetooth connectivity for mobile phones – Voice-activated, hands-free calling including automatic phonebook transfer
  • USB port for digital media players (such as Apple iPod and Microsoft Zune) and USB mass storage devices – Voice-activated access to digital music files including MP3, AAC, WMA, and WAV.
  • Audible text message readback – Text-to-speech engine capable of reading aloud incoming text messages from compatible Bluetooth-paired phones
  • Bluetooth streaming audio (A2DP) – Digital content, including music, podcasts, and Internet radio broadcasts can be played through the vehicle’s audio system

Standard SYNC Applications (for 2010 models)

  • 911 AssistTM
    • First launched for the 2009 model year (and available for 2008 models as dealer-installed upgrade)
    • Commands SYNC to use the Bluetooth-paired cell phone to make an automatic call directly to a local 911 emergency operator in an air bag-deploying incident
    • No subscription: Free capability for the life of the vehicle
    • Video: http://www.youtube.com/v/sI3ixk5kDBM
  • Vehicle Health Report
    • First launched for the 2009 model year (and available for 2008 models as dealer-installed upgrade)
    • Provides personalized report on command including vehicle diagnostics, scheduled maintenance, recall information, and dealership coupons
    • Information sent via data-over-voice technology using Bluetooth-paired phone and accessed through the www.syncmyride.comwebsite
    • No subscription: Free capability for the life of the vehicle
    • Video: http://www.youtube.com/v/p-CaBKLltTA
  • Traffic, Directions & Information
    • Launched for 2010 model year along with the addition of a GPS receiver as standard SYNC hardware
    • Delivers voice-activated, on-demand turn-by-turn directions, business search, traffic reports, and personalized information
    • Information services include weather, news, stock quotes, movie listings, sports scores, horoscopes, and travel connections
    • Leverages Bluetooth-paired and registered cell phone (smart phone or data plan not required)
    • Free for the first 3-years of vehicle ownership; Continued access only $60 per year
    • Videos:

Coming Soon

  • AppLink
    • Industry-first capability providing drivers access and control of smart phone apps using voice commands and vehicle controls
    • First launches on 2011 Fiesta
      • Software will be available by end of 2010 for owners via download and installation directly from www.syncmyride.com
    • Compatible with AndroidTM and BlackBerry® smart phones (Apple® iPhone compatibility coming in mid-2011)
    • First SYNC-enabled smart phone apps: Pandora Internet radio, Stitcher podcast radio, and OpenBeak (a Twitter client)
    • Standard SYNC feature; no subscription necessary (owner must have compatible smart phone and data service plan)

MyFord Touch™
The second generation of SYNC evolves the device connectivity system into the operating system behind the new MyFord and MyLincoln Touch driver interface launching on the 2011 Ford Edge and Lincoln MKX. MyFord Touch is a holistic approach to the driver interface replacing many of the traditional vehicle buttons, knobs and gauges with clear, colorful LCD screens and intuitive 5-way buttons on the steering wheel. In MyFord Touch-equipped vehicles, SYNC now controls the functions of phone communications, entertainment/audio, navigation/services, and climate. Voice recognition has improved ten-fold with SYNC now responding to over 10,000 commands at the first press of the “Talk” button.

MyFord Touch, powered by SYNC, will migrate next to the 2011 Ford Explorer and 2012 Focus and eventually be available on over 80% of Ford products globally.

Voice Activated Navigation System on SYNC with MyFord Touch [Aug 2, 2001]

Ford Drops Price of SYNC by $100, Making Hands-Free, Voice-Activated In-Car Connectivity More Affordable, Available to All [Aug 1, 2011]

  • Ford initiates new pricing strategy for SYNC®, making the hands-free, voice-activated connectivity system more affordable for customers; dropping option price to $295 makes SYNC the most capable and most affordable system on the market
  • Launching first on the 2012 Ford Explorer and Edge, SYNC will now be available as optional equipment on base trim levels, marking broader availability and more choice for customers
  • Making hands-free technology more affordable and available comes on the heels of Ford becoming the first automaker to announce its support for a nationwide ban on the use of hand-held mobile devices while driving

Ford is making hands-free, voice-controlled in-car connectivity even more affordable, announcing both a $100 price drop for Ford SYNC® along with expanded availability by offering it as an option on base trim levels for the first time.

“Ford SYNC is making a difference. Our customers love it and recommend it, and our dealers want it on more products,” said Ken Czubay, Ford vice president, U.S. Marketing, Sales and Service. “SYNC already has brought hands-free, voice-activated in-car connectivity to millions, helping keep drivers’ eyes on the road and hands on the wheel. Now, Ford is making it even easier for customers to afford exactly what they want.”

The move marks the company’s latest push to make voice control the primary and safest way for customers to access their favorite mobile devices while driving – a capability more and more drivers are clamoring for, according to the Consumer Electronics Association (CEA).

In a 2010 study, the CEA found that 55 percent of smartphone owners, for example, prefer voice commands as their primary in-car user interface. SYNC users agree, with internal Ford research showing more than 85 percent say they use voice controls while driving, up from 60 percent in previous studies.

This month, Ford became the first automaker to openly support the Safe Drivers Act of 2011, proposed federal legislation for a nationwide ban on the use of hand-held mobile devices while driving. To date, 10 states, including California and New York, have legally banned talking on a hand-held cellphone while driving, with many local municipalities also following suit enacting their own set of restrictions. Text messaging while driving is banned in 34 states.

The new SYNC pricing and choice strategy for 2012 ups the ante on how Ford is translating this trend into real-world actions that offer smarter in-vehicle connectivity solutions for customers.

“As the list of states banning hand-held calls and texting while driving continues to grow and legislators ponder a nationwide ban, Ford is strengthening its leadership position as the only full-line automaker with plans to offer available hands-free mobile device connectivity on 100 percent of its passenger vehicle lineup,” said Czubay.

SYNC has been installed already on more than 3 million vehicles since its debut in 2007.

The new pricing strategy makes SYNC the most capable and most affordable in-car connectivity system in the industry. The new pricing will be available first on the 2012 Ford Explorer and Edge base models. Customers who opt for SYNC will pay only $295 for the award-winning in-car connectivity system, previously priced at $395. In addition, SYNC will now be available on all trim levels, as the availability chart of the 2012 Ford Edge shows:

Ford Edge Trim Level 2011 Model SYNC Availability 2012 Model SYNC Availability
SE Not available Optional
SEL Optional Standard
Limited Standard Standard
Sport Standard Standard

With the base SYNC package, customers will enjoy the core hands-free features and services that have quickly established SYNC as a must-have technology, with more than 76 percent of current SYNC users saying they would recommend the system to other customers. Those features include:

  • Hands-free, voice-activated calling via a Bluetooth®-connected mobile phone
  • Hands-free, voice-activated control of a USB-connected digital music player
  • 911 Assist™, the automated emergency calling service that is free for the life of the vehicle
  • Vehicle Health Report, the on-demand diagnostic and maintenance information service

In addition, customers who choose the base package will have the option to purchase a SYNC Services subscription, which expands voice-controlled features to include a cloud-based network of services. These include turn-by-turn directions, traffic reports, and business search information with available live operator assistance if needed. A SYNC Services subscription costs only $60 a year, besting the telematics services offered by the competition.

Ford dealers are excited about the prospect of being able to offer SYNC to a larger population of their customers.

James T. Seavitt, president of Village Ford in Dearborn, Mich., says he wouldn’t be surprised to see those rates soar even higher with the new SYNC pricing and base model availability. Seavitt admits that approximately 75 percent of the vehicles he currently sells have SYNC.

“Customers frequently ask about SYNC in our dealership as they continue to hear more about the benefits and convenience of hands-free connectivity while driving,” said Seavitt. “This move from Ford will help dealers put more customers in SYNC-equipped vehicles so they can experience why using their voice to control their favorite mobile devices in the car is a smarter choice.”
On Edge and Explorer alone, SYNC has already been a big hit on the showroom floor, with current take rates above 80 percent. With the new pricing strategy, SYNC is now expected to be installed on more than 95 percent of models sold.

During the next three years, Ford will introduce the new SYNC pricing and choice strategy across the entire North American Ford vehicle lineup.

Vehicles next in line after the 2012 Ford Explorer and Edge include the 2013 Ford Taurus, Focus, Escape and Flex.

Microsoft Tellme Puts Ford Drivers on Cloud 9 [May 13, 2010]

With more than 2 million SYNC-equipped vehicles on the road today, Ford has shown that people want a simpler and easier way to make phone calls, listen to music and get directions while in the car. And now, a whole new group of drivers is about to experience SYNC as Ford launches the all-new 2011 Ford Fiestalater this summer.

The newest addition to the small-car segment, the Fiesta will be available with Ford SYNC, powered by Microsoft, a fully integrated, in-car communications and entertainment system that gives drivers hands-free, voice-activated control over their mobile phones and media players. SYNC includes Microsoft Tellme’s voice-activated Traffic, Directions and Information (TDI)system — an interactive voice-powered service that expands Ford SYNC’s voice-command capabilities. The introduction of the Fiesta is the first time that an economy car will be available with this level of technology.

According to Microsoft Tellme research, 93 percent of motorists want the type of speech services provided by Tellme and 80 percent say availability of the Tellme service would be a key factor in which car to purchase. Vehicle manufacturers like Ford recognize this demand and look to Microsoft for a differentiated product offering with a strong user appeal.

Ford SYNC with TDI breaks ground with in-vehicle infotainment by taking full advantage of the power of Tellme’s speech recognition platform and the Windows Embedded Automotive platform. The flexibility of the Windows Embedded Automotive platform enables automakers to build upon and create unique in-vehicle experience for their consumers.

Microsoft Tellme and the new Ford Fiesta

Microsoft joins Ford to celebrate the launch of the new Ford Fiesta with Ford SYNC TDI technology.

In April, more than 200 media from all over the country participated in a two-day program in San Francisco where they got to kick the tires and test drive the Fiestas throughout the city, participate in demonstrations, and compete in some street-course activities. Representatives from Microsoft Tellme were also on hand to talk about its role in the Ford SYNC TDI system.

With coverage of more than 14 million business listings, personalized traffic information, turn-by-turn directions, and location-based search, Microsoft Tellme’s cloud-based voice applications give drivers access to real-time information that’s updated continually, ensuring that searches for businesses, addresses and routes are always current.

A question the Microsoft Tellme team gets often is, “How does it work?” SYNC automatically connects drivers’ mobile phones and media players with their vehicle’s in-car microphone and sound system, simply by pushing a button on the steering wheel.

To use TDI, it’s as simple as this:

  • Press the Voice button on the steering wheel and say “Services.” When you hear SYNC’s greeting, say “Traffic.”
  • When prompted, say the name of a personal saved destination, such as work or home or grandma’s house. You can even just say the name of a city.
  • SYNC will respond with a custom traffic report — as determined by the in-vehicle GPS receiver — to your destination.
  • When multiple routes are available, you will hear the estimated travel time on each route, based on distance and traffic conditions.
  • This all happens through a connection with your Bluetooth-enabled mobile phone, in a regular voice call, so there’s no need for a data plan or for Ford to add a costly embedded cellular radio.

Customers are clearly excited about the Fiesta, with more than 1,000 retail orders already placed before the car is even available to the public. The Fiesta will make its debut on North American roadways later this summer.

KIA UVO

Stop reaching out to change the station while driving. With Kia UVO, use your voice to play a song or change the station, make or answer phone calls, send and receive SMS text messages and more. Say “Play artist Rolling Stones” and start listening. Turns out, you can always get what you want…it’s as simple as asking.

Kia UVO powered by Microsoft
Kia UVO is an innovative and intelligent in-car communications and entertainment system. Using UVO, drivers and passengers can quickly and directly access music files, change radio stations, make or answer phone calls, send and receive SMS text messages, and operate a rear-view camera when the driver shifts into reverse, all through voice-activated controls using Microsoft speech recognition technology.

Kia Motors unveils infotainment system for its vehicles powered by Microsoft(r) in the US[Jan 6, 2010]

    • Kia UVO, short for ‘Your Voice,’ features a breakthrough user interface that provides simple and easy access to Kia vehicles’ multimedia and infotainment systems
    • UVO is the first in-vehicle solution to integrate full Microsoft(r) intelligent speech engine technology

(SEOUL) January 5, 2010 – Kia Motors America (KMA) today unveiled an innovative and intelligent in-car communications and entertainment system, ‘UVO powered by Microsoft(r),’ to be available in select Kia vehicles in the US starting this summer.  UVO provides consumer friendly voice- and touch-activated experiences for simple management of music files and hands-free mobile phone operation.  Co-developed with Microsoft(r) and based on Windows Embedded Auto software, UVO is an easy-to-use, hands-free solution that allows drivers and passengers to answer and place phone calls, receive and respond to SMS text messages, access music from a variety of media sources and create custom music experiences.

Understanding drivers want and need intuitive controls, Kia Motors and Microsoft(r) designed UVO to enable a new level of voice recognition through Microsoft(r) speech technology.  UVO users will be able to access media content and connect with people through simple, quick voice commands without having to navigate through menus.  By supporting complex grammar, UVO needs only short voice commands to connect drivers and passengers with their desired functions. An interactive system, UVO responds to inquiries such as ‘What’s playing?’ and provides audible answers and related functions, helping to keep drivers’ eyes safely focused on the road.

UVO also brings advancements to in-car technology through an immersive user experience.  The interface features a 4.3-inch, full-color display that provides detailed information on media content, phonebook data and vehicle information; the screen also doubles as a rear-view camera when the shifter is put in reverse.  UVO is an open platform that seamlessly integrates with a wide variety of mobile phones, music players and other devices, making it easy for drivers to quickly pair devices.

“UVO powered by Microsoft(r) is a breakthrough for in-vehicle infotainment that helps allow drivers and passengers to safely and easily use all of their personal technologies to create personalized in-vehicle communications and entertainment experiences,” says Michael Sprague, Vice President, Marketing, KMA.  “Collaborating with Microsoft(r), Kia Motors is able to offer drivers an experience that will provide our cars with a clear competitive advantage.”

“We are very excited with the customized approach Kia Motors is bringing to in-car infotainment,” says Kevin Dallas, General Manager of Microsoft’s Windows Embedded Business division.  “Kia’s UVO system demonstrates how the power of Windows Embedded technology can keep consumers connected to the devices, information and entertainment that matters to them most.”

Based on the award-winning Windows Embedded Auto platform, UVO can be updated easily as new consumer devices continue to be introduced to the market.

UVO will debut this summer in the all-new Kia Sorento and will be extended to additional Kia vehicles as part of the brand’s technological evolution.  Kia Sorento, Soul, Forte and Forte Koup already come standard and at no extra cost with Bluetooth(r)  wireless technology connectivity, iPod(r)/MP3/USB connectivity, and a three-month SIRIUS(r) satellite radio subscription.

UVO will be shown for the first time at the 2010 International Consumer Electronics Show (CES) in Las Vegas, January 7-10, in both Kia Motors and Microsoft(r) booths; representatives from both companies will be on-hand for demonstrations.

Key features of UVO, powered by Microsoft(r):

  • Advanced Speech Recognition: Intelligent Microsoft(r) speech technology is trained to the system operator’s voice, creating a personal profile and allowing for up to two different voice profiles in various languages. Support for large grammar commands and faster response time means the content is delivered when you ask for it. Kia Motors’ UVO system is the first in-vehicle solution to integrate full Microsoft(r) speech engine technology.
  • Natural Interface Advancements: A full-color, easy-to-use in-dash monitor allows occupants to quickly scroll through media and mobile device content through intuitive voice and touch-screen commands.
  • Custom Media Experiences with MyMusic: UVO’s ‘Jukebox’ function features a 1GB hard drive for media storage, allowing users to rip music from CDs or an MP3 player into personal MyMusic folders and store up to 250 songs sorted by title and/or artist – all through voice commands. The system can shuffle through an MP3 player or AM/FM and SIRIUS(r) radio stations and instantly identify what’s playing all through simple voice commands.
  • Rear Backup Camera: When the vehicle is put in reverse, a built-in rear backup camera uses UVO’s in-dash display to provide clearer images of the environment behind the car assisting the driver to identify certain objects that otherwise may be difficult to see.
  • Ability to Continuously Update Features and Services: Based on a flexible Windows Embedded Auto platform, updates and services can be delivered in a number of ways (over-the-air, over-the-Web) for Kia to continue to provide a superior user experience after the system enters the market.

Kia Motors is in the midst of a dramatic, design-led transformation, which has been delivering dynamically styled vehicles in several important segments at exactly the right time contributing to the brand’s continued gains in market share.  The launch of the all-new Sorento, the official vehicle of the NBA and the first vehicle to be built at Kia Motors’ first U.S.-based manufacturing facilities in West Point, Georgia, will further enhance the Kia lineup.

Kia Motors and Microsoft Usher in New Era of In-Car Technology [Jan 5, 2010]

Kia Motors America (KMA) and Microsoft today unveiled Kia UVO, powered by Microsoft, a new in-car infotainment system with advanced voice- and touch-activated features.

Kia UVO, Powered by Microsoft [Jan 7, 2010]

Microsoft’s Greg Baribault shows off Kia’s new in-car infotainment system, UVO, powered by Microsoft.

With UVO, drivers and passengers can quickly and directly access music files, change radio stations, make or answer phone calls, send and receive SMS text messages, and operate a rear-view camera when the driver shifts into reverse, all through voice-activated controls using Microsoft speech recognition technology. The hands-free system helps drivers stay focused on the road.

Features of UVO include advanced speech recognition; a 4.3-inch full-color display screen; and MyMusic, a jukebox-type function that enables drivers to shuffle between music sources including personal music folders, an MP3 player, or AM/FM and satellite radio.

Co-designed by Kia Motors and Microsoft, UVO is built on the award-winning Microsoft Windows Embedded Auto software platform. The system will be offered during the third quarter of 2010, starting with the 2011 Kia Sorento CUV.

Microsoft and Kia will demonstrate UVO at the 2010 International Consumer Electronics Show in Las Vegas this week.

Hands Free: Read & Reply to Email with Microsoft TellMe Speech [Oct 19, 2010]

Steven Bridgeland, Senior Product Manager for Microsoft’s Windows Embedded Business, discusses the speech integration built into Windows Embedded Automotive 7.

Spread the Word: Speech Recognition Is the “New Touch” in Computing [Oct 28, 2009]

Keyboards and mice still are the dominant methods for working with a PC or laptop. But big leaps in speech-recognition technology mean that talking to a computer may soon be as natural as using a mouse.

Leading Microsoft’s charge to that audible future is Zig Serafin, general manager of the Speech at Microsoft group. Serafin says his team’s goal is simply to create the world’s most advanced speech platform, one that spans cloud-based voice services, mobile phones and world-class servers for enterprise customers. “Voice is the new touch,” says Serafin. “It’s the natural evolution from keyboards and touch screens. Today, speech is rapidly becoming an expected part of our everyday experience across a variety of devices. Bill Gates articulated this vision a decade ago, and we’re seeing it happen today.”

Two years ago, Microsoft acquired Tellme Networksand has subsequently merged Microsoft’s speech development team (formerly the Speech Components Group) with Tellme to form the Speech at Microsoft group. The group’s sophisticated speech-recognition technology and Web speech engine, which has been under development for more than a decade, is leading to a wave of voice-enabled products promising easier, faster interactions — spanning automobiles, smartphones, and personal productivity software.

For example, Ford Sync, powered by Microsoft and Tellme, provides in-dash voice-activated navigation and search. In addition, Bing for Mobile, Exchange Server 2010, Windows 7, and new Windows® phones such as the Samsung Intrepid from Sprint are all voice-enabled.

“See” Your Voice Mail

One of the most eagerly awaited features in Exchange Server 2010 is the new Voice Mail Preview, a capability that is poised to transform the way people retrieve and navigate voice mail. Using speech-to-text technology, Exchange 2010 automatically sends a text preview of voice mail right to the user’s inbox.

Instead of wondering whether the little red light on their phones is signaling an important call, people can scan text previews, right in Outlook, to determine message content and priority.

Exchange Server 2010’s voice mail feature turns an audio call into a text preview.
Exchange Server 2010’s voice mail feature turns an audio call into a text preview.
Click for high-res image

Rajesh Jha, corporate vice president of Microsoft Exchange, says Voice Mail Preview in Exchange 2010 makes it dramatically easier to visually sift through voice mail on your PC, mobile phone, or any popular Web browser to quickly determine the importance of a call. “For me, this feature is invaluable during meetings or other situations when actually listening to voice mail is not a viable option,” says Jha.

Exchange Server 2010 will launch at TechEd Europe, which runs Nov. 9–13 in Berlin.

“Hands-Free” Calling, Texting and Search

The Bing for Mobile application is a free, on-the-go version of Bing with voice-enabled search. Using this application, people simply speak their search query to retrieve results on their Windows phone.

The Bing 411 service works for any phone. People call 1-800-Bing-411, speak their search, and hear the results or get a text message of addresses, directions and other information for easy access later. Both Bing 411 and the Bing for Mobile application help users safely access important information wherever they may be, when typing on a phone is slow, impossible or inconvenient.

With the newly launched Samsung Intrepid from Sprint, the first Windows phone to use Microsoft’s Tellme voice user interface, the experience gets even better. People can speak a search query or dictate a text message, making it dramatically easier to accomplish tasks on the go. Intrepid users simply press the Tellme button on the phone and say what they want — whether that’s to dial a colleague, text a friend, or search Bing for the nearest hardware store or best happy hour.

“When you’re on the go, using only keystrokes to search can be cumbersome, especially if you’re multi-tasking. It takes over 20 strokes of the keypad to find a restaurant on the Web,” says Yusuf Mehdi, senior vice president of the Online Services Division at Microsoft. “With Bing for Mobile or Bing 411, you simply speak your query to get results quickly, easily and safely. Using your voice to simply ‘say what you want and get it’ helps you do more when you’re in a mobile scenario.”

Talk to Windows 7

An improved speech recognition feature in Windows 7, launched last week, enables people to control their computer completely by voice or by touch and voice. Using Windows Speech Recognition, people can easily launch applications, access commands and even convert their voice into text in any application that runs on Windows 7. In addition, software developers can tap into these capabilities to enable rich, natural speech interactions between users and Windows-based applications.

Partners such as HP are already leveraging these capabilities in their Windows 7-based PCs with innovative applications that leverage speech and touch together to transform the user experience.

“By using the power of their voice, people can get their jobs done more efficiently,” says Ian LeGrow, group program manager for the Windows team at Microsoft. “With Windows Speech Recognition, the interactions between people and their computers can be more natural, not just in the future, but starting today.”

Voice at Your Service

The Speech at Microsoft group runs the Tellme platform, the world’s largest voice platform based on the VoiceXML standard, managing more than 6 million calls every day, helping businesses improve customer service.

This month, the Speech at Microsoft group introduced an enhanced Outbound IVR (interactive voice response) Service on the Tellme platform to provide proactive customer service. With this service, businesses can provide interactive outbound messages that allow customers to act upon the alerts — to pay a bill, rebook a flight, or schedule delivery for a missed package, for example. The Outbound IVR Service is optimized to work across the phone (as a call or text), e-mail, instant messaging and the Web to deliver a personalized, efficient experience.

Says Jamie Bertasi, senior director for Speech at Microsoft, “We are delivering a steady stream of innovations to our platform in order to continue to deliver the best experience for the caller and best performance for the enterprise. By leveraging the power of the cloud and the billions of interactions we see every year, we are able to fine-tune the way companies engage their customers, enabling them to improve customer satisfaction while significantly reducing costs.”

Looking Ahead: What’s Next

According to analysts, the growing demand across industries for speech technology indicates that voice is poised to transform the user experience on a variety of fronts.

“Speech-recognition technology has matured to a level where it’s a primary catalyst for the next wave of innovation in the unified communications space,” said Nancy Jamison, principal analyst with Jamison Consulting. “Microsoft’s recent advancements in speech really strike at the heart of what true unified communications is all about — improving the user experience.”

By combining Tellme’s speech optimization and deployment experience with Microsoft’s cutting-edge speech technology, this new group brings together a cross-functional team of domain experts to drive speech technology to new heights. By using cloud-based technology, the Speech at Microsoft group is envisioning a future where speech recognition rivals human understanding.

Serafin says that his team of experts will remain committed to applying their many decades of experience to push the frontiers of voice-enabled technology that brings speech into everyday use.

“For perhaps the first time in the history of Microsoft, we have our world-class speech scientists and highly respected software-plus-services experts under one roof, and I believe the resulting collaboration will lead to pathbreaking innovation,” says Serafin. “The climate in our R&D environment is optimally charged to accelerate advances, leverage the power of software plus services, and revolutionize the ways customers interact with a wide range of Microsoft products.”

Bolstering that expertise is the recent addition of Larry Heck to the role of chief scientist for the Speech at Microsoft group. Heck first joined Microsoft as the partner architect for the Online Services Division R&D. Before that he led the creation, development and deployment of the search and advertising algorithms at Yahoo!, and before that he was the vice president of R&D at Nuance. Heck has joined the Speech at Microsoft group to help chart the course of next-generation elements of Microsoft’s speech platform.

“Speech belongs in the cloud. Only there can you reach the scale, the enormous volume of interactions required to create a speech system capable of rivaling human understanding,” said Heck. “With the formation of the Speech at Microsoft group, the unrivaled breadth of our platform today, and our cloud-based approach, this future is within sight.”

Speech, the Experience Game-Changer [Aug 3, 2010]

The growth of connected devices, from automobiles to your mobile phone, coupled with the increase in data consumption is signaling the beginning of a broad shift in technology toward an era of more integrated, natural experiences driven by speech, touch and gesture.

Today at the 2010 SpeechTEKConference in New York, Zig Serafin, general manager of the Speech Group at Microsoft, delivered a keynote address describing Microsoft’s vision for speech and natural user interfaces (NUIs). Serafin demonstrated the latest in speech recognition technology that has been designed into upcoming Microsoft products. These products promise to deliver more elegant and accessible interfaces, allowing users to utilize their voices and, in some cases, their bodies to perform actions and access information.

During his address, Serafin demonstrated three speech innovations:

Kia UVO. Microsoft is creating more natural and safer automotive experiences using the Windows Embedded Automotive software platform and Microsoft Tellme Speech technologies. Starting later this year, Kia will begin offering the Kia UVOmultimedia and infotainment system in its all-new Sportage, Sorento and Optima. The UVO system is the first in-vehicle solution to integrate full Microsoft speech engine technology, allowing users to easily access media content and connect with people through simple, quick voice commands without having to navigate through hierarchical menus.

Windows Phone 7. Microsoft is raising the bar for mobile device interactions with the development of Windows Phone 7. Speech has been seamlessly integrated into the phone experience, for functions such as search, navigation and dialing.

Kinect for Xbox 360. Microsoft is unlocking new communication and entertainment experiences with Kinect for Xbox 360. The Kinect system allows users to navigate the Xbox 360 experience and participate in new gaming challenges by using NUIs such as gestures and speech.

“Microsoft is creating rich, immersive and seamless experiences across devices, delivered from the cloud. Speech will become the tool we use to unlock the power of devices as their connectivity and capabilities accelerate,” Serafin told SpeechTEK attendees.

As NUIs become more advanced and integrated into today’s technology, customers will expect to be able to interact more naturally, whether in front of the TV, in the car, on the go with their mobile device, or when interacting with businesses through customer-care applications, Serafin explained.

Just as important as the NUI is the fundamental shift in the architecture of speech, a shift that is accelerating the rate of learning and innovation. Microsoft Tellme has embraced a cloud-based architecture for speech. This architecture takes the billions of speech interactions running on the Microsoft Tellme speech cloud and uses them to improve the underlying recognition engine and improve the understanding of a user’s intent. For example, in the upcoming release of Windows Phone 7, users of the Bing voice search technology will be able to ask, “Who is pitching for the Giants tonight?” and get a listing of starting pitchers as well as ticket and weather information for the game. This represents a more natural experience for the user.

Microsoft continues to make significant investments in NUI, and in the next 12 months will be delivering products and technologies that will fundamentally change, for the better, how users will expect to interact with their TVs, mobile devices, and cars.

For more information on Microsoft’s speech innovations, please visit the Microsoft Tellme pressroom. You can also read more about Microsoft Tellme’s recent partner win with Avis Budget Group.

Customer Spotlight: Microsoft Continues Momentum in Speech With New Customer Care Solution for Avis Budget Group, Recent Industry Awards [Aug 3, 2010]

Today at the SpeechTEK 2010 industry conference in New York City, Microsoft Corp. announced the addition of Avis Budget Group Inc. to its growing roster of enterprise customers using the Microsoft Tellme speech cloud platform. Avis Budget, parent company of Avis Rent A Car and Budget Rent A Car, two of the world’s leading rental car brands, recently deployed the first in a series of new customer care solutions. By taking advantage of Microsoft Tellme’s award-winning speech cloud platform, Avis and Budget are delivering improved service to their customers during the peak summer travel season.

“Delivering new, streamlined customer care experiences can help save our customers time while giving them greater control over managing their vehicle rental arrangements,” said Thomas M. Gartland, executive vice president, sales & marketing, Avis Budget Group. “By working with Microsoft Tellme, we are able to deliver immediate improvements to our customer experience, while also keeping long-term technology costs in check.”

The second phase of the solution will add new reservation booking capabilities and expand integration to customer data systems to deliver enhanced caller personalization. By choosing the Microsoft Tellme cloud-based speech platform, Avis Budget will be able to roll out additional services to customers in an accelerated timeframe with minimal demands on its own internal system.

Microsoft Tellme Receives 2010 Speech Engine “Winner” Award

Also at SpeechTEK, Microsoft Tellme was honored with Speech Technology Magazine’s 2010 Speech Engine “Winner” Award, which is given to the year’s best speech recognition engine. In naming Microsoft its Winner, Speech Technology Magazine noted the company’s strengths in cloud-based speech and its focus on the enterprise, mobile and automotive markets. In addition, Microsoft Tellme was named “Leader” in the Speech Self-Service Suite category, in which the company’s depth in functionality and customer satisfaction were highlighted.

“2010 is the year speech hits the mainstream. Speech is changing the way we interact with technology in our homes, in our cars, on our mobile devices and on our PCs,” said Zig Serafin, general manager of Microsoft Tellme. “We are honored to be recognized as leaders in speech technology and will continue our efforts to make speech a natural part of everyday interaction with technology.”

Microsoft and Toyota Announce Strategic Partnership on Next-Generation Telematics [Apr 11, 2011]

Microsoft Corp. and Toyota Motor Corp. (TMC) today announced that they have forged a strategic partnership and plan to build a global platform for TMC’s next-generation telematics services using the Windows Azure platform. Telematics is the fusing of telecommunications and information technologies in vehicles; it can encompass GPS systems, energy management and other multimedia technologies.

As part of the partnership, the two companies plan to participate in a 1 billion yen (approximately $12 million) investment in Toyota Media Service Co., a TMC subsidiary that offers digital information services to Toyota automotive customers. The two companies aim to help develop and deploy telematics applications on the Windows Azure platform, which includes Windows Azure and Microsoft SQL Azure, starting with TMC’s electric and plug-in hybrid vehicles in 2012. TMC’s goal is to establish a complete global cloud platform by 2015 that will provide affordable and advanced telematics services to Toyota automotive customersaround the world.

As part of its smart-grid activities, aimed at achieving a low-carbon society through efficient energy use, TMC is conducting trials in Japan of its Toyota Smart Center pilot program, which plans to link people, automobiles and homes for integrated control of energy consumption. TMC believes that, as electric and plug-in hybrid vehicles become more popular, such systems will rely more on telematics services for achieving efficient energy management.

Microsoft has a long history of delivering platforms and services to the automotive market, including in-car infotainment systems built on the Windows Embedded Automotive platform, in-car mapping services with Bing and the Microsoft Tellme voice application, and many other consumer solutions.

“Today’s announcement of our partnership with TMC is a great example of how we continue to invest in the automotive industry and of our commitment to power the services that are important to consumers,” said Microsoft CEO Steve Ballmer. “It further validates the power of the cloud, as the Windows Azure platform will provide the enterprise-grade, scalable platform that TMC needs to deliver telematics in its automobiles worldwide.”

“This new partnership between Microsoft and Toyota is an important step in developing greater future mobility and energy management for consumers around the world. Creating these more efficient, more environmentally advanced products will be our contribution to society,” said Akio Toyoda, president of TMC. “To achieve this, it is important to develop a new link between vehicles, people and smart center energy-management systems.”

Advertisements

About Nacsa Sándor

Lazure Kft. • infokommunikációs felhő szakértés • high-tech marketing • elérhetőség: snacsa@live.com Okleveles villamos és automatizálási mérnök (1971) Munkahelyek: Microsoft, EMC, Compaq és Digital veterán. Korábban magyar cégek (GDS Szoftver, Computrend, SzáMOK, OLAJTERV). Jelenleg Lazure Kft. Amire szakmailag büszke vagyok (időrendben visszafelé): – Microsoft .NET 1.0 … .NET 3.5 és Visual Studio Team System bevezetések Magyarországon (2000 — 2008) – Digital Alpha technológia vezető adatközponti és vállalati szerver platformmá tétele (másokkal együttes csapat tagjaként) Magyarországon (1993 — 1998) – Koncepcionális modellezés (ma használatos elnevezéssel: domain-driven design) az objektum-orientált programozással kombinált módon (1985 — 1993) – Poszt-graduális képzés a miniszámítógépes szoftverfejlesztés, konkurrens (párhuzamos) programozás és más témákban (1973 — 1984) Az utóbbi időben általam művelt területek: ld. lazure2.wordpress.com (Experiencing the Cloud) – Predictive strategies based on the cyclical nature of the ICT development (also based on my previous findings during the period of 1978 — 1990) – User Experience Design for the Cloud – Marketing Communications based on the Cloud
This entry was posted in SaaS, smartphones and tagged , , , , , , , , , , , , , , , , . Bookmark the permalink.

3 Responses to Microsoft Tellme cloud service for WP7 ‘Mango’ and other systems

  1. Pingback: Carspike.com

  2. Pingback: Böngésző alapú alkalmazások mindenütt « HTML5+svg[2]+css3+ecmascript5+domL2/L3

  3. Antonio says:

    VC is all well and good if you can still do the basic things that made Voice Control capability a hit to begin with. Until I can use a simple command like, “previous track” to control my media playback (like I could in 2006), this all sucks. I don’t (shouldn’t) have to buy/rent/own a car (much less a Ford) or even be driving to do so. Although someone walking may have both hands free, they may not want to use them on their phone; and if they are running or weight training, they don’t want to pull their phone out of their pocket. If it’s about numbers, I’m fairly certain there are more people walking around listening to music as I type this than are driving. Which leads to the ultimate value prop, which is adoption of the technology. All those folks on foot will be your best friend when you want to sell the tools that allow them to ask their house, “What’s the weather like today?” because the technology was already visceral/ubiquitous etc. for them ‘
    cause the use it every day on the street.

    Nuff said, bring it back! (please)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s