Category Archives: speech

New York Times article discusses our work on in-vehicle navigation devices

Last week I was interviewed by Randall Stross for an article that appeared in the September 2 edition of the New York Times. Mr. Stross’ article, “When GPS Confuses, You May Be to Blame,” discusses research on in-vehicle personal navigation devices, including our work on comparing voice-only instructions to map+voice instructions [1].

Specifically, Mr. Stross reports on a driving simulator study published at AutomotiveUI 2009, in which we found that drivers spent significantly more time looking at the road ahead when navigation instructions were provided using a voice-only interface than in the case when both voice instructions and a map were available. In fact, with voice-only instructions drivers spent about 4 seconds more every minute looking at the road ahead. Furthermore, we found evidence that this difference in the time spent looking at the road ahead also had an effect on driving performance measures. These results led us to conclude that voice-only instructions might be safer to use than voice+map instructions. However, the majority of our participants preferred having a map in addition to the voice instructions.

This latter finding was the impetus for a follow-on study in which we explored projecting navigation instructions onto the real world scene (using augmented reality) [2]. We found that augmented reality navigation aids allow for excellent visual attention to the road ahead and excellent driving performance.

References

[1] Andrew L. Kun, Tim Paek, Zeljko Medenica, Nemanja Memarovic, Oskar Palinko, “Glancing at Personal Navigation Devices Can Affect Driving: Experimental Results and Design Implications,” Automotive UI 2009

[2] Zeljko Medenica, Andrew L. Kun, Tim Paek, Oskar Palinko, “Augmented Reality vs. Street Views: A Driving Simulator Study Comparing Two Emerging Navigation Aids,” MobileHCI 2011

Co-chairing AutomotiveUI 2010

On November 11 and 12 I was at the AutomotiveUI 2010 conference serving as program co-chair with Susanne Boll. The conference was hosted by Anind Dey at CMU and co-chaired by Albrecht Schmidt.

The conference was successful and really fun. I could go on about all the great papers and posters (including two posters from our group at UNH [1,2]) but in this post I’ll only mention two: John Krumm’s keynote talk and, selfishly, my own talk (this is my blog after all). John gave an overview of his work with data from GPS sensors. He discussed work on prediction of where people will go, his experiences with location privacy and with creating road maps. Given that John is, according to his own website, the “all seeing, all knowing, master of time, space, and dimension,” this was indeed a very informative talk 😉 OK in all seriousness, the talk was excellent. I find John’s work on prediction of people’s destination and selected route the most interesting. One really interesting effect of having accurate predictions, and people sharing such data in the cloud, would be on routing algorithms hosted in the cloud. If such an algorithm could know where all of us are going at any instant of time, it could propose routes that overall allow efficient use of roads, reduced pollution, etc.

My talk focused on collaborative work with Alex Shyrokov and Peter Heeman on multi-threaded dialogues. Specifically, I talked about designing spoken tasks for human-human dialogue experiments for Alex’s PhD work [3]. Alex wanted to observe how pairs of subjects switch between two dialogue threads, while one of the subjects is also engaged in operating a simulated vehicle. Our hypothesis is that observed human-human dialogue behaviors can be used as the starting point for designing computer dialogue behaviors for in-car spoken dialogue systems. One of the suggestions we put forth in the paper is that the tasks for human-human experiments should be engaging. These are the types of tasks that will result in interesting dialogue behaviors and can thus teach us something about how humans manage multi-threaded dialogues.

Next year the conference moves back to Europe. The host will be Manfred Tscheligi in Salzburg, Austria. Judging by the number of submissions this year and the quality of the conference, we can look forward to many interesting papers next year, both from industry and from academia. Also, the location will be excellent – just think Mozart, Sound of Music (see what Rick Steves has to say), and world-renowned Christmas markets!

References

[1] Zeljko Medenica, Andrew L. Kun, Tim Paek, Oskar Palinko, “Comparing Augmented Reality and Street View Navigation,” AutomotiveUI 2010 Adjunct Proceedings

[2] Oskar Palinko, Sahil Goyal, Andrew L. Kun, “A Pilot Study of the Influence of Illumination and Cognitive Load on Pupil Diameter in a Driving Simulator,” AutomotiveUI 2010 Adjunct Proceedings

[3] Andrew L. Kun, Alexander Shyrokov, Peter A. Heeman, “Spoken Tasks for Human-Human Experiments: Towards In-Car Speech User Interfaces for Multi-Threaded Dialogue,” AutomotiveUI 2010

Talk at SpeechTEK 2010

On Tuesday (August 3, 2010) I attended SpeechTEK 2010. I had a chance to see several really interesting talks including the lunch keynote by Zig Serafin, General Manager, Speech at Microsoft. He and two associates discussed, among other topics, the upcoming release of the Windows 7 phone and of the Kinect for Xbox 360 (formerly Project Natal). We also saw successful live demonstrations of both of these technologies.

One of Zig’s associates to take the stage was Larry Heck, Chief Scientist, Speech at Microsoft. Larry believes that there are three areas of research and development that will combine to make speech a part of everyday interactions with computers. First, the advent of ubiquitous computing and the need for natural user interfaces (NUIs) means that we cannot keep relying on GUIs and keyboards for many of our computing needs. Second, cloud computing makes it possible to gather rich data to train speech systems. Finally, with advances in speech technology we can expect to see search move beyond typing keywords (which is what we do today sitting at our PCs) to conversational queries (which is what people are starting to do on mobile phones).

I attended four other talks with topics relevant to my research. Brigitte Richardson discussed her work on Ford’s Sync. It’s exciting to hear that Ford is coming out with an SDK that will allow integrating devices with Sync. This appears to be a similar approach to ours at Project54 – we also provide an SDK which can be used to write software for the Project54 system [1]. Eduardo Olvera of Nuance discussed the differences and similarities between designing interfaces for speech interaction and those for interaction on a small form factor screen. Karen Kaushansky of TellMe discussed similar issues focusing on customer care. Finally, Kathy Lee, also of TellMe, discussed her work on a diary study exploring when people are willing to talk to their phones. This work reminded me of an experiment in which Ronkainen et al. asked participants to rate the social acceptability of mobile phone usage scenarios they viewed in video clips [2].

I also had a chance to give a talk reviewing some of the results of my collaboration with Tim Paek of Microsoft Research. Specifically, I discussed the effects of speech recognition accuracy and PTT button usage on driving performance [3] and the use of voice-only instructions for personal navigation devices [4]. The talk was very well received by the audience of over 25, with many follow-up questions. Tim also gave this talk earlier this year at Mobile Voice 2010.

For pictures from SpeechTEK 2010 visit my Flickr page.

References

[1] Andrew L. Kun, W. Thomas Miller, III, Albert Pelhe and Richard L. Lynch, “A software architecture supporting in-car speech interaction,” IEEE Intelligent Vehicles Symposium 2004

[2] Sami Ronkainen, Jonna Häkkilä, Saana Kaleva, Ashley Colley, Jukka Linjama, “Tap Input as an Embedded Interaction Method for Mobile Devices,” TEI 2007

[3] Andrew L. Kun, Tim Paek, Zeljko Medenica, “The Effect of Speech Interface Accuracy on Driving Performance,” Interspeech 2007

[4] Andrew L. Kun, Tim Paek, Zeljko Medenica, Nemanja Memarovic, Oskar Palinko, “Glancing at Personal Navigation Devices Can Affect Driving: Experimental Results and Design Implications,” Automotive UI 2009

MERL gift

I’m happy to report that I received a gift grant in the amount of $5,000 from Mitsubishi Electric Research Laboratories (MERL). The gift is intended to support my work on speech user interfaces and it was awarded by Dr. Kent Wittenburg, Vice President & Director of MERL.

This gift comes in the context of ongoing interactions between researchers at MERL and my group at UNH. Kent and Bent Schmidt-Nielsen hosted me several years ago for a demonstration of the Project54 system (I drove to Boston in a police SUV, which was fun), and I also gave a talk at MERL last fall. In 2009 my PhD student Zeljko Medenica worked as a summer intern at MERL under the direction of Bret Harsham (Bret recently gave a talk at UNHon some of this work – see picture below). Zeljko is headed back to MERL this summer and he will work under the direction of Garrett Weinberg.

I greatly appreciate MERL’s generous gift and I plan to use it to help fund a graduate student working on speech user interfaces. I hope to report back to Kent, Bent, Bret and Garrett on the student’s progress by the end of this summer.

Project54 on front page of New York Times

In a front page article of the March 11, 2010 edition of the New York Times Matt Richtel discusses in-vehicle electronic devices used by first responders. Based on a number of interviews, including one with me, Matt gets the point across that interactions with in-vehicle devices can distract first responders from the primary task for any driver: driving. The personal accounts from first responders are certainly gripping. Thanks Matt for bringing this issue to the public.

Enter Project54. According to Matt “[r]esearchers are working to reduce the risk.” He goes on to describe UNH’s Project54 system which allows officers to issue voice commands in order to interact with in-car electronic devices. This means officers can keep their eyes on the road and their hands on the wheel. The article includes praise for the Project54 system by Captain John G. LeLacheur of the New Hampshire State Police. The Project54 system was developed in partnership with the NHSP and almost every NHSP cruiser has the Project54 system installed.

Both the print and the online versions of the article begin with a picture of the Project54 in-car system. This great picture was taken by Sheryl Senter and it shows Sergeant Tom Dronsfield of the Lee, NH Police Department in action.