CISL – Language Lab

March 8, 2018

Anyone who’s struggled to get their point across in a foreign language they haven’t mastered knows the benefit of enlisting the full contents of the human communication toolbox: waving your arms, imploring with your eyes, describing the word you lack with an assembly of the words you have, even throwing in a little English when nothing else comes to mind is all part of the repertoire. It’s clunky, but our flexible human minds bridge the gap between the ideal words and meaning.

Could a computer do the same? The answer — as found in the Cognitive and Immersive Systems Laboratory (CISL) — is yes. CISL is developing a system that allows Mandarin Chinese language students to practice their new tongue with a cognitive agent. The computational goal is for the agent to understand human communication — including elements like spatial position, gesture, and facial expression — in a specific context, despite deficiencies in vocabulary, grammar, or pronunciation typical of language learners. Ultimately, the system will adapt to students’ ability level, coach them when necessary, and provide feedback on performance and exercises to improve.

Recently, CISL held a test of its “Mandarin Project” system with the help of students enrolled in Chinese 1, a class taught by Yalun Zhou, an expert in second language acquisition and an assistant professor of Communication and Media. The specific context of the conversation was taken from unit 4 of their class — at the restaurant. In Studio 2 of the Curtis R. Priem Experimental Media and Performing Arts Center (EMPAC), students entered a cognitive and immersive environment mimicking a restaurant, greeted a “waiter” (digitally represented as a panda), and tried to order a drink and food.

CISL Director Hui Su, himself a native Mandarin speaker, said the test was intended to collect data on the system’s performance, with a focus on areas where novice Chinese speakers encounter difficulty. By introducing the system to non-native speakers, CISL hopes to uncover patterns in errors, and equip the computer to understand and compensate. In its first trial, Su, a Rensselaer professor of practice in computer science, was impressed with the results:

The dialogue to order food is very simple, but from the perspective of a non-native speakers, it’s already complicated. From greeting, to ordering a drink, to ordering food, and finishing payment is 10 to 20 rounds of conversation with a meaningful task. Nevertheless, all the student pairs were able to greet the waiter and order a drink, a few teams made it through the entire ordering process, and one team completed the payment step, which they haven’t even covered in class. Although we saw a lot of room for improvement, I’m very encouraged by what we’ve achieved thus far.

Zhou, who is aiding CISL in the project, said that for the students, the Mandarin Project represents a unique experience in academia.

The Chinese language learning experience that Studio 2 provides to RPI students is a one-of-a-kind educational experience that their counterparts in other universities cannot match… The ultimate goal of learning a foreign language is to use the target language to communicate with people of that language in a grammatically correct and culturally appropriate way. Research has shown that task-based, communicative language practices lead to the best results for that purpose. The immersive restaurant at Studio 2 provides a near real-life environment for RPI students to practice and test their abilities to complete a communicative task, i.e., ordering Chinese food with the AI waiter whose speech is generated from native speaker accents. This is an invaluable opportunity for the students who are unable to practice speaking with native speakers outside of the classroom.

The Mandarin Project – a reboot of an initiative launched in 2012 to combine narrative, game design, and augmented and virtual reality to teach Chinese — is the latest manifestation of CISL, which is dedicated to pioneering immersive and cognitive systems as an aid to collaborative problem-solving.

A collaboration launched in 2015 between Rensselaer and IBM Research, CISL was charged with developing a cognitive and immersive environment that could be used as a classroom, meeting room, design studio, or diagnosis room. Within a year, CISL had reached its first milestone, completing an initial “cognitive and immersive architecture” that allows a computer to integrate sensory information from multiple sources (like speech, spatial positioning, gesture), translate it into an understanding of events in the room, and offer an appropriate response.

This prototype powered a “meeting room” application run in a cognitive and immersive space in EMPAC Studio 2 that facilitates a business discussion of mergers and acquisitions. Although rudimentary in comparison with human understanding, the initial architecture established a framework that could be continually improved and adapted to other scenarios such as a classroom, design studio, or diagnosis room. The Mandarin Project is one such scenario.

Su divides the capabilities of the Mandarin Project cognitive and immersive system into three “areas.” Area 1 is about “capturing inputs,” such as capturing and combining verbal inputs with gestures, and capturing who is speaking about what, when, to whom, and in what context. Area 2 is about reasoning, planning and understanding — called “Mind-of-the-Room.” And Area 3 focuses on the computer’s meaningful response, which Su calls “immersive narrative generation.”

The Mandarin Project system builds on the initial architecture CISL developed with improvements that can be helpfully divided into the three areas.

For language learners, the most important improvement in Area 1 is a “spatial context system,” which combines gesture and verbal input from multiple channels to generate meaning. As an example, students may point at a menu item and ask “waiter, what is this?” without specifying the meaning of “this” (a word they may not know), and the system will know what they mean.

Area 2 includes a lot of what Su calls “help functions.” A “request cue” feature allows users to ask in English or Chinese “what am I supposed to say next?” or “what did you say?” and the system will respond appropriately. Language switching allows users to request to “switch back to English” when needed. Users may request a transcription of input the system recognized, for their own review.

The Mandarin Project also has several help functions under development that it did not test, including a “pitch tone contour analysis.” Chinese syllables can be pronounced with four “tones,” each of which alters the meaning of the syllable and ultimately the word, and correct tone is often a daunting skill for students to master. As an aid to improve pronunciation, a “visual tone cue” will display the correct tone above words that are part of the dialogue. The pitch tone contour analysis will be used to record the tones a user used, and how that compares to the correct tone choice, showing users where they erred and how to improve.

Su says developers are “just getting started” on Area 2, with several additional features planned. Among their ideas, “adaptive response” will track use of help functions, and adjust difficulty of the dialogue accordingly. A “tone help” function will offer correct verbal pronunciations of words that are mis-pronounced. An “alternate approach” will understand users who try to describe a word they don’t know with the words they do know. And a “personalized learning plan” function will generate exercises for each user based on their performance.

Some Area 2 improvements are dedicated to adapting a Chinese language speech recognition engine that was created for native users. The system was “trained” on native speakers, and tolerances for tone and pronunciation are set accordingly. But those settings can be altered for non-native speakers based on the experiences in Studio 2.

In Area 3, immersive narrative generation, the system is able to offer information about specific dishes on the menu, like the history of Peking duck. The system is able to fetch data from DBPedia, although at the moment information is pre-fed into the computer. Eventually, the system will fetch information in real time.

Although the system is still very much a prototype, Zhou said she is certain her students are the beneficiaries.

The students enjoyed the immersive interaction and expressed that the immersive learning allowed them to gain an understanding of how their speaking ability is developing/progressing. From the point view of the instructor, these “smart” technologies of the three areas are appealing because they “force” the students to interact with the AI waiter like a real customer in an authentic Chinese restaurant. The multimodal multitasking (e.g., listening and speaking to the waiter and reading the menu simultaneously, reading the transcription to diagnose errors and adjusting their speaking, and the instant help function) stimulates their desire to complete the food ordering task in real time. Although we are still constrained by the available technology for full function of pedagogical design, as a foreign language educator, I am thrilled to see how novice learners can practice in a simulated restaurant, be motivated to complete the communicative task, and increase their level of confidence regarding speaking.

Categories Campus and Community