Natural Language Generation in the Second Life virtual environment

Project Report

Christos Christodoulopoulos

Contents:

Introduction

Second Life is 3-D virtual world, created by entirely by its residents. Each resident can create a part of the world by acquiring virtual land and building structures on it. They can also interact with the rest of the world via an avatar i.e. a digital character that represents a certain real-life person. The interaction may be of various forms from personal communication (chatting) to commercial transactions.
Businesses and schools can create their own virtual environments (called ``Virtual Land Estates'') that they can use for supporting a ``media-enriched'' collaboration between their members, developing of cost-effective virtual experiences and reaching a global audience in a level that wa previously impossible through conventional media (e.g. the World Wide Web).
The University of Edinburgh has created its own virtual environment (the Edinburgh University Islands) where many research projects are taking place. One of these projects is the Virtual World of Whisky (VWoW) tutorial room and tour. The VWoW tutorial room is a virtual reconstruction of a real-life whisky shop where the users can book for tutored tasting sessions enhanced by video, audio and interactive messaging.
This project aims at delivering automatically generated whisky tasting notes that can be delivered via short text messages and audio at the the user's request. These generated notes must be able to capture the attributes of the whisky in question, the user's experience level and the discourse history.

System Architecture

Figure 1 presents the complete system architecture. In the following sections we will present an overview of the individual elements. The main emphasis is given in the NLG component which is presented in the Natural Language Generation section.
Figure 1. System Architecture Overview

I-X, I-X Process Panel and I-Room technologies

The basis for our system is provided by tools built with the I-X technology. I-X is a platform for the creation of intelligent systems for synthesis tasks (like planning) that provides:
an issue-handling style of architecture, with reasoning and functional capabilities provided as plug-ins. Also via plug-ins it allows for sophisticated constraint management, and a wide range of communications and visualisation capabilities.
In this implementation we will use the I-X Planner as an organiser for the activities of input/output handling and language generation process.
The I-X planner is managed through a customised I-X process panel, which in this implementation provides a set of activities related to the generation of the tasting notes and the interaction with Second Life. For the actual interaction with the world of Second Life the University of Edinburgh's AIAI has created the I-Room project. The driving motivation was to create:
an "intelligent room" or "knowledgeable room" to act as a knowledge aid to support collaborative teleconferences and meetings
One of I-Room's main components is the I-Room Helper. The Helper acts as an agent between Second Life and the real world that passes the requests of the avatars to the process panel (and in turn to the I-X Planner) and delivers messages from the external system to the room (e.g. turning on screens to show a video or writing a chat message to the avatars in the room). In this project the I-Room Helper will be used to provide the external system with the whisky name string and to deliver the NLG output in the form of written tasting notes and audio.
Another component of the system directly related to Second Life is the Avatar Sensor. This is another component of the I-Room that provides information on avatar's name, id, location, etc. to the external system. In this project the avatar sensor will be used to provide the user's experience (tied to his/hers avatar) which in turn will serve as the User Model for our NLG system.

Knowledge Base

For the knowledge base of our system we will use the Topcat whisky data created by Jon Oberlander et al. for the ILEX project. We converted the flat-file data to a Web Ontology Language (OWL) knowledge base that will serve for better information representation and reasoning in our system. We used the Protégé API for the knowledge base lookup and the information retrieval.
The knowledge base contains information about more than 100 Scotch whiskies including: region (Highland, Speyside, etc.), attributes (bottling, colour, nose, palate, body and finish) and distillery (name, owner, location).

Natural Language Generation

In this section we describe the individual parts of the NLG component in our system. The design of these parts follows the analysis of Reiter and Dale although some alterations were introduced for a better adapatation to the target corpora and simplicity reasons.

Target Corpus

Before we start building our NLG system we must first find a target corpus; in other words we must first be able to tell what kind of texts we want to generate.
Many of the available whisky tasting note are nothing more than a list of all the attributes of a particular whisky. For instance the scotchmaltwhisky.co.uk website has the following tasting notes for the Laphroaig whisky:
Colour: Rich deep gold.
Nose: Very powerful, "medicine", smoke, seaweed and ozone characters overlaying a sweetness.
Body: Full and strong.
Taste: A massive peated burst of flavour with hints of sweetness at the end.
Finish: Long and savoury.
At the opposite end, there are very rich manually crafted tasting notes that contain information that only a human whisky expert can have. A very good example for this type of tasting notes is the Scotch Malt Whisky Society's website. A sample text about one whisky is the following:
One of only three operating lowland distilleries, this one distills its spirit three times and has a three wood (expression of the malt, not the golf club...) This example is from a barrel and has picked up a good orangey gold colour for its age. The first impression on the nose is sweet, as one would expect. It becomes more and more hay-like with time, and we were reminded of lying in dry, hot, hay. The taste is minty and very clean, like breathing in sharply after crunching a mint. With water a little soap reveals itself to the nose, and a slight aniseedy, peppery sensation tingles the palate. Simple and sweet, this is a lowland that does exactly as it should.
Our NLG system aims at creating tasting notes that lay between these two extremes using all the information available from the knowledge base. The output text will contain sentences in the style of the emphasised parts of the above text. For this we will use a minimal amount of reasoning over the knowledge base data and for the more "iconic" phrases (e.g. like breathing in sharply after crunching a mint) we will use template expressions.

Document Planning

The first part of the NLG pipeline is the Document planner. This module takes as input a 4-tuple in the form representing the knowledge base, communicative goal, user model and discourse history respectively. For this implementation the communicative goal is always the description of the whisky (using tasting notes) and thus c only represents the whisky in question.
The user model is used to describe the user's level of expertise considering the whisky tasting process. In our system we classify the users either as experts or beginners. The beginner's tasting notes emphasise more at the tasting process itself, giving instructions to the user, while the expert's tasting notes contain more verbose descriptions, like the ones by the Scotch Malt Whisky Society.
The discourse history consists of only the previous mentioned whisky for simplicity reasons, although more advanced discourse history management schemes could be implemented. The output of the Document Planner consists of a structured message list. A message is a configuration of informational elements that are to be included in the generate text. Here we use one message per whisky attribute and therefore for every whisky description we have a DistilleryMessage, a ColourMessage, a NoseMessage and so forth.
To order this message list we use a series of empirical rules, mainly acquired by the target corpora. The ordering depends on the user model; for the beginner there is a fixed intuitive order where as for the expert user the ordering is dynamically determined by the whisky's attribute values. Tables 1 and 2 present the default ordering and the expert ordering rules respectively. The D(x) and W(x) in the 3rd rule represent the attribute x of the previous and the current whisky respectively.
The document plan also determines the structure of the text. That is how the messages are divided into sentences. As with the ordering, the structure depends on the user model. For the beginners we chose to compose one message into a single sentence (with the exception of the distillery and bottling messages) as we wanted the tasting instructions to be as clear as possible.
For the expert users we create structures of up to two messages per sentence. Since the ordering is dynamic, each sentence might be generated of a combination of two different messages each time. The only exception here is the body and finish messages which are always combined as a single sentence. Again, this is an empirical rule extracted by the target corpora.
Table 1. Default Ordering
1st Distillery
2nd Bottling
3rd Colour
4th Body
5th Nose
6th Palate
7th Finish
Table 2. Expert user exceptions
Distillery(status)=closed 1st
Year of bottling 1st
Colour!={Gold,Full-Gold} 2nd
x,D(x)=W(x) 2nd
Body 6th
Finish 7th
Default Order

Sentence Planning

The second part of our NLG system is Sentence Planning (or in the terminology of Reiter and Dale, Microplanning). The decisions for this module concern either lexicalisation, referring expression generation and aggregation. Unlike the Reiter and Dale approach, the sentence planning module is tightly coupled with the Surface Realisation module that we describe later on, in the sense that most of the choices made in this module are heavily affected by the decisions in the realiser level.
The lexicalisation part of the sentence planning has to do with the lexical choices we use for each sentence. For our system, the source data consist of attribute values described in orthographic strings and therefore most of the lexical choices are already pre-determined for us. However we have to decide which lexical forms to use for the comparison parts of two consequent descriptions (if any). For this task we use only a minimal set of comparative/contastive words ("...like...","...but also..."), but this set is easily expandable. One other form of lexalisation decision is the use of contrast between values of the same attribute. For instance if the attribute contains both sweet and sour values we can generate the phrase "sweet yet sour". The rules for generating the contrast are, as before, extracted from the target corpora.
For the referring expressions we use a small number of rules to decide whether or not a sentence can contain one and whether it should definite or indefinite. We rely on the surface realiser to determine the position of the expression and the actual generation is been done through a small template set. According to these rules a whisky may be referred to by its name with/without a definite article ([the] Laphroaig), a definite description with/without its region (this [Islay] whisky/malt/dram/spirit) or by the indefinite article (it).
We can discern aggregation into two types: across sentences (extra-sentential) or within a sentence (intra-sentential). Out of the two the former is of less importance in our system as each sentence is essentially a narration sequence to the previous one. Therefore our main concern is the intra-sentential aggregation which again is heavily depending on the realisation choices we make. The aggregation type can be either a simple connective (and) between the two messages of the sentence or a selection of a certain surface template if there is a ``borderline'' referring expression. Sentences (1) and (2) are examples of these two different aggregation types.
(1) [the nose is X] and [the palate is Y]
(2) [bottled in X this whisky] [has a Y colour]

Surface Realisation

For the surface realisation part we rely mostly on the use of templates; that is orthographic strings which contain slots to be filled by variables from the data. This method was preferred to the alternative of a rule-based syntactic realiser for two main reasons. First, the templates are easily expandable; one can simply add more strings to each template file thus achieving an even greater variation to the output text. Second, the templates allow the creation of the more "stylistic" verbose descriptions that one can find at the Scotch Malt Whisky Society's notes. The amount of reasoning and deep-level syntactic description that these phrases require makes them extremely difficult to generate with a rule-based surface realiser.
There are however some drawbacks to this approach, the greatest of which is that all the template have to be manually created, a process which is time and man-power consuming ever for a small scale domain like this. Another problem with the templates solution is that the strings require to be properly constructed so that they can aggregate into syntactically correct sentences; in other words every template must be designed in such way that it will join syntactically with every other possible template.
This last problem is not as challenging as it seems. Given some rule-based constraints (e.g. whether a template can follow a "borderline" referring expression or whether a template contains a referring expression or not) it is easy to create easy-to-join templates. For this system we have implemented the templates as rows in attribute templates files containing a boolean structure (used for the rule-based reasoning) and a surface form (the orthographic string with the template slots). Bellow is a part of the nose template file:

//compare, hasRefExp, followsRefExp, "surface form"
false, false, false, "the first impression is {values}"
false, false, false, "the palate: {values}"
false, false, true, " has a {values} taste"
false, true, false, "{refExp} has a {values} taste"
false, true, false, "{values} in taste {refExp}"
true, false, false, "the palate: {compareVal}"

For the creation of the "stylistic" descriptions we used another template file containing the attribute values and their verbose descriptions. We can choose whether to describe more than one value with a single description. However in order to use these multi-valued descriptions the attribute must only contain these values; otherwise we end up with problematic sentences. If only one value is being described the template must contain a slot to hold all the rest values. A part of the "fancy" descriptions template file is shown bellow:

//numOfValues, values, "surface form"
2, salty,oily, "briny and sligtly oily -like salt
and vinegar crisps in a boat's engine room"
1, smoky, "{restValues} with a subtle smokiness"

Sample Output

Figure 2 shows the actual output of the NLG system at the user's request for the description of the Laphroaig whisky. The system also generates an audio output as well (via a Text-To-Speech synthesiser) which is delivered in the form of streaming audio (again through the I-Room Helper).
Figure 2. Second Life VWoW room snapshot. The generated is shown in the form of chat messages by the I-Room Helper.

The text transcription of the generated text is:

From Laphroaig distillery, this Islay malt matured over 10 years.
It has a full-gold colour and the aroma is phenolic and sherry together with the distant scent of the sea.
It has a salty and oily taste.
After a medium and oily body it leaves a dry and round taste.

If the user asked for another whisky description, the discourse history will allow a direct comparison over the attributes of the two whiskies. An example is shown bellow:

Bottled in 1971, Millburn comes from Millburn distillery.
The colour is red and the nose: sherry like the Laphroaig, but also dry, rich and aromatic.
The taste is sherry and malty with a subtle smokiness.
The body: smooth, firm and full and with a warm, long and smoky finish.

The comparison is very clear; the two whiskies share one value of the nose attribute and this is depicted in the expression ...like the Laphroaig, but also.... An important point to notice here is that if there were more than one common attribute values between the two whiskies, the system would one pick the first one. If it state every similarity the text would have become quite boring to read.
As the reader may have noticed both these messages are addressed to an expert user. The generated tasting notes for the same whiskies for the beginner are quite different:

Distilled in Laphroaig distillery, this Islay whisky matured over 10 years.
Hold up your glass and see its full-gold colour.
Swirling the whisky around the glass will give the feel of the medium and oily body.
Hold you mouth open slightly to take in the phenolic, salty and sherry aroma.
Have a small sip and taste the oily and salty palate.
Enjoy the round and dry finish.

From Millburn distillery, this dram was bottled in 1971.
Hold up your glass and see its red colour.
Swirling the whisky around the glass will give the feel of the smooth, firm and full body.
Add a splash of water to unleash the dry, rich, sherry and aromatic nose.
Have a taste of the whisky, let the fire burn off and leave you with the smoky, sherry and malty taste.
Feel the warm, long and smoky finish.

Here the two message are almost identical: only a slight variations in the expression used, no comparisons, no "fancy" descriptions and the main focus is shifted towards the tasting process itself.

Conclusions and Further Work

In this report we presented a Natural Language Generation system for the Second Life virtual environment. We described the overall architecture of the system and focused on the NLG component where we analysed some of its designing aspects. The output of the system is within the style of the Scotch Malt Whisky Society's tasting notes although there are several limitation that apply, mainly because of the lack of the required data. One example of this lack of descriptive power is the phrase With water a little soap reveals itself to the nose from the original target text. The change of the aroma after the addition of water is just non-existent in our knowledge base. Another example of the limitations that the data introduce is the fact that we cannot describe a selected attribute value of a whisky in a more elaborate way than it was originally described in the knowledge base. Therefore we cannot get the expression for its age in the phrase ...has picked up a good orangey gold colour for its age to describe the colour of a whisky simply because that information is non-existent.
An approach to overcome these problems is to reason over the data, using a small-scale background knowledge. In that way for instance we could infer that if a whisky is more than X years-old and it has a full-gold colour where normally it should have a gold colour then this whisky is special and therefore the expression for its age is worth mentioning. Another improvement that we can introduce with reasoning is the insertion of parts of the distilling process in the generated text. For instance if the nose of a whisky is cherry was can infer (with some certainty) that it has matured in a sherry barrel.
Another expansion to the system would be to generate comparison texts between two different whiskies. This would require us to re-visit some designing decisions mainly in the part of Document and Sentence Planning.
A final touch could be the creation of a TTS voice for this domain that would contain all the names of the whiskies, the distilleries and their regions.

References

Ehund Reiter and Robert Dale, Building Natural Language Generation Systems. Cambridge University Press, Cambridge, 2000.

Artificial Intelligence Applications Institute, School of Informatics, University of Edinburgh, I-X: Technology for Intelligent Systems. http://www.aiai.ed.ac.uk/project/ix.

Artificial Intelligence Applications Institute, School of Informatics, University of Edinburgh, I-Room: A Room for Intelligent Interaction. http://www.aiai.ed.ac.uk/project/i-room.

R. Dale, J. Oberlander, M. Milosavljevic and A. Knott, Integrating Natural Language Generation and Hypertext to Produce Dynamic Documents. Interacting with Computers, 11(2), 109--135, 1998.


© Christos Christodoulopoulos, 2008