The general rule of thumb is more the utterances per intent, the better it is. But having sheer high numbers of utterances is not enough, it is imperative that there be high variance in the type of utterances. We are working on a system that will help auto generate these for you, but until then this guide can help you generate different forms of utterances.
Add command like utterance. Such as "Show cart", "Give cart" etc.
It helps to use an online thesaurus to come up with synonyms and add them.
Try to think of sentences even if they do not make grammatical sense. Often,
people are not grammatical when they speak. This could be because they were
in the middle of another thought or were thinking fast, but do not expect your
users to make only grammatical sentences.
Try variations with as many prepositions.
One of the largest sources of errors would be overlap between utterances of different intents, or the overlap of entities of two different entity types. If you face a situation where you feel an utterance is not triggering your expected intent, then check to see if that utterance is part of that intent you expect triggered. If it is not then add the utterance to the intent. But if it is already part of the intent, then check to see if the utterance is part of another intent. If it is part of another intent then check to see if that is unintentional or it shouldn't be part of that intent anymore due to some changes. If it is unintentional or can be removed then the best thing to do is to remove it.
However, if it makes sense to keep the intent then see if the two intents can be combined. This will work if they trigger the same functionality or have the same statements/prompts in both conditions when all entities are provided or some are not.
If you cannot combine them and have a special reason why you'd want to keep them separate, this can be because of complex logic based on the app state or any other reason, then you'll have to write handlers that are able to condition on the intent(s) and entities, for the actual app functionality that has to be triggered. See using entities to trigger app actions above.
It is a hard task to come up with every single type of utterance a user may say, and train the schema with. Even though the machine learning system inside Slang is supposed to help expand to understand more utterances than just the ones you provided, the system does have its limitations. In order to get a better understanding of intuition of the machine learning which is at work behind the scenes of Slang, read this section. However, the general rule of thumb is more the data better it will act.
One mechanism of making the system better is to constantly keep adding the utterances that your users used back to the schema, even if they worked well. Periodically, you should go through the utterances listed in the analytics page and add it back to the schema in the corresponding intent.
These are the following conditions an utterance in the analytics page can be in:
The utterance is triggered with the correct intent and entities and the utterance (with the entities abstracted out) is available in the examples list of the intent exactly, word for word. No action has to be taken here. This is the state all utterances have to be in over time.
The utterance is triggered with the wrong intent and the utterance is not in the examples set of the corresponding intent. Then add the utterance to the example set of the correct intent and mark the entities if any.
The utterance is triggered with the wrong intent but the utterance is in the given examples set. See if the same or similar utterance is available in the examples set of another intent. In such a case, you will have to remove the example from one of the intents, to remove the conflict. If this is not the case then raise a ticket to the Slang support team.
The utterance is triggered with the correct intent but the entities are not recognized properly. If the entity is of enum type then try adding the value to the enum list either as a value or synonym. Else, you can try to add the utterance back to the examples set of the correct intent and mark the corresponding entities.
The utterance is unrecognized. Add the utterance to the examples list of the corresponding intent.
Also, see the section on making the schema better for other languages, other than English, for utterances made in another language.
We are working on making the console more feature rich and friendly so as to make this process easier, but for now doing it manually is the prescribed method.
This may seem like a tedious process but we have seen that over the course of 4 to 5 rounds of analytics to buddy feedback with 15 testers/users of the your buddy, you should get coverage in the high 90 percents.
Due to its closed coupling with a UI, it feels very natural for users to command VAX with single words, or pair of words. One can say savings account and either expect the app to perform some default function such as show the savings account or to prompt for what action to execute, given the entity value savings account.
To allow for such cases to trigger intents you will have to add an utterance with only the entity and no other words. You would have to mark up the entity though.
For example, in the above case if you choose savings account to trigger show account statement functionality, then in that intent you can add an utterance savings account. The benefit of the markup is that now any other type of account as listed in the enum would now trigger the intent.
For the second case where you want the app to express that it has understood the entity savings account but is unsure of what action to take then you can add a dummy intent that captures this entity and based on that prompt for the action. See :ref:using_intents_for_collecting_missing_entities.
Even though we've explained before that the intent triggered maps to the function that is called, and the entities are the parameters to that function. It is not a hard and fast rule, that it be a one to one mapping between intents and functions. Multiple intents can call the same function and intents can call multiple functions conditioned on the app state or the entities.
We will explain using an example.
Example 1: Assume a DTH app with the functionality to know where a channel is (what the channel number is) and also the functionality to know where a program is, with utterances Where is HBO? and Where is Game of Thrones? respectively. You may want to have two intents one which handles where a channel is and the other where a program is. Because on your app you may have two different functions/database lookups/UI changes based on whether the user asked for where a channel or program. However, in this case most utterances would have the same form with program and channel interchangeable. In such a case it would make sense to keep one only one intent which capture sentences which mean where. And enumerate example sentences which have similar utterances for program and channels. As opposed to having two intents with similar examples that differ only by the entity.
One major reason why you may want to use such logic is: the only way the system is able to choose one intent over the other is because it recognises the entity and knows through the training that one entity is found in one intent more than than the other. But what if the user uses a channel or program that the system cannot recognize, then there is not enough for the system to choose one intent over the other. This may lead to unpredictable behaviour in that depending on the state of training one intent will be chosen over the other. It may make more sense to handle the similar example sentences in the same intent, and intent choice not depend on entity, in these cases, as you may be able to not prompt the user with a sentence such as "We did not understand the program or channel you were searching for, can you try again?"
So when do we decide the to use the intents to decide app functionality versus using the itents to decide the use case? There is no right answer, and depends heavily on the use case. But some of the factors that can decide this is:
Repeating the utterances. Are you repeating yourself too much by enumerating
the same type of utterances for the same functionality, or functionality that
can be decided only based on the entity.
How do you want the prompts to work in the negative case: in the case that the entity
is unrecognized but the intent is recognized. Do you want different ways of
But in the end it comes down to the way the internals of the intent classification
vs entity tagging happens. See the sections on the intuition of how
intent classification and entity tagging occurs to understand more. Each have
its own pros and cons. Intent classifcation is less sensitive to sentence
structure and entity tagging is sensitive structure and can therefore be more
precise, but can fail for cases it has not been trained on. You may get the
intuition over time, but in the beginning, we suggest you try both methods out
in case you aren't happy with the performance. Over time we will update this
section with more cases where one choice worked better than the other.
See the section below in resolving cases where the entities or intents overlap
and therefore impact performance. If there is such overlap you may want to consider
clubbing intents together and conditioning on the entity.
Some usecases require negative utterances to trigger intents. Examples of these include Don't show me morning flights. It may so happen that you may also have intents which have utterances such as Show me morning flights. In the first case you may want to trigger app functionality that shows the morning flights while in the second case you want to trigger app functionality that does not show the morning flights.
One solution could be to have two intents one which has show morning flights and other positive utterance in one intent and don't show morning flights and similar negative utterances another intent, such that according to the sentence it may trigger the corresponding intent. However, though, this may work well for sentences that are in the training set, it may not work that well for utterances that are not in the training set.
Added to this there may be a utterances that are not negated sentences like Hide the morning flights which are positive, in that they do not have a 'not' or 'don't' but trigger the opposite functionality of 'show'.
In such cases, it may be useful to have two intents one for show and one for remove, and then mark up negation words such as 'don't', 'not' etc. as an entity called say negation (you are free to use your own name). This follows the concept of using entities to decide app functionality even though it is the same intent, as explained above.
In your app logic now you have to condition on two elements, first the intent and the negation entity. The logic will flow as following. Assume you have an app function called
add_positive_filter(time_of_day) which takes the time of day as parameter and shows only flights for that time of the day, while
add_negative_filter(time_of_day) which take the time of the day as parameter and filters flights such that flights for that time of day are removed.
Now, assume the following utterances:
Show morning flights.
Don't show morning flights.
Hide morning flights.
Don't hide morning flights.
In the first utterance, the intent triggered will be show_flights with the entity negation not set and the entity time_of_day set as morning. In this case we will trigger
In the second utterance, the intent triggered will be show_flights with the entity negation set. The value of negation in unimportant, only the fact that it is set is crucial. We will now trigger the opposite of what the show_flights intention. We will trigger
In the third utterance, the intent triggered will be hide_flights with the entity negation not set and the entity time_of_day set as morning. In this case we will trigger
In the fourth utterance, the intent triggered will be hide_flights with the entity negation set. The value of negation, as in the second case, is unimportant, only the fact that it is set is crucial. We will now trigger the opposite of what the hide_flights intention. We will trigger
When presses the trigger and speaks their intent, they may or may not mention all the entities with the intent, but to complete the action your use case may require the user to definitely provide the values for these entities.
One way of collecting these entities is through the mechanism of required entity prompting which is spoken about in more detail here. (link to entities)
However, there may be cases for which the missing entity prompt mechanism is not suitable for your use cases. Some of them are:
The missing entity mechanism relies on the user saying only the entity and
not an entire sentence. For example, if the prompt is 'Where are you flying to?'
The missing entity prompt relies on the user saying only 'Bangalore' and not
'I want to fly to Bangalore' as the reply.
Your use case allows the user to mention two more entities as replies to a prompt
which requested two or more entities. Or, additionally, for the case the user
responded with more entities than prompted for. For example, the user is prompted
'Where do you want to fly from?' and the user replies, 'From Bangalore to Delhi'
Your use case requires complex logic for the prompting system. For example,
You have a use case where you ask the next prompt based on the reply for the
previous prompt, concretely: in a flight booking app, the user is prompted for
whether they are booking a return journey or a single journey, then based on the
answer you either prompt for a return journey date or not.
In such cases, you can design an 'intent' for which the purpose is more to collect entities than for figuring out the true purpose(intention) of the action. For this intent, the utterances can be designed in such a way that it concentrates on collecting the entities.
Examples of utterances for these intents are:
I would like to travel to Bangalore
In the client side of the system (Android client etc.) you would then have to handle these intents in such a way where, if they are triggered that implictly means that they are helpers to another major intent. For example, in a flight booking app, if we have intents called, book_ticket and city_entity_collection.
book_ticket would be triggered by utterances such as I want to fly from Bangalore or I want a ticket. In your app logic, you can now maintain a variable that remembers the state of the intent. What that means is if the user says I want to fly from Bangalore you can then prompt for the destination city. If the user says Delhi then city_entity_collection would be triggered and in your app you can maintain the logic of since the intent known is book_ticket and now city_entity_collection is triggered then we extract the entity(s) and add to the set of known entities.
Can or should we combine book_ticket and city_entity_collection intents? There may be a case for combining the intents together, such that both times the same intent gets triggered. Such that if the user says I want to fly from Bangalore and book_ticket intent get triggered, now you prompt Where do you want to fly to? and the user says Delhi, which again triggers book_ticket intent but this times with the entity Delhi marked as city in the entities set, so that you can now write a logic that takes this entity and adds it to the list of known entities under destination.
However, you may want to split up the two intents in order to reuse the second entity collection intent in another flow. For, example, after a book_hotel_room intent.
Another reason for why you may not want to split up the two intents is, you don't necessarily want the utterances in the second intent to trigger the first intent. You don't want the utterances in city_entity_collection to trigger book_ticket.
However, the auxilliary may be true too: even though the user may be responding to the prompt with an entity, and you have implemented city_entity_collection intent, it may actually trigger the book_ticket intent anyway. In such a case then you will any way have to handle the entity as in the case explained above where you had one intent.
The internal architecture of our system is such that utterances in other languages are translated using a translate tool and then English NLU is independently applied on it. Though, it works out of the box in many cases, there are other cases where the NLU does not understand the translated sentence because the translated sentence though correct may be a strange way of saying that sentence in English. Read :ref:.._intuition_buddy_building for more details on the NLU system works.
Largely there are two problems that arise from translations:
Changes in the utterances due to translation such that the NLU doesn't understand.
Changes in the entities due to translation such that the NLU doesn't understand.
We cover the explanations and the remedies in two different subsections below.
This is an example of a case from Hindi to English translation for a DTH based app where this does not work, but this problem can easily be extended to other languages: I want to subscribe to Star World. In Hindi, this would spoken as: मुझे स्टार वर्ल्ड की सदस्यता चाहिए (Mujhe star world ki sadasyata chahiye) Which translates to I want Star World membership. Which isn't something that one would say naturally in English, even though it seems correct.
This may effect the overall accuracies of the NLU for other languages.
Remedy: As for other things we are working on features to make this easier and faster, and maybe even redundant in the future. But for now the manual process we prescribe is:
If a sentence does not work in another language, then use Google Translate's online tool to translate the spoken sentence from the language of choice to English.
Add the sentence back to the correct intent and markup the entities accordingly.
Remember, even though this works out of the box, this issue crops because of the different ways a concept can be spoken in one's native language. Therefore, the issue may be faced and the remedy works, when working with native speakers of the language, if one has access to them.
It may also be possible that after translation there is overlap between the utterances or entities
When translating from another language there may be cases where the over enthusiastic translations tool would try to translate entities that it does not have to.
For example: The translation of the name of the Hindi movie Kabhi Kushi Kabhi Gham in English is 'Sometimes happiness sometimes sadness'. And the Hindi sentence: मुझे कभी कुशी कभी घम देकन हें (Mujhe Kabhi Kushi Kabhi Gham dhekna hein) which should get translated to I want to see Kabhi Kushi Kabhi Gham gets translated to I want to watch sometimes sadness sometimes happiness.
Of course this example can be easily extrapolated to other languages too.
Remedy: As for other things we are working on features to make this easier and faster, and maybe even redundant in the future.
But for now the manual process we prescribe is:
If an entity does not work in another language, then use [Google Translate's online
tool] (https://translate.google.co.in/) to translate the entity from the
language of choice (preferably in language's original script and not the
transliterated version) to English.
Add the translated version of the entity to the corresponding synonyms list.
Our classification engine will try to classify the utterance into one of the known intents or return that it thinks it does not belong to any one of the intents beyond doubt. See the section on the intuition of intents if you want to understand more on how this works. There are many use cases for which it is required behavior for Slang to reply with intent unrecognized versus letting it choose some incorrect intent.
One set of tricks that you as a developer can employ is to perform post processing on the utterance if the intent is unrecognized. You can write a database look-ups or custom logic on the sentence if the intent is not recognized.
We would love to know of the post processing steps you performed. If feasible we can create and add yours to the library of steps, so that it is reusable for others and you.
Often, because of the language model of the ASR, it may understand words that are close to what your user may have said. One of the more legitimate ways to solve this would be to use the API to bias the ASR here. But if for some reason that too doesn't work, another trick to use is to add the word that the ASR is recognizing to the synonyms of the word that your user would have actually wanted to say.
For example, if the user is saying add but the ASR picks it up as eight then you can add eight to the synonyms of add or if add is not an entity then add and utterances with eight in place of add.
Caution: Use treat this method only as hack and not a legitimate way of Improving accuracy. This method is highly fragile in that the synonyms that you use may overlap with more legitimate words in the schema, and may cause unwanted behavior. Therefore, please make a note of the times that you are using this this hack and any time you are debugging some unwanted behavior of the NLP, the hacks that you added for this would be a good place to review.
This is a specific work around of how ASR works. When there is long string of numbers say 789456123, these could be telephone numbers, serial numbers or any other similar type of numbers, a user can speak this as one of:
seven eight nine...four five six
seven eighty nine...four five six
seven hundred and eighty nine...four five six
The ASR may not return it as one long string of numbers but a set of smaller strings, depending on how the user says it.
To overcome this you can mark the entity as a list type and get an order list of smaller number strings. See how listing works here. Then on the client you can join all the smaller numbers in the list to get the larger string of numbers.
When Slang is integrated with your app, though ideal that every function of your app should be voice integrated, it is more possible that only some of your app functions are actually handled by voice. But in actual production you will not be able to stop a user from trying and asking their intent from the app. After all it is one of the features of VAX vs. traditional GUI: the user is not limited to the GUI to express their intent to the app.
For a good user experience, you probably want to handle those utterances and inform the user that the app understood what they were saying, but that feature is not supported through voice though. It is a better feeling for the user to be told this rather than the 'I am sorry, I did not understand you' statement that the app would make otherwise if the utterance did not map any of the intents.
The way to make this work is, in the console, try to have intents for every possible action that user may want to take using voice. Some of the actions may be handled and trigger callbacks in the app, while others may just trigger a prompt explaining to the user that they were understood, but no action will be taken.
These do not have to stop only at the functions that your app currently supports but the voice doesn't it. It can be for functions that your app does not support too.
For example, you may have a train ticket booking app, and the customer unknowingly tries to book a hotel a city. Instead, of showing him train tickets for that city, since that is the closest function in your app that gets triggered, or optionally, no action being taken. You can add a hotel booking intent to your buddy, which when triggered will inform the user that hotel booking is not currently supported by the app.