If you haven’t already, you probably want to read the general intuition of entities and entity types in the context of the other parts of the buddy in the introduction page. However, here, just like we covered on the intent page, we will cover a bit more intuition about what entities and entity types really are.
Entities are the data that you want extracted from the utterance as spoken by the user. If intent is the general notion of what they want to do. Entities flesh out the command or request more: if the intent gave us information on what to do, entities can give us information on what or where to perform the action. Even, when or sometimes, how to perform. Or if the action involves a person or their account, then to whom or with whom or from whom.
The way the entity extraction system works is: at the buddy creation time, you, the developer, would mark up the utterances that you would have added for intent definition. Just like how we explained for intents, we can definitely write a hard pattern lookup, or traditional string search, using regex or other search mechanism, however, that would require us to enumerate all possible patterns for utterances and entities within them. Again, this is where machine learning comes in: we use techniques behind the scenes to understand properties of the entities, like its location in the sentence and statistics of words it occurs with etc., to soften the lookup.
There are broadly two different properties that we can use to extract entities it’s location and relatedness to potential words to the front and back of it; and it’s membership to a known list of values.
As we have seen the first property is derived from the markup of utterances. However, the definition of entity types is where the second property comes in. Entity types are like a collection of knowledge about that particular known type. For, example, we have the utterance, ‘Book me a flight from Mumbai to Bangalore.’ The possible entities here are source city Mumbai and destination city Bangalore. In general, in aviation, we know there is only finite number of airports, therefore, we can enumerate all of the known airports in a list that if present gives a stronger case for tagging of that particular word or string of words.
The difference between entities and entity types are like variables and types in a programming language. The type gives the background system information to work with about the inherent qualities (values) of the concept, just like how int and float depicts numbers vs strings that depict sequences of alpha numeric characters that have certain properties, that can be used while programming. Same way, entity types have those same inherent properties. And extending the analogy, variables are instances of these types, allowing multiple instances to use share these properties. The best example is of source city and destination city in the example above. They both potentially can share common properties of being cities, but in the context of the utterance can take two different slightly different meanings.
Some entity types, are so general that we have implemented them for you to use directly. Some of them are listable items that we have put together for you. However, we have gone one step further for entity types like time, date and duration. There are many ways a user can say a date, for example: 24th April, Day after tomorrow. Instead of return the string to you, we have normalized them for you. Read __ for more information.
In some cases, it may not be possible to enumerate every single entity in a list. Maybe, you are have an app that refers to area names in every town in a large country like India. It may be difficult to obtain such an exhaustive list and you may always miss out on enumeration. In such cases, there is an option to expand entities, which naturally use information about the words in their vicinity or more. Read expand_entity_types for more information.
In the following we use bracket_notation for marking up the entities on the utterances.
We look at the same utterances we dealt with in intents_examples, but this time we markup the entities too.
In the ‘navigate’ use case:
Take me to my [cart](location)Show me my [saved items](location)Show me my [inbox](location)Take me [back](location)
The location entity is used to help understand where the user wanted to navigate to.
The ‘filter’ intent annotated below.
Show only [black](color) colourShow the ones with [long sleeves](sleeves_type)I want to see [office wear](wearing_occassion) clothes
We can use different entity_types to capture different types of attributes. For the e-commerce domain for fashion, we have color entity type to capture different values of colors, sleeves_type to capture different types of sleeves such as long sleeves, half sleeves, sleeveless etc. and wearing_occassion to capture values such as office wear and party wear.
In the intent adding_to_cart:
Add this item to the [cart](cart_entity)I'll take [this](item_context) itemI want to buy [this](item_context) item.
we see an example of an entity not having to be a physical item or property, but rather an abstract one such as the context entity with value this. We can use this to capture context from the page. Other examples for this entity can be next, previous, etc.
Like we saw before in the Examples of Entities and Entity Types: it may not be very obvious what the scope of the values of entity types should be. This question depends on the possible values of the entity type, and the context of where it will be used.
In the e-commerce use case we saw, while the values of sleeve type and wearing occassion are different, the context of their usage may be the same depending on your app. Both are attributes that once collected would probably help you filter items that you show your user.
The choice we have is between
Show me [blue](colour) shirt.Show me [sleeveless](sleeve_style) shirt.
Show me [blue](attribute) shirt.Show me [sleeveless](attribute) shirt.
Both are fine, but it may be useful to keep the following in mind to help you make your choice.
If you have multiple of these entities appearing next to each other like the following
Show me [blue](color), [sleeveless](sleeve_style) shirt.Show me [blue](attribute), [sleeveless](attribute) shirt.
Then the options are to mark the individual entities as belonging to different entity/entity_types or mark them as the same entity/entity_type, and then using entity listing (listing_entities) to mark attribute as a listable to capture all the instances of attribute that appear close to each other.
Another benefit is that it helps you better organize your entities and their values.
When using expand entity types(expand_entity_types), while we don’t currently, in the future we may apply apply intelligent entity expansion expand_entity_types based on the values that you have given.
The topics covered here along with those on the intents and the prompts page should be enough to develop most apps and use cases. However, there are many tips and tricks that we have learned over time which can help boost accuracy, those are included in the advanced topics.
We mark up utterances in our examples in this doc using the bracket notation. We use this annotation method for brevity. The box (square) brackets are to be considered to be part of the utterance, while the entity pertaining to that string is in that portion of paranthesis immediately following the entity.
I would like to book a ticket from [Bangalore](source) to [Delhi](destination)for the [24th of August](start_date).