Trie

Have you ever wondered how auto-complete, spell checkers/auto-correct, browser history and etc., work? How are they so effective in suggesting words/phrases by just considering small parts of the terms you intended to type out?

Google’s auto-complete suggestions

Let’s discuss about some of the ways this can be done. We can store all the dictionary words and phrases which are typed out and then do like prefix search on those words based on the input. But this approach is so computationally expensive in addition to the massive storage which would require to store all these data.

So, how about we store the phrases/words in a prefix-preserved data structure like a linked list for example? Say for example these 2 words can be stored in the same linked list:

assassination classroom -> it can represent the words assassin, assassination and classroom, it can also collectively refer to the anime — assassination classroom

But this isn’t still effective in terms of time and space complexity, as for example assassination classroom and assassin’s creed would be placed on two separate linked lists, to search and suggest the phrases would be hard as now have to perform the search on two different linked lists if the prefix was just assassin.

Well, the most effective solution to these kind of scenarios is a special k-ary search tree called trie. Trie is a special and useful data structure that is based on the prefix of a string. They are used to represent the “retrieval” of data and that is where it gets its name trie.

Let’s first have a look at the structure of a binary tree node:

binary tree node

As you can see from the above code snippet, the 3 basic components required to construct a binary tree node are the 2 pointers to its left and right child respectively and the value of the node.

Similarly, a trie node must have two basic components, which are —

  1. children: this is an array of type Node, which are the links to possible character of keys, which can eventually branch out from the current node.
  2. isWord: this gives the idea whether there is a word formed by traversing the previous nodes(characters) up to the current node.
Trie node

Note: I have initialized the size of the children array to be 26(number of English alphabets), but it varies based on the character set which you choose to be represented by your trie.

Basic operations in a trie

The two most basic operations that we can do in a trie are — a. insert and b. search.

Steps in inserting a word into a trie:

  1. Traverse character by character for every character in the input string — the character as the key index of the children of the current node.
  2. If the character is new, create a node linking to that index of children array.
  3. Next, move to the existing or newly created node for the current character.
  4. Iteratively repeat this process, until the entire string is parsed, then mark the last node, isWord as true — to mark that this is particular traversal in the trie yields us a word.

Here’s a illustration of how the word apple inserted into an empty trie:

trie illustration of insertion of the word apple

From the above the diagram we can infer how the steps of insertion apply to this example:

  1. Initially the trie is empty so we create a new root node and link the next trie node to the index of key ‘a’ in the children array of the root node and since it isn’t the end of the word we mark the isWord as false.
  2. Now, we traverse to the next node which was linked to ‘a’ and we move simultaneously move to the next character in the input string which is ‘p’. So, the same step as for insertion of ‘a’ is repeated for ‘p’.
  3. This process is repeated until the insertion of ‘e’, now when we move to the next trie node linked to index of ‘e’, but there isn’t any more character in the input string to be parsed, so we mark the isWord as true — which signifies that ‘apple’ is a complete word in our trie.

Here’s a illustration of how the rest of the words in the word list are inserted into the trie:

trie illustration of insertion of words in the wordlist
  1. Insertion of “apps” is easy, as we once again repeat the process of insertion from root node, but this time around the creation of new nodes isn’t necessary for the prefix “app” as it is a prefix of “apple” too.
  2. Now when we move to the character ‘s’ in “apps”, we have to create a new node and link to the index of key ‘s’ as the character ‘s’ wasn’t parsed until this moment along this prefix. Next, we mark isWord as true to signify that “apps” is a complete word in our trie.
  3. Now when we move on to insert the word “ball”, and the same process is carried on.

I highly recommend you to practice the insertion operation first and get a clear understanding of what’s happening and why it’s happening before we move on the next topic.

Steps in searching a word in a trie:

  1. Traverse character by character for every character in the input string — the character as the key index of the children of the current node.
  2. Check if the child node is present for the particular character, if not return false — as this word isn’t present in the trie.
  3. Else, move to the next node, through link of the child node.
  4. Repeat this process iteratively.
  5. If the string is completely parsed, then check if the current node’s isWord is set to true, if so then the word is present — return true else return false.

Note: Checking just if the string is parsed isn’t enough, it is necessary to check for the isWord flag, as the following case maybe true too:

Suppose there is a trie which contains the word apple, but not the word app. On searching for app, you would have parsed the string apple’s prefix but not the word app itself as it isn’t present in the trie.

Here is the pseudo code for operations that can be performed on a trie. I have included another common operation found in a trie — startsWith — which says whether there are words in the trie with the same prefix as the input string:

Pseudo code for implementation of trie

Here is the code for implementation of a trie in java, you can try implementing the same in the language of your choice:

Java implementation of Trie — insert, search, startsWith functionalities

Now, we have coded a simple implementation of trie, I would recommend you guys to find the time and space complexity of various operations in a trie.

  1. Auto-complete: Trie provides the alphabetical ordering of data by keys. Trie even in the worst case gives fast auto-complete suggestions compared to imperfect hash table algorithms and key collisions in hash tables.
  2. Spell checker: Trie stores all the dictionary words. Using trie not only makes it easy to see the word in the dictionary but also simple to build an algorithm to include a collection of relevant words or suggestions.
  3. Browser history: Browsers keep track of the history of websites viewed by the user, so when the prefix of a previously visited URL is typed, it auto-suggests the website.
  1. You can try the same problem — implementation of a trie in leetcode:

2. Try building a trie for your choice of character set.

3. Try implementing delete operation on trie.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store