Michal Forišek's answer to Given a dictionary of words, how can we efficiently find a pair of words such that they don't have characters in common and that the sum of their lengths is maximized?

Given a dictionary of words, how can we efficiently find a pair of words such that they don't have characters in common and that the sum of their lengths is maximized?

Michal Forišek

teacher, scientist, competitive programmer · Author has 1.2K answers and 11.5M answer views · 11y ·

Originally Answered: Given a dictionary of words, how can we efficiently find a pair words s.t. they don't have characters in common and sum of their length is maximum? ·

To be consistent with John Kurlak's answer, I will use [math]n[/math] to denote the number of words and [math]a[/math] to denote the size of the alphabet. I will also use [math]\ell[/math] to denote maximum word length. Thus, the size of input is [math]O(\ell n)[/math] and this is also the time complexity of reading it.

The speedup for the naive quadratic-time solution is explained well in other answers. Here, I will explain a better algorithm for the case when the alphabet is small enough. Here, "small enough" will mean "we are able to spend [math]O(a2^a)[/math] time and [math]O(2^a)[/math] memory". Note that for lowercase English letters we have [math]a=26[/math] which makes this algorithm perfectly reasonable.

But first, a curiosity. I used the algorithm described below on the 62887 lowercase words in /usr/share/dict/words on my machine. Here's the best output: individualizing phototypesetter.

Here's the algorithm:

As in the other answers, we will use bitmasks to represent sets of letters. Thus, the integers from 0 to [math]2^a-1[/math] will represent all possible subsets of our alphabet, 0 being the empty set.

Our algorithm will consist of three steps:

For each set S of letters, find the longest word that consists of exactly those letters.
For each set S of letters, find the longest word that consists of at most those letters (i.e., some letters may be unused, but you cannot use a letter that does not belong to S).
For each word w, compute length(w) + length(longest word out of letters not in w) and pick the maximum.

Step 1 is easy: just take an array of size [math]2^a[/math], then read the input, and for each word update the corresponding cell. This can be done in [math]O(\ell n)[/math].

Step 2 can be done using dynamic programming. We process the subsets of our alphabet in the order 0, 1, ..., [math]2^a -1[/math]. (Note that this order that has the property that for any set S, all subsets of S are processed before S.) For each set of letters S, the best word for S is either the best word made out of exactly these letters (this we computed in phase 1), or at least one letter is unused. We try all possibilities for the unused letter and pick the best one. All of this takes [math]O(a2^a)[/math] time.

Step 3 is again easy. If we stored the set of letters for each word in step 1, step 3 can now be done in [math]O(n)[/math] time. Hence the overall time complexity is [math]O(\ell n + a2^a)[/math].

11.9K views ·

View upvotes

1 of 5 answers

Something went wrong. Wait a moment and try again.

View 4 other answers to this question

About · Careers · Privacy · Terms · Contact · Languages · Your Ad Choices · Press ·