Geeks With Blogs
Ulterior Motive Lounge UML Comics and more from Martin L. Shoemaker (The UML Guy),
Offering UML Instruction and Consulting for your projects and teams.
So in Part 4, I said that recognizing the music key would be tricky.

But why? Didn't I spend most of Part 3 explaining how cleverly I used M-SAPI so that users only had to say partial names to be recognized?

Well, yes; but I've long said that programming has a Conservation of Complexity law: the less complex for the users, the more complex for the programmers. (Be glad: that's the short version. My long discussion on Conservation of Complexity would take up the rest of this post.)

The reason why this flexibility leads to complexity is because one short phrase can match multiple long phrases. For instance, one album in my collection is Forever Gold by B.B. King. It includes these songs:

2. How Blue Can You Get?
3. Every Day I Have the Blues
10. Catfish Blues
14. Other Night Blues

I also have some sample music provided with Windows Vista, including one track from Aaron Goldberg's Worlds: OAM's Blues. From Sports by Huey Lewis and the News, I have Honkytonk Blues. From Jonathan Richman's self-titled album, I have Blue Moon. From Celebrating the Best of Jazz by Louis Armstrong, there's St. Louis Blues and Black and Blue. From Am I Cool or What? (yes, that's a Garfield CD — go ahead, laugh, but it has The Temptations, Patti LaBelle, Carl Anderson, Natalie Cole, The Pointer Sisters, Lou Rawls, Diane Schuur, Valerie Pinkston, Desiree Goyette, and B.B. King), there's Monday Morning Blues. From True Blue by Madonna, there's True Blue. From Cargo by Men at Work, there's Blue for You. From All-Time Top 100 TV Themes, there's Hill Street Blues. From Tropico, there's Outlaw Blues. From another Forever Gold title with Ray Charles, there's Sentimental Blues. From my fellow Duelist Geoff Nostrant (a.k.a. Silvercord), there's blueshift. From Who's Next by The Who, there's Behind Blue Eyes.

So if all I say to Dee Jay is "Dee Jay, Play Blue", Dee Jay will be really confused. Thirteen different songs have "Blue" in the title. Now that's my fault as the user; but we can't blame the users if we want happy users. We want to cope with what real users do, not just force them to do what we want.

So how do we make Dee Jay understand all these potential matches? As in Part 3, there's the obvious way and the lazy way. And once again, the lazy way (relying on Microsoft to solve the problem) is the smart way. When M-SAPI returns a RecognizedPhrase (or the subclass, RecognitionResult), it can include a list of equally good partial matches, called Homophones. Now we could quibble about that term: in grammar, homophones are words which sound the same but have different meanings. Here, the homophone phrases likely don't sound alike at all; but the recognized words form part of each phrase. But ignoring the terminology, the concept is easy: every phrase in the Homophones list is just as good of a match as the top-level phrase.

So remember from Part 2 that Dee Jay is designed to select one or more songs or albums or artists (i.e., media descriptors) that match a given phrase. Well, now we want the media descriptors that match the phrase and its Homophones. So the code for selecting all the matches looks something like this:


// Music commands may include a specifier.
string specifier = "";
if (e.Result.Semantics.ContainsKey(_Specifier))
{

SemanticValue valSpecifier = e.Result.Semantics[_Specifier];
if (valSpecifier.Confidence >= 0.8)
{

specifier = e.Result.Semantics[_Specifier].Value.ToString();

}

}

// Add the best match to the media phrase list.
List<RecognizedPhrase> testedPhrases = new List<RecognizedPhrase>();
List<MediaPhrase> phrases = new List<MediaPhrase>();
AddRecognizedMediaPhrase(command, e.Result, testedPhrases, phrases);

...

/// <summary>
/// Add a recognized phrase to a list of music phrases.
/// </summary>
/// <param name="command">The command being built.</param>
/// <param name="reco">The recognized phrase.</param>
/// <param name="testedPhrases">The phrases which have already been tested.</param>
/// <param name="phrases">The current list of music phrases.</param>
private void AddRecognizedMediaPhrase(string command,
RecognizedPhrase reco, List<RecognizedPhrase> testedPhrases, List<MediaPhrase> phrases)
{

// Avoid infinite recursion.
if (testedPhrases.Contains(reco))
{

return;

}
testedPhrases.Add(reco);

// Only confident items with music.
if ((reco.Confidence >= 0.8) && (reco.Semantics.ContainsKey(_MusicKey)))
{

// Only matching commands.
if ((reco.Semantics.ContainsKey(_Command)) && (reco.Semantics[_Command].Value.ToString() == command))
{

// Add the key. Don't duplicate.
string key = reco.Semantics[_MusicKey].Value.ToString();
if (!phrases.Contains(_Map[key]))
{

phrases.Add(_Map[key]);

}

}

}

// If we have homophones, add those, too.
if ((reco.Homophones.Count != null) && (reco.Homophones.Count > 0))
{

foreach (RecognizedPhrase phrase in reco.Homophones)
{

AddRecognizedMediaPhrase(command, reco, testedPhrases, phrases);

}

}

}



So now we have a richer list of possible matches, based on the top phrase and its Homophones. But we could potentially make it richer still. While any RecognizedPhrase can have Homophones, a RecognitionResult can also have Alternates, a list of lower confidence matches, each possibly including Homophones. So I could conceivably add code like this:


// If we have alternates, add those, too.
if ((e.Result.Alternates != null) && (e.Result.Alternates.Count > 0))
{

foreach (RecognizedPhrase alt in e.Result.Alternates)
{

AddRecognizedMediaPhrase(command, alt, testedPhrases, phrases);

}

}


But so far, I'm not very happy with the results when I do that. I need to experiment with different Confidence thresholds, and maybe tolerance on individual SemanticValues (as discussed in Part 4), to see if there's a good way to filter out "good" alternates from "bad".

So now we have a great big list of possible media phrases that the user might have meant. How is Dee Jay to know which one is correct? Well, the same way any M-SAPI application should clarify user intentions: it's going to ask. But I have other commitments and some flaky hardware, so it will be a while before I can get to that.
Posted on Saturday, November 15, 2008 4:40 PM .NET , M-SAPI | Back to top


Comments on this post: Dee Jay, Part 5: Homophones and Alternates

No comments posted yet.
Your comment:
 (will show your gravatar)


Copyright © Martin L. Shoemaker | Powered by: GeeksWithBlogs.net