This post follows on from Adventures in Lip-sync: Part 1
So, at this stage, we’ve got a string of gibberish from Repeat After Me and not much else. To get this working there’s four key ingredients:
- the audio soundtrack;
- the phoneme info from Repeat After Me;
- the graphic representations of the mouth shapes (visemes);
- and a dictionary to translate one to the other.
All I need to do is hook them up.
Step 4
I created a MovieClip with my mouth shapes on separate key-frames. The nine “main” mouth shapes, and extra for ‘th’ sounds and a closed/relaxed mouth – 11 in total. Here’s Preston Blair’s mouth shapes that are something similar to mine. I kept them pretty basic so they can be easily re-skinned or adapted next time. Keyframe by number they’re:
- M, B, P
- A, I
- H, EH, UH
- EE
- S, C
- V
- L
- O, OH
- W, OO
- TH
- (Closed)
Step 5
Next, I created an object called ref that I initialise like this:
private function initRef():void { ref = new Object(); ref["AA"] = 2; ref["AE"] = 2; ref["AH"] = 3; ref["AO"] = 8; ref["AW"] = 8; ref["AX"] = 2; ref["AXR"] = 5; ref["AY"] = 2; ref["EH"] = 3; ref["ER"] = 3; ref["EY"] = 2; ref["IH"] = 3; ref["IX"] = 3; ref["IY"] = 4; ref["OW"] = 8; ref["OY"] = 8; ref["UH"] = 9; ref["UW"] = 9; ref["B"] = 1; ref["CH"] = 5; ref["D"] = 5; ref["DH"] = 10; ref["DX"] = 1; ref["F"] = 6; ref["G"] = 5; ref["HH"] = 3; ref["JH"] = 5; ref["K"] = 5; ref["L"] = 7; ref["M"] = 1; ref["N"] = 5; ref["NG"] = 5; ref["P"] = 1; ref["R"] = 5; ref["S"] = 5; ref["SH"] = 5; ref["T"] = 5; ref["TH"] = 10; ref["V"] = 6; ref["W"] = 9; ref["Y"] = 5; ref["Z"] = 5; ref["ZH"] = 5; ref[""] = 11; }
This gives me a reference or dictionary of sorts to convert the phoneme info from the data to the appropriate keyframe. Here’s a great reference that really helped reduce the 40-odd phonemes into my 10 visemes.
Step 6
To tie in the phonemes with the audio, I loaded the audio clip and using this nifty little Audio Cuepoint class by Armen Abrahamyan I can fire events at the appropriate times.
Step 7
Next thing is to get the phoneme data in. I wanted to re-use this class so I put the data in a text-file and loaded it in via ActionScript. Looking at the data generated by Repeat After Me I figured out all that’s relevant to me is:
[phoneme name] {D:[duration (ms)]; … }
That’s enough of a pattern for me to parse it out (watching for a gotcha in the form of a random numeric character at the start of the phonemes). I created an XML object that the Audio Cuepoint class can understand to store the info:
private var cuepoints:XML = ;
and here’s the function I called once the load completed:
private function parseData(_data:String):void{ var instructions:Array = new Array(); instructions = _data.split("\n"); var instructions_length:uint = instructions.length; var time:Number = 0; for (var a:uint = 0; a < instructions_length; a++) { var duration_delimiter:String = " {D "; if(instructions[a].indexOf(duration_delimiter) > -1) { var instruction:Object = new Object(); instruction.viseme = instructions[a].split(duration_delimiter)[0]; //strip anything that's not a letter var pattern:RegExp = /[^a-z]/gi; instruction.viseme = instruction.viseme.replace(pattern,""); cuepoints.appendChild(new XML('' + instruction.viseme + '')); //get duration in milliseconds instruction.duration = Number(instructions[a].split(duration_delimiter)[1].split(";")[0].split("}")[0]); time += instruction.duration; visemes.push(instruction); } } }
Step 8
All that’s left to do now is listen for my audio cuepoints, cross-reference the phoneme with the appropriate keyframe and gotoAndStop().
private function getViseme(_key:String):uint { return ref[_key.toUpperCase()] as uint; } private function onCuepointFind(e:Event):void { trace("cuepoint find time: " +AudioCuePoint(e.target).cuepointTime + " / text: "+AudioCuePoint(e.target).cuepointText+"\n"); mouth.gotoAndStop(getViseme( AudioCuePoint(e.target).cuepointText ) ); }
Here’s a clip in action. It’s not perfect but for a 5-minute lip-synced animation, this saved me a huge amount of time!
Tags: actionscript, Animation, Flash, lip-sync