pepper.framework.sensor.asr module¶

class pepper.framework.sensor.asr.AbstractASR(language='en-GB')[source]¶

Bases: object

Abstract Automatic Speech Recognition (ASR)

Parameters:	language (str) – Language Code <LC> & Region Code <RC> -> “LC-RC”

MAX_ALTERNATIVES = 10¶

language¶

Automatic Speech Recognition Language

Returns:	language – Language Code <LC> & Region Code <RC> -> “LC-RC”
Return type:	str

transcribe(audio)[source]¶

Transcribe Speech in Audio

Parameters:	audio (numpy.ndarray) – Audio Samples (Containing Speech)
Returns:	transcript
Return type:	List[UtteranceHypothesis]

class pepper.framework.sensor.asr.AbstractTranslator(source, target)[source]¶

Bases: object

Abstract Translator

Parameters:	source (str) – Two Character Source Language Code target (str) – Two Character Target Language Code

source¶

Source Language

Returns:	source – Two Character Source Language Code
Return type:	str

target¶

Target Language

Returns:	target – Two Character Target Language Code
Return type:	str

translate(text)[source]¶

Translate Text from Source to Target Language

Parameters:	text (str) –
Returns:	translated_text – Translated Text
Return type:	str

class pepper.framework.sensor.asr.BaseGoogleASR(language='en-GB', sample_rate=16000, hints=())[source]¶

Bases: pepper.framework.sensor.asr.AbstractASR, pepper.framework.sensor.asr.GoogleTranslator

Abstract Base Google Automatic Speech Recognition (ASR)

Handles common parameters for SynchronousGoogleASR and StreamedGoogleASR

Parameters:	language (str) – Language Code <LC> & Region Code <RC> -> “LC-RC” sample_rate (int) – Number of Audio Samples per second that will be handled to ASR transcription (16k is nice!) hints (Tuple[str]) – Words or Phrases that ASR should be extra sensitive to

transcribe(audio)[source]¶

Transcribe Speech in Audio

Parameters:	audio (numpy.ndarray) – Audio Samples (Containing Speech)
Returns:	transcript
Return type:	List[UtteranceHypothesis]

class pepper.framework.sensor.asr.GoogleTranslator(source, target)[source]¶

Bases: pepper.framework.sensor.asr.AbstractTranslator

Google Translator

Parameters:	source (str) – Two Character Source Language Code target (str) – Two Character Target Language Code

translate(text)[source]¶

Translate Text from Source to Target Language

Parameters:	text (str) –
Returns:	translated_text – Translated Text
Return type:	str

class pepper.framework.sensor.asr.StreamedGoogleASR(language='en-GB', sample_rate=16000, hints=())[source]¶

Bases: pepper.framework.sensor.asr.BaseGoogleASR

Streamed Google Automatic Speech Recognition (ASR)

Recognises Speech ‘live’ as it is spoken. Should be faster than Synchronous ASR

Parameters:	language (str) – Language Code <LC> & Region Code <RC> -> “LC-RC” sample_rate (int) – Number of Audio Samples per second that will be handled to ASR transcription (16k is nice!) hints (Tuple[str]) – Words or Phrases that ASR should be extra sensitive to

live¶

Live Speech Transcript String (Debug/Visual purposes only)

Returns:	live – Live Speech Transcript String
Return type:	str

transcribe(audio)[source]¶

Transcribe Speech in Audio (Streamed)

Instead of a single Block of Audio, this function takes an Iterable of Audio frames. One frame is processed at a time while the audio is being generated by the speaker. This provides a faster ASR than the synchronous version, and a live representation of the utterance. TODO: this breaks the abstract specification, is there a neater way to do this?

Parameters:	audio (Iterable[numpy.ndarray]) – Iterable of Audio Samples (Containing Speech)
Returns:	transcript
Return type:	List[UtteranceHypothesis]

class pepper.framework.sensor.asr.SynchronousGoogleASR(language='en-GB', sample_rate=16000, hints=())[source]¶

Bases: pepper.framework.sensor.asr.BaseGoogleASR

Synchronous Google Automatic Speech Recognition (ASR)

Recognises Speech ‘live’ as it is spoken. Should be faster than Synchronous ASR

Parameters:	language (str) – Language Code <LC> & Region Code <RC> -> “LC-RC” sample_rate (int) – Number of Audio Samples per second that will be handled to ASR transcription (16k is nice!) hints (Tuple[str]) – Words or Phrases that ASR should be extra sensitive to

transcribe(audio)[source]¶

Transcribe Speech in Audio

Parameters:	audio (numpy.ndarray) –
Returns:	hypotheses
Return type:	List[UtteranceHypothesis]

class pepper.framework.sensor.asr.UtteranceHypothesis(transcript, confidence)[source]¶

Bases: object

Automatic Speech Recognition (ASR) Hypothesis

Parameters:	transcript (str) – Utterance Hypothesis Transcript confidence (float) – Utterance Hypothesis Confidence

confidence¶

Automatic Speech Recognition Hypothesis Confidence

Returns:	confidence
Return type:	float

transcript¶

Automatic Speech Recognition Hypothesis Transcript

Returns:	transcript
Return type:	str