pepper.framework.sensor.asr module

class pepper.framework.sensor.asr.AbstractASR(language='en-GB')[source]

Bases: object

Abstract Automatic Speech Recognition (ASR)

Parameters:language (str) – Language Code <LC> & Region Code <RC> -> “LC-RC”
MAX_ALTERNATIVES = 10
language

Automatic Speech Recognition Language

Returns:language – Language Code <LC> & Region Code <RC> -> “LC-RC”
Return type:str
transcribe(audio)[source]

Transcribe Speech in Audio

Parameters:audio (numpy.ndarray) – Audio Samples (Containing Speech)
Returns:transcript
Return type:List[UtteranceHypothesis]
class pepper.framework.sensor.asr.AbstractTranslator(source, target)[source]

Bases: object

Abstract Translator

Parameters:
  • source (str) – Two Character Source Language Code
  • target (str) – Two Character Target Language Code
source

Source Language

Returns:source – Two Character Source Language Code
Return type:str
target

Target Language

Returns:target – Two Character Target Language Code
Return type:str
translate(text)[source]

Translate Text from Source to Target Language

Parameters:text (str) –
Returns:translated_text – Translated Text
Return type:str
class pepper.framework.sensor.asr.BaseGoogleASR(language='en-GB', sample_rate=16000, hints=())[source]

Bases: pepper.framework.sensor.asr.AbstractASR, pepper.framework.sensor.asr.GoogleTranslator

Abstract Base Google Automatic Speech Recognition (ASR)

Handles common parameters for SynchronousGoogleASR and StreamedGoogleASR

Parameters:
  • language (str) – Language Code <LC> & Region Code <RC> -> “LC-RC”
  • sample_rate (int) – Number of Audio Samples per second that will be handled to ASR transcription (16k is nice!)
  • hints (Tuple[str]) – Words or Phrases that ASR should be extra sensitive to
transcribe(audio)[source]

Transcribe Speech in Audio

Parameters:audio (numpy.ndarray) – Audio Samples (Containing Speech)
Returns:transcript
Return type:List[UtteranceHypothesis]
class pepper.framework.sensor.asr.GoogleTranslator(source, target)[source]

Bases: pepper.framework.sensor.asr.AbstractTranslator

Google Translator

Parameters:
  • source (str) – Two Character Source Language Code
  • target (str) – Two Character Target Language Code
translate(text)[source]

Translate Text from Source to Target Language

Parameters:text (str) –
Returns:translated_text – Translated Text
Return type:str
class pepper.framework.sensor.asr.StreamedGoogleASR(language='en-GB', sample_rate=16000, hints=())[source]

Bases: pepper.framework.sensor.asr.BaseGoogleASR

Streamed Google Automatic Speech Recognition (ASR)

Recognises Speech ‘live’ as it is spoken. Should be faster than Synchronous ASR

Parameters:
  • language (str) – Language Code <LC> & Region Code <RC> -> “LC-RC”
  • sample_rate (int) – Number of Audio Samples per second that will be handled to ASR transcription (16k is nice!)
  • hints (Tuple[str]) – Words or Phrases that ASR should be extra sensitive to
live

Live Speech Transcript String (Debug/Visual purposes only)

Returns:live – Live Speech Transcript String
Return type:str
transcribe(audio)[source]

Transcribe Speech in Audio (Streamed)

Instead of a single Block of Audio, this function takes an Iterable of Audio frames. One frame is processed at a time while the audio is being generated by the speaker. This provides a faster ASR than the synchronous version, and a live representation of the utterance. TODO: this breaks the abstract specification, is there a neater way to do this?

Parameters:audio (Iterable[numpy.ndarray]) – Iterable of Audio Samples (Containing Speech)
Returns:transcript
Return type:List[UtteranceHypothesis]
class pepper.framework.sensor.asr.SynchronousGoogleASR(language='en-GB', sample_rate=16000, hints=())[source]

Bases: pepper.framework.sensor.asr.BaseGoogleASR

Synchronous Google Automatic Speech Recognition (ASR)

Recognises Speech ‘live’ as it is spoken. Should be faster than Synchronous ASR

Parameters:
  • language (str) – Language Code <LC> & Region Code <RC> -> “LC-RC”
  • sample_rate (int) – Number of Audio Samples per second that will be handled to ASR transcription (16k is nice!)
  • hints (Tuple[str]) – Words or Phrases that ASR should be extra sensitive to
transcribe(audio)[source]

Transcribe Speech in Audio

Parameters:audio (numpy.ndarray) –
Returns:hypotheses
Return type:List[UtteranceHypothesis]
class pepper.framework.sensor.asr.UtteranceHypothesis(transcript, confidence)[source]

Bases: object

Automatic Speech Recognition (ASR) Hypothesis

Parameters:
  • transcript (str) – Utterance Hypothesis Transcript
  • confidence (float) – Utterance Hypothesis Confidence
confidence

Automatic Speech Recognition Hypothesis Confidence

Returns:confidence
Return type:float
transcript

Automatic Speech Recognition Hypothesis Transcript

Returns:transcript
Return type:str