pepper.framework.sensor.asr module¶
-
class
pepper.framework.sensor.asr.AbstractASR(language='en-GB')[source]¶ Bases:
objectAbstract Automatic Speech Recognition (ASR)
Parameters: language (str) – Language Code <LC> & Region Code <RC> -> “LC-RC” -
MAX_ALTERNATIVES= 10¶
-
language¶ Automatic Speech Recognition Language
Returns: language – Language Code <LC> & Region Code <RC> -> “LC-RC” Return type: str
-
transcribe(audio)[source]¶ Transcribe Speech in Audio
Parameters: audio (numpy.ndarray) – Audio Samples (Containing Speech) Returns: transcript Return type: List[UtteranceHypothesis]
-
-
class
pepper.framework.sensor.asr.AbstractTranslator(source, target)[source]¶ Bases:
objectAbstract Translator
Parameters: - source (str) – Two Character Source Language Code
- target (str) – Two Character Target Language Code
-
source¶ Source Language
Returns: source – Two Character Source Language Code Return type: str
-
target¶ Target Language
Returns: target – Two Character Target Language Code Return type: str
-
class
pepper.framework.sensor.asr.BaseGoogleASR(language='en-GB', sample_rate=16000, hints=())[source]¶ Bases:
pepper.framework.sensor.asr.AbstractASR,pepper.framework.sensor.asr.GoogleTranslatorAbstract Base Google Automatic Speech Recognition (ASR)
Handles common parameters for SynchronousGoogleASR and StreamedGoogleASR
Parameters: - language (str) – Language Code <LC> & Region Code <RC> -> “LC-RC”
- sample_rate (int) – Number of Audio Samples per second that will be handled to ASR transcription (16k is nice!)
- hints (Tuple[str]) – Words or Phrases that ASR should be extra sensitive to
-
transcribe(audio)[source]¶ Transcribe Speech in Audio
Parameters: audio (numpy.ndarray) – Audio Samples (Containing Speech) Returns: transcript Return type: List[UtteranceHypothesis]
-
class
pepper.framework.sensor.asr.GoogleTranslator(source, target)[source]¶ Bases:
pepper.framework.sensor.asr.AbstractTranslatorGoogle Translator
Parameters: - source (str) – Two Character Source Language Code
- target (str) – Two Character Target Language Code
-
class
pepper.framework.sensor.asr.StreamedGoogleASR(language='en-GB', sample_rate=16000, hints=())[source]¶ Bases:
pepper.framework.sensor.asr.BaseGoogleASRStreamed Google Automatic Speech Recognition (ASR)
Recognises Speech ‘live’ as it is spoken. Should be faster than Synchronous ASR
Parameters: - language (str) – Language Code <LC> & Region Code <RC> -> “LC-RC”
- sample_rate (int) – Number of Audio Samples per second that will be handled to ASR transcription (16k is nice!)
- hints (Tuple[str]) – Words or Phrases that ASR should be extra sensitive to
-
live¶ Live Speech Transcript String (Debug/Visual purposes only)
Returns: live – Live Speech Transcript String Return type: str
-
transcribe(audio)[source]¶ Transcribe Speech in Audio (Streamed)
Instead of a single Block of Audio, this function takes an Iterable of Audio frames. One frame is processed at a time while the audio is being generated by the speaker. This provides a faster ASR than the synchronous version, and a live representation of the utterance. TODO: this breaks the abstract specification, is there a neater way to do this?
Parameters: audio (Iterable[numpy.ndarray]) – Iterable of Audio Samples (Containing Speech) Returns: transcript Return type: List[UtteranceHypothesis]
-
class
pepper.framework.sensor.asr.SynchronousGoogleASR(language='en-GB', sample_rate=16000, hints=())[source]¶ Bases:
pepper.framework.sensor.asr.BaseGoogleASRSynchronous Google Automatic Speech Recognition (ASR)
Recognises Speech ‘live’ as it is spoken. Should be faster than Synchronous ASR
Parameters: - language (str) – Language Code <LC> & Region Code <RC> -> “LC-RC”
- sample_rate (int) – Number of Audio Samples per second that will be handled to ASR transcription (16k is nice!)
- hints (Tuple[str]) – Words or Phrases that ASR should be extra sensitive to
-
transcribe(audio)[source]¶ Transcribe Speech in Audio
Parameters: audio (numpy.ndarray) – Returns: hypotheses Return type: List[UtteranceHypothesis]
-
class
pepper.framework.sensor.asr.UtteranceHypothesis(transcript, confidence)[source]¶ Bases:
objectAutomatic Speech Recognition (ASR) Hypothesis
Parameters: - transcript (str) – Utterance Hypothesis Transcript
- confidence (float) – Utterance Hypothesis Confidence
-
confidence¶ Automatic Speech Recognition Hypothesis Confidence
Returns: confidence Return type: float
-
transcript¶ Automatic Speech Recognition Hypothesis Transcript
Returns: transcript Return type: str