pepper.framework.sensor.asr module¶
-
class
pepper.framework.sensor.asr.
AbstractASR
(language='en-GB')[source]¶ Bases:
object
Abstract Automatic Speech Recognition (ASR)
Parameters: language (str) – Language Code <LC> & Region Code <RC> -> “LC-RC” -
MAX_ALTERNATIVES
= 10¶
-
language
¶ Automatic Speech Recognition Language
Returns: language – Language Code <LC> & Region Code <RC> -> “LC-RC” Return type: str
-
transcribe
(audio)[source]¶ Transcribe Speech in Audio
Parameters: audio (numpy.ndarray) – Audio Samples (Containing Speech) Returns: transcript Return type: List[UtteranceHypothesis]
-
-
class
pepper.framework.sensor.asr.
AbstractTranslator
(source, target)[source]¶ Bases:
object
Abstract Translator
Parameters: - source (str) – Two Character Source Language Code
- target (str) – Two Character Target Language Code
-
source
¶ Source Language
Returns: source – Two Character Source Language Code Return type: str
-
target
¶ Target Language
Returns: target – Two Character Target Language Code Return type: str
-
class
pepper.framework.sensor.asr.
BaseGoogleASR
(language='en-GB', sample_rate=16000, hints=())[source]¶ Bases:
pepper.framework.sensor.asr.AbstractASR
,pepper.framework.sensor.asr.GoogleTranslator
Abstract Base Google Automatic Speech Recognition (ASR)
Handles common parameters for SynchronousGoogleASR and StreamedGoogleASR
Parameters: - language (str) – Language Code <LC> & Region Code <RC> -> “LC-RC”
- sample_rate (int) – Number of Audio Samples per second that will be handled to ASR transcription (16k is nice!)
- hints (Tuple[str]) – Words or Phrases that ASR should be extra sensitive to
-
transcribe
(audio)[source]¶ Transcribe Speech in Audio
Parameters: audio (numpy.ndarray) – Audio Samples (Containing Speech) Returns: transcript Return type: List[UtteranceHypothesis]
-
class
pepper.framework.sensor.asr.
GoogleTranslator
(source, target)[source]¶ Bases:
pepper.framework.sensor.asr.AbstractTranslator
Google Translator
Parameters: - source (str) – Two Character Source Language Code
- target (str) – Two Character Target Language Code
-
class
pepper.framework.sensor.asr.
StreamedGoogleASR
(language='en-GB', sample_rate=16000, hints=())[source]¶ Bases:
pepper.framework.sensor.asr.BaseGoogleASR
Streamed Google Automatic Speech Recognition (ASR)
Recognises Speech ‘live’ as it is spoken. Should be faster than Synchronous ASR
Parameters: - language (str) – Language Code <LC> & Region Code <RC> -> “LC-RC”
- sample_rate (int) – Number of Audio Samples per second that will be handled to ASR transcription (16k is nice!)
- hints (Tuple[str]) – Words or Phrases that ASR should be extra sensitive to
-
live
¶ Live Speech Transcript String (Debug/Visual purposes only)
Returns: live – Live Speech Transcript String Return type: str
-
transcribe
(audio)[source]¶ Transcribe Speech in Audio (Streamed)
Instead of a single Block of Audio, this function takes an Iterable of Audio frames. One frame is processed at a time while the audio is being generated by the speaker. This provides a faster ASR than the synchronous version, and a live representation of the utterance. TODO: this breaks the abstract specification, is there a neater way to do this?
Parameters: audio (Iterable[numpy.ndarray]) – Iterable of Audio Samples (Containing Speech) Returns: transcript Return type: List[UtteranceHypothesis]
-
class
pepper.framework.sensor.asr.
SynchronousGoogleASR
(language='en-GB', sample_rate=16000, hints=())[source]¶ Bases:
pepper.framework.sensor.asr.BaseGoogleASR
Synchronous Google Automatic Speech Recognition (ASR)
Recognises Speech ‘live’ as it is spoken. Should be faster than Synchronous ASR
Parameters: - language (str) – Language Code <LC> & Region Code <RC> -> “LC-RC”
- sample_rate (int) – Number of Audio Samples per second that will be handled to ASR transcription (16k is nice!)
- hints (Tuple[str]) – Words or Phrases that ASR should be extra sensitive to
-
transcribe
(audio)[source]¶ Transcribe Speech in Audio
Parameters: audio (numpy.ndarray) – Returns: hypotheses Return type: List[UtteranceHypothesis]
-
class
pepper.framework.sensor.asr.
UtteranceHypothesis
(transcript, confidence)[source]¶ Bases:
object
Automatic Speech Recognition (ASR) Hypothesis
Parameters: - transcript (str) – Utterance Hypothesis Transcript
- confidence (float) – Utterance Hypothesis Confidence
-
confidence
¶ Automatic Speech Recognition Hypothesis Confidence
Returns: confidence Return type: float
-
transcript
¶ Automatic Speech Recognition Hypothesis Transcript
Returns: transcript Return type: str