DeploymentsKubernetes
Realtime
Quickstart
Install
Providing the Prerequisites have been met for the Speechmatics Helm chart, use the command below to install:
# Install the sm-realtime chart
helm upgrade --install speechmatics-realtime \
oci://speechmaticspublic.azurecr.io/sm-charts/sm-realtime \
--version 1.0.1 \
--set proxy.ingress.url="speechmatics.example.com"
Validate
Capacity check
You can confirm whether the transcribers and inference servers are available using:
kubectl get sessiongroups
If the transcribers and inference servers are available, it will show CAPACITY meaning that they have successfully registered.
NAME REPLICAS CAPACITY USAGE VERSION SPEC HASH
inference-server-enhanced-recipe1 1 480 0 1 b5784af49332f9948481195451eab6ca
rt-transcriber-en 1 2 0 1 83929f2b9b2448cdc818d0e46e37600b
Run a session
speechmatics rt transcribe \
--url wss://speechmatics.example.com/v2 \
--lang en \
--operating-point enhanced \
--ssl-mode insecure \
<audio-file>
Hardware recommendations
Below are the recommended Azure node sizes for running Realtime on Kubernetes:
Configuration
For detailed configuration options, refer to sm-realtime Helm chart README.md
See the examples below on how to configure the Helm chart for different deployment scenarios.
global:
transcriber:
languages: ["ar", "ba", "be", "bg", "bn", "ca", "cmn", "cmn_en", "cmn_en_ms_ta", "cs", "cy", "da", "de", "el", "en", "en_ms", "en_ta", "eo", "es", "es-bilingual-en", "et", "eu", "fa", "fi", "fr", "ga", "gl", "he", "hi", "hr", "hu", "ia", "id", "it", "ja", "ko", "lt", "lv", "mn", "mr", "ms", "mt", "nl", "no", "pl", "pt", "ro", "ru", "sk", "sl", "sv", "sw", "ta", "th", "tl", "tr", "ug", "uk", "ur", "vi", "yue"]
# Enable all enhanced and standard inference server recipes
inferenceServerEnhancedRecipe1:
enabled: true
inferenceServerEnhancedRecipe2:
enabled: true
inferenceServerEnhancedRecipe3:
enabled: true
inferenceServerEnhancedRecipe4:
enabled: true
inferenceServerStandardAll:
enabled: true
# Disable default enhanced inference server recipes
inferenceServerEnhancedRecipe1:
enabled: false
# Enable custom inference server deployment with just en models
inferenceServerCustom:
enabled: true
fullnameOverride: inference-server-en
tritonServer:
image:
# Repository for the en-only inference server triton container
repository: sm-gpu-inference-server-en
inferenceSidecar:
enabled: true
# Configuration for custom model deployments
registerFeatures:
capacity: 600
customModelCosts:
"*:diar_standard": 0
"*:body_standard": 0
"*:diar_enhanced": 0
"*:body_enhanced": 0
en:am_en_standard: 0
en:ensemble_en_standard: 20
en:lm_en_enhanced: 10
en:am_en_enhanced: 0
en:ensemble_en_enhanced: 20
global:
# Enable scaling for all sessiongroups resources
sessionGroups:
scaling:
enabled: true
inferenceServerEnhancedRecipe1:
sessionGroups:
scaling:
# Scale up inference server pods when there are 300 inference tokens remaining
scaleOnCapacityLeft: 300
transcribers:
sessionGroups:
scaling:
# Scale up transcriber pods when there is only capacity for 1 more session
scaleOnCapacityLeft: 1
Uninstall
Run the following command to uninstall Realtime from the cluster:
helm uninstall speechmatics-realtime
Depending on the configuration setup, you may also need to remove PVCs created from the redis deployment:
# Delete any left-over PVCs with `kubectl delete pvc`
kubectl get pvc | grep redis-data