DeploymentsKubernetes

Realtime

Learn about the Kubernetes deployment options for Realtime

Quickstart

Install

Providing the Prerequisites have been met for the Speechmatics Helm chart, use the command below to install:

# Install the sm-realtime chart
helm upgrade --install speechmatics-realtime \
oci://speechmaticspublic.azurecr.io/sm-charts/sm-realtime \
--version 1.0.1 \
--set proxy.ingress.url="speechmatics.example.com"

Validate

Capacity check

You can confirm whether the transcribers and inference servers are available using:

kubectl get sessiongroups

If the transcribers and inference servers are available, it will show CAPACITY meaning that they have successfully registered.

NAME                                REPLICAS   CAPACITY   USAGE   VERSION   SPEC HASH
inference-server-enhanced-recipe1   1          480        0       1         b5784af49332f9948481195451eab6ca
rt-transcriber-en                   1          2          0       1         83929f2b9b2448cdc818d0e46e37600b

Run a session

speechmatics rt transcribe \
  --url wss://speechmatics.example.com/v2 \
  --lang en \
  --operating-point enhanced \
  --ssl-mode insecure \
  <audio-file>

Hardware recommendations

Below are the recommended Azure node sizes for running Realtime on Kubernetes:

Service	Node Size
Inference Server	Standard_NC4as_T4_v3
Transcriber	Standard_E16s_v5
All Other Services	Standard_D*s_v5

Configuration

For detailed configuration options, refer to sm-realtime Helm chart README.md

See the examples below on how to configure the Helm chart for different deployment scenarios.

global:
  transcriber:
    languages: ["ar", "ba", "be", "bg", "bn", "ca", "cmn", "cmn_en", "cmn_en_ms_ta", "cs", "cy", "da", "de", "el", "en", "en_ms", "en_ta", "eo", "es", "es-bilingual-en", "et", "eu", "fa", "fi", "fr", "ga", "gl", "he", "hi", "hr", "hu", "ia", "id", "it", "ja", "ko", "lt", "lv", "mn", "mr", "ms", "mt", "nl", "no", "pl", "pt", "ro", "ru", "sk", "sl", "sv", "sw", "ta", "th", "tl", "tr", "ug", "uk", "ur", "vi", "yue"]

# Enable all enhanced and standard inference server recipes
inferenceServerEnhancedRecipe1:
  enabled: true

inferenceServerEnhancedRecipe2:
  enabled: true

inferenceServerEnhancedRecipe3:
  enabled: true

inferenceServerEnhancedRecipe4:
  enabled: true

inferenceServerStandardAll:
  enabled: true

# Disable default enhanced inference server recipes
inferenceServerEnhancedRecipe1:
  enabled: false

# Enable custom inference server deployment with just en models
inferenceServerCustom:
  enabled: true
  fullnameOverride: inference-server-en
 
  tritonServer:
    image:
      # Repository for the en-only inference server triton container
      repository: sm-gpu-inference-server-en
 
  inferenceSidecar:
    enabled: true

    # Configuration for custom model deployments
    registerFeatures:
      capacity: 600
      customModelCosts:
        "*:diar_standard": 0
        "*:body_standard": 0
        "*:diar_enhanced": 0
        "*:body_enhanced": 0
        en:am_en_standard: 0
        en:ensemble_en_standard: 20
        en:lm_en_enhanced: 10
        en:am_en_enhanced: 0
        en:ensemble_en_enhanced: 20

global:
  # Enable scaling for all sessiongroups resources
  sessionGroups:
    scaling:
      enabled: true

inferenceServerEnhancedRecipe1:
  sessionGroups:
    scaling:
      # Scale up inference server pods when there are 300 inference tokens remaining
      scaleOnCapacityLeft: 300

transcribers:
  sessionGroups:
    scaling:
      # Scale up transcriber pods when there is only capacity for 1 more session
      scaleOnCapacityLeft: 1

Uninstall

Run the following command to uninstall Realtime from the cluster:

helm uninstall speechmatics-realtime

Depending on the configuration setup, you may also need to remove PVCs created from the redis deployment:

# Delete any left-over PVCs with `kubectl delete pvc`
kubectl get pvc | grep redis-data

Quickstart​

Install​

Validate​

Capacity check​

Run a session​

Hardware recommendations​

Configuration​

Uninstall​