AppTek and Deluxe - AI-Enabled
Solutions with Human Expertise for the Media & Entertainment Industry

Deluxe and AppTek bring the Media & Entertainment industry’s premium solution for AI-enabled localization services with a human touch that caters to studios, streaming platforms and content creators worldwide. The fusion of Deluxe’s extensive workflow experience combined with AppTek’s award-winning science team offers enterprises the ability to create bespoke AI solutions that can be delivered on-premises or consumed through Deluxe’s platforms while preserving data privacy and exclusivity.

Automatic Dubbing

AppTek and Deluxe offer fully automatic and human-in-the-loop speaker-adaptive dubbing solutions designed to accelerate the dubbing process and help content creators reach more audiences across the globe.  The automatic service is the first of its kind to transcribe and translate audio from a source language and present it back in similar-sounding voices in the target language while maintaining the time constraints, style, formality, gender and other context- or domain-specific information needed to produce a more accurate dubbed output.  

Automatic Dubbing - Under the Hood

Solutions for Subtitling

The combination of AppTek and Deluxe’s ASR and meta-aware MT technologies is designed to reduce manual labor and accelerate production timelines for captioning and subtitling workflows by making use of content metadata, such as genre, style, speaker gender, etc. to produce more accurate machine translations. Additionally, advanced “Intelligent Line Segmentation” technology is a separate neural network trained on the segmentation decisions of professional captioners and subtitlers. It is used in combination with ASR and MT output to deliver more accurately segmented automated subtitles either for fully automated workflows or to provide a significant jump-start to expert-in-the-loop ones.

Voice Conversion

Voice Conversion modifies the speech of a source speaker to make their speech sound like that of another target speaker without changing any linguistic information., allowing a single speaker to mimic the voice of another speaker.  Available in same language or cross lingual and includes parameter controls including pitch modification​.

Speech Synthesis

AppTek’s text-to-speech technology converts written text into speech using pre-built or customized voices. The technology can be used in a wide range of applications, such as offering accessibility to visually impaired users for critical text-based news crawls, e.g., announcing school closings.

Live Automatic Captioning

AppTek and Deluxe deliver fully automated, same-language captions for live content in multiple languages, dialects, demographics and domains. Our OmniCaption 300 closed captioning appliance was developed for and trained on broadcast news, sports, weather and other programming to offer high accuracy that broadcasters can depend on. Check out our video to view samples of our automatic captioning in action.

Language Identification

AppTek’s language identification technology determines the language of a given text or audio input. It is used for the more efficient management of subtitle or audio files, e.g. ahead of broadcast time, or to identify multiple languages within a given file and segment it for other automatic processes such as automatic speech recognition or machine translation.

Large Language Models and GPT

AppTek and Deluxe work with enterprise organizations in support of Large Language Models and Generative Pre-Trained Transformers to transform creative text generation and language-enabled workflows.  Services include:

• Data Collection: Gather high-quality data from a variety of sources, ranging from pre-existing data sets such as books and articles to customer supplied data, covering a wide variety of domains and content which should be incorporated in the model.​

• Data Preprocessing: Clean the text data by removing any noise, such as HTML tags, punctuation and special characters, and tokenize the text into smaller units, such as words and sentences.​

• Model Architecture: Decide on the architecture of the LLM, including the number of layers, the size of the hidden state, the number of attention heads, and the sequence length.​

• Training: Train the LLM on the preprocessed text data using a large amount of computing resources, such as GPUs or TPUs. The model is trained to predict the next word given the preceding words, and the model parameters are updated during training according to how well the predictions of the model being trained match the ground truth.​

• Hyper-Parameter Fine-tuning: Fine-tune the pre-trained LLM on a specific task, such as text generation or language translation, by providing it with task-specific training data, and adjusting the model's parameters to fit the task.​

• Evaluation: Evaluate system on downstream tasks including text-generation tasks, model's propensity for bias and toxicity, cultural insensitivities, etc. Conduct post-training analysis to identify any biases in the model's output and take corrective measures as needed.​

• Deployment: Deploy the trained and fine-tuned LLM to a production environment, such as a web application or mobile app, where it can be used to generate text or perform other natural language processing tasks.​

Schedule a demo.

Take a comprehensive look into AppTek's AI-enabled speech and language technologies, combined with Deluxe professional services,  can transform your media content and localization strategy.

30-Year Leaders in Speech Technology
Find us on Social Media:
Copyright 2022 AppTek    |    Privacy Policy      |       Terms of Use