Skip to content

2023

The development of Wordcab Transcribe

As a machine learning engineer and open-source enthusiast, I've always been driven by the desire to create solutions that bridge the gap between technological capability and universal accessibility.

Out of this pursuit, Wordcab Transcribe was born - a FastAPI based API for transcribing audio files using Faster-Whisper and NVIDIA NeMo.

This journey of creating an open-source, production-ready transcription service has been both challenging and rewarding.

asr tools logo

Turbocharge your tokenization by exploiting parallelism

Parallelize Hugging Face Tokenizers with num_proc

Parallelize Hugging Face Tokenizers with num_proc

Processing large datasets can be time-consuming, especially when it comes to tokenizing text.

But what if you could reduce your tokenization time from hours to mere minutes? Without any extra effort? 🤯

In this blog post, we'll show you how to parallelize your tokenization using Hugging Face's num_proc parameter.

Wordcab Transcribe - An open-source ASR solution using Whisper, Docker and FastAPI

Automatic Speech Recognition (ASR) has become an essential tool for developers and businesses. With Wordcab Transcribe, you can leverage ASR in your projects without relying on expensive third-party platforms.

We've implemented an open-source ASR solution using Docker, FastAPI, and the faster-whisper library, which is a fast implementation of the transcription model from OpenAI Whisper.

This project utilizes CTranslate2 under the hood to speed up the processing of audio files while requiring less than 5GB of VRAM on the GPU with the large-v2 Whisper model.

In this blog post, we'll present the Wordcab Transcribe project and show you how to use it in your own applications.

Keep your workstation clean - Docker

Optimize Docker Storage for Machine Learning

When working with Machine Learning, especially with large images like NVIDIA ones for training models on GPUs, it is important to manage your workstation storage efficiently.

Docker is a great tool for containerization, providing a consistent environment for deploying applications.

However, as you create and run containers, unused files and storage may accumulate on your system.

In this post, we'll cover how to use Docker commands to prevent unused files and storage from cluttering your workstation.

We are all learners

In today's fast-paced world of technology, learning and adapting to new developments is essential. This is especially true in the field of Artificial Intelligence (AI), which is advancing at an unprecedented rate.

As a learner, it can be overwhelming to keep up with the constant stream of new information and updates...