2023 – Present | Full-stack development | Technologies: Python, Flask, reactjs, Openai API, FFmpeg, pyDub, mpviepy, gunicorn, asyncio.
Transcribly is a straightforward web application that utilises AI to streamline the process of transcribing various media inputs—be it sound files, video files, or YouTube links.
Implementation:
Initial Project Setup:
The project began by constructing Worker files using Python. These files primarily handled ffmpeg operations, including the conversion of file types, downloading YouTube videos, and extracting audio from video files. Once the audio was obtained, the next step involved pre-processing.
Pre-Processing:
The pre-processing phase entailed scouring the file for silent areas, removing them, and subsequently chunking the file into smaller segments based on those silent areas. As a result, the file size was optimized, making it more manageable for subsequent operations.
With the smaller file chunks in hand, the application asynchronously called the OpenAI Whisper API for transcription. This asynchronous approach significantly improved processing speed, enabling parallel transcription of smaller files.
Upon receiving data from the Whisper API, the original transcription was reconstructed by connecting the different chunks based on their stride. The final result was the generation of the transcribed content.
Backend Implementation:
For the backend, Flask was utilized to make server calls to various methods within the Worker class. The code was structured with multiple methods, each handling a specific operation. This modular approach improved readability, maintenance, and extensibility.
Media files sent from the frontend to the backend through an API call underwent a series of methods. These included extraction methods to obtain audio from video files, chunking methods for preprocessing, and API calls to the Whisper API. Asynchronous operations were employed for efficiency, and the results were returned as JSON to the frontend.
Frontend Development:
The frontend was built using React.js, providing users with a seamless interface for transcription. Responses from the API calls were displayed on a user-friendly interface. The frontend endpoint was hosted on render.com, utilizing their free plan with limited resources but still proving functional.
Challenges and Future Features:
The project faced challenges related to memory constraints, particularly concerning large audio files. Future iterations aim to include additional features, such as a note-taking capability directly on the transcriptions. This interactive paradigm allows users to give extra commands, enhancing the application’s versatility and utility.
Learning Outcomes:
Throughout the development process, valuable insights were gained. Working with the OpenAI API proved to be a straightforward experience with effective documentation. Asynchronous programming using Async IO, event loops, and integration with Flask in the backend were key aspects learned. Additionally, the frontend development involved mastering React.js, hooks, and Material UI for a responsive user interface.