Vidify

Scalable education video editor

When I created mawiwi.de, I was struggling with the videos.

I knew that this was something that I had to solve for Studymaniac to create a truly remarkable platform.

It would take me 2-3 hours to create a PowerPoint presentation, record a video and edit it.

So far so good, but...

"There is a mistake at 3:45!" "I do not understand the step at 6:25?"

The real problem occurred when I wanted to change something in the video, which happens regularly with educational videos.

Then I had to put in another 1-2 hours.

My goals

I needed a tool that would allow me to create and edit educational videos as quickly and easily as possible.

Considering that I wanted to create a global education platform, I also wanted to make sure that automatic video translation could be easily implemented to make the videos scale across different languages.

Video Editor

How I built it

Most university lectures consist of a PowerPoint presentation and a voice-over. I also used this approach for videos at mawiwi.de. While it is imperfect as it does not include any animations, it is effective.

Therefore, the first step was to create a tool that would allow me to create some sort of web-based PowerPoint presentation without a steep learning curve. I decided to use TlDraw for this, as I was fascinated by Steve Ruiz's work for a long time.

Tldraw is an open-source whiteboard package based on React. It excels at its handwriting library based on perfect-freehand, which was vital for me to enable teachers to create videos on their tablets in the browser. Inspired by tlslides, a repository that uses tlDraw to create presentations, I transformed tlDraw into my adaptation of PowerPoint.

The plan was to convert each slide to an image to create the video's visuals. The duration of each slide should depend on the next step, the audio.

For every slide, I added a textarea to write the script for each slide. Then, I started using GoogleCloud TTS to create the audio initially on a Python backend running on FastAPI. To concat all images into a video and handle the audio stream, I used ffmpeg.

Unfortunately, the GoogleCloud TTS was not good enough for my needs. The voices were unnatural, and the pronunciation (especially German) was underwhelming. To offer an alternative, I added the option to record the audio directly in the browser for each slide. Hence, I implemented a small audio recorder in React.

Audio Editor

This was more complicated than I thought, but still fun.

Audio Result

Edit: Recently, I added support for Elevenlabs voices, which are much better than GoogleCloud TTS.

The result

I had to create some videos myself last year, which was much quicker than before. I also added fun features like handwriting math recognition and a small LaTeX editor. This feature sped up the creation process by at least 80%.

The nature of the editor allows instant rerendering of videos. This flexibility came in handy whenever I made a mistake or updated the language service. I could press rerender - and voilà!

Another super cool advantage is that multiple creators can work on videos together.

You can try it

An intern at Studymaniac created a website for the tool as his first project when I taught him the basics of Web Development.

You can try it yourself on https://vidify.cloud.

Techstack

React
TailwindCSS
TlDraw
Python (FastAPI)
ffmpeg
Elevenlabs for voices
Laravel for Backend of vidify.cloud