Tarteel


Posted on November 15, 2018 at 18:00

Tag: Open Source

Github Project

Paper

Tarteel is a hackathon born initiative designed to be the world’s first open source audio collection of Quran recitation data. Our goal is to have a comprehensive database of audio collections of regular individuals that we can use to train machine learning models. We hope to collect audio samples from a diverse demographic so that our applications can be applicable to the broad spectrum of Muslim backgrounds. Our goal is to make it easier for normal people to read & recite the Quran correctly and with tajweed (measured, rhythmic tones), as well as support their memorization.

Examples of the applications we hope to build are:

  • Speech-to-text recognition
  • Hifz (Memorization) correction
  • Tajweed (Recitation) correction

which can be used in:

  • Mosques to indicate where the Imam is reading
  • Schools to support teachers teaching Quran
  • Halaqas seeking to perfect their recitation of the Quran.

Technical Details

My current role in this project is:

  1. Maintain and develop the backend Django framework
  2. Developing a REST API that will be used in the future to get data from our database.

The main feature I am working on right now is a tajweed evaluator that allows expert readers to highlight and annotate the mistakes of people who submitted audio files. Labelling the data allows us to train models that can recognize mistakes in audio files. I’m working on creating API endpoints to allow experts to submit evaluations from the website and interested parties receive data for training.

Transliteration

I added a feature to show the English transliteration of an ayah. This is a great way to expand our demographic base as it allows us to capture audio data for non-arabic speakers, a true niche and underrepresented demographic if you asked me. This was interesting challenge since there was no easily interpretable format (csv, JSON, xls) to read verses from. Thus, I turned to Beautiful Soup to scrape any website with transliteration (after getting tired of trying to find and learn the ‘right’ web scraping API for the job). Thankfully, all the sites were rather old static ones using tables so this made the job pretty easy. The next step was just reading the requested verse from the file which was a rather simple task.

Minification

This project was my first foray into backend web development. I had some front end experience (this site I guess…) but never knew much about the backend. I began by providing simple performance enhancements, code cleanup, refactoring, and utilities. Our first issue was the large loading times caused by the huge JS/JSON files we were sending. This was fixed by adding simple logic that only sent static files when needed. I then created a script that minifies1 all JS/CSS files. This is apparently good practice when working production environments.

Templates

Next was simplifying the HTML template for the homepage. The first thing that caught my eye was a beautiful 100+ line list of country codes. This was replaced by a simple for loop that loaded country codes from the backend instead. Stylistic changes were made to ease reading along with some comments. I think the next step is using something more professional like django-countries, but when we really need to capture background, we’ll worry about it.

exchange for a small (but indecipherable) file. I used the bable-minify-preset command line client for this.

  1. Minification basically removes all unnecessary whitespaces, line breaks, formatting in