An open-source initiative for the Torah world

The Torah world's spoken treasures, finally turned to text.

Most Torah ever taught lives only as audio that no one can search, read, translate, or preserve. TorahScribe is building the free, open, on-device speech-to-text that changes that — accurate for Hebrew, Yiddish, Aramaic, and Yeshivish, and owned by the whole community, forever.

A communal public good in formation — seeking founding funders, data partners, and builders.

100,000s of hours of recorded shiurim — effectively unsearchable today
~0 shiurim captioned for the deaf & hard-of-hearing
Decades of gedolim's recordings sitting on aging, unindexed media
1 open model the entire community could freely run

The problem

Torah is being taught faster than it can be captured.

Every day, in batei midrash and shuls and on the phone, an ocean of Torah is spoken — and most of it vanishes into audio files that can't be searched, quoted, reviewed, printed for Shabbos, or made accessible to those who can't hear. The general transcription tools that exist mangle it: they confuse Hebrew, Aramaic, and English, and miswrite pesukim and Gemara phrases.

A few proprietary services have started to solve this commercially — proof the need is real. But a capability this essential to limud haTorah shouldn't live behind a paywall or depend on one company. It should be open infrastructure: free to use, free to build on, and able to run anywhere.

Why now

Capable speech models now run on a phone — offline.

Until recently, accurate speech-to-text meant sending your audio to someone else's servers. That era is ending. Modern open models can run directly on a phone or laptop, with no internet — which matters for three reasons:

Permanence & access

An open model on your device can't be shut down, rate-limited, paywalled, or lost. It works in a beis medrash with no wifi, on an old phone, forever.

Privacy

Sensitive recordings — a beis din, a private chaburah, a family's testimony — never have to leave the device or touch a company's servers.

The window is open

The base models are good enough now and freely licensed. If the community doesn't build the open Torah layer today, the capability calcifies inside closed silos.

What we're building

A Torah-tuned transcription engine that belongs to the community.

Open & free, forever

Open model weights and open code under a permissive license. Anyone — a yeshiva, an archive, a developer, even another service — can use and build on it without permission or fees.

Runs on your device

Designed to run offline on a phone or laptop, not just in the cloud — so it's private, free to run, and works anywhere.

Tuned for Torah language

Built to handle what generic tools fail on: Hebrew, Aramaic, Hasidic Yiddish, and Yeshivish — with the names, pesukim, and Gemara phrases written correctly.

Community-owned

Stewarded as shared infrastructure, with rabbinic oversight for accuracy and the sensitivity that sacred texts deserve. Institutions keep ownership of their own transcripts.

Who it serves

One open tool, many mitzvos.

Accessibility

Automatic captions for the deaf and hard-of-hearing — opening shiurim that have never been accessible to them.

Discovery & learning

Turn vast audio libraries into a searchable, quotable, printable text of Torah — findable by topic, source, or phrase.

Preservation

Convert decades of gedolim's recordings into permanent, searchable text — a zikaron that outlives the tape.

Translation & outreach

Accurate transcripts make translation tractable — extending a rav's Torah to learners in other languages.

Scale of limud Torah

Let dozens of Torah organizations build on one shared engine instead of each paying to solve transcription again.

Data sovereignty

A yeshiva can run it on its own archive and own its own transcripts — no per-seat dependency on any vendor.

How it's built

Standing on open shoulders, filling the gap no one has.

  • We don't start from scratch. We build on the best open Hebrew speech models already released to the community, and adapt them to the Torah register.
  • We measure honestly. Our first deliverable is an open Torah transcription benchmark — the yardstick the field is missing — so improvement is provable, not just claimed.
  • We go on-device. A genuine first: an open, Torah-tuned model that transcribes a shiur on a phone, with no internet.
  • We do the hard languages. Hasidic Yiddish and Talmudic Aramaic — the parts of the mission that generic tools ignore and that have almost no open data today.
  • We keep a human in the loop. Rabbinic and scholarly review for sacred names, accurate quotations, and the kavod that Torah requires.

Roadmap

A practical path, phase by phase.

  1. Phase 0

    Benchmark & baseline

    Publish the first open Torah transcription benchmark and measure where today's models really stand on shiurim, daf yomi, and davening.

  2. Phase 1

    First Torah-tuned model

    Adapt an open Hebrew model to Torah content and ship a working demo — including a phone that transcribes a shiur offline.

  3. Phase 2

    Yiddish, Aramaic & code-switching

    Extend to Hasidic Yiddish and Talmudic Aramaic through data partnerships with the archives and communities that hold this audio.

  4. Phase 3

    Production & permanence

    Harden quality, release open weights and tools, and establish TorahScribe as durable communal infrastructure.

Get involved

Help make Torah transcription a permanent public good.

TorahScribe is a non-commercial, open initiative in formation. We're looking for the founding partners who understand that shared infrastructure is the highest-leverage investment in the Torah world's future.

Funders & donors

Seed the benchmark, the first model, and the on-device demo. This is roads, not an app — built once, used by everyone.

Data partners

Archives, yeshivos, and shiur platforms who can share audio — and in return get their own libraries made searchable and accessible.

Builders

ML engineers, Hebrew/Yiddish/Aramaic linguists, and Torah scholars who want to lend their expertise.