How It Works

A researcher's guide to every feature in LOOM CRISPR Search.

New to CRISPR or pathogen genomics? Start with Learn for a visual, step-by-step introduction before diving into this feature reference.

1. What is this tool?

LOOM CRISPR Search is an open-science platform for discovering CRISPR-based diagnostic targets across pathogen genomes. It uses a 195 KB WebAssembly binary (built on a Burrows-Wheeler Transform FM-index) to perform sub-millisecond exact-match searches across millions of genome sequences — entirely in your browser. No data leaves your machine.

The database contains 140,000 pre-computed CRISPR targets across 13 human pathogens extracted from over 3 million genomes downloaded from NCBI.

2. Pathogen Cards & Diagnostic Priority

The landing page shows a card for each pathogen. Each card displays the number of targets and genomes indexed.

Diagnostic Priority Badge

Many cards have a colored badge in the top-right corner:

Dx High — Score ≥ 70. Greatest unmet diagnostic need.
Dx Medium — Score 40–69.
Dx Low — Score < 40. Good existing diagnostics already available.

The score (0–100) is a composite of five weighted factors:

CRISPR diagnostic gap (30%) — Are there published CRISPR-based diagnostics for this pathogen?
Disease burden (25%) — Annual deaths worldwide.
Case fatality rate (15%) — How lethal is the disease?
WHO classification (15%) — Is it on the WHO R&D Blueprint or priority pathogen list?
Outbreak potential (15%) — Pandemic/epidemic risk factors.

Tip: Hover over any badge to see the exact numeric score and the factors behind it.

Top Diagnostic Opportunities

The three featured cards at the top highlight pathogens with the highest priority scores. These include a brief summary (e.g., "No CRISPR dx published · 21,000–143,000 deaths/yr") so you can immediately see why they rank high.

3. Targets Table — Columns Explained

After selecting a pathogen, the table shows one row per CRISPR target site (23-mer). Here's what each column means:

Sequence (23-mer)

The 23-nucleotide CRISPR target sequence, shown with color-coded bases (A=green, T=red, G=gold, C=blue). This includes the 20-nt guide + 3-nt PAM motif. Badges may appear next to the sequence (see Cross-Reactivity and Resistance sections below).

Gene

The gene the target falls within, or intergenic if it's between genes. Click on a gene tag to jump to the Disease Context tab and highlight that gene in the gene map.

Position

The start position (0-based offset) within the indexed genome.

PAM

The protospacer adjacent motif. NGG = SpCas9-compatible (most widely used). TTTN = Cas12a-compatible. – = no canonical PAM detected (still a valid target site for PAM-less Cas variants).

Coverage

How many of the indexed genomes contain this exact 23-mer sequence. Shown as both a count and a percentage. Example: 1,832 67.6% means 1,832 out of 2,711 Zika genomes have this sequence.

Higher coverage = more conserved target = works across more strains = better diagnostic candidate.

Score

Guide RNA quality score (see next section).

Actions

BLAST link (see BLAST section below).

4. Guide RNA Quality Score

Each guide gets a composite quality score from 0 to 100, displayed as a colored circle:

86 — High quality (≥ 70). Likely to work well in the lab.
52 — Medium quality (40–69). Usable but may need optimization.
28 — Low quality (< 40). Potential issues with efficiency or specificity.

The score is computed from six factors:

GC content (25%) — Optimal range is 40–60%. Too high or too low reduces binding efficiency.
Poly-T terminator (20%) — Runs of 4+ T's can cause premature transcription termination. Penalized.
Seed-region GC (15%) — The 8–12 nt seed region (PAM-proximal) needs moderate GC for specificity.
Homopolymer runs (15%) — Long stretches of any single base reduce efficiency. Penalized.
PAM type (15%) — NGG (SpCas9) gets full marks; TTTN (Cas12a) gets partial; no PAM gets zero.
Self-complementarity (10%) — Sequences that can fold back on themselves are penalized.

Tip: Hover over any score badge to see the individual GC%, seed-region GC%, and max homopolymer run length.

5. Cross-Reactivity Badges

For the top 100 NGG guides per pathogen, we ran exact-match searches against 7 animal/human host genomes (human GRCh38, pig, bat, chicken, cow, camel, mouse) to check whether the guide sequence also appears in a host genome — which would cause off-target effects in a diagnostic assay.

specific — No hits in any host genome. This guide is pathogen-specific.
3 host hits — Found in 3 host genomes. Hover to see which hosts.

99.7% of the tested guides are specific. Guides without a badge were not among the top 100 NGG guides for that pathogen and have not been tested.

6. BLAST Links

Every row has a BLAST link in the Actions column. Clicking it opens NCBI BLAST (blastn) in a new tab with the guide sequence pre-filled and the database set to nt (all NCBI nucleotide sequences).

You need to click the blue "BLAST" button on the NCBI page to run the search — NCBI does not allow automated submission via URL because each search uses compute resources on their servers.

What BLAST tells you: Which organisms and genomic regions contain this exact (or similar) sequence. Use it to verify specificity beyond our pre-computed host checks, or to explore evolutionary conservation.

7. Drug-Resistance Overlay

Guides that overlap known drug-resistance mutation regions show a ☢ geneSymbol badge next to the sequence. This means the target site sits within a genomic region where drug-resistance mutations are known to occur.

Why it matters: Targeting resistance regions can be a double-edged sword — it could detect resistant strains specifically, but mutations in that region may also cause the guide to lose binding in resistant variants.

Hover over the badge to see the specific drug and mutation region name.

Coverage by pathogen:

HIV-1 — Protease and reverse transcriptase inhibitor resistance regions (58 overlapping guides)
Influenza A — Neuraminidase inhibitor resistance (754 overlapping guides)
Hepatitis B — Polymerase inhibitor resistance (9,298 overlapping guides)
M. tuberculosis — Currently 0 overlaps (coordinate-system mismatch with multi-strain concatenated genome; known limitation)

8. Research Novelty Filter

The dropdown filter lets you view:

All targets — No filtering.
Novel (unstudied genes) — Targets in genes that have NOT been studied in published CRISPR diagnostic papers (based on our PubMed scan). These represent potential new research directions.
Published (studied genes) — Targets in genes already covered by published CRISPR diagnostic work.

Genes are identified using NCBI gene annotations and matched against PubMed-indexed publications via ontology-bridged symbol resolution.

research gap — This gene has no published CRISPR diagnostic studies. Potential novelty.
studied — Published CRISPR diagnostic work exists targeting this gene.

9. Live Search

The Live Search tab lets you search for any arbitrary DNA sequence against a pathogen's FM-index — loaded directly into your browser as a WASM binary.

Select a pathogen from the cards.
Go to the Live Search tab.
Click "Load Index" (downloads the FM-index, typically 5–200 MB).
Type or paste any DNA sequence. Results appear instantly (< 1 ms).

This searches the full indexed genome, not just pre-computed targets. You can search for primers, probes, or any sequence of interest.

Note: Very large indexes (e.g., Human GRCh38 at 1.5 GB) may take longer to download but still search in under 1 ms once loaded.

10. Panel Designer

The Panel Designer tab helps you design a minimal multiplex diagnostic panel — a small set of CRISPR guides that can distinguish between multiple pathogens in a single assay.

Select which pathogens to include (or use a preset: Respiratory, Hemorrhagic, STI, All).
Set minimum conservation threshold (default: 70%) and minimum quality score (default: 50).
Click Design Panel.

The algorithm uses a greedy set-cover approach: for each pathogen, it picks the highest-scoring guide that is unique to that pathogen (not found as a top candidate in others). The result is a table showing one distinguishing guide per pathogen.

Click Export Panel CSV to download the panel as a spreadsheet for ordering oligos or sharing with your lab.

If a pathogen shows "no qualifying guides": Relax the conservation or score thresholds. Pathogens with very large genome diversity (e.g., Dengue with 55,000 genomes) may not have guides reaching 70% conservation.

11. Disease Context Tab

After selecting a pathogen, the Disease Context tab shows biomedical ontology data:

Identifiers — NCBI TaxID, Disease Ontology (DOID), MONDO, ICD-11 codes.
Epidemiological context — WHO classification, case fatality rate, annual burden, geographic spread, vaccine status.
Diagnostic landscape — Current CRISPR dx status, existing rapid tests, diagnostic need assessment.
Gene Map — All annotated genes for this pathogen with symbol, product, and type. Click a gene tag in the targets table to jump here.
Drug-Resistance Regions — If applicable, a table of known resistance mutation regions with drug names and coordinates.

12. Exports

Two export buttons are available when viewing a pathogen's targets:

CSV — Comma-separated values, importable into Excel/Google Sheets. Includes columns: sequence, gene, position, PAM, occurrences, coverage %, guide score, GC%, seed GC%, and host specificity status.
JSON — Machine-readable format with the same fields, suitable for programmatic analysis.

The Panel Designer also has its own Export Panel CSV button for the designed panel.

13. Methods & Data Sources

Full methodology details are available in the Data Sources tab within the app. Key points:

Genome source: NCBI Datasets API — complete genomes for each pathogen.
Indexing: BWT FM-index (Burrows-Wheeler Transform) built with the brenda Rust crate. All genomes for a pathogen are concatenated and indexed together.
Target extraction: All 23-mer sequences with NGG, TTTN, or no PAM are extracted and ranked by occurrence (conservation) across genomes.
Gene annotation: NCBI Gene database via Datasets V2 API.
Literature scan: PubMed/PMC queries per pathogen per gene for CRISPR diagnostic publications.
Diagnostic priority: Scored from WHO reports, GBD (Global Burden of Disease) data, and published literature reviews.
Cross-reactivity: Exact 23-mer matching against 7 animal host FM-indexes (human, pig, bat, chicken, cow, camel, mouse).
All processing runs locally. No server. No data exfiltration. The entire app is a set of static files.

LOOM CRISPR Search — Open-science CRISPR target discovery

Alvaro Videla Godoy — 2026