LOOM CRISPR Search — Open-Science Target Discovery

Pathogens

Select a pathogen to browse pre-computed CRISPR targets or load its FM-index for live search

Select a pathogen to view CRISPR targets

#	Sequence (23-mer)	Gene	Position	PAM	Coverage	Score	Actions

Select a pathogen above to view its CRISPR targets

No index loaded

Select a pathogen, load its FM-index, then search any DNA sequence for exact matches across all genomes

PubMed Literature Scanner

Search NCBI PubMed for published research on any CRISPR target sequence. Identify which targets are novel (no prior publications) and which have existing literature — helping prioritize targets for new papers.

Enter a sequence or gene name to search published CRISPR literature, or use "Quick scan" to check all targets for a selected pathogen

Disease & Epidemiological Context

Biomedical ontology data linking each pathogen to standardized disease classifications, WHO surveillance context, gene annotations, and diagnostic landscape. All data is bundled — works fully offline.

Select a pathogen to view disease context and ontology data

NCBI

NCBI RefSeq Viral

Complete reference sequences for all known viral genomes. The primary source for our viral pathogen indexes.

Dataset: RefSeq Viral Complete Genomes
Accessed: March 2026
Sequences: 703,000+
License: Public domain (US Government work)
URL: ncbi.nlm.nih.gov/datasets

Citation: Sayers EW, et al. "Database resources of the National Center for Biotechnology Information." Nucleic Acids Research, 2024, 52(D1):D33-D43.

NCBI

NCBI RefSeq Bacterial

Reference genomes for bacterial pathogens including M. tuberculosis and V. cholerae.

Dataset: RefSeq Bacterial Genomes (selected pathogens)
Accessed: March 2026
Pathogens: Cholera, Tuberculosis
License: Public domain (US Government work)
URL: ncbi.nlm.nih.gov/datasets

Citation: O'Leary NA, et al. "Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation." Nucleic Acids Research, 2016, 44(D1):D733-D745.

GRCh38

Human Reference Genome

GRCh38.p14 (Genome Reference Consortium Human Build 38) used for off-target analysis — ensuring CRISPR guides don't match human sequences.

Assembly: GRCh38.p14 (GCF_000001405.40)
Accessed: March 2026
Size: ~3.1 Gbp
License: Public domain
URL: GRCh38.p14 at NCBI

Citation: Schneider VA, et al. "Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly." Genome Research, 2017, 27(5):849-864.

LOOM

LOOM FM-Index Engine

The search engine powering this tool. A Burrows-Wheeler Transform (BWT) based FM-index compiled to 195 KB of WebAssembly, enabling sub-millisecond exact-match search in the browser.

Library: brenda (Rust crate)
Binary: 195 KB WASM
Method: FM-index with suffix array sampling
License: Open source

Method: Ferragina P, Manzini G. "Opportunistic data structures with applications." FOCS 2000. IEEE, 2000, pp. 390-398.

PubMed / NCBI E-utilities

Literature searches use NCBI's public E-utilities API to query PubMed for published CRISPR research related to target sequences.

API: NCBI E-utilities (esearch + esummary)
Rate limit: 3 requests/sec (unauthenticated)
Data: PubMed article metadata
License: Public access (NLM Terms of Service)
URL: E-utilities documentation

Citation: Sayers E. "E-utilities Quick Start." Entrez Programming Utilities Help, NCBI, 2024.

TAX

NCBI Taxonomy

Standardized taxonomic classification for every pathogen — species name, lineage, genome type, and transmission mode. Powers the Disease Context taxonomy chips.

Dataset: NCBI Taxonomy Database
Accessed: March 2026
Pathogens: 12 (all indexed species)
License: Public domain (US Government work)
URL: ncbi.nlm.nih.gov/taxonomy

Citation: Schoch CL, et al. "NCBI Taxonomy: a comprehensive update on curation, resources and tools." Database, 2020, baaa062.

Disease Ontology

Standardized disease definitions, synonyms, and cross-references for each pathogen's primary disease. Provides the "What is it?" descriptions and alternative names in the Disease Context tab.

Dataset: Disease Ontology (DO)
Accessed: March 2026
Terms: 12 disease terms (DOID mapped)
License: CC0 1.0 (Public Domain)
URL: disease-ontology.org

Citation: Schriml LM, et al. "The Human Disease Ontology 2022 update." Nucleic Acids Research, 2022, 50(D1):D1255-D1261.

MONDO Disease Ontology

Cross-ontology disease mappings linking Disease Ontology, OMIM, Orphanet, and other vocabularies. Provides additional cross-references for each pathogen's disease.

Dataset: Monarch Disease Ontology (MONDO)
Accessed: March 2026
Terms: 12 disease terms (MONDO mapped)
License: CC BY 4.0
URL: mondo.monarchinitiative.org

Citation: Vasilevsky NA, et al. "Mondo: Unifying diseases for the world, by the world." medRxiv, 2022.

WHO

WHO Disease Surveillance

Epidemiological context from the World Health Organization — case fatality rates, annual case/death estimates, geographic spread, diagnostic landscape, and CRISPR diagnostic status.

Dataset: WHO Disease Outbreak News & fact sheets
Accessed: March 2026
Data: Epi stats for 12 pathogens
License: CC BY-NC-SA 3.0 IGO
URL: who.int/disease-outbreak-news

Citation: World Health Organization. "Disease Outbreak News." WHO, 2024-2026.

GENE

NCBI Gene / Datasets V2

Complete gene annotations for each pathogen's reference genome — gene symbols, names, positions, and types. Powers the Gene Map section with 8,283 annotated genes across all 12 pathogens.

API: NCBI Datasets V2 (annotation_report)
Accessed: March 2026
Genes: 8,283 across 12 pathogens
License: Public domain (US Government work)
URL: NCBI Datasets V2 API

Citation: NCBI Resource Coordinators. "Database resources of the National Center for Biotechnology Information." Nucleic Acids Research, 2024, 52(D1):D33-D43.

Methods (Copy for Your Paper)

Copy this methods paragraph into your manuscript's Materials & Methods section:

CRISPR diagnostic target candidates were identified using LOOM CRISPR Search
(https://calm-mushroom-0185d800f.4.azurestaticapps.net), a BWT/FM-index based pangenomic scanning tool.
For each pathogen, all available genome assemblies were downloaded from NCBI
RefSeq (accessed March 2026) and concatenated into a single corpus. A 23-mer
sliding window (20 bp guide + 3 bp PAM context) was applied to extract all
candidate target sequences. PAM classification identified NGG (SpCas9) and
TTTN (Cas12a/Cpf1) compatible sites. Targets were ranked by genome conservation
(occurrence count across all assemblies). Guide quality scores were computed
based on GC content (optimal 40-70%), seed region GC (last 12 nt), poly-T
terminator absence, homopolymer run length, and self-complementarity.
Off-target specificity was assessed by searching each candidate against 7
host reference genomes (human GRCh38, pig, bat, chicken, cow, camel, mouse)
using exact-match FM-index queries. Literature coverage was assessed by
automated PubMed scanning with ontology-enhanced synonym expansion (NCBI
Taxonomy, Disease Ontology, MONDO). Drug-resistance region overlap was
annotated using coordinates from WHO mutation catalogs and Stanford HIVDB.

How To Cite This Tool

If you use LOOM CRISPR Search or data from this tool in your research, please cite:

LOOM CRISPR Search: Open-science CRISPR target discovery tool.
https://calm-mushroom-0185d800f.4.azurestaticapps.net (2026).
Genome data: NCBI RefSeq Viral & Bacterial, GRCh38.p14.
Disease context: Disease Ontology (CC0), MONDO (CC BY 4.0),
  WHO Disease Surveillance, NCBI Taxonomy, NCBI Gene.
Search engine: brenda FM-index (195 KB WASM).

CRISPR Glossary

Key terms used throughout this tool

PAM (Protospacer Adjacent Motif)

A short DNA sequence (2–6 bp) immediately adjacent to the target site that the CRISPR-Cas protein must recognize before it can bind and cut. Without the correct PAM, the enzyme ignores the target — even if the guide RNA matches perfectly. Different Cas enzymes require different PAMs.

NGG (SpCas9 PAM)

The PAM sequence required by SpCas9 (from S. pyogenes), the most widely used CRISPR nuclease. "N" = any nucleotide, "GG" = two guanines. The NGG must appear on the 3′ side of the 20-bp target. Example: ...ATCGATCGATCGATCGATCGAGG

TTTN (Cas12a / Cpf1 PAM)

The PAM sequence required by Cas12a (also called Cpf1), often used in SHERLOCK/DETECTR diagnostic assays. "TTT" = three thymines, "N" = any nucleotide. Unlike SpCas9, this PAM sits on the 5′ side of the target. Example: TTTGATCGATCGATCGATCGATCG...

Guide RNA (gRNA / sgRNA)

A short RNA molecule (~20 nt) that directs the Cas enzyme to a specific DNA target via Watson-Crick base pairing. In this tool, each 23-mer target represents a potential guide: 20 bp of targeting sequence + 3 bp PAM.

23-mer

A 23-nucleotide DNA sequence. In CRISPR target design, a 23-mer typically means the 20 bp guide sequence plus a 3 bp PAM (e.g., 20 bp + NGG). This is the standard targeting unit for SpCas9.

Off-target

An unintended genomic site where a CRISPR guide could bind and cut due to partial sequence similarity. Good diagnostic guides should have zero off-targets in the human genome — which is why this tool can cross-check against GRCh38.

FM-index

A compressed full-text index based on the Burrows-Wheeler Transform (BWT). It enables exact substring matching in sub-millisecond time across gigabytes of genome data. LOOM uses a 195 KB WASM FM-index to search pathogen genomes entirely in your browser.

SpCas9

Streptococcus pyogenes Cas9 — the original and most commonly used CRISPR nuclease. Recognizes NGG PAM. Widely validated in diagnostics (e.g., SHERLOCK) and therapeutics.

Cas12a (Cpf1)

An alternative CRISPR nuclease that recognizes TTTN PAMs and creates staggered (sticky-end) cuts. Used in the DETECTR diagnostic platform. Offers different targeting range compared to SpCas9.

SHERLOCK / DETECTR

CRISPR-based diagnostic platforms. SHERLOCK (Cas13) and DETECTR (Cas12a) detect specific nucleic acid sequences with high sensitivity, used for rapid pathogen detection (e.g., SARS-CoV-2, Zika, Dengue).

Multiplexed Diagnostic Panel Designer

Select pathogens to build a syndromic diagnostic panel. The algorithm finds the minimum set of non-cross-reactive NGG guides that uniquely identifies each pathogen.

Select Pathogens for Panel

Min conservation: Min guide score: Host-specific only:

LOOMCRISPR