Learn the field, not just the interface

From first base pair to real research questions.

This page is for curious people who want to understand what LOOM is really doing. You will learn what a CRISPR guide is, why conserved regions matter, how off-target risk changes decisions, and why exact genome search can surface ideas worth testing in the lab.

You do not need a biology degree to begin. You do need curiosity, patience, and the willingness to think like both an engineer and a scientist.

GenomeA long molecular text you can search
Guide RNAA programmable locator for a target site
Research gapA conserved site nobody has studied yet

Your map from zero to hero

Watch the signal travel: biology -> search -> screening -> novelty

Concept map for learning CRISPR search A directional learning map showing how genome biology, guide and PAM logic, conservation and off-target screening, and literature review flow into exact-search driven CRISPR discovery. LOOM exact genome search for CRISPR discovery Genome A,C,G,T text Guide RNA programmable lockpick PAM the docking rule Conservation works across variants Off-targets bad cross-matches Literature what is already known
1Biology first — Genome + Guide RNA + PAM
2Search exact sites — LOOM retrieves candidates
3Pressure-test — conservation & off-target filter
4Ask what is new — literature gap → experiment

Foundations

The fastest way to stop CRISPR from feeling mysterious is to reduce it to a few solid ideas you can carry everywhere.

Genome

A genome is searchable text

A pathogen genome is a very long string written in four letters: A, C, G, and T. LOOM treats it like text you can query exactly, the same way a search engine finds a phrase in a document.

Guide

A CRISPR guide is a programmable address

A guide RNA tells a CRISPR system where to go. If the guide matches a sequence in the pathogen and the local rules are satisfied, the system can bind or cut there.

PAM

Matching is not enough

Many CRISPR systems require a nearby short motif called a PAM. For SpCas9, the most famous pattern is NGG. If the sequence matches but the PAM is wrong, the target is often unusable.

Conservation

Diagnostics live or die on stability

If a target sequence mutates often, your assay breaks on new variants. Conserved regions survive because evolution has less freedom to change them without damaging the pathogen.

Specificity

Great targets avoid lookalikes

A useful target should appear in the pathogen you care about and stay absent from hosts or nearby lookalike organisms. Otherwise, your assay risks false signals.

Novelty

Research begins where published attention stops

Novel does not mean random. It means a region is conserved, plausible, and underexplored in the literature. The sweet spot is where biology and neglect overlap.

SVG labs

These are conceptual simulations. They are not full molecular dynamics or full FM-index internals. They are designed to teach the right mental model quickly and honestly.

Lab 1: How a guide finds a target

A guide needs both sequence agreement and a valid PAM. Step through the decision like a machine would.

Conceptual exact-match scan
Guide matching simulation A genome strip and a guide strip with highlights showing match, mismatch, and PAM location. Genome window
Start with a clean target. You should see a guide lining up with a genome segment and a nearby NGG PAM.
matching bases
mismatch
PAM zone

Lab 2: Why conserved regions beat flashy ones

Compare a mutation-prone surface protein with a quieter replication region as time passes.

Variant pressure model
Conservation simulation Two panels comparing Spike and polymerase style regions under mutation pressure over time. Immune-exposed region Mutates when the pathogen is under immune pressure. Replication machinery Often constrained by function;mutations are more costly.
2020
At the start of an outbreak, many targets look good. Over time, exposed regions often decay faster than constrained regions.

Lab 3: How exact search narrows the haystack

LOOM does not compare your query against every genome letter one by one. It uses indexed structure so each additional base collapses the candidate space.

Backwards-search intuition
Indexed search simulation Bars showing candidate ranges shrinking as query characters are added. Candidate interval shrinkage Each step uses one more base and cuts away impossible matches.
Use a short DNA pattern. The bars show a teaching example of how indexed exact search collapses possible matches into a tiny interval.

How real research moves

Research is not one giant leap. It is a disciplined pipeline of filters. Most bad ideas die early, which is exactly what you want.

1. Start with a biological question

Do not begin with a random sequence. Begin with a concrete problem: detect a pathogen faster, distinguish a variant, avoid host cross-reactivity, or probe an underexplored gene.

2. Search for conserved candidate sites

Use indexed search and precomputed target databases to find sites that recur across many genomes. Conservation is your first rough filter for assay robustness.

3. Check PAM compatibility

A perfect target with no compatible PAM may be useless for your chosen CRISPR enzyme. Pick the enzyme and PAM logic early.

4. Screen for off-targets

Search the same sequence against hosts and likely confounders. False positives can erase an otherwise beautiful design.

5. Scan the literature

Now ask whether the target, gene, or region has already been studied. Novelty matters most when combined with biological plausibility, not when novelty stands alone.

6. Design experiments with humility

Computational targets are a strong starting point, not a publication by themselves. You still need wet-lab validation, negative controls, sensitivity curves, and failure analysis.

What LOOM is especially good at

  1. Surfacing exact matches quickly across very large genome collections.
  2. Comparing conservation across many related sequences.
  3. Letting a scientist explore candidate targets without cloud infrastructure.

What LOOM does not replace

  1. Experimental validation.
  2. Clinical trial evidence.
  3. Deep mechanistic biology beyond exact sequence retrieval.

Why ontology matters

A gene can have many names. If you search for only one of them, you miss real research. Ontology integration is how we avoid that trap.

The problem

One gene, many names

The RSV SH protein is also called "small hydrophobic protein" in NCBI Gene. A scanner that searches only for "SH protein" misses hundreds of papers indexed under the official name. This is not a hypothetical — our v3 scanner had this exact bug, misclassifying a well-studied target as a confirmed research gap.

The fix

Ontology-backed synonym expansion

Our v4 scanner loads 8,283 gene annotations from NCBI Gene across all 12 pathogens. Every gene symbol, full name, and alias is merged with manually curated synonyms. The search uses the union of all name sources, so no known alias is missed.

NCBI Taxonomy

Canonical species identifiers (taxonomy IDs) for every pathogen. Ensures we are querying the right organism, not a namesake.

NCBI Gene

Official gene symbols, full names, and aliases. This is the primary source for synonym expansion — the layer that caught the RSV SH protein error.

Disease Ontology & MONDO

Standardised disease identifiers (DOID, MONDO) that link pathogens to the diseases they cause, enabling cross-database disease queries.

WHO GHO & ICD-11

Global health context: WHO priority classification, transmission routes, and ICD-11 codes. Lets us rank pathogens by public-health urgency.

How it works in practice

  1. The scanner loads ontology-enrichment.json at startup.
  2. For each pathogen gene, it collects every known alias from NCBI Gene.
  3. Aliases are merged with manual synonyms into a single expanded query.
  4. Every result is tagged with a synonym_source — so you can trace exactly which name matched.

What this means for you

  1. Gap claims are more trustworthy — fewer false negatives from name mismatches.
  2. Every result carries provenance, so you can verify it yourself.
  3. New pathogens inherit ontology coverage automatically from NCBI.

Your zero to hero roadmap

If you want this page to become a launchpad rather than a curiosity, follow a progression. Each stage compounds the one before it.

Stage 1: Learn the language

Become fluent with genome, gene, guide, PAM, conservation, specificity, assay, and variant. Until these feel natural, everything else stays foggy.

Stage 2: Learn to inspect targets

Open the search tool, pick one pathogen, and learn how to read the target table. Compare high-coverage sites to low-coverage sites. Check PAMs. Inspect genes.

Stage 3: Ask a small research question

Pick one underexplored gene and formulate a narrow hypothesis. Example: "Does this conserved replication-region site remain absent from published CRISPR diagnostic work?"

First project ideas

  1. Compare one structural gene and one replication gene for conservation stability.
  2. Build a shortlist of ten host-specific guides for one pathogen.
  3. Audit one pathogen's literature gap and propose two unexplored target genes.

If you want to go deeper

  1. Study molecular biology and introductory genetics.
  2. Learn sequence alignment, assembly, indexing, and phylogenetics.
  3. Pair computational work with lab mentorship whenever possible.

Glossary you can actually use

Short definitions tuned for practical reading, not for memorizing jargon in isolation.

Guide RNA

A short programmable sequence that tells a CRISPR system where to bind or cut.

PAM

A short nearby motif required by many CRISPR enzymes. For SpCas9, the classic example is NGG.

Conservation

The fraction of genomes that keep the same target sequence. Higher conservation usually means a more durable assay.

Off-target

An unintended match elsewhere, especially in host genomes or related organisms, that can produce false positives or unwanted binding.

Exact search

Search that asks whether a sequence occurs exactly, letter for letter, without substitution or alignment scoring.

Research gap

A biologically meaningful space with little or no published CRISPR work, often where new discoveries become possible.