A genome is searchable text
A pathogen genome is a very long string written in four letters: A, C, G, and T. LOOM treats it like text you can query exactly, the same way a search engine finds a phrase in a document.
A guided, visual introduction to CRISPR diagnostics, pathogen genomics, and exact search
This page is for curious people who want to understand what LOOM is really doing. You will learn what a CRISPR guide is, why conserved regions matter, how off-target risk changes decisions, and why exact genome search can surface ideas worth testing in the lab.
You do not need a biology degree to begin. You do need curiosity, patience, and the willingness to think like both an engineer and a scientist.
Watch the signal travel: biology -> search -> screening -> novelty
The fastest way to stop CRISPR from feeling mysterious is to reduce it to a few solid ideas you can carry everywhere.
A pathogen genome is a very long string written in four letters: A, C, G, and T. LOOM treats it like text you can query exactly, the same way a search engine finds a phrase in a document.
A guide RNA tells a CRISPR system where to go. If the guide matches a sequence in the pathogen and the local rules are satisfied, the system can bind or cut there.
Many CRISPR systems require a nearby short motif called a PAM. For SpCas9, the most famous pattern is NGG. If the sequence matches but the PAM is wrong, the target is often unusable.
If a target sequence mutates often, your assay breaks on new variants. Conserved regions survive because evolution has less freedom to change them without damaging the pathogen.
A useful target should appear in the pathogen you care about and stay absent from hosts or nearby lookalike organisms. Otherwise, your assay risks false signals.
Novel does not mean random. It means a region is conserved, plausible, and underexplored in the literature. The sweet spot is where biology and neglect overlap.
These are conceptual simulations. They are not full molecular dynamics or full FM-index internals. They are designed to teach the right mental model quickly and honestly.
A guide needs both sequence agreement and a valid PAM. Step through the decision like a machine would.
Compare a mutation-prone surface protein with a quieter replication region as time passes.
LOOM does not compare your query against every genome letter one by one. It uses indexed structure so each additional base collapses the candidate space.
Research is not one giant leap. It is a disciplined pipeline of filters. Most bad ideas die early, which is exactly what you want.
Do not begin with a random sequence. Begin with a concrete problem: detect a pathogen faster, distinguish a variant, avoid host cross-reactivity, or probe an underexplored gene.
Use indexed search and precomputed target databases to find sites that recur across many genomes. Conservation is your first rough filter for assay robustness.
A perfect target with no compatible PAM may be useless for your chosen CRISPR enzyme. Pick the enzyme and PAM logic early.
Search the same sequence against hosts and likely confounders. False positives can erase an otherwise beautiful design.
Now ask whether the target, gene, or region has already been studied. Novelty matters most when combined with biological plausibility, not when novelty stands alone.
Computational targets are a strong starting point, not a publication by themselves. You still need wet-lab validation, negative controls, sensitivity curves, and failure analysis.
A gene can have many names. If you search for only one of them, you miss real research. Ontology integration is how we avoid that trap.
The RSV SH protein is also called "small hydrophobic protein" in NCBI Gene. A scanner that searches only for "SH protein" misses hundreds of papers indexed under the official name. This is not a hypothetical — our v3 scanner had this exact bug, misclassifying a well-studied target as a confirmed research gap.
Our v4 scanner loads 8,283 gene annotations from NCBI Gene across all 12 pathogens. Every gene symbol, full name, and alias is merged with manually curated synonyms. The search uses the union of all name sources, so no known alias is missed.
Canonical species identifiers (taxonomy IDs) for every pathogen. Ensures we are querying the right organism, not a namesake.
Official gene symbols, full names, and aliases. This is the primary source for synonym expansion — the layer that caught the RSV SH protein error.
Standardised disease identifiers (DOID, MONDO) that link pathogens to the diseases they cause, enabling cross-database disease queries.
Global health context: WHO priority classification, transmission routes, and ICD-11 codes. Lets us rank pathogens by public-health urgency.
ontology-enrichment.json at startup.synonym_source — so you can trace exactly which name matched.If you want this page to become a launchpad rather than a curiosity, follow a progression. Each stage compounds the one before it.
Become fluent with genome, gene, guide, PAM, conservation, specificity, assay, and variant. Until these feel natural, everything else stays foggy.
Open the search tool, pick one pathogen, and learn how to read the target table. Compare high-coverage sites to low-coverage sites. Check PAMs. Inspect genes.
Pick one underexplored gene and formulate a narrow hypothesis. Example: "Does this conserved replication-region site remain absent from published CRISPR diagnostic work?"
Short definitions tuned for practical reading, not for memorizing jargon in isolation.
A short programmable sequence that tells a CRISPR system where to bind or cut.
A short nearby motif required by many CRISPR enzymes. For SpCas9, the classic example is NGG.
The fraction of genomes that keep the same target sequence. Higher conservation usually means a more durable assay.
An unintended match elsewhere, especially in host genomes or related organisms, that can produce false positives or unwanted binding.
Search that asks whether a sequence occurs exactly, letter for letter, without substitution or alignment scoring.
A biologically meaningful space with little or no published CRISPR work, often where new discoveries become possible.