What is Hoodini?
Hoodini is a gene-centric comparative genomics toolkit for microbial genomes. It unleashes its full power when comparing multiple genomic neighborhoods around homologous genes or genetic elements.
What can you do with Hoodini?
- Explore protein context — You found an interesting protein and want to see what genes surround it across different organisms
- Compare homolog neighborhoods — You have a list of homologs from a BLAST search and want to compare their genomic contexts
- Visualize gene clusters — You’re studying a protein family and want to visualize conserved operons or gene clusters
- Analyze non-coding regions — Compare CRISPR arrays, regulatory elements, or genomic islands
- Include custom genomes — Work with newly sequenced genomes, MAGs, or proprietary data not in NCBI
- Annotate defense systems — Automatically detect PADLOC, DefenseFinder, CRISPR-Cas, and mobile elements
Quick Start
1. Install
Hoodini is not yet available on Bioconda. Please follow the Installation guide for current installation methods.
2. Run
Choose the input mode that fits your data:
Single protein ID — Hoodini runs remote BLAST to find homologs (NCBI or UniProt IDs):
hoodini run --input "WP_012345678.1" --output results/File with protein IDs — Each ID is an independent query, no BLAST (NCBI or UniProt IDs):
hoodini run --input my_proteins.txt --output results/File with nucleotide IDs — For (pro)phages, plasmids, genomic islands, CRISPR arrays (NCBI nucleotide IDs only):
hoodini run --input my_nucleotides.txt --output results/Local data — For custom assemblies or data not in NCBI:
hoodini run --inputsheet my_data.tsv --output results/For --inputsheet, provide a TSV with protein_id or nucleotide_id column plus genomic data paths (gbf_path or faa_path + fna_path + gff_path).
3. Add annotations (optional)
hoodini run --input proteins.txt --output results/ \
--padloc --deffinder --cctyper --genomadDefault Tree Building
Hoodini automatically builds a phylogenetic tree to organize your results:
| Input type | Default tree method |
|---|---|
| Protein IDs | FAMSA alignment → VeryFastTree (fast ML) |
| Nucleotide IDs | AAI (Average Amino Acid Identity) distance tree |
You can customize the tree method with --tree-mode. See CLI Reference for all options.
Pipeline Overview
Hoodini performs these stages (optional steps enabled by flags):
Initialize
Parse inputs and create output folder
IPG parsing
Resolve identical protein groups for candidate selection
Assembly download
Fetch assemblies and extract neighborhoods
Protein comparisons (optional)
All-vs-all similarity for AAI trees
Nucleotide comparisons (optional)
ANI/nucleotide links
Protein clustering
Group proteins by similarity for visualization
Tree construction
Build phylogenetic tree
Annotations (optional)
PADLOC, DefenseFinder, domains, etc.
Visualization
Generate interactive HTML viewer and tables
Learn More
- CLI Reference — Complete reference for all commands and options
- Tutorial — Step-by-step walkthrough with real examples
- API Reference — Python API for programmatic access