CLI Reference
Hoodini provides a powerful command-line interface for comparative genomics analysis.
Commands Overview
hoodini run
The main pipeline command that orchestrates the complete analysis workflow.
hoodini run --input <accessions.txt> --output results/Input Options
—input
Single file or literal input
# File with one accession per line
hoodini run --input accessions.txt
# Literal protein accession
hoodini run --input "WP_012345678.1"When using a literal input, Hoodini performs a remote BLAST to expand the search set using --remote-evalue and --remote-max-targets.
Output & Config
| Option | Type | Default | Description |
|---|---|---|---|
--config | path | — | TOML config file |
--output | path | results | Output directory |
--force | flag | false | Overwrite existing output |
--keep | flag | false | Keep intermediate files |
Performance
| Option | Type | Default | Description |
|---|---|---|---|
--num-threads | int | 10 | Number of threads |
--max-concurrent-downloads | int | 8 | Parallel NCBI downloads |
--api-key | str | — | NCBI API key (or NCBI_API_KEY env var) |
Data Sources
| Option | Type | Default | Description |
|---|---|---|---|
--assembly-folder | path | — | Use local assemblies instead of downloading |
Neighborhood Window
| Option | Type | Default | Description |
|---|---|---|---|
--win-mode | str | win_nts | Window mode: win_nts or win_genes |
--win | int | 20000 | Window size (nucleotides or genes) |
--min-win | int | 2000 | Minimum window per side |
--min-win-type | str | both | total, upstream, downstream, or both |
Clustering
| Option | Type | Default | Description |
|---|---|---|---|
--cand-mode | str | best_id | Candidate selection mode (see below) |
--clust-method | str | diamond_deepclust | Clustering method |
Candidate selection modes (--cand-mode)
best_id(default): Pick single best representative per input homolog, prioritizing assembly quality and edge proximity. The protein ID must match the original query when possible.best_ipg: Same asbest_id, but allows different protein IDs (e.g., GenBank vs RefSeq versions of the same protein).same_id: Keep all IPG records that share the same protein ID as the query. Especially relevant for non-redundant proteins (WP_,YP_,NP_) present in multiple assemblies.any_ipg: Keep all identical proteins from IPG regardless of ID. Can result in massive expansion if the protein exists in thousands of assemblies.one_id: Keep first IPG record per input homolog, regardless of assembly quality (order from NCBI IPG).
Warning: Using any_ipg or same_id can dramatically increase the number of neighborhoods if your query protein is highly conserved across many assemblies.
Pairwise Comparisons
| Option | Type | Default | Description |
|---|---|---|---|
--prot-links | flag | false | Compute protein similarity links |
--nt-links | flag | false | Compute nucleotide links |
--ani-mode | str | fastani | ANI calculation: skani or blastn |
--nt-aln-mode | str | blastn | Nucleotide alignment: blastn, fastani, minimap2, intergenic_blastn |
--min-pident | float | 30.0 | Minimum percent identity for AAI/wGRR |
Tree Construction
| Option | Type | Default | Description |
|---|---|---|---|
--tree-mode | str | fast_ml | Tree building method (see below) |
--tree-file | path | — | Input Newick tree for use_input_tree |
--aai-mode | str | wgrr | AAI mode: wgrr or aai |
--aai-subset-mode | str | target_region | Subset for AAI tree: target_prot, target_region, window |
Tree modes (--tree-mode)
| Mode | Description |
|---|---|
taxonomy | NCBI taxonomy distances with single-linkage clustering |
fast_nj | FAMSA distance matrix → DecentTree NJ/UPGMA |
fast_ml | FAMSA alignment → VeryFastTree (default) |
aai_tree | AAI/wGRR pairwise distances → DecentTree |
ani_tree | ANI pairwise distances → DecentTree |
use_input_tree | Load from --tree-file |
foldmason_tree | AlphaFold structures → foldmason MSA → VeryFastTree |
neigh_similarity_tree | Jaccard distance on protein cluster presence/absence |
neigh_phylo_tree | Weighted neighborhood similarity using gene positions |
Remote BLAST
For single-query expansion:
| Option | Type | Default | Description |
|---|---|---|---|
--remote-evalue | float | 1e-5 | E-value for remote BLAST |
--remote-max-targets | int | 100 | Max hits for remote BLAST |
Annotations
| Option | Type | Default | Description |
|---|---|---|---|
--padloc | flag | false | Run PADLOC defense system detection |
--deffinder | flag | false | Run DefenseFinder |
--cctyper | flag | false | Run CCTyper CRISPR-Cas typing |
--genomad | flag | false | Run geNomad virus/plasmid detection |
--ncrna | flag | false | Infernal ncRNA prediction |
--domains | str | — | Comma-separated MetaCerberus domains |
--emapper | flag | false | Run eggNOG-mapper |
--blast | path | — | FASTA file for BLAST search against neighborhood nucleotides (e.g., IS elements) |
--sorfs | flag | false | Re-annotate small ORFs in extracted regions |
Logging
| Option | Type | Default | Description |
|---|---|---|---|
--quiet | flag | false | Silence non-error output |
--debug | flag | false | Verbose debug logging |
hoodini download
Download databases and resources used by Hoodini.
hoodini download databases --threads 8Subcommands
databases
Download all databases (~35 GB total)
Downloads all required databases for full functionality: PADLOC, DefenseFinder, geNomad, eggNOG-mapper, and supporting files.
hoodini download databases [OPTIONS]| Option | Description |
|---|---|
--force | Re-download existing files |
--threads | Number of threads |
--skip-padloc | Skip PADLOC models |
--skip-deffinder | Skip DefenseFinder models |
--skip-genomad | Skip geNomad database |
--skip-emapper | Skip eggNOG-mapper data |
--skip-parquet | Skip parquet files |
--skip-contig-lengths | Skip contig length database |
hoodini utils
Utility commands for metadata helpers.
hoodini utils nuc2asmlen --output out.tsv input.tsvSubcommands
nuc2asmlen
Convert nucleotide IDs to assembly lengths
hoodini utils nuc2asmlen --output out.tsv input.tsvTakes a TSV with nucleotide/contig accessions and adds assembly and contig length metadata.
Configuration File
You can supply a TOML config file using --config. CLI flags always override config values.
hoodini run --config my_config.toml --input accessions.txtDefault values are defined in hoodini/config/defaults.toml under sections: general, window, tree, aai, ani, clustering, annotations, pairwise, and paths.
See outputs for details on output files and directory structure.