Outputs

This page describes the files and folders produced by Hoodini and how to interpret them.

Output Directory Structure

- - - gff.parquet
    - hoods.parquet
    - protein_metadata.parquet
    - tree_metadata.parquet
    - protein_links.parquet
    - nucleotide_links.parquet
    - domains.parquet
    - domains_metadata.parquet
    - ncrna_metadata.parquet
  - tree.nwk
  - *.html
- assembly_list.txt
- all_neigh.tsv
- records.csv
- tree.nwk
- pairwise_aa.tsv
- nt_links.tsv
- aai_matrix.tsv
- ani_matrix.tsv
- domains.tsv

Core Outputs

These files are always produced:

Path	Description
`assembly_list.txt`	Assembly accessions selected for processing
`all_neigh.tsv`	Neighborhood coordinates and identifiers for each target
`records.csv`	Input records enriched with taxonomy and metadata
`neighborhood/neighborhoods.fasta`	Extracted neighborhood nucleotide sequences

The records.csv file contains your original input columns plus all metadata added during the pipeline, including taxonomy, assembly info, and any custom columns you provided.

Assembly Folder

Downloaded genome files (when not using local assemblies):

- - GCA_000001234.1_genomic.gbff.gz
  - GCA_000001234.1_genomic.fna.gz
  - GCA_000001234.1_genomic.gff.gz
  - GCA_000001234.1_protein.faa.gz

Comparative Analysis Outputs

These files are produced depending on your configuration:

Protein Links

pairwise_aa.tsv - Produced with --prot-links or aai tree mode

Contains protein-protein similarity scores:

Column	Description
`query_id`	Source protein ID
`target_id`	Target protein ID
`pident`	Percent identity
`evalue`	E-value
`bitscore`	Bit score

Nucleotide Links

nt_links.tsv - Produced with --nt-links

Contains nucleotide alignments for synteny visualization:

Column	Description
`query_hood`	Source neighborhood ID
`target_hood`	Target neighborhood ID
`query_start`	Start position in query
`query_end`	End position in query
`target_start`	Start position in target
`target_end`	End position in target
`identity`	Alignment identity

Distance Matrices

Distance matrices - Produced with aai or ani tree modes

aai_matrix.tsv - Average Amino acid Identity matrix
ani_matrix.tsv - Average Nucleotide Identity matrix

Square matrices with pairwise distances between all neighborhoods.

Annotation Outputs

Domains

domains.tsv - Produced with --domains

Column	Description
`protein_id`	Protein accession
`domain`	Domain name
`database`	Source database (Pfam, TIGRfam, COG)
`start`	Domain start position
`end`	Domain end position
`evalue`	E-value

CRISPR-Cas

cctyper/ - Produced with --cctyper

- crispr_arrays.tsv
- cas_operons.tsv
- spacers.fasta
- repeats.fasta

Contains CRISPR array predictions and Cas protein classifications.

ncRNA

ncrna/ - Produced with --ncrna

- infernal_results.tblout
- ncrna_summary.tsv

Non-coding RNA predictions from Infernal (Rfam).

Mobile Elements

genomad/ - Produced with --genomad

- virus_summary.tsv
- plasmid_summary.tsv
- provirus_summary.tsv

Mobile genetic element predictions (viruses, plasmids, proviruses).

Visualization Bundle

The hoodini-viz/ folder contains everything needed for the interactive viewer:

- - gff.parquet
  - hoods.parquet
  - protein_metadata.parquet
  - tree_metadata.parquet
  - domains.parquet
  - nucleotide_links.parquet
  - protein_links.parquet
- tree.nwk
- hoodini-viz.html

Parquet Files

Efficient binary format used by the viewer:

File	Contents
`gff.parquet`	Gene annotations for all neighborhoods
`hoods.parquet`	Neighborhood metadata and sequences
`protein_metadata.parquet`	Protein info including clusters and domains
`tree_metadata.parquet`	Tree leaf metadata (taxonomy, custom columns)
`domains.parquet`	Domain annotations (if `--domains` used)
`nucleotide_links.parquet`	Synteny links (if `--nt-links` used)
`protein_links.parquet`	Protein similarity links (if `--prot-links` used)

TSV Files

Human-readable versions for inspection:

File	Contents
`gff.gff`	Standard GFF3 format annotations
`hoods.txt`	Neighborhood coordinates
`protein_metadata.txt`	Protein annotations
`tree_metadata.txt`	Leaf metadata

Viewer HTML

hoodini-viz.html - Self-contained HTML file that loads the parquet files and displays the interactive visualization.

💡

Tip: You can share the entire hoodini-viz/ folder. The HTML file loads data from the parquet files, so keep them together!

Tree File

tree.nwk - Newick format phylogenetic tree

Produced when tree_mode is not none. The tree type depends on your setting:

taxonomy: Tree based on NCBI taxonomy hierarchy
aai: Tree based on Average Amino acid Identity distances
ani: Tree based on Average Nucleotide Identity distances

The leaf names correspond to neighborhood IDs (uid column in other files).