Skip to Content

Outputs

This page describes the files and folders produced by Hoodini and how to interpret them.

Output Directory Structure

        • gff.parquet
        • hoods.parquet
        • protein_metadata.parquet
        • tree_metadata.parquet
        • protein_links.parquet
        • nucleotide_links.parquet
        • domains.parquet
        • domains_metadata.parquet
        • ncrna_metadata.parquet
      • tree.nwk
      • *.html
    • assembly_list.txt
    • all_neigh.tsv
    • records.csv
    • tree.nwk
    • pairwise_aa.tsv
    • nt_links.tsv
    • aai_matrix.tsv
    • ani_matrix.tsv
    • domains.tsv

Core Outputs

These files are always produced:

PathDescription
assembly_list.txtAssembly accessions selected for processing
all_neigh.tsvNeighborhood coordinates and identifiers for each target
records.csvInput records enriched with taxonomy and metadata
neighborhood/neighborhoods.fastaExtracted neighborhood nucleotide sequences

The records.csv file contains your original input columns plus all metadata added during the pipeline, including taxonomy, assembly info, and any custom columns you provided.


Assembly Folder

Downloaded genome files (when not using local assemblies):

      • GCA_000001234.1_genomic.gbff.gz
      • GCA_000001234.1_genomic.fna.gz
      • GCA_000001234.1_genomic.gff.gz
      • GCA_000001234.1_protein.faa.gz

Comparative Analysis Outputs

These files are produced depending on your configuration:

pairwise_aa.tsv - Produced with --prot-links or aai tree mode

Contains protein-protein similarity scores:

ColumnDescription
query_idSource protein ID
target_idTarget protein ID
pidentPercent identity
evalueE-value
bitscoreBit score

Annotation Outputs

domains.tsv - Produced with --domains

ColumnDescription
protein_idProtein accession
domainDomain name
databaseSource database (Pfam, TIGRfam, COG)
startDomain start position
endDomain end position
evalueE-value

Visualization Bundle

The hoodini-viz/ folder contains everything needed for the interactive viewer:

      • gff.parquet
      • hoods.parquet
      • protein_metadata.parquet
      • tree_metadata.parquet
      • domains.parquet
      • nucleotide_links.parquet
      • protein_links.parquet
    • tree.nwk
    • hoodini-viz.html

Parquet Files

Efficient binary format used by the viewer:

FileContents
gff.parquetGene annotations for all neighborhoods
hoods.parquetNeighborhood metadata and sequences
protein_metadata.parquetProtein info including clusters and domains
tree_metadata.parquetTree leaf metadata (taxonomy, custom columns)
domains.parquetDomain annotations (if --domains used)
nucleotide_links.parquetSynteny links (if --nt-links used)
protein_links.parquetProtein similarity links (if --prot-links used)

TSV Files

Human-readable versions for inspection:

FileContents
gff.gffStandard GFF3 format annotations
hoods.txtNeighborhood coordinates
protein_metadata.txtProtein annotations
tree_metadata.txtLeaf metadata

Viewer HTML

hoodini-viz.html - Self-contained HTML file that loads the parquet files and displays the interactive visualization.

đź’ˇ

Tip: You can share the entire hoodini-viz/ folder. The HTML file loads data from the parquet files, so keep them together!


Tree File

tree.nwk - Newick format phylogenetic tree

Produced when tree_mode is not none. The tree type depends on your setting:

  • taxonomy: Tree based on NCBI taxonomy hierarchy
  • aai: Tree based on Average Amino acid Identity distances
  • ani: Tree based on Average Nucleotide Identity distances

The leaf names correspond to neighborhood IDs (uid column in other files).

Last updated on