Outputs
This page describes the files and folders produced by Hoodini and how to interpret them.
Output Directory Structure
- gff.parquet
- hoods.parquet
- protein_metadata.parquet
- tree_metadata.parquet
- protein_links.parquet
- nucleotide_links.parquet
- domains.parquet
- domains_metadata.parquet
- ncrna_metadata.parquet
- tree.nwk
- *.html
- assembly_list.txt
- all_neigh.tsv
- records.csv
- tree.nwk
- pairwise_aa.tsv
- nt_links.tsv
- aai_matrix.tsv
- ani_matrix.tsv
- domains.tsv
Core Outputs
These files are always produced:
| Path | Description |
|---|---|
assembly_list.txt | Assembly accessions selected for processing |
all_neigh.tsv | Neighborhood coordinates and identifiers for each target |
records.csv | Input records enriched with taxonomy and metadata |
neighborhood/neighborhoods.fasta | Extracted neighborhood nucleotide sequences |
The records.csv file contains your original input columns plus all metadata added during the pipeline, including taxonomy, assembly info, and any custom columns you provided.
Assembly Folder
Downloaded genome files (when not using local assemblies):
- GCA_000001234.1_genomic.gbff.gz
- GCA_000001234.1_genomic.fna.gz
- GCA_000001234.1_genomic.gff.gz
- GCA_000001234.1_protein.faa.gz
Comparative Analysis Outputs
These files are produced depending on your configuration:
Protein Links
pairwise_aa.tsv - Produced with --prot-links or aai tree mode
Contains protein-protein similarity scores:
| Column | Description |
|---|---|
query_id | Source protein ID |
target_id | Target protein ID |
pident | Percent identity |
evalue | E-value |
bitscore | Bit score |
Annotation Outputs
Domains
domains.tsv - Produced with --domains
| Column | Description |
|---|---|
protein_id | Protein accession |
domain | Domain name |
database | Source database (Pfam, TIGRfam, COG) |
start | Domain start position |
end | Domain end position |
evalue | E-value |
Visualization Bundle
The hoodini-viz/ folder contains everything needed for the interactive viewer:
- gff.parquet
- hoods.parquet
- protein_metadata.parquet
- tree_metadata.parquet
- domains.parquet
- nucleotide_links.parquet
- protein_links.parquet
- tree.nwk
- hoodini-viz.html
Parquet Files
Efficient binary format used by the viewer:
| File | Contents |
|---|---|
gff.parquet | Gene annotations for all neighborhoods |
hoods.parquet | Neighborhood metadata and sequences |
protein_metadata.parquet | Protein info including clusters and domains |
tree_metadata.parquet | Tree leaf metadata (taxonomy, custom columns) |
domains.parquet | Domain annotations (if --domains used) |
nucleotide_links.parquet | Synteny links (if --nt-links used) |
protein_links.parquet | Protein similarity links (if --prot-links used) |
TSV Files
Human-readable versions for inspection:
| File | Contents |
|---|---|
gff.gff | Standard GFF3 format annotations |
hoods.txt | Neighborhood coordinates |
protein_metadata.txt | Protein annotations |
tree_metadata.txt | Leaf metadata |
Viewer HTML
hoodini-viz.html - Self-contained HTML file that loads the parquet files and displays the interactive visualization.
Tip: You can share the entire hoodini-viz/ folder. The HTML file loads data from the parquet files, so keep them together!
Tree File
tree.nwk - Newick format phylogenetic tree
Produced when tree_mode is not none. The tree type depends on your setting:
- taxonomy: Tree based on NCBI taxonomy hierarchy
- aai: Tree based on Average Amino acid Identity distances
- ani: Tree based on Average Nucleotide Identity distances
The leaf names correspond to neighborhood IDs (uid column in other files).