Skip to Content

Use Cases

This page covers the most common analysis scenarios you can run with Hoodini Colab.

Single Protein Analysis

When you want to explore the genomic neighborhood of a specific protein.

Scenario: You found an interesting protein in a paper and want to see what genes surround it across different organisms.

Configuration:

  1. Select Single Input mode
  2. Enter an NCBI protein ID (e.g., WP_012345678.1)
  3. Configure Remote BLAST settings:
    • E-value: 1e-10 (default, adjust for stringency)
    • Max targets: 100 (how many homologs to retrieve)
Input: WP_012345678.1 Mode: Single Input Remote BLAST E-value: 1e-10 Max targets: 100

Single Input mode uses remote BLAST to automatically find homologous sequences. This only works with NCBI protein IDs.

Custom Homolog List

When you already have a curated list of sequences to compare.

Scenario: You ran your own BLAST search and want to analyze specific sequences rather than letting Hoodini choose.

Configuration:

  1. Select Input List mode
  2. Paste your IDs (one per line):
WP_000000001.1 WP_000000002.1 WP_000000003.1 NZ_CP000001.1

Unlike Single Input mode, you can mix NCBI protein IDs and nucleotide IDs in this mode.

Analyzing Non-Coding Regions

For CRISPR arrays, regulatory elements, or genomic islands.

Scenario: You want to compare CRISPR arrays or other non-coding genomic regions.

Configuration:

  1. Select Input List mode
  2. Enter nucleotide accessions (e.g., NZ_CP000001.1)
  3. In Neighborhood Window:
    • Set appropriate window sizes for your elements
    • Consider larger windows for genomic islands
Input: NZ_CP000001.1, NZ_CP000002.1 Window upstream: 15000 Window downstream: 15000

Defense System Survey

Comprehensive scan for antiphage defense systems.

Scenario: You want to catalog all defense systems in the neighborhoods around your genes of interest.

Configuration:

  1. Set up your input (any mode)
  2. Enable annotation tools:
    • PADLOC — Antiphage defense systems
    • DefenseFinder — Defense system detection
    • CCtyper — CRISPR-Cas systems
    • geNomad — Mobile genetic elements

First-time setup: Each tool requires downloading its database. Allow extra time (~2-5 min per tool) on the first run.

Phylogenetic Context Analysis

Add evolutionary context with tree construction.

Scenario: You want to visualize how genomic neighborhoods vary across a phylogenetic tree.

Configuration:

  1. Set up your input
  2. In Tree Construction, select a method:

Fastest option — Uses NCBI taxonomy

Tree mode: taxonomy_tree

Custom Coordinates (Input Sheet)

For precise control over which genomic regions to analyze.

Scenario: You have specific coordinates from your own analysis or want to reproduce exact regions.

Configuration:

  1. Select Input Sheet mode
  2. Fill in the table with:
    • protein_id or nucleotide_id
    • assembly_id
    • start and end coordinates
    • strand (+/-)

Example table format:

protein_idassembly_idstartendstrand
WP_000001.1GCF_000001.11000025000+
WP_000002.1GCF_000002.15000065000-

You can also paste TSV data directly into the table.

Comparing Gene Clusters

Visualize conserved operons across species.

Scenario: You’re studying a biosynthetic gene cluster and want to see how it varies across organisms.

Configuration:

  1. Use Input List mode with proteins from the cluster
  2. Enable Protein Links to see sequence similarity
  3. Set Clustering to group similar genes:
    • Identity threshold: 0.3 (30%)
    • Coverage threshold: 0.8 (80%)
  4. Enable a tree method to order genomes

Batch Processing Tips

For large-scale analyses:

  • Start with a subset: Test with 10-20 sequences first
  • Use taxonomy trees: Faster than AAI/ANI for large datasets
  • Limit annotations: Each tool adds processing time
  • Add NCBI API key: Speeds up data downloads significantly
# Set API key before running import os os.environ['NCBI_API_KEY'] = 'your_api_key_here'

Next Steps

Last updated on