Analysis Using Python

1. Species Abundance

awk '$4 == "S"' kraken_report.txt > species_abundance.txt
awk -F"\t" '{print $1","$6}' species_abundance.txt > abundance.csv
      

Python Code:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv("abundance.csv", names=["Abundance", "Taxon"])
df = df.sort_values("Abundance", ascending=False)

plt.figure(figsize=(10, 6))
sns.barplot(data=df, y="Taxon", x="Abundance", palette="mako")
plt.title("Relative Abundance of Species")
plt.xlabel("Relative Abundance (%)")
plt.ylabel("Species")
plt.tight_layout()
plt.savefig("abundance_plot.png")
plt.show()
      
Output Image:
Species Abundance Plot

2. Genus Abundance

awk '$4 == "G"' kraken_report.txt > genus_abundance.txt
awk -F"\t" '{print $1","$6}' genus_abundance.txt > genus_abundance.csv
      

Python Code:

df = pd.read_csv("genus_abundance.csv", names=["Abundance", "Genus"])
df = df.sort_values("Abundance", ascending=False)

plt.figure(figsize=(10, 6))
sns.barplot(data=df, y="Genus", x="Abundance", palette="cubehelix")
plt.title("Relative Abundance by Genus")
plt.xlabel("Relative Abundance (%)")
plt.ylabel("Genus")
plt.tight_layout()
plt.savefig("genus_abundance_plot.png")
plt.show()
      
Output Image:
Genus Abundance Plot

3. Heatmap Visualization

echo "Genus,Sample1" > genus_heatmap.csv
awk -F"\t" '{print $6","$1}' genus_abundance.txt >> genus_heatmap.csv
      

Python Code:

import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv("genus_heatmap.csv")
df.set_index("Genus", inplace=True)

plt.figure(figsize=(5, len(df)*0.5))
sns.heatmap(df, annot=True, cmap="YlGnBu", linewidths=0.5, cbar_kws={'label': 'Relative Abundance (%)'})
plt.title("Genus-Level Heatmap from Kraken Report")
plt.tight_layout()
plt.savefig("kraken_genus_heatmap.png")
plt.show()
      
Output Image:
Genus Heatmap