File:Ncbi-prok-genomesize.svg

Original file(SVG file, nominally 1,260 × 720 pixels, file size: 917 KB)

Captions

Captions

Add a one-line explanation of what this file represents

Summary edit

Description
English: Log-log plot of the total number of annotated proteins in bacterial and archeal genomes submitted to GenBank as a function of genome size. Based on data from NCBI genome reports.
Date
Source Own work
Author Estevezj
SVG development
InfoField
 
The SVG code is valid.
 
This chart was created with R.
 
The file size of this SVG image may be irrationally large because its text has been converted to paths inhibiting translations.
Source code
InfoField

R code

#!/usr/bin/Rscript
# File-Name:       prok-genomes-genes-graph.R           
# Date:            2013-01-11
# Author:          James Estevez (User:Estevezj)
# Purpose:         This generates a log-log plot of protein count as a function of genome size.
# Data Used:       ftp://ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/prokaryotes.txt
# License:         To the extent possible under law, the author(s) have
#                  dedicated all copyright and related and neighboring rights to this software to
#                  the public domain worldwide. This software is distributed without any
#                  warranty. You should have received a copy of the CC0 Public Domain Dedication
#                  along with this software. If not, see
#                  <https://creativecommons.org/publicdomain/zero/1.0/>.
library(grDevices)
library(ggplot2)
library(plyr)
library(taxize)

# Download our tables from NCBI's FTP site. Accessed Fri Jan 11 23:02:49 PST 2013

prok <- read.table("ftp://ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/prokaryotes.txt", sep="\t", comment.char="!", header=T, stringsAsFactors = F)

prok <- read.table("ncbi-ftp-reports-prokaryotes.txt", sep="\t", comment.char="!", header=T, stringsAsFactors = F)
# Clear missing values ('-')
prok.cut <- prok[(prok$Size..Mb. != '-') & (prok$Proteins != '-'),]

# Set classes
prok.cut$Size..Mb. <- as.numeric(prok.cut$Size..Mb.)
prok.cut$Proteins <- as.numeric(prok.cut$Proteins)
prok.cut$Group <- as.factor(prok.cut$Group)

# From which domain of life does each genome come?
groups <- levels(prok.cut$Group)
get_domain <- function(x){first.hit <- classification(get_uid(x))[[1]] # return the first hit
                          kingdom <- as.character(first.hit[which(first.hit[,"Rank"] == "superkingdom"), 1]) # extract domain
                          return(data.frame(Group = x, Domain = kingdom))
}
domains <- ldply(groups, get_domain)
foo <- prok.cut
prok.cut <- merge(prok.cut, domains, by = "Group")

# Draw our plot
p <- ggplot(prok.cut, aes(Size..Mb., Proteins, color = Domain))

# Save our plot to SVG
svg(filename='ncbi-prok-genomesize.svg', width = 14, height = 8)
p +  geom_point(alpha = 0.5, size = 2) +
  scale_y_log10() +
  scale_x_log10() +
  scale_shape(solid = FALSE) +
  ggtitle("The total genome size and the number of genes in bacteria and archaea.") +
  xlab('Genome size (Megabases)') +
  ylab("Number of protein coding genes") +
  scale_colour_brewer(type="qual", palette=3)
dev.off()

Licensing edit

I, the copyright holder of this work, hereby publish it under the following license:
w:en:Creative Commons
attribution share alike
This file is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license.
You are free:
  • to share – to copy, distribute and transmit the work
  • to remix – to adapt the work
Under the following conditions:
  • attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
  • share alike – If you remix, transform, or build upon the material, you must distribute your contributions under the same or compatible license as the original.

File history

Click on a date/time to view the file as it appeared at that time.

Date/TimeThumbnailDimensionsUserComment
current07:00, 12 January 2013Thumbnail for version as of 07:00, 12 January 20131,260 × 720 (917 KB)Estevezj (talk | contribs)User created page with UploadWizard

There are no pages that use this file.

Metadata