Skip to main content

Hello Applied Math faculty and students,

Please consider creating a project for the 2018 RTP Hackathon.

The event is being hosted by NCBI and the UNC Curriculum in Bioinformatics and Computational Biology and will run March 12th-14th. The goal of the Hackathon is to gather teams of bioinformaticians and computational biologists from across RTP (and beyond) to apply their data science skills for the development of software tools and pipelines for the analysis of large biological data sets. Ideally, a team’s effort will result in packaged, open-source software and a manuscript.


To get started, we are asking for short descriptions (~2 sentences) of potential projects. A list of past projects is provided below to give you examples of the types of projects we are looking for. Typical projects are small enough to be completed within a few days (though teams can finish them after the Hackathon) and often use public datasets. For projects oriented around specific lab datasets, NCBI asks that they are publicly available within 6 months after the Hackthon. Also, each project needs a team leader – this could be you or a post-doc/senior graduate student you feel comfortable heading your project. Our goal is to select 5-8 projects with the folks from NCBI toward the beginning of January, so please submit your project before December 31st.


To submit a project, please fill out this form:


For more information about NCBI-Hackathons, please visit:





Examples of past projects include:

+ Identify phages and viruses from metagenomes

+ Classify SRA datasets by source

+ Identify QTLs in plants

+ Use machine learning to characterize viral sequences

+ Develop a Machine Learning Tool to Differentiate Between Synthetic and Natural Genomic Regions in Plants.

+ Compute human ancestral alleles from

chimp, gorillas, orangutan and macaque; and provide API access to ancestral allele for a given position on human genome GRCh38.

+ Machine learning pipelines for germline rare variants linked to phenotypes

+ Building an interactive online environment to run NCBI-style hackathons

+ An integrated pipeline for novel virus discovery

+ Probabalistic identification of past viral exposure based on non-native sequences in host genome

+ Packaging and distributing an automatic corpus-updater for NLP tools

+ Phenotypic Indexing of (crispr derived) mouse models

+ Expanding and publicizing a Shiny app for visualizing protein correlation profiling data

+ Building a pipeline for efficient partitioning of barcodes

+ Creating a public JBrowse database for all Staphylococcus aureus genomes

+ Simulating tumor genomes

+ Associating somatic mutations with clinical outcomes

+ Simplifying access to shared-data repositories from python

+ Building a pipeline for searching for virus-associated protein domains in NGS datasets.

Comments are closed.