I'm trying to develop a GUI in python for analyze tRNA-Seq data which could be run in Linux and Windows. For this it is needed run some programs like: bowtie2, samtools or bedtools, which can be downloaded by anaconda easily on Linux but is a headache on Windows. This programs can't be downloaded on Windows so I had to install Windows Subsystem for Linux (WSL) and tried to downloaded by this way.
I tried to install bowtie with the source code (installed previosly MinGW-W64 and MSYS) but I got several error messages - it says my machine is not 64 bit. I modified the MakeFile with notepad. Run make and g make, nut nothing has worked.I have also tried to install Bowtie2 with mingw64.zip but also there nothing.
Install Bowtie Windows
This is anecdotal evidence but in the past I have found that Windows "home" editions are missing some system level features that result in strange messages/inability to run certain commercial bioinformatics programs (Vector NTI was one). Issues you noted with MinGW may be related to this. You should consider upgrading the home edition to pro, if you want to stick with Windows. That would allow you to install Windows subsystem for Linux (bash). Without this ability, Windows for bioinformatics analysis is going to prove limiting.
Providing system wide access to miRge3.0, cutadapt, bowtie and bowtie-build, please type or (copy and paste) and submit each of the following commands on the terminal:NOTE: Make sure to change your path to python bin folder; Replace /home/arun/.local/ with /Path on your computer/.
Mac comes with python2.7 installed by default. To use python3.7, creating an alias in .bash_profile would do the trickOpen a new terminal window. Use vim editor if you are familiar using this editor vi .bash_profile or open the .bash_profile using text editor by open -e .bash_profile and add the following line at the bottom of the text.
Providing system wide access to miRge3.0, cutadapt, bowtie and bowtie-build, please type or (copy and paste) and submit each of the following commands on the terminal:NOTE: Make sure to change your path to python bin folder; Replace /Users/loaneruser/Library/ with /Path on your computer/.
This package provides an R wrapper of the popular bowtie2 sequencing reads aligner and AdapterRemoval, a convenient tool for rapid adapter trimming, identification, and read merging. The package contains wrapper functions that allow for genome indexing and alignment to those indexes. The package also allows for the creation of .bam files via Rsamtools.
Same as Bowtie, the first and basic step of running Bowtie2 is to build Bowtie2 index from a reference genome sequence. The basic usage of thecommand bowtie2-build is:$ bowtie2-build -f input_reference.fasta index_prefixwhere input_reference.fasta is an input file of sequence reads in fasta format, and index_prefix is the prefix of the generated index files. Beside the option -f that is used when the reference input file is a fasta file, the option -c can be used when the reference sequences are given on the command line.
An example of how to run Bowtie2 local alignment on Crane with paired-end fasta files and 8 CPUs is shown below:bowtie2_alignment.submit#!/bin/bash#SBATCH --job-name=Bowtie2#SBATCH --nodes=1#SBATCH --ntasks-per-node=8#SBATCH --time=168:00:00#SBATCH --mem=10gb#SBATCH --output=Bowtie2.%J.out#SBATCH --error=Bowtie2.%J.errmodule load bowtie/2.3bowtie2 -x index_prefix -f -1 input_reads_pair_1.fasta -2 input_reads_pair_2.fasta -S bowtie2_alignments.sam --local -p $SLURM_NTASKS_PER_NODE
Use Bowtie Chart by MAQ Software to quickly compare values in one or more categories. Bowtie Chart by MAQ Software is ideal for displaying sales metrics, such as the flow of a sale from channel to category. The thickness of the bowtie's branches indicates the relative weight of each category. Create a full bowtie to showcase how a cumulative value splits into two distinct sub-categories. Alternately, you can create a half bowtie, showcasing the distribution of data within a source.
You can download this report and / or the logs used to generate it, to try running MultiQC yourself. The MultiQC_NGI package must be installed. Note that the example report has some user-specific config settings, seen in the multiqc_config.yaml file. It can also be run with the --test-db parameter, using the example data provided.
To install MultiQC, simply run pip install multiqc on the command line. If you use conda / bioconda, you can run conda install multiqc instead. See the installation instructions for more help.
Once installed, run the plugin by selecting your reads and reference sequence then clicking on Align/Assemble - Map to Reference in the toolbar. Bowtie is available as an option in the Algorithm drop-down menu.
Or some Bismark HTML summary reports: Bismark Summary Report WGBS, Bismark Summary Report RRBS (no deduplication), or a Bismark Summary for a single-cell experiment which summarises a larger number of samples (Bismark Summary single cells data (.txt)) Here is an overview of the alignment modes that are currently supported by Bismark: Bismark alignment modes (pdf). Changelog 19-11-2019: 0.22.3 released (click here for the Release Notes hosted on Github)
16-10-2019: 0.22.2 released (click here for the Release Notes hosted on Github)
21-04-2019: 0.22.1 released (click here for the Release Notes hosted on Github)
16-04-2019: 0.22.0 released (click here for the Release Notes hosted on Github)
14-03-2019: 0.21.0 released (click here for the Release Notes hosted on Github)
01-02-2019: 0.20.1 released (click here for the Release Notes hosted on Github)
16-08-2018: 0.20.0 released (click here for the Release Notes hosted on Github)
27-04-2018: 0.19.1 released (click here for the Release Notes hosted on Github)
13-10-2017: 0.19.0 released (click here for the Release Notes hosted on Github)
23-05-2017: 0.18.1 released (click here for the Release Notes hosted on Github)
15-05-2017: 0.18.0 released (click here for the Release Notes hosted on Github)
18-01-2017: 0.17.0 released (click here for the Release Notes hosted on Github)
25-07-2016: 0.16.3 released (click here for the Release Notes hosted on Github)
Bismark: Essential fixes (2 in total) to address a bug for Bowtie 2 alignments where reads that should be considered ambiguous were incorrectly assigned to the first alignment thread. These errors had crept in during releases 0.16.0 and 0.16.2). More info available on Github
Bismark: Added support for large Bowtie (1) index files ending in .ebwtl which had been added in Bowtie v1.1.0
Changed the Shebang in all scripts of the Bismark suite to #!/usr/bin/env perl instead of #!/usr/bin/perl
deduplicate_bismark: Does now bail with a useful error message when the input files are empty
bismark_genome_preparation: Added new option '--genomic_composition' so that the genomic composition can be calculated and written right at the genome preparation stage rather than by using bam2nuc
bam2nuc: Now also calculates a fold coverage for the various (di-)nucleotides. The changes in the nucleotide_stats text file are also picked up and plotted by bismark2report
bam2nuc: Added a new option '--genomic_composition_only' to just process the genomic sequence without requiring any data files
bismark2summary: Added option -o/--basename FILENAME to specify a certain filename. If not specified the name will remain bismark_summary_report.txt/html
bismark2summary: Added documentation and the options '--help' and '--version' to be consistent with the rest of Bismark
bismark2summary: Added option '--title STRING' to give the HTML report a different title
25-04-2016: 0.16.1 released (click here for the Release Notes hosted on Github)
Bismark: Removed a rogue warn/sleep statement for PE/Bowtie2 mode that had crept in during the last release...
20-04-2016: 0.16.0 released (click here for the Release Notes hosted on Github)
Bismark: File endings .fastq .fq .fastq.gz .fq.gz are now removed from the output file (unless they were specified with --basename) in a bid to reduce the length of the already long file names
Bismark: Enabled the new option --dovetail (which will be turned on by default for --pbat libraries) which will now allow dovetailing reads to be reported
Bismark: Changed the behaviour of corner cases where several non-directional alignments could have existed for the very same position but to different strands so that now the best alignment trumps the weaker one. As an example: If you relaxed the alignment criteria of a given alignment to allow 60 mismatches for PE alignment we did find an alignment to the OT strand with a combined AS of -324, but there also was an alignment to the CTOB strand with and AS of 0 (perfect alignment). The CTOB now trumps the OT alignment, and the methylation information information is now reported for the bottom strand
New module: bismark2summary accepts Bismark BAM files as input. It will then try to identify Bismark reports, and optionally deduplication reports or methylation extractor (splitting) reports automatically based the BAM file basename. It produces a tab delimited overview table (.txt) as well as a graphical HTML report. Examples can be found at Bismark Summary Report and Bismark Summary Report (.txt)
The new Bismark module bam2nuc calculcates the average mono- and di-nucleotide coverage of libraries and compares this to the genomic average composition. bam2nuc can be called straight from within Bismark (option --nucleotide_coverage) or run stand-alone. bam2nuc creates a ...nucleotide_stats.txt file that is also automatically detected by bismark2report and incorporated into the HTML report
bismark_sitrep.tpl: Removed an extra function call in bismark_sitrep.tpl so that the M-bias 2 plot is drawn once the M-bias 1 plot has finished drawing (parallel processing could with certain browsers and data may have resulted in a white spaceholder only)
Methylation extractor: Altering the file path handling of coverage2cytosine and bismark2bedGraph also required some changes in the methylation extractor
bismark2bedGraph: Input file path handling has been completely reworked. The output file which can be specified as -o output.bedGraph now has to be a single file name and mustn't contain any path information. A particular output folder may be specified with -dir /any/path/
bismark2bedGraph: Addressing the file path handling issue also fixed a similar issue with the option --remove_spaces when -o had been specified
coverage2cytosine: Changed zcat for gunzip -c when reading a gzipped coverage file. This should avoid some Mac platforms crashing because zcat invariably requires a file to end in the .Z (which it doesn't...)
coverage2cytosine: Changed the way in which the coverage input file is handed over from the methylation_extractor to coverage2cytosine (previously the path information might have been part of the file name, but instead it will now be only part of the -dir output_directory option
14-01-2016: 0.15.0 released (click here for the Release Notes hosted on Github)
Added option --se/--single_end [list]. This sets single-end mapping mode explicitly giving alist of file names as [list]. The filenames may be provided as a comma , or colon :-separated list
Added option --genome_folder /path/to/genome as alternative to supplying the genome as the first argument
Added an option --rg_tag to print an @RG header line as well as and RG:Z: tag to each read. The ID and SAMPLE fields default to 'SAMPLE', but can be specified manually with --rg_id or --rg_sample
Added new option --ambig_bam for Bowtie2-mode only, which writes out a single alignment for sequences with multiple alignments to a special file ending in .ambiguous.bam. The alignments are in Bowtie2 format and do not any contain Bismark specific entries such as the methylation call etc. These ambiguous BAM files are intended to be used as coverage estimators for variant callers. Works for single-end and paired-end alignments in single or multi-core mode
Added the new options --cram and --cram_ref to Bismark for both paired- and single-end alignments in single or multi-core mode. This option requires Samtools version 1.2 or higher. A genome FastA reference may be supplied as a single file with the option --cram_ref; if this is not specified the file is derived from the reference FastA file(s) used for the Bismark run, and written to the file Bismark_genome_CRAM_reference.mfa into the output directory.
deduplicate_bismark: Added better handling of cases when the input file was empty (died for percentage calculation instead of calling it N/A)
Added a note mentioning that Read1 and Read2 of paired-end files are expected to follow each other in two consecutive lines and possibly require name-sorting prior to deduplication. Also added a check that reads the first 100000 lines to see if the file appears to have been sorted and bail out if this is true
methylation extractor: Added support for CRAM files (this option requires Samtools version 1.2 or higher) bismark2bedGraph
Changed the way gzip compressed input files are handled when using the UNIX sort command (i.e. with --scaffolds/--gazillion or without --ample_memory coverage2cytosine
Added option --gzip to compress output files. This currently only works for the default CpG_report and CX_report output files (and thus not with the option --gc or --split_files. The option --gzip is now also passed on from the bismark_methylation_extractor
Added a check to bail if no information was found in the coverage file, e.g. if a wrong file path for a .cov.gz file had been specified
bismark_genome_preparation: Added process handling to the child processes
20-08-2015: 0.14.5 released - minor fix
deduplicate_bismark: Changed all instances of literal calls of 'samtools' calls to '$samtools_path'
19-08-2015: 0.14.4 released
Bismark: Changed the FLAG values of paired-end alignments to the CTOT or CTOB strands so that reads can be properly displayed in SeqMonk when imported as BAM files. This change affects only paired-end alignments in --pbat or --non_directional mode. In detail we simply swapped the Read 1 and Read 2 FLAG values round so reads now resemble exactly concordant read pairs to the OT or OB strands. Note that results produced by the methylation extractor or further downstream of that are not affected by this change
Bismark: Input files specified with filepath information for FastA files are now handled properly in --multicore runs (this was fixed only for FastQ files in the previous patch)
Bismark: Unmapped and ambiguous files (options --unmapped and --ambiguous) are now written out as gzip compressed files by default
Bismark: Changed the default mode of operation to --bowtie2. Bowtie (1) alignments may still be chosen using the option --bowtie1
Bismark Genome Preparation: Changed the execution of the genome indexing of the parent process to system() rather than an exec() call since this seemed to lead to interesting faults when run in a pipeline setting
Bismark Genome Preparation: Changed the default indexing mode to --bowtie2. Bowtie (1) indexing is still available via the option --bowtie1
bismark2bedGraph: The coverage (.cov) and bedGraph (.bedGraph) files are now written out as gzip compressed files by default
coverage2cytosine: Added new option '--gc/--gc_context' to reprocess the genome to find methylation in GpC context. This might be useful for specialist applications where GpC methylases had been employed. The output format is exactly the same as for the normal cytosine report, and only positions covered by at least one read are reported (output file ends in .GpC_report.txt). In addition this will write out a Bismark coverage file (ending in GpC.cov)
deduplicate_bismark: Removed redundant closing statements to get rid of warning messages
deduplicate_bismark: The option --representative is no longer displayed in the help text. The option was once useful to determine the PCR bias that had been introduced by over digestion with bisulfite and is nearly always not what should be used for deduplication (it will be left in and is still functional for the time being though)
06-05-2015: 0.14.3 released
Bismark: Changed the renaming settings for paired-end files so that 'sam' within the filename no longer gets renamed to 'bam' (e.g. smallsample.sam > smallbample.sam)
Bismark: Input files specified with filepath information are now handled properly in --multicore runs
Bismark: The --multicore option currently requires the output files to be in BAM format, so specifying --sam at the same time has been disallowed
Methylation Extractor: fixed another bug for the same issue as in 0.14.1 that had crept into the 0.14.2 release (to do with --ignore_3prime)
coverage2cytosine: Changed the option --merge_CpG so that CGs starting at position 1 are not considered (since the 3-base sequence context of the bottom strand C at position 2 can not be determined)
27-03-2015: 0.14.2 released
Methylation Extractor: Added a bug fix for the same issue as in 0.14.1 that was overlooked in the earlier release
27-03-2015: 0.14.1 released
Bismark: Fixed the cleaning up stage in a --multicore run when --gzip had been specified as well
Bismark: Fixed the handling of files in a --multicore run when the input files had been specified including file path information
Bismark: Please note that the option -B/--basename in conjunction with --multicore is currently not supported (as in: disabled), but we are aiming to address this soon
Methylation Extractor: Fixed a bug with the position adjustment of paired-end reads when the reads should have been trimmed from their 3' ends (option --ignore_3prime)
deduplicate_bismark: Now also removing newline characters from the read conversion tag in case other programs interfered with the tag ordering and put this tag into the very last column
06-03-2015: 0.14.0 released - Bismark Parallelization
Bismark: Finally added parallelization to the Bismark alignment step using the option '--muticore int' which sets the number of parallel instances of Bismark to be run concurrently. At least in this first distribution this is achieved by forking the Bismark alignment step very early on so that each individual Spawn of Bismark (SoB?) processes only every n-th sequence (n being set by --multicore). Once all processes have completed, the individual BAM files, mapping reports, unmapped or ambiguous FastQ files are merged into single files in very much the same way as they would have been generated running Bismark conventionally with only a single instance.If system resources are plentiful this is a viable option to speed up the alignment process (we observed a near linear speed increase for up to --multicore 8 tested so far). However, please note that a typical Bismark run will use several cores already (Bismark itself, 2 or 4 threads of Bowtie/Bowtie2, Samtools, gzip etc...) and 10-16GB of memory depending on the choice of aligner and genome. WARNING: Bismark Parallel (BP?) is resource hungry! Each value of --multicore specified will effectively lead to a linear increase in compute and memory requirements, so --multicore 4 for e.g. the GRCm38 mouse genome will probably use 20 cores and eat 40GB or RAM, but at the same time reduce the alignment time to 25-30%. You have been warned...
Bismark: Changed the default output to BAM. SAM output may be requested using the option --sam
Bismark: No longer generates a piechart (.png) with the alignment stats. bismark2report generates a much nicer report anyway
Methylation Extractor: To detect paired-end alignment mode from the @PG header line, white spaces before and after -1 and -2 are now required. In some instances files containing e.g. -1-2 in their filename might previously have been identified as paired-end incorrectly
deduplicate_bismark: To detect paired-end alignment mode from the @PG header line, white spaces before and after -1 and -2 are now required
deduplicate_bismark: Added option --version so that Clusterflow can report a version number
bismark2bedGraph: Fixed path handling for cases where the input files were given with path information and an output directory had been specified as well
coverage2cytosine: Fixed a typo in the shebang which prevented coverage2cytosine from running
27-12-2014: 0.13.1 released
Bismark Genome Preparation: Added a check for unique chromosome names to the Bismark indexer to avoid disappointments later
Methylation Extractor: Fixed a bug for the M-bias reports when the option --multicore was used, in which case only the numbers of one core were used to constuct the report. Now every different thread writes out an individual M-bias table, and once the methylation extraction has completed all these individual files are merged into a single, cumulative table as it should be
Methylation Extractor: Added a new option --mbias_off, which processes the files as normal but does not write out any M-bias files. This option is meant for users who run the methylation extractor two times, the first time to figure out whether there is a bias that needs to be removed, and the second time using the --ignore options, but without overwriting the already existent M-bias reports
bismark2bedGraph: Deferred removal of the input file path information a little so that specifying file paths doesn't prevent bismark2bedGraph from finding the input files anymore
bismark2bedGraph: If the specified output directory doesn't exist it will be created
bismark2bedGraph: Changed the way scaffolds are sorted (with --gazillion/--scaffold specified) to -k3,3V (this was done following a suggestion by Volker Brendel, Indiana University: "The -k3,3V sort option is critical when the sequence names are numbered scaffolds (without left-buffering of zeros)
coverage2cytosine: Added a new option --merge_CpG that will post-process the genome-wide report to write out an additional coverage file which has the top and bottom strand methylation evidence pooled into a single CpG dinucleotide entity. This may be the desirable input format for some downstream processing tools such as the R-package bsseq (by K.D. Hansen). For an example please see the RELEASE_NOTES file. This option is currently experimental, and only works if CpG context only and a single genome-wide report were specified (i.e. it doesn't work with the options --CX or --split_by_chromosome)
coverage2cytosine: Changed the processing of not-covered chromosomes so that they are sorted and not processed randomly. This should make runs more reproducible
01-10-2014: 0.13.0 released
Bismark: Fixed renaming issue for SAM to BAM files (which would have replaced any occurrence of sam in the file name, e.g. sample1_... instead of the file extension .sam)
Methylation Extractor: Added new option '--multicore INT' to set the number of cores to be used for the methylation extraction process. If system resources are plentiful this is a viable option to speed up the extraction process (we observed a near linear speed increase for up to 10 cores specified). Please note that a typical process of extracting a BAM file and writing out '.gz' output streams will in fact use 3 cores per value of --multicore INT specified (1 for the methylation extractor itself, 1 for a Samtools stream, 1 for a GZIP stream), so --multicore 10 is likely to use around 30 cores of system resources. This option has no bearing on the speed of the bismark2bedGraph or genome-wide cytosine report processes
Methylation Extractor: Added two new options '--ignore_3prime INT' (for single-end alignments and Read 1 of paired-end alignments) and '--ignore_3prime_r2 INT' (for Read 2 of paired-end alignments) to remove positions that display a methylation call bias on the 3' end of reads
Methylation Extractor: The option --no_overlap is now the default for paired-end data. One may explicitly choose to include overlapping data with the option '--include_overlap'
Methylation Extractor: The splitting report will now be written out by default (previously optional --report)
Methylation Extractor: In paired-end mode, read-pairs which had been skipped because either read was shorter than a specified (very high) value of '--ignore' or '--ignore_r2' will now have the information of the other read extracted if it meets the length criteria (if applicable). Thanks to Andrew Dei Rossi for contributing a patch
bismark2bedGraph: Fixed the location of the sorting directory which could have failed if an output directory had been specified
21-07-2014: hotfix 0.12.5 released
Bismark: Added one additional check to the way ambiguous alignments are handled in Bowtie 2 mode. In more detail this adds a check whether the current ambiguous alignment is worse than the best alignment so far, in which case the sequence does not get flagged as ambiguous
21-07-2014: Version 0.12.4 released
Bismark: Improved the way ambiguous alignments are handled in Bowtie 2 mode. Previously, sequences wereclassified as ambiguously aligning as soon as a sequence produced several equally good alignmentswithin the same alignment thread. Under certain circumstances however there may exist equally goodalignments within the same alignment thread, but the sequence might have a better (unique) alignmentin another thread. Such a unique alignment will now trump the ambiguous alignment as it should
Bismark: Got rid of 2 warning messages of MD-tag information for reads containing deletions (Bowtie 2 mode only)which accidentally made it through to the release
Bismark: Added '-x' to the invocation of Bowtie 2 for FastA sequences so that it works again (It used to workpreviously only because Bowtie 2 did not check it properly and automatically used bowtie2-align-s, butnow it does check...)
Methylation Extractor: Line endings are now chomped at an earlier stage so that interfering with the optional fields in the Bismark BAM file doesn't break the methylation extractor (e.g. reordering of optional tags by Picard)
23-06-2014: Version 0.12.3 released
Bismark: Replaced the XX-tag field (base-by-base mismatches to the reference, excluding indels) by an MD:Z: field that now properly reflects mismatches as well as indels
Bismark: Fixed the hemming distance value (NM:i: field) for reads containing insertions (Bowtie 2 mode only) which was previously offset by the number of insertions in the read
methylation extractor/bismark2bedGraph: Changed the '--zero_based' option of the methylation extractor and bismark2bedGraph to write out an additional coverage file (ending in .zero.cov) which uses the UCSC zero-based, half-open standard
bismark2bedGraph: Changed the requirement of CpG context files to now start with CpG... (from CpG_...)
04-05-2014: Version 0.12.2 released
Bismark: Added support for the new 64-bit index files for very large genomes in Bowtie 2 mode. The large genome indexes (ending in .bt2l instead of .bt2 for small genomes) are generated automatically by bismark_genome_preparation and work just as well in the Bismark alignment step
Bismark: Fixed a bug that would omit the name of the second last chromosome from the SAM header if the genome had been supplied as Multi-FastA file. Everything else, including the alignments, would have been unaffected by this glitch
Bismark: When the option '--basename' is specified, SE amibiguous file names now feature an underscore in their file name. Also, the pie chart file names are now derived from the the basename
Methylation Extractor: Introduced a length check when the options --ignore or --ignore_r2 were set to ensure that only reads that are long enough are being processed
29-04-2014: Version 0.12.1 released
Bismark: Added calculation of MAPQ values for SAM/BAM output generated with Bowtie 2 for both single-end and paired-end mode. The calculation is implemented like in Bowtie 2 itself. Mapping quality values are still unavailable for alignments performed with Bowtie and retain a value of 255 throughout
Bismark: Fixed an uninitialised value warning for PE alignments with Bowtie 2 that occurred whenever Read 2 aligned to the very start of a chromosome (this only affected the warning itself and had no impact on any results)
coverage2cytosine: all chromosomes or scaffolds are now processed irrespective of whether they were covered in the sequencing experiment or not. Previously, CpG/cytosine reports for genomes with lots of small scaffolds that were not covered by any reads might have had a variable number of lines between experiments
08-04-2014: Version 0.11.1 released
Bismark: The option --pbat now also works for use with Bowtie 2, in both single-end and paired-end mode. The only limitation to this is that it only works with FastQ files and uncompressed temporary files
Bismark: Changed the order in which the @SQ lines are written out to the SAM/BAM header from random to the same orderthey are being read in from the genomes folder (or the order of the files in which they occur withina multi-FastA file)
Bismark: Included a new option '-B/--basename basename' for output files instead of deriving these names from the input file. --basename takes precedence over the option --prefix.
Bismark: Unmapped or ambiguous files now end in .fq or.fa for FastA or FastQ files, respectively (instead of .txt files)
Methylation extractor: willl no longer attempt to delete unused files if --mbias_only was speficied
Methylation extractor: Added a test to see if a file that does not end in .bam is in fact a BAM file, and if this succeeds open the file using Samtools view
03-12-2013: Bug fix for deduplicate_bismark
deduplicate_bismark: fixed a bug for '--representative' mode where the final report was accidentally written to the SAM file instead of the report file. Please note that using '--representative' is nearly always what you DON'T WANT to do, since this selects for the most highly amplified PCR product/artefact and not a random read
27-11-2013: Version 0.10.1 released
Bismark methylation extractor: The methylation extractor does now detect automatically whether Bismark alignment file(s) were run in single-end or paired-end mode. The automatic detection can be overridden by manually specifying -s or -p and this option is only available for SAM/BAM files
bismark2bedGraph: When run in stand-alone mode, the coverage file will replace 'bedGraph' as the file ending with 'bismark.cov'. If the output filename is anything other than 'bedGraph', '.bismark.cov' will be appended to the filename
bismark2bedGraph: When run in stand-alone mode, '--counts' will be enabled by default for the coverage output
bismark2bedGraph: Added a new option '--scaffolds/--gazillion' for users working with unfinished genomes sporting tens or even hundreds of thousands of scaffolds/contigs/chromosomes. Such a large number of reference sequences frequently resulted in errors with pre-sorting reads to individual chromosome files because of the operating system's limitation of the number of filehandles that can be written to at any one time (typically this limit is anything between 128 and 1024 filehandles; to find out this limit on Linux, type: ulimit -a). To bypass the limitation of open filehandles, the option '--scaffolds' does not pre-sort methylation calls into individual chromosome files. Instead, all input files are temporarily merged into a single file (unless there is only a single file), and this file will then be sorted by both chromosome AND position using the UNIX sort command. Please be aware that this option might take a looooong time to complete, depending on the size of the input files, and the memory you allocate to this process (see '--buffer_size')
bismark2bedGraph: Added a new option '--ample_memory'. Using this option will not sort chromosomal positions using the UNIX sort command, but will instead use two arrays to sort methylated and unmethylated calls, respectively. This may result in a faster sorting process for very large files, but this comes at the cost of a larger memory footprint (as an estimate, two arrays of the length of the largest human chromosome 1 (250 million bp) consume around 16GB of RAM). Note however that due to the overhead of creating and looping through huge arrays this option might in fact be *slower* for small-ish files (up to a few million alignments). Note also that this option is not currently compatible with options '--scaffolds/--gazillion'. This option still needs some efficiency testing as to when it actually makes sense to use it, but it produces identical results to the default sort option. Thanks to Yi-Shiou Chen for contributing this twist
deduplicate_bismark: The deduplication script does now detect automatically whether a Bismark alignment file was run in single-end or paired-end mode (this happens separately for every file analysed). The automatic detection can be overridden by manually specifying -s or -p and this option is only available for SAM/BAM files
bismark2report: Specifying a single file for each of the optional reports does now will now work as intended, instead of being skipped
coverage2cytosine: Added some counting and statements to indicate when the run finished successfully (it proved to be difficult to follow the report process for a genome with nearly half a million scaffolds...)
10-11-2013: Version 0.10.0 released
Bismark: The option '--prefix' does now also work for the C->T and G->A transcribed temporary files to allow multiple instances of Bismark to be run on the same file in the same folder (e.g. using Bowtie and Bowtie 2 or some stricter and laxer parameters concurrently)
bismark2report: Changed the behavior of this module to automatically find all Bismark mapping reports in the current working directory, and to try and work out whether the optional reports are present as well (i.e. deduplication, splitting and M-bias reports). This uses the file basename and will fail if the files have been renamed at any stage
bismark2report: Added commas as separator for large numbers to improve readability
Bismark methylation extractor: will now delete unsused methylation context files (e.g. CTOT and CTOB files for a directional library)
bismark2bedGraph: Dropped the option -k3,3 from the sort command to result in a dramatic speed increase while sorting. This option had been used previously to enable sorting by chromosome in addition to position, but should no longer be needed because the files are being read in sorted by chromosome already
bismark2bedGraph: This module does now produce these two output files: (1) A bedGraph file, which now contains a header line: 'track type=bedGraph'. The genomic start coords are 0-based, the end coords are 1-based. (2) A coverage file ending in .cov. This file replaces the former 'bedGraph --counts' file and is required to proceed with the subsequent step to generate a genome-wide cytosine report (the module doing this has been renamed to coverage2cytosine to reflect this file name change)
coverage2cytosine: Changed the name of this module from 'bedGraph2cytosine' to 'coverage2cytosine' to reflect the change that this module now requires the methylation coverage file produced by the bismark2bedGraph module (this coverage file replaces the former "bedGraph --counts" output)
coverage2cytosine: Previously, the cytosine report would always report every C position in any context, even though the default should have reported CpG positions only. This has now been fixed
Bismark genome preparation: Made a couple of changes to make the genome preparation fully non-interactive. This means that the path to the genome folder and to Bowtie (1/2) have to be specified up front (for Bowtie (1/2) it is otherwise assumed that it is in the PATH). Furthermore, already existing bisulfite indices in the target folder will be overwritten and the user is no longer prompted if he agrees to this. We got rid of this because creating a second index (Bowtie 1 as well as 2) in the same folder in non-interactive mode got stuck in loops asking whether it is alright to proceed or not, generating therabyte sized log files without ever starting doing anything useful...)
deduplicate_bismark: Renamed the rather long deduplication script to this slightly shorter one. Also added some filehandle closing statements that might have caused buffering issues under certain circumstances
08-16-2013: Version 0.9.0 released
Bismark: Implemented the new methylation call symbols 'U' and 'u' for methylated or unmethylated cytosinesin unknown sequence context, respectively. If the sequence context bases contain an N, e.g. CN or CHN, the context cannot be determined accurately (previously, these cases were assumed to be in CHH context). These situations may arise whenever the reference sequence contains Ns, or when insertions in the read occur close to a cytosine position (bases inserted into the read have no direct equivalent in the reference sequence and were assumed to be Ns for the methylation call). In practical terms, the 'U/u' methylation calls will only occur for Bowtie 2 alignments because Bowtie 1 does not support gapped alignments or read alignments if the reference contains any N's. The Bismark report will now also include the 'U/u' statistics, such as count and % methylation, however only if run in Bowtie 2 mode
bismark2report: this new module generates a graphical interactive HTML report of the Bismark alignment, deduplication, splitting and M-bias statistics for convenient visualisation of what is going on. Since several different modules of Bismark may be included into this report that may or may not have been run, bismark2report requires the user to specify the relevant reports as input files. Many thanks to Phil Ewels for the conceptual design and his help with this report. Here are examples for a standard paired-end BS-Seq report, or for a single-end PBAT report
Bismark: Fixed a bug affecting the generation of the alignment overview pie chart which occurred for PBAT libraries only
Methylation Extractor: Added handling of the newly introduced methylation call U/u for cytosines in Unknown sequence context (CN or CHN). These methylation calls are simply ignored in the extraction process to not cause too much confusion for downstream analysis
bismark2bedGraph: Added a check to see whether input files start with CpG_* or not. If they don't, please include the option '--CX' when running bismark2bedGraph as a stand-alone tool
07-26-2013: Version 0.8.3 released
Bismark: Changed the FLAG values of paired-end SAM/BAM output files to comply with other downstream applications such as Picard. In addition, reads will no longer have /1 or /2 appended to the read IDs. For the time being, the old FLAG values and read ID tags can still be obtained using the option '--old_flag'. For more information on the change of FLAG tags please see the RELEASE NOTES or type 'bismark --help'
Methylation Extractor: Changed the additional check for the module GD::Graph::colour to an 'eval require ...' statement instead of using 'use'. This should now properly skip drawing the M-bias plot if the module is not installed on the system
Methylation Extractor: Implemented two quick tests for paired-end SAM/BAM files to see if the file had been sorted by chromosomal position prior to using the methylation extractor, because this would cause problems with the strand identity and overlaps since both reads 1 and read 2 are expected to follow each other directly in the Bismark alignment file. The first test attempts to find an @SO (for sorted) tag in the SAM header. If this cannot be found, the first 100000 sequences are checked for whether or not their ID is the same. If the file appears to have been sorted, the methylation extractor will bail and ask for an unsorted file instead
07-24-2013: Version 0.8.2 released
Bismark: Changed the values of the TLEN values in paired-end SAM format generated by Bowtie 2 whenever one read was completely contained within the other; in such cases both TLEN values will be set to the length of the longer fragment
Bismark: Changed the output filename for Bowtie 2 files for single-end reads from '...bt2_bismark.sam' to '...bismark_bt2.sam' so that single-end and paired-end file names are more consistent
Methylation Extractor: Added a new option '--mbias_only'. If this option is specified, the M-bias plot(s) and their data are being written out. The standard methylation report ('--report') is optional. Since this option will not extract any methylation data, neither bedGraph nor cytosine report conversion are not allowed
Methylation Extractor: If a specific output directory and '--cytosine_report' are specified at the same time, the bedGraph2cytosine module will now use the bedGraph file located in the output directory as intended
Methylation Extractor: Added an additional check for the module GD::Graph::colour; if it can't be found on the system drawing of the M-bias plot will be skipped
07-12-2013: Version 0.8.1 released
Methylation Extractor: Changed the function of the option '--ignore ' to ignore the first bp from the 5' end of single-end reads or Read 1 of paired-end files. In addition, added a new option '--ignore_r2 ' to ignore the first bp from the 5' end of Read 2 of paired-end files. Since the first couple of bases in Read 2 of BS-Seq experiments show a severe bias towards non-methylation as a result of the end-repair of sonicated fragments with unmethylated cytosines (see M-bias plot), it is recommended that the the first couple of bp of Read 2 are removed before starting downstream analysis. Please see the section on M-bias plots in the Bismark User Guide for more details
Methylation Extractor: Changed colours, legends and background colour of the M-bias plot
Bismark: Changed the way in which the alignment overview file is being named to actually work
07-12-13: Version 0.8.0 released
Bismark: Added new option '--prefix' to add a prefix to the output filenames. For example, '--prefix test' with 'file.fq' would result in the output file 'test.file.fq_bismark.sam' etc.
Bismark: Fixed a warning message that occurred when chromosomal sequences could not be extracted in paired-end Bowtie2 mode
Bismark: will now generate a pie chart with the alignment statistics once a run has finished; this allows to get a quick overview of how many sequences aligned uniquely or sequences that did not align, either due to producing no alignment at all, multiple mapping or because it was impossible to extract the chromosomal sequence
Methylation Extractor: upon completion, the methylation extractor will now produce an M-bias (methylation bias) plot, which shows the methylation proportion across each possible position in the reads (described in: Hansen et al., Genome Biology, 2012, 13:R83). The data for the M-bias plot will be written into a text file (to generate graphs by alternative means) and drawn into a .png file. The plot also contains the absolute number of methylation calls per position (methylated + unmethylated)
05-10-13: Version 0.7.12 released
Bismark: Removed a rogue sleep(1) command that would slow down single-end Bowtie 2 alignments for a single lane of HiSeq (200M sequences) from 1 day to 6 years and 4 months (roughly)
bismark2bedGraph: keeps now track of the temp files it just created in a session instead of using all files in the output folder ending in ".methXtractor.temp". This lets you kick off the bedGraph conversion step from already sorted, individual methXtractor.temp files if desired
04-22-13: Version 0.7.11 released
Bismark: Fixed non-functional single-end alignments with Bowtie2 which were accidentally brokenby introducing the option '--pbat' in v0.7.10 (an evil 'if' instead of 'elsif'...)
For paired-end alignments with Bowtie 1, the option '--non_bs_mm' would accidentallyconfuse the number of mismatches of read 1 and read 2 whenever the first read alignedin reverse orientation, i.e. for OB and CTOT alignments. This has now been corrected
Previously, the option '--non_bs_mm' would potentially output non-integer values for Bowtie 2 alignments if the read (or reference) contained 'N' characters. Alignment scores from 'N's are now adjusted so that they count as mismatches similar to what Bowtie 1 does. This works for fine reads with up to and including 5 N's (which is quite a lot...)
Methylation extractor: To avoid duplication and keep code modular, the bedGraph conversion step invoked by the option '--bedGraph' is now been farmed out to the module 'bismark2bedGraph'. This script is independent of the methylation extractor and also works as a stand-alone tool from the methylation extractor output (compressed or gzip compressed files). To work well from within the methylation extractor this script (which is now included in the Bismark package) needs to reside in the same folder as the 'bismark_methylation_extractor' itself
bismark2bedGraph: Temporary chromosome files now have an input file name included in their file name to enable parallel processing of several files in the same directory at the same time
To avoid duplication and keep code modular, the bedGraph to genome-wide cytosine methylation report step invoked by the option '--cytosine_report' has now been split out to the module 'bedGraph2cytosine'. This script is independent of the methylation extractor and also works as a stand-alone tool from the Bismark bedGraph '--counts' output (compressed or gzip compressed files). To work well from within the methylation extractor this script (which is now included in the Bismark package) needs to reside in the same folder as the 'bismark_methylation_extractor' itself
Deduplication script: Fixed some warnings that were thrown if '--bam' was not specified
04-18-13: Version 0.7.10 released
Bismark: Added new option '--gzip' that causes temporary bisulfite conversion files to be written out in a GZIP compressed form to save disk space. This option is available for most alignment modes with the exception of paired-end FastA files
Added new option '--bam' that causes the output file to be written out in BAM format instead of the default SAM format. Bismark will attempt to use the path to Samtools that was specified with '--samtools_path', or, if it hasn't been specified explicitly, attempt to find Samtools in the PATH. If no installation of Samtools can be found the SAM output will be compressed with GZIP instead (yielding a .sam.gz output file)
Added new option '--samtools_path' to point Bismark to your Samtools installation, e.g. /home/user/samtools/. Does not need to be specified explicitly if Samtools is in the PATH
Added new option '--pbat' which is to be used for PBAT-Seq libraries (Post-Bisulfite Adapter Tagging; Kobayashi et al., PLoS Genetics, 2012). This is essentially the exact opposite of alignments in 'directional' mode, as it will only launch two alignment threads to the CTOT and CTOB strands instead of the normal OT and OB ones. The option '--pbat' works currently only for single-end and paired-end FastQ files for use with Bowtie1 and uncompressed temporary files only (there are no plans to extend this to other alignment modes at present)
Methylation extractor: The methylation extractor does now also read BAM files, however this requires a working copy of Samtools. The new option '--samtools_path' may point the methylation extractor to your Samtools installation, e.g. /home/user/samtools/. This does not need to be specified explicitly if Samtools is in the PATH
Added new option '--gzip' to write out the primary methylation extractor files (CpG_OT_..., CpG_OB_... etc) in a GZIP compressed form to save disk space. This option does not work on bedGraph and genome-wide cytosine reports as they are 'tiny' anyway
The methylation extractor does now treat InDel free reads differently than before which leads to a 60% increase in extraction speed for ungapped alignments in SAM format!
When sorting methylation calls for the bedGraph step, the methylation extractor does now use the output directory to store temporary sort files instead of the default /tmp/ directory
Deduplication script: The deduplication script does now also read BAM files, however this requires a working copy of Samtools. The new option '--samtools_path' may point the script to your Samtools installation, e.g. /home/user/samtools/. This does not need to be specified explicitly if Samtools is in the PATH
The deduplication script also received the new option '--bam' to write out deduplicated files directly in BAM format. If no installation of Samtools can be found the SAM output will be compressed with GZIP instead (yielding a .sam.gz output file)
03-01-13: Version 0.7.8 released
Bismark: Added new option '--non_bs_mm' which prints an extra column at the end of SAM files showing the number of non-nisulfite mismatches of a read. This option is not available in '--vanilla' format. Format for single-end reads: "XA:Z:mismatches". Format for paired-end reads: read 1: "XA:Z:mismatches", read 2: "XB:Z:mismatches"
Bismark: The mapping report file names were changed to _bismark_(SE/PE)_report.txt (Bowtie 1) or bt2_bismark_(SE/PE)_report.txt (Bowtie 2) to keep it more uniform
Methylation extractor: The input file(s) may now be specified with a file path which abrogates the need to be in the same directory as the input file(s) when calling the methylation extractor
Methylation extractor: Added new function '--buffer_size' to increase the physical memory used for the sorting the output by chromosomal positions (only needed for bedGraph output)
Methylation extractor: Reference sequence files containing pipe ('') characters were found to crash the methylation extractor as the chromosome name was used for filenames. These characters are now replaced with underscores when the reads are sorted during the bedGraph step
Updated the Bismark User Guide with sections for the bedGraph and genome-wide methylation report outputs, and Appendix IV is now showing alignment stats for the test data
02-10-12: Version 0.7.7 released
When reading in the genome file, Bismark does now automatically remove \r line ending characters as well. This sometimes caused problems when genome files had been edited on Windows machines.
Added support for the Bowtie 2 options '--rdg int1,int2' and '--rfg int1,int2' to adjust the gap open and extension penalties for both read and reference sequence. This might be useful in very specialised circumstances (e.g. when handling PacBio data...)
The methylation extractor received a fairly extensive overhaul:
Renamed methylation_extractor to bismark_methylation_extractor
Added new function '-o/--output' to specify an output directory. This became necessary for integration into Galaxy
Added new function '--no_header' to suppress the Bismark version header in the output files if plain alignment data is more desirable
Added option '--bedGraph' to produce a bedGraph output file once the methylation extraction has finished; this reports the genomic location of a cytosine and its methylation state (in %). By default, only cytosines in CpG context will be sorted/reported
Implemented option '--cutoff threshold' to set the minimum number of times a methylation state has to be seen for that nucleotide before its methylation percentage is reported
Implemented option '--counts' which adds two additional columns to the bedGraph output file to enable further calculations: Column 5: count of methylated calls per position Column 6: count of unmethylated calls per position
Implemented option '--CX_context' so that the sorted bedGraph output file contains information on every single cytosine that was covered in the experiment irrespective of its sequence context
Added option '--cytosine_report' which produces a genome-wide methylation report for all cytosines. By default, the output uses 1-based chromosome coordinates and reports CpG context only. The output considers all Cs on both forward and reverse strands and reports their position, strand, trinucleotide content and methylation state
Option '--CX_context' applies to the cytosine report as well. The output file wil contain information on every single cytosine in the genome irrespective of its context. This applies to both forward and reverse strands
Implemented option '--zero_based' to use zero-based coordinates like used in e.g. bed files instead of 1-based coordinates
Implemented option '--genome_folder PATH' to be used to extract sequences from. Accepted formats are FastA files ending with '.fa' or '.fasta'
Added an option '--split_by_chromosome' which writes the cytosine report output to individual chromosome files instead of to one single very large file
23-08-12: Update to genome_methylation2bedGraph script
Added an option '--split_by_chromosome' to enable sorting of very large files. The methylation extractor output is first written into temporary files chromosome by chromosome. These temporary files can then sorted by position and are deleted afterwards
Added an option '--counts' which adds 2 more lines to the output file to enable further calculations (technically no longer in bedGraph format then...) Column 5: count of methylated calls per position Column 6: count of unmethylated calls per position
31-07-12: Version 0.7.6 released
Reworked the way in which SAM files (both single and paired-end) are handled in the methylation extractor so that reads containing InDels, which may be generated by Bismark using Bowtie 2, are now handled as intended. Bismark users employing Bowtie 2 for alignments are strongly encouraged to upgrade to this version
Changed the way in which the methylation extractor identifies the read and genome conversion flags in SAM output. This might become relevant if the Bismark SAM mapping output was compressed/decompressed with CRAM or Goby at some point, since these tools may change the order of optional tags in a SAM entry. Thanks to Z. Zeno for pointing this out and contributing a patch
16-07-12: Version 0.7.5 released
Trailing read ID segment numbers (e.g. /1,/2 or /3) are now removed internally for Bowtie 2 alignments in paired-end mode as this might have caused no reads to align at all if the segment number was not 1 or 2. As of Bowtie 2 version 2.0.0-beta7 this behavior has been disabled for unpaired reads
The Bowtie 2 option -M is now deprecated (as of Bowtie 2 version 2.0.0-beta7). What used to be called -M mode is still the default mode, but adjusting the -M setting is deprecated. The options -D and -R should be used to adjust the effort expended to find valid alignments
Changed the default seed mismatch parameter (controlled by -n) to 1 (down from 2). This increases alignment speed noticably and typically produces very similar results for good quality read data
Fixed a bug where the chromosomal sequence could not be extracted for very short genomic sequences for alignments with Bowtie 2
The methylation extractor and the Bismark alignment output deduplication script do now read both raw and gzipped (.gz) Bismark mapping files
Manual updated accordingly
26-04-12: Version 0.7.4 released
Introduced a new option '--temp_dir ' to which the C-to-T or G-to-A transcribed temporary files can be written to instead of using the same folder that contains the input files. This might become useful for implementation into Galaxy.
The input files to be aligned may now contain path information, e.g. /home/user/file.fq or ../temp/file.fq, and one no longer has to call Bismark from within the directory containing the input files.
05-04-12: Version 0.7.3 released
Corrected a bug for the TLEN field in paired-end SAM output. This value was occasionally calculated incorrectly if both reads were overlapping almost entirely with a difference of only a single bp between the end of one read and the start of the second read. This did not affect the output of the methylation extractor but merely the display of the read alignment itself
Removed a potential source of crashes with gzipped input files and the option -u/--qupto
methylation_extractor: Corrected a potential flaw for the 'remove overlap' option for paired-end alignments in --vanilla mode when the first read aligned in a reverse orientation
methylation_extractor: file endings of all files generated by the methylation extractor will be only a single '.txt' if the file was called .txt before
14-03-12: Version 0.7.2 released
methylation_extractor: changed the file endings of all files generated by the methylation extractor to '.txt'; this is to avoid confusing these files with SAM formatted Bismark output files
deduplicate_Bismark_alignment_output.pl: Fixed a bug for paired-end deduplication mode in SAM format, which only printed the second read alignment of a pair to the deduplicated file
trim_galore: Updated so that non-RRBS FastQ files are adapter and quality trimmed in a single pass
trim_galore: added an option --fastqc_args "..." to pass extra arguments to FastQC for easier integration into pipelines
trim_galore: Added some more documentation and trim_galore can now be downloaded separately here
validate_paired_end_files: Updated so that one can optionally write out unpaired single-end reads should a read-pair fail to be considered a valid paired-end read pair
29-02-12: Version 0.7.1 released
Adjusted Bismark so that white spaces or tab characters in the read IDs get replaced with underscores on the fly. This was necessary because some ID checks would fail as Bowtie2 truncates read IDs if it encounters spaces in the read ID (causing errors with the latest RTA version), whereas Bowtie 1 only truncates read IDs if 'tab' characters were found. More information about this can be found in the RELEASE_NOTES.
An RRBS QC pack is now avaliable for download which contains a brief guide to RRBS, the Cutadapt-wrapping script trim_galore as well as a validate_paired_end_files script to remove read pairs for which at least one of the read has been trimmed to a too short read length due to quality and/or adapter trimming.
24-02-12: Version 0.7.0 released
Changed Bismark's behavior for "--directional" mode (default) to run only 2 parallel instances of Bowtie 1/2 to the original top (OT) and bottom (OB) strands, instead of 4 instances to all possible bisulfite strands. This change might result in somewhat faster alignment speed and mapping efficiency. It is still possible to run the 4-alignment strand mode for any combination of input file(s) and choice of aligner by specifying --non_directional.
Changed the --score_min default function for Bowtie 2 alignments to a more stringent setting of "L,0,-0.2" instead of using the Bowtie2 default function (which was "L,0,-0.6")
06-02-12: Version 0.6.4 released
Adjusted the options -u and -s so that only the non-skipped part of the input file will be transcribed and analysed. This allows splitting up very large files into smaller chunks to allow parallel processing, e.g -s 10000000 -u 20000000 would analyse sequences 10000001 to 20000000. The alignment report will be based on this reduced number of reads analysed
In paired-end mode, the options --unmapped and --ambiguous do now output unaligned or multiply aligned reads, respectively, to their correct output files as intended
Sequences in FastA format do now receive Phred score qualities of 40 throughout (ASCII 'I') to prevent the SAM to BAM conversion in SAMtools from failing
If a genomic sequence could not be extracted it will now also be counted and reported for use with Bowtie 1
Suppressed debugging warning meassages that were printed in error for Bowtie2 alignments (single-end mode only)
04-01-12: Version 0.6.3 released
The methylation extractor does now also work with Bismark SAM output files
Fixed a bug caused when a read was called 0 (zero)
Changed the XX:Z mismatch field in the SAM output to display mismatching nucleotides of the reference sequence (instead of the read sequence ones)
15-12-11: Version 0.6.beta2 released
Added a parallelization option for Bowtie 2 alignments ('-p'). Since it makes use of the option '--reorder' this option requires a Bowtie 2 version of 2.0.0-beta5 or higher. This option is still experimental and is only recommended for use on very powerful hardware setups (i.e. lots of cores and memory).
08-12-11: Version 0.6.beta1 released
Bismark does now also support gapped alignments with Bowtie 2 (when specifying the option '--bowtie2')
The bismark_genome_preparation does now also generate Bowtie 2 bisulfite indexes
The Bismark default output has been changed to SAM format (for both Bowtie 1 and Bowtie 2)
The 'old' output is still available via the option '--vanilla'
Slightly increased the alignment efficiencies for Bowtie 1 alignments
Changed the default mapping behavior to the former option '--directional' ('--non-directional' re-enables four-strand output)
Changed the default maximum insert size parameter (-X/--maxins) for paired-end alignments to 500bp (up from 250bp)
The methylation extractor works currently only on the 'vanilla' Bismark output
The bismark2SAM script will now reverse qualities and methylation calls when reads were reverse-complemented
17-10-11: Version 0.5.4 released
Bismark will now accept input files in either normal, uncompressed or gzipped format
Added the option -o/--output_dir to Bismark which lets you specify the folder for all Bismark output files instead of writing into the same folder as the input file(s). If the output directory does not exist already it will be created first
The path to the genome folder can now be absolute or relative (e.g. ../genomes/mouse/)
Changed the way unmapped or ambiguous reads are reported so that one output file (and/or ambiguous read file) is generated per input file. Their name will be derived from the input file name. For paired-end samples, the unmapped or ambiguous filenames can be discriminated by _1 and _2 in their file names
Added the number of sequences analysed in total to the paired-end report file (was only printed on screen previously)
Fixed a bug for the FastQ output for ambiguous reads where quality scores were not followed by a new line
20-09-11: Update to bismark2SAM script
The bismark2SAM script does now also report the methylation calls in a custom field (XM) for easier downstream processing. In addition, the second read of a paired-end alignment has a 2 at the end of the ID field to reflect the paired-end nature. Thanks to T. McBryan for implementing these new features.
13-09-11: Version 0.5.3 released
Increased the 'chunkmbs' default value to 512 MB (up from 256 MB)
Corrected a mix-up of the strand names of the complementary strands in the alignment report for single-end alignments (see release notes)
Fixed a bug in the genome_methylation_bismark2bedGraph script that was introduced during the 1-based (Bismark) to 0-based (bedGraph) coordinate adaptation in June 2011. Thanks to M.A. Bentley for his contributions to the new version.
Improved the bismark2SAM script to more accurately describe the origin of a bisulfite strand in the bitwise FLAG field. Thanks to E. Vidal for his contributions to the new version.
16-08-11: Version 0.5.2 released
Increased the 'chunkmbs' default value to 256 MB (up from 64 MB)
Bismark will now accept input files in both comma and space separated format
Fixed a bug in the methylation extractor which resulted in offset positions for reverse reads when the option '--ignore' was used (single-end only)
Included a check (and warning) whether the read IDs in the input files contain tab characters, as this will cause Bowtie to truncate the reads and result in no alignments
16-06-11: Version 0.5.1 released
The genome folder for the bismark_genome_preparation can now be specified either as absolute or relative path
Fixed a bug where a newline character was missing after the quality values in the unmapped reads FastQ output
Fixed a bug which prevented paired-end alignments in FastA format
Input files for the methylation extractor can now also have a relative path
21-04-11: Version 0.5.0 released
Bismark alignments should now also support FastQ files produced by Casava v1.8 which will be available soon
The Bismark output will now have an additional column (2 extra columns for paired-end data) with the basecall qualities (in Phred33/ Sanger format; left blank for FastA data)
A bug was fixed for the reporting of paired-end alignments whereby alignments to the CTOT strand were assigned to CTOB strand and vice versa
10-02-11: Version 0.4.1 released
Bisulfite genomes are now written into a multi-FastA file by default. This allows indexing of new genomes with tens of thousands of contigs or scaffolds
The internal reporting of paired-end alignments was changed, so that sequence which produce two identical alignments are preferentially assigned to the original strands as intended
04-02-11: Version 0.4.0 released
The option --directional is now also available for paired-end libraries. This will ignore alignments to strands which should theoretically not be sequenced
Fixed a strand confusion in the alignments summary report for paired-end alignments (this only affected the report but not any alignments as such)
26-01-11: Version 0.3.0 released
The Bismark User Guide replaces the previous documentation (INSTALL.txt and README.txt). It is easy to follow and contains many more details about BS-Seq and Bimark
A BS-Seq test dataset is now available for download. It contains 10K sequences (human, shotgun) in FastQ format, taken from the SRR020138 data set (Lister et al, 2009).
Both bismark and bismark_genome_preparation will now recognise the reference genome sequences with either .fa and .fasta file extensions
18-01-11: Version 0.2.6 released
Fixed a bug which might have been caused by specifying very lax alignment parameters (allowing 10+ non-BS mismatches)
22-12-10: Version 0.2.5 released
Added the option '--un ' to write out unaligned reads to
Added the option '--ambiguous ' to write out ambiguously aligned reads to
18-11-10: Version 0.2.4 released
Added the option '-I/--minins ' to modify the minimum valid insert size for paired-end alignments
Added the option '-X/--maxins ' to modify the maximum valid insert size for paired-end alignments
Changed the remove_tree command in the genome preparation script to rm_tree to be compatible with older versions of Perl (thanks to S. Cooper for spotting this)
04-11-10: Version 0.2.3 released
Added the option '--directional' to Bismark to only report alignments to the original strands if the library was generated in a strand-sepcific manner
The alignment option '--best' will now be selected by default to ensure the best possible alignments
All Bismark output files will now end in .txt so they can be viewed or imported more easily
Changed the reporting format slightly to increase readability
13-09-10: Version 0.2.2 released
Fixed a bug whereby the methylation positions of certain reverse mapped reads were offset by a few bp (in the methylation extractor output)
08-09-10: Version 0.2.1 released
The Bismark aligner will now handle Multi-Fasta-Files (MFA) as intended.
07-09-10: Version 0.2.0 released
Non-CpG context methylation will now be subdivided into CHG and CHH context
Added the option '--chunkmbs ' to counteract Bowtie best-first memory chunk exhaustion warnings in --best and paired-end alignment mode
Added the option '--quiet' so that bowtie warnings can be suppressed if desired
FastA files do no longer need the file extension '.fa' in order to work
Bismark will no longer tolerate non-unique chromosome names when reading the genome into memory
Fixed an issue with paired-end report files
The methylation extractor will by default produce individual output files for CpG, CHG and CHH conext
The methylation extractor can optionally merge CHG and CHH context into 'non-CpG' context if desired
The methylation extract will ensure that its version matches the Bismark version used to generate the Bismark mapping results file
09-08-10: Version 0.1.5 released
Fixed a bug whereby specifying '-n 0' as alignment parameter would not work correctly
06-08-10: Version 0.1.4 released
Bismark will no longer stop during the methylation call process when it encounters ambiguity bases in the reference genome
Fixed a strand-specifity mix-up in the single-end methylation extractor output
03-08-10: Version 0.1.3 released
The genome indexer will now properly (and recursively) remove any pre-existing bisulfite genome directory before creating a new one
The genome indexer will now convert ambiguity code for DNA into N's instead (anything else than C, A, T or G)
The genome indexer does now also accept fastA files with mutltiple sequence entries
Fixed a strand-specifity mix-up in the single-end methylation extractor output
The option to ignore bases in the methylation extractor does now correctly alter the position of the remaining methylation calls
Added an option to the methylation extractor to score overlapping methylation calls for paired-end alignments only once
17-06-10: Version 0.1.2 released
Both single-end and paired-end alignments have a new and final output format (see README.txt for more details)
Bismark and the Methylation Extractor will include their version info in the first line of the output file
Fixed a bug with the chromosome name resolution for paired-end alignments
Reads aligning to the very edges of chromosomes previously produced several error messages when trying to extract one additional bp to determine if Cs are in CpG context. These reads will be excluded.
The Bismark and Methylation Extractor --help option will give info about their output file format
15-06-10: Version 0.1.1 released
Bismark also handles genome fastA files in other formats than only Ensembl format
Fixed a runtime bug with first alignments
14-06-10: Version 0.1 released
Initial release
All basic functions working
Having problems with the site? Please let us know 2ff7e9595c
Comments