Genapsys' Base Caller: Mysterious, But Not Ideal?
WhenI wrote about Genapsys' pre-print on their sequencing system the other night, I intended that to be the last I wrote until some major news from them.   But after launching that into the great Internet ether,  I found myself lying awake wondering if a very simple idea had any merit.  Painfully simple -- I almost didn't pursue it because it was so simple and obvious.   But, it turns out it appears to have merit -- there may be an obvious route to improving the accuracy of Genapsys' basecalling on homopolymers.   And that also took me into ground I've thought about before -- going back to my first yea...
Source: Omics! Omics! - May 5, 2019 Category: Bioinformatics Authors: Keith Robison Source Type: blogs

Poking at Genapsys Preprint
Genapsys is continuing down the path of pre-launch information, most recentlyreleasing a pre-print.   I'm looking at this pre-print critically and unfortunately turning into a bit of Reviewer #3.   Not that anything is fatal and pre-publication review is a key value to pre-prints. If I were an actual reviewer I'd be writing mostly the same things and covering more vertebrate species than they sequenced (a human exome panel   was included, though most samples were bacterial) -- I'd grouse about a missing figure (which I've provided), carp about critical details not provided and beef over a public data dep...
Source: Omics! Omics! - May 1, 2019 Category: Bioinformatics Authors: Keith Robison Source Type: blogs

Want to Run An Exciting Sequencing Group? Ginkgo Is Looking for You!
I've awakened from my blogging torpor to point out a really interesting career opportunity for the types who might read this space. Ginkgo Bioworks, one of the leading synthetic biology companies in the world, is looking forsomeone to run their existing Next Generation Sequencing group. It's a chance to run an energetic high-throughput sequencing group that works on a wide range of projects. And, as you might of guessed from the fact I'm writing about it here, you'd also get to be my boss. I'm hoping many will see that as a feature and not a bug.Read more » (Source: Omics! Omics!)
Source: Omics! Omics! - April 23, 2019 Category: Bioinformatics Authors: Keith Robison Source Type: blogs

Mapping the Vikings using R
The commute to my workplace is 90 minutes each way. Podcasts are my friend. I’m a long-time listener of In Our Time and enjoyed the recent episode about The Danelaw. Melvyn and I hail from the same part of the world, and I learned as a child that many of the local place names there were derived from Old Norse or Danish. Notably: places ending in -by denote a farmstead, settlement or village; those ending in -thwaite mean a clearing or meadow. So how local are those names? Time for some quick and dirty maps using R. First, we’ll need a dataset of British place names. There are quite a few of these online, but t...
Source: What You're Doing Is Rather Desperate - April 3, 2019 Category: Bioinformatics Authors: nsaunders Tags: R statistics ggplot2 history maps podcast rstats viking Source Type: blogs

Nanosens Publishes Proof-of-Concept for Point-of-Care CNV Diagnostic
Here's a killer technological challenge for anyone: design a scheme to detect vanishingly small concentrations of a valuable analyte in a biological fluid.   The assay must require zero pipetting, work in the field at ambient temperature, generate results quickly, contain positive and negative controls, be usefully precise and accurate, and be usable by personnel with no formal technical training.  Oh, and be dirt cheap as well.Read more » (Source: Omics! Omics!)
Source: Omics! Omics! - March 24, 2019 Category: Bioinformatics Authors: Keith Robison Source Type: blogs

Nanosens Publishes Proof-of-Concept for Point-off-Care CNV Diagnostic
Here's a killer technological challenge for anyone: design a scheme to detect vanishingly small concentrations of a valuable analyte in a biological fluid.   The assay must require zero pipetting, work in the field at ambient temperature, generate results quickly, contain positive and negative controls, be usefully precise and accurate, and be usable by personnel with no formal technical training.  Oh, and be dirt cheap as well.Read more » (Source: Omics! Omics!)
Source: Omics! Omics! - March 24, 2019 Category: Bioinformatics Authors: Keith Robison Source Type: blogs

How long since your team scored 100+ points? This blog ’ s first foray into the fitzRoy R package
When this blog moved from bioinformatics to data science I ran a Twitter poll to ask whether I should start afresh at a new site or continue here. “Continue here”, you said. So let’s test the tolerance of the long-time audience and celebrate the start of the 2019 season as we venture into the world of – Australian football (AFL) statistics! I’ve been hooked on the wonderful sport of AFL since attending my first game, the ANZAC Day match between the Sydney Swans and Melbourne in 2003, and have hardly missed a Swans home game since. However, I don’t think you need to be a sports fanatic &...
Source: What You're Doing Is Rather Desperate - March 22, 2019 Category: Bioinformatics Authors: nsaunders Tags: australia sport statistics afl fitzroy Source Type: blogs

This is not normal(ised)
“Sydney stations where commuters fall through gaps, get stuck in lifts” blares the headline. The story tells us that: Central Station, the city’s busiest, topped the list last year with about 54 people falling through gaps Wow! Wait a minute… Central Station, the city’s busiest Some poking around in the NSW Transport Open Data portal reveals how many people enter every Sydney train station on a “typical” day in 2016, 2017 and 2018. We could manipulate those numbers in various ways to estimate total, unique passengers for FY 2017-18 but I’m going to argue that the value as-is ...
Source: What You're Doing Is Rather Desperate - March 11, 2019 Category: Bioinformatics Authors: nsaunders Tags: australian news statistics smh trains transport Source Type: blogs

Using parameters in Rmarkdown
Nothing new or original here, just something that I learned about quite recently that may be useful for others. One of my more “popular” code repositories, judging by Twitter, is – well, Twitter. It mostly contains Rmarkdown reports which summarise meetings and conferences by analysing usage of their associated Twitter hashtags. The reports follow a common template where the major difference is simply the hashtag. So one way to create these reports is to use the previous one, edit to find/replace the old hashtag with the new one, and save a new file. That works…but what if we could define the hashta...
Source: What You're Doing Is Rather Desperate - March 4, 2019 Category: Bioinformatics Authors: nsaunders Tags: programming statistics automation reports rmarkdown Source Type: blogs

Beyond Generations: My Vocabulary for Sequencing Tech
Many writers have attempted to divide Next Generation Sequencing into Second Generation Sequencing and Third Generation Sequencing.   Personally, I think it isn't helpful and just confuses matters.   I'm not the biggest fan of Next Generation Sequencing (NGS) to start with, as like"post-modern architecture" (or heck,"modern architecture") it isn't future-proofed.   Not that I wouldn't take a job with NGS in the title, but still not a favorite.   High Throughput Sequencing feels a little better, but again doesn't leave room for distinguishing growth -- and HTS as an abbreviation i...
Source: Omics! Omics! - February 20, 2019 Category: Bioinformatics Authors: Keith Robison Source Type: blogs

Some thoughts on my recent Twitter break
Various people have suggested that taking a break from social networks – Twitter in particular – can be A Good Thing™. So I tried it, for a couple of weeks. Here’s what I learned. 1. Why a break? The reasons that everyone else cites, I guess. A sense that my stream has swung away from essential information towards noise and distraction, despite attempts to curate it carefully by following “good people”. A realisation that I was habitually reaching for it without thinking, or knowing why. The empty serotonin hit of checking for likes. For me, a growing sense of people existing in bubbles...
Source: What You're Doing Is Rather Desperate - February 19, 2019 Category: Bioinformatics Authors: nsaunders Tags: networking health productivity social networking twitter Source Type: blogs

An absolute beginner ’ s guide to creating data frames for a Stack Overflow [r] question
For better or worse I spend some time each day at Stack Overflow [r], reading and answering questions. If you do the same, you probably notice certain features in questions that recur frequently. It’s as though everyone is copying from one source – perhaps the one at the top of the search results. And it seems highest-ranked is not always best. Nowhere is this more apparent to me than in the way many users create data frames. So here is my introductory guide “how not to create data frames”, aimed at beginners writing their first questions. 1. No need for vectors There is no need to create vectors f...
Source: What You're Doing Is Rather Desperate - February 7, 2019 Category: Bioinformatics Authors: nsaunders Tags: R statistics data frame rstats stack overflow Source Type: blogs

Failing to Fetch An Interesting Result on Dog Oncogene Homologs
An idea for a little exploration occurred to me back at Infinity -- that is 7.5 years ago -- that I've never tried out.   But I never got around to it.  I had some downtime recently  to play around so I finally executed the experiment -- alas, it turns out not to be very interesting.  Still, a negative result is a negative result.Read more » (Source: Omics! Omics!)
Source: Omics! Omics! - February 7, 2019 Category: Bioinformatics Authors: Keith Robison Source Type: blogs

Price ’ s Protein Puzzle: 2019 update
Chains of amino acids strung together make up proteins and since each amino acid has a 1-letter abbreviation, we can find words (English and otherwise) in protein sequences. I imagine this pursuit began as soon as proteins were first sequenced, but the first reference to protein word-finding as a sport is, to my knowledge, “Price’s Protein Puzzle”, a letter to Trends in Biochemical Sciences in September 1987 [1]. Price wrote: It occurred to me that TIBS could organise a competition to find the longest word […] contained within any known protein sequence. The journal took up the challenge and publish...
Source: What You're Doing Is Rather Desperate - January 30, 2019 Category: Bioinformatics Authors: nsaunders Tags: bioinformatics computing statistics algorithm amino acid search words Source Type: blogs

Covaris Grabs A Spot on the Liquid Handler Deck
For as long as I can remember, Covaris has been the standard in DNA shearing for high throughput short read sequencing.   Their benchtop units had their quirks -- custom tubes being the foremost -- but they were what everyone else compared to.  In 2013 when the American Society for Human Genetics was in town, the PacBio folks did me a great favor and loaned me an exhibit hall pass.  Multiple companies were offerin g DNA shearing instruments -- and every one compared themselves against Covaris.  Now they have a new offering, moving the instrument onto a liquid handling robot deck so that it is available for high-through...
Source: Omics! Omics! - January 30, 2019 Category: Bioinformatics Authors: Keith Robison Source Type: blogs