Twitter Coverage of the Bioinformatics Open Source Conference 2017
July 21-22 saw the 18th incarnation of the Bioinformatics Open Source Conference, which generally precedes the ISMB meeting. I had the great pleasure of attending BOSC way back in 2003 and delivering a short presentation on Bioperl. I knew almost nothing in those days, but everyone was very kind and appreciative. My trusty R code for Twitter conference hashtags pulled out 3268 tweets and without further ado here is: the Github repository, where you can view the markdown report in the code/R directory the published report at RPubs The ISMB/ECCB meeting wraps today and analysis of Twitter coverage for that meeting will app...
Source: What You're Doing Is Rather Desperate - July 25, 2017 Category: Bioinformatics Authors: nsaunders Tags: bioinformatics meetings statistics bosc Source Type: blogs

Hacking Highcharter: observations per group in boxplots
Highcharts has long been a favourite visualisation library of mine, and I’ve written before about Highcharter, my preferred way to use Highcharts in R. Highcharter has a nice simple function, hcboxplot(), to generate boxplots. I recently generated some for a project at work and was asked: can we see how many observations make up the distribution for each category? This is a common issue with boxplots and there are a few solutions such as: overlay the box on a jitter plot to get some idea of the number of points, or try a violin plot, or a so-called bee-swarm plot. In Highcharts, I figured there should be a method to ...
Source: What You're Doing Is Rather Desperate - July 24, 2017 Category: Bioinformatics Authors: nsaunders Tags: R statistics Source Type: blogs

Chart golf: the “ demographic tsunami ”
In conclusion then: Years as coloured bars: not great Excel: no “Tsunami”: hardly Bonus section: population pyramids I’ve always liked population pyramids, ever since I first learned about them in high school geography class. Here’s my attempt to animate one. The trick is to subset the data by gender, then create two geoms and set the values for one subset to be negative (but not the labels). More commonly, ages are binned and proportions rather than counts may be used, but I did neither in this case. I find it either mesmerising or a bit “too much”, depending on my mood. How about you? Filed ...
Source: What You're Doing Is Rather Desperate - July 21, 2017 Category: Bioinformatics Authors: nsaunders Tags: R statistics news smh sydney visualisation Source Type: blogs

Visualising Twitter coverage of recent bioinformatics conferences
Back in February, I wrote some R code to analyse tweets covering the 2017 Lorne Genome conference. It worked pretty well. So I reused the code for two recent bioinformatics meetings held in Sydney: the Sydney Bioinformatics Research Symposium and the VIZBI 2017 meeting. So without further ado, here are the reports in markdown format, which display quite nicely when pushed to Github: Sydney Bioinformatics Research Symposium 2017 VIZBI 2017 and you can dig around in the repository for the Rmarkdown, HTML and image files, if you like.Filed under: bioinformatics, meetings, R, statistics Tagged: sbrs2017, twitter, vizbi2017 (...
Source: What You're Doing Is Rather Desperate - June 20, 2017 Category: Bioinformatics Authors: nsaunders Tags: bioinformatics meetings statistics sbrs2017 twitter vizbi2017 Source Type: blogs

An update to the nhmrcData R package
Just pushed an updated version of my nhmrcData R package to Github. A quick summary of the changes: In response to feedback, added the packages required for vignette building as dependencies (Imports) – commit Added 8 new datasets with funding outcomes by gender for 2003 – 2013, created from a spreadsheet that I missed first time around – commit and see the README Vignette is not yet updated with new examples. So now you can generate even more depressing charts of funding rates for even more years, such as the one featured on the right (click for full-size). Enjoy and as ever, let me know if there are ...
Source: What You're Doing Is Rather Desperate - March 15, 2017 Category: Bioinformatics Authors: nsaunders Tags: R statistics data nhmrc package rstats Source Type: blogs

The nhmrcData package: NHMRC funding outcomes data made tidy
Do you like R? Information about Australian biomedical research funding outcomes? Tidy data? If the answers to those questions are “yes”, then you may also like nhmrcData, a collection of datasets derived from funding statistics provided by the Australian National Health & Medical Research Council. It’s also my first R package (more correctly, R data package). Read on for the details. 1. Installation The package is hosted at Github and is in a subdirectory of a top-level repository, so it can be installed using the devtools package, then loaded in the usual way: devtools::install_github("neilf...
Source: What You're Doing Is Rather Desperate - March 8, 2017 Category: Bioinformatics Authors: nsaunders Tags: R statistics Source Type: blogs

HTML vignettes crashing your RStudio? This may be the reason
Short version: if RStudio on Windows 7 crashes when viewing vignettes in HTML format, it may be because those packages specify knitr::rmarkdown as the vignette engine, instead of knitr::knitr. Longer version with details – read on. HTML documentation for broom in RStudioAt work I run RStudio (currently version 1.0.136) on Windows 7 (because I have no choice). This works: open the Packages tab click on broom click on User guides, package vignettes and other documentation click on HTML to see documentation for broom::broom HTML documentation for dplyr in RStudioIf I do the same for the dplyr package and choose the ...
Source: What You're Doing Is Rather Desperate - March 6, 2017 Category: Bioinformatics Authors: nsaunders Tags: R statistics debug rstudio software windows Source Type: blogs

Twitter Coverage of the Lorne Genome Conference 2017
Things to know about Lorne in the state of Victoria, Australia. It’s situated on the Great Ocean Road, a major visitor attraction and a great way to see the scenic coastline of the region It’s home to a number of life science conferences including Lorne Genome 2017 This week’s project then: use R to analyse coverage of the 2017 meeting on Twitter. I last did something similar for the ISMB meeting in 2012. How things have changed. Back then I prepared PDF reports using Sweave, retrieved tweets using the twitteR package and struggled with dates and time when plotting timelines. This time around I wrote RM...
Source: What You're Doing Is Rather Desperate - February 16, 2017 Category: Bioinformatics Authors: nsaunders Tags: genomics meetings R statistics conference lorne rstudio rtweet twitter Source Type: blogs

On the passing of Hans Rosling
It would be remiss not to mention briefly the passing of Hans Rosling. Data needs storytellers and the world needs advocates for evidence-based decision making. We have lost one of the best. For some insights into the man and his interesting (and at times challenging) life, I highly recommend this news feature. You can enjoy presentations at the Gapminder website: I’d start with the documentary The Joy of Stats. Perhaps I should not be surprised or annoyed – but I am – at the lack of coverage this story received at news outlets, particularly in Australia. Aside from an obituary at Guardian Australia (not ...
Source: What You're Doing Is Rather Desperate - February 9, 2017 Category: Bioinformatics Authors: nsaunders Tags: statistics communication gapminder obituary presentations Source Type: blogs

Hyetographs, hydrographs and highcharter
Dual y-axes: yes or no? What about if one of them is also reversed, i.e. values increase from the top of the chart to the bottom? Judging by this StackOverflow question, hydrologists are fond of both of these things. It asks whether ggplot2 can be used to generate a “rainfall hyetograph and streamflow hydrograph”, which looks like this: My first thought was “why?” but perhaps, as suggested on Twitter, the chart signifies rain falling from above. My view (and one held more widely) is that dual axes are to be discouraged unless (1) the variables measured in each case are directly comparable with reg...
Source: What You're Doing Is Rather Desperate - February 6, 2017 Category: Bioinformatics Authors: nsaunders Tags: R statistics ggplot2 highcharter highcharts hydrology package Source Type: blogs

Nice graphic? Are they taking the p …
Yes, it started with a tweet: Nice graphic on urine components via https://t.co/sfuXNB02sF pic.twitter.com/vhVLahQ8su — Metabolomics (@metabolomics) January 31, 2017 By what measure is this a “nice graphic”? First, the JPEG itself is low-quality. Second, it contains spelling and numerical errors (more on that later). And third…do I have to spell this out…those are 3D pie charts. Can it be fixed? So far as I know, there isn’t a tool to generate data by extracting labels from images, so I sat down and typed in the numbers manually. Here they are for download. The top and bottom pie cha...
Source: What You're Doing Is Rather Desperate - February 4, 2017 Category: Bioinformatics Authors: nsaunders Tags: R statistics charts data visualisation Source Type: blogs

The real meaning of spurious correlations
Like many data nerds, I’m a big fan of Tyler Vigen’s Spurious Correlations, a humourous illustration of the old adage “correlation does not equal causation”. Technically, I suppose it should be called “spurious interpretations” since the correlations themselves are quite real, but then good marketing is everything. There is, however, a more formal definition of the term spurious correlation or more specifically, as the excellent Wikipedia page is now titled, spurious correlation of ratios. It describes the following situation: You take a bunch of measurements X1, X2, X3… And a second ...
Source: What You're Doing Is Rather Desperate - February 2, 2017 Category: Bioinformatics Authors: nsaunders Tags: R statistics causation correlation proportionality ratios Source Type: blogs

Taking steps (in XML)
So the votes are in: Your established blog is mostly about your work. Your work changes. Do you continue at the current blog or start a new one? — Neil Saunders (@neilfws) January 23, 2017 I thank you, kind readers. So here’s the plan: (1) keep blogging here as frequently as possible (perhaps monthly), (2) on more general “how to do cool stuff with data and R” topics, (3) which may still include biology from time to time. Sounds OK? Good. So: let’s use R to analyse data from the iOS Health app. I own an iPhone. It comes with a Health app installed by default. Not being a big user of mobile...
Source: What You're Doing Is Rather Desperate - February 1, 2017 Category: Bioinformatics Authors: nsaunders Tags: personal statistics this blog health iOS parsing xml Source Type: blogs

A Change of Direction
In this post: a brief summary of what I got up to, work-wise, in 2016 and my plans for a rather different 2017. The short version: it’s goodbye bioinformatics and hello educational data science! It feels as though 2016 was a challenging year for many people in various ways. My year was no exception. I spent the first 6 months working as a data scientist for a healthcare technology startup. It was a new, different and very enjoyable experience. However, a change in the focus of their business resulted in my redundancy mid-way through the year. I took some time out, then began applying for jobs. I received a lot of su...
Source: What You're Doing Is Rather Desperate - January 1, 2017 Category: Bioinformatics Authors: nsaunders Tags: career education personal cese nsw Source Type: blogs

Evidence for a limit to effective peer review
I missed it first time around but apparently, back in October, Nature published a somewhat-controversial article: Evidence for a limit to human lifespan. It came to my attention in a recent tweet: Just wow https://t.co/fupXIOAC43 pic.twitter.com/vsxT3VyTg6 — Nick Loman (@pathogenomenick) December 11, 2016 The source: a fact-check article from Dutch news organisation NRC titled “Nature article is wrong about 115 year limit on human lifespan“. NRC seem rather interested in this research article. They have published another more recent critique of the work, titled “Statistical problems, but not enou...
Source: What You're Doing Is Rather Desperate - December 18, 2016 Category: Bioinformatics Authors: nsaunders Tags: publications R statistics human longevity peer review Source Type: blogs