Debuting in a VFL/AFL Grand Final is rare
When Marlion Pickett runs onto the M.C.G for Richmond in the AFL Grand Final this Saturday, he’ll be only the sixth player in 124 finals to debut on the big day. The sole purpose of this blog post is to illustrate how incredibly easy it is to figure this out, thanks to the dplyr and fitzRoy packages. library(dplyr) library(fitzRoy) afldata % select(Season, Round, Date, ID, First.name, Surname, Playing.for, Home.team, Home.score, Away.team, Away.score) %>% group_by(ID) %>% arrange(Date) %>% # a player's first game slice(1) %>% ungroup() %>% # grand finals only filter(Rou...
Source: What You're Doing Is Rather Desperate - September 26, 2019 Category: Bioinformatics Authors: nsaunders Tags: australia sport statistics afl grand final richmond rstats Source Type: blogs

Extracting Sydney transport data from Twitter
The @sydstats Twitter account uses this code base, and data from the Transport for NSW Open Data API to publish insights into delays on the Sydney Trains network. Each tweet takes one of two forms and is consistently formatted, making it easy to parse and extract information. Here are a couple of examples with the interesting parts highlighted in bold: Between 16:00 and 18:30 today, 26% of trips experienced delays. #sydneytrains The worst delay was 16 minutes, on the 18:16 City to Berowra via Gordon service. #sydneytrains I’ve created a Github repository with code and a report showing some ways in which this data ...
Source: What You're Doing Is Rather Desperate - September 10, 2019 Category: Bioinformatics Authors: nsaunders Tags: programming statistics rstats sydney sydstats transport Source Type: blogs

Twitter coverage of the useR! 2019 conference
Very briefly: Last week was useR! conference time again, coming to you this time from Toulouse, France I’ve retrieved 8 318 tweets that mention #user2019 and run them through my report generator And here are the results Take-home message this year: the R Ladies rock! (Source: What You're Doing Is Rather Desperate)
Source: What You're Doing Is Rather Desperate - July 15, 2019 Category: Bioinformatics Authors: nsaunders Tags: R statistics twitter user2019 Source Type: blogs

Can random forest provide insights into how yeast grows?
I’m not saying this is a good idea, but bear with me. A recent question on Stack Overflow [r] asked why a random forest model was not working as expected. The questioner was working with data from an experiment in which yeast was grown under conditions where (a) the growth rate could be controlled and (b) one of 6 nutrients was limited. Their dataset consisted of 6 rows – one per nutrient – and several thousand columns, with values representing the activity (expression) of yeast genes. Could the expression values be used to predict the limiting nutrient? The random forest was not working as expected: not...
Source: What You're Doing Is Rather Desperate - June 26, 2019 Category: Bioinformatics Authors: nsaunders Tags: bioinformatics genomics statistics expression random forest rstats yeast Source Type: blogs

Geelong and the curse of the bye
This week we return to Australian Rules Football, the R package fitzRoy and some statistics to ask – why can’t Geelong win after a bye? (with apologies to long-time readers who used to come for the science) Code and a report for this blog post are available at Github. First, some background. In 2011 the AFL expanded from 16 to 17 teams with the addition of the Gold Coast Suns. In the same year, a bye round (a week where some teams don’t play) was reintroduced to the competition. For the purposes of this discussion, we are interested only in bye rounds since 2011, and during the regular home/away season. ...
Source: What You're Doing Is Rather Desperate - June 26, 2019 Category: Bioinformatics Authors: nsaunders Tags: australia sport statistics afl geelong rstats Source Type: blogs

Is your phone giving you horns?
No. Why would you even ask that? Well, because this. I sense problems immediately. First, the story is tagged “evolution”. The horns are not arising through inheritance of advantageous mutations, so that isn’t evolution. Second: HORNS. — Alex Holcombe (@ceptional) June 20, 2019 Yes last time I checked, horns were external and pointed upwards. The X-ray seems to show an internal, downward-pointing bone growth. But wait, there’s more. The story mentions (but does not link to) three research articles. Here are the links. Two are freely-available. A morphological adaptation? The prevalence of...
Source: What You're Doing Is Rather Desperate - June 21, 2019 Category: Bioinformatics Authors: nsaunders Tags: australian news health mobile phone Source Type: blogs

Mapping the Vikings using R
The commute to my workplace is 90 minutes each way. Podcasts are my friend. I’m a long-time listener of In Our Time and enjoyed the recent episode about The Danelaw. Melvyn and I hail from the same part of the world, and I learned as a child that many of the local place names there were derived from Old Norse or Danish. Notably: places ending in -by denote a farmstead, settlement or village; those ending in -thwaite mean a clearing or meadow. So how local are those names? Time for some quick and dirty maps using R. First, we’ll need a dataset of British place names. There are quite a few of these online, but t...
Source: What You're Doing Is Rather Desperate - April 3, 2019 Category: Bioinformatics Authors: nsaunders Tags: R statistics ggplot2 history maps podcast rstats viking Source Type: blogs

How long since your team scored 100+ points? This blog ’ s first foray into the fitzRoy R package
When this blog moved from bioinformatics to data science I ran a Twitter poll to ask whether I should start afresh at a new site or continue here. “Continue here”, you said. So let’s test the tolerance of the long-time audience and celebrate the start of the 2019 season as we venture into the world of – Australian football (AFL) statistics! I’ve been hooked on the wonderful sport of AFL since attending my first game, the ANZAC Day match between the Sydney Swans and Melbourne in 2003, and have hardly missed a Swans home game since. However, I don’t think you need to be a sports fanatic &...
Source: What You're Doing Is Rather Desperate - March 22, 2019 Category: Bioinformatics Authors: nsaunders Tags: australia sport statistics afl fitzroy Source Type: blogs

This is not normal(ised)
“Sydney stations where commuters fall through gaps, get stuck in lifts” blares the headline. The story tells us that: Central Station, the city’s busiest, topped the list last year with about 54 people falling through gaps Wow! Wait a minute… Central Station, the city’s busiest Some poking around in the NSW Transport Open Data portal reveals how many people enter every Sydney train station on a “typical” day in 2016, 2017 and 2018. We could manipulate those numbers in various ways to estimate total, unique passengers for FY 2017-18 but I’m going to argue that the value as-is...
Source: What You're Doing Is Rather Desperate - March 11, 2019 Category: Bioinformatics Authors: nsaunders Tags: australian news statistics smh trains transport Source Type: blogs

Using parameters in Rmarkdown
Nothing new or original here, just something that I learned about quite recently that may be useful for others. One of my more “popular” code repositories, judging by Twitter, is – well, Twitter. It mostly contains Rmarkdown reports which summarise meetings and conferences by analysing usage of their associated Twitter hashtags. The reports follow a common template where the major difference is simply the hashtag. So one way to create these reports is to use the previous one, edit to find/replace the old hashtag with the new one, and save a new file. That works…but what if we could define the hasht...
Source: What You're Doing Is Rather Desperate - March 4, 2019 Category: Bioinformatics Authors: nsaunders Tags: programming statistics automation reports rmarkdown Source Type: blogs

Some thoughts on my recent Twitter break
Various people have suggested that taking a break from social networks – Twitter in particular – can be A Good Thing™. So I tried it, for a couple of weeks. Here’s what I learned. 1. Why a break? The reasons that everyone else cites, I guess. A sense that my stream has swung away from essential information towards noise and distraction, despite attempts to curate it carefully by following “good people”. A realisation that I was habitually reaching for it without thinking, or knowing why. The empty serotonin hit of checking for likes. For me, a growing sense of people existing in bubbles...
Source: What You're Doing Is Rather Desperate - February 19, 2019 Category: Bioinformatics Authors: nsaunders Tags: networking health productivity social networking twitter Source Type: blogs

An absolute beginner ’ s guide to creating data frames for a Stack Overflow [r] question
For better or worse I spend some time each day at Stack Overflow [r], reading and answering questions. If you do the same, you probably notice certain features in questions that recur frequently. It’s as though everyone is copying from one source – perhaps the one at the top of the search results. And it seems highest-ranked is not always best. Nowhere is this more apparent to me than in the way many users create data frames. So here is my introductory guide “how not to create data frames”, aimed at beginners writing their first questions. 1. No need for vectors There is no need to create vectors f...
Source: What You're Doing Is Rather Desperate - February 7, 2019 Category: Bioinformatics Authors: nsaunders Tags: R statistics data frame rstats stack overflow Source Type: blogs

Price ’ s Protein Puzzle: 2019 update
Chains of amino acids strung together make up proteins and since each amino acid has a 1-letter abbreviation, we can find words (English and otherwise) in protein sequences. I imagine this pursuit began as soon as proteins were first sequenced, but the first reference to protein word-finding as a sport is, to my knowledge, “Price’s Protein Puzzle”, a letter to Trends in Biochemical Sciences in September 1987 [1]. Price wrote: It occurred to me that TIBS could organise a competition to find the longest word […] contained within any known protein sequence. The journal took up the challenge and publis...
Source: What You're Doing Is Rather Desperate - January 30, 2019 Category: Bioinformatics Authors: nsaunders Tags: bioinformatics computing statistics algorithm amino acid search words Source Type: blogs

Extracting data from news articles: Australian pollution by postcode
The recent ABC News article Australia’s pollution mapped by postcode reveals nation’s dirty truth is interesting. It contains a searchable table, which is useful if you want to look up your own suburb. However, I was left wanting more: specifically, the raw data and some nice maps. So here’s how I got them, using R. The full details are in this Github repository. There you’ll find the code to generate this report. Essentially, the procedure goes like this: Use rvest to create a data frame from the data table in the online article Clean and pre-process the data using dplyr Join the pollution data w...
Source: What You're Doing Is Rather Desperate - November 28, 2018 Category: Bioinformatics Authors: nsaunders Tags: australia environment statistics geospatial maps pollution rstats Source Type: blogs

Using OSX? Compiling an R package from source? Issues with ‘ -fopenmp ’ ? Try this.
You can file this one under “I may have the very specific solution if you’re having exactly the same problem.” So: if you’re running some R code and you see a warning like this: Warning message: In checkMatrixPackageVersion() : Package version inconsistency detected. TMB was built with Matrix version 1.2.14 Current Matrix version is 1.2.15 Please re-install 'TMB' from source using install.packages('TMB', type = 'source') or ask CRAN for a binary version of 'TMB' matching CRAN's 'Matrix' package And installation of TMB from source fails like this: install.packages("TMB", type = "so...
Source: What You're Doing Is Rather Desperate - November 19, 2018 Category: Bioinformatics Authors: nsaunders Tags: programming statistics compiler llvm osx Source Type: blogs

We do not wish to share
The article Cytotoxic T cells modulate inflammation and endogenous opioid analgesia in chronic arthritis contains a statement that I don’t recall seeing before: Availability of data and materials We do not wish to share our data at this moment. This seems odd for an open-access article, published by a “big on open-access” publisher: How is this possible @BioMedCentral ??https://t.co/cgmLq8Weay — Mick Watson (@BioMickWatson) November 15, 2018 However, according to the BMC policy on open data: Question: Do authors need to publish more data than they publish already? Response: We are not requiring ...
Source: What You're Doing Is Rather Desperate - November 15, 2018 Category: Bioinformatics Authors: nsaunders Tags: open access publications biomed central Source Type: blogs

Just use a scatterplot. Also, Sydney sprawls.
In conclusion: Scatterplots – good News article’s interpretation of factors affecting commute time – poor (Source: What You're Doing Is Rather Desperate)
Source: What You're Doing Is Rather Desperate - July 18, 2018 Category: Bioinformatics Authors: nsaunders Tags: australia australian news statistics commuting congestion rstats smh sydney traffic Source Type: blogs

Using leaflet, just because
I love it when researchers take the time to share their knowledge of the computational tools that they use. So first, let me point you at Environmental Computing, a site run by environmental scientists at the University of New South Wales, which has a good selection of R programming tutorials. One of these is Making maps of your study sites. It was written with the specific purpose of generating simple, clean figures for publications and presentations, which it achieves very nicely. I’ll be honest: the sole motivator for this post is that I thought it would be fun to generate the map using Leaflet for R as an alterna...
Source: What You're Doing Is Rather Desperate - July 17, 2018 Category: Bioinformatics Authors: nsaunders Tags: R statistics leaflet maps rstats Source Type: blogs

Twitter coverage of the useR! 2018 conference
In summary: useR! the conference for users of R was held in Brisbane earlier this month it sounded like a lot of fun and here’s an analysis of tweets that used the #useR2018 hashtag during the week The code that generated the report (which I’ve used heavily and written about before) is at Github too. A few changes required compared with previous reports, due to changes in the rtweet package, and a weird issue with kable tables breaking markdown headers. I love that the most popular media attachment is a screenshot of a Github repo. (Source: What You're Doing Is Rather Desperate)
Source: What You're Doing Is Rather Desperate - July 16, 2018 Category: Bioinformatics Authors: nsaunders Tags: R statistics rstats twitter user2018 Source Type: blogs

Idle thoughts lead to R internals: how to count function arguments
“Some R functions have an awful lot of arguments”, you think to yourself. “I wonder which has the most?” It’s not an original thought: the same question as applied to the R base package is an exercise in the Functions chapter of the excellent Advanced R. Much of the information in this post came from there. There are lots of R packages. We’ll limit ourselves to those packages which ship with R, and which load on startup. Which ones are they? What packages load on starting R? Start a new R session and type search(). Here’s the result on my machine: search() [1] ".GlobalEnv&...
Source: What You're Doing Is Rather Desperate - June 22, 2018 Category: Bioinformatics Authors: nsaunders Tags: R statistics Source Type: blogs

PubMed retractions report has moved
A brief message for anyone who uses my PubMed retractions report. It’s no longer available at RPubs; instead, you will find it here at Github. Github pages hosting is great, once you figure out that docs/ corresponds to your web root :) Now I really must update the code and try to make it more interesting than a bunch of bar charts. (Source: What You're Doing Is Rather Desperate)
Source: What You're Doing Is Rather Desperate - May 23, 2018 Category: Bioinformatics Authors: nsaunders Tags: R statistics github pmretract pubmed retraction rstats Source Type: blogs

50% bananas
Today in “blog posts that have spent two years in the draft folder” – “Humans are 50% banana.” “Humans are 50% banana.” Perhaps you have heard this statement, or one like it. It seems to be widely-quoted. As an example it’s hard to go past this article from UK tabloid The Mirror which, in addition to the banana, also informs us that “the entire internet weighs about the same as one large strawberry”. I don’t even know where to begin with that one. A couple of years ago whilst between jobs and with time on my hands, I thought I’d go in search of the sou...
Source: What You're Doing Is Rather Desperate - May 9, 2018 Category: Bioinformatics Authors: nsaunders Tags: genomics humour banana human myths Source Type: blogs

Moving from RPubs to Github documents
If you still follow my Twitter feed – I pity you, as it’s been rather boring of late. Consisting largely of Github commit messages, many including the words “knit to github document”. Here’s why. RPubs, an early offering from RStudio, has been a great platform for easy and free publishing of HTML documents generated from RMarkdown and written in RStudio. That said, it’s always been very basic (e.g. no way to organise documents by content, tags). There’s been no real development of the platform for several years and of late, I’ve noticed it’s become less reliable. Bugs, ...
Source: What You're Doing Is Rather Desperate - April 4, 2018 Category: Bioinformatics Authors: nsaunders Tags: R statistics this blog github markdown publlishing reports rpubs Source Type: blogs

Farewell then, PubMed Commons
PubMed Commons, the NCBI’s experiment in comments for PubMed articles, has been discontinued. Thoroughly too, with all traces of it expunged from the NCBI website. Last time I wrote about the service, I concluded “all it needs now is more active users, more comments per user and a real API.” None of those things happened. Result: “NIH has decided that the low level of participation does not warrant continued investment in the project, particularly given the availability of other commenting venues.” NLM also write that “all comments are archived on our FTP site.” A CSV file is avail...
Source: What You're Doing Is Rather Desperate - March 15, 2018 Category: Bioinformatics Authors: nsaunders Tags: R statistics comments pubmed commons rstats Source Type: blogs

Twitter coverage of the Australian Bioinformatics & Computational Biology Society Conference 2017
You know the drill by now. Grab the tweets. Generate the report using RMarkdown. Push to Github. Publish to RPubs. This time it’s the Australian Bioinformatics & Computational Biology Society Conference 2017, including the COMBINE symposium. Looks like a good time was had by all in Adelaide. A couple of quirks this time around. First, the rtweet package went through a brief phase of returning lists instead of nice data frames. I hope that’s been discarded as a bad idea :) Second, results returned from a hashtag search that did not contain said hashtags, but were definitely from the meeting. Yet to get to th...
Source: What You're Doing Is Rather Desperate - November 20, 2017 Category: Bioinformatics Authors: nsaunders Tags: australia bioinformatics meetings statistics abacbs twitter Source Type: blogs

Mapping data using R and leaflet
The R language provides many different tools for creating maps and adding data to them. I’ve been using the leaflet package at work recently, so I thought I’d provide a short example here. Whilst searching for some data that might make a nice map, I came across this article at ABC News. It includes a table containing Australian members of parliament, their electorate and their voting intention regarding legalisation of same-sex marriage. Since I reside in New South Wales, let’s map the data for electorates in that state. Here’s the code at Github. The procedure is pretty straightforward: Obtain a...
Source: What You're Doing Is Rather Desperate - November 15, 2017 Category: Bioinformatics Authors: nsaunders Tags: australia australian news statistics leaflet maps politics ssm Source Type: blogs

XML parsing made easy: is that podcast getting longer?
Sometime in 2009, I began listening to a science podcast titled This Week in Virology, or TWiV for short. I thought it was pretty good and listened regularly up until sometime in 2016, when it seemed that most episodes were approaching two hours in duration. I listen to several podcasts when commuting to/from work, which takes up about 10 hours of my week, so I found it hard to justify two hours for one podcast, no matter how good. Were the episodes really getting longer over time? Let’s find out using R. One thing I’ve learned as a data scientist: management want to see the key points first. So here it is: T...
Source: What You're Doing Is Rather Desperate - October 12, 2017 Category: Bioinformatics Authors: nsaunders Tags: R statistics parsing podcast rss twiv xml Source Type: blogs

Feels like a dry winter – but what does the data say?
A reminder that when idle queries pop into your head, the answer can often be found using R + online data. And a brief excursion into accessing the Weather Underground. One interesting aspect of Australian life, even in coastal urban areas like Sydney, is that sometimes it just stops raining. For weeks or months at a time. The realisation hits slowly: at some point you look around at the yellow-brown lawns, ovals and “nature strips” and say “gee, I don’t remember the last time it rained.” Thankfully in our data-rich world, it’s relatively easy to find out whether the dry spell is really ...
Source: What You're Doing Is Rather Desperate - October 11, 2017 Category: Bioinformatics Authors: nsaunders Tags: australia statistics ggplot2 rainfall sydney underground weather Source Type: blogs

Infographic-style charts using the R waffle package
Conclusion That’s it, more or less. Are the icons any more effective than the squares? In certain settings, perhaps. You be the judge.Filed under: R, statistics Tagged: packages, rstats, visualization, waffle (Source: What You're Doing Is Rather Desperate)
Source: What You're Doing Is Rather Desperate - September 8, 2017 Category: Bioinformatics Authors: nsaunders Tags: R statistics packages rstats visualization waffle Source Type: blogs

Years as coloured bars
I keep seeing years represented by coloured bars. First it was that demographic tsunami chart. Then there are examples like the one on the right, which came up in a web search today. I even saw one (whispers) at work today. I get what they are trying to do – illustrate trends within categories over time – but I don’t think years as coloured bars is the way to go. To me, progression over time suggests that time should be an axis, so as the eye moves along the data from one end to the other, without interruption. What I want to see is categories over time, not time within categories. So what is the way to g...
Source: What You're Doing Is Rather Desperate - August 5, 2017 Category: Bioinformatics Authors: nsaunders Tags: R statistics ggplot2 visualisation Source Type: blogs

To do analysis stuff
First there was “insert statistical method here“. Now we have R – making it easy “to do analysis stuff“. Via Elisabeth; I’ll hand you over now for an entertaining summary. To be fair, analysis stuff describes my working life quite well.  Filed under: humour, publications, uncategorized (Source: What You're Doing Is Rather Desperate)
Source: What You're Doing Is Rather Desperate - August 2, 2017 Category: Bioinformatics Authors: nsaunders Tags: humour publications Source Type: blogs

Twitter Coverage of the ISMB/ECCB Conference 2017
Search all the hashtags ISMB (Intelligent Systems for Molecular Biology – which sounds rather old-fashioned now, doesn’t it?) is the largest conference for bioinformatics and computational biology. It is held annually and, when in Europe, jointly with the European Conference on Computational Biology (ECCB). I’ve had the good fortune to attend twice: in Brisbane 2003 (very enjoyable early in my bioinformatics career, but unfortunately the seed for the “no more southern hemisphere meetings” decision), and in Toronto 2008. The latter was notable for its online coverage and for me, the pleasure of...
Source: What You're Doing Is Rather Desperate - August 2, 2017 Category: Bioinformatics Authors: nsaunders Tags: bioinformatics meetings statistics eccb ismb twitter Source Type: blogs

Twitter Coverage of the Bioinformatics Open Source Conference 2017
July 21-22 saw the 18th incarnation of the Bioinformatics Open Source Conference, which generally precedes the ISMB meeting. I had the great pleasure of attending BOSC way back in 2003 and delivering a short presentation on Bioperl. I knew almost nothing in those days, but everyone was very kind and appreciative. My trusty R code for Twitter conference hashtags pulled out 3268 tweets and without further ado here is: the Github repository, where you can view the markdown report in the code/R directory the published report at RPubs The ISMB/ECCB meeting wraps today and analysis of Twitter coverage for that meeting will app...
Source: What You're Doing Is Rather Desperate - July 25, 2017 Category: Bioinformatics Authors: nsaunders Tags: bioinformatics meetings statistics bosc Source Type: blogs

Hacking Highcharter: observations per group in boxplots
Highcharts has long been a favourite visualisation library of mine, and I’ve written before about Highcharter, my preferred way to use Highcharts in R. Highcharter has a nice simple function, hcboxplot(), to generate boxplots. I recently generated some for a project at work and was asked: can we see how many observations make up the distribution for each category? This is a common issue with boxplots and there are a few solutions such as: overlay the box on a jitter plot to get some idea of the number of points, or try a violin plot, or a so-called bee-swarm plot. In Highcharts, I figured there should be a method to ...
Source: What You're Doing Is Rather Desperate - July 24, 2017 Category: Bioinformatics Authors: nsaunders Tags: R statistics Source Type: blogs

Chart golf: the “ demographic tsunami ”
In conclusion then: Years as coloured bars: not great Excel: no “Tsunami”: hardly Bonus section: population pyramids I’ve always liked population pyramids, ever since I first learned about them in high school geography class. Here’s my attempt to animate one. The trick is to subset the data by gender, then create two geoms and set the values for one subset to be negative (but not the labels). More commonly, ages are binned and proportions rather than counts may be used, but I did neither in this case. I find it either mesmerising or a bit “too much”, depending on my mood. How about you...
Source: What You're Doing Is Rather Desperate - July 21, 2017 Category: Bioinformatics Authors: nsaunders Tags: R statistics news smh sydney visualisation Source Type: blogs

Visualising Twitter coverage of recent bioinformatics conferences
Back in February, I wrote some R code to analyse tweets covering the 2017 Lorne Genome conference. It worked pretty well. So I reused the code for two recent bioinformatics meetings held in Sydney: the Sydney Bioinformatics Research Symposium and the VIZBI 2017 meeting. So without further ado, here are the reports in markdown format, which display quite nicely when pushed to Github: Sydney Bioinformatics Research Symposium 2017 VIZBI 2017 and you can dig around in the repository for the Rmarkdown, HTML and image files, if you like.Filed under: bioinformatics, meetings, R, statistics Tagged: sbrs2017, twitter, vizbi2017 (...
Source: What You're Doing Is Rather Desperate - June 20, 2017 Category: Bioinformatics Authors: nsaunders Tags: bioinformatics meetings statistics sbrs2017 twitter vizbi2017 Source Type: blogs

An update to the nhmrcData R package
Just pushed an updated version of my nhmrcData R package to Github. A quick summary of the changes: In response to feedback, added the packages required for vignette building as dependencies (Imports) – commit Added 8 new datasets with funding outcomes by gender for 2003 – 2013, created from a spreadsheet that I missed first time around – commit and see the README Vignette is not yet updated with new examples. So now you can generate even more depressing charts of funding rates for even more years, such as the one featured on the right (click for full-size). Enjoy and as ever, let me know if there ...
Source: What You're Doing Is Rather Desperate - March 15, 2017 Category: Bioinformatics Authors: nsaunders Tags: R statistics data nhmrc package rstats Source Type: blogs

The nhmrcData package: NHMRC funding outcomes data made tidy
Do you like R? Information about Australian biomedical research funding outcomes? Tidy data? If the answers to those questions are “yes”, then you may also like nhmrcData, a collection of datasets derived from funding statistics provided by the Australian National Health & Medical Research Council. It’s also my first R package (more correctly, R data package). Read on for the details. 1. Installation The package is hosted at Github and is in a subdirectory of a top-level repository, so it can be installed using the devtools package, then loaded in the usual way: devtools::install_github("neilf...
Source: What You're Doing Is Rather Desperate - March 9, 2017 Category: Bioinformatics Authors: nsaunders Tags: R statistics Source Type: blogs

HTML vignettes crashing your RStudio? This may be the reason
Short version: if RStudio on Windows 7 crashes when viewing vignettes in HTML format, it may be because those packages specify knitr::rmarkdown as the vignette engine, instead of knitr::knitr. Longer version with details – read on. HTML documentation for broom in RStudioAt work I run RStudio (currently version 1.0.136) on Windows 7 (because I have no choice). This works: open the Packages tab click on broom click on User guides, package vignettes and other documentation click on HTML to see documentation for broom::broom HTML documentation for dplyr in RStudioIf I do the same for the dplyr package and choose the ...
Source: What You're Doing Is Rather Desperate - March 6, 2017 Category: Bioinformatics Authors: nsaunders Tags: R statistics debug rstudio software windows Source Type: blogs

Twitter Coverage of the Lorne Genome Conference 2017
Things to know about Lorne in the state of Victoria, Australia. It’s situated on the Great Ocean Road, a major visitor attraction and a great way to see the scenic coastline of the region It’s home to a number of life science conferences including Lorne Genome 2017 This week’s project then: use R to analyse coverage of the 2017 meeting on Twitter. I last did something similar for the ISMB meeting in 2012. How things have changed. Back then I prepared PDF reports using Sweave, retrieved tweets using the twitteR package and struggled with dates and time when plotting timelines. This time around I wrote RM...
Source: What You're Doing Is Rather Desperate - February 16, 2017 Category: Bioinformatics Authors: nsaunders Tags: genomics meetings R statistics conference lorne rstudio rtweet twitter Source Type: blogs

On the passing of Hans Rosling
It would be remiss not to mention briefly the passing of Hans Rosling. Data needs storytellers and the world needs advocates for evidence-based decision making. We have lost one of the best. For some insights into the man and his interesting (and at times challenging) life, I highly recommend this news feature. You can enjoy presentations at the Gapminder website: I’d start with the documentary The Joy of Stats. Perhaps I should not be surprised or annoyed – but I am – at the lack of coverage this story received at news outlets, particularly in Australia. Aside from an obituary at Guardian Australia (not ...
Source: What You're Doing Is Rather Desperate - February 9, 2017 Category: Bioinformatics Authors: nsaunders Tags: statistics communication gapminder obituary presentations Source Type: blogs

Hyetographs, hydrographs and highcharter
Dual y-axes: yes or no? What about if one of them is also reversed, i.e. values increase from the top of the chart to the bottom? Judging by this StackOverflow question, hydrologists are fond of both of these things. It asks whether ggplot2 can be used to generate a “rainfall hyetograph and streamflow hydrograph”, which looks like this: My first thought was “why?” but perhaps, as suggested on Twitter, the chart signifies rain falling from above. My view (and one held more widely) is that dual axes are to be discouraged unless (1) the variables measured in each case are directly comparable with reg...
Source: What You're Doing Is Rather Desperate - February 6, 2017 Category: Bioinformatics Authors: nsaunders Tags: R statistics ggplot2 highcharter highcharts hydrology package Source Type: blogs

Nice graphic? Are they taking the p …
Yes, it started with a tweet: Nice graphic on urine components via https://t.co/sfuXNB02sF pic.twitter.com/vhVLahQ8su — Metabolomics (@metabolomics) January 31, 2017 By what measure is this a “nice graphic”? First, the JPEG itself is low-quality. Second, it contains spelling and numerical errors (more on that later). And third…do I have to spell this out…those are 3D pie charts. Can it be fixed? So far as I know, there isn’t a tool to generate data by extracting labels from images, so I sat down and typed in the numbers manually. Here they are for download. The top and bottom pie c...
Source: What You're Doing Is Rather Desperate - February 4, 2017 Category: Bioinformatics Authors: nsaunders Tags: R statistics charts data visualisation Source Type: blogs

The real meaning of spurious correlations
Like many data nerds, I’m a big fan of Tyler Vigen’s Spurious Correlations, a humourous illustration of the old adage “correlation does not equal causation”. Technically, I suppose it should be called “spurious interpretations” since the correlations themselves are quite real, but then good marketing is everything. There is, however, a more formal definition of the term spurious correlation or more specifically, as the excellent Wikipedia page is now titled, spurious correlation of ratios. It describes the following situation: You take a bunch of measurements X1, X2, X3… And a se...
Source: What You're Doing Is Rather Desperate - February 3, 2017 Category: Bioinformatics Authors: nsaunders Tags: R statistics causation correlation proportionality ratios Source Type: blogs

Taking steps (in XML)
So the votes are in: Your established blog is mostly about your work. Your work changes. Do you continue at the current blog or start a new one? — Neil Saunders (@neilfws) January 23, 2017 I thank you, kind readers. So here’s the plan: (1) keep blogging here as frequently as possible (perhaps monthly), (2) on more general “how to do cool stuff with data and R” topics, (3) which may still include biology from time to time. Sounds OK? Good. So: let’s use R to analyse data from the iOS Health app. I own an iPhone. It comes with a Health app installed by default. Not being a big user of mobile...
Source: What You're Doing Is Rather Desperate - February 1, 2017 Category: Bioinformatics Authors: nsaunders Tags: personal statistics this blog health iOS parsing xml Source Type: blogs

A Change of Direction
In this post: a brief summary of what I got up to, work-wise, in 2016 and my plans for a rather different 2017. The short version: it’s goodbye bioinformatics and hello educational data science! It feels as though 2016 was a challenging year for many people in various ways. My year was no exception. I spent the first 6 months working as a data scientist for a healthcare technology startup. It was a new, different and very enjoyable experience. However, a change in the focus of their business resulted in my redundancy mid-way through the year. I took some time out, then began applying for jobs. I received a lot of su...
Source: What You're Doing Is Rather Desperate - January 1, 2017 Category: Bioinformatics Authors: nsaunders Tags: career education personal cese nsw Source Type: blogs

Evidence for a limit to effective peer review
I missed it first time around but apparently, back in October, Nature published a somewhat-controversial article: Evidence for a limit to human lifespan. It came to my attention in a recent tweet: Just wow https://t.co/fupXIOAC43 pic.twitter.com/vsxT3VyTg6 — Nick Loman (@pathogenomenick) December 11, 2016 The source: a fact-check article from Dutch news organisation NRC titled “Nature article is wrong about 115 year limit on human lifespan“. NRC seem rather interested in this research article. They have published another more recent critique of the work, titled “Statistical problems, but not enou...
Source: What You're Doing Is Rather Desperate - December 18, 2016 Category: Bioinformatics Authors: nsaunders Tags: publications R statistics human longevity peer review Source Type: blogs

An Analysis of Contributions to PubMed Commons
I recently saw a tweet floating by which included a link to some recent statistics from PubMed Commons, the NCBI service for commenting on scientific articles in PubMed. Perhaps it was this post at their blog. So I thought now would be a good time to write some code to analyse PubMed Commons data. The tl;dr version: here’s the Github repository and the RPubs report. For further details and some charts, read on. Currently, there is no access to PubMed Commons data via the NCBI Entrez API aside from a PubMed search filter to return articles that have comments. However, a Google search for “pubmed commons api&rdq...
Source: What You're Doing Is Rather Desperate - December 1, 2016 Category: Bioinformatics Authors: nsaunders Tags: publications R statistics comments ncbi pubmed commons Source Type: blogs

Putting data on maps using R: easier than ever
New Zealand earthquake density 2010 – November 2016Using R to add data to maps has been pretty straightforward for a few years now. That said, it seems easier than ever to do things like use map APIs (e.g. Google, Open Street Map), overlay quite complex data visualisations (e.g. “heatmap-style” densities) and even generate animations. A couple of key R packages in this space: ggmap and gganimate. To illustrate, I’ve used data from the recent New Zealand earthquake to generate some static maps and an animation. Here’s the Github repository and a report published at RPubs. Thanks to Florian Tesc...
Source: What You're Doing Is Rather Desperate - November 24, 2016 Category: Bioinformatics Authors: nsaunders Tags: R statistics earthquakes ggplot2 maps new zealand visualisation Source Type: blogs

The y-axis: to zero or not to zero
I don’t “do politics” at this blog, but I’m always happy to do charts. Here’s one that’s been doing the rounds on Twitter recently: A quick look at turnout data: It seems 2016 was nothing special for the Rep-candidate. It's the Dem-candidate that didn't get the vote out. pic.twitter.com/wby3gta26m — D Yanagizawa-Drott (@yanagiz) November 9, 2016 What’s the first thing that comes into your mind on seeing that chart? It seems that there are two main responses to the chart: Wow, what happened to all those Democrat voters between 2008 and 2016? Wow, that’s misle...
Source: What You're Doing Is Rather Desperate - November 20, 2016 Category: Bioinformatics Authors: nsaunders Tags: R statistics charts graph politics visualisation Source Type: blogs