Twitter coverage of the useR! 2019 conference
Very briefly: Last week was useR! conference time again, coming to you this time from Toulouse, France I’ve retrieved 8 318 tweets that mention #user2019 and run them through my report generator And here are the results Take-home message this year: the R Ladies rock! (Source: What You're Doing Is Rather Desperate)
Source: What You're Doing Is Rather Desperate - July 15, 2019 Category: Bioinformatics Authors: nsaunders Tags: R statistics twitter user2019 Source Type: blogs

Can random forest provide insights into how yeast grows?
I’m not saying this is a good idea, but bear with me. A recent question on Stack Overflow [r] asked why a random forest model was not working as expected. The questioner was working with data from an experiment in which yeast was grown under conditions where (a) the growth rate could be controlled and (b) one of 6 nutrients was limited. Their dataset consisted of 6 rows – one per nutrient – and several thousand columns, with values representing the activity (expression) of yeast genes. Could the expression values be used to predict the limiting nutrient? The random forest was not working as expected: not...
Source: What You're Doing Is Rather Desperate - June 26, 2019 Category: Bioinformatics Authors: nsaunders Tags: bioinformatics genomics statistics expression random forest rstats yeast Source Type: blogs

Geelong and the curse of the bye
This week we return to Australian Rules Football, the R package fitzRoy and some statistics to ask – why can’t Geelong win after a bye? (with apologies to long-time readers who used to come for the science) Code and a report for this blog post are available at Github. First, some background. In 2011 the AFL expanded from 16 to 17 teams with the addition of the Gold Coast Suns. In the same year, a bye round (a week where some teams don’t play) was reintroduced to the competition. For the purposes of this discussion, we are interested only in bye rounds since 2011, and during the regular home/away season. ...
Source: What You're Doing Is Rather Desperate - June 25, 2019 Category: Bioinformatics Authors: nsaunders Tags: australia sport statistics afl geelong rstats Source Type: blogs

Is your phone giving you horns?
No. Why would you even ask that? Well, because this. I sense problems immediately. First, the story is tagged “evolution”. The horns are not arising through inheritance of advantageous mutations, so that isn’t evolution. Second: HORNS. — Alex Holcombe (@ceptional) June 20, 2019 Yes last time I checked, horns were external and pointed upwards. The X-ray seems to show an internal, downward-pointing bone growth. But wait, there’s more. The story mentions (but does not link to) three research articles. Here are the links. Two are freely-available. A morphological adaptation? The prevalence of...
Source: What You're Doing Is Rather Desperate - June 21, 2019 Category: Bioinformatics Authors: nsaunders Tags: australian news health mobile phone Source Type: blogs

Mapping the Vikings using R
The commute to my workplace is 90 minutes each way. Podcasts are my friend. I’m a long-time listener of In Our Time and enjoyed the recent episode about The Danelaw. Melvyn and I hail from the same part of the world, and I learned as a child that many of the local place names there were derived from Old Norse or Danish. Notably: places ending in -by denote a farmstead, settlement or village; those ending in -thwaite mean a clearing or meadow. So how local are those names? Time for some quick and dirty maps using R. First, we’ll need a dataset of British place names. There are quite a few of these online, but t...
Source: What You're Doing Is Rather Desperate - April 3, 2019 Category: Bioinformatics Authors: nsaunders Tags: R statistics ggplot2 history maps podcast rstats viking Source Type: blogs

How long since your team scored 100+ points? This blog ’ s first foray into the fitzRoy R package
When this blog moved from bioinformatics to data science I ran a Twitter poll to ask whether I should start afresh at a new site or continue here. “Continue here”, you said. So let’s test the tolerance of the long-time audience and celebrate the start of the 2019 season as we venture into the world of – Australian football (AFL) statistics! I’ve been hooked on the wonderful sport of AFL since attending my first game, the ANZAC Day match between the Sydney Swans and Melbourne in 2003, and have hardly missed a Swans home game since. However, I don’t think you need to be a sports fanatic &...
Source: What You're Doing Is Rather Desperate - March 22, 2019 Category: Bioinformatics Authors: nsaunders Tags: australia sport statistics afl fitzroy Source Type: blogs

This is not normal(ised)
“Sydney stations where commuters fall through gaps, get stuck in lifts” blares the headline. The story tells us that: Central Station, the city’s busiest, topped the list last year with about 54 people falling through gaps Wow! Wait a minute… Central Station, the city’s busiest Some poking around in the NSW Transport Open Data portal reveals how many people enter every Sydney train station on a “typical” day in 2016, 2017 and 2018. We could manipulate those numbers in various ways to estimate total, unique passengers for FY 2017-18 but I’m going to argue that the value as-is ...
Source: What You're Doing Is Rather Desperate - March 11, 2019 Category: Bioinformatics Authors: nsaunders Tags: australian news statistics smh trains transport Source Type: blogs

Using parameters in Rmarkdown
Nothing new or original here, just something that I learned about quite recently that may be useful for others. One of my more “popular” code repositories, judging by Twitter, is – well, Twitter. It mostly contains Rmarkdown reports which summarise meetings and conferences by analysing usage of their associated Twitter hashtags. The reports follow a common template where the major difference is simply the hashtag. So one way to create these reports is to use the previous one, edit to find/replace the old hashtag with the new one, and save a new file. That works…but what if we could define the hashta...
Source: What You're Doing Is Rather Desperate - March 4, 2019 Category: Bioinformatics Authors: nsaunders Tags: programming statistics automation reports rmarkdown Source Type: blogs

Some thoughts on my recent Twitter break
Various people have suggested that taking a break from social networks – Twitter in particular – can be A Good Thing™. So I tried it, for a couple of weeks. Here’s what I learned. 1. Why a break? The reasons that everyone else cites, I guess. A sense that my stream has swung away from essential information towards noise and distraction, despite attempts to curate it carefully by following “good people”. A realisation that I was habitually reaching for it without thinking, or knowing why. The empty serotonin hit of checking for likes. For me, a growing sense of people existing in bubbles...
Source: What You're Doing Is Rather Desperate - February 19, 2019 Category: Bioinformatics Authors: nsaunders Tags: networking health productivity social networking twitter Source Type: blogs

An absolute beginner ’ s guide to creating data frames for a Stack Overflow [r] question
For better or worse I spend some time each day at Stack Overflow [r], reading and answering questions. If you do the same, you probably notice certain features in questions that recur frequently. It’s as though everyone is copying from one source – perhaps the one at the top of the search results. And it seems highest-ranked is not always best. Nowhere is this more apparent to me than in the way many users create data frames. So here is my introductory guide “how not to create data frames”, aimed at beginners writing their first questions. 1. No need for vectors There is no need to create vectors f...
Source: What You're Doing Is Rather Desperate - February 7, 2019 Category: Bioinformatics Authors: nsaunders Tags: R statistics data frame rstats stack overflow Source Type: blogs

Price ’ s Protein Puzzle: 2019 update
Chains of amino acids strung together make up proteins and since each amino acid has a 1-letter abbreviation, we can find words (English and otherwise) in protein sequences. I imagine this pursuit began as soon as proteins were first sequenced, but the first reference to protein word-finding as a sport is, to my knowledge, “Price’s Protein Puzzle”, a letter to Trends in Biochemical Sciences in September 1987 [1]. Price wrote: It occurred to me that TIBS could organise a competition to find the longest word […] contained within any known protein sequence. The journal took up the challenge and publish...
Source: What You're Doing Is Rather Desperate - January 30, 2019 Category: Bioinformatics Authors: nsaunders Tags: bioinformatics computing statistics algorithm amino acid search words Source Type: blogs

Extracting data from news articles: Australian pollution by postcode
The recent ABC News article Australia’s pollution mapped by postcode reveals nation’s dirty truth is interesting. It contains a searchable table, which is useful if you want to look up your own suburb. However, I was left wanting more: specifically, the raw data and some nice maps. So here’s how I got them, using R. The full details are in this Github repository. There you’ll find the code to generate this report. Essentially, the procedure goes like this: Use rvest to create a data frame from the data table in the online article Clean and pre-process the data using dplyr Join the pollution data w...
Source: What You're Doing Is Rather Desperate - November 28, 2018 Category: Bioinformatics Authors: nsaunders Tags: australia environment statistics geospatial maps pollution rstats Source Type: blogs

Using OSX? Compiling an R package from source? Issues with ‘ -fopenmp ’ ? Try this.
You can file this one under “I may have the very specific solution if you’re having exactly the same problem.” So: if you’re running some R code and you see a warning like this: Warning message: In checkMatrixPackageVersion() : Package version inconsistency detected. TMB was built with Matrix version 1.2.14 Current Matrix version is 1.2.15 Please re-install 'TMB' from source using install.packages('TMB', type = 'source') or ask CRAN for a binary version of 'TMB' matching CRAN's 'Matrix' package And installation of TMB from source fails like this: install.packages("TMB", type = "source") clang: ...
Source: What You're Doing Is Rather Desperate - November 18, 2018 Category: Bioinformatics Authors: nsaunders Tags: programming statistics compiler llvm osx Source Type: blogs

We do not wish to share
The article Cytotoxic T cells modulate inflammation and endogenous opioid analgesia in chronic arthritis contains a statement that I don’t recall seeing before: Availability of data and materials We do not wish to share our data at this moment. This seems odd for an open-access article, published by a “big on open-access” publisher: How is this possible @BioMedCentral ??https://t.co/cgmLq8Weay — Mick Watson (@BioMickWatson) November 15, 2018 However, according to the BMC policy on open data: Question: Do authors need to publish more data than they publish already? Response: We are not requiring ...
Source: What You're Doing Is Rather Desperate - November 15, 2018 Category: Bioinformatics Authors: nsaunders Tags: open access publications biomed central Source Type: blogs

Just use a scatterplot. Also, Sydney sprawls.
In conclusion: Scatterplots – good News article’s interpretation of factors affecting commute time – poor (Source: What You're Doing Is Rather Desperate)
Source: What You're Doing Is Rather Desperate - July 18, 2018 Category: Bioinformatics Authors: nsaunders Tags: australia australian news statistics commuting congestion rstats smh sydney traffic Source Type: blogs