Gene names, data corruption and Excel: the final chapter?
I suppose that after: Gene name errors and Excel: lessons not learned (2012) Data corruption using Excel: 12+ years and counting (2016) When your tools are broken, just change the data (2019-20) and Gene names, data corruption and Excel: a 2021 update (2021) it would be remiss of me not to mention: Microsoft fixes the Excel feature that was wrecking scientific data. Is it really fixed though? Users have to know that the feature exists, find it and toggle a checkbox. Given that the users most “at risk” probably open CSV files in Excel by default simply by clicking on them…I’m ...
Source: What You're Doing Is Rather Desperate - October 27, 2023 Category: Bioinformatics Authors: nsaunders Tags: bioinformatics errors excel genes Source Type: blogs

Price ’ s Protein Puzzle: 2023 update
One of the joys (?) of having been online for…quite some time now…is watching topics reappear every few years or so. What is the longest coherent word or phrase present in the amino acid sequence of a real protein?— Dr. Caroline Bartman (@Caroline_Bartma) July 21, 2023 Yes, it’s Price’s Protein Puzzle which I last wrote about back in 2019. The good news is that my code still runs, so I’ve updated the results of an English word search versus the UniProt Reviewed (Swiss-Prot) protein database. Just for fun I threw in a few other languages too. So what’s new? In terms of...
Source: What You're Doing Is Rather Desperate - July 26, 2023 Category: Bioinformatics Authors: nsaunders Tags: bioinformatics statistics algorithm amino acid rstats search words Source Type: blogs

The “ curse of the bye ” revisited
A while ago we looked at Geelong and the curse of the bye. And since the AFL media have outdone themselves this year with “curse of the bye” articles: see for example here, here, here and here, I decided to revisit the topic in more depth. If you like that kind of thing head over to the report at Github. It has lots of charts like this one. Executive summary: once you take into account scheduling and expected results, there’s little if any evidence for significantly more losses coming off a bye round. I doubt that will prevent the same spate of articles next season. (Source: What You're Doing Is Rather Desperate)
Source: What You're Doing Is Rather Desperate - July 10, 2023 Category: Bioinformatics Authors: nsaunders Tags: australia sport statistics afl bye rstats Source Type: blogs

Has your knowledge stopped updating?
Some years ago I read an article – I forget where – describing how our general knowledge often becomes frozen in time. Asked to name the tallest building in the world you confidently proclaim “the Sears Tower!”, because for most of your childhood that was the case – never mind that the record was surpassed long ago and it isn’t even called the Sears Tower anymore. From memory the example in the article was of a middle-aged speaker who constantly referred to a figure of 4 billion for the human population – again, because that’s what he learned in school and had never mentally ...
Source: What You're Doing Is Rather Desperate - January 27, 2023 Category: Bioinformatics Authors: nsaunders Tags: education R statistics readr read_csv rstats tidyverse Source Type: blogs

Editing metadata in trail camera images using R, magick and exiftool
I have a new hobby: camera traps, also known as trail cameras. Strapped to trees in my local bushland they sit in wait, firing automatically when triggered by a passing animal. Once in a while, something quite magical happens. The camera model I chose is the Campark T85 which for me, had the right combination of features and price point. One useful feature is the ability to transfer images and video to a phone wirelessly (albeit through a rather clunky phone app). Unfortunately, images retrieved in this way have one major flaw: an almost-complete absence of metadata. There is no GPS in the camera of course, but th...
Source: What You're Doing Is Rather Desperate - October 25, 2022 Category: Bioinformatics Authors: nsaunders Tags: environment programming statistics campark exiftool metadata photography rstats trail camera Source Type: blogs

Using R to detect the pressure wave from the 2022 Hunga Tonga eruption in personal weather station data
It seems like an age ago, but in fact it was only mid-January 2022 when this happened: The satellite imagery from the Hunga Tonga eruption is unreal. Direct your attention to the lower right. The eruption then shock wave is simply incredible. pic.twitter.com/OTLCgyEozQ— Taylor Trogdon (@TTrogdon) January 15, 2022 Wow. Now, pause for a moment and try to recall the last time you read any news about Tonga since the event.The eruption sent an atmospheric pressure wave, clearly visible in this imagery, around the world. Friends online reported that this was detected by their personal weather stations (PWS) which made ...
Source: What You're Doing Is Rather Desperate - March 29, 2022 Category: Bioinformatics Authors: nsaunders Tags: australia environment statistics world news hunga tonga rstats weather wunderground Source Type: blogs

Using R/fitzRoy to ask: how many times a V/AFL team with the same lineup has played together?
If you sit in the intersection of “likes Australian Rules football / finds sport statistics interesting / is on Twitter”, you’ve probably come across Swamp. One of his recent tweets tells us that: No V/@AFL premiership winning lineup have all played together in another V/@AFL match, there has always been at least one person missingAll MELB 2021 premiership players are still at the club in 2022 @melbournefc— Swamp (@sirswampthing) March 16, 2022 You may go on to ask: has any team lineup from one of the almost 16 000 recorded games played together again in another game? And if so, how often? Th...
Source: What You're Doing Is Rather Desperate - March 28, 2022 Category: Bioinformatics Authors: nsaunders Tags: australia sport statistics afl fitzroy rstats Source Type: blogs

Enhancement of old colour photographs using Generative Adversarial Networks
It’s almost Christmas, I haven’t posted anything in a while and I see that WordPress has an Image Compare feature, so let’s have some colourful fun. When I’m not at the computer writing R code, I can often be found at the computer processing photographs. Or at the computer browsing Twitter, which is how I came across Stuart Humphryes, a digital artist who enhances autochromes. Autochromes are early colour photographs, generated using a process patented by the Lumière brothers in 1903. You can find and download many examples of them online. Stuart uses a variety of software tools to clean, enhanc...
Source: What You're Doing Is Rather Desperate - December 23, 2021 Category: Bioinformatics Authors: nsaunders Tags: multimedia enhancement gan image photography processing python Source Type: blogs

Gene names, data corruption and Excel: a 2021 update
It’s an old favourite of this blog, isn’t it. We had Gene name errors and Excel: lessons not learned (2012). Followed by Data corruption using Excel: 12+ years and counting (2016). Perhaps most depressingly of all, the conclusion of the trilogy, When your tools are broken, just change the data (2019-20). Well, I’m happy (?) to see the publication of the latest instalment, inspired in part by the title of my first post: Gene name errors: Lessons not learned, from Mark Ziemann’s group. Here’s the accompanying Twitter thread. Summary: it’s even worse than we thought. Tagging this one with t...
Source: What You're Doing Is Rather Desperate - August 3, 2021 Category: Bioinformatics Authors: nsaunders Tags: bioinformatics statistics errors excel genes hgnc Source Type: blogs

How I resurrected my ancient PhD thesis using R/bookdown (and some other tools)
An ancient thesisI’ve long admired the look of publications generated using the R bookdown package, and thought it would be fun and educational to publish one myself. The problem is that I am not writing a book and have no plans to do so any time soon. Then I remembered that I’ve already written a book. There it is on the right. It’s called “Cloning, sequence analysis and studies on the expression of the nirS gene, encoding cytochrome cd1 nitrite reductase, from Thiosphaera pantotropha“. Catchy title, hey. It’s from my former life, as a biochemistry graduate turned reluctant molecular mi...
Source: What You're Doing Is Rather Desperate - July 22, 2021 Category: Bioinformatics Authors: nsaunders Tags: personal statistics bookdown oxford phd rstats thesis thesisdown Source Type: blogs

Florence Nightingale s rose charts (and others) in ggplot2
It’s been a while. I hope you are all well. Shall we make some charts? About this time last year, one of my life-long dreams came true when I was told that I could work from home indefinitely. One effect of this – I won’t say downside – is that I don’t get through as many podcast episodes as I used to. Only a select few podcasts make the cut, and one of those is 99% Invisible. I first heard Florence Nightingale and her Geeks Declare War on Death, an episode of the Cautionary Tales podcast, premiered as a special episode of 99% Invisible. It discusses Nightingale’s work as a stat...
Source: What You're Doing Is Rather Desperate - March 16, 2021 Category: Bioinformatics Authors: nsaunders Tags: R statistics crimea florence nightingale ggplot2 podcast polar area rstats Source Type: blogs

Florence Nightingale ’ s “ rose charts ” (and others) in ggplot2
It’s been a while. I hope you are all well. Shall we make some charts? About this time last year, one of my life-long dreams came true when I was told that I could work from home indefinitely. One effect of this – I won’t say downside – is that I don’t get through as many podcast episodes as I used to. Only a select few podcasts make the cut, and one of those is 99% Invisible. I first heard Florence Nightingale and her Geeks Declare War on Death, an episode of the Cautionary Tales podcast, premiered as a special episode of 99% Invisible. It discusses Nightingale’s work as a stat...
Source: What You're Doing Is Rather Desperate - March 16, 2021 Category: Bioinformatics Authors: nsaunders Tags: R statistics crimea florence nightingale ggplot2 podcast polar area rstats Source Type: blogs

When your tools are broken, just change the data
It’s been 3 years since we last visited that old favourite recurring topic, data corruption by Excel. Specifically, the unwanted auto-conversion of identifiers that look like dates, e.g. SEPT1, to – well, dates. Here’s a new twist – well, a two year-old twist in fact, as I don’t keep up to date with this field any longer: TIL that SEPT genes were renamed in 2017 to SEPTIN genes by the HGNC https://t.co/2UadZUMLCS pic.twitter.com/jCo0Hcf6sf — mdziemann (@mdziemann) October 8, 2019 Yes, in 2017 the HGNC decided that the solution to this long-standing issue is to rename the offending gen...
Source: What You're Doing Is Rather Desperate - October 9, 2019 Category: Bioinformatics Authors: nsaunders Tags: bioinformatics computing genomics software excel genes hgnc nomenclature Source Type: blogs

Debuting in a VFL/AFL Grand Final is rare
When Marlion Pickett runs onto the M.C.G for Richmond in the AFL Grand Final this Saturday, he’ll be only the sixth player in 124 finals to debut on the big day. The sole purpose of this blog post is to illustrate how incredibly easy it is to figure this out, thanks to the dplyr and fitzRoy packages. library(dplyr) library(fitzRoy) afldata <- get_afltables_stats() afldata %>% select(Season, Round, Date, ID, First.name, Surname, Playing.for, Home.team, Home.score, Away.team, Away.score) %>% group_by(ID) %>% arrange(Date) %>% # a player's first game slice(1) %>% ungrou...
Source: What You're Doing Is Rather Desperate - September 26, 2019 Category: Bioinformatics Authors: nsaunders Tags: australia sport statistics afl grand final richmond rstats Source Type: blogs

Extracting Sydney transport data from Twitter
The @sydstats Twitter account uses this code base, and data from the Transport for NSW Open Data API to publish insights into delays on the Sydney Trains network. Each tweet takes one of two forms and is consistently formatted, making it easy to parse and extract information. Here are a couple of examples with the interesting parts highlighted in bold: Between 16:00 and 18:30 today, 26% of trips experienced delays. #sydneytrains The worst delay was 16 minutes, on the 18:16 City to Berowra via Gordon service. #sydneytrains I’ve created a Github repository with code and a report showing some ways in which this data ...
Source: What You're Doing Is Rather Desperate - September 10, 2019 Category: Bioinformatics Authors: nsaunders Tags: programming statistics rstats sydney sydstats transport Source Type: blogs