Sir Henry at Rawlinson End

Well, it was bound to happen.

The story so far….

While the 1980 movie “Sir Henry at Rawlinson End” (IMDB link) may be obscure, the radio series that precipitated it is probably even more obscure. Although Stanshall intended to compress the original BBC radio series in the eponymous 1978 album, by then Stanshall’s alcoholism had taken a turn for the worse and the album is definitely just a shadow of the original radio series. The almost-sequel Sir Henry at N’didi’s Kraal (1984) was quite a step down from the previous album.

The recordings of the original BBC series can be found online; but to understand the more obscure references in the series I’ve relied on a script published online at Until that went offline. Something’s supposed to be happening at the Vivian Stanshall Appreciation Society and Archive, but that page seems empty, too. Wayback machine has the last capture of at April 2018. Thankfully, I had archived a version should the site go down, so I’ve decided to put it up here. For posterity, and all that; and to honour the memory and legacy of one of the greatest British eccentrics and comedic masterminds ever.

As time goes on I might clean up the text here and there.

Note that no copyright infringement is intended.

Now, read on… The script is online here.

Download it in .docx or PDF:

About Firth’s “Apps for Depression” meta-analysis…

WARNING: Long rant. Shield your eyes and read with a fluffy animal, narcotics, and a bucket for vomit/stool within reach.

Great! A meta-analysis on smartphone apps for depression.

UPDATE: After I wrote this blog, Joe Firth got in touch with me on Twitter. He explained that (paraphrasing here) the reason for rating studies as “low” risk of bias was that if the target variable of interest (depression in this case) was reported in the paper, it shouldn’t really matter whether it was or wasn’t included in the trial registration. Although I see merit in that argument, I would still be distrustful of any study that has funny things going on between the trial registration and publication of the final study. Personally I would prefer to see all studies without a verified prospective trial registration rated as being at “unknown risk of bias”, since we simply cannot know what’s missing and what isn’t. For example, researchers might have chosen to measure both the CES-D and PHQ-9, but left out the CES-D scores because the results were non-significant – without a prospective trial registration, we’d have nothing to go on.

The Mental Elf is a website which usually provides excellent, readable blogs by researchers who look at new studies, critically examining them and presenting annotated results to a broad audience in accessible language. Do visit them and give them a follow on Twitter.

Recently they featured a blog on a 2017 meta-analysis by Firth et al, published in the prestigious journal World Psychiatry (link to fulltext). This being my field of study, I was of course very happy to see a meta-analysis on this subject, especially in such an authoritative journal.

I was excited to see a meta-analysis on this subject, and the Mental Elf blog was convincing: but being an inveterate methodological terrorist and an incorrigible pedant, I checked out the Firth meta-analysis. It was preregistered in PROSPERO (albeit just 2 days before searches were performed), but it is still a good thing and going beyond most meta-analyses in the field.

Thus having high expectations, but also knowing the dire state of research in the field, my heart sank when I saw the risk of bias assessment for the included trials. Although Firth et al. claim to have followed Cochrane’s guidelines for assessing risk of bias (see Handbook), it was immediately clear that they hadn’t really, at least not how I would have.

Before my rant starts, I’d like to stress here that I’m not singling out Firth et al because I hate their meta-analysis, or because they have a publication in a prestigious journal and I don’t, or because they looked at me in a funny way at a conference – I’ve seen excellent earlier work from the authors and John Torous is one of my favourite authors in the field of mHealth.

However, their meta-analysis is a good example of the lack of real assessment of bias in most meta-analyses. It’s an expectation thing: I expect meta-analysts to go beyond a perfunctory “we scanned the paper and everything seemed to be in order” to provide an in-depth synthesis of the available evidence, and to annotate on the generalisability of findings. Maybe my standards are simply too high, but as soon as I started looking into the meta-analysis I started… Seeing things.

Selective^2 outcome reporting?

How did I know about the not really following Cochrane’s handbook? Because of a prolific research practice that is endemic in most research on eHealth and psychological research in general: selective outcome reporting. Or in the case of meta-analyses: selective selective outcome reporting.

Traditionally, meta-analysts check whether study authors actually report all the stuff they said they would be measuring to prevent something naughty called ‘selective outcome reporting’. For example, your Amazing Intervention™ for depression is finished and you set out to measure depression as its primary outcome measure. Moreover, you’re interested in anxiety and stress, because you know that’s related to depression. But! Shock horror, you find that Amazing Intervention™ only has an effect on anxiety. Problem? Well… What if you just report outcomes for anxiety and tell the world that Amazing Intervention™ works wonders for anxiety? Nobody need ever know about the null findings for depression and stress, right?

To prevent this from happening in trials, and randomised controlled trials (RCTs) specifically, a number of trial registries exist (WHO, ISRCTN, ANZTR) where, ideally, researchers state a priori what, how and when they’re going to measure something. That way it is much harder to hide results, or to claim that an intervention was effective for something while in reality the researcher just kept looking until he/she found something that looked good.

Over the years, I’ve become quite suspicious of selective outcome reporting bias summaries in meta-analyses: most of them are demonstrably wrong if you actually bother to check trial registries against published research, like Cochrane advises. I’m currently working on a paper that investigates naughty stuff in trial reporting for eHealth trials in anxiety and depression (see preliminary results on this poster which I presented at a conference in 2017. Yes I know it’s an awful poster, I was in a hurry).

In that project, I compare trial protocols from international trial registries to published reports of those studies – and a consistent finding is that in many studies, outcome measures change, disappear, appear, or are switched between primary and secondary outcome measure between the protocol and the published papers. So where does the Firth meta-analysis come in?

What’s up with the Risk of Bias assessments?

In Firth’s meta-analysis, the red flag is their summary table of their risk of bias assessment (table 2 in the published paper). As you can see: for all studies, the risk of bias assessment for selective outcome reporting is “low risk of bias” (coded as “+” in the table; highlights are mine).

Table 2 from Firth, 2017b. Reproduced under Fair Use provision for criticism and commentary.

Well, this seemed highly improbable to me.

Firstly, in a sample of 18 studies it is highly unlikely that none of these studies have issues with selective outcome reporting.

Sec0ndly, if you follow Cochrane’s handbook; you’re supposed to trace trial protocols and compare those to the published work (see section 8.14.2 in the Cochrane handbook here).  Now, outcome switching or omission is naughty, but it needn’t be an issue for a meta-analysis if the outcome of interest is reported – it’s a moot point whether a primary outcome is reported as secondary, or vice-versa – as long as the outcome is there.

Yes, well, and? Well… 5 out of 18 included studies don’t have a trial registration available, or at least they’re not in the paper and I couldn’t find them in trial registries (cf. Enock et al., Howells et al., Kuhn, et al., Oh et al., Roepke et al.). Most of these were submitted to journals that claim adherence to ICMJE guidelines, which has required prospective preregistration of trials since 2005 (source). Most of the papers themselves claim adherence to the Helsinki declaration which, as not many people know, has required public and prospective preregistration since the update in 2008 (source, see points 21, 22, 35, 36). Nobody seems to check for this.

My opinion: In all of these 5 cases, the risk of bias assessment should read “unknown” since we simply cannot know whether selective outcome reporting took place.

Moreover, Arean et al’s paper points to a trial registration which is very obviously not the same study ( NCT00540865, see here). That protocol refers to a different intervention (not an app in sight) in a different population, and specifies recruitment times which don’t even overlap with the paper. At first I thought it might be a simple typo in the registration number, but the Arean authors repeatedly and consistently refer to this registration in several other papers on this particular app. Quite an editorial/reviewer oversight in multiple places. Again, this should’ve read “unknown” risk of bias at best, and perhaps “high” risk of bias.

Moreover, the study from Ly et al. has PHQ-9 and BDI as prespecified primary outcome measure (see protocol). The published study has the PHQ-9 relegated to a secondary outcome measure, but for this meta-analysis that’s not an issue. However, Firth et al report using only the BDI in Table 1 rather than pooling the effect sizes of PHQ-9 and BDI, as they say they would in the methods: “For studies which used more than one measure of depression, a mean total change was calculated by pooling outcomes from each measure.” (Firth et al., 2017, p288). This is odd, and might be an oversight, but it is a deviation from both the methods section and the PROSPERO protocol.

Missing, presumed accounted for.

Missing data is the scourge of eHealth and mHealth research, and this is no different in the Firth meta-analysis. Even though all but 2 included studies score a “low risk of bias” score for incomplete outcome data (see figure 2 in Firth), it is questionable if any type of missing data correction can account for substantial amounts of missing data.

For example, look at figure 1 in Arean et al., which shows post-test attritions of 26-33%, going up to 65-77% at follow-up. Is this an issue? Well… This is where applicaton of the Cochrane handbook comes down to interpretation. Here’s what they have to say about missing data:

The risk of bias arising from incomplete outcome data depends on several factors, including the amount and distribution across intervention groups, the reasons for outcomes being missing, the likely difference in outcome between participants with and without data, what study authors have done to address the problem in their reported analyses, and the clinical context.” (source; emphasis mine).

It comes down to interpretation: were the missing data approaches from study authors ‘good enough’ to account for systematic differences in missing data between different arms in the RCTs? The answer could be “yes”, resulting in “low risk of bias” for the “incomplete outcome data” criterion.

Then again, it could equally be argued that large amounts of missing data – endemic in eHealth/mHealth studies – are something that cannot simply be fixed by any kind of statistical correction, especially since these corrections almost always assume that data is missing at random, which they seldom are (for an excellent discussion about these issues by people way cleverer than me, see Mavridis et al here). Either way, you’re left with an imprecise result, which is not the meta-analysts’ fault – but it is something that should be discussed in detail rather than dismissed as “accounted for” and checked off with a “+” in a risk of bias table.

Clearly, this is an area where the Cochrane handbook could use some updating, and the interpretation of whether missing data are ‘accounted for’ is a judgement call. However, relying on study authors to address missing data, in my opinion at least, misses a major point of a meta-analysis: to give an objective-as-possible synthesis of available evidence and to catalogue and discuss its limitations in relation to the generalisability of the results; which is something that – with respect to all authors – I find lacking in both the Mental Elf blog and the Firth meta-analysis.

Wait, there’s more.

  • Two included studies have an obvious financial conflict of interest: Arean et al. and Roepke et al., both of which include authors paid by the app developer (see here and here[paywalled], respectively). This is usually coded as “other sources of bias”, and as such Firth et al. have coded this correctly in Table 2. Or at least I assume that’s the reason for coding these studies as high risk of bias – there may be more reasons. But none of this is discussed in the paper, which may represent an author choice – I think it would have been good to include this essential information since a) both studies report some of the highest effect sizes in the meta-analysis, and b) neither of these studies have a study protocol available, leaving the door wide open for all sorts of questionable research practices and methodological ‘tricks’. I’ve reviewed a number of mHealth papers recently: quite a few were methodologically weak high-N ‘experimercials’ conducted by mobile app firms. These had no trial registrations available (at best, retrospective ones).

Quite a COI to reduce to a single “-” in a table, and not a mention in the text. From Arean et al., JMIR 2016, reproduced under Fair Use for commentary and criticism.

  • There is something very odd going on in the main forest plot (figure 2 in the paper), which for some reason not adequately explained or explored mixes studies with active and wait-list control conditions. This was registered in the protocol, but to me this makes no sense, why would you pool the outcomes of such differing interventions with different control conditions? (this is probably where a large part of the statistical heterogeneity comes from – see table 3 in Firth).
  • The pooled g=0.383 from this forest plot is also the ‘main’ finding for the paper. Again, this makes no sense. Much less so when you read carefully and realise that these figures are – granted, as per protocol – changes in depressive scores, i.e. pre-posttest effect sizes. There are quite a number of problems with this: for starters, pre-post measurements are not independent of each other, and calculating effect sizes this way needs the pre-post correlation which is usually unknown (For a more comprehensive explanation of why you shouldn’t use pre-post scores in meta-analyses, see Cuijpers et al here). Secondly, despite randomisation, differences in pre-test scores can still exist. Randomisation is not some magical process that makes differences at baseline disappear.
  • Finally, the inclusion of studies using completers-only analyses means that the pre-post scores will be skewed towards the ‘good’ patients that bothered to fill in post-test measurements (see also “missing data”).
  • Also, the column headings for “Lower limit” and “Upper limit” appear to be switched in figure 3. This is probably Comprehensive Meta-analysis’ fault, which is a horrible buggy piece of software).
  • The reference to Reid et al, 2014 is obviously incorrect: that paper only reports secondary outcomes from the trial. The correct reference should be Reid et al, 2011. Not a biggie, but it’s a bit sloppy.

“No evidence of publication bias”?

Hahahaha! No but seriously. Of course there is publication bias, it’s a given. It’s just that we’re probably not able to detect it very well. Here’s a screenshot from the WHO trials register, where I did a quick dirty search for ((smartphone OR mobile) AND (depression OR mood)) up until Jan 1st 2016 (I generously don’t expect anything after that to be published yet).

Notice anything? It’s the “Results available” column. Some of these studies go back a decade. Many of these studies were never published. Of course there is publication bias, and this is only the preregistered studies – the Firth et al. sample alone contains 5 unregistered studies, Zarquon knows how many unregistered ones are out there (this is what makes publication bias an ‘unknowable unknown‘).

The tests currently available for checking publication bias are synthetic, statistical measures which are highly dependent on the data. At best, they are mostly useless, at worst, they are positively misleading: especially when they ‘conclude’ that there is no evidence of publication bias, as in Firth et al’s case. For an interesting overview of current methods and their performance in meta-analyses, see Carter et al here.

Case in point: screenshot from WHO trial registry. I didn’t follow up on all of these, but many of these trials on mobile apps have disappeared in a file drawer. Tell me again how there’s “no evidence of publication bias”.


Frail Fail-safe….No!

Apart from other ways to detect publication bias, Firth et al report that “Additionally, a “fail-safe N” was used to account for the file draw[sic] problem” (p.289); and happily report that “…the fail-safe N was 567 (estimating that 567 unpublished “null” studies would need to exist for the actual p value to exceed 0.05).” (p.292).

Please, meta-analysts, for the love of Zarquon, it’s 2018, stop using Fail-safe N. Only people who want to desperately “prove” something the efficacy of something use that. Perhaps we should abandon tests for publication bias altogether since it’s an “unknowable unknown”, and reports of “no publication bias” are profoundly misleading.

Are you quite done yelling, Dr Kok?

OK, now what? Well… Probably nothing. I’m probably wrong about all of this, nobody is going to care, nothing is going to change, this blog will be read by 2 people and a Google bot, and that’s it. Because who am I to argue?

In the end, it’s mostly disappointment on my part. Again, I’ve seen earlier work from the authors of this meta-analysis and generally it’s been very good. The field needs a good meta-analysis on mHealth. I’m confident the Firth and team had the best intentions of the world, they pre-registered the meta-analysis in PROSPERO, the results are available as an Open Access paper and yet… I somehow expected a more in-depth discussion from a meta-analysis in a prestigious journal (if you’re the deluded type that thinks the IF says something about quality (it doesn’t), World Psychiatry has staggering impact factor of 26.561, but using the IF is for the statistically illiterate, so… Erm let’s not go there now).

Firth et al. will probably be widely cited as an authoritative meta-analysis; especially since the take-home message of this meta-analysis for a lot of people will be “apps are moderately effective in treating depression”. But the field of mHealth deserves a more nuanced discussion of all the caveats and methodological issues that restrict the generalisability of findings in quite substantial ways.

So I’m actually hoping that I’m wrong about all of this rather than that the authors dropped the ball on a number of things. Oh, the same author team has published another meta-analysis on apps looking at anxiety – I did not look at this study in detail but it shares some of the same studies included in this meta-analysis on depression apps. Find that meta-analysis here (paywalled unfortunately).

What’s the take-away?

Well, for my part it reinforces a point made before by others: the quality of research into mHealth is very, very dubious, so much so that we cannot draw any reliable conclusions on the efficacy and effectiveness of mHealth interventions for depression. My fear is that the Firth et al. meta-analysis will cause an oversimplification effect: future trials for mHealth application will duly power their analyses to detect an expected effect size of g=0.4 without knowing exactly where this effect size came from.

TAKE HOME MESSAGE: Triallists, meta-analysts, colleagues: our field deserves better. We can do better. We should do better.

Final note

The blog authors on Mental Elf conclude by repeating a cheap shot at the RCT I’ve been hearing a lot in the past years: the RCT is apparently ‘too slow’ to keep up with technological developments. There is nothing inherently “slow” in the randomised controlled trial, but ineffective and poorly piloted recruitment practices in eHealth trials have given led to disappointing accrual rates of participants, which in turn have given the RCT an undeserved bad rap (after all, it couldn’t possibly be the case that researchers have wildly overestimated their ability to attract participants, right?).

A bad workman blames his tools, so don’t blame ineptitude in recruiting participants on the RCT. If eHealth and mHealth are to be evidence-based, they are to adhere to rigorous standards of scientific research. So don’t ditch the RCT just yet, do proper pilot studies, be realistic and optimise your recruitment strategies rather than blaming the RCT as a research tool.


Firth J, Torous J, Nicholas J, Carney R, Pratap A, Rosenbaum S, et al. The efficacy of smartphone-based mental health interventions for depressive symptoms: a meta-analysis of randomized controlled trials. World Psychiatry. 2017;16(3):287–298. PMID: 28941113 [Open Access]



Throughout the digging through studies and writing this blog I was repeatedly conflicted, and interested in cracking open a beer and blasting some Motörhead instead of wasting my spare Friday ranting on the Internet. I opted for rooibos tea and Lana del Rey instead.

This blog is CC BY-NC. Feel free to tell me how wrong I am on Twitter.

Robin looks for meta-analysis alternatives 1: JamoviMeta.

Meta-analyses. So much meta, many analyses. I’ve done a few, two are under review, and two almost ready for submission. Red thread in this is the Comprehensive Meta-Analysis (CMA) meta-analysis software package. CMA has brought the practice of meta-analysis (or ‘an exercise in mega-silliness‘, as Eysenck called it) to a broader audience because of its relative ease of use. Downside of this relative ease of use is the unbridled proliferation of biased meta-analyses that serve only ‘prove’ something works, but let’s not get into that – my blood pressure is high enough as it is.

Some years back, CMA changed from one-off purchases to an annual subscription plan, ranging from $195-$895 per year per user, obviously taking hints from other lucrative subscription-based plans (I’m looking at you, Office365). Moreover, CMA has a number of very irritating bugs and glitches: just to name a few, there’s issues with copying and pasting data, issues with not outputting high-resolution graphics but just a black screen, issues with system locale, etc. etc. On the whole, CMA is a bit cumbersome and expensive to work with, and I’ve been telling myself to go and learn R for years now; if anything to use the Metafor package, which is widely regarded as excellent.

Would I like some cheese with my whine?

However, I never found the time to take up the learning curve needed for R (i.e., I’m too stupid and lazy), and while recently whining on Twitter about how someone (most definitely not me) should make a graphical front-end for R that doesn’t pre-suppose advanced degrees in computer science, voodoo black arts and advanced nerdery; Wolfgang Viechtbauer pointed me to JamoviMeta.

In my quest to find a suitable alternative to CMA that even full-on unapologetic troglodytes like me can understand – let’s give it a test drive!

DISCLAIMER: Most of the time I have no idea that I’m doing, as will be readily apparent to any expert after even a cursory glance.


I was redirected to a github page, which instructed me to first download Jamovi, and add the module MetaModel.jmo.

Never heard of Jamovi before, but let’s give it a shot – the installer seems straightforward, MetaModel is an add-on for the Jamovi software package, which is itself a fairly new initiative at an “open” statistics package. I’m not entirely sure if Jamovi itself is an add-on to R, but at this point that’s not particularly relevant for what I want to do.

The main screen of Jamovi looks simple, clean and friendly. Now, to ‘sideload’ MetaModel. Nothing in the menu so, click Modules, sideload, find the downloaded MetaModel.jmo and import it.


JamoviMeta main window

It’s not immediately apparent where I should start – the boxes with labels like “Group one sample size” look inviting as text boxes, but entering information doesn’t work. Using the horizontal arrow to shift the 3 bubbles with “A” on the left panel to the right doesn’t work and flashes the little yellow ruler(?) in the text box which isn’t a text box.

Entering variables (note how the dialogue box resembles SPSS).

The grey arrow pointing to the right brings me to a spreadsheet-like… Well, spreadsheet. Ah! The A, B, C refer to columns in this spreadsheet, and the software’s expecting data as you’d expect: study name, sample size, mean, standard deviations. Jamovi seems to automatically recognise the type of data I’ve entered, but also seems thrown off by my use of a comma instead of a period. Incidentally, this is/was a major issue with CMA, which depends your computer’s ‘locale’ settings – if you’re from a country that uses dots for thousands and commas for decimals (eg, €10.000,00) and you send a data file to a colleague who has US numbering (eg, $10,000.00), the data would be all screwed up. Adding variable labels isn’t immediately apparent either, but double-clicking a column header and then double clicking the letter of the column lets you change the label.

Variable labels & type window

Having entered the data, I go back to “Analyse”, and try to enter my newly made data into MetaModel. Everything works, except… It won’t accept the sample sizes for my data. When I try to, it flashes the yellow ruler (?) in red – Ah, this probably means it wants continuous data, but the sample sizes had been interpreted as ordinal data as denoted by the three bubbles (same icons as in SPSS).

This being corrected, MetaModel goes straight to work (apparently), and tells me “Need to specify ‘vi’ or ‘sei’ argument”. Well obviously. More random clicking is in order, I think – that’s never failed me, since psychology students are taught to keep clicking until the window says p<0.05 or smaller*). I’ve only just entered data, and haven’t actually told MetaModel what to do so it’s no surprise that nothing works.

I flip open ‘Model options’, ‘plots’ and ‘publication bias’.

…I quickly close ‘publication bias’ again, as it only shows options for Fail-safe N. Let us never mention Fail-safe N again, and I hope the developer removes this option ASAP. I am aware of the current discussion of how Trim & Fill probably doesn’t work very well either (nor does anything else, apart from 3PSM apparently, but I think everyone can probably agree that Fail-safe N should never be used.

Clicking around a bit (I won’t go into all the different types of meta-analysis model estimators), I find out that I have to choose either ‘Raw Mean Difference’ or ‘Log Transformed Ratio of Means’ to make the “Need to specify ‘vi’ or ‘sei’ argument” message go away. Not sure what this is about. However, all this looks encouraging, and it’s time for real data.

I prepared a small data file in CMA, based on a meta-analysis we’re currently working on, using Excel as an intermediary as CMA’s data import/export capabilities as non-existent and I need to change all decimal commas to decimal points, and copy-paste the data into MetaModel. Small issue: there’s no fixed column for subgroups within studies (or maybe I’m just doing it wrong, so I renamed the studies into Kok 2014, A, B, etc.

JamoviMeta data window

CMA data window


However, running the analyses from here on was straightforward, easy and quick. The results are pretty much consistent with CMA (I used a DerSimonian-Laird model estimator, I think that is the CMA standard). I saw no strange differences or outliers, apart from a few (not particularly large) differences in effect sizes. These are probably due to subtle differences in calculations, but I take it both CMA and MetaModel have their own set of assumptions for calculations which explain the small variations. Kendall’s tau was even spot on.

MetaModel main results

CMA main results (click for bigger image)


MetaModel has tackled one of my biggest gripes with CMA: high-quality images. Unhelpfully, CMA’s so-called ‘high resolution’ outputs have been quirky, ugly and too low resolution for most journals as it would only export to Word (ugh), Powerpoint (really?) and .WMF (WTF?). In MetaModel, right-clicking e.g., the funnel plot gives you the option

Right-click graphics export options

to export the image to a high-quality PDF which looks crisp and clear (download sample PDFs of the MetaModel funnel plot and MetaModel Forest plot here).

MetaModel forest plot

CMA “high resolution” forest plot

MetaModel funnel plot

CMA funnel plot (with imputed studies)


If this is a ‘beta’, it looks and work better than OpenMetaAnalyst ever did (although to be fair, I should revisit that some time). The developer (Kyle Hamilton) has done an impressive job in coding relatively simple, but very usable module for meta-analysis. It is lightyears faster than CMA (which can crawl to a virtual stand-still on my i3 laptop) and can output high-quality graphics. Also, it does real-time analyses so there’s no need to keep mashing that “-> Run analyses” button after making small changes. Choosing Jamovi as a front-end was a good bet – its interface looks friendly modern and crisp. Of course, features are missing and this was just a very quick test run, but my first impression is very good. I’d very much like to see where this is going.


  • Pretty much MWAM (Moron Without A Manual) proof.
  • Feels much more modern than CMA. Looks better. MUCH faster.
  • More model estimators than CMA.
  • Contour-enhanced funnel plots and prediction intervals. Nice addition.
  • So far, no glitches or crashes.
  • It’s free!


  • Hover-over hints (contextual information if you hover over a button) would be nice
  • Error messages aren’t especially helpful


  • Fail-safe N.


  • Modern alternatives for publication bias, e.g. p-curve, p-uniform, PET(-PEESE) or 3PSM.
  • 95% CIs around I²
  • Support for multiple subgroups and timepoints?


*) Only a slight exaggeration: this is what students teach themselves.





High-resolution Risk of Bias assessment graph… in Excel!


Back in 2015 or 2016 when I made this, there weren’t too many alternatives to RevMan for RoB assessment graphs. You can still use the Excel thing below, but I strongly advise you to use other, newer (and vastly better) methods like the {robvis} package in R.

For more information and hands-on guidelines for {robvis}, see this excellent online resource.



The old post:

Some years ago, I found myself ranting and raving at the RevMan software kit, which is the official Cochrane Collaboration software suite for doing systematic reviews. Unfortunately, either because I’m an idiot or because the software is an idiot (possibly both), I found it impossible to export a Risk of Bias assessment graph at a resolution that was even remotely acceptable to journals. These days journals tend to accept only vector-based graphics or bitmap images in HUGE resolutions (presumably so they can scale these down to unreadable smudges embedded in a .pdf). At that time I had a number of meta-analyses on my hands so I decided to recreate the RevMan-style risk of bias assessment graph, but in Excel. This way anyone can make crisp-looking risk of bias assessment graphs at a resolution higher than 16dpi (or whatever pre-1990 graphics resolution RevMan appears to use…)

The sheet is relatively easy to use, just follow the embedded instructions. You need (1) percentages from your own risk of bias assessment (2) basic colouring skills that I’m sure you’ve picked up before the age of 3. All you basically do to make the risk of bias assessment graph is colour it in using Excel. It does involve a bit of fiddling with column and row heights and widths, but it gives you nice graphs like these:

Sample Risk of Risk of Bias assessment graph

Sample Risk of Bias Graph

Like anything I ever do, this comes with absolutely no guarantee of any kind, so don’t blame me if this Excel file blows up your computer, kills your pets, unleashes the Zombie Apocalypse or makes Jason Donovan record a new album.

Download available here (licensed under Creative Commons BY-SA):

Risk of Bias Graph in Excel – v2.6

MD5: 1FF2E1EED7BFD1B9D209E408924B059F


UPDATE November 2017 – I only just noticed that the first criterion says “Random sequence allocation” where it should of course say “generation“. Version 2.6 fixes this.

UPDATE January 2017 – another friendly person noted that I’m an idiot and hadn’t fixed the column formatting problem in the full Cochrane version of the Excel. Will I ever learn? Probably not. Version 2.5 corrects this (and undoubtedly introduces new awful bugs).

UPDATE September 2016 – a friendly e-mailer noted that the sheet was protected to disallow column formatting (which makes the thing useless). Version 2.4 corrects this.





eMental Health interview with VGCt [Dutch]

Nothing like an interview on eMental Health to make you feel important

I’m still reeling from the festivities surrounding my H-index increase from 3 (“aggressively mediocre“) to 4 (“impressively flaccid but with mounting tumescence“)*. Best gift I got: a sad, weary stare from my colleagues. Yay! But back to eMental Health (booooo hisssss).

Some while back I did an interview (in Dutch) with Anja Greeven from the Dutch Association for Cognitive Behavioural Therapy [Vereniging voor Gedragstherapie en Cognitieve Therapie] for their Science Update newsletter in December 2015. It’s about life, the universe and everything; but mostly about eHealth and eMental Health; implementation (or lack thereof), wishful thinking, perverse incentives (you have a filthy mind) and that robot therapist we’ve all been dreaming about (sorry, Alan Turing).

Kudos to me for the wonderful contradiction where I call everyone predicting the future a liar and a charlatan; after which I blithely shoot myself in the foot by trying to predict the future. In my defense, I never claimed I wasn’t a liar and a charlatan. It was great fun blathering on about all kinds of things, and massive respect to Anja who had to wade through a 2-hour recording of my irritating voice to find things that might pass as making sense to someone, presumably.

Anyway, the interview is in Dutch, so good luck Google Translating it!

Link to the VGCt interview in .pdf [Dutch]


*) Real proper technical sciencey descriptions for these numbers, actually. The views expressed in this interview are my own; and nobody I know or work for would ever endorse the silly incoherent drivel I’ve put forward in this interview.

Save the Data! Data integrity in Academia

Data integrity is integral to reproducibility.

I recently read something on an Internet web site called Facebook, it’s supposed to be quite the thing at the moment. Friend and skeptical academic James Coyne, whose fearless stabs at the methodologically pathetic and conceptually weak I much admire, instafacetweetbooked a post over at Mind the Brain, pointing to a case in post-publication peer review that made me wonder whether I was looking at serious academic discourse or toddlers in kindergarten trying to smash each other’s sand castles. James and I have co-authored a manuscript about the shortcomings in psychotherapy research which is available freely here, and I’m ashamed so say that I still haven’t met up with James in person, although he’s tried to get a hold of me more than once when he was in Amsterdam.

Anyway, point in case, during post-publication peer review where these reviewers highlighted flaws in the original analysis, the original authors had manipulated the published data to pretend the post-publication peer reviewers were a bunch of idiots who didn’t know what they’re doing. This is clearly pathetic and must have been immensely frustrating for the post-publication reviewers (it was a heroic feat in itself to be able to prove such devious manipulations in the first place, thankfully they took close note of the data set time stamps).

What can be done? Checking time stamps is trivial, but so is manipulating time stamps. My mind immediate took to what nerdy computery types like me have used for a very, very long time: file checksums. We use these things to check whether, for example, the file we just downloaded didn’t get corrupted somewhere along the sewer pipes of the Internet. Best known, probably, are MD5-hashes, a cryptographic hash of the information in a file. MD5-hases are unique: they are composed of 32 alphanumeric characters (A-Z, 0-9) which yields (26+10)^32 = 6,3340286662973277706162286946812e+49 different combinations. That’ll do nicely to catalogue all the Internet’s cat memes with unique hashes from decades past and aeons to come, and then some. So, if I were to download nyancat.png from, I could calculate the hash of that downloaded file using, e.g., the excellent md5check.exe by Angus Johnson, which gives me a unique 32-character hash; which I could then compare with the hash as shown on Few things are worse than corrupted cat memes, really, but let’s consider that these hashes are equally useful to check whether a piece of, say, security software, wasn’t tampered with somewhere between the programmer’s keyboard and your hard drive – it’s the computer equivalent of putting a tiny sliver of sellotape on the cookie jar to see that nobody’s nicking your Oreos.

How can all this help us in science and the case stated above? Let’s try to corrupt some data. Let’s look at the SPSS sample data file “anticonvulsants.sav” as included in IBM SPSS21. It’s a straightforward data set looking at a multi-centre pharmacological intervention for an anticonvulsant vs. placebo, for patients followed for a number of weeks, reporting number of convulsions per patient per week as a continuous scale variable. The MD5 hash (“checksum”) for this data file is F5942356205BF75AD7EDFF103BABC6D3 as reported by md5check.exe.


First, I duplicate the file (anticonvulsants (2).sav), and md5check.exe tells me that the checksum matches with the original [screenshot] – these files are bit-for-bit exactly the same. The more astute observer will wonder why changing the filename didn’t change the checksum (bit-for-bit, right?). Let’s not go into that much detail, but Google most assuredly is your friend if you really must know.

Now, to test the anti-tamper check, let’s say we’re being mildly optimistic about the number of convulsions that our new anticonvulsant can prevent. Let’s look at patient 1FSL from centre 07057.  He’s on our swanky new anticonvulsant, and the variable ‘convulsions’ tells us he’s had 2, 6, 4, 4, 6 and 3 convulsions each week, respectively. But I’m sure the nurses didn’t mean to report that. Perhaps they mistook his spasmodic exuberance during spongey-bathtime as a convulsion? Anyway. I’m sure they meant to report 2 fewer convulsions per week as he gets the sponge twice a week, so I subtract 2 convulsions for each week, leaving us with 0, 4, 2, 2, 4 and 1 convulsions.

Let’s save the file and, compare checksums against the original data file.


Oh dear. The data done broke. The resulting checksum for the… enhanced dataset is E3A79623A681AD7C9CD7AE6181806E8A, which is completely different from the original checksum, which was F5942356205BF75AD7EDFF103BABC6D3 (are you convulsing yet?).

Since the MD5-hashes are unique, changing just a single bit of information in a data file compromises data integrity; and regular numbers take up more than just one bit of information. Be it data corruption or malicious intent, if there’s a mismatch in files then there’s a problem. Is this a good point to remind you that replication is a fundamental underpinning of science? Yes it is.

This was just a simple proof-of-concept and I sure this has been done before. The wealth of ‘open data’ means that data are – to both honest re-analysis and dishonest re-analysis. To ensure data-integrity, when graciously uploading raw data with a manuscript, why not include some kind of digital watermark? In this example, I’ve used the humble (and quite vulnerable) MD5-hash to show how a untampered dataset would pass the checksum test, making sure that re-analysts are all singing from the same datasheet as the original authors, to horribly butcher a metaphor. Might I suggest, “Supplement A1. Raw Data File. MD5 checksum F5942356205BF75AD7EDFF103BABC6D3”.


New paper in the bulletin of the EHPS


What’s up with the speed of eHealth implementation?

Fresh off the virtual press at the bulletin of the European Health Psychology Society: Jeroen Ruwaard and I investigate into the rapid pace of eHealth implementation. Many bemoan the slow implementation and uptake of eHealth, but aren’t we in fact going too quickly? We examine four arguments to implement unvalidated (i.e., not evidence-based) interventions and find them quite lacking in substance, if not style.

Ruwaard, J. J., & Kok, R. N. (2015). Wild West eHealth: Time to Hold our Horses? The European Health Psychologist, 17(1).

Download the fulltext here [free, licensed under CC-BY].

Re-amping Ritual, Rejoice!

In preparing the re-issue of our critically acclaimed but sold out 2003 debut album The Apotheosis, we decided to have a little fun and re-record a few of the old tracks, just a tad shy of 10 years later. And wow, has technology come a long way since 2002/2003. We now do basically everything ourselves. No wait, we literally do everything ourselves, apart from mixing and mastering. Most of us still remember fiddling about on little 4-track Tascam recorders that used ordinary cassette tapes, nowadays we do 8-track digital stuff in Protools in unimaginable sound quality without even batting an eyelid. Now, I’ve always been a big fan of re-amping.

The contenders. Left to right: Røde NT1000, Shure SM58, Audio Technica ATM25 (x2), Audio Technica ATM21, Audio Technica ATM31R, Audio Technica AT4033a, AKG D112.

The contenders. Left to right: Røde NT1000, Shure SM58, Audio Technica ATM25 (x2), Audio Technica ATM21, Audio Technica ATM31R, Audio Technica AT4033a, AKG D112.

Long story short, it means not recording a thundering amp while you play, but record just the instrument and play it back through an amp later. This has a number of advantages, but for DIY-types like us the biggest is having total control over your sound while you’re not playing. Essentially you get to be the bass player and sound engineer in one and you don’t have to play something, listen back, put down your bass, fiddle with your amp/microphone, put on your bass, play something, do it all over again, ad nauseam. Armed with a nice selection of microphones we set to with an Ampeg 8×10 loaned to us by Tom of the almighty Dead Head. I used my SVP-PRO (we are inseparable) and trusty Peavey power amp, and started experimenting with microphone placement and combinations.

D112, ATM25 and AT4033a in action. NT1000 to the far left in the corner, not in the pic.

D112, ATM25 and AT4033a in action. NT1000 to the far left in the corner, not in the pic.

The winning combination turned out to be the ATM25 off-axis, right on the edge of the cone at 45 degrees, edged back just about an inch, with the AT4033a at 70 cms (2.3 feet), just about in the vertical centre of the 8×10.



I used Audacity to make these cool plots, and the graphs clearly show the differences in microphone signals. At the end of the day, the D112 was too boomy anywhere near the speaker cone (the very proximity effect the D112 is ‘famous’ for), the ATM25 sounded simply more gritty, dark and… well, evil. The AT4033a complemented the ATM25 perfectly, topping off the ATM25’s low-end gurgle with a snappy, gnarly high-mid end. Interestingly, the Røde NT1000 stashed away in the far corned picked up quite some lows and mids as you can see by the huge hump below 100Hz, but I’m not sure we’re going to use it (there is quite an audible rattle in there somewhere from something vibrating).


Shoddily pasted graph showing the frequency responses of the different mikes in their different settings. Note the huge low-end response on the NT1000 condenser!

Here are some sound samples, straight from the board with just a touch of compression (1:2.5, 0.1msec attack, 2sec decay).

Røde NT-1000

Audio Technica ATM4033a

Audio Technica ATM25

AKG D112

Quick thought

Meta-analyses are ventriloquist’s dummies. Sitting on a wise man’s knee they may be made to utter words of wisdom; elsewhere, they say nothing, or talk nonsense, or indulge in sheer diabolism.” – Adapted from Aldous Huxley

Corrected JMIR citation style for Mendeley desktop


100 out of 100 academics agree that working with Endnote is about as enjoyable as putting your genitals through a rusty meat grinder while listening to Justin Bieber’s greatest hits at full blast and being waterboarded with liquid pig shit. I’ve spent countless hours trying to salvage the broken mess that Endnote leaves and have even lost thousands of carefully cleaned and de-duplicated references for a systematic review due to a completely moronic ‘database corruption’ that was unrecoverable.

Thankfully, there is an excellent alternative in the free, open source (FOSS) form of Mendeley Desktop, available for Windows, OS X, iToys and even Linux (yay!).

One of the big advantages of Mendeley over Endnote, apart from it not looking like the interface from a 1980s fax machine, is the ability to add, customise and share your own citation styles in the .csl (basically xml/Zotero) markup. While finishing my last revised paper I found out that the shared .csl file for the Journal of Medical Internet Research (a staple journal for my niche) is quite off and throws random, unnecessary fields in the bibliography that did not conform to JMIR’s instructions for authors.

The online repository of Mendeley is pretty wonky and the visual editor isn’t too user friendly, so I busted out some seriously nerdy h4xx0rz-skillz (which chiefly involved pressing backspace a lot) .

Get it.

Well, with some judicious hacking, I present to you a fixed JMIR .csl file for Mendeley (and probably Zotero, too). Download the JMIR .csl HERE (probably need to click ‘save as’, as your browser will try to display the xml stream). It’s got more than a few rough edges but it works for the moment. Maybe I’ll update it some time.

According to the original file, credits mostly go out to Michael Berkowitz, Sebastian Karcher and Matt Tracy. And a bit of me. And a bit of being licensed under a Creative Commons Attribution-ShareAlike 3.0 License. Don’t forget to set the Journal Abbreviation Style correctly in the Mendeley user interface.

Oh, I also have a Mendeley profile. Which may or may not be interesting. I’ve never looked at it. Tell me if there’s anything interesting there. So, TL;DR: Mendeley is FOSS (Free Open Source Software), Endnote is POSS (Piece of Shit Software).

Update: A friendly blogger from Zoteromusings informed me in the comments that I was wrong: Mendeley is indeed not FOSS but just free to use, and not open source. Endnote is still a piece of shit, though. I was right about that 😉