Tech & research - Dr Scott Leatham

Research should be transparent, accountable, and accessible. The tools that make it should be too.

I’ve been replacing software with open source alternatives for years, and a few folk have asked how this works with research. One of the most important aspects of being an independent researcher was making research easier to access, understand, and use in the real world. And just like my much-laboured point that research is not value neutral, and should be done in line with our social change ethos, technology can and should be operated with a conscience. That’s a lot easier to do a) when the people dominating the industry aren’t lining themselves up behind proto-dictators and b) when we know what our software is actually doing.

And it can be free. Maybe I should have led with that. (And if you listen carefully you’ll also hear the plaintive cry of an anguished tech billionaire.)

As an independent consultant and researcher, I use a variety of research-specific and day-to-day software. In my view, technology (including software) is not inherently evil or problematic: but it is at the vanguard of a highly problematic system. It depends on who owns it, runs it, benefits from it, and who bears the burdens. Many of the practices of large tech and software corporations are not defensible ethically. For example, a basic tenet of social research is safeguarding sensitive or private information. But if that information is stored or shared within ecosystems designed to trawl it for marketing, AI development, and so on, how can we guarantee this?

Countering tech monopolies is an approach that many would like to take, but it can be difficult or daunting to find good alternatives to dominant corporations. To help other researchers, students, and others working on similar things, I keep a list of the open-source software and approaches I use. This is also to help draw attention to good projects. Remember: community run projects need citing, like you would other resources you draw on. If you use a community -made resource regularly, consider donating to it.

Community supported, open-source software provides a model for collaborative working that challenges the profit- and extraction-centric logics of big tech companies. I like to think of this aspect as a kind of prefigurative politics: it’s a mode of relating to others that anticipates a better, fairer system, even as it has to compromise to survive in the current one. By being open source, there can be greater confidence over what is happening to the data as it is auditable by anyone familiar with code. This also typically means the software is free to use. To make that happen, the project is usually supported by small donations from its community of users, or in the case of research focused software, sometimes hosted by universities or academic projects.

There’s an additional benefit in avoiding the ludicrous e-waste we generate in the constant search for false novelty. For hardware like laptops, desktops, and smartphones, open-source software and firmware can often rescue older tech by being less resource heavy (see specially ‘operating system’) – that means saving a device from waste and getting more life out of it. The most ‘sustainable’ tech is the stuff you already have (or someone else has). Personally, I like using 2^nd hand devices (from a reputable place) in this way, you just need to remember to scrub the storage drives: don’t rely on the previous (or next) owner to do it.

The below list will be updated as I learn more. And, I’ve assumed folk reading this are not experts – because the transition from broligarchy needs to be accessible!

Research & analysis - open source alternatives

Anyone needing dedicated, and often very expensive, research software for qualitative, quantitative, field-based and desk-based work may find this useful. If you work for a university, check that you’re actually allowed to use software that isn’t prescribed for interviewing, surveying, storing, etc. (Yes, that is a thing, a very annoying thing that even runs contrary to the goal of protecting data, but I digress.)

Referencing / citing

try Zotero – “collect, organize, annotate, cite, and share research”, as they say. It’s free, community-supported, and open source. It can even speak to other software, such as web browsers (see below on that). I like its group work credentials so you can use free software to collaborate with people and organisations that don’t have expensive subscription models. Very handily, you can tell it which precise referencing format to output in (e.g. all the Chicagos and Harvards, etc.) to match your university, publication, or other style guide. Also check out CiteAs, especially for citing more obscure things.

A note to students: learn to cite and compile a bibliography manually first! All referencing software will output occasional errors, and you’ll only spot this if you know what you’re looking at.

Qualitative / textual / archival / Nvivo-y stuff

try CATMA – this a good alternative to archive / corpus management and analysis software like Nvivo. I’ve been recommending it to students who a) hate Nvivo’s complexity and b) cannot access it through their university (which varies a lot). It handles thematic analysis (a widely used qualitative research approach) very well, plus you can manage groups for collaboration, including new users. It’s also pretty resource efficient and can run in-browser for older systems. All free! Combining this with Zotero makes for easy solo or group work on annotating and analysing documents, and keeping track of the references. I organised a participatory discourse analysis summer school on greenwashing with CATMA, and folk (all new users) really liked it.

Statistics

For replacing the university mainstay SPSS, there are a few options and you might want to experiment. GNU PSPP, made by Free Software Foundation – who also support a brilliant range of other software – has a long history and is well supported. Don’t worry too much about the GNU side of things, you can download a version of PSPP straight to a Windows, MacOS, or Linux system (more on these below…). The other option is JASP which is maintained by academics at the University of Amsterdam.
And let’s get this out of the way: lovers of R language statistics already know their options and they probably never used SPSS anyway. Well done on learning R. For everyone else, you can learn new languages like R and this genuinely will replace proprietary software, make you part of a pretty handy community, and even help you get started with programming. My only issue with recommending using R directly is the learning curve. It isn’t practical or efficient for people just dabbling in statistics, but hardcore quants researchers will probably find it rewarding.

Transcription

Note to students: Manual transcription can be annoying. But I do still think it has the edge if typing is accessible to you. This isn’t an anti-tech stance (although transcription software still discriminates against many accents and languages), it’s methodological and just regular logical. Any output from your software will still have to be checked over line-by-line manually for the inevitable errors. It (probably) won’t capture things like sighing, laughing, etc, but these are important. And methodologically, the whole point is getting familiar with your data. Listening carefully and reproducing is just part of that – treat transcribing as the first analytical pass over your data. Finally – if your super-accurate AI software is doing the work, that’s because it’s passing it to a cloud-based service where you are mostly likely giving away rights on it. That’s ethically dodgy (is a participant’s interview data ‘yours’ to give away?). You can install a local-only LLM to ensure no cloud based anything is involved, and eat up your own processing power / energy. By the time you’ve done all that you should have just transcribed it manually.
Answer 2: we’re still doing machines, so what’s the best for avoiding data leakage and the big AIs? There’s only one solution I’ve played with here, which is Whishper – it’s a local-only large language model for audio-to-text (so your data stays where it should – in your hands). There’s a bit of messing around to install it locally, but this isn’t especially complicated. The issue is that both its speed and accuracy are resource dependent – if you have lots of processing power (and a graphics processing unit might help), plus you can store the largest language model locally, you’ll get better results. That’s inherent to how AI works, though, you’re just doing that work on your own hardware. And remember: there’s still an error rate! In English, that averages about 4% on this model with larger language model.

Literature searching

Paid-for platforms in academic literature are at the root of the grotesque academic publishing industry. I’ve worked with a fabulous group recently to establish a new journal founded on egalitarian, open access, peer review, and accessibility principles – but this is only one field. Unfortunately there are no immediate solutions to this systemic problem, and even open access tools can only index what is there. That said, there are some great indexing tools. Elsevier runs the paid-for industry standard ‘Scopus’, and many academics maintain an Elsevier boycott. Semantic Scholar is non-profit and free to use. It isn’t open source, but the free versions of Lens offer full function for patent and literature searching.
Try Citation Gecko for a way to visualise relevant literature as a web of interconnecting knowledge (what, at its best, academic literature ought to be). ‘Connected papers’ and ‘Litmaps’ offer similar things, but with a pricing model (with a free basic option). This is a great way of finding where the relevant arguments are, even though in reality it’s just a neat way of visualising citation links. Students: This is a nifty visual reminder that academic papers do not stand alone – your reviewing should ask ‘what is missing? Where can I contribute? Who dis/agrees with whom’ not ‘which single paper agrees with me?’.