You can also join our Discord
server!
If you found Harmony helpful, you can leave us a
review!
๐ง What Does Harmony Do?
Psychologists and social scientists often have to match items in different questionnaires, such as:
โI often feel anxiousโ
โFeeling nervous, anxious or afraidโ
This process is called harmonisation.
๐น The Problem
- Harmonisation is a time-consuming and subjective process.
- Researchers have to manually go through long PDFs of questionnaires.
- Extracting questions and putting them into Excel is tedious.
๐น The Solution: Harmony
๐ Harmony uses natural language processing (NLP) and generative AI models to: - Automatically match similar questionnaire items.
- Help researchers work across multiple languages.
- Save time and effort in data harmonisation.
๐ Looking for Examples?
Check out our examples repository for hands-on demonstrations. ## ๐ The Harmony Project
Harmony is an AI-powered tool that helps researchers compare items from questionnaires and identify similar content.
๐น Try Harmony: Harmony Web App
๐น Read our blog: Harmony Blog
๐ Who to Contact?
๐น Harmony Team: harmonydata.ac.uk
๐น Thomas Wood: fastdatascience.com
Getting started with the Harmony R library
-
Check out this video walkthrough installing and running R on Windows 10.
-
You can run the walkthrough Python notebook in Google Colab with a single click:
-
You can also download an R markdown notebook to run in R Studio:
-
You can run the walkthrough R notebook in Google Colab with a single click:
Installing R library
You can install the development version of harmonydata from GitHub with:
#install.packages("devtools") # If you don't have devtools installed already. library(devtools) #> Loading required package: usethis
devtools::install_github("harmonydata/harmony_r") #> Using GitHub PAT from the git credential store. #> Skipping install of 'harmonydata' from a github remote, the SHA1 (5104e35b) has not changed since last install. #> Use `force = TRUE` to force installation
or you can install it via CRAN:
install.packages("harmonydata")Setting up domain
Before starting, you can set up the remote API endpoint for harmony using this function. By default it uses the remote Harmony API https://api.harmonydata.ac.uk
For example, if you want to use Harmony locally, you can run the Harmony API as a Docker container. By default it runs on localhost at port 8000. In this case you can run this command to run it locally:
docker run -p 8000:8000 harmonydata/harmonylocal
Now in R you can set the R library to point to your local Harmony on Docker.
harmonydata::set_url("http://localhost:8000")
Parsing a raw file into an Instrument
If you want to read in a raw (unstructured) PDF or Excel file, you can do this via a POST request to the REST API. This will convert the file into an Instrument object in JSON. It returns the instrument as a list.
library(harmonydata) instrument = load_instruments_from_file(path = "examples/GAD-7.pdf") names(instrument[[1]]) #> [1] "file_id" "instrument_id" "instrument_name" "file_name" #> [5] "language" "questions"
You can also input a url containing the questionnaire.
instrument_2 = load_instruments_from_file("https://medfam.umontreal.ca/wp-content/uploads/sites/16/GAD-7-fran%C3%A7ais.pdf") names(instrument_2[[1]]) #> [1] "file_id" "instrument_id" "instrument_name" "file_name" #> [5] "language" "questions"
Matching instruments
You can get a list containing the results of the match. Here we can see a list of similarity score for each question comapred to all the other questions in th other questionaire.
instruments = append(instrument, instrument_2) match = match_instruments(instruments) names(match) #> [1] "instruments" #> [2] "questions" #> [3] "matches" #> [4] "query_similarity" #> [5] "closest_catalogue_instrument_matches" #> [6] "instrument_to_instrument_similarities" #> [7] "clusters"
Here is how the matches look like.
match$matches #> [[1]] #> [[1]][[1]] #> [1] 1 #> #> [[1]][[2]] #> [1] 0.5830621 #> #> [[1]][[3]] #> [1] 0.6179736 #> #> [[1]][[4]] #> [1] 0.4357673 #> #> [[1]][[5]] #> [1] 0.4945895 #> #> [[1]][[6]] #> [1] 0.5529693 #> #> [[1]][[7]] #> [1] 0.7089151 #> #> [[1]][[8]] #> [1] 0.2380928 #> #> [[1]][[9]] #> [1] 0.2814474 #> #> [[1]][[10]] #> [1] 0.894249 #> #> [[1]][[11]] #> [1] 0.6634801 #> #> [[1]][[12]] #> [1] 0.5109949 #> #> [[1]][[13]] #> [1] 0.5931828 #> #> [[1]][[14]] #> [1] 0.4505574 #> #> #> [[2]] #> [[2]][[1]] #> [1] 0.5830621 #> #> [[2]][[2]] #> [1] 1 #> #> [[2]][[3]] #> [1] 0.7629658 #> #> [[2]][[4]] #> [1] 0.4594004 #> #> [[2]][[5]] #> [1] 0.4558097 #> #> [[2]][[6]] #> [1] -0.4613766 #> #> [[2]][[7]] #> [1] 0.5173815 #> #> [[2]][[8]] #> [1] -0.2566257 #> #> [[2]][[9]] #> [1] -0.2383574 #> #> [[2]][[10]] #> [1] 0.60493 #> #> [[2]][[11]] #> [1] 0.8852125 #> #> [[2]][[12]] #> [1] 0.5615149 #> #> [[2]][[13]] #> [1] -0.4793222 #> #> [[2]][[14]] #> [1] -0.4719152 #> #> #> [[3]] #> [[3]][[1]] #> [1] 0.6179736 #> #> [[3]][[2]] #> [1] 0.7629658 #> #> [[3]][[3]] #> [1] 1 #> #> [[3]][[4]] #> [1] 0.3895614 #> #> [[3]][[5]] #> [1] 0.3963558 #> #> [[3]][[6]] #> [1] 0.4716267 #> #> [[3]][[7]] #> [1] 0.6041647 #> #> [[3]][[8]] #> [1] 0.2892596 #> #> [[3]][[9]] #> [1] 0.2572643 #> #> [[3]][[10]] #> [1] 0.6280157 #> #> [[3]][[11]] #> [1] 0.7662809 #> #> [[3]][[12]] #> [1] 0.5106027 #> #> [[3]][[13]] #> [1] 0.4637058 #> #> [[3]][[14]] #> [1] 0.5593851 #> #> #> [[4]] #> [[4]][[1]] #> [1] 0.4357673 #> #> [[4]][[2]] #> [1] 0.4594004 #> #> [[4]][[3]] #> [1] 0.3895614 #> #> [[4]][[4]] #> [1] 1 #> #> [[4]][[5]] #> [1] 0.6178267 #> #> [[4]][[6]] #> [1] 0.3250091 #> #> [[4]][[7]] #> [1] 0.3117914 #> #> [[4]][[8]] #> [1] 0.1839352 #> #> [[4]][[9]] #> [1] 0.2985738 #> #> [[4]][[10]] #> [1] 0.4527453 #> #> [[4]][[11]] #> [1] 0.4667662 #> #> [[4]][[12]] #> [1] 0.5440194 #> #> [[4]][[13]] #> [1] 0.3540848 #> #> [[4]][[14]] #> [1] 0.2137841 #> #> #> [[5]] #> [[5]][[1]] #> [1] 0.4945895 #> #> [[5]][[2]] #> [1] 0.4558097 #> #> [[5]][[3]] #> [1] 0.3963558 #> #> [[5]][[4]] #> [1] 0.6178267 #> #> [[5]][[5]] #> [1] 1 #> #> [[5]][[6]] #> [1] 0.3895386 #> #> [[5]][[7]] #> [1] 0.4360376 #> #> [[5]][[8]] #> [1] 0.2008711 #> #> [[5]][[9]] #> [1] 0.2626984 #> #> [[5]][[10]] #> [1] 0.4596621 #> #> [[5]][[11]] #> [1] 0.4473513 #> #> [[5]][[12]] #> [1] 0.6250208 #> #> [[5]][[13]] #> [1] 0.4114662 #> #> [[5]][[14]] #> [1] 0.2880645 #> #> #> [[6]] #> [[6]][[1]] #> [1] 0.5529693 #> #> [[6]][[2]] #> [1] -0.4613766 #> #> [[6]][[3]] #> [1] 0.4716267 #> #> [[6]][[4]] #> [1] 0.3250091 #> #> [[6]][[5]] #> [1] 0.3895386 #> #> [[6]][[6]] #> [1] 1 #> #> [[6]][[7]] #> [1] 0.4438164 #> #> [[6]][[8]] #> [1] 0.3468708 #> #> [[6]][[9]] #> [1] 0.3111583 #> #> [[6]][[10]] #> [1] 0.5644366 #> #> [[6]][[11]] #> [1] 0.5049124 #> #> [[6]][[12]] #> [1] 0.5719854 #> #> [[6]][[13]] #> [1] 0.9502258 #> #> [[6]][[14]] #> [1] 0.3653329 #> #> #> [[7]] #> [[7]][[1]] #> [1] 0.7089151 #> #> [[7]][[2]] #> [1] 0.5173815 #> #> [[7]][[3]] #> [1] 0.6041647 #> #> [[7]][[4]] #> [1] 0.3117914 #> #> [[7]][[5]] #> [1] 0.4360376 #> #> [[7]][[6]] #> [1] 0.4438164 #> #> [[7]][[7]] #> [1] 1 #> #> [[7]][[8]] #> [1] -0.1535627 #> #> [[7]][[9]] #> [1] -0.153154 #> #> [[7]][[10]] #> [1] 0.612879 #> #> [[7]][[11]] #> [1] 0.541166 #> #> [[7]][[12]] #> [1] 0.5295712 #> #> [[7]][[13]] #> [1] 0.5013311 #> #> [[7]][[14]] #> [1] 0.8445888 #> #> #> [[8]] #> [[8]][[1]] #> [1] 0.2380928 #> #> [[8]][[2]] #> [1] -0.2566257 #> #> [[8]][[3]] #> [1] 0.2892596 #> #> [[8]][[4]] #> [1] 0.1839352 #> #> [[8]][[5]] #> [1] 0.2008711 #> #> [[8]][[6]] #> [1] 0.3468708 #> #> [[8]][[7]] #> [1] -0.1535627 #> #> [[8]][[8]] #> [1] 1 #> #> [[8]][[9]] #> [1] 0.5548581 #> #> [[8]][[10]] #> [1] 0.2341754 #> #> [[8]][[11]] #> [1] 0.3289153 #> #> [[8]][[12]] #> [1] 0.3237803 #> #> [[8]][[13]] #> [1] 0.3217046 #> #> [[8]][[14]] #> [1] 0.1625244 #> #> #> [[9]] #> [[9]][[1]] #> [1] 0.2814474 #> #> [[9]][[2]] #> [1] -0.2383574 #> #> [[9]][[3]] #> [1] 0.2572643 #> #> [[9]][[4]] #> [1] 0.2985738 #> #> [[9]][[5]] #> [1] 0.2626984 #> #> [[9]][[6]] #> [1] 0.3111583 #> #> [[9]][[7]] #> [1] -0.153154 #> #> [[9]][[8]] #> [1] 0.5548581 #> #> [[9]][[9]] #> [1] 1 #> #> [[9]][[10]] #> [1] 0.3128226 #> #> [[9]][[11]] #> [1] -0.3486197 #> #> [[9]][[12]] #> [1] 0.2828471 #> #> [[9]][[13]] #> [1] 0.3370971 #> #> [[9]][[14]] #> [1] -0.217787 #> #> #> [[10]] #> [[10]][[1]] #> [1] 0.894249 #> #> [[10]][[2]] #> [1] 0.60493 #> #> [[10]][[3]] #> [1] 0.6280157 #> #> [[10]][[4]] #> [1] 0.4527453 #> #> [[10]][[5]] #> [1] 0.4596621 #> #> [[10]][[6]] #> [1] 0.5644366 #> #> [[10]][[7]] #> [1] 0.612879 #> #> [[10]][[8]] #> [1] 0.2341754 #> #> [[10]][[9]] #> [1] 0.3128226 #> #> [[10]][[10]] #> [1] 1 #> #> [[10]][[11]] #> [1] 0.712629 #> #> [[10]][[12]] #> [1] 0.5177428 #> #> [[10]][[13]] #> [1] 0.6094118 #> #> [[10]][[14]] #> [1] 0.4456488 #> #> #> [[11]] #> [[11]][[1]] #> [1] 0.6634801 #> #> [[11]][[2]] #> [1] 0.8852125 #> #> [[11]][[3]] #> [1] 0.7662809 #> #> [[11]][[4]] #> [1] 0.4667662 #> #> [[11]][[5]] #> [1] 0.4473513 #> #> [[11]][[6]] #> [1] 0.5049124 #> #> [[11]][[7]] #> [1] 0.541166 #> #> [[11]][[8]] #> [1] 0.3289153 #> #> [[11]][[9]] #> [1] -0.3486197 #> #> [[11]][[10]] #> [1] 0.712629 #> #> [[11]][[11]] #> [1] 1 #> #> [[11]][[12]] #> [1] 0.6538957 #> #> [[11]][[13]] #> [1] 0.5488661 #> #> [[11]][[14]] #> [1] 0.539001 #> #> #> [[12]] #> [[12]][[1]] #> [1] 0.5109949 #> #> [[12]][[2]] #> [1] 0.5615149 #> #> [[12]][[3]] #> [1] 0.5106027 #> #> [[12]][[4]] #> [1] 0.5440194 #> #> [[12]][[5]] #> [1] 0.6250208 #> #> [[12]][[6]] #> [1] 0.5719854 #> #> [[12]][[7]] #> [1] 0.5295712 #> #> [[12]][[8]] #> [1] 0.3237803 #> #> [[12]][[9]] #> [1] 0.2828471 #> #> [[12]][[10]] #> [1] 0.5177428 #> #> [[12]][[11]] #> [1] 0.6538957 #> #> [[12]][[12]] #> [1] 1 #> #> [[12]][[13]] #> [1] 0.6412413 #> #> [[12]][[14]] #> [1] 0.4908774 #> #> #> [[13]] #> [[13]][[1]] #> [1] 0.5931828 #> #> [[13]][[2]] #> [1] -0.4793222 #> #> [[13]][[3]] #> [1] 0.4637058 #> #> [[13]][[4]] #> [1] 0.3540848 #> #> [[13]][[5]] #> [1] 0.4114662 #> #> [[13]][[6]] #> [1] 0.9502258 #> #> [[13]][[7]] #> [1] 0.5013311 #> #> [[13]][[8]] #> [1] 0.3217046 #> #> [[13]][[9]] #> [1] 0.3370971 #> #> [[13]][[10]] #> [1] 0.6094118 #> #> [[13]][[11]] #> [1] 0.5488661 #> #> [[13]][[12]] #> [1] 0.6412413 #> #> [[13]][[13]] #> [1] 1 #> #> [[13]][[14]] #> [1] 0.4567534 #> #> #> [[14]] #> [[14]][[1]] #> [1] 0.4505574 #> #> [[14]][[2]] #> [1] -0.4719152 #> #> [[14]][[3]] #> [1] 0.5593851 #> #> [[14]][[4]] #> [1] 0.2137841 #> #> [[14]][[5]] #> [1] 0.2880645 #> #> [[14]][[6]] #> [1] 0.3653329 #> #> [[14]][[7]] #> [1] 0.8445888 #> #> [[14]][[8]] #> [1] 0.1625244 #> #> [[14]][[9]] #> [1] -0.217787 #> #> [[14]][[10]] #> [1] 0.4456488 #> #> [[14]][[11]] #> [1] 0.539001 #> #> [[14]][[12]] #> [1] 0.4908774 #> #> [[14]][[13]] #> [1] 0.4567534 #> #> [[14]][[14]] #> [1] 1
Running harmonydata locally from a docker image
To run harmonydata locally, first you need to pull the docker image using the terminal.
1. Pull docker image
docker pull harmonydata/harmonyapi
2. Run docker image
docker run -p 8000:80 harmonyapi
3. Configure harmonydata to run locally
Set url to use localhost. Donโt forget to expose port 8000:
set_url(harmony_url = "http://localhost:8000")
๐ How do I cite Harmony?
You can cite our validation paper:
McElroy, Wood, Bond, Mulvenna, Shevlin, Ploubidis, Scopel Hoffmann, Moltrecht, Using natural language processing to facilitate the harmonisation of mental health questionnaires: a validation study using real-world data. BMC Psychiatry 24, 530 (2024), https://doi.org/10.1186/s12888-024-05954-2
A BibTeX entry for LaTeX users is
@article{mcelroy2024using, title={Using natural language processing to facilitate the harmonisation of mental health questionnaires: a validation study using real-world data}, author={McElroy, Eoin and Wood, Thomas and Bond, Raymond and Mulvenna, Maurice and Shevlin, Mark and Ploubidis, George B and Hoffmann, Mauricio Scopel and Moltrecht, Bettina}, journal={BMC Psychiatry}, volume={24}, number={1}, pages={530}, year={2024}, publisher={Springer} }
