Data security and logistics for remote transcription
November 30, 2021 11:00 AM Subscribe
I need to have 11 audio files transcribed, and for confidentiality reasons, I cannot upload them to an automated transcription program. My volunteer undergraduate research assistants are still working from home due to COVID, using their personal laptops. My upmost concern here is data security, and I need to figure out hardware/software needs. Can I set them up to do the transcription securely, efficiently, and also cheaply?
The last time I supervised transcription was about 10 years ago, and it was in an in-person research lab. We had foot pedals and bought software to play the files (I think?) and the audio files were stored on the dedicated study computer.
We had initially planned to have these files transcribed in person, but that was pre COVID. The audio files are not sensitive, but they have participant names in them, so they’re stored in a secure server that research assistants have access to on an as-needed basis following rigorous confidentiality and data security training. The undergrads will be required to wear headphones and only transcribe in a private setting.
I need to figure out a few things to see if this is feasible:
*I need to make sure that the mp3s don’t get stored anywhere outside of the secure server when they’re played (e.g., downloaded to their personal computers, or stored in their iTunes library).
*I need to figure out whether I need to get them foot pedals for transcription, and if so, what foot pedals to buy (considering that most have Macs and I think some have PCs).
*I need to figure out whether I need to buy any special software for them.
It is NOT feasible to buy them all dedicated laptops for this purpose.
I am considering hiring a professional transcriptionist instead if that is cheaper and more secure, but that would raise other issues (e.g., it would involve sending the files remotely which makes me uncomfortable, there are some confusing aspects of the files that will make it hard to understand what to transcribe and what to redact for someone who’s not involved in this work).
The last time I supervised transcription was about 10 years ago, and it was in an in-person research lab. We had foot pedals and bought software to play the files (I think?) and the audio files were stored on the dedicated study computer.
We had initially planned to have these files transcribed in person, but that was pre COVID. The audio files are not sensitive, but they have participant names in them, so they’re stored in a secure server that research assistants have access to on an as-needed basis following rigorous confidentiality and data security training. The undergrads will be required to wear headphones and only transcribe in a private setting.
I need to figure out a few things to see if this is feasible:
*I need to make sure that the mp3s don’t get stored anywhere outside of the secure server when they’re played (e.g., downloaded to their personal computers, or stored in their iTunes library).
*I need to figure out whether I need to get them foot pedals for transcription, and if so, what foot pedals to buy (considering that most have Macs and I think some have PCs).
*I need to figure out whether I need to buy any special software for them.
It is NOT feasible to buy them all dedicated laptops for this purpose.
I am considering hiring a professional transcriptionist instead if that is cheaper and more secure, but that would raise other issues (e.g., it would involve sending the files remotely which makes me uncomfortable, there are some confusing aspects of the files that will make it hard to understand what to transcribe and what to redact for someone who’s not involved in this work).
I would recommend sending a quick email to the folks at your REB/IRB, as you may need to amend your research ethics approval given the change from in-person! They can also likely provide advice on how to best maintain data security and confidentiality.
I am part of the REB at my uni, and we generally recommend One Drive for storage of data (which you may or may not use at your uni- but you might have an equivalent), which allows the PI to set up a secure folder that can be accessed remotely by those who receive permissions (I.e., students). Can your students just access the secure server from home? This should mean they would not have to store the files. At our uni we also request that audio recordings & transcripts are identified by code, with a master list linking name to code that is stored separately. This would help protect confidentiality.
I have no specific recommendations for software, I just use the default media player on my computer these days which is... not very efficient, but not terrible. Maybe others have recommendations. I honestly would not bother with foot pedals though unless you had 100s of interviews to transcribe!
posted by DTMFA at 11:56 AM on November 30, 2021 [1 favorite]
I am part of the REB at my uni, and we generally recommend One Drive for storage of data (which you may or may not use at your uni- but you might have an equivalent), which allows the PI to set up a secure folder that can be accessed remotely by those who receive permissions (I.e., students). Can your students just access the secure server from home? This should mean they would not have to store the files. At our uni we also request that audio recordings & transcripts are identified by code, with a master list linking name to code that is stored separately. This would help protect confidentiality.
I have no specific recommendations for software, I just use the default media player on my computer these days which is... not very efficient, but not terrible. Maybe others have recommendations. I honestly would not bother with foot pedals though unless you had 100s of interviews to transcribe!
posted by DTMFA at 11:56 AM on November 30, 2021 [1 favorite]
Response by poster: Yes, I will most definitely be amending the IRB with whatever we decide.
The students can access the secure server from home. However, if they try to play a file stored in the secure server, I'm concerned that the file might then get stored in the local database whatever software is used to play it. For example, if I try playing a test audio file stored in the server, it opens it in iTunes, and adds it to my iTunes library.
I will consult with our IT person as well. If we go with the virtual computer (which I have set up for some staff, but for $reasons might not work here), I'd still love software/pedal tips.
posted by quiet coyote at 12:48 PM on November 30, 2021
The students can access the secure server from home. However, if they try to play a file stored in the secure server, I'm concerned that the file might then get stored in the local database whatever software is used to play it. For example, if I try playing a test audio file stored in the server, it opens it in iTunes, and adds it to my iTunes library.
I will consult with our IT person as well. If we go with the virtual computer (which I have set up for some staff, but for $reasons might not work here), I'd still love software/pedal tips.
posted by quiet coyote at 12:48 PM on November 30, 2021
> The students can access the secure server from home. However, if they try to play a file stored in the secure server, I'm concerned that the file might then get stored in the local database whatever software is used to play it.
It depends on what the secure server is exactly, and how it is set up. Here's a couple of ways:
1. the secure server is just a file server. A research assistant establish a secure connection to the server and log in with their credentials to gain access to the audio files. A research assistant downloads a copy of the audio file to their personal laptop. The research assistant plays the audio file and transcribes it using software on their laptop, independently of the secure server. When the research assistant has finished doing the transcription work, perhaps they upload a transcribed document to the secure server.
2. the secure server provides users some kind of remote access to a desktop computing environment running somewhere in a server managed by IT, with software to play and transcribe the audio, as well as access to the audio files. A research assistant installs some remote access client software into their personal laptop, and uses that establish a secure connection to the server and logs in with their credentials to gain access to the remote environment. The audio files are never copied back to the assistant's personal laptop, although clearly the audio signal has to be sent back through the remote access client software so the student can listen to it, in order to transcribe.
Way 1 would describe patterns of sharing files using a SFTP server or dropbox or onedrive. The playing of the audio and the transcription of the audio is all happening on the research assistant's personal laptop.
Way 2 might describe things like using citrix VDI to remotely log in to a windows environment running in a server somewhere.
A risk with relying on people's personal laptops to do sensitive / confidential work is that you have no control and no guarantee about what software is installed on the laptop or who use uses the machine. E.g. maybe the laptops are already full of malware that is logging every keystroke.
For organisations that have serious data privacy and confidentiality concerns for regulatory or commercial reasons, work would only ever happen on laptops or computers owned and managed and locked down by the IT department of the organisation, never through anyone's personal equipment, so they could make some guarantees about what software was or was not installed, and if it had received up to date security patches, etc.
posted by are-coral-made at 1:09 PM on November 30, 2021
It depends on what the secure server is exactly, and how it is set up. Here's a couple of ways:
1. the secure server is just a file server. A research assistant establish a secure connection to the server and log in with their credentials to gain access to the audio files. A research assistant downloads a copy of the audio file to their personal laptop. The research assistant plays the audio file and transcribes it using software on their laptop, independently of the secure server. When the research assistant has finished doing the transcription work, perhaps they upload a transcribed document to the secure server.
2. the secure server provides users some kind of remote access to a desktop computing environment running somewhere in a server managed by IT, with software to play and transcribe the audio, as well as access to the audio files. A research assistant installs some remote access client software into their personal laptop, and uses that establish a secure connection to the server and logs in with their credentials to gain access to the remote environment. The audio files are never copied back to the assistant's personal laptop, although clearly the audio signal has to be sent back through the remote access client software so the student can listen to it, in order to transcribe.
Way 1 would describe patterns of sharing files using a SFTP server or dropbox or onedrive. The playing of the audio and the transcription of the audio is all happening on the research assistant's personal laptop.
Way 2 might describe things like using citrix VDI to remotely log in to a windows environment running in a server somewhere.
A risk with relying on people's personal laptops to do sensitive / confidential work is that you have no control and no guarantee about what software is installed on the laptop or who use uses the machine. E.g. maybe the laptops are already full of malware that is logging every keystroke.
For organisations that have serious data privacy and confidentiality concerns for regulatory or commercial reasons, work would only ever happen on laptops or computers owned and managed and locked down by the IT department of the organisation, never through anyone's personal equipment, so they could make some guarantees about what software was or was not installed, and if it had received up to date security patches, etc.
posted by are-coral-made at 1:09 PM on November 30, 2021
Maybe I'm simplifying this too much, but can you setup a Windows machine on site that has access to the secured files? You can then give each undergrad their own login to the machine, and each of them could remote desktop into that machine from their own laptops or whatnot (windows and mac all work) and listen to the audio that way, via remote desktop. The audio comes from the remote desktop app, not their personal machine's audio players. This doesn't solve any sort of external hardware needs like foot pedals though.
We effectively do this at work to give access to files we don't want employees storing on their own machines, and these are things we're pretty darn strict about. But it may require some IT help to get it set up.
posted by cgg at 1:45 PM on November 30, 2021 [1 favorite]
We effectively do this at work to give access to files we don't want employees storing on their own machines, and these are things we're pretty darn strict about. But it may require some IT help to get it set up.
posted by cgg at 1:45 PM on November 30, 2021 [1 favorite]
@cgg has the right idea: remote desktop / VNC into a machine, so the local machine never has direct access to the file. If they want to "steal" it, they have to basically record locally instead of stealing the file directly.
Depending on what's available, it may even be possible to remote connect to a virtualBox or similar VMs.
Keep in mind that foot pedals are often paired with custom software for playback control. Whether that software is compatible with the access solution you decide on is yet ANOTHER technical hurdle to consider.
posted by kschang at 2:13 PM on November 30, 2021
Depending on what's available, it may even be possible to remote connect to a virtualBox or similar VMs.
Keep in mind that foot pedals are often paired with custom software for playback control. Whether that software is compatible with the access solution you decide on is yet ANOTHER technical hurdle to consider.
posted by kschang at 2:13 PM on November 30, 2021
Also, check in with your IT department-mention you have audio files to transcribe and ask what software they would use for an initial speech-to-text conversion, so your students are editing for punctuation and accuracy. Microsoft Dictate has pretty amazing AI capabilities, if your IT doesn’t come up with something specific.
We use VPN and Box.
posted by childofTethys at 3:39 PM on November 30, 2021
We use VPN and Box.
posted by childofTethys at 3:39 PM on November 30, 2021
I need to make sure that the mp3s don’t get stored anywhere outside of the secure server when they’re played (e.g., downloaded to their personal computers, or stored in their iTunes library)
There is, in general, no way to prevent digital audio that some remote user can actually listen to being captured and stored, given a sufficiently savvy and motivated remote user.
Best you can do is prevent automated and/or accidental capture and storage by not giving your remote users direct access to the MP3s concerned, instead playing them via something like RDP sound sharing from a VM on your end is probably as close to technically secure as you can get. It's not very secure, and the round-trip latency between a local footswitch and a remote audio stream is going to be quite unpleasant for your transcribers. If you gave me that work, the first thing I'd do is bypass all your security controls so I could just grab the audio and work with my own preferred choice of tools for exactly that reason. I wouldn't tell you I'd done this.
If you want to keep the whole process secure, I think your best chance of demonstrably achieving that is to pay somebody whose ability to generate ongoing business rests on a reputation for handling this kind of work securely.
Can I set them up to do the transcription securely, efficiently, and also cheaply?
I would be very very surprised to learn that your genuine options here don't boil down to "pick any two".
posted by flabdablet at 5:52 PM on November 30, 2021
There is, in general, no way to prevent digital audio that some remote user can actually listen to being captured and stored, given a sufficiently savvy and motivated remote user.
Best you can do is prevent automated and/or accidental capture and storage by not giving your remote users direct access to the MP3s concerned, instead playing them via something like RDP sound sharing from a VM on your end is probably as close to technically secure as you can get. It's not very secure, and the round-trip latency between a local footswitch and a remote audio stream is going to be quite unpleasant for your transcribers. If you gave me that work, the first thing I'd do is bypass all your security controls so I could just grab the audio and work with my own preferred choice of tools for exactly that reason. I wouldn't tell you I'd done this.
If you want to keep the whole process secure, I think your best chance of demonstrably achieving that is to pay somebody whose ability to generate ongoing business rests on a reputation for handling this kind of work securely.
Can I set them up to do the transcription securely, efficiently, and also cheaply?
I would be very very surprised to learn that your genuine options here don't boil down to "pick any two".
posted by flabdablet at 5:52 PM on November 30, 2021
Re foot pedals: when I was doing a bunch of transcription years ago I found that Express Scribe‘s keyboard shortcuts did everything I needed in terms of making it easy to back up and replay and fast forward as required. You could also set shortcuts to vary playback speeds for easier or tougher portions, and on Windows, at least, you could set the shortcuts to be systemwide (or at least to work in MS Word) so that you could do your transcription in other editors. It was cheap, too.
posted by col_pogo at 8:47 AM on December 1, 2021
posted by col_pogo at 8:47 AM on December 1, 2021
I am a researcher with similar concerns. A workaround that has been fine for me so far has been to play recordings or live transcribe from a laptop or other phone into a Pixel with the Recorder app from Google. The transcriptions are much better for in person conversations, and I am finding the troubleshooting somewhat annoying for long recordings (think over 90 minutes at a time). That said, it is a superior choice, especially given that transcription is local to the device and not over the cloud.
You can Google which devices have this app (last I checked a year or so ago, I think I had to have a certain kind of Pixel), and see if this might factor into your budget. Literally getting a refurbished device for this purpose, and then finding a way to fix any inaccuracies, could be the way to go.
posted by pearl228 at 6:02 PM on December 6, 2021
You can Google which devices have this app (last I checked a year or so ago, I think I had to have a certain kind of Pixel), and see if this might factor into your budget. Literally getting a refurbished device for this purpose, and then finding a way to fix any inaccuracies, could be the way to go.
posted by pearl228 at 6:02 PM on December 6, 2021
« Older Please tell me about your favorite pool floats and... | Will a 5% polyester blend meaningfully impact... Newer »
This thread is closed to new comments.
I may have the terminology wrong (since I was an end user), but I think I've seen a virtual computer used for a similar purpose. The virtual machine was locked down in certain (data protective) ways (e.g. I couldn't access my own computer's drives while logged in), and there were logs to audit if there were any issues. Specialized software can be installed on virtual machines as well.
posted by oceano at 11:51 AM on November 30, 2021