Walk a non-programmer through Watson, GitHub and BlueMix
June 1, 2016 12:38 PM   Subscribe

Would a kind programmer help me activate and install the software necessary so I can use IBM Watson's Speech to Text application?

Based on a previous Ask, I decided to try Watson Speech to Text as I was looking for some software that would transcribe long (20 to 60 minute) WAV files as I am a slow, horrible typist. In the past I have tried other software (such as Dragon) but my main difficulty is that all of the audio files involve multiple speakers.

Anyway, I tried the free Watson demo and while the result wasn't perfect, it was acceptable enough that I could go back into the file and clean up what was missed with relative ease. Yeah!

Now I'm at the stage where I'm trying move past the free online demo and actually install and use the solution and I'm already stumped. I was hesitant about attempting this because I'm not a programmer. I haven't done any "programming" since learning Waterloo Structured BASIC eons ago. That said, I've worked on the periphery of the tech field, I interact with programmers and developers, I understand the basic concepts and I'm good about following instructions. I poured over the instructions multiple times and finally decided I'd give it a shot.

Sadly, I'm pretty much stuck on step one.

I'm trying to follow the procedure here.

Create a Bluemix Account


Sign up in Bluemix, or use an existing account. Watson Services in Beta are free to use.

Done, I think.
In my GitHub account I found Watson Speech to Text and added the service to my dashboard. (Forgive me if I'm using the wrong terminology and feel free to tell me what the correct vocabulary is.)

Download and install the Cloud-foundry CLI tool


Edit the manifest.yml file and change the to something unique.

And here's the problem. I don't see the manifest.yml file. In my Cloud Foundry file (C:\Program Files\Cloud Foundry) I have:


I've clicked on the .exe files and when I do, something happens in the background (I see the command prompt screen with code scrolling by) but that's it.

I know enough to open the admin command prompt (as opposed to the limited user locked-down one) but I'm pretty much stuck after that.

- services:
- speech-to-text-service-standard
command: node app.js
path: .
memory: 512M

The name you use will determinate your application url initially, e.g. .mybluemix.net.

Install Node.js

Connect to Bluemix in the command line tool.

$ cf api https://api.ng.bluemix.net
$ cf login -u

Create the Speech to Text service in Bluemix.

$ cf create-service speech_to_text standard speech-to-text-service-standard

Push it live!

$ cf push

I'm faring no better with the GitHub instructions.

I realize this is way over my head, and I should probably leave programming to the programmers, but if somebody is willing to lead me through this in a step-by-step fashion, I'd really appreciate it. It would be nice to learn something new (or more realistically be exposed to something new, as I'm not sure I'd really be learning anything except following instructions.) I'm on Windows 8.1 Professional.
posted by sardonyx to Computers & Internet (13 answers total) 6 users marked this as a favorite
The instructions on Github assume you've cloned (via Git) or downloaded the speech-to-text-nodejs repository. This is where manifest.yml is located. Fortunately, you don't need to figure out Git -- you just click the big green "Clone or download" button on Github and pick "Download as Zip". Extract that wherever.

It looks like you also need NodeJS installed. Then you should be able to go to the directory where you extracted everything -- in Command Prompt! -- and run "npm install". This installs the dependencies you need for the app. (This is all in the instructions, actually.)

From there you're probably okay, assuming cf.exe and node.exe are in your Windows path. If they're not, you probably need to restart Command Prompt. This is also assuming that a Unix environment isn't required, but I'm not seeing anything to suggest that it is.
posted by neckro23 at 12:53 PM on June 1, 2016

I've clicked on the .exe files and when I do, something happens in the background

Try running the command from a command line window, that will let you see the message being returned. You will most likely need to change to the proper directory with cd C:\directory\where\command\lives\

I don't see the manifest.yml file. In my Cloud Foundry file (C:\Program Files\Cloud Foundry) I have

I believe you're confused: the manifest.yml file should be in the root of the Watson code (you can see it here in their github repository), so you'd need to have that cloned (i.e., copied to) to a folder on your machine. If you don't already have that code locally, try the "Clone or Download" button here.

ON EDIT: neckro23 makes a really good point: if things aren't working for you because Windows says it doesn't know the command you're trying to run, close your prompt and open a new one (or just open a new one if you need to keep track of what happened in your old one) to see if refreshing your profile gets you the commands you need.
posted by yerfatma at 12:55 PM on June 1, 2016

Response by poster: Thanks both of you! I'm afraid I'm still going to need loads of hand holding, but I think I'm making some progress.

Now I have CloudFoundry and nodejs downloaded into my Program Files folder. I also have cli-master in my Temp folder. (This contains about 20 subfolders and a bunch of different files including .ymls and .jsons.)

Additionally I have the Watson speech-to-text-nodejs-master in my Downloads folder. I don't see an .exe or a way to extract it, but I do see the infamous (AHA!) manifest.yml file.

I presume I need/should/move the speech-to-text and cli-master files into the more permanent file structure (like into the Program folder), or is that a mistake? And if I need to do that, do I attempt to edit the manifest.yml file before I move it or after? (And if moving is required is there a preferred method I should use?)

I want to at least get the basic foundation correct before I start making edits in the command prompt, as I know how badly things can go wrong there, even with the two of you so graciously helping.
posted by sardonyx at 1:32 PM on June 1, 2016

I don't think the app cares where you put the folder. You shouldn't have to be fooling around with Program Files at all (beyond installing node and cloud-foundry). I don't know what you mean by "cli-master" though.

Doing the node/npm stuff in Command Prompt won't mess up your system, everything will stay in that folder. I'm not as sure about the cloud-foundry stuff, but a quick look suggests that it only changes things in the app folder. If you screw something up, just re-extract the files.
posted by neckro23 at 2:10 PM on June 1, 2016

I think I should clarify here: The app (the speech-to-text-nodejs thing that you download as a zip) isn't the kind that installs. It just runs directly from the folder using Node. The "npm install" step doesn't install things globally either, it just puts the libraries the app needs in the app's "node_modules" folder.

You'll have to run the app via the command line every time, probably. You *might* be able to run the server by double-clicking server.js once you get everything set up, but I'm not 100% sure this would work.
posted by neckro23 at 2:17 PM on June 1, 2016

Response by poster: Sorry. I don't understand how to edit the manifest.yml file.

As I mentioned that file is in the temp downloads folder (so C:\Users\sardonxy\Downloads\speech-to-text-nodejs-master.zip) and the .yml file is there.

Because it's a zip file I assume I would need to open/unpack it and set it up somewhere more permanent (like the Program Files folder) but I don't see a way to do that. Also, I don't really see a need to do it as if I drill down into the folder, it contains five subfolders and a dozen-and-a-half files, including the .yml file.

So when I go to edit it, I have to change the directory to that listed above, but of course I get the "directory name is invalid" message, even after closing and re-opening the command prompt.

Assuming that I could get the directory to be valid, how do I open and edit the .yml file?

One of the things that has gone right, as far as I can tell is nodejs seems to have installed correctly. It's sitting in my program files and even has a node.exe file which, when I click on it, opens up the command prompt.

(I have no problem running this from the command prompt, if that's what is required.)

Again, sorry for needing such remedial instructions, but I do truly appreciate all the assistance.
posted by sardonyx at 6:50 PM on June 1, 2016

You do need to unzip the .zip file for this to work. You should be able to unzip the file by right-clicking the .zip file and selecting Extract All. This will leave you with a folder called speech-to-text-nodejs-master in your Downloads folder. It's fine to leave it there; you don't need to move it to Program Files or anywhere else unless you want to.

The (unzipped) speech-to-text-nodejs-master folder will contain all the contents of the .zip file, including manifest.yml, which you will now be able to edit. You probably want to do this with a plain text editor like Notepad, not in a full-fledged word processor like Word.

Once you've made the changes to manifest.yml, use Command Prompt to change into the application folder:
cd C:\Users\sardonxy\Downloads\speech-to-text-nodejs-master
Since that folder now exists, you shouldn't get the "directory invalid" message, and you can proceed with the next step in the instructions.

Hope that helps!
posted by Gerald Bostock at 10:04 PM on June 1, 2016

Oh hey, this is relevant to my interests! I run the Metafilter Podcast transcript server, and I work with/at Cloud Foundry. If you're technical, I'd say "it's a PaaS like Heroku, but more flexible." Bluemix is IBM's branded version of Cloud Foundry. The cf app pushes the app to the appropriate servers, but you can hit the "deploy to Bluemix button" to skip all the cli stuff entirely. I think that you did that, and you might be able to view your app?

I got it running fairly quickly, though I'm not going to post the link here, because of billing things. I'll send a link in a Mefi Mail. It only accepts WAV, FLAC or OPUS file formats.
posted by Pronoiac at 12:18 AM on June 2, 2016 [1 favorite]

Response by poster: When I open manifest.yml in Notepad (stupidly I should have tried this on my own without the prompt, but thanks for reminding me of that Gerald Bostock) here's what I see:
label: speech_to_text
plan: standard
- name: speech-to-text-demo
path: .
command: npm start
memory: 512M
- speech-to-text-service-standard
NODE_ENV: production

So as I understand it, I swap out "speech-to-text-demo" with something like "audio-transcriber" and hit save and continue from there. Okay that will be my next step.

Pronoiac, the sad part is that I'm techie enough to understand exactly what you mean by "If you're technical, I'd say 'it's a PaaS like Heroku, but more flexible,'" having written countless sentences exactly like that over the years, but I'm not techie enough to actually move beyond the conceptual and into the doing. That's what makes this kind of thing particularly frustrating.
posted by sardonyx at 8:31 AM on June 2, 2016

Could you ask one of the programmers and developers that you interact with? Offer some beers or dinner, perhaps?
posted by dozo at 12:02 PM on June 5, 2016

Huh, I'm not used to writing longer comments.


We have a haiku about Cloud Foundry:
Here is my source code
Run it on the cloud for me
I do not care how
You can scale up and down without manually stopping or restarting anything, and you can tell the platform "just make sure we've got one running," and the platform will monitor the app and restart instances as needed.


If you start up the speech to text nodejs app, you get your own copy of the demo site. You can do this at the command line, or by hitting the "deploy to Bluemix" button; the second is *much* simpler.


I took the latest podcast and let Watson process it, on my own copy of the app; here's the output, which is a big wall of text. It's a lot more coherent than the Google web speech API attempt.

There is a limit to how much audio you can send at once; on my copy of the app I got:
"Session closed. Reason: Payload exceeds the 104857600 bytes limit."
This isn't in the app, but in the Watson API. They mention it as Audio transmission: "Lets the client pass as much as 100 MB of audio to the service as a continuous stream of data chunks or as a one-shot delivery." That's about 10 minutes of WAV data. I don't think this is a paid feature.

Splitting up audio into 5-10 minute chunks could be done automatically; I might ask for longer segments as a wishlist item, along with something like paragraphs when someone else starts talking. It's ok, but not quite what I need right now.


Judging by the documentation, I think their target audience is web developers. I'm not sure IT people or, say, Geek Squad would follow this well.
posted by Pronoiac at 2:04 PM on June 5, 2016

Response by poster: Okay I got a bit further and then got stuck again.

--I edited the manifest.yml file and changed the name to audio-transcriber.
--I installed nodejs and got it running! I can log into Bluemix.
--I created the Text-to-Speech service (and got the this-service-will-incur-a-cost warning)
--I entered the command cf push audio-transcriber and got the following messages:

Creating app audio-transcriber in...(my Bluemix account)

Binding audio-transcriber .mybluemix.net to audio-transcriber

Error processing app files in 'C:\Users\sardonyx': open C:\users\sardonxy\AppData\Local\Microsoft\Windows\INetCache\Content.Word\LongStringOfLetters&Numbers.tmp: This process cannot access the file because it is being used by another process


Okay, I should have said in my initials post I used to associate with programmers and developers. My career has taken me further away from the tech field. I still have to interact with them but it's much, much rarer, and there really isn't anybody these days I'd feel comfortable offering a case of beer (or pay for a couple hours of their time) to do this for me.

Any suggestions?
posted by sardonyx at 7:18 PM on June 8, 2016

Response by poster: I logged out and logged in again and tried to repeat the process.

After attempting to recreate speech-to-text service I got the message: Speech-to-text-service-standard already exists.

Typing cf push (as per the instructions) gets me:
FAILED Manifest file is not found in current directory. please provide either an app name or manifest.

Typing cf push audio-transcriber gets me me the file is being used by another process message again.

So I'm guessing I somehow edited manifest.yml incorrectly. Or have it in the incorrect directory.
posted by sardonyx at 7:28 PM on June 8, 2016

« Older I'm looking for a photo of an empty lot!   |   2 week Toronto to east coast road trip suggestions Newer »
This thread is closed to new comments.