Semi-automated complicated google scholar searches via cmd or similar
August 21, 2020 3:03 PM   Subscribe

My searches (mainly on google scholar) are getting more involved ... and time-consuming. A recent case is an historic hydrocarbon (mostly coal-tar and benzene) site at near sea level. So this is a search looking for field remediation efforts and precedents, preferably at high-latitude, coastal sites. My question is how can I semi-automate complicated searches?

Here's a recent search as a case in point.
start:
bioremediation estuary [inc. citations, exclude patents [as nothing useful found on search], 24100 hits]
bioremediation "close to sea level" [nothing useful]
bioremediation "at sea level" [250 hits, was still a bit scattergun]

bioremediation "near sea level" [47 hits]

[then I ran the following strings against the line above]
pah = 3 [abbrev. for polycyclic aromatic hydrocarbons]
"coal tar" = 0
benzene = 4
hydrocarbon = 14 +benzene = 4 inc. Zhang 2013 below
-"tropical" "temperate" = 22
[I actually had 10 strings]

the benzene one included:
Zhang et al. 2010 Phytoremediation in engineered wetlands with 115 citations, one of which is Zhang's continuation of this field to a hydrocarbon, coastal, cold-regions pilot study. (I have yet to explore the 114 others)

Zhang et al 2013 Pilot-Scale ... In-Situ Bioremediation ... Site In Newfoundland And Labrador. Which has proven very useful to work forward from, and very useful for understanding my site. This has 4 citations, all very useful looking.

But I'm doing this a line at a time in scholar which is tedious so is there a way to automate scholar, e.g via Powershell, cmd etc so it can use e.g. the below and a list of substrings and search them one by one and print a list to screen? I know it won't be perfect but it may work as a first cut to what value there is in a search term, especially as many terms have 0 hits.

bioremediation "near sea level" [main string, with sub-terms below]
pah
"coal tar"
benzene
hydrocarbon
-"tropical" "temperate"
posted by unearthed to Computers & Internet (3 answers total) 3 users marked this as a favorite
 
Best answer: You are looking at learning basic webscraping. Many of the free tutorials you'll find begin with scraping Google results.

Doing this in Powershell alone will be agonizing.

Looking back from experience, I would personally start with Python and the Scrapy framework, since you'll have the greatest variety of resources and tutorials, and the barrier to entry isn't particularly high. Getting Python installed on Windows these days is as simple as installing Anaconda. Good luck!
posted by aspersioncast at 8:24 AM on August 22, 2020 [2 favorites]


+1 python and some web scraping library
posted by turkeyphant at 8:34 PM on August 22, 2020


Response by poster: Thanks aspersioncast for suggesting Anaconda as I now have it running on system.

As I have an open profile a non-member also contacted me and offered to write a script which I have high hopes for but no time test for a week or so.
posted by unearthed at 8:36 PM on August 27, 2020


« Older Can you help this newbie seamstress remake a dress...   |   Wonton soup tonight Newer »
This thread is closed to new comments.