A database lookup is a wish your heart makes
June 21, 2019 12:06 PM   Subscribe

Please help me find out what I need to learn to make this database lookup project!

I am in Scientific Affairs at a big dietary supplement company. One of the things we do is “stacking analysis” – we look at the amounts of the active ingredients in a proposed formula, and add them to the amounts of those ingredients in various combinations of our existing products people might consume the same day, and compare these totals to the safe level for that ingredient. Another thing we do is analyze mixtures of our products that our customers make themselves off-label, comparing the total amounts of the actives to safe levels, and looking for incompatible preparation conditions (for example Product A is ineffective in acidic solutions and Product B is acidic). We do this manually, on spreadsheets.

I want to write an app to do this, but there’s a lot I don’t know. That’s in addition to the things I don’t know I don’t know. Hope me.

  • The people who run our ERP with all the product information have agreed to send me weekly spreadsheets of the data I specify. Can I drop them in a shared folder and have the app automatically use the most recent spreadsheet as a database, or do I need to convert the spreadsheet data into another form?
  • I’d like to pull data from the USDA Food Composition Databases -- what’s involved in that?
  • Safe levels and product incompatibilities would need to be manually collected somewhere and updated periodically.
  • I would like to make a central app that maybe runs in a browser, rather than something people have to download. Where would my app be hosted? Can I put the whole thing on our department shared drive? If not, then where/how?
  • This app will contain formula information, so there are possible data security issues. I am thinking if we have it on the shared drive, people have to be onsite or work through a VPN to get to it, and then users would also each have a unique login. What other things should I be thinking about?
  • I am really lacking a sort of big picture of how this could work. I just saw this today on w3schools and said aha!
    Using SQL in Your Web Site To build a web site that shows data from a database, you will need: • An RDBMS database program (i.e. MS Access, SQL Server, MySQL) • To use a server-side scripting language, like PHP or ASP • To use SQL to get the data you want • To use HTML / CSS to style the page
    So if I learn/have those things I'm good? Would this also apply to a website that is locally hosted? Indeed, can I think of this project as a locally hosted website rather than a software application?
How would YOU approach this project?
Thanks for any advice you can give me on making this thing happen!
posted by pH Indicating Socks to Computers & Internet (15 answers total) 7 users marked this as a favorite
Do you have any internal collaboration systems? Myself, I would use SharePoint - if it was on-premises, I would use it's capabilities to treat excel spreadsheets like databases - but I would need to be using SharePoint "enterprise" with a combination of SQL Server Reporting Services integrated into it (or PowerBI). Might even use something called "Access Web Databases" for the actual database needs.

In the cloud (SharePoint Online), some features are available, others are not (Access Web Databases).

In a more generic fashion, I would want to use a web content management system where user login/authentication is already handled for me. It should have some sort of list/relationship or database ability built-in to structure and and classify information.
posted by jkaczor at 12:17 PM on June 21, 2019

I'm a SQL person who had to deal with Excel a lot. So this colors my response.

It is a medium-easy task to have SQL import a source file, like Excel or csv or tab-delimited, to a table. It is medium-easy to use SQL Reporting Services (does the html for you) to design reports to output results based on some criteria. You can google your way through these tasks.

It is quite easy to write an Excel PowerPivot that pulls in a source file and lets you write transforms and filter functions. I'm definitely no expert but I did get myself to a functional point in two days writing something for a customer that was easier done in Excel than SQL, using google and some youtube.
posted by Lyn Never at 12:20 PM on June 21, 2019 [1 favorite]

Typically if it is a website, you cannot put it on a shared drive.

But - if you have Microsoft Access, you can host a database on a shared drive (but I don't recommend doing that...).

Your asking a big deal here - basically a system design/architecture/blueprint, and we don't know enough about your current available internal systems.
posted by jkaczor at 12:21 PM on June 21, 2019

(In my SharePoint world, I would leverage "PowerPivot" to make my pivoted Excel spreadsheet (like Lyn Never mentions) available on the web)
posted by jkaczor at 12:23 PM on June 21, 2019

I would make a website using Django and host it on a service that charges you for how much the site is used, like AWS. Since it is internal to your company, your site would not get many hits and it would be cheap or even free.

The Django Girls tutorial will teach you most of what you need to know.
posted by tofu_crouton at 12:24 PM on June 21, 2019

Response by poster: Thanks for your responses thus far!
Jkaczor, I don't understand what you mean when you say "a website cannot be hosted on a shared drive"? I was using website to mean "HTML and associated files which are accessed through a browser" and I can do THAT on a shared drive, no? Also, I know virtually nothing and have no support from IT (who would really prefer we didn't use the computers at all) so yes, I'm looking for guidance from my fellow mefites. If you mefites can rescue young women from human trafficking, you can rescue my head from my rectum, at least I hope you can!
posted by pH Indicating Socks at 12:37 PM on June 21, 2019 [1 favorite]

If you go the AWS route, they offer a free tier of services, including a machine type called "t2.micro" that gives you the opportunity to try things out and learn some things without risk of incurring huge costs.

Once you have something running, you can then upgrade your machine type, storage, and other options to a level that meets demand and budget.
posted by They sucked his brains out! at 12:39 PM on June 21, 2019

IMHO, some time spent putting carts and horses in the right order would be helpful.

1. Are you and your colleagues basically doing this process now, manually? You say that you are, but your questions about the USDA database (which you'd already being pulling from?) and data security (which would still be an issue with a manual process) suggest that you are not yet doing in the full fashion you'd like your app to handle.

2. If data security really isn't an issue for your manual process, then it can wait for the moment.

3. Ditto the question of how to pull the "most recent" spreadsheet. If you have everything else working, this will be easy.

If the answer to #1 is no, I'd strongly suggest that you hack your way through, by hook or by crook, a manual version of the process and document that first. That is, do it, entering data by hand, copying stuff from a website into a spreadsheet by hand, etc. Then document it to a sufficient level of detail that someone who had only IT smarts in spreadsheets and any other tool you use -- but no substantive smarts in chemical interactions -- could follow the steps and reliably get a good result.

Then consider whether all of the steps and all of the outcomes provide value to the business, and eliminate any that do not. This also helps you make the case for funding, staff, etc. to automate this.
posted by alittleknowledge at 12:39 PM on June 21, 2019 [6 favorites]

I'd start with the algorithm of how you are going to find compatible products.

I'd do this as a vector space of products, generate permutations of products, multiply product matrix by each permutation, and check if below USDA limits or fails on compatibility.

The people who currently do this work would probably recognize these steps if it were described to them.

This is an inefficient algorithm. Due to the permutations part. That is likely to be a lot of product combinations. The people who do this work already know this. But, this can be done in parallel.
posted by bdc34 at 1:33 PM on June 21, 2019

I just saw this today on w3schools and said aha!

Using SQL in Your Web Site To build a web site that shows data from a database, you will need: • An RDBMS database program (i.e. MS Access, SQL Server, MySQL) • To use a server-side scripting language, like PHP or ASP • To use SQL to get the data you want • To use HTML / CSS to style the page

So if I learn/have those things I'm good?

This is describing the skills of a generic web developer. Not saying you can't do this, I'm just saying this is a thing that a lot of people do as their full-time job. The system you're describing doesn't sound that complex, but if you have no experience with programming it's not going to be easy to just dive in and build it.

I’d like to pull data from the USDA Food Composition Databases -- what’s involved in that?

Does this data change often? On the page you linked to, there is a "Downloads" item in the top menu bar; one of the options there is "BFPD Access Database Files", which lets you download a copy (in MS Access format) of what I assume is the full database. So this at least is easy, as long as the data is fairly static. If you need to update it frequently, this is more of a hassle, but it could easily be automated with a cron job (basically a script that would run automatically on some schedule to download a newer copy of the DB).

I second the advice above on really carefully documenting the process. I know you didn't give us all the details because they're not relevant, but having very specific requirements makes it much easier to do it right when you're automating it.

I also second the use of Django (a python webapp framework) for this sort of project, but that's probably just because it's what I'm used to. This would be instead of using PHP or ASP. Again, this doesn't sound like a terribly difficult project for somebody who has some background with databases and webapps, but it's not a weekend project.
posted by number9dream at 1:43 PM on June 21, 2019 [1 favorite]

Your problem and level of technical ability is basically what low code cloud based database platforms are designed for. At my organization, we use Quick Base very extensively, but there are plenty of competitors. These will require a budget, and deliver the database, hosting, security, and a variety of integrations for you, at a technical level that you can definitely handle. Nthing the advice that you will not get good results trying to roll your own solution at your current ability level, as you were gesturing at in your "General" section.
posted by Kwine at 1:51 PM on June 21, 2019 [2 favorites]

Best answer: I was using website to mean "HTML and associated files which are accessed through a browser" and I can do THAT on a shared drive, no?

It sounds like you've got limited resources, so, yes, this might be a useful hack which will significantly simplify what you need to do.*

As step 1 of this hack, I would a) not use a database, b) not use a webserver, and c) not create an app. Instead, I'd write a script (I'd suggest using Python) which you run on the command line to read the Excel file, fetch the USDA data, and write out static HTML files.

This will be enough to keep you busy for a while, and it will also form the core of a later conversion to full stack database+webserver+app.

For opening the Excel file in Python, you'll either have to export the Excel file as .csv and use the csv library, or install the pandas library so that you can open the .xls directly.

For fetching the data from USDA, the keyword you need is "API". Using that keyword, I found documentation for the NBD REST API. The friendliest way to access a REST API in Python is to use the requests library.

To install the pandas and requests libraries, you'll need to use pip after install Python. On Linux, run:

sudo pip install requests
sudo pip install pandas

On Windows, open cmd.exe in admin mode and run (pick the right one for the Python version you install):

C:\Python27\Scripts\pip.exe install requests
C:\Python27\Scripts\pip.exe install pandas


C:\Python37\Scripts\pip.exe install requests
C:\Python37\Scripts\pip.exe install pandas

Learning about those things should keep you busy for a while, and will result in usable web pages before you have to add the complexity of web servers, databases, and security.

*You will run into security limits doing it this way if you try to do certain things in Javascript, but if you're just creating static HTML this shouldn't be a problem.
posted by clawsoon at 1:55 PM on June 21, 2019 [2 favorites]

Does this data change often? On the page you linked to, there is a "Downloads" item in the top menu bar; one of the options there is "BFPD Access Database Files", which lets you download a copy (in MS Access format) of what I assume is the full database.

I didn't notice that! It would be even better for the hack I'm suggesting, because it wouldn't require you to learn REST, the requests library, or have to sign up for a government API key. For my Python hack idea, I'd download the "BFPD ASCII Files" link directly below the Access Database link. The files in the ASCII download are .csv, which you'll be able to open directly in Python with the built-in csv library I mentioned. Download the "BFDP ASCII Files" once for development purposes; later, you can set up a cron job (Linux) or Scheduled Task (Windows) to download it once a day or so.
posted by clawsoon at 2:04 PM on June 21, 2019

One way to do this would be to:

1) Set up a RDBMS (MySQL or Postgres are both free-as-in-beer), on a virtual or real piece of hardware, and import the various CSV dumps from BFPD ASCII Files there. You'd have a table for Products, one for Serving Size, etc. You would also import your in-house data here as well. Importing CSV is usually fairly easy (in the sense that it's a task someone else has already solved a hundred times over).

2) Set up a web server capable of running some DB-enabled programmable scripting. This is where you write code that connects to the aforementioned RDBMS and constructs and executes SQL. It then returns a representation of the results of those queries to the browser (e.g. as JSON). This requires you to learn SQL and the scripting language.

3) The browser then interprets that data and displays it in a human-readable way. There are lots of browser-side toolkits for doing things like calling up a JSON API endpoint and turning the results into a table (AG Grid is one I started using recently to replace another older implementation of the same ideas). This requires you to learn JS/HTML/CSS.

4) All of this is hosted on an internal network accessed via LAN or VPN to prevent data leakage. Sounds like your IT department is at least capable of handling that part. Oh, you probably also want to implement some kind of login and access logging, which shouldn't be too bad (add a USERS table and maybe also a REMOTE_ACCESS_LOG table and you could do all that in the DB).

An enthusiastic amateur can probably teach themselves to do all of this, but it will take you a lot of effort and time because there are a number of dependencies you ALSO have to teach yourself how to do before you get to the core functionality [e.g., step 1 above really reads more like (1a) install linux [possibly on a virtual host], (1b) install rdbms, (1c) configure rdbms, (1d) learn how to import csv to rdbms, (1e) construct a cron job or other script to do this daily].

As far as web apps go it's what a professional (e.g., yours truly) would consider low-to-medium difficulty (I could probably rough out a proof of concept in week or two, a weekend if I really rushed -- I can do this because I've done it before and I already know all the moving parts and how they fit together), depending on how fancy you want to get with search capabilities and display.

Do you have a budget for this? This is what software development consultants and freelancers are for. If you don't have a budget, total up all the time you and everyone else spends manually doing this, estimate the amount of time saved by having this purpose-built tool alleviating all that tedium, and try to pitch your boss on splashing out for some skilled help. I'm not saying you can't teach yourself all these skills, but like so many other things it's probably cheaper just to hire a professional, not to mention that the result will be more polished.
posted by axiom at 8:40 PM on June 21, 2019

If you are asking this question, it probably means that you're new to web and software development.

My recommendation would be: Don't bother with the whole stack. You'll get too stuck in figuring this out that you won't focus on what actually matters.

Make it in Google Sheets, based on what you have, but better. Create a user interface: have one tab/sheet be the data, either using IMPORTXML to pull in feeds from external sources. The other tab/sheets are "calculators" that can lookup data and and use data validation, VLOOKUPs, and calculations in order to do the calculation. Use protected ranges and cell highlighting to make it clear which cells people should change, and which cells people shouldn't. Make the sheets private and share it only with those who need to use it.

You don't need a relational database. You don't need a webapp.
posted by suedehead at 12:56 PM on June 22, 2019 [4 favorites]

« Older Vacation in France; questions about driving...   |   Low-effort summer solstice celebration with kids Newer »
This thread is closed to new comments.