Access database - photos, how to proceed
June 27, 2018 7:26 PM
I have someone building me an Access database for my plant info and plant library - they haven't handled a lot of image data before so I'm hoping to smooth the way for them. I'm sure my query is all over the place so please bear with me.
The information side is straightforward but how should we handle the images? The system will be set up to be able to output a maximum of 6 images per species when a query is made. We may have a selection box setup so a user can pre-choose image content types e.g. plant form, roots image, fruit colour, flower colour, fall colour etc.
Should the images be duplicated for the DB to query, or can it work off the core set? All images are in one folder.
Or
Thumbnails made from each one
Or
Something else?
I believe I want three image folders, 1 - the main one with fullsize images, 2 – a thumbnail folder for the report form, 3 – a mid-size image folder with one dimension being around 1000pixels.
Should the images receive a random number as a key? Will it be a problem to also have a readable plant name so the images can still be browsed 'manually'?
There are curremtly about 10000 image files but this could easily double. The primary format is jpg, but also some .tif, .png and a few RAW (but happy if these are not indexed). File sizes range from 100'sKb to 10Mb.
The information side is straightforward but how should we handle the images? The system will be set up to be able to output a maximum of 6 images per species when a query is made. We may have a selection box setup so a user can pre-choose image content types e.g. plant form, roots image, fruit colour, flower colour, fall colour etc.
Should the images be duplicated for the DB to query, or can it work off the core set? All images are in one folder.
Or
Thumbnails made from each one
Or
Something else?
I believe I want three image folders, 1 - the main one with fullsize images, 2 – a thumbnail folder for the report form, 3 – a mid-size image folder with one dimension being around 1000pixels.
Should the images receive a random number as a key? Will it be a problem to also have a readable plant name so the images can still be browsed 'manually'?
There are curremtly about 10000 image files but this could easily double. The primary format is jpg, but also some .tif, .png and a few RAW (but happy if these are not indexed). File sizes range from 100'sKb to 10Mb.
Should the images be duplicated for the DB to query, or can it work off the core set?
Not only should it work off the core set, it should not break in nasty ways whenever the core set gets updated. The entire point of a database is to establish a Single Source of Truth for every data item. Once the data duplication rot sets into a design, it can get very difficult to excise it and duplicated data inevitably leads to inconsistent data.
You definitely do not want to push Access into working with the huge database files that would result from attempting to embed images inside them. That's pretty much the canonical example of "breaks in nasty ways". Access is already quite fragile enough without deliberately engaging in this kind of provocation.
Should the images receive a random number as a key?
Quite seriously, the fact that you are asking this means that your DB implementor is already out of their depth. I agree with toomanycurls that you need to be talking to people who know how to do this.
posted by flabdablet at 8:24 PM on June 27, 2018
Not only should it work off the core set, it should not break in nasty ways whenever the core set gets updated. The entire point of a database is to establish a Single Source of Truth for every data item. Once the data duplication rot sets into a design, it can get very difficult to excise it and duplicated data inevitably leads to inconsistent data.
You definitely do not want to push Access into working with the huge database files that would result from attempting to embed images inside them. That's pretty much the canonical example of "breaks in nasty ways". Access is already quite fragile enough without deliberately engaging in this kind of provocation.
Should the images receive a random number as a key?
Quite seriously, the fact that you are asking this means that your DB implementor is already out of their depth. I agree with toomanycurls that you need to be talking to people who know how to do this.
posted by flabdablet at 8:24 PM on June 27, 2018
There are a lot of answers to this question (and a lot more pieces of information I'd like to know in order to truly answer the question), and toomanycurls covers a lot of good information in their answer. Rather than a lot of back and forth, I'm going to describe how I'd implement this, if I were doing it.
First off, don't store the images in the database. Databases are really not designed for large chunks of data. My approach would be to take a hash of each image file (md5 or sha1 are fine - they are relatively fast and collisions would not be disastrous). Not the file name, mind you, but the actual file data. Then I would store these in a set of directories, using the hex representation of the hash to construct the paths:
let's say IMAGE1.JPG hashes to 926835f08d828001b970100210a6be3e804a0cc6. I would store this as:
/IMAGES/92/68/35/f08d828001b970100210a6be3e804a0cc6.jpg
I would then have a table in the database, something like:
HASH | ORIGINAL_NAME | FILETYPE | {OTHER METADATA}
If access speed is very important, I might consider pre-generating thumbnails, and storing them alongside the image file:
/IMAGES/92/68/35/f08d828001b970100210a6be3e804a0cc6_thumb.jpg
Or generate them on the fly, but cache them once generated.
The reason for this is that a lot of filesystems are not going to like having a single directory with 10000+ images in them, and this ensures that you won't have a ton of images in each directory, but also that the tree won't be too deep and take too long for the application to navigate.
I would advise against trying to make the tree manually browseable, but if that's a requirement, create a separate tree that uses symlinks into the hashed file structure.
If you want to go really serious about it, look at a cloud service like Cloudinary or imgix, but those are better suited to a web app than an Access DB.
posted by jferg at 8:29 PM on June 27, 2018
First off, don't store the images in the database. Databases are really not designed for large chunks of data. My approach would be to take a hash of each image file (md5 or sha1 are fine - they are relatively fast and collisions would not be disastrous). Not the file name, mind you, but the actual file data. Then I would store these in a set of directories, using the hex representation of the hash to construct the paths:
let's say IMAGE1.JPG hashes to 926835f08d828001b970100210a6be3e804a0cc6. I would store this as:
/IMAGES/92/68/35/f08d828001b970100210a6be3e804a0cc6.jpg
I would then have a table in the database, something like:
HASH | ORIGINAL_NAME | FILETYPE | {OTHER METADATA}
If access speed is very important, I might consider pre-generating thumbnails, and storing them alongside the image file:
/IMAGES/92/68/35/f08d828001b970100210a6be3e804a0cc6_thumb.jpg
Or generate them on the fly, but cache them once generated.
The reason for this is that a lot of filesystems are not going to like having a single directory with 10000+ images in them, and this ensures that you won't have a ton of images in each directory, but also that the tree won't be too deep and take too long for the application to navigate.
I would advise against trying to make the tree manually browseable, but if that's a requirement, create a separate tree that uses symlinks into the hashed file structure.
If you want to go really serious about it, look at a cloud service like Cloudinary or imgix, but those are better suited to a web app than an Access DB.
posted by jferg at 8:29 PM on June 27, 2018
In the name of not abusing the edit window - the using of hashes makes the assumption that once an image is put into the system, it's not ever edited. It can be deleted, and a new one (with a new hash) put in, but the minute an image is edited, it breaks the hash.
posted by jferg at 8:30 PM on June 27, 2018
posted by jferg at 8:30 PM on June 27, 2018
BTW does this have anything to do with your preceding question where flabdablet appears to have developed an image annotation solution using Inkscape and SVG for you? If so, continuing to use web-oriented technologies would be much more advisable than bringing in Microsoft Access, in general, even in the case of a computer which isn't connected to a network. A web browser can open files from the computer's hard drive or it can connect to a lightweight web server on the same computer.
posted by XMLicious at 8:31 PM on June 27, 2018
posted by XMLicious at 8:31 PM on June 27, 2018
Also if the interface were to go through a web browser and it was all on the same computer you probably wouldn't need to worry about creating image thumbnails as a browser on a modern desktop computer can easily handle displaying and resizing many 10MB images at a time with no network latency, unless all 20000 images were visible at once or something.
posted by XMLicious at 8:42 PM on June 27, 2018
posted by XMLicious at 8:42 PM on June 27, 2018
Brief answer as I will pause things while I consider all this. I would pefer this was a system that initially ran on a local system - I often end up living in places with crap/spotty internet. Ah I see you have that bit sorted XMLicious
toomanycurls - How broadly is this system of plant imagery going to be used?
Just me on a PC initially but programmer is fully aware I may want to put it on a server later for a very small group as well as when I travel.
flabdablet - Thanks a lot for the heads up.
jferg - Thank you your framework/approach is very helpful.
XMLicious - Yes flabdablet built me a very useful annotation system, but this query is not linked to that one. I didn't fully realise this - "A web browser can open files from the computer's hard drive or it can connect to a lightweight web server on the same computer." and no, I'd never be looking at more than 10 to 30 images simultaneously.
posted by unearthed at 11:05 PM on June 27, 2018
toomanycurls - How broadly is this system of plant imagery going to be used?
Just me on a PC initially but programmer is fully aware I may want to put it on a server later for a very small group as well as when I travel.
flabdablet - Thanks a lot for the heads up.
jferg - Thank you your framework/approach is very helpful.
XMLicious - Yes flabdablet built me a very useful annotation system, but this query is not linked to that one. I didn't fully realise this - "A web browser can open files from the computer's hard drive or it can connect to a lightweight web server on the same computer." and no, I'd never be looking at more than 10 to 30 images simultaneously.
posted by unearthed at 11:05 PM on June 27, 2018
If toomanycurls's suggestion of using an off-the-shelf asset management system isn't workable—perhaps you have no budget but free programming help?—and you'll be the only one entering information, my first thought for approaching it would be to see how well your computer handles editing a 20000-line spreadsheet with the kind of data you'll be storing. (Actually, do you already have the spreadsheet?)
Then, if that provides a comfortable enough editing interface with the normal spreadsheet features, I would save the spreadsheet in .csv file format and work on a system that takes in the spreadsheet and generates a bunch of interlinked HTML files from it which present the information and images together in a browser, the way you want it to look.
Hence, you would edit your spreadsheet and put image names / paths in six columns along with all the other data on each plant, then run some kind of program or script and it would crank away for a while and update the HTML files with any changes.
Then the whole thing would just be a bunch of files and you could save them to an external hard drive or burn them to a Blu-ray disc to back it all up or to look at it on another computer, or you could upload them to a web host or online file storage service to make it accessible over the internet and searchable via Google.
That's all assuming you just want to display the information and the images together like a reference database and don't have something more complicated in mind when you say "database".
posted by XMLicious at 11:56 PM on June 27, 2018
Then, if that provides a comfortable enough editing interface with the normal spreadsheet features, I would save the spreadsheet in .csv file format and work on a system that takes in the spreadsheet and generates a bunch of interlinked HTML files from it which present the information and images together in a browser, the way you want it to look.
Hence, you would edit your spreadsheet and put image names / paths in six columns along with all the other data on each plant, then run some kind of program or script and it would crank away for a while and update the HTML files with any changes.
Then the whole thing would just be a bunch of files and you could save them to an external hard drive or burn them to a Blu-ray disc to back it all up or to look at it on another computer, or you could upload them to a web host or online file storage service to make it accessible over the internet and searchable via Google.
That's all assuming you just want to display the information and the images together like a reference database and don't have something more complicated in mind when you say "database".
posted by XMLicious at 11:56 PM on June 27, 2018
Just me on a PC initially but programmer is fully aware I may want to put it on a server later for a very small group as well as when I travel.
Given that eventual requirement, I would recommend avoiding Access.
Yes, you can theoretically share an Access back-end between multiple Access front-ends, but doing so in a not-completely-fragile fashion involves about as much work as implementing a web-service layer in between a browser for a front end and a more robust DB in the back-end (all of which can, in most cases, be implemented using software that won't cause you licensing headaches).
Access has existed since before the Web did, and that shows in many, many ways. It's got a reasonably good report generator for textual data, but for mixed media it's a complete bear. As a result, it's much harder to find pre-canned Access applications to do this kind of thing than pre-canned Web-based stuff.
Access is a pretty good entry-level tool for learning about databases and maybe doing a bit of proof-of-concept prototyping. But in 2018, using it for anything beyond toys you're happy to learn how to tinker with personally rapidly becomes an expensive waste of time and resources.
posted by flabdablet at 12:08 AM on June 28, 2018
Given that eventual requirement, I would recommend avoiding Access.
Yes, you can theoretically share an Access back-end between multiple Access front-ends, but doing so in a not-completely-fragile fashion involves about as much work as implementing a web-service layer in between a browser for a front end and a more robust DB in the back-end (all of which can, in most cases, be implemented using software that won't cause you licensing headaches).
Access has existed since before the Web did, and that shows in many, many ways. It's got a reasonably good report generator for textual data, but for mixed media it's a complete bear. As a result, it's much harder to find pre-canned Access applications to do this kind of thing than pre-canned Web-based stuff.
Access is a pretty good entry-level tool for learning about databases and maybe doing a bit of proof-of-concept prototyping. But in 2018, using it for anything beyond toys you're happy to learn how to tinker with personally rapidly becomes an expensive waste of time and resources.
posted by flabdablet at 12:08 AM on June 28, 2018
assuming you just want to display the information and the images together like a reference database and don't have something more complicated in mind when you say "database".
If I understand the requirement correctly, the point is to provide a method of identifying plants that meet specific criteria (e.g. from the earlier question: "I'd like to enter one or more plant attributes e.g. pH tolerance; blue flowers; 2 metres tall etc and have a list returned").
So to me, that says web form with a bunch of filter boxes for search criteria and a Submit button, and when you hit Submit you get back the first of a set of linked web pages with nicely formatted descriptions of the plant types that meet those search criteria complete with photos, and stepping from one matching plant to the next and back should be smooth and very very fast because the pages should have bugger-all on them but IMG tags and a bit of text. And if you want to get a bit fancy, clicking on a photo should bring up a full-resolution original in its own browser tab (not as one of those furshlugginer lightbox overlays, ack pfui).
And if the filter boxes can be neatly incorporated into the results pages, so that starting a new search or adding additional filter criteria is about as easy as clicking the link to the next match on the existing criteria, so much the better.
This kind of stuff is what dynamic Web pages were invented for.
posted by flabdablet at 12:16 AM on June 28, 2018
If I understand the requirement correctly, the point is to provide a method of identifying plants that meet specific criteria (e.g. from the earlier question: "I'd like to enter one or more plant attributes e.g. pH tolerance; blue flowers; 2 metres tall etc and have a list returned").
So to me, that says web form with a bunch of filter boxes for search criteria and a Submit button, and when you hit Submit you get back the first of a set of linked web pages with nicely formatted descriptions of the plant types that meet those search criteria complete with photos, and stepping from one matching plant to the next and back should be smooth and very very fast because the pages should have bugger-all on them but IMG tags and a bit of text. And if you want to get a bit fancy, clicking on a photo should bring up a full-resolution original in its own browser tab (not as one of those furshlugginer lightbox overlays, ack pfui).
And if the filter boxes can be neatly incorporated into the results pages, so that starting a new search or adding additional filter criteria is about as easy as clicking the link to the next match on the existing criteria, so much the better.
This kind of stuff is what dynamic Web pages were invented for.
posted by flabdablet at 12:16 AM on June 28, 2018
flabdablet, providing a method of identifying plants that meet specific criteria is exactly what I want (need). Very succinctly put.
posted by unearthed at 1:28 AM on June 28, 2018
posted by unearthed at 1:28 AM on June 28, 2018
If we're only talking a simple table of plant attributes with something of the order of ten thousand rows, and you're happy to maintain it by editing a spreadsheet, this kind of filtering doesn't even need a proper database engine. I would expect a single HTML page with a few hundred lines of Javascript embedded in it to be quite adequate.
posted by flabdablet at 6:40 AM on June 28, 2018
posted by flabdablet at 6:40 AM on June 28, 2018
An example of what flabdablet describes there, from earlier in this decade, is the MIT SIMILE project's free and standardized Exhibit widget; this cereal characters demo may be closest to what you describe, though there's only one image associated with each entry in that case.
posted by XMLicious at 1:44 PM on June 28, 2018
posted by XMLicious at 1:44 PM on June 28, 2018
« Older California: What to pay mother's helper? | How do encourage my company to decline unethical... Newer »
This thread is closed to new comments.
There are a couple of things about asset management to cover which you didn't directly ask but you really need to know before paying for this database.
-asset resizing is a capability your system should handle on the fly. I assume you have some image resolution you want to see images at by default in addition to having the original size and thumbnail. If the system you're having built or buying doesn't do that then you are not getting a good product
-asset query can be done in several ways which any worth its salt asset management system will do. You should be about to index on any metadata that you are willing to capture and maintain about your asset. First you'll want to define parameters that you'll want to search by (plant name, color, desx, etc.) and go through the effort of tagging your existing image library
Bluntly, whether you're powering some internal or personal image lookup tool or a web site you are describing a consumer off the shelf technology that can be plug and play with the majority of your work done in configuration rather than forcing this into Access where is definitely does not belong.
My cred for this is having worked on asset production and management software (where it was built in house for good cause and by developers who worked in bleeding edge software technology). Memail me if you want to talk in more detail. I can look up vendors I explored for competitive analysis to share with you.
posted by toomanycurls at 8:06 PM on June 27, 2018