How can I help my coworkers create XML documents to store metadata?
April 28, 2009 9:50 AM Subscribe
Generating XML documents help? I work with the county health department, and have been asked to come up with a way to create, store and manipulate metadata for various datasets we store internally on a network drive. Basically I think the simplest thing to do is to create an xml file stored in the same directory with any dataset. Preferably I can do this with infopath 2003 or free software, and using the Dublin Core standard for metadata. I've seen the 5 year old answers on Dublin Core Metadata from this Ask, but I need to let my coworkers create the data first, and that mostly covers storage/database solutions. Looking through various DC list archives hasn't been helpful.
I realize that having an xml file for each dataset may make it hard to actually search and manipulate the data, but at least we'd be storing it somewhere in a standard form.
We have Office 2003 on all the relevant machines, to the best of my knowledge, and I think Infopath 2003 should be able to create forms which create xml compliant with dublin core (either simple or qualified, preferably the later). It would be rather difficult to have our IT department roll out any sort of complex IT solution or have a server setup with the authority I have. I could however use some sort of database file stored on the network drive.
I'd really prefer to have forms or generate the XML programatically, rather then having them hand edit a template or something. The idea is to make storing the data as smooth as possible.
Does anybody have any suggestions for me or strong useful resources that you can point me too? I'm not exactly sure how to take the XML schemas for Dublin Core and create a compliant form from it.
I'm not married to Dublin Core if there is another standard or to infopath 2003 if there is a better program that won't cost anything.
I realize that having an xml file for each dataset may make it hard to actually search and manipulate the data, but at least we'd be storing it somewhere in a standard form.
We have Office 2003 on all the relevant machines, to the best of my knowledge, and I think Infopath 2003 should be able to create forms which create xml compliant with dublin core (either simple or qualified, preferably the later). It would be rather difficult to have our IT department roll out any sort of complex IT solution or have a server setup with the authority I have. I could however use some sort of database file stored on the network drive.
I'd really prefer to have forms or generate the XML programatically, rather then having them hand edit a template or something. The idea is to make storing the data as smooth as possible.
Does anybody have any suggestions for me or strong useful resources that you can point me too? I'm not exactly sure how to take the XML schemas for Dublin Core and create a compliant form from it.
I'm not married to Dublin Core if there is another standard or to infopath 2003 if there is a better program that won't cost anything.
Okay, I've re-read your original post a few times and I think you're asking your question too specifically. I think that what you're describing is the need for a type of product called a Document Management System.
I would just install Alfresco the pre-eminent open source DMS, load all of your files into it, and just go nuts trying to customize it to work exactly the way you want it to. I am recommending this kind of kamikaze, attack-your-problem approach because DMS is such an enormous, sprawling, and complicated field that you will simply get lost if you try to read about it and plan everything out ahead of time. Alfresco is very much a high-quality product that is competitive with the commercial solutions out there and you will not lose anything by doing what I'm describing.
The worst case is that you'll run into some shortcoming with Alfresco and have to scrap it and start over with a different product, but simply charging forward and trying everything with Alfresco will probably help you to understand your problem much better and what the possibilities are. (Best case, you get a laurel wreath and a promotion by vicariously using the intuition of this content management / document management engineer.)
posted by XMLicious at 10:41 AM on April 28, 2009
I would just install Alfresco the pre-eminent open source DMS, load all of your files into it, and just go nuts trying to customize it to work exactly the way you want it to. I am recommending this kind of kamikaze, attack-your-problem approach because DMS is such an enormous, sprawling, and complicated field that you will simply get lost if you try to read about it and plan everything out ahead of time. Alfresco is very much a high-quality product that is competitive with the commercial solutions out there and you will not lose anything by doing what I'm describing.
The worst case is that you'll run into some shortcoming with Alfresco and have to scrap it and start over with a different product, but simply charging forward and trying everything with Alfresco will probably help you to understand your problem much better and what the possibilities are. (Best case, you get a laurel wreath and a promotion by vicariously using the intuition of this content management / document management engineer.)
posted by XMLicious at 10:41 AM on April 28, 2009
And also, Infopath is not a DMS, it's not even remotely the same type of thing. I think you were completely barking up the wrong tree there.
Alfresco interoperates with Microsoft Office in a variety of ways, as do most of the DMS solutions you would probably consider, if you had some other reason you wanted Office involved.
posted by XMLicious at 10:50 AM on April 28, 2009
Alfresco interoperates with Microsoft Office in a variety of ways, as do most of the DMS solutions you would probably consider, if you had some other reason you wanted Office involved.
posted by XMLicious at 10:50 AM on April 28, 2009
Response by poster: Is the metadata going to have the same fields and possible values for each field every time? If it is, something like Infopath would be such incredible overkill that it would be like using a bazooka to kill a cockroach.
Same fiends, but different possible values for each field. We pull regularly from 30 databases, and irregularly from a lot more.
Did someone really just tell you "make metadata"? Is the metadata going to actually be used for anything in particular - i.e. is it going to produce some sort of indexing or organizing capability down the road? I regard the concept of metadata as something almost like a trap - it's waaaaaay to easy to get fascinated with ontologies and standards and "semantic web" type notions without realizing that you aren't really accomplishing anything of value or practical use.
Well I was just asked to find a standardized format so others could create metadata. While I'm hoping to have indexing and organizing capability in the future, what we really need is basic metadata to be saved in some sort of standardized format. Where the dataset came from, who created it, when it was downloaded, when it was created. They were going to expand on a geospatial format, but I think dublin core is more appropriate.
XML is just text, it's really, really easy to create. I am normally in favor of elegantly or robustly engineered solutions, and government agencies often dig themselves into holes with custom solutions for things, but what you've described so far is so simple I'm inclined to say that you should just take a whack at doing this yourself in whatever programming language or tool you're familiar with which could accomplish it, or if you aren't confident in programming yourself just recruit the first programmer you come across and let them use whatever language or tools they're most familiar with.
Well yes, and if it was me creating it for every dataset, I might be willing to hand code the xml using a template doc. But I've got a wide range of computer competency to provide for, and want to with a low common denominator because I've found that implementations of technology that have high barriers to entry in terms of knowledge don't get successfully implemented. The more I can do to make things simple to use, the more likely I am to be able to get the metadata created. Once it's created in a machine readable format, I or others can do whatever they want with it.
To try to summarize: a form user interface and programmatic generation of XML are such universal capabilities today that you probably should not focus on finding the technicalogically optimum way of doing this, but rather you should try to find an approach that is easiest or quickest or cheapest or some other criteria like that.
Easiest to implement and simple to use then.
posted by gryftir at 11:16 AM on April 28, 2009
Same fiends, but different possible values for each field. We pull regularly from 30 databases, and irregularly from a lot more.
Did someone really just tell you "make metadata"? Is the metadata going to actually be used for anything in particular - i.e. is it going to produce some sort of indexing or organizing capability down the road? I regard the concept of metadata as something almost like a trap - it's waaaaaay to easy to get fascinated with ontologies and standards and "semantic web" type notions without realizing that you aren't really accomplishing anything of value or practical use.
Well I was just asked to find a standardized format so others could create metadata. While I'm hoping to have indexing and organizing capability in the future, what we really need is basic metadata to be saved in some sort of standardized format. Where the dataset came from, who created it, when it was downloaded, when it was created. They were going to expand on a geospatial format, but I think dublin core is more appropriate.
XML is just text, it's really, really easy to create. I am normally in favor of elegantly or robustly engineered solutions, and government agencies often dig themselves into holes with custom solutions for things, but what you've described so far is so simple I'm inclined to say that you should just take a whack at doing this yourself in whatever programming language or tool you're familiar with which could accomplish it, or if you aren't confident in programming yourself just recruit the first programmer you come across and let them use whatever language or tools they're most familiar with.
Well yes, and if it was me creating it for every dataset, I might be willing to hand code the xml using a template doc. But I've got a wide range of computer competency to provide for, and want to with a low common denominator because I've found that implementations of technology that have high barriers to entry in terms of knowledge don't get successfully implemented. The more I can do to make things simple to use, the more likely I am to be able to get the metadata created. Once it's created in a machine readable format, I or others can do whatever they want with it.
To try to summarize: a form user interface and programmatic generation of XML are such universal capabilities today that you probably should not focus on finding the technicalogically optimum way of doing this, but rather you should try to find an approach that is easiest or quickest or cheapest or some other criteria like that.
Easiest to implement and simple to use then.
posted by gryftir at 11:16 AM on April 28, 2009
Infopath can be used to build forms from an XML schema, so this would be an easy way to let others "fill in" an XML document. However, there will still be problems. Namely all the documents still need to be managed - and now you have twice as many (the document, and the associated document metadata file).
Windows SharePoint Services is free, and can be used to support the Dublin Core standard. It integrates tightly with Office and the rest of the Microsoft server stack and has loads more functionality. Unless you can setup and maintain your own Windows 2003 server, it would require IT involvement.
Some questions: How many documents do you have on your network drive? Are the documents critical to your business? Why don't you want IT involved?
posted by askmehow at 11:42 AM on April 28, 2009
Windows SharePoint Services is free, and can be used to support the Dublin Core standard. It integrates tightly with Office and the rest of the Microsoft server stack and has loads more functionality. Unless you can setup and maintain your own Windows 2003 server, it would require IT involvement.
Some questions: How many documents do you have on your network drive? Are the documents critical to your business? Why don't you want IT involved?
posted by askmehow at 11:42 AM on April 28, 2009
Response by poster: IT won't implement any sort of server/enterprise system including a DMS like you describe without moving bureaucratic mountains, including possibly competitive bidding. And we have no budget for the time it would take to implement it. I think I am stuck with programs that can work locally or as a remote file without any real involvement from IT or any additional server/enterprise setup.
posted by gryftir at 11:44 AM on April 28, 2009
posted by gryftir at 11:44 AM on April 28, 2009
Arrgh. I can't find a simple screen shot or video that shows it, but I'm pretty sure that with Alfresco you'd be able to set something up where your user would drag and drop one of these files into the system and they'd be presented with a metadata form to fill out. They might even provide a sample setup of this with Dublin Core fields somewhere. The files would be accessible just like a shared drive - really through Windows Explorer exactly the same way you do now: this is shown (along with a whole lot of other stuff, one hour long) in the video "webinar" Replacing Your Shared Drive With Alfresco. (Clicking on that link will probably ask to install the WebEx video player plugin first.)
...but on preview it sounds like you would have IT barriers on the DMS route. IT in a government agency... *shudder* I feel your pain.
To run with your original idea to create XML files: if you don't have an in-house programmer with a preference for development platforms or tools they've already got experience with, I would say create this as a Microsoft XAML Browser Application. This would require installing the free .NET Framework on each user's computer. The form could be as easy to use as the sort of forms that appear on web sites and the skills needed to create and modify them would be very similar to the skills used to create a web form. If you want more details on how to approach this solution I can elaborate.
posted by XMLicious at 12:17 PM on April 28, 2009
...but on preview it sounds like you would have IT barriers on the DMS route. IT in a government agency... *shudder* I feel your pain.
To run with your original idea to create XML files: if you don't have an in-house programmer with a preference for development platforms or tools they've already got experience with, I would say create this as a Microsoft XAML Browser Application. This would require installing the free .NET Framework on each user's computer. The form could be as easy to use as the sort of forms that appear on web sites and the skills needed to create and modify them would be very similar to the skills used to create a web form. If you want more details on how to approach this solution I can elaborate.
posted by XMLicious at 12:17 PM on April 28, 2009
Best answer: in that case, maybe you can create an access database and a data entry form that is stored on the network file share alongside the documents. ask users to enter new/edit document meta-data in the db when appropriate.
this meets your IT requirements, and its in a format that can be easily exported when a more robust solution becomes important.
posted by askmehow at 12:19 PM on April 28, 2009
this meets your IT requirements, and its in a format that can be easily exported when a more robust solution becomes important.
posted by askmehow at 12:19 PM on April 28, 2009
Best answer: the only issue i see with using XML files is that you are then going to have a whole new set of documents to manage, plus you will need to maintain a relationship between a document and its meta-data file.
i think it will serve you better in the long run to the metadata for all of the existing files in a single datastore (mdb, xml, xls, csv, whatever).
posted by askmehow at 12:25 PM on April 28, 2009
i think it will serve you better in the long run to the metadata for all of the existing files in a single datastore (mdb, xml, xls, csv, whatever).
posted by askmehow at 12:25 PM on April 28, 2009
Response by poster: XMLicious: Well... there are people who do SAS code. That's as close as an in house programmer as we get. I am not any sort of serious programmer. I'm pretty sure our two health department IT people aren't programmers, or if they are it's not a skill that's called upon. They do sysop stuff, control directory permissions, and do computer repair. That's why I liked Infopath 2003, since it seems designed specifically for end users. Also our computers are locked down, so already installed software is useful.
Unfortunately infopath 2003 it has poor documentation for doing stuff like creating a form to generate XML using one of the DC XML schemas, and my google foo failed me. Seriously, I can import the XML schema, and then I try to make a form element and get errors.
Askmehow: Yeah I understand the central repository vs. distributed xml files issue. It sucks, but I feel like I need to be able to have users simply click on an XML file in the same directory as the dataset file. It really does need to be very simple.
Is there is some way to link the access database to the file or the folder the file is in, or create a link in each folder to the relevant entry in the database? I suppose I could do the same thing with an excel spreadsheet, but only one person can edit those at a time with the way our system is set up. I don't know if access databases have the same problem.
If there is some way to enter in the metadata once and have it go to a single data store and also get an XML file?
Anyway current metadata is all locked in people's heads. If I can get a real simple system to get it out and use it then that's what I need to do. Then if it's distributed but not indexed... well that would possibly justify some IT involvement, or may require someone to periodically spider our network drive for XML files, download them all, read the data, and put it into a database.
posted by gryftir at 1:31 PM on April 28, 2009
Unfortunately infopath 2003 it has poor documentation for doing stuff like creating a form to generate XML using one of the DC XML schemas, and my google foo failed me. Seriously, I can import the XML schema, and then I try to make a form element and get errors.
Askmehow: Yeah I understand the central repository vs. distributed xml files issue. It sucks, but I feel like I need to be able to have users simply click on an XML file in the same directory as the dataset file. It really does need to be very simple.
Is there is some way to link the access database to the file or the folder the file is in, or create a link in each folder to the relevant entry in the database? I suppose I could do the same thing with an excel spreadsheet, but only one person can edit those at a time with the way our system is set up. I don't know if access databases have the same problem.
If there is some way to enter in the metadata once and have it go to a single data store and also get an XML file?
Anyway current metadata is all locked in people's heads. If I can get a real simple system to get it out and use it then that's what I need to do. Then if it's distributed but not indexed... well that would possibly justify some IT involvement, or may require someone to periodically spider our network drive for XML files, download them all, read the data, and put it into a database.
posted by gryftir at 1:31 PM on April 28, 2009
I hate to tell you but you're really starting to describe writing a DMS from scratch. But I understand the strictures you're probably under from your IT staff. (It still might be easier to arrange some Machiavellian scheme to have them deposed and replaced with your puppet IT regime than build a DMS from scratch, though.)
Infopath and the products that compete with it aren't general-purpose editors for every XML file, I think you can see: they're a sort of "platform" that a programmer would build on top of to create more specialized editing programs. But Infopath's capabilities are way, way more than what would be needed to accomplish what you're describing. It's really for someone who is trying to make an easy-to-use editor that would fit into a large, complicated software system already passing XML around all over the place, rather than the sort of application you're describing; that's probably why the documentation is bad.
My guess from using similar systems would be that the conventional usage of Infopath would probably involve adding custom Infopath instructions into the Dublin Core schema. The sort of thing you are hoping to do with it is what the people who invented XML back in the nineties dreamed of, but their hopes have not been realized, for better or for worse.
posted by XMLicious at 1:47 PM on April 28, 2009
Infopath and the products that compete with it aren't general-purpose editors for every XML file, I think you can see: they're a sort of "platform" that a programmer would build on top of to create more specialized editing programs. But Infopath's capabilities are way, way more than what would be needed to accomplish what you're describing. It's really for someone who is trying to make an easy-to-use editor that would fit into a large, complicated software system already passing XML around all over the place, rather than the sort of application you're describing; that's probably why the documentation is bad.
My guess from using similar systems would be that the conventional usage of Infopath would probably involve adding custom Infopath instructions into the Dublin Core schema. The sort of thing you are hoping to do with it is what the people who invented XML back in the nineties dreamed of, but their hopes have not been realized, for better or for worse.
posted by XMLicious at 1:47 PM on April 28, 2009
Response by poster: Hmm... looking over the access help file, I'm wondering if I can enter stuff into the database using a form, set a Dublin Core schema, then export the entry into XML.
Infopath may indeed be overkill.
posted by gryftir at 3:02 PM on April 28, 2009
Infopath may indeed be overkill.
posted by gryftir at 3:02 PM on April 28, 2009
Best answer: this is exactly what infopath was designed for.
here is the schema for the dublin core standard: ">http://dublincore.org/schemas/xmls/simpledc20021212.xsd
here is how you create an infopath form from an existing xsd schema:
http://msdn.microsoft.com/en-us/library/bb251017.aspx
here is more information about the various moving parts of infopath forms: ">http://office.microsoft.com/en-us/infopath/HA100626851033.aspx
the issue is, and i promise its the last time i'll bring it up, you will end up with 1 metadata file per document. this can become very difficult to manage over time.
for instance -
*if document names and/or locations change.
*if a document gets modified but not the metadata.
*if you have 2 different documents with the same name in different locations.
all of these are issues that DMS solves, and will be alleviated to some extent with a central repository of document meta-data. essentially there are no easy, cheap and fast solutions to complex problems.
i think you will be able to come up with something that helps in the short term, but you should bear in mind that you are building a stop-gap solution to what is ultimately an IT problem.
to that end, i would keep a document of all assumptions, tricks, and implementation decisions that you've made while building your solution. this will definitely help the people that inherit and/or work with it in the future.
posted by askmehow at 3:37 PM on April 28, 2009
here is the schema for the dublin core standard: ">http://dublincore.org/schemas/xmls/simpledc20021212.xsd
here is how you create an infopath form from an existing xsd schema:
http://msdn.microsoft.com/en-us/library/bb251017.aspx
here is more information about the various moving parts of infopath forms: ">http://office.microsoft.com/en-us/infopath/HA100626851033.aspx
the issue is, and i promise its the last time i'll bring it up, you will end up with 1 metadata file per document. this can become very difficult to manage over time.
for instance -
*if document names and/or locations change.
*if a document gets modified but not the metadata.
*if you have 2 different documents with the same name in different locations.
all of these are issues that DMS solves, and will be alleviated to some extent with a central repository of document meta-data. essentially there are no easy, cheap and fast solutions to complex problems.
i think you will be able to come up with something that helps in the short term, but you should bear in mind that you are building a stop-gap solution to what is ultimately an IT problem.
to that end, i would keep a document of all assumptions, tricks, and implementation decisions that you've made while building your solution. this will definitely help the people that inherit and/or work with it in the future.
posted by askmehow at 3:37 PM on April 28, 2009
let me try again w/the links:
dublin core
creating infopath form from xsd schema
infopath moving parts
posted by askmehow at 3:39 PM on April 28, 2009
dublin core
creating infopath form from xsd schema
infopath moving parts
posted by askmehow at 3:39 PM on April 28, 2009
This thread is closed to new comments.
Did someone really just tell you "make metadata"? Is the metadata going to actually be used for anything in particular - i.e. is it going to produce some sort of indexing or organizing capability down the road? I regard the concept of metadata as something almost like a trap - it's waaaaaay to easy to get fascinated with ontologies and standards and "semantic web" type notions without realizing that you aren't really accomplishing anything of value or practical use.
XML is just text, it's really, really easy to create. I am normally in favor of elegantly or robustly engineered solutions, and government agencies often dig themselves into holes with custom solutions for things, but what you've described so far is so simple I'm inclined to say that you should just take a whack at doing this yourself in whatever programming language or tool you're familiar with which could accomplish it, or if you aren't confident in programming yourself just recruit the first programmer you come across and let them use whatever language or tools they're most familiar with.
To try to summarize: a form user interface and programmatic generation of XML are such universal capabilities today that you probably should not focus on finding the technicalogically optimum way of doing this, but rather you should try to find an approach that is easiest or quickest or cheapest or some other criteria like that.
posted by XMLicious at 10:30 AM on April 28, 2009