<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel> 

	<title>Comments on: What's the best way to set up a tagging system?</title>
	<link>http://ask.metafilter.com/36957/Whats-the-best-way-to-set-up-a-tagging-system/</link>
	<description>Comments on Ask MetaFilter post What's the best way to set up a tagging system?</description>
	<pubDate>Mon, 24 Apr 2006 23:04:31 -0800</pubDate>
	<lastBuildDate>Mon, 24 Apr 2006 23:04:31 -0800</lastBuildDate>
	<language>en-us</language>
	<docs>http://blogs.law.harvard.edu/tech/rss</docs>
	<ttl>60</ttl>

	<item>
		<title>Question: What&apos;s the best way to set up a tagging system?</title>
		<link>http://ask.metafilter.com/36957/Whats-the-best-way-to-set-up-a-tagging-system</link>	
		<description>Database experts: I&apos;m setting up a tagging system for a site I run. What&apos;s the most efficient way to go about it? &lt;br /&gt;&lt;br /&gt; As part of the next version of the mp3 hosting site I run, I want to give users the ability to tag each mp3 they upload with various keywords that people can search later on down the track, much like we can do here with threads. I&apos;d like to know what the general consensus is on the best method of setting something like this up, and how best to implement it.&lt;br&gt;
&lt;br&gt;
I thought of making a table, say &apos;tags&apos;, and in that have one new record for each tag a file has. For instance, a user uploads an mp3 and the site assigns it the id &apos;12&apos;. The user selects, from a predetermined list, some tags that best describe the file he&apos;s uploading: &apos;comedy&apos;, &apos;diy&apos;, and &apos;modern french literature&apos;. Three new records are created in the &apos;tags&apos; table which correspond to those three tags, ie:&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
id | tagid | fileid&lt;br&gt;
-------------------&lt;br&gt;
 1   comedy    12&lt;br&gt;
 2    diy      12&lt;br&gt;
 3    mfl      12&lt;/font&gt;&lt;br&gt;
&lt;br&gt;
As more files are added and tagged, records are created for those files too, and eventually we have a &apos;tag&apos; table with a trillion records which we can interrogate at our leisure.&lt;br&gt;
&lt;br&gt;
Is this the best / most efficient way to do this?</description>
		<guid isPermaLink="false">post:ask.metafilter.com,2006:site.36957</guid>
		<pubDate>Mon, 24 Apr 2006 22:46:50 -0800</pubDate>
		<dc:creator>Savvas</dc:creator>
		
			<category>tags</category>
		
			<category>php</category>
		
			<category>mysql</category>
		
			<category>databases</category>
		
	</item> <item>
		<title>By: paulsc</title>
		<link>http://ask.metafilter.com/36957/Whats-the-best-way-to-set-up-a-tagging-system#572741</link>	
		<description>Presuming fileid is a &lt;a href=&quot;http://en.wikipedia.org/wiki/Foreign_key&quot;&gt;foreign key&lt;/a&gt; from some table with details about the file, it has possibilities.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2006:site.36957-572741</guid>
		<pubDate>Mon, 24 Apr 2006 23:04:31 -0800</pubDate>
		<dc:creator>paulsc</dc:creator>
	</item><item>
		<title>By: nicwolff</title>
		<link>http://ask.metafilter.com/36957/Whats-the-best-way-to-set-up-a-tagging-system#572744</link>	
		<description>No, because you&apos;d be duplicating tag names in that table. Normally, you&apos;d have a table of tags and a &quot;join table&quot; that associated the tags with the files:&lt;br&gt;
&lt;br&gt;
&lt;code&gt;create table files ( &lt;br&gt;
&amp;nbsp;&amp;nbsp;file_id serial primary key, &lt;br&gt;
&amp;nbsp;&amp;nbsp;file_name varchar(128), &lt;br&gt;
&amp;nbsp;&amp;nbsp;file bytea  -- or whatever your DB calls it &lt;br&gt;
);&lt;br&gt;
&lt;br&gt;
create table tags (&lt;br&gt;
&amp;nbsp;&amp;nbsp;tag_id serial primary key,&lt;br&gt;
&amp;nbsp;&amp;nbsp;tag_name varchar(128)&lt;br&gt;
);&lt;br&gt;
&lt;br&gt;
create table file_has_tag (&lt;br&gt;
&amp;nbsp;&amp;nbsp;file_id integer references files on delete cascade,&lt;br&gt;
&amp;nbsp;&amp;nbsp;tag_id integer references tags on delete cascade&lt;br&gt;
);&lt;/code&gt;&lt;br&gt;
&lt;br&gt;
The &quot;tags&quot; table becomes your &quot;predetermined list&quot; of tags. To get all the files with some tag, you do, for example:&lt;br&gt;
&lt;br&gt;
&lt;code&gt;SELECT files.* FROM files JOIN file_has_tag USING (file_id) JOIN tags USING (tag_id) WHERE tag_name = &apos;comedy&apos;;&lt;/code&gt;&lt;br&gt;
&lt;br&gt;
You&apos;ll want to index the file_has_tag table on both file_id and tag_id.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2006:site.36957-572744</guid>
		<pubDate>Mon, 24 Apr 2006 23:24:33 -0800</pubDate>
		<dc:creator>nicwolff</dc:creator>
	</item><item>
		<title>By: devilsbrigade</title>
		<link>http://ask.metafilter.com/36957/Whats-the-best-way-to-set-up-a-tagging-system#572748</link>	
		<description>Keep in mind that depending on the database, a JOIN can be more expensive than alternate ways. &lt;br&gt;
&lt;br&gt;
Some databases in particular have full-text search capabilities that will make their own keyword tables. If your database has this, it would be worth seeing if this was faster than the JOIN &amp;amp; corrisponding SQL.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2006:site.36957-572748</guid>
		<pubDate>Mon, 24 Apr 2006 23:30:38 -0800</pubDate>
		<dc:creator>devilsbrigade</dc:creator>
	</item><item>
		<title>By: MetaMonkey</title>
		<link>http://ask.metafilter.com/36957/Whats-the-best-way-to-set-up-a-tagging-system#572749</link>	
		<description>This similar &lt;a href=&quot;http://ask.metafilter.com/mefi/34897&quot;&gt;AskMe on setting up a tagging system&lt;/a&gt; may be handy.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2006:site.36957-572749</guid>
		<pubDate>Mon, 24 Apr 2006 23:32:04 -0800</pubDate>
		<dc:creator>MetaMonkey</dc:creator>
	</item><item>
		<title>By: C.Batt</title>
		<link>http://ask.metafilter.com/36957/Whats-the-best-way-to-set-up-a-tagging-system#572759</link>	
		<description>Go with 2 tables as you originally outlined.&lt;br&gt;
&lt;br&gt;
With 1 table and a Tag field containing a delimited list of tags, your queries will either be slow, or just look weird.  You could make it work, but it probably isn&apos;t a great idea.&lt;br&gt;
&lt;br&gt;
With 3 tables you&apos;re basically saying that every time a user adds a file and tags it, your process is going to lookup the tag in the Tags table to see if it already exists.  If it exists, it will create a record in the association table using the key of the existing Tag record and the new tagged-item.  If it doesn&apos;t exist, the process&apos;ll insert the tag into the Tags table and create a record in the association table using the key of the newly inserted tag.  (I hope that makes some sense).&lt;br&gt;
&lt;br&gt;
Anyhow, the main difference between 2 tables and 3 tables is one of performance vs. storage space (respectively).  2 tables will have slightly better performance overall and 3 tables slightly better storage.  I say slightly because &quot;Tag&quot; data is really quite insignificant (varchar(50) perhaps?) and generally won&apos;t affect either parameter.&lt;br&gt;
&lt;br&gt;
One thing to consider is that a Tag&apos;s meaning is the Tag itself.  There isn&apos;t any other data really associated to the tag.  Unless you see a point in having each Tag in the system defined as a unique entity, then there&apos;s no point for the 3rd table other than satisfying certain aspects of academic correctness.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2006:site.36957-572759</guid>
		<pubDate>Tue, 25 Apr 2006 00:06:38 -0800</pubDate>
		<dc:creator>C.Batt</dc:creator>
	</item><item>
		<title>By: C.Batt</title>
		<link>http://ask.metafilter.com/36957/Whats-the-best-way-to-set-up-a-tagging-system#572762</link>	
		<description>GAH.  Okay, I skipped the part where you mentioned &quot;pre-determined list&quot; of tags.  If you want present a pick-list to the user, go with 3 tables as it&apos;ll ease the creation of the list.  If you want to let them assign tags &quot;free form&quot; go with 2 tables.&lt;br&gt;
&lt;br&gt;
/ d&apos;oh!</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2006:site.36957-572762</guid>
		<pubDate>Tue, 25 Apr 2006 00:15:04 -0800</pubDate>
		<dc:creator>C.Batt</dc:creator>
	</item><item>
		<title>By: Savvas</title>
		<link>http://ask.metafilter.com/36957/Whats-the-best-way-to-set-up-a-tagging-system#572771</link>	
		<description>Rather than having the tags in a database of their own, I have them held in an array within the script itself. When I want to extract information from the tag table for a particular file, I&apos;m doing this:&lt;br&gt;
&lt;br&gt;
select * from tags where fileid=$id_of_file;&lt;br&gt;
&lt;br&gt;
and then just parsing the information that gets returned.&lt;br&gt;
&lt;br&gt;
I don&apos;t know anything about joins or references or anything like that, my SQL knowledge is relatively basic and limited to putting stuff in and getting stuff out, so I have no idea if this is a more efficient way of doing things.&lt;br&gt;
&lt;br&gt;
I&apos;m intrigued by the meaning of &quot;on delete cascade&quot; that you mention in your post, nicwolff. Does this imply that when a particular record gets removed from the file table, its associated tags are removed automatically also? If so, that&apos;s fucking great and I need to do that.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2006:site.36957-572771</guid>
		<pubDate>Tue, 25 Apr 2006 00:49:04 -0800</pubDate>
		<dc:creator>Savvas</dc:creator>
	</item><item>
		<title>By: Savvas</title>
		<link>http://ask.metafilter.com/36957/Whats-the-best-way-to-set-up-a-tagging-system#572772</link>	
		<description>Oops, I meant tags in a table, not database, of their own.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2006:site.36957-572772</guid>
		<pubDate>Tue, 25 Apr 2006 00:49:59 -0800</pubDate>
		<dc:creator>Savvas</dc:creator>
	</item><item>
		<title>By: flabdablet</title>
		<link>http://ask.metafilter.com/36957/Whats-the-best-way-to-set-up-a-tagging-system#572923</link>	
		<description>IIRC, it means that if a record is removed from the file table (or the tags table, because cascade deletion is set for both), the records in the file_has_tags table that reference it also get removed.&lt;br&gt;
&lt;br&gt;
You can still have files with no tags, or tags with no files, but you won&apos;t get stale records in your file_has_tags table that point to nonexistent files or tags.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2006:site.36957-572923</guid>
		<pubDate>Tue, 25 Apr 2006 07:45:02 -0800</pubDate>
		<dc:creator>flabdablet</dc:creator>
	</item><item>
		<title>By: reklaw</title>
		<link>http://ask.metafilter.com/36957/Whats-the-best-way-to-set-up-a-tagging-system#573098</link>	
		<description>It&apos;s worth pointing out that if you have a predetermined list, what you&apos;re doing isn&apos;t really &quot;tags&quot; -- it&apos;s categories, or sections. The whole point of tags is that they are freeform.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2006:site.36957-573098</guid>
		<pubDate>Tue, 25 Apr 2006 10:35:55 -0800</pubDate>
		<dc:creator>reklaw</dc:creator>
	</item><item>
		<title>By: Savvas</title>
		<link>http://ask.metafilter.com/36957/Whats-the-best-way-to-set-up-a-tagging-system#573608</link>	
		<description>Yeah but if you don&apos;t tell them then I won&apos;t, reklaw.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2006:site.36957-573608</guid>
		<pubDate>Tue, 25 Apr 2006 20:04:01 -0800</pubDate>
		<dc:creator>Savvas</dc:creator>
	</item>
	</channel>
</rss>
