Do I really need to keep free tagging on my sites?
August 1, 2012 5:27 PM   Subscribe

It's 2012 and ten of my sites, all driven by syndicated content, have tagging. What value does tagging/folksonomy bring to the table, if any? Is it time to retire tagging?

I maintain a small network of ten Drupal sites. All of them use a freetagging vocabulary for posts alongside a traditional category vocabulary. The content on the biggest site is predominately made up of blog entries about social media/marketing.

Content is submitted by users (sometimes directly, usually pulled in via RSS that has been voluntarily submitted), and curated by a small editorial team that doesn't make any effort to prune or normalize the tags the users add to their posts. As a result, we have about 27,000 rows in our term table for our largest site, almost all of which are tags. In some cases, contributors add no tags; in other cases I've seen as many as two dozen tags on a single eight- or nine-paragraph post.

I don't see much sign that there's an emergent taxonomy going on: There are a number of different conventions in use. Some tags are capitalized, some aren't; some are pluralized, some aren't; some use spaces, some don't; some start with Twitter-style hash marks, most do not. Because our contributors don't really think of themselves as writers for our sites, I don't think they have much incentive to self-police or organize. Because we really only expose tags by the byline/dateline and treat them as equal with categories for purposes of display, tags don't play a big part in the organization of the sites: There are no tag clouds, popularity-driven tag lists, etc. We also don't really expose them as something an individual user can follow or otherwise collect in a personalized space.

When I look at the numbers, our combined taxonomy indexes (categories and tags) comprise less than a half of a percent of actual traffic.

We don't have any interest in cleaning what we have up, either. It would be a huge task, and our editorial team would be on the hook for policing whatever we were left with. Some of that could be handled with simple rules (lower case, no space, no plurals) but I'm not sure that would be to any good end.

I did some analysis on the number of posts per tag and found that the vast majority of the tags in the database are applied to single posts: 12,602 tags are used just once, 2700 tags are used just 2-5 times.
I see the value of tags in certain circumstances. Certainly on a social bookmarking site I can see how they help surface bookmarks and allow communities to organize in the absence of a formal taxonomy tailored to them. I also see their value in organizing fast-moving bodies of information where there's a big universe of things and creating individual categories in a more rigidly maintained taxonomy would be onerous. I also understand that tags can be a value to improved site indexation, and possibly good for SEO (to the extent they generally constitute single-word links to pages full of content related to that keyword).

At the same time, I've got concerns about our situation: We have over ten thousand tag archive pages that have one post each, which means indexation probably isn't really being helped. Rather, I think we're inviting spiders to spend their time crawling down blind alleys.

So, I'm pretty close to pulling the trigger on tags and weathering whatever complaints may come, but I'm doing my due diligence here (and elsewhere) by asking what I might be missing. Anything? My gut tells me tagging was added as a "feature" at some point when that was simply what was done for an up-and-coming site that wanted to show how Web-savvy it was, so I'm more than ready to consign the feature to the same hole those little 80x15 buttons from 2003 have gone.
posted by mph to Computers & Internet (9 answers total) 6 users marked this as a favorite
 
Would it make sense to try to gauge what percentage of traffic from search activities hits on the taxonomy index pages comprise rather than of overall traffic? Here on MeFi I find the tags handy (in fact essential for some purposes) but primarily use them when I'm searching for a remembered post.

Do you not do any "recommendation engine" type stuff like the "Related Questions" at the bottom of this page? I would think that something of that sort is valuable for retaining eyeballs and tags could be helpful in driving it.
posted by XMLicious at 5:42 PM on August 1, 2012 [1 favorite]


I find tags useful, so I'd be wary of throwing them out.

For the spiders, you could just change your robots.txt to suggest that search engines not index the tag pages.

You should match tags far more broadly. I mean, remove the # in #hashtags, and spaces, and use case-insensitive matches, and then see how many of those are used only once. Matching singular and plural is trickier, though a known problem with known plans of attack, though I can't recommend anything offhand.
posted by Pronoiac at 7:57 PM on August 1, 2012


Best answer: You could remove all the tags that are only used once. That would accomplish 80% of your goal with 1% of the effort. It can probably be done with a few SQL commands by someone who knows their way around the Drupal SQL table structure (obviously make backups first, etc etc etc).

Honestly, the tags that are used 2-10 times are the MOST useful, because they help you find that ONE (or two or three) other articles on the site with related content.

So the ones used 2-5 times are not worthless, in fact they are the absolutely most useful of all.

The tags that are useless are the ones that are used only once *AND* the ones that are used dozens and dozens (or even worse, hundreds or thousands) of times--both are meaningless for opposite reasons.

I agree with XMLicious that even though they may not constituent a huge percentage of use or hits, they may be used most of all by your most important users--the power users so to speak, who are really trying to find a specific piece of your content, re-find something they have seen before, or are vitally interested in the topic at hand, not just randomly and casually browsing.

Just for example, when **I** am trying to find a post or bit of information that I posted on our own web site, I most often find it by related tags or the 'similar pages' recommendations that are mostly created by the tags, followed by the search. In fact, what happens most often is I search and find a similar or related page and then drill down through the tags or related pages to find the exact page I was seeking. I suspect other power users use similar methods--that's certainly the type of thing I do when trying to find something on MF, for example.

FWIW we use the simple 'similar pages' drupal module to include 5 or so 'possibly related stories' at the bottom of each story or blog post. http://drupal.org/project/similar That means users get a list of related articles just at the moment they are thinking, "Hey, I've finished this article now, what should I do next?"
posted by flug at 8:34 PM on August 1, 2012


If there were some way to automatically detect and mark synonyms, that would make the tags far more useful--but as you say, with your number of tags it just isn't worth the effort to spend much time, even doing things like that.

Also, agree with Pronoiac that using robots.txt to keep search spiders out of all--or at least almost all--tag pages is wise.
posted by flug at 8:38 PM on August 1, 2012


I'm not seeing what the downside of keeping them is.
posted by gjc at 6:36 AM on August 2, 2012


Response by poster: Thanks, all.

One other downside I didn't mention is overall site performance: We can see where the queries to generate tag index pages are hammering us when a spider comes through, and the editing interface slows drastically when our editorial team tries to modify tags on the larger sites because it's trying to autocomplete from a giant pool. We also see transient memory errors in some parts of the admin interface when a cached taxo query expires.

XMLicious, I read that MeTa from a few days ago re: how MeFi generates recommended posts. I bookmarked it for future reference because it sounds really elegant. We use an external service right now, though. Probably one you've seen on other sites around the Web: It offers a mix of internal content and links to external content.

flug, thanks for the insight into how to interpret that node-per-tag analysis. I had an inkling of what you were saying about the most used ones: They pretty much parallel (in unnormalized fashion) what we're using as formal categories, anyhow. I hadn't really stopped to think about that 2-5-posts-per-tag grouping, though.

I think the jury is still out, but I'm feeling less certain that it's a no-brainer to ditch them.
posted by mph at 10:27 AM on August 2, 2012


There is a lot to respond to and I am headed out the door, but one thing where tags are helping even in their current states is search. Tags often have higher relevance in search engines, but even when the value is flat their linked value increases search relevance.

There are ways to clean-up the tags that you have, but a better approach is to use internal tag stemming approach with internal search (tag, tags, tagging all seen as one by search). Look up Porter Stemming for a path to start to get there.

Also, folksonomy rarely will evolve into a taxonomy, but it does identify gaps in the existing taxonomy for updates there. But, the flat nature is helpful. If tracking who has placed tags and clustering around that aspect it can be helpful as well. Pure folksonomic tagging would let others not creating content to also tag. Author tagging is a little redundant as the terms they use are also often in the text.
posted by vanderwal at 3:28 PM on August 2, 2012


Response by poster: Wow, Thomas. What a pleasure to see you join this question! I attended one of your sessions at Networld Interop a few years back, in my past life as a tech journo.

After putting the idea of ditching tags in front of our editorial team, they pointed out that free tagging is useful to them for organizing short-term navigational aids we don't want to bake into the permanent category taxonomy. So we're looking at keeping a free tagging taxonomy but limiting access to the curators so they can more easily fulfill their trendspotting function without struggling against their contributors' inconsistent notions.
posted by mph at 5:00 PM on August 2, 2012


Mike, happy to give feedback. Also happy to chat offline.

It sounds like you have a good middle ground approach. It is always good to keep free tagging running. How it is use and displayed (along with to whom) is the craft portion.
posted by vanderwal at 9:51 AM on August 3, 2012


« Older How much would you pay to keep me from making more...   |   How to get complementary colors in powerpoint? Newer »
This thread is closed to new comments.