In PHP, can you sort by a word which is not the first?
June 21, 2011 1:52 PM   Subscribe

PHPfilter: in a bit over my head: is there a native function to sort an array by a word which isn't always in a fixed position, or do I need to create one? (And if I need to create one, is this the best way to do it?)

I have a form set up to receive reports and process them, breaking each report into an array populated with individual records, and then splitting the records off into other arrays based on different criteria.

I need to be able to sort the final arrays based on words which are not necessarily in a fixed position in each element.

A simplified example: Here's an array in its original form (assume each element is on its own line):
JAdventure Stilton
JFantasy Abbott
JFantasy Osborne
JFantasy Toriyama
JFiction Blume
JFiction Walt
JHist Gonzalez
JHist Osborne
JMystery Erickson
JMystery Sobol
JScifi Dezago
JScifi Star
L.P. JFiction Park
L.P. JHist Tripp

Here's the form I need it to be in:
JFantasy Abbott
JFiction Blume
JScifi Dezago
JMystery Erickson
JHist Gonzalez
JFantasy Osborne
JHist Osborne
L.P. JFiction Park
JMystery Sobol
JScifi Star
JAdventure Stilton
JFantasy Toriyama
L.P. JHist Tripp
JFiction Walt

Is there a native function to do this kind of sorting?

Assuming the answer is no, would the following be the most efficient way to manage what I need done? (Forgive my pseudocode; I'm still just trying to wrap my head around the problem.)
1: RM "^JAdventure " if present:
	a. foreach($jmedia as $key => $value)
	b. if preg_match "^JAdventure " (
		i. preg_replace "^JAdventure " with nothing
		ii. $jmedia[$key][1] = "JAdventure "
		)
2. sort
3. foreach($jmedia as $key =>$value)
	a. if $jmedia[$key][1] then (
		i. add $jmedia[$key][1] as a string to the start of $jmedia[$key]
		ii. remove $jmedia[$key][1]
		)
posted by johnofjack to Computers & Internet (15 answers total) 1 user marked this as a favorite
 
I asked my boyfriend, who is a php whiz.

boyfriend: usort
boyfriend: but really, wtf, use a database
posted by Stephanie Duy at 2:07 PM on June 21, 2011


So, as I understand it, you want to sort on the last word of each line? There's probably a more efficient way do this, but in rough untested psuedo-ish code:

foreach $line in $lines:
  $words = explode(" ", $line)
  $last = words[count($words)-1]
  $keyedLines[$last] = $line

$sorted = ksort($keyedLines)
posted by cgg at 2:07 PM on June 21, 2011 [1 favorite]


Major caveat: I'm not a PHP developer by trade.

It looks like you're shoving all kinds of data together into one field and using, eg, the prefix J to indicate metadata. This is doable but gets ugly and awkward fast and goes bad very easily, but is easily avoided by storing your data in something other than one string per record.

My first reaction is that Stephanie Duy's boyfriend is exactly right and this is why we have databases.

If you really want to keep this as an in-memory data structure, my inclination in treating this as a general programming problem you probably want a custom data structure for this, most likely implemented as a class - ie, a "Record" object that might have a variable for Format (Book, LP, etc), one for Author (Walt, Park) one for Genre (Fantasy, Scifi) or whatever other values you need to store, and then a custom sorting function that would sort on, say, Author. A quick bit of googling implies that you just need to define a mvf() function for your class, though as I say, I have minimal first-hand experience with php and should not be trusted farther than you can throw me.
posted by Tomorrowful at 2:18 PM on June 21, 2011 [1 favorite]


Response by poster: Ah, sorry. This is coming from a database, from proprietary software which isn't very user friendly or customizable. It's a followup to an earlier question.

I'm doing this myself because the IT department has bigger fish to fry....

The author's name actually wouldn't be the last word of each element, either. An example element:
CD JFICTION KRULICK / Krulik, Nancy E
Katie Kazoo, Switcheroo. Drat! You copycat! ; Doggone it! [sound recor
item ID: 32054056852493 Date of discharge: 01/14/2011

I couldn't see how to get usort to do it, but that's probably just because I don't know what I'm doing.

On that site I did stumble onto array_multisort, though, which I think might work--dupe the array, remove cruft from one but not the other, multisort? Would that be incredibly inefficient?

... Hm, no, let me rephrase that: would that be as efficient as possible given the situation (unfriendly software, bloated uncustomizable reports, noob with a computer and a PHP form, etc.)?
posted by johnofjack at 2:56 PM on June 21, 2011


It looks like your actual data is multiline. That changes the approach. What is producing this output now? Are you left with a .txt file you need to parse? Or do you have some existing PHP which is spitting this out?
posted by artlung at 3:20 PM on June 21, 2011


Response by poster: It's coming out incredibly verbose, as text. I'm cleaning it up in PHP, removing the parts of it that we never need and doing some rudimentary grouping and sorting.

Although the original is multiline, each record is indeed its own element in the arrays. The text dump is broken up by whitespace; the linebreaks are converted to HTML (if that matters).

It's ugly, it's subpar, it's inelegant ... I just want something that works and is relatively efficient given the absurd constraints.

The crux of the problem is that I work in a bureaucracy and this is what we're stuck with (except that when I'm completely caught up on everything else I can tinker with PHP and kludge together possible solutions).
posted by johnofjack at 3:45 PM on June 21, 2011


Your pseudocode is doing it the hard way. You can easily use explode to split strings into arrays (and implode to make them strings again), and array_pop to process the elements in an array. array_multisort() is handy but requires each column in its own array.

If you have an array of books like in your example in $a, something like this... I'm assuming that the category can me more than one word but the author is always one word.
$authorlist = array(); // blank array
$catlist = array();
foreach($a) {
 $b = explode($a, " ");
 $authorlist[] = array_pop($b); // put last element in author list
 $catlist[] = implode(" ", $b) // everything but last element since that was popped last line
}
array_multisort($authorlist, SORT_ASC, $catlist, SORT_ASC); // sorts by author first then category
Arrays in PHP are pretty inefficient but it shouldn't really matter unless you're processing boatloads of data and expecting them to show up on a webpage quickly. The alternative is putting your data into a lightweight SQL database like MySQL...
posted by neckro23 at 3:46 PM on June 21, 2011


Oh, I forgot to turn the arrays back into strings. Change SORT_ASC to SORT_DESC (reverse the sorting order) and you can use array_pop() to spit out the strings again (you have to reverse order because pop takes the element off the end of the array, not the beginning):
$newlist = array();
while ($author = array_pop($authorlist) {
 $category = array_pop($catlist);
 $newlist[] = $category . " " . $author; // or echo it or whatever you want to output here
}

posted by neckro23 at 3:51 PM on June 21, 2011


Don't use explode() or array_pop() or any of this stuff.

Stephanie's boyfriend is right, use usort. usort allows you to sort with an arbitrary callback function. Given that you're asking for something that's "as efficient as possible" considering that you can't sort in the database then usort is your best option.
posted by holloway at 5:13 PM on June 21, 2011


Response by poster: I got array_multisort to work. It can handle a 100-page document in .30 seconds, so I'm calling that good enough.

Thanks for the suggestions, everyone.
posted by johnofjack at 5:18 PM on June 21, 2011


Response by poster: usort() still puzzles me. I'll take another look at the documentation; it looks useful even though I don't get it.
posted by johnofjack at 5:18 PM on June 21, 2011


usort is a callback function. You can implement an arbitrary sorting with it by returning either -1, 0, or 1 when it gives you two items.
posted by holloway at 5:23 PM on June 21, 2011


IANA PHP Developer, but usort() isn't a callback function, it just requires you to provide it one, traditionally known as a comparator. strcmp() is one example. If for example, you wanted to do a case insensitive sort (and didn't know about the prewritten one) you could write:


function lcmp($a, $b)
{
$a = strtolower($a);
$b = strtolower($b);
if ($a == $b) {
return 0;
}
return ($a < $b) ? -1 : 1;
}


And then lcmp() as the second parameter. usort() will use your lcmp() to determine which string goes before another. There are many other kinds of sort you can implement with this. For example, you might imagine a sort where you don't treat accented characters as distinct.

The crux of the problem is that I work in a bureaucracy and this is what we're stuck with.

I work as an Application Developer for a research university. 9 times out of 10 that proprietary software random dept bought can be accessed in the normal manner with SQL, if IT were inclined to let you. But sometimes they're not so you get a webcam pointed at the parking garage sign because it's easier than teaching whoever's officially "in charge" of the system to let your trivial webservice have read only access to a view within the database. On the plus side, one of the sysadmins really wanted an excuse to learn AI vision techniques.
posted by pwnguin at 8:44 PM on June 21, 2011 [1 favorite]


Of course usort isn't literally a callback function but the whole point of it is to use a callback function which is what it's called in PHP. It's the appropriate function when there's no simpler way of sorting with another inbuilt function.
posted by holloway at 8:52 PM on June 21, 2011


Depending on what else your code does, the "right" way to do this is probably going to be to parse that horrendous text blob into a more useful format when you pull it from the database. So...
books = {
    {
        'genre' => 'Adventure',
        'author' => 'Stilton',
        'lp' => false
    },
    {
        'genre' => 'Fantasy',
        'author' => 'Abbott',
        'lp' => false
    },
    # ...
}
And then write a usort callback to sort by the author field.
posted by robtoo at 11:23 PM on June 21, 2011


« Older Who cut the cheese?   |   Stay on target... Newer »
This thread is closed to new comments.