Searching Questions (about CPAN)

Over recent months, search.cpan.org (s.c.o) has been down quite a lot. Lots of Perl people probably don’t notice — they, like me, use MetaCPAN by default, and quite possibly have browser shenanigans set up to munge s.c.o URLs into MetaCPAN ones. (For those wanting to do so, Dave Cross has an excellent explanation of how to do so in Firefox, and how it all works, over at perlhacks.com)

My worry — one a number of people share, based on IRC discussions — is that especially those new-ish to Perl take away a terrible impression of a famously “dying” community, built around a “dead” language, when they see that “CPAN is down”. CPAN isn’t down, of course, but since the page with the most google-juice when you search for any given module is down, I can forgive those who misunderstand. Again on IRC, we get a lot of people asking about this, and I’d say around 50% lead with “is CPAN down?”

Frustrating! So what’s to be done? Well, I’m not sure I’m in a position to decide that; that said, I’ve emailed (I think) the admins of s.c.o to see if there’s any way I can assist them in keeping their service up. That solves the immediate problem.

From those same conversations however, there seems to be a growing consensusbody of opinion that MetaCPAN might be a better default experience anyway. I note it’s already the search engine you get if you start from http://cpan.org/ for instance. So what do we need to know before we can ask anybody to consider making and co-ordinating the necessary changes?

Credit to RJBS for suggesting these 4 actions, which I hope I’ve paraphrased accurately:

  1. What proportion of links to search.cpan.org that exist out in the wild would sco.metacpan.org cope with?
  2. Would the admins of search.cpan.org be happy to make the changes for it to continue to exist at a “legacy” hostname of some kind?
  3. Is search.cpan.org’s downtime really as big a problem as we perceive? Is MetaCPAN any more reliable?
  4. Does anybody in the community have other objections, and can they be persuaded that this solution is workable despite them?

I think question 1 is easy to test. The rewrite rules used by sco.metacpan.org are published on github, but some crawling of the web (and dumps of wikipedia?) should provide a corpus of URLs from which we can divine what formats exist in the wild. This corpus can be checked for distinct structures and compared against the rewrite rules, and perhaps, to chase out edge cases, some proportion tested for real to make sure they return sane results. I plan to make a start on this — at least on obtaining a corpus of URLs from the real world — over this coming weekend.

It seems a little too soon to ask question 2; I’m going to leave that until we have some answers on the others, and think about broaching the subject (and finding out who I should broach it with!) at a later date. If anybody else wants to run with it, I promise I won’t be upset with them.

Question 3 seems easy to test, although it’ll take some time to get meaningful results. I plan to start tracking availability of both services from this weekend; I’ll make the results public, hopefully with some simple visualisations of uptime in the last (day, week, month, quarter), which will at least inform the discussion. I’ve recently been asked if s.c.o is really down as much as I think it is (dunno), and about MetaCPAN’s reliability, and this seems the cleanest way to settle those questions. Data (mostly) doesn’t lie.

Question 4 I simply cannot answer. But, perhaps you can? Leave a comment, or hassle me on IRC (jkg) or Twitter (jkg) and perhaps I can collate some opinions from the community, if people feel that would help move the conversation along.

I’m not pushing for this move to happen immediately; I also lack the clout and connections to make it happen at all. If you’re worried about search.cpan.org going away on the strength of this blogpost: please, don’t be! All I’m trying to do is facilitate the discussion about the future of CPAN search a little. I do have a preferred outcome (I wouldn’t be doing the work to the preserve the status quo), but this is bigger than little ol’ me.

Customary radio silence now resumes.

One thought on “Searching Questions (about CPAN)

  1. CPAN Search Uptime Stuff – jkg on everything

Comments are closed.