Archives For 2009/12/31

>JetCrawl

2010/01/16 — Leave a comment

>In an effort to provide realistic data in places.sqlite, I wrote a data generator in Python which inserts many records into places, and this was a good thing.

The data is entriely made up from random strings, and you end up with urls like this: http://ffhjhfj.uwtgbz.wsc

The same for tags, etc…

In order to make things more realistic and “testable” inside xpcshell, I created a crawler using Jetpack and standard Firefox XPCOM components. I have a feeling that QA might be interested in this as, so I posted to the Jetpack Gallery

It is not configurable from the outside, but I have plans for that. I am planning on making JetCrawl surf every night for a set period to increase my collection of data work with.

I need this data in crafting a new Places Query API that is fast and well tested against a rather large collection of bookmarks and history. The urls that Jetcrawl use are taken from the Alexa Top 100, so it is a common set to boot.

Automated tests will work better with this data since it is hitting predictable urls and it is the actual places apis creating the data in the first place.

If you want to try it out, please use a new profile. You can stop it by closing the tab. There is no UI as of yet.

>For ages, I have relied on Douglas Crockford’s excellent supplant() for my string formatting needs in JS. It works great with whatever objects you have already hanging around in your code:

var myObj = { name: "Conan", surname: "The Barbarian" };

"{name} {surname}".supplant(myObj);

// result:

"Conan The Barbarian"

This is great, reliable and all that…

But it is a bit verbose in the cases where you do not already have an object hanging around with the correct properties. I tried to find an existing implementation, but most of them were still too verbose relying on hacks like: “{$1} {$2}” or “$1 $2” or worse.

I want a “printf” or Python-style string interpolation:

#python

"%s %s" % ("Conan", "The Barbarian",)

// result:

"Conan The Barbarian"

That’s what I want! – only for strings, it doesn’t have to be too feature-rich. Eventually it would be nice if it could handle integers and floats etc.

After some questions and answers on irc in #js, I was able to get something rudimentary going. I started with supplant, and added support for arrays and arguments.

Behold! String.printf():


String.prototype.printf = function (obj) {
var useArguments = false;
var _arguments = arguments;
var i = -1;
if (typeof _arguments[0] == "string") {
useArguments = true;
}
if (obj instanceof Array || useArguments) {
return this.replace(/\%s/g,
function (a, b) {
i++;
if (useArguments) {
if (typeof _arguments[i] == 'string') {
return _arguments[i];
}
else {
throw new Error("Arguments element is an invalid type");
}
}
return obj[i];
});
}
else {
return this.replace(/{([^{}]*)}/g,
function (a, b) {
var r = obj[b];
return typeof r === 'string' || typeof r === 'number' ? r : a;
});
}
};

Examples:

"{f} {b}".printf({f: "foo", b: "bar"});

"%s %s".printf(["foo", "bar"]);

"%s %s".printf("foo", "bar");

// all of which give this result:

"foo bar"

Let me know if you think of more tweaks for this – or problems. It pains me every time I see concatenation in JS code. Question, which is more efficient? ‘+’ or String.replace()?

Cheers!