Archives For 2009/02/28

>At Mozilla, we need to understand how Firefox is used in the wild. Knowing what “typical” profiles are like and having automated tests that attempt to model real world situations is a big plus for writing well performing code.

Just in case anyone else needs to collect data about Firefox use or model “typical” user data for performance testing, here is how Drew and I quickly put together our “Places” toolkit.

The Sprint info page is here: https://wiki.mozilla.org/Firefox/Sprints/Places_DB_Creation_Scripts

We needed:

1. a client side script that collects places.sqlite metrics

The client side script is a Javascript written by Drew.

His script runs a bunch of aggregate SQL queries against your Places SQLite database and posts this to the collection url: https://places-stats.mozilla.com/stats

and

2. A server side script to generate a places.sqlite database based on the metrics we are collecting.

I focused on the database generation.

For now, we are doing this so we can create a test (mock) sqlite database with as many records as we wish, or based on the min, max or average of the users that post to the places-stats collection url.

So the basic flow is:

1. have users visit https://places-stats.mozilla.com and run the collection script.
2. get a large number of users (and varied types of users) posting their stats to the collection url
3. be able to produce a “power user”, “average user”, and “light user” places.sqlite database on the fly from data hosted at places-stats.mozilla.com

I wrote a Python script for the aggregate data collection and database generation.

To make this an easy, fast exercise in software re-use, I used Django’s db module to reverse engineer the Places schema into a set of Python models.

Once you have Django set up you can run the famous ‘manage.py inspectdb’, which queries your SQLite db schema and outputs the corresponding django.db Python classes.

It’s trivial to inject new rows into the database using django.db:


place = MozPlaces(
url=my_url,
title=my_title,
rev_host=reverse_host(my_url),
visit_count=1,
hidden=0,
typed=1,
favicon=new_favicon(),
frecency=1)
place.save()

(‘MozPlaces’ is a django.db ORM class)

Wow, that was easy, but wait, there is more to do.

We are not even attempting to create ‘real’ generated place data, we just want the rows in the database to seem real. We can generate random host, domain, and tld data like this:


def url_parts():
"""
return a dictionary like: {'proto':'http'
'host':'www',
'domain':'foo',
'tld':'com'}
"""
protocol = ['https','http','ftp']
host_len = random.randint(4,26)
host = "".join(random.sample(ALPHA,host_len))
domain_len = random.randint(2,26)
domain = "".join(random.sample(ALPHA,domain_len))
tld_len = random.randint(2,3)
tld = "".join(random.sample(ALPHA,tld_len))
proto_idx = random.randint(0, 2)
proto = protocol[proto_idx]
return {'proto':proto,'host':host,'domain':domain,'tld':tld}

Python’s random module has a ton of cool features. Output from the program shows that we end up with crazy looking hosts:


% python builddb/generate.py

h = httplib2.Http(os.tmpnam())
########################################################
Creating 131901 Places
Creating about 191594 History Visits
Creating about 12779 Bookmarks
Creating 101 Keywords
Creating 2173 Input History Records
########################################################
131901
Place #1 created
https://rmxwunibhvqzgjfclasypedko.zjrlundpaocs.kc/00000120269538042dedec07007f000000010001
Place #2 created
http://hlbgtm.wjxbdquyraotliek.au/000001202695391f62a5444e007f000000010001
Place #3 created
http://zdlxfpavecirty.urjawdvzoxgqemcikl.fp/00000120269539d794891209007f000000010001
Place #4 created
http://viwzykb.ofwxjmvltr.oa/0000012026953ab4b233317e007f000000010001
Place #5 created
https://yphswltjfmrbqogcd.qvd.ozd/0000012026953b539bc78b95007f000000010001
Place #6 created
ftp://pncqvksgazieuhdlofwxrtbymj.oekt.rbk/0000012026953c1f28a069ce007f000000010001
Place #7 created
http://lsmqeaojpxibvgnukwztcryhfd.isryhudzoeqjxtcankfgm.sg/0000012026953ca74487966d007f000000010001

My favorite site of the lot is “yphswltjfmrbqogcd.qvd.ozd”:)

The generation script populates “Places”, History, Bookmarks, Favicons, Input History and Keywords. I still have a few more entity types to generate, but this is sufficient for the testing we need to do now.

The current patch is here: https://bug480340.bugzilla.mozilla.org/attachment.cgi?id=367263

The bug is here: https://bugzilla.mozilla.org/show_bug.cgi?id=480340

The basic lesson learned is that you can build an effective, one-off data collection/metrics tool quickly and easily. I am sure others at Mozilla need tools like this, so do not hesitate to ping me with questions.

>There has been a lot of discussion about ORMs, web frameworks and MozStorage on Mozilla newsgroups as of late. Coincidentally, I have been slogging through using MozStorage with Places (bookmarking) code in my day to day. I really miss my days of lazy lazy Orm-y development, you know, Django Models:


my_old_macs = Computer.objects.filter(model__exact='Mac IIci').order_by('-date_aquired')

the result object ‘my_old_macs’ is a wrapped query that has not executed yet. Once you begin iterating, it executes and returns the rows as Computer objects.


for mac in my_old_macs:
print mac.model
print mac.nickname
print mac.date_aquired

Ahhh, the beauty and simplicity. Here is the Model reference.

I need this kind of easy to use (and yet sophisticated) ORM style database connectivity in Firefox for the 60% + of the time where a simple, bloated ORM does the trick.

There is a related bug in Bugzilla.

I have spent a lot of time lately (mostly weekends) hacking some very buggy and naive ORM code that mimics Django – a little:)

I have attached it to bug 394372

I would love some feedback, I know I am doing some things wrong and bad, but I think I have some good concepts fleshed out.

Here is the basic usage:


var id = new Field('id','INTEGER',null,false,true,true,null);
var make_model = new Field('make_model','VARCHAR',128,false,false,false,null);
var fields = [id,make_model];
var computer = new Model(fields,'myDbTable');
var models = [computer];
var orm = new Orm('computers.sqlite',models);

// create the database:
orm.createDB();

// let's insert:
computer.save({make_model:'Mac IIci'});
computer.save({make_model:'Mac IIcx'});
computer.save({make_model:'Mac IIvx'});

// get a computer
var myIIci = computer.filter(['make_model__eq__Mac IIci']);

// Not working yet, but the style I am going for:

// update a computer
myIIci.save({make_model:'Mac IIci MK2'});

// delete a computer
myIIci.delete();

// JOIN query:

var nerdsWithIIcis = nerd.filter(['computer__make_model__eq__Mac IIci']);

Let me know what you think. You can do prety amazing things with Django, and yes, you do have to still write SQL here and there for perfomance reasons.

I hang out in #places, nick: ddahl

Cheers!