Feb 062010

I have about 14,000 data points collected from backups. I started in Oct 2007, but have not done anything with the data. I’d like your help.

I have put the data into a PostgreSQL database and dumped it. The dump is here: http://langille.org/tmp/dlt_stats.sql.gz

The table looks like:

$ psql dan
psql (8.4.2)
Type "help" for help.

dan=# \d dlt_stats
                                      Table "public.dlt_stats"
   Column    |            Type             |                       Modifiers
 id          | integer                     | not null default nextval('dlt_stats_id_seq'::regclass)
 backup_date | timestamp without time zone | not null
 dlt_id      | text                        | not null
 errors      | integer                     | not null


The dlt_id is the label on the tape. The errors field presents the number of correctable errors per GB for that backup job. I run a script after each Bacula backup job which grabs the data from the SCSI device. I then toss that data into a flat file, which I have since imported into PostgreSQL

What I’d like is graphs on a webpage. Use whatever tools you want. I’d like to be able to select a label, and see the results. I think a line graph is best for this, but you do what you prefer.

Bonus points for pretty, simple, and elegant. Your results will be published unless otherwise requested by you. I’ll use the results of the graphs to decide what tapes to toss, what tapes to use for full backups etc.

Thank you.

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive

  5 Responses to “Graphing”

  1. That gives me a first impression, is that something you’d base the work on?

    dlt=# select dlt_id, count(*), min(backup_date), max(backup_date), min(errors), max(errors), avg(errors)
    from dlt_stats
    group by dlt_id order by 7, 6 desc, 5;

    Then I’d have a look at http://matplotlib.sourceforge.net/, there are some recipes to be using it in a web application. There’s a default interactive rendering that could be easier to use from the command line, though.

  2. Good point.

    My initial goal was something like this:

    select dlt_id, backup_date, errors from dlt_stats order by dlt_id, backup_date;

    Then plot each dlt_id on it’s own graph. This would show any trends over time.

    It would be interesting to see your data set plotted against each other.

    The min/max and avg is interesting. Please ignore any zero values. They will be invalid (the batch job failed to record the actual errors).


    select dlt_id, count(*), min(backup_date), max(backup_date), min(errors), max(errors), avg(errors)
    from dlt_stats where errors != 0
    group by dlt_id order by 7, 6 desc, 5


  3. If you have pl-R installed and you’re running the web server on the same machine as the DB server (or are willing to fool around with NFS / SMB type stuff), you can use R it to draw your graphs for you.

  4. Yes, they are both on the same machine.

  5. Hi,

    My solution is a new approch, hope that it’s good for you.

    I use, XML feature from postgresql, XSLT processing and SVG for graphics rendering.

    You can see a result with a SVG compatible browser (Firefox 3.6 for example) here : http://tmafg.free.fr/bordel/hornet/postgresql/result.xml

    The idea is to generate an XML compatible with my XSLT file, so the request is :
    SELECT query_to_xml(‘SELECT * FROM dlt_stats WHERE dlt_id=”DLT7000-002” AND errors0 ORDER BY backup_date;’, false, false, ”);

    where you have to concatenate with “”

    Process the xml generated with the xsl file located here (http://tmafg.free.fr/bordel/hornet/postgresql/result.xsl) and voila !!!