Format of ETCHA Sketch files

This is a very preliminary format for this data. Please drop me a note if you have suggestions, complaints, or want to contribute your sketches to the data set.

The information in the ETCHA Sketches consists of the sketches collected from four domains: floor plans, geometry, family trees, and electrical circuits.  In addition some strokes are labeled with geometric primitives: line, arc, ellipse, polyline, polygon and other.  These labels were assigned with 4 different semantics:

  1. Best: Best interpretation in isolation
  2. Context: Best interpretation in context
  3. IsA: Is the stroke a...  This condition allows more than one label per stroke.
  4. CanBeA: Can the stroke be considered a...  This condition allows more than one label per stroke.
There are two ways you can download the database:
Go to the download page

XML files

There are 6 files in all. Together they comprise the ETCHA Sketches.
Final label set This is the set of strokes and labels that at least two labelers agreed on.  It contains a combination of point and shape elements.
Floor plans These are the full sketches with no labels.  It is just a series of sketch elements as described below.
Family tree
Geometry
Circuits
All of the labels This includes multiple and sometimes contradictory labels for the strokes. It useful for exploring the variation in how people differed in their labeling of the strokes.

The sketches and labels are described in simple XML files. Currently we only have stroke classifications into primitive shapes but our format is designed to represent higher level domain structures as well.

The XML format described here is out-of-date. Please see the new format description at http://rationale.csail.mit.edu/ETCHASketches/format/.

There are two main tags the sketch tag and the shape tag.

The sketch tag has one attribute, id and several components, sketcher which is the author of the sketch and study which is the name of the study in which the sketch was collected. Any number of points can then be listed. The shape tag is used to collect the points together into strokes. Here is a quick example:

<sketch id="4715">
  <sketcher>74</sketcher>
<study>family-trees</study>
<point id="1044912" x="441.0" y="121.0" time="113835076"/>
<point id="1044913" x="439.0" y="120.0" time="113835086"/>
...

<shape type="Stroke" id="23600">
<part type="POINT">1044912</part>
<part type="POINT">1044913</part>
...
</shape>
...
</sketch>

The shape tag is also used to represent the different labellings of the strokes.  It can contain attributes for the condition it was labeled in, the labeler it was labeled by, the type of the stroke, the sketch the stroke came from and the study that the sketch came from.

This example is from the final-labels.xml file and identifies stroke number 11262 as being a polyline:

<SHAPE condition="Best" type="polyline" 
study="circuits" sketch="128">
<PART type="stroke">11262</PART>
</SHAPE>

Here is an example from the all-labels.xml file which additionally identifies the person assigning the label.
<SHAPE condition="Best" type="polyline" 
study="circuits" sketch="128"
labeler="Anonymous--1404223902">
  <PART type="stroke">11262</PART>
</SHAPE>
<SHAPE condition="Best" type="polyline"
study="circuits" sketch="128"
labeler="Anonymous--572464613">
<PART type="stroke">11262</PART>
</SHAPE>

All of the index id's are consistent across the different datasets and with the database file except for the the point ids.

PostgreSQL dump file

We are making the SQL file available because it has been a great resource for us. Unfortunately, due to the wide variety of databases and even differences between versions of the same database, the only version I can support is the version of PostgreSQL that our group currently uses. The file should be loadable by other versions of PostgreSQL but is unlikely to work for other databases because PostgreSQL has native support for geometric data types that other databases do not support (most notably the path type that we use to represent strokes).

To load the database first install PostgreSQL version 7.4.6. Create a database:

createdb etchasketches
gunzip -c etcha-sketches.sql.gz | psql etchasketches
This will create several tables. The main tables are sketches, strokes, and labels. The other tables are referenced by these tables and are: authors, classes, devices, hand_classified_strokes, pen_modifiers, stroke_types.
Go to: DRG home : ETCHASketch Home
Contact: Mike Oltmans