The information in the ETCHA Sketches consists of the sketches
collected from four domains: floor plans, geometry, family trees, and
electrical circuits. In addition some strokes are labeled with
geometric primitives: line, arc,
ellipse, polyline, polygon and
other. These labels were
assigned with 4 different semantics:
Final label set | This is the set of strokes and labels that at least two
labelers agreed on. It contains a combination of point and shape elements. |
Floor plans | These are the full sketches with no labels.
It is just a series of sketch
elements as described below. |
Family tree | |
Geometry | |
Circuits | |
All of the labels | This includes multiple and sometimes contradictory labels for the strokes. It useful for exploring the variation in how people differed in their labeling of the strokes. |
The sketches and labels are described in simple XML files. Currently we only have stroke classifications into primitive shapes but our format is designed to represent higher level domain structures as well.
The XML format described here is out-of-date. Please see the new format description at http://rationale.csail.mit.edu/ETCHASketches/format/.
There are two main tags the sketch
tag and the shape
tag.
The sketch tag has one attribute, id
and several
components,
sketcher
which is the author of the sketch and
study
which is the name of the study in which the sketch
was collected. Any number of points can then be listed. The
shape
tag is used to collect the points together into
strokes. Here is a quick example:
<sketch id="4715">
<sketcher>74</sketcher>
<study>family-trees</study>
<point id="1044912" x="441.0" y="121.0" time="113835076"/>
<point id="1044913" x="439.0" y="120.0" time="113835086"/>
...
<shape type="Stroke" id="23600">
<part type="POINT">1044912</part>
<part type="POINT">1044913</part>
...
</shape>
...
</sketch>
The shape tag is also used to represent the different labellings of
the strokes. It can contain attributes for the condition it was labeled in,
the labeler it was
labeled by, the type of
the stroke, the sketch
the stroke came from and the study
that the sketch came from.
This example is from the final-labels.xml
file and identifies stroke number 11262 as being a polyline:
<SHAPE condition="Best" type="polyline"Here is an example from the all-labels.xml file which additionally identifies the person assigning the label.
study="circuits" sketch="128">
<PART type="stroke">11262</PART>
</SHAPE>
<SHAPE condition="Best" type="polyline"
study="circuits" sketch="128"
labeler="Anonymous--1404223902">
<PART type="stroke">11262</PART>
</SHAPE>
<SHAPE condition="Best" type="polyline"
study="circuits" sketch="128"
labeler="Anonymous--572464613">
<PART type="stroke">11262</PART>
</SHAPE>
All of the index id's are consistent across the different datasets
and with the database file except for the the point ids.
path
type that
we use to represent strokes).
To load the database first install PostgreSQL version 7.4.6. Create a database:
createdb etchasketchesThis will create several tables. The main tables are sketches, strokes, and labels. The other tables are referenced by these tables and are: authors, classes, devices, hand_classified_strokes, pen_modifiers, stroke_types.
gunzip -c etcha-sketches.sql.gz | psql etchasketches