Automatic Sign Detection with Application to an Assistive Robot

This Jupyter Notebook describes the dataset used for the paper (to appear):

Shakeel, Amlaan, Peining Che, Xian Liu, Yamuna Rajasekhar, and John Femiani. 2018. “Automatic Sign Detection with Application to an Assistive Robot.” In AAAI-18 Workshop on Artificial Intelligence Applied to Assistive Technologies and Smart Environments (ATSE).
Data(tar.gz), Paper(PDF) Video, Poster

Or, as bibtex:

  title      = "Automatic Sign Detection with Application to an Assistive Robot",
  booktitle  = "{AAAI-18} Workshop on Artificial Intelligence Applied to
                Assistive Technologies and Smart Environments ({ATSE})",
  author     = "Shakeel, Amlaan and Che, Peining and Liu, Xian and Rajasekhar,
                Yamuna and Femiani, John",
  year       =  2018,
  conference = "AAAI18"

The notebook provides links to the data, and provides guidance on how to read the date for your own applications.

NOTE: We have not released the code for this project, but the data is available.

Accessing the Data

In [56]:
%pylab inline
from PIL import Image
import os
from glob import glob
import xmltodict
import munch

Download the data from and extract it to a location of your choosing. Mine is tha path shown below.

In [ ]:
annotations_root = '/mnt/S/Teams/Vision/aaai18/' #TODO: Change me to your local download location

The xml files with the annotation information are in an 'Annotations' subfolder.

In [12]:
xmls = glob(os.path.join(annotations_root, 'Annotations', '*'))

The images associated with each XML file are in the 'Images' folder, but there is some sloppiness about how we associate the image and the XML. For most images, the stem (basename without the extension) of the XML file and the corresponding image are the same. The 'path' path recorded in the XML file is for the video that the images were extracted from. For some XML files, however, the image has a name that is not related to the XML file and you need to look at the XML 'path' data to figure it out.

This is reflected in the parse function below; first I look for an image with the same stem as the XML file and then, if that fails, I look at annotation.path from the XML file.

In [95]:
def parse(xml):
    filename = os.path.basename(xml).replace('.xml', '.jpg')
    image_path = os.path.join(data, 'Images', filename)
    annotation = munch.munchify(xmltodict.parse(open(xml))).annotation
    if not os.path.isfile(image_path):
        filename = os.path.basename(annotation.path)
        image_path = os.path.join(data, 'Images', filename)
    result = munch.Munch()
    result.annotation = annotation
    result.image_path = image_path
    result.image =
    if not isinstance(annotation.object, list):
        annotation.object = [annotation.object]
    result.cropped_images = [result.image.crop((int(o.bndbox.xmin), 
                                                int(o.bndbox.ymax))) for o in annotation.object]
    return result

Let's look at an example. The last XML file in the list is froma video.

This show how the annotation.object data from the XML file contains the name and bounding box of the objects in the image.

In [130]:
example = parse(xmls[-1])
for o in example.annotation.object:
    R= Rectangle((float(o.bndbox.xmin), float(o.bndbox.ymin)), 
                  width=float(o.bndbox.xmax) - float(o.bndbox.xmin),
                  height=float(o.bndbox.ymax) - float(o.bndbox.ymin),
    text(float(o.bndbox.xmin), float(o.bndbox.ymax),, verticalalignment='top', color='red')

In order to inspect the data, let's group the object by their name.

In the next cell I build a dictonary that assiciate labels with a list of (XML, index) pairs, with each pair holding the XML file and the index of an object.

In [116]:
object_by_label = {}
for i, xml in enumerate(xmls):
    annotation = munch.munchify(xmltodict.parse(open(xml))).annotation
    annotation.object = annotation.object if isinstance (annotation.object, list) else [annotation.object]
    for j, o in enumerate(annotation.object):
        name =
        if name not in object_by_label:
            object_by_label[name] = []
        object_by_label[name].append((xml, j))
    print '\rprocessed {} of {} files'.format(i, len(xmls)), 
8870 of 8871

Now we can see how many instances of each label we have:

In [117]:
for name in object_by_label:
    print name, ":", len(object_by_label[name])
exit left : 2091
exit forward : 3391
exit right : 914
exit both : 857
elevator : 282
elevator door : 588
door : 5039
stairs : 409

In order to get a sense of the data, let's look at some of the images (cropped in to the objects)

In [131]:
def show_samples(label, rows=5, cols=8):
    examples = random.choice(len(object_by_label[label]), rows*cols)
    for i in range(len(examples)):
        xml, j = object_by_label[label][examples[i]]
        xticks([]); yticks([]);# xlabel(name, fontsize=8)
In [124]:
show_samples('exit left')
In [132]:
show_samples('exit forward')
In [133]:
show_samples(u'exit left')