Jarvis was a project to convince a youbot to act like Iron Man’s robotic helpers (at least for circuit boards.) I built the object detection and grasp generation module to find the target board and identify a grasp vector in real time while detecting and avoiding your hand.

Alternative text - include a link to the PDF!

System Dependencies

ROS Package Dependencies

How to Use

First make sure all dependencies have been installed and a vicon (or other localization system) is broadcasting the pose of the kinect.


roslaunch perception.launch

This will open an RVIZ window, ensure the kinect localization is running, and open a slew of nodes. Most of them are associated with Openni_launch and openni_tracker.

Once the nodes have started, stand in front of the kinect in the psi “surrender” position so that the tracker can lock on to you. Hold the position until a green cylinder apears around your left forearm.

Take the PCB (or other flat object) in your left hand and hold it still or move it very slowly. Two things should appear on the screen: a red blob where the algorithm thinks the grabbable object is and a large vector indicating the grasp.


In a delicious bit of irony, Kinect is built to work on windows while ROS exists only on linux, so you need a third party kinect driver. There are at least three:

  • Openni
  • Libfreenect
  • Openni 2

Different packages depend on different drivers. Make sure you know which ones you need because installations of multiple drivers can interfere with each other.

Code Walkthrough

Full code (along with other components - human interaction, path planning, compliant motor control etc.): link

Gotchas and Warnings

Openni (the foundation that maintains the kinect drivers behind many 3rd-party Kinect applications) is defunct at the time of this writing (December 2014.) It was recently acquired by Apple and its website has been shut down. link There are some mirrors link

Point cloud segmentation is slow. With this in mind, on future projects I would:

  • Do as much processing with OpenCV and 2d images as possible.
  • Use PCL GPU It is currently in beta.
  • Set up the processing algorithm so it does segmentation as little as possible. This could include segmenting initially using a particle filter (or other type of filter) to track the object, instead of trying to find it anew every time step. Here’s a good example of this tactic: link

I think some kind of implicit or explicit model is essential for robotic grasping, and AI in general. There are many ways to do this (see the grasping review paper.)


Main PCL site:

Best Point Cloud Library resource I could find:

Openni 2: