Telerobotics Bottlenecks

Note: this is the expanded prose form of a talk I did recently on bottlenecks in telerobotics.

Over the course of this talk, I want to argue and hopefully convince you of five points. I’ve framed them as aggressively as possible, to make the risk of failure real.

Telerobotics is as worth paying attention to as the heavy-weight topics in this workshop like climate change or aging.
Telerobotics is a tool to give people god-like abilities, not a subset of robotics that will be subsumed by advanced AI.
Despite decades of work and sweet demo videos, telerobotics is not a ‘solved’ problem.
Without intervention, general purpose telerobotics may never come to fruition. On the flipside, it’s possible to intervene with a coordinated research program so that it does.
We can take a definite approach towards these interventions by designing ARPA-like programs to coordinate and fund this work.

Telerobotics has a PR problem. To some of you, telerobotics is a meaningless word. Maybe it evokes a vision of something that looks like an iPad on a Segway, someone moving a small robot arm around while mirrored by a giant chonker that is clearly straight out of the 1950s, or a pair of tiny pincers moving at glacial speed to suture a wound. Even in the robotics world, telerobotics is a low-status backwater. It’s a seedy bar full of weird people you go to find a one night stand or get drunk at (or in this case publish a few papers) but definitely aren’t proud of. The problem is that people are weirdly stuck on what telerobotics means right now. They have either forgotten or never thought about what telerobotics could be.

One issue is simply that the word “telerobotics” means very different things to different people. “Telerobotics” is a pretty nebulous suitcase word¹: in different contexts it can mean everything from basically any remote-control system to only the zoom-on-wheels systems that you see in the wild. In addition to miscommunication, confusion about what telerobotics is makes it very hard to say what ‘working’ means, which is in itself often a weirdly powerful bottleneck. I’m going to talk specifically about what I call “general purpose telerobotics” to sidestep some of the nebulosity — people can still define telerobotics however they want. However, that still leaves us with the question of “what does that mean?” And in order to argue that there are solvable bottlenecks “working general-purpose telerobotics,” what does that mean? Feeling around the edges, there are two essential aspects to “working” general purpose telerobotics: a sense of interacting with the world (which rules out a lot of what I think of as remote control, like drones and cars) and a sense of that interaction being an extension of your own body. So if you come away with one thing, remember that working general purpose telerobotics is a system that enables a person to seamlessly interact with the world via a machine.

That definition of general purpose telerobotics is still quite abstract, so I want to briefly frame the possibility space. A Freud-inspired quote I love from Ada Palmer in the Terra Incognita series frames the possibilities for telerobotics perfectly: ”All technology is a prosthetic god, a set of tools we weak humans strap on to give ourselves the powers we crave: computers for omniscience; trackers for omnipresence; medicine for immortality; armor for invulnerability; guns for Heaven’s wrathful thunderbolts.” The goal of telerobotics is to give people superpowers, or, more poetically, enable them to take on aspects of godhood. The ability to effectively teleport anywhere in the world and act with motorized limbs can give you the speed of Hermes. The ability to be massive and lift with hydraulic muscles can give you the strength of Hercules. The ability to hold still for arbitrary amounts of time and translate moving your hand a foot into moving a millimeter can give you the deftness of Athena’s weaving-inventing fingers. The ability to use a tiny robot as an extension of your own body can give you Metis’ shape changing abilities.

Those powers are all well and good, you might say, but won’t AI-powered robots just be able to do anything someone could do with a telerobotic system but faster and better? I’m going to ignore questions of that timeline and feasibility because it’s a deep belief-based rabbit hole. (I will briefly note that one of the cool things about telerobotics is that it can also bootstrap autonomous robots by providing training data about how humans handle situations.) Instead, the importance of telerobotics is (roughly) orthogonal to autonomous robots because telerobotics is a way to empower people, not simply accomplish tasks. Telerobotics can enable exploration and discovery in the same way that writing, drawing, and music creation tools can, even though you can now turn all of them over to AI that will do a decent job.

Another misconception that might be implicitly or explicitly floating in your head is that general purpose telerobotics is a solved problem. And after seeing things like this image of Jeff Bezos gleefully manipulating Converge Robotics Group’s system or the Toyota T-HR3 it’s easy to understand why. Unfortunately, these demos are systems designed for just one thing: demos. Their ‘serious context of use’ is “triggering the feeling in demo-ees and outside observers that they are just the tip of the iceberg and either work now or will soon work given addition well-scoped development work.” Unfortunately that’s not the case. There’s no path from the demo system to a working system without a paradigm shift, which requires the hard uncertain and hard-to-value-capture research work I’ll address shortly. (This isn’t to say that systems designed for a serious context of use cannot create amazing demos: SpaceX is the undisputed champion in that realm.)

Misleading hints that telerobotics is a solved problem fall into two categories besides full demo systems: academic work that highlights progress in one subsystem and specialized commercial telerobots. Academics boldly declare that their novel method crushes others on benchmarks that have little to do with a working system, publish a paper, and then move on to the next project. CEOs who make specialized telerobots for sorting or medicine declare that they have general purpose telerobotics in the bag, but in reality rehauling their system architecture would require just as much work as building one from scratch and their incentives are to automate and dominate a market, not make a system that is good at many diverse things. At the end of the day there is no amount of money you could pay right now to get a working general purpose telerobotic system (remember, seamless interaction) on a deterministic schedule.

The extant work that gestures towards telerobotics is actually scattered across a number of different domains in what I would call “lineages” that developed almost independently. They’re worth laying out both to give you a sense of the different mindsets and evolutionary pressures on telerobotic technology and because the relative isolation between the lineages contributes to telerobotics bottlenecks to some extent. Nuclear material handling is the original telerobotics lineage. It’s concerned with doing a small number of reasonably-dextrous tasks on the other side of a lead wall and still supports at least one company. Surgical robots like the da Vinci robots are another old lineage with clear commercial applications. They are concerned with tiny, slow, precise motions directed from across a room and sometimes farther away. Space agencies have been doing telerobotic work for a shockingly long time considering how they’re never used; the goal there is to do fiddly repair tasks with a large latency and no line-of-sight to the robot. There’s a whole class of military telerobotics intended to diffuse roadside bombs and other explosives. There are plenty of shuttered projects that resemble the cobbled-together proof-of-concept-but-going-nowhere system of academic lineages but the ones that are actually deployed are much closer to either the nuclear or surgical lineages. The telepresence “iPad on a Segway” lineage lives on a completely different branch of the family tree. Vaguely related is the whole world of “avatars” that focus on trying to “make it feel like you’re there” but in my opinion end up prematurely optimizing for form over function. There are a number of academic lineages on components of the bigger picture — from hands to haptics to complaint arms to interfaces. These projects tend to declare superiority on some (often new) axis, possibly do a proof of concept, and then are shelved in a form that is mostly useless to bigger systems. Some academic projects do involve cobbling together a larger system in order to demonstrate a specific novel insight.

If all of this work over the course of decades has failed to get us onto a path towards general purpose telerobotics, why should we expect that we would be able to change that now? “Why now” arguments are always just narratives, but there are some reasons for optimism. The quality of VR hardware, software, and peripherals has been continuously improving for almost a decade with varying amounts of hype. The whole point of these systems is to enable you to seamlessly interact with a digital world. It’s reasonable to believe that it might be possible to replace that digital world with the real world as mediated through a robot. A specific VR development that I think is underrated for telerobotics is the interaction paradigms that VR game designers have created.

Advances in deep learning could let us sprinkle a bit of AI pixie dust on telerobotics in two areas. First, work like OpenAI’s Project Dactyl suggest that deep learning could enable much better low-level control for delicate manipulation tasks, potentially enabling a telerobot’s operator to have a more video-game like interface to the world. Second, the many branches of scene understanding could unlock everything from simulated touch sensations to predicting a few moments into the future to combat latency. As a last “why now,” the burgeoning drone industry has driven a lot of work on high-power-density motors that could potentially enable better telerobotic arms and manipulators.

Given the combination of “why now” reasons and the slew of work that touches on general-purpose telerobotics, one might expect that it’s just a matter of time (and maybe some mixing and shaking) before general purpose telerobotics arrives. Misirabile dictu, I’d argue that’s not the case (otherwise I wouldn’t be giving this talk). People have created many high potential pieces for sure, but in a nutshell, the bottleneck to general purpose telerobotics is in building a system. Building a general telerobotics system requires coordinating many disciplines through an absolutely huge design space, which is still real research despite the amount of work that’s already been done. There are three major bottlenecks constraining this coordinated systems work:

The technology itself, especially the densely interconnected nature of the system’s architecture.
A serious context of use.
Constraints imposed by institutional incentives.

Often, the technology itself is the last thing you want to talk about. However, I’m going to jump straight to it because I hope that by giving you a sense of the high dimensional design knottiness we’re facing, the bottlenecks imposed by serious contexts of use and institutional structures will make more sense. I will be skimming the surface for the purpose of brevity, not to try to pull one over on you — I’m more than happy to go into any details about assertions later.

You could think of any telerobotic system as having five different sub-systems:

To start, there is the information connection between the operator and the robot. The biggest thing we’re concerned about there is the closed loop latency from the operator to the robot and back to the operator. Latency means that the operator is giving commands to the robot in the future and acting on information from the past. Latency is a big deal both because our ability to act leans heavily on quick feedback loops with the world and because latency does nasty things to control loops that in the most extreme cases can lead to things like a force-feedback device ripping off someone’s arm.
The second subsystem is the human interface — this is your VR headset and haptic gloves, screen and mouse, fully articulated exoskeleton, vat of variable stiffness smart fluid: the combination of hardware and software that the person sees, hears, feels (and tastes and smells) and also issues commands through. The design space here is massive: do you try to map as close to 1:1 between what the operator experiences+does to what the robot experiences+does? Does the operator give high-level commands? What senses do you use? Do you show the operator a direct view from the robot’s sensors, do you reconstruct the robot’s world, something in between? How/do you show the operator what the robot is going to do?
The third subsystem is the robot’s actuators — the devices it uses to physically interact with the world. Even ignoring locomotion, there is a massive set of design possibilities, each with its own constraints. Robot arms are traditionally designed so that you tell them to go somewhere and they go exactly there (to within some tolerance, the smaller the better) — this is called positional control. The trick is that in order to do this well, the arms must be extremely strong (and are correspondingly expensive). Without a lot of extra work if you tell them to go somewhere that happens to be inside a wall, teapot, or someone’s head they will happily go exactly there, with the bad results you might expect. People have done a lot of work on both sensors to detect+avoid potential collisions and on ‘compliant’ arms that act more like our default-bendy human arms, but both approaches come with massive downsides. Grippers/hands are another actuator can of worms: can you get away with ye olde pincers? (I would argue no.) But human hands are actually incredible and we are nowhere near being able to make robotic copies of them. Do you eschew the whole hand-like paradigm and use a tentacle or a balloon filled with coffee grounds?
The fourth subsystem is the robot’s sensors — both the hardware it uses to collect information about its environment and the software it uses to process that information. Cameras (of many varieties, numbers, and positions), depth sensors, and pressure/tactile are the standard possibilities here. Does the software need to recognize objects? Reconstruct the environment?
Finally, there are the control systems that touch both the robot and the operator. Control systems are responsible for everything from mapping between a joystick movement and a motor voltage to figuring out what to do if the operator tells the robot “grab that.”

The fact that there are many design decisions for each subsystem is not a bottleneck in and of itself. The bottleneck lies in a trifecta of facts:

The design choices in every subsystem are tightly coupled to every other subsystem.
Adjusting design choices is slow and/or expensive because it often requires many small acts of invention.
There are several potential non-overlapping ‘design regimes’ and it is unclear which one to pursue, so addressing the previous two points requires either parallelization or the possibility that you go down a path in a regime that you realize is the wrong one.

The most straightforward example of this coupling is how latency affects everything. If you have extremely low latency, you can get away with an extremely dumb control system and an interface that attempts to map every operator movement onto the robot and every robot sensation onto the operator. Increase the latency to levels you would get over the internet or to the other side of the world, however, and you either need sensors+a control system that can reconstruct the environment for the operator to interact with directly or one of a million ways to abstract the operator’s commands. “Just make a telerobot for local networks and then expand” is a reasonable response. However, unless you had a plan for building a global network dedicated to only low-latency telerobotics, there is very little knowledge that would transfer out of applications with short distances and local networks because doing so would require not just a few tweaks but jumping to an entirely different design regime.

I want to throw some more examples of subsystem coupling at you just to give a taste of their knottiness. Most of the environments where you would want to use a telerobotic system involve the potential to damage things. If you don’t have a complaint arm, you need coupled sensors and control systems that can make sure you’re not destroying things and their requirements depend on how fast the interface enables the operator to move the robot; if you do have a compliant arm, you need a control system that can make sure the arm is doing what you actually want it to, and an interface that enables the operator to actually indicate that intent. The same sort of design coupling hinges on whether the interface uses a mouse, VR controllers, haptic gloves, or a full force-feedback exoskeleton. Also on whether the raw sensor data is just piped to the operator or if the system is able to ‘understand’ the environment to some extent (and how is it doing it?). Also on how much autonomy you put into the robot and in which situations. The list goes on.

You could do research and engineering work to improve any of the subsystems: better compliant arms, better hands, better scene understanding, improved simulation, better sensors, less latency, better interfaces. However, each of those is a hard, almost-infinitely optimizable problem in and of itself. Each subsystem doesn’t need to be perfect in order to create a working general-purpose telerobotic system, but the imperfections in each subsystem need to be compensated for by the rest of the system. As a result, an effort towards general-purpose telerobotics requires significant coordination to do well, involves a lot of potentially wasted work, and it will create hard-to-capture value because once you find a regime that works decently well, nothing can stop other groups from jumping straight to that regime and optimizing it.

Enabling technologies like telerobotics need to be developed in a serious context of use. Unfortunately, there is no single obvious serious context of use for telerobotics. Serious contexts of use act like the second level of a multi-level evolutionary system to prevent you from getting stuck at a local optimum. Without a serious context of use, you inevitably optimize around demos and fail to catch the subtle but important design decisions informed by friction with the real world. This is especially true for a general purpose technology like telerobotics where ‘working’ cannot be boiled down to a set of metrics. The context of use for general purpose telerobotics is also important because it needs to actually demand general purpose telerobotics instead of either specialized teleoperation or automation.

There are several potential candidates, each with their own downsides.

Space exploration, specifically enabling an operator in a spaceship or habitat to operate a robot to do mining or maintenance demands a lot of non-routine dexterous manipulation. Unfortunately, there are about twelve people at most in space at any time right now, none of them have the bandwidth to do iterated tinkering, and the times when a telerobot would be useful are far less frequent than they would be if asteroid mining were an actual thing that people did.
Oil/Gas/Mining could save tons of money in insurance, safety equipment, and travel costs by using telerobots. The things people need to do are varied enough that automation requires something close to generally intelligent robots. A few gas companies are even funding some academic telerobotics projects. However, this case is in most part a cost-cutting measure that will be expensive, uncertain, and take a long time to create: a combination that needs to fight against momentum and asymmetric career risk in big companies. Optimistically, some companies already use underwater Remotely Operated Vehicles (ROVs) to do maintenance and repair: if there is room to expand their capabilities, it may present a promising serious context of use.
Elder Care is in some ways an ideal context of use because it requires doing such a wide range of things and is a very big deal globally. At the same time, the bar for “working” is extremely high because we’re talking about people’s lives, homes, and the opinions of older adults. Housework is similar to elder care but with both lower stakes and less seriousness because it’s basically a cost-cutting measure.
Labwork would unlock 24/7 worldwide collaborations and let people utilize rare equipment that would otherwise be inaccessible but it has very little ‘slack’ — researchers don’t have the bandwidth to use a suboptimal tool while it’s improved and there is very little capturable upside.
There of course may be others!

General purpose telerobotics feels so tantalizingly close that it’s incredibly reasonable to ask (especially among this crowd) “why can’t a visionary startup funded by deep-pocketed investors unlock general purpose telerobotics?” Two other important questions depending on your worldviews might be “why couldn’t an academic lab build a prototype system that makes general purpose telerobotics to inevitability?” And “why couldn’t a well-resourced corporation like an oil and gas company do the work to make general purpose telerobotics inevitable?” At the end of the day, a startup needs to be a good investment. That means that the longer the timeline and the more the uncertainty, the larger the potential returns need to be. There is no irrefutable proof that a researchy program shouldn’t be a startup unless perhaps there is no conceivable way that its output can be sold. However, there are several pieces of evidence that given the work that needs to be done at this point in time,² it’s too early for a general purpose telerobotics startup: general purpose robotics is hard and takes a long time: Boston Dynamics was in large part funded by DARPA for decades before they released a commercial product (Spot). It still took three years between the announcement and actual sales (2016-2019) and once Boston Dynamics figured out the right place in design space for a product, a ton of competitors jumped in. Boston Dynamics itself would have been a bad investment, but a Spot company that started in 2016 based on their previous work may have been a good investment. History is littered with general-purpose robotics companies that people hoped would be good investments: Willow Garage, Rethink, Fetch. Robotics companies that were good investments are the ones that produce a specialized robot for a particular market. Despite ambitions otherwise, none of these ever go from specialized to generalized products.

Academic labs don’t have the resources or incentives to create a general purpose telerobotic system. Simultaneously working on all five areas will require a lot of expensive systems engineering work and granting organizations — both public and private — rarely give grants for speculative systems research. On top of that, the academic system rewards telerobotics researchers for using off the shelf parts to show that their particular new technique improves performance on a made-up benchmark so that they can publish a paper and move on.

Nominally, industrial labs should be a good home for a general-purpose telerobotics program. However, no existent company can see itself capturing enough value from the work to justify the cost. For most of them it makes sense to focus on automation, which is also sexier. Given that bleak picture, what do potential paths forward look like? I’ll start from specifics — what design regimes are most promising — and then move to the more general — what a full program might look like and then the full scenario in which general purpose telerobotics shifts from a laughable impossibility to an inevitability. The caveat, of course, is that this is a plan but not the only plan. Perhaps similar to how open houses come with sample furniture and fake fruit. As a result it’s a little bit uncomfortable to assert these things because the whole point of program design is to rigorously figure out all of this. And the goal of each step is to update each subsequent step.

One regime is what I would call a dynamically supervisory system. Imagine an interface that works something like a first-person rpg where everything can be highlighted and interacted with. The goal of the interface would be to give you as much awareness of what’s going on as possible, let you know the possibility space of interactions, and give you an idea of what the robot is going to do before it does it. This type of interaction would require a sophisticated object-based physical model of the world to be generated and refreshed in real time. The robot would need to have some library of affordances for different things in the world and be able to plan local interactions with them. Perhaps the person has direct control in ‘free space’ so that they can get the robot into the best position to plan from so the planning problem becomes as easy as possible. In this situation you would also need either a compliant arm or a ton of sensors combined with a blisteringly fast control system.

Another regime is what I would call simulated direct drive. Imagine an interface where you are the robot — however, in order to get this to work with inevitable latency, you’re actually a simulated robot. In this scenario, the operator is interacting with a high-fidelity environment that is a simulation of the robot’s environment t+2Δ in the future, where Δ is the delay between the robot and the operator. This regime would need to capture the robot’s environment in a way that’s legible to the human interface: maybe not necessarily semantic objects, but definitely surfaces and some sense of what will happen to the surfaces over time so that they could be simulated. While you wouldn’t need the control system on the robot’s end to do planning, per se, you would also need some kind of reactive system that predicts how the person would react to unexpected input. This is analogous to how reflex arcs work in the nervous system.

How can this design space exploration happen? We’ve already established that it’s constrained in academic, startup, and industrial contexts. My (biased) hypothesis is that the work to explore these regimes could happen in the context of a DARPA-like program that coordinates between several different groups: academic labs, contract engineering companies, and organizations with serious contexts of use like underwater ROV operators. The program would start by running workshops and nailing down the chunks of work that need to be done and who would do them, along with an intermediate program goal that isn’t “solve general purpose telerobotics.” One possibility is a modifiable research platform that could be to telerobotics in the late 2020s what the PR2 was to robotics in the early 2010s. The program could fund several approaches at once (which is very hard to do in a single organization), requiring that they are actually tested in a real ROV situation. Over time, as it becomes clear which approach works best and coordination costs increase, the program could internalize the work to produce a single research platform. You could imagine spinning up a nonprofit or low-profit organization to manufacture and disseminate these.

A single five-year program, no matter how successful, is unlikely to get us all the way there. Once people have had a chance to muck around with the research platforms, a followup program could run a prize competition similar to the DARPA urban challenge that that incentivizes team creation around optimizing and building systems within several serious contexts of use — it’s hard to over optimize one system that needs to both clean a house and repair an underwater gas line. The goal again would not be to get general purpose telerobotics over the finish line, but to shift perception of its potential in the same way that the DARPA Urban challenge shifted perception around autonomous vehicles. At this point some of the people from the top teams would form companies to go after specific contexts of use with the core technology in ways that are actually investable and market mechanisms could take over. At that point I would say we successfully shifted general purpose telerobotics from impossible to inevitable.

A word that has been filled with so many meanings by different people that it resembles an overstuffed suitcase. ↩
Not being a good investment now does not preclude being a good investment later. ↩

If you enjoyed this post, please share!