Abstract: Robots have been around for almost 4 decades and have helped improved efficiency across multiple industries through automation. Yet the innovation in this space has slowed down due to the fact that robots have been capable of narrow applications focused on specific tasks. There seems to be a resurgence in the field of robotics and we may be heading towards ubiquitous general-purpose robots in the coming decades.
The human mind is a marvellous thing in this universe. It enabled us to build skyscrapers, communicate across large distances, fly across the ocean and assemble space stations, and yet we don’t understand how it works. We leverage all sorts of innovative tools and machines to reduce physical labour, automate work, increase efficiency and conduct remote operations in the process.
One of the earliest inventions to automate work dates back to 1500 BC in Egypt where waterclocks were built to strike bells at the end of a full hour. While new machines gradually arose after the industrial revolution, the proliferation of the idea behind robots can be credited to sci-fi authors such as Isaac Asimov. Since the 1970s, robots have been used across a variety of industries, particularly automotive and manufacturing, thanks to advances in integrated circuit chips and academic research.
Recent estimates claim there are 2.7 Million industrial robots in use globally. While the number is certainly not negligible, the adoption of robots hasn't been universal. This can be attributed to two major reasons:
- Robot applications are narrow: They are programmed to execute a particular task in a structured environment and can't be easily re-programmed to perform additional tasks
- High Capex and risk of obsolesce: Robots are quite expensive and almost all robots are replaced every 4-5 years if the application process or business changes
To enable an economy powered by automation that augments human capabilities, building general-purpose robots capable of performing a broad range of tasks in the real world will be essential. Like the main-frame computers of the 60s and 70s, robots can be operated only by experts for specialized tasks in a specific environment. Just as integrated chips and advances in Operating Systems gave rise to the PC era, advances in computer vision, machine learning and sensor technologies could pave the way for affordable, general-purpose robots.
The gap between the demand for conventional robots and general-purpose robots could be bridged by collaborative robots aka cobots. Cobots are sometimes categorized as 'human augmentation technologies that aid humans rather than replace them. Unlike a conventional robot designed for a specific task and not for collaboration, cobots such as those developed by universal robots can be re-deployed for multiple tasks. They tend to be easily programmable, safe and lightweight as compared to industrial robots.
Let's talk about Cobots another day and dive into the progress made in building general-purpose robots.
Robots have been predominantly built to work in a carefully designed structured environment. A major requirement for robots to become ubiquitous, execute hard tasks and collaborate with humans would lie in their ability to work in unstructured environments where they deal with uncertainty and constant variability. But before we dive into how GPRs can be built to work in unstructured environments let's understand the different aspects of a robot.
A robot, just as any machine, can be broken down into hardware and software.
A top-level view of robot hardware can be broken down into 4 parts: body, electronic circuits, sensors and actuators.
The body is the frame that gives the structure to the robot. While most of the world's robots are high-powered arms, they could also be human-like, dog-shaped or specifically designed.
The electronic circuits power the robot, provide compute resources and help perform a myriad of functions that translate inputs into outputs.
Sensors are essential to understand the world around the robot. Sensors can be used to replicate the perception of human senses such as vision, smell, touch, audio.
Actuators help convert perception into action to achieve the desired output. They convert electrical energy into physical motion.
While the hardware can certainly be incrementally improved upon, the true innovations in GPR will come from software.
A simplified approach to how humans interact with the world and how we expect GPRs to do the same essentially boils down to 3 broad steps:
i. Perception: Sense/Detect movements, objects or force
ii. Computation: Analyze and compute the information received
iii. Action: Move or perform an action as a response
While machine vision is quite straightforward (a robot is given a bunch of digital cameras to capture), machine perception is extremely challenging. For a machine to perceive something means, it needs to recognize the true representation of what it captures. Eg: if the camera captures a cat, the machine should perceive:
a. if it is a 2D (an image of an image) cat picture or a 3D cat
b. if 3D cat, whether it is a living cat or a cat-like object, and
c. what are the intentions of the cat so that the machine can decide to act or not upon what it perceives
Perception isn't just limited to vision, it could be associated with other sensors such as LIDAR(remote sensing), microphone(Natural Language Processing for hearing), accelerometers(movement), piezoelectric transducers(pressure/force for the sense of touch), mass spectrometers(smell), pH meter(taste) etc.
As the machine receives information from different sensors, it also needs to process the information. Eg: A self-driving car should recognize an emergency (an ambulance on the same lane), a dangerous situation(a child in the middle of the road) or differentiate between a stop sign placed by traffic regulators and a playful sticker behind another car.
A Neural Network (sometimes termed Deep Learning) is usually used to process the information. A neural network essentially consists of machine learning algorithms to simulate dense connections of nodes similar to the functioning of a human brain to recognize patterns and make decisions as a human would. 'Deep' in deep learning models allude to the depth in neural networks.
Given the millions of years of evolution, the human body makes a coordinated, high-accuracy movement of our limbs seem very simple. But replicating this with a robot is a nightmare. Arguably, the most important task of the robot since perception and computation is performed to help execute an action. But on the other hand, the final action is about translating commands to actions that are relatively simpler than the abstract concepts of perception and 'thinking'.
Different approaches to building a GPR:
Neural network-based applications are certainly advancing within a structured environment for specific tasks, but general-purpose applications in an unstructured environment is still a far cry. Eg: Today's Deep Learning approaches are able to capture the intricate details of a cup in an image such as the direction of the cupholder. But have had only partial success in interacting with the cup and spectacular failures when encountering a previously unseen type of cup.
While deep-learning models have been the go-to approach in AI applications focused on particular tasks, reinforcement learning and recursive cortical networks have seen the maximum progress in the field of GPRs.
Deep Learning vs Reinforcement Learning:
Although both models enable a system to learn autonomously, the crucial difference lies in the approach. Deep learning models learn from training and apply that training to a new dataset. Whereas reinforcement learning models learn by dynamically adjusting actions based on continuous feedback. Google's AI research group has been extensively working on reinforcement learning models at scale. This will enable robots to learn many distinct tasks from large diverse real-robot datasets within a single model.
Recursive Cortical Network(RCN) vs Neural Network:
Unlike conventional neural networks that leverage matrix multiplication, RCN uses a probabilistic graphical model. Just as a human mind forms lateral connections to retain features and reconstruct them, the RCN model simulates and regenerates different features of an object such as shape, contours, corners and other basic elements. This enables an RCN model to train and infer at a much higher efficiency as compared to other models.
As mentioned above, there are different approaches to building the software component/learning method for GPR. While the learning techniques are different, the basic approach, as explained in this PhD dissertation, in building a GPR could be as follows:
i. Primitive Skills: For a robot to execute sophisticated behaviours, it has to learn the foundations/building-blocks such as grasping, dropping, pushing etc. Labelled as primitive skills, a robot needs to develop these skills to act upon the information it receives through the sensors in unstructured environment. A collection of different primitive skills are compiled into a large repertoire of building blocks.
ii. Sequential Tasks: A task such as sorting objects based on colour essentially involves a sequence of primitive skills such as pick up the object, decode the characteristics based on which the objects are sorted, and drop them into the right repository, etc. Methods such as transfer learning (an ML method where a model developed for a task is reused as the starting point for a model on a second task) and RCN enable knowledge sharing from tasks to similar ones in this process. Eg: Skills transferred from pushing an object to closing a door.
Such skill transfer requires constant interactions in different environments that can be parameterized and shared to improve the model, and achieve generalization as well as fast adaptation.
iii. Hierarchical Tasks: High-level, long-horizon tasks such as “preparing dinner” or “driving to a destination” require a prolonged interaction with the environment. For such tasks, researchers observed that the sequential composition of primitive skills become less tractable due to the exponential growth of possible combinations. Therefore, to reduce the problem it is suggested that the levels of abstraction are increased through meta-learning i.e restructuring the composition of tasks when needed through latent hierarchical programs based on probability distribution.
This is an extremely hard challenge to tackle, startups and researchers are working towards solving this yet to be an unsolved problem. Building a GPR is probably one of the most difficult problems out there. In addition to building robots that can fluently interact in any kind of environment, designing robots that can sense and respond to human emotions is presumably much more difficult than building a logic-oriented intelligent machine.
The company is on a mission to build and scale embodied artificial general intelligence (AGI).
The premise behind Sanctuary AI, lead by Geordie Rose, is that the clearest path to cutting through this extraordinarily difficult problem is to mimic biological systems and the types of intelligence that they need to navigate the world. Not copying the brain, but thinking about what properties of the brain are required for an intelligent agent, for example, to know how to reach out and grasp an object, or to know how to walk on uneven terrain, or to know how to reason about the world.
The idea behind the company is that the world built by humans have been built for humans and thereby being embodied, the synths understand the environment just as humans do through reinforcement learning. Instead of using a pure reinforcement learning where the robots will need to learn from scratch via trial and error (both time-consuming and dangerous), the company leverages teleoperation to provide the building blocks for the robots.
Dr Gildert and her synth (short for synthetic beings) Credits: PCMag
Backed with 150 Million $ from the likes of Peter Thiel, Elon Musk, Jeff Bezos, Khosla Ventures and more, Vicarious, a 10-year-old startup, was the first company to break the captcha test. The company developed the Recursive Cortex Networks approach to AI. In this fascinating podcast, CEO Scott Phoenix throws light into his perspective on building an AGI. Vicarious also seems to take an embodiment approach in eventually building a GPR. To build a human-like brain, Scott believes, the system needs to learn the model of the actual world as well as high-level concepts through reasoning.
In this blogpost, the Vicarious team asserts RCN models have two distinct advantages over deep learning models: better performance in general, and the capability to deal with adversarial examples (handcrafted inputs that are used to fool a neural network). The company is also the first to take the robots-as-a-service approach that both reduces the risks and costs associated with accelerating product cycles.
Flexiv, a general-purpose robotics startup founded in 2016, is focusing on developing and manufacturing adaptive robots which integrate industrial-grade force control, computer vision and AI technologies. The robots have the capacity to adapt to uncertain environments and overcome disturbances and offer the ability to be redeployed for new tasks.
Covariant AI is a startup aiming to build an universal AI that allows robots to see, reason, and act on the world around them. The company provides an AI layer that can be added to any existing robot, enabling robots to learn new skills rather than requiring explicit programming. Called the covariant brain, the software allows the robots to learn general tasks and adapt to new tasks by breaking down complex tasks into simple steps and applying general skills to complete them.
Having raised 147M $ in funding, the Covariant Brain is powering a wide range of industrial robots across various industries with drastically different types of products to manipulate.
Robust AI is another company tackling the challenge by building a cognitive brain that can adapt to uncertain environments. The company combines multiple techniques, including deep learning and an old AI technique called symbolic AI, in a hybrid approach to enable robots to perform multiple tasks in an unstructured environment.
As the name suggests, the company originated from AGI research, particularly MicroPSI theory. MicroPsi industries is a software company that provides ready-to-use AI systems for controlling industrial robots. The company develops generic cognitive machines allowing robots to learn from humans and act in dynamic environments.
GPR vs AGI:
I'd like to make a clear distinction between AGI and GPR. Artificial General Intelligence is when machines can achieve human-level cognitive thinking and creativity whereas GPT is merely better AI that can do a set of tasks better than humans. While GPTs will have the capacity to do multiple tasks and work in an unstructured environment, they wouldn't exhibit creativity, emotion or lack of interest as humans would. I belong to the Deutsch camp that believes until we understand what creativity is and come up with a theory, building an AGI is unlikely.
Certain companies such as Sanctuary and Vicarious are definitely hoping to build AGI by understanding the mind and come up with a theory in the due process and it is an infinitely hard problem to solve.
Will GPRs replace jobs? Yes, that is bound to happen. As AIs get better and better at specific tasks, it will become harder for corporations not to replace humans with AIs citing efficiency - cheaper, more productive and more accurate. What does this mean for the workforce? It won’t likely be a binary answer. While some of the jobs will be replaced, there will be new jobs created that require either creativity or control over these machines. There is an interesting research that claims robots are associated with an increase in the span of control for supervisors remaining within the organization while diminishing the need for managers to monitor worker activities to ensure production quality.
GPRs could also mean humans are relieved of non-creative tasks that they don’t want to do. But this needs to be parallelly supplemented with either a Universal Basic Income model, a large-scale skill development programme run by governments/corporations or both.
Geordie Rose believes that building an AGI will help us understand the mind and claims, perhaps, the greatest test in calling ourselves intelligent species is to understand how intelligence and the mind works. He argues, and I agree, an intelligent machine could free humans from the tasks that they don’t want to do and focus on the tasks that they would like to do. And that one can also do all of the things that they’ve ever done in the background of having a bunch of intelligent machines running around. Just because Michael Jordan was the best basketball player or Lionel Messi is the best football player doesn't stop other athletes or even amateurs from playing their favourite sports, the same would apply for all possible tasks.
Resources and Recommendations:
A great explanation on Neural Networks by the famed 3Blue1Brown.
A great talk by Yuke Zhu on the roadmap to building General Purpose Robots.
Geordie Rose on the TDS podcast
Scott Pheonix on the podcast with Gary Tan.
The Robot Report on the latest news in the robot industry.