The makeup of a robotic system
Robotics is both a thrilling and strenuous domain as it takes so many various capabilities and disciplines to develop a robot. This ranges from mechatronic design and hardware engineering to a few of the more philosophical aspects of software engineering.
The depth of a sophisticated robotic system is staggering and how all the individual components combine together to make something stunning is truly mind-boggling. In this blog post by AICoreSpot, we will analyze what it takes to develop a robot that has sophisticated capabilities by today’s benchmarks. It attempts to slice robotic skills along several dimensions.
The definition of a ‘robot’
In non-specific terms, a robot is essentially a machine with actuators, sensors, and some variant of a computing device that has been coded to execute some level of autonomous behavior. In layman’s terms, a robot attains its percepts from the external environment which undergo process by the sensors – this is subsequently fed into the agent, which is the robot. It obtains these percepts based on its actions and behavior in the external environment. The robot’s body, is its effectors.
What is the difference between a robot and a machine? One might wonder. There are two points that illustrate this difference.
- Robots have, at minimum, one closed feedback loop between sensors and actuators that does not need human intervention. This doesn’t include things such as an RC car, or machine that consistently carries out a repetitive action but will never recuperate if you prod it slightly. For instance, Theo Jansen’s kinetic sculptures.
- Robots are embodied agents that function in the real world. This doesn’t include stuff such as chatbots, or even smart speakers which will amazing depictions of AI – aren’t really categorized into the ‘robot’ umbrella.
The holistic picture – aware, independent, and automatic
All robots are not made equal. This doesn’t imply that simplistic robots are without purpose, given that regardless of how simplistic or complicated, both in tech and budget terms, developing a robot for a specific application is a hardcore engineering challenge. To put it in another fashion: overengineering is not always required.
We will now classify the capacities of a robot into three classes: aware, independent, and automatic. This basically relates to how low- or high- level a robot’s abilities have been produced, but it’s not a 100% precise mapping as you will see.
Automatic: The robot can be controlled and it can carry out motion commands given by a human agent in a restricted environment. Visualize, for instance, industrial robotics within assembly lines. We are not needed to code very sophisticated intelligence for a robot to assemble a component of an airplane from a static group of instructions. What is required is quick, reliant, and continuous operation. The motion trajectories of these robots undergo training and calibration by a specialist for a very particular activity, and the environment is customized for human agents and robots to collaborate with little conflict.
Independent: The robot can now carry out activities in an uncertain setting with limited human oversight. One of the most prevalent instances of this is autonomous vehicles – which are, in essence, robots. Present days autonomous vehicles can identify and steer away from other vehicles and pedestrians, execute lane changes, and navigate the laws of traffic with considerably high effectiveness rates regardless of the flak they receive from the media for not being 100% precise.
Aware: This delves a bit into the “sci-fi” camp of robots, but we can be certain that research and studies in the domain are inching closer to this reality every day. We could classify a robot as being aware when it can setup two-way communications with human agents. A robot with awareness, unlike the prior two categories, is not just a piece of engineering that obtains commands and renders monotonous activities simpler, but one that facilitates increased collaboration. Theoretically, robots with awareness can comprehend the world at an increased level of abstraction than their more dated counterparts, and go about processing human’s intentions from verbal or non-verbal signs, still with the overall objective of providing a solution to a real-life issue.
A useful example is a robot that can assist humans in putting together furniture. Such a robot functions in the same physical and activity space as human agents, so it should adjust to where we opt to locate ourselves or what component of the furniture we are putting together without being an obstruction. It can understand our instructions or requests, undergo learning from our demonstrations, and inform us on what it perceives or what it could do next in a language we know so we can also have an active influence in making sure that the robot is being leveraged to its complete extent.
Let’s observe robots from the perspective of three distinct dimensions, and the classification of skills needed for a robot to accomplish awareness on that specific dimension.
Dimension 1 – Self-awareness and control | ||
Abstract | Task/behavior plans | Semantic comprehension and reasoning |
Behavioral | Motion plans (pathways and trajectories) | Navigating (localization and mapping) |
Functional | Controls (location and velocity) |
This explains to what degree the robot is aware about itself.
Dimension 2 – Spatial awareness | |||
Abstract | Task and behavior plans | Semantic comprehension and reasoning | NLU and Dialog |
Behavioral | Motion plans (pathways and trajectories) | Navigating (localization and mapping) | |
Functional | Perceiving (detection and tracking) |
This explains the robot’s extent of awareness with regards to the environment and its relationship with the external environment.
Dimension 3 – Cognition and expression | ||
Abstract | Semantic comprehension and reasoning | NLU and Dialog |
Behavioral | NLP | |
Functional | Perceiving (detection and tracking) | Speech (recognition and synthesis) |
What are the capabilities of the robot with regards to reasoning about the external world and describing its perceptions, beliefs, and intent, to other independent agents?
The most critical takeaway as we evolve from automatic to aware is the increasing capacity for robots to function “in the wild”. While an industrial robot may be developed to execute monotonous activities with above-human agility and strength, a robot intended for service at home will often give up this type of activity-particular performance with more general abilities required for human interaction and the navigation of uncertain and/or alien environments.
Looking further into robotic skills
Developing a robot needs a combo of skills at various levels of abstraction. These capabilities are all critical facets of a robot’s software stack and need considerably differing spheres of expertise. This takes us back to the core point of this blog post: It’s not simple to develop a robot with high capacities, and it’s definitely not simple to go about this task as one person.
Functional abilities
Functional abilities | Controls | Perceiving | Speech |
(Positioning, velocity, etc.) | (Detection, tracking, etc.) | (Recognition, synthesis, etc.) |
This indicates the low-level foundational capabilities of a robot. With no solid array of functional capabilities, we would have a difficult time to make our robot successful at anything that rests higher up on the skill matrix.
Dwelling at the lowest level of abstraction, functional capabilities are closely connected to direct communication with the sensors and actuators on the bot. These capabilities can be spoken about along the acting and sensing modalities.
Acting
- Controls is with regards to making certain the robot can carry out physical commands in a reliable fashion – regardless of what kind of robot we are dealing with, it is required to react to inputs in a fashion that can be predicted if it is to communicate with the external world. It’s a basic concept, but a challenging undertaking that goes from managing electrical current/fluid pressure/etc. to multi-actuator coordination in carrying out a full motion trajectory.
- Speech synthesis acts on the real world in a different fashion – this lies on the human-robot interaction (HRI) aspect of things. We can visualize these capacities as the robot’s capability to describe its state, perceptions, beliefs, and intent in a manner that is comprehensible by human agents. Think of speech synthesis as going beyond speaking in a monotonous robotic voice, but perhaps the simulation of emotion or emphasis to enable getting data across to the human. This is not restricted to speech. Several social robots will additionally leverage visual cues such as facial expressions, lights and colors, or movements, for instance, take MIT’s Leonardo.
Sensing
- Controls needs some degree of proprioceptive (self) sensing. For regulating the state of a robot, we are required to leverage sensors such as encoders, inertial measurement units, etc.
- Perception is concerned with exteroceptive (environmental) sensing. This primarily concerns itself with line-of-sight sensors such a radar, sonar, and lidar, in addition to cameras. Perception algorithms need considerable process to interpret an array of noisy pixels and/or distance/range readings. The action of abstracting this information to detect and locate objects, trace them over time and space, and leverage them for more sophisticated planning is what makes it thrilling and a challenge. Lastly, going back to the subject of social robotics, vision can also facilitate robots to make inferences about the state of human beings for nonverbal interaction.
- Speech recognition is another variant of exteroceptive sensing. Obtaining from raw audio to precise enough text that the robot can undertake processing of is not simple, regardless of how simplified smart, voice-based assistants have made it seem. This domain of work is referred to as automatic speech recognition (ASR).
Behavioral capabilities
Behavioral abilities | Motion plans | Navigating | Natural Language Processing |
(Pathways, trajectories, etc.) | (Localization, mapping, etc.) |
Behavioral abilities are a stage above the somewhat “raw” sensor-to-actuator process loops that we looked into in the functional capabilities section. Developing a robust array of behavioral capabilities simplify our communications with bots both from the role of programmers and users.
From the functional perspective, we have perhaps illustrated capacities for the bot to respond to very definite, mathematical objectives. For instance,
- Robot arm: “Shift the elbow joint to a 45-degree angle and the shoulder joint to a 90-degree angle in < 2.5 seconds with < 10% overshoot. Then, exert a force of 2 N on the gripper.”
- Self-driving car: “Go up to 40 mph without surpassing accelerating limit of 0.1 g and move the steering wheel to reach a turning rate of 10m.
From the behavioral perspective, commands may take the shape of:
- Robot arm: “Hold the door handle.”
- Self-driving car: “Go left at the upcoming intersection while adhering to the laws of traffic and occupant ride comfort limitations.”
Separating these motion plans and navigating activities needs a combo of models of the bot and/or physical world, and obviously, our group of functional capabilities such as perceiving and control.
- Motion planning goes about coordinating several actuators in a bot to carry out sophisticated tasks, over shifting individual joints to particular points, we now leverage kinematic and dynamic frameworks of our bot to function in the activity space – for instance, the pose of a manipulator’s and effector or the traffic lane a vehicle takes up in a big highway. Also, to travel from the beginning to an objective configuration needs pathway plans and a trajectory that illustrates how to carry out the planned pathway over the course of time.
- Navigation goes about developing a representation of the environment (mapping) and awareness of the robot’s state within the external environment; referred to as localization, to facilitate the robot to function in the environment. This could be represented through simplistic primitives such as polygonal walls and obstructions, an occupancy grid, an HD map of highways, etc.
Basically, functional capabilities and behavioral capabilities do not operate in isolation. Motion plans in a location with obstructions needs perception and navigating capabilities. Navigation in a world needs controls and motion plans.
From the language perspective, NLP (Natural Language Processing) is what moves us from raw textual input – regardless of it coming from speech or direct typed inputs, to something more actionable for the bot. For example, if a bot is provided the input “get me a snack from the fridge”, the NLP engine requires to go about interpreting this at the apt level to execute the activity. Bringing it all together,
- Going to the fridge is a navigation issue that most probably needs a map of the area.
- Identifying the snack and obtaining its 3D position in relation to the bot is perception issue.
- Grabbing the snack without tipping other objects over is a motion planning issue.
- Traversing back to the human’s location – where they were located when they gave out the instruction is once again, a navigation issue. There is a possibility that somebody shut a door when the bot was working, or dropped something along the way, obstructing the pathway, so the bot may be needed to replan on the basis of these alterations to the environment.
Abstract capabilities
Abstract capabilities | Task and behavior plans | Semantic comprehension and reasoning | NLU and Dialog |
In simple terminology, abstract capabilities are the connection between the behaviors of a human agent and the bot. All the capabilities in the table above execute some transformation that enables human agents to more simply describe their instructions to bots, and likewise enables bots to simply describe and express themselves to their human masters.
Task and behavior plans function on the critical principles of composition and abstraction. An instruction such as “bring me a snack from the fridge” can be simplified into a group of basic behaviors. (perception, motion plans, navigation, etc.) which can be parameterized and therefore generalized to other variants of instructions like “throw the empty plastic bottle in the trash.” Leveraging a mutual language such as this makes it useful for coders and programmers to add in functionalities to bots, and for end-users to utilize their bot counterparts to provide solutions to a broader array of issues. Modelling utilities like finite-scale machines and behavior trees have vital in the implementation of such modular systems.
Semantic understanding and reasoning brings abstract know-how to a bot’s internal model of the physical world. For instance, in navigating we observed that a map can be depicted as either occupied or unoccupied. In actuality, there are several more semantics that can improve the activity space of the bots aside from “move to x, avoid y.” Does the location have distinct rooms? Is part of the occupied region movable, if the scenario calls for it? Are there aspects in the location where items can be kept and recovered later? Where are particular objects usually identified such that the bot can execute a more precise search?
Natural language understanding and dialog is basically 2-way interaction of semantic understanding in-between bots and human agents. The reasoning behind abstracting away our world framework was so human agents could work with it in a simplified fashion. Provided below are instances of both directions of interaction:
- Human-to-robot: The objective here is in sharing know-how with the bot to enhance its semantic comprehension of the world. Some instances could be teaching new capabilities or minimizing uncertainty with regards to the external world.
- Robot-to-human: If a robot could not carry out a plan or comprehend an instruction, can it describe to the human agent the reasons behind its failure? Perhaps a door was shut on the path to the objective, or the bot was not aware what a particular word meant and it can query you to describe it.
This is all conceptual at this stage – can a bot be coded to collaborate with humans at such a high level of communication? It’s by no means simple, but research attempts to provide solutions to issues like these on an everyday basis. A good measurement of efficient human-bot communication is if the human agent and the bot collaboratively learned not to face the same issue repeatedly, therefore enhancing the user experience.
Conclusion
We hope we threw some light on the navigation of robot systems that provided some useful information to you. As we specified earlier, no classification is bullet-proof. Going back to something we touched upon earlier: A robot has, at minimum, one closed feedback loop between sensors and actuators that does not need human intervention. Let’s attempt to place our taxonomy in an instance group of feedback loops below.
As nothing in the present day can avoid machine learning, we will also be briefly exploring the role of machine learning’s dominant role in sophisticated robotics.
- Process of visual and audio data has been an ongoing sphere of research for several years, however, the advent of neural networks as function approximators with regards to ML – that is, deep learning, has rendered latest perception and ASR systems with more useful than ever before.
- Also, learning has shown its use at increased levels of abstraction. Textual process leveraging neural networks has shifted the needle on NLP and comprehension. Likewise, neural networks have facilitated end-to-end frameworks that can go about learning to generate motion, behavior, and/or task planning from complicated observation sources such as imagery and range sensors.
The reality is, our human know-how is being outpaced by machine learning for the process of such high-dimensional information. We should not forget that machine learning should not be a supporting beam owing to it single largest drawback: we are not yet capable of explaining why learned systems act in the fashion that they do, which means we cannot provide the kind of assurances than we can with conventional methodology. The idea is to ultimately refine our cumulative scientific know-how so we are not dependent on data-driven black-box strategies. Being aware of how and why bots go about learning will only increase their capabilities in the years that lie ahead.