What does ChatGPT mean for robotics?

Listen to this article

Editor’s Note: This article was originally published on OLogic’s website and was reprinted with permission.

Wouldn’t it be nice to have a much more natural interaction with robots? Well, maybe now we have a technology that can help us reach that goal. Or at least that was my initial thought before researching this article. That’s why the positive section came first, but it’s actually the negative section that throws up the most interesting aspects of this new technology and the potential issues it faces, both technically and commercially.

ChatGPT hit the headlines in November 2022 with a much more conversational approach to artificial intelligence (AI). AI broadly refers to any computer program or system that can perform tasks that typically require human intelligence, such as learning, problem-solving, decision-making, and language understanding. Where ChatGPT is different is the type of AI it uses is Natural Language Processing (NLP) and it generates more human-like text.

Clearly, this will have an impact on robotics, so here we take a closer look at the positives and potential negatives of NLP. First, let’s be clear about what type of robotics we’re discussing here. For OLogic, robotics is a field of engineering and science that involves the design, construction, and operation of robots. These robots have improved over time to be more intelligent and interactive, which has led to the growth of what is known as corobotics or cobots, where robots are designed to specifically work safely in the same environment as humans. These robots can sense, process, and act autonomously or semi-autonomously. As AI has evolved, robotic engineers have used various techniques, such as machine vision, and reinforcement learning, to enable robots to perform tasks in a wide range of applications, such as; manufacturing, logistics, healthcare, and exploration.

ChatGPT is the most recent and cool AI tool to be launched, and it has the potential to enhance robotics by improving their communication and decision-making capabilities. So what are the potential positives and negatives of this new type of AI in our robotics context?

Potential positive impact of ChatGPT

As an AI language model, ChatGPT can offer a variety of ways to improve robotics. Here are some possible ways:

Natural Language Processing (NLP): One of the main advantages is how ChatGPT can be used to improve the natural language processing capabilities of robots, enabling them to better understand and respond to human language. This can be especially useful in human-robot interactions and customer service applications.

A recent example of this was when a team of programmers outfitted Boston Dynamics’ robot dog, Spot, with OpenAI’s ChatGPT and Google’s Text-to-Speech modulation in a viral video.

We integrated ChatGPT with our robots.

We had a ton of fun building this!

Read on for the details: pic.twitter.com/DRC2AOF0eU

— Santiago (@svpino) April 25, 2023

The result was very interesting and a good indication of what’s possible. However, when trying to apply these NLP technologies, Spot still needed specific phrases and procedures to be able to “converse” with a human.

Microsoft, who has a multibillion dollar investment in OpenAI, has also released some guidelines for ChatGPT-robotics integration. The goal of the development activity is to make interaction with robots more natural. Moving them from robots that rely on lines of code to perform tasks, to more natural language instructions.

Machine Vision: ChatGPT can also help improve machine vision, which is essential for robots to “see” and navigate their surroundings. By training robots on ChatGPT-generated synthetic data or using ChatGPT to augment existing datasets, ChatGPT would be able to provide additional training examples to recognize and interpret visual data more accurately. This would help robots perform their tasks with greater efficiency and effectiveness.

Reinforcement Learning: ChatGPT can be used to improve the learning capabilities of robots through reinforcement learning. This involves training robots to make decisions based on feedback from their environment, allowing them to adapt and improve over time.

Data Analysis: ChatGPT can help improve data analysis in robotics by enabling robots to process and analyze large amounts of data quickly and accurately. This can be particularly useful in fields such as logistics and manufacturing, where robots need to make decisions based on real-time data.

Collaborative Learning: Finally, ChatGPT can help enable robots to learn from one another through collaborative learning. This involves sharing data and insights between robots to improve their collective intelligence and effectiveness.

Overall, ChatGPT has the potential to significantly improve robotics by enhancing their learning, decision-making, and communication capabilities (thanks ChatGPT for the closing sentence).

Now for the Negatives

There is the potential for negative consequences associated with the use of ChatGPT or any other AI technology in robotics. Here are some examples of the types of issues it faces to be commercially and technically viable, as well as some of the problems it creates:

Probabilistic, not deterministic: This is probably one of the key issues to be overcome. ChatGPT is based on a probabilistic methodology, which is less structured as a probabilistic model is based on the theory of probability or the fact that randomness plays a role in predicting future events. The opposite is deterministic, which is the opposite of random — it tells us something can be predicted exactly, without the added complication of randomness. Robots today are very much in the deterministic camp. They are programmed to know exactly what to do in each situation they encounter, which is why most robots work in semi-controlled environments.

Google Research and Everyday Robots tried to overcome these issues in a project called PaLM-SayCan. The premise of the research was to “Do As I Can, Not As I Say”. Results from their research can be found on GitHub here. There are multiple examples and trials on the site, with the most recent summary showing that SayCan combined with the improved language model (PaLM), which they refer to as PaLM-SayCan, improved the robotics performance of the entire system compared to a previous Large Language Model (LLM) called FLAN.

PaLM-SayCan chose the correct sequence of skills 84% of the time and executed them successfully 74% of the time, reducing errors by half compared to FLAN. This was particularly exciting because it represented for the first time how an improvement in language models translates to a similar improvement in robotic performance.

Robot Learning (RL): To help with the development of more natural language interaction and to further learning-based robotics, large amounts of training data are required. If you’re interested in this technology area and want to learn more, Bridge Dataset has collected data from 33,078 teleoperated demonstrations, 8,887 rollouts from a scripted policy, and 21 different environments.

Training Model: One of the less mentioned drawbacks of LLM machine learning is the vast amounts of data that are required to train the model that lead to two major drawbacks. Firstly, the vast amount of computing power needed to process all the data. This is literally costing multi millions every time the dataset is updated. And the second issue is that it takes months to retrain the model. You can’t simply add a new section or subset, you have to retrain the LLM on the whole dataset. If you have company sustainability targets and greenhouse gas goals, you may want to check just how many resources are going into your query before you get caught up in the hype.

Bigger doesn’t equal better: This one is critical if we are to overcome the main technological constraints of real-world deployments of NLP. The surprising limitation of NLP or LLM is that the more data you give them to train on does not result in better performance. In fact, it may be the opposite as datasets become unclean due to an averaging effect over all the data.

A way to try and understand this visually is to consider a model having been trained on 16 colors. The difference between blue and yellow is very clear and so the model can interpret the information and provide an accurate prediction. If you feed more data into the model, in this example it maybe 500 colors, to try and provide more subtle and granular information, the result is that the model can’t then distinguish between them. The peaks and troughs of the dataset average out so that there is no clear demarcation between them resulting in errors when the model replies to the query.

This more doesn’t equal better attitude shows concern for how to improve the technology. Just throwing more data at it is not the answer as it is a path of diminishing returns – which is reached very quickly. This is also why OpenAI’s Sam Altman announced that there will be no Chat GPT-5 any time soon.

The current way to try and overcome this is through prompt engineering.

Prompt Engineering: Because bigger doesn’t mean better and LLMs do not actually understand the world like humans do, it leads to many errors, false results, and plain lies where facts are made up. This has led to a new technical job known as prompt engineering where a person has to spend time refining the questions given to the LLM in such a way as to guide the model in the correct direction. Pretty much coaching the model in the type of information it needs to retrieve in order to make the correct answer. As you can imagine, this is a skill in itself and again leaves the result open to human bias.

Distillation Model: Also known as knowledge distillation, this is impacting future financial investment into these types of models. While ChatGPT takes hundreds of millions of dollars and several months to train each release, by utilizing the knowledge distillation methodology, a group at Stanford University recently released (and terminated) their chatbot demo called Alpaca with some insightful results. This $600 ChatGPT lookalike was found to have very similar performance to OpenAI’s GPT-3.5 model. How did they do this? They gathered 52,000 question-answering examples from OPenAI’s text-davinci-003 (known more commonly as GPT-3.5) and used this to retrain a LLaMA model into an instruction-following form.

I don’t think people realize what a big deal it is that Stanford retrained a LLaMA model, into an instruction-following form, by **cheaply** fine-tuning it on inputs and outputs **from text-davinci-003**.

It means: If you allow any sufficiently wide-ranging access to your AI…

— Eliezer Yudkowsky (@ESYudkowsky) March 14, 2023

What does this mean to the average person? Basically, you can get 80% of the performance for a lot less than the cost of ChatGPT. So, unless ChatGPT restricts access to the inputs and outputs of these LLM, or they successfully sue everyone, it seems very hard to know how these companies are going to maintain their competitive advantage and commercial edge.

Unintended Bias: This is recognized by the general AI industry, and like other AI models, ChatGPT is prone to the same problem. AI systems such as ChatGPT can learn and replicate biases present in the data they are trained on. If the data is biased, the AI system can perpetuate that bias, leading to discrimination or unfair treatment. This needs careful attention and management to ensure that we remove any bias discovered in practice.

Unemployment: More generally in society, one of the big concerns for robotics and AI is how they impact work. As they continue to advance, the worry is that they may replace human workers in certain industries, leading to job displacement and unemployment. This can have social and economic consequences if the displaced workers are not able to find new jobs or acquire the skills necessary to work alongside robots.

Dependence: In general, over-reliance on robotics and AI can make humans overly dependent on these systems, leading to a loss of skills and abilities. This can be particularly problematic if the technology fails or malfunctions, leading to errors or accidents.

Ethical Concerns: The use of robotics and AI raises ethical concerns about their impact on society, particularly in areas like privacy, autonomy, and accountability. For example, there may be concerns about how personal data is collected and used, who is responsible for errors or accidents caused by robots, and how decisions made by robots are justified.

It is important to recognize and address these potential negative consequences as we continue to develop and deploy robotics and AI technologies. This can be done through careful consideration of the ethical implications of these technologies, ongoing monitoring and evaluation of their impact, and proactive measures to mitigate potential risks.

Exciting Future

It’s an exciting area of technology and one we will keep a close eye on as we introduce these technologies into our development projects at OLogic. As the compute power of the distilled variants is in the range of a few gigabytes, OLogic is looking forward to running these distilled models on our very own Edge AIoT PumpkinPi. This will bring new edge case applications to the market in a much more affordable and competitive way than the current big compute trajectory…..

A Pumpkin Pi i350 EVK is an Edge AI platform designed for mainstream AI + IoT applications that require vision and voice edge processing, such as facial, object, gesture, motion recognition, LPR, voice activation and speed recognition, sound isolation, bio-tech and biometric measurements, and more.

Thanks, ChatGPT, for your input too. It was insightful, though at a very high level.

About the Author

Ted Larson is the CEO of OLogic, a research and development outsourcing company with a focus on robotics. OLogic has worked on products for companies such as Hasbro, Facebook, Google, Motorola, HP, and Amazon. Larson is computer software and electronics expert with 30+ years of experience designing and building commercial products.

Prior to OLogic, he founded an internet software company called the Urbanite Network, a web server content publishing platform for media customers, and grew the company to over 70 employees, and raised over $10 million in private equity and venture capital. Prior to Urbanite, Larson held positions at Hewlett-Packard, Iomega, and the Los Alamos National Laboratory. He has both a BS and MS in computer science from Cal-Poly, San Luis Obispo.

Comments

John Markham says

December 20, 2023 at 2:50 pm

I built a simple prototype basically 2 dc motors glued to a board and connected to a raspberry pi (with a camera) that took a “mission” from ChatGPT to control it.
https://www.youtube.com/watch?v=XoMICkLN2Cc

I gave it a “mission”, find the tennis ball, and chatGPT will “search” until it finds the tennis ball, and then it approaches it.