Project 01: Robots Learn to Talk

[Please note this is a ‘workbench’ project which means it is currently work in progress.]

[Progress Update: The planning phase is almost complete. The 2.0 Ideation Phase is now far progressed and I am finalising the 3.0 Objectives and Assumptions section. Basically ready to define the detail project phases and make a start on this project – see below]

Introduction

I have not deliberately built the ability to learn a language and speak into the Xzistor Concept brain model. Halfway through, I just realised that speech will automatically ‘evolve’ in these robots if they have a hearing sense (microphone) and a means to effect sounds (speaker). It will come about as a natural consequence of, among others, the Motion Algorithm and the Association-forming Algorithm. I was excited about this fact and thought I should pursue an advanced project to demonstrate what so many have tried before.

Note: I do not deem robot speech just a simple standalone piece of code that will record words and play them back like a parrot. It is also not an Alexa type look up table with pre-programmed phrases that are played in response to ‘sound recognised’ voice commands. If that was the objective, well – Troopy does this already – I have added two fun sound files below where you can hear me and Troopy exchange some military style communications (squelch, and all!) just to amuse some friends’ kids. But this is simply playback of pre-recorded sound files and not what we need to achieve with this project! Language in Xzistor Concept robots develop precisely like they would in humans…so it will take time to get to a full language acumen, but like always, I aim to explain the fundamental principals and leave further detail developments to others, while I tackle the next topic. My approach will involve motion, reflex, emotions, learning, thinking, association-forming and it will require a key ingredient….laziness. Yes, Xzistor Concept robots will only learn to speak if it offers an easier way to get what they want – easier than physical exertion. See! Laziness is not such a bad thing! 🙂

Pre-recorded voice of Ano to Troopy (military style):

Pre-recorded voice of Troopy responding to request above (military style):

In preparation for collaborating on this project I want to recommend that you look at the Robot Entertainment demo video and make sure you understand the basics of ‘playing’. The reason for this will soon become clear!


1.0 – Aim

Guidance Text: Clearly define the aim of the project. Be clear on what it aims to prove (what it is – and what it isn’t).

Using the Xzistor Concept brain model, demonstrate in the simplest way possible the principals of how an Xzistor Concept robot with a hearing sense (microphone) and a sound generating ability (speaker) can develop speech similar to the way humans learn to speak.

2.0 Ideation

Guidance Text: Creative (often messy) initial notes about approaches, options and conceptual aspects that could support achieving the aim. This will include initial tests / experiments / prototyping to inform our thinking before deciding on the final objectives and assumptions. This upfront work will save time later!

Let’s look carefully at our aim.

Ok, so the aim is to keep it simple. Like always, we will only aim to prove the principle and let others explore all the many detail applications spawned from this.

I will be exploring many ideas here and do some preemptive experiments / tests in preparation for setting down my final objectives for this project.

Idea 2.1 – Microphone

We need to provide Troopy with a sense of hearing …I am curious to know if I can use the webcam already fitted to Troopy. I vaguely remember this Logitech 910e boasts stereo microphones…but need to investigate! Oh, I see it is actually called the Logitech c920.

Here it is:

The dual microphones can be seen clearly (perforated areas) on either side of the camera. Here is what the description on Amazon say ‘Dual Microphones : The two-microphone system on this HD webcam – one on each side of the lens – captures natural stereo audio while filtering out background noise.’

I currently already feed the HD video signal from Troopy through the umbilical chord to the laptop where the Xzistor Concept brain model is running. Now to test if the sound is coming through too! This is exciting!

Bare with me as I initiate my little Lego friend!

Ok, check out the video below:

Idea 2.2 – Speaker

Troopy has a speaker which resides in the EV3 Brick as can be seen in the photograph below (the perforated lines on the side of the hub):

Lego Self-Balancing Robot

As an aside: The interesting Lego self-balancing robot above – built by another robotics enthusiast – could easily be turned into an Xzistor Concept robot – all we need to do is to include the following senses: colour sensing, ultrasonic distance sensing and inertia sensing (by making use of the standard Lego gyro sensor).

Back to the speaker.

I found that sending sounds to the onboard speaker over WiFi is a bit cumbersome as it is only a very specific mono type wav file (16 bit pcm, mono, sample rate 24KHz or 32KHz) that will work. But I got another idea: To use a Bluetooth speaker.

The reason for this is that Troopy’s Java brain program has no problem playing sound files over the laptop speakers, but trouble sending it to the EV3 brick speaker. If I can send the sound from the laptop to the robot via Buetooth – that can make things much easier. But I need to find a small blue tooth speaker.

Do not know how small (compact) they make Bluetooth speakers – I’ll take a look on the Internet.

Amazingly – they come in quite small sizes. Found many, refined search to these two candidates!

TECHPOOL (on the left) and i-Star (on the right)

I have decided to try my luck with the i-Star Bluetooth speaker (on the right). £12.99 + delivery (but shop around online as they are getting cheaper!). I recommend going for plastic rather then metal to keep the overall weight of your robot down – this will save your robot’s motors. Ordered one will continue here when received!

Hi, received it now – thought I’ll drop in a little video of me testing it (below):

Idea 2.3 – Sound Wave Analyser App

Once we get the above microphone and Bluetooth speaker working, we might need a piece of software to analyse the incoming sound and put it into a graph or histogram (I am thinking if I can make it simpler!). This is to discretisise sound into a digital approximation which we can turn into a sequence of numbers. The numbers will be the representation (or brain state) of the sound in Troopy’s brain.

I am becoming more and more convinced there is a way to simplify this approach and because I always like to simplify things as much as possible and decompose all complexity into simpler constitute parts – I think I will build this project in a much simpler way which I will clearly explain when we set the Objectives for the project – just worried I will fail to convince you that I can teach robots to speak if I make it too simple. 🙂

I still think it is worthwhile to have a play with a sound wave analyser which will prove helpful later for discussions on more sophisticated implementations.

I am now scouring the Internet to see what sound wave analysis tools are available for free online. I was hoping to find this in Java code – but could not find what I am looking for off the bat. I did find this free app which will help to explain why I want this (see below).

RTSPECT is a free program for displaying a real time waveform and spectrum display of an audio signal. With RTSPECT you can monitor the waveform and spectral shape of sounds being played into the computer’s microphone or line input ports. RTSPECT can display one or two-channel audio signals.

A handy tool to better understand sound wave spectra is this Windows Tool for Real-time Waveforms & Spectra which you can download for free courtesy of the University College London’s PSYCHOLOGY AND LANGUAGE SCIENCES team, part of the Faculty of Brain Sciences:

Real-time Sound Wave Spectrum Analyserhttps://www.phon.ucl.ac.uk/resource/sfs/rtspect/ 

Have a play with this tool as we will be talking a little around sound wave spectrum analysis as a means of generating a unique digital representation of words in the brain of the robot. I had a bit of a play in the video below:

The idea is to take a sound (e.g. a word spoken to Troopy by a tutor) and digitise it quite coarsely, say approximate it 20 numerical values. This can then be translated into a simple numerical sequence that can be fed into the robot’s digital brain. The image below explains the principle:

by using the principal below:

ddd

We can use the values in the sequence and play back a kind of ‘robot speech’ equivalent by assigning pitch values to the numeral values in the sequence (yes – it will be clunky until we increase the fidelity). This will create a beep-sound type sequence version of the word. When we eventually add more digital sample points, the sound will improve to the point where it will sound very much like the spoken word.

Note to self: Use this Lejos Java command – PlaySound (tone, duration, volume) . This will play tones on the EV3 brick (these are not wav files just tones)

Idea 2.4 – Babbling

I specifically suggested that you to take a look at the demo video on Robot Entertainment and specifically at the concept of ‘playing’. The vocal equivalent of playing is ‘babbling’. This means Troopy will randomly (instinctively – if you like) utter sounds just like an infant. Why this is important will become clearer later – but basically it will create opportunities for the robot to learn, just like playing does.

Idea 2.5 – Phasing

This is not technical – but more about how I structure the project. I see an opportunity here to break the project up into a few distinct phases – starting really simple and the moving from principle to principle in order to arrive at a more detailed discussion (but I do not plan to let things get too complicated). I also see an opportunity to use assumptions to safe time on aspects otherwise requiring fairly comprehensive and time-consuming tests. I always remind myself that my time is best spend cracking the principles and then leaving others to flesh out detail as spin-off projects…(plenty of MSc’s, PhD’s waiting here!). If this sounds a bit vague, don’t worry, it will be clarified when we set down the final objectives for the project in the next section.

I thought, as a courtesy to those that might be quite unfamiliar the topic discussed here, I will add a few introductory phases to make sure everybody is aligned as we step into the project. I will also draw parallels with a human baby learning to talk. So some interim steps added below:

Phase 1

What is hunger? Show the hunger graph on the laptop dashboard. Show what happens if Troopy is presented with food. There is the sensing of the proximity of the food. Then there is the ‘stop and suckle’ reflex. (I have come back to this because I see they call it the ‘suck’ reflex – which is one of many baby reflexes. Check out this video on Youtube: https://www.youtube.com/watch?v=_JVINnp7NZ0

Of all the baby reflexes explained, we will model only the root and suck reflexes in Troopy (just to show we can build in any reflex we desire). Explain the difference between current feeding arrangement and how this will be adapted to drive the robot towards speech.

Key focus: How Troopy’s hunger currently works.

Key words: Hunger

Phase 2

Introduce the ‘Elsa feeder food source’ and explain how it will work. Mobile so can be anywhere in confine. Will indicate proximity of food and present Elsa and food and ID colour for optica recognition in future. Explain how associaiton-forming will work.

Key focus: How a ‘feeder’ personality can provide a mobile food source.

Key words: Mobile feeder

Phase 3

Explain the concept of ‘babbling’ and parallels with ‘playing’. Set the robot up to babble and explain practical arrangements i.e. 3 seconds linger to allow me to move food source in front of Troopy. Explore the change in emotions towards Elsa, the food source and the colour blue. Explain that for now the colour blue will be used to optically recognise food. Early crying will introduce the learning that sound can elicit help.

Key focus: How Troopy’s can learn to use voice commands to elicit the mobile ‘feeder’ for food.

Key words: Command food

Phase 4

Demonstrate that Elsa has now become a revered object (image) which causes can cause a positive emotion in Troopy. Troopy now wants to get more emotional reward from Elsa even when not hungry. Just seeing Elsa makes Troopy feel good. Now soothing words become source of reward as it is associated with food reward. ‘Good Robot!’ Show Troopy’s positive emotions when hears Good Robot and we should see slight smile on face.

Key focus: How Troopy’s now experiences the words ‘Good robot!’ as emotional reward. This can act as a learning/motivator to learn other things… Troopy will now do things to get that response ‘Good robot!’

Key words: Praise reward

Phase 4

Troopy will learn to elicit emotional reward from Elsa when mimicking/repeating a word after her when linked to the correct image or object. Elsa says ‘Apple’ and holds up apple. Troopy has learned the skill (coordination) to repeat her word ‘Apple’. He sees her smile and hears her say ‘Good robot!’. He only goes to this effort to elicit the emotional reinforcement/reward from Elsa.

Key focus: How Troopy’s repeats words for emotional reward’.

Key words: Mimic for praise

Phase 5

We can make this a discussion based on the principle demonstrated on how the robots future learning of words and expanded vocabulary will work. We can see if we want to send more time show casing further detail or whether we leave it to other researchers

Key focus: How Troopy’s will learn more words and phreases (just a longer word). Also how he will learn to use nw wrods to get what he wants. What is this? What di you want? The Aplle or the orange? No, that is the oragneg? This is the apple do you wnat e to give ou the aplee? Give apple! There you go therre is your apple. Good robot!

The praise words got Troopy now to get the apple…received the apple add further reward.?

experiences the words ‘Good robot!’ as emotional reward. This can act as a learning/motivator to learn other things… Troopy will now do things to get that response ‘Good robot!’

Key words: Praise reward

GreenFor the first phase we can implement this ‘babbling’ reflex. Some thoughts about this initial phase below:

Idea 2.6 – Mobile Feeder

The idea to explore here is to construct a ‘mobile feeder’ that can bring food to Troopy. This device should achieve the follwoing:

1.) Bring food to Troopy anywhere in the Learning Confine

2.) Allow Troopy the sense he is in contact with the food (this will trigger the stop and suckle reflex – same as for human babies)

3.) Allow Troopy to observe visually the object/person/doll that is presenting the food (this is so Troopy can develop an appreciation for this object/person over time)

4.) Apart from the object/person bringing the food, Troopy should also be able to see the food (either physically or an image of it).

5.) As an interim step, this ‘mobile feeder’ also just needs a ‘consistent colour band (say BLUE) that can be easily tracked by Troopy ahead of the more complex objects/images he will learn to associaet with receiving food. We can then test to see if Troopy develops a ‘positive emotional state’ when looking at this colour band.

Option: We can make the ‘mobile feeder’ say words which will equally become associated with food over time.

We can explore this ‘mobile feeder’ a little more to develop the idea a little. See below:

Mobile Feeder – more detail: I will present Troopy’s front colour sensor with a specific colour GREEN which will tell the brain program on the PC to starts ‘filling’ Troopy’s stomach. The suckling reflex will make Troopy freeze and suck(le) for food (remember invisible straw like Simmy) I will make the food taste better than any other food source in the Learning Confine. Troopy will make a babble tone and I will make the utterance activity stay present (linger without fading) in the mind for 3 seconds – this will give me time to present the food (GREEN card). When his stomach is full, he will pull away and play unless something more urgent happens. In Troopy’s brain the utterance will be managed in the muscle coordination center area of the mind as if it was caused by throat muscles.

The Food Server Device is seen here from two angles. Troopy will see the FRONT VIEW. There will be 3 image recognition areas of interest in Troopys Filed Of View (FOV): 1 – Elsa’s face 2- food bowel 3 – lower blue panel. We will start by just using the lower blue panel for association-forming. When Troopy;s front colour sensor sees the GREEN card, he will experience a suckle (stop and suck) reflex and his stomach will start to fill.

Just ordered some of these which will be easy to load images in. I like the backward tilt which will avoid shadows that can confuse Troopy. I can stick the GREEN card to the bottom and we do not need real food – an image of food will suffice for now.

I have received the ‘sign holder’ today and built a ‘mobile feeder’ as shown above and think it came out quite well. This is going to be a very important device for our discussions going forward, because it will not only explain how Xzistor robots learn to talk but other important aspects as well – for instance how they get attached to objects. Below is a video showing how this simple device works.

Below we can see, through Troopy’s own eyes, how he will observe Elsa with the food bowl through his onboard webcam.

There are three areas of interest here 1.) Elsa’s face, 2.) the food bowl and 3.) the blue colour band at the bottom.

Each of these three areas has its purpose. Let’s discuss them individually:

Elsa’s Face

1.) Elsa’s face will be pixelated into a mosaic pattern and numerated into a string of values representing the image in the robot brain. This image need not be identical every time, as angles and lighting change slightly, the robot brain will allow for a threshold of say 70% correlation. This threshold/tolerance is a separate interesting discussion for later. The main thing to remember is that no matter how complex the face is, it will always just be turned into a string of numerals in the robot brain (and thus a state of 0’s and 1’s). As time goes on Troopy will learn that this set of pixels mean food is being offered – which will make his hunger go away. And it is therefore a good thing. Elsa’s words (voice) will also start to trigger positive emotions.

The Food Bowl

2.) The food bowl will also just be an image (set of pixels) that will be turned into numerals. As time goes by Troopy will learn that this set of pixels also mean food – which will make his hunger go away. And it again this will be a good thing that will trigger positive emotions.

The Blue Colour Band

3.) The blue colour band at the bottom will offer a helpful intermediate step which can simplify things and we can use this ‘consistent colour patch’ ahead of any face/object recognition because only the colour blue is recognised and associated with food. This will simplify things but still demonstrate the principal. As above, this will trigger positive emotions.

One of the first tests we can do after having Elsa fed Troopy a couple of times, is to see whether this specific hue of blue will trigger positive emotions when placed in Troopy’s field of view. This will prove the point that an object (which is just a simple or complex set of pixels) can become an object of desire/happiness as it will make the robot happy when looking at it – even in the absence of hunger or food.

This is quite exciting! Cannot wait to show you this! This positive emotion experienced by Troopy will be real happiness – as real as it is in your and my brain!

Note: I just want to confirm that any object/face recognition software is not part of the Xzistor Concept brain model and not needed. In advanced Xzistor Concept instantiations we will use an approach where the field of view is pixelated with more detail around the’pupil’ and diminishing detail as we move away from the pupil (a bit like a human eye).

This is the end of the Ideation Phase. We can now consolidate our initial thoughts into clear Objectives and Assumptions.

3.0 – Objectives & Assumptions

Guidance Text: Nailing down very clear objectives as well as assumptions that will achieve the aim of the project.

Objective 3.1

Objective 3.1: Explain how Troopy’s hunger currently works (text + video).

  • Assumption 3.1.1: Assume viewer has no prior knowledge of how hunger is modeled by the Xzistor Concept brain model.
  • Assumption 3.1.2: Assume by demonstrating the baby ‘suck’ reflex we prove that we can do all the other human baby reflexes (and many non-human ones).
  • Assumption 3.1.3: Assume we do not need to give Troopy physical food, we can tell is brain directly that he is being fed (stomach is filling up).

Objective 3.2

Objective 3.2: Explain how Troopy’s static food source can be replaced by a ‘mobile feeder’ food source (text + video). Along with the optic image of Elsa’s face and the food, the ‘mobile feeder’ will say the the words: ‘Yes! Good robot! as the robots start sucking. This phrase will become associated with being fed, Elsa’s face and the food image.

.

  • Assumption 3.2.1: Assume the 3rd party (human or robot) that delivers the food can be simplified by a photographic image rather than real 3D objects (e.g. pictures of Elsa doll and food bowl will suffice).
  • Assumption 3.2.2: As a simplification assume a single colour panel (blue) can be used to demonstrate the association-forming with the ‘mobile feeder’ ahead of using the more detailed face and food bowl images.
  • Assumption 3.2.3: Assume we do not need to let Troopy ‘hear’ the phrase ‘Yes! Good robot!’. In stead we will directly inject a representative ‘audio state’ representing this sound pattern into his brain for association-forming. The logic behind this will be explained.

Objective 3.3

Objective 3.3: Explain how Troopy’s can learn to use voice commands to elicit the mobile ‘feeder’ for food (text + video). Troopy is rewarded for saying the right word after the mother (‘mobile feeder’) and then given food while hearing ‘Yes! Good robot!’.

  • Assumption 3.3.1: Assume instinctive ‘babbling’ can be modeled by 3 random utterances. One of these (say BEE-BOO) can coincide with Elsa (‘mobile feeder’s’) term for food (BEE-BOO). Then Troopy is rewarded with food from the ‘mobile feeder’ (Elsa) and BEE-BOO becomes the learnt command to elicit food. P.S: This is how human infants learn – the first repetition of the required word is purely coincidental and then the robot learns by reward it is a useful word. Where is Pavlov when you need him!
  • Assumption 3.3.2: Assume this illustrates that Troopy will learn that there is reward in repeating Elsa (‘mobile feeder’s) words and he will gradually learn other words by repeating them after her. He does this to get a ‘Yes! Good robot!’ which now elicits positive emotions in the robot brain. We do this to save on a very time consuming exercise to let Troopy learn more words – instead we explain how it will work in the robot brain (just as in the human brain).

At this point we have demonstrated the principle of a robot developing speech just like humans do!

Important Note!

Having demonstrated the principle of how a robot will learn to talk, we can now have a play with adding an actual hearing sense with sound wave analysis and associated brain states. This will become ‘tinkering’ in my books as it is simply adding more granularity to what we have already proven – but some will Labbers have indicated they will appreciate if we can work away the assumptions. But the point should have been proven by now so now we will just develop the demonstration more to making it more obvious how it works. It will also help those working on this topic to move closer to more repeprwite speech. Again my MO is to prove the pijt and move to the next topic.

Longer Term Objectives

Objective 3.4: Give Troopy an actual sense of hearing and integrate to allow association-forming.

  • Assumption 3.4.1: We will assume we can break sounds (words) down using the sound wave analysis tool and digitise these into a sensory hearing brain state representation of the sound. This will now be used in association-forming and will be a little more complicated than the direct injection method.

Objective 3.5: Give Troopy a voice based on discretsied sound patters (this will understandably start off by being quite crude)

Objective 3.6: Increase granularity to gradually move closer to better quality sounds heard and reproduced by Troopy.

Objective 3.7: Demonstrate real word mimicking and expanding vocabulary.

Objective 3.8: Demonstrate phrases rather than single words.

CONTENT TO BE ADDED BEYOND THIS POINT

4.0 – Project Planning & Stages

Guidance Text: Break the project into logical stages to gradually progress to the final experiment that will meet the objectives.

5.0 – Experimental Setup and Parts Procurement

Guidance Text: Set up the lab (Learning Confine, Robot and PC) up for the experiment and procure additional hardware and software that may be required.

6.0 – Work-benching through the Project Stages

Guidance Text: Work the plan in a systematic way in collaboration with Labbers to meet the objectives.

7.0 – Results

Guidance Text: Published results in the from of reports, posts and/or demo videos showing that the objectives have been met. Labbers get a chance to review and contribute – and suggest the way forward.

8.0 – Compliance Check

Guidance Text: A final check to ensure the objectives have substantively been met. Again Labbers will be asked to do a final critical check to verify results.