Thank You to Jovencito Talasan
One method of Neural Network Programming is Q Learning that will be applied for ai robots. Q learning is is an off-policy method that can be run on top of any strategy wandering in the Markov Decision Process (MDP). It uses the information observed to approximate the optimal function, from which one can construct the optimal policy or to summarize the function of Q learning, only the state-action function needs to be stored and updated.
This the intelliBrain Mobot’s Flow Chart:

The Language used for the Neural Network Programming is JAVA. The Program was then loaded to the controller.
LEARN_RATE = 0.35f; // the learning rate constant
public static float EXPLORE_RATE = 0.15f; // the rate it tries out random
//actions
private static Servo mLeftServo;
private static Servo mRightServo;
byte [][] state;
byte [][]action;
private static float [][] Q; // table of Q-values (action,state)
byte i = -1; // last state
byte a = -1; // last action
byte a1 = 0; // action with highest q-value
float r = 0; // Last reward
public ObstacleAvoidance(byte actions, byte states) // constructor
{
state = new byte[states][4]; // 16 unique states, 4 unique SENSOR
action = new byte[actions][2]; // 4 unique actions with 2 actuators
initAction(action);
initState(state);
Q = new float[actions][states]; // table of Q-values for action-states
//setRandomValues(Q);
setLearnedQValues(Q);
}
public static void main(String [] args)
{
byte actions = 4;
byte states = 16;
long iterMax=100000;
ObstacleAvoidance qutil = new ObstacleAvoidance(actions, states);
ObstacleAvoidance qbrain = new ObstacleAvoidance(actions, states);
byte[] e = new byte[8]; // The environment values
2
byte [] motorCommand = new byte[2]; // Commands to send to motors
mLeftServo=IntelliBrain.getServo(1);
mRightServo=IntelliBrain.getServo(2);
for (long iter=0;iter<=3)
e[0]=1;
else
e[0]=0;
if(e[5]<=4)
e[1]=1;
else
e[1]=0;
if (e[6]<=3)
e[2]=1;
else
e[2]=0;
if (e[7]<=3)
e[3]=1;
else
e[3]=0;
}
public static void initAction(byte [][] state)
{
byte count = 0;
for(byte i=0;i<2;++i)
for(byte j=0;j<2;++j)
{
state[count][0] = i;
state[count][1] = j;
++count;
}
}
public static void initState(byte [][] state)
{
byte count = 0;
for(byte i=0;i<2;++i)
for(byte j=0;j<2;++j)
for(byte k=0;k<2;++k)
for(byte l=0;l<2;++l)
{
state[count][0] = i;
state[count][1] = j;
state[count][2] = k;
state[count][3] = l;
5
++count;
}
}
public static void setRandomValues(float [][] Q)
{
for(byte i=0;i=0)
{
r = getRewardValue(e);
a1 = getMaxAction(j);
Q[a][i] = Q[a][i] + LEARN_RATE * (r + Q[a1][j] - Q[a][i]);
// Q["+a+"]["+i+"] = " + System.out.println(""+Q[a][i]);
}
i = j;
// GREEDY
float rand = (float)Math.random();
if(rand > EXPLORE_RATE)
{
a = getMaxAction(j);
//System.out.println("Q["+a+"]["+j+"] = " + Q[a][j]);
}
// E-GREEDY
else
a = (byte)(Math.random() * 4);
buzzer.beep();
return a;
}
// find the largest Q-value for a given state (j), and return action
public byte getMaxAction(byte state)
8
{
float max = -1000;
byte action = 0;
for(byte a=0;a max)
{
max = Q[a][state];
action = a;
}
}
return action;
}
public byte getState(byte [] e)
{
byte count = 0;
for(byte i=0;i<2;++i)
for(byte j=0;j<2;++j)
for(byte k=0;k<2;++k)
for(byte l=0;l<2;++l)
{
if (e[0] == state[count][0])
if (e[1] == state[count][1])
if (e[2] == state[count][2])
if (e[3] == state[count][3])
return count;
++count;
}
return -1;
}
public float getRewardValue(byte [] e)
{
byte reward=0;
if(a==0)
reward += 3; // good reward
if (e[0] == 1||e[1] == 1||e[2] == 1||e[3] == 1)
reward -= 2; // give bad reward: penalize
return reward ;
}
}
To view the Result of the intellibrain Neural Network Programming on ai Robots. Click here
Did find the post very useful? Maybe you want to buy me a glass of beer!






1 Comments until now.
Tals, fuzzy logic e post kay mao na ang thesis sa mga graduating sa CoE karon, Specially sa Robot Soccer Goallie Behaviour
Comment!