Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
AMME5520
Advanced Control and Optimisation
Project 1
1 Problem Introduction
In this project, the planning and control of a theoretical robot waiter is investigated, modelled as the segway-like inverted pendulum given in Figure 1. This diagram also shows the five state variables of the robot,
x =lθ θ(˙) v px pg ϕ]T ,
which are the body angle and angular velocity, speed, x and y positions, and yaw angle respectively. The path planning and control of the robot will be designed to allow traversal of a restaurant with semi-randomly placed circular tables, such that the robot avoids collisions and maintains balance and stability at all times.
(a) Side view (b) Top view
Figure 1: Model and state variables of the proposed waiter robot
2 Path Planning
2.1 Occupancy Map
(a) Base occupancy (tables only)
(b) Occupancy with padding (grey)
Figure 2: Occupancy maps for the restaurant with obstacles (tables) in black and padding in grey
To model the restaurant, a 2D occupancy map is generated by randomly creating circles with radius R = 0.5m, and a minimum separation of 3R between each table and the edges of the restaurant. For a restaurant of 10 × 10m with 10 tables, this results in the occupancy map in Figure 2a. For simplification, the robot is modelled as a point object on the 2D surface of the restaurant. To account for the width of the robot a padding of width Rrobot is added around the tables and restaurant edges. Additional padding of δR = 10cm is added to allow for small deviations from the path, which mitigates against collisions when the path is planned next to an obstacle. The final padded occupancy map is given by Figure 2b.
2.2 Probabilistic Roadmap
Most path planning algorithms, such as those investigated in this report, are designed to search over a finite number of nodes in a graph. For a robot operating in a continuous space such as a restaurant, there are two commonly employed methods to discretise the space into a finite set of nodes. The first is grid based sampling, where the continuous space is broken into a grid with a specified coarseness, with each grid space representing a node. Adjacent grid spaces typically define the edges of the graph, although other methods that take into account the possible motions of a real robot are also used. The second method is to create a probabilistic roadmap (PRM), where points are randomly sampled from the space to form nodes, and edges are either determined by nodes within a fixed radius of each other, or by choosing the K nearest neighbours.
The PRM implemented in this project uses the K nearest neighbours method, as it can reduce the chance of having isolated nodes, as well as reducing the density of connections when the number of nodes is increased (this is mainly for visual benefit). If an edge between nodes would collide with an obstacle, the next nearest node is selected until each node has K connections (where possible). Collisions are detected by drawing a line on the occupancy grid from one node to another using Bresenham’s line algorithm [1], then checking if any point on the line is an obstacle. Bresenham’s algorithm was used as it is a very efficient method to generate a line on a grid, because it does not require floating point operations.
(a) N = 80, K = 3, d = 0
(b) N = 80, K = 3, d = 77cm
(c) N = 150, K = 5, d = 57cm
Figure 3: PRMs generated by adjusting N , K and d. Best viewed zoomed in.
Once built, the resulting directed graph is made undirected by connecting any one-way edges, so some nodes may have more than K connections. To improve the distribution of nodes, nodes can also be sampled such that they must be some minimum distance from each other to be accepted. The three parameters that determine the roadmap are then the number of nodes N , the number of neighbours K , and the minimum separation between nodes d. Some example roadmaps with varying parameters are given in Figure 3, with the final roadmap used in the remainder of the project being Figure 3c. The parameters were iteratively adjusted until the restaurant floor was well covered and sharp angles between edges were reduced, which benefits the implementation in Section 4.
2.3 Dijkstra’s Algorithm
The first path finding algorithm implemented is Dijkstra’s algorithm, which works by iteratively updating an estimated arrival cost at each node in the graph. Dijkstra’s algorithm assumes that every edge has a positive cost, i.e. g(x → y) > 0 for all x and y in the graph. The algorithm works by initialising the cost to reach every node (except the start) as ∞ , and a queue of unsolved nodes Q containing all the nodes. The node x with the minimum estimated arrival cost Va is then chosen and removed from the queue, i.e.
x = arg min Va (z).
z=Q
For each neighbour y of x, we check if Va (x) + g(x → y) < Va (y), and if so we have found a shorter path to y. The estimate of Va (y) is updated, and the predecessor node for y is set to x. This process repeats until the target node has been selected, at which point the path from x to y can be found by tracing back through the predecessor nodes. Dijkstra’s algorithm guarantees that the optimal path between two nodes will be found, and is generally a good choice when there is no additional information about the type of graph given.
2.4 Heuristics and A*
While Dijkstra’s guarantees an optimal path, it can often search more nodes than it needs to. Given prior knowledge of the graph properties, it is possible to design a heuristic that modifies our choice of the node to remove from the queue, and reach the end node faster. This is the key difference between Dijkstra’s and A*, and the new choice of x is now,
x = arg min(Va (z) + h(z)) ,
z=Q
where his the chosen heuristic. If his both admissible and consistent, then it will also guarantee the optimal path is found. If the heuristic is a good estimate of the true cost to go V (x), then A* will search less nodes than Dijkstra’s, as the selection of x will be more directed to the goal. A heuristic is admissible when it always underestimates the true cost to go, such that h(x) ≤ V (x), and it is consistent when h(x) ≤ g(x → y)+h(y). There are cases however where it may be advantageous to have a heuristic that is not admissible or consistent, as discussed below.
The performance of Dijkstra’s algorithm will now be compared with A* for the following heuristics, given the knowledge that the graph represents a physical 2D space
. The Euclidean distance from x to the final node. This is both consistent and admissible, guaranteeing the optimal path. Unless specified otherwise, A* refers to the implementation with this heuristic.
. The square of the distance. While not consistent or admissible, in computationally restricted environments this can lead to faster searches with minimal effect on the path optimality (situation dependent).
. The inverse of the distance. Neither consistent or admissible, this guarantees every node in the graph will be searched before the final node (in Matlab, 1/0 = ∞), and is an example of a poor choice of heuristic.
(a) Dijkstra (b) A*
Figure 4: Comparison of each path finding algorithm when traversing from node 3 to node 46
From Figures 4a and 4b, we can see that while Dijkstra and A* both give the same path, A* searches far fewer nodes, due to the heuristic influencing the choice of x. When using the squared distance heuristic, the path is not always optimal, however it avoids the computationally expensive call to the square root function, searches less nodes, and returns a path that in many cases may be “good enough”. The node choice is just weighted more to those closest to the end node, rather than the cost of traversing those nodes. By contrast, using the inverse distance gives the optimal path, but searches every node, and so would never be beneficial to use over the standard A* heuristic. These observations are further validated by applying each algorithm for 100 random combinations of start and end nodes, then taking the average across the trials, the results of which are presented in Table 1.
Table 1: Averaged statistics for each path finding algorithm across 100 random start and finish nodes
|
Dijkstra |
A* |
A* (squared) |
A* (inverse) |
Cost (m) |
6.026 |
6.026 |
6.527 |
6.026 |
Nodes searched |
79.9 |
17.6 |
8.27 |
150 |
Time (ms) |
0.759 |
0.263 |
0.116 |
2.53 |
3 Angle and Speed Control
3.1 Model Parameters
The robot will be modelled after the Segway Minipro [2], with the following model parameters (parameters marked with a ‘*’ were not directly provided, but estimated from those that were):
. Wheel radius ρ = 131mm
. Segway radius Rrobot = 273mm
. Distance from centre of body to centre of wheel axis l = 240mm
. Maximum torque umax = 70Nm
. Body mass m = 8kg*
. Combined wheel mass M = 5kg*
Assuming the robot can be represented as a thin rod of uniform density (the body) attached to thin discs of uniform density (the wheels), we can calculate the moments of inertia JP (body about its centre of mass) and JM (wheels about the axis perpendicular to their face) as:
JP = 12 m(2l)
= 0.096kg m2
JM = Mρ2
= 0.0686kg m2
With a gravitational acceleration of g = −9.81ms−2 , we can then calculate the physical constants for the system as:
µ 1 = JM + ρ2 (m + M) = 0.292kg m2
µ2 = JP + ml2
= 0.384kg m2
µ3 = ρml
= 0.157kg m2
µ4 = glm
= −11.8kg m2 s−2
3.2 Operating Points
The dynamics for the robot are described by the equations from the assignment sheet, with the exception of the dynamics for ˙(v) , which are corrected as θ(¨) = f1 (θ , θ(˙)) − f2 (θ)u1 . Since there is no air resistance in the model, the only equilibrium point of the robot at constant velocity is when θ = 0, as there is no counteracting force.
This applies for any constant velocity, whether the robot is stationary or moving at v = 1ms−1 . Since f2 and f4 are strictly positive, u1 must also be 0 at equilibrium for θ(¨) and ˙(v) to remain at 0.
3.3 Linearisation of Dynamics
By decoupling the positions and yaw angle from the system, we can linearise around the operating point xop =[0 0 0]T , which will be a valid linearisation for any velocity.
To do so, we compute the Jacobian and evaluate it at xop , then split it by state and control variables to create the linearised system ˙(x) = Ax + Bu1 .
3.4 LQR Design
3.4.1 Choice of weight matrices
With the system linearised, the main design feature for anLQR controller is the cost matrices Q and R. As a starting point these matrices were designed using Bryson’s method, which requires choosing the maximum desired value for each state and control variable, ¯(x)i and ¯(u)i. Q and R are then diagonal matrices, with qi =¯(x)i(−)2 and ri =¯(u)i(−)2 .
This normalises the costs so that they are all dimensionless, and allows easier design of weights than pole placement. The weights of each state and control now determine the importance of minimising that state, and hence the controller response. To choose values for ¯(x) and ¯(u) , we make an initial guess and simulate the response for a non zero starting position. We can see in Figure 5 that the control limit of 70Nm is exceeded, and θ and v take a while to converge. The weights are then iteratively updated (e.g. ¯(q) is reduced), and a settling time of Ts = 5s is specified by solving the LQR gain for the modified A matrix, Amod :α =
4
Ts
Amod = A + αI
This guarantees the real part of the system poles are less than α , which approximately determines the settling time. Using this modification and the final weight matrices, the improved response in Figure 5 has shorter settling time for most states to converge, and does not exceed torque limits to do so. The tradeoff is slightly higher overshoot to achieve this.
0 2 4 6
Time (s)
0 2 4 6
Time (s)
0 2 4 6
Time (s)
0 0.1 0.2 0.3 0.4
Time (s)
Figure 5: System response for different weightings in Q, R and Ts , with initial condition of x0 = I0.61 0.35 −0.5]T
3.4.2 Varying initial and reference conditions
The designed controller is then simulated for a variety of semi-randomly generated initial conditions (θ and θ between ±0.7, v between ±1), tracking both a 0 and 1ms−1 velocities.
While some conditions are easier to reach convergence from (e.g. states are closer to 0, or θ and θ have opposite sign), the controller consistently converges to the reference signal in a similar time regardless. Only velocity plots are shown in Figure 6, but all other states also converge.
0 2 4 6
Time (s)
(a) Reference of v = 0
0 2 4 6
Time (s)
(b) Reference of v = 1ms−1
Figure 6: Controller performance for different initial conditions and reference velocities
3.4.3 Linear system response
Interestingly, the trend of the linear system response appears to be unaffected by how far the system starts from the linearisation point xop , as shown in Figure 7. Instead, the linear and non-linear responses differ most at the point of overshoot, with the difference roughly proportional to the amount of overshoot. This demonstrates the robust properties of LQR, which can stabilise a system even when it differs from the linear model.
x0: [0.17, -0.09, 0.20]
0 2 4 6
Time (s)
(a) Starting close to xop
x0: [1.31, 0.70, -1.50]
0 2 4 6
Time (s)
(b) Starting far from xop
Figure 7: Comparison between linear and non-linear system response