pedro nunes controlo/manuten˘c~ao da posi˘c~ao de ve culos

Universidade de AveiroDepartamento deElectronica, Telecomunicacoes e Informatica,

2019

Pedro Nunes Controlo/manutencao da posicao de veıculosroboticos flutuantes aereos e subaquaticos, baseadoem visao.

Vision-based station-keeping of floating aerial andunderwater unmanned vehicles.

Universidade de AveiroDepartamento deElectronica, Telecomunicacoes e Informatica,

2019

Pedro Nunes Controlo/manutencao da posicao de veıculosroboticos flutuantes aereos e subaquaticos, baseadoem visao.

Vision-based station-keeping of floating aerial andunderwater unmanned vehicles.

Dissertacao apresentada a Universidade de Aveiro para cumprimento dosrequisitos necessarios a obtencao do grau de Mestre em EngenhariaEletronica e Telecomunicacoes, realizada sob a orientacao cientıfica doDoutor Francisco Jose Curado Mendes Teixeira, Investigador Nıvel 2 doDepartamento de Eletronica, Telecomunicacoes e Informatica (DETI) daUniversidade de Aveiro e sob co-orientacao cientifica do Doutor Jose NunoPanelas Nunes Lau, Professor Auxiliar do Departamento de Eletronica, Tele-comunicacoes e Informatica (DETI) da Universidade de Aveiro.

o juri / the jury

presidente / president Doutor Pedro Nicolau Faria da FonsecaProfessor Auxiliar da Universidade de Aveiro (por delegacao do Reitor da Univer-

sidade de Aveiro)

vogais / examiners committee Doutor Vıtor Manuel Ferreira dos SantosProfessor Associado da Universidade de Aveiro

Doutor Francisco Jose Curado Mendes TeixeiraInvestigador Nıvel 2 da Universidade de Aveiro (orientador)

agradecimentos /acknowledgements

Gostaria de dedicar este trabalho a minha avo, que me deu a chance deperseguir uma educacao superior, que sempre me deu motivacao para meesforcar e dar o sempre melhor que posso, atraves dos seus exemplos.Gostaria de agradecer aos meus pais por terem feito de mim a pessoa quesou e por me terem apoiado em todos os momentos complicados.Gostaria de agradecer a minha namorada por me mostrar sempre o ladopositivo em todas as situacoes, por ser uma melhor amiga e dar o apoioemocional necessario para continuar com um sorriso.Gostaria de agradecer a todos os meus amigos por os momentos partilhados.E por fim mas nao por ultimo, queria deixar um agradecimento aos meusorientadores, Prof. Francisco Curado e Prof. Nuno Lau, por me terem aju-dado o quanto me ajudaram e terem acreditado em mim.Todos referidos neste pequeno excerto foram pecas fundamentais e im-prescendiveis para a minha formacao como pessoa, que aparenta ser in-acabavel, e para a conclusao desta grande jornada.O dirigıvel usado neste trabalho no laboratorio de Sistemas Inteligentes eRoboticos (IRIS) do DETI e propriedade e foi gentilmente disponibilizadopelo Instituto de Telecomunicacoes de Aveiro gracas ao apoio do Prof.Paulo Monteiro.

Resumo Esta dissertacao aborda o problema de controlo/manutencao da posicaode veıculos roboticos flutuantes, ou seja o procedimento destinado a manterum veıculo aerio ou subaquatico, a pairar sobre um determinado ambiente,numa posicao fixa previamente estabelecida em relacao a um referencial lo-cal ou global. O metodo proposto para atingir este objectivo, na ausencia dedados de posicionamento GPS, consiste num sistema de controlo baseadoem visao, designado na literatura robotica por ”visual-servo control”. Aabordagem explora marcas visuais presentes no terreno sobrevoado pelo ve-iculo, tirando partido de camaras de video digital normalmente disponıveiscomo equipamento ”standard” neste tipo de plataformas autonomas. Osistema implementado no ambito do trabalho assenta em camaras de videodigital e sistemas computacionais de muito baixo custo que se demonstrasatisfazerem plenamente os requisitos da aplicacao em vista. O ambientede teste proposto para validar o sistema de controlo implementado e umpequeno dirigıvel (blimp) que pode ser operado dentro de um edifıcio comsuficiente espaco interior disponivel. Devido a diversas dificuldades opera-cionais sentidas durante o trabalho da dissertacao, que impossibilitarama realizacao de testes com o veıculo real, foi desenvolvido um simuladorcomputacional do ”blimp”, parametrizado com valores reais e destinado afornecers os sinais requeridos pelo sistema de controlo baseado em visaoreponsavel pela manutencao da posicao do veıculo. Embora as diferentescomponentes funcionais desenvolvidas neste trabalho nao tenham sido com-pletamente integradas para implementar um sistema totalmente funcional,foram implementados os diferentes blocos funcionais que poderao ser pos-teriormente integrados de forma a permitir atingir os objectivos do trabalhoproposto.

Abstract This dissertation addresses the problem of “station-keeping” of a roboticfloating vehicle, i.e., the procedure of maintaining a hovering underwatervehicle or a lighter-than-the-air ship on a pre-established 3D position rela-tively to a ground-referenced frame. The method proposed to achieve thisobjective, in the absence of GPS signals, is a vision-based control schemedesignated in the robotics literature as visual-servo control. The approachexploits the availability of visual features observed on the ground surveydby the vehicle and takes advantage of standard digital video cameras nor-mally available on unmanned platforms as those contemplated in the presentcontext. The system implementated in the context of this work relies oninexpensive video cameras and computational resources, which are provento fullfil the requirements of the envisioned application. The proposed testbench of the control system is a small blimp that can be operated insidea sufficiently large building. Due to difficulties in the operation of this airvehicle experienced during the work, the dissertation developments includeda dynamics-based computer simulator of the blimp parameterized by realphysical parameters and designed to provide the signal inputs required by thevisual-servoing system responsible for station-keeping the vehicle. Althoughthe different functional components developed in this work have not beencompletely integrated in a fully-functional version of the system, the the-oretical and experimental frameworks of the proposed approach have beenestablished and a set of operational blocks has been implemented which canbe later integrated in order to achieve the objectives of the project.

Contents

Contents i

List of Figures iii

List of Tables v

Acronyms vii

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Thesis layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Literature and Fundamental Concepts 3

2.1 Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.1 Camera Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.2 RANSAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.3 Pinhole Camera Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.4 Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.5 SIFT Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2 Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2.1 Visual Servoing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3 Tools and Methods 24

3.1 Blimp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2 Hardware configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2.1 Blimp Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2.2 Low Level Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.2.3 High Level Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.3 Camera sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.4 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.4.1 Visual processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.4.2 Control algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.4.3 Blimp Dynamics and Kinematics . . . . . . . . . . . . . . . . . . . . . 33

i

4 Results and Discussion 364.1 Camera calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.2 Simulated Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.3 1-D Linear Actuator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.4 Blimp Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.5 Visual Servoing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5 Conclusion and Future Work 635.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

References 64

ii

List of Figures

2.1 Comparison between different projection transformations. . . . . . . . . . . . 6

2.2 Demonstration of RANSAC method on a image with high value of points ofinterest. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 Conversion from Euclidean R3 to Euclidean R2. . . . . . . . . . . . . . . . . . 9

2.4 Behavior of light rays in function of the lens curvature. . . . . . . . . . . . . . 10

2.5 The effects of distortion on a chessboard pattern followed by the correction ofthe distortion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.6 Visual representation of Difference of Gaussians. . . . . . . . . . . . . . . . . 15

2.7 Implementation of the Difference of Gaussian procedure on a sample imagecomposed by a high contrasting square. . . . . . . . . . . . . . . . . . . . . . 15

2.8 Implementation of the Difference of Gaussian procedure on a sample imagecomposed by a high contrasting circle. . . . . . . . . . . . . . . . . . . . . . . 15

2.9 Detection of local extrema on a octave. . . . . . . . . . . . . . . . . . . . . . . 16

2.10 Representation of the descriptors detected in images 2.7 and 2.8, respectively. 16

2.11 Representation of multiple gradients, with different magnitudes and orientations. 17

2.12 Representation of a quadrant on a descriptor. . . . . . . . . . . . . . . . . . . 18

2.13 Application of SIFT algorithm on a image different orientations. . . . . . . . 18

2.14 Representation of an open loop system. . . . . . . . . . . . . . . . . . . . . . 20

2.15 Representation of an closed loop system. . . . . . . . . . . . . . . . . . . . . 20

2.16 Practical results(feature trajectory, velocity vector,trajectory of center point)of an IBVS control system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.1 Blimp provided for the implementation of this project . . . . . . . . . . . . . 24

3.2 Blimp schematic with origin of body-fixed referential placed at a point on thegondola. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.3 Block diagram of the integrated circuits. . . . . . . . . . . . . . . . . . . . . . 26

3.4 Representation of possible camera distortions. . . . . . . . . . . . . . . . . . . 28

3.5 Representation of the registered positions and the areas that define the stateof the vehicle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.6 Diagram of the movement state during the position registration . . . . . . . . 32

3.7 Motor thrust in XB direction, versus the voltage applied on the motor, Vmotor. 35

4.1 Test 1: Calibration procedure with the ideal straight lines placed on thecolumns of the chessboard pattern. . . . . . . . . . . . . . . . . . . . . . . . . 37


iii


4.4 Undistortion of the chessboard pattern used in the calibration procedure. . . 434.5 Graphical representation of the trajectory performed by the camera pose. . . 444.6 Test 1: Distance of x and y components from the reference point. . . . . . . . 454.7 Test 1: Total distance from the reference point. . . . . . . . . . . . . . . . . . 454.8 Test 1: Velocity of the simulated camera pose. . . . . . . . . . . . . . . . . . 464.9 Test 2: Distance of x and y components from the reference point. . . . . . . . 464.10 Test 2: Total distance from the reference point. . . . . . . . . . . . . . . . . . 474.11 Test 2: Velocity of the simulated camera pose. . . . . . . . . . . . . . . . . . 474.12 Test 3: Distance of x and y components from the reference point. . . . . . . . 484.13 Test 3: Total distance from the reference point. . . . . . . . . . . . . . . . . . 484.14 Test 3: Velocity of the simulated camera pose. . . . . . . . . . . . . . . . . . 494.15 Test 4: Effects of the rotation on the camera frame. . . . . . . . . . . . . . . 494.16 Test 4: Effects of the rotation on the camera frame. . . . . . . . . . . . . . . 504.17 Test 4: Matching of rotated frames with different directions. . . . . . . . . . . 504.18 Visualization of the 1D actuator test, with reference frame on the left and the

current frame on the right. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.19 Test 1: Total distance from the reference point measured by the application. . 544.20 Test 1: Velocity of the camera pose measured from the application. . . . . . . 554.21 Simulink model overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.22 Simulink block: Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564.23 Simulink blocks: Lateral and Longitudinal dynamics . . . . . . . . . . . . . . 574.24 Simulink block: Kinematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.25 Simulink block: Main thruster response . . . . . . . . . . . . . . . . . . . . . 584.26 Simulation results: linear velocities of the vehicle. . . . . . . . . . . . . . . . . 594.27 Simulation results: angular velocities of the vehicle. . . . . . . . . . . . . . . . 594.28 Simulation results: Euler angles. . . . . . . . . . . . . . . . . . . . . . . . . . 604.29 Simulation results: 3D position of the vehicle (notice that coordinate ZI grows

downwards, according to the North-East-Down (NED) referential convention). 60

iv

List of Tables

3.1 Camera specifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.1 Test 1: Distortion measurements from application vs MathWorks. . . . . . . . 384.2 Test 1: Reprojection errors from application calibration vs MathWorks cali-

bration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.3 Test 2: Distortion measurements from application vs MathWorks. . . . . . . . 394.4 Test 2: Reprojection errors from application calibration vs MathWorks cali-

bration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.5 Test 3: Distortion measurements from application vs MathWorks. . . . . . . . 404.6 Test 3: Reprojection errors from application calibration vs MathWorks cali-

bration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.7 1-D linear actuator minimum resolution. . . . . . . . . . . . . . . . . . . . . . 524.8 Table of positions set to the actuator. . . . . . . . . . . . . . . . . . . . . . . 544.9 Points used for the calculation of the interaction matrix. . . . . . . . . . . . . 61

v

Acronyms

API Application programming interface.

DC Direct current.

DOF Degrees Of Freedom.

DoG Difference of Gaussians.

FPS Frames Per Second.

IBVS Image Based Visual Servo.

LoG Laplacian of Gaussians.

LTA Lighter than Air.

MW MathWorks.

PBVS Position Based Visual Servo.

PIC Peripheral Interface Controller.

PWM Pulse Width Modulation.

RANSAC Random Sample Consensus.

RC Radio Controller.

RPi Raspberry Pi.

SIFT Scale-Invariant Feature Transform.

vii

Chapter 1

Introduction

1.1 Motivation

The work described in this dissertation is motivated by the need to develop reliable, low-cost control systems capable of maintaining a hovering vehicle on a pre-established, fixed 3Dposition in a global reference frame. This problem is designated in the navigation and controlliterature as “station-keeping”, a designation that is applied in different contexts: satelliteorbital control and dynamic positioning of oceanic ships, rigs, and submarine platforms. Theproblem arises frequently in the context of underwater exploration by robotic underwatervehicles, either remotely operated (ROVs) or autonomous underwater vehicles (AUVs) whereglobal positioning signals (GPS) are not available. There are several types of missions inocean exploration, e.g. close monitoring of sea-floor habitats at large depths, that requireobservation vehicles to remain at a fixed position even in the presence of external disturbancessuch as underwater currents, without the inconvenient of using mooring systems or othermechanical apparatus. For such, it is important that the vehicle can autonomously estimateand maintain its position relying as much as possible on standard sensors such as sonaraltimeters and video cameras.

The solution of the station-keeping problem is also of interest for the control of unmanned,lighter-than-the-air vehicles, including blimps, that can be operated in scenarios deprivedfrom GPS signals such as inside a large building or a covered stadium. Although externaldisturbances, such as strong wind are normally negligible, some air turbulence is normallypresent in these scenarios. Additionally, in some of these environments, mooring the blimpis not a practical (or even acceptable) solution due to its interference with other activitiesoccurring in the same space. As such, vehicle drifts are practically unavoidable withoutadequate methods of determining and controlling its desired position.

Since the positioning and control problems posed in both scenarios are very similar, ablimp is a very interesting platform, in terms of cost and facility of operation, to test adiversity of methods that can de adopted to solve the station-keeping problem for both air-borne and submarine systems. These reasons justified the adoption of a blimp as the testingplatform for this project. Its utilization was also motivated by the existence at the Intelligentand Robotic Systems (IRIS) laboratory of the DETI of a small blimp owned and kindly madeavailable by the Institute of Telecommunications of Aveiro (IT-UA).

Vision-based control is proposed to address the problem posed in this work, motivated bythe following considerations:

1

1. Vehicles of the class considered in this context normally use video cameras as standardsensors since their typical missions consist of video image acquisition tasks; as such,there is no need to use dedicated cameras or additional sensors for the current purpose;

2. The above mentioned consideration facilitates the configuration and greatly reduces thecost of the equipment; the proposed configuration for the blimp only requires the integra-tion of inexpensive, current-of-the-shelf components such as dedicated micro-controllersor single-board computers for low-level control of actuators and video processing;

3. Visual-servo control methods are very well documented in the literature and their su-perior efficacy has been demonstrated when applied to the solution of robot controlproblems in diverse scenarios.

1.2 Objectives

The present work aims at the development and implementation of an affordable vision-based system for station-keeping of a floating vehicle, relying on the acquisition of visualfeatures observed in an unstructured environment. The implementation of the proposedcontrol approach implies the configuration of dedicated hardware and the development ofsoftware for digital image acquisition and processing, the implementation of vision-basedcontrol algorithms, and the integration of these components to implement a solution forstation-keeping of the vehicle in a 3-dimensional space relying on a reduced set of actuators.

In order to validate the system under development and assess its performance in realconditions, it is proposed to use a small blimp that can be operated inside the building ofIntelligent and Robotic Systems (IRIS) laboratory of the DETI-UA.

1.3 Thesis layout

This dissertation report is configured as the following:

• Chapter 2 - Literature and Fundamental Concepts: Brief description of thesubjects that compound into the realization of this project.

• Chapter 3 - Tools and Methods: Brief introduction to the components utilizedaccompanied by a demonstration of the developed work on the corresponding compo-nent.

• Chapter 4 - Results and Discussion: Display of the acquired results from thedeveloped work followed by critical commentary of the respective subject.

• Chapter 5 - Conclusion and Future Work: A final comment on the developedwork as a whole.

2

Chapter 2

Literature and FundamentalConcepts

2.1 Vision

2.1.1 Camera Projection

Projective transformation is a concept which refers to the mapping of world features toan image plane. It provides developers the tools to create applications that require pre-cise bi-dimensional representations of 3D objects imaged by digital cameras. Without thesemathematical formulation, any procedures that go further than image observation would beinconceivable. Such problems are very common and are observed regularly, e.g., football fieldstripes that are parallel to each other, do not appear as such on screen, circles figures thatget captured and represented as an ellipse.

With the rapid evolution of computer vision algorithms there was a demand for research inorder to provide greater reliability in visual transformations, since systems for automation oftasks such as, autonomous driving, face unlocking algorithm, object detection, which utilizeworld features as input for its decision making require the error to be as low as possible.

For the solution of the problems stated above, it was necessary to know what were theproperties that could be maintained in both Euclidian and Projective representations. Afterseveral analysis, it was found that the property of straightness should be preserved in all ofthe representations. Thus, it was defined that the requirement, for any successful projectiontransformation of a plane, is when the mapping of the points maintain straight lines[1].

A camera sensor is the most common tool utilized by any digital image capturing methodhowever, it is crucial to know its properties beforehand so its is possible to utilize computervision algorithms to their full potential.

By capturing images of world features, which are 3 dimensional, the camera outputs thisinformation in a single plane, named as an image or a frame, which is 2 dimensional thusloosing 1 dimension. This is the natural behavior of the projection process also known ascentral projection.

Central projection is a process which projects various rays from distinct points, for ex-ample, the vertices of a cube, making these rays converge to one point called the center ofprojection. In between the center of projection and the points chosen to be transformed, aplane, known as the image plane, will be intersected by these rays thus forming the image

3

point correspondent to the world point of each ray. A representation of this projection systemcan be seen in figure 2.3.

Homography

Homography, in many frameworks, is a term utilized to designate the transformation ofcoordinates between the same points in different images.

To understand the definition of homography first it is required to understand the definitionof homogeneous coordinates.

Definition 2.1.1. Homogeneous coordinates is a system of coordinates used in projectivegeometry similar to Cartesian coordinates used in Euclidean geometry.

With this information it is possible to understand the definition of homography, whichgiven as follows [1]:

Definition 2.1.2. A homography is an invertible mapping h from P2(homogeneous 3 coor-dinate vectors) to itself such that three points x1,x2 and x3 lie on the same line if and onlyif h(x1),h(x2) and h(x3) do.

Translating this definition into an algebraic terminology the following equation is obtained:

x′i = Hxi ⇔

x′1x′2x′3

=

h11 h12 h13h21 h22 h23h31 h32 h33

x1x2x3

(2.1)

This provides the point x′i from the multiplication of a 3 x 3 non-singular matrix and xi,a homogeneous 3 coordinate vector.

An homogeneous matrix can provide projective transformations with 8 Degrees Of Free-dom (DOF) movements, where only the ratio of these eight independent parameters is signif-icant.

Homography matrix can also be applied to inhomogeneous coordinates as image pointsand 2D world points are measured directly in some applications. The relation can be givenas

x′ =x′1x′3

=h11x+ h12y + h13h31x+ h32y + h33

(2.2)

y′ =x′2x′3

=h21x+ h22y + h23h31x+ h32y + h33

, (2.3)

where points (x, y) and (x′, y′) are the world and image points, respectively.

To obtain the homography matrix (H), a minimum of 4 point correspondences are neces-sary, additionally, to generate eight linear equations these 4 points must not be collinear.

To understand the full composing components of the projection transformation matrix, itis necessary to understand beforehand transformations that involve fewer DOF. [1] combinesall of the projection transformations in a hierarchy, which follows an ascending order in theDOF provided by the transformations.

4

The second transformation, in the hierarchy of transformations, is the similarity trans-form which is represented as followsx′y′

1

=

s cos(θ) −s sin(θ) txs sin(θ) s cos(θ) ty

0 0 1

xy1

, (2.4)

with s values representing the difference in scale, θ the rotation of the normal axis and (tx, ty)the translation differences in the x and y axis respectively. This provides 4 DOF and asa result the invariant properties of this transformation are the ratio of distances between 2points, the angles from the intersection of lines and circular points, i.e., fixed points utilizedfor the computation of circles. For the creation of the similarity matrix it is required thecorrespondence of at least 2 points.

The third transformation, in the hierarchy of transformations, is the affine transforma-tion that is presented as follows:x′y′

1

=

a11 a12 txa21 a22 ty0 0 1

xy1

(2.5)

With,

A =

[a11 a12a21 a22

], (2.6)

as a 2x2 non-singular matrix that can be decomposed as follows:

A = R(θ)R(−φ)DR(φ), (2.7)

in which R represents the rotation matrix in a given axis and D representing a diagonalmatrix.

By having two angles, φ and θ, an extra 2 DOF is gained, in relation to similarity, due tothe rotation of the x and y axis. Therefore affinity accounts for the scaling in the orthogonaldirections.

The invariant properties of the affinity are:

• The parallelism, due to the fact points at infinity are also mapped under the affinetransformation which translates to parallel lines being able to intersect at infinity whichis the necessary condition to parallelism.

• The ratio of lengths of parallel line segments.

• The ratio of areas, given the fact that the scaling in the rotated directions, measuredby the transformation, will cancel out the ratio of areas.

The final projection transformation in the hierarchy is the homography transforma-tion.

Capable of 8 DOF, due to the addition of a vector v in the affine matrix. v is responsiblefor non-linear effects of the projectivity, thus, allowing the projective transformation to modelvanishing points into finite points, opposed to an affine transformation, which would map thesepoints at infinity.

5

The algebraic representation of the homography matrix is given as:

H =

[A tvT 1

]. (2.8)

Fig 4.3 demonstrates the differences between different projection transformations.

Homography

(8 dof)

Affine

(6 dof)

Similarity

(4 dof)

Figure 2.1: Comparison between different projection transformations.

2.1.2 RANSAC

Random Sample Consensus (RANSAC) is a robust iterative method to estimate parame-ters of a mathematical model from a set of observed data that contains outliers. Its robustnessproperty means that outliers present in the data are filtered out to avoid their influence onthe values of the estimates. Therefore, RANSAC also can be used as an outlier detectionmethod. The algorithm was first published by Fischler and Bolles in 1981. A simple exampleof its application is the process of fitting a line in two dimensions to a set of observed pointscontaining outliers originated, for example, from erroneous measurements.

RANSAC in computer vision can be applied in order to increase the robustness of thecalculation made in projective transformation estimations [2]. To achieve this, a selection of4 random sample correspondences are selected, followed by an estimation of a homographymatrix. After the acquirement of the H matrix, every correspondence is compared to theresult obtained in H, which will result in the correspondence being classified as an inlier oroutlier.

6

• Inliers: the correspondences that fit in the homography estimation, i.e, true-positives.

• Outliers: correspondences that fail to match with the projective transformation, i.e.,false-positives.

The procedure taken to decide, if a given correspondence is an inlier, is by verifying whichcorrespondences are inside the threshold margin, around the lines that are consistent to theresult obtained from the initial projective estimation. If the corresponding point surpasses thegiven threshold value then it is classified as outlier. If the correspondence is in conformationwith the projective estimation then it is considered as an inlier followed by incrementation ofthe inliers correspondences made.

When the RANSAC method has finished testing every possible estimation, the estimationwith the most number of inliers is the defined value for the homography calculation. A visualrepresentation of this method can be observed in figure 2.2.

Figure 2.2: Demonstration of RANSAC method on a image with high value of points ofinterest. (Source: [3])

2.1.3 Pinhole Camera Model

The pinhole camera model relates the points in space to an image plane through severalmathematical equations. This section provides some definitions and deductions of the funda-mental equations, as this project relies heavily on camera projections due to the necessity ofrepresenting spatial cartesian coordinates onto an image plane, and vice-versa. This model isrequired in any computer vision application that utilizes real world features for its executionsince it provides relative, or absolute, measurements used for further calculations such asdistances, velocities, traffic lane assist and much more.

By observing, figure 2.3, the procedure taken in order to transform 3D point A, into 2Dpoint a, located in the image plane, is as follows:

7

Considering point A = (X,Y, Z), where Z is the distance from the camera center to pointA, when applied a conversion to homogeneous coordinates, we obtain the following point:

fXfYZ

=

f 0f 0

1 0

XYZ1

(2.9)

Where the f represents the focal length of the camera sensor.Afterwards it is necessary to account for the offset in relation to the principal point,

p = (px, py).

Definition 2.1.3. Principal point is the point originated from the intersection of the axisperpendicular to the image plane(principal axis).

This will form the point:

fX + pxZfY + pyZ

Z

=

f px 0f py 0

1 0

XYZ1

(2.10)

Considering:

K =

f pxf py

1

(2.11)

The matrix K is also known as the camera calibration matrix, which will be explained infurther detail in section 2.1.4.

Considering the principal axis of the camera is coinciding with the z-axis we have:

a = K[I|0]A (2.12)

To account for camera rotation and translation, due to the fact that it is impossible for allpoints to be on the principal point axis it is necessary to apply the rotation R and translationt values, which represents the orientation of the camera coordinate frame, in relation to theworld coordinate frame, the relation between the 2D point a and 3D point A, is given asfollows:

a = RA′ + t, (2.13)

where:

A′ =

fX + ZpxfY + Zpy

Z

. (2.14)

With this representation it is possible to observe that equation 2.13 provides 9 degrees offreedom (DOF), 3 from the K matrix, considered to be the internal parameters of the camera;3 from the matrix R and 3 from the vector t, which both constitute the parameters known asexternal parameters or external orientation [1].

8

Figure 2.3: Conversion from Euclidean R3 to Euclidean R2. (Source: [1])

Most of the camera sensors do not have equal scales in both axial directions. This cancause unequal scale factors for the measured pixels in each direction. Thus by modifyingequation 2.14 it is possible to correct this behavior which provides the equation:

K =

fx s pxfy py

1

(2.15)

Where the fx and fy are the focal lengths respective to its direction and s is the skewparameter which for most cases is considered to be 0. The skew factor has to be accountedmostly for applications that capture points from other image planes, e.g., a photography ofa camera photography. Taking this into account it is possible to conclude that, consideringA = (X,Y, Z)T , and the camera center to be on the same axis as the principal point, theimage plane point,a = (ax, ay), will be:{

ax = sfx ∗X/Z + px

ay = fy ∗ Y/Z + py(2.16)

2.1.4 Camera Calibration

This chapter is one of the most important subjects for this application and most of com-puter vision related development due to the fact that, with the pursuit for balance betweencost and quality, inexpensive cameras often produce visible distortions which affect primarilythe effectiveness of measurements calculations and 3 dimensional reconstructions.

Intrinsic parameters

Whilst the calculation of the internal parameters were already described in section 2.1.3,it is necessary to include the concept of distortion in order to obtain the correct visual repre-sentation of the captured frames.

Distortion is an occurrence common in camera sensors and is related to the physicalattributes of the camera, such as, altering zoom values, lens curvature and lens size (worsefor small lenses), and can be divided in two different effects which are radial and tangential.

Radial distortions occur when light rays intersect a camera lens that has considerablecurvature which causes the light ray bend in different ways around the lens, fig 2.4. Light

9

tends to bend more at the edges of the lens than the optical center, therefore the resultingeffect is greater towards the edges of the screen. Tangential distortions happen when thecamera lens is not parallel to the camera sensor, which causes the center of curvature of lenssurfaces not being strictly colinear [4]. It is important to note that this is not a commonbehavior for most camera manufacturers, and as such, this distortion component can beneglected if the framework utilized does not provide these values.

Figure 2.4: Behavior of light rays in function of the lens curvature. Source: T.E. of Ency-clopaedia Britannica, ”Lens,” October 2019. [Online; visited at December 11, 2019].

For radial distortion, considering point a = (u, v) an ideal pinhole projection, i.e., non-distorted, and converting to an homogeneous point, (u, v, 1)

First, considering only radial distortion it is possible to model the effect as:

(u, v) = L(r)(u, v), (2.17)

where (u, v) coordinates represent the point in the image, after the distortion; (u, v) theideal image position; r the radial distance from the center for radial distortion and L(r) thedistortion factor, which is a function of the radius r.

An approximation of L(r)(r ∈ R) is given by a Taylor expansion, represented in equation2.18 where the parameters k1, k2, k3, . . . , kn, (n ∈ N) are considered to be part of the intrinsicparameters of the camera.

L(r) = 1 + k1r + k2r2 + k3r

3 + kn (2.18)

Thus, by combining equations 2.16, 2.17 and 2.18 the expression for radial distortion,measured in pixels, is given as:{

δu = u(k1r2i + k2r

4i + . . .)

δv = u(k1r2i + k2r

4i + . . .)

(2.19)

10

Tangential distortion, by applying a similar approach used in radial distortion, can beacquired with the following expression:{

δu = 2p1uv + p2(r2 + 2u2)

δv = p1(r2 + 2v2) + 2p2uv,

(2.20)

where [p1, p2] are the coefficients of the tangential distortion, (δu, δv) are the error in pix-els related to the distortion. Combining every procedure from equations 2.16,2.17,2.19 and2.20, an accurate representation in the image plane, without taking extrinsic parameters intoaccount, is given by {

u = Dus(u+ δu(r) + δu(t)) + px

v = Dv(v + δv(r) + δv(t)) + py,(2.21)

where (Du, Dv) is a representation of the distortion effects, s the skew parameter introducedin equation 2.16 and (px, py) the central point of the image obtained in equation 2.10.

Image 2.5 demonstrates an example of an algorithm performing a calibration with dis-tortion correction. The image on the left side portrays the calibration algorithm detectingthe chessboard pattern and proceeds to draw a line between points in the same row which isconsidered to be straight line in world coordinates. Image on the right is the final result afterapplying the undistortion method.

Figure 2.5: The effects of distortion on a chessboard pattern followed by the correction of thedistortion.

Extrinsic parameters

Extrinsic parameters, already introduced in equation 2.13, provide the parameters re-quired in order to transform object coordinates to a camera centered coordinate frame[1]. Itis composed by a rotation matrix and a translation matrix. The rotation matrix, utilized inextrinsic parameters, is represented by Euler angles of the x, y and z axis providing 6 DOFmovements.

The construction of the 3D rotation matrix is as follows:

By considering θX = roll (x axis), θY = pitch (y axis) and θZ = yaw (z axis) the Euler

11

angles associated with its axis we obtain the following matrices:

Rx =

1 0 00 cos(θX) − sin(θX)0 sin(θX) cos(θX)

(2.22)

Ry =

cos(θY ) 0 sin(θY )0 1 0

− sin(θY ) 0 cos(θY )

(2.23)

Rz =

cos(θZ) − sin(θZ) 0sin(θZ) cos(θZ) 0

0 0 1

. (2.24)

Afterwards, by multiplying the rotation matrices of all axis in a specific order it is possibleto obtain the joint rotation matrix:

R = RZRYRX ⇔ R =

r11 r12 r13r21 r22 r23r31 r32 r33

, (2.25)

with:

r11 = cos(θZ) cos(θY ) r12 = sin(θX) sin(θY ) cos(θZ)− cos(θX) sin(θZ)r13 = cos(θX) cos(θZ) sin(θY ) + sin(θZ) sin(θX) r21 = cos(θY ) sin(θZ)r22 = sin(θX) sin(θY ) sin(θZ) + cos(θX) cos(θZ) r23 = cos(θX) sin(θZ) sin(θY )− sin(θX) cos(θZ)r31 = − sin(θY ) r32 = sin(θX) cos(θY )r33 = cos(θX) cos(θY )

.

(2.26)

An important property of rotation matrices, and matrices in general, is that the multipli-cation is not commutative. Therefore, it is necessary to respect the order specified in equation2.25 in order to provide the correct result.

Another component of the extrinsic parameters is the translation vector,

t = (txi , tyi , tzi), (2.27)

which is obtained by the difference of the projection center of the camera, point C in figure2.3, in relation to the object coordinate system, with the Z axis perpendicular to the imageplane, i.e., with all of the 3 angles of rotation equal to 0.

By adding the translation vector to the previously obtained rotation matrix, it is possibleto transform the object coordinates into the camera coordinates:xiyi

zi

=

r11 r12 r13r21 r22 r23r31 r32 r33

Xi

YiZi

+

txityitzi

. (2.28)

This concludes all steps necessary to accomplish an accurate representation of a worldpoint into an image coordinate but raised the question on the acquirement of the valuesnecessary for the procedure, i.e., r, k, t, R.

12

Methods for computation of the parameters

While there are multiple methods that are used for the computation of the camera modelparameters, the recommend procedure, amongst the computer vision developers, is a simplepractical test that involves a grid, e.g., chessboard pattern.

As a grid is mainly composed by straight lines, a property well maintained by projectivetransformation, it is possible to compute every intrinsic parameter. The external parameterscan also be obtained in the same process with the restriction that the developer must containinformation regarding the length of a straight line in the observed object, e.g., an edge of thechessboard pattern or the diameter of a circumference.

A calibration algorithm analyses the chessboard pattern and for the intrinsic parameters,by assuming that all of the chessboard pattern features are coplanar, it is then possible toneglect the depth value and account solely on the (x, y) coordinates of the square verticesthat form the chessboard, relate them to pixels and correct any curvatures seen in the imageplane.

If the developer intends to obtain the extrinsic parameters, in the example of the chess-board pattern, the size of the square edges must be inserted in the algorithm, which willutilize this information in order to perform relations between the camera coordinates and theobject coordinates. This is highly desirable for applications that require the camera to befunctioning in a fixed base position, e.g., robotic arm, cranes.

For validation of the calibration process, a reprojection error method is applied.

Definition 2.1.4. Reprojection error is an alternative method of quantifying error in eachof the two images. [1]

The reprojection error method will be applied, for this case, in the image before the cameracalibration and the image after the application of calibration. The method is executed byanalyzing the points, which were utilized to perform the calibration, and verifying if thesepoints are in the same position as before. Through the mean values, of all the deviationdistances, it is possible to deduct the success of the calibration process.

For a greater robustness of the calibration method it is recommended an array of imageswith more than one point of view per test, followed by averaging all the mean values of thereprojection errors. Usually more than 0.25 pixels error per image is considered to be a badcalibration and it is advised to redo the process with a different set of images.

2.1.5 SIFT Algorithm

The Scale-Invariant Feature Transform (SIFT) algorithm is one of the tools available thatenables the detection of relevant features in an image, so that it is possible the comparisonbetween multiple images and thus create a match even if they contain differences in position,angle or scale between them.

It is stated [5][6] that SIFT algorithm can extract a large quantity of features, with analmost real-time execution, regardless of rotation and scaling, offering some robustness tochanges in brightness and 3D camera point of view. It is also resilient to occlusion, clutterand noise disruptions.

In order to maximize the efficiency of this procedure, the cost of feature extraction canbe minimized by imposing a number of steps where the most demanding operations will beapplied only if the initial tests are approved.

The major steps for the acquirement of features are as follows [5]:

13

• Scale-space extrema detection.

• Keypoint localization.

• Orientation assignment.

• Keypoint descriptor.

Definition 2.1.5. Keypoint is a locally distinct point in a image with the values of the pixelcoordinates.

Scale-space extrema detection

The first step, in keypoint detection, consists in areas in the image that are invariant toscale change. This can be obtained by applying a Laplacian of Gaussians (LoG), function,also known as Gaussian blur, to a given image. This process identifies the locations with thehighest difference in contrast, considered to be a point of interest, example given in figure 2.6.

In order to reduce the computational costs of this method, the LoG is replaced with theDifference of Gaussians (DoG) method, providing a considerable improvement on the timeduration necessary for the completion of these calculations, while achieving almost the sameresults [6][5][7].

Difference of Gaussian, represented in figure 2.6, calculates local extrema by applyingseveral Gaussian blur filters on the input image, creating a stack of images also known asoctave. In order to provide similar keypoints in images with different points of view, themethod will also change the scale parameters, separated by a constant multiplicate value, k,generating multiple octaves. Afterwards, for each octave, the adjacent blurred images willthen be subtracted providing a DoG image. If the octave has different scale values in regardsto the original image, after the completion of the DoG method, there will be a resampling ofthe resulting image to the initial value,i.e, original image.

This process is repeated by a predefined number of octaves which is arbitrary to the user,although the optimal values used in the calculations have been discovered in studies [5] andare assumed as default values in many SIFT tools available today.

The figures 2.7 and 2.8 demonstrates the input(left half of the image) and the final outcome(right half of the image) of the difference of gaussian function.

By completing difference-of-Gaussian, the next procedure is to find a local maxima andminima which can be done by comparing a pixel to its neighbors, on the same image ason the image with the scale above and below, seen in figure 2.9. Being considered a localextrema indicates the algorithm that a potential keypoint has been identified. The cost ofthis procedure is minimal due to the fact that most of the sample points will be eliminatedat the DoG procedure.

Keypoint localization

The second stage consists in eliminating bad extrema from the potential keypoint pool.This stage is necessary because bad keypoints are susceptible to noise or can be poorlylocalized, for example, keypoints along the edges, seen in figure 2.10.

Through the application of a 3D quadratic function to the local sample was able to findthe interpolated location of the maximum concluding that with this method noticeable im-provements in matching and stability were made[6].

14

Figure 2.6: Visual representation of Difference of Gaussians.(Source: [5])

Figure 2.7: Implementation of the Difference of Gaussian procedure on a sample image com-posed by a high contrasting square.

Figure 2.8: Implementation of the Difference of Gaussian procedure on a sample image com-posed by a high contrasting circle.

15

Figure 2.9: Detection of local extrema on a octave. (Source: [5])

In the situation of having poorly localized keypoints, the reason for the instability asso-ciated with this problem is due to the fact that, all local extrema near an edge would beconsidered, by the algorithm, as identical to each other which would translate to a largeamount of false detections.

Figure 2.10: Representation of the descriptors detected in images 2.7 and 2.8, respectively.

Orientation assignment

The third step is responsible for assigning a dominant orientation for each accepted key-point, in order to provide matches of keypoints, regardless of the rotation between the images.

In [5], a new technique was implemented, which used the scale of the keypoint to select aGaussian smoothed image with the closest scale value allowing all computations to be madein a scale-invariant manner.

Next, by using pixel differences for each image sample, values of gradient magnitude andorientation are obtained which will be used to form an orientation histogram, covering 360

16

degrees and with a radius of 1.5 times the scale of the keypoint or by the summation of allthe magnitudes of the corresponding gradients with the same direction, if the sum surpassesthe value of the former, seen in figure 2.11.

There is the possibility that more than one keypoint will be generated at the same lo-cation, which is caused by having multiple peaks with similar magnitudes near each other.However, keypoints that share the same location and magnitude will not share same domi-nant orientation. This occurrence is not common, but it is considered beneficial to stabilityin matching [5].

Figure 2.11: Representation of multiple gradients, with different magnitudes and orienta-tions.(Source: [5])

Keypoint descriptor

The final stage in the SIFT algorithm is computing a descriptor for an image regionthat can be distinctive but still viable in changes such as illumination or 3D viewpoint.When identifying tri-dimensional objects, by allowing shifts in the positions of the gradientsit was observed an drastic increase of the recognition accuracy under 3D rotation. SIFTalgorithm utilizes this methodology by implementing shifts in gradient positions, with thecreation of orientation histograms on 4x4 sample regions, i.e., 128 orientations. For a visualrepresentation, a quadrant of one descriptor is demonstrated in figure 2.12. To smoothen thegradient shifts, a trilinear interpolation is applied, distributing the value of the gradient inadjacent histogram bins.

By having all the values of the orientation histograms, the descriptor will be formed butthere is still one further step necessary to increase its robustness.

In order to minimize the effects of changing luminance values there will be two additionaltransformations of the feature vector which are described as:

1. The normalization of the vector for linear changes.

2. The implementation of a threshold value that limits the magnitude of the gradients.

17

Figure 2.12: Representation of a quadrant on a descriptor.(Source: [5])

Keypoint Matching

With every possible descriptor acquired, the next step is to relate them in order to obtainthe common points in different images. This is accomplished by comparing the descriptorwith the most similar gradient histograms, favoring the keypoints with the least reprojecteddistance from the keypoint in the compared image.

Figure 2.1.5 demonstrates a good example of the robustness present in the SIFT featurematching. As it is observable, in a environment that has suffered a rotation change of 135◦ itwas still possible to acquire a fair amount of matches where the majority were inliers.

A normalization of the gradient histograms, allows the comparison between descriptors tobe rotation invariant since the most dominant gradient in two descriptors will have the samedirection thus enabling a direct comparison of the histograms.

The robustness and accuracy of this process can be increased further by utilizing RANSACmethod which will translate to a rate of inliers amongst the total matches of an almost 90%in a 2D environment.

Figure 2.13: Application of SIFT algorithm on a image different orientations.

18

2.2 Control

Control systems can be described as a set of multiple techniques, that range from themodelling of a physical system into a mathematical environment, to the automation of saidphysical system. This branch of engineering procedures translates to an easier approach tothe conclusion of a project as, with the mathematical model, it is possible to previouslysimulate the behavior of the system without damaging or spending resources. For example,in a space exploration agency, if every time a engineer executed a sensor measurement testthat required the space ship to be fully functional, it would generate a increasingly wear onthe engine system as well the costs in energy. For this reason the modelling of autonomoussystems is a necessary step for an optimal approach to a system with a moderate level ofcomplexity.

This section will provide a brief explanation of the basic concepts of the control disciplineso it allows an understanding of the procedure taken in this application and its limitations inaddition to its vantages.

Transfer Function

The transfer function is a model that relates the output of a system with its input signals.It can be designed in many forms and domains. For a clearer explanation on this mattera example is given. By considering a Direct current (DC) motor controlled by an low-levelcontroller, e.g., Peripheral Interface Controller (PIC), it is possible to observe that it receivesan input voltage, within its functional range, which the motor will utilize this energy to convertinto a mechanical force with a rate that can be measured as rotation’s-per-minute (rpm). Asthe conversion is different for every produced motor, even with the same specifications, it isnecessary to adjust the PIC controller output in order for the motor to perform as desired.With a transfer function for the specified DC motor the controller can previously define thevoltage which can be represented as:

O(z) = G(z)I(z), (2.29)

with I(z) representing the input, O(z) the output andG(z) the transfer function of the system.The equation is represented in a frequency domain which can be obtained by a z-transform,a procedure that converts a discrete-time signal to the frequency domain.

As can be shown, for any linear system, the transfer function can be determined as theratio between the z-transform of the output and the input.

Open-Loop Controller

An open loop controller is a system where the input does not account for the error as-sociated with the output, in relation to the input. Thus, this type of control system cannotcorrect its behavior during its execution, however, if the system is well known, it is consid-erably cheaper and less complex to implement. A visual representation of a general controlloop system is given in figure 2.14, with rk representing the referential value, uk representingthe input towards the transfer function or the physical plant and yk representing the discreteoutput value with k = {1, 2, ...n}, n ∈ N.

19

Input Transfer Function Outputu

ky

kControllerr

k

Figure 2.14: Representation of an open loop system.

Closed-Loop Controller

A closed loop controller is a system, where a feedback signal, is introduced in the systemin order to allow the controller to correct the output signal. An example is given in figure2.15, with rk representing the referential value, uk representing the input towards the transferfunction or the physical plant,ek representing the error value between the desired value withthe output and yk representing the discrete output value with k = {1, 2, ...n}, n ∈ N.

Transfer FunctionController+-

Input Outputr

ke

k uk

yk

Figure 2.15: Representation of an closed loop system.

2.2.1 Visual Servoing

Visual servo control is a method which utilizes vision as an input in order to control thebehavior of a robotic vehicle. There are two types of visual servoing that are denominated asImage Based Visual Servo (IBVS) and Position Based Visual Servo (PBVS) [8][9].

In both of these concepts the goal is to minimize an error usually defined as

e(t) = s(m(t), a)− s∗, (2.30)

where e(t) is the error, s(m(t), a) is a vector of visual features, with m(t) as a set of coordinates

20

of the interest points and a an additional set of parameters that provide information aboutthe system. s∗ is the desired value of the features, i.e., features in the reference frame. Bothcontain the capability of controlling the motion of a camera contained on a robot capable of6 or less DOF.

To design a velocity controller, after the acquirement of the s parameters, it is possible toobtain the time derivative of s with the equation presented below:

s′ = Lsvc, (2.31)

where v is the spatial velocity of the camera, v = (vc, ωc), and Ls ∈ Rk×6 defined as theinteraction matrix related to s, which is the main focus of the visual servoing concept [8].

By elaborating with equations 2.30 and 2.31, considering Le = Ls, it is possible to obtainthe derivative of the error in relation with the camera velocity:

e′ = Levc (2.32)

As the velocity is the desired value, since the error is already obtained, by applying theMoore-Penrose pseudo-inverse on Le and by ensuring an exponential decoupled decrease ofthe error the following equation is obtained:

vc = −λL+e e (2.33)

The obtainment of the interaction matrix differs for both IBVS and PBVS.

Image Based Visual Servo

This procedure utilizes internal parameters of the pinhole camera model in order to trans-form a world point into an image coordinate, equation 2.16, which is used to form the inter-action matrix Ls:

Ls =

−1/Z 0 x/Z xy −(1 + x2) y

0 −1/Z y/Z (1 + y2) −xy −x

(2.34)

For this particular case, it is assumed that the values of the depth (Z) coordinates areknown. This procedure also requires calculation of the matrix in every iteration of the appli-cation.

Fortunately there is the possibility of an approximation to the interaction matrix whichallows for the control algorithm to function without knowing previously the depth of the pointand more importantly the calculation of the matrix at every iteration.

A more practical approach can be followed if a matrix Le is used as a replacement of Ls. Leis the resulting interaction matrix when the camera pose is at the desired position e = e′ = 0.

Thus, the approximation of the interaction matrix, L+e , results in a constant, having the

only requirement the depth value,Z , of each desired point. Figure 2.16 demonstrates onetest executed and it is possible to conclude that, the system successfully converges to thedesired position, however the velocity of convergence and the path to the desired position areundesired behaviors [8]. This is due to the intense execution of rotation motions required forthe camera to converge to the position where e = 0.

21

Figure 2.16: Practical results(feature trajectory, velocity vector,trajectory of center point) of

an IBVS control system with 4 points for the visual features and with L+e = L+

e . (Source:[8])

Position Based Visual Servo

The purpose of this section is primarily to demonstrate the difference between the twovisual servo models, with an additional brief explanation, since it is impossible to apply thismethod in this project due to the reason that the setup of the designed system does not allowthis implementation, as the vision methods applied do not utilize stereo vision or know the3D model of the captured object.

This methodology[8][9][10] utilizes the intrinsic camera parameters and the knowledge ofthe 3D object where the camera will converge to, e.g., a fixed QR code in a production line.

With a setup that utilizes more than 1 camera, it is possible to accomplish this method-ology without previous knowledge of the observed object, since with camera triangulationcalculations it is possible to reconstruct the observed world scenario.

An ideal implementation would be acquiring the rotation and translation values of thecamera pose, equation 2.28, which would serve as an input on the PBVS system that is definedby a similar model introduced in equation 2.33.

By considering θu a representation of the angle/axis parameterization for the rotation andtc the translation vector, the components required for the attainment of the tangential andangular velocities, (vc, wc) respectively, are:

e = s = (tc, θu)

Le =

[R 0

0 Lθu

](2.35)

The model presented has the advantage of decoupling the translation and rotational mo-tion which will result in a trajectory path very similar to a straight line when converging to

22

the desired position, which is excellent for stability. The output of a decoupled PBVS willthen be as follows: {

vc = −λRT tc

ωc = −λθu(2.36)

23

Chapter 3

Tools and Methods

3.1 Blimp

The vehicle utilized in this project for the application of the control algorithm is a smallscale indoor blimp, made available to this project by the Institute of Telecommunications(IT) of Aveiro; see figure 3.1.

Figure 3.1: Blimp provided for the implementation of this project

A blimp is a vehicle considered to be a Lighter than Air (LTA), meaning that the gasesinside, i.e., helium or hydrogen, have a lower density coefficient than atmospheric air, thusproviding the lifting force necessary to maintain altitude. Another property that justifies theselection of an LTA vehicle, in assignments that require hovering, is that any movement actionswill force the displacement of surrounding particles, which in consequence, the particles willgain kinetic energy and, in reaction, will apply a force in the opposing direction[11]. Thiseffect translates to the vehicle, if symmetrical, maintaining a static position if no externalforces are applied. This is a positive quality for this dissertation in particular, as it providesgood setup times for the vision algorithm.

This type of blimp is considered to be a nonholonomic system, i.e., has more total degreesof freedom (6DOF) than controllable degrees (3DOF) having only controllable DOF in thesurge, yaw and sway components, a visual representation can be seen in figure 3.2.

Ideally, by fixing the vehicle in a determined altitude the remaining possible movements

24

Figure 3.2: Blimp schematic with origin of body-fixed referential placed at a point on thegondola. (Source:[12]).

will be on the horizontal plane, which will result in a simpler control algorithm as only 2dimensional movements can be executed. In reality, 3D movements can occur, such as roll,pitch and altitude changes, that could be disruptive for the control algorithm. However,the occurrence probability is low due to the blimp being robust to these alterations since,assuming, as the tests occur in a controlled environment, no undesired external force is applied,therefore the forces that are responsible for maintaining the blimp state do not change.

Since it was not possible to perform tests with the real blimp, the dynamic model of realblimp similar to the one existent in the IRIS laboratory was implemented in Matlab/Simulinkin order to obtain realistic values for the dynamic behavior of the vehicle and introduce thesevalues in the vision-based algorithms of position estimation.

3.2 Hardware configuration

3.2.1 Blimp Components

The packaging of this particular blimp already provides:

1. Main motor responsible for the vehicle translational movements.

2. Servo responsible for the angle of the main motor in relation to its horizontal plane,enabling the simultaneous propulsion of the blimp in surge and heave.

3. Electronic Speed Controller for the main motor.

4. Tail rotor responsible for the blimp angular orientation.

5. 11.8V li-ion battery with 3 cells with 2500 mAh capacity.

6. Radio Controller (RC).

25

7. Additional 200g of weight capacity at full inflation with helium gas.

The setup applied on this project will not use the RC, since the purpose of station-keepingis to release the user of the responsibility of controlling the vehicle. Therefore an ArduinoTM

will be used to control all motors, using a detailed procedure method in the next section. TheArduino will then be connected to a Raspberry Pi (RPi), by a serial protocol. The RPi isassigned the responsibility to capture the images, send them to the control center, which inthis project is a laptop computer, and then transmit the commands of the command centerto the Arduino. An overview can be seen in fig. 3.3.

Low Level Controller

(Arduino)

High Level Controller

(RPi)

Command Centre

(Personal Computer)

Serial Connection

Wireless

Figure 3.3: Block diagram of the integrated circuits.

An additional 5V battery was added to power the control units, as there would be theproblem regarding power insulation, because of the 3 electromechanical components consum-ing large amounts of current when using a DC-to-DC voltage regulator instead. This resultsin lowering the cost and the complexity of the blimp circuitry, at the expense of additionalweight.

3.2.2 Low Level Controller

The Arduino Leonardo combined with a motor shield, with a conventional SN754410 Hbridge, were chosen as the controlling units of the blimp’s electromechanical componentsbecause of the availability, modular capabilities and documentation provided.

In order to control the speed of the electromechanical components, the setup utilizes 3Pulse Width Modulation (PWM) signals, which can be generated by the Arduino boardthrough the utilization of existing functions provided by the manufacturer, or by changingregister values of the 16MHz Microchip ATmega32U4 processor present in the board.

Most conventional servos require an input signal of 50Hz with pulse width values rangingfrom 1 ms to 2 ms for a full 270-degree rotation. As the servo in the blimp defines the angle ofthe main motor in relation to its horizontal plane, only 2 positions are desirable, specificallyat 0◦(Forward) or 180◦(Reverse), which corresponds to pulse widths of 1 ms and 1.6 ms,respectively.

The Arduino functions, developed by its manufacturer, were not able to provide theresolution needed for only just translational movements. As such, the main thruster wouldexert a force in a direction that would result in undesirable altitude changes. By registermanipulation, a more precise control of the angle, between the main thruster and its horizontalplane, was possible, thus minimizing what would be a systematic error in the blimp movementtrajectory.

The rotor, responsible for controlling the orientation of the vehicle, is required to haveinvertible polarities. This is achieved with the H-bridge of the SN754410.

26

The H-bridge is functioning in a Enable Input configuration which requires a single PWMinput signal and 2 DC input signals. The duty-cycle of the PWM signal, in combination withthe supply voltage, determines the motor speed. The frequency of the PWM signal mustbe set within a range of values,which takes into account that, low frequency values wouldnot be enough to average the input voltage at a DC value, thus generating unwanted speedchanges and vibrations. Additionally, extremely high frequencies would cause the H-bridge todissipate power at a higher rate causing an increase in the H-bridge temperature and batterydrain.

For the main thruster, since it will operate on a ON/OFF method, the use of the sourcefunctions provided with the Arduino Application programming interface (API) is justifiable.These functions will emulate the behavior an RC controller. This will result in the Arduinooutputting a signal similar to one used by a servo, which then will be received by the electronicspeed controller component, where it will finally generate the required output voltage for themain thruster to operate.

3.2.3 High Level Controller

In early tests it was observed that the RPi single-board computer did not have the process-ing capacity to execute feature detection at an acceptable rate. Therefore it was assigned, asits primary function, establishing a wireless connection between the blimp and the commandcenter using an UDP socket as the communication protocol in addition to capturing a streamof continuous frames and send forth the control signals to the Arduino.

The frame capture is set at a rate of 10 Frames-per-second (FPS) as higher FPS valueswould demand processing costs unobtainable, by the command center, since it cannot finisha single iteration in less that .1s, causing frequent transmission interruptions and undesirablewaiting periods on the output of control algorithm thus making the system unstable.

Furthermore, in order to establish a successful connection between the RPi and the com-mand center, it was necessary to divide the captured frames before the transmission, for thereason that each socket has a maximum capacity of 216 = 65536 bytes per packet and every640x480 grey frame would have 307200 bytes.

Any delay on the transmission, of the video stream, would be mitigated as the UDPprotocol disregards the confirmation message on the receiver, as opposed to TCP protocol,and as a consequence of this characteristic the transmitter proceeds with the transmissionallowing the receiver to acquire the first frame whenever it is capable of, disregarding theframes that could not be obtained at the moment of reception.

To assure that different frames do not appear as a unified frame at the receiver, a messageis sent at the start of transmission informing the command center that the first packet of theframe is ready to be sent. If the transmission is interrupted, the command center will discardthe received data and wait for the next frame.

This methodology was found to be suitable for this dissertation as the station-keepingproblem implies that the vehicle is considered to move at speeds under 0.1m/s, meaning theperiod between frame acquisitions does not impact the success of the algorithm in the samemanner as the accuracy of visual position estimation.

Once the command center finishes with the vision related estimations, of the vehicleposition, and evaluates the next required control action, several messages containing one byteare sent to the onboard computer(RPi), through the UDP socket, which will redirect theinformation to the low level controller with the use of a serial connection.

27

3.3 Camera sensor

The RPi Camera Version 2, distributed by Raspberry Pi organization, was used in thisproject as the specifications of the sensor Sony IMX219, displayed in table 3.1, were satisfac-tory for this work.

Specifications Values

Sensor resolution 3280× 2464 pixels

Pixel Size 1.12µm× 1.12µm

Focal Length 3.04mm

Linux Integration V4L2 driver available

Horizontal Field of View 62.2◦

Vertical Field of View 48.8◦

Table 3.1: Camera specifications.

For this application each frame of the video is set at a 640 pixels of width,480 pixelsof height with an update rate of 10 Frames per second. This is facilitated as the cameraused is compatible with the Linux V4L2 drivers, which allows the user to change its pre-setparameters.

The mounting point of the camera is set at an angle where the image plane is as parallelas possible to the scene plane in order to minimize 3D transformations.

For accurate planar measurements a previous calibration of the camera is required as mostcameras have lens distortion which will result in features being misrepresented in the imageplane. Figure 3.4 exhibits two commonly observed lens distortion effects on a checkerboardpattern, used in calibration procedures. It is possible to conclude that, in an uncalibratedcamera, as the distance from the central point increases, the error associated to the pointbecomes larger.

Figure 3.4: Representation of possible camera distortions. (Source: [13])

28

3.4 Software

This chapter describes every step executed on the command center (Laptop computer)which is the responsible for every estimation regarding the vehicle position, velocity androtation. When starting, the program sets all the required parameters for the application ofthe distortion-correction mapping, i.e., undistortion of the frames. Afterwards a UDP clientconnects to the server, in this case the blimp’s onboard computer (RPi).

By successfully establishing a connection, between the RPi and the command center, thefollowing step is to begin the loop where the station-keeping algorithm will be executed.For synchronization purposes a dedicated thread is created which will then be responsiblefor continuously acquiring frames from the video stream. The video will then be storedlocally prior to the feature detection algorithm preventing the corruption of the frames whendetecting and matching features.

3.4.1 Visual processing

Detecting and Matching Features

Feature detection is executed by the SIFT algorithm provided by OpenCV. It calculatesup to 4000 keypoints as it considers a whole image for analysis instead of a region of interest.

As the matches present considerable error in each iteration, it is required further calcula-tions to correctly identify and remove the outliers in the matches to obtain the lowest possibleerror. The algorithm of homography estimation with the RANSAC algorithm outputs a maskwhich identifies inliers and outliers, in the image, with good results, however, a small numberof false positives in rare cases, e.g., images with patterns, are accepted. To nullify as muchas possible these exceptions, by multiplying the initial homography results with the pointsin the reference frame, it is possible to obtain a prediction of P (x, y) values in the frame ofcomparison. With the creation of a rectangular zone surrounding the predicted point, it ispossible to assume that a true positive match was found if the matched keypoint, in the frameof comparison, is contained in the prediction region.

The homography matrix is acknowledged as a good calculation of the position deviation, ifthe number of true positive matches surpasses a threshold value with a default of 20 matches.

If the initial homography estimation did not met the threshold value set, the algorithmwould divide the image in smaller equal parts and apply the homography calculations with theadjacent sections. However executing multiple homography estimations, in a single iteration,required processing costs that are not possible for work applications in real-time. Therefore,instead of dividing the frame in separate sections, the solution applied disregards frames thatdo not meet the minimum value of true-positive matches. This methodology presents a fasteralgorithm, with a minimal penalization in the calculation of the position, as the probabilityof the next frame acquiring enough matches for a positive correspondence, assuming normalconditions, is enough for measurements at the maximum possible speed.

Robustness improvements

It was observed that with the camera drifting further from the reference position, theframes would get progressively lower matches, as the image set to be compared has fewerfeatures in common with the reference frame. To provide the vehicle more space for movement

29

and retain the ability to come back to the initial position, a chain of frames was created. Itcontains in each index:

1. The actual frame.

2. Position in relation to the reference frame.

3. Frame counter.

4. Time stamp of the frame acquisition.

5. Index number of the frame used as reference.

6. Strength value.

The motive for preserving the index of an intermediary image is to maintain the connectionbetween the camera position with the reference point. As this technique is process intensive,it is set to trigger only when the distance from the central points,Pc = (x, y), of the framesthat are being processed exceeds a threshold value, set by the user. In this condition, everyapproved frame is inserted on the chain with a strength value of s = m

a , where m ∈ N isthe number of true-positive matches, and the age parameter a > 0, which increases everyiteration.

By completing a link with a frame every image related to the link resets its age parameter.If the age parameter does not reset, it means that the image did not found a link for an amountof time and as a result it is removed from the chain.

In this work, every 5 iterations the chain is sorted by link position and strength, in therespective order which keeps the strongest frame in each link position.

If during a comparison a frame does not acquire the minimum value of true positivematches, it will compare with the next frame on the chain. In the end, if a link is notfound within the frames that are inside the region of possible locations, the algorithm willdisregard the frame and acquire another, as the frame could be affected by quality degradingoccurrences, such as motion blur or a very intensive lens flare.

The region of possible location, mentioned before, is a region centered around the vehiclethat implies it is physically impossible for the vehicle to appear out of the region bounds.The region has into account the instantaneous linear speed, the maximum linear speed, andthe time since the last update.

3.4.2 Control algorithms

This chapter will focus on the treatment of input data, the methods applied for speedestimation and position maintenance.

Point Registration

This step is executed every time a frame is stored in the chain, along with its data. Sincethe point registration algorithm requires, as inputs, the central points and the time stampsof the stored frames, it is justifiable that the execution order is set in this manner.

The algorithm outputs an object variable that contains a 2D point, a time stamp in mil-liseconds and a variable that informs if the vehicle is moving, stationary or if its uncertain.

30

By observing figure 3.5, as consecutive estimated positions can contain inconsistent dis-tance values, if every distance was considered for velocity calculations it would result in theappearance of errors. Therefore, two threshold regions are created, surrounding the last ac-knowledged position of the vehicle. A state is defined by the intersection of the estimatedpoints with the regions.

Stationary Region

Uncertain Region

Vehicle Position

Direction of movement

Figure 3.5: Representation of the registered positions, during point registration, and thethreshold areas that define the state of the vehicle. Image on the left portrays the positionestimations in a moving vehicle. Image on the right portrays the position estimations in astationary vehicle.

By obtaining a state different from uncertain the algorithm corrects every given uncer-tain state prior to this event, as it will re-define the uncertain states to the newly acquiredcertain state, i.e., stationary or moving. This methodology, represented by the diagramshown in figure 3.6, imposes a minimum required velocity value in the instantaneous velocitycalculation, meaning that, for very slow drift movements the algorithm would not be able todetect the velocity of the vehicle, thus creating instability in the control algorithm. For thisproblem, two solutions where applied:

• Calculating the average velocity every 10 intervals (approximately 1 second).

• Visual Servoing

31

Point acquired for

registration

Is inside the

uncertainty region?

Is inside the

stationary region?

No Yes

NoYes

Point acquired with uncertainty

regarding vehicle movement

Point acquired during vehicle

movementPoint acquired with stationary

vehicle

Replace every uncertain registry

prior to the acquired state

Register Position

Figure 3.6: Diagram of the movement state during the position registration

Visual Servoing

For the execution of the visual servoing algorithm the program will select the 10 mostrobust points of the analyzed images during 5 seconds. With every measurement the algorithmwill keep track of the matched points. If a point has multiple matches, during this time, itwill increase a variable that counts the number of times a certain point is matches.

Arriving at the end of the setup time, the 10 points with most matches, will be consideredfor the creation of the interaction matrix. The importance of this procedure is due to thefact of the visual servoing algorithm, requires every iteration, the distance measured, of thepoints, from the actual position to the desired position.

After acquiring the set of points required, the calculation of the interaction matrix is doneby application of equation 2.34. The set of points and the interaction matrix are stored andif the image captured is near to the reference image,i.e., directly linked to the reference, the

32

homography estimation, if successful, will update the actual position of the points.

The visual servoing algorithm will output a vector composed by 6 variables in which thefirst 3 elements are for the instantaneous linear velocity of the center point of the capturedimage and the last 3 are for the instantaneous angular velocity of the camera frame.

With the acquisition of the velocity vectors, the control algorithm can adjust the correctionthe drifting movements, more precisely than position based control.

3.4.3 Blimp Dynamics and Kinematics

Simplified system dynamics

The dynamic model adopted was taken from [12] and uses the physical parameters of thesystem, determined in indoor free-flight experimental tests with a blimp (Table 4, column‘Fr’ of the above-mentioned reference). Since the dimensions of that blimp are approximatelydouble the size of our blimp, and in order to deal with this scaling, the physical parameterstaken from that paper used were modified accordingly before being applied in our simulator.

In the present context, {I} denotes an inertial reference frame and {B} denotes de refer-ence frame attached to the body of the vehicle (body frame).

The 6DOF nonlinear dynamic equations of motion of a vehicle, based on Newtonianmechanics, can be expressed in a compact form by the following equation, which is a simplifiedform of the general expression presented in [14]:

Mν + C(ν)ν + D(ν)ν + g(η) = τ,

where M represents the system inertia mass (including added mass), C(ν) corresponds tothe Coriolis-centripetal matrix (including added mass) as a function of the vehicle velocityvector, D(ν) is the damping matrix which is also a function of the velocity vector, g(η) denotesthe vector of gravitational and buoyancy forces and moments, and τ represents the vectorof external forces and moments, including the control inputs. Vector η = [x, y, z, φ, θ, ψ]T

denotes 3D position (expressed in referential {I} and the Euler angles of the vehicle bodyreference {B} with respect to {I}.

Assuming the vehicle moves with very low speed (the Coriolis forces are negligible), theformer expression can be simplified as

Mν + D(ν)ν + g(η) = τ. (3.1)

The formulation proposed in [12] corresponds to the linearized version of equation 3.1about a trimmed flight trajectory, and relies on the following assumptions:

• The trimmed condition is a steady rectilinear level flight.

• The mass of the blimp is constant.

• The origin of the body-fixed-axis system (xb, yb, zb) is placed at a point in the gondola(at the bottom of the blimp).

• The blimp is symmetric about the xb− zb plane, and both the center of gravity and thecenter of buoyancy lie in that plane.

33

The formulation adopted also decomposes the system generalized inertia matrices intotwo components: the mass matrix, M and the added mass, A.

Based on the above-made assumptions, the linearized equations of motion are divided intotwo sets representing the (uncoupled) longitudinal (LG) and the lateral (LT ) modes of thevehicle as:

MLGxLG = ALGxLG + QLG

xLG = [u,w,q,θ]T

QLG = [τX , τZ , τM , 0]T(3.2)

and

MLT xLT = ALTxLT + QLT

xLT = [v,p,r,φ, ψ]T

QLG = [τY , τL, τN , 0, 0]T ,

(3.3)

where the following variables denote the perturbed dynamic and kinematic states around thetrimmed trajectory:

[u,v,w]T : linear (surge, sway, and heave) velocities expressed in {B},[p,q,r]T : angular velocities (roll, pitch, and yaw rates) expressed in {B},[φ, θ, ψ]T : Euler angles,[τX , τY , τZ ]T : external forces applied along the respective axes, and[τL, τM , τN ]T : external moments centered on the axes [xB, yB, zB].

From the above formulation it is possible to derive two differential equations relating thelongitudinal and lateral state vectors xLG and xLT , respectively, with the system parametersand the applied forces and moments as

xLG = (MLG)−1 (ALGxLG + QLG) (3.4)

and

xLT = (MLT )−1 (ALTxLT + QLT ) (3.5)

The state of the system given by the solutions of these differential equations is obtainedby the Simulink solver applied to the dynamics model described above.

System kinematics

The kinematic state of the blimp is represented by its pose, η (defined above) together withthe body-referenced velocity vector ν = [u, v, w, p, q, r]T , which are related with the kinematicstate of the vehicle represented in the inertial frame {I} through the following expressions:

J(η) =

[J1 03×3

03×3 J2

],

where J(η) is as 3D rotation matrix that permits the transformation of the kinematic quan-tities from the body-fixed to the inertial frame and vice-versa, with (noting the simplifiednotation cψ = cos(ψ) and sψ = sin(ψ), etc.)

34

J1 =

cψcθ −sψcφ+ cψsθsφ sψsφ+ cψcφsθsψcθ cψcφ+ sφsθsψ −cψsφ+ sθsψcφ−sθ cθsφ cθcφ

,and

J2 =

1 sφtθ cφtθ0 cφ −sφ0 sφ/cθ cφ/cθ

.This topic is covered in detail in the above mentioned reference [14].

Thruster model

Similarly to the configuration of our real blimp, the vehicle modeled here has two thrusters:one Main Thruster (MT) and a tail Rotor (TR). Both thrusters can rotate in opposed direc-tions allowing for inversion of the resultant thrust. The MT can be tilted by an controllableangle, ρ in order to provide simultaneous longitudinal and vertical components of thrust.

The thruster model adopted for the MT is similar to the one presented in [15]. Accordingto that model, the thruster response in gram-force per volt (gf/V) is approximately linear inthe voltage domain of [−4, 0.5] V with a gain of 1.6 gf/V, and in the region [0.5, 4] V witha gain of 3 gf/V, as shown in figure 3.7; as shown also in this figure, the thruster responsehas a dead-zone in the input voltage interval [−0.5, 0.5] V. The dynamic model of the TR issimilar but this motor is assumed to have significantly less power than the MT.

Figure 3.7: Motor thrust in XB direction, versus the voltage applied on the motor, Vmotor.(Source: [15])

35

Chapter 4

Results and Discussion

4.1 Camera calibration

First in the order of experimental procedure is the camera calibration in order to correctfor camera distortions and obtain the physical camera parameters, studied in section 2.1.3, inorder to accomplish the IBVS procedure. The experiments consists in comparing the valuesof an ideal chessboard pattern with one captured by the camera.

Considering an ideal position of the chessboard pattern to be when the pattern is containedin the X and Y axis of the camera coordinate reference, the coordinates of each vertex willbe (i, j) with i, j representing the column and the row of the contrasting squares. Giventhis information it is possible to compare with the chessboard captured by the camera andafterwards obtain the distortion coefficients, focal lengths in the x and y axis and the centerof projection for the given sensor.

By providing the algorithm with the length of the squares edges, the calibration proce-dure can establish the correspondence between world distance and the image pixels. Thusthe estimation of the extrinsic parameters is possible which will output the translation androtations vectors of the chessboard reference in regards of the camera coordinate reference foreach image.

For evaluation of the calibration procedure the reprojection error of the vertices will bemeasured, providing the error per pixel of each image.

The setup used consisted on a chessboard grid with contrasting squares area to be 35mm× 35 mm captured in distinct positions and orientations .

For validation of the implementation executed in this project, the results are comparedwith the calibration results obtained with a similar tool available in the Matlab ComputerVision toolbox (MathWorks (MW), Corp.).

Five tests were executed with the results displayed below:

36

Test 1

Figure 4.1: Test 1: Calibration procedure with the ideal straight lines placed on the columnsof the chessboard pattern.

Intrinsic parameters obtained from the calibration.

KApp =

499.0095 0 315.53810 499.8903 245.47670 0 1

KMW =

503.5673 0 319.34420 502.8614 250.54550 0 1

(4.1)

37

Coefficients Application MathWorks

k1 2.23× 10−1 2.23× 10−1

k2 −4.34× 10−1 −4.36× 10−1

k3 −6.96× 10−2 0

p1 −7.32× 10−4 0

p2 −2.03× 10−3 −2.37× 10−3

Table 4.1: Test 1: Distortion measurements from application vs MathWorks.

Image Application MathWorks

1 0.13 0.14

2 0.10 0.10

3 0.15 0.15

4 0.17 0.17

5 0.13 0.13

6 0.11 0.10

7 0.10 0.09

8 0.17 0.17

9 0.12 0.13

10 0.14 0.14

11 0.16 0.18

12 0.11 0.09

13 0.14 0.15

14 0.18 0.18

15 0.21 0.22

16 0.19 0.18

17 0.16 0.18

18 0.17 0.17

19 0.12 0.13

20 0.18 0.18

21 0.11 0.13

22 0.17 0.17

23 0.14 0.14

24 0.10 0.11

Mean 0.15 0.15

Table 4.2: Test 1: Reprojection errors from application calibration vs MathWorks calibration

38

Test 2



KApp =

506.93 0 317.100 505.06 254.130 0 1

KMW =

506.92 0 318.950 505.09 254.520 0 1

(4.2)


k1 2.38× 10−1 2.41× 10−1

k2 −5.10× 10−1 −5.3× 10−1

k3 6.17× 10−2 1.28× 10−1

p1 3.46× 10−4 0

p2 −9.51× 10−4 0


39

No. Application MathWorks

1 0.37 0.36

2 0.15 0.16

3 0.20 0.20

4 0.11 0.10

5 0.16 0.16

6 0.16 0.16

7 0.17 0.18

8 0.20 0.21

9 .18 0.19

10 0.22 .21

11 0.11 .11

Mean 0.19 .19


Test 3


KApp =

845.31 0 319.460 843.26 245.670 0 1

KMW =

894.97 0 320.000 895.09 248.570 0 1

(4.3)


k1 7.18× 10−1 8.77× 10−3

k2 −3.76 −5.304× 10−1

k3 6.33 1.283× 10−1

p1 8.14× 10−5 −3.83× 10−4

p2 −2.03× 10−3 −8.91× 10−6


Analysis of the results obtained from the elaborated tests.

For the validation of the focal lengths obtained in the elaborated tests, in relation to themanufacturer specifications, it is possible to consider that the test 1 and 2 were successful asthe calculations, in equation 4.4, confirm the focal lengths of the image captured to be almostidentical with the declared value from the manufacturer.

On the contrary, test 3 was considered to be invalid as the focal lengths are miscalculated.This is due to the fact that, the chessboard captured in the images was not placed on a straightplane, which resulted on a pincushion-type of distortion effect, which was not accounted for

40


in the calculations of both frameworks.{fx = (503.5673 ∗ 3280

640 )[px] ∗ 1.12[µmpx ] = 2.9mm

fy = (502.8614 ∗ 2464480 )[px] ∗ 1.12[µmpx ] = 2.9mm

(4.4)

By analyzing the distortion coefficients it is possible to classify the type of distortionpresent in every image through the interpretation of the k1 value. If k1 value is negative itis considered to be a pincushion effect and positive values to be a barrel effect and the effectis more present with greater module values. In the rare case that k1 value is 0 it is assumedthat the camera sensor does not have radial distortion.

In this project, after an efficient calibration, the camera distortion present in the capturedimages is of a barrel effect with values close to 0. The tangential distortion is minimal,therefore, in order to decrease the complexity of the calculations is considered to be negligible.This decision is fortified due to the fact that the MW framework defined the tangentialdistortion as 0, thus does not factor the undistortion of the images.

The final metric for evaluation of the calibration test is the reprojection error, i.e., errorper pixel. It was verified that the measurements from the algorithm were consistent with themeasurements provided by the Matlab tools. Additionally, the mean value of the reprojectionerror is considered to be acceptable, as it is comfortably below the .25[px] value assumed to

41

No. Application MathWorks

1 0.19 0.19

2 0.17 0.18

3 0.18 0.17

4 0.14 0.15

5 0.19 0.18

6 0.20 0.19

7 0.17 0.17

8 0.15 0.16

9 0.17 0.16

10 0.13 0.14

11 0.17 0.18

12 0.17 0.17

13 0.18 0.16

14 0.13 0.14

15 0.17 0.18

16 0.18 0.18

17 0.13 0.14

18 0.16 0.15

19 0.12 0.13

20 0.12 0.13

21 0.17 0.17

22 0.12 0.13

23 0.13 0.13

24 0.16 0.17


be the limit of the error in the standard of computer vision.The obtained results will be applied in several modules of the application which are, in

order:

1. Undistortion of every captured frame from the sensor.

2. Projection of image coordinates into world scene points, necessary for the attainmentof the interaction matrix utilized in the IBVS method.

3. Correction of the offset seen by the camera, derived from angular rotations and alti-tude,or depth, variations.

42

By applying the results into an undistortion algorithm it is possible to obtain the correctmapping of the points into the image plane. An example of this mapping from the resultsobtained in first test can be verified in figure 4.4.

Figure 4.4: Undistortion of the chessboard pattern used in the calibration procedure.

The extrinsic parameters are not considered for this project for the reason that the tan-gential and rotation vectors obtained are from the chessboard pattern, which is not utilizedany further for the measurements. The verification is also not important due to the fact thatthe world scene, in every test, is unknown which describes well the problem of station keepingfor research purposes. If the robot would be assembled with the intention of maintaininga stationary position, e.g., assembly line, it would be beneficial to validate the external pa-rameters obtained from the calibration tests, as the camera would be able to estimate theposition and orientation measurements, assuming that the object observed is declared in theestimation algorithms.

4.2 Simulated Environment

A simulation, of a world scenario, was designed in order to test the accuracy of the posi-tion and velocity estimations of the vision algorithm with as minimal as possible interferencesfrom external factors. To accomplish this, a large image of 3000[px]× 3000[px] was designed.It is composed with different geometric features, randomly distributed across a white back-ground. This will result in a world scene with well defined features that can provide test with

43

considerable deviations from the reference point, in order to analyze and verify the robustnessof the method applied in this project.

The simulator can also provide the verification of the velocity measurements as the valuefor the motion of the frames, in the large image, is also set before the test. In order to obtainthe distance traveled per second, every simulated test will be executed at 10 FPS.

Test1

This test was created for the measurement of 1D translation movements at a given time.To accomplish this, the simulator will create a trajectory of a square with simultaneousstarting and finish points. The velocity set for this test was of 3px/s.

Figure 4.5: Graphical representation of the trajectory performed by the camera pose.

44

Figure 4.6: Test 1: Distance of x and y components from the reference point.

Figure 4.7: Test 1: Total distance from the reference point.

45

Figure 4.8: Test 1: Velocity of the simulated camera pose.

Test 2

This test was developed in order to test the robustness of the algorithm when the camerapose suffers from non linear movements. In order to accomplish this the simulator would havethe camera move to one direction with a 20% chance of moving in the orthogonal directionsat any given time. The velocity of the camera pose movements is set at 3px/s.


46



Test 3

For this test the camera pose will have a trajectory along a diagonal line. This is donein order to verify the application measurements when the 2D components are acting at thesame time. Each increment of this test will change the position of x and y coordinates by 1px/s which translates to a combined velocity of

√2 ≈ 1.41px/s.

47



Test 4

This test serves as validation of the yaw rotation estimation that can occur on the blimp.In order to achieve this procedure a 3D module capable of providing 9 controllable DOF wascreated. It is composed of 18 spheres which are divided in an equal number of spheres alongtwo blocks. A visual representation is given in figure 4.17.

48


Figure 4.15: Test 4: Effects of the rotation on the camera frame.

49

Figure 4.16: Test 4: Effects of the rotation on the camera frame.

Figure 4.17: Test 4: Matching of rotated frames with different directions.

50

Analysis

From the results obtained of the simulation it is possible to conclude that the positionestimation is accurate and robust for 2 dimensional movements, which is the behavior as-sumed to be predominant for floating vehicles with a streamlined structure, i.e., capable ofmaintaining pitch and roll angles when a external force is applied, such as the blimp used inthis project.

From the first test, by observing figures 4.6 and 4.7, it is possible to verify that evenwhen the camera pose is far apart from the reference point, the application is capable ofproviding estimations for the vehicle position with very few errors in relation to the realvalue. The inconvenience originated from this method is the notable increase of the demandin computational resources, which is proportional with the distance of the vehicle from thereference point.

The estimations of the translational velocity, of the camera pose, are consistent betweenall the tests performed and it is possible to observe that, the estimations for the instantaneousvelocity can suffer from large variations between samples, see figures 4.11 and 4.14. This isan undesirable attribute when applied to a control algorithm. Thus, in order to minimizethis effect, the algorithm will average the velocity estimations for each second (10 samples),providing a more stable estimation of the translational velocity of the camera pose.

From this point onwards, the instantaneous translational velocity estimations are notconsidered to further extent, as it does not propose any beneficial attributes.

The second test portrays a behavior that is not common for an indoor blimp, although itis relevant given the fact that, in the practical tests, the estimation of the projection matriceswill not be as a accurate as the simulated estimations. Thus it is very likely a similar jitterpattern is going to be present in any practical test where the algorithm is presented with aninconsistent detection of features, e.g., green field, bottom of ocean.

The estimation of the vehicle position from the second test is consistent to the resultsobtained in the first test. It is possible to observe, in figures 4.10 and 4.9, that the positionestimation of the work application had errors that could be neglected in a real application.

In regards to the velocity estimation, it is possible to verify (see figure 4.11) that, theaverage velocity estimation was efficient during the execution of test. The errors observedhad a maximum value of .1px

s in relation to the real instantaneous velocity of the simulator.This is due to the fact that the simulator, at that position, had a small number of figures,which created difficulties for the estimation algorithm as it could not have a healthy numberof correspondences.

The third test realized serves as a finalization of the 2D translation movements in whichthere is a change of position in both axis. The importance of this test is to evaluate themovements that will be as similar as possible to a realistic environment for this project, as itis impossible to realize a perfect 1D translation movement on a floating vehicle.

By observing figures 4.12 and 4.13, it is possible to verify that the vehicle can be trackedaccurately with almost no errors. The measurements, of the average translational velocity,were close to the simulated values with a maximum error of approximately .24 px/s, thus itis possible to conclude the measurements obtained were satisfiable and can be utilized for theinput of the control algorithm responsible for the connection of the vehicle kinematics withthe vision algorithm.

The final test performed was useful to understand the effects of the angular rotation, ofthe captured frames in relation with the vehicle. It was possible to observe that the results of

51

the projection matrices, when the vehicle rotated around the Z axis would describe a circlearound the center point of the frame.

To understand this behavior, by considering the similarity matrix (equation 2.4), thefollowing deduction is obtained:

p =

tx+ (1− α)cu − βcvty + βcu + (1− α)cv

1

, (4.5)

with: {α = s cos(θ)

β = s sin(θ). (4.6)

As it is observable the points will describe a circle around the center of rotation which iscorrect for the points present in the image due to the fact that the reference of the framealso suffered this effect. However, this is not equal to the movement of the vehicle, since thereference of the vehicle does not suffered any linear movements when the vehicle is performingyaw rotations.

In order to correct this problem, by subtracting the parameters in equation 4.7 with theprojection transformation matrix, the actual position of the vehicle reference is obtained.{

(1− α)cu − βcvβcu + (1− α)cv

(4.7)

With every test realized in the simulator, it is possible to conclude that, for 2D movements,the results obtained for the position estimation are efficient and robust. Also, the ability ofstoring multiple frames along the trajectory of the vehicle movement provides a superiorrange of the allowed distance radius for its movement. On the contrary, in some cases,the translational velocity estimation is not sufficiently accurate in order to be applied tothe control algorithm. Therefore, when possible, it is preferable to use the visual servoingalgorithm, since it would provide much better estimations of linear and angular velocities.

4.3 1-D Linear Actuator

This experimental test allows the analysis, of the application estimations, when it isacquiring frames through the camera sensor.

The actuator travels along a runway with a length of 4 m with only 2 directions, thushaving only 1 controllable DOF. The table 4.7 presents the minimum resolution associatedwith the measurements obtained by the actuator.

Measurement Resolution

Position 2.8× 10−4

Velocity 4.5× 10−3

Table 4.7: 1-D linear actuator minimum resolution.

The resolution values provided by the actuator implies that the error associated with theactuator should not present any type of limitation on the measurements as the resolution

52

values are much inferior to the values measured by the vision algorithm. However, wheninitializing the variables, for the velocity and acceleration, at which the actuator moves alongits runway, it is necessary to choose values that allows the structure containing the camerasensor to have the minimum oscillations possible. These oscillations are originated when thevalues of the acceleration are strict in relation to the values defined for the velocity.

The framework utilized for the actuator provides two options for the setting of the accel-eration of the actuator, which are described as:

• Defining at which rate it accelerates.

• Defining the acceleration range until it reaches the desired velocity.

The bigger the range for the acceleration of the actuator the lower the acceleration whichconstitutes in lower vibrations of the structure due to the lower force applied to it.

The problem associated with choosing small values for the acceleration parameters is thatthe values for the deacceleration are equal, which can result in the actuator not stopping atthe specified value.

For the execution of the tests the velocity value defined was of 0.1m/s with a range foracceleration of 20cm.

In order to obtain the relation between pixels and the metric system the distance fromthe camera sensor to the object scene was measured with a ruler tape.

The camera sensor was registered at approximately 1m of the ground and knowing thatthe frames captured would be undistorted and assuming the image plane would be parallel tothe object (figure 4.18), the relation could be calculated with a simple rule of three (equation4.8).

105px/20cm = 5.25[px

cm] (4.8)

Where the relation between the 105 px and the 20cm was measured experimentally.

Figure 4.18: Visualization of the 1D actuator test, with reference frame on the left and thecurrent frame on the right.

53

Test 1

For this test the array of positions to be traveled where given as follows:

Order Position [cm] Position [px]

1 50 262.5

2 20 105

3 0 0

4 50 262.5

5 80 420

6 20 105

Table 4.8: Table of positions set to the actuator.

Figure 4.19: Test 1: Total distance from the reference point measured by the application.

Discussion

The results obtained from the vision estimation with the camera sensor are on par whatwas displayed in section 4.2.

The position estimation, represented in figure 4.19, is worsened by a small factor, due tothe possible vibrations on the actuator or to the difficulty of the algorithm obtaining goodframes for comparison, which is improbable due to the reason that, the object measured wascomposed with well defined features. However, it is possible to observe that the positionestimation is considered to be accurate and robust, since the estimated values were similar tothe real values.

In terms of the velocity estimation, in figure 4.20, it is observable that the algorithm didnot perform on a acceptable level as it did not average the .1m/s (2px/s) for the majority of

54

Figure 4.20: Test 1: Velocity of the camera pose measured from the application.

the tests; detected movements when the vehicle was immobile and overshoot to a maximumvalue of .5m/s (11px/s) which is 5 times higher than the real value.

4.4 Blimp Simulation

Simulink model

In order to simulate the behavior of the blimp in a realistic mode, the above presenteddynamics and kinematic models have been implemented in the Matlab/Simulink environment.

Figure 4.21: Simulink model overview

55

Figure 4.22: Simulink block: Dynamics

Simulation results

Illustrative simulation conditions The test results presented in the following figureswere executed in the following conditions:

• Simulation duration: 100 s

• Main thruster: Input = step 2 V @ t=10 sec;

• Main thruster tilt angle (rho) = 10 degrees

• Tail rotor input: 0 V

Model geometry

1. The origin of {B} is place at the bottom surface of the vehicle (gondole), instead of itscenter, as in reference [12];

2. The center of the Main Thruster is positioned at the origin of {B} ;

3. The center of the tail Rotor is placed on the xB axis at coordinate xt;

4. Due to above assumption 2, the actuation of the MT does not generate moments aroundthe yB axis;

5. Due to above assumption 3, the actuation of the TR only generates moments aroundthe zB axis;

6. The blimp has neutral buoyancy: in the absence of external forces, its altitude remainsconstant;

56

Figure 4.23: Simulink blocks: Lateral and Longitudinal dynamics

7. When the surge velocity is positive (u > 0), the blimp descends due to an aerodynamiceffect; in order to compensate for this effect, it is necessary to apply a positive tilt angleρ to the MT; notice that coordinate ZI grows downwards, according to the North-East-Down (NED) referential convention.

Model limitations

• This model implements the linearized dynamics equations around a trimmed trajectorywhich assumes very small disturbances, including angular deviations from the nominaltrajectory;

• The model is particularly sensible to large perturbations of the yaw angle relatively tothe direction of the straight-line trajectory;

• In order to implement more complex trajectories, it is recommended to compose thedesired path based on a sequence of straight-line segments; between each segment, it ispossible to rotate the vehicle using the TR with the MT inactive.

We notice that the pitching disturbances caused by the surge force applied by the mainthruster observed in the results obtained with the simulated model consist of small-amplitude,

57

Figure 4.24: Simulink block: Kinematics

Figure 4.25: Simulink block: Main thruster response

fast-decaying oscillations as shown in figures 4.28 and 4.27. These oscillations are similar tothose reported in the paper of [15] that were obtained with both a real vehicle and a linearizeddynamics model.

58

Figure 4.26: Simulation results: linear velocities of the vehicle.

Figure 4.27: Simulation results: angular velocities of the vehicle.

59

Figure 4.28: Simulation results: Euler angles.

Figure 4.29: Simulation results: 3D position of the vehicle (notice that coordinate ZI growsdownwards, according to the North-East-Down (NED) referential convention).

60

4.5 Visual Servoing

For the validation of the visual servoing calculations, we formed the interaction matrixwith the obtention of 10 points expressed in camera coordinates, followed by transformingthe points into corresponding world coordinates, with the application of equation 2.14.

Point Position u [px] Position v [px]

1 235.9 270

2 311.9 188

3 557.88 416.65

4 625 114

5 632 417

6 591.42 112.36

7 525.89 442

8 595.4 383.32

9 633.7 183.8

10 576.67 417.65

Table 4.9: Points used for the calculation of the interaction matrix.

With the points in the world coordinates the next step is forming matrix Lk×6e with k = 20.By applying the pseudo inverse the matrix L+

e is obtained which only requires, for the input,the distance between the points used for the creation of Le and the actual position of thepoints.

Test1

By following equation 2.33, for a value of λ = 1, by inserting an offset on the x coordinate,for each point, by 105 pixels we obtain the vector of velocities:

vx = 105

vy = 0

vz = 0

wθx = 0

wθy = 0

wθz = 0

[px/s]. (4.9)

By inserting an offset on the y coordinate, for each point, by 105 pixels we obtain thevector of velocities:

61

Test2

vx = 0

vy = 105

vz = 0

wθx = 0

wθy = 0

wθz = 0

[px/s]. (4.10)

Test3

The final test was realized by inserting an offset on both x and y coordinates with thevalue of 105 and we obtain the following:

vx = 105

vy = 105

vz = 0

wθx = 0

wθy = 0

wθz = 0

[px/s]. (4.11)

Discussion

While the tests realized were few, it is possible to confirm the consistency of the output forthe translational movements by applying distance values with no error in the measurement,i.e., same offset for all of the points.

62

Chapter 5

Conclusion and Future Work

5.1 Conclusion

The work developed in the context of this dissertation included a diversity of components,including hardware configuration and tests (electrical motors and motor drivers, standalonecomputers and cameras) and software development for camera calibration, visual featureextraction, visual odometry, and visual-servo control. The preliminary tests were performedwith auxiliary platforms in order to permit the acquisition of “ground-truth” data required tovalidate the vision-based algorithms in experimental trials. Due to practical difficulties, in thefinal part of the work it was not possible to perform tests with the real blimp available at theIRIS laboratory, a limitation that prevented a more realistic assessment of the feasibility andthe efficacy of the methods under development. In order to cope with this limitation, whileproviding a realistic test bench for the system, was developed a dynamics-based simulator ofthe blimp parameterized by real physical parameters and designed to provide the signal inputsrequired by the visual-servoing system responsible for station-keeping the vehicle. Althoughthe different functional components developed in this work have not been fully integratedas would be necessary to implement a fully-functional version of the system, the theoreticaland experimental frameworks of the proposed approach have been established and a set ofoperational blocks has been implemented which can be later integrated in order to achievethe initial objectives of the project.

5.2 Future Work

For the future development of vehicle maintenance, through computer vision, the addi-tional integrations that can be had on this project, that would benefit the efficiency androbustness of the work application are:

• Application of Visual Servoing in real-time execution.

• Implementation of measurements with a camera sensor combined with an inertial unit.

• Introduction of sensor that provide additional information of the vehicle altitude.

• Addition of external sensors for the ground truthing of the position.

63

References

[1] R. Hartley and A. Zisserman, Multiple view geometry in computer vision. Cambridgeuniversity press, 2003.

[2] E. Dubrofsky, “Homography estimation,” Diplomova prace. Vancouver: UniverzitaBritske Kolumbie, 2009.

[3] A. Gartia, “Camera calibration and fundamental matrix estimation with RANSAC.”[Online; visited at 1-December-2019].

[4] J. Heikkila and O. Silven, “A four-step camera calibration procedure with implicit imagecorrection,” in Proceedings of IEEE Computer Society Conference on Computer Visionand Pattern Recognition, pp. 1106–1112, June 1997.

[5] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Internationaljournal of computer vision, vol. 60, no. 2, pp. 91–110, 2004.

[6] M. Brown and D. Lowe, “Invariant features from interest point groups,” in British Ma-chine Vision Conference, pp. 656–665, 2002.

[7] G. Bradski, “Introduction to SIFT (Scale-Invariant Feature Transform),” 2013. [Online;visited at 14-September-2019].

[8] F. Chaumette and S. Hutchinson, “Visual servo control. I. Basic approaches,” IEEERobotics & Automation Magazine, vol. 13, no. 4, pp. 82–90, 2006.

[9] F. Chaumette and S. Hutchinson, “Visual servo control. II. Advanced approaches,” IEEERobotics & Automation Magazine, vol. 14, no. 1, pp. 109–118, 2007.

[10] S. Hutchinson, G. D. Hager, and P. I. Corke, “A tutorial on visual servo control,” IEEETransactions on Robotics and Automation, vol. 12, no. 5, pp. 651–670, 1996.

[11] S. van der Zwaan, A. Bernardino, and J. Santos-Victor, “Vision based station keeping anddocking for an aerial blimp,” in Proceedings. 2000 IEEE/RSJ International Conferenceon Intelligent Robots and Systems (IROS 2000) (Cat. No.00CH37113), vol. 1, pp. 614–619 vol.1, Oct 2000.

[12] T. Yamasaki and N. Goto, “Identification of blimp dynamics via flight tests,” Trans. ofThe Japan Society for Aeronautical and Space Sciences, vol. 46, no. 153, pp. 195–205,2003.

[13] G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software Tools, 2000.

64

[14] T. I. Fossen, Marine Control Systems. Guidance, Navigation, and Control of Ships, Rigsand Underwater Vehicles. Marine Cybernetics, 2002.

[15] Q. Tao, J. Cha, M. Hou, and F. Zhang, “Parameter identification of blimp dynamicsthrough swinging motion,” in Proc. 15th International Conference on Control, Automa-tion, Robotics and Vision (ICARCV). Singapore, November 18-21, 2018, pp. 1186–1191,2018.

65

pedro nunes controlo/manuten˘c~ao da posi˘c~ao de ve culos

Documents