/usr/share/doc/libeigen3-dev/html/TopicInsideEigenExample.html is in libeigen3-doc 3.3.4-4.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<meta name="generator" content="Doxygen 1.8.13"/>
<title>Eigen: What happens inside Eigen, on a simple example</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="navtreedata.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript">
$(document).ready(initResizable);
</script>
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="search/searchdata.js"></script>
<script type="text/javascript" src="search/search.js"></script>
<script type="text/javascript">
$(document).ready(function() { init_search(); });
</script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
<link href="eigendoxy.css" rel="stylesheet" type="text/css">
<!-- -->
<script type="text/javascript" src="eigen_navtree_hacks.js"></script>
<!-- <script type="text/javascript"> -->
<!-- </script> -->
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<!-- <a name="top"></a> -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 56px;">
<td id="projectlogo"><img alt="Logo" src="Eigen_Silly_Professor_64x64.png"/></td>
<td style="padding-left: 0.5em;">
<div id="projectname"><a href="http://eigen.tuxfamily.org">Eigen</a>
 <span id="projectnumber">3.3.4</span>
</div>
</td>
<td> <div id="MSearchBox" class="MSearchBoxInactive">
<span class="left">
<img id="MSearchSelect" src="search/mag_sel.png"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
alt=""/>
<input type="text" id="MSearchField" value="Search" accesskey="S"
onfocus="searchBox.OnSearchFieldFocus(true)"
onblur="searchBox.OnSearchFieldFocus(false)"
onkeyup="searchBox.OnSearchFieldChange(event)"/>
</span><span class="right">
<a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a>
</span>
</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.13 -->
<script type="text/javascript">
var searchBox = new SearchBox("searchBox", "search",false,'Search');
</script>
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
<div id="nav-tree">
<div id="nav-tree-contents">
<div id="nav-sync" class="sync"></div>
</div>
</div>
<div id="splitbar" style="-moz-user-select:none;"
class="ui-resizable-handle">
</div>
</div>
<script type="text/javascript">
$(document).ready(function(){initNavTree('TopicInsideEigenExample.html','');});
</script>
<div id="doc-content">
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
onkeydown="return searchBox.OnSearchSelectKey(event)">
</div>
<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<iframe src="javascript:void(0)" frameborder="0"
name="MSearchResults" id="MSearchResults">
</iframe>
</div>
<div class="header">
<div class="headertitle">
<div class="title">What happens inside <a class="el" href="namespaceEigen.html" title="Namespace containing all symbols from the Eigen library. ">Eigen</a>, on a simple example </div> </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><hr/>
<p>Consider the following example program:</p>
<div class="fragment"><div class="line"><span class="preprocessor">#include<Eigen/Core></span></div><div class="line"></div><div class="line"><span class="keywordtype">int</span> main()</div><div class="line">{</div><div class="line"> <span class="keywordtype">int</span> size = 50;</div><div class="line"> <span class="comment">// VectorXf is a vector of floats, with dynamic size.</span></div><div class="line"> <a class="code" href="classEigen_1_1Matrix.html">Eigen::VectorXf</a> u(size), v(size), w(size);</div><div class="line"> u = v + w;</div><div class="line">}</div></div><!-- fragment --><p>The goal of this page is to understand how <a class="el" href="namespaceEigen.html" title="Namespace containing all symbols from the Eigen library. ">Eigen</a> compiles it, assuming that SSE2 vectorization is enabled (GCC option -msse2).</p>
<h1><a class="anchor" id="WhyInteresting"></a>
Why it's interesting</h1>
<p>Maybe you think, that the above example program is so simple, that compiling it shouldn't involve anything interesting. So before starting, let us explain what is nontrivial in compiling it correctly – that is, producing optimized code – so that the complexity of <a class="el" href="namespaceEigen.html" title="Namespace containing all symbols from the Eigen library. ">Eigen</a>, that we'll explain here, is really useful.</p>
<p>Look at the line of code </p><div class="fragment"><div class="line">u = v + w; <span class="comment">// (*)</span></div></div><!-- fragment --><p>The first important thing about compiling it, is that the arrays should be traversed only once, like </p><div class="fragment"><div class="line"><span class="keywordflow">for</span>(<span class="keywordtype">int</span> i = 0; i < size; i++) u[i] = v[i] + w[i];</div></div><!-- fragment --><p> The problem is that if we make a naive C++ library where the VectorXf class has an operator+ returning a VectorXf, then the line of code (*) will amount to: </p><div class="fragment"><div class="line"><a class="code" href="group__matrixtypedefs.html#ga8028d921d43acd5605eabad41c254ef2">VectorXf</a> tmp = v + w;</div><div class="line"><a class="code" href="group__matrixtypedefs.html#ga8028d921d43acd5605eabad41c254ef2">VectorXf</a> u = tmp;</div></div><!-- fragment --><p> Obviously, the introduction of the temporary <em>tmp</em> here is useless. It has a very bad effect on performance, first because the creation of <em>tmp</em> requires a dynamic memory allocation in this context, and second as there are now two for loops: </p><div class="fragment"><div class="line"><span class="keywordflow">for</span>(<span class="keywordtype">int</span> i = 0; i < size; i++) tmp[i] = v[i] + w[i];</div><div class="line"><span class="keywordflow">for</span>(<span class="keywordtype">int</span> i = 0; i < size; i++) u[i] = tmp[i];</div></div><!-- fragment --><p> Traversing the arrays twice instead of once is terrible for performance, as it means that we do many redundant memory accesses.</p>
<p>The second important thing about compiling the above program, is to make correct use of SSE2 instructions. Notice that <a class="el" href="namespaceEigen.html" title="Namespace containing all symbols from the Eigen library. ">Eigen</a> also supports AltiVec and that all the discussion that we make here applies also to AltiVec.</p>
<p>SSE2, like AltiVec, is a set of instructions allowing to perform computations on packets of 128 bits at once. Since a float is 32 bits, this means that SSE2 instructions can handle 4 floats at once. This means that, if correctly used, they can make our computation go up to 4x faster.</p>
<p>However, in the above program, we have chosen size=50, so our vectors consist of 50 float's, and 50 is not a multiple of 4. This means that we cannot hope to do all of that computation using SSE2 instructions. The second best thing, to which we should aim, is to handle the 48 first coefficients with SSE2 instructions, since 48 is the biggest multiple of 4 below 50, and then handle separately, without SSE2, the 49th and 50th coefficients. Something like this:</p>
<div class="fragment"><div class="line"><span class="keywordflow">for</span>(<span class="keywordtype">int</span> i = 0; i < 4*(size/4); i+=4) u.packet(i) = v.packet(i) + w.packet(i);</div><div class="line"><span class="keywordflow">for</span>(<span class="keywordtype">int</span> i = 4*(size/4); i < size; i++) u[i] = v[i] + w[i];</div></div><!-- fragment --><p>So let us look line by line at our example program, and let's follow <a class="el" href="namespaceEigen.html" title="Namespace containing all symbols from the Eigen library. ">Eigen</a> as it compiles it.</p>
<h1><a class="anchor" id="ConstructingVectors"></a>
Constructing vectors</h1>
<p>Let's analyze the first line:</p>
<div class="fragment"><div class="line"><a class="code" href="classEigen_1_1Matrix.html">Eigen::VectorXf</a> u(size), v(size), w(size);</div></div><!-- fragment --><p>First of all, VectorXf is the following typedef: </p><div class="fragment"><div class="line"><span class="keyword">typedef</span> Matrix<float, Dynamic, 1> <a class="code" href="group__matrixtypedefs.html#ga8028d921d43acd5605eabad41c254ef2">VectorXf</a>;</div></div><!-- fragment --><p>The class template <a class="el" href="classEigen_1_1Matrix.html" title="The matrix class, also used for vectors and row-vectors. ">Matrix</a> is declared in <a class="el" href="ForwardDeclarations_8h_source.html">src/Core/util/ForwardDeclarations.h</a> with 6 template parameters, but the last 3 are automatically determined by the first 3. So you don't need to worry about them for now. Here, <a class="el" href="classEigen_1_1Matrix.html" title="The matrix class, also used for vectors and row-vectors. ">Matrix</a><float, Dynamic, 1> means a matrix of floats, with a dynamic number of rows and 1 column.</p>
<p>The <a class="el" href="classEigen_1_1Matrix.html" title="The matrix class, also used for vectors and row-vectors. ">Matrix</a> class inherits a base class, <a class="el" href="classEigen_1_1MatrixBase.html" title="Base class for all dense matrices, vectors, and expressions. ">MatrixBase</a>. Don't worry about it, for now it suffices to say that <a class="el" href="classEigen_1_1MatrixBase.html" title="Base class for all dense matrices, vectors, and expressions. ">MatrixBase</a> is what unifies matrices/vectors and all the expressions types – more on that below.</p>
<p>When we do </p><div class="fragment"><div class="line"><a class="code" href="classEigen_1_1Matrix.html">Eigen::VectorXf</a> u(size);</div></div><!-- fragment --><p> the constructor that is called is Matrix::Matrix(int), in <a class="el" href="Matrix_8h_source.html">src/Core/Matrix.h</a>. Besides some assertions, all it does is to construct the <em>m_storage</em> member, which is of type DenseStorage<float, Dynamic, Dynamic, 1>.</p>
<p>You may wonder, isn't it overengineering to have the storage in a separate class? The reason is that the <a class="el" href="classEigen_1_1Matrix.html" title="The matrix class, also used for vectors and row-vectors. ">Matrix</a> class template covers all kinds of matrices and vector: both fixed-size and dynamic-size. The storage method is not the same in these two cases. For fixed-size, the matrix coefficients are stored as a plain member array. For dynamic-size, the coefficients will be stored as a pointer to a dynamically-allocated array. Because of this, we need to abstract storage away from the <a class="el" href="classEigen_1_1Matrix.html" title="The matrix class, also used for vectors and row-vectors. ">Matrix</a> class. That's DenseStorage.</p>
<p>Let's look at this constructor, in <a class="el" href="DenseStorage_8h_source.html">src/Core/DenseStorage.h</a>. You can see that there are many partial template specializations of DenseStorages here, treating separately the cases where dimensions are Dynamic or fixed at compile-time. The partial specialization that we are looking at is: </p><div class="fragment"><div class="line"><span class="keyword">template</span><<span class="keyword">typename</span> T, <span class="keywordtype">int</span> _Cols> <span class="keyword">class </span>DenseStorage<T, <a class="code" href="namespaceEigen.html#ad81fa7195215a0ce30017dfac309f0b2">Dynamic</a>, <a class="code" href="namespaceEigen.html#ad81fa7195215a0ce30017dfac309f0b2">Dynamic</a>, _Cols></div></div><!-- fragment --><p>Here, the constructor called is DenseStorage::DenseStorage(int size, int rows, int columns) with size=50, rows=50, columns=1.</p>
<p>Here is this constructor: </p><div class="fragment"><div class="line"><span class="keyword">inline</span> DenseStorage(<span class="keywordtype">int</span> size, <span class="keywordtype">int</span> rows, <span class="keywordtype">int</span>) : m_data(<a class="code" href="namespaceinternal.html">internal</a>::aligned_new<T>(size)), m_rows(rows) {}</div></div><!-- fragment --><p>Here, the <em>m_data</em> member is the actual array of coefficients of the matrix. As you see, it is dynamically allocated. Rather than calling new[] or malloc(), as you can see, we have our own internal::aligned_new defined in <a class="el" href="Memory_8h_source.html">src/Core/util/Memory.h</a>. What it does is that if vectorization is enabled, then it uses a platform-specific call to allocate a 128-bit-aligned array, as that is very useful for vectorization with both SSE2 and AltiVec. If vectorization is disabled, it amounts to the standard new[].</p>
<p>As you can see, the constructor also sets the <em>m_rows</em> member to <em>size</em>. Notice that there is no <em>m_columns</em> member: indeed, in this partial specialization of DenseStorage, we know the number of columns at compile-time, since the _Cols template parameter is different from Dynamic. Namely, in our case, _Cols is 1, which is to say that our vector is just a matrix with 1 column. Hence, there is no need to store the number of columns as a runtime variable.</p>
<p>When you call <a class="el" href="classEigen_1_1PlainObjectBase.html#ad12a492bcadea9b65ccd9bc8404c01f1">VectorXf::data()</a> to get the pointer to the array of coefficients, it returns DenseStorage::data() which returns the <em>m_data</em> member.</p>
<p>When you call <a class="el" href="structEigen_1_1EigenBase.html#adb0d244a512a784ded5319bd1a848440">VectorXf::size()</a> to get the size of the vector, this is actually a method in the base class <a class="el" href="classEigen_1_1MatrixBase.html" title="Base class for all dense matrices, vectors, and expressions. ">MatrixBase</a>. It determines that the vector is a column-vector, since ColsAtCompileTime==1 (this comes from the template parameters in the typedef VectorXf). It deduces that the size is the number of rows, so it returns VectorXf::rows(), which returns DenseStorage::rows(), which returns the <em>m_rows</em> member, which was set to <em>size</em> by the constructor.</p>
<h1><a class="anchor" id="ConstructionOfSumXpr"></a>
Construction of the sum expression</h1>
<p>Now that our vectors are constructed, let's move on to the next line:</p>
<div class="fragment"><div class="line">u = v + w;</div></div><!-- fragment --><p>The executive summary is that operator+ returns a "sum of vectors" expression, but doesn't actually perform the computation. It is the operator=, whose call occurs thereafter, that does the computation.</p>
<p>Let us now see what <a class="el" href="namespaceEigen.html" title="Namespace containing all symbols from the Eigen library. ">Eigen</a> does when it sees this:</p>
<div class="fragment"><div class="line">v + w</div></div><!-- fragment --><p>Here, v and w are of type VectorXf, which is a typedef for a specialization of <a class="el" href="classEigen_1_1Matrix.html" title="The matrix class, also used for vectors and row-vectors. ">Matrix</a> (as we explained above), which is a subclass of <a class="el" href="classEigen_1_1MatrixBase.html" title="Base class for all dense matrices, vectors, and expressions. ">MatrixBase</a>. So what is being called is</p>
<div class="fragment"><div class="line"><a class="code" href="classEigen_1_1MatrixBase.html#acc4d02dad6db83c4235dafbaaebe140f">MatrixBase::operator+</a>(<span class="keyword">const</span> MatrixBase&)</div></div><!-- fragment --><p>The return type of this operator is </p><div class="fragment"><div class="line">CwiseBinaryOp<internal::scalar_sum_op<float>, <a class="code" href="group__matrixtypedefs.html#ga8028d921d43acd5605eabad41c254ef2">VectorXf</a>, VectorXf></div></div><!-- fragment --><p> The <a class="el" href="classEigen_1_1CwiseBinaryOp.html" title="Generic expression where a coefficient-wise binary operator is applied to two expressions. ">CwiseBinaryOp</a> class is our first encounter with an expression template. As we said, the operator+ doesn't by itself perform any computation, it just returns an abstract "sum of vectors" expression. Since there are also "difference of vectors" and "coefficient-wise product of vectors" expressions, we unify them all as "coefficient-wise binary operations", which we abbreviate as "CwiseBinaryOp". "Coefficient-wise" means that the operations is performed coefficient by coefficient. "binary" means that there are two operands – we are adding two vectors with one another.</p>
<p>Now you might ask, what if we did something like</p>
<div class="fragment"><div class="line">v + w + u;</div></div><!-- fragment --><p>The first v + w would return a <a class="el" href="classEigen_1_1CwiseBinaryOp.html" title="Generic expression where a coefficient-wise binary operator is applied to two expressions. ">CwiseBinaryOp</a> as above, so in order for this to compile, we'd need to define an operator+ also in the class <a class="el" href="classEigen_1_1CwiseBinaryOp.html" title="Generic expression where a coefficient-wise binary operator is applied to two expressions. ">CwiseBinaryOp</a>... at this point it starts looking like a nightmare: are we going to have to define all operators in each of the expression classes (as you guessed, <a class="el" href="classEigen_1_1CwiseBinaryOp.html" title="Generic expression where a coefficient-wise binary operator is applied to two expressions. ">CwiseBinaryOp</a> is only one of many) ? This looks like a dead end!</p>
<p>The solution is that <a class="el" href="classEigen_1_1CwiseBinaryOp.html" title="Generic expression where a coefficient-wise binary operator is applied to two expressions. ">CwiseBinaryOp</a> itself, as well as <a class="el" href="classEigen_1_1Matrix.html" title="The matrix class, also used for vectors and row-vectors. ">Matrix</a> and all the other expression types, is a subclass of <a class="el" href="classEigen_1_1MatrixBase.html" title="Base class for all dense matrices, vectors, and expressions. ">MatrixBase</a>. So it is enough to define once and for all the operators in class <a class="el" href="classEigen_1_1MatrixBase.html" title="Base class for all dense matrices, vectors, and expressions. ">MatrixBase</a>.</p>
<p>Since <a class="el" href="classEigen_1_1MatrixBase.html" title="Base class for all dense matrices, vectors, and expressions. ">MatrixBase</a> is the common base class of different subclasses, the aspects that depend on the subclass must be abstracted from <a class="el" href="classEigen_1_1MatrixBase.html" title="Base class for all dense matrices, vectors, and expressions. ">MatrixBase</a>. This is called polymorphism.</p>
<p>The classical approach to polymorphism in C++ is by means of virtual functions. This is dynamic polymorphism. Here we don't want dynamic polymorphism because the whole design of <a class="el" href="namespaceEigen.html" title="Namespace containing all symbols from the Eigen library. ">Eigen</a> is based around the assumption that all the complexity, all the abstraction, gets resolved at compile-time. This is crucial: if the abstraction can't get resolved at compile-time, <a class="el" href="namespaceEigen.html" title="Namespace containing all symbols from the Eigen library. ">Eigen</a>'s compile-time optimization mechanisms become useless, not to mention that if that abstraction has to be resolved at runtime it'll incur an overhead by itself.</p>
<p>Here, what we want is to have a single class <a class="el" href="classEigen_1_1MatrixBase.html" title="Base class for all dense matrices, vectors, and expressions. ">MatrixBase</a> as the base of many subclasses, in such a way that each <a class="el" href="classEigen_1_1MatrixBase.html" title="Base class for all dense matrices, vectors, and expressions. ">MatrixBase</a> object (be it a matrix, or vector, or any kind of expression) knows at compile-time (as opposed to run-time) of which particular subclass it is an object (i.e. whether it is a matrix, or an expression, and what kind of expression).</p>
<p>The solution is the <a href="http://en.wikipedia.org/wiki/Curiously_Recurring_Template_Pattern">Curiously Recurring Template Pattern</a>. Let's do the break now. Hopefully you can read this wikipedia page during the break if needed, but it won't be allowed during the exam.</p>
<p>In short, <a class="el" href="classEigen_1_1MatrixBase.html" title="Base class for all dense matrices, vectors, and expressions. ">MatrixBase</a> takes a template parameter <em>Derived</em>. Whenever we define a subclass Subclass, we actually make Subclass inherit <a class="el" href="classEigen_1_1MatrixBase.html" title="Base class for all dense matrices, vectors, and expressions. ">MatrixBase</a><Subclass>. The point is that different subclasses inherit different <a class="el" href="classEigen_1_1MatrixBase.html" title="Base class for all dense matrices, vectors, and expressions. ">MatrixBase</a> types. Thanks to this, whenever we have an object of a subclass, and we call on it some <a class="el" href="classEigen_1_1MatrixBase.html" title="Base class for all dense matrices, vectors, and expressions. ">MatrixBase</a> method, we still remember even from inside the <a class="el" href="classEigen_1_1MatrixBase.html" title="Base class for all dense matrices, vectors, and expressions. ">MatrixBase</a> method which particular subclass we're talking about.</p>
<p>This means that we can put almost all the methods and operators in the base class <a class="el" href="classEigen_1_1MatrixBase.html" title="Base class for all dense matrices, vectors, and expressions. ">MatrixBase</a>, and have only the bare minimum in the subclasses. If you look at the subclasses in <a class="el" href="namespaceEigen.html" title="Namespace containing all symbols from the Eigen library. ">Eigen</a>, like for instance the <a class="el" href="classEigen_1_1CwiseBinaryOp.html" title="Generic expression where a coefficient-wise binary operator is applied to two expressions. ">CwiseBinaryOp</a> class, they have very few methods. There are coeff() and sometimes coeffRef() methods for access to the coefficients, there are rows() and cols() methods returning the number of rows and columns, but there isn't much more than that. All the meat is in <a class="el" href="classEigen_1_1MatrixBase.html" title="Base class for all dense matrices, vectors, and expressions. ">MatrixBase</a>, so it only needs to be coded once for all kinds of expressions, matrices, and vectors.</p>
<p>So let's end this digression and come back to the piece of code from our example program that we were currently analyzing,</p>
<div class="fragment"><div class="line">v + w</div></div><!-- fragment --><p>Now that <a class="el" href="classEigen_1_1MatrixBase.html" title="Base class for all dense matrices, vectors, and expressions. ">MatrixBase</a> is a good friend, let's write fully the prototype of the operator+ that gets called here (this code is from <a class="el" href="MatrixBase_8h_source.html">src/Core/MatrixBase.h</a>):</p>
<div class="fragment"><div class="line"><span class="keyword">template</span><<span class="keyword">typename</span> Derived></div><div class="line"><span class="keyword">class </span>MatrixBase</div><div class="line">{</div><div class="line"> <span class="comment">// ...</span></div><div class="line"></div><div class="line"> <span class="keyword">template</span><<span class="keyword">typename</span> OtherDerived></div><div class="line"> <span class="keyword">const</span> CwiseBinaryOp<internal::scalar_sum_op<typename internal::traits<Derived>::Scalar>, Derived, OtherDerived></div><div class="line"> <a class="code" href="classEigen_1_1MatrixBase.html#acc4d02dad6db83c4235dafbaaebe140f">operator+</a>(<span class="keyword">const</span> MatrixBase<OtherDerived> &other) <span class="keyword">const</span>;</div><div class="line"></div><div class="line"> <span class="comment">// ...</span></div><div class="line">};</div></div><!-- fragment --><p>Here of course, <em>Derived</em> and <em>OtherDerived</em> are VectorXf.</p>
<p>As we said, <a class="el" href="classEigen_1_1CwiseBinaryOp.html" title="Generic expression where a coefficient-wise binary operator is applied to two expressions. ">CwiseBinaryOp</a> is also used for other operations such as substration, so it takes another template parameter determining the operation that will be applied to coefficients. This template parameter is a functor, that is, a class in which we have an operator() so it behaves like a function. Here, the functor used is internal::scalar_sum_op. It is defined in src/Core/Functors.h.</p>
<p>Let us now explain the internal::traits here. The internal::scalar_sum_op class takes one template parameter: the type of the numbers to handle. Here of course we want to pass the scalar type (a.k.a. numeric type) of VectorXf, which is <code>float</code>. How do we determine which is the scalar type of <em>Derived</em> ? Throughout <a class="el" href="namespaceEigen.html" title="Namespace containing all symbols from the Eigen library. ">Eigen</a>, all matrix and expression types define a typedef <em>Scalar</em> which gives its scalar type. For example, VectorXf::Scalar is a typedef for <code>float</code>. So here, if life was easy, we could find the numeric type of <em>Derived</em> as just </p><div class="fragment"><div class="line"><span class="keyword">typename</span> Derived::Scalar</div></div><!-- fragment --><p> Unfortunately, we can't do that here, as the compiler would complain that the type Derived hasn't yet been defined. So we use a workaround: in <a class="el" href="ForwardDeclarations_8h_source.html">src/Core/util/ForwardDeclarations.h</a>, we declared (not defined!) all our subclasses, like <a class="el" href="classEigen_1_1Matrix.html" title="The matrix class, also used for vectors and row-vectors. ">Matrix</a>, and we also declared the following class template: </p><div class="fragment"><div class="line"><span class="keyword">template</span><<span class="keyword">typename</span> T> <span class="keyword">struct </span>internal::traits;</div></div><!-- fragment --><p> In <a class="el" href="Matrix_8h_source.html">src/Core/Matrix.h</a>, right <em>before</em> the definition of class <a class="el" href="classEigen_1_1Matrix.html" title="The matrix class, also used for vectors and row-vectors. ">Matrix</a>, we define a partial specialization of internal::traits for T=<a class="el" href="classEigen_1_1Matrix.html" title="The matrix class, also used for vectors and row-vectors. ">Matrix</a><any template parameters>. In this specialization of internal::traits, we define the Scalar typedef. So when we actually define <a class="el" href="classEigen_1_1Matrix.html" title="The matrix class, also used for vectors and row-vectors. ">Matrix</a>, it is legal to refer to "typename internal::traits\<Matrix\>::Scalar".</p>
<p>Anyway, we have declared our operator+. In our case, where <em>Derived</em> and <em>OtherDerived</em> are VectorXf, the above declaration amounts to: </p><div class="fragment"><div class="line"><span class="keyword">class </span>MatrixBase<VectorXf></div><div class="line">{</div><div class="line"> <span class="comment">// ...</span></div><div class="line"></div><div class="line"> <span class="keyword">const</span> CwiseBinaryOp<internal::scalar_sum_op<float>, <a class="code" href="group__matrixtypedefs.html#ga8028d921d43acd5605eabad41c254ef2">VectorXf</a>, VectorXf></div><div class="line"> <a class="code" href="classEigen_1_1MatrixBase.html#acc4d02dad6db83c4235dafbaaebe140f">operator+</a>(<span class="keyword">const</span> MatrixBase<VectorXf> &other) <span class="keyword">const</span>;</div><div class="line"></div><div class="line"> <span class="comment">// ...</span></div><div class="line">};</div></div><!-- fragment --><p>Let's now jump to <a class="el" href="CwiseBinaryOp_8h_source.html">src/Core/CwiseBinaryOp.h</a> to see how it is defined. As you can see there, all it does is to return a <a class="el" href="classEigen_1_1CwiseBinaryOp.html" title="Generic expression where a coefficient-wise binary operator is applied to two expressions. ">CwiseBinaryOp</a> object, and this object is just storing references to the left-hand-side and right-hand-side expressions – here, these are the vectors <em>v</em> and <em>w</em>. Well, the <a class="el" href="classEigen_1_1CwiseBinaryOp.html" title="Generic expression where a coefficient-wise binary operator is applied to two expressions. ">CwiseBinaryOp</a> object is also storing an instance of the (empty) functor class, but you shouldn't worry about it as that is a minor implementation detail.</p>
<p>Thus, the operator+ hasn't performed any actual computation. To summarize, the operation <em>v</em> + <em>w</em> just returned an object of type <a class="el" href="classEigen_1_1CwiseBinaryOp.html" title="Generic expression where a coefficient-wise binary operator is applied to two expressions. ">CwiseBinaryOp</a> which did nothing else than just storing references to <em>v</em> and <em>w</em>.</p>
<h1><a class="anchor" id="Assignment"></a>
The assignment</h1>
<p>At this point, the expression <em>v</em> + <em>w</em> has finished evaluating, so, in the process of compiling the line of code </p><div class="fragment"><div class="line">u = v + w;</div></div><!-- fragment --><p> we now enter the operator=.</p>
<p>What operator= is being called here? The vector u is an object of class VectorXf, i.e. <a class="el" href="classEigen_1_1Matrix.html" title="The matrix class, also used for vectors and row-vectors. ">Matrix</a>. In <a class="el" href="Matrix_8h_source.html">src/Core/Matrix.h</a>, inside the definition of class <a class="el" href="classEigen_1_1Matrix.html" title="The matrix class, also used for vectors and row-vectors. ">Matrix</a>, we see this: </p><div class="fragment"><div class="line"><span class="keyword">template</span><<span class="keyword">typename</span> OtherDerived></div><div class="line"><span class="keyword">inline</span> Matrix& operator=(<span class="keyword">const</span> MatrixBase<OtherDerived>& other)</div><div class="line">{</div><div class="line"> eigen_assert(m_storage.data()!=0 && <span class="stringliteral">"you cannot use operator= with a non initialized matrix (instead use set()"</span>);</div><div class="line"> <span class="keywordflow">return</span> Base::operator=(other.derived());</div><div class="line">}</div></div><!-- fragment --><p> Here, Base is a typedef for <a class="el" href="classEigen_1_1MatrixBase.html" title="Base class for all dense matrices, vectors, and expressions. ">MatrixBase</a><<a class="el" href="classEigen_1_1Matrix.html" title="The matrix class, also used for vectors and row-vectors. ">Matrix</a>>. So, what is being called is the operator= of <a class="el" href="classEigen_1_1MatrixBase.html" title="Base class for all dense matrices, vectors, and expressions. ">MatrixBase</a>. Let's see its prototype in <a class="el" href="MatrixBase_8h_source.html">src/Core/MatrixBase.h</a>: </p><div class="fragment"><div class="line"><span class="keyword">template</span><<span class="keyword">typename</span> OtherDerived></div><div class="line">Derived& operator=(<span class="keyword">const</span> MatrixBase<OtherDerived>& other);</div></div><!-- fragment --><p> Here, <em>Derived</em> is VectorXf (since u is a VectorXf) and <em>OtherDerived</em> is <a class="el" href="classEigen_1_1CwiseBinaryOp.html" title="Generic expression where a coefficient-wise binary operator is applied to two expressions. ">CwiseBinaryOp</a>. More specifically, as explained in the previous section, <em>OtherDerived</em> is: </p><div class="fragment"><div class="line">CwiseBinaryOp<internal::scalar_sum_op<float>, <a class="code" href="group__matrixtypedefs.html#ga8028d921d43acd5605eabad41c254ef2">VectorXf</a>, VectorXf></div></div><!-- fragment --><p> So the full prototype of the operator= being called is: </p><div class="fragment"><div class="line">VectorXf& <a class="code" href="classEigen_1_1MatrixBase.html#a373bf62ad398162df5a71963ed7cbeff">MatrixBase<VectorXf>::operator=</a>(<span class="keyword">const</span> MatrixBase<CwiseBinaryOp<internal::scalar_sum_op<float>, VectorXf, VectorXf> > & other);</div></div><!-- fragment --><p> This operator= literally reads "copying a sum of two VectorXf's into another VectorXf".</p>
<p>Let's now look at the implementation of this operator=. It resides in the file <a class="el" href="Assign_8h_source.html">src/Core/Assign.h</a>.</p>
<p>What we can see there is: </p><div class="fragment"><div class="line"><span class="keyword">template</span><<span class="keyword">typename</span> Derived></div><div class="line"><span class="keyword">template</span><<span class="keyword">typename</span> OtherDerived></div><div class="line"><span class="keyword">inline</span> Derived& <a class="code" href="classEigen_1_1MatrixBase.html#a373bf62ad398162df5a71963ed7cbeff">MatrixBase<Derived></a></div><div class="line"><a class="code" href="classEigen_1_1MatrixBase.html#a373bf62ad398162df5a71963ed7cbeff"> ::operator=</a>(<span class="keyword">const</span> MatrixBase<OtherDerived>& other)</div><div class="line">{</div><div class="line"> <span class="keywordflow">return</span> internal::assign_selector<Derived,OtherDerived>::run(derived(), other.derived());</div><div class="line">}</div></div><!-- fragment --><p>OK so our next task is to understand internal::assign_selector :)</p>
<p>Here is its declaration (all that is still in the same file <a class="el" href="Assign_8h_source.html">src/Core/Assign.h</a>) </p><div class="fragment"><div class="line"><span class="keyword">template</span><<span class="keyword">typename</span> Derived, <span class="keyword">typename</span> OtherDerived,</div><div class="line"> <span class="keywordtype">bool</span> EvalBeforeAssigning = int(OtherDerived::Flags) & <a class="code" href="group__flags.html#ga0972b20dc004d13984e642b3bd12532e">EvalBeforeAssigningBit</a>,</div><div class="line"> <span class="keywordtype">bool</span> NeedToTranspose = Derived::IsVectorAtCompileTime</div><div class="line"> && OtherDerived::IsVectorAtCompileTime</div><div class="line"> && int(Derived::RowsAtCompileTime) == int(OtherDerived::ColsAtCompileTime)</div><div class="line"> && int(Derived::ColsAtCompileTime) == int(OtherDerived::RowsAtCompileTime)</div><div class="line"> && int(Derived::SizeAtCompileTime) != 1></div><div class="line"><span class="keyword">struct </span>internal::assign_selector;</div></div><!-- fragment --><p>So internal::assign_selector takes 4 template parameters, but the 2 last ones are automatically determined by the 2 first ones.</p>
<p>EvalBeforeAssigning is here to enforce the EvalBeforeAssigningBit. As explained <a href="TopicLazyEvaluation.html">here</a>, certain expressions have this flag which makes them automatically evaluate into temporaries before assigning them to another expression. This is the case of the <a class="el" href="classEigen_1_1Product.html" title="Expression of the product of two arbitrary matrices or vectors. ">Product</a> expression, in order to avoid strange aliasing effects when doing "m = m * m;" However, of course here our <a class="el" href="classEigen_1_1CwiseBinaryOp.html" title="Generic expression where a coefficient-wise binary operator is applied to two expressions. ">CwiseBinaryOp</a> expression doesn't have the EvalBeforeAssigningBit: we said since the beginning that we didn't want a temporary to be introduced here. So if you go to <a class="el" href="CwiseBinaryOp_8h_source.html">src/Core/CwiseBinaryOp.h</a>, you'll see that the Flags in internal::traits<<a class="el" href="classEigen_1_1CwiseBinaryOp.html" title="Generic expression where a coefficient-wise binary operator is applied to two expressions. ">CwiseBinaryOp</a>> don't include the EvalBeforeAssigningBit. The Flags member of <a class="el" href="classEigen_1_1CwiseBinaryOp.html" title="Generic expression where a coefficient-wise binary operator is applied to two expressions. ">CwiseBinaryOp</a> is then imported from the internal::traits by the EIGEN_GENERIC_PUBLIC_INTERFACE macro. Anyway, here the template parameter EvalBeforeAssigning has the value <code>false</code>.</p>
<p>NeedToTranspose is here for the case where the user wants to copy a row-vector into a column-vector. We allow this as a special exception to the general rule that in assignments we require the dimesions to match. Anyway, here both the left-hand and right-hand sides are column vectors, in the sense that ColsAtCompileTime is equal to 1. So NeedToTranspose is <code>false</code> too.</p>
<p>So, here we are in the partial specialization: </p><div class="fragment"><div class="line">internal::assign_selector<Derived, OtherDerived, false, false></div></div><!-- fragment --><p>Here's how it is defined: </p><div class="fragment"><div class="line"><span class="keyword">template</span><<span class="keyword">typename</span> Derived, <span class="keyword">typename</span> OtherDerived></div><div class="line"><span class="keyword">struct </span>internal::assign_selector<Derived,OtherDerived,false,false> {</div><div class="line"> <span class="keyword">static</span> Derived& run(Derived& dst, <span class="keyword">const</span> OtherDerived& other) { <span class="keywordflow">return</span> dst.lazyAssign(other.derived()); }</div><div class="line">};</div></div><!-- fragment --><p>OK so now our next job is to understand how lazyAssign works :)</p>
<div class="fragment"><div class="line"><span class="keyword">template</span><<span class="keyword">typename</span> Derived></div><div class="line"><span class="keyword">template</span><<span class="keyword">typename</span> OtherDerived></div><div class="line"><span class="keyword">inline</span> Derived& MatrixBase<Derived></div><div class="line"> ::lazyAssign(<span class="keyword">const</span> MatrixBase<OtherDerived>& other)</div><div class="line">{</div><div class="line"> EIGEN_STATIC_ASSERT_SAME_MATRIX_SIZE(Derived,OtherDerived)</div><div class="line"> eigen_assert(rows() == other.rows() && cols() == other.cols());</div><div class="line"> internal::assign_impl<Derived, OtherDerived>::run(derived(),other.derived());</div><div class="line"> <span class="keywordflow">return</span> derived();</div><div class="line">}</div></div><!-- fragment --><p>What do we see here? Some assertions, and then the only interesting line is: </p><div class="fragment"><div class="line">internal::assign_impl<Derived, OtherDerived>::run(derived(),other.derived());</div></div><!-- fragment --><p>OK so now we want to know what is inside internal::assign_impl.</p>
<p>Here is its declaration: </p><div class="fragment"><div class="line"><span class="keyword">template</span><<span class="keyword">typename</span> Derived1, <span class="keyword">typename</span> Derived2,</div><div class="line"> <span class="keywordtype">int</span> Vectorization = internal::assign_traits<Derived1, Derived2>::Vectorization,</div><div class="line"> <span class="keywordtype">int</span> Unrolling = internal::assign_traits<Derived1, Derived2>::Unrolling></div><div class="line"><span class="keyword">struct </span>internal::assign_impl;</div></div><!-- fragment --><p> Again, internal::assign_selector takes 4 template parameters, but the 2 last ones are automatically determined by the 2 first ones.</p>
<p>These two parameters <em>Vectorization</em> and <em>Unrolling</em> are determined by a helper class internal::assign_traits. Its job is to determine which vectorization strategy to use (that is <em>Vectorization</em>) and which unrolling strategy to use (that is <em>Unrolling</em>).</p>
<p>We'll not enter into the details of how these strategies are chosen (this is in the implementation of internal::assign_traits at the top of the same file). Let's just say that here <em>Vectorization</em> has the value <em>LinearVectorization</em>, and <em>Unrolling</em> has the value <em>NoUnrolling</em> (the latter is obvious since our vectors have dynamic size so there's no way to unroll the loop at compile-time).</p>
<p>So the partial specialization of internal::assign_impl that we're looking at is: </p><div class="fragment"><div class="line">internal::assign_impl<Derived1, Derived2, LinearVectorization, NoUnrolling></div></div><!-- fragment --><p>Here is how it's defined: </p><div class="fragment"><div class="line"><span class="keyword">template</span><<span class="keyword">typename</span> Derived1, <span class="keyword">typename</span> Derived2></div><div class="line"><span class="keyword">struct </span>internal::assign_impl<Derived1, Derived2, LinearVectorization, NoUnrolling></div><div class="line">{</div><div class="line"> <span class="keyword">static</span> <span class="keywordtype">void</span> run(Derived1 &dst, <span class="keyword">const</span> Derived2 &src)</div><div class="line"> {</div><div class="line"> <span class="keyword">const</span> <span class="keywordtype">int</span> size = dst.size();</div><div class="line"> <span class="keyword">const</span> <span class="keywordtype">int</span> packetSize = internal::packet_traits<typename Derived1::Scalar>::size;</div><div class="line"> <span class="keyword">const</span> <span class="keywordtype">int</span> alignedStart = internal::assign_traits<Derived1,Derived2>::DstIsAligned ? 0</div><div class="line"> : internal::first_aligned(&dst.coeffRef(0), size);</div><div class="line"> <span class="keyword">const</span> <span class="keywordtype">int</span> alignedEnd = alignedStart + ((size-alignedStart)/packetSize)*packetSize;</div><div class="line"></div><div class="line"> <span class="keywordflow">for</span>(<span class="keywordtype">int</span> index = 0; index < alignedStart; index++)</div><div class="line"> dst.copyCoeff(index, src);</div><div class="line"></div><div class="line"> <span class="keywordflow">for</span>(<span class="keywordtype">int</span> index = alignedStart; index < alignedEnd; index += packetSize)</div><div class="line"> {</div><div class="line"> dst.template copyPacket<Derived2, Aligned, internal::assign_traits<Derived1,Derived2>::SrcAlignment>(index, src);</div><div class="line"> }</div><div class="line"></div><div class="line"> <span class="keywordflow">for</span>(<span class="keywordtype">int</span> index = alignedEnd; index < size; index++)</div><div class="line"> dst.copyCoeff(index, src);</div><div class="line"> }</div><div class="line">};</div></div><!-- fragment --><p>Here's how it works. <em>LinearVectorization</em> means that the left-hand and right-hand side expression can be accessed linearly i.e. you can refer to their coefficients by one integer <em>index</em>, as opposed to having to refer to its coefficients by two integers <em>row</em>, <em>column</em>.</p>
<p>As we said at the beginning, vectorization works with blocks of 4 floats. Here, <em>PacketSize</em> is 4.</p>
<p>There are two potential problems that we need to deal with: </p><ul>
<li>first, vectorization works much better if the packets are 128-bit-aligned. This is especially important for write access. So when writing to the coefficients of <em>dst</em>, we want to group these coefficients by packets of 4 such that each of these packets is 128-bit-aligned. In general, this requires to skip a few coefficients at the beginning of <em>dst</em>. This is the purpose of <em>alignedStart</em>. We then copy these first few coefficients one by one, not by packets. However, in our case, the <em>dst</em> expression is a VectorXf and remember that in the construction of the vectors we allocated aligned arrays. Thanks to <em>DstIsAligned</em>, <a class="el" href="namespaceEigen.html" title="Namespace containing all symbols from the Eigen library. ">Eigen</a> remembers that without having to do any runtime check, so <em>alignedStart</em> is zero and this part is avoided altogether. </li>
<li>second, the number of coefficients to copy is not in general a multiple of <em>packetSize</em>. Here, there are 50 coefficients to copy and <em>packetSize</em> is 4. So we'll have to copy the last 2 coefficients one by one, not by packets. Here, <em>alignedEnd</em> is 48.</li>
</ul>
<p>Now come the actual loops.</p>
<p>First, the vectorized part: the 48 first coefficients out of 50 will be copied by packets of 4: </p><div class="fragment"><div class="line"><span class="keywordflow">for</span>(<span class="keywordtype">int</span> index = alignedStart; index < alignedEnd; index += packetSize)</div><div class="line">{</div><div class="line"> dst.template copyPacket<Derived2, Aligned, internal::assign_traits<Derived1,Derived2>::SrcAlignment>(index, src);</div><div class="line">}</div></div><!-- fragment --><p>What is copyPacket? It is defined in src/Core/Coeffs.h: </p><div class="fragment"><div class="line"><span class="keyword">template</span><<span class="keyword">typename</span> Derived></div><div class="line"><span class="keyword">template</span><<span class="keyword">typename</span> OtherDerived, <span class="keywordtype">int</span> StoreMode, <span class="keywordtype">int</span> LoadMode></div><div class="line"><span class="keyword">inline</span> <span class="keywordtype">void</span> MatrixBase<Derived>::copyPacket(<span class="keywordtype">int</span> index, <span class="keyword">const</span> MatrixBase<OtherDerived>& other)</div><div class="line">{</div><div class="line"> eigen_internal_assert(index >= 0 && index < size());</div><div class="line"> derived().template writePacket<StoreMode>(index,</div><div class="line"> other.derived().template packet<LoadMode>(index));</div><div class="line">}</div></div><!-- fragment --><p>OK, what are writePacket() and packet() here?</p>
<p>First, writePacket() here is a method on the left-hand side VectorXf. So we go to <a class="el" href="Matrix_8h_source.html">src/Core/Matrix.h</a> to look at its definition: </p><div class="fragment"><div class="line"><span class="keyword">template</span><<span class="keywordtype">int</span> StoreMode></div><div class="line"><span class="keyword">inline</span> <span class="keywordtype">void</span> writePacket(<span class="keywordtype">int</span> index, <span class="keyword">const</span> PacketScalar& x)</div><div class="line">{</div><div class="line"> internal::pstoret<Scalar, PacketScalar, StoreMode>(m_storage.data() + index, x);</div><div class="line">}</div></div><!-- fragment --><p> Here, <em>StoreMode</em> is <em><a class="el" href="group__enums.html#gga45fe06e29902b7a2773de05ba27b47a1ae12d0f8f869c40c76128260af2242bc8">Aligned</a></em>, indicating that we are doing a 128-bit-aligned write access, <em>PacketScalar</em> is a type representing a "SSE packet of 4 floats" and internal::pstoret is a function writing such a packet in memory. Their definitions are architecture-specific, we find them in <a class="el" href="SSE_2PacketMath_8h_source.html">src/Core/arch/SSE/PacketMath.h</a>:</p>
<p>The line in <a class="el" href="SSE_2PacketMath_8h_source.html">src/Core/arch/SSE/PacketMath.h</a> that determines the PacketScalar type (via a typedef in <a class="el" href="Matrix_8h_source.html">Matrix.h</a>) is: </p><div class="fragment"><div class="line"><span class="keyword">template</span><> <span class="keyword">struct </span>internal::packet_traits<float> { <span class="keyword">typedef</span> __m128 type; <span class="keyword">enum</span> {size=4}; };</div></div><!-- fragment --><p> Here, __m128 is a SSE-specific type. Notice that the enum <em>size</em> here is what was used to define <em>packetSize</em> above.</p>
<p>And here is the implementation of internal::pstoret: </p><div class="fragment"><div class="line"><span class="keyword">template</span><> <span class="keyword">inline</span> <span class="keywordtype">void</span> internal::pstore(<span class="keywordtype">float</span>* to, <span class="keyword">const</span> __m128& from) { _mm_store_ps(to, from); }</div></div><!-- fragment --><p> Here, __mm_store_ps is a SSE-specific intrinsic function, representing a single SSE instruction. The difference between internal::pstore and internal::pstoret is that internal::pstoret is a dispatcher handling both the aligned and unaligned cases, you find its definition in <a class="el" href="GenericPacketMath_8h_source.html">src/Core/GenericPacketMath.h</a>: </p><div class="fragment"><div class="line"><span class="keyword">template</span><<span class="keyword">typename</span> Scalar, <span class="keyword">typename</span> Packet, <span class="keywordtype">int</span> LoadMode></div><div class="line"><span class="keyword">inline</span> <span class="keywordtype">void</span> internal::pstoret(Scalar* to, <span class="keyword">const</span> Packet& from)</div><div class="line">{</div><div class="line"> <span class="keywordflow">if</span>(LoadMode == <a class="code" href="group__enums.html#gga45fe06e29902b7a2773de05ba27b47a1ae12d0f8f869c40c76128260af2242bc8">Aligned</a>)</div><div class="line"> internal::pstore(to, from);</div><div class="line"> <span class="keywordflow">else</span></div><div class="line"> internal::pstoreu(to, from);</div><div class="line">}</div></div><!-- fragment --><p>OK, that explains how writePacket() works. Now let's look into the packet() call. Remember that we are analyzing this line of code inside copyPacket(): </p><div class="fragment"><div class="line">derived().template writePacket<StoreMode>(index,</div><div class="line"> other.derived().template packet<LoadMode>(index));</div></div><!-- fragment --><p>Here, <em>other</em> is our sum expression <em>v</em> + <em>w</em>. The .derived() is just casting from <a class="el" href="classEigen_1_1MatrixBase.html" title="Base class for all dense matrices, vectors, and expressions. ">MatrixBase</a> to the subclass which here is <a class="el" href="classEigen_1_1CwiseBinaryOp.html" title="Generic expression where a coefficient-wise binary operator is applied to two expressions. ">CwiseBinaryOp</a>. So let's go to <a class="el" href="CwiseBinaryOp_8h_source.html">src/Core/CwiseBinaryOp.h</a>: </p><div class="fragment"><div class="line"><span class="keyword">class </span>CwiseBinaryOp</div><div class="line">{</div><div class="line"> <span class="comment">// ...</span></div><div class="line"> <span class="keyword">template</span><<span class="keywordtype">int</span> LoadMode></div><div class="line"> <span class="keyword">inline</span> PacketScalar packet(<span class="keywordtype">int</span> index)<span class="keyword"> const</span></div><div class="line"><span class="keyword"> </span>{</div><div class="line"> <span class="keywordflow">return</span> m_functor.packetOp(m_lhs.template packet<LoadMode>(index), m_rhs.template packet<LoadMode>(index));</div><div class="line"> }</div><div class="line">};</div></div><!-- fragment --><p> Here, <em>m_lhs</em> is the vector <em>v</em>, and <em>m_rhs</em> is the vector <em>w</em>. So the packet() function here is Matrix::packet(). The template parameter <em>LoadMode</em> is <em><a class="el" href="group__enums.html#gga45fe06e29902b7a2773de05ba27b47a1ae12d0f8f869c40c76128260af2242bc8">Aligned</a></em>. So we're looking at </p><div class="fragment"><div class="line"><span class="keyword">class </span>Matrix</div><div class="line">{</div><div class="line"> <span class="comment">// ...</span></div><div class="line"> <span class="keyword">template</span><<span class="keywordtype">int</span> LoadMode></div><div class="line"> <span class="keyword">inline</span> PacketScalar packet(<span class="keywordtype">int</span> index)<span class="keyword"> const</span></div><div class="line"><span class="keyword"> </span>{</div><div class="line"> <span class="keywordflow">return</span> internal::ploadt<Scalar, LoadMode>(m_storage.data() + index);</div><div class="line"> }</div><div class="line">};</div></div><!-- fragment --><p> We let you look up the definition of internal::ploadt in <a class="el" href="GenericPacketMath_8h_source.html">GenericPacketMath.h</a> and the internal::pload in <a class="el" href="SSE_2PacketMath_8h_source.html">src/Core/arch/SSE/PacketMath.h</a>. It is very similar to the above for internal::pstore.</p>
<p>Let's go back to CwiseBinaryOp::packet(). Once the packets from the vectors <em>v</em> and <em>w</em> have been returned, what does this function do? It calls m_functor.packetOp() on them. What is m_functor? Here we must remember what particular template specialization of <a class="el" href="classEigen_1_1CwiseBinaryOp.html" title="Generic expression where a coefficient-wise binary operator is applied to two expressions. ">CwiseBinaryOp</a> we're dealing with: </p><div class="fragment"><div class="line">CwiseBinaryOp<internal::scalar_sum_op<float>, <a class="code" href="group__matrixtypedefs.html#ga8028d921d43acd5605eabad41c254ef2">VectorXf</a>, VectorXf></div></div><!-- fragment --><p> So m_functor is an object of the empty class internal::scalar_sum_op<float>. As we mentioned above, don't worry about why we constructed an object of this empty class at all – it's an implementation detail, the point is that some other functors need to store member data.</p>
<p>Anyway, internal::scalar_sum_op is defined in src/Core/Functors.h: </p><div class="fragment"><div class="line"><span class="keyword">template</span><<span class="keyword">typename</span> Scalar> <span class="keyword">struct </span>internal::scalar_sum_op EIGEN_EMPTY_STRUCT {</div><div class="line"> <span class="keyword">inline</span> <span class="keyword">const</span> Scalar operator() (<span class="keyword">const</span> Scalar& a, <span class="keyword">const</span> Scalar& b)<span class="keyword"> const </span>{ <span class="keywordflow">return</span> a + b; }</div><div class="line"> <span class="keyword">template</span><<span class="keyword">typename</span> PacketScalar></div><div class="line"> <span class="keyword">inline</span> <span class="keyword">const</span> PacketScalar packetOp(<span class="keyword">const</span> PacketScalar& a, <span class="keyword">const</span> PacketScalar& b)<span class="keyword"> const</span></div><div class="line"><span class="keyword"> </span>{ <span class="keywordflow">return</span> internal::padd(a,b); }</div><div class="line">};</div></div><!-- fragment --><p> As you can see, all what packetOp() does is to call internal::padd on the two packets. Here is the definition of internal::padd from <a class="el" href="SSE_2PacketMath_8h_source.html">src/Core/arch/SSE/PacketMath.h</a>: </p><div class="fragment"><div class="line"><span class="keyword">template</span><> <span class="keyword">inline</span> __m128 internal::padd(<span class="keyword">const</span> __m128& a, <span class="keyword">const</span> __m128& b) { <span class="keywordflow">return</span> _mm_add_ps(a,b); }</div></div><!-- fragment --><p> Here, _mm_add_ps is a SSE-specific intrinsic function, representing a single SSE instruction.</p>
<p>To summarize, the loop </p><div class="fragment"><div class="line"><span class="keywordflow">for</span>(<span class="keywordtype">int</span> index = alignedStart; index < alignedEnd; index += packetSize)</div><div class="line">{</div><div class="line"> dst.template copyPacket<Derived2, Aligned, internal::assign_traits<Derived1,Derived2>::SrcAlignment>(index, src);</div><div class="line">}</div></div><!-- fragment --><p> has been compiled to the following code: for <em>index</em> going from 0 to the 11 ( = 48/4 - 1), read the i-th packet (of 4 floats) from the vector v and the i-th packet from the vector w using two __mm_load_ps SSE instructions, then add them together using a __mm_add_ps instruction, then store the result using a __mm_store_ps instruction.</p>
<p>There remains the second loop handling the last few (here, the last 2) coefficients: </p><div class="fragment"><div class="line"><span class="keywordflow">for</span>(<span class="keywordtype">int</span> index = alignedEnd; index < size; index++)</div><div class="line"> dst.copyCoeff(index, src);</div></div><!-- fragment --><p> However, it works just like the one we just explained, it is just simpler because there is no SSE vectorization involved here. copyPacket() becomes copyCoeff(), packet() becomes coeff(), writePacket() becomes coeffRef(). If you followed us this far, you can probably understand this part by yourself.</p>
<p>We see that all the C++ abstraction of <a class="el" href="namespaceEigen.html" title="Namespace containing all symbols from the Eigen library. ">Eigen</a> goes away during compilation and that we indeed are precisely controlling which assembly instructions we emit. Such is the beauty of C++! Since we have such precise control over the emitted assembly instructions, but such complex logic to choose the right instructions, we can say that <a class="el" href="namespaceEigen.html" title="Namespace containing all symbols from the Eigen library. ">Eigen</a> really behaves like an optimizing compiler. If you prefer, you could say that <a class="el" href="namespaceEigen.html" title="Namespace containing all symbols from the Eigen library. ">Eigen</a> behaves like a script for the compiler. In a sense, C++ template metaprogramming is scripting the compiler – and it's been shown that this scripting language is Turing-complete. See <a href="http://en.wikipedia.org/wiki/Template_metaprogramming">Wikipedia</a>. </p>
</div></div><!-- contents -->
</div><!-- doc-content -->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
<ul>
<li class="footer">Generated by
<a href="http://www.doxygen.org/index.html">
<img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.13 </li>
</ul>
</div>
</body>
</html>
|