This file is indexed.

/usr/share/doc/ganeti/html/design-htools-2.3.html is in ganeti-doc 2.16.0~rc2-1build1.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <title>Synchronising htools to Ganeti 2.3 &#8212; Ganeti 2.16.0~rc2 documentation</title>
    <link rel="stylesheet" href="_static/style.css" type="text/css" />
    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
    <script type="text/javascript">
      var DOCUMENTATION_OPTIONS = {
        URL_ROOT:    './',
        VERSION:     '2.16.0~rc2',
        COLLAPSE_INDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true,
        SOURCELINK_SUFFIX: '.txt'
      };
    </script>
    <script type="text/javascript" src="_static/jquery.js"></script>
    <script type="text/javascript" src="_static/underscore.js"></script>
    <script type="text/javascript" src="_static/doctools.js"></script>
    <link rel="search" title="Search" href="search.html" />
    <link rel="next" title="Ganeti 2.4 design" href="design-2.4.html" />
    <link rel="prev" title="Ganeti 2.3 design" href="design-2.3.html" /> 
  </head>
  <body>
    <div class="related" role="navigation" aria-label="related navigation">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="design-2.4.html" title="Ganeti 2.4 design"
             accesskey="N">next</a></li>
        <li class="right" >
          <a href="design-2.3.html" title="Ganeti 2.3 design"
             accesskey="P">previous</a> |</li>
        <li class="nav-item nav-item-0"><a href="index.html">Ganeti 2.16.0~rc2 documentation</a> &#187;</li> 
      </ul>
    </div>  

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body" role="main">
            
  <div class="section" id="synchronising-htools-to-ganeti-2-3">
<h1>Synchronising htools to Ganeti 2.3<a class="headerlink" href="#synchronising-htools-to-ganeti-2-3" title="Permalink to this headline"></a></h1>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Created:</th><td class="field-body">2011-Mar-18</td>
</tr>
<tr class="field-even field"><th class="field-name">Status:</th><td class="field-body">Implemented</td>
</tr>
<tr class="field-odd field"><th class="field-name">Ganeti-Version:</th><td class="field-body">2.5.0</td>
</tr>
</tbody>
</table>
<p>Ganeti 2.3 introduces a number of new features that change the cluster
internals significantly enough that the htools suite needs to be
updated accordingly in order to function correctly.</p>
<div class="section" id="shared-storage-support">
<h2>Shared storage support<a class="headerlink" href="#shared-storage-support" title="Permalink to this headline"></a></h2>
<p>Currently, the htools algorithms presume a model where all of an
instance’s resources is served from within the cluster, more
specifically from the nodes comprising the cluster. While is this
usual for memory and CPU, deployments which use shared storage will
invalidate this assumption for storage.</p>
<p>To account for this, we need to move some assumptions from being
implicit (and hardcoded) to being explicitly exported from Ganeti.</p>
<div class="section" id="new-instance-parameters">
<h3>New instance parameters<a class="headerlink" href="#new-instance-parameters" title="Permalink to this headline"></a></h3>
<p>It is presumed that Ganeti will export for all instances a new
<code class="docutils literal"><span class="pre">storage_type</span></code> parameter, that will denote either internal storage
(e.g. <em>plain</em> or <em>drbd</em>), or external storage.</p>
<p>Furthermore, a new <code class="docutils literal"><span class="pre">storage_pool</span></code> parameter will classify, for both
internal and external storage, the pool out of which the storage is
allocated. For internal storage, this will be either <code class="docutils literal"><span class="pre">lvm</span></code> (the pool
that provides space to both <code class="docutils literal"><span class="pre">plain</span></code> and <code class="docutils literal"><span class="pre">drbd</span></code> instances) or
<code class="docutils literal"><span class="pre">file</span></code> (for file-storage-based instances). For external storage,
this will be the respective NAS/SAN/cloud storage that backs up the
instance. Note that for htools, external storage pools are opaque; we
only care that they have an identifier, so that we can distinguish
between two different pools.</p>
<p>If these two parameters are not present, the instances will be
presumed to be <code class="docutils literal"><span class="pre">internal/lvm</span></code>.</p>
</div>
<div class="section" id="new-node-parameters">
<h3>New node parameters<a class="headerlink" href="#new-node-parameters" title="Permalink to this headline"></a></h3>
<p>For each node, it is expected that Ganeti will export what storage
types it supports and pools it has access to. So a classic 2.2 cluster
will have all nodes supporting <code class="docutils literal"><span class="pre">internal/lvm</span></code> and/or
<code class="docutils literal"><span class="pre">internal/file</span></code>, whereas a new shared storage only 2.3 cluster could
have <code class="docutils literal"><span class="pre">external/my-nas</span></code> storage.</p>
<p>Whatever the mechanism that Ganeti will use internally to configure
the associations between nodes and storage pools, we consider that
we’ll have available two node attributes inside htools: the list of internal
and external storage pools.</p>
</div>
<div class="section" id="external-storage-and-instances">
<h3>External storage and instances<a class="headerlink" href="#external-storage-and-instances" title="Permalink to this headline"></a></h3>
<p>Currently, for an instance we allow one cheap move type: failover to
the current secondary, if it is a healthy node, and four other
“expensive” (as in, including data copies) moves that involve changing
either the secondary or the primary node or both.</p>
<p>In presence of an external storage type, the following things will
change:</p>
<ul class="simple">
<li>the disk-based moves will be disallowed; this is already a feature
in the algorithm, controlled by a boolean switch, so adapting
external storage here will be trivial</li>
<li>instead of the current one secondary node, the secondaries will
become a list of potential secondaries, based on access to the
instance’s storage pool</li>
</ul>
<p>Except for this, the basic move algorithm remains unchanged.</p>
</div>
<div class="section" id="external-storage-and-nodes">
<h3>External storage and nodes<a class="headerlink" href="#external-storage-and-nodes" title="Permalink to this headline"></a></h3>
<p>Two separate areas will have to change for nodes and external storage.</p>
<p>First, then allocating instances (either as part of a move or a new
allocation), if the instance is using external storage, then the
internal disk metrics should be ignored (for both the primary and
secondary cases).</p>
<p>Second, the per-node metrics used in the cluster scoring must take
into account that nodes might not have internal storage at all, and
handle this as a well-balanced case (score 0).</p>
</div>
<div class="section" id="n-1-status">
<h3>N+1 status<a class="headerlink" href="#n-1-status" title="Permalink to this headline"></a></h3>
<p>Currently, computing the N+1 status of a node is simple:</p>
<ul class="simple">
<li>group the current secondary instances by their primary node, and
compute the sum of each instance group memory</li>
<li>choose the maximum sum, and check if it’s smaller than the current
available memory on this node</li>
</ul>
<p>In effect, computing the N+1 status is a per-node matter. However,
with shared storage, we don’t have secondary nodes, just potential
secondaries. Thus computing the N+1 status will be a cluster-level
matter, and much more expensive.</p>
<p>A simple version of the N+1 checks would be that for each instance
having said node as primary, we have enough memory in the cluster for
relocation. This means we would actually need to run allocation
checks, and update the cluster status from within allocation on one
node, while being careful that we don’t recursively check N+1 status
during this relocation, which is too expensive.</p>
<p>However, the shared storage model has some properties that changes the
rules of the computation. Speaking broadly (and ignoring hard
restrictions like tag based exclusion and CPU limits), the exact
location of an instance in the cluster doesn’t matter as long as
memory is available. This results in two changes:</p>
<ul class="simple">
<li>simply tracking the amount of free memory buckets is enough,
cluster-wide</li>
<li>moving an instance from one node to another would not change the N+1
status of any node, and only allocation needs to deal with N+1
checks</li>
</ul>
<p>Unfortunately, this very cheap solution fails in case of any other
exclusion or prevention factors.</p>
<p>TODO: find a solution for N+1 checks.</p>
</div>
</div>
<div class="section" id="node-groups-support">
<h2>Node groups support<a class="headerlink" href="#node-groups-support" title="Permalink to this headline"></a></h2>
<p>The addition of node groups has a small impact on the actual
algorithms, which will simply operate at node group level instead of
cluster level, but it requires the addition of new algorithms for
inter-node group operations.</p>
<p>The following two definitions will be used in the following
paragraphs:</p>
<dl class="docutils">
<dt>local group</dt>
<dd>The local group refers to a node’s own node group, or when speaking
about an instance, the node group of its primary node</dd>
<dt>regular cluster</dt>
<dd>A cluster composed of a single node group, or pre-2.3 cluster</dd>
<dt>super cluster</dt>
<dd>This term refers to a cluster which comprises multiple node groups,
as opposed to a 2.2 and earlier cluster with a single node group</dd>
</dl>
<p>In all the below operations, it’s assumed that Ganeti can gather the
entire super cluster state cheaply.</p>
<div class="section" id="balancing-changes">
<h3>Balancing changes<a class="headerlink" href="#balancing-changes" title="Permalink to this headline"></a></h3>
<p>Balancing will move from cluster-level balancing to group
balancing. In order to achieve a reasonable improvement in a super
cluster, without needing to keep state of what groups have been
already balanced previously, the balancing algorithm will run as
follows:</p>
<ol class="arabic simple">
<li>the cluster data is gathered</li>
<li>if this is a regular cluster, as opposed to a super cluster,
balancing will proceed normally as previously</li>
<li>otherwise, compute the cluster scores for all groups</li>
<li>choose the group with the worst score and see if we can improve it;
if not choose the next-worst group, so on</li>
<li>once a group has been identified, run the balancing for it</li>
</ol>
<p>Of course, explicit selection of a group will be allowed.</p>
<div class="section" id="super-cluster-operations">
<h4>Super cluster operations<a class="headerlink" href="#super-cluster-operations" title="Permalink to this headline"></a></h4>
<p>Beside the regular group balancing, in a super cluster we have more
operations.</p>
<div class="section" id="redistribution">
<h5>Redistribution<a class="headerlink" href="#redistribution" title="Permalink to this headline"></a></h5>
<p>In a regular cluster, once we run out of resources (offline nodes
which can’t be fully evacuated, N+1 failures, etc.) there is nothing
we can do unless nodes are added or instances are removed.</p>
<p>In a super cluster however, there might be resources available in
another group, so there is the possibility of relocating instances
between groups to re-establish N+1 success within each group.</p>
<p>One difficulty in the presence of both super clusters and shared
storage is that the move paths of instances are quite complicated;
basically an instance can move inside its local group, and to any
other groups which have access to the same storage type and storage
pool pair. In effect, the super cluster is composed of multiple
‘partitions’, each containing one or more groups, but a node is
simultaneously present in multiple partitions, one for each storage
type and storage pool it supports. As such, the interactions between
the individual partitions are too complex for non-trivial clusters to
assume we can compute a perfect solution: we might need to move some
instances using shared storage pool ‘A’ in order to clear some more
memory to accept an instance using local storage, which will further
clear more VCPUs in a third partition, etc. As such, we’ll limit
ourselves at simple relocation steps within a single partition.</p>
<p>Algorithm:</p>
<ol class="arabic">
<li><p class="first">read super cluster data, and exit if cluster doesn’t allow
inter-group moves</p>
</li>
<li><p class="first">filter out any groups that are “alone” in their partition
(i.e. no other group sharing at least one storage method)</p>
</li>
<li><p class="first">determine list of healthy versus unhealthy groups:</p>
<blockquote>
<div><ol class="arabic simple">
<li>a group which contains offline nodes still hosting instances is
definitely not healthy</li>
<li>a group which has nodes failing N+1 is ‘weakly’ unhealthy</li>
</ol>
</div></blockquote>
</li>
<li><p class="first">if either list is empty, exit (no work to do, or no way to fix problems)</p>
</li>
<li><p class="first">for each unhealthy group:</p>
<blockquote>
<div><ol class="arabic simple">
<li>compute the instances that are causing the problems: all
instances living on offline nodes, all instances living as
secondary on N+1 failing nodes, all instances living as primaries
on N+1 failing nodes (in this order)</li>
<li>remove instances, one by one, until the source group is healthy
again</li>
<li>try to run a standard allocation procedure for each instance on
all potential groups in its partition</li>
<li>if all instances were relocated successfully, it means we have a
solution for repairing the original group</li>
</ol>
</div></blockquote>
</li>
</ol>
</div>
<div class="section" id="compression">
<h5>Compression<a class="headerlink" href="#compression" title="Permalink to this headline"></a></h5>
<p>In a super cluster which has had many instance reclamations, it is
possible that while none of the groups is empty, overall there is
enough empty capacity that an entire group could be removed.</p>
<p>The algorithm for “compressing” the super cluster is as follows:</p>
<ol class="arabic">
<li><p class="first">read super cluster data</p>
</li>
<li><p class="first">compute total <em>(memory, disk, cpu)</em>, and free <em>(memory, disk, cpu)</em>
for the super-cluster</p>
</li>
<li><p class="first">computer per-group used and free <em>(memory, disk, cpu)</em></p>
</li>
<li><p class="first">select candidate groups for evacuation:</p>
<blockquote>
<div><ol class="arabic simple">
<li>they must be connected to other groups via a common storage type
and pool</li>
<li>they must have fewer used resources than the global free
resources (minus their own free resources)</li>
</ol>
</div></blockquote>
</li>
<li><p class="first">for each of these groups, try to relocate all its instances to
connected peer groups</p>
</li>
<li><p class="first">report the list of groups that could be evacuated, or if instructed
so, perform the evacuation of the group with the largest free
resources (i.e. in order to reclaim the most capacity)</p>
</li>
</ol>
</div>
<div class="section" id="load-balancing">
<h5>Load balancing<a class="headerlink" href="#load-balancing" title="Permalink to this headline"></a></h5>
<p>Assuming a super cluster using shared storage, where instance failover
is cheap, it should be possible to do a load-based balancing across
groups.</p>
<p>As opposed to the normal balancing, where we want to balance on all
node attributes, here we should look only at the load attributes; in
other words, compare the available (total) node capacity with the
(total) load generated by instances in a given group, and computing
such scores for all groups, trying to see if we have any outliers.</p>
<p>Once a reliable load-weighting method for groups exists, we can apply
a modified version of the cluster scoring method to score not
imbalances across nodes, but imbalances across groups which result in
a super cluster load-related score.</p>
</div>
</div>
</div>
<div class="section" id="allocation-changes">
<h3>Allocation changes<a class="headerlink" href="#allocation-changes" title="Permalink to this headline"></a></h3>
<p>It is important to keep the allocation method across groups internal
(in the Ganeti/Iallocator combination), instead of delegating it to an
external party (e.g. a RAPI client). For this, the IAllocator protocol
should be extended to provide proper group support.</p>
<p>For htools, the new algorithm will work as follows:</p>
<ol class="arabic simple">
<li>read/receive cluster data from Ganeti</li>
<li>filter out any groups that do not supports the requested storage
method</li>
<li>for remaining groups, try allocation and compute scores after
allocation</li>
<li>sort valid allocation solutions accordingly and return the entire
list to Ganeti</li>
</ol>
<p>The rationale for returning the entire group list, and not only the
best choice, is that we anyway have the list, and Ganeti might have
other criteria (e.g. the best group might be busy/locked down, etc.)
so even if from the point of view of resources it is the best choice,
it might not be the overall best one.</p>
</div>
<div class="section" id="node-evacuation-changes">
<h3>Node evacuation changes<a class="headerlink" href="#node-evacuation-changes" title="Permalink to this headline"></a></h3>
<p>While the basic concept in the <code class="docutils literal"><span class="pre">multi-evac</span></code> iallocator
mode remains unchanged (it’s a simple local group issue), when failing
to evacuate and running in a super cluster, we could have resources
available elsewhere in the cluster for evacuation.</p>
<p>The algorithm for computing this will be the same as the one for super
cluster compression and redistribution, except that the list of
instances is fixed to the ones living on the nodes to-be-evacuated.</p>
<p>If the inter-group relocation is successful, the result to Ganeti will
not be a local group evacuation target, but instead (for each
instance) a pair <em>(remote group, nodes)</em>. Ganeti itself will have to
decide (based on user input) whether to continue with inter-group
evacuation or not.</p>
<p>In case that Ganeti doesn’t provide complete cluster data, just the
local group, the inter-group relocation won’t be attempted.</p>
</div>
</div>
</div>


          </div>
        </div>
      </div>
      <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
        <div class="sphinxsidebarwrapper">
  <h3><a href="index.html">Table Of Contents</a></h3>
  <ul>
<li><a class="reference internal" href="#">Synchronising htools to Ganeti 2.3</a><ul>
<li><a class="reference internal" href="#shared-storage-support">Shared storage support</a><ul>
<li><a class="reference internal" href="#new-instance-parameters">New instance parameters</a></li>
<li><a class="reference internal" href="#new-node-parameters">New node parameters</a></li>
<li><a class="reference internal" href="#external-storage-and-instances">External storage and instances</a></li>
<li><a class="reference internal" href="#external-storage-and-nodes">External storage and nodes</a></li>
<li><a class="reference internal" href="#n-1-status">N+1 status</a></li>
</ul>
</li>
<li><a class="reference internal" href="#node-groups-support">Node groups support</a><ul>
<li><a class="reference internal" href="#balancing-changes">Balancing changes</a><ul>
<li><a class="reference internal" href="#super-cluster-operations">Super cluster operations</a><ul>
<li><a class="reference internal" href="#redistribution">Redistribution</a></li>
<li><a class="reference internal" href="#compression">Compression</a></li>
<li><a class="reference internal" href="#load-balancing">Load balancing</a></li>
</ul>
</li>
</ul>
</li>
<li><a class="reference internal" href="#allocation-changes">Allocation changes</a></li>
<li><a class="reference internal" href="#node-evacuation-changes">Node evacuation changes</a></li>
</ul>
</li>
</ul>
</li>
</ul>

  <h4>Previous topic</h4>
  <p class="topless"><a href="design-2.3.html"
                        title="previous chapter">Ganeti 2.3 design</a></p>
  <h4>Next topic</h4>
  <p class="topless"><a href="design-2.4.html"
                        title="next chapter">Ganeti 2.4 design</a></p>
  <div role="note" aria-label="source link">
    <h3>This Page</h3>
    <ul class="this-page-menu">
      <li><a href="_sources/design-htools-2.3.rst.txt"
            rel="nofollow">Show Source</a></li>
    </ul>
   </div>
<div id="searchbox" style="display: none" role="search">
  <h3>Quick search</h3>
    <form class="search" action="search.html" method="get">
      <div><input type="text" name="q" /></div>
      <div><input type="submit" value="Go" /></div>
      <input type="hidden" name="check_keywords" value="yes" />
      <input type="hidden" name="area" value="default" />
    </form>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
        </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="related" role="navigation" aria-label="related navigation">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="design-2.4.html" title="Ganeti 2.4 design"
             >next</a></li>
        <li class="right" >
          <a href="design-2.3.html" title="Ganeti 2.3 design"
             >previous</a> |</li>
        <li class="nav-item nav-item-0"><a href="index.html">Ganeti 2.16.0~rc2 documentation</a> &#187;</li> 
      </ul>
    </div>
    <div class="footer" role="contentinfo">
        &#169; Copyright 2018, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015 Google Inc..
      Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.6.7.
    </div>
  </body>
</html>