/usr/share/doc/ganeti/html/design-linuxha.html is in ganeti-doc 2.16.0~rc2-1build1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Linux HA integration — Ganeti 2.16.0~rc2 documentation</title>
<link rel="stylesheet" href="_static/style.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: './',
VERSION: '2.16.0~rc2',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true,
SOURCELINK_SUFFIX: '.txt'
};
</script>
<script type="text/javascript" src="_static/jquery.js"></script>
<script type="text/javascript" src="_static/underscore.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Submitting jobs from logical units" href="design-lu-generated-jobs.html" />
<link rel="prev" title="Improving location awareness of Ganeti" href="design-location.html" />
</head>
<body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="design-lu-generated-jobs.html" title="Submitting jobs from logical units"
accesskey="N">next</a></li>
<li class="right" >
<a href="design-location.html" title="Improving location awareness of Ganeti"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Ganeti 2.16.0~rc2 documentation</a> »</li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="linux-ha-integration">
<h1><a class="toc-backref" href="#id2">Linux HA integration</a><a class="headerlink" href="#linux-ha-integration" title="Permalink to this headline">¶</a></h1>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Created:</th><td class="field-body">2013-Oct-24</td>
</tr>
<tr class="field-even field"><th class="field-name">Status:</th><td class="field-body">Implemented</td>
</tr>
<tr class="field-odd field"><th class="field-name">Ganeti-Version:</th><td class="field-body">2.7.0</td>
</tr>
</tbody>
</table>
<div class="contents topic" id="contents">
<p class="topic-title first">Contents</p>
<ul class="simple">
<li><a class="reference internal" href="#linux-ha-integration" id="id2">Linux HA integration</a><ul>
<li><a class="reference internal" href="#current-state-and-shortcomings" id="id3">Current state and shortcomings</a></li>
<li><a class="reference internal" href="#proposed-changes" id="id4">Proposed changes</a><ul>
<li><a class="reference internal" href="#ganeti-ocf-agents" id="id5">Ganeti OCF agents</a></li>
<li><a class="reference internal" href="#master-role-agent" id="id6">Master role agent</a><ul>
<li><a class="reference internal" href="#future-improvements" id="id7">Future improvements</a></li>
</ul>
</li>
<li><a class="reference internal" href="#node-role-agent" id="id8">Node role agent</a><ul>
<li><a class="reference internal" href="#id1" id="id9">Future improvements</a></li>
</ul>
</li>
<li><a class="reference internal" href="#risks" id="id10">Risks</a></li>
<li><a class="reference internal" href="#code-status" id="id11">Code status</a></li>
</ul>
</li>
<li><a class="reference internal" href="#future-work" id="id12">Future work</a></li>
</ul>
</li>
</ul>
</div>
<p>This is a design document detailing the integration of Ganeti and Linux HA.</p>
<div class="section" id="current-state-and-shortcomings">
<h2><a class="toc-backref" href="#id3">Current state and shortcomings</a><a class="headerlink" href="#current-state-and-shortcomings" title="Permalink to this headline">¶</a></h2>
<p>Ganeti doesn’t currently support any self-healing or self-monitoring.</p>
<p>We are now working on trying to improve the situation in this regard:</p>
<ul class="simple">
<li>The <a class="reference internal" href="design-autorepair.html"><span class="doc">autorepair system</span></a> will take care
of self repairing a cluster in the presence of offline nodes.</li>
<li>The <a class="reference internal" href="design-monitoring-agent.html"><span class="doc">monitoring agent</span></a> will take care
of exporting data to monitoring.</li>
</ul>
<p>What is still missing is a way to self-detect “obvious” failures rapidly
and to:</p>
<ul class="simple">
<li>Maintain the master role active.</li>
<li>Offline resource that are obviously faulty so that the autorepair
system can perform its work.</li>
</ul>
</div>
<div class="section" id="proposed-changes">
<h2><a class="toc-backref" href="#id4">Proposed changes</a><a class="headerlink" href="#proposed-changes" title="Permalink to this headline">¶</a></h2>
<p>Linux-HA provides software that can be used to provide high availability
of services through automatic failover of resources. In particular
Pacemaker can be used together with Heartbeat or Corosync to make sure a
resource is kept active on a self-monitoring cluster.</p>
<div class="section" id="ganeti-ocf-agents">
<h3><a class="toc-backref" href="#id5">Ganeti OCF agents</a><a class="headerlink" href="#ganeti-ocf-agents" title="Permalink to this headline">¶</a></h3>
<p>The Ganeti agents will be slightly special in the HA world. The
following will apply:</p>
<ul class="simple">
<li>The agents will be able to be configured cluster-wise by tags (which
will be read on the nodes via ssconf_cluster_tags) and locally by
files on the filesystem that will allow them to “simulate” a
particular condition (eg. simulate a failure even if none is
detected).</li>
<li>The agents will be able to run in “full” or “partial” mode: in
“partial” mode they will always succeed, and thus never fail a
resource as long as a node is online, is running the linux HA software
and is responding to the network. In “full” mode they will also check
resources like the cluster master ip or master daemon, and act if they
are missing</li>
</ul>
<p>Note that for what Ganeti does OCF agents are needed: simply relying on
the LSB scripts will not work for the Ganeti service.</p>
</div>
<div class="section" id="master-role-agent">
<h3><a class="toc-backref" href="#id6">Master role agent</a><a class="headerlink" href="#master-role-agent" title="Permalink to this headline">¶</a></h3>
<p>This agent will manage the Ganeti master role. It needs to be configured
as a sticky resource (you don’t want to flap the master role around, do
you?) that is active on only one node. You can require quorum or fencing
to protect your cluster from multiple masters.</p>
<p>The agent will implement a stateless resource that considers itself
“started” only the master node, “stopped” on all master candidates and
in error mode for all other nodes.</p>
<p>Note that if not all your nodes are master candidates this resource
might have problems:</p>
<ul class="simple">
<li>if all nodes are configured to run the resource, heartbeat may decide
to “fence” (aka stonith) all your non-master-candidate nodes if told
to do so. This might not be what you want.</li>
<li>if only master candidates are configured as nodes for the resource,
beware of promotions and demotions, as nothing will update
automatically pacemaker should a change happen at the Ganeti level.</li>
</ul>
<p>Other solutions, such as reporting the resource just as “stopped” on non
master candidates as well might mean that pacemaker would choose the
“wrong” node to promote to master, which is also a bad idea.</p>
<div class="section" id="future-improvements">
<h4><a class="toc-backref" href="#id7">Future improvements</a><a class="headerlink" href="#future-improvements" title="Permalink to this headline">¶</a></h4>
<ul class="simple">
<li>Ability to work better with non-master-candidate nodes</li>
<li>Stateful resource that can “safely” transfer the master role between
online nodes (with queue drain and such)</li>
<li>Implement “full” mode, with detection of the cluster IP and the master
node daemon.</li>
</ul>
</div>
</div>
<div class="section" id="node-role-agent">
<h3><a class="toc-backref" href="#id8">Node role agent</a><a class="headerlink" href="#node-role-agent" title="Permalink to this headline">¶</a></h3>
<p>This agent will manage the Ganeti node role. It needs to be configured
as a cloned resource that is active on all nodes.</p>
<p>In partial mode it will always return success (and thus trigger a
failure only upon an HA level or network failure). Full mode, which
initially will not be implemented, could also check for the node daemon
being unresponsive or other local conditions (TBD).</p>
<p>When a failure happens the HA notification system will trigger on all
other nodes, including the master. The master will then be able to
offline the node. Any other work to restore instance availability should
then be done by the autorepair system.</p>
<p>The following cluster tags are supported:</p>
<ul class="simple">
<li><code class="docutils literal"><span class="pre">ocf:node-offline:use-powercycle</span></code>: Try to powercycle a node using
<code class="docutils literal"><span class="pre">gnt-node</span> <span class="pre">powercycle</span></code> when offlining.</li>
<li><code class="docutils literal"><span class="pre">ocf:node-offline:use-poweroff</span></code>: Try to power off a node using
<code class="docutils literal"><span class="pre">gnt-node</span> <span class="pre">power</span> <span class="pre">off</span></code> when offlining (requires OOB support).</li>
</ul>
<div class="section" id="id1">
<h4><a class="toc-backref" href="#id9">Future improvements</a><a class="headerlink" href="#id1" title="Permalink to this headline">¶</a></h4>
<ul class="simple">
<li>Handle draining differently than offlining</li>
<li>Handle different modes of “stopping” the service</li>
<li>Implement “full” mode</li>
</ul>
</div>
</div>
<div class="section" id="risks">
<h3><a class="toc-backref" href="#id10">Risks</a><a class="headerlink" href="#risks" title="Permalink to this headline">¶</a></h3>
<p>Running Ganeti with Pacemaker increases the risk of stability for your
Ganeti Cluster. Events like:</p>
<ul class="simple">
<li>stopping heartbeat or corosync on a node</li>
<li>corosync or heartbeat being killed for any reason</li>
<li>temporary failure in a node’s networking</li>
</ul>
<p>will trigger potentially dangerous operations such as node offlining or
master role failover. Moreover if the autorepair system will be working
they will be able to also trigger instance failovers or migrations, and
disk replaces.</p>
<p>Also note that operations like: master-failover, or manual node-modify
might interact badly with this setup depending on the way your HA system
is configured (see below).</p>
<p>This of course is an inherent problem with any Linux-HA installation,
but is probably more visible with Ganeti given that our resources tend
to be more heavyweight than many others managed in HA clusters (eg. an
IP address).</p>
</div>
<div class="section" id="code-status">
<h3><a class="toc-backref" href="#id11">Code status</a><a class="headerlink" href="#code-status" title="Permalink to this headline">¶</a></h3>
<p>This code is heavily experimental, and Linux-HA is a very complex
subsystem. <em>We might not be able to help you</em> if you decide to run this
code: please make sure you understand fully high availability on your
production machines. Ganeti only ships this code as an example but it
might need customization or complex configurations on your side for it
to run properly.</p>
<p><em>Ganeti does not automate HA configuration for your cluster</em>. You need
to do this job by hand. Good luck, don’t get it wrong.</p>
</div>
</div>
<div class="section" id="future-work">
<h2><a class="toc-backref" href="#id12">Future work</a><a class="headerlink" href="#future-work" title="Permalink to this headline">¶</a></h2>
<ul class="simple">
<li>Integrate the agents better with the ganeti monitoring</li>
<li>Add hooks for managing HA at node add/remove/modify/master-failover
operations</li>
<li>Provide a stonith system through Ganeti’s OOB system</li>
<li>Provide an OOB system that does “shunning” of offline nodes, for
emulating a real OOB, at least on all nodes</li>
</ul>
</div>
</div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table Of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Linux HA integration</a><ul>
<li><a class="reference internal" href="#current-state-and-shortcomings">Current state and shortcomings</a></li>
<li><a class="reference internal" href="#proposed-changes">Proposed changes</a><ul>
<li><a class="reference internal" href="#ganeti-ocf-agents">Ganeti OCF agents</a></li>
<li><a class="reference internal" href="#master-role-agent">Master role agent</a><ul>
<li><a class="reference internal" href="#future-improvements">Future improvements</a></li>
</ul>
</li>
<li><a class="reference internal" href="#node-role-agent">Node role agent</a><ul>
<li><a class="reference internal" href="#id1">Future improvements</a></li>
</ul>
</li>
<li><a class="reference internal" href="#risks">Risks</a></li>
<li><a class="reference internal" href="#code-status">Code status</a></li>
</ul>
</li>
<li><a class="reference internal" href="#future-work">Future work</a></li>
</ul>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="design-location.html"
title="previous chapter">Improving location awareness of Ganeti</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="design-lu-generated-jobs.html"
title="next chapter">Submitting jobs from logical units</a></p>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="_sources/design-linuxha.rst.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3>Quick search</h3>
<form class="search" action="search.html" method="get">
<div><input type="text" name="q" /></div>
<div><input type="submit" value="Go" /></div>
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="design-lu-generated-jobs.html" title="Submitting jobs from logical units"
>next</a></li>
<li class="right" >
<a href="design-location.html" title="Improving location awareness of Ganeti"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Ganeti 2.16.0~rc2 documentation</a> »</li>
</ul>
</div>
<div class="footer" role="contentinfo">
© Copyright 2018, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015 Google Inc..
Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.6.7.
</div>
</body>
</html>
|