/usr/share/doc/ganeti/html/design-chained-jobs.html is in ganeti-doc 2.16.0~rc2-1build1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Chained jobs — Ganeti 2.16.0~rc2 documentation</title>
<link rel="stylesheet" href="_static/style.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: './',
VERSION: '2.16.0~rc2',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true,
SOURCELINK_SUFFIX: '.txt'
};
</script>
<script type="text/javascript" src="_static/jquery.js"></script>
<script type="text/javascript" src="_static/underscore.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Unit tests for cmdlib / LogicalUnit’s" href="design-cmdlib-unittests.html" />
<link rel="prev" title="RADOS/Ceph support in Ganeti" href="design-ceph-ganeti-support.html" />
</head>
<body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="design-cmdlib-unittests.html" title="Unit tests for cmdlib / LogicalUnit’s"
accesskey="N">next</a></li>
<li class="right" >
<a href="design-ceph-ganeti-support.html" title="RADOS/Ceph support in Ganeti"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Ganeti 2.16.0~rc2 documentation</a> »</li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="chained-jobs">
<h1><a class="toc-backref" href="#id1">Chained jobs</a><a class="headerlink" href="#chained-jobs" title="Permalink to this headline">¶</a></h1>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Created:</th><td class="field-body">2011-Jul-14</td>
</tr>
<tr class="field-even field"><th class="field-name">Status:</th><td class="field-body">Implemented</td>
</tr>
<tr class="field-odd field"><th class="field-name">Ganeti-Version:</th><td class="field-body">2.5.0</td>
</tr>
</tbody>
</table>
<div class="contents topic" id="contents">
<p class="topic-title first">Contents</p>
<ul class="simple">
<li><a class="reference internal" href="#chained-jobs" id="id1">Chained jobs</a><ul>
<li><a class="reference internal" href="#current-state-and-shortcomings" id="id2">Current state and shortcomings</a></li>
<li><a class="reference internal" href="#proposed-changes" id="id3">Proposed changes</a><ul>
<li><a class="reference internal" href="#implementation-details" id="id4">Implementation details</a><ul>
<li><a class="reference internal" href="#status-while-waiting-for-dependencies" id="id5">Status while waiting for dependencies</a></li>
<li><a class="reference internal" href="#cost-of-deserialization" id="id6">Cost of deserialization</a></li>
</ul>
</li>
</ul>
</li>
<li><a class="reference internal" href="#other-discussed-solutions" id="id7">Other discussed solutions</a><ul>
<li><a class="reference internal" href="#job-level-attribute" id="id8">Job-level attribute</a></li>
<li><a class="reference internal" href="#client-side-logic" id="id9">Client-side logic</a></li>
</ul>
</li>
</ul>
</li>
</ul>
</div>
<p>This is a design document about the innards of Ganeti’s job processing.
Readers are advised to study previous design documents on the topic:</p>
<ul class="simple">
<li><a class="reference internal" href="design-2.0.html#jqueue-original-design"><span class="std std-ref">Original job queue</span></a></li>
<li><a class="reference internal" href="design-2.3.html#jqueue-job-priority-design"><span class="std std-ref">Job priorities</span></a></li>
<li><a class="reference internal" href="design-lu-generated-jobs.html"><span class="doc">LU-generated jobs</span></a></li>
</ul>
<div class="section" id="current-state-and-shortcomings">
<h2><a class="toc-backref" href="#id2">Current state and shortcomings</a><a class="headerlink" href="#current-state-and-shortcomings" title="Permalink to this headline">¶</a></h2>
<p>Ever since the introduction of the job queue with Ganeti 2.0 there have
been situations where we wanted to run several jobs in a specific order.
Due to the job queue’s current design, such a guarantee can not be
given. Jobs are run according to their priority, their ability to
acquire all necessary locks and other factors.</p>
<p>One way to work around this limitation is to do some kind of job
grouping in the client code. Once all jobs of a group have finished, the
next group is submitted and waited for. There are different kinds of
clients for Ganeti, some of which don’t share code (e.g. Python clients
vs. htools). This design proposes a solution which would be implemented
as part of the job queue in the master daemon.</p>
</div>
<div class="section" id="proposed-changes">
<h2><a class="toc-backref" href="#id3">Proposed changes</a><a class="headerlink" href="#proposed-changes" title="Permalink to this headline">¶</a></h2>
<p>With the implementation of <a class="reference internal" href="design-2.3.html#jqueue-job-priority-design"><span class="std std-ref">job priorities</span></a> the processing code was re-architected
and became a lot more versatile. It now returns jobs to the queue in
case the locks for an opcode can’t be acquired, allowing other
jobs/opcodes to be run in the meantime.</p>
<p>The proposal is to add a new, optional property to opcodes to define
dependencies on other jobs. Job X could define opcodes with a dependency
on the success of job Y and would only be run once job Y is finished. If
there’s a dependency on success and job Y failed, job X would fail as
well. Since such dependencies would use job IDs, the jobs still need to
be submitted in the right order.</p>
<p>The new attribute’s value would be a list of two-valued tuples. Each
tuple contains a job ID and a list of requested status for the job
depended upon. Only final status are accepted
(<code class="docutils literal"><span class="pre">canceled,</span> <span class="pre">success,</span> <span class="pre">error</span></code>). An empty list is
equivalent to specifying all final status (except
<code class="docutils literal"><span class="pre">canceled</span></code>, which is treated specially).
An opcode runs only once all its dependency requirements have been
fulfilled.</p>
<p>Any job referring to a cancelled job is also cancelled unless it
explicitly lists <code class="docutils literal"><span class="pre">canceled</span></code> as a requested
status.</p>
<p>In case a referenced job can not be found in the normal queue or the
archive, referring jobs fail as the status of the referenced job can’t
be determined.</p>
<p>With this change, clients can submit all wanted jobs in the right order
and proceed to wait for changes on all these jobs (see
<code class="docutils literal"><span class="pre">cli.JobExecutor</span></code>). The master daemon will take care of executing them
in the right order, while still presenting the client with a simple
interface.</p>
<p>Clients using the <code class="docutils literal"><span class="pre">SubmitManyJobs</span></code> interface can use relative job IDs
(negative integers) to refer to jobs in the same submission.</p>
<p>Example data structures:</p>
<div class="highlight-javascript"><div class="highlight"><pre><span></span><span class="c1">// First job</span>
<span class="p">{</span>
<span class="s2">"job_id"</span><span class="o">:</span> <span class="s2">"6151"</span><span class="p">,</span>
<span class="s2">"ops"</span><span class="o">:</span> <span class="p">[</span>
<span class="p">{</span> <span class="s2">"OP_ID"</span><span class="o">:</span> <span class="s2">"OP_INSTANCE_REPLACE_DISKS"</span><span class="p">,</span> <span class="cm">/*...*/</span> <span class="p">},</span>
<span class="p">{</span> <span class="s2">"OP_ID"</span><span class="o">:</span> <span class="s2">"OP_INSTANCE_FAILOVER"</span><span class="p">,</span> <span class="cm">/*...*/</span> <span class="p">},</span>
<span class="p">],</span>
<span class="p">}</span>
<span class="c1">// Second job, runs in parallel with first job</span>
<span class="p">{</span>
<span class="s2">"job_id"</span><span class="o">:</span> <span class="s2">"7687"</span><span class="p">,</span>
<span class="s2">"ops"</span><span class="o">:</span> <span class="p">[</span>
<span class="p">{</span> <span class="s2">"OP_ID"</span><span class="o">:</span> <span class="s2">"OP_INSTANCE_MIGRATE"</span><span class="p">,</span> <span class="cm">/*...*/</span> <span class="p">}</span>
<span class="p">],</span>
<span class="p">}</span>
<span class="c1">// Third job, depending on success of previous jobs</span>
<span class="p">{</span>
<span class="s2">"job_id"</span><span class="o">:</span> <span class="s2">"9218"</span><span class="p">,</span>
<span class="s2">"ops"</span><span class="o">:</span> <span class="p">[</span>
<span class="p">{</span> <span class="s2">"OP_ID"</span><span class="o">:</span> <span class="s2">"OP_NODE_SET_PARAMS"</span><span class="p">,</span>
<span class="s2">"depend"</span><span class="o">:</span> <span class="p">[</span>
<span class="p">[</span><span class="mi">6151</span><span class="p">,</span> <span class="p">[</span><span class="s2">"success"</span><span class="p">]],</span>
<span class="p">[</span><span class="mi">7687</span><span class="p">,</span> <span class="p">[</span><span class="s2">"success"</span><span class="p">]],</span>
<span class="p">],</span>
<span class="s2">"offline"</span><span class="o">:</span> <span class="nx">True</span><span class="p">,</span> <span class="p">},</span>
<span class="p">],</span>
<span class="p">}</span>
</pre></div>
</div>
<div class="section" id="implementation-details">
<h3><a class="toc-backref" href="#id4">Implementation details</a><a class="headerlink" href="#implementation-details" title="Permalink to this headline">¶</a></h3>
<div class="section" id="status-while-waiting-for-dependencies">
<h4><a class="toc-backref" href="#id5">Status while waiting for dependencies</a><a class="headerlink" href="#status-while-waiting-for-dependencies" title="Permalink to this headline">¶</a></h4>
<p>Jobs waiting for dependencies are certainly not in the queue anymore and
therefore need to change their status from “queued”. While waiting for
opcode locks the job is in the “waiting” status (the constant is named
<code class="docutils literal"><span class="pre">JOB_STATUS_WAITLOCK</span></code>, but the actual value is <code class="docutils literal"><span class="pre">waiting</span></code>). There the
following possibilities:</p>
<ol class="arabic">
<li><p class="first">Introduce a new status, e.g. “waitdeps”.</p>
<p>Pro:</p>
<ul class="simple">
<li>Clients know for sure a job is waiting for dependencies, not locks</li>
</ul>
<p>Con:</p>
<ul class="simple">
<li>Code and tests would have to be updated/extended for the new status</li>
<li>List of possible state transitions certainly wouldn’t get simpler</li>
<li>Breaks backwards compatibility, older clients might get confused</li>
</ul>
</li>
<li><p class="first">Use existing “waiting” status.</p>
<p>Pro:</p>
<ul class="simple">
<li>No client changes necessary, less code churn (note that there are
clients which don’t live in Ganeti core)</li>
<li>Clients don’t need to know the difference between waiting for a job
and waiting for a lock; it doesn’t make a difference</li>
<li>Fewer state transitions (see commit <code class="docutils literal"><span class="pre">5fd6b69479c0</span></code>, which removed
many state transitions and disk writes)</li>
</ul>
<p>Con:</p>
<ul class="simple">
<li>Not immediately visible what a job is waiting for, but it’s the
same issue with locks; this is the reason why the lock monitor
(<code class="docutils literal"><span class="pre">gnt-debug</span> <span class="pre">locks</span></code>) was introduced; job dependencies can be shown
as “locks” in the monitor</li>
</ul>
</li>
</ol>
<p>Based on these arguments, the proposal is to do the following:</p>
<ul>
<li><p class="first">Rename <code class="docutils literal"><span class="pre">JOB_STATUS_WAITLOCK</span></code> constant to <code class="docutils literal"><span class="pre">JOB_STATUS_WAITING</span></code> to
reflect its actual meaning: the job is waiting for something</p>
</li>
<li><p class="first">While waiting for dependencies and locks, jobs are in the “waiting”
status</p>
</li>
<li><p class="first">Export dependency information in lock monitor; example output:</p>
<div class="highlight-javascript"><div class="highlight"><pre><span></span><span class="nx">Name</span> <span class="nx">Mode</span> <span class="nx">Owner</span> <span class="nx">Pending</span>
<span class="nx">job</span><span class="o">/</span><span class="mi">27491</span> <span class="o">-</span> <span class="o">-</span> <span class="nx">success</span><span class="o">:</span><span class="nx">job</span><span class="o">/</span><span class="mi">34709</span><span class="p">,</span><span class="nx">job</span><span class="o">/</span><span class="mi">21459</span>
<span class="nx">job</span><span class="o">/</span><span class="mi">21459</span> <span class="o">-</span> <span class="o">-</span> <span class="nx">success</span><span class="p">,</span><span class="nx">error</span><span class="o">:</span><span class="nx">job</span><span class="o">/</span><span class="mi">14513</span>
</pre></div>
</div>
</li>
</ul>
</div>
<div class="section" id="cost-of-deserialization">
<h4><a class="toc-backref" href="#id6">Cost of deserialization</a><a class="headerlink" href="#cost-of-deserialization" title="Permalink to this headline">¶</a></h4>
<p>To determine the status of a dependency job the job queue must have
access to its data structure. Other queue operations already do this,
e.g. archiving, watching a job’s progress and querying jobs.</p>
<p>Initially (Ganeti 2.0/2.1) the job queue shared the job objects
in memory and protected them using locks. Ganeti 2.2 (see <a class="reference internal" href="design-2.2.html"><span class="doc">design
document</span></a>) changed the queue to read and deserialize jobs
from disk. This significantly reduced locking and code complexity.
Nowadays inotify is used to wait for changes on job files when watching
a job’s progress.</p>
<p>Reading from disk and deserializing certainly has some cost associated
with it, but it’s a significantly simpler architecture than
synchronizing in memory with locks. At the stage where dependencies are
evaluated the queue lock is held in shared mode, so different workers
can read at the same time (deliberately ignoring CPython’s interpreter
lock).</p>
<p>It is expected that the majority of executed jobs won’t use
dependencies and therefore won’t be affected.</p>
</div>
</div>
</div>
<div class="section" id="other-discussed-solutions">
<h2><a class="toc-backref" href="#id7">Other discussed solutions</a><a class="headerlink" href="#other-discussed-solutions" title="Permalink to this headline">¶</a></h2>
<div class="section" id="job-level-attribute">
<h3><a class="toc-backref" href="#id8">Job-level attribute</a><a class="headerlink" href="#job-level-attribute" title="Permalink to this headline">¶</a></h3>
<p>At a first look it might seem to be better to put dependencies on
previous jobs at a job level. However, it turns out that having the
option of defining only a single opcode in a job as having such a
dependency can be useful as well. The code complexity in the job queue
is equivalent if not simpler.</p>
<p>Since opcodes are guaranteed to run in order, clients can just define
the dependency on the first opcode.</p>
<p>Another reason for the choice of an opcode-level attribute is that the
current LUXI interface for submitting jobs is a bit restricted and would
need to be changed to allow the addition of job-level attributes,
potentially requiring changes in all LUXI clients and/or breaking
backwards compatibility.</p>
</div>
<div class="section" id="client-side-logic">
<h3><a class="toc-backref" href="#id9">Client-side logic</a><a class="headerlink" href="#client-side-logic" title="Permalink to this headline">¶</a></h3>
<p>There’s at least one implementation of a batched job executor twisted
into the <code class="docutils literal"><span class="pre">burnin</span></code> tool’s code. While certainly possible, a client-side
solution should be avoided due to the different clients already in use.
For one, the <a class="reference internal" href="rapi.html"><span class="doc">remote API</span></a> client shouldn’t import
non-standard modules. htools are written in Haskell and can’t use Python
modules. A batched job executor contains quite some logic. Even if
cleanly abstracted in a (Python) library, sharing code between different
clients is difficult if not impossible.</p>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table Of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">Chained jobs</a><ul>
<li><a class="reference internal" href="#current-state-and-shortcomings">Current state and shortcomings</a></li>
<li><a class="reference internal" href="#proposed-changes">Proposed changes</a><ul>
<li><a class="reference internal" href="#implementation-details">Implementation details</a><ul>
<li><a class="reference internal" href="#status-while-waiting-for-dependencies">Status while waiting for dependencies</a></li>
<li><a class="reference internal" href="#cost-of-deserialization">Cost of deserialization</a></li>
</ul>
</li>
</ul>
</li>
<li><a class="reference internal" href="#other-discussed-solutions">Other discussed solutions</a><ul>
<li><a class="reference internal" href="#job-level-attribute">Job-level attribute</a></li>
<li><a class="reference internal" href="#client-side-logic">Client-side logic</a></li>
</ul>
</li>
</ul>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="design-ceph-ganeti-support.html"
title="previous chapter">RADOS/Ceph support in Ganeti</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="design-cmdlib-unittests.html"
title="next chapter">Unit tests for cmdlib / LogicalUnit’s</a></p>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="_sources/design-chained-jobs.rst.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3>Quick search</h3>
<form class="search" action="search.html" method="get">
<div><input type="text" name="q" /></div>
<div><input type="submit" value="Go" /></div>
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="design-cmdlib-unittests.html" title="Unit tests for cmdlib / LogicalUnit’s"
>next</a></li>
<li class="right" >
<a href="design-ceph-ganeti-support.html" title="RADOS/Ceph support in Ganeti"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">Ganeti 2.16.0~rc2 documentation</a> »</li>
</ul>
</div>
<div class="footer" role="contentinfo">
© Copyright 2018, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015 Google Inc..
Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.6.7.
</div>
</body>
</html>
|