/usr/share/doc/simgrid/html/FAQ.html is in simgrid-doc 3.10-7.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<title>SimGrid: Frequently Asked Questions</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript">
$(document).ready(initResizable);
</script>
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="search/search.js"></script>
<script type="text/javascript">
$(document).ready(function() { searchBox.OnSelectItem(0); });
</script>
<link href="stylesheet.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 56px;">
<td style="padding-left: 0.5em;">
<div id="projectname">SimGrid
 <span id="projectnumber">3.10</span>
</div>
<div id="projectbrief">Versatile Simulation of Distributed Systems</div>
</td>
</tr>
</tbody>
</table>
</div>
<div id="navrow1" class="tabs">
<ul class="tablist">
<li><a href="http://simgrid.gforge.inria.fr/"><span>Home page</span></a></li>
<li><a href="http://simgrid.gforge.inria.fr/documentation.html"><span>Online documentation</span></a></li>
<li><a href="https://gforge.inria.fr/projects/simgrid"><span>Dev's Corner</span></a></li>
<li> <div id="MSearchBox" class="MSearchBoxInactive">
<span class="left">
<img id="MSearchSelect" src="search/mag_sel.png"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
alt=""/>
<input type="text" id="MSearchField" value="Search" accesskey="S"
onfocus="searchBox.OnSearchFieldFocus(true)"
onblur="searchBox.OnSearchFieldFocus(false)"
onkeyup="searchBox.OnSearchFieldChange(event)"/>
</span><span class="right">
<a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a>
</span>
</div>
</li>
</ul>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.1.2 -->
<script type="text/javascript">
var searchBox = new SearchBox("searchBox", "search",false,'Search');
</script>
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
<div id="nav-tree">
<div id="nav-tree-contents">
<div id="nav-sync" class="sync"></div>
</div>
</div>
<div id="splitbar" style="-moz-user-select:none;"
class="ui-resizable-handle">
</div>
</div>
<script type="text/javascript">
$(document).ready(function(){initNavTree('FAQ.html','');});
</script>
<div id="doc-content">
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
onmouseover="return searchBox.OnSearchSelectShow()"
onmouseout="return searchBox.OnSearchSelectHide()"
onkeydown="return searchBox.OnSearchSelectKey(event)">
<a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(0)"><span class="SelectionMark"> </span>All</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(1)"><span class="SelectionMark"> </span>Data Structures</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(2)"><span class="SelectionMark"> </span>Functions</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(3)"><span class="SelectionMark"> </span>Variables</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(4)"><span class="SelectionMark"> </span>Typedefs</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(5)"><span class="SelectionMark"> </span>Enumerations</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(6)"><span class="SelectionMark"> </span>Enumerator</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(7)"><span class="SelectionMark"> </span>Groups</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(8)"><span class="SelectionMark"> </span>Pages</a></div>
<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<iframe src="javascript:void(0)" frameborder="0"
name="MSearchResults" id="MSearchResults">
</iframe>
</div>
<div class="header">
<div class="headertitle">
<div class="title">Frequently Asked Questions </div> </div>
</div><!--header-->
<div class="contents">
<div class="toc"><h3>Table of Contents</h3>
<ul><li class="level1"><a href="#faq_simgrid">I'm new to SimGrid. I have some questions. Where should I start?</a><ul><li class="level2"><a href="#faq_interfaces">What is the difference between MSG and SimDag? Do they serve the same purpose?</a></li>
<li class="level2"><a href="#faq_visualization">Visualizing and analyzing the results</a></li>
<li class="level2"><a href="#faq_C">Argh! Do I really have to code in C?</a></li>
</ul>
</li>
<li class="level1"><a href="#faq_howto">Feature related questions</a><ul><li class="level2"><a href="#faq_MIA">"Could you please add (your favorite feature here) to SimGrid?"</a></li>
<li class="level2"><a href="#faq_MIA_MSG">MSG features</a><ul><li class="level3"><a href="#faq_MIA_examples">I want some more complex MSG examples!</a></li>
<li class="level3"><a href="#faq_MIA_taskdup">Missing in action: MSG Task duplication/replication</a></li>
<li class="level3"><a href="#faq_MIA_asynchronous">I want to do asynchronous communications in MSG</a></li>
<li class="level3"><a href="#faq_MIA_thread_synchronization">I need to synchronize my MSG processes</a></li>
<li class="level3"><a href="#faq_MIA_host_load">Where is the get_host_load function hidden in MSG?</a></li>
<li class="level3"><a href="#faq_MIA_communication_time">How can I get the *real* communication time?</a></li>
</ul>
</li>
<li class="level2"><a href="#faq_MIA_SimDag">SimDag related questions</a><ul><li class="level3"><a href="#faq_SG_comm">Implementing communication delays between tasks.</a></li>
<li class="level3"><a href="#faq_SG_DAG">How to implement a distributed dynamic scheduler of DAGs.</a></li>
</ul>
</li>
<li class="level2"><a href="#faq_MIA_generic">Generic features</a><ul><li class="level3"><a href="#faq_more_processes">Increasing the amount of simulated processes</a></li>
<li class="level3"><a href="#faq_MIA_batch_scheduler">Is there a native support for batch schedulers in SimGrid?</a></li>
<li class="level3"><a href="#faq_MIA_checkpointing">I need a checkpointing thing</a></li>
</ul>
</li>
<li class="level2"><a href="#faq_platform">Platform building and Dynamic resources</a><ul><li class="level3"><a href="#faq_platform_example">Where can I find SimGrid platform files?</a></li>
<li class="level3"><a href="#faq_platform_alnem">How can I automatically map an existing platform?</a></li>
<li class="level3"><a href="#faq_platform_synthetic">Generating synthetic but realistic platforms</a></li>
<li class="level3"><a href="#faq_platform_random">Using random variable for the resource power or availability</a></li>
</ul>
</li>
<li class="level1"><a href="#faq_troubleshooting">Troubleshooting</a><ul></li>
<li class="level2"><a href="#faq_trouble_changelog">The feature X stopped to work after my last update</a></li>
<li class="level2"><a href="#faq_trouble_lib_compil">SimGrid compilation and installation problems</a><ul><li class="level3"><a href="#faq_trouble_lib_config">cmake fails!</a></li>
<li class="level3"><a href="#faq_trouble_distcheck">Dude! "ctest" fails on my machine!</a></li>
</ul>
</li>
<li class="level2"><a href="#faq_trouble_compil">User code compilation problems</a><ul><li class="level3"><a href="#faq_trouble_err_logcat">"gcc: _simgrid_this_log_category_does_not_exist__??? undeclared (first use in this function)"</a></li>
<li class="level3"><a href="#faq_trouble_pthreadstatic">"gcc: undefined reference to pthread_key_create"</a></li>
<li class="level3"><a href="#faq_trouble_lib_msg_deprecated">"gcc: undefined reference to MSG_*"</a></li>
</ul>
</li>
<li class="level2"><a href="#faq_trouble_errors">Runtime error messages</a><ul><li class="level3"><a href="#faq_flexml_limit">"surf_parse_lex: Assertion `next limit' failed."</a></li>
<li class="level3"><a href="#faq_trouble_errors_big_fat_warning">I'm told that my XML files are too old.</a></li>
</ul>
</li>
<li class="level2"><a href="#faq_trouble_valgrind">Valgrind-related and other debugger issues</a><ul><li class="level3"><a href="#faq_trouble_vg_longjmp">longjmp madness in valgrind</a></li>
<li class="level3"><a href="#faq_trouble_vg_libc">Valgrind spits tons of errors about backtraces!</a></li>
<li class="level3"><a href="#faq_trouble_backtraces">Truncated backtraces</a></li>
</ul>
</li>
<li class="level2"><a href="#faq_deadlock">There is a deadlock in my code!!!</a></li>
<li class="level2"><a href="#faq_surf_network_latency">I get weird timings when I play with the latencies.</a></li>
<li class="level2"><a href="#faq_bugrepport">So I've found a bug in SimGrid. How to report it?</a></li>
</ul>
</li>
</ul>
</div>
<div class="textblock"><h1><a class="anchor" id="faq_simgrid"></a>
I'm new to SimGrid. I have some questions. Where should I start?</h1>
<p>You are at the right place... Having a look to these <a href="http://www.loria.fr/~quinson/blog/2010/06/28/Tutorial_at_HPCS/">the slides of the HPCS'10 tutorial</a> (or to these <a href="http://graal.ens-lyon.fr/~alegrand/articles/slides_g5k_simul.pdf">ancient slides</a>, or to these <a href="http://graal.ens-lyon.fr/~alegrand/articles/Simgrid-Introduction.pdf">"obsolete" slides</a>) may give you some insights on what SimGrid can help you to do and what are its limitations. Then you definitely should read the <a class="el" href="group__MSG__examples.html">MSG examples</a>.</p>
<p>If you are stuck at any point and if this FAQ cannot help you, please drop us a mail to the user mailing list: <a href="#" onclick="location.href='mai'+'lto:'+'sim'+'gr'+'id-'+'us'+'er@'+'li'+'sts'+'.g'+'for'+'ge'+'.in'+'ri'+'a.f'+'r'; return false;">simgr<span style="display: none;">.nosp@m.</span>id-u<span style="display: none;">.nosp@m.</span>ser@l<span style="display: none;">.nosp@m.</span>ists<span style="display: none;">.nosp@m.</span>.gfor<span style="display: none;">.nosp@m.</span>ge.i<span style="display: none;">.nosp@m.</span>nria.<span style="display: none;">.nosp@m.</span>fr</a>.</p>
<h2><a class="anchor" id="faq_interfaces"></a>
What is the difference between MSG and SimDag? Do they serve the same purpose?</h2>
<p>It depend on how you define "purpose", I guess ;)</p>
<p>They all allow you to build a prototype of application which you can run within the simulator afterward. They all share the same simulation kernel, which is the core of the SimGrid project. They differ by the way you express your application.</p>
<p>With SimDag, you express your code as a collection of interdependent parallel tasks. So, in this model, applications can be seen as a DAG of tasks. This is the interface of choice for people wanting to port old code designed for SimGrid v1 or v2 to the framework current version.</p>
<p>With MSG, your application is seen as a set of communicating processes, exchanging data by the way of messages and performing computation on their own.</p>
<h2><a class="anchor" id="faq_visualization"></a>
Visualizing and analyzing the results</h2>
<p>It is sometime convenient to "see" how the agents are behaving. If you like colors, you can use <code>tools/MSG_visualization/colorize.pl </code> as a filter to your MSG outputs. It works directly with INFO. Beware, INFO() prints on stderr. Do not forget to redirect if you want to filter (e.g. with bash): </p>
<pre class="fragment">./msg_test small_platform.xml small_deployment.xml 2>&1 | ../../tools/MSG_visualization/colorize.pl
</pre><p>We also have a more graphical output. Have a look at section <a class="el" href="options.html#options_tracing">Configuring the tracing subsystem</a>.</p>
<h2><a class="anchor" id="faq_C"></a>
Argh! Do I really have to code in C?</h2>
<p>Currently bindings on top of MSG are supported for Java, Ruby and Lua. You can find a few documentation about them on the doc page. Note that bindings are released separately from the main dist and so have their own version numbers.</p>
<p>Moreover If you use C++, you should be able to use the SimGrid library as a standard C library and everything should work fine (simply <em>link</em> against this library; recompiling SimGrid with a C++ compiler won't work and it wouldn't help if you could).</p>
<p>For now, we do not feel a real demand for any other language. But if you think there is one, please speak up!</p>
<h1><a class="anchor" id="faq_howto"></a>
Feature related questions</h1>
<h2><a class="anchor" id="faq_MIA"></a>
"Could you please add (your favorite feature here) to SimGrid?"</h2>
<p>Here is the deal. The whole SimGrid project (MSG, SURF, ...) is meant to be kept as simple and generic as possible. We cannot add functions for everybody's needs when these functions can easily be built from the ones already in the API. Most of the time, it is possible and when it was not possible we always have upgraded the API accordingly. When somebody asks us a question like "How to do that?
Is there a function in the API to simply do this?", we're always glad to answer and help. However if we don't need this code for our own need, there is no chance we're going to write it... it's your job! :) The counterpart to our answers is that once you come up with a neat implementation of this feature (task duplication, RPC, thread synchronization, ...), you should send it to us and we will be glad to add it to the distribution. Thus, other people will take advantage of it (and we don't have to answer this question again and again ;).</p>
<p>You'll find in this section a few "Missing In Action" features. Many people have asked about it and we have given hints on how to simply do it with MSG. Feel free to contribute...</p>
<h2><a class="anchor" id="faq_MIA_MSG"></a>
MSG features</h2>
<h3><a class="anchor" id="faq_MIA_examples"></a>
I want some more complex MSG examples!</h3>
<p>Many people have come to ask me a more complex example and each time, they have realized afterward that the basics were in the previous three examples.</p>
<p>Of course they have often been needing more complex functions like <a class="el" href="group__m__process__management.html#gac00bbc4ebc824d14a3f6719de8190618" title="Suspend the process.">MSG_process_suspend()</a>, <a class="el" href="group__m__process__management.html#ga9f25a30269f39c7683d18f9d4e1fd331" title="Resume a suspended process.">MSG_process_resume()</a> and MSG_process_isSuspended() (to perform synchronization), or <a class="el" href="group__msg__deprecated__functions.html#gae72af17e16cda4efcfad301a4170accd" title="Test whether there is a pending communication on a channel.">MSG_task_Iprobe()</a> and <a class="el" href="group__msg__task__usage.html#gab7c18061b167e38f28886512ea73be6b" title="Sleep for the specified number of seconds.">MSG_process_sleep()</a> (to avoid blocking receptions), or even <a class="el" href="group__m__process__management.html#ga943d27b38754b2af9f78e80cc29db306" title="Creates and runs a new msg_process_t.">MSG_process_create()</a> (to design asynchronous communications or computations). But the examples are sufficient to start.</p>
<p>We know. We should add some more examples, but not really some more complex ones... We should add some examples that illustrate some other functionalists (like how to simply encode asynchronous communications, RPC, process migrations, thread synchronization, ...) and we will do it when we will have a little bit more time. We have tried to document the examples so that they are understandable. Tell us if something is not clear and once again feel free to participate! :)</p>
<h3><a class="anchor" id="faq_MIA_taskdup"></a>
Missing in action: MSG Task duplication/replication</h3>
<p>There is no task duplication in MSG. When you create a task, you can process it or send it somewhere else. As soon as a process has sent this task, he doesn't have this task anymore. It's gone. The receiver process has got the task. However, you could decide upon receiving to create a "copy" of a task but you have to handle by yourself the semantic associated to this "duplication".</p>
<p>As we already told, we prefer keeping the API as simple as possible. This kind of feature is rather easy to implement by users and the semantic you associate really depends on people. Having a <em>generic</em> task duplication mechanism is not that trivial (in particular because of the data field). That is why I would recommend that you write it by yourself even if I can give you advice on how to do it.</p>
<p>You have the following functions to get information about a task: <a class="el" href="group__m__task__management.html#ga2adce7c9dbe8ecab1d6db1fdc420ea80" title="Return the name of a msg_task_t.">MSG_task_get_name()</a>, <a class="el" href="group__m__task__management.html#gae6c900f519f6280a9e7a615a2e47cb96" title="Returns the computation amount needed to process a task msg_task_t.">MSG_task_get_compute_duration()</a>, <a class="el" href="group__m__task__management.html#gada131788294fe436d9cc1044319b111f" title="Returns the remaining computation amount of a task msg_task_t.">MSG_task_get_remaining_computation()</a>, <a class="el" href="group__m__task__management.html#ga5d7514ea11b8a51bad0cbc7ad0636856" title="Returns the size of the data attached to a task msg_task_t.">MSG_task_get_data_size()</a>, and <a class="el" href="group__m__task__management.html#gab0ef7c694fcc69282cf4da632ae64d5b" title="Return the user data of a msg_task_t.">MSG_task_get_data()</a>.</p>
<p>You could use a dictionary (<a class="el" href="group__XBT__dict__cons.html#gac60ea15fce6ea593be308de876712f04" title="Dictionary data type (opaque structure)">xbt_dict_t</a>) of dynars (<a class="el" href="group__XBT__dynar__cons.html#gac826571988d2b63ae225e5c62ecdbc79" title="Dynar data type (opaque type)">xbt_dynar_t</a>). If you still don't see how to do it, please come back to us...</p>
<h3><a class="anchor" id="faq_MIA_asynchronous"></a>
I want to do asynchronous communications in MSG</h3>
<p>In the past (version <= 3.4), there was no function to perform asynchronous communications. It could easily be implemented by creating new process when needed though. Since version 3.5, we have introduced the following functions:</p>
<ul>
<li><a class="el" href="group__msg__task__usage.html#gaa280111a523d70bb36061e6e52ca1b42" title="Sends a task on a mailbox.">MSG_task_isend()</a></li>
<li><a class="el" href="group__msg__task__usage.html#gadb4623d587d75ad3b078928f0f1b27a5" title="Starts listening for receiving a task from an asynchronous communication.">MSG_task_irecv()</a></li>
<li><a class="el" href="group__msg__task__usage.html#gafd22dad001b95804f5e93aa146c84fd3" title="Checks whether a communication is done, and if yes, finalizes it.">MSG_comm_test()</a></li>
<li><a class="el" href="group__msg__task__usage.html#ga2ae956f1e7a1014652d85b5f8db42aa3" title="Wait for the completion of a communication.">MSG_comm_wait()</a></li>
<li><a class="el" href="group__msg__task__usage.html#ga01ce9097c976b96664e6785f6ab5d4b6" title="This function is called by a sender and permit to wait for each communication.">MSG_comm_waitall()</a></li>
<li><a class="el" href="group__msg__task__usage.html#ga62772e7b378bc485c114231af4b8f596" title="This function waits for the first communication finished in a list.">MSG_comm_waitany()</a></li>
<li><a class="el" href="group__msg__task__usage.html#gaa1b107438e7a295058e098a8c2c1bc4a" title="Destroys a communication.">MSG_comm_destroy()</a></li>
</ul>
<p>We refer you to the description of these functions for more details on their usage as well as to the example section on <a class="el" href="use.html#MSG_ex_asynchronous_communications">Asynchronous communications</a>.</p>
<h3><a class="anchor" id="faq_MIA_thread_synchronization"></a>
I need to synchronize my MSG processes</h3>
<p>You obviously cannot use pthread_mutexes of pthread_conds since we handle every scheduling related decision within SimGrid.</p>
<p>In the past (version <=3.3.4) you could do it by playing with <a class="el" href="group__m__process__management.html#gac00bbc4ebc824d14a3f6719de8190618" title="Suspend the process.">MSG_process_suspend()</a> and <a class="el" href="group__m__process__management.html#ga9f25a30269f39c7683d18f9d4e1fd331" title="Resume a suspended process.">MSG_process_resume()</a> or with fake communications (using <a class="el" href="group__msg__deprecated__functions.html#ga1ae6d2a80ca4f35d7a19215f409ce00f" title="Listen on a channel and wait for receiving a task.">MSG_task_get()</a>, <a class="el" href="group__msg__deprecated__functions.html#ga3e310a2bf7efc32d3aac8dad2eacd71d" title="Put a task on a channel of an host and waits for the end of the transmission.">MSG_task_put()</a> and <a class="el" href="group__msg__deprecated__functions.html#gae72af17e16cda4efcfad301a4170accd" title="Test whether there is a pending communication on a channel.">MSG_task_Iprobe()</a>).</p>
<p>Since version 3.4, you can use classical synchronization structures. See page <a class="el" href="group__XBT__synchro.html">Synchro stuff</a> or simply check in include/xbt/synchro_core.h.</p>
<h3><a class="anchor" id="faq_MIA_host_load"></a>
Where is the get_host_load function hidden in MSG?</h3>
<p>There is no such thing because its semantic wouldn't be really clear. Of course, it is something about the amount of host throughput, but there is as many definition of "host load" as people asking for this function. First, you have to remember that resource availability may vary over time, which make any load notion harder to define.</p>
<p>It may be instantaneous value or an average one. Moreover it may be only the power of the computer, or may take the background load into account, or may even take the currently running tasks into account. In some SURF models, communications have an influence on computational power. Should it be taken into account too?</p>
<p>First of all, it's near to impossible to predict the load beforehand in the simulator since it depends on too much parameters (background load variation, bandwidth sharing algorithmic complexity) some of them even being not known beforehand (other task starting at the same time). So, getting this information is really hard (just like in real life). It's not just that we want MSG to be as painful as real life. But as it is in some way realistic, we face some of the same problems as we would face in real life.</p>
<p>How would you do it for real? The most common option is to use something like NWS that performs active probes. The best solution is probably to do the same within MSG, as in next code snippet. It is very close from what you would have to do out of the simulator, and thus gives you information that you could also get in real settings to not hinder the realism of your simulation.</p>
<pre class="fragment">double get_host_load() {
m_task_t task = MSG_task_create("test", 0.001, 0, NULL);
double date = MSG_get_clock();
MSG_task_execute(task);
date = MSG_get_clock() - date;
MSG_task_destroy(task);
return (0.001/date);
}
</pre><p>Of course, it may not match your personal definition of "host load". In this case, please detail what you mean on the mailing list, and we will extend this FAQ section to fit your taste if possible.</p>
<h3><a class="anchor" id="faq_MIA_communication_time"></a>
How can I get the *real* communication time?</h3>
<p>Communications are synchronous and thus if you simply get the time before and after a communication, you'll only get the transmission time and the time spent to really communicate (it will also take into account the time spent waiting for the other party to be ready). However, getting the <em>real</em> communication time is not really hard either. The following solution is a good starting point.</p>
<pre class="fragment">int sender()
{
m_task_t task = MSG_task_create("Task", task_comp_size, task_comm_size,
calloc(1,sizeof(double)));
*((double*) task->data) = MSG_get_clock();
MSG_task_put(task, slaves[i % slaves_count], PORT_22);
XBT_INFO("Send completed");
return 0;
}
int receiver()
{
m_task_t task = NULL;
double time1,time2;
time1 = MSG_get_clock();
a = MSG_task_get(&(task), PORT_22);
time2 = MSG_get_clock();
if(time1<*((double *)task->data))
time1 = *((double *) task->data);
XBT_INFO("Communication time : \"%f\" ", time2-time1);
free(task->data);
MSG_task_destroy(task);
return 0;
}
</pre><h2><a class="anchor" id="faq_MIA_SimDag"></a>
SimDag related questions</h2>
<h3><a class="anchor" id="faq_SG_comm"></a>
Implementing communication delays between tasks.</h3>
<p>A classic question of SimDag newcomers is about how to express a communication delay between tasks. The thing is that in SimDag, both computation and communication are seen as tasks. So, if you want to model a data dependency between two DAG tasks t1 and t2, you have to create 3 SD_tasks: t1, t2 and c and add dependencies in the following way:</p>
<pre class="fragment">SD_task_dependency_add(NULL, NULL, t1, c);
SD_task_dependency_add(NULL, NULL, c, t2);
</pre><p>This way task t2 cannot start before the termination of communication c which in turn cannot start before t1 ends.</p>
<p>When creating task c, you have to associate an amount of data (in bytes) corresponding to what has to be sent by t1 to t2.</p>
<p>Finally to schedule the communication task c, you have to build a list comprising the workstations on which t1 and t2 are scheduled (w1 and w2 for example) and build a communication matrix that should look like [0;amount ; 0; 0].</p>
<h3><a class="anchor" id="faq_SG_DAG"></a>
How to implement a distributed dynamic scheduler of DAGs.</h3>
<p>Distributed is somehow "contagious". If you start making distributed decisions, there is no way to handle DAGs directly anymore (unless I am missing something). You have to encode your DAGs in term of communicating process to make the whole scheduling process distributed. Here is an example of how you could do that. Assume T1 has to be done before T2.</p>
<pre class="fragment"> int your_agent(int argc, char *argv[] {
...
T1 = MSG_task_create(...);
T2 = MSG_task_create(...);
...
while(1) {
...
if(cond) MSG_task_execute(T1);
...
if((MSG_task_get_remaining_computation(T1)=0.0) && (you_re_in_a_good_mood))
MSG_task_execute(T2)
else {
/* do something else */
}
}
}
</pre><p>If you decide that the distributed part is not that much important and that DAG is really the level of abstraction you want to work with, then you should give a try to <a class="el" href="group__SD__API.html">SimDag</a>.</p>
<h2><a class="anchor" id="faq_MIA_generic"></a>
Generic features</h2>
<h3><a class="anchor" id="faq_more_processes"></a>
Increasing the amount of simulated processes</h3>
<p>Here are a few tricks you can apply if you want to increase the amount of processes in your simulations.</p>
<ul>
<li><b>A few thousands of simulated processes</b> (soft tricks)<br/>
SimGrid can use either pthreads library or the UNIX98 contexts. On most systems, the number of pthreads is limited and then your simulation may be limited for a stupid reason. This is especially true with the current linux pthreads, and I cannot get more than 2000 simulated processes with pthreads on my box. The UNIX98 contexts allow me to raise the limit to 25,000 simulated processes on my laptop.<br/>
<br/>
The <code>–with-context</code> option of the <code>./configure</code> script allows you to choose between UNIX98 contexts (<code>–with-context=ucontext</code>) and the pthread version (<code>–with-context=pthread</code>). The default value is ucontext when the script detect a working UNIX98 context implementation. On Windows boxes, the provided value is discarded and an adapted version is picked up.<br/>
<br/>
We experienced some issues with contexts on some rare systems (solaris 8 and lower or old alpha linuxes comes to mind). The main problem is that the configure script detect the contexts as being functional when it's not true. If you happen to use such a system, switch manually to the pthread version, and provide us with a good patch for the configure script so that it is done automatically ;)</li>
</ul>
<ul>
<li><b>Hundred thousands of simulated processes</b> (hard-core tricks)<br/>
As explained above, SimGrid can use UNIX98 contexts to represent and handle the simulated processes. Thanks to this, the main limitation to the number of simulated processes becomes the available memory.<br/>
<br/>
Here are some tricks I had to use in order to run a token ring between 25,000 processes on my laptop (1Gb memory, 1.5Gb swap).<br/>
<ul>
<li>First of all, make sure your code runs for a few hundreds processes before trying to push the limit. Make sure it's valgrind-clean, i.e. that valgrind does not report neither memory error nor memory leaks. Indeed, numerous simulated processes result in <em>fat</em> simulation hindering debugging.</li>
<li>It was really boring to write 25,000 entries in the deployment file, so I wrote a little script <code>examples/gras/mutual_exclusion/simple_token/make_deployment.pl</code>, which you may want to adapt to your case. You could also think about hijacking the SURFXML parser (have look at <a class="el" href="platform.html#pf_flexml_bypassing">Bypassing the XML parser with your own C functions</a>).</li>
<li>The deployment file became quite big, so I had to do what is in the FAQ entry <a class="el" href="FAQ.html#faq_flexml_limit">"surf_parse_lex: Assertion `next limit' failed."</a></li>
<li>Each UNIX98 context has its own stack entry. As debugging this is quite hairy, the default value is a bit overestimated so that user doesn't get into trouble about this. You want to tune this size to increase the number of processes. This is the <code>STACK_SIZE</code> define in <code>src/xbt/xbt_context_sysv.c</code>, which is 128kb by default. Reduce this as much as you can, but be warned that if this value is too low, you'll get a segfault. The token ring example, which is quite simple, runs with 40kb stacks.</li>
<li>You may tweak the logs to reduce the stack size further. When logging something, we try to build the string to display in a char array on the stack. The size of this array is constant (and equal to XBT_LOG_BUFF_SIZE, defined in include/xbt/log/h). If the string is too large to fit this buffer, we move to a dynamically sized buffer. In which case, we have to traverse one time the log event arguments to compute the size we need for the buffer, malloc it, and traverse the argument list again to do the actual job.<br/>
The idea here is to move XBT_LOG_BUFF_SIZE to 1, forcing the logs to use a dynamic array each time. This allows us to lower further the stack size at the price of some performance loss...<br/>
This allowed me to run the reduce the stack size to ... 4k. Ie, on my 1Gb laptop, I can run more than 250,000 processes!</li>
</ul>
</li>
</ul>
<h3><a class="anchor" id="faq_MIA_batch_scheduler"></a>
Is there a native support for batch schedulers in SimGrid?</h3>
<p>No, there is no native support for batch schedulers and none is planned because this is a very specific need (and doing it in a generic way is thus very hard). However some people have implemented their own batch schedulers. Vincent Garonne wrote one during his PhD and put his code in the contrib directory of our SVN so that other can keep working on it. You may find inspiring ideas in it.</p>
<h3><a class="anchor" id="faq_MIA_checkpointing"></a>
I need a checkpointing thing</h3>
<p>Actually, it depends on whether you want to checkpoint the simulation, or to simulate checkpoints.</p>
<p>The first one could help if your simulation is a long standing process you want to keep running even on hardware issues. It could also help to <em>rewind</em> the simulation by jumping sometimes on an old checkpoint to cancel recent calculations.<br/>
Unfortunately, such thing will probably never exist in SG. One would have to duplicate all data structures because doing a rewind at the simulator level is very very hard (not talking about the malloc free operations that might have been done in between). Instead, you may be interested in the Libckpt library (<a href="http://www.cs.utk.edu/~plank/plank/www/libckpt.html">http://www.cs.utk.edu/~plank/plank/www/libckpt.html</a>). This is the checkpointing solution used in the condor project, for example. It makes it easy to create checkpoints (at the OS level, creating something like core files), and rerunning them on need.</p>
<p>If you want to simulate checkpoints instead, it means that you want the state of an executing task (in particular, the progress made towards completion) to be saved somewhere. So if a host (and the task executing on it) fails (cf. <a class="el" href="group__msg__simulation.html#ggaf79b56c0bd3b78b539b0cb4c12e56425a69965b44e0393c3ba81482bb975c55e5" title="System shutdown. The host on which you are running has just been rebooted. Free your datastructures a...">MSG_HOST_FAILURE</a>), then the task can be restarted from the last checkpoint.<br/>
</p>
<p>Actually, such a thing does not exist in SimGrid either, but it's just because we don't think it is fundamental and it may be done in the user code at relatively low cost. You could for example use a watcher that periodically get the remaining amount of things to do (using <a class="el" href="group__m__task__management.html#gada131788294fe436d9cc1044319b111f" title="Returns the remaining computation amount of a task msg_task_t.">MSG_task_get_remaining_computation()</a>), or fragment the task in smaller subtasks.</p>
<h2><a class="anchor" id="faq_platform"></a>
Platform building and Dynamic resources</h2>
<h3><a class="anchor" id="faq_platform_example"></a>
Where can I find SimGrid platform files?</h3>
<p>There are several little examples in the archive, in the examples/msg directory. From time to time, we are asked for other files, but we don't have much at hand right now.</p>
<p>You should refer to the Platform Description Archive (<a href="http://pda.gforge.inria.fr">http://pda.gforge.inria.fr</a>) project to see the other platform file we have available, as well as the Simulacrum simulator, meant to generate SimGrid platforms using all classical generation algorithms.</p>
<h3><a class="anchor" id="faq_platform_alnem"></a>
How can I automatically map an existing platform?</h3>
<p>We are working on a project called ALNeM (Application-Level Network Mapper) which goal is to automatically discover the topology of an existing network. Its output will be a platform description file following the SimGrid syntax, so everybody will get the ability to map their own lab network (and contribute them to the catalog project). This tool is not ready yet, but it move quite fast forward. Just stay tuned.</p>
<h3><a class="anchor" id="faq_platform_synthetic"></a>
Generating synthetic but realistic platforms</h3>
<p>The third possibility to get a platform file (after manual or automatic mapping of real platforms) is to generate synthetic platforms. Getting a realistic result is not a trivial task, and moreover, nobody is really able to define what "realistic" means when speaking of topology files. You can find some more thoughts on this topic in these <a href="http://graal.ens-lyon.fr/~alegrand/articles/Simgrid-Introduction.pdf">slides</a>.</p>
<p>If you are looking for an actual tool, there we have a little tool to annotate Tiers-generated topologies. This perl-script is in <code>tools/platform_generation/</code> directory of the SVN. Dinda et Al. released a very comparable tool, and called it GridG.</p>
<p>The specified computing power will be available to up to 6 sequential tasks without sharing. If more tasks are placed on this host, the resource will be shared accordingly. For example, if you schedule 12 tasks on the host, each will get half of the computing power. Please note that although sound, this model were never scientifically assessed. Please keep this fact in mind when using it.</p>
<h3><a class="anchor" id="faq_platform_random"></a>
Using random variable for the resource power or availability</h3>
<p>The best way to model the resouce power using a random variable is to use an availability trace that is directed by a probability distribution. This can be done using the function tmgr_trace_generator_value() below. The date and value generators is created with one of tmgr_event_generator_new_uniform(), tmgr_event_generator_new_exponential() or tmgr_event_generator_new_weibull() (if you need other generators, adding them to src/surf/trace_mgr.c should be quite trivial and your patch will be welcomed). Once your trace is created, you have to connect it to the resource with the function sg_platf_new_trace_connect().</p>
<p>That the process is very similar if you want to model the resource availability with a random variable (deciding whether it's on/off instead of deciding its speed) using the function tmgr_trace_generator_state() or tmgr_trace_generator_avail_unavail() instead of tmgr_trace_generator_value().</p>
<p>Unfortunately, all this is currently lacking a proper documentation, and there is even no proper example of use. You'll thus have to check the header file include/simgrid/platf.h and experiment a bit by yourself. The following code should be a good starting point, and contributing a little clean example would be a good way to help the SimGrid project.</p>
<div class="fragment"><div class="line">tmgr_trace_generator_value(<span class="stringliteral">"mytrace"</span>,tmgr_event_generator_new_exponential(.5)</div>
<div class="line"> tmgr_event_generator_new_uniform(100000,9999999));</div>
<div class="line"> </div>
<div class="line">sg_platf_trace_connect_cbarg_t myconnect = SG_PLATF_TRACE_CONNECT_INITIALIZER;</div>
<div class="line">myconnect.kind = SURF_TRACE_CONNECT_KIND_BANDWIDTH;</div>
<div class="line">myconnect.trace = <span class="stringliteral">"mytrace"</span>;</div>
<div class="line">myconnect.element = <span class="stringliteral">"mylink"</span>;</div>
<div class="line"></div>
<div class="line">sg_platf_trace_connect(myconnect);</div>
</div><!-- fragment --><h1><a class="anchor" id="faq_troubleshooting"></a>
Troubleshooting</h1>
<h2><a class="anchor" id="faq_trouble_changelog"></a>
The feature X stopped to work after my last update</h2>
<p>I guess that you want to read the ChangeLog file, that always contains all the information that could be important to the users during the upgrade. Actually, you may want to read it (alongside with the NEWS file that highlights the most important changes) even before you upgrade your copy of SimGrid, too.</p>
<p>Backward compatibility is very important to us, as we want to provide a scientific tool allowing to evaluate the code you write in several years, too. That being said, we sometimes change the interface to make them more usable to the users. When we do so, we always keep the old interface as DEPRECATED. If you still want to use them, you want to define the SIMGRID_DEPRECATED preprocessor symbol before loading the SimGrid files:</p>
<pre class="fragment">#define SIMGRID_DEPRECATED
#include <msg/msg.h>
</pre><h2><a class="anchor" id="faq_trouble_lib_compil"></a>
SimGrid compilation and installation problems</h2>
<h3><a class="anchor" id="faq_trouble_lib_config"></a>
cmake fails!</h3>
<p>We know only one reason for the configure to fail:</p>
<ul>
<li><b>You are using a broken build environment</b><br/>
If symptom is that the configury magic complains about gcc not being able to build executables, you are probably missing the libc6-dev package. Damn Ubuntu.</li>
</ul>
<p>If you experience other kind of issue, please get in touch with us. We are always interested in improving our portability to new systems.</p>
<h3><a class="anchor" id="faq_trouble_distcheck"></a>
Dude! "ctest" fails on my machine!</h3>
<p>Don't assume we never run this target, because we do. Check <a href="http://cdash.inria.fr/CDash/index.php?project=Simgrid">http://cdash.inria.fr/CDash/index.php?project=Simgrid</a> (click on previous if there is no result for today: results are produced only by 11am, French time) and <a href="https://buildd.debian.org/status/logs.php?pkg=simgrid">https://buildd.debian.org/status/logs.php?pkg=simgrid</a> if you don't believe us.</p>
<p>If it's failing on your machine in a way not experienced by the autobuilders above, please drop us a mail on the mailing list so that we can check it out. Make sure to read <a class="el" href="FAQ.html#faq_bugrepport">So I've found a bug in SimGrid. How to report it?</a> before you do so.</p>
<h2><a class="anchor" id="faq_trouble_compil"></a>
User code compilation problems</h2>
<h3><a class="anchor" id="faq_trouble_err_logcat"></a>
"gcc: _simgrid_this_log_category_does_not_exist__??? undeclared (first use in this function)"</h3>
<p>This is because you are using the log mecanism, but you didn't created any default category in this file. You should refer to <a class="el" href="group__XBT__log.html">Logging support</a> for all the details, but you simply forgot to call one of <a class="el" href="group__XBT__log.html#ga5094a0e812d0012e6ee4d2257b1a13f1">XBT_LOG_NEW_DEFAULT_CATEGORY()</a> or <a class="el" href="group__XBT__log.html#ga8a4327fc994afcfb2eaebea0c4d1b00a">XBT_LOG_NEW_DEFAULT_SUBCATEGORY()</a>.</p>
<h3><a class="anchor" id="faq_trouble_pthreadstatic"></a>
"gcc: undefined reference to pthread_key_create"</h3>
<p>This indicates that one of the library SimGrid depends on (libpthread here) was missing on the linking command line. Dependencies of libsimgrid are expressed directly in the dynamic library, so it's quite impossible that you see this message when doing dynamic linking.</p>
<p>If you compile your code statically (and if you use a pthread version of SimGrid – see <a class="el" href="FAQ.html#faq_more_processes">Increasing the amount of simulated processes</a>), you must absolutely specify <code>-lpthread</code> on the linker command line. As usual, this should come after <code>-lsimgrid</code> on this command line.</p>
<h3><a class="anchor" id="faq_trouble_lib_msg_deprecated"></a>
"gcc: undefined reference to MSG_*"</h3>
<p>Since version 3.7 all the m_channel_t mecanism is deprecated. So functions about this mecanism may get removed in future releases.</p>
<p>List of functions:</p>
<ul>
<li>XBT_PUBLIC(int) <a class="el" href="group__m__host__management.html#gaafceb2773bb9e39878592ff9a9a81a01" title="Return the current number MSG hosts.">MSG_get_host_number(void)</a>;</li>
</ul>
<ul>
<li>XBT_PUBLIC(m_host_t *) MSG_get_host_table(void);</li>
</ul>
<ul>
<li>XBT_PUBLIC(MSG_error_t) <a class="el" href="group__msg__deprecated__functions.html#ga4abdb2c7da1b9d89187ce51211db3f25" title="Return the last value returned by a MSG function (except MSG_get_errno...).">MSG_get_errno(void)</a>;</li>
</ul>
<ul>
<li>XBT_PUBLIC(MSG_error_t) <a class="el" href="group__msg__deprecated__functions.html#ga1ae6d2a80ca4f35d7a19215f409ce00f" title="Listen on a channel and wait for receiving a task.">MSG_task_get(m_task_t * task, m_channel_t channel)</a>;</li>
</ul>
<ul>
<li>XBT_PUBLIC(MSG_error_t) <a class="el" href="group__msg__deprecated__functions.html#ga48bf7268e2f079ba1ad51a00209028cd" title="Listen on a channel and wait for receiving a task with a timeout.">MSG_task_get_with_timeout(m_task_t * task, m_channel_t channel, double max_duration)</a>;</li>
</ul>
<ul>
<li>XBT_PUBLIC(MSG_error_t) <a class="el" href="group__msg__deprecated__functions.html#gad09e64171486db6b0004653125313971" title="Listen on channel and waits for receiving a task from host.">MSG_task_get_from_host(m_task_t * task, int channel, m_host_t host)</a>;</li>
</ul>
<ul>
<li>XBT_PUBLIC(MSG_error_t) MSG_task_get_ext(m_task_t * task, int channel, double max_duration, m_host_t host);</li>
</ul>
<ul>
<li>XBT_PUBLIC(MSG_error_t) <a class="el" href="group__msg__deprecated__functions.html#ga3e310a2bf7efc32d3aac8dad2eacd71d" title="Put a task on a channel of an host and waits for the end of the transmission.">MSG_task_put(m_task_t task, m_host_t dest, m_channel_t channel)</a>;</li>
</ul>
<ul>
<li>XBT_PUBLIC(MSG_error_t) <a class="el" href="group__msg__deprecated__functions.html#ga6e5db824a4317c3a1279eeaa79b6511d" title="Does exactly the same as MSG_task_put but with a bounded transmition rate.">MSG_task_put_bounded(m_task_t task, m_host_t dest, m_channel_t channel, double max_rate)</a>;</li>
</ul>
<ul>
<li>XBT_PUBLIC(MSG_error_t) <a class="el" href="group__msg__deprecated__functions.html#gac979ed68d1b602dd04169cbed23828a9" title="Put a task on a channel of an host (with a timeout on the waiting of the destination host) and waits ...">MSG_task_put_with_timeout(m_task_t task, m_host_t dest, m_channel_t channel, double max_duration)</a>;</li>
</ul>
<ul>
<li>XBT_PUBLIC(int) <a class="el" href="group__msg__deprecated__functions.html#gae72af17e16cda4efcfad301a4170accd" title="Test whether there is a pending communication on a channel.">MSG_task_Iprobe(m_channel_t channel)</a>;</li>
</ul>
<ul>
<li>XBT_PUBLIC(int) <a class="el" href="group__msg__deprecated__functions.html#ga99bfe9edc985c6d2cce3947f81780660" title="Test whether there is a pending communication on a channel, and who sent it.">MSG_task_probe_from(m_channel_t channel)</a>;</li>
</ul>
<ul>
<li>XBT_PUBLIC(int) <a class="el" href="group__msg__deprecated__functions.html#gaa44c47c7ed2f78243eca7379437e1cc0" title="Return the number of tasks waiting to be received on a channel and sent by host.">MSG_task_probe_from_host(int channel, m_host_t host)</a>;</li>
</ul>
<ul>
<li>XBT_PUBLIC(MSG_error_t) MSG_set_channel_number(int number);</li>
</ul>
<ul>
<li>XBT_PUBLIC(int) MSG_get_channel_number(void);</li>
</ul>
<p>If you want them you have to compile Simgrid v3.7 with option "-Denable_msg_deprecated=ON". Using them should print warning to inform what new function you have to use.</p>
<h2><a class="anchor" id="faq_trouble_errors"></a>
Runtime error messages</h2>
<h3><a class="anchor" id="faq_flexml_limit"></a>
"surf_parse_lex: Assertion `next limit' failed."</h3>
<p>This is because your platform file is too big for the parser.</p>
<p>Actually, the message comes directly from FleXML, the technology on top of which the parser is built. FleXML has the bad idea of fetching the whole document in memory before parsing it. And moreover, the memory buffer size must be determined at compilation time.</p>
<p>We use a value which seems big enough for our need without bloating the simulators footprints. But of course your mileage may vary. In this case, just edit src/surf/surfxml.l modify the definition of FLEXML_BUFFERSTACKSIZE. E.g.</p>
<pre class="fragment">#define FLEXML_BUFFERSTACKSIZE 1000000000
</pre><p>Then recompile and everything should be fine, provided that your version of Flex is recent enough (>= 2.5.31). If not the compilation process should warn you.</p>
<p>A while ago, we worked on FleXML to reduce a bit its memory consumption, but these issues remain. There is two things we should do:</p>
<ul>
<li>use a dynamic buffer instead of a static one so that the only limit becomes your memory, not a stupid constant fixed at compilation time (maybe not so difficult).</li>
<li>change the parser so that it does not need to get the whole file in memory before parsing (seems quite difficult, but I'm a complete newbe wrt flex stuff).</li>
</ul>
<p>These are changes to FleXML itself, not SimGrid. But since we kinda hijacked the development of FleXML, I can grant you that any patches would be really welcome and quickly integrated.</p>
<p><b>Update:</b> A new version of FleXML (1.7) was released. Most of the work was done by William Dowling, who use it in his own work. The good point is that it now use a dynamic buffer, and that the memory usage was greatly improved. The downside is that William also changed some things internally, and it breaks the hack we devised to bypass the parser, as explained in <a class="el" href="platform.html#pf_flexml_bypassing">Bypassing the XML parser with your own C functions</a>. Indeed, this is not a classical usage of the parser, and Will didn't imagine that we may have used (and even documented) such a crude usage of FleXML. So, we now have to repair the bypassing functionality to use the latest FleXML version and fix the memory usage in SimGrid.</p>
<h3><a class="anchor" id="faq_trouble_errors_big_fat_warning"></a>
I'm told that my XML files are too old.</h3>
<p>The format of the XML platform description files is sometimes improved. For example, we decided to change the units used in SimGrid from MBytes, MFlops and seconds to Bytes, Flops and seconds to ease people exchanging small messages. We also reworked the route descriptions to allow more compact descriptions.</p>
<p>That is why the XML files are versionned using the 'version' attribute of the root tag. Currently, it should read: </p>
<pre class="fragment"> <platform version="2">
</pre><p>If your files are too old, you can use the simgrid_update_xml.pl script which can be found in the tools directory of the archive.</p>
<h2><a class="anchor" id="faq_trouble_valgrind"></a>
Valgrind-related and other debugger issues</h2>
<p>If you don't, you really should use valgrind to debug your code, it's almost magic.</p>
<h3><a class="anchor" id="faq_trouble_vg_longjmp"></a>
longjmp madness in valgrind</h3>
<p>This is when valgrind starts complaining about longjmp things, just like:</p>
<pre class="fragment">==21434== Conditional jump or move depends on uninitialised value(s)
==21434== at 0x420DBE5: longjmp (longjmp.c:33)
==21434==
==21434== Use of uninitialised value of size 4
==21434== at 0x420DC3A: __longjmp (__longjmp.S:48)
</pre><p>This is the sign that you didn't used the exception mecanism well. Most probably, you have a <code>return;</code> somewhere within a <code>TRY{}</code> block. This is <b>evil</b>, and you must not do this. Did you read the section about <a class="el" href="group__XBT__ex.html">Exception support</a>??</p>
<h3><a class="anchor" id="faq_trouble_vg_libc"></a>
Valgrind spits tons of errors about backtraces!</h3>
<p>It may happen that valgrind, the memory debugger beloved by any decent C programmer, spits tons of warnings like the following : </p>
<pre class="fragment">==8414== Conditional jump or move depends on uninitialised value(s)
==8414== at 0x400882D: (within /lib/ld-2.3.6.so)
==8414== by 0x414EDE9: (within /lib/tls/i686/cmov/libc-2.3.6.so)
==8414== by 0x400B105: (within /lib/ld-2.3.6.so)
==8414== by 0x414F937: _dl_open (in /lib/tls/i686/cmov/libc-2.3.6.so)
==8414== by 0x4150F4C: (within /lib/tls/i686/cmov/libc-2.3.6.so)
==8414== by 0x400B105: (within /lib/ld-2.3.6.so)
==8414== by 0x415102D: __libc_dlopen_mode (in /lib/tls/i686/cmov/libc-2.3.6.so)
==8414== by 0x412D6B9: backtrace (in /lib/tls/i686/cmov/libc-2.3.6.so)
==8414== by 0x8076446: xbt_dictelm_get_ext (dict_elm.c:714)
==8414== by 0x80764C1: xbt_dictelm_get (dict_elm.c:732)
==8414== by 0x8079010: xbt_cfg_register (config.c:208)
==8414== by 0x806821B: MSG_config (msg_config.c:42)
</pre><p>This problem is somewhere in the libc when using the backtraces and there is very few things we can do ourselves to fix it. Instead, here is how to tell valgrind to ignore the error. Add the following to your ~/.valgrind.supp (or create this file on need). Make sure to change the obj line according to your personnal mileage (change 2.3.6 to the actual version you are using, which you can retrieve with a simple "ls /lib/ld*.so").</p>
<pre class="fragment">{
name: Backtrace madness
Memcheck:Cond
obj:/lib/ld-2.3.6.so
fun:dl_open_worker
fun:_dl_open
fun:do_dlopen
fun:dlerror_run
fun:__libc_dlopen_mode
}</pre><p>Then, you have to specify valgrind to use this suppression file by passing the <code>–suppressions=$HOME/.valgrind.supp</code> option on the command line. You can also add the following to your ~/.bashrc so that it gets passed automatically. Actually, it passes a bit more options to valgrind, and this happen to be my personnal settings. Check the valgrind documentation for more information.</p>
<pre class="fragment">export VALGRIND_OPTS="--leak-check=yes --leak-resolution=high --num-callers=40 --tool=memcheck --suppressions=$HOME/.valgrind.supp" </pre><h3><a class="anchor" id="faq_trouble_backtraces"></a>
Truncated backtraces</h3>
<p>When debugging SimGrid, it's easier to pass the –disable-compiler-optimization flag to the configure if valgrind or gdb get fooled by the optimization done by the compiler. But you should remove these flag when everything works before going in production (before launching your 1252135 experiments), or everything will run only one half of the true SimGrid potential.</p>
<h2><a class="anchor" id="faq_deadlock"></a>
There is a deadlock in my code!!!</h2>
<p>Unfortunately, we cannot debug every code written in SimGrid. We furthermore believe that the framework provides ways enough information to debug such informations yourself. If the textual output is not enough, Make sure to check the <a class="el" href="FAQ.html#faq_visualization">Visualizing and analyzing the results</a> FAQ entry to see how to get a graphical one.</p>
<p>Now, if you come up with a really simple example that deadlocks and you're absolutely convinced that it should not, you can ask on the list. Just be aware that you'll be severely punished if the mistake is on your side... We have plenty of FAQ entries to redact and new features to implement for the impenitents! ;)</p>
<h2><a class="anchor" id="faq_surf_network_latency"></a>
I get weird timings when I play with the latencies.</h2>
<p>OK, first of all, remember that units should be Bytes, Flops and Seconds. If you don't use such units, some SimGrid constants (e.g. the SG_TCP_CTE_GAMMA constant used in most network models) won't have the right unit and you'll end up with weird results.</p>
<p>Here is what happens with a single transfer of size L on a link (bw,lat) when nothing else happens.</p>
<pre class="fragment">0-----lat--------------------------------------------------t
|-----|**** real_bw =min(bw,SG_TCP_CTE_GAMMA/(2*lat)) *****|
</pre><p>In more complex situations, this min is the solution of a complex max-min linear system. Have a look <a href="http://lists.gforge.inria.fr/pipermail/simgrid-devel/2006-April/thread.html">here</a> and read the two threads "Bug in SURF?" and "Surf bug not
fixed?". You'll have a few other examples of such computations. You can also read "A Network Model for Simulation of Grid Application" by Henri Casanova and Loris Marchal to have all the details. The fact that the real_bw is smaller than bw is easy to understand. The fact that real_bw is smaller than SG_TCP_CTE_GAMMA/(2*lat) is due to the window-based congestion mechanism of TCP. With TCP, you can't exploit your huge network capacity if you don't have a good round-trip-time because of the acks...</p>
<p>Anyway, what you get is t=lat + L/min(bw,SG_TCP_CTE_GAMMA/(2*lat)). </p>
<pre class="fragment">if I you set (bw,lat)=(100 000 000, 0.00001), you get t = 1.00001 (you fully
</pre><p> use your link) if I you set (bw,lat)=(100 000 000, 0.0001), you get t = 1.0001 (you're on the limit) if I you set (bw,lat)=(100 000 000, 0.001), you get t = 10.001 (ouch!)</p>
<p>This bound on the effective bandwidth of a flow is not the only thing that may make your result be unexpected. For example, two flows competing on a saturated link receive an amount of bandwidth inversely proportional to their round trip time.</p>
<h2><a class="anchor" id="faq_bugrepport"></a>
So I've found a bug in SimGrid. How to report it?</h2>
<p>We do our best to make sure to hammer away any bugs of SimGrid, but this is still an academic project so please be patient if/when you find bugs in it. If you do, the best solution is to drop an email either on the simgrid-user or the simgrid-devel mailing list and explain us about the issue. You can also decide to open a formal bug report using the <a href="https://gforge.inria.fr/tracker/?atid=165&group_id=12&func=browse">relevant interface</a>. You need to login on the server to get the ability to submit bugs.</p>
<p>We will do our best to solve any problem repported, but you need to help us finding the issue. Just telling "it segfault" isn't enough. Telling "It
segfaults when running the attached simulator" doesn't really help either. You may find the following article interesting to see how to repport informative bug repports: <a href="http://www.chiark.greenend.org.uk/~sgtatham/bugs.html">http://www.chiark.greenend.org.uk/~sgtatham/bugs.html</a> (it is not SimGrid specific at all, but it's full of good advices).</p>
<dl class="section author"><dt>Author</dt><dd>Da SimGrid team <a href="#" onclick="location.href='mai'+'lto:'+'sim'+'gr'+'id-'+'de'+'vel'+'@l'+'ist'+'s.'+'gfo'+'rg'+'e.i'+'nr'+'ia.'+'fr'; return false;">simgr<span style="display: none;">.nosp@m.</span>id-d<span style="display: none;">.nosp@m.</span>evel@<span style="display: none;">.nosp@m.</span>list<span style="display: none;">.nosp@m.</span>s.gfo<span style="display: none;">.nosp@m.</span>rge.<span style="display: none;">.nosp@m.</span>inria<span style="display: none;">.nosp@m.</span>.fr</a> </dd></dl>
</div></div><!-- contents -->
</div><!-- doc-content -->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
<ul>
<li class="footer">Generated on Sun Nov 17 2013 21:34:47 for SimGrid by
<a href="http://www.doxygen.org/index.html">
<img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.1.2 </li>
</ul>
</div>
</body>
</html>
|