/usr/share/openmpi/help-mpi-btl-openib.txt is in openmpi-common 1.4.3-2.1ubuntu3.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 | # -*- text -*-
#
# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
# University Research and Technology
# Corporation. All rights reserved.
# Copyright (c) 2004-2005 The University of Tennessee and The University
# of Tennessee Research Foundation. All rights
# reserved.
# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
# University of Stuttgart. All rights reserved.
# Copyright (c) 2004-2006 The Regents of the University of California.
# All rights reserved.
# Copyright (c) 2006-2008 Cisco Systems, Inc. All rights reserved.
# Copyright (c) 2007-2008 Mellanox Technologies. All rights reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
#
# $HEADER$
#
# This is the US/English help file for Open MPI's OpenFabrics support
# (the openib BTL).
#
[ini file:file not found]
The Open MPI OpenFabrics (openib) BTL component was unable to find or
read an INI file that was requested via the
btl_openib_device_param_files MCA parameter. Please check this file
and/or modify the btl_openib_evice_param_files MCA parameter:
%s
#
[ini file:not in a section]
In parsing the OpenFabrics (openib) BTL parameter file, values were
found that were not in a valid INI section. These values will be
ignored. Please re-check this file:
%s
At line %d, near the following text:
%s
#
[ini file:unexpected token]
In parsing the OpenFabrics (openib) BTL parameter file, unexpected
tokens were found (this may cause significant portions of the INI file
to be ignored). Please re-check this file:
%s
At line %d, near the following text:
%s
#
[ini file:expected equals]
In parsing the OpenFabrics (openib) BTL parameter file, unexpected
tokens were found (this may cause significant portions of the INI file
to be ignored). An equals sign ("=") was expected but was not found.
Please re-check this file:
%s
At line %d, near the following text:
%s
#
[ini file:expected newline]
In parsing the OpenFabrics (openib) BTL parameter file, unexpected
tokens were found (this may cause significant portions of the INI file
to be ignored). A newline was expected but was not found. Please
re-check this file:
%s
At line %d, near the following text:
%s
#
[ini file:unknown field]
In parsing the OpenFabrics (openib) BTL parameter file, an
unrecognized field name was found. Please re-check this file:
%s
At line %d, the field named:
%s
This field, and any other unrecognized fields, will be skipped.
#
[no device params found]
WARNING: No preset parameters were found for the device that Open MPI
detected:
Local host: %s
Device name: %s
Device vendor ID: 0x%04x
Device vendor part ID: %d
Default device parameters will be used, which may result in lower
performance. You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.
NOTE: You can turn off this warning by setting the MCA parameter
btl_openib_warn_no_device_params_found to 0.
#
[init-fail-no-mem]
The OpenFabrics (openib) BTL failed to initialize while trying to
allocate some locked memory. This typically can indicate that the
memlock limits are set too low. For most HPC installations, the
memlock limits should be set to "unlimited". The failure occured
here:
Local host: %s
OMPI source: %s:%d
Function: %s()
Device: %s
Memlock limit: %s
You may need to consult with your system administrator to get this
problem fixed. This FAQ entry on the Open MPI web site may also be
helpful:
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
#
[init-fail-create-q]
The OpenFabrics (openib) BTL failed to initialize while trying to
create an internal queue. This typically indicates a failed
OpenFabrics installation, faulty hardware, or that Open MPI is
attempting to use a feature that is not supported on your hardware
(i.e., is a shared receive queue specified in the
btl_openib_receive_queues MCA parameter with a device that does not
support it?). The failure occured here:
Local host: %s
OMPI source: %s:%d
Function: %s()
Error: %s (errno=%d)
Device: %s
You may need to consult with your system administrator to get this
problem fixed.
#
[pp rnr retry exceeded]
The OpenFabrics "receiver not ready" retry count on a per-peer
connection between two MPI processes has been exceeded. In general,
this should not happen because Open MPI uses flow control on per-peer
connections to ensure that receivers are always ready when data is
sent.
This error usually means one of two things:
1. There is something awry within the network fabric itself.
2. A bug in Open MPI has caused flow control to malfunction.
#1 is usually more likely. You should note the hosts on which this
error has occurred; it has been observed that rebooting or removing a
particular host from the job can sometimes resolve this issue.
Below is some information about the host that raised the error and the
peer to which it was connected:
Local host: %s
Local device: %s
Peer host: %s
You may need to consult with your system administrator to get this
problem fixed.
#
[srq rnr retry exceeded]
The OpenFabrics "receiver not ready" retry count on a shared receive
queue or XRC receive queue has been exceeded. This error can occur if
the mca_btl_openib_ib_rnr_retry is set to a value less than 7 (where 7
the default value and effectively means "infinite retry"). If your
rnr_retry value is 7, there might be something awry within the network
fabric itself. In this case, you should note the hosts on which this
error has occurred; it has been observed that rebooting or removing a
particular host from the job can sometimes resolve this issue.
Below is some information about the host that raised the error and the
peer to which it was connected:
Local host: %s
Local device: %s
Peer host: %s
You may need to consult with your system administrator to get this
problem fixed.
#
[pp retry exceeded]
The InfiniBand retry count between two MPI processes has been
exceeded. "Retry count" is defined in the InfiniBand spec 1.2
(section 12.7.38):
The total number of times that the sender wishes the receiver to
retry timeout, packet sequence, etc. errors before posting a
completion error.
This error typically means that there is something awry within the
InfiniBand fabric itself. You should note the hosts on which this
error has occurred; it has been observed that rebooting or removing a
particular host from the job can sometimes resolve this issue.
Two MCA parameters can be used to control Open MPI's behavior with
respect to the retry count:
* btl_openib_ib_retry_count - The number of times the sender will
attempt to retry (defaulted to 7, the maximum value).
* btl_openib_ib_timeout - The local ACK timeout parameter (defaulted
to 10). The actual timeout value used is calculated as:
4.096 microseconds * (2^btl_openib_ib_timeout)
See the InfiniBand spec 1.2 (section 12.7.34) for more details.
Below is some information about the host that raised the error and the
peer to which it was connected:
Local host: %s
Local device: %s
Peer host: %s
You may need to consult with your system administrator to get this
problem fixed.
#
[no active ports found]
WARNING: There is at least one OpenFabrics device found but there are
no active ports detected (or Open MPI was unable to use them). This
is most certainly not what you wanted. Check your cables, subnet
manager configuration, etc. The openib BTL will be ignored for this
job.
Local host: %s
#
[error in device init]
WARNING: There was an error initializing an OpenFabrics device.
Local host: %s
Local device: %s
#
[no devices right type]
WARNING: No OpenFabrics devices of the right type were found within
the requested bus distance. The OpenFabrics BTL will be ignored for
this run.
Local host: %s
Requested type: %s
If the "requested type" is "<any>", this usually means that *no*
OpenFabrics devices were found within the requested bus distance.
#
[default subnet prefix]
WARNING: There are more than one active ports on host '%s', but the
default subnet GID prefix was detected on more than one of these
ports. If these ports are connected to different physical IB
networks, this configuration will fail in Open MPI. This version of
Open MPI requires that every physically separate IB subnet that is
used between connected MPI processes must have different subnet ID
values.
Please see this FAQ entry for more details:
http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid
NOTE: You can turn off this warning by setting the MCA parameter
btl_openib_warn_default_gid_prefix to 0.
#
[ibv_fork requested but not supported]
WARNING: fork() support was requested for the OpenFabrics (openib)
BTL, but it is not supported on the host %s. Deactivating the
OpenFabrics BTL.
#
[ibv_fork_init fail]
WARNING: fork() support was requested for the OpenFabrics (openib)
BTL, but the library call ibv_fork_init() failed on the host %s.
Deactivating the OpenFabrics BTL.
#
[wrong buffer alignment]
Wrong buffer alignment %d configured on host '%s'. Should be bigger
than zero and power of two. Use default %d instead.
#
[of error event]
The OpenFabrics stack has reported a network error event. Open MPI
will try to continue, but your job may end up failing.
Local host: %s
MPI process PID: %d
Error number: %d (%s)
This error may indicate connectivity problems within the fabric;
please contact your system administrator.
#
[of unknown event]
The OpenFabrics stack has reported an unknown network error event.
Open MPI will try to continue, but the job may end up failing.
Local host: %s
MPI process PID: %d
Error number: %d
This error may indicate that you are using an OpenFabrics library
version that is not currently supported by Open MPI. You might try
recompiling Open MPI against your OpenFabrics library installation to
get more information.
#
[specified include and exclude]
ERROR: You have specified more than one of the btl_openib_if_include,
btl_openib_if_exclude, btl_openib_ipaddr_include, or btl_openib_ipaddr_exclude
MCA parameters. These four parameters are mutually exclusive; you can only
specify one.
For reference, the values that you specified are:
btl_openib_if_include: %s
btl_openib_if_exclude: %s
btl_openib_ipaddr_include: %s
btl_openib_ipaddr_exclude: %s
#
[nonexistent port]
WARNING: One or more nonexistent OpenFabrics devices/ports were
specified:
Host: %s
MCA parameter: mca_btl_if_%sclude
Nonexistent entities: %s
These entities will be ignored. You can disable this warning by
setting the btl_openib_warn_nonexistent_if MCA parameter to 0.
#
[invalid mca param value]
WARNING: An invalid MCA parameter value was found for the OpenFabrics
(openib) BTL.
Problem: %s
Resolution: %s
#
[no qps in receive_queues]
WARNING: No queue pairs were defined in the btl_openib_receive_queues
MCA parameter. At least one queue pair must be defined. The
OpenFabrics (openib) BTL will therefore be deactivated for this run.
Local host: %s
#
[invalid qp type in receive_queues]
WARNING: An invalid queue pair type was specified in the
btl_openib_receive_queues MCA parameter. The OpenFabrics (openib) BTL
will be deactivated for this run.
Valid queue pair types are "P" for per-peer and "S" for shared receive
queue.
Local host: %s
btl_openib_receive_queues: %s
Bad specification: %s
#
[invalid pp qp specification]
WARNING: An invalid per-peer receive queue specification was detected
as part of the btl_openib_receive_queues MCA parameter. The
OpenFabrics (openib) BTL will therefore be deactivated for this run.
Per-peer receive queues require between 2 and 5 parameters:
1. Buffer size in bytes (mandatory)
2. Number of buffers (mandatory)
3. Low buffer count watermark (optional; defaults to (num_buffers / 2))
4. Credit window size (optional; defaults to (low_watermark / 2))
5. Number of buffers reserved for credit messages (optional;
defaults to (num_buffers*2-1)/credit_window)
Example: P,128,256,128,16
- 128 byte buffers
- 256 buffers to receive incoming MPI messages
- When the number of available buffers reaches 128, re-post 128 more
buffers to reach a total of 256
- If the number of available credits reaches 16, send an explicit
credit message to the sender
- Defaulting to ((256 * 2) - 1) / 16 = 31; this many buffers are
reserved for explicit credit messages
Local host: %s
Bad queue specification: %s
#
[invalid srq specification]
WARNING: An invalid shared receive queue specification was detected as
part of the btl_openib_receive_queues MCA parameter. The OpenFabrics
(openib) BTL will therefore be deactivated for this run.
Shared receive queues can take between 2 and 4 parameters:
1. Buffer size in bytes (mandatory)
2. Number of buffers (mandatory)
3. Low buffer count watermark (optional; defaults to (num_buffers / 2))
4. Maximum number of outstanding sends a sender can have (optional;
defaults to (low_watermark / 4)
Example: S,1024,256,128,32
- 1024 byte buffers
- 256 buffers to receive incoming MPI messages
- When the number of available buffers reaches 128, re-post 128 more
buffers to reach a total of 256
- A sender will not send to a peer unless it has less than 32
outstanding sends to that peer.
Local host: %s
Bad queue specification: %s
#
[rd_num must be > rd_low]
WARNING: The number of buffers for a queue pair specified via the
btl_openib_receive_queues MCA parameter must be greater than the low
buffer count watermark. The OpenFabrics (openib) BTL will therefore
be deactivated for this run.
Local host: %s
Bad queue specification: %s
#
[biggest qp size is too small]
WARNING: The largest queue pair buffer size specified in the
btl_openib_receive_queues MCA parameter is smaller than the maximum
send size (i.e., the btl_openib_max_send_size MCA parameter), meaning
that no queue is large enough to receive the largest possible incoming
message fragment. The OpenFabrics (openib) BTL will therefore be
deactivated for this run.
Local host: %s
Largest buffer size: %d
Maximum send fragment size: %d
#
[biggest qp size is too big]
WARNING: The largest queue pair buffer size specified in the
btl_openib_receive_queues MCA parameter is larger than the maximum
send size (i.e., the btl_openib_max_send_size MCA parameter). This
means that memory will be wasted because the largest possible incoming
message fragment will not fill a buffer allocated for incoming
fragments.
Local host: %s
Largest buffer size: %d
Maximum send fragment size: %d
#
[freelist too small]
WARNING: The maximum freelist size that was specified was too small
for the requested receive queue sizes. The maximum freelist size must
be at least equal to the sum of the largest number of buffers posted
to a single queue plus the corresponding number of reserved/credit
buffers for that queue. It is suggested that the maximum be quite a
bit larger than this for performance reasons.
Local host: %s
Specified freelist size: %d
Minimum required freelist size: %d
#
[XRC with PP or SRQ]
WARNING: An invalid queue pair type was specified in the
btl_openib_receive_queues MCA parameter. The OpenFabrics (openib) BTL
will be deactivated for this run.
Note that XRC ("X") queue pairs cannot be used with per-peer ("P") and
SRQ ("S") queue pairs. This restriction may be removed in future
versions of Open MPI.
Local host: %s
btl_openib_receive_queues: %s
#
[XRC with BTLs per LID]
WARNING: An invalid queue pair type was specified in the
btl_openib_receive_queues MCA parameter. The OpenFabrics (openib) BTL
will be deactivated for this run.
XRC ("X") queue pairs can not be used when (btls_per_lid > 1). This
restriction may be removed in future versions of Open MPI.
Local host: %s
btl_openib_receive_queues: %s
btls_per_lid: %d
#
[XRC on device without XRC support]
WARNING: You configured the OpenFabrics (openib) BTL to run with %d
XRC queues. The device %s does not have XRC capabilities; the
OpenFabrics btl will ignore this device. If no devices are found with
XRC capabilities, the OpenFabrics BTL will be disabled.
Local host: %s
#
[No XRC support]
WARNING: The Open MPI build was compiled without XRC support, but XRC
("X") queues were specified in the btl_openib_receive_queues MCA
parameter. The OpenFabrics (openib) BTL will therefore be deactivated
for this run.
Local host: %s
btl_openib_receive_queues: %s
#
[XRC with wrong OOB]
WARNING: You must use the "xoob" OpenFabrics (openib) connection
manager when XRC ("X") queues are specified in the
btl_openib_receive_queues MCA parameter. Either remove the X queues
from btl_openib_receive_queues or ensure to use the "xoob" connection
manager by setting btl_openib_connect to "xoob". The OpenFabrics BTL
will therefore be deactivated for this run.
Local host: %s
Number of XRC RQs: %d
#
[SRQ or PP with wrong OOB]
WARNING: You must use the "oob" OpenFabrics (openib) connection
manager when PP ("P") or SRQ ("S") queues are specified in the
btl_openib_receive_queues MCA parameter. Either remove the P or S
queues from btl_openib_receive_queues or ensure to use the "oob"
connection manager by setting btl_openib_connect to "oob". The
OpenFabrics BTL will therefore be deactivated for this run.
Local host: %s
Number of SRQs: %d
Number of PPRQs: %d
#
[non optimal rd_win]
WARNING: rd_win specification is non optimal. For maximum performance it is
advisable to configure rd_win bigger than (rd_num - rd_low), but currently
rd_win = %d and (rd_num - rd_low) = %d.
#
[apm without lmc]
WARNING: You can't enable APM support with LMC bit configured to 0.
APM support will be disabled.
#
[apm with wrong lmc]
Can not provide %d alternative paths with LMC bit configured to %d.
#
[apm not enough ports]
WARNING: For APM over ports ompi require at least 2 active ports and
only single active port was found. Disabling APM over ports
#
[locally conflicting receive_queues]
Open MPI detected two devices on a single server that have different
"receive_queues" parameter values (in the openib BTL). Open MPI
currently only supports one OpenFabrics receive_queues value in an MPI
job, even if you have different types of OpenFabrics adapters on the
same host.
Device 2 (in the details shown below) will be ignored for the duration
of this MPI job.
You can fix this issue by one or more of the following:
1. Set the MCA parameter btl_openib_receive_queues to a value that
is usable by all the OpenFabrics devices that you will use.
2. Use the btl_openib_if_include or btl_openib_if_exclue MCA
parameters to select exactly which OpenFabrics devices to use in
your MPI job.
Finally, note that the "receive_queues" values may have been set by
the Open MPI device default settings file. You may want to look in
this file and see if your devices are getting receive_queues values
from this file:
%s/mca-btl-openib-device-params.ini
Here is more detailed information about the recieive_queus value
conflict:
Local host: %s
Device 1: %s (vendor 0x%x, part ID %d)
Receive queues: %s
Device 2: %s (vendor 0x%x, part ID %d)
Receive queues: %s
#
[eager RDMA and progress threads]
WARNING: The openib BTL was directed to use "eager RDMA" for short
messages, but the openib BTL was compiled with progress threads
support. Short eager RDMA is not yet supported with progress threads;
its use has been disabled in this job.
This is a warning only; you job will attempt to continue.
#
[ptmalloc2 with no threads]
WARNING: It appears that ptmalloc2 was compiled into this process via
-lopenmpi-malloc, but there is no thread support. This combination is
known to cause memory corruption in the openib BTL. Open MPI is
therefore disabling the use of the openib BTL in this process for this
run.
Local host: %s
#
[cannot raise btl error]
The OpenFabrics driver in Open MPI tried to raise a fatal error, but
failed. Hopefully there was an error message before this one that
gave some more detailed information.
Local host: %s
Source file: %s
Source line: %d
Your job is now going to abort, sorry.
#
[no iwarp support]
Open MPI does not support iWARP devices with this version of OFED.
You need to upgrade to a later version of OFED (1.3 or later) for Open
MPI to support iWARP devices.
(This message is being displayed because you told Open MPI to use
iWARP devices via the btl_openib_device_type MCA parameter)
#
[unsupported queues configuration]
The Open MPI receive queue configuration for the OpenFabrics devices on two nodes are incompatible,
meaning that MPI processes on two specific nodes were unable to communicate with each other.
This generally happens when you are using OpenFabrics devices from different vendors on the same network.
You should be able to use the mca_btl_openib_receive_queues MCA parameter to set a uniform receive queue configuration
for all the devices in the MPI job, and therefore be able to run successfully.
Local host: %s
Local adapter: %s (vendor 0x%x, part ID %d)
Local queues: %s
Remote host: %s
Remote adapter: (vendor 0x%x, part ID %d)
Remote queues: %s
#
[conflicting transport types]
Open MPI detected two different OpenFabrics transport types in the same Infiniband network.
Such mixed network trasport configuration is not supported by Open MPI.
Local host: %s
Local adapter: %s (vendor 0x%x, part ID %d)
Local transport type: %s
Remote host: %s
Remote Adapter: (vendor 0x%x, part ID %d)
Remote transport type: %s
|