/usr/share/pyshared/ZODB/ConflictResolution.txt is in python-zodb 1:3.9.7-2.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 | ===================
Conflict Resolution
===================
Overview
========
Conflict resolution is a way to resolve transaction conflicts that would
otherwise abort a transaction. As such, it risks data integrity in order to
try to avoid throwing away potentially computationally expensive transactions.
The risk of harming data integrity should not be underestimated. Writing
conflict resolution code takes some responsibility for transactional
integrity away from the ZODB, and puts it in the hands of the developer
writing the conflict resolution code.
The current conflict resolution code is implemented with a storage mix-in
found in ZODB/ConflictResolution.py. The idea's proposal, and an explanation
of the interface, can be found here:
http://www.zope.org/Members/jim/ZODB/ApplicationLevelConflictResolution
Here is the most pertinent section, somewhat modified for this document's
use:
A new interface is proposed to allow object authors to provide a method
for resolving conflicts. When a conflict is detected, then the database
checks to see if the class of the object being saved defines the method,
_p_resolveConflict. If the method is defined, then the method is called
on the object. If the method succeeds, then the object change can be
committed, otherwise a ConflictError is raised as usual.
::
def _p_resolveConflict(oldState, savedState, newState):
Return the state of the object after resolving different changes.
Arguments:
oldState
The state of the object that the changes made by the current
transaction were based on.
The method is permitted to modify this value.
savedState
The state of the object that is currently stored in the
database. This state was written after oldState and reflects
changes made by a transaction that committed before the
current transaction.
The method is permitted to modify this value.
newState
The state after changes made by the current transaction.
The method is not permitted to modify this value.
This method should compute a new state by merging changes
reflected in savedState and newState, relative to oldState.
If the method cannot resolve the changes, then it should raise
ZODB.POSException.ConflictError.
Consider an extremely simple example, a counter::
>>> from persistent import Persistent
>>> class PCounter(Persistent):
... '`value` is readonly; increment it with `inc`.'
... _val = 0
... def inc(self):
... self._val += 1
... @property
... def value(self):
... return self._val
... def _p_resolveConflict(self, oldState, savedState, newState):
... oldState['_val'] = (
... savedState.get('_val', 0) +
... newState.get('_val', 0) -
... oldState.get('_val', 0))
... return oldState
...
By "state", the excerpt above means the value used by __getstate__ and
__setstate__: a dictionary, in most cases. We'll look at more details below,
but let's continue the example above with a simple successful resolution
story.
First we create a storage and a database, and put a PCounter in the database.
>>> import ZODB
>>> db = ZODB.DB('Data.fs')
>>> import transaction
>>> tm_A = transaction.TransactionManager()
>>> conn_A = db.open(transaction_manager=tm_A)
>>> p_A = conn_A.root()['p'] = PCounter()
>>> p_A.value
0
>>> tm_A.commit()
Now get another copy of 'p' so we can make a conflict. Think of `conn_A`
(connection A) as one thread, and `conn_B` (connection B) as a concurrent
thread. `p_A` is a view on the object in the first connection, and `p_B`
is a view on *the same persistent object* in the second connection.
>>> tm_B = transaction.TransactionManager()
>>> conn_B = db.open(transaction_manager=tm_B)
>>> p_B = conn_B.root()['p']
>>> p_B.value
0
>>> p_A._p_oid == p_B._p_oid
True
Now we can make a conflict, and see it resolved.
>>> p_A.inc()
>>> p_A.value
1
>>> p_B.inc()
>>> p_B.value
1
>>> tm_B.commit()
>>> p_B.value
1
>>> tm_A.commit()
>>> p_A.value
2
We need to synchronize connection B, in any of a variety of ways, to see the
change from connection A.
>>> p_B.value
1
>>> trans = tm_B.begin()
>>> p_B.value
2
A very similar class found in real world use is BTrees.Length.Length.
This conflict resolution approach is simple, yet powerful. However, it
has a few caveats and rough edges in practice. The simplicity, then, is
a bit of a disguise. Again, be warned, writing conflict resolution code
means that you claim significant responsibilty for your data integrity.
Because of the rough edges, the current conflict resolution approach is slated
for change (as of this writing, according to Jim Fulton, the ZODB
primary author and maintainer). Others have talked about different approaches
as well (see, for instance, http://www.python.org/~jeremy/weblog/031031c.html).
But for now, the _p_resolveConflict method is what we have.
Caveats and Dangers
===================
Here are caveats for working with this conflict resolution approach.
Each sub-section has a "DANGERS" section that outlines what might happen
if you ignore the warning. We work from the least danger to the most.
Conflict Resolution Is on the Server
------------------------------------
If you are using ZEO or ZRS, be aware that the classes for which you have
conflict resolution code *and* the classes of the non-persistent objects
they reference must be available to import by the *server* (or ZRS
primary).
DANGERS: You think you are going to get conflict resolution, but you won't.
Ignore `self`
-------------
Even though the _p_resolveConflict method has a "self", ignore it.
Don't change it. You make changes by returning the state. This is
effectively a class method.
DANGERS: The changes you make to the instance will be discarded. The
instance is not initialized, so other methods that depend on instance
attributes will not work.
Here's an example of a broken _p_resolveConflict method.
>>> class PCounter2(PCounter):
... def __init__(self):
... self.data = []
... def _p_resolveConflict(self, oldState, savedState, newState):
... self.data.append('bad idea')
... return super(PCounter2, self)._p_resolveConflict(
... oldState, savedState, newState)
...
Now we'll prepare for the conflict again.
>>> p2_A = conn_A.root()['p2'] = PCounter2()
>>> p2_A.value
0
>>> tm_A.commit()
>>> trans = tm_B.begin() # sync
>>> p2_B = conn_B.root()['p2']
>>> p2_B.value
0
>>> p2_A._p_oid == p2_B._p_oid
True
And now we will make a conflict.
>>> p2_A.inc()
>>> p2_A.value
1
>>> p2_B.inc()
>>> p2_B.value
1
>>> tm_B.commit()
>>> p2_B.value
1
>>> tm_A.commit() # doctest: +ELLIPSIS
Traceback (most recent call last):
...
ConflictError: database conflict error...
oops!
>>> tm_A.abort()
>>> p2_A.value
1
>>> trans = tm_B.begin()
>>> p2_B.value
1
Watch Out for Persistent Objects in the State
---------------------------------------------
If the object state has a reference to Persistent objects (instances
of classes that inherit from persistent.Persistent) then these references
*will not be loaded and are inaccessible*. Instead, persistent objects
in the state dictionary are ZODB.ConflictResolution.PersistentReference
instances. These objects have the following interface::
class IPersistentReference(zope.interface.Interface):
'''public contract for references to persistent objects from an object
with conflicts.'''
oid = zope.interface.Attribute(
'The oid of the persistent object that this reference represents')
database_name = zope.interface.Attribute(
'''The name of the database of the reference, *if* different.
If not different, None.''')
klass = zope.interface.Attribute(
'''class meta data. Presence is not reliable.''')
weak = zope.interface.Attribute(
'''bool: whether this reference is weak''')
def __cmp__(other):
'''if other is equivalent reference, return 0; else raise ValueError.
Equivalent in this case means that oid and database_name are the same.
If either is a weak reference, we only support `is` equivalence, and
otherwise raise a ValueError even if the datbase_names and oids are
the same, rather than guess at the correct semantics.
It is impossible to sort reliably, since the actual persistent
class may have its own comparison, and we have no idea what it is.
We assert that it is reasonably safe to assume that an object is
equivalent to itself, but that's as much as we can say.
We don't compare on 'is other', despite the
PersistentReferenceFactory.data cache, because it is possible to
have two references to the same object that are spelled with different
data (for instance, one with a class and one without).'''
So let's look at one of these. Let's assume we have three, `old`,
`saved`, and `new`, each representing a persistent reference to the same
object within a _p_resolveConflict call from the oldState, savedState,
and newState [#get_persistent_reference]_. They have an oid, `weak` is
False, and `database_name` is None. `klass` happens to be set but this is
not always the case.
>>> isinstance(new.oid, str)
True
>>> new.weak
False
>>> print new.database_name
None
>>> new.klass is PCounter
True
There are a few subtleties to highlight here. First, notice that the
database_name is only present if this is a cross-database reference
(see cross-database-references.txt in this directory, and examples
below). The database name and oid is sometimes a reasonable way to
reliably sort Persistent objects (see zope.app.keyreference, for
instance) but if your code compares one PersistentReference with a
database_name and another without, you need to refuse to give an answer
and raise an exception, because you can't know how the unknown
database_name sorts.
We already saw a persistent reference with a database_name of None. Now
let's suppose `new` is an example of a cross-database reference from a
database named '2' [#cross-database]_.
>>> new.database_name
'2'
As seen, the database_name is available for this cross-database reference,
and not for others. References to persistent objects, as defined in
seialize.py, have other variations, such as weak references, which are
handled but not discussed here [#instantiation_test]_
Second, notice the __cmp__ behavior [#cmp_test]_. This is new behavior
after ZODB 3.8 and addresses a serious problem for when persistent
objects are compared in an _p_resolveConflict, such as that in the ZODB
BTrees code. Prior to this change, it was not safe to use Persistent
objects as keys in a BTree. You needed to define a __cmp__ for them to
be sorted reliably out of the context of conflict resolution, but then
during conflict resolution the sorting would be arbitrary, on the basis
of the persistent reference's memory location. This could have lead to
inconsistent state for BTrees (or BTree module buckets or tree sets or sets).
Here's an example of how the new behavior stops potentially incorrect
resolution.
>>> import BTrees
>>> treeset_A = conn_A.root()['treeset'] = BTrees.family32.OI.TreeSet()
>>> tm_A.commit()
>>> trans = tm_B.begin() # sync
>>> treeset_B = conn_B.root()['treeset']
>>> treeset_A.insert(PCounter())
1
>>> treeset_B.insert(PCounter())
1
>>> tm_B.commit()
>>> tm_A.commit() # doctest: +ELLIPSIS
Traceback (most recent call last):
...
ConflictError: database conflict error...
>>> tm_A.abort()
Third, note that, even if the persistent object to which the reference refers
changes in the same transaction, the reference is still the same.
DANGERS: subtle and potentially serious. Beyond the two subtleties above,
which should now be addressed, there is a general problem for objects that
are composites of smaller persistent objects--for instance, a BTree, in
which the BTree and each bucket is a persistent object; or a
zc.queue.CompositePersistentQueue, which is a persistent queue of
persistent queues. Consider the following situation. It is actually solved,
but it is a concrete example of what might go wrong.
A BTree (persistent object) has a two buckets (persistent objects). The
second bucket has one persistent object in it. Concurrently, one thread
deletes the one object in the second bucket, which causes the BTree to dump
the bucket; and another thread puts an object in the second bucket. What
happens during conflict resolution? Remember, each persistent object cannot
see the other. From the perspective of the BTree object, it has no
conflicts: one transaction modified it, causing it to lose a bucket; and the
other transaction did not change it. From the perspective of the bucket,
one transaction deleted an object and the other added it: it will resolve
conflicts and say that the bucket has the new object and not the old one.
However, it will be garbage collected, and effectively the addition of the
new object will be lost.
As mentioned, this story is actually solved for BTrees. As
BTrees/MergeTemplate.c explains, whenever savedState or newState for a bucket
shows an empty bucket, the code refuses to resolve the conflict: this avoids
the situation above.
>>> bucket_A = conn_A.root()['bucket'] = BTrees.family32.II.Bucket()
>>> bucket_A[0] = 255
>>> tm_A.commit()
>>> trans = tm_B.begin() # sync
>>> bucket_B = conn_B.root()['bucket']
>>> bucket_B[1] = 254
>>> del bucket_A[0]
>>> tm_B.commit()
>>> tm_A.commit() # doctest: +ELLIPSIS
Traceback (most recent call last):
...
ConflictError: database conflict error...
>>> tm_A.abort()
However, the story highlights the kinds of subtle problems that units
made up of multiple composite Persistent objects need to contemplate.
Any structure made up of objects that contain persistent objects with
conflict resolution code, as a catalog index is made up of multiple
BTree Buckets and Sets, each with conflict resolution, needs to think
through these kinds of problems or be faced with potential data
integrity issues.
.. ......... ..
.. FOOTNOTES ..
.. ......... ..
.. [#get_persistent_reference] We'll catch persistent references with a class
mutable.
>>> class PCounter3(PCounter):
... data = []
... def _p_resolveConflict(self, oldState, savedState, newState):
... PCounter3.data.append(
... (oldState.get('other'),
... savedState.get('other'),
... newState.get('other')))
... return super(PCounter3, self)._p_resolveConflict(
... oldState, savedState, newState)
...
>>> p3_A = conn_A.root()['p3'] = PCounter3()
>>> p3_A.other = conn_A.root()['p']
>>> tm_A.commit()
>>> trans = tm_B.begin() # sync
>>> p3_B = conn_B.root()['p3']
>>> p3_A.inc()
>>> p3_B.inc()
>>> tm_B.commit()
>>> tm_A.commit()
>>> old, saved, new = PCounter3.data[-1]
.. [#cross-database] We need a whole different set of databases for this.
See cross-database-references.txt in this directory for a discussion of
what is going on here.
>>> databases = {}
>>> db1 = ZODB.DB('1', databases=databases, database_name='1')
>>> db2 = ZODB.DB('2', databases=databases, database_name='2')
>>> tm_multi_A = transaction.TransactionManager()
>>> conn_1A = db1.open(transaction_manager=tm_multi_A)
>>> conn_2A = conn_1A.get_connection('2')
>>> p4_1A = conn_1A.root()['p4'] = PCounter3()
>>> p5_2A = conn_2A.root()['p5'] = PCounter3()
>>> conn_2A.add(p5_2A)
>>> p4_1A.other = p5_2A
>>> tm_multi_A.commit()
>>> tm_multi_B = transaction.TransactionManager()
>>> conn_1B = db1.open(transaction_manager=tm_multi_B)
>>> p4_1B = conn_1B.root()['p4']
>>> p4_1A.inc()
>>> p4_1B.inc()
>>> tm_multi_B.commit()
>>> tm_multi_A.commit()
>>> old, saved, new = PCounter3.data[-1]
.. [#instantiation_test] We'll simply instantiate PersistentReferences
with examples of types described in ZODB/serialize.py.
>>> from ZODB.ConflictResolution import PersistentReference
>>> ref1 = PersistentReference('my_oid')
>>> ref1.oid
'my_oid'
>>> print ref1.klass
None
>>> print ref1.database_name
None
>>> ref1.weak
False
>>> ref2 = PersistentReference(('my_oid', 'my_class'))
>>> ref2.oid
'my_oid'
>>> ref2.klass
'my_class'
>>> print ref2.database_name
None
>>> ref2.weak
False
>>> ref3 = PersistentReference(['w', ('my_oid',)])
>>> ref3.oid
'my_oid'
>>> print ref3.klass
None
>>> print ref3.database_name
None
>>> ref3.weak
True
>>> ref3a = PersistentReference(['w', ('my_oid', 'other_db')])
>>> ref3a.oid
'my_oid'
>>> print ref3a.klass
None
>>> ref3a.database_name
'other_db'
>>> ref3a.weak
True
>>> ref4 = PersistentReference(['m', ('other_db', 'my_oid', 'my_class')])
>>> ref4.oid
'my_oid'
>>> ref4.klass
'my_class'
>>> ref4.database_name
'other_db'
>>> ref4.weak
False
>>> ref5 = PersistentReference(['n', ('other_db', 'my_oid')])
>>> ref5.oid
'my_oid'
>>> print ref5.klass
None
>>> ref5.database_name
'other_db'
>>> ref5.weak
False
>>> ref6 = PersistentReference(['my_oid']) # legacy
>>> ref6.oid
'my_oid'
>>> print ref6.klass
None
>>> print ref6.database_name
None
>>> ref6.weak
True
.. [#cmp_test] All references are equal to themselves.
>>> ref1 == ref1 and ref2 == ref2 and ref4 == ref4 and ref5 == ref5
True
>>> ref3 == ref3 and ref3a == ref3a and ref6 == ref6 # weak references
True
Non-weak references with the same oid and database_name are equal.
>>> ref1 == ref2 and ref4 == ref5
True
Everything else raises a ValueError: weak references with the same oid and
database, and references with a different database_name or oid.
>>> ref3 == ref6
Traceback (most recent call last):
...
ValueError: can't reliably compare against different PersistentReferences
>>> ref1 == PersistentReference(('another_oid', 'my_class'))
Traceback (most recent call last):
...
ValueError: can't reliably compare against different PersistentReferences
>>> ref4 == PersistentReference(
... ['m', ('another_db', 'my_oid', 'my_class')])
Traceback (most recent call last):
...
ValueError: can't reliably compare against different PersistentReferences
|