All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] dm thin: Flush data device before committing metadata to avoid data corruption
@ 2019-12-04 14:07 Nikos Tsironis
  2019-12-04 14:07 ` [PATCH 1/2] dm thin metadata: Add support for a pre-commit callback Nikos Tsironis
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Nikos Tsironis @ 2019-12-04 14:07 UTC (permalink / raw)
  To: snitzer, agk, dm-devel; +Cc: thornber, ntsironis

The thin provisioning target maintains per thin device mappings that map
virtual blocks to data blocks in the data device.

When we write to a shared block, in case of internal snapshots, or
provision a new block, in case of external snapshots, we copy the shared
block to a new data block (COW), update the mapping for the relevant
virtual block and then issue the write to the new data block.

Suppose the data device has a volatile write-back cache and the
following sequence of events occur:

1. We write to a shared block
2. A new data block is allocated
3. We copy the shared block to the new data block using kcopyd (COW)
4. We insert the new mapping for the virtual block in the btree for that
   thin device.
5. The commit timeout expires and we commit the metadata, that now
   includes the new mapping from step (4).
6. The system crashes and the data device's cache has not been flushed,
   meaning that the COWed data are lost.

The next time we read that virtual block of the thin device we read it
from the data block allocated in step (2), since the metadata have been
successfully committed. The data are lost due to the crash, so we read
garbage instead of the old, shared data.

Moreover, apart from internal and external snapshots, the same issue
exists for newly provisioned blocks, when block zeroing is enabled.
After the system recovers the provisioned blocks might contain garbage
instead of zeroes.

For more information regarding the implications of this please see the
relevant commit.

To solve this and avoid the potential data corruption we have to flush
the pool's data device before committing its metadata.

This ensures that the data blocks of any newly inserted mappings are
properly written to non-volatile storage and won't be lost in case of a
crash.

Nikos Tsironis (2):
  dm thin metadata: Add support for a pre-commit callback
  dm thin: Flush data device before committing metadata

 drivers/md/dm-thin-metadata.c | 29 +++++++++++++++++++++++++++++
 drivers/md/dm-thin-metadata.h |  7 +++++++
 drivers/md/dm-thin.c          | 32 ++++++++++++++++++++++++++++++++
 3 files changed, 68 insertions(+)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-12-09 14:25 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-12-04 14:07 [PATCH 0/2] dm thin: Flush data device before committing metadata to avoid data corruption Nikos Tsironis
2019-12-04 14:07 ` [PATCH 1/2] dm thin metadata: Add support for a pre-commit callback Nikos Tsironis
2019-12-05 19:40   ` Mike Snitzer
2019-12-05 21:33     ` Nikos Tsironis
2019-12-04 14:07 ` [PATCH 2/2] dm thin: Flush data device before committing metadata Nikos Tsironis
2019-12-04 15:27   ` Joe Thornber
2019-12-04 16:17     ` Nikos Tsironis
2019-12-04 16:39       ` Mike Snitzer
2019-12-04 16:47         ` Nikos Tsironis
2019-12-04 19:58 ` [PATCH 0/2] dm thin: Flush data device before committing metadata to avoid data corruption Eric Wheeler
2019-12-04 20:17   ` Mike Snitzer
2019-12-05 15:31     ` Nikos Tsironis
2019-12-05 15:42       ` Mike Snitzer
2019-12-05 16:02         ` Nikos Tsironis
2019-12-05 22:34       ` Eric Wheeler
2019-12-06 15:14         ` Nikos Tsironis
2019-12-06 20:06           ` Eric Wheeler
2019-12-09 14:25             ` Nikos Tsironis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.