virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* Fix potential data loss and corruption due to Incorrect BIO Chain Handling
@ 2025-11-21  8:17 zhangshida
  2025-11-21  8:17 ` [PATCH 1/9] block: fix data loss and stale date exposure problems during append write zhangshida
                   ` (10 more replies)
  0 siblings, 11 replies; 40+ messages in thread
From: zhangshida @ 2025-11-21  8:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-block, nvdimm, virtualization, linux-nvme, gfs2, ntfs3,
	linux-xfs, zhangshida, starzhangzsd

From: Shida Zhang <zhangshida@kylinos.cn>

Hello everyone,

We have recently encountered a severe data loss issue on kernel version 4.19,
and we suspect the same underlying problem may exist in the latest kernel versions.

Environment:
*   **Architecture:** arm64
*   **Page Size:** 64KB
*   **Filesystem:** XFS with a 4KB block size

Scenario:
The issue occurs while running a MySQL instance where one thread appends data
to a log file, and a separate thread concurrently reads that file to perform
CRC checks on its contents.

Problem Description:
Occasionally, the reading thread detects data corruption. Specifically, it finds
that stale data has been exposed in the middle of the file.

We have captured four instances of this corruption in our production environment.
In each case, we observed a distinct pattern:
    The corruption starts at an offset that aligns with the beginning of an XFS extent.
    The corruption ends at an offset that is aligned to the system's `PAGE_SIZE` (64KB in our case).

Corruption Instances:
1.  Start:`0x73be000`, **End:** `0x73c0000` (Length: 8KB)
2.  Start:`0x10791a000`, **End:** `0x107920000` (Length: 24KB)
3.  Start:`0x14535a000`, **End:** `0x145b70000` (Length: 8280KB)
4.  Start:`0x370d000`, **End:** `0x3710000` (Length: 12KB)

After analysis, we believe the root cause is in the handling of chained bios, specifically
related to out-of-order io completion.

Consider a bio chain where `bi_remaining` is decremented as each bio in the chain completes.
For example,
if a chain consists of three bios (bio1 -> bio2 -> bio3) with
bi_remaining count:
1->2->2
if the bio completes in the reverse order, there will be a problem. 
if bio 3 completes first, it will become:
1->2->1
then bio 2 completes:
1->1->0

Because `bi_remaining` has reached zero, the final `end_io` callback for the entire chain
is triggered, even though not all bios in the chain have actually finished processing.
This premature completion can lead to stale data being exposed, as seen in our case.

The core issue appears to be that `bio_chain_endio` does not check if the current bio's
`bi_remaining` count has reached zero before proceeding to the next I/O.

Proposed Fix:
Removing `__bio_chain_endio` and allowing the standard `bio_endio` to handle the completion
logic should resolve this issue, as `bio_endio` correctly manages the `bi_remaining` counter.

Shida Zhang (9):
  block: fix data loss and stale date exposure problems during append
    write
  block: export bio_chain_and_submit
  gfs2: use bio_chain_and_submit for simplification
  xfs: use bio_chain_and_submit for simplification
  block: use bio_chain_and_submit for simplification
  fs/ntfs3: use bio_chain_and_submit for simplification
  zram: use bio_chain_and_submit for simplification
  nvmet: fix the potential bug and use bio_chain_and_submit for
    simplification
  nvdimm: use bio_chain_and_submit for simplification

 block/bio.c                       |  3 ++-
 drivers/block/zram/zram_drv.c     |  3 +--
 drivers/nvdimm/nd_virtio.c        |  3 +--
 drivers/nvme/target/io-cmd-bdev.c |  3 +--
 fs/gfs2/lops.c                    |  3 +--
 fs/ntfs3/fsntfs.c                 | 12 ++----------
 fs/squashfs/block.c               |  3 +--
 fs/xfs/xfs_bio_io.c               |  3 +--
 fs/xfs/xfs_buf.c                  |  3 +--
 fs/xfs/xfs_log.c                  |  3 +--
 10 files changed, 12 insertions(+), 27 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2025-11-28  6:27 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-21  8:17 Fix potential data loss and corruption due to Incorrect BIO Chain Handling zhangshida
2025-11-21  8:17 ` [PATCH 1/9] block: fix data loss and stale date exposure problems during append write zhangshida
2025-11-21  9:34   ` Johannes Thumshirn
2025-11-22  7:08     ` Stephen Zhang
2025-11-21 10:31   ` Christoph Hellwig
2025-11-21 16:13     ` Andreas Gruenbacher
2025-11-22  7:25       ` Stephen Zhang
2025-11-28  3:22       ` Stephen Zhang
2025-11-28  5:55         ` Christoph Hellwig
2025-11-28  6:26           ` Stephen Zhang
2025-11-22 12:15   ` Ming Lei
2025-11-21  8:17 ` [PATCH 2/9] block: export bio_chain_and_submit zhangshida
2025-11-21 10:32   ` Christoph Hellwig
2025-11-21 17:12   ` Andreas Gruenbacher
2025-11-22  7:02     ` Stephen Zhang
2025-11-21  8:17 ` [PATCH 3/9] gfs2: use bio_chain_and_submit for simplification zhangshida
2025-11-21  8:17 ` [PATCH 4/9] xfs: " zhangshida
2025-11-21  8:17 ` [PATCH 5/9] block: " zhangshida
2025-11-21  8:17 ` [PATCH 6/9] fs/ntfs3: " zhangshida
2025-11-21  8:17 ` [PATCH 7/9] zram: " zhangshida
2025-11-21  8:17 ` [PATCH 8/9] nvmet: fix the potential bug and " zhangshida
2025-11-21  8:17 ` [PATCH 9/9] nvdimm: " zhangshida
2025-11-21 10:37 ` Fix potential data loss and corruption due to Incorrect BIO Chain Handling Christoph Hellwig
2025-11-22  6:38   ` Stephen Zhang
2025-11-24  6:22     ` Christoph Hellwig
2025-11-27  7:05       ` Stephen Zhang
2025-11-27  7:14         ` Christoph Hellwig
2025-11-27  7:40           ` Gao Xiang
2025-11-27 14:46             ` Christoph Hellwig
2025-11-28  1:32               ` Stephen Zhang
2025-11-28  1:29           ` Stephen Zhang
2025-11-22  3:35 ` Ming Lei
2025-11-22  6:42   ` Stephen Zhang
2025-11-22  7:46     ` Andreas Gruenbacher
2025-11-22 12:01     ` Ming Lei
2025-11-22 14:56       ` Andreas Gruenbacher
2025-11-23  3:14         ` Stephen Zhang
2025-11-23 13:48         ` Ming Lei
2025-11-24  1:28           ` Stephen Zhang
2025-11-24  2:00             ` Stephen Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).