public inbox for linux-raid@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/9] Don't set MD_BROKEN on failfast bio failure
@ 2025-09-15  3:42 Kenta Akagi
  2025-09-15  3:42 ` [PATCH v4 1/9] md/raid1,raid10: Set the LastDev flag when the configuration changes Kenta Akagi
                   ` (8 more replies)
  0 siblings, 9 replies; 39+ messages in thread
From: Kenta Akagi @ 2025-09-15  3:42 UTC (permalink / raw)
  To: Song Liu, Yu Kuai, Mariusz Tkaczyk, Shaohua Li, Guoqing Jiang
  Cc: linux-raid, linux-kernel, Kenta Akagi

Changes from V3:
- The error handling in md_error() is now serialized, and a new helper
  function, md_bio_failure_error, has been introduced.
- MD_FAILFAST bio failures are now processed by md_bio_failure_error
  instead of signaling via FailfastIOFailure.
- RAID10: Fix missing reschedule of failfast read bio failure
- Regardless of failfast, in narrow_write_error, writes that succeed
  in retry are returned to the higher layer as success
Changes from V2:
- Fix to prevent the array from being marked broken for all
  Failfast IOs, not just metadata.
- Reflecting the review, update raid{1,10}_error to clear
  FailfastIOFailure so that devices are properly marked Faulty.
Changes from V1:
- Avoid setting MD_BROKEN instead of clearing it
- Add pr_crit() when setting MD_BROKEN
- Fix the message may shown after all rdevs failure:
  "Operation continuing on 0 devices"

v3: https://lore.kernel.org/linux-raid/20250828163216.4225-1-k@mgml.me/
v2: https://lore.kernel.org/linux-raid/20250817172710.4892-1-k@mgml.me/
v1: https://lore.kernel.org/linux-raid/20250812090119.153697-1-k@mgml.me/

When multiple MD_FAILFAST bios fail simultaneously on Failfast-enabled
rdevs in RAID1/RAID10, the following issues can occur:
* MD_BROKEN is set and the array halts, even though this should not occur
  under the intended Failfast design.
* Writes retried through narrow_write_error succeed, but the I/O is still
  reported as BLK_STS_IOERR
* RAID10 only: If a Failfast read I/O fails, it is not retried on any
  remaining rdev, and as a result, the upper layer receives an I/O error.

Simultaneous bio failures across multiple rdevs are uncommon; however,
rdevs serviced via nvme-tcp can still experience them due to something as
simple as an Ethernet fault. The issue can be reproduced using the
following steps.

# prepare nvmet/nvme-tcp and md array #
sh-5.2# cat << 'EOF' > loopback-nvme.sh
set -eu
nqn="nqn.2025-08.io.example:nvmet-test-$1"
back=$2
cd /sys/kernel/config/nvmet/
mkdir subsystems/$nqn
echo 1 > subsystems/${nqn}/attr_allow_any_host
mkdir subsystems/${nqn}/namespaces/1
echo -n ${back} > subsystems/${nqn}/namespaces/1/device_path
echo 1 > subsystems/${nqn}/namespaces/1/enable
ports="ports/1"
if [ ! -d $ports ]; then
        mkdir $ports
        cd $ports
        echo 127.0.0.1 > addr_traddr
        echo tcp       > addr_trtype
        echo 4420      > addr_trsvcid
        echo ipv4      > addr_adrfam
        cd ../../
fi
ln -s /sys/kernel/config/nvmet/subsystems/${nqn} ${ports}/subsystems/
nvme connect -t tcp -n $nqn -a 127.0.0.1 -s 4420
EOF

sh-5.2# chmod +x loopback-nvme.sh
sh-5.2# modprobe -a nvme-tcp nvmet-tcp
sh-5.2# truncate -s 1g a.img b.img
sh-5.2# losetup --show -f a.img
/dev/loop0
sh-5.2# losetup --show -f b.img
/dev/loop1
sh-5.2# ./loopback-nvme.sh 0 /dev/loop0
connecting to device: nvme0
sh-5.2# ./loopback-nvme.sh 1 /dev/loop1
connecting to device: nvme1
sh-5.2# mdadm --create --verbose /dev/md0 --level=1 --raid-devices=2 \
--failfast /dev/nvme0n1 --failfast /dev/nvme1n1
...
mdadm: array /dev/md0 started.

# run fio #
sh-5.2# fio --name=test --filename=/dev/md0 --rw=randrw --rwmixread=50 \
--bs=4k --numjobs=9 --time_based --runtime=300s --group_reporting --direct=1

# It can reproduce the issue by block nvme traffic during fio #
sh-5.2# iptables -A INPUT -m tcp -p tcp --dport 4420 -j DROP;
sh-5.2# sleep 10; # twice the default KATO value
sh-5.2# iptables -D INPUT -m tcp -p tcp --dport 4420 -j DROP


Patch 1–3 serve as preparatory changes for patch 4.
Patch 4 prevents MD_FAILFAST bio failure causing the array to fail.
Patch 5, regardless of FAILFAST, reports success to the upper layer
if a write retry in narrow_write_error succeeds without marking badblock.
Patch 6 fixes an issue where writes are not retried in a no-bbl
configuration when last rdevs MD_FAILFAST bio failure.
Patch 7 adds the missing retry path for Failfast read errors in RAID10.
Patch 8-9 modify the pr_crit handling in raid{1,10}_error.

Kenta Akagi (9):
  md/raid1,raid10: Set the LastDev flag when the configuration changes
  md: serialize md_error()
  md: introduce md_bio_failure_error()
  md/raid1,raid10: Don't set MD_BROKEN on failfast bio failure
  md/raid1,raid10: Set R{1,10}BIO_Uptodate when successful retry of a
    failed bio
  md/raid1,raid10: Fix missing retries Failfast write bios on no-bbl
    rdevs
  md/raid10: fix failfast read error not rescheduled
  md/raid1,raid10: Add error message when setting MD_BROKEN
  md/raid1,raid10: Fix: Operation continuing on 0 devices.

 drivers/md/md.c     |  61 ++++++++++++++---
 drivers/md/md.h     |  12 +++-
 drivers/md/raid1.c  | 156 ++++++++++++++++++++++++++++++++------------
 drivers/md/raid10.c | 115 ++++++++++++++++++++++++++------
 4 files changed, 270 insertions(+), 74 deletions(-)

-- 
2.50.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2025-09-23 16:03 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-15  3:42 [PATCH v4 0/9] Don't set MD_BROKEN on failfast bio failure Kenta Akagi
2025-09-15  3:42 ` [PATCH v4 1/9] md/raid1,raid10: Set the LastDev flag when the configuration changes Kenta Akagi
2025-09-18  1:00   ` Yu Kuai
2025-09-18 14:02     ` Kenta Akagi
2025-09-21  7:54   ` Xiao Ni
2025-09-21 14:48     ` Kenta Akagi
2025-09-15  3:42 ` [PATCH v4 2/9] md: serialize md_error() Kenta Akagi
2025-09-18  1:04   ` Yu Kuai
2025-09-21  6:11     ` Kenta Akagi
2025-09-15  3:42 ` [PATCH v4 3/9] md: introduce md_bio_failure_error() Kenta Akagi
2025-09-18  1:09   ` Yu Kuai
2025-09-18 14:56     ` Kenta Akagi
2025-09-15  3:42 ` [PATCH v4 4/9] md/raid1,raid10: Don't set MD_BROKEN on failfast bio failure Kenta Akagi
2025-09-18  1:26   ` Yu Kuai
2025-09-18 15:22     ` Kenta Akagi
2025-09-19  1:36       ` Yu Kuai
2025-09-20  6:30         ` Kenta Akagi
2025-09-20  9:51           ` Yu Kuai
2025-09-23 15:54             ` Kenta Akagi
2025-09-15  3:42 ` [PATCH v4 5/9] md/raid1,raid10: Set R{1,10}BIO_Uptodate when successful retry of a failed bio Kenta Akagi
2025-09-17  9:24   ` Li Nan
2025-09-17 13:20     ` Kenta Akagi
2025-09-18  6:39       ` Li Nan
2025-09-18 15:36         ` Kenta Akagi
2025-09-19  1:37           ` Li Nan
2025-09-15  3:42 ` [PATCH v4 6/9] md/raid1,raid10: Fix missing retries Failfast write bios on no-bbl rdevs Kenta Akagi
2025-09-17 10:06   ` Li Nan
2025-09-17 13:33     ` Kenta Akagi
2025-09-18  6:58       ` Li Nan
2025-09-18 16:23         ` Kenta Akagi
2025-09-19  1:28           ` Li Nan
2025-09-15  3:42 ` [PATCH v4 7/9] md/raid10: fix failfast read error not rescheduled Kenta Akagi
2025-09-18  7:38   ` Li Nan
2025-09-18 16:12     ` Kenta Akagi
2025-09-19  1:20       ` Li Nan
2025-09-15  3:42 ` [PATCH v4 8/9] md/raid1,raid10: Add error message when setting MD_BROKEN Kenta Akagi
2025-09-15  3:42 ` [PATCH v4 9/9] md/raid1,raid10: Fix: Operation continuing on 0 devices Kenta Akagi
2025-09-15  7:19   ` Paul Menzel
2025-09-15  8:19     ` Kenta Akagi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox