linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] Do not set MD_BROKEN on failfast io failure
@ 2025-08-28 16:32 Kenta Akagi
  2025-08-28 16:32 ` [PATCH v3 1/3] md/raid1,raid10: " Kenta Akagi
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Kenta Akagi @ 2025-08-28 16:32 UTC (permalink / raw)
  To: Song Liu, Yu Kuai, Mariusz Tkaczyk, Guoqing Jiang
  Cc: linux-raid, linux-kernel, Kenta Akagi

Changes from V2:
- Fix to prevent the array from being marked broken for all
  Failfast IOs, not just metadata.
- Reflecting the review, update raid{1,10}_error to clear
  FailfastIOFailure so that devices are properly marked Faulty.
Changes from V1:
- Avoid setting MD_BROKEN instead of clearing it
- Add pr_crit() when setting MD_BROKEN
- Fix the message may shown after all rdevs failure:
  "Operation continuing on 0 devices"

v2: https://lore.kernel.org/linux-raid/20250817172710.4892-1-k@mgml.me/
v1: https://lore.kernel.org/linux-raid/20250812090119.153697-1-k@mgml.me/

A failfast bio, for example in the case of nvme-tcp, bio will fail
immediately if the connection to the target is briefly lost and
the device enters a reconnecting state - even though it would
recover given few seconds. This behavior is by design in failfast.

However, md treats Failfast IO failures as fatal,
potentially marking the array as MD_BROKEN when a connection is lost.

For example, if an initiator - that is, a machine loading the md
module - loses all connections briefly, the array is marked
as MD_BROKEN, preventing subsequent writes.
This is the issue I am currently facing, and which this patch aims to fix.

The 1st patch changes the behavior on MD_FAILFAST IO failures on
the last rdev. The 2nd and 3rd patches modify the pr_crit messages.

Kenta Akagi (3):
  md/raid1,raid10: Do not set MD_BROKEN on failfast io failure
  md/raid1,raid10: Add error message when setting MD_BROKEN
  md/raid1,raid10: Fix: Operation continuing on 0 devices.

 drivers/md/md.c     | 14 +++++++++-----
 drivers/md/md.h     | 13 +++++++------
 drivers/md/raid1.c  | 32 ++++++++++++++++++++++++++------
 drivers/md/raid10.c | 35 ++++++++++++++++++++++++++++-------
 4 files changed, 70 insertions(+), 24 deletions(-)

-- 
2.50.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-09-05 15:07 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-28 16:32 [PATCH v3 0/3] Do not set MD_BROKEN on failfast io failure Kenta Akagi
2025-08-28 16:32 ` [PATCH v3 1/3] md/raid1,raid10: " Kenta Akagi
2025-08-29  2:54   ` Li Nan
2025-08-29 12:21     ` Kenta Akagi
2025-08-30  8:48       ` Li Nan
2025-08-30 18:10         ` Kenta Akagi
2025-09-01  3:22           ` Li Nan
2025-09-01  4:22             ` Kenta Akagi
2025-09-01  7:48               ` Yu Kuai
2025-09-01 16:48                 ` Kenta Akagi
2025-09-05 15:07                   ` Kenta Akagi
2025-08-28 16:32 ` [PATCH v3 2/3] md/raid1,raid10: Add error message when setting MD_BROKEN Kenta Akagi
2025-08-28 16:32 ` [PATCH v3 3/3] md/raid1,raid10: Fix: Operation continuing on 0 devices Kenta Akagi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).