linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] md/raid1,raid10: don't broken array on failfast metadata write fails
@ 2025-08-17 17:27 Kenta Akagi
  2025-08-17 17:27 ` [PATCH v2 1/3] " Kenta Akagi
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Kenta Akagi @ 2025-08-17 17:27 UTC (permalink / raw)
  To: Song Liu, Yu Kuai, Mariusz Tkaczyk, Guoqing Jiang
  Cc: linux-raid, linux-kernel, Kenta Akagi

Changes from V1:
- Avoid setting MD_BROKEN instead of clearing it
- Add pr_crit() when setting MD_BROKEN
- Fix the message may shown after all rdevs failure:
  "Operation continuing on 0 devices"

A failfast bio, for example in the case of nvme-tcp,
will fail immediately if the connection to the target is
lost for a few seconds and the device enters a reconnecting
state - even though it would recover if given a few seconds.
This behavior is exactly as intended by the design of failfast.

However, md treats super_write operations fails with failfast as fatal.
For example, if an initiator - that is, a machine loading the md module -
loses all connections for a few seconds, the array becomes
broken and subsequent write is no longer possible.
This is the issue I am currently facing, and which this patch aims to fix.

The 1st patch changes the behavior on super_write MD_FAILFAST IO failures.
The 2nd and 3rd patches modify the output of pr_crit.

Kenta Akagi (3):
  md/raid1,raid10: don't broken array on failfast metadata write fails
  md/raid1,raid10: Add error message when setting MD_BROKEN
  md/raid1,raid10: Fix: Operation continuing on 0 devices.

 drivers/md/md.c     |  9 ++++++---
 drivers/md/md.h     |  7 ++++---
 drivers/md/raid1.c  | 26 ++++++++++++++++++++------
 drivers/md/raid10.c | 26 ++++++++++++++++++++------
 4 files changed, 50 insertions(+), 18 deletions(-)

-- 
2.50.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-08-27 17:31 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-17 17:27 [PATCH v2 0/3] md/raid1,raid10: don't broken array on failfast metadata write fails Kenta Akagi
2025-08-17 17:27 ` [PATCH v2 1/3] " Kenta Akagi
2025-08-18  2:05   ` Yu Kuai
2025-08-18  2:48     ` Yu Kuai
2025-08-18 12:48       ` Kenta Akagi
2025-08-18 15:45         ` Yu Kuai
2025-08-20 17:09           ` Kenta Akagi
2025-08-23  1:54   ` Li Nan
2025-08-27 17:31     ` Kenta Akagi
2025-08-17 17:27 ` [PATCH v2 2/3] md/raid1,raid10: Add error message when setting MD_BROKEN Kenta Akagi
2025-08-17 17:27 ` [PATCH v2 3/3] md/raid1,raid10: Fix: Operation continuing on 0 devices Kenta Akagi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).