* [PATCH 0/2] md: flush deadlock bugfix
@ 2024-05-25 18:52 linan666
2024-05-25 18:52 ` [PATCH 1/2] md: change the return value type of md_write_start to void linan666
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: linan666 @ 2024-05-25 18:52 UTC (permalink / raw)
To: song, yukuai3
Cc: linux-raid, linux-kernel, linan666, yi.zhang, houtao1, yangerkun
From: Li Nan <linan122@huawei.com>
I recently identified a flush deadlock issue, which can be resolved
by this patch set. After testing for a day in an environment where the
problem can be easily reproduced, I did not encounter the issue again.
Before a complete overwrite of the md flush, first fix the issue with
this patch set.
Li Nan (2):
md: change the return value type of md_write_start to void
md: fix deadlock between mddev_suspend and flush bio
drivers/md/md.h | 2 +-
drivers/md/md.c | 40 +++++++++++++++++++---------------------
drivers/md/raid1.c | 3 +--
drivers/md/raid10.c | 3 +--
drivers/md/raid5.c | 3 +--
5 files changed, 23 insertions(+), 28 deletions(-)
--
2.39.2
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH 1/2] md: change the return value type of md_write_start to void 2024-05-25 18:52 [PATCH 0/2] md: flush deadlock bugfix linan666 @ 2024-05-25 18:52 ` linan666 2024-05-28 12:53 ` Yu Kuai 2024-05-25 18:52 ` [PATCH 2/2] md: fix deadlock between mddev_suspend and flush bio linan666 2024-06-10 20:52 ` [PATCH 0/2] md: flush deadlock bugfix Song Liu 2 siblings, 1 reply; 7+ messages in thread From: linan666 @ 2024-05-25 18:52 UTC (permalink / raw) To: song, yukuai3 Cc: linux-raid, linux-kernel, linan666, yi.zhang, houtao1, yangerkun From: Li Nan <linan122@huawei.com> Commit cc27b0c78c79 ("md: fix deadlock between mddev_suspend() and md_write_start()") aborted md_write_start() with false when mddev is suspended, which fixed a deadlock if calling mddev_suspend() with holding reconfig_mutex(). Since mddev_suspend() now includes lockdep_assert_not_held(), it no longer holds the reconfig_mutex. This makes previous abort unnecessary. Now, remove unnecessary abort and change function return value to void. Signed-off-by: Li Nan <linan122@huawei.com> --- drivers/md/md.h | 2 +- drivers/md/md.c | 14 ++++---------- drivers/md/raid1.c | 3 +-- drivers/md/raid10.c | 3 +-- drivers/md/raid5.c | 3 +-- 5 files changed, 8 insertions(+), 17 deletions(-) diff --git a/drivers/md/md.h b/drivers/md/md.h index ca085ecad504..487582058f74 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -785,7 +785,7 @@ extern void md_unregister_thread(struct mddev *mddev, struct md_thread __rcu **t extern void md_wakeup_thread(struct md_thread __rcu *thread); extern void md_check_recovery(struct mddev *mddev); extern void md_reap_sync_thread(struct mddev *mddev); -extern bool md_write_start(struct mddev *mddev, struct bio *bi); +extern void md_write_start(struct mddev *mddev, struct bio *bi); extern void md_write_inc(struct mddev *mddev, struct bio *bi); extern void md_write_end(struct mddev *mddev); extern void md_done_sync(struct mddev *mddev, int blocks, int ok); diff --git a/drivers/md/md.c b/drivers/md/md.c index 509e5638cea1..14d6e615bcbb 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -8638,12 +8638,12 @@ EXPORT_SYMBOL(md_done_sync); * A return value of 'false' means that the write wasn't recorded * and cannot proceed as the array is being suspend. */ -bool md_write_start(struct mddev *mddev, struct bio *bi) +void md_write_start(struct mddev *mddev, struct bio *bi) { int did_change = 0; if (bio_data_dir(bi) != WRITE) - return true; + return; BUG_ON(mddev->ro == MD_RDONLY); if (mddev->ro == MD_AUTO_READ) { @@ -8676,15 +8676,9 @@ bool md_write_start(struct mddev *mddev, struct bio *bi) if (did_change) sysfs_notify_dirent_safe(mddev->sysfs_state); if (!mddev->has_superblocks) - return true; + return; wait_event(mddev->sb_wait, - !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) || - is_md_suspended(mddev)); - if (test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags)) { - percpu_ref_put(&mddev->writes_pending); - return false; - } - return true; + !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags)); } EXPORT_SYMBOL(md_write_start); diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 7b8a71ca66dd..0d80ff471c73 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1687,8 +1687,7 @@ static bool raid1_make_request(struct mddev *mddev, struct bio *bio) if (bio_data_dir(bio) == READ) raid1_read_request(mddev, bio, sectors, NULL); else { - if (!md_write_start(mddev,bio)) - return false; + md_write_start(mddev,bio); raid1_write_request(mddev, bio, sectors); } return true; diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index a4556d2e46bf..f8d7c02c6ed5 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -1836,8 +1836,7 @@ static bool raid10_make_request(struct mddev *mddev, struct bio *bio) && md_flush_request(mddev, bio)) return true; - if (!md_write_start(mddev, bio)) - return false; + md_write_start(mddev, bio); if (unlikely(bio_op(bio) == REQ_OP_DISCARD)) if (!raid10_handle_discard(mddev, bio)) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 2bd1ce9b3922..a84389311dd1 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -6078,8 +6078,7 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi) ctx.do_flush = bi->bi_opf & REQ_PREFLUSH; } - if (!md_write_start(mddev, bi)) - return false; + md_write_start(mddev, bi); /* * If array is degraded, better not do chunk aligned read because * later we might have to read it again in order to reconstruct -- 2.39.2 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] md: change the return value type of md_write_start to void 2024-05-25 18:52 ` [PATCH 1/2] md: change the return value type of md_write_start to void linan666 @ 2024-05-28 12:53 ` Yu Kuai 0 siblings, 0 replies; 7+ messages in thread From: Yu Kuai @ 2024-05-28 12:53 UTC (permalink / raw) To: linan666, song Cc: linux-raid, linux-kernel, yi.zhang, houtao1, yangerkun, yukuai (C), Yu Kuai 在 2024/05/26 2:52, linan666@huaweicloud.com 写道: > From: Li Nan <linan122@huawei.com> > > Commit cc27b0c78c79 ("md: fix deadlock between mddev_suspend() and > md_write_start()") aborted md_write_start() with false when mddev is > suspended, which fixed a deadlock if calling mddev_suspend() with > holding reconfig_mutex(). Since mddev_suspend() now includes > lockdep_assert_not_held(), it no longer holds the reconfig_mutex. This > makes previous abort unnecessary. Now, remove unnecessary abort and > change function return value to void. Nice cleanup, feel free to add: Reviewed-by: Yu Kuai <yukuai3@huawei.com> > > Signed-off-by: Li Nan <linan122@huawei.com> > --- > drivers/md/md.h | 2 +- > drivers/md/md.c | 14 ++++---------- > drivers/md/raid1.c | 3 +-- > drivers/md/raid10.c | 3 +-- > drivers/md/raid5.c | 3 +-- > 5 files changed, 8 insertions(+), 17 deletions(-) > > diff --git a/drivers/md/md.h b/drivers/md/md.h > index ca085ecad504..487582058f74 100644 > --- a/drivers/md/md.h > +++ b/drivers/md/md.h > @@ -785,7 +785,7 @@ extern void md_unregister_thread(struct mddev *mddev, struct md_thread __rcu **t > extern void md_wakeup_thread(struct md_thread __rcu *thread); > extern void md_check_recovery(struct mddev *mddev); > extern void md_reap_sync_thread(struct mddev *mddev); > -extern bool md_write_start(struct mddev *mddev, struct bio *bi); > +extern void md_write_start(struct mddev *mddev, struct bio *bi); > extern void md_write_inc(struct mddev *mddev, struct bio *bi); > extern void md_write_end(struct mddev *mddev); > extern void md_done_sync(struct mddev *mddev, int blocks, int ok); > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 509e5638cea1..14d6e615bcbb 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -8638,12 +8638,12 @@ EXPORT_SYMBOL(md_done_sync); > * A return value of 'false' means that the write wasn't recorded > * and cannot proceed as the array is being suspend. > */ > -bool md_write_start(struct mddev *mddev, struct bio *bi) > +void md_write_start(struct mddev *mddev, struct bio *bi) > { > int did_change = 0; > > if (bio_data_dir(bi) != WRITE) > - return true; > + return; > > BUG_ON(mddev->ro == MD_RDONLY); > if (mddev->ro == MD_AUTO_READ) { > @@ -8676,15 +8676,9 @@ bool md_write_start(struct mddev *mddev, struct bio *bi) > if (did_change) > sysfs_notify_dirent_safe(mddev->sysfs_state); > if (!mddev->has_superblocks) > - return true; > + return; > wait_event(mddev->sb_wait, > - !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) || > - is_md_suspended(mddev)); > - if (test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags)) { > - percpu_ref_put(&mddev->writes_pending); > - return false; > - } > - return true; > + !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags)); > } > EXPORT_SYMBOL(md_write_start); > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 7b8a71ca66dd..0d80ff471c73 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -1687,8 +1687,7 @@ static bool raid1_make_request(struct mddev *mddev, struct bio *bio) > if (bio_data_dir(bio) == READ) > raid1_read_request(mddev, bio, sectors, NULL); > else { > - if (!md_write_start(mddev,bio)) > - return false; > + md_write_start(mddev,bio); > raid1_write_request(mddev, bio, sectors); > } > return true; > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c > index a4556d2e46bf..f8d7c02c6ed5 100644 > --- a/drivers/md/raid10.c > +++ b/drivers/md/raid10.c > @@ -1836,8 +1836,7 @@ static bool raid10_make_request(struct mddev *mddev, struct bio *bio) > && md_flush_request(mddev, bio)) > return true; > > - if (!md_write_start(mddev, bio)) > - return false; > + md_write_start(mddev, bio); > > if (unlikely(bio_op(bio) == REQ_OP_DISCARD)) > if (!raid10_handle_discard(mddev, bio)) > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index 2bd1ce9b3922..a84389311dd1 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -6078,8 +6078,7 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi) > ctx.do_flush = bi->bi_opf & REQ_PREFLUSH; > } > > - if (!md_write_start(mddev, bi)) > - return false; > + md_write_start(mddev, bi); > /* > * If array is degraded, better not do chunk aligned read because > * later we might have to read it again in order to reconstruct > ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] md: fix deadlock between mddev_suspend and flush bio 2024-05-25 18:52 [PATCH 0/2] md: flush deadlock bugfix linan666 2024-05-25 18:52 ` [PATCH 1/2] md: change the return value type of md_write_start to void linan666 @ 2024-05-25 18:52 ` linan666 2024-05-28 13:12 ` Yu Kuai 2024-08-17 22:49 ` Michael 2024-06-10 20:52 ` [PATCH 0/2] md: flush deadlock bugfix Song Liu 2 siblings, 2 replies; 7+ messages in thread From: linan666 @ 2024-05-25 18:52 UTC (permalink / raw) To: song, yukuai3 Cc: linux-raid, linux-kernel, linan666, yi.zhang, houtao1, yangerkun From: Li Nan <linan122@huawei.com> Deadlock occurs when mddev is being suspended while some flush bio is in progress. It is a complex issue. T1. the first flush is at the ending stage, it clears 'mddev->flush_bio' and tries to submit data, but is blocked because mddev is suspended by T4. T2. the second flush sets 'mddev->flush_bio', and attempts to queue md_submit_flush_data(), which is already running (T1) and won't execute again if on the same CPU as T1. T3. the third flush inc active_io and tries to flush, but is blocked because 'mddev->flush_bio' is not NULL (set by T2). T4. mddev_suspend() is called and waits for active_io dec to 0 which is inc by T3. T1 T2 T3 T4 (flush 1) (flush 2) (third 3) (suspend) md_submit_flush_data mddev->flush_bio = NULL; . . md_flush_request . mddev->flush_bio = bio . queue submit_flushes . . . . md_handle_request . . active_io + 1 . . md_flush_request . . wait !mddev->flush_bio . . . . mddev_suspend . . wait !active_io . . . submit_flushes . queue_work md_submit_flush_data . //md_submit_flush_data is already running (T1) . md_handle_request wait resume The root issue is non-atomic inc/dec of active_io during flush process. active_io is dec before md_submit_flush_data is queued, and inc soon after md_submit_flush_data() run. md_flush_request active_io + 1 submit_flushes active_io - 1 md_submit_flush_data md_handle_request active_io + 1 make_request active_io - 1 If active_io is dec after md_handle_request() instead of within submit_flushes(), make_request() can be called directly intead of md_handle_request() in md_submit_flush_data(), and active_io will only inc and dec once in the whole flush process. Deadlock will be fixed. Additionally, the only difference between fixing the issue and before is that there is no return error handling of make_request(). But after previous patch cleaned md_write_start(), make_requst() only return error in raid5_make_request() by dm-raid, see commit 41425f96d7aa ("dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io concurrent with reshape)". Since dm always splits data and flush operation into two separate io, io size of flush submitted by dm always is 0, make_request() will not be called in md_submit_flush_data(). To prevent future modifications from introducing issues, add WARN_ON to ensure make_request() no error is returned in this context. Fixes: fa2bbff7b0b4 ("md: synchronize flush io with array reconfiguration") Signed-off-by: Li Nan <linan122@huawei.com> --- drivers/md/md.c | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 14d6e615bcbb..9bb7e627e57f 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -550,13 +550,9 @@ static void md_end_flush(struct bio *bio) rdev_dec_pending(rdev, mddev); - if (atomic_dec_and_test(&mddev->flush_pending)) { - /* The pair is percpu_ref_get() from md_flush_request() */ - percpu_ref_put(&mddev->active_io); - + if (atomic_dec_and_test(&mddev->flush_pending)) /* The pre-request flush has finished */ queue_work(md_wq, &mddev->flush_work); - } } static void md_submit_flush_data(struct work_struct *ws); @@ -587,12 +583,8 @@ static void submit_flushes(struct work_struct *ws) rcu_read_lock(); } rcu_read_unlock(); - if (atomic_dec_and_test(&mddev->flush_pending)) { - /* The pair is percpu_ref_get() from md_flush_request() */ - percpu_ref_put(&mddev->active_io); - + if (atomic_dec_and_test(&mddev->flush_pending)) queue_work(md_wq, &mddev->flush_work); - } } static void md_submit_flush_data(struct work_struct *ws) @@ -617,8 +609,20 @@ static void md_submit_flush_data(struct work_struct *ws) bio_endio(bio); } else { bio->bi_opf &= ~REQ_PREFLUSH; - md_handle_request(mddev, bio); + + /* + * make_requst() will never return error here, it only + * returns error in raid5_make_request() by dm-raid. + * Since dm always splits data and flush operation into + * two separate io, io size of flush submitted by dm + * always is 0, make_request() will not be called here. + */ + if (WARN_ON_ONCE(!mddev->pers->make_request(mddev, bio))) + bio_io_error(bio);; } + + /* The pair is percpu_ref_get() from md_flush_request() */ + percpu_ref_put(&mddev->active_io); } /* -- 2.39.2 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] md: fix deadlock between mddev_suspend and flush bio 2024-05-25 18:52 ` [PATCH 2/2] md: fix deadlock between mddev_suspend and flush bio linan666 @ 2024-05-28 13:12 ` Yu Kuai 2024-08-17 22:49 ` Michael 1 sibling, 0 replies; 7+ messages in thread From: Yu Kuai @ 2024-05-28 13:12 UTC (permalink / raw) To: linan666, song Cc: linux-raid, linux-kernel, yi.zhang, houtao1, yangerkun, yukuai (C) Hi, 在 2024/05/26 2:52, linan666@huaweicloud.com 写道: > From: Li Nan <linan122@huawei.com> > > Deadlock occurs when mddev is being suspended while some flush bio is in > progress. It is a complex issue. > > T1. the first flush is at the ending stage, it clears 'mddev->flush_bio' > and tries to submit data, but is blocked because mddev is suspended > by T4. > T2. the second flush sets 'mddev->flush_bio', and attempts to queue > md_submit_flush_data(), which is already running (T1) and won't > execute again if on the same CPU as T1. > T3. the third flush inc active_io and tries to flush, but is blocked because > 'mddev->flush_bio' is not NULL (set by T2). > T4. mddev_suspend() is called and waits for active_io dec to 0 which is inc > by T3. > > T1 T2 T3 T4 > (flush 1) (flush 2) (third 3) (suspend) > md_submit_flush_data > mddev->flush_bio = NULL; > . > . md_flush_request > . mddev->flush_bio = bio > . queue submit_flushes > . . > . . md_handle_request > . . active_io + 1 > . . md_flush_request > . . wait !mddev->flush_bio > . . > . . mddev_suspend > . . wait !active_io > . . > . submit_flushes > . queue_work md_submit_flush_data > . //md_submit_flush_data is already running (T1) > . > md_handle_request > wait resume > > The root issue is non-atomic inc/dec of active_io during flush process. > active_io is dec before md_submit_flush_data is queued, and inc soon > after md_submit_flush_data() run. > md_flush_request > active_io + 1 > submit_flushes > active_io - 1 > md_submit_flush_data > md_handle_request > active_io + 1 > make_request > active_io - 1 > > If active_io is dec after md_handle_request() instead of within > submit_flushes(), make_request() can be called directly intead of > md_handle_request() in md_submit_flush_data(), and active_io will > only inc and dec once in the whole flush process. Deadlock will be > fixed. > > Additionally, the only difference between fixing the issue and before is > that there is no return error handling of make_request(). But after > previous patch cleaned md_write_start(), make_requst() only return error > in raid5_make_request() by dm-raid, see commit 41425f96d7aa ("dm-raid456, > md/raid456: fix a deadlock for dm-raid456 while io concurrent with > reshape)". Since dm always splits data and flush operation into two > separate io, io size of flush submitted by dm always is 0, make_request() > will not be called in md_submit_flush_data(). To prevent future > modifications from introducing issues, add WARN_ON to ensure > make_request() no error is returned in this context. > > Fixes: fa2bbff7b0b4 ("md: synchronize flush io with array reconfiguration") > Signed-off-by: Li Nan <linan122@huawei.com> The patch itself looks correct. However, there was a plan to remove the flush handling and submit the flush bio directly to underlying disks like dm. Because md_flush_request(), which is fast patch, grab a disk level spinlock mddev->lock, and will affect performance. I'm fine taking this patch first, I'll leave the decision to Song. Thanks, Kuai > --- > drivers/md/md.c | 26 +++++++++++++++----------- > 1 file changed, 15 insertions(+), 11 deletions(-) > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 14d6e615bcbb..9bb7e627e57f 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -550,13 +550,9 @@ static void md_end_flush(struct bio *bio) > > rdev_dec_pending(rdev, mddev); > > - if (atomic_dec_and_test(&mddev->flush_pending)) { > - /* The pair is percpu_ref_get() from md_flush_request() */ > - percpu_ref_put(&mddev->active_io); > - > + if (atomic_dec_and_test(&mddev->flush_pending)) > /* The pre-request flush has finished */ > queue_work(md_wq, &mddev->flush_work); > - } > } > > static void md_submit_flush_data(struct work_struct *ws); > @@ -587,12 +583,8 @@ static void submit_flushes(struct work_struct *ws) > rcu_read_lock(); > } > rcu_read_unlock(); > - if (atomic_dec_and_test(&mddev->flush_pending)) { > - /* The pair is percpu_ref_get() from md_flush_request() */ > - percpu_ref_put(&mddev->active_io); > - > + if (atomic_dec_and_test(&mddev->flush_pending)) > queue_work(md_wq, &mddev->flush_work); > - } > } > > static void md_submit_flush_data(struct work_struct *ws) > @@ -617,8 +609,20 @@ static void md_submit_flush_data(struct work_struct *ws) > bio_endio(bio); > } else { > bio->bi_opf &= ~REQ_PREFLUSH; > - md_handle_request(mddev, bio); > + > + /* > + * make_requst() will never return error here, it only > + * returns error in raid5_make_request() by dm-raid. > + * Since dm always splits data and flush operation into > + * two separate io, io size of flush submitted by dm > + * always is 0, make_request() will not be called here. > + */ > + if (WARN_ON_ONCE(!mddev->pers->make_request(mddev, bio))) > + bio_io_error(bio);; > } > + > + /* The pair is percpu_ref_get() from md_flush_request() */ > + percpu_ref_put(&mddev->active_io); > } > > /* > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] md: fix deadlock between mddev_suspend and flush bio 2024-05-25 18:52 ` [PATCH 2/2] md: fix deadlock between mddev_suspend and flush bio linan666 2024-05-28 13:12 ` Yu Kuai @ 2024-08-17 22:49 ` Michael 1 sibling, 0 replies; 7+ messages in thread From: Michael @ 2024-08-17 22:49 UTC (permalink / raw) To: linan666 Cc: houtao1, linux-kernel, linux-raid, song, yangerkun, yi.zhang, yukuai3 git send-email \ --in-reply-to=20240525185257.3896201-3-linan666@huaweicloud.com \ --to=linan666@huaweicloud.com \ --cc=houtao1@huawei.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-raid@vger.kernel.org \ --cc=song@kernel.org \ --cc=yangerkun@huawei.com \ --cc=yi.zhang@huawei.com \ --cc=yukuai3@huawei.com \ /path/to/YOUR_REPLY ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/2] md: flush deadlock bugfix 2024-05-25 18:52 [PATCH 0/2] md: flush deadlock bugfix linan666 2024-05-25 18:52 ` [PATCH 1/2] md: change the return value type of md_write_start to void linan666 2024-05-25 18:52 ` [PATCH 2/2] md: fix deadlock between mddev_suspend and flush bio linan666 @ 2024-06-10 20:52 ` Song Liu 2 siblings, 0 replies; 7+ messages in thread From: Song Liu @ 2024-06-10 20:52 UTC (permalink / raw) To: linan666; +Cc: yukuai3, linux-raid, linux-kernel, yi.zhang, houtao1, yangerkun On Sat, May 25, 2024 at 4:00 AM <linan666@huaweicloud.com> wrote: > > From: Li Nan <linan122@huawei.com> > > I recently identified a flush deadlock issue, which can be resolved > by this patch set. After testing for a day in an environment where the > problem can be easily reproduced, I did not encounter the issue again. > > Before a complete overwrite of the md flush, first fix the issue with > this patch set. > > Li Nan (2): > md: change the return value type of md_write_start to void > md: fix deadlock between mddev_suspend and flush bio Applied the set to md-6.11. Thanks! Song ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-08-17 23:09 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-05-25 18:52 [PATCH 0/2] md: flush deadlock bugfix linan666 2024-05-25 18:52 ` [PATCH 1/2] md: change the return value type of md_write_start to void linan666 2024-05-28 12:53 ` Yu Kuai 2024-05-25 18:52 ` [PATCH 2/2] md: fix deadlock between mddev_suspend and flush bio linan666 2024-05-28 13:12 ` Yu Kuai 2024-08-17 22:49 ` Michael 2024-06-10 20:52 ` [PATCH 0/2] md: flush deadlock bugfix Song Liu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).