* [PATCH -next 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log()
@ 2023-06-27 8:56 Yu Kuai
2023-06-27 8:56 ` [PATCH -next 1/2] md/raid5-cache: Revert "md/raid5-cache: Clear conf->log after finishing work" Yu Kuai
2023-06-27 8:56 ` [PATCH -next 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread() Yu Kuai
0 siblings, 2 replies; 4+ messages in thread
From: Yu Kuai @ 2023-06-27 8:56 UTC (permalink / raw)
To: logang, hch, song, shli
Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun
From: Yu Kuai <yukuai3@huawei.com>
Commit b13015af94cf ("md/raid5-cache: Clear conf->log after finishing
work") introduce a new problem:
// caller hold reconfig_mutex
r5l_exit_log
flush_work(&log->disable_writeback_work)
r5c_disable_writeback_async
wait_event
/*
* conf->log is not NULL, and mddev_trylock()
* will fail, wait_event() can never pass.
*/
conf->log = NULL
patch 1 revert this patch, an patch 2 fix the original problem in a
different way.
Noted this problem is just found by code review, and I think this is
probably the reason that some mdadm tests is broken.
Yu Kuai (2):
md/raid5-cache: Revert "md/raid5-cache: Clear conf->log after
finishing work"
md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread()
drivers/md/raid5-cache.c | 24 ++++++++++--------------
1 file changed, 10 insertions(+), 14 deletions(-)
--
2.39.2
^ permalink raw reply [flat|nested] 4+ messages in thread* [PATCH -next 1/2] md/raid5-cache: Revert "md/raid5-cache: Clear conf->log after finishing work" 2023-06-27 8:56 [PATCH -next 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log() Yu Kuai @ 2023-06-27 8:56 ` Yu Kuai 2023-06-27 8:56 ` [PATCH -next 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread() Yu Kuai 1 sibling, 0 replies; 4+ messages in thread From: Yu Kuai @ 2023-06-27 8:56 UTC (permalink / raw) To: logang, hch, song, shli Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun From: Yu Kuai <yukuai3@huawei.com> This reverts commit b13015af94cf405f73ff64ce0797269554020c37. Because this will cause that r5c_disable_writeback_async() to wait forever, since caller hold reconfig_mutex and conf->log is not NULL: wait_event conf->log == NULL || (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) && (locked = mddev_trylock(mddev))) This problem is found by code review, and the null-ptr-deref this patch fixed will be fixed by another approch in the next patch. Signed-off-by: Yu Kuai <yukuai3@huawei.com> --- drivers/md/raid5-cache.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c index 47ba7d9e81e1..083288e36949 100644 --- a/drivers/md/raid5-cache.c +++ b/drivers/md/raid5-cache.c @@ -3168,13 +3168,12 @@ void r5l_exit_log(struct r5conf *conf) { struct r5l_log *log = conf->log; + conf->log = NULL; + /* Ensure disable_writeback_work wakes up and exits */ wake_up(&conf->mddev->sb_wait); flush_work(&log->disable_writeback_work); md_unregister_thread(&log->reclaim_thread); - - conf->log = NULL; - mempool_exit(&log->meta_pool); bioset_exit(&log->bs); mempool_exit(&log->io_pool); -- 2.39.2 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH -next 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread() 2023-06-27 8:56 [PATCH -next 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log() Yu Kuai 2023-06-27 8:56 ` [PATCH -next 1/2] md/raid5-cache: Revert "md/raid5-cache: Clear conf->log after finishing work" Yu Kuai @ 2023-06-27 8:56 ` Yu Kuai 2023-06-27 15:12 ` kernel test robot 1 sibling, 1 reply; 4+ messages in thread From: Yu Kuai @ 2023-06-27 8:56 UTC (permalink / raw) To: logang, hch, song, shli Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun From: Yu Kuai <yukuai3@huawei.com> r5l_reclaim_thread() already check that 'conf->log' is not NULL in the beginning, however, r5c_do_reclaim() and r5c_do_reclaim() will dereference 'conf->log' again, which will cause null-ptr-deref if 'conf->log' is set to NULL from r5l_exit_log() concurrently. Fix this problem by don't dereference 'conf->log' again in r5c_do_reclaim() and r5c_do_reclaim(). Fixes: a39f7afde358 ("md/r5cache: write-out phase and reclaim support") Signed-off-by: Yu Kuai <yukuai3@huawei.com> --- drivers/md/raid5-cache.c | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c index 083288e36949..0cd95d9c3370 100644 --- a/drivers/md/raid5-cache.c +++ b/drivers/md/raid5-cache.c @@ -1148,10 +1148,9 @@ static void r5l_run_no_space_stripes(struct r5l_log *log) * for write through mode, returns log->next_checkpoint * for write back, returns log_start of first sh in stripe_in_journal_list */ -static sector_t r5c_calculate_new_cp(struct r5conf *conf) +static sector_t r5c_calculate_new_cp(struct r5l_log *log) { struct stripe_head *sh; - struct r5l_log *log = conf->log; sector_t new_cp; unsigned long flags; @@ -1159,12 +1158,12 @@ static sector_t r5c_calculate_new_cp(struct r5conf *conf) return log->next_checkpoint; spin_lock_irqsave(&log->stripe_in_journal_lock, flags); - if (list_empty(&conf->log->stripe_in_journal_list)) { + if (list_empty(&log->stripe_in_journal_list)) { /* all stripes flushed */ spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags); return log->next_checkpoint; } - sh = list_first_entry(&conf->log->stripe_in_journal_list, + sh = list_first_entry(&log->stripe_in_journal_list, struct stripe_head, r5c); new_cp = sh->log_start; spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags); @@ -1173,10 +1172,8 @@ static sector_t r5c_calculate_new_cp(struct r5conf *conf) static sector_t r5l_reclaimable_space(struct r5l_log *log) { - struct r5conf *conf = log->rdev->mddev->private; - return r5l_ring_distance(log, log->last_checkpoint, - r5c_calculate_new_cp(conf)); + r5c_calculate_new_cp(log)); } static void r5l_run_no_mem_stripe(struct r5l_log *log) @@ -1419,9 +1416,9 @@ void r5c_flush_cache(struct r5conf *conf, int num) } } -static void r5c_do_reclaim(struct r5conf *conf) +static void r5c_do_reclaim(struct r5l_log *log) { - struct r5l_log *log = conf->log; + struct r5conf *conf = log->rdev->mddev->private; struct stripe_head *sh; int count = 0; unsigned long flags; @@ -1525,7 +1522,7 @@ static void r5l_do_reclaim(struct r5l_log *log) log->io_list_lock); } - next_checkpoint = r5c_calculate_new_cp(conf); + next_checkpoint = r5c_calculate_new_cp(log); spin_unlock_irq(&log->io_list_lock); if (reclaimable == 0 || !write_super) @@ -1554,7 +1551,7 @@ static void r5l_reclaim_thread(struct md_thread *thread) if (!log) return; - r5c_do_reclaim(conf); + r5c_do_reclaim(log); r5l_do_reclaim(log); } -- 2.39.2 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH -next 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread() 2023-06-27 8:56 ` [PATCH -next 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread() Yu Kuai @ 2023-06-27 15:12 ` kernel test robot 0 siblings, 0 replies; 4+ messages in thread From: kernel test robot @ 2023-06-27 15:12 UTC (permalink / raw) To: Yu Kuai, logang, hch, song, shli Cc: oe-kbuild-all, linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun Hi Yu, kernel test robot noticed the following build warnings: [auto build test WARNING on next-20230626] url: https://github.com/intel-lab-lkp/linux/commits/Yu-Kuai/md-raid5-cache-Revert-md-raid5-cache-Clear-conf-log-after-finishing-work/20230627-165746 base: next-20230626 patch link: https://lore.kernel.org/r/20230627085611.4186951-3-yukuai1%40huaweicloud.com patch subject: [PATCH -next 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread() config: x86_64-buildonly-randconfig-r003-20230627 (https://download.01.org/0day-ci/archive/20230627/202306272247.VWiGIFDe-lkp@intel.com/config) compiler: gcc-12 (Debian 12.2.0-14) 12.2.0 reproduce: (https://download.01.org/0day-ci/archive/20230627/202306272247.VWiGIFDe-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202306272247.VWiGIFDe-lkp@intel.com/ All warnings (new ones prefixed by >>): drivers/md/raid5-cache.c: In function 'r5l_do_reclaim': >> drivers/md/raid5-cache.c:1496:24: warning: unused variable 'conf' [-Wunused-variable] 1496 | struct r5conf *conf = log->rdev->mddev->private; | ^~~~ vim +/conf +1496 drivers/md/raid5-cache.c a39f7afde358ca Song Liu 2016-11-17 1493 0576b1c618ef22 Shaohua Li 2015-08-13 1494 static void r5l_do_reclaim(struct r5l_log *log) 0576b1c618ef22 Shaohua Li 2015-08-13 1495 { a39f7afde358ca Song Liu 2016-11-17 @1496 struct r5conf *conf = log->rdev->mddev->private; 0576b1c618ef22 Shaohua Li 2015-08-13 1497 sector_t reclaim_target = xchg(&log->reclaim_target, 0); 170364619ac21c Christoph Hellwig 2015-10-05 1498 sector_t reclaimable; 170364619ac21c Christoph Hellwig 2015-10-05 1499 sector_t next_checkpoint; a39f7afde358ca Song Liu 2016-11-17 1500 bool write_super; 0576b1c618ef22 Shaohua Li 2015-08-13 1501 0576b1c618ef22 Shaohua Li 2015-08-13 1502 spin_lock_irq(&log->io_list_lock); a39f7afde358ca Song Liu 2016-11-17 1503 write_super = r5l_reclaimable_space(log) > log->max_free_space || a39f7afde358ca Song Liu 2016-11-17 1504 reclaim_target != 0 || !list_empty(&log->no_space_stripes); 0576b1c618ef22 Shaohua Li 2015-08-13 1505 /* 0576b1c618ef22 Shaohua Li 2015-08-13 1506 * move proper io_unit to reclaim list. We should not change the order. 0576b1c618ef22 Shaohua Li 2015-08-13 1507 * reclaimable/unreclaimable io_unit can be mixed in the list, we 0576b1c618ef22 Shaohua Li 2015-08-13 1508 * shouldn't reuse space of an unreclaimable io_unit 0576b1c618ef22 Shaohua Li 2015-08-13 1509 */ 0576b1c618ef22 Shaohua Li 2015-08-13 1510 while (1) { 170364619ac21c Christoph Hellwig 2015-10-05 1511 reclaimable = r5l_reclaimable_space(log); 170364619ac21c Christoph Hellwig 2015-10-05 1512 if (reclaimable >= reclaim_target || 0576b1c618ef22 Shaohua Li 2015-08-13 1513 (list_empty(&log->running_ios) && 0576b1c618ef22 Shaohua Li 2015-08-13 1514 list_empty(&log->io_end_ios) && a8c34f915976e3 Shaohua Li 2015-09-02 1515 list_empty(&log->flushing_ios) && 04732f741dce5e Christoph Hellwig 2015-10-05 1516 list_empty(&log->finished_ios))) 0576b1c618ef22 Shaohua Li 2015-08-13 1517 break; 0576b1c618ef22 Shaohua Li 2015-08-13 1518 170364619ac21c Christoph Hellwig 2015-10-05 1519 md_wakeup_thread(log->rdev->mddev->thread); 170364619ac21c Christoph Hellwig 2015-10-05 1520 wait_event_lock_irq(log->iounit_wait, 170364619ac21c Christoph Hellwig 2015-10-05 1521 r5l_reclaimable_space(log) > reclaimable, 170364619ac21c Christoph Hellwig 2015-10-05 1522 log->io_list_lock); 0576b1c618ef22 Shaohua Li 2015-08-13 1523 } 170364619ac21c Christoph Hellwig 2015-10-05 1524 f2ca7c4bb59762 Yu Kuai 2023-06-27 1525 next_checkpoint = r5c_calculate_new_cp(log); 0576b1c618ef22 Shaohua Li 2015-08-13 1526 spin_unlock_irq(&log->io_list_lock); 0576b1c618ef22 Shaohua Li 2015-08-13 1527 a39f7afde358ca Song Liu 2016-11-17 1528 if (reclaimable == 0 || !write_super) 0576b1c618ef22 Shaohua Li 2015-08-13 1529 return; 0576b1c618ef22 Shaohua Li 2015-08-13 1530 0576b1c618ef22 Shaohua Li 2015-08-13 1531 /* 0576b1c618ef22 Shaohua Li 2015-08-13 1532 * write_super will flush cache of each raid disk. We must write super 0576b1c618ef22 Shaohua Li 2015-08-13 1533 * here, because the log area might be reused soon and we don't want to 0576b1c618ef22 Shaohua Li 2015-08-13 1534 * confuse recovery 0576b1c618ef22 Shaohua Li 2015-08-13 1535 */ 4b482044d24f3d Shaohua Li 2015-10-08 1536 r5l_write_super_and_discard_space(log, next_checkpoint); 0576b1c618ef22 Shaohua Li 2015-08-13 1537 0576b1c618ef22 Shaohua Li 2015-08-13 1538 mutex_lock(&log->io_mutex); 170364619ac21c Christoph Hellwig 2015-10-05 1539 log->last_checkpoint = next_checkpoint; a39f7afde358ca Song Liu 2016-11-17 1540 r5c_update_log_state(log); 0576b1c618ef22 Shaohua Li 2015-08-13 1541 mutex_unlock(&log->io_mutex); 0576b1c618ef22 Shaohua Li 2015-08-13 1542 170364619ac21c Christoph Hellwig 2015-10-05 1543 r5l_run_no_space_stripes(log); 0576b1c618ef22 Shaohua Li 2015-08-13 1544 } 0576b1c618ef22 Shaohua Li 2015-08-13 1545 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-06-27 15:13 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-06-27 8:56 [PATCH -next 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log() Yu Kuai 2023-06-27 8:56 ` [PATCH -next 1/2] md/raid5-cache: Revert "md/raid5-cache: Clear conf->log after finishing work" Yu Kuai 2023-06-27 8:56 ` [PATCH -next 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread() Yu Kuai 2023-06-27 15:12 ` kernel test robot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox