public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH -next 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log()
@ 2023-06-27  8:56 Yu Kuai
  2023-06-27  8:56 ` [PATCH -next 1/2] md/raid5-cache: Revert "md/raid5-cache: Clear conf->log after finishing work" Yu Kuai
  2023-06-27  8:56 ` [PATCH -next 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread() Yu Kuai
  0 siblings, 2 replies; 4+ messages in thread
From: Yu Kuai @ 2023-06-27  8:56 UTC (permalink / raw)
  To: logang, hch, song, shli
  Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Commit b13015af94cf ("md/raid5-cache: Clear conf->log after finishing
work") introduce a new problem:

// caller hold reconfig_mutex
r5l_exit_log
 flush_work(&log->disable_writeback_work)
			r5c_disable_writeback_async
			 wait_event
			  /*
			   * conf->log is not NULL, and mddev_trylock()
			   * will fail, wait_event() can never pass.
			   */
 conf->log = NULL

patch 1 revert this patch, an patch 2 fix the original problem in a
different way.

Noted this problem is just found by code review, and I think this is
probably the reason that some mdadm tests is broken.

Yu Kuai (2):
  md/raid5-cache: Revert "md/raid5-cache: Clear conf->log after
    finishing work"
  md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread()

 drivers/md/raid5-cache.c | 24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH -next 1/2] md/raid5-cache: Revert "md/raid5-cache: Clear conf->log after finishing work"
  2023-06-27  8:56 [PATCH -next 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log() Yu Kuai
@ 2023-06-27  8:56 ` Yu Kuai
  2023-06-27  8:56 ` [PATCH -next 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread() Yu Kuai
  1 sibling, 0 replies; 4+ messages in thread
From: Yu Kuai @ 2023-06-27  8:56 UTC (permalink / raw)
  To: logang, hch, song, shli
  Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

This reverts commit b13015af94cf405f73ff64ce0797269554020c37.

Because this will cause that r5c_disable_writeback_async() to wait
forever, since caller hold reconfig_mutex and conf->log is not NULL:

wait_event
 conf->log == NULL ||
 (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags) &&
  (locked = mddev_trylock(mddev)))

This problem is found by code review, and the null-ptr-deref this patch
fixed will be fixed by another approch in the next patch.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid5-cache.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 47ba7d9e81e1..083288e36949 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -3168,13 +3168,12 @@ void r5l_exit_log(struct r5conf *conf)
 {
 	struct r5l_log *log = conf->log;
 
+	conf->log = NULL;
+
 	/* Ensure disable_writeback_work wakes up and exits */
 	wake_up(&conf->mddev->sb_wait);
 	flush_work(&log->disable_writeback_work);
 	md_unregister_thread(&log->reclaim_thread);
-
-	conf->log = NULL;
-
 	mempool_exit(&log->meta_pool);
 	bioset_exit(&log->bs);
 	mempool_exit(&log->io_pool);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH -next 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread()
  2023-06-27  8:56 [PATCH -next 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log() Yu Kuai
  2023-06-27  8:56 ` [PATCH -next 1/2] md/raid5-cache: Revert "md/raid5-cache: Clear conf->log after finishing work" Yu Kuai
@ 2023-06-27  8:56 ` Yu Kuai
  2023-06-27 15:12   ` kernel test robot
  1 sibling, 1 reply; 4+ messages in thread
From: Yu Kuai @ 2023-06-27  8:56 UTC (permalink / raw)
  To: logang, hch, song, shli
  Cc: linux-raid, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

r5l_reclaim_thread() already check that 'conf->log' is not NULL in the
beginning, however, r5c_do_reclaim() and r5c_do_reclaim() will
dereference 'conf->log' again, which will cause null-ptr-deref if
'conf->log' is set to NULL from r5l_exit_log() concurrently.

Fix this problem by don't dereference 'conf->log' again in
r5c_do_reclaim() and r5c_do_reclaim().

Fixes: a39f7afde358 ("md/r5cache: write-out phase and reclaim support")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/raid5-cache.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 083288e36949..0cd95d9c3370 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -1148,10 +1148,9 @@ static void r5l_run_no_space_stripes(struct r5l_log *log)
  * for write through mode, returns log->next_checkpoint
  * for write back, returns log_start of first sh in stripe_in_journal_list
  */
-static sector_t r5c_calculate_new_cp(struct r5conf *conf)
+static sector_t r5c_calculate_new_cp(struct r5l_log *log)
 {
 	struct stripe_head *sh;
-	struct r5l_log *log = conf->log;
 	sector_t new_cp;
 	unsigned long flags;
 
@@ -1159,12 +1158,12 @@ static sector_t r5c_calculate_new_cp(struct r5conf *conf)
 		return log->next_checkpoint;
 
 	spin_lock_irqsave(&log->stripe_in_journal_lock, flags);
-	if (list_empty(&conf->log->stripe_in_journal_list)) {
+	if (list_empty(&log->stripe_in_journal_list)) {
 		/* all stripes flushed */
 		spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags);
 		return log->next_checkpoint;
 	}
-	sh = list_first_entry(&conf->log->stripe_in_journal_list,
+	sh = list_first_entry(&log->stripe_in_journal_list,
 			      struct stripe_head, r5c);
 	new_cp = sh->log_start;
 	spin_unlock_irqrestore(&log->stripe_in_journal_lock, flags);
@@ -1173,10 +1172,8 @@ static sector_t r5c_calculate_new_cp(struct r5conf *conf)
 
 static sector_t r5l_reclaimable_space(struct r5l_log *log)
 {
-	struct r5conf *conf = log->rdev->mddev->private;
-
 	return r5l_ring_distance(log, log->last_checkpoint,
-				 r5c_calculate_new_cp(conf));
+				 r5c_calculate_new_cp(log));
 }
 
 static void r5l_run_no_mem_stripe(struct r5l_log *log)
@@ -1419,9 +1416,9 @@ void r5c_flush_cache(struct r5conf *conf, int num)
 	}
 }
 
-static void r5c_do_reclaim(struct r5conf *conf)
+static void r5c_do_reclaim(struct r5l_log *log)
 {
-	struct r5l_log *log = conf->log;
+	struct r5conf *conf = log->rdev->mddev->private;
 	struct stripe_head *sh;
 	int count = 0;
 	unsigned long flags;
@@ -1525,7 +1522,7 @@ static void r5l_do_reclaim(struct r5l_log *log)
 				    log->io_list_lock);
 	}
 
-	next_checkpoint = r5c_calculate_new_cp(conf);
+	next_checkpoint = r5c_calculate_new_cp(log);
 	spin_unlock_irq(&log->io_list_lock);
 
 	if (reclaimable == 0 || !write_super)
@@ -1554,7 +1551,7 @@ static void r5l_reclaim_thread(struct md_thread *thread)
 
 	if (!log)
 		return;
-	r5c_do_reclaim(conf);
+	r5c_do_reclaim(log);
 	r5l_do_reclaim(log);
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH -next 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread()
  2023-06-27  8:56 ` [PATCH -next 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread() Yu Kuai
@ 2023-06-27 15:12   ` kernel test robot
  0 siblings, 0 replies; 4+ messages in thread
From: kernel test robot @ 2023-06-27 15:12 UTC (permalink / raw)
  To: Yu Kuai, logang, hch, song, shli
  Cc: oe-kbuild-all, linux-raid, linux-kernel, yukuai3, yukuai1,
	yi.zhang, yangerkun

Hi Yu,

kernel test robot noticed the following build warnings:

[auto build test WARNING on next-20230626]

url:    https://github.com/intel-lab-lkp/linux/commits/Yu-Kuai/md-raid5-cache-Revert-md-raid5-cache-Clear-conf-log-after-finishing-work/20230627-165746
base:   next-20230626
patch link:    https://lore.kernel.org/r/20230627085611.4186951-3-yukuai1%40huaweicloud.com
patch subject: [PATCH -next 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread()
config: x86_64-buildonly-randconfig-r003-20230627 (https://download.01.org/0day-ci/archive/20230627/202306272247.VWiGIFDe-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce: (https://download.01.org/0day-ci/archive/20230627/202306272247.VWiGIFDe-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202306272247.VWiGIFDe-lkp@intel.com/

All warnings (new ones prefixed by >>):

   drivers/md/raid5-cache.c: In function 'r5l_do_reclaim':
>> drivers/md/raid5-cache.c:1496:24: warning: unused variable 'conf' [-Wunused-variable]
    1496 |         struct r5conf *conf = log->rdev->mddev->private;
         |                        ^~~~


vim +/conf +1496 drivers/md/raid5-cache.c

a39f7afde358ca Song Liu          2016-11-17  1493  
0576b1c618ef22 Shaohua Li        2015-08-13  1494  static void r5l_do_reclaim(struct r5l_log *log)
0576b1c618ef22 Shaohua Li        2015-08-13  1495  {
a39f7afde358ca Song Liu          2016-11-17 @1496  	struct r5conf *conf = log->rdev->mddev->private;
0576b1c618ef22 Shaohua Li        2015-08-13  1497  	sector_t reclaim_target = xchg(&log->reclaim_target, 0);
170364619ac21c Christoph Hellwig 2015-10-05  1498  	sector_t reclaimable;
170364619ac21c Christoph Hellwig 2015-10-05  1499  	sector_t next_checkpoint;
a39f7afde358ca Song Liu          2016-11-17  1500  	bool write_super;
0576b1c618ef22 Shaohua Li        2015-08-13  1501  
0576b1c618ef22 Shaohua Li        2015-08-13  1502  	spin_lock_irq(&log->io_list_lock);
a39f7afde358ca Song Liu          2016-11-17  1503  	write_super = r5l_reclaimable_space(log) > log->max_free_space ||
a39f7afde358ca Song Liu          2016-11-17  1504  		reclaim_target != 0 || !list_empty(&log->no_space_stripes);
0576b1c618ef22 Shaohua Li        2015-08-13  1505  	/*
0576b1c618ef22 Shaohua Li        2015-08-13  1506  	 * move proper io_unit to reclaim list. We should not change the order.
0576b1c618ef22 Shaohua Li        2015-08-13  1507  	 * reclaimable/unreclaimable io_unit can be mixed in the list, we
0576b1c618ef22 Shaohua Li        2015-08-13  1508  	 * shouldn't reuse space of an unreclaimable io_unit
0576b1c618ef22 Shaohua Li        2015-08-13  1509  	 */
0576b1c618ef22 Shaohua Li        2015-08-13  1510  	while (1) {
170364619ac21c Christoph Hellwig 2015-10-05  1511  		reclaimable = r5l_reclaimable_space(log);
170364619ac21c Christoph Hellwig 2015-10-05  1512  		if (reclaimable >= reclaim_target ||
0576b1c618ef22 Shaohua Li        2015-08-13  1513  		    (list_empty(&log->running_ios) &&
0576b1c618ef22 Shaohua Li        2015-08-13  1514  		     list_empty(&log->io_end_ios) &&
a8c34f915976e3 Shaohua Li        2015-09-02  1515  		     list_empty(&log->flushing_ios) &&
04732f741dce5e Christoph Hellwig 2015-10-05  1516  		     list_empty(&log->finished_ios)))
0576b1c618ef22 Shaohua Li        2015-08-13  1517  			break;
0576b1c618ef22 Shaohua Li        2015-08-13  1518  
170364619ac21c Christoph Hellwig 2015-10-05  1519  		md_wakeup_thread(log->rdev->mddev->thread);
170364619ac21c Christoph Hellwig 2015-10-05  1520  		wait_event_lock_irq(log->iounit_wait,
170364619ac21c Christoph Hellwig 2015-10-05  1521  				    r5l_reclaimable_space(log) > reclaimable,
170364619ac21c Christoph Hellwig 2015-10-05  1522  				    log->io_list_lock);
0576b1c618ef22 Shaohua Li        2015-08-13  1523  	}
170364619ac21c Christoph Hellwig 2015-10-05  1524  
f2ca7c4bb59762 Yu Kuai           2023-06-27  1525  	next_checkpoint = r5c_calculate_new_cp(log);
0576b1c618ef22 Shaohua Li        2015-08-13  1526  	spin_unlock_irq(&log->io_list_lock);
0576b1c618ef22 Shaohua Li        2015-08-13  1527  
a39f7afde358ca Song Liu          2016-11-17  1528  	if (reclaimable == 0 || !write_super)
0576b1c618ef22 Shaohua Li        2015-08-13  1529  		return;
0576b1c618ef22 Shaohua Li        2015-08-13  1530  
0576b1c618ef22 Shaohua Li        2015-08-13  1531  	/*
0576b1c618ef22 Shaohua Li        2015-08-13  1532  	 * write_super will flush cache of each raid disk. We must write super
0576b1c618ef22 Shaohua Li        2015-08-13  1533  	 * here, because the log area might be reused soon and we don't want to
0576b1c618ef22 Shaohua Li        2015-08-13  1534  	 * confuse recovery
0576b1c618ef22 Shaohua Li        2015-08-13  1535  	 */
4b482044d24f3d Shaohua Li        2015-10-08  1536  	r5l_write_super_and_discard_space(log, next_checkpoint);
0576b1c618ef22 Shaohua Li        2015-08-13  1537  
0576b1c618ef22 Shaohua Li        2015-08-13  1538  	mutex_lock(&log->io_mutex);
170364619ac21c Christoph Hellwig 2015-10-05  1539  	log->last_checkpoint = next_checkpoint;
a39f7afde358ca Song Liu          2016-11-17  1540  	r5c_update_log_state(log);
0576b1c618ef22 Shaohua Li        2015-08-13  1541  	mutex_unlock(&log->io_mutex);
0576b1c618ef22 Shaohua Li        2015-08-13  1542  
170364619ac21c Christoph Hellwig 2015-10-05  1543  	r5l_run_no_space_stripes(log);
0576b1c618ef22 Shaohua Li        2015-08-13  1544  }
0576b1c618ef22 Shaohua Li        2015-08-13  1545  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-06-27 15:13 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-27  8:56 [PATCH -next 0/2] md/raid5-cache: fix a deadlock in r5l_exit_log() Yu Kuai
2023-06-27  8:56 ` [PATCH -next 1/2] md/raid5-cache: Revert "md/raid5-cache: Clear conf->log after finishing work" Yu Kuai
2023-06-27  8:56 ` [PATCH -next 2/2] md/raid5-cache: fix null-ptr-deref in r5l_reclaim_thread() Yu Kuai
2023-06-27 15:12   ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox