[PATCH V4 for-6.10/block] loop: Fix a race between loop detach and loop open

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH V4 for-6.10/block] loop: Fix a race between loop detach and loop open
@ 2024-06-07 19:06 Gulam Mohamed
  2024-06-08  5:20 ` Christoph Hellwig
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Gulam Mohamed @ 2024-06-07 19:06 UTC (permalink / raw)
  To: linux-block, linux-kernel; +Cc: yukuai1, hch, axboe

1. Userspace sends the command "losetup -d" which uses the open() call
   to open the device
2. Kernel receives the ioctl command "LOOP_CLR_FD" which calls the
   function loop_clr_fd()
3. If LOOP_CLR_FD is the first command received at the time, then the
   AUTOCLEAR flag is not set and deletion of the
   loop device proceeds ahead and scans the partitions (drop/add
   partitions)

        if (disk_openers(lo->lo_disk) > 1) {
                lo->lo_flags |= LO_FLAGS_AUTOCLEAR;
                loop_global_unlock(lo, true);
                return 0;
        }

 4. Before scanning partitions, it will check to see if any partition of
    the loop device is currently opened
 5. If any partition is opened, then it will return EBUSY:

    if (disk->open_partitions)
                return -EBUSY;
 6. So, after receiving the "LOOP_CLR_FD" command and just before the above
    check for open_partitions, if any other command
    (like blkid) opens any partition of the loop device, then the partition
    scan will not proceed and EBUSY is returned as shown in above code
 7. But in "__loop_clr_fd()", this EBUSY error is not propagated
 8. We have noticed that this is causing the partitions of the loop to
    remain stale even after the loop device is detached resulting in the
    IO errors on the partitions
Fix:
Defer the detach of loop device to release function, which is called
when the last close happens, by setting the lo_flags to LO_FLAGS_AUTOCLEAR
at the time of detach i.e in loop_clr_fd() function.

Test case involves the following two scripts:

script1.sh:

while [ 1 ];
do
        losetup -P -f /home/opt/looptest/test10.img
        blkid /dev/loop0p1
done

script2.sh:

while [ 1 ];
do
        losetup -d /dev/loop0
done

Without fix, the following IO errors have been observed:

kernel: __loop_clr_fd: partition scan of loop0 failed (rc=-16)
kernel: I/O error, dev loop0, sector 20971392 op 0x0:(READ) flags 0x80700
        phys_seg 1 prio class 0
kernel: I/O error, dev loop0, sector 108868 op 0x0:(READ) flags 0x0
        phys_seg 1 prio class 0
kernel: Buffer I/O error on dev loop0p1, logical block 27201, async page
        read

Signed-off-by: Gulam Mohamed <gulam.mohamed@oracle.com>
---
v4<-v3:
1. Defer the loop detach to last close of loop device
2. Removed the use of lo_open due to following reasons:

Setting the lo_state to Lo_rundown in loop_clr_fd() may not help in
stopping the incoming open(), when the loop is being detached, as the
open() could invoke the lo_open() before the lo_state is set to Lo_rundown
and increment the disk_openers refcnt later.
As the actual cleanup is deferred to last close, in release, there is no
chance for the open() to kick in to take the reference. Because both open()
and release() are protected by open_mutex and hence they cannot run in
parallel.
So, lo_open() and setting lo_state to Lo_rundown is not needed. Removing
the loop state Lo_rundown as its not used anymore.

 drivers/block/loop.c | 44 ++++++++------------------------------------
 1 file changed, 8 insertions(+), 36 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 28a95fd366fe..4936cadc1a63 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -41,7 +41,6 @@
 enum {
 	Lo_unbound,
 	Lo_bound,
-	Lo_rundown,
 	Lo_deleting,
 };
 
@@ -1131,7 +1130,7 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,
 	return error;
 }
 
-static void __loop_clr_fd(struct loop_device *lo, bool release)
+static void __loop_clr_fd(struct loop_device *lo)
 {
 	struct file *filp;
 	gfp_t gfp = lo->old_gfp_mask;
@@ -1139,14 +1138,6 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
 	if (test_bit(QUEUE_FLAG_WC, &lo->lo_queue->queue_flags))
 		blk_queue_write_cache(lo->lo_queue, false, false);
 
-	/*
-	 * Freeze the request queue when unbinding on a live file descriptor and
-	 * thus an open device.  When called from ->release we are guaranteed
-	 * that there is no I/O in progress already.
-	 */
-	if (!release)
-		blk_mq_freeze_queue(lo->lo_queue);
-
 	spin_lock_irq(&lo->lo_lock);
 	filp = lo->lo_backing_file;
 	lo->lo_backing_file = NULL;
@@ -1164,8 +1155,6 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
 	mapping_set_gfp_mask(filp->f_mapping, gfp);
 	/* This is safe: open() is still holding a reference. */
 	module_put(THIS_MODULE);
-	if (!release)
-		blk_mq_unfreeze_queue(lo->lo_queue);
 
 	disk_force_media_change(lo->lo_disk);
 
@@ -1180,11 +1169,7 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
 		 * must be at least one and it can only become zero when the
 		 * current holder is released.
 		 */
-		if (!release)
-			mutex_lock(&lo->lo_disk->open_mutex);
 		err = bdev_disk_changed(lo->lo_disk, false);
-		if (!release)
-			mutex_unlock(&lo->lo_disk->open_mutex);
 		if (err)
 			pr_warn("%s: partition scan of loop%d failed (rc=%d)\n",
 				__func__, lo->lo_number, err);
@@ -1232,25 +1217,8 @@ static int loop_clr_fd(struct loop_device *lo)
 		loop_global_unlock(lo, true);
 		return -ENXIO;
 	}
-	/*
-	 * If we've explicitly asked to tear down the loop device,
-	 * and it has an elevated reference count, set it for auto-teardown when
-	 * the last reference goes away. This stops $!~#$@ udev from
-	 * preventing teardown because it decided that it needs to run blkid on
-	 * the loopback device whenever they appear. xfstests is notorious for
-	 * failing tests because blkid via udev races with a losetup
-	 * <dev>/do something like mkfs/losetup -d <dev> causing the losetup -d
-	 * command to fail with EBUSY.
-	 */
-	if (disk_openers(lo->lo_disk) > 1) {
-		lo->lo_flags |= LO_FLAGS_AUTOCLEAR;
-		loop_global_unlock(lo, true);
-		return 0;
-	}
-	lo->lo_state = Lo_rundown;
+	lo->lo_flags |= LO_FLAGS_AUTOCLEAR;
 	loop_global_unlock(lo, true);
-
-	__loop_clr_fd(lo, false);
 	return 0;
 }
 
@@ -1724,15 +1692,19 @@ static void lo_release(struct gendisk *disk)
 	if (disk_openers(disk) > 0)
 		return;
 
+	/*
+	 * Clear the backing device information if this is the last close of
+	 * a device that's been marked for auto clear, or on which LOOP_CLR_FD
+	 * has been called.
+	 */
 	mutex_lock(&lo->lo_mutex);
 	if (lo->lo_state == Lo_bound && (lo->lo_flags & LO_FLAGS_AUTOCLEAR)) {
-		lo->lo_state = Lo_rundown;
 		mutex_unlock(&lo->lo_mutex);
 		/*
 		 * In autoclear mode, stop the loop thread
 		 * and remove configuration after last close.
 		 */
-		__loop_clr_fd(lo, true);
+		__loop_clr_fd(lo);
 		return;
 	}
 	mutex_unlock(&lo->lo_mutex);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH V4 for-6.10/block] loop: Fix a race between loop detach and loop open
  2024-06-07 19:06 [PATCH V4 for-6.10/block] loop: Fix a race between loop detach and loop open Gulam Mohamed
@ 2024-06-08  5:20 ` Christoph Hellwig
  2024-06-10  3:44 ` Chaitanya Kulkarni
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2024-06-08  5:20 UTC (permalink / raw)
  To: Gulam Mohamed; +Cc: linux-block, linux-kernel, yukuai1, hch, axboe

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V4 for-6.10/block] loop: Fix a race between loop detach and loop open
  2024-06-07 19:06 [PATCH V4 for-6.10/block] loop: Fix a race between loop detach and loop open Gulam Mohamed
  2024-06-08  5:20 ` Christoph Hellwig
@ 2024-06-10  3:44 ` Chaitanya Kulkarni
  2024-06-11 14:58 ` kernel test robot
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Chaitanya Kulkarni @ 2024-06-10  3:44 UTC (permalink / raw)
  To: Gulam Mohamed, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org
  Cc: yukuai1@huaweicloud.com, hch@lst.de, axboe@kernel.dk

On 6/7/24 12:06, Gulam Mohamed wrote:
> 1. Userspace sends the command "losetup -d" which uses the open() call
>     to open the device
> 2. Kernel receives the ioctl command "LOOP_CLR_FD" which calls the
>     function loop_clr_fd()
> 3. If LOOP_CLR_FD is the first command received at the time, then the
>     AUTOCLEAR flag is not set and deletion of the
>     loop device proceeds ahead and scans the partitions (drop/add
>     partitions)
>
>          if (disk_openers(lo->lo_disk) > 1) {
>                  lo->lo_flags |= LO_FLAGS_AUTOCLEAR;
>                  loop_global_unlock(lo, true);
>                  return 0;
>          }
>
>   4. Before scanning partitions, it will check to see if any partition of
>      the loop device is currently opened
>   5. If any partition is opened, then it will return EBUSY:
>
>      if (disk->open_partitions)
>                  return -EBUSY;
>   6. So, after receiving the "LOOP_CLR_FD" command and just before the above
>      check for open_partitions, if any other command
>      (like blkid) opens any partition of the loop device, then the partition
>      scan will not proceed and EBUSY is returned as shown in above code
>   7. But in "__loop_clr_fd()", this EBUSY error is not propagated
>   8. We have noticed that this is causing the partitions of the loop to
>      remain stale even after the loop device is detached resulting in the
>      IO errors on the partitions
> Fix:
> Defer the detach of loop device to release function, which is called
> when the last close happens, by setting the lo_flags to LO_FLAGS_AUTOCLEAR
> at the time of detach i.e in loop_clr_fd() function.
>
> Test case involves the following two scripts:
>
> script1.sh:
>
> while [ 1 ];
> do
>          losetup -P -f /home/opt/looptest/test10.img
>          blkid /dev/loop0p1
> done
>
> script2.sh:
>
> while [ 1 ];
> do
>          losetup -d /dev/loop0
> done
>
> Without fix, the following IO errors have been observed:
>
> kernel: __loop_clr_fd: partition scan of loop0 failed (rc=-16)
> kernel: I/O error, dev loop0, sector 20971392 op 0x0:(READ) flags 0x80700
>          phys_seg 1 prio class 0
> kernel: I/O error, dev loop0, sector 108868 op 0x0:(READ) flags 0x0
>          phys_seg 1 prio class 0
> kernel: Buffer I/O error on dev loop0p1, logical block 27201, async page
>          read
>
> Signed-off-by: Gulam Mohamed <gulam.mohamed@oracle.com>
> ---
>

Looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck

I did run blktests realted to this patch [1] without this patch I can
following messages :-
[  320.404176] __loop_clr_fd: partition scan of loop0 failed (rc=-16)
[  322.908994] __loop_clr_fd: partition scan of loop0 failed (rc=-16)

with this patch applied, these messages are gone when ran same test
posted in [1] ..

[1]
https://lore.kernel.org/all/ymanwmgtn76jg56vmjbg5vxcegfng2ewccgntmtzskwl6qx42d@g3iyvqldgais/T/



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V4 for-6.10/block] loop: Fix a race between loop detach and loop open
  2024-06-07 19:06 [PATCH V4 for-6.10/block] loop: Fix a race between loop detach and loop open Gulam Mohamed
  2024-06-08  5:20 ` Christoph Hellwig
  2024-06-10  3:44 ` Chaitanya Kulkarni
@ 2024-06-11 14:58 ` kernel test robot
  2024-06-13 21:10   ` Gulam Mohamed
  2024-06-12  5:19 ` Christoph Hellwig
  2025-03-31 14:10 ` Zhu Yanjun
  4 siblings, 1 reply; 9+ messages in thread
From: kernel test robot @ 2024-06-11 14:58 UTC (permalink / raw)
  To: Gulam Mohamed
  Cc: oe-lkp, lkp, linux-block, ltp, linux-kernel, yukuai1, hch, axboe,
	oliver.sang



Hello,

kernel test robot noticed "ltp.ioctl09.fail" on:

commit: 02ab74c165fb204557fe6cde80eda0633fbc4412 ("[PATCH V4 for-6.10/block] loop: Fix a race between loop detach and loop open")
url: https://github.com/intel-lab-lkp/linux/commits/Gulam-Mohamed/loop-Fix-a-race-between-loop-detach-and-loop-open/20240608-031123
base: https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git for-next
patch link: https://lore.kernel.org/all/20240607190607.17705-1-gulam.mohamed@oracle.com/
patch subject: [PATCH V4 for-6.10/block] loop: Fix a race between loop detach and loop open

in testcase: ltp
version: ltp-x86_64-14c1f76-1_20240608
with following parameters:

	disk: 1HDD
	fs: ext4
	test: syscalls-03/ioctl09



compiler: gcc-13
test machine: 4 threads 1 sockets Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz (Ivy Bridge) with 8G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202406112130.a572f72-oliver.sang@intel.com


Running tests.......
<<<test_start>>>
tag=ioctl09 stime=1717978971
cmdline="ioctl09"
contacts=""
analysis=exit
<<<test_output>>>
tst_test.c:1734: TINFO: LTP version: 20240524-32-ge2c52c5bb
tst_test.c:1618: TINFO: Timeout per run is 0h 02m 30s
tst_device.c:96: TINFO: Found free device 0 '/dev/loop0'
ioctl09.c:48: TPASS: access /sys/block/loop0/loop0p1 succeeds
ioctl09.c:56: TPASS: access /dev/loop0p1 succeeds
ioctl09.c:51: TPASS: access /sys/block/loop0/loop0p2 fails
ioctl09.c:59: TPASS: access /dev/loop0p2 fails
ioctl09.c:48: TPASS: access /sys/block/loop0/loop0p1 succeeds
ioctl09.c:56: TPASS: access /dev/loop0p1 succeeds
ioctl09.c:48: TPASS: access /sys/block/loop0/loop0p2 succeeds
ioctl09.c:56: TPASS: access /dev/loop0p2 succeeds
tst_device.c:263: TWARN: ioctl(/dev/loop0, LOOP_CLR_FD, 0) no ENXIO for too long

Summary:
passed   8
failed   0
broken   0
skipped  0
warnings 1
incrementing stop
<<<execution_status>>>
initiation_status="ok"
duration=3 termination_type=exited termination_id=4 corefile=no
cutime=3 cstime=42
<<<test_end>>>
INFO: ltp-pan reported some tests FAIL
LTP Version: 20240524-32-ge2c52c5bb

       ###############################################################

            Done executing testcases.
            LTP Version:  20240524-32-ge2c52c5bb
       ###############################################################




The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240611/202406112130.a572f72-oliver.sang@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V4 for-6.10/block] loop: Fix a race between loop detach and loop open
  2024-06-07 19:06 [PATCH V4 for-6.10/block] loop: Fix a race between loop detach and loop open Gulam Mohamed
                   ` (2 preceding siblings ...)
  2024-06-11 14:58 ` kernel test robot
@ 2024-06-12  5:19 ` Christoph Hellwig
  2025-03-31 14:10 ` Zhu Yanjun
  4 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2024-06-12  5:19 UTC (permalink / raw)
  To: Gulam Mohamed; +Cc: linux-block, linux-kernel, yukuai1, hch, axboe

On Fri, Jun 07, 2024 at 07:06:07PM +0000, Gulam Mohamed wrote:
> Setting the lo_state to Lo_rundown in loop_clr_fd() may not help in
> stopping the incoming open(), when the loop is being detached, as the
> open() could invoke the lo_open() before the lo_state is set to Lo_rundown
> and increment the disk_openers refcnt later.
> As the actual cleanup is deferred to last close, in release, there is no
> chance for the open() to kick in to take the reference. Because both open()
> and release() are protected by open_mutex and hence they cannot run in
> parallel.
> So, lo_open() and setting lo_state to Lo_rundown is not needed. Removing
> the loop state Lo_rundown as its not used anymore.

Looks like LTP still expects Lo_rundown to be set.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH V4 for-6.10/block] loop: Fix a race between loop detach and loop open
  2024-06-11 14:58 ` kernel test robot
@ 2024-06-13 21:10   ` Gulam Mohamed
  2024-06-14  5:45     ` hch
  0 siblings, 1 reply; 9+ messages in thread
From: Gulam Mohamed @ 2024-06-13 21:10 UTC (permalink / raw)
  To: kernel test robot
  Cc: oe-lkp@lists.linux.dev, lkp@intel.com,
	linux-block@vger.kernel.org, ltp@lists.linux.it,
	linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com, hch@lst.de,
	axboe@kernel.dk

Hi,

> -----Original Message-----
> From: kernel test robot <oliver.sang@intel.com>
> Sent: Tuesday, June 11, 2024 8:28 PM
> To: Gulam Mohamed <gulam.mohamed@oracle.com>
> Cc: oe-lkp@lists.linux.dev; lkp@intel.com; linux-block@vger.kernel.org;
> ltp@lists.linux.it; linux-kernel@vger.kernel.org; yukuai1@huaweicloud.com;
> hch@lst.de; axboe@kernel.dk; oliver.sang@intel.com
> Subject: Re: [PATCH V4 for-6.10/block] loop: Fix a race between loop detach
> and loop open
> 
> 
> 
> Hello,
> 
> kernel test robot noticed "ltp.ioctl09.fail" on:
> 
> commit: 02ab74c165fb204557fe6cde80eda0633fbc4412 ("[PATCH V4 for-
> 6.10/block] loop: Fix a race between loop detach and loop open")
> url: https://urldefense.com/v3/__https://github.com/intel-lab-
> lkp/linux/commits/Gulam-Mohamed/loop-Fix-a-race-between-loop-detach-
> and-loop-open/20240608-
> 031123__;!!ACWV5N9M2RV99hQ!Niww5tWxpW_rqqBaG_-
> w8CbDvJjcC6AwSb4gYZL3tS7fUrBcYesefSCbVL8GrWLJ0R8W_jyMsgUDi0HVVA
> _7Fk4$
> base:
> https://urldefense.com/v3/__https://git.kernel.org/cgit/linux/kernel/git/axboe
> /linux-block.git__;!!ACWV5N9M2RV99hQ!Niww5tWxpW_rqqBaG_-
> w8CbDvJjcC6AwSb4gYZL3tS7fUrBcYesefSCbVL8GrWLJ0R8W_jyMsgUDi0HVSM
> 8EomQ$  for-next patch link:
> https://urldefense.com/v3/__https://lore.kernel.org/all/20240607190607.177
> 05-1-
> gulam.mohamed@oracle.com/__;!!ACWV5N9M2RV99hQ!Niww5tWxpW_rqq
> BaG_-
> w8CbDvJjcC6AwSb4gYZL3tS7fUrBcYesefSCbVL8GrWLJ0R8W_jyMsgUDi0HVEcY
> Yz3s$
> patch subject: [PATCH V4 for-6.10/block] loop: Fix a race between loop detach
> and loop open
> 
> in testcase: ltp
> version: ltp-x86_64-14c1f76-1_20240608
> with following parameters:
> 
> 	disk: 1HDD
> 	fs: ext4
> 	test: syscalls-03/ioctl09
> 
> 
> 
> compiler: gcc-13
> test machine: 4 threads 1 sockets Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz
> (Ivy Bridge) with 8G memory
> 
> (please refer to attached dmesg/kmsg for entire log/backtrace)
> 
> 
> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes:
> | https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/20240611213
> | 0.a572f72-
> oliver.sang@intel.com__;!!ACWV5N9M2RV99hQ!Niww5tWxpW_rqqBaG_
> | -
> w8CbDvJjcC6AwSb4gYZL3tS7fUrBcYesefSCbVL8GrWLJ0R8W_jyMsgUDi0HVDgL
> 6MVc$
> 
> 
> Running tests.......
> <<<test_start>>>
> tag=ioctl09 stime=1717978971
> cmdline="ioctl09"
> contacts=""
> analysis=exit
> <<<test_output>>>
> tst_test.c:1734: TINFO: LTP version: 20240524-32-ge2c52c5bb
> tst_test.c:1618: TINFO: Timeout per run is 0h 02m 30s
> tst_device.c:96: TINFO: Found free device 0 '/dev/loop0'
> ioctl09.c:48: TPASS: access /sys/block/loop0/loop0p1 succeeds
> ioctl09.c:56: TPASS: access /dev/loop0p1 succeeds
> ioctl09.c:51: TPASS: access /sys/block/loop0/loop0p2 fails
> ioctl09.c:59: TPASS: access /dev/loop0p2 fails
> ioctl09.c:48: TPASS: access /sys/block/loop0/loop0p1 succeeds
> ioctl09.c:56: TPASS: access /dev/loop0p1 succeeds
> ioctl09.c:48: TPASS: access /sys/block/loop0/loop0p2 succeeds
> ioctl09.c:56: TPASS: access /dev/loop0p2 succeeds
> tst_device.c:263: TWARN: ioctl(/dev/loop0, LOOP_CLR_FD, 0) no ENXIO for
> too long
> 
> Summary:
> passed   8
> failed   0
> broken   0
> skipped  0
> warnings 1
> incrementing stop
> <<<execution_status>>>
> initiation_status="ok"
> duration=3 termination_type=exited termination_id=4 corefile=no
> cutime=3 cstime=42
> <<<test_end>>>
> INFO: ltp-pan reported some tests FAIL
> LTP Version: 20240524-32-ge2c52c5bb
> 
> 
> ###############################################################
> 
>             Done executing testcases.
>             LTP Version:  20240524-32-ge2c52c5bb
> 
> ###############################################################
> 
> 
> 
> 
> The kernel config and materials to reproduce are available at:
> https://urldefense.com/v3/__https://download.01.org/0day-
> ci/archive/20240611/202406112130.a572f72-
> oliver.sang@intel.com__;!!ACWV5N9M2RV99hQ!Niww5tWxpW_rqqBaG_-
> w8CbDvJjcC6AwSb4gYZL3tS7fUrBcYesefSCbVL8GrWLJ0R8W_jyMsgUDi0HVsf4L
> rkk$
> 
> 
> 
> --
> 0-DAY CI Kernel Test Service
> https://urldefense.com/v3/__https://github.com/intel/lkp-
> tests/wiki__;!!ACWV5N9M2RV99hQ!Niww5tWxpW_rqqBaG_-
> w8CbDvJjcC6AwSb4gYZL3tS7fUrBcYesefSCbVL8GrWLJ0R8W_jyMsgUDi0HVQRs
> yTxc$

I looked at the LTP test case failure and also the function tst_detach_device_by_fd() which failed. Our kernel patch will defer all the attempts to detach a loop device to the last close, to fix an issue.
The tst_detach_device_by_fd() in LTP test case will open the loop device and repeatedly checks for error code ENXIO. As the new approach, as I mentioned above, will defer the detach to last close and the last close happens *only* when the LTP test function tst_detach_device_by_fd() returns, the test will obviously fail. So, Can you please modify the LTP test case to accommodate the new behaviour of kernel code for loop detach?
Please let us know your comments.

Regards,
Gulam Mohamed.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V4 for-6.10/block] loop: Fix a race between loop detach and loop open
  2024-06-13 21:10   ` Gulam Mohamed
@ 2024-06-14  5:45     ` hch
  2024-06-14 23:35       ` Gulam Mohamed
  0 siblings, 1 reply; 9+ messages in thread
From: hch @ 2024-06-14  5:45 UTC (permalink / raw)
  To: Gulam Mohamed
  Cc: kernel test robot, oe-lkp@lists.linux.dev, lkp@intel.com,
	linux-block@vger.kernel.org, ltp@lists.linux.it,
	linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com, hch@lst.de,
	axboe@kernel.dk

On Thu, Jun 13, 2024 at 09:10:37PM +0000, Gulam Mohamed wrote:
> I looked at the LTP test case failure and also the function tst_detach_device_by_fd() which failed. Our kernel patch will defer all the attempts to detach a loop device to the last close, to fix an issue.
> The tst_detach_device_by_fd() in LTP test case will open the loop device and repeatedly checks for error code ENXIO. As the new approach, as I mentioned above, will defer the detach to last close and the last close happens *only* when the LTP test function tst_detach_device_by_fd() returns, the test will obviously fail. So, Can you please modify the LTP test case to accommodate the new behaviour of kernel code for loop detach?
> Please let us know your comments.

I still think simply setting the rundown state is the better approach..


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH V4 for-6.10/block] loop: Fix a race between loop detach and loop open
  2024-06-14  5:45     ` hch
@ 2024-06-14 23:35       ` Gulam Mohamed
  0 siblings, 0 replies; 9+ messages in thread
From: Gulam Mohamed @ 2024-06-14 23:35 UTC (permalink / raw)
  To: hch@lst.de
  Cc: kernel test robot, oe-lkp@lists.linux.dev, lkp@intel.com,
	linux-block@vger.kernel.org, ltp@lists.linux.it,
	linux-kernel@vger.kernel.org, yukuai1@huaweicloud.com,
	axboe@kernel.dk

Hi Christoph,

> -----Original Message-----
> From: hch@lst.de <hch@lst.de>
> Sent: Friday, June 14, 2024 11:16 AM
> To: Gulam Mohamed <gulam.mohamed@oracle.com>
> Cc: kernel test robot <oliver.sang@intel.com>; oe-lkp@lists.linux.dev;
> lkp@intel.com; linux-block@vger.kernel.org; ltp@lists.linux.it; linux-
> kernel@vger.kernel.org; yukuai1@huaweicloud.com; hch@lst.de;
> axboe@kernel.dk
> Subject: Re: [PATCH V4 for-6.10/block] loop: Fix a race between loop detach
> and loop open
> 
> On Thu, Jun 13, 2024 at 09:10:37PM +0000, Gulam Mohamed wrote:
> > I looked at the LTP test case failure and also the function
> tst_detach_device_by_fd() which failed. Our kernel patch will defer all the
> attempts to detach a loop device to the last close, to fix an issue.
> > The tst_detach_device_by_fd() in LTP test case will open the loop device
> and repeatedly checks for error code ENXIO. As the new approach, as I
> mentioned above, will defer the detach to last close and the last close
> happens *only* when the LTP test function tst_detach_device_by_fd()
> returns, the test will obviously fail. So, Can you please modify the LTP test case
> to accommodate the new behaviour of kernel code for loop detach?
> > Please let us know your comments.
> 
> I still think simply setting the rundown state is the better approach..

Thanks for the review Christoph. I am sending the V5 as you suggested.

Regards,
Gulam Mohamed.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V4 for-6.10/block] loop: Fix a race between loop detach and loop open
  2024-06-07 19:06 [PATCH V4 for-6.10/block] loop: Fix a race between loop detach and loop open Gulam Mohamed
                   ` (3 preceding siblings ...)
  2024-06-12  5:19 ` Christoph Hellwig
@ 2025-03-31 14:10 ` Zhu Yanjun
  4 siblings, 0 replies; 9+ messages in thread
From: Zhu Yanjun @ 2025-03-31 14:10 UTC (permalink / raw)
  To: Gulam Mohamed, linux-block, linux-kernel; +Cc: yukuai1, hch, axboe

On 07.06.24 21:06, Gulam Mohamed wrote:
> 1. Userspace sends the command "losetup -d" which uses the open() call
>     to open the device
> 2. Kernel receives the ioctl command "LOOP_CLR_FD" which calls the
>     function loop_clr_fd()
> 3. If LOOP_CLR_FD is the first command received at the time, then the
>     AUTOCLEAR flag is not set and deletion of the
>     loop device proceeds ahead and scans the partitions (drop/add
>     partitions)
> 
>          if (disk_openers(lo->lo_disk) > 1) {
>                  lo->lo_flags |= LO_FLAGS_AUTOCLEAR;
>                  loop_global_unlock(lo, true);
>                  return 0;
>          }
> 
>   4. Before scanning partitions, it will check to see if any partition of
>      the loop device is currently opened
>   5. If any partition is opened, then it will return EBUSY:
> 
>      if (disk->open_partitions)
>                  return -EBUSY;
>   6. So, after receiving the "LOOP_CLR_FD" command and just before the above
>      check for open_partitions, if any other command
>      (like blkid) opens any partition of the loop device, then the partition
>      scan will not proceed and EBUSY is returned as shown in above code
>   7. But in "__loop_clr_fd()", this EBUSY error is not propagated
>   8. We have noticed that this is causing the partitions of the loop to
>      remain stale even after the loop device is detached resulting in the
>      IO errors on the partitions
> Fix:
> Defer the detach of loop device to release function, which is called
> when the last close happens, by setting the lo_flags to LO_FLAGS_AUTOCLEAR
> at the time of detach i.e in loop_clr_fd() function.
> 
> Test case involves the following two scripts:
> 
> script1.sh:
> 
> while [ 1 ];
> do
>          losetup -P -f /home/opt/looptest/test10.img
>          blkid /dev/loop0p1
> done
> 
> script2.sh:
> 
> while [ 1 ];
> do
>          losetup -d /dev/loop0
> done
> 
> Without fix, the following IO errors have been observed:
> 
> kernel: __loop_clr_fd: partition scan of loop0 failed (rc=-16)
> kernel: I/O error, dev loop0, sector 20971392 op 0x0:(READ) flags 0x80700
>          phys_seg 1 prio class 0
> kernel: I/O error, dev loop0, sector 108868 op 0x0:(READ) flags 0x0
>          phys_seg 1 prio class 0
> kernel: Buffer I/O error on dev loop0p1, logical block 27201, async page
>          read
> 
> Signed-off-by: Gulam Mohamed <gulam.mohamed@oracle.com>

This is for v6.10 stable, should add "Cc: stable@vger.kernel.org"?
So the engineers who work the stable branch can find this commit and 
apply it?

Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>

Zhu Yanjun

> ---
> v4<-v3:
> 1. Defer the loop detach to last close of loop device
> 2. Removed the use of lo_open due to following reasons:
> 
> Setting the lo_state to Lo_rundown in loop_clr_fd() may not help in
> stopping the incoming open(), when the loop is being detached, as the
> open() could invoke the lo_open() before the lo_state is set to Lo_rundown
> and increment the disk_openers refcnt later.
> As the actual cleanup is deferred to last close, in release, there is no
> chance for the open() to kick in to take the reference. Because both open()
> and release() are protected by open_mutex and hence they cannot run in
> parallel.
> So, lo_open() and setting lo_state to Lo_rundown is not needed. Removing
> the loop state Lo_rundown as its not used anymore.
> 
>   drivers/block/loop.c | 44 ++++++++------------------------------------
>   1 file changed, 8 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 28a95fd366fe..4936cadc1a63 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -41,7 +41,6 @@
>   enum {
>   	Lo_unbound,
>   	Lo_bound,
> -	Lo_rundown,
>   	Lo_deleting,
>   };
>   
> @@ -1131,7 +1130,7 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,
>   	return error;
>   }
>   
> -static void __loop_clr_fd(struct loop_device *lo, bool release)
> +static void __loop_clr_fd(struct loop_device *lo)
>   {
>   	struct file *filp;
>   	gfp_t gfp = lo->old_gfp_mask;
> @@ -1139,14 +1138,6 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
>   	if (test_bit(QUEUE_FLAG_WC, &lo->lo_queue->queue_flags))
>   		blk_queue_write_cache(lo->lo_queue, false, false);
>   
> -	/*
> -	 * Freeze the request queue when unbinding on a live file descriptor and
> -	 * thus an open device.  When called from ->release we are guaranteed
> -	 * that there is no I/O in progress already.
> -	 */
> -	if (!release)
> -		blk_mq_freeze_queue(lo->lo_queue);
> -
>   	spin_lock_irq(&lo->lo_lock);
>   	filp = lo->lo_backing_file;
>   	lo->lo_backing_file = NULL;
> @@ -1164,8 +1155,6 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
>   	mapping_set_gfp_mask(filp->f_mapping, gfp);
>   	/* This is safe: open() is still holding a reference. */
>   	module_put(THIS_MODULE);
> -	if (!release)
> -		blk_mq_unfreeze_queue(lo->lo_queue);
>   
>   	disk_force_media_change(lo->lo_disk);
>   
> @@ -1180,11 +1169,7 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
>   		 * must be at least one and it can only become zero when the
>   		 * current holder is released.
>   		 */
> -		if (!release)
> -			mutex_lock(&lo->lo_disk->open_mutex);
>   		err = bdev_disk_changed(lo->lo_disk, false);
> -		if (!release)
> -			mutex_unlock(&lo->lo_disk->open_mutex);
>   		if (err)
>   			pr_warn("%s: partition scan of loop%d failed (rc=%d)\n",
>   				__func__, lo->lo_number, err);
> @@ -1232,25 +1217,8 @@ static int loop_clr_fd(struct loop_device *lo)
>   		loop_global_unlock(lo, true);
>   		return -ENXIO;
>   	}
> -	/*
> -	 * If we've explicitly asked to tear down the loop device,
> -	 * and it has an elevated reference count, set it for auto-teardown when
> -	 * the last reference goes away. This stops $!~#$@ udev from
> -	 * preventing teardown because it decided that it needs to run blkid on
> -	 * the loopback device whenever they appear. xfstests is notorious for
> -	 * failing tests because blkid via udev races with a losetup
> -	 * <dev>/do something like mkfs/losetup -d <dev> causing the losetup -d
> -	 * command to fail with EBUSY.
> -	 */
> -	if (disk_openers(lo->lo_disk) > 1) {
> -		lo->lo_flags |= LO_FLAGS_AUTOCLEAR;
> -		loop_global_unlock(lo, true);
> -		return 0;
> -	}
> -	lo->lo_state = Lo_rundown;
> +	lo->lo_flags |= LO_FLAGS_AUTOCLEAR;
>   	loop_global_unlock(lo, true);
> -
> -	__loop_clr_fd(lo, false);
>   	return 0;
>   }
>   
> @@ -1724,15 +1692,19 @@ static void lo_release(struct gendisk *disk)
>   	if (disk_openers(disk) > 0)
>   		return;
>   
> +	/*
> +	 * Clear the backing device information if this is the last close of
> +	 * a device that's been marked for auto clear, or on which LOOP_CLR_FD
> +	 * has been called.
> +	 */
>   	mutex_lock(&lo->lo_mutex);
>   	if (lo->lo_state == Lo_bound && (lo->lo_flags & LO_FLAGS_AUTOCLEAR)) {
> -		lo->lo_state = Lo_rundown;
>   		mutex_unlock(&lo->lo_mutex);
>   		/*
>   		 * In autoclear mode, stop the loop thread
>   		 * and remove configuration after last close.
>   		 */
> -		__loop_clr_fd(lo, true);
> +		__loop_clr_fd(lo);
>   		return;
>   	}
>   	mutex_unlock(&lo->lo_mutex);


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-03-31 14:11 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-07 19:06 [PATCH V4 for-6.10/block] loop: Fix a race between loop detach and loop open Gulam Mohamed
2024-06-08  5:20 ` Christoph Hellwig
2024-06-10  3:44 ` Chaitanya Kulkarni
2024-06-11 14:58 ` kernel test robot
2024-06-13 21:10   ` Gulam Mohamed
2024-06-14  5:45     ` hch
2024-06-14 23:35       ` Gulam Mohamed
2024-06-12  5:19 ` Christoph Hellwig
2025-03-31 14:10 ` Zhu Yanjun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).