From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-180.mta1.migadu.com (out-180.mta1.migadu.com [95.215.58.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A250B3234 for ; Mon, 31 Mar 2025 14:10:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.180 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743430258; cv=none; b=bdgYj1SfhK+OBpiC+aArPTL1uuo/Pcj8WgRdsk+u6pAtQOQ/W7+LTNYEaVvYNnT/XnOxFBC3mN1fnsGgbCi2cLc0ZWYD/MX4h3buoZ3E2rj41JC4qJTKPIk9mg/WFYHv/TlrYOC26uKiWZYnt2/wN/dbzs5m46QhrPLZQilW4Xo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1743430258; c=relaxed/simple; bh=o6YiP+hfdJ1vSNRtwyyAk/naULoqj66JwmyD1tDu4q8=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=e3ZoIB41idqOuKO29f5SpJCOuWAnE0gX535NfOLFsqfW4PVSchhsm3buRQqnBX63QYHAlU5N7z/7cyq5hcetPbfVV22VOhv2ocLx1wn5UsRvpW67L926mNLf/MjIF2em+C2K0cp+md3/p7L4SJ22YKy+fF8m7l/PJEpPHKc1J9s= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=kycF3tPT; arc=none smtp.client-ip=95.215.58.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="kycF3tPT" Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1743430252; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WY9rTWqlWYUdXrsjV4+jqRxNBOZeu76XLfJnw4sSWRc=; b=kycF3tPT9QrKetZdCxdw8Hlpcrq3xkS3oFLWKWa4AZVtrJ4rQAVk9v3GoOnPUA9EF74lrD BhLnfNE+2PVHO3zW4vlovcYDPzHBaQ2nv6v3APhj0iOtCRlB2CyIVHNZ8scBr3+FPTw2nZ jlGDcxcw5Wva4Q9iOHNHjHpWiR+6XOs= Date: Mon, 31 Mar 2025 16:10:47 +0200 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH V4 for-6.10/block] loop: Fix a race between loop detach and loop open To: Gulam Mohamed , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Cc: yukuai1@huaweicloud.com, hch@lst.de, axboe@kernel.dk References: <20240607190607.17705-1-gulam.mohamed@oracle.com> Content-Language: en-US X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Zhu Yanjun In-Reply-To: <20240607190607.17705-1-gulam.mohamed@oracle.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT On 07.06.24 21:06, Gulam Mohamed wrote: > 1. Userspace sends the command "losetup -d" which uses the open() call > to open the device > 2. Kernel receives the ioctl command "LOOP_CLR_FD" which calls the > function loop_clr_fd() > 3. If LOOP_CLR_FD is the first command received at the time, then the > AUTOCLEAR flag is not set and deletion of the > loop device proceeds ahead and scans the partitions (drop/add > partitions) > > if (disk_openers(lo->lo_disk) > 1) { > lo->lo_flags |= LO_FLAGS_AUTOCLEAR; > loop_global_unlock(lo, true); > return 0; > } > > 4. Before scanning partitions, it will check to see if any partition of > the loop device is currently opened > 5. If any partition is opened, then it will return EBUSY: > > if (disk->open_partitions) > return -EBUSY; > 6. So, after receiving the "LOOP_CLR_FD" command and just before the above > check for open_partitions, if any other command > (like blkid) opens any partition of the loop device, then the partition > scan will not proceed and EBUSY is returned as shown in above code > 7. But in "__loop_clr_fd()", this EBUSY error is not propagated > 8. We have noticed that this is causing the partitions of the loop to > remain stale even after the loop device is detached resulting in the > IO errors on the partitions > Fix: > Defer the detach of loop device to release function, which is called > when the last close happens, by setting the lo_flags to LO_FLAGS_AUTOCLEAR > at the time of detach i.e in loop_clr_fd() function. > > Test case involves the following two scripts: > > script1.sh: > > while [ 1 ]; > do > losetup -P -f /home/opt/looptest/test10.img > blkid /dev/loop0p1 > done > > script2.sh: > > while [ 1 ]; > do > losetup -d /dev/loop0 > done > > Without fix, the following IO errors have been observed: > > kernel: __loop_clr_fd: partition scan of loop0 failed (rc=-16) > kernel: I/O error, dev loop0, sector 20971392 op 0x0:(READ) flags 0x80700 > phys_seg 1 prio class 0 > kernel: I/O error, dev loop0, sector 108868 op 0x0:(READ) flags 0x0 > phys_seg 1 prio class 0 > kernel: Buffer I/O error on dev loop0p1, logical block 27201, async page > read > > Signed-off-by: Gulam Mohamed This is for v6.10 stable, should add "Cc: stable@vger.kernel.org"? So the engineers who work the stable branch can find this commit and apply it? Reviewed-by: Zhu Yanjun Zhu Yanjun > --- > v4<-v3: > 1. Defer the loop detach to last close of loop device > 2. Removed the use of lo_open due to following reasons: > > Setting the lo_state to Lo_rundown in loop_clr_fd() may not help in > stopping the incoming open(), when the loop is being detached, as the > open() could invoke the lo_open() before the lo_state is set to Lo_rundown > and increment the disk_openers refcnt later. > As the actual cleanup is deferred to last close, in release, there is no > chance for the open() to kick in to take the reference. Because both open() > and release() are protected by open_mutex and hence they cannot run in > parallel. > So, lo_open() and setting lo_state to Lo_rundown is not needed. Removing > the loop state Lo_rundown as its not used anymore. > > drivers/block/loop.c | 44 ++++++++------------------------------------ > 1 file changed, 8 insertions(+), 36 deletions(-) > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > index 28a95fd366fe..4936cadc1a63 100644 > --- a/drivers/block/loop.c > +++ b/drivers/block/loop.c > @@ -41,7 +41,6 @@ > enum { > Lo_unbound, > Lo_bound, > - Lo_rundown, > Lo_deleting, > }; > > @@ -1131,7 +1130,7 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode, > return error; > } > > -static void __loop_clr_fd(struct loop_device *lo, bool release) > +static void __loop_clr_fd(struct loop_device *lo) > { > struct file *filp; > gfp_t gfp = lo->old_gfp_mask; > @@ -1139,14 +1138,6 @@ static void __loop_clr_fd(struct loop_device *lo, bool release) > if (test_bit(QUEUE_FLAG_WC, &lo->lo_queue->queue_flags)) > blk_queue_write_cache(lo->lo_queue, false, false); > > - /* > - * Freeze the request queue when unbinding on a live file descriptor and > - * thus an open device. When called from ->release we are guaranteed > - * that there is no I/O in progress already. > - */ > - if (!release) > - blk_mq_freeze_queue(lo->lo_queue); > - > spin_lock_irq(&lo->lo_lock); > filp = lo->lo_backing_file; > lo->lo_backing_file = NULL; > @@ -1164,8 +1155,6 @@ static void __loop_clr_fd(struct loop_device *lo, bool release) > mapping_set_gfp_mask(filp->f_mapping, gfp); > /* This is safe: open() is still holding a reference. */ > module_put(THIS_MODULE); > - if (!release) > - blk_mq_unfreeze_queue(lo->lo_queue); > > disk_force_media_change(lo->lo_disk); > > @@ -1180,11 +1169,7 @@ static void __loop_clr_fd(struct loop_device *lo, bool release) > * must be at least one and it can only become zero when the > * current holder is released. > */ > - if (!release) > - mutex_lock(&lo->lo_disk->open_mutex); > err = bdev_disk_changed(lo->lo_disk, false); > - if (!release) > - mutex_unlock(&lo->lo_disk->open_mutex); > if (err) > pr_warn("%s: partition scan of loop%d failed (rc=%d)\n", > __func__, lo->lo_number, err); > @@ -1232,25 +1217,8 @@ static int loop_clr_fd(struct loop_device *lo) > loop_global_unlock(lo, true); > return -ENXIO; > } > - /* > - * If we've explicitly asked to tear down the loop device, > - * and it has an elevated reference count, set it for auto-teardown when > - * the last reference goes away. This stops $!~#$@ udev from > - * preventing teardown because it decided that it needs to run blkid on > - * the loopback device whenever they appear. xfstests is notorious for > - * failing tests because blkid via udev races with a losetup > - * /do something like mkfs/losetup -d causing the losetup -d > - * command to fail with EBUSY. > - */ > - if (disk_openers(lo->lo_disk) > 1) { > - lo->lo_flags |= LO_FLAGS_AUTOCLEAR; > - loop_global_unlock(lo, true); > - return 0; > - } > - lo->lo_state = Lo_rundown; > + lo->lo_flags |= LO_FLAGS_AUTOCLEAR; > loop_global_unlock(lo, true); > - > - __loop_clr_fd(lo, false); > return 0; > } > > @@ -1724,15 +1692,19 @@ static void lo_release(struct gendisk *disk) > if (disk_openers(disk) > 0) > return; > > + /* > + * Clear the backing device information if this is the last close of > + * a device that's been marked for auto clear, or on which LOOP_CLR_FD > + * has been called. > + */ > mutex_lock(&lo->lo_mutex); > if (lo->lo_state == Lo_bound && (lo->lo_flags & LO_FLAGS_AUTOCLEAR)) { > - lo->lo_state = Lo_rundown; > mutex_unlock(&lo->lo_mutex); > /* > * In autoclear mode, stop the loop thread > * and remove configuration after last close. > */ > - __loop_clr_fd(lo, true); > + __loop_clr_fd(lo); > return; > } > mutex_unlock(&lo->lo_mutex);