[PATCH] loop: Make explicit loop device destruction lazy

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] loop: Make explicit loop device destruction lazy
@ 2012-09-28  6:09 Dave Chinner
  2012-09-28  8:41 ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2012-09-28  6:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel, axboe

From: Dave Chinner <dchinner@redhat.com>

xfstests has always had random failures of tests due to loop devices
failing to be torn down and hence leaving filesytems that cannot be
unmounted. This causes test runs to immediately stop.

Over the past 6 or 7 years we've added hacks like explicit unmount
-d commands for loop mounts, losetup -d after unmount -d fails, etc,
but still the problems persist.  Recently, the frequency of loop
related failures increased again to the point that xfstests 259 will
reliably fail with a stray loop device that was not torn down.

That is despite the fact the test is above as simple as it gets -
loop 5 or 6 times running mkfs.xfs with different paramters:

        lofile=$(losetup -f)
        losetup $lofile "$testfile"
        "$MKFS_XFS_PROG" -b size=512 $lofile >/dev/null || echo "mkfs failed!"
        sync
        losetup -d $lofile

And losteup -d $lofile is failing with EBUSY on 1-3 of these loops
every time the test is run.

Turns out that blkid is running simultaneously with losetup -d, and
so it sees an elevated reference count and returns EBUSY.  But why
is blkid running? It's obvious, isn't it? udev has decided to try
and find out what is on the block device as a result of a creation
notification. And it is racing with mkfs, so might still be scanning
the device when mkfs finishes and we try to tear it down.

So, make losetup -d force autoremove behaviour. That is, when the
last reference goes away, tear down the device. xfstests wants it
*gone*, not causing random teardown failures when we know that all
the operations the tests have specifically run on the device have
completed and are no longer referencing the loop device.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 drivers/block/loop.c |   17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 3bba655..187b573 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -976,8 +976,21 @@ static int loop_clr_fd(struct loop_device *lo)
 	if (lo->lo_state != Lo_bound)
 		return -ENXIO;

-	if (lo->lo_refcnt > 1)	/* we needed one fd for the ioctl */
-		return -EBUSY;
+	/*
+	 * If we've explicitly asked to tear down the loop device,
+	 * and it has an elevated reference count, set it for auto-teardown when
+	 * the last reference goes away. This stops $!~#$@ udev from
+	 * preventing teardown because it decided that it needs to run blkid on
+	 * the loopback device whenever they appear. xfstests is notorious for
+	 * failing tests because blkid via udev races with a losetup
+	 * <dev>/do something like mkfs/losetup -d <dev> causing the losetup -d
+	 * command to fail with EBUSY.
+	 */
+	if (lo->lo_refcnt > 1) {
+		lo->lo_flags |= LO_FLAGS_AUTOCLEAR;
+		mutex_unlock(&lo->lo_ctl_mutex);
+		return 0;
+	}

 	if (filp == NULL)
 		return -EINVAL;
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] loop: Make explicit loop device destruction lazy
  2012-09-28  6:09 [PATCH] loop: Make explicit loop device destruction lazy Dave Chinner
@ 2012-09-28  8:41 ` Jens Axboe
  2012-09-28 14:38   ` Dave Jones
  2012-09-28 15:02   ` Jeff Moyer
  0 siblings, 2 replies; 7+ messages in thread
From: Jens Axboe @ 2012-09-28  8:41 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-kernel, linux-fsdevel

On 2012-09-28 08:09, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> xfstests has always had random failures of tests due to loop devices
> failing to be torn down and hence leaving filesytems that cannot be
> unmounted. This causes test runs to immediately stop.
> 
> Over the past 6 or 7 years we've added hacks like explicit unmount
> -d commands for loop mounts, losetup -d after unmount -d fails, etc,
> but still the problems persist.  Recently, the frequency of loop
> related failures increased again to the point that xfstests 259 will
> reliably fail with a stray loop device that was not torn down.
> 
> That is despite the fact the test is above as simple as it gets -
> loop 5 or 6 times running mkfs.xfs with different paramters:
> 
>         lofile=$(losetup -f)
>         losetup $lofile "$testfile"
>         "$MKFS_XFS_PROG" -b size=512 $lofile >/dev/null || echo "mkfs failed!"
>         sync
>         losetup -d $lofile
> 
> And losteup -d $lofile is failing with EBUSY on 1-3 of these loops
> every time the test is run.
> 
> Turns out that blkid is running simultaneously with losetup -d, and
> so it sees an elevated reference count and returns EBUSY.  But why
> is blkid running? It's obvious, isn't it? udev has decided to try
> and find out what is on the block device as a result of a creation
> notification. And it is racing with mkfs, so might still be scanning
> the device when mkfs finishes and we try to tear it down.
> 
> So, make losetup -d force autoremove behaviour. That is, when the
> last reference goes away, tear down the device. xfstests wants it
> *gone*, not causing random teardown failures when we know that all
> the operations the tests have specifically run on the device have
> completed and are no longer referencing the loop device.

I hear that %^#@#! blkid behavior, it is such a pain in the neck. I
don't know how many times I've had to explain that behaviour to people
who run write testing with tracing, wonder wtf there are reads in the
trace.

Patch looks fine, seems like the sane thing to do (lazy-remove on last
drop) for this case.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] loop: Make explicit loop device destruction lazy
  2012-09-28  8:41 ` Jens Axboe
@ 2012-09-28 14:38   ` Dave Jones
  2012-09-29  5:51     ` Jens Axboe
  2012-09-28 15:02   ` Jeff Moyer
  1 sibling, 1 reply; 7+ messages in thread
From: Dave Jones @ 2012-09-28 14:38 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Dave Chinner, linux-kernel, linux-fsdevel

On Fri, Sep 28, 2012 at 10:41:50AM +0200, Jens Axboe wrote:

 > > Turns out that blkid is running simultaneously with losetup -d, and
 > > so it sees an elevated reference count and returns EBUSY.  But why
 > > is blkid running? It's obvious, isn't it? udev has decided to try
 > > and find out what is on the block device as a result of a creation
 > > notification. And it is racing with mkfs, so might still be scanning
 > > the device when mkfs finishes and we try to tear it down.
 > > 
 > I hear that %^#@#! blkid behavior, it is such a pain in the neck. I
 > don't know how many times I've had to explain that behaviour to people
 > who run write testing with tracing, wonder wtf there are reads in the
 > trace.
 
Could this problem explain this bug too ? https://bugzilla.redhat.com/show_bug.cgi?id=853674

	Dave

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] loop: Make explicit loop device destruction lazy
  2012-09-28 14:38   ` Dave Jones
@ 2012-09-29  5:51     ` Jens Axboe
  0 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2012-09-29  5:51 UTC (permalink / raw)
  To: Dave Jones, Dave Chinner, linux-kernel, linux-fsdevel

On 2012-09-28 16:38, Dave Jones wrote:
> On Fri, Sep 28, 2012 at 10:41:50AM +0200, Jens Axboe wrote:
> 
>  > > Turns out that blkid is running simultaneously with losetup -d, and
>  > > so it sees an elevated reference count and returns EBUSY.  But why
>  > > is blkid running? It's obvious, isn't it? udev has decided to try
>  > > and find out what is on the block device as a result of a creation
>  > > notification. And it is racing with mkfs, so might still be scanning
>  > > the device when mkfs finishes and we try to tear it down.
>  > > 
>  > I hear that %^#@#! blkid behavior, it is such a pain in the neck. I
>  > don't know how many times I've had to explain that behaviour to people
>  > who run write testing with tracing, wonder wtf there are reads in the
>  > trace.
>  
> Could this problem explain this bug too ? https://bugzilla.redhat.com/show_bug.cgi?id=853674

It certainly looks like it!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] loop: Make explicit loop device destruction lazy
  2012-09-28  8:41 ` Jens Axboe
  2012-09-28 14:38   ` Dave Jones
@ 2012-09-28 15:02   ` Jeff Moyer
  2012-09-29  5:50     ` Jens Axboe
  1 sibling, 1 reply; 7+ messages in thread
From: Jeff Moyer @ 2012-09-28 15:02 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Dave Chinner, linux-kernel, linux-fsdevel

Jens Axboe <axboe@kernel.dk> writes:

> On 2012-09-28 08:09, Dave Chinner wrote:
>> From: Dave Chinner <dchinner@redhat.com>
>> 
>> xfstests has always had random failures of tests due to loop devices
>> failing to be torn down and hence leaving filesytems that cannot be
>> unmounted. This causes test runs to immediately stop.
>> 
>> Over the past 6 or 7 years we've added hacks like explicit unmount
>> -d commands for loop mounts, losetup -d after unmount -d fails, etc,
>> but still the problems persist.  Recently, the frequency of loop
>> related failures increased again to the point that xfstests 259 will
>> reliably fail with a stray loop device that was not torn down.
>> 
>> That is despite the fact the test is above as simple as it gets -
>> loop 5 or 6 times running mkfs.xfs with different paramters:
>> 
>>         lofile=$(losetup -f)
>>         losetup $lofile "$testfile"
>>         "$MKFS_XFS_PROG" -b size=512 $lofile >/dev/null || echo "mkfs failed!"
>>         sync
>>         losetup -d $lofile
>> 
>> And losteup -d $lofile is failing with EBUSY on 1-3 of these loops
>> every time the test is run.
>> 
>> Turns out that blkid is running simultaneously with losetup -d, and
>> so it sees an elevated reference count and returns EBUSY.  But why
>> is blkid running? It's obvious, isn't it? udev has decided to try
>> and find out what is on the block device as a result of a creation
>> notification. And it is racing with mkfs, so might still be scanning
>> the device when mkfs finishes and we try to tear it down.
>> 
>> So, make losetup -d force autoremove behaviour. That is, when the
>> last reference goes away, tear down the device. xfstests wants it
>> *gone*, not causing random teardown failures when we know that all
>> the operations the tests have specifically run on the device have
>> completed and are no longer referencing the loop device.
>
> I hear that %^#@#! blkid behavior, it is such a pain in the neck. I
> don't know how many times I've had to explain that behaviour to people
> who run write testing with tracing, wonder wtf there are reads in the
> trace.
>
> Patch looks fine, seems like the sane thing to do (lazy-remove on last
> drop) for this case.

Do we also want to prevent further opens?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] loop: Make explicit loop device destruction lazy
  2012-09-28 15:02   ` Jeff Moyer
@ 2012-09-29  5:50     ` Jens Axboe
  2012-10-01 14:00       ` Jeff Moyer
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2012-09-29  5:50 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Dave Chinner, linux-kernel, linux-fsdevel

On 2012-09-28 17:02, Jeff Moyer wrote:
> Jens Axboe <axboe@kernel.dk> writes:
> 
>> On 2012-09-28 08:09, Dave Chinner wrote:
>>> From: Dave Chinner <dchinner@redhat.com>
>>>
>>> xfstests has always had random failures of tests due to loop devices
>>> failing to be torn down and hence leaving filesytems that cannot be
>>> unmounted. This causes test runs to immediately stop.
>>>
>>> Over the past 6 or 7 years we've added hacks like explicit unmount
>>> -d commands for loop mounts, losetup -d after unmount -d fails, etc,
>>> but still the problems persist.  Recently, the frequency of loop
>>> related failures increased again to the point that xfstests 259 will
>>> reliably fail with a stray loop device that was not torn down.
>>>
>>> That is despite the fact the test is above as simple as it gets -
>>> loop 5 or 6 times running mkfs.xfs with different paramters:
>>>
>>>         lofile=$(losetup -f)
>>>         losetup $lofile "$testfile"
>>>         "$MKFS_XFS_PROG" -b size=512 $lofile >/dev/null || echo "mkfs failed!"
>>>         sync
>>>         losetup -d $lofile
>>>
>>> And losteup -d $lofile is failing with EBUSY on 1-3 of these loops
>>> every time the test is run.
>>>
>>> Turns out that blkid is running simultaneously with losetup -d, and
>>> so it sees an elevated reference count and returns EBUSY.  But why
>>> is blkid running? It's obvious, isn't it? udev has decided to try
>>> and find out what is on the block device as a result of a creation
>>> notification. And it is racing with mkfs, so might still be scanning
>>> the device when mkfs finishes and we try to tear it down.
>>>
>>> So, make losetup -d force autoremove behaviour. That is, when the
>>> last reference goes away, tear down the device. xfstests wants it
>>> *gone*, not causing random teardown failures when we know that all
>>> the operations the tests have specifically run on the device have
>>> completed and are no longer referencing the loop device.
>>
>> I hear that %^#@#! blkid behavior, it is such a pain in the neck. I
>> don't know how many times I've had to explain that behaviour to people
>> who run write testing with tracing, wonder wtf there are reads in the
>> trace.
>>
>> Patch looks fine, seems like the sane thing to do (lazy-remove on last
>> drop) for this case.
> 
> Do we also want to prevent further opens?

That's not a bad idea, at least it would be the logical thing to do. But
it does get into the realm of potentially breaking existing behaviour.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] loop: Make explicit loop device destruction lazy
  2012-09-29  5:50     ` Jens Axboe
@ 2012-10-01 14:00       ` Jeff Moyer
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff Moyer @ 2012-10-01 14:00 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Dave Chinner, linux-kernel, linux-fsdevel

Jens Axboe <axboe@kernel.dk> writes:

> On 2012-09-28 17:02, Jeff Moyer wrote:
>> Jens Axboe <axboe@kernel.dk> writes:
>> 
>>> On 2012-09-28 08:09, Dave Chinner wrote:
>>>> From: Dave Chinner <dchinner@redhat.com>
>>>>
>>>> xfstests has always had random failures of tests due to loop devices
>>>> failing to be torn down and hence leaving filesytems that cannot be
>>>> unmounted. This causes test runs to immediately stop.
>>>>
>>>> Over the past 6 or 7 years we've added hacks like explicit unmount
>>>> -d commands for loop mounts, losetup -d after unmount -d fails, etc,
>>>> but still the problems persist.  Recently, the frequency of loop
>>>> related failures increased again to the point that xfstests 259 will
>>>> reliably fail with a stray loop device that was not torn down.
>>>>
>>>> That is despite the fact the test is above as simple as it gets -
>>>> loop 5 or 6 times running mkfs.xfs with different paramters:
>>>>
>>>>         lofile=$(losetup -f)
>>>>         losetup $lofile "$testfile"
>>>>         "$MKFS_XFS_PROG" -b size=512 $lofile >/dev/null || echo "mkfs failed!"
>>>>         sync
>>>>         losetup -d $lofile
>>>>
>>>> And losteup -d $lofile is failing with EBUSY on 1-3 of these loops
>>>> every time the test is run.
>>>>
>>>> Turns out that blkid is running simultaneously with losetup -d, and
>>>> so it sees an elevated reference count and returns EBUSY.  But why
>>>> is blkid running? It's obvious, isn't it? udev has decided to try
>>>> and find out what is on the block device as a result of a creation
>>>> notification. And it is racing with mkfs, so might still be scanning
>>>> the device when mkfs finishes and we try to tear it down.
>>>>
>>>> So, make losetup -d force autoremove behaviour. That is, when the
>>>> last reference goes away, tear down the device. xfstests wants it
>>>> *gone*, not causing random teardown failures when we know that all
>>>> the operations the tests have specifically run on the device have
>>>> completed and are no longer referencing the loop device.
>>>
>>> I hear that %^#@#! blkid behavior, it is such a pain in the neck. I
>>> don't know how many times I've had to explain that behaviour to people
>>> who run write testing with tracing, wonder wtf there are reads in the
>>> trace.
>>>
>>> Patch looks fine, seems like the sane thing to do (lazy-remove on last
>>> drop) for this case.
>> 
>> Do we also want to prevent further opens?
>
> That's not a bad idea, at least it would be the logical thing to do. But
> it does get into the realm of potentially breaking existing behaviour.

What do you think could rely on the existing behaviour (that isn't
broken by design)?

-Jeff

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-10-01 14:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-28  6:09 [PATCH] loop: Make explicit loop device destruction lazy Dave Chinner
2012-09-28  8:41 ` Jens Axboe
2012-09-28 14:38   ` Dave Jones
2012-09-29  5:51     ` Jens Axboe
2012-09-28 15:02   ` Jeff Moyer
2012-09-29  5:50     ` Jens Axboe
2012-10-01 14:00       ` Jeff Moyer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).