[PATCH 0/2] Two fixes for loop devices

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] Two fixes for loop devices
@ 2017-06-16  5:02 NeilBrown
  2017-06-16  5:02 ` [PATCH 2/2] loop: Add PF_LESS_THROTTLE to block/loop device thread NeilBrown
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: NeilBrown @ 2017-06-16  5:02 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-kernel

Hi Jens,
 one of these is a resend of a patch I sent a while back.
 The other is new - loop closes files differently from close()
 and in a way that can confuse NFS.

Thanks,
NeilBrown

---

NeilBrown (2):
      loop: use filp_close() rather than fput()
      loop: Add PF_LESS_THROTTLE to block/loop device thread.


 drivers/block/loop.c |   14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

--
Signature

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2] loop: Add PF_LESS_THROTTLE to block/loop device thread.
  2017-06-16  5:02 [PATCH 0/2] Two fixes for loop devices NeilBrown
@ 2017-06-16  5:02 ` NeilBrown
  2017-06-16  7:36   ` Christoph Hellwig
  2017-06-16  5:02 ` [PATCH 1/2] loop: use filp_close() rather than fput() NeilBrown
  2017-06-16 14:29 ` [PATCH 0/2] Two fixes for loop devices Jens Axboe
  2 siblings, 1 reply; 10+ messages in thread
From: NeilBrown @ 2017-06-16  5:02 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-kernel

When a filesystem is mounted from a loop device, writes are
throttled by balance_dirty_pages() twice: once when writing
to the filesystem and once when the loop_handle_cmd() writes
to the backing file.  This double-throttling can trigger
positive feedback loops that create significant delays.  The
throttling at the lower level is seen by the upper level as
a slow device, so it throttles extra hard.

The PF_LESS_THROTTLE flag was created to handle exactly this
circumstance, though with an NFS filesystem mounted from a
local NFS server.  It reduces the throttling on the lower
layer so that it can proceed largely unthrottled.

To demonstrate this, create a filesystem on a loop device
and write (e.g. with dd) several large files which combine
to consume significantly more than the limit set by
/proc/sys/vm/dirty_ratio or dirty_bytes.  Measure the total
time taken.

When I do this directly on a device (no loop device) the
total time for several runs (mkfs, mount, write 200 files,
umount) is fairly stable: 28-35 seconds.
When I do this over a loop device the times are much worse
and less stable.  52-460 seconds.  Half below 100seconds,
half above.
When I apply this patch, the times become stable again,
though not as fast as the no-loop-back case: 53-72 seconds.

There may be room for further improvement as the total overhead still
seems too high, but this is a big improvement.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/block/loop.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 9c457ca6c55e..6ed7c4506951 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -843,10 +843,16 @@ static void loop_unprepare_queue(struct loop_device *lo)
 	kthread_stop(lo->worker_task);
 }

+static int loop_kthread_worker_fn(void *worker_ptr)
+{
+	current->flags |= PF_LESS_THROTTLE;
+	return kthread_worker_fn(worker_ptr);
+}
+
 static int loop_prepare_queue(struct loop_device *lo)
 {
 	kthread_init_worker(&lo->worker);
-	lo->worker_task = kthread_run(kthread_worker_fn,
+	lo->worker_task = kthread_run(loop_kthread_worker_fn,
 			&lo->worker, "loop%d", lo->lo_number);
 	if (IS_ERR(lo->worker_task))
 		return -ENOMEM;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] loop: Add PF_LESS_THROTTLE to block/loop device thread.
  2017-06-16  5:02 ` [PATCH 2/2] loop: Add PF_LESS_THROTTLE to block/loop device thread NeilBrown
@ 2017-06-16  7:36   ` Christoph Hellwig
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2017-06-16  7:36 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-block, linux-kernel

why isn't loop using kthread_create_worker()?  Why isn't the less
throttle a flag to kthread_create_worker()?  I hate all this open
coding..

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] loop: use filp_close() rather than fput()
  2017-06-16  5:02 [PATCH 0/2] Two fixes for loop devices NeilBrown
  2017-06-16  5:02 ` [PATCH 2/2] loop: Add PF_LESS_THROTTLE to block/loop device thread NeilBrown
@ 2017-06-16  5:02 ` NeilBrown
  2017-06-16  7:37   ` Christoph Hellwig
  2017-06-17  0:01   ` Al Viro
  2017-06-16 14:29 ` [PATCH 0/2] Two fixes for loop devices Jens Axboe
  2 siblings, 2 replies; 10+ messages in thread
From: NeilBrown @ 2017-06-16  5:02 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-kernel

When a loop device is being shutdown the backing file is
closed with fput().  This is different from how close(2)
closes files - it uses filp_close().

The difference is important for filesystems which provide a ->flush
file operation such as NFS.  NFS assumes a flush will always
be called on last close, and gets confused otherwise.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 drivers/block/loop.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index ebbd0c3fe0ed..9c457ca6c55e 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -682,7 +682,7 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev,
 	if (error)
 		goto out_putf;
 
-	fput(old_file);
+	filp_close(old_file, NULL);
 	if (lo->lo_flags & LO_FLAGS_PARTSCAN)
 		loop_reread_partitions(lo, bdev);
 	return 0;
@@ -1071,12 +1071,12 @@ static int loop_clr_fd(struct loop_device *lo)
 	loop_unprepare_queue(lo);
 	mutex_unlock(&lo->lo_ctl_mutex);
 	/*
-	 * Need not hold lo_ctl_mutex to fput backing file.
+	 * Need not hold lo_ctl_mutex to close backing file.
 	 * Calling fput holding lo_ctl_mutex triggers a circular
 	 * lock dependency possibility warning as fput can take
 	 * bd_mutex which is usually taken before lo_ctl_mutex.
 	 */
-	fput(filp);
+	filp_close(filp, NULL);
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] loop: use filp_close() rather than fput()
  2017-06-16  5:02 ` [PATCH 1/2] loop: use filp_close() rather than fput() NeilBrown
@ 2017-06-16  7:37   ` Christoph Hellwig
  2017-06-17  0:01   ` Al Viro
  1 sibling, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2017-06-16  7:37 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-block, linux-kernel

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] loop: use filp_close() rather than fput()
  2017-06-16  5:02 ` [PATCH 1/2] loop: use filp_close() rather than fput() NeilBrown
  2017-06-16  7:37   ` Christoph Hellwig
@ 2017-06-17  0:01   ` Al Viro
  2017-06-18  4:30     ` NeilBrown
  1 sibling, 1 reply; 10+ messages in thread
From: Al Viro @ 2017-06-17  0:01 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jens Axboe, linux-block, linux-kernel

On Fri, Jun 16, 2017 at 03:02:09PM +1000, NeilBrown wrote:
> When a loop device is being shutdown the backing file is
> closed with fput().  This is different from how close(2)
> closes files - it uses filp_close().
> 
> The difference is important for filesystems which provide a ->flush
> file operation such as NFS.  NFS assumes a flush will always
> be called on last close, and gets confused otherwise.

Huh?  You do realize that mmap() + close() + modify + msync() + munmap()
will have IO done *after* the last flush, right?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] loop: use filp_close() rather than fput()
  2017-06-17  0:01   ` Al Viro
@ 2017-06-18  4:30     ` NeilBrown
  0 siblings, 0 replies; 10+ messages in thread
From: NeilBrown @ 2017-06-18  4:30 UTC (permalink / raw)
  To: Al Viro, Trond Myklebust
  Cc: Jens Axboe, linux-block, linux-kernel, Linux NFS Mailing list

[-- Attachment #1: Type: text/plain, Size: 2004 bytes --]

On Sat, Jun 17 2017, Al Viro wrote:

> On Fri, Jun 16, 2017 at 03:02:09PM +1000, NeilBrown wrote:
>> When a loop device is being shutdown the backing file is
>> closed with fput().  This is different from how close(2)
>> closes files - it uses filp_close().
>> 
>> The difference is important for filesystems which provide a ->flush
>> file operation such as NFS.  NFS assumes a flush will always
>> be called on last close, and gets confused otherwise.
>
> Huh?  You do realize that mmap() + close() + modify + msync() + munmap()
> will have IO done *after* the last flush, right?

Yes I do ... or rather I did.  I didn't make that connection this time.

The sequence you describe causes exactly the same sort of problem.
I sent a patch to Trond to add a vfs_fsync() call to nfs_file_release()
but he claims the current behaviour is "working as expected".  I didn't
quite know what to make of that..

https://www.spinics.net/lists/linux-nfs/msg62603.html

To provide the full picture:
 When an NFS file has dirty pages, they (indirectly) hold extra
 references on the superblock, using nfs_sb_active().
 This means that when the filesystem is unmounted, the superblock
 remains active until all the writes complete.  This contrasts with
 every other filesystems where all writes will complete before the
 umount returns.

 When you open/write/close, there will be no dirty pages at umount time
 (because close() flushes) so this doesn't cause a problem.  But when
 you mmap, or use a loop device, then dirty pages can still be around to
 keep the superblock alive.

 The observable symptom that brought this to my attention was that
    umount -a -t nfs
    disable network
    sync

 can hang in sync, because the NFS filesystems can still be waiting to
 write out data.

If nfs_file_release() adds vfs_fsync(), or maybe if __fput() calls
filp->f_op->flush(), then loop.c wouldn't need to use filp_close().

Which would you prefer?

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] Two fixes for loop devices
  2017-06-16  5:02 [PATCH 0/2] Two fixes for loop devices NeilBrown
  2017-06-16  5:02 ` [PATCH 2/2] loop: Add PF_LESS_THROTTLE to block/loop device thread NeilBrown
  2017-06-16  5:02 ` [PATCH 1/2] loop: use filp_close() rather than fput() NeilBrown
@ 2017-06-16 14:29 ` Jens Axboe
  2017-06-18  4:33   ` NeilBrown
  2 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2017-06-16 14:29 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-block, linux-kernel

On 06/15/2017 11:02 PM, NeilBrown wrote:
> Hi Jens,
>  one of these is a resend of a patch I sent a while back.
>  The other is new - loop closes files differently from close()
>  and in a way that can confuse NFS.

Are you wanting to get these into 4.12, or defer to 4.13?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] Two fixes for loop devices
  2017-06-16 14:29 ` [PATCH 0/2] Two fixes for loop devices Jens Axboe
@ 2017-06-18  4:33   ` NeilBrown
  2017-06-18 15:06     ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: NeilBrown @ 2017-06-18  4:33 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 672 bytes --]

On Fri, Jun 16 2017, Jens Axboe wrote:

> On 06/15/2017 11:02 PM, NeilBrown wrote:
>> Hi Jens,
>>  one of these is a resend of a patch I sent a while back.
>>  The other is new - loop closes files differently from close()
>>  and in a way that can confuse NFS.
>
> Are you wanting to get these into 4.12, or defer to 4.13?

I'm happy either way.  I just want them to land eventually, so I can
tick them off my list.

The conversation with Al might result in a different fix for the
flip_close() problem, but I think that it is still more correct to call
filp_close(), because what loop is doing is a lot like closing a file
that it has been writing to.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] Two fixes for loop devices
  2017-06-18  4:33   ` NeilBrown
@ 2017-06-18 15:06     ` Jens Axboe
  0 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2017-06-18 15:06 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-block, linux-kernel

On 06/17/2017 10:33 PM, NeilBrown wrote:
> On Fri, Jun 16 2017, Jens Axboe wrote:
> 
>> On 06/15/2017 11:02 PM, NeilBrown wrote:
>>> Hi Jens,
>>>  one of these is a resend of a patch I sent a while back.
>>>  The other is new - loop closes files differently from close()
>>>  and in a way that can confuse NFS.
>>
>> Are you wanting to get these into 4.12, or defer to 4.13?
> 
> I'm happy either way.  I just want them to land eventually, so I can
> tick them off my list.
> 
> The conversation with Al might result in a different fix for the
> flip_close() problem, but I think that it is still more correct to call
> filp_close(), because what loop is doing is a lot like closing a file
> that it has been writing to.

I'll toss 2/2 in for now for 4.13, please resend the other once the issues
have been ironed out.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-06-18 15:06 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-16  5:02 [PATCH 0/2] Two fixes for loop devices NeilBrown
2017-06-16  5:02 ` [PATCH 2/2] loop: Add PF_LESS_THROTTLE to block/loop device thread NeilBrown
2017-06-16  7:36   ` Christoph Hellwig
2017-06-16  5:02 ` [PATCH 1/2] loop: use filp_close() rather than fput() NeilBrown
2017-06-16  7:37   ` Christoph Hellwig
2017-06-17  0:01   ` Al Viro
2017-06-18  4:30     ` NeilBrown
2017-06-16 14:29 ` [PATCH 0/2] Two fixes for loop devices Jens Axboe
2017-06-18  4:33   ` NeilBrown
2017-06-18 15:06     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).