* [RFC][PATCH] nfsd regression since delayed fput()
@ 2013-10-16 18:52 Al Viro
2013-10-17 4:48 ` Linus Torvalds
0 siblings, 1 reply; 9+ messages in thread
From: Al Viro @ 2013-10-16 18:52 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-fsdevel
Background: nfsd v[23] had throughput regression since delayed fput
went in; every read or write ends up doing fput() and we get a pair
of extra context switches out of that (plus quite a bit of work
in queue_work itselfi, apparently). Use of schedule_delayed_work()
gives it a chance to accumulate a bit before we do __fput() on all
of them. I'm not too happy about that solution, but... on at least
one real-world setup it reverts about 10% throughput loss we got from
switch to delayed fput.
If anybody has better ideas, I'll gladly take those instead - this
approach feels kludgy ;-/
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/fs/file_table.c b/fs/file_table.c
index abdd15a..0ca5fa4 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -317,7 +317,7 @@ void fput(struct file *file)
}
if (llist_add(&file->f_u.fu_llist, &delayed_fput_list))
- schedule_work(&delayed_fput_work);
+ schedule_delayed_work(&delayed_fput_work, 1);
}
}
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [RFC][PATCH] nfsd regression since delayed fput()
2013-10-16 18:52 [RFC][PATCH] nfsd regression since delayed fput() Al Viro
@ 2013-10-17 4:48 ` Linus Torvalds
2013-10-17 18:14 ` J. Bruce Fields
0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2013-10-17 4:48 UTC (permalink / raw)
To: Al Viro; +Cc: linux-fsdevel
On Wed, Oct 16, 2013 at 11:52 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> If anybody has better ideas, I'll gladly take those instead - this
> approach feels kludgy ;-/
It looks kludgy indeed, but I do like the concept of basically
rate-limiting delayed_fput_work.
At the same time, I worry that it will cause the same kinds of things
that we discussed for user processes: delayed fput can result in
visible semantic differences (eg silly-renames) if it is delayed past
other operations that care about elevated reference counts.
Now I hope that people don't nfs-(re)export nfs filesystems, so the
nfs silly-rename isn't necessarily an issue, but I could imagine other
similar things.
I dunno. I like the rate-limiting, I worry about unintended consequences.
Linus
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC][PATCH] nfsd regression since delayed fput()
2013-10-17 4:48 ` Linus Torvalds
@ 2013-10-17 18:14 ` J. Bruce Fields
2013-10-17 18:29 ` Linus Torvalds
0 siblings, 1 reply; 9+ messages in thread
From: J. Bruce Fields @ 2013-10-17 18:14 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Al Viro, linux-fsdevel
On Wed, Oct 16, 2013 at 09:48:06PM -0700, Linus Torvalds wrote:
> On Wed, Oct 16, 2013 at 11:52 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > If anybody has better ideas, I'll gladly take those instead - this
> > approach feels kludgy ;-/
>
> It looks kludgy indeed, but I do like the concept of basically
> rate-limiting delayed_fput_work.
>
> At the same time, I worry that it will cause the same kinds of things
> that we discussed for user processes: delayed fput can result in
> visible semantic differences (eg silly-renames) if it is delayed past
> other operations that care about elevated reference counts.
>
> Now I hope that people don't nfs-(re)export nfs filesystems, so the
> nfs silly-rename isn't necessarily an issue, but I could imagine other
> similar things.
NFS isn't exportable and there aren't any plans to change that.
--b.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC][PATCH] nfsd regression since delayed fput()
2013-10-17 18:14 ` J. Bruce Fields
@ 2013-10-17 18:29 ` Linus Torvalds
2013-10-17 18:39 ` Al Viro
0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2013-10-17 18:29 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: Al Viro, linux-fsdevel
On Thu, Oct 17, 2013 at 11:14 AM, J. Bruce Fields <bfields@fieldses.org> wrote:
>
> NFS isn't exportable and there aren't any plans to change that.
The thing is, other filesystems have other rules. Not everybody
necessarily supports the Unix "you can remove files that are still in
use" semantics, so I could imagine EBUSY etc happening..
Linus
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC][PATCH] nfsd regression since delayed fput()
2013-10-17 18:29 ` Linus Torvalds
@ 2013-10-17 18:39 ` Al Viro
2013-10-17 19:09 ` Linus Torvalds
0 siblings, 1 reply; 9+ messages in thread
From: Al Viro @ 2013-10-17 18:39 UTC (permalink / raw)
To: Linus Torvalds; +Cc: J. Bruce Fields, linux-fsdevel
On Thu, Oct 17, 2013 at 11:29:46AM -0700, Linus Torvalds wrote:
> On Thu, Oct 17, 2013 at 11:14 AM, J. Bruce Fields <bfields@fieldses.org> wrote:
> >
> > NFS isn't exportable and there aren't any plans to change that.
>
> The thing is, other filesystems have other rules. Not everybody
> necessarily supports the Unix "you can remove files that are still in
> use" semantics, so I could imagine EBUSY etc happening..
Yes, but... To get anything of that kind you would need to have a kernel
thread open a file on such fs, do IO and fput() it, then have something attempt
to unlink() it and fail with EBUSY due to fput() being delayed. If that
something is *not* the same kernel thread, it's already racy - after all, if
that unlink() happens while the kernel thread does IO, we are going to get
what we are going to get. And if it is the same thread... What would it
be? knfsd? Exporting a filesystem with possible-EBUSY-on-unlink?
I don't see any exportable examples of that...
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC][PATCH] nfsd regression since delayed fput()
2013-10-17 18:39 ` Al Viro
@ 2013-10-17 19:09 ` Linus Torvalds
2013-10-17 20:12 ` Al Viro
0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2013-10-17 19:09 UTC (permalink / raw)
To: Al Viro; +Cc: J. Bruce Fields, linux-fsdevel
On Thu, Oct 17, 2013 at 11:39 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Yes, but... To get anything of that kind you would need to have a kernel
> thread open a file on such fs, do IO and fput() it, then have something attempt
> to unlink() it and fail with EBUSY due to fput() being delayed. If that
> something is *not* the same kernel thread, it's already racy
Umm. Let me remind you why this thing got started in the first place.
- the kernel thread is "knfsd"
- the "IO and fput" is a read over NFS
- the unlink is an NFS unlink operation
Yeah, that wasn't so hard to trigger after all, was it? It's just a
client that reads a file and unlinks it within less than a jiffy of
reading it. Boom.
All it needs is a non-unixy filesystem getting nfs-exported. Like FAT.
Now, I actually think that FAT is fine with unlinking a file that is
in use, but my whole argument was "unintended consequences". It is
*not* obvious that keeping a reference is valid.
Look at rmdir() instead of unlink(), for example. Even POSIX allows
EBUSY for a directory entry that is still in use, and requires it for
a non-empty one. So even with a Unix filesystem, I can imagine
somebody (and by somebody, I mean "knfsd, as response to normal NFS
client traffic") doing:
- read file (delayed fput)
- unlink file
- rmdir directory that file used to be in and that *should* be empty
- delayed fput happens now.
and even a posix-compliant filesystem could have issues with this
(again, the *example* for a filesystem with issues would be NFS, and I
realize you don't re-export it, but the point is, filesystem semantics
can make this problematic).
Again, the above is an *example* of subtle issues with delaying the fput.
Linus
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC][PATCH] nfsd regression since delayed fput()
2013-10-17 19:09 ` Linus Torvalds
@ 2013-10-17 20:12 ` Al Viro
2013-10-17 22:33 ` Linus Torvalds
0 siblings, 1 reply; 9+ messages in thread
From: Al Viro @ 2013-10-17 20:12 UTC (permalink / raw)
To: Linus Torvalds; +Cc: J. Bruce Fields, linux-fsdevel
On Thu, Oct 17, 2013 at 12:09:00PM -0700, Linus Torvalds wrote:
> All it needs is a non-unixy filesystem getting nfs-exported. Like FAT.
>
> Now, I actually think that FAT is fine with unlinking a file that is
> in use, but my whole argument was "unintended consequences". It is
> *not* obvious that keeping a reference is valid.
>
> Look at rmdir() instead of unlink(), for example. Even POSIX allows
> EBUSY for a directory entry that is still in use, and requires it for
> a non-empty one. So even with a Unix filesystem, I can imagine
> somebody (and by somebody, I mean "knfsd, as response to normal NFS
> client traffic") doing:
>
> - read file (delayed fput)
> - unlink file
> - rmdir directory that file used to be in and that *should* be empty
> - delayed fput happens now.
>
> and even a posix-compliant filesystem could have issues with this
> (again, the *example* for a filesystem with issues would be NFS, and I
> realize you don't re-export it, but the point is, filesystem semantics
> can make this problematic).
Sure, I agree that such a scenario might happen; the question was "which
fs would have to be exported to step on that?" and AFAICS none of the
exportable filesystems have any problems of that sort.
> Again, the above is an *example* of subtle issues with delaying the fput.
And we have a tool for dealing with those, if they turn out to be real -
flush_delayed_fput(), doing all pending delayed ones. I would rather
avoid using it unless we have a case where it's really needed, though;
it can have interesting consequences wrt locking order, etc.
Frankly, in case of knfsd I'm a lot more concerned about the amount of
struct file (all for the same few disk files) sitting around opened and
waiting to be closed, just because there's a client saturating a 10G
link with read requests...
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC][PATCH] nfsd regression since delayed fput()
2013-10-17 20:12 ` Al Viro
@ 2013-10-17 22:33 ` Linus Torvalds
2013-10-25 14:18 ` Al Viro
0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2013-10-17 22:33 UTC (permalink / raw)
To: Al Viro; +Cc: J. Bruce Fields, linux-fsdevel
On Thu, Oct 17, 2013 at 1:12 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> And we have a tool for dealing with those, if they turn out to be real -
> flush_delayed_fput(), doing all pending delayed ones. I would rather
> avoid using it unless we have a case where it's really needed, though;
> it can have interesting consequences wrt locking order, etc.
Fair enough..
> Frankly, in case of knfsd I'm a lot more concerned about the amount of
> struct file (all for the same few disk files) sitting around opened and
> waiting to be closed, just because there's a client saturating a 10G
> link with read requests...
I agree that that might be an issue - we've had somewhat similar
issues with RCU freeing (tons and tons of pending work), although
quite frankly the RCU grace periods can end up being much longer than
a jiffy, so I don't know how noticeable it is. But there could be
latency reasons to try to avoid having *too* much outstanding work,
so..
Maybe we should have some (per-cpu) counter, and every N cases we
should use a zero timeout?
Linus
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC][PATCH] nfsd regression since delayed fput()
2013-10-17 22:33 ` Linus Torvalds
@ 2013-10-25 14:18 ` Al Viro
0 siblings, 0 replies; 9+ messages in thread
From: Al Viro @ 2013-10-25 14:18 UTC (permalink / raw)
To: Linus Torvalds; +Cc: J. Bruce Fields, linux-fsdevel
On Thu, Oct 17, 2013 at 03:33:07PM -0700, Linus Torvalds wrote:
> On Thu, Oct 17, 2013 at 1:12 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > And we have a tool for dealing with those, if they turn out to be real -
> > flush_delayed_fput(), doing all pending delayed ones. I would rather
> > avoid using it unless we have a case where it's really needed, though;
> > it can have interesting consequences wrt locking order, etc.
>
> Fair enough..
>
> > Frankly, in case of knfsd I'm a lot more concerned about the amount of
> > struct file (all for the same few disk files) sitting around opened and
> > waiting to be closed, just because there's a client saturating a 10G
> > link with read requests...
>
> I agree that that might be an issue - we've had somewhat similar
> issues with RCU freeing (tons and tons of pending work), although
> quite frankly the RCU grace periods can end up being much longer than
> a jiffy, so I don't know how noticeable it is. But there could be
> latency reasons to try to avoid having *too* much outstanding work,
> so..
>
> Maybe we should have some (per-cpu) counter, and every N cases we
> should use a zero timeout?
Maybe... For now, though, I'd rather go for dumb approach and see how
much PITA it leaves - we can always add extra complexity if it turns
out to be a problem.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-10-25 14:18 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-16 18:52 [RFC][PATCH] nfsd regression since delayed fput() Al Viro
2013-10-17 4:48 ` Linus Torvalds
2013-10-17 18:14 ` J. Bruce Fields
2013-10-17 18:29 ` Linus Torvalds
2013-10-17 18:39 ` Al Viro
2013-10-17 19:09 ` Linus Torvalds
2013-10-17 20:12 ` Al Viro
2013-10-17 22:33 ` Linus Torvalds
2013-10-25 14:18 ` Al Viro
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).