From: "NeilBrown" <neilb@suse.de>
To: "Jens Axboe" <axboe@kernel.dk>
Cc: "Christian Brauner" <brauner@kernel.org>,
"Al Viro" <viro@zeniv.linux.org.uk>,
"Oleg Nesterov" <oleg@redhat.com>,
"Chuck Lever" <chuck.lever@oracle.com>,
"Jeff Layton" <jlayton@kernel.org>,
"Ingo Molnar" <mingo@redhat.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Juri Lelli" <juri.lelli@redhat.com>,
"Vincent Guittot" <vincent.guittot@linaro.org>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-nfs@vger.kernel.org
Subject: Re: [PATCH 1/2] Allow a kthread to declare that it calls task_work_run()
Date: Wed, 06 Dec 2023 09:16:21 +1100 [thread overview]
Message-ID: <170181458198.7109.790647899711986334@noble.neil.brown.name> (raw)
In-Reply-To: <3609267c-3fcd-43d6-9b43-9f84bef029a2@kernel.dk>
On Wed, 06 Dec 2023, Jens Axboe wrote:
> On 12/5/23 2:58 PM, Jens Axboe wrote:
> > On 12/5/23 2:28 PM, NeilBrown wrote:
> >> On Tue, 05 Dec 2023, Christian Brauner wrote:
> >>> On Mon, Dec 04, 2023 at 03:09:44PM -0700, Jens Axboe wrote:
> >>>> On 12/4/23 2:02 PM, NeilBrown wrote:
> >>>>> It isn't clear to me what _GPL is appropriate, but maybe the rules
> >>>>> changed since last I looked..... are there rules?
> >>>>>
> >>>>> My reasoning was that the call is effectively part of the user-space
> >>>>> ABI. A user-space process can call this trivially by invoking any
> >>>>> system call. The user-space ABI is explicitly a boundary which the GPL
> >>>>> does not cross. So it doesn't seem appropriate to prevent non-GPL
> >>>>> kernel code from doing something that non-GPL user-space code can
> >>>>> trivially do.
> >>>>
> >>>> By that reasoning, basically everything in the kernel should be non-GPL
> >>>> marked. And while task_work can get used by the application, it happens
> >>>> only indirectly or implicitly. So I don't think this reasoning is sound
> >>>> at all, it's not an exported ABI or API by itself.
> >>>>
> >>>> For me, the more core of an export it is, the stronger the reason it
> >>>> should be GPL. FWIW, I don't think exporting task_work functionality is
> >
> >>>
> >>> Yeah, I'm not too fond of that part as well. I don't think we want to
> >>> give modules the ability to mess with task work. This is just asking for
> >>> trouble.
> >>>
> >>
> >> Ok, maybe we need to reframe the problem then.
> >>
> >> Currently fput(), and hence filp_close(), take control away from kernel
> >> threads in that they cannot be sure that a "close" has actually
> >> completed.
> >>
> >> This is already a problem for nfsd. When renaming a file, nfsd needs to
> >> ensure any cached "open" that it has on the file is closed (else when
> >> re-exporting an NFS filesystem it can result in a silly-rename).
> >>
> >> nfsd currently handles this case by calling flush_delayed_fput(). I
> >> suspect you are no more happy about exporting that than you are about
> >> exporting task_work_run(), but this solution isn't actually 100%
> >> reliable. If some other thread calls flush_delayed_fput() between nfsd
> >> calling filp_close() and that same nfsd calling flush_delayed_fput(),
> >> then the second flush can return before the first flush (in the other
> >> thread) completes all the work it took on.
> >>
> >> What we really need - both for handling renames and for avoiding
> >> possible memory exhaustion - is for nfsd to be able to reliably wait for
> >> any fput() that it initiated to complete.
> >>
> >> How would you like the VFS to provide that service?
> >
> > Since task_work happens in the context of your task already, why not
> > just have a way to get it stashed into a list when final fput is done?
> > This avoids all of this "let's expose task_work" and using the task list
> > for that, which seems kind of pointless as you're just going to run it
> > later on manually anyway.
> >
> > In semi pseudo code:
> >
> > bool fput_put_ref(struct file *file)
> > {
> > return atomic_dec_and_test(&file->f_count);
> > }
> >
> > void fput(struct file *file)
> > {
> > if (fput_put_ref(file)) {
> > ...
> > }
> > }
> >
> > and then your nfsd_file_free() could do:
> >
> > ret = filp_flush(file, id);
> > if (fput_put_ref(file))
> > llist_add(&file->f_llist, &l->to_free_llist);
> >
> > or something like that, where l->to_free_llist is where ever you'd
> > otherwise punt the actual freeing to.
>
> Should probably have the put_ref or whatever helper also init the
> task_work, and then reuse the list in the callback_head there. Then
> whoever flushes it has to call ->func() and avoid exposing ____fput() to
> random users. But you get the idea.
Interesting ideas - thanks.
So maybe the new API would be
fput_queued(struct file *f, struct llist_head *q)
and
flush_fput_queue(struct llist_head *q)
with the meaning being that fput_queued() is just like fput() except
that any file needing __fput() is added to the 'q'; and that
flush_fput_queue() calls __fput() on any files in 'q'.
So to close a file nfsd would:
fget(f);
flip_close(f);
fput_queued(f, &my_queue);
though possibly we could have a
filp_close_queued(f, q)
as well.
I'll try that out - but am happy to hear alternate suggestions for names :-)
Thanks,
NeilBrown
next prev parent reply other threads:[~2023-12-05 22:16 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-04 1:36 [PATCH 0/2 v2] Move all file-close work for nfsd into nfsd threads NeilBrown
2023-12-04 1:36 ` [PATCH 1/2] Allow a kthread to declare that it calls task_work_run() NeilBrown
2023-12-04 2:13 ` Jens Axboe
2023-12-04 21:02 ` NeilBrown
2023-12-04 22:09 ` Jens Axboe
2023-12-04 22:27 ` NeilBrown
2023-12-05 11:14 ` Christian Brauner
2023-12-05 14:06 ` Jeff Layton
2023-12-05 21:28 ` NeilBrown
2023-12-05 21:58 ` Jens Axboe
2023-12-05 22:03 ` Jens Axboe
2023-12-05 22:16 ` NeilBrown [this message]
2023-12-05 23:23 ` NeilBrown
2023-12-05 23:31 ` Jens Axboe
2023-12-06 14:24 ` Christian Brauner
2023-12-08 1:40 ` NeilBrown
2023-12-06 14:29 ` Christian Brauner
2023-12-06 5:44 ` Christoph Hellwig
2023-12-04 2:25 ` Matthew Wilcox
2023-12-04 21:04 ` NeilBrown
2023-12-04 2:40 ` Al Viro
2023-12-04 16:12 ` Oleg Nesterov
2023-12-04 21:20 ` NeilBrown
2023-12-05 6:27 ` Dave Chinner
2023-12-05 6:41 ` Dave Chinner
2023-12-05 8:48 ` NeilBrown
2023-12-05 11:29 ` Christian Brauner
2023-12-05 11:25 ` Christian Brauner
2023-12-05 14:23 ` Chuck Lever
2023-12-04 1:36 ` [PATCH 2/2] nfsd: Don't leave work of closing files to a work queue NeilBrown
2023-12-04 16:58 ` Chuck Lever
2023-12-04 22:21 ` NeilBrown
2023-12-04 23:48 ` Chuck Lever
2023-12-05 6:36 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=170181458198.7109.790647899711986334@noble.neil.brown.name \
--to=neilb@suse.de \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=jlayton@kernel.org \
--cc=juri.lelli@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=vincent.guittot@linaro.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).