linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: NeilBrown <neilb@suse.de>
Cc: Lorenzo Bianconi <lorenzo@kernel.org>,
	linux-nfs@vger.kernel.org,  lorenzo.bianconi@redhat.com,
	chuck.lever@oracle.com, netdev@vger.kernel.org,  kuba@kernel.org
Subject: Re: [PATCH v8 3/6] NFSD: add write_version to netlink command
Date: Tue, 23 Apr 2024 08:42:58 -0400	[thread overview]
Message-ID: <4931ecd2e702c95a4da0e1f0c4d9e439f941a451.camel@kernel.org> (raw)
In-Reply-To: <171375700742.7600.2829467433799282531@noble.neil.brown.name>

On Mon, 2024-04-22 at 13:36 +1000, NeilBrown wrote:
> On Wed, 17 Apr 2024, Jeff Layton wrote:
> > On Wed, 2024-04-17 at 10:43 +1000, NeilBrown wrote:
> > > On Wed, 17 Apr 2024, Jeff Layton wrote:
> > > > On Wed, 2024-04-17 at 09:05 +1000, NeilBrown wrote:
> > > > > On Wed, 17 Apr 2024, Jeff Layton wrote:
> > > > > > On Wed, 2024-04-17 at 07:48 +1000, NeilBrown wrote:
> > > > > > > On Tue, 16 Apr 2024, Jeff Layton wrote:
> > > > > > > > On Tue, 2024-04-16 at 13:16 +1000, NeilBrown wrote:
> > > > > > > > > On Tue, 16 Apr 2024, Lorenzo Bianconi wrote:
> > > > > > > > > > Introduce write_version netlink command through a "declarative" interface.
> > > > > > > > > > This patch introduces a change in behavior since for version-set userspace
> > > > > > > > > > is expected to provide a NFS major/minor version list it wants to enable
> > > > > > > > > > while all the other ones will be disabled. (procfs write_version
> > > > > > > > > > command implements imperative interface where the admin writes +3/-3 to
> > > > > > > > > > enable/disable a single version.
> > > > > > > > > 
> > > > > > > > > It seems a little weird to me that the interface always disables all
> > > > > > > > > version, but then also allows individual versions to be disabled.
> > > > > > > > > 
> > > > > > > > > Would it be reasonable to simply ignore the "enabled" flag when setting
> > > > > > > > > version, and just enable all versions listed??
> > > > > > > > > 
> > > > > > > > > Or maybe only enable those with the flag, and don't disable those
> > > > > > > > > without the flag?
> > > > > > > > > 
> > > > > > > > > Those don't necessarily seem much better - but the current behaviour
> > > > > > > > > still seems odd.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > I think it makes sense.
> > > > > > > > 
> > > > > > > > We disable all versions, and enable any that have the "enabled" flag set
> > > > > > > > in the call from userland. Userland technically needn't send down the
> > > > > > > > versions that are disabled in the call, but the current userland program
> > > > > > > > does.
> > > > > > > > 
> > > > > > > > I worry about imperative interfaces that might only say -- "enable v4.1,
> > > > > > > > but disable v3" and leave the others in their current state. That
> > > > > > > > requires that both the kernel and userland keep state about what
> > > > > > > > versions are currently enabled and disabled, and it's possible to get
> > > > > > > > that wrong.
> > > > > > > 
> > > > > > > I understand and support your aversion for imperative interfaces.
> > > > > > > But this interface, as currently implemented, looks somewhat imperative.
> > > > > > > The message sent to the kernel could enable some interfaces and then
> > > > > > > disable them.  I know that isn't the intent, but it is what the code
> > > > > > > supports.  Hence "weird".
> > > > > > > 
> > > > > > > We could add code to make that sort of thing impossible, but there isn't
> > > > > > > much point.  Better to make it syntactically impossible.
> > > > > > > 
> > > > > > > Realistically there will never be NFSv4.3 as there is no need - new
> > > > > > > features can be added incrementally. 
> > > > > > > 
> > > > > > 
> > > > > > There is no need _so_far_. Who knows what the future will bring? Maybe
> > > > > > we'll need v4.3 in order to add some needed feature?
> > > > > > 
> > > > > > >  So we could just pass an array of
> > > > > > > 5 active flags: 2,3,4.0,4.1,4.2.  I suspect you wouldn't like that and
> > > > > > > I'm not sure that I do either.  A "read" would return the same array
> > > > > > > with 3 possible states: unavailable, disabled, enabled.  (Maybe the
> > > > > > > array could be variable length so 5.0 could be added one day...).
> > > > > > > 
> > > > > > 
> > > > > > A set of flags is basically what this interface is. They just happen to
> > > > > > also be labeled with the major and minorversion. I think that's a good
> > > > > > thing.
> > > > > 
> > > > > Good for whom?  Labelling allows for labelling inconsistencies.
> > > > > 
> > > > 
> > > > Now you're just being silly. You wanted a variable length array, but
> > > > what if the next slot is not v4.3 but 5.0? A positional interpretation
> > > > of a slot in an array is just as just as subject to problems.
> > > 
> > > I don't think it could look like a imperative interface, which the
> > > current one does a bit.
> > > 
> > > > 
> > > > > Maybe the kernel should be able to provide an ordered list of labels,
> > > > > and separately an array of which labels are enabled/disabled.
> > > > > Userspace could send down a new set of enabled/disabled flags based on
> > > > > the agreed set of labels.
> > > > > 
> > > > 
> > > > How is this better than what's been proposed? One strength of netlink is
> > > > that the data is structured. The already proposed interface takes
> > > > advantage of that.
> > > > 
> > > > > Here is a question that is a bit of a diversion, but might help us
> > > > > understand the context more fully:
> > > > > 
> > > > >   Why would anyone disable v4.2 separately from v4.1 ??
> > > > > 
> > > > 
> > > > Furthermore, what does it mean to disable v4.1 but leave v4.2 enabled?
> > > 
> > > Indeed!
> > > 
> > > > 
> > > > > I understand that v2, v3, v4.0, v4.1 are effectively different protocols
> > > > > and you might want to ensure that only the approved/tested protocol is
> > > > > used.  But v4.2 is just a few enhancements on v4.1.  Why would you want
> > > > > to disable it?
> > > > > 
> > > > > The answer I can think of that there might be bugs (perish the
> > > > > thought!!) in some of those features so you might want to avoid using
> > > > > them.
> > > > > But in that case, it is really the features that you want to suppress,
> > > > > not the protocol version.
> > > > > 
> > > > > Maybe I might want to disable delegation - to just write delegation.
> > > > > Can I do that?  What if I just want to disable server-side copy, but
> > > > > keep fallocate and umask support?
> > > > > 
> > > > > i.e.  is a list of versions really the granularity that we want to use
> > > > > for this interface?
> > > > > 
> > > > 
> > > > Our current goal is to replace rpc.nfsd with a new program that works
> > > > via netlink. An important bit of what rpc.nfsd does is start the NFS
> > > > server with the settings in /etc/nfs.conf. Some of those settings are
> > > > vers3=, vers4.0=, etc. that govern how /proc/fs/nfsd/versions is set.
> > > > We have an immediate need to deal with those settings today, and
> > > > probably will for quite some time.
> > > > 
> > > > I'm not opposed to augmenting that with something more granular, but I
> > > > don't think we should block this interface and wait on that. We can
> > > > extend the interface at some point in the future to take a new feature
> > > > bitmask or something, and just declare that (e.g.) declaring vers4.2=n
> > > > disables some subset of those features.
> > > 
> > > I agree that we don't want to block "good" while we wait for "perfect".  I
> > > just want to be sure that what we have is "good".
> > > 
> > > The current /proc interface uses strings like "v3" and "v4.1".
> > > The proposed netlink interface uses pairs on u32s - "major" and "minor".
> > > So we lose some easy paths for extensibility.  Are we comfortable with
> > > that?
> > > 
> > > This isn't a big deal - certainly not a blocked.  I just don't feel
> > > entirely comfortable about the current interface and I'm exploring to
> > > see if there might be something better.
> > > 
> > > 
> > 
> > Ok, I'm not convinced that anything you've proposed so far is better
> > than what we have, but I'm open-minded.
> 
> Thanks.  I admit that I don't think anything I've come up with is better
> either.
> 
> > 
> > At this point we have these to-dos:
> > 
> > 1) make the threads interface accept an array of integers rather than a
> > singleton. This is needed when sunrpc.pool_mode is not global.
> > 
> 
> While this I think probably still makes sense, I'm feeling less
> enthusiastic about it and probably wouldn't complain if it didn't
> happen.
> 

Ok. I think making it an array is fine. For now, it will just error out
when you send down more than one value, but I think we can make it work
the way you'd expect. Regardlness, cleaning up and unifying the pooled
vs. non-pooled server handling seems like a good thing to do. The
existing one is a long-standing kludge.

> I don't like that fact that we need to configure a number of threads.  I
> would much rather it grow/shrink dynamically.  I've got more patches
> that are heading in this direction but they aren't ready yet.
> 
> If we do land these and they prove to be effective, then configuring
> per-pool threads would become extremely uninteresting.  The demand on
> each pool would adjust each pool separately.
> 
> So if use an array here turns out to be problematic for some reason, I
> won't complain.
> 

+1 on making the thread pool sizing dynamic. I had started looking at
what heuristics we should use, but I haven't gotten very far yet.

Some general thoughts:

nfsd threads spend most of their time waiting. They are only very rarely
cpu-bound, so the max size of the thread pools should mostly scale with
the amount of memory in the host.

I have a patch that adds a new percpu counter to count how often we go
to wake a thread and don't find one, and have to queue it. It clearly
hits a lot more when there are fewer running nfsd threads.

I don't think that's sufficient to know when to spawn a new thread
though. For that, we probably ought to be looking at how long RPCs are
sitting and waiting for a thread to pick them up. When that starts
growing then we probably need to bring in more threads.

Conversely, when a thread is sitting idle for a long time (10s? 60s?),
we can probably spin it down. Maybe we can have a new LRU for the
threads and have a periodic thread reaper job that checks for those.


> > 2) have the interface pass down the scope string in the THREADS_SET
> > instead of making userland change the UTS namespace before starting
> > threads. This should just mean adding a new optional string attribute to
> > the threads interfaces.
> > 
> > ...anything else we're missing that needs doing here? We'd really like
> > to see this in -next soon, so we can possibly make v6.10 with this.
> 
> I haven't yet found any obvious gaps.  I agree that we can and should
> move forward promptly.
> 

Great. I think Lorenzo is planning to send another set soon that should
hopefully be reasonable for merge.

> 
> > 
> > Thanks,
> > -- 
> > Jeff Layton <jlayton@kernel.org>
> > 
> 

-- 
Jeff Layton <jlayton@kernel.org>

  reply	other threads:[~2024-04-23 12:43 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-15 19:44 [PATCH v8 0/6] convert write_threads, write_version and write_ports to netlink commands Lorenzo Bianconi
2024-04-15 19:44 ` [PATCH v8 1/6] nfsd: move nfsd_mutex handling into nfsd_svc callers Lorenzo Bianconi
2024-04-15 19:44 ` [PATCH v8 2/6] NFSD: convert write_threads to netlink command Lorenzo Bianconi
2024-04-16 21:55   ` NeilBrown
2024-04-16 22:39     ` Jeff Layton
2024-04-17 13:04       ` Jeff Layton
2024-04-16 22:28   ` NeilBrown
2024-04-16 22:47     ` Jeff Layton
2024-04-15 19:44 ` [PATCH v8 3/6] NFSD: add write_version " Lorenzo Bianconi
2024-04-16  3:16   ` NeilBrown
2024-04-16 10:26     ` Jeff Layton
2024-04-16 21:48       ` NeilBrown
2024-04-16 22:33         ` Jeff Layton
2024-04-16 23:05           ` NeilBrown
2024-04-17  0:10             ` Jeff Layton
2024-04-17  0:43               ` NeilBrown
2024-04-17 11:25                 ` Jeff Layton
2024-04-22  3:36                   ` NeilBrown
2024-04-23 12:42                     ` Jeff Layton [this message]
2024-04-15 19:44 ` [PATCH v8 4/6] SUNRPC: introduce svc_xprt_create_from_sa utility routine Lorenzo Bianconi
2024-04-15 19:44 ` [PATCH v8 5/6] SUNRPC: add a new svc_find_listener helper Lorenzo Bianconi
2024-04-15 19:44 ` [PATCH v8 6/6] NFSD: add listener-{set,get} netlink command Lorenzo Bianconi
2024-04-16 19:05 ` [PATCH v8 0/6] convert write_threads, write_version and write_ports to netlink commands Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4931ecd2e702c95a4da0e1f0c4d9e439f941a451.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=kuba@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=lorenzo.bianconi@redhat.com \
    --cc=lorenzo@kernel.org \
    --cc=neilb@suse.de \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).