From: "J. Bruce Fields" <bfields@fieldses.org>
To: Richard Weinberger <richard@nod.at>
Cc: linux-nfs@vger.kernel.org, david@sigma-star.at,
luis.turcitu@appsbroker.com, david.young@appsbroker.com,
david.oberhollenzer@sigma-star.at,
trond.myklebust@hammerspace.com, anna.schumaker@netapp.com,
chris.chilvers@appsbroker.com
Subject: Re: [RFC PATCH 0/6] nfs-utils: Improving NFS re-exports
Date: Thu, 17 Feb 2022 11:33:32 -0500 [thread overview]
Message-ID: <20220217163332.GA16497@fieldses.org> (raw)
In-Reply-To: <20220217131531.2890-1-richard@nod.at>
On Thu, Feb 17, 2022 at 02:15:25PM +0100, Richard Weinberger wrote:
> This is the second iteration of the NFS re-export improvement series for nfs-utils.
> While the kernel side didn't change at all and is still small,
> the userspace side saw much more changes.
> Please note that this is still an RFC, there is room for improvement.
>
> The core idea is adding new export option: reeport=
> Using reexport= it is possible to mark an export entry in the exports file
> explicitly as NFS re-export and select a strategy how unique identifiers
> should be provided.
> "remote-devfsid" is the strategy I have proposed in my first patch,
> I understand that this one is dangerous. But I still find it useful in some
> situations.
> "auto-fsidnum" and "predefined-fsidnum" are new and use a SQLite database as
> backend to keep track of generated ids.
> For a more detailed description see patch "exports: Implement new export option reexport=".
Thanks, I'll try to take a look.
Before upstreaming, I would like us to pick just one. These kind of
options tend to complicate testing and documentation and debugging.
For an RFC, though, I think it makes sense, so I'm fine with keeping
"reexport=" while we're still exploring the different options. And,
hey, maybe we end up adding more than one after we've upstreamed the
first one.
> I choose SQLite because nfs-utils already uses it and using SQL ids can nicely
> generated and maintained. It will also scale for large setups where the amount
> of subvolumes is high.
>
> Beside of id generation this series also addresses the reboot problem.
> If the re-exporting NFS server reboots, uncovered NFS subvolumes are not yet
> mounted and file handles become stale.
> Now mountd/exportd keeps track of uncovered subvolumes and makes sure they get
> uncovered while nfsd starts.
>
> The whole set of features is currently opt-in via --enable-reexport.
> I'm also not sure about the rearrangement of the reexport code,
> currently it is a helper library.
>
> Please let me know whether you like this approach.
> If so I'd tidy it up and submit it as non-RFC.
>
> TODOs/Open questions:
> - When re-exporting, fs.nfs.nfs_mountpoint_timeout should be set to 0
> to make subvolumes not vanish.
> Is this something exportfs should do automatically when it sees an export entry with a reexport= option?
Setting the timeout to 0 doesn't help with re-export server reboots.
After a reboot is another case where we could end up in a situation
where a client hands us a filehandle for a filesystem that isn't mounted
yet.
I think you want to keep a path with each entry in the database. When
mountd gets a request for a filesystem it hasn't seen before, it stats
that path, which should trigger the automounts.
And it'd be good to have a test case with a client (Linux client or
pynfs) that, say, opens a file several mounts deep, then reboots the
reexport server, then tries to, say, write to the file descriptor after
the reboot. (Or maybe there's a way to force the mounts to expire as a
shortcut instead of doing a full reboot.)
> - exportd saw only minimal testing so far, I wasn't aware of it yet. :-S
> - Currently wtere is no way to release the shared memory which contains the database lock.
> I guess it could be released via exportfs -f, which is the very last exec in nfs-server.service
> - Add a tool to import/export entries from the reexport database which obeys the shared lock.
> - When doing v4->v4 or v3->v4 re-exports very first read access to a file block a few seconds until
> the client does a retransmit.
> v3->v3 works fine. More investigation needed.
Might want to strace mountd and look at the communication over the
/proc/fs/nfsd/*/channel files, maybe mountd is failing to respond to an
upcall.
--b.
>
> Looking forward for your feedback!
>
> Thanks,
> //richard
>
> Richard Weinberger (6):
> Implement reexport helper library
> exports: Implement new export option reexport=
> export: Implement logic behind reexport=
> export: Record mounted volumes
> nfsd: statfs() every known subvolume upon start
> export: Garbage collect orphaned subvolumes upon start
>
> configure.ac | 12 +
> support/Makefile.am | 4 +
> support/export/Makefile.am | 2 +
> support/export/cache.c | 241 +++++++++++++++++-
> support/export/export.h | 3 +
> support/include/nfslib.h | 1 +
> support/nfs/Makefile.am | 1 +
> support/nfs/exports.c | 73 ++++++
> support/reexport/Makefile.am | 6 +
> support/reexport/reexport.c | 477 +++++++++++++++++++++++++++++++++++
> support/reexport/reexport.h | 53 ++++
> utils/exportd/Makefile.am | 8 +-
> utils/exportd/exportd.c | 17 ++
> utils/exportfs/Makefile.am | 4 +
> utils/mount/Makefile.am | 6 +
> utils/mountd/Makefile.am | 6 +
> utils/mountd/mountd.c | 1 +
> utils/mountd/svc_run.c | 18 ++
> utils/nfsd/Makefile.am | 6 +
> utils/nfsd/nfsd.c | 10 +
> 20 files changed, 934 insertions(+), 15 deletions(-)
> create mode 100644 support/reexport/Makefile.am
> create mode 100644 support/reexport/reexport.c
> create mode 100644 support/reexport/reexport.h
>
> --
> 2.31.1
next prev parent reply other threads:[~2022-02-17 16:33 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-17 13:15 [RFC PATCH 0/6] nfs-utils: Improving NFS re-exports Richard Weinberger
2022-02-17 13:15 ` [RFC PATCH 1/6] Implement reexport helper library Richard Weinberger
2022-03-08 21:44 ` J. Bruce Fields
2022-03-09 9:43 ` Richard Weinberger
2022-03-09 14:19 ` bfields
2022-03-09 15:02 ` Richard Weinberger
2022-03-09 15:28 ` bfields
2022-02-17 13:15 ` [RFC PATCH 2/6] exports: Implement new export option reexport= Richard Weinberger
2022-03-08 22:10 ` J. Bruce Fields
2022-03-09 9:43 ` Richard Weinberger
2022-02-17 13:15 ` [RFC PATCH 3/6] export: Implement logic behind reexport= Richard Weinberger
2022-02-17 13:15 ` [RFC PATCH 4/6] export: Record mounted volumes Richard Weinberger
2022-02-17 13:15 ` [RFC PATCH 5/6] nfsd: statfs() every known subvolume upon start Richard Weinberger
2022-02-17 13:15 ` [RFC PATCH 6/6] export: Garbage collect orphaned subvolumes " Richard Weinberger
2022-02-17 16:33 ` J. Bruce Fields [this message]
2022-02-17 17:27 ` [RFC PATCH 0/6] nfs-utils: Improving NFS re-exports Richard Weinberger
2022-02-17 19:27 ` bfields
2022-02-17 20:15 ` Richard Weinberger
2022-02-17 20:18 ` bfields
2022-02-17 20:29 ` Richard Weinberger
2022-03-07 9:25 ` Richard Weinberger
2022-03-07 22:29 ` bfields
2022-04-19 20:20 ` Steve Dickson
2022-04-19 20:31 ` Richard Weinberger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220217163332.GA16497@fieldses.org \
--to=bfields@fieldses.org \
--cc=anna.schumaker@netapp.com \
--cc=chris.chilvers@appsbroker.com \
--cc=david.oberhollenzer@sigma-star.at \
--cc=david.young@appsbroker.com \
--cc=david@sigma-star.at \
--cc=linux-nfs@vger.kernel.org \
--cc=luis.turcitu@appsbroker.com \
--cc=richard@nod.at \
--cc=trond.myklebust@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox