From: Dominique Martinet <asmadeus@codewreck.org>
To: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Christian Schoenebeck <linux_oss@crudebyte.com>, v9fs@lists.linux.dev
Subject: Re: cache fixes (redux)
Date: Thu, 28 Dec 2023 09:32:17 +0900 [thread overview]
Message-ID: <ZYzCERk_dg1-CEjb@codewreck.org> (raw)
In-Reply-To: <CAFkjPT=8CZHATaraBrqAZGDKjQLOp=U1gdgteJ5jpXRGJyBojQ@mail.gmail.com>
Eric Van Hensbergen wrote on Wed, Dec 27, 2023 at 11:39:19AM -0600:
> On Wed, Dec 27, 2023 at 2:11 AM Dominique Martinet <asmadeus@codewreck.org>
> wrote:
> > I'm not sure multi-walk is possible in practice: even opening a file
> > directly through e.g. cat /mnt/long/path/to/file will have the VFS check
> > permissions along the path, so the VFS will need to stat all elements
> > along the way.
>
> Well, the server will do perm checks (ideally on walks), so do we really
> have to? Feels like overkill to check in both places, in principal we
> expect/assume server perm checks.
The server should definitely check as well, yes.
I'm not sure it's possible to tell the linux VFS that the server already
checked though; ultimately the plan back when I was working on 9p for my
previous job was to plug in MPI-IO into a user-space client library
(e.g. https://github.com/martinetd/space9 -- now unmaintained) so we
wouldn't have to deal with the vfs...
But once again, if you can manage to tell it that, I'm all for it. Some
servers might not check properly but that can/should be fixed when it
comes up.
> it did strike me that inodes are only unique per filesystem and I doubt the
> servers guarantee uniqueness across underlying mounts - but in principal we
> shouldnt have to rely on looking up inodes by qid.path anyways….however
> nothing I’m planning on should make this worse, but something to keep an
> eye out for.
We've had collisions a few times -- most "real" file systems I'm aware
of pseudo-randomize inode generation but in particular tmpfs inode
numbering is linear per mount so it's very easy to trigger collisions.
qemu mixes in some bits from st_dev to avoid this:
https://gitlab.com/qemu-project/qemu/-/blob/master/hw/9pfs/9p.c?ref_type=heads#L881
> > > - QUESTION: it seems like P9_QTLINK and P9_QTSYMLINK are no
> > > longer used, can these be recovered? I was thinking of reusing one of
> > > these to mark unlinked files which we could do a more comprehensive
> > > compare for if its actually necessary.
> >
> > Since we don't support reconnects, unlinked files have been working just
> > fine with servers that keep the underlying files open (e.g. qemu)
> >
> > What's your plan exactly?
>
>
> See above, mark fid/qids that are unlinked with a different type I think.
> I’m gonna look at how other filesystems handle this case before doing
> anything.
local filesystems don't do anything - the inode is just kept around, all
the way down to disk afaik (disk-mapped tmpfiles flush data to disk so
there must be some way to address them...)
(The last iput with nlink==0 calls evict_inode which does required
cleanup)
NFS doesn't delete the backing file but renames it - silly renames -
you've probably seen some .nfs1234 files around if you used a mount
point, the client won't let you delete them while it's open:
rm: cannot remove '.nfs000000000000f42300000001': Device or resource busy
And it's unlinked when file is closed (DCACHE_NFSFS_RENAMED check in
nfs_dentry_iput). That's necessary because the connexion to the server
can drop at any time, and reconnecting needs to be able to open the
file again.
Servers will also scrub these files after they're been unused for a
while.
For 9p since we don't handle reconnect I think we're fine just not doing
anything -- servers that don't close backing files will de facto keep
the file existing through their fd. There's a process with the file open
so we have an open fid somewhere and the inode won't get killed either.
When the last fd is closed the inode is evicted and we'll have all fids
closed so server will release its fd as well and backing filesystem will
cleanup appropriately.
If we want to handle reconnect at some point (CEA/Bull did that work but
never tried to publish it...), then we ought to do silly renames as
well, but servers really need background cleanup as well or it'll end
up full of .nfsxxx files...
> > We don't have any way of driving protocol changes forward (and there's
> > no compatibility negotiation on client connect...), but I recall
> > Christian also was looking forward to some other changes so we might
> > want to try to move this forward again...
> >
>
> Well, those bits unused (I think) so shouldnt be protocol change.
Right, I didn't realize it was just local use -- we can do whatever we
want in memory yes.
I'd avoid reusing the bits and define some new ones at the end of the
permissible range to make the "can come from server" and "client mess"
separation but leaving this up to you -- ultimately as long as there's
no conflict with what's on wire we should be fine.
> However we can bump to
> new protocol when necessary (9p2023? its been 23 years, why not?). big on
> my list would be qid.path -> 128 bit and qid.version -> 128 bit and maybe
> include filesystem uid in qid to differentiate underlying mounts and maybe
> i_generation. This would help with cache maintenance and uniqueness
> concerns. I’m also thinking of how we might handle security differently
> (use kernel TLS, public key auth integration, etc. although those might not
> require protocol changes). i’m also in favor of dirread senantics for
> cache modes so we retrieve attributes with the file list. What other
> things should we be considering? maybe time to start putting together an
> RFC.
Right; there's plenty to improve if we start...
My free time is basically rock bottom as it has been for a few years so
I probably won't help much, but I wouldn't push back.
--
Dominique Martinet | Asmadeus
next prev parent reply other threads:[~2023-12-28 0:32 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-26 16:13 cache fixes (redux) Eric Van Hensbergen
2023-12-27 8:10 ` Dominique Martinet
2023-12-27 18:07 ` Eric Van Hensbergen
[not found] ` <CAFkjPT=8CZHATaraBrqAZGDKjQLOp=U1gdgteJ5jpXRGJyBojQ@mail.gmail.com>
2023-12-28 0:32 ` Dominique Martinet [this message]
2023-12-28 10:44 ` Christian Schoenebeck
2023-12-28 15:07 ` Eric Van Hensbergen
2023-12-28 21:48 ` asmadeus
2023-12-29 12:22 ` Christian Schoenebeck
2023-12-29 12:50 ` asmadeus
2023-12-29 16:16 ` Eric Van Hensbergen
2023-12-29 16:06 ` Eric Van Hensbergen
2023-12-29 15:50 ` Eric Van Hensbergen
2024-01-03 18:09 ` cache fixes (redux) (a tale of many inode numbers...) Eric Van Hensbergen
2024-01-03 20:24 ` cache fixes (redux) asmadeus
[not found] ` <CAFkjPT=61yygntbhSJMMWeK4JBrm85EZpz387aPHD_xmHVBbog@mail.gmail.com>
2024-01-03 22:42 ` Fwd: " Eric Van Hensbergen
2024-01-03 22:48 ` Eric Van Hensbergen
2024-01-04 12:37 ` Christian Schoenebeck
2024-01-04 15:59 ` Eric Van Hensbergen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZYzCERk_dg1-CEjb@codewreck.org \
--to=asmadeus@codewreck.org \
--cc=ericvh@gmail.com \
--cc=linux_oss@crudebyte.com \
--cc=v9fs@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox