linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vegard Nossum <vegard.nossum@oracle.com>
To: Al Viro <viro@kernel.org>, linux-fsdevel@vger.kernel.org
Cc: linux-doc@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [RFC] documentation on filesystem exposure to RCU pathwalk from fs maintainers' POV
Date: Mon, 26 Feb 2024 07:35:17 +0100	[thread overview]
Message-ID: <5879f44e-1c28-4be1-a684-9bf4fbf6966b@oracle.com> (raw)
In-Reply-To: <Zdu58Jevui1ySBqa@duke.home>


Hi,

A couple of clarity-related suggestions:

On 25/02/2024 23:06, Al Viro wrote:
> ===================================================================
> The ways in which RCU pathwalk can ruin filesystem maintainer's day
> ===================================================================
> 
> The problem: exposure of filesystem code to lockless environment
> ================================================================
> 
> Filesystem methods can usually count upon VFS-provided warranties
> regarding the stability of objects they are called to act upon; at the
> very least, they can expect the dentries/inodes/superblocks involved to
> remain live throughout the operation.
> 
> Life would be much more painful without that; however, such warranties
> do not come for free.  The problem is that access patterns are heavily
> biased; every system call getting an absolute pathname will have to
> start at root directory, etc.  Having each of them in effect write
> "I'd been here" on the same memory objects would cost quite a bit.
> As the result, we try to keep the fast path stores-free, bumping
> no refcounts and taking no locks.  Details are described elsewhere,
> but the bottom line for filesystems is that some methods may be called
> with much looser warranties than usual.  Of course, from the filesystem

There is a slight apparent contradiction here between "at the very
least, they can expect [...] to remain live throughout the operation" in
the first paragraph (which sounds like they _do_ have these guarantees)
and most of the second paragraph (which says they _don't_ have these
guarantees).

I *think* what you are saying is that dentries/inodes/sbs involved will
indeed stay live (i.e. allocated), but that there are OTHER warranties
you might usually expect that are not there, such as objects not being
locked and potentially changing underneath your filesystem's VFS
callback or being in a partial state or other indirectly pointed-to
objects not being safe to access.

The source of the confusion for me is that you say "such warranties do
not come for free" and it sounds like it refers to the liveness warranty
that you just mentioned, whereas it actually refers to all the other
warranties that you did NOT mention yet at this point in the document.

How about changing it like this:

"""
Filesystem methods can usually count upon a number of VFS-provided
warranties regarding the stability of the dentries/inodes/superblocks
they are called to act upon. For example, they always can expect these
objects to remain live throughout the operation; life would be much more
painful without that.

However, such warranties do not come for free and other warranties may
not always be provided. [...]
"""

(As a side note, you may also want to actually link the docs we have for
RCU lookup where you say "details are described elsewhere".)

> What methods are affected?
> ==========================
> 
> 	The list of the methods that could run into that fun:
> 
> ========================	==================================	=================
> 	method			indication that the call is unsafe	unstable objects
> ========================	==================================	=================

I'd wish for explicit definitions of "unsafe" (which is a terminology
you do use more or less consistently in this doc) and "unstable". The
definitions don't need mathematical precision, but there should be a
quick one-line explanation of each.

I think "the call is unsafe" means that it doesn't have all the usual
safety warranties (as detailed above).

I think "unstable" means "not locked, can change underneath the
function" (but not that it can be freed), but it would be good to have
it spelled out.

> ->d_hash(d, ...) 		none - any call might be		d
> ->d_compare(d, ...)		none - any call might be		d
> ->d_revalidate(d, f)		f & LOOKUP_RCU				d
> ->d_manage(d, f)		f					d
> ->permission(i, m)		m & MAY_NOT_BLOCK			i
> ->get_link(d, i, ...)		d == NULL				i
> ->get_inode_acl(i, t, f)	f == LOOKUP_RCU				i
> ========================	==================================	=================

FWIW, thanks for writing this document, it's a really useful addition.


Vegard

  reply	other threads:[~2024-02-26  6:35 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-25 22:06 [RFC] documentation on filesystem exposure to RCU pathwalk from fs maintainers' POV Al Viro
2024-02-26  6:35 ` Vegard Nossum [this message]
2024-02-26  9:13   ` Al Viro
2024-02-26  7:58 ` Randy Dunlap
2024-02-26 18:35 ` [RFC][v2] " Al Viro
2024-02-27  8:04   ` [RFC][v3] " Al Viro
2024-02-28  4:20     ` Akira Yokosawa
2024-02-28 13:02     ` [RFC][v4] " Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5879f44e-1c28-4be1-a684-9bf4fbf6966b@oracle.com \
    --to=vegard.nossum@oracle.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).