linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-fsdevel@vger.kernel.org,
	Matthew Wilcox <willy@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 2/4] fs/dcache: Split __d_lookup_done()
Date: Wed, 27 Jul 2022 04:36:49 +0100	[thread overview]
Message-ID: <YuCy0Zzt/mq3IBz7@ZenIV> (raw)
In-Reply-To: <YuCxiWkbfWTT+p1f@ZenIV>

On Wed, Jul 27, 2022 at 04:31:21AM +0100, Al Viro wrote:
> On Mon, Jun 13, 2022 at 04:07:10PM +0200, Sebastian Andrzej Siewior wrote:
> > __d_lookup_done() wakes waiters on dentry::d_wait inside a preemption
> > disabled region. This violates the PREEMPT_RT constraints as the wake up
> > acquires wait_queue_head::lock which is a "sleeping" spinlock on RT.
> 
> I'd probably turn that into something like
> 
> __d_lookup_done() wakes waiters on dentry->d_wait.  On PREEMPT_RT we are
> not allowed to do that with preemption disabled, since the wakeup
> acquired wait_queue_head::lock, which is a "sleeping" spinlock on RT.
> 
> Calling it under dentry->d_lock is not a problem, since that is also
> a "sleeping" spinlock on the same configs.  Unfortunately, two of
> its callers (__d_add() and __d_move()) are holding more than just ->d_lock
> and that needs to be dealt with.
> 
> The key observation is that wakeup can be moved to any point before
> dropping ->d_lock.
> 
> > As a first step to solve this, move the wake up outside of the
> > hlist_bl_lock() held section.
> > 
> > This is safe because:
> > 
> >   1) The whole sequence including the wake up is protected by dentry::lock.
> > 
> >   2) The waitqueue head is allocated by the caller on stack and can't go
> >      away until the whole callchain completes.
> 
> 	That's too vague and in one case simply incorrect - the call
> of d_alloc_parallel() in nfs_call_unlink() does *not* have wq in stack
> frame of anything in the callchain.  Incidentally, another unusual caller
> (d_add_ci()) has a bug (see below).  What really matters is that we can't
> reach destruction of wq without __d_lookup_done() under ->d_lock.
> 
> Waiters get inserted into ->d_wait only after they'd taken ->d_lock
> and observed DCACHE_PAR_LOOKUP in flags.  As long as they are
> woken up (and evicted from the queue) between the moment __d_lookup_done()
> has removed DCACHE_PAR_LOOKUP and dropping ->d_lock, we are safe,
> since the waitqueue ->d_wait points to won't get destroyed without
> having __d_lookup_done(dentry) called (under ->d_lock).
> 
> ->d_wait is set only by d_alloc_parallel() and only in case when
> it returns a freshly allocated in-lookup dentry.  Whenever that happens,
> we are guaranteed that __d_lookup_done() will be called for resulting
> dentry (under ->d_lock) before the wq in question gets destroyed.
> 
> With two exceptions wq lives in call frame of the caller of
> d_alloc_parallel() and we have an explicit d_lookup_done() on the
> resulting in-lookup dentry before we leave that frame.
> 
> One of those exceptions is nfs_call_unlink(), where wq is embedded into
> (dynamically allocated) struct nfs_unlinkdata.  It is destroyed in
> nfs_async_unlink_release() after an explicit d_lookup_done() on the
> dentry wq went into.
> 
> Remaining exception is d_add_ci().  There wq is what we'd found in
> ->d_wait of d_add_ci() argument.  Callers of d_add_ci() are two
> instances of ->d_lookup() and they must have been given an in-lookup
> dentry.  Which means that they'd been called by __lookup_slow() or
> lookup_open(), with wq in the call frame of one of those.
> 
> [[[
> Result of d_alloc_parallel() in d_add_ci() is fed to
> d_splice_alias(), which *NORMALLY* feeds it to __d_add() or
> __d_move() in a way that will have __d_lookup_done() applied to it.
> 
> 	However, there is a nasty possibility - d_splice_alias() might
> legitimately fail without having marked the sucker not in-lookup.  dentry
> will get dropped by d_add_ci(), so ->d_wait won't end up pointing to freed
> object, but it's still a bug - retain_dentry() will scream bloody murder
> upon seeing that, and for a good reason; we'll get hash chain corrupted.
> It's impossible to hit without corrupted fs image (ntfs or case-insensitive
> xfs), but it's a bug.  Fix is a one-liner (add d_lookup_done(found);
> right after
>         res = d_splice_alias(inode, found);
> 	if (res) {
> in d_add_ci()) and with that done the last sentence about d_add_ci() turns
> into
> ]]]
> 
> Result of d_alloc_parallel() in d_add_ci() is fed to
> d_splice_alias(), which either returns non-NULL (and d_add_ci() does
> d_lookup_done()) or feeds dentry to __d_add() that will do
> __d_lookup_done() under ->d_lock.  That concludes the analysis.
> 
> 
> PS: I'm not sure we need to do this migration of wakeup in stages;
> lift it into the caller of __d_lookup_done() as the first step,
> then move the damn thing all the way to end_dir_add().  Analysis
> can go into either...

PPS: I'm OK with the series in its current form; resplit mentioned above
is a matter of taste and if you prefer to keep the commit boundaries
as-is, I'm fine with that.  Commit message really needs to be changed,
though.  Would you be OK with the changes as above?

  reply	other threads:[~2022-07-27  3:36 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-13 14:07 fs/dcache: Resolve the last RT woes Sebastian Andrzej Siewior
2022-06-13 14:07 ` [PATCH 1/4] fs/dcache: Disable preemption on i_dir_seq write side on PREEMPT_RT Sebastian Andrzej Siewior
2022-06-13 14:07 ` [PATCH 2/4] fs/dcache: Split __d_lookup_done() Sebastian Andrzej Siewior
2022-07-27  3:31   ` Al Viro
2022-07-27  3:36     ` Al Viro [this message]
2022-06-13 14:07 ` [PATCH 3/4] fs/dcache: Use __d_lookup_unhash() in __d_add/move() Sebastian Andrzej Siewior
2022-07-26  3:00   ` Al Viro
2022-07-26  7:47     ` [PATCH 3/4 v2] " Sebastian Andrzej Siewior
2022-06-13 14:07 ` [PATCH 4/4] fs/dcache: Move wakeup out of i_seq_dir write held region Sebastian Andrzej Siewior
2022-07-08 15:52 ` fs/dcache: Resolve the last RT woes Thomas Gleixner
2022-07-25 15:49 ` Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YuCy0Zzt/mq3IBz7@ZenIV \
    --to=viro@zeniv.linux.org.uk \
    --cc=bigeasy@linutronix.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).