All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-fsdevel@vger.kernel.org,
	Matthew Wilcox <willy@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 2/4] fs/dcache: Split __d_lookup_done()
Date: Wed, 27 Jul 2022 04:36:49 +0100	[thread overview]
Message-ID: <YuCy0Zzt/mq3IBz7@ZenIV> (raw)
In-Reply-To: <YuCxiWkbfWTT+p1f@ZenIV>

On Wed, Jul 27, 2022 at 04:31:21AM +0100, Al Viro wrote:
> On Mon, Jun 13, 2022 at 04:07:10PM +0200, Sebastian Andrzej Siewior wrote:
> > __d_lookup_done() wakes waiters on dentry::d_wait inside a preemption
> > disabled region. This violates the PREEMPT_RT constraints as the wake up
> > acquires wait_queue_head::lock which is a "sleeping" spinlock on RT.
> 
> I'd probably turn that into something like
> 
> __d_lookup_done() wakes waiters on dentry->d_wait.  On PREEMPT_RT we are
> not allowed to do that with preemption disabled, since the wakeup
> acquired wait_queue_head::lock, which is a "sleeping" spinlock on RT.
> 
> Calling it under dentry->d_lock is not a problem, since that is also
> a "sleeping" spinlock on the same configs.  Unfortunately, two of
> its callers (__d_add() and __d_move()) are holding more than just ->d_lock
> and that needs to be dealt with.
> 
> The key observation is that wakeup can be moved to any point before
> dropping ->d_lock.
> 
> > As a first step to solve this, move the wake up outside of the
> > hlist_bl_lock() held section.
> > 
> > This is safe because:
> > 
> >   1) The whole sequence including the wake up is protected by dentry::lock.
> > 
> >   2) The waitqueue head is allocated by the caller on stack and can't go
> >      away until the whole callchain completes.
> 
> 	That's too vague and in one case simply incorrect - the call
> of d_alloc_parallel() in nfs_call_unlink() does *not* have wq in stack
> frame of anything in the callchain.  Incidentally, another unusual caller
> (d_add_ci()) has a bug (see below).  What really matters is that we can't
> reach destruction of wq without __d_lookup_done() under ->d_lock.
> 
> Waiters get inserted into ->d_wait only after they'd taken ->d_lock
> and observed DCACHE_PAR_LOOKUP in flags.  As long as they are
> woken up (and evicted from the queue) between the moment __d_lookup_done()
> has removed DCACHE_PAR_LOOKUP and dropping ->d_lock, we are safe,
> since the waitqueue ->d_wait points to won't get destroyed without
> having __d_lookup_done(dentry) called (under ->d_lock).
> 
> ->d_wait is set only by d_alloc_parallel() and only in case when
> it returns a freshly allocated in-lookup dentry.  Whenever that happens,
> we are guaranteed that __d_lookup_done() will be called for resulting
> dentry (under ->d_lock) before the wq in question gets destroyed.
> 
> With two exceptions wq lives in call frame of the caller of
> d_alloc_parallel() and we have an explicit d_lookup_done() on the
> resulting in-lookup dentry before we leave that frame.
> 
> One of those exceptions is nfs_call_unlink(), where wq is embedded into
> (dynamically allocated) struct nfs_unlinkdata.  It is destroyed in
> nfs_async_unlink_release() after an explicit d_lookup_done() on the
> dentry wq went into.
> 
> Remaining exception is d_add_ci().  There wq is what we'd found in
> ->d_wait of d_add_ci() argument.  Callers of d_add_ci() are two
> instances of ->d_lookup() and they must have been given an in-lookup
> dentry.  Which means that they'd been called by __lookup_slow() or
> lookup_open(), with wq in the call frame of one of those.
> 
> [[[
> Result of d_alloc_parallel() in d_add_ci() is fed to
> d_splice_alias(), which *NORMALLY* feeds it to __d_add() or
> __d_move() in a way that will have __d_lookup_done() applied to it.
> 
> 	However, there is a nasty possibility - d_splice_alias() might
> legitimately fail without having marked the sucker not in-lookup.  dentry
> will get dropped by d_add_ci(), so ->d_wait won't end up pointing to freed
> object, but it's still a bug - retain_dentry() will scream bloody murder
> upon seeing that, and for a good reason; we'll get hash chain corrupted.
> It's impossible to hit without corrupted fs image (ntfs or case-insensitive
> xfs), but it's a bug.  Fix is a one-liner (add d_lookup_done(found);
> right after
>         res = d_splice_alias(inode, found);
> 	if (res) {
> in d_add_ci()) and with that done the last sentence about d_add_ci() turns
> into
> ]]]
> 
> Result of d_alloc_parallel() in d_add_ci() is fed to
> d_splice_alias(), which either returns non-NULL (and d_add_ci() does
> d_lookup_done()) or feeds dentry to __d_add() that will do
> __d_lookup_done() under ->d_lock.  That concludes the analysis.
> 
> 
> PS: I'm not sure we need to do this migration of wakeup in stages;
> lift it into the caller of __d_lookup_done() as the first step,
> then move the damn thing all the way to end_dir_add().  Analysis
> can go into either...

PPS: I'm OK with the series in its current form; resplit mentioned above
is a matter of taste and if you prefer to keep the commit boundaries
as-is, I'm fine with that.  Commit message really needs to be changed,
though.  Would you be OK with the changes as above?

  reply	other threads:[~2022-07-27  3:36 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-13 14:07 fs/dcache: Resolve the last RT woes Sebastian Andrzej Siewior
2022-06-13 14:07 ` [PATCH 1/4] fs/dcache: Disable preemption on i_dir_seq write side on PREEMPT_RT Sebastian Andrzej Siewior
2022-06-13 14:07 ` [PATCH 2/4] fs/dcache: Split __d_lookup_done() Sebastian Andrzej Siewior
2022-07-27  3:31   ` Al Viro
2022-07-27  3:36     ` Al Viro [this message]
2022-06-13 14:07 ` [PATCH 3/4] fs/dcache: Use __d_lookup_unhash() in __d_add/move() Sebastian Andrzej Siewior
2022-07-26  3:00   ` Al Viro
2022-07-26  7:47     ` [PATCH 3/4 v2] " Sebastian Andrzej Siewior
2022-06-13 14:07 ` [PATCH 4/4] fs/dcache: Move wakeup out of i_seq_dir write held region Sebastian Andrzej Siewior
2022-07-08 15:52 ` fs/dcache: Resolve the last RT woes Thomas Gleixner
2022-07-25 15:49 ` Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YuCy0Zzt/mq3IBz7@ZenIV \
    --to=viro@zeniv.linux.org.uk \
    --cc=bigeasy@linutronix.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.