linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: linux-fsdevel@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Matthew Wilcox <willy@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH 0/4 v2] fs/dcache: Resolve the last RT woes.
Date: Wed, 27 Jul 2022 13:49:00 +0200	[thread overview]
Message-ID: <20220727114904.130761-1-bigeasy@linutronix.de> (raw)

This is v2 of the series, v1 is available at
   https://https://lkml.kernel.org/r/.kernel.org/all/20220613140712.77932-1-bigeasy@linutronix.de/

v1…v2:
   - Make patch around Al's description of a bug in d_add_ci(). I took
     the liberty to make him Author and added his signed-off-by since I
     sinmply added a patch-body around his words.

   - The reasoning of why delaying the wakeup is reasonable has been
     replaced with Al's analysis of the code.

   - The split of wake up has been done differently (and I hope this is
     what Al meant). First the wake up has been pushed to the caller and
     then delayed to end_dir_add() after preemption has been enabled.

   - There is still __d_lookup_unhash(), __d_lookup_unhash_wake() and
     __d_lookup_done() is removed.
     __d_lookup_done() is removed because it is exported and the return
     value changes which will affect OOT users which are not aware of
     it.
     There is still d_lookup_done() which invokes
     __d_lookup_unhash_wake(). This can't remain in the header file due
     to cyclic depencies which in turn can't resolve wake_up_all()
     within the inline function.

The original cover letter:

PREEMPT_RT has two issues with the dcache code:

   1) i_dir_seq is a special cased sequence counter which uses the lowest
      bit as writer lock. This requires that preemption is disabled.

      On !RT kernels preemption is implictly disabled by spin_lock(), but
      that's not the case on RT kernels.

      Replacing i_dir_seq on RT with a seqlock_t comes with its own
      problems due to arbitrary lock nesting. Using a seqcount with an
      associated lock is not possible either because the locks held by the
      callers are not necessarily related.

      Explicitly disabling preemption on RT kernels across the i_dir_seq
      write held critical section is the obvious and simplest solution. The
      critical section is small, so latency is not really a concern.

   2) The wake up of dentry::d_wait waiters is in a preemption disabled
      section, which violates the RT constraints as wake_up_all() has
      to acquire the wait queue head lock which is a 'sleeping' spinlock
      on RT.

      There are two reasons for the non-preemtible section:

         A) The wake up happens under the hash list bit lock

         B) The wake up happens inside the i_dir_seq write side
            critical section

       #A is solvable by moving it outside of the hash list bit lock held                                                                                                                      
       section.

       Making #B preemptible on RT is hard or even impossible due to lock
       nesting constraints.

       A possible solution would be to replace the waitqueue by a simple
       waitqueue which can be woken up inside atomic sections on RT.

       But aside of Linus not being a fan of simple waitqueues, there is
       another observation vs. this wake up. It's likely for the woken up
       waiter to immediately contend on dentry::lock.

       It turns out that there is no reason to do the wake up within the
       i_dir_seq write held region. The only requirement is to do the wake
       up within the dentry::lock held region. Further details in the
       individual patches.

       That allows to move the wake up out of the non-preemptible section
       on RT, which also reduces the dentry::lock held time after wake up.

Thanks,

Sebastian



             reply	other threads:[~2022-07-27 11:49 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-27 11:49 Sebastian Andrzej Siewior [this message]
2022-07-27 11:49 ` [PATCH 1/4 v2] fs/dcache: d_add_ci() needs to complete parallel lookup Sebastian Andrzej Siewior
2022-07-27 11:49 ` [PATCH 2/4 v2] fs/dcache: Disable preemption on i_dir_seq write side on PREEMPT_RT Sebastian Andrzej Siewior
2022-07-27 11:49 ` [PATCH 3/4 v2] fs/dcache: Move the wakeup from __d_lookup_done() to the caller Sebastian Andrzej Siewior
2022-07-27 11:49 ` [PATCH 4/4 v2] fs/dcache: Move wakeup out of i_seq_dir write held region Sebastian Andrzej Siewior
2022-07-30  4:41 ` [PATCH 0/4 v2] fs/dcache: Resolve the last RT woes Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220727114904.130761-1-bigeasy@linutronix.de \
    --to=bigeasy@linutronix.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).