Re: [PATCH 0/6 v3] xfs: lockless buffer lookups

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 0/6 v3] xfs: lockless buffer lookups
Date: Wed, 13 Jul 2022 10:03:21 -0700	[thread overview]
Message-ID: <Ys762Vr5+jLgMUWZ@magnolia> (raw)
In-Reply-To: <Ys76W8V72KJmXN+B@magnolia>

On Wed, Jul 13, 2022 at 10:01:15AM -0700, Darrick J. Wong wrote:
> On Fri, Jul 08, 2022 at 09:52:53AM +1000, Dave Chinner wrote:
> > Hi folks,
> > 
> > Current work to merge the XFS inode life cycle with the VFS indoe
> > life cycle is finding some interesting issues. If we have a path
> > that hits buffer trylocks fairly hard (e.g. a non-blocking
> > background inode freeing function), we end up hitting massive
> > contention on the buffer cache hash locks:
> 
> Hmm.  I applied this to a test branch and this fell out of xfs/436 when
> it runs rmmod xfs.  I'll see if I can reproduce it more regularly, but
> thought I'd put this out there early...

...and I should have mentioned that this VM was running with
MKFS_OPTIONS='-i nrext64=1 -d rmapbt=1' and always_cow turned on.

--D

> XFS (sda3): Unmounting Filesystem
> =============================================================================
> BUG xfs_buf (Not tainted): Objects remaining in xfs_buf on __kmem_cache_shutdown()
> -----------------------------------------------------------------------------
> 
> Slab 0xffffea000443b780 objects=18 used=4 fp=0xffff888110edf340 flags=0x17ff80000010200(slab|head|node=0|zone=2|lastcpupid=0xfff)
> CPU: 3 PID: 30378 Comm: modprobe Not tainted 5.19.0-rc5-djwx #rc5 bebda13a030d0898279476b6652ddea67c2060cc
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20171121_152543-x86-ol7-builder-01.us.oracle.com-4.el7.1 04/01/2014
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x34/0x44
>  slab_err+0x95/0xc9
>  __kmem_cache_shutdown.cold+0x39/0x1e9
>  kmem_cache_destroy+0x49/0x130
>  exit_xfs_fs+0x50/0xc57 [xfs 370e1c994a59de083c05cd4df389f629878b8122]
>  __do_sys_delete_module.constprop.0+0x145/0x220
>  ? exit_to_user_mode_prepare+0x6c/0x100
>  do_syscall_64+0x35/0x80
>  entry_SYSCALL_64_after_hwframe+0x46/0xb0
> RIP: 0033:0x7fe7d7877c9b
> Code: 73 01 c3 48 8b 0d 95 21 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 65 21 0f 00 f7 d8 64 89 01 48
> RSP: 002b:00007fffb911cab8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
> RAX: ffffffffffffffda RBX: 0000555a217adcc0 RCX: 00007fe7d7877c9b
> RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000555a217add28
> RBP: 0000555a217adcc0 R08: 0000000000000000 R09: 0000000000000000
> R10: 00007fe7d790fac0 R11: 0000000000000206 R12: 0000555a217add28
> R13: 0000000000000000 R14: 0000555a217add28 R15: 00007fffb911ede8
>  </TASK>
> Disabling lock debugging due to kernel taint
> Object 0xffff888110ede000 @offset=0
> Object 0xffff888110ede1c0 @offset=448
> Object 0xffff888110edefc0 @offset=4032
> Object 0xffff888110edf6c0 @offset=5824
> 
> --D
> 
> > -   92.71%     0.05%  [kernel]                  [k] xfs_inodegc_worker
> >    - 92.67% xfs_inodegc_worker
> >       - 92.13% xfs_inode_unlink
> >          - 91.52% xfs_inactive_ifree
> >             - 85.63% xfs_read_agi
> >                - 85.61% xfs_trans_read_buf_map
> >                   - 85.59% xfs_buf_read_map
> >                      - xfs_buf_get_map
> >                         - 85.55% xfs_buf_find
> >                            - 72.87% _raw_spin_lock
> >                               - do_raw_spin_lock
> >                                    71.86% __pv_queued_spin_lock_slowpath
> >                            - 8.74% xfs_buf_rele
> >                               - 7.88% _raw_spin_lock
> >                                  - 7.88% do_raw_spin_lock
> >                                       7.63% __pv_queued_spin_lock_slowpath
> >                            - 1.70% xfs_buf_trylock
> >                               - 1.68% down_trylock
> >                                  - 1.41% _raw_spin_lock_irqsave
> >                                     - 1.39% do_raw_spin_lock
> >                                          __pv_queued_spin_lock_slowpath
> >                            - 0.76% _raw_spin_unlock
> >                                 0.75% do_raw_spin_unlock
> > 
> > This is basically hammering the pag->pag_buf_lock from lots of CPUs
> > doing trylocks at the same time. Most of the buffer trylock
> > operations ultimately fail after we've done the lookup, so we're
> > really hammering the buf hash lock whilst making no progress.
> > 
> > We can also see significant spinlock traffic on the same lock just
> > under normal operation when lots of tasks are accessing metadata
> > from the same AG, so let's avoid all this by creating a lookup fast
> > path which leverages the rhashtable's ability to do rcu protected
> > lookups.
> > 
> > This is a rework of the initial lockless buffer lookup patch I sent
> > here:
> > 
> > https://lore.kernel.org/linux-xfs/20220328213810.1174688-1-david@fromorbit.com/
> > 
> > And the alternative cleanup sent by Christoph here:
> > 
> > https://lore.kernel.org/linux-xfs/20220403120119.235457-1-hch@lst.de/
> > 
> > This version isn't quite a short as Christophs, but it does roughly
> > the same thing in killing the two-phase _xfs_buf_find() call
> > mechanism. It separates the fast and slow paths a little more
> > cleanly and doesn't have context dependent buffer return state from
> > the slow path that the caller needs to handle. It also picks up the
> > rhashtable insert optimisation that Christoph added.
> > 
> > This series passes fstests under several different configs and does
> > not cause any obvious regressions in scalability testing that has
> > been performed. Hence I'm proposing this as potential 5.20 cycle
> > material.
> > 
> > Thoughts, comments?
> > 
> > Version 3:
> > - rebased onto linux-xfs/for-next
> > - rearranged some of the changes to avoid repeated shuffling of code
> >   to different locations
> > - fixed typos in commits
> > - s/xfs_buf_find_verify/xfs_buf_map_verify/
> > - s/xfs_buf_find_fast/xfs_buf_lookup/
> > 
> > Version 2:
> > - https://lore.kernel.org/linux-xfs/20220627060841.244226-1-david@fromorbit.com/
> > - based on 5.19-rc2
> > - high speed collision of original proposals.
> > 
> > Initial versions:
> > - https://lore.kernel.org/linux-xfs/20220403120119.235457-1-hch@lst.de/
> > - https://lore.kernel.org/linux-xfs/20220328213810.1174688-1-david@fromorbit.com/
> > 
> >

next prev parent reply	other threads:[~2022-07-13 17:03 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-07 23:52 [PATCH 0/6 v3] xfs: lockless buffer lookups Dave Chinner
2022-07-07 23:52 ` [PATCH 1/6] xfs: rework xfs_buf_incore() API Dave Chinner
2022-07-07 23:52 ` [PATCH 2/6] xfs: break up xfs_buf_find() into individual pieces Dave Chinner
2022-07-09 22:58   ` Darrick J. Wong
2022-07-07 23:52 ` [PATCH 3/6] xfs: merge xfs_buf_find() and xfs_buf_get_map() Dave Chinner
2022-07-10  0:15   ` Darrick J. Wong
2022-07-11  5:14   ` Christoph Hellwig
2022-07-12  0:01     ` Dave Chinner
2022-07-07 23:52 ` [PATCH 4/6] xfs: reduce the number of atomic when locking a buffer after lookup Dave Chinner
2022-07-07 23:52 ` [PATCH 5/6] xfs: remove a superflous hash lookup when inserting new buffers Dave Chinner
2022-07-07 23:52 ` [PATCH 6/6] xfs: lockless buffer lookup Dave Chinner
2022-07-10  0:15   ` Darrick J. Wong
2022-07-13 17:01 ` [PATCH 0/6 v3] xfs: lockless buffer lookups Darrick J. Wong
2022-07-13 17:03   ` Darrick J. Wong [this message]
2022-07-14  1:32   ` Dave Chinner
2022-07-14  2:11     ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ys762Vr5+jLgMUWZ@magnolia \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox