All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [GIT PULL] xfs: CIL and log scalability improvements
Date: Fri, 4 Jun 2021 19:15:33 -0700	[thread overview]
Message-ID: <20210605021533.GH26380@locust> (raw)
In-Reply-To: <20210605020354.GG26380@locust>

On Fri, Jun 04, 2021 at 07:03:54PM -0700, Darrick J. Wong wrote:
> On Fri, Jun 04, 2021 at 01:29:28PM +1000, Dave Chinner wrote:
> > Hi Darrick,
> > 
> > Can you please pull the CIL and log improvements from the tag listed
> > below?
> 
> I tried that and threw the series at fstests, which crashed all VMs with
> the following null pointer dereference:
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0 
> Oops: 0000 [#1] PREEMPT SMP
> CPU: 2 PID: 731060 Comm: mount Not tainted 5.13.0-rc4-djwx #rc4
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> RIP: 0010:xlog_cil_init+0x2f7/0x370 [xfs]
> Code: b4 7e a0 bf 1c 00 00 00 e8 c6 3b 8d e0 85 c0 78 0c c6 05 13 7f 12 00 01 e9 7b fd ff ff 4
> 2 48 c7 c6 f8 bf 7f a0 <48> 8b 39 e8 f0 24 04 00 31 ff b9 fc 05 00 00 48 c7 c2 d9 b3 7e a0
> RSP: 0018:ffffc9000776bcd0 EFLAGS: 00010286
> RAX: 00000000fffffff0 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 00000000fffffff0 RSI: ffffffffa07fbff8 RDI: 00000000ffffffff
> RBP: ffff888004cf3c00 R08: ffffffffa078fb40 R09: 0000000000000000
> R10: 000000000000000c R11: 0000000000000048 R12: ffff888052810000
> R13: 0000607f81c0b0f8 R14: ffff888004a09c00 R15: ffff888004a09c00
> FS:  00007fd2e4486840(0000) GS:ffff88807e000000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 00000000528e5004 CR4: 00000000001706a0
> Call Trace:
>  xlog_alloc_log+0x51f/0x5f0 [xfs]
>  xfs_log_mount+0x55/0x340 [xfs]
>  xfs_mountfs+0x4e4/0x9f0 [xfs]
>  xfs_fs_fill_super+0x4dd/0x7a0 [xfs]
>  ? suffix_kstrtoint.constprop.0+0xe0/0xe0 [xfs]
>  get_tree_bdev+0x175/0x280
>  vfs_get_tree+0x1a/0x80
>  ? capable+0x2f/0x50
>  path_mount+0x6fb/0xa90
>  __x64_sys_mount+0x103/0x140
>  do_syscall_64+0x3a/0x70
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7fd2e46e8dde
> Code: 48 8b 0d b5 80 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0
> f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 82 80 0c 00 f7 d8 64 89 01 48
> 
> I'm pretty sure that's due to:
> 
> 	if (!xlog_cil_pcp_init) {
> 		int	ret;
> 
> 		ret = cpuhp_setup_state_nocalls(CPUHP_XFS_CIL_DEAD,
> 						"xfs/cil_pcp:dead", NULL,
> 						xlog_cil_pcp_dead);
> 		if (ret < 0) {
> 			xfs_warn(cil->xc_log->l_mp,
> 	"Failed to initialise CIL hotplug, error %d. XFS is non-functional.",
> 				ret);
> 
> Because we haven't set cil->xc_log yet.

And having now fixed that, I get tons of:

XFS (sda): Failed to initialise CIL hotplug, error -16. XFS is non-functional.
XFS: Assertion failed: 0, file: fs/xfs/xfs_log_cil.c, line: 1532
------------[ cut here ]------------
WARNING: CPU: 2 PID: 113983 at fs/xfs/xfs_message.c:112 assfail+0x3c/0x40 [xfs]
Modules linked in: xfs libcrc32c ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_REDIR

EBUSY??

--D

> 
> > 
> > Cheers,
> > 
> > Dave.
> > 
> > The following changes since commit d07f6ca923ea0927a1024dfccafc5b53b61cfecc:
> > 
> >   Linux 5.13-rc2 (2021-05-16 15:27:44 -0700)
> > 
> > are available in the Git repository at:
> > 
> >   git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git tags/xfs-cil-scale-tag
> > 
> > for you to fetch changes up to 856d6346ad5e76f230906792b1eb4ba70c860748:
> > 
> >   xfs: expanding delayed logging design with background material (2021-06-04 12:42:29 +1000)
> > 
> > ----------------------------------------------------------------
> > xfs: CIL and log scalability improvements
> > 
> > Performance improvements are largely documented in the change logs of the
> > individual patches. Headline numbers are an increase in transaction rate from
> > 700k commits/s to 1.7M commits/s, and a reduction in fua/flush operations by
> > 2-3 orders of magnitude on metadata heavy workloads that don't use fsync.
> > 
> > Summary of series:
> > 
> > Patches         Modifications
> > -------         -------------
> > 1-7:            log write FUA/FLUSH optimisations
> > 8:              bug fix
> > 9-11:           Async CIL pushes
> > 12-25:          xlog_write() rework
> > 26-39:          CIL commit scalability
> > 
> > The log write FUA/FLUSH optimisations reduce the number of cache flushes
> > required to flush the CIL to the journal. It extends the old pre-delayed logging
> > ordering semantics required by writing individual transactions to the iclogs out
> > to cover then CIL checkpoint transactions rather than individual writes to the
> > iclogs. In doing so, we reduce the cache flush requirements to once per CIL
> > checkpoint rather than once per iclog write.
> > 
> > The async CIL pushes fix a pipeline limitation that only allowed a single CIL
> > push to be processed at a time. This was causing CIL checkpoint writing to
> > become CPU bound as only a single CIL checkpoint could be pushed at a time. The
> > checkpoint pipleine was designed to allow multiple pushes to be in flight at
> > once and use careful ordering of the commit records to ensure correct recovery
> > order, but the workqueue implementation didn't allow concurrent works to be run.
> > The concurrent works now extend out to 4 CIL checkpoints running at a time,
> > hence removing the CPU usage limiations without introducing new lock contention
> > issues.
> > 
> > The xlog_write() rework is long overdue. The code is complex, difficult to
> > understand, full of tricky, subtle corner cases and just generally really hard
> > to modify. This patchset reworks the xlog_write() API to reduce the processing
> > overhead of writing out long log vector chains, and factors the xlog_write()
> > code into a simple, compact fast path along with a clearer slow path to handle
> > the complex cases.
> > 
> > The CIL commit scalability patchset removes spinlocks from the transaction
> > commit fast path. These spinlocks are the performance limiting bottleneck in the
> > transaction commit path, so we apply a variety of different techniques to do
> > either atomic. lockless or per-cpu updates of the CIL tracking structures during
> > commits. This greatly increases the throughput of the the transaction commit
> > engine, moving the contention point to the log space tracking algorithms after
> > doubling throughput on 32-way workloads.
> > 
> > ----------------------------------------------------------------
> > Dave Chinner (39):
> >       xfs: log stripe roundoff is a property of the log
> >       xfs: separate CIL commit record IO
> >       xfs: remove xfs_blkdev_issue_flush
> >       xfs: async blkdev cache flush
> >       xfs: CIL checkpoint flushes caches unconditionally
> >       xfs: remove need_start_rec parameter from xlog_write()
> >       xfs: journal IO cache flush reductions
> >       xfs: Fix CIL throttle hang when CIL space used going backwards
> >       xfs: xfs_log_force_lsn isn't passed a LSN
> >       xfs: AIL needs asynchronous CIL forcing
> >       xfs: CIL work is serialised, not pipelined
> >       xfs: factor out the CIL transaction header building
> >       xfs: only CIL pushes require a start record
> >       xfs: embed the xlog_op_header in the unmount record
> >       xfs: embed the xlog_op_header in the commit record
> >       xfs: log tickets don't need log client id
> >       xfs: move log iovec alignment to preparation function
> >       xfs: reserve space and initialise xlog_op_header in item formatting
> >       xfs: log ticket region debug is largely useless
> >       xfs: pass lv chain length into xlog_write()
> >       xfs: introduce xlog_write_single()
> >       xfs:_introduce xlog_write_partial()
> >       xfs: xlog_write() no longer needs contwr state
> >       xfs: xlog_write() doesn't need optype anymore
> >       xfs: CIL context doesn't need to count iovecs
> >       xfs: use the CIL space used counter for emptiness checks
> >       xfs: lift init CIL reservation out of xc_cil_lock
> >       xfs: rework per-iclog header CIL reservation
> >       xfs: introduce per-cpu CIL tracking structure
> >       xfs: implement percpu cil space used calculation
> >       xfs: track CIL ticket reservation in percpu structure
> >       xfs: convert CIL busy extents to per-cpu
> >       xfs: Add order IDs to log items in CIL
> >       xfs: convert CIL to unordered per cpu lists
> >       xfs: convert log vector chain to use list heads
> >       xfs: move CIL ordering to the logvec chain
> >       xfs: avoid cil push lock if possible
> >       xfs: xlog_sync() manually adjusts grant head space
> >       xfs: expanding delayed logging design with background material
> > 
> >  Documentation/filesystems/xfs-delayed-logging-design.rst |  361 ++++++++++++++++++++++++++----
> >  fs/xfs/libxfs/xfs_log_format.h                           |    4 -
> >  fs/xfs/libxfs/xfs_types.h                                |    1 +
> >  fs/xfs/xfs_bio_io.c                                      |   35 +++
> >  fs/xfs/xfs_buf.c                                         |    2 +-
> >  fs/xfs/xfs_buf_item.c                                    |   39 ++--
> >  fs/xfs/xfs_dquot_item.c                                  |    2 +-
> >  fs/xfs/xfs_file.c                                        |   20 +-
> >  fs/xfs/xfs_inode.c                                       |   10 +-
> >  fs/xfs/xfs_inode_item.c                                  |   18 +-
> >  fs/xfs/xfs_inode_item.h                                  |    2 +-
> >  fs/xfs/xfs_linux.h                                       |    2 +
> >  fs/xfs/xfs_log.c                                         | 1015 +++++++++++++++++++++++++++++++++++++++---------------------------------------------
> >  fs/xfs/xfs_log.h                                         |   66 ++----
> >  fs/xfs/xfs_log_cil.c                                     |  822 ++++++++++++++++++++++++++++++++++++++++++++++++++------------------
> >  fs/xfs/xfs_log_priv.h                                    |  114 +++++-----
> >  fs/xfs/xfs_super.c                                       |   13 +-
> >  fs/xfs/xfs_super.h                                       |    1 -
> >  fs/xfs/xfs_sysfs.c                                       |    1 +
> >  fs/xfs/xfs_trace.c                                       |    1 +
> >  fs/xfs/xfs_trans.c                                       |   18 +-
> >  fs/xfs/xfs_trans.h                                       |    5 +-
> >  fs/xfs/xfs_trans_ail.c                                   |   11 +-
> >  fs/xfs/xfs_trans_priv.h                                  |    3 +-
> >  include/linux/cpuhotplug.h                               |    1 +
> >  25 files changed, 1603 insertions(+), 964 deletions(-)
> > 
> > -- 
> > Dave Chinner
> > david@fromorbit.com

  reply	other threads:[~2021-06-05  2:15 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-04  3:29 [GIT PULL] xfs: CIL and log scalability improvements Dave Chinner
2021-06-05  2:03 ` Darrick J. Wong
2021-06-05  2:15   ` Darrick J. Wong [this message]
2021-06-05 21:43     ` Dave Chinner
2021-06-05 22:29       ` Darrick J. Wong
2021-06-06 22:11     ` Dave Chinner
2021-06-07  0:00       ` Dave Chinner
2021-06-07  0:17         ` [PATCH 1/2] xfs: introduce CPU hotplug infrastructure Dave Chinner
2021-06-08  4:14           ` Darrick J. Wong
2021-06-07  0:18         ` [PATCH 2/2] xfs: introduce per-cpu CIL tracking structure Dave Chinner
2021-06-07 21:59         ` [GIT PULL] xfs: CIL and log scalability improvements Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210605021533.GH26380@locust \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.