linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@hammerspace.com>
To: "david@fromorbit.com" <david@fromorbit.com>
Cc: "bfoster@redhat.com" <bfoster@redhat.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"trondmy@kernel.org" <trondmy@kernel.org>,
	"hch@infradead.org" <hch@infradead.org>,
	"axboe@kernel.dk" <axboe@kernel.dk>,
	"djwong@kernel.org" <djwong@kernel.org>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
	"willy@infradead.org" <willy@infradead.org>
Subject: Re: [PATCH] iomap: Address soft lockup in iomap_finish_ioend()
Date: Wed, 5 Jan 2022 20:45:05 +0000	[thread overview]
Message-ID: <c9d9b7850c6086b123b4add4de7b1992cb62f6ad.camel@hammerspace.com> (raw)
In-Reply-To: <0996c40657b5873dda5119344bf74556491e27b9.camel@hammerspace.com>

On Tue, 2022-01-04 at 21:09 -0500, Trond Myklebust wrote:
> On Tue, 2022-01-04 at 12:22 +1100, Dave Chinner wrote:
> > On Tue, Jan 04, 2022 at 12:04:23AM +0000, Trond Myklebust wrote:
> > > On Tue, 2022-01-04 at 09:03 +1100, Dave Chinner wrote:
> > > > On Sat, Jan 01, 2022 at 05:39:45PM +0000, Trond Myklebust
> > > > wrote:
> > > > > On Sat, 2022-01-01 at 14:55 +1100, Dave Chinner wrote:
> > > > > > As it is, if you are getting soft lockups in this location,
> > > > > > that's
> > > > > > an indication that the ioend chain that is being built by
> > > > > > XFS
> > > > > > is
> > > > > > way, way too long. IOWs, the completion latency problem is
> > > > > > caused
> > > > > > by
> > > > > > a lack of submit side ioend chain length bounding in
> > > > > > combination
> > > > > > with unbound completion side merging in xfs_end_bio - it's
> > > > > > not a
> > > > > > problem with the generic iomap code....
> > > > > > 
> > > > > > Let's try to address this in the XFS code, rather than hack
> > > > > > unnecessary band-aids over the problem in the generic
> > > > > > code...
> > > > > > 
> > > > > > Cheers,
> > > > > > 
> > > > > > Dave.
> > > > > 
> > > > > Fair enough. As long as someone is working on a solution,
> > > > > then
> > > > > I'm
> > > > > happy. Just a couple of things:
> > > > > 
> > > > > Firstly, we've verified that the cond_resched() in the bio
> > > > > loop
> > > > > does
> > > > > suffice to resolve the issue with XFS, which would tend to
> > > > > confirm
> > > > > what
> > > > > you're saying above about the underlying issue being the
> > > > > ioend
> > > > > chain
> > > > > length.
> > > > > 
> > > > > Secondly, note that we've tested this issue with a variety of
> > > > > older
> > > > > kernels, including 4.18.x, 5.1.x and 5.15.x, so please bear
> > > > > in
> > > > > mind
> > > > > that it would be useful for any fix to be backward portable
> > > > > through
> > > > > the
> > > > > stable mechanism.
> > > > 
> > > > The infrastructure hasn't changed that much, so whatever the
> > > > result
> > > > is it should be backportable.
> > > > 
> > > > As it is, is there a specific workload that triggers this
> > > > issue?
> > > > Or
> > > > a specific machine config (e.g. large memory, slow storage).
> > > > Are
> > > > there large fragmented files in use (e.g. randomly written VM
> > > > image
> > > > files)? There are a few factors that can exacerbate the ioend
> > > > chain
> > > > lengths, so it would be handy to have some idea of what is
> > > > actually
> > > > triggering this behaviour...
> > > > 
> > > > Cheers,
> > > > 
> > > > Dave.
> > > 
> > > We have different reproducers. The common feature appears to be
> > > the
> > > need for a decently fast box with fairly large memory (128GB in
> > > one
> > > case, 400GB in the other). It has been reproduced with HDs, SSDs
> > > and
> > > NVME systems.
> > > 
> > > On the 128GB box, we had it set up with 10+ disks in a JBOD
> > > configuration and were running the AJA system tests.
> > > 
> > > On the 400GB box, we were just serially creating large (> 6GB)
> > > files
> > > using fio and that was occasionally triggering the issue. However
> > > doing
> > > an strace of that workload to disk reproduced the problem faster
> > > :-
> > > ).
> > 
> > Ok, that matches up with the "lots of logically sequential dirty
> > data on a single inode in cache" vector that is required to create
> > really long bio chains on individual ioends.
> > 
> > Can you try the patch below and see if addresses the issue?
> > 
> 
> That patch does seem to fix the soft lockups.
> 

Oops... Strike that, apparently our tests just hit the following when
running on AWS with that patch.

[Wed Jan  5 20:34:46 2022] watchdog: BUG: soft lockup - CPU#4 stuck for
48s! [kworker/4:1:31315]
[Wed Jan  5 20:34:46 2022] Modules linked in: nfsv3 auth_name
bpf_preload xt_nat veth nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4
dns_resolver nfsidmap nfs fscache netfs dm_multipath nfsd auth_rpcgss
nfs_acl lockd grace sunrpc xt_MASQUERADE nf_conntrack_netlink
xt_addrtype br_netfilter bridge stp llc overlay xt_sctp
nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp
nf_conntrack_ftp xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle
ip6table_security ip6table_raw iptable_nat nf_nat iptable_mangle
iptable_security iptable_raw nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
ip_set nfnetlink ip6table_filter ip6_tables iptable_filter bonding tls
ipmi_msghandler intel_rapl_msr intel_rapl_common isst_if_common nfit
libnvdimm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i2c_piix4
rapl ip_tables xfs nvme crc32c_intel ena nvme_core
[Wed Jan  5 20:34:46 2022] CPU: 4 PID: 31315 Comm: kworker/4:1 Kdump:
loaded Tainted: G        W    L    5.15.12-200.pd.17718.el7.x86_64 #1
[Wed Jan  5 20:34:46 2022] Hardware name: Amazon EC2 r5b.2xlarge/, BIOS
1.0 10/16/2017
[Wed Jan  5 20:34:46 2022] Workqueue: xfs-conv/nvme1n1 xfs_end_io [xfs]
[Wed Jan  5 20:34:46 2022] RIP:
0010:_raw_spin_unlock_irqrestore+0x1c/0x20
[Wed Jan  5 20:34:46 2022] Code: 92 cc cc cc cc cc cc cc cc cc cc cc cc
cc 0f 1f 44 00 00 c6 07 00 0f 1f 40 00 f7 c6 00 02 00 00 75 01 c3 fb 66
0f 1f 44 00 00 <c3> 0f 1f 00 0f 1f 44 00 00 8b 07 a9 ff 01 00 00 75 21
b8 00 02 00
[Wed Jan  5 20:34:46 2022] RSP: 0018:ffffb9738983fd10 EFLAGS: 00000206
[Wed Jan  5 20:34:46 2022] RAX: 0000000000000001 RBX: 0000000000000db0
RCX: fffffffffffff90f
[Wed Jan  5 20:34:46 2022] RDX: ffffffffa3808938 RSI: 0000000000000206
RDI: ffffffffa3808930
[Wed Jan  5 20:34:46 2022] RBP: 0000000000000206 R08: ffffb9738601fc80
R09: ffffb9738601fc80
[Wed Jan  5 20:34:46 2022] R10: 0000000000000000 R11: 0000000000000000
R12: ffffffffa3808930
[Wed Jan  5 20:34:46 2022] R13: ffffdda3db40dd40 R14: ffff930e1c62f538
R15: ffffdda3db40dd40
[Wed Jan  5 20:34:46 2022] FS:  0000000000000000(0000)
GS:ffff93164dd00000(0000) knlGS:0000000000000000
[Wed Jan  5 20:34:46 2022] CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[Wed Jan  5 20:34:46 2022] CR2: 00007ffe41f5c080 CR3: 00000005a5810001
CR4: 00000000007706e0
[Wed Jan  5 20:34:46 2022] DR0: 0000000000000000 DR1: 0000000000000000
DR2: 0000000000000000
[Wed Jan  5 20:34:46 2022] DR3: 0000000000000000 DR6: 00000000fffe0ff0
DR7: 0000000000000400
[Wed Jan  5 20:34:46 2022] PKRU: 55555554
[Wed Jan  5 20:34:46 2022] Call Trace:
[Wed Jan  5 20:34:46 2022]  <TASK>
[Wed Jan  5 20:34:46 2022]  wake_up_page_bit+0x79/0xe0
[Wed Jan  5 20:34:46 2022]  end_page_writeback+0xc4/0xf0
[Wed Jan  5 20:34:46 2022]  iomap_finish_ioend+0x130/0x260
[Wed Jan  5 20:34:46 2022]  iomap_finish_ioends+0x71/0x90
[Wed Jan  5 20:34:46 2022]  xfs_end_ioend+0x5a/0x120 [xfs]
[Wed Jan  5 20:34:46 2022]  xfs_end_io+0xa1/0xc0 [xfs]
[Wed Jan  5 20:34:46 2022]  process_one_work+0x1f1/0x390
[Wed Jan  5 20:34:46 2022]  worker_thread+0x53/0x3e0
[Wed Jan  5 20:34:46 2022]  ? process_one_work+0x390/0x390
[Wed Jan  5 20:34:46 2022]  kthread+0x127/0x150
[Wed Jan  5 20:34:46 2022]  ? set_kthread_struct+0x40/0x40
[Wed Jan  5 20:34:46 2022]  ret_from_fork+0x22/0x30
[Wed Jan  5 20:34:46 2022]  </TASK>


So it was harder to hit, but we still did eventually.

-- 
Trond Myklebust
CTO, Hammerspace Inc
4984 El Camino Real, Suite 208
Los Altos, CA 94022
​
www.hammer.space


  reply	other threads:[~2022-01-05 20:45 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-30 19:35 [PATCH] iomap: Address soft lockup in iomap_finish_ioend() trondmy
2021-12-30 21:24 ` Jens Axboe
2021-12-30 22:25   ` Trond Myklebust
2021-12-30 22:27     ` Jens Axboe
2021-12-30 22:55       ` Trond Myklebust
2021-12-31  1:42 ` Matthew Wilcox
2021-12-31  6:16   ` Trond Myklebust
2022-01-01  3:55     ` Dave Chinner
2022-01-01 17:39       ` Trond Myklebust
2022-01-03 22:03         ` Dave Chinner
2022-01-04  0:04           ` Trond Myklebust
2022-01-04  1:22             ` Dave Chinner
2022-01-04  3:01               ` Trond Myklebust
2022-01-04  7:08               ` hch
2022-01-04 18:08                 ` Matthew Wilcox
2022-01-04 18:14                   ` hch
2022-01-04 19:22                     ` Darrick J. Wong
2022-01-04 21:52                       ` Dave Chinner
2022-01-04 23:12                         ` Darrick J. Wong
2022-01-05  2:10                           ` Dave Chinner
2022-01-05 13:56                             ` Brian Foster
2022-01-05 22:04                               ` Dave Chinner
2022-01-06 16:44                                 ` Brian Foster
2022-01-10  8:18                                   ` Dave Chinner
2022-01-10 17:45                                     ` Brian Foster
2022-01-10 18:11                                       ` hch
2022-01-11 14:33                                       ` Trond Myklebust
2022-01-05 13:42                           ` hch
2022-01-04 21:16                 ` Dave Chinner
2022-01-05 13:43                   ` hch
2022-01-05 22:34                     ` Dave Chinner
2022-01-05  2:09               ` Trond Myklebust
2022-01-05 20:45                 ` Trond Myklebust [this message]
2022-01-05 22:48                   ` Dave Chinner
2022-01-05 23:29                     ` Trond Myklebust
2022-01-06  0:01                     ` Darrick J. Wong
2022-01-09 23:09                       ` Dave Chinner
2022-01-06 18:36                     ` Trond Myklebust
2022-01-06 18:38                       ` Trond Myklebust
2022-01-06 20:07                       ` Brian Foster
2022-01-07  3:08                         ` Trond Myklebust
2022-01-07 15:15                           ` Brian Foster
2022-01-09 23:34                       ` Dave Chinner
2022-01-10 23:37                       ` Dave Chinner
2022-01-11  0:08                         ` Dave Chinner
2022-01-13 17:01                         ` Trond Myklebust
2022-01-17 17:24                           ` Trond Myklebust
2022-01-17 17:36                             ` Darrick J. Wong
2022-01-04 13:36         ` Brian Foster
2022-01-04 19:23           ` Darrick J. Wong
2022-01-05  2:31 ` [iomap] f5934dda54: BUG:sleeping_function_called_from_invalid_context_at_fs/iomap/buffered-io.c kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c9d9b7850c6086b123b4add4de7b1992cb62f6ad.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=axboe@kernel.dk \
    --cc=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=trondmy@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).