From: Xiong Zhou <xzhou@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Xiong Zhou <xzhou@redhat.com>,
linux-next@vger.kernel.org, viro@zeniv.linux.org.uk,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: Linux-next parallel cp workload hang
Date: Wed, 18 May 2016 19:46:17 +0800 [thread overview]
Message-ID: <20160518114617.GC6551@dhcp12-144.nay.redhat.com> (raw)
In-Reply-To: <20160518095409.GX26977@dastard>
[-- Attachment #1: Type: text/plain, Size: 3796 bytes --]
Hi,
On Wed, May 18, 2016 at 07:54:09PM +1000, Dave Chinner wrote:
> On Wed, May 18, 2016 at 04:31:50PM +0800, Xiong Zhou wrote:
> > Hi,
> >
> > On Wed, May 18, 2016 at 03:56:34PM +1000, Dave Chinner wrote:
> > > On Wed, May 18, 2016 at 09:46:15AM +0800, Xiong Zhou wrote:
> > > > Hi,
> > > >
> > > > Parallel cp workload (xfstests generic/273) hangs like blow.
> > > > It's reproducible with a small chance, less the 1/100 i think.
> > > >
> > > > Have hit this in linux-next 20160504 0506 0510 trees, testing on
> > > > xfs with loop or block device. Ext4 survived several rounds
> > > > of testing.
> > > >
> > > > Linux next 20160510 tree hangs within 500 rounds testing several
> > > > times. The same tree with vfs parallel lookup patchset reverted
> > > > survived 900 rounds testing. Reverted commits are attached. >
> > > What hardware?
> >
> > A HP prototype host.
>
> description? cpus, memory, etc? I want to have some idea of what
> hardware I need to reproduce this...
#lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
Stepping: 2 CPU MHz: 2596.918
BogoMIPS: 5208.33
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 30720K
NUMA node0 CPU(s): 0-11,24-35
NUMA node1 CPU(s): 12-23,36-47
#free -m
total used free shared buff/cache available
Mem: 31782 623 27907 9 3251 30491
Swap: 10239 0 10239
>
> xfs_info from the scratch filesystem would also be handy.
meta-data=/dev/pmem1 isize=256 agcount=4, agsize=131072 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=0 finobt=0
data = bsize=4096 blocks=524288, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =internal bsize=4096 blocks=2560, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
>
> > > Can you reproduce this with CONFIG_XFS_DEBUG=y set? if you can, and
> > > it doesn't trigger any warnings or asserts, can you then try to
> > > reproduce it while tracing the following events:
> > >
> > > xfs_buf_lock
> > > xfs_buf_lock_done
> > > xfs_buf_trylock
> > > xfs_buf_unlock
> > >
> > > So we might be able to see if there's an unexpected buffer
> > > locking/state pattern occurring when the hang occurs?
> >
> > Yes, i've reproduced this with both CONFIG_XFS_DEBUG=y and the tracers
> > on. There are some trace output after hang for a while.
>
> I'm not actually interested in the trace after the hang - I'm
> interested in what happened leading up to the hang. The output
> you've given me tell me that the directory block at offset is locked
> but nothing in the trace tells me what locked it.
>
> Can I suggest using trace-cmd to record the events, then when the
> test hangs kill the check process so that trace-cmd terminates and
> gathers the events. Then dump the report to a text file and attach
> that?
Sure. Trace report, dmesg, ps axjf after Ctrl+C are attached.
Thanks for the instructions and patient.
Xiong
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
[-- Attachment #2: g273-trace-report.tar.gz --]
[-- Type: application/gzip, Size: 244506 bytes --]
next prev parent reply other threads:[~2016-05-18 11:46 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-18 1:46 Linux-next parallel cp workload hang Xiong Zhou
2016-05-18 5:56 ` Dave Chinner
2016-05-18 8:31 ` Xiong Zhou
2016-05-18 9:54 ` Dave Chinner
2016-05-18 11:46 ` Xiong Zhou [this message]
2016-05-18 14:17 ` Dave Chinner
2016-05-18 23:02 ` Dave Chinner
2016-05-19 6:22 ` Xiong Zhou
[not found] <20160518013802.GA4679@ZX.nay.redhat.com>
2016-05-18 2:06 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160518114617.GC6551@dhcp12-144.nay.redhat.com \
--to=xzhou@redhat.com \
--cc=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-next@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).