Re: Linux-next parallel cp workload hang

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Xiong Zhou <xzhou@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Xiong Zhou <xzhou@redhat.com>,
	linux-next@vger.kernel.org, viro@zeniv.linux.org.uk,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: Linux-next parallel cp workload hang
Date: Wed, 18 May 2016 19:46:17 +0800	[thread overview]
Message-ID: <20160518114617.GC6551@dhcp12-144.nay.redhat.com> (raw)
In-Reply-To: <20160518095409.GX26977@dastard>

[-- Attachment #1: Type: text/plain, Size: 3796 bytes --]

Hi,

On Wed, May 18, 2016 at 07:54:09PM +1000, Dave Chinner wrote:
> On Wed, May 18, 2016 at 04:31:50PM +0800, Xiong Zhou wrote:
> > Hi,
> > 
> > On Wed, May 18, 2016 at 03:56:34PM +1000, Dave Chinner wrote:
> > > On Wed, May 18, 2016 at 09:46:15AM +0800, Xiong Zhou wrote:
> > > > Hi,
> > > > 
> > > > Parallel cp workload (xfstests generic/273) hangs like blow.
> > > > It's reproducible with a small chance, less the 1/100 i think.
> > > > 
> > > > Have hit this in linux-next 20160504 0506 0510 trees, testing on
> > > > xfs with loop or block device. Ext4 survived several rounds
> > > > of testing.
> > > > 
> > > > Linux next 20160510 tree hangs within 500 rounds testing several
> > > > times. The same tree with vfs parallel lookup patchset reverted
> > > > survived 900 rounds testing. Reverted commits are attached.  > 
> > > What hardware?
> > 
> > A HP prototype host.
> 
> description? cpus, memory, etc? I want to have some idea of what
> hardware I need to reproduce this...

#lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                48
On-line CPU(s) list:   0-47
Thread(s) per core:    2
Core(s) per socket:    12
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
Stepping:              2 CPU MHz:               2596.918
BogoMIPS:              5208.33
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              30720K
NUMA node0 CPU(s):     0-11,24-35
NUMA node1 CPU(s):     12-23,36-47

#free -m
        total        used        free      shared  buff/cache   available
Mem:    31782         623       27907           9        3251       30491
Swap:   10239           0       10239

> 
> xfs_info from the scratch filesystem would also be handy.

meta-data=/dev/pmem1             isize=256    agcount=4, agsize=131072 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=0        finobt=0
data     =                       bsize=4096   blocks=524288, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> > > Can you reproduce this with CONFIG_XFS_DEBUG=y set? if you can, and
> > > it doesn't trigger any warnings or asserts, can you then try to
> > > reproduce it while tracing the following events:
> > > 
> > > 	xfs_buf_lock
> > > 	xfs_buf_lock_done
> > > 	xfs_buf_trylock
> > > 	xfs_buf_unlock
> > > 
> > > So we might be able to see if there's an unexpected buffer
> > > locking/state pattern occurring when the hang occurs?
> > 
> > Yes, i've reproduced this with both CONFIG_XFS_DEBUG=y and the tracers
> > on. There are some trace output after hang for a while.
> 
> I'm not actually interested in the trace after the hang - I'm
> interested in what happened leading up to the hang. The output
> you've given me tell me that the directory block at offset is locked
> but nothing in the trace tells me what locked it.
> 
> Can I suggest using trace-cmd to record the events, then when the
> test hangs kill the check process so that trace-cmd terminates and
> gathers the events. Then dump the report to a text file and attach
> that?

Sure. Trace report, dmesg, ps axjf after Ctrl+C are attached.

Thanks for the instructions and patient.
Xiong
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

[-- Attachment #2: g273-trace-report.tar.gz --]
[-- Type: application/gzip, Size: 244506 bytes --]

next prev parent reply	other threads:[~2016-05-18 11:46 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-18  1:46 Linux-next parallel cp workload hang Xiong Zhou
2016-05-18  5:56 ` Dave Chinner
2016-05-18  8:31   ` Xiong Zhou
2016-05-18  9:54     ` Dave Chinner
2016-05-18 11:46       ` Xiong Zhou [this message]
2016-05-18 14:17         ` Dave Chinner
2016-05-18 23:02           ` Dave Chinner
2016-05-19  6:22             ` Xiong Zhou
     [not found] <20160518013802.GA4679@ZX.nay.redhat.com>
2016-05-18  2:06 ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160518114617.GC6551@dhcp12-144.nay.redhat.com \
    --to=xzhou@redhat.com \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-next@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).