From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from aserp1040.oracle.com ([141.146.126.69]:38906 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933382AbdCJOi1 (ORCPT ); Fri, 10 Mar 2017 09:38:27 -0500 Date: Fri, 10 Mar 2017 15:38:46 +0100 From: Quentin Casasnovas Subject: Re: XFS race on umount Message-ID: <20170310143846.GA7971@chrystal.oracle.com> References: <20170310120406.GU16870@chrystal> <20170310140535.GB27272@bfoster.bfoster> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="45Z9DzgjV8m4Oswq" Content-Disposition: inline In-Reply-To: <20170310140535.GB27272@bfoster.bfoster> Sender: linux-xfs-owner@vger.kernel.org List-ID: List-Id: xfs To: Brian Foster Cc: Quentin Casasnovas , linux-xfs@vger.kernel.org, "Darrick J. Wong" --45Z9DzgjV8m4Oswq Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Mar 10, 2017 at 09:05:35AM -0500, Brian Foster wrote: > On Fri, Mar 10, 2017 at 01:04:06PM +0100, Quentin Casasnovas wrote: > > Hi Guys, > >=20 > > We've been using XFS recently on our build system because we found that= it > > scales pretty well and we have good use for the reflink feature :) > >=20 > > I think our setup is relivatively unique in that on every one of our bu= ild > > server, we mount hundreds of XFS filesystem from NBD devices in paralle= l, > > where our build environment are stored on qcow2 images and connected wi= th > > qemu-nbd, then umount them when the build is finished. Those qcow2 ima= ges > > are stored on a NFS mount, which leads to some (expected) hickups when > > reading/writing blocks where sometimes the NBD layer will return some > > errors to the block layer, which in turn will pass them on to XFS. It > > could be due to network contention, very high load on the server, or any > > transcient error really, and in those cases, XFS will normally force sh= ut > > down the filesystem and wait for a umount. > >=20 > > All of this is fine and is exactly the behaviour we'd expect, though it > > turns out that we keep hiting what I think is a race condition between > > umount and a force shutdown from XFS itself, where I have a umount proc= ess > > completely stuck in xfs_ail_push_all_sync(): > >=20 > > [] xfs_ail_push_all_sync+0x9e/0xe0 > > [] xfs_unmountfs+0x67/0x150 > > [] xfs_fs_put_super+0x20/0x70 > > [] generic_shutdown_super+0x6a/0xf0 > > [] kill_block_super+0x2b/0x80 > > [] deactivate_locked_super+0x47/0x80 > > [] deactivate_super+0x49/0x70 > > [] cleanup_mnt+0x3e/0x90 > > [] __cleanup_mnt+0xd/0x10 > > [] task_work_run+0x79/0xa0 > > [] exit_to_usermode_loop+0x4f/0x75 > > [] syscall_return_slowpath+0x5b/0x70 > > [] entry_SYSCALL_64_fastpath+0x96/0x98 > > [] 0xffffffffffffffff > >=20 > > This is on a v4.10.1 kernel. I've had a look at xfs_ail_push_all_sync() > > and I wonder if there isn't a potential lost wake up problem, where I c= an't > > see that we retest the condition after setting the current process to > > TASK_UNINTERRUPTIBLE and before calling schedule() (though I know nothi= ng > > about XFS internals...). > >=20 > > Here's an exerpt of relevant dmesg messages that very likely happened at > > the same time the unmount process was started: > >=20 > > [29961.767707] block nbd74: Other side returned error (22) > > [29961.837518] XFS (nbd74): metadata I/O error: block 0x6471ba0 ("xfs= _tra=3D > > ns_read_buf_map") error 5 numblks 32 > > [29961.838172] block nbd74: Other side returned error (22) > > [29961.838179] block nbd74: Other side returned error (22) > > [29961.838184] block nbd74: Other side returned error (22) > > [29961.838203] block nbd74: Other side returned error (22) > > [29961.838208] block nbd74: Other side returned error (22) > > [29962.259551] XFS (nbd74): xfs_imap_to_bp: xfs_trans_read_buf() retu= rned=3D > > error -5. > > [29962.356376] XFS (nbd74): xfs_do_force_shutdown(0x8) called from li= ne 3=3D > > 454 of file fs/xfs/xfs_inode.c. Return address =3D3D 0xffffffff813bf471 > > [29962.503003] XFS (nbd74): Corruption of in-memory data detected. S= hutt=3D > > ing down filesystem > > [29963.166314] XFS (nbd74): Please umount the filesystem and rectify = the =3D > > problem(s) > >=20 > > I'm pretty sure the process isn't deadlocking on the spinlock because it > > doesn't burn any CPU and is really out of the scheduler pool. It shoul= d be > > noted that when I noticed the hung umount process, I've manually tried = to > > unmount the corresponding XFS mountpoint and that was fine, though it > > obviously didn't "unhang" the stuck umount process. > >=20 >=20 > I'm not parsing the last bit here.. you were able to manually unmount > the hung unmount..? Sorry if I'm not very clear... There is a 'umount' process hung, with the stack trace I've pasted above. I then tried to run another umount on the mount point, and that seems to have succeeded. I'll need to reproduce the hang to confirm it was the case. >=20 > That aside, could you post a snippet of the tracepoint output > ('trace-cmd start -e "xfs:*"; cat /sys/kernel/debug/tracing/trace_pipe') > when the problem occurs? Also, how about the stack of the xfsaild thread > for that specific mount ('ps aux | grep xfsaild; cat > /proc//stack'). >=20 I'll try to collect the trace output requested next time it reproduces. In the meantime, the xfsaild trace isn't very telling much afaict: $ ps aux | grep umount | grep -v grep root 63425 0.0 0.0 18688 2552 ? DN Mar09 0:00 umount = -n /var/lib/chroot/mount/qcow2/i386sYA9rN-mnt $ sudo cat /proc/63425/stack [] xfs_ail_push_all_sync+0x9e/0xe0 [] xfs_unmountfs+0x67/0x150 [] xfs_fs_put_super+0x20/0x70 [] generic_shutdown_super+0x6a/0xf0 [] kill_block_super+0x2b/0x80 [] deactivate_locked_super+0x47/0x80 [] deactivate_super+0x49/0x70 [] cleanup_mnt+0x3e/0x90 [] __cleanup_mnt+0xd/0x10 [] task_work_run+0x79/0xa0 [] exit_to_usermode_loop+0x4f/0x75 [] syscall_return_slowpath+0x5b/0x70 [] entry_SYSCALL_64_fastpath+0x96/0x98 [] 0xffffffffffffffff $ ps aux | grep xfsaild/nbd116 | grep -v grep root 15256 6.8 0.0 0 0 ? S Mar09 110:13 [xfsail= d/nbd116] $ sudo cat /proc/15256/stack [] xfsaild+0xcc/0x5b0 [] kthread+0xf7/0x130 [] ret_from_fork+0x29/0x40 [] 0xffffffffffffffff $ sudo umount /var/lib/chroot/mount/qcow2/i386sYA9rN-mnt umount: /var/lib/chroot/mount/qcow2/i386sYA9rN-mnt: not mounted Thanks for the help! Cheers, Quentin --45Z9DzgjV8m4Oswq Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- iQIcBAEBCgAGBQJYwrp2AAoJEB5Tt01po9cNAsMP/2O30XSaCgwJn184M57kl3Yj bH99jbBONDKyISVJSxlI/D9UgrAbhwsCeLXm6c2ko77Zowba8PaVTUchV32yvIe2 ZrVL8Rq6thCaqYX5WdjYFtMi3f9eCOfRLZzkh/Vin43xANfT7cQgIas/CBWQqIuU MZvMzYd7zRoCkLE1HMxm0t0H39Uqj1F9udZJwpDy181AvuPyBvzjLvRMIX9nqxt2 7XiVYnHq0UxqK0m3jNgmUIFqFi4SMIoU52DLfmjeJ52lSE1nGrzLge5rf6Sccf9S BXClXBhP9N9FDPn3bKJru3NEgDuh38fKEuHM0Drt0FCqE9rDeYvHCfHMf5kjYJg3 mTyDsws4gXOPiP7GUuFn4l9nSxAMNStYwm//uK759BokSMmbor/4IFFxl9+rFF/D TDBzJy2bo2bFHU66R/lpX6M+oIhB+1n1BNyX2w3I05gtkT5qasJsScIgd+02ELqn kvvym5bRiMHJ32jDdlnFwXoK/mpymGrWovI9QxrZukdDVu+kndf26jbTBlH98f+C eBvpMfqquNVlLptTVufc0BjrNncBknNBHKICRyfiGbLQVekV73elQRRTAc8c8Tc1 tWs6ooxcYNGjuAFv1MT5H8jnw23bLJPPhn0qmzfqx0rOMHOIpqY6Wu789E3nJpd9 GbVQdoQmZgAfUSH+ulcC =1m0c -----END PGP SIGNATURE----- --45Z9DzgjV8m4Oswq--