From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from userp1040.oracle.com ([156.151.31.81]:29965 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935838AbdCJMEF (ORCPT ); Fri, 10 Mar 2017 07:04:05 -0500 Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id v2AC3hh1030730 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Fri, 10 Mar 2017 12:03:44 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id v2AC3gLD025079 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Fri, 10 Mar 2017 12:03:43 GMT Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id v2AC3gBH015462 for ; Fri, 10 Mar 2017 12:03:42 GMT Date: Fri, 10 Mar 2017 13:04:06 +0100 From: Quentin Casasnovas Subject: XFS race on umount Message-ID: <20170310120406.GU16870@chrystal> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="4dbYIUm7Fhfg+fSv" Content-Disposition: inline Sender: linux-xfs-owner@vger.kernel.org List-ID: List-Id: xfs To: linux-xfs@vger.kernel.org Cc: "Darrick J. Wong" --4dbYIUm7Fhfg+fSv Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi Guys, We've been using XFS recently on our build system because we found that it scales pretty well and we have good use for the reflink feature :) I think our setup is relivatively unique in that on every one of our build server, we mount hundreds of XFS filesystem from NBD devices in parallel, where our build environment are stored on qcow2 images and connected with qemu-nbd, then umount them when the build is finished. Those qcow2 images are stored on a NFS mount, which leads to some (expected) hickups when reading/writing blocks where sometimes the NBD layer will return some errors to the block layer, which in turn will pass them on to XFS. It could be due to network contention, very high load on the server, or any transcient error really, and in those cases, XFS will normally force shut down the filesystem and wait for a umount. All of this is fine and is exactly the behaviour we'd expect, though it turns out that we keep hiting what I think is a race condition between umount and a force shutdown from XFS itself, where I have a umount process completely stuck in xfs_ail_push_all_sync(): [] xfs_ail_push_all_sync+0x9e/0xe0 [] xfs_unmountfs+0x67/0x150 [] xfs_fs_put_super+0x20/0x70 [] generic_shutdown_super+0x6a/0xf0 [] kill_block_super+0x2b/0x80 [] deactivate_locked_super+0x47/0x80 [] deactivate_super+0x49/0x70 [] cleanup_mnt+0x3e/0x90 [] __cleanup_mnt+0xd/0x10 [] task_work_run+0x79/0xa0 [] exit_to_usermode_loop+0x4f/0x75 [] syscall_return_slowpath+0x5b/0x70 [] entry_SYSCALL_64_fastpath+0x96/0x98 [] 0xffffffffffffffff This is on a v4.10.1 kernel. I've had a look at xfs_ail_push_all_sync() and I wonder if there isn't a potential lost wake up problem, where I can't see that we retest the condition after setting the current process to TASK_UNINTERRUPTIBLE and before calling schedule() (though I know nothing about XFS internals...). Here's an exerpt of relevant dmesg messages that very likely happened at the same time the unmount process was started: [29961.767707] block nbd74: Other side returned error (22) [29961.837518] XFS (nbd74): metadata I/O error: block 0x6471ba0 ("xfs_tra= ns_read_buf_map") error 5 numblks 32 [29961.838172] block nbd74: Other side returned error (22) [29961.838179] block nbd74: Other side returned error (22) [29961.838184] block nbd74: Other side returned error (22) [29961.838203] block nbd74: Other side returned error (22) [29961.838208] block nbd74: Other side returned error (22) [29962.259551] XFS (nbd74): xfs_imap_to_bp: xfs_trans_read_buf() returned= error -5. [29962.356376] XFS (nbd74): xfs_do_force_shutdown(0x8) called from line 3= 454 of file fs/xfs/xfs_inode.c. Return address =3D 0xffffffff813bf471 [29962.503003] XFS (nbd74): Corruption of in-memory data detected. Shutt= ing down filesystem [29963.166314] XFS (nbd74): Please umount the filesystem and rectify the = problem(s) I'm pretty sure the process isn't deadlocking on the spinlock because it doesn't burn any CPU and is really out of the scheduler pool. It should be noted that when I noticed the hung umount process, I've manually tried to unmount the corresponding XFS mountpoint and that was fine, though it obviously didn't "unhang" the stuck umount process. Any help would be appreciated :) Thanks, Quentin --4dbYIUm7Fhfg+fSv Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- iQIcBAEBCgAGBQJYwpY2AAoJEB5Tt01po9cNCJwQALVGmoxLODVgIYjUl1sJHUzs X/28QMl3rb2U47jB/YUveIBFD4x7Ghy919sAR84+6TjnUMSS5/jf193rySUTNc91 9c+vZM9MGr6S9J8HGc1PmlirIY/mTJCRuzv84Z7BPW30QxJG1Ga0Dnw2Ucbojt5M zmXQMEuKp8qBJQtb9w5GnYkNsMrMbvOzexrMFe785wvaNy37t7ODgT/OQaAxA23X IYXluk039B4C0qdGrbdb+eULVq8+luCepV/qh/GuWfnjtoxO0l6qc6oTr4+A3nnG Mt5owE80BSM3h9ybJ2AJhasYho/Arq+Tqw6F6uFsJBxNlsxw0RLMuxYxXBwpPtJE ZpZ5xOWgXBz0OWfes5WVRT6dziQFOm7EG3sO/XgG/lJcnchwKv9FUp5MWTAFsuVn 0exagaP1tn+p5HP49zcdpoy0gn05dkx4vtDi30UDBqYZsGSF7+H1YCK976VJx3yG DXiAsJp+10CW3jMGt4oeOeYP34XbIg0uldEa9oaXIv4j4kh+Brg0rp5DIPgNl9/c Jk3iS3SvgtxOsEXqYg7EvFMm1vAeQLYw3MAUlnqRwWjL+1IzgkxmBH48bfFX9g/h X8Yb005LZv9QHIqG5CLZTxtotjULdapBbvXFfeqpQopsy5gR2bvXrkhqvyzJLTm0 p7/0UDOo0XnMSfvIvvKJ =UjEj -----END PGP SIGNATURE----- --4dbYIUm7Fhfg+fSv--