From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay2.corp.sgi.com [137.38.102.29]) by oss.sgi.com (Postfix) with ESMTP id 5B7577CA4 for ; Wed, 3 Aug 2016 01:17:00 -0500 (CDT) Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by relay2.corp.sgi.com (Postfix) with ESMTP id 02FCB304032 for ; Tue, 2 Aug 2016 23:16:56 -0700 (PDT) Received: from ipmail05.adl6.internode.on.net (ipmail05.adl6.internode.on.net [150.101.137.143]) by cuda.sgi.com with ESMTP id orlvYXDYnaZEoALv for ; Tue, 02 Aug 2016 23:16:50 -0700 (PDT) Date: Wed, 3 Aug 2016 16:14:30 +1000 From: Dave Chinner Subject: [current tot] XFS: Assertion failed: bp->b_flags & XBF_ASYNC, file: fs/xfs/xfs_buf.c, line: 118 Message-ID: <20160803061430.GS16044@dastard> MIME-Version: 1.0 Content-Disposition: inline List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: xfs@oss.sgi.com Cc: bfoster@redhat.com Hi Brian, It seems I can hit this fairly often on a single CPU, 1GB ram VM in generic/306: [12279.804308] XFS: Assertion failed: bp->b_flags & XBF_ASYNC, file: fs/xfs/xfs_buf.c, line: 118 [12279.806499] ------------[ cut here ]------------ [12279.807560] kernel BUG at fs/xfs/xfs_message.c:113! [12279.808717] invalid opcode: 0000 [#1] PREEMPT SMP [12279.809790] Modules linked in: [12279.810526] CPU: 0 PID: 8181 Comm: xfs_quota Tainted: G W 4.7.0-dgc+ #864 [12279.812362] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 [12279.814372] task: ffff880028ae0000 task.stack: ffff88000ec98000 [12279.815707] RIP: 0010:[] [] assfail+0x22/0x30 [12279.817477] RSP: 0018:ffff88000ec9bc28 EFLAGS: 00010282 [12279.818689] RAX: 00000000ffffffea RBX: ffff880008b51930 RCX: 0000000000000021 [12279.820318] RDX: ffff88000ec9bb50 RSI: 000000000000000a RDI: ffffffff823b0e6c [12279.822036] RBP: ffff88000ec9bc28 R08: 0000000000000000 R09: 0000000000000000 [12279.823643] R10: 000000000000000a R11: f000000000000000 R12: ffff880008b518c0 [12279.825277] R13: ffff88003d51c6e0 R14: ffff88003d51c600 R15: 0000000000000000 [12279.826892] FS: 00007f67553c8700(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000 [12279.828734] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [12279.830013] CR2: 00007f6207369008 CR3: 000000001445f000 CR4: 00000000000006f0 [12279.831621] Stack: [12279.832074] ffff88000ec9bc60 ffffffff8150f498 ffff880039c37920 ffff880034681200 [12279.833823] ffff88003d439000 ffff880039c37a98 ffff880039c37ad8 ffff88000ec9bca8 [12279.835528] ffffffff8154d705 ffff880008b518c0 ffff880008b518c0 0000000000000000 [12279.837257] Call Trace: [12279.837834] [] xfs_buf_rele+0x2b8/0x3b0 [12279.839078] [] xfs_qm_dqpurge+0x1d5/0x220 [12279.840367] [] ? xfs_qm_shrink_count+0x20/0x20 [12279.841765] [] xfs_qm_dquot_walk+0x100/0x170 [12279.843122] [] xfs_qm_dqpurge_all+0x52/0x70 [12279.844458] [] xfs_qm_scall_quotaoff+0x129/0x3f0 [12279.845887] [] xfs_quota_disable+0x3d/0x50 [12279.847197] [] SyS_quotactl+0x3c2/0x870 [12279.848435] [] ? SYSC_newstat+0x2f/0x40 [12279.849703] [] entry_SYSCALL_64_fastpath+0x1a/0xa4 [12279.851157] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 f1 41 89 d0 48 c7 c6 a8 03 40 82 48 89 e5 48 89 fa 31 c0 31 ff e8 4e fa ff ff <0f> 0b 66 90 66 [12279.856516] RIP [] assfail+0x22/0x30 [12279.857739] RSP [12279.859248] ---[ end trace 496ea0918ba4a5b4 ]--- I've seen it once also from inode reclaim, and AFAICT, the reason is the same both times: they call xfs_bwrite(). It looks to me like we get an XBF_INFLIGHT buffer, the reclaim code then tries to flush another object on the buffer, takes a reference to it and so prevents the XBF_INFLIGHT IO accounting from being decremented when the IO completes and releases. It then flushes the object to the buffer and calls xfs_bwrite() which clears XBF_ASYNC and it writes the buffer again. Once this completes, it calls xfs_buf_rele(), which drops the last reference and we try to release the XBF_INFLIGHT accounting. That then assert fails because XBF_ASYNC is not set. It looks to me like we should just remove the assert - I forgot about this particular corner case. Can you have a quick look and check whether my analysis above is correct or whether I've missed something else here? -Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs