From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com ([209.132.183.28]:50208 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750953AbdBFR7K (ORCPT ); Mon, 6 Feb 2017 12:59:10 -0500 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9038980F7D for ; Mon, 6 Feb 2017 17:59:10 +0000 (UTC) Date: Mon, 6 Feb 2017 12:59:08 -0500 From: Brian Foster Subject: Re: [BUG] xfs/305 hangs 4.10-rc4 kernel Message-ID: <20170206175908.GI57865@bfoster.bfoster> References: <20170125063943.GF1859@eguan.usersys.redhat.com> <20170125145215.GC28388@bfoster.bfoster> <20170126032950.GM1859@eguan.usersys.redhat.com> <20170126184427.GA39683@bfoster.bfoster> <20170127025219.GR1859@eguan.usersys.redhat.com> <20170130181224.GC8737@bfoster.bfoster> <20170130215952.GA11230@bfoster.bfoster> <20170204114700.GH1859@eguan.usersys.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170204114700.GH1859@eguan.usersys.redhat.com> Sender: linux-xfs-owner@vger.kernel.org List-ID: List-Id: xfs To: Eryu Guan Cc: linux-xfs@vger.kernel.org On Sat, Feb 04, 2017 at 07:47:00PM +0800, Eryu Guan wrote: > On Mon, Jan 30, 2017 at 04:59:52PM -0500, Brian Foster wrote: > > > > I reproduced an xfs_wait_buftarg() unmount hang once that looks like a > > separate issue (occurs after the test, long after quotaoff has > > completed). I haven't reproduced that one again nor the original hang in > > 100+ iterations so far. Care to give the following a whirl in your > > environment? Thanks. > > I applied your test patch on top of 4.10-rc4 stock kernel and hit > xfs/305 hang at 82nd iteration. I attached the dmesg and sysrq-w log. > You can login the same RH internal test host if that's helpful, I left > the host running in the hang state. > Ok, that's not too surprising. It does look like we are in some kind of live lock situation. xfs_quota is spinning on the dqpurge and two or three fsstress workers are spinning on xfs_iget() retries via bulkstat. I'm going to hard reboot this box and try to restart this test with some customized tracepoints to try and get more data.. Brian > Thanks, > Eryu