From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ipmail06.adl6.internode.on.net ([150.101.137.145]:26530 "EHLO ipmail06.adl6.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751092AbdJAW6w (ORCPT ); Sun, 1 Oct 2017 18:58:52 -0400 Date: Mon, 2 Oct 2017 09:58:49 +1100 From: Dave Chinner Subject: Re: [Bug Report]: generic/085 trigger a XFS panic on kernel 4.14-rc2 Message-ID: <20171001225849.GH3666@dastard> References: <20170930032857.GQ21475@dhcp12-143.nay.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170930032857.GQ21475@dhcp12-143.nay.redhat.com> Sender: linux-xfs-owner@vger.kernel.org List-ID: List-Id: xfs To: Zorro Lang Cc: linux-xfs@vger.kernel.org On Sat, Sep 30, 2017 at 11:28:57AM +0800, Zorro Lang wrote: > Hi, > > I hit a panic[1] when I ran xfstests on debug kernel v4.14-rc2 > (with xfsprogs 4.13.1), and I can reproduce it on the same machine > twice. But I can't reproduce it on another machine. > > Maybe there're some hardware specific requirement to trigger this panic. I > tested on normal disk partition, but the disk is multi stripes RAID device. > I didn't get the mkfs output of g/085, bug I found the default mkfs output > (mkfs.xfs -f /dev/sda3) is: > > meta-data=/dev/sda3 isize=512 agcount=16, agsize=982528 blks > = sectsz=512 attr=2, projid32bit=1 > = crc=1 finobt=1, sparse=0, rmapbt=0, reflink=0 > data = bsize=1024 blocks=15720448, imaxpct=25 > = sunit=512 swidth=1024 blks > naming =version 2 bsize=4096 ascii-ci=0 ftype=1 > log =internal log bsize=1024 blocks=10240, version=2 > = sectsz=512 sunit=32 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 FWIW, I've come across a few of these log recovery crashes recently when reworking mkfs.xfs. The cause of them has always been either a log being too small or a mismatch between log size and log stripe unit configuration. The typical sign of that was either a negative buffer length like this one (XFS (dm-0): Invalid block length (0xfffffed8) for buffer) or the head/tail block initially being calculated before/after the actual log and so the log offset was negative. I'm guessing the recent log validity checking we've added isn't as robust as it should be, but I haven't had time to dig into it yet. I've debugged the issues far enough to point to mkfs being wrong with xfs_logprint - it runs the same head/tail recovery code as the kernel so typically crashes on the same problems as the kernel. It's much easier to debug in userspace with gdb, though..... Cheers, Dave. -- Dave Chinner david@fromorbit.com