From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from aserp1040.oracle.com ([141.146.126.69]:39802 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750949AbdJBWPF (ORCPT ); Mon, 2 Oct 2017 18:15:05 -0400 Date: Mon, 2 Oct 2017 15:15:00 -0700 From: "Darrick J. Wong" Subject: Re: [Bug Report]: generic/085 trigger a XFS panic on kernel 4.14-rc2 Message-ID: <20171002221500.GA6503@magnolia> References: <20170930032857.GQ21475@dhcp12-143.nay.redhat.com> <20171001225849.GH3666@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171001225849.GH3666@dastard> Sender: linux-xfs-owner@vger.kernel.org List-ID: List-Id: xfs To: Dave Chinner Cc: Zorro Lang , linux-xfs@vger.kernel.org On Mon, Oct 02, 2017 at 09:58:49AM +1100, Dave Chinner wrote: > On Sat, Sep 30, 2017 at 11:28:57AM +0800, Zorro Lang wrote: > > Hi, > > > > I hit a panic[1] when I ran xfstests on debug kernel v4.14-rc2 > > (with xfsprogs 4.13.1), and I can reproduce it on the same machine > > twice. But I can't reproduce it on another machine. > > > > Maybe there're some hardware specific requirement to trigger this panic. I > > tested on normal disk partition, but the disk is multi stripes RAID device. > > I didn't get the mkfs output of g/085, bug I found the default mkfs output > > (mkfs.xfs -f /dev/sda3) is: > > > > meta-data=/dev/sda3 isize=512 agcount=16, agsize=982528 blks > > = sectsz=512 attr=2, projid32bit=1 > > = crc=1 finobt=1, sparse=0, rmapbt=0, reflink=0 > > data = bsize=1024 blocks=15720448, imaxpct=25 > > = sunit=512 swidth=1024 blks > > naming =version 2 bsize=4096 ascii-ci=0 ftype=1 > > log =internal log bsize=1024 blocks=10240, version=2 > > = sectsz=512 sunit=32 blks, lazy-count=1 > > realtime =none extsz=4096 blocks=0, rtextents=0 > > FWIW, I've come across a few of these log recovery crashes recently > when reworking mkfs.xfs. The cause of them has always been either a > log being too small or a mismatch between log size and log stripe > unit configuration. The typical sign of that was either a > negative buffer length like this one (XFS (dm-0): Invalid block > length (0xfffffed8) for buffer) or the head/tail block initially > being calculated before/after the actual log and so the log offset > was negative. > > I'm guessing the recent log validity checking we've added isn't as > robust as it should be, but I haven't had time to dig into it yet. > I've debugged the issues far enough to point to mkfs being wrong > with xfs_logprint - it runs the same head/tail recovery code as the > kernel so typically crashes on the same problems as the kernel. It's > much easier to debug in userspace with gdb, though..... Just to pile on with everyone else: I've noticed that fuzzing logsunit to -1 causes the mount process to spit out a bunch of recovery-related io errors. Shortly thereafter the kernel crashes too. --D > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html