From mboxrd@z Thu Jan 1 00:00:00 1970 From: Fengguang Wu Subject: Re: [XFS on bad superblock] BUG: unable to handle kernel NULL pointer dereference at 00000003 Date: Thu, 10 Oct 2013 14:03:34 +0800 Message-ID: <20131010060334.GA17576@localhost> References: <20131009073910.GA387@localhost> <20131010005900.GE2025@devil.localdomain> <20131010011640.GA5726@localhost> <20131010014117.GA6017@localhost> <20131010031515.GT4446@dastard> <20131010032637.GA12725@localhost> <20131010033300.GA12952@localhost> <20131010033834.GA13141@localhost> <20131010042820.GA5663@dastard> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: linux-kernel@vger.kernel.org, xfs@oss.sgi.com, Ben Myers , Dave Chinner , linux-fsdevel@vger.kernel.org, "ocfs2-devel@oss.oracle.com" To: Dave Chinner Return-path: Content-Disposition: inline In-Reply-To: <20131010042820.GA5663@dastard> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com List-Id: linux-fsdevel.vger.kernel.org On Thu, Oct 10, 2013 at 03:28:20PM +1100, Dave Chinner wrote: > On Thu, Oct 10, 2013 at 11:38:34AM +0800, Fengguang Wu wrote: > > On Thu, Oct 10, 2013 at 11:33:00AM +0800, Fengguang Wu wrote: > > > On Thu, Oct 10, 2013 at 11:26:37AM +0800, Fengguang Wu wrote: > > > > Dave, > > > > > > > > > I note that you have CONFIG_SLUB=y, which means that the cache slabs > > > > > are shared with objects of other types. That means that the memory > > > > > corruption problem is likely to be caused by one of the other > > > > > filesystems that is probing the block device(s), not XFS. > > > > > > > > Good to know that, it would easy to test then: just turn off every > > > > other filesystems. I'll try it right away. > > > > > > Seems that we don't even need to do that. A dig through the oops > > > database and I find stack dumps from other FS. > > > > > > This happens in the kernel with same kconfig and commit 3.12-rc1. > > > > Here is a summary of all FS with oops: > > > > 411 ocfs2_fill_super > > 189 xfs_fs_fill_super > > 86 jfs_fill_super > > 50 isofs_fill_super > > 33 fat_fill_super > > 18 vfat_fill_super > > 15 msdos_fill_super > > 11 ext2_fill_super > > 10 ext3_fill_super > > 3 reiserfs_fill_super > > The order of probing on the original dmesg output you reported is: > > ext3 > ext2 > fatfs > reiserfs > gfs2 > isofs > ocfs2 There are effectively no particular order, because there are many superblocks for these filesystems to scan. for superblocks: for filesystems: scan super block In the end, any filesystem may impact the other (and perhaps a later run of itself). > which means that no XFS filesystem was mounted in the original bug > report, and hence that further indicates that XFS is not responsible > for the problem and that perhaps the original bisect was not > reliable... This is an easily reproducible bug. And I further confirmed it in two ways: 1) turn off XFS, build 39 commits and boot them 2000+ times => no single mount error 2) turn off all other filesystems, build 2 kernels on v3.12-rc3 v3.12-rc4 and boot them => half boots have oops So it may well be that XFS is impacted by an early run of itself. Thanks, Fengguang _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs