From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ingo Molnar Subject: Re: [Bug #13941] x86 Geode issue Date: Sun, 16 Aug 2009 23:01:34 +0200 Message-ID: <20090816210134.GA14972@elte.hu> References: <200908131654.45227.rjw@sisk.pl> <11fae7c70908130800q7b4a5293t5c373613d736d74@mail.gmail.com> <200908132034.34951.rjw@sisk.pl> <11fae7c70908161217p33830075p783880315a31b2e5@mail.gmail.com> <20090816205706.GB3463@elte.hu> Mime-Version: 1.0 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Content-Disposition: inline In-Reply-To: <20090816205706.GB3463-X9Un+BFzKDI@public.gmane.org> Sender: kernel-testers-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="iso-8859-1" To: =?iso-8859-1?Q?Martin-=C9ric?= Racine Cc: "Rafael J. Wysocki" , Alexander Viro , Linux Kernel Mailing List , Kernel Testers List * Ingo Molnar wrote: >=20 > * Martin-=C9ric Racine wrote: >=20 > > On Thu, Aug 13, 2009 at 9:34 PM, Rafael J. Wysocki wro= te: > > > On Thursday 13 August 2009, Martin-=C9ric Racine wrote: > > >> On Thu, Aug 13, 2009 at 5:54 PM, Rafael J. Wysocki = wrote: > > >> > On Thursday 13 August 2009, Martin-=C9ric Racine wrote: > > >> >> 2009/8/13 Martin-=C9ric Racine : > > >> >> > On Thu, Aug 13, 2009 at 12:07 PM, Ingo Molnar wrote: > > >> >> >> * Martin-=C9ric Racine wrote: > > >> >> >>> Yes, this bug is still valid. > > >> >> >>> > > >> >> >>> Ubuntu kernel team member Leann Ogasawara and I are slowl= y > > >> >> >>> bisecting our way through the changes that took place sin= ce 2.6.30 > > >> >> >>> to find the commit that introduced this regression. Pleas= e stay > > >> >> >>> tuned. > > >> >> >> > > >> >> >> hm, the only outright Geode related commit was: > > >> >> >> > > >> >> >> =A0d6c585a: x86: geode: Mark mfgpt irq IRQF_TIMER to preve= nt resume failure > > >> >> >> > > >> >> >> the jpg at: > > >> >> >> > > >> >> >> =A0http://launchpadlibrarian.net/28892781/00002.jpg > > >> >> >> > > >> >> >> is very out of focus - but what i could decypher suggests = a > > >> >> >> pagefault crash in the VFS code, in generic_delete_inode()= =2E > > >> >> > > >> >> This one might be a bit better: > > >> >> > > >> >> http://launchpadlibrarian.net/30267494/2.6.31-5.24.jpg > > > > > > Hmm. =A0This looks like a sysfs oops to my untrained eye. > >=20 > > The bisect I did with Leann Ogasawara has narrowed the kernel panic > > down to the following: > >=20 > > commit f19d4a8fa6f9b6ccf54df0971c97ffcaa390b7b0 > > Author: Al Viro > > Date: Mon Jun 8 19:50:45 2009 -0400 > >=20 > > add caching of ACLs in struct inode > >=20 > > No helpers, no conversions yet. > >=20 > > Signed-off-by: Al Viro >=20 > Weird. If the functions do what their name suggests, i.e. if=20 > inode_init_always() is an always called constructor and if=20 > destroy_inode() is an unconditional destructor then this patch=20 > should have no functional effect on the VFS side. >=20 > It increases the size of struct inode, so if you have some old=20 > module (built to an older version of fs.h) still around it might=20 > corrupt your inode data structure. >=20 > Or the size change might trigger some dormant bug. It might move a=20 > critical inode right into the path of a pre-existing (but not=20 > visibly crash-triggering) data corruption. >=20 > The possibilities on the 'weird bug' front are endless - the=20 > crash/oops itself should be turned into text, posted here and=20 > analyzed. Btw., before you invest any time into the 'weird crash' theory, i'd=20 suggest to double check the bisection result: f19d4a8fa6f9b6ccf54df0971c97ffcaa390b7b0 crashes f19d4a8fa6f9b6ccf54df0971c97ffcaa390b7b0~1 boots fine You can save yourself from a lot of head scratching that way - the=20 bisection result looks weird. (albeit plausible - a VFS crash points=20 to a VFS commit.) _Maybe_ the bisection is just off a little bit (there was a=20 bisection mistake in the last few steps), and the real buggy commit=20 is one of the nearby ones: 1cbd20d: switch xfs to generic acl caching helpers 073aaa1: helpers for acl caching + switch to those 06b16e9: switch shmem to inode->i_acl 281eede: switch reiserfs to inode->i_acl 7a77b15: switch reiserfs to usual conventions for caching ACLs e68888b: reiserfs: minimal fix for ACL caching d441b1c: switch nilfs2 to inode->i_acl 5affd88: switch btrfs to inode->i_acl 290c263: switch jffs2 to inode->i_acl 05fc079: switch jfs to inode->i_acl d4bfe2f: switch ext4 to inode->i_acl 6582a0e: switch ext3 to inode->i_acl 5e78b43: switch ext2 to inode->i_acl f19d4a8: add caching of ACLs in struct inode 3e63cbb: fs: Add new pre-allocation ioctls to vfs for compatibility wit= h legacy xfs ioctls 01c0319: cleanup __writeback_single_inode f21f622: ... and the same for vfsmount id/mount group id c63e09e: Make allocation of anon devices cheaper 7e325d3: update Documentation/filesystems/Locking f6cc746: devpts: remove module-related code 3b22edc: VFS: Switch init_mount_tree() to use the new create_mnt_ns() h= elper 654f562: vfs: fix nd->root leak in do_filp_open() b5450d9: reiserfs: remove stray unlock_super in reiserfs_resize c912e7a: ALSA: hda - Fix support for Samsung P50 with AD1986A codec Ingo From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756104AbZHPVBt (ORCPT ); Sun, 16 Aug 2009 17:01:49 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755983AbZHPVBs (ORCPT ); Sun, 16 Aug 2009 17:01:48 -0400 Received: from mx2.mail.elte.hu ([157.181.151.9]:55188 "EHLO mx2.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755545AbZHPVBq (ORCPT ); Sun, 16 Aug 2009 17:01:46 -0400 Date: Sun, 16 Aug 2009 23:01:34 +0200 From: Ingo Molnar To: =?iso-8859-1?Q?Martin-=C9ric?= Racine Cc: "Rafael J. Wysocki" , Alexander Viro , Linux Kernel Mailing List , Kernel Testers List Subject: Re: [Bug #13941] x86 Geode issue Message-ID: <20090816210134.GA14972@elte.hu> References: <200908131654.45227.rjw@sisk.pl> <11fae7c70908130800q7b4a5293t5c373613d736d74@mail.gmail.com> <200908132034.34951.rjw@sisk.pl> <11fae7c70908161217p33830075p783880315a31b2e5@mail.gmail.com> <20090816205706.GB3463@elte.hu> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20090816205706.GB3463@elte.hu> User-Agent: Mutt/1.5.18 (2008-05-17) X-ELTE-SpamScore: -1.5 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-1.5 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.2.5 -1.5 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Ingo Molnar wrote: > > * Martin-Éric Racine wrote: > > > On Thu, Aug 13, 2009 at 9:34 PM, Rafael J. Wysocki wrote: > > > On Thursday 13 August 2009, Martin-Éric Racine wrote: > > >> On Thu, Aug 13, 2009 at 5:54 PM, Rafael J. Wysocki wrote: > > >> > On Thursday 13 August 2009, Martin-Éric Racine wrote: > > >> >> 2009/8/13 Martin-Éric Racine : > > >> >> > On Thu, Aug 13, 2009 at 12:07 PM, Ingo Molnar wrote: > > >> >> >> * Martin-Éric Racine wrote: > > >> >> >>> Yes, this bug is still valid. > > >> >> >>> > > >> >> >>> Ubuntu kernel team member Leann Ogasawara and I are slowly > > >> >> >>> bisecting our way through the changes that took place since 2.6.30 > > >> >> >>> to find the commit that introduced this regression. Please stay > > >> >> >>> tuned. > > >> >> >> > > >> >> >> hm, the only outright Geode related commit was: > > >> >> >> > > >> >> >>  d6c585a: x86: geode: Mark mfgpt irq IRQF_TIMER to prevent resume failure > > >> >> >> > > >> >> >> the jpg at: > > >> >> >> > > >> >> >>  http://launchpadlibrarian.net/28892781/00002.jpg > > >> >> >> > > >> >> >> is very out of focus - but what i could decypher suggests a > > >> >> >> pagefault crash in the VFS code, in generic_delete_inode(). > > >> >> > > >> >> This one might be a bit better: > > >> >> > > >> >> http://launchpadlibrarian.net/30267494/2.6.31-5.24.jpg > > > > > > Hmm.  This looks like a sysfs oops to my untrained eye. > > > > The bisect I did with Leann Ogasawara has narrowed the kernel panic > > down to the following: > > > > commit f19d4a8fa6f9b6ccf54df0971c97ffcaa390b7b0 > > Author: Al Viro > > Date: Mon Jun 8 19:50:45 2009 -0400 > > > > add caching of ACLs in struct inode > > > > No helpers, no conversions yet. > > > > Signed-off-by: Al Viro > > Weird. If the functions do what their name suggests, i.e. if > inode_init_always() is an always called constructor and if > destroy_inode() is an unconditional destructor then this patch > should have no functional effect on the VFS side. > > It increases the size of struct inode, so if you have some old > module (built to an older version of fs.h) still around it might > corrupt your inode data structure. > > Or the size change might trigger some dormant bug. It might move a > critical inode right into the path of a pre-existing (but not > visibly crash-triggering) data corruption. > > The possibilities on the 'weird bug' front are endless - the > crash/oops itself should be turned into text, posted here and > analyzed. Btw., before you invest any time into the 'weird crash' theory, i'd suggest to double check the bisection result: f19d4a8fa6f9b6ccf54df0971c97ffcaa390b7b0 crashes f19d4a8fa6f9b6ccf54df0971c97ffcaa390b7b0~1 boots fine You can save yourself from a lot of head scratching that way - the bisection result looks weird. (albeit plausible - a VFS crash points to a VFS commit.) _Maybe_ the bisection is just off a little bit (there was a bisection mistake in the last few steps), and the real buggy commit is one of the nearby ones: 1cbd20d: switch xfs to generic acl caching helpers 073aaa1: helpers for acl caching + switch to those 06b16e9: switch shmem to inode->i_acl 281eede: switch reiserfs to inode->i_acl 7a77b15: switch reiserfs to usual conventions for caching ACLs e68888b: reiserfs: minimal fix for ACL caching d441b1c: switch nilfs2 to inode->i_acl 5affd88: switch btrfs to inode->i_acl 290c263: switch jffs2 to inode->i_acl 05fc079: switch jfs to inode->i_acl d4bfe2f: switch ext4 to inode->i_acl 6582a0e: switch ext3 to inode->i_acl 5e78b43: switch ext2 to inode->i_acl f19d4a8: add caching of ACLs in struct inode 3e63cbb: fs: Add new pre-allocation ioctls to vfs for compatibility with legacy xfs ioctls 01c0319: cleanup __writeback_single_inode f21f622: ... and the same for vfsmount id/mount group id c63e09e: Make allocation of anon devices cheaper 7e325d3: update Documentation/filesystems/Locking f6cc746: devpts: remove module-related code 3b22edc: VFS: Switch init_mount_tree() to use the new create_mnt_ns() helper 654f562: vfs: fix nd->root leak in do_filp_open() b5450d9: reiserfs: remove stray unlock_super in reiserfs_resize c912e7a: ALSA: hda - Fix support for Samsung P50 with AD1986A codec Ingo