From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx101.postini.com [74.125.245.101]) by kanga.kvack.org (Postfix) with SMTP id 0CDA26B004D for ; Mon, 19 Dec 2011 04:05:07 -0500 (EST) From: ebiederm@xmission.com (Eric W. Biederman) References: <20111121082445.GD1625@x4.trippels.de> <1321866988.2552.10.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC> <20111121131531.GA1679@x4.trippels.de> <1321884966.10470.2.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC> <20111121153621.GA1678@x4.trippels.de> <1321890510.10470.11.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC> <20111121161036.GA1679@x4.trippels.de> <20111121163459.GA1679@x4.trippels.de> <20111122083630.GA1672@x4.trippels.de> <20111219091909.GA1614@x4> Date: Mon, 19 Dec 2011 01:06:45 -0800 In-Reply-To: <20111219091909.GA1614@x4> (Markus Trippelsdorf's message of "Mon, 19 Dec 2011 10:19:09 +0100") Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413 Sender: owner-linux-mm@kvack.org List-ID: To: Markus Trippelsdorf Cc: Eric Dumazet , "Alex,Shi" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , Christoph Lameter , Pekka Enberg , Matt Mackall , "netdev@vger.kernel.org" , tj@kernel.org Markus Trippelsdorf writes: > On 2011.12.18 at 19:21 -0800, Eric W. Biederman wrote: >> Markus Trippelsdorf writes: >>=20 >> > On 2011.11.21 at 17:34 +0100, Markus Trippelsdorf wrote: >> >> On 2011.11.21 at 17:10 +0100, Markus Trippelsdorf wrote: >> >> > On 2011.11.21 at 16:48 +0100, Eric Dumazet wrote: >> >> > > Le lundi 21 novembre 2011 =C3=A0 16:36 +0100, Markus Trippelsdorf= a =C3=A9crit : >> >> > > > On 2011.11.21 at 15:16 +0100, Eric Dumazet wrote: >> >> > > > > Le lundi 21 novembre 2011 =C3=A0 14:15 +0100, Markus Trippels= dorf a =C3=A9crit : >> >> > > > >=20 >> >> > > > > > I've enabled CONFIG_SLUB_DEBUG_ON and this is what happend: >> >> > > > > >=20 >> >> > > > >=20 >> >> > > > > Thanks >> >> > > > >=20 >> >> > > > > Please continue to provide more samples. >> >> > > > >=20 >> >> > > > > There is something wrong somewhere, but where exactly, its ha= rd to say. >> >> > > >=20 >> >> > > > New sample. This one points to lib/idr.c: >> >> > > >=20 >> >> > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D >> >> > > > BUG idr_layer_cache: Poison overwritten >> >> > > > ---------------------------------------------------------------= -------------- >> >> > >=20 >> >> > > Thanks, could you now add "CONFIG_DEBUG_PAGEALLOC=3Dy" in your co= nfig as >> >> > > well ? >> >> >=20 >> >> > Sure. This one happend with CONFIG_DEBUG_PAGEALLOC=3Dy: >> >> >=20 >> >> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D >> >> > BUG task_struct: Poison overwritten >> >> > -------------------------------------------------------------------= ---------- >> >>=20 >> >> And sometimes this one that I've reported earlier already: >> >>=20 >> >> (see: http://thread.gmane.org/gmane.linux.kernel/1215023 ) >> >>=20 >> >> ------------[ cut here ]------------ >> >> WARNING: at fs/sysfs/sysfs.h:195 sysfs_get_inode+0x136/0x140() >> >> Hardware name: System Product Name >> >> Pid: 1876, comm: slabinfo Not tainted 3.2.0-rc2-00274-g6fe4c6d #72 >> >> Call Trace: >> >> [] warn_slowpath_common+0x75/0xb0 >> >> [] warn_slowpath_null+0x15/0x20 >> >> [] sysfs_get_inode+0x136/0x140 >> >> [] sysfs_lookup+0x6f/0x110 >> >> [] d_alloc_and_lookup+0x39/0x80 >> >> [] do_lookup+0x294/0x3a0 >> >> [] ? inode_permission+0x7a/0xb0 >> >> [] do_last.isra.46+0x137/0x7f0 >> >> [] path_openat+0xc6/0x370 >> >> [] ? getname_flags+0x36/0x230 >> >> [] ? handle_mm_fault+0x192/0x290 >> >> [] do_filp_open+0x3c/0x90 >> >> [] ? alloc_fd+0xdc/0x120 >> >> [] do_sys_open+0xe7/0x1c0 >> >> [] sys_open+0x1b/0x20 >> >> [] system_call_fastpath+0x16/0x1b >> >> ---[ end trace b1377eb8b131d37d ]--- >> > >> > Hm, the "sysfs: use rb-tree" thing hit again during boot. Could this be >> > the root cause of this all? >> > >> > I wrote down the following: >> > >> > RIP : rb_next >> > >> > Trace: >> > sysfs_dir_pos >> > sysfs_readdir >> > ? sys_ioctl >> > vfs_readdir >> > sys_getdents >>=20 >> Thanks for reporting this. >>=20 >> Has this by any chance been resolved or stopped happening? > > Yes. > >> This looks for all of the world like something is stomping your sysfs >> dirents. I haven't seen anyone else complaining so this seems like the >> problem is unique to your configuration. Which suggests that it is not >> sysfs itself that is wrong. >>=20 >> I have been through the code a time or two and I haven't seen anything >> obviously wrong. Everything that sysfs does is protected by the >> sysfs_mutex so the locking is very very simple. >>=20 >> My best guess of why now is that the rbtree code make a sysfs dirent >> 48 bytes larger. And so it is much more exposed to these kinds of >> problems. > > Sorry, but your subsystem was just accidentally hit by a bug in the > Radeon driver, that sometimes randomly writes 0 dwords somewhere to > memory after a kexec boot (see the rest of this huge thread). > It's still not fixed in mainline, because Linus refused to take the fix > this late in the series. Awesome.=20=20 I guess that means I am only responsible for the Radeon driver having the opportunity to take that code path. It is nice to see kexec being used and being well known enough I didn't have to get involved. Eric -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751464Ab1LSJFL (ORCPT ); Mon, 19 Dec 2011 04:05:11 -0500 Received: from out03.mta.xmission.com ([166.70.13.233]:49321 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751034Ab1LSJFI convert rfc822-to-8bit (ORCPT ); Mon, 19 Dec 2011 04:05:08 -0500 From: ebiederm@xmission.com (Eric W. Biederman) To: Markus Trippelsdorf Cc: Eric Dumazet , "Alex\,Shi" , "linux-kernel\@vger.kernel.org" , "linux-mm\@kvack.org" , Christoph Lameter , Pekka Enberg , Matt Mackall , "netdev\@vger.kernel.org" , tj@kernel.org References: <20111121082445.GD1625@x4.trippels.de> <1321866988.2552.10.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC> <20111121131531.GA1679@x4.trippels.de> <1321884966.10470.2.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC> <20111121153621.GA1678@x4.trippels.de> <1321890510.10470.11.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC> <20111121161036.GA1679@x4.trippels.de> <20111121163459.GA1679@x4.trippels.de> <20111122083630.GA1672@x4.trippels.de> <20111219091909.GA1614@x4> Date: Mon, 19 Dec 2011 01:06:45 -0800 In-Reply-To: <20111219091909.GA1614@x4> (Markus Trippelsdorf's message of "Mon, 19 Dec 2011 10:19:09 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-XM-SPF: eid=;;;mid=;;;hst=in01.mta.xmission.com;;;ip=98.207.153.68;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1+ComxrYSTbWkzK1SZSfRzKZe3gTmKh8mQ= X-SA-Exim-Connect-IP: 98.207.153.68 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 XM_URI_RBL_RM URI removed in uri.bl.xmission.com * [URIs: gmane.org] * 1.5 TR_Symld_Words too many words that have symbols inside * 0.0 T_TM2_M_HEADER_IN_MSG BODY: T_TM2_M_HEADER_IN_MSG * -3.0 BAYES_00 BODY: Bayes spam probability is 0 to 1% * [score: 0.0000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa03 1397; Body=1 Fuz1=1 Fuz2=1] * 0.4 UNTRUSTED_Relay Comes from a non-trusted relay X-Spam-DCC: XMission; sa03 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ;Markus Trippelsdorf X-Spam-Relay-Country: ** Subject: Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413 X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Fri, 06 Aug 2010 16:31:04 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Markus Trippelsdorf writes: > On 2011.12.18 at 19:21 -0800, Eric W. Biederman wrote: >> Markus Trippelsdorf writes: >> >> > On 2011.11.21 at 17:34 +0100, Markus Trippelsdorf wrote: >> >> On 2011.11.21 at 17:10 +0100, Markus Trippelsdorf wrote: >> >> > On 2011.11.21 at 16:48 +0100, Eric Dumazet wrote: >> >> > > Le lundi 21 novembre 2011 à 16:36 +0100, Markus Trippelsdorf a écrit : >> >> > > > On 2011.11.21 at 15:16 +0100, Eric Dumazet wrote: >> >> > > > > Le lundi 21 novembre 2011 à 14:15 +0100, Markus Trippelsdorf a écrit : >> >> > > > > >> >> > > > > > I've enabled CONFIG_SLUB_DEBUG_ON and this is what happend: >> >> > > > > > >> >> > > > > >> >> > > > > Thanks >> >> > > > > >> >> > > > > Please continue to provide more samples. >> >> > > > > >> >> > > > > There is something wrong somewhere, but where exactly, its hard to say. >> >> > > > >> >> > > > New sample. This one points to lib/idr.c: >> >> > > > >> >> > > > ============================================================================= >> >> > > > BUG idr_layer_cache: Poison overwritten >> >> > > > ----------------------------------------------------------------------------- >> >> > > >> >> > > Thanks, could you now add "CONFIG_DEBUG_PAGEALLOC=y" in your config as >> >> > > well ? >> >> > >> >> > Sure. This one happend with CONFIG_DEBUG_PAGEALLOC=y: >> >> > >> >> > ============================================================================= >> >> > BUG task_struct: Poison overwritten >> >> > ----------------------------------------------------------------------------- >> >> >> >> And sometimes this one that I've reported earlier already: >> >> >> >> (see: http://thread.gmane.org/gmane.linux.kernel/1215023 ) >> >> >> >> ------------[ cut here ]------------ >> >> WARNING: at fs/sysfs/sysfs.h:195 sysfs_get_inode+0x136/0x140() >> >> Hardware name: System Product Name >> >> Pid: 1876, comm: slabinfo Not tainted 3.2.0-rc2-00274-g6fe4c6d #72 >> >> Call Trace: >> >> [] warn_slowpath_common+0x75/0xb0 >> >> [] warn_slowpath_null+0x15/0x20 >> >> [] sysfs_get_inode+0x136/0x140 >> >> [] sysfs_lookup+0x6f/0x110 >> >> [] d_alloc_and_lookup+0x39/0x80 >> >> [] do_lookup+0x294/0x3a0 >> >> [] ? inode_permission+0x7a/0xb0 >> >> [] do_last.isra.46+0x137/0x7f0 >> >> [] path_openat+0xc6/0x370 >> >> [] ? getname_flags+0x36/0x230 >> >> [] ? handle_mm_fault+0x192/0x290 >> >> [] do_filp_open+0x3c/0x90 >> >> [] ? alloc_fd+0xdc/0x120 >> >> [] do_sys_open+0xe7/0x1c0 >> >> [] sys_open+0x1b/0x20 >> >> [] system_call_fastpath+0x16/0x1b >> >> ---[ end trace b1377eb8b131d37d ]--- >> > >> > Hm, the "sysfs: use rb-tree" thing hit again during boot. Could this be >> > the root cause of this all? >> > >> > I wrote down the following: >> > >> > RIP : rb_next >> > >> > Trace: >> > sysfs_dir_pos >> > sysfs_readdir >> > ? sys_ioctl >> > vfs_readdir >> > sys_getdents >> >> Thanks for reporting this. >> >> Has this by any chance been resolved or stopped happening? > > Yes. > >> This looks for all of the world like something is stomping your sysfs >> dirents. I haven't seen anyone else complaining so this seems like the >> problem is unique to your configuration. Which suggests that it is not >> sysfs itself that is wrong. >> >> I have been through the code a time or two and I haven't seen anything >> obviously wrong. Everything that sysfs does is protected by the >> sysfs_mutex so the locking is very very simple. >> >> My best guess of why now is that the rbtree code make a sysfs dirent >> 48 bytes larger. And so it is much more exposed to these kinds of >> problems. > > Sorry, but your subsystem was just accidentally hit by a bug in the > Radeon driver, that sometimes randomly writes 0 dwords somewhere to > memory after a kexec boot (see the rest of this huge thread). > It's still not fixed in mainline, because Linus refused to take the fix > this late in the series. Awesome. I guess that means I am only responsible for the Radeon driver having the opportunity to take that code path. It is nice to see kexec being used and being well known enough I didn't have to get involved. Eric From mboxrd@z Thu Jan 1 00:00:00 1970 From: ebiederm@xmission.com (Eric W. Biederman) Subject: Re: WARNING: at mm/slub.c:3357, kernel BUG at mm/slub.c:3413 Date: Mon, 19 Dec 2011 01:06:45 -0800 Message-ID: References: <20111121082445.GD1625@x4.trippels.de> <1321866988.2552.10.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC> <20111121131531.GA1679@x4.trippels.de> <1321884966.10470.2.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC> <20111121153621.GA1678@x4.trippels.de> <1321890510.10470.11.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC> <20111121161036.GA1679@x4.trippels.de> <20111121163459.GA1679@x4.trippels.de> <20111122083630.GA1672@x4.trippels.de> <20111219091909.GA1614@x4> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: Eric Dumazet , "Alex\,Shi" , "linux-kernel\@vger.kernel.org" , "linux-mm\@kvack.org" , Christoph Lameter , Pekka Enberg , Matt Mackall , "netdev\@vger.kernel.org" , tj@kernel.org To: Markus Trippelsdorf Return-path: In-Reply-To: <20111219091909.GA1614@x4> (Markus Trippelsdorf's message of "Mon, 19 Dec 2011 10:19:09 +0100") Sender: owner-linux-mm@kvack.org List-Id: netdev.vger.kernel.org Markus Trippelsdorf writes: > On 2011.12.18 at 19:21 -0800, Eric W. Biederman wrote: >> Markus Trippelsdorf writes: >>=20 >> > On 2011.11.21 at 17:34 +0100, Markus Trippelsdorf wrote: >> >> On 2011.11.21 at 17:10 +0100, Markus Trippelsdorf wrote: >> >> > On 2011.11.21 at 16:48 +0100, Eric Dumazet wrote: >> >> > > Le lundi 21 novembre 2011 =C3=A0 16:36 +0100, Markus Trippelsdorf= a =C3=A9crit : >> >> > > > On 2011.11.21 at 15:16 +0100, Eric Dumazet wrote: >> >> > > > > Le lundi 21 novembre 2011 =C3=A0 14:15 +0100, Markus Trippels= dorf a =C3=A9crit : >> >> > > > >=20 >> >> > > > > > I've enabled CONFIG_SLUB_DEBUG_ON and this is what happend: >> >> > > > > >=20 >> >> > > > >=20 >> >> > > > > Thanks >> >> > > > >=20 >> >> > > > > Please continue to provide more samples. >> >> > > > >=20 >> >> > > > > There is something wrong somewhere, but where exactly, its ha= rd to say. >> >> > > >=20 >> >> > > > New sample. This one points to lib/idr.c: >> >> > > >=20 >> >> > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D >> >> > > > BUG idr_layer_cache: Poison overwritten >> >> > > > ---------------------------------------------------------------= -------------- >> >> > >=20 >> >> > > Thanks, could you now add "CONFIG_DEBUG_PAGEALLOC=3Dy" in your co= nfig as >> >> > > well ? >> >> >=20 >> >> > Sure. This one happend with CONFIG_DEBUG_PAGEALLOC=3Dy: >> >> >=20 >> >> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D >> >> > BUG task_struct: Poison overwritten >> >> > -------------------------------------------------------------------= ---------- >> >>=20 >> >> And sometimes this one that I've reported earlier already: >> >>=20 >> >> (see: http://thread.gmane.org/gmane.linux.kernel/1215023 ) >> >>=20 >> >> ------------[ cut here ]------------ >> >> WARNING: at fs/sysfs/sysfs.h:195 sysfs_get_inode+0x136/0x140() >> >> Hardware name: System Product Name >> >> Pid: 1876, comm: slabinfo Not tainted 3.2.0-rc2-00274-g6fe4c6d #72 >> >> Call Trace: >> >> [] warn_slowpath_common+0x75/0xb0 >> >> [] warn_slowpath_null+0x15/0x20 >> >> [] sysfs_get_inode+0x136/0x140 >> >> [] sysfs_lookup+0x6f/0x110 >> >> [] d_alloc_and_lookup+0x39/0x80 >> >> [] do_lookup+0x294/0x3a0 >> >> [] ? inode_permission+0x7a/0xb0 >> >> [] do_last.isra.46+0x137/0x7f0 >> >> [] path_openat+0xc6/0x370 >> >> [] ? getname_flags+0x36/0x230 >> >> [] ? handle_mm_fault+0x192/0x290 >> >> [] do_filp_open+0x3c/0x90 >> >> [] ? alloc_fd+0xdc/0x120 >> >> [] do_sys_open+0xe7/0x1c0 >> >> [] sys_open+0x1b/0x20 >> >> [] system_call_fastpath+0x16/0x1b >> >> ---[ end trace b1377eb8b131d37d ]--- >> > >> > Hm, the "sysfs: use rb-tree" thing hit again during boot. Could this be >> > the root cause of this all? >> > >> > I wrote down the following: >> > >> > RIP : rb_next >> > >> > Trace: >> > sysfs_dir_pos >> > sysfs_readdir >> > ? sys_ioctl >> > vfs_readdir >> > sys_getdents >>=20 >> Thanks for reporting this. >>=20 >> Has this by any chance been resolved or stopped happening? > > Yes. > >> This looks for all of the world like something is stomping your sysfs >> dirents. I haven't seen anyone else complaining so this seems like the >> problem is unique to your configuration. Which suggests that it is not >> sysfs itself that is wrong. >>=20 >> I have been through the code a time or two and I haven't seen anything >> obviously wrong. Everything that sysfs does is protected by the >> sysfs_mutex so the locking is very very simple. >>=20 >> My best guess of why now is that the rbtree code make a sysfs dirent >> 48 bytes larger. And so it is much more exposed to these kinds of >> problems. > > Sorry, but your subsystem was just accidentally hit by a bug in the > Radeon driver, that sometimes randomly writes 0 dwords somewhere to > memory after a kexec boot (see the rest of this huge thread). > It's still not fixed in mainline, because Linus refused to take the fix > this late in the series. Awesome.=20=20 I guess that means I am only responsible for the Radeon driver having the opportunity to take that code path. It is nice to see kexec being used and being well known enough I didn't have to get involved. Eric -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org