From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1763949AbYDYRwS (ORCPT ); Fri, 25 Apr 2008 13:52:18 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755752AbYDYRwH (ORCPT ); Fri, 25 Apr 2008 13:52:07 -0400 Received: from smtp2f.orange.fr ([80.12.242.152]:9306 "EHLO smtp2f.orange.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753518AbYDYRwG convert rfc822-to-8bit (ORCPT ); Fri, 25 Apr 2008 13:52:06 -0400 X-ME-UUID: 20080425175203135.211C87000094@mwinf2f21.orange.fr Message-ID: <48121A37.5020504@cosmosbay.com> Date: Fri, 25 Apr 2008 19:51:51 +0200 From: Eric Dumazet User-Agent: Thunderbird 1.5.0.14 (Windows/20071210) MIME-Version: 1.0 To: Alexander van Heukelum Cc: Randy Dunlap , lkml Subject: Re: BUG in strnlen References: <20080425090901.ca642c4d.randy.dunlap@oracle.com> <48121331.1030204@cosmosbay.com> <1209145675.2005.1249890339@webmail.messagingengine.com> In-Reply-To: <1209145675.2005.1249890339@webmail.messagingengine.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Alexander van Heukelum a écrit : > On Fri, 25 Apr 2008 19:21:53 +0200, "Eric Dumazet" > said: > >> Randy Dunlap a écrit : >> >>> Hi, >>> >>> All of my daily testing (x86_64, 4 CPUs, 8 GB RAM) >>> since (after) 2.6.25 is seeing this BUG: >>> (i.e., 2.6.25 does not do this) >>> >>> >>> BUG: unable to handle kernel paging request at ffffffffa00b7551 >>> IP: [] strnlen+0x15/0x1f >>> PGD 203067 PUD 207063 PMD 27e44f067 PTE 0 >>> Oops: 0000 [1] SMP >>> CPU 3 >>> Modules linked in: hp_ilo parport_pc lp parport tg3 cciss ehci_hcd ohci_hcd uhci_hcd [last unloaded: reiserfs] >>> > > ------------------------------------------------------------------------------------------^^^^^^ > > >>> Pid: 20926, comm: cat Not tainted 2.6.25-git5 #1 >>> RIP: 0010:[] [] strnlen+0x15/0x1f >>> RSP: 0018:ffff810274981cc8 EFLAGS: 00010297 >>> RAX: ffffffffa00b7551 RBX: ffff810274981d38 RCX: ffffffff80603719 >>> RDX: ffff810274981d68 RSI: fffffffffffffffe RDI: ffffffffa00b7551 >>> RBP: ffff810274981cc8 R08: 00000000ffffffff R09: 00000000000000c8 >>> R10: 0000000000000050 R11: 0000000000000246 R12: ffff8102364600cc >>> R13: ffffffffa00b7551 R14: 0000000000000011 R15: 0000000000000010 >>> FS: 00007f956375d6f0(0000) GS:ffff81027f808980(0000) knlGS:00000000f7f7f6c0 >>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >>> CR2: ffffffffa00b7551 CR3: 00000002734d5000 CR4: 00000000000006e0 >>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>> Process cat (pid: 20926, threadinfo ffff810274980000, task ffff81026d18ce20) >>> Stack: ffff810274981d28 ffffffff80358d5a ffff810274981d28 0000000000000f34 >>> ffff8102364600cc ffff810236461000 ffffffff80603719 ffff81024ac14f00 >>> ffff81024ac14f00 0000000000000004 0000000000000000 0000000000000000 >>> Call Trace: >>> [] vsnprintf+0x31b/0x592 >>> [] seq_printf+0x7e/0xa7 >>> [] ? debug_mutex_free_waiter+0x46/0x4a >>> [] ? __down_read+0x17/0x92 >>> [] ? __mutex_lock_slowpath+0x1d8/0x1e5 >>> [] ? count_partial+0x45/0x4d >>> [] s_show+0x7e/0xcb >>> [] seq_read+0x10b/0x298 >>> [] proc_reg_read+0x7b/0x95 >>> [] vfs_read+0xab/0x154 >>> [] sys_read+0x47/0x6f >>> [] tracesys+0xd5/0xda >>> >>> >>> Code: 48 8d 44 11 ff 40 38 30 74 0a 48 ff c8 48 39 d0 73 f3 31 c0 c9 c3 55 48 89 f8 48 89 e5 eb 03 48 ff c0 48 ff ce 48 83 fe ff 74 05 <80> 38 00 75 ef c9 48 29 f8 c3 55 31 c0 48 89 e5 eb 13 41 38 c8 >>> RIP [] strnlen+0x15/0x1f >>> RSP >>> CR2: ffffffffa00b7551 >>> >>> >>> --- >>> >>> >>> >> My initial thoughts are : >> >> Fault address is 0xffffffffa00b7551 which is in module mapping space on >> x86_64 >> >> strnlen() is OK >> >> Some module created a kmem_cache (with kmem_cache_create()). >> slub or slab kept a pointer to the cache name in their internal >> structures. >> Module was unloaded but forgot to destroy kmem cache before unloading. >> >> Fault happens while doing "cat /proc/slabinfo", when trying to >> dereference cache name since module was unloaded and its memory unmapped. >> >> Next step is to find which module was unloaded ... >> > > The last one was reiserfs, apparently ;). > Yes but reiserfs correctly destroys its cache at unload time. Must be something else...