Re: 2.6.22.6: kernel BUG at fs/locks.c:171

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: 2.6.22.6: kernel BUG at fs/locks.c:171
  2007-09-13  9:20 2.6.22.6: kernel BUG at fs/locks.c:171 Soeren Sonnenburg
@ 2007-09-12 23:51 ` Nick Piggin
  2007-09-14  6:02   ` Soeren Sonnenburg
  0 siblings, 1 reply; 12+ messages in thread
From: Nick Piggin @ 2007-09-12 23:51 UTC (permalink / raw)
  To: Soeren Sonnenburg, linux-fsdevel; +Cc: Linux Kernel

[-- Attachment #1: Type: text/plain, Size: 6211 bytes --]

On Thursday 13 September 2007 19:20, Soeren Sonnenburg wrote:
> Dear all,
>
> I've just seen this in dmesg on a AMD K7 / kernel 2.6.22.6 machine
> (config attached).
>
> Any ideas / which further information needed ?

Thanks for the report. Is it reproduceable? It seems like the
locks_free_lock call that's oopsing is coming from __posix_lock_file.
The actual function looks fine, but the lock being freed could have
been corrupted if there was slab corruption, or a hardware corruption.

You could: try running memtest86+ overnight. And try the following
patch and turn on slab debugging then try to reproduce the problem.


>
> Soeren
>
> ------------[ cut here ]------------
> kernel BUG at fs/locks.c:171!
> invalid opcode: 0000 [#1]
> Modules linked in: ipt_iprange ipt_REDIRECT capi kernelcapi capifs
> ipt_REJECT xt_tcpudp xt_state xt_limit ipt_LOG ipt_MASQUERADE
> iptable_mangle iptable_nat nf_conntrack_ipv4 iptable_filter ip_tables
> x_tables b44 ohci1394 ieee1394 nf_nat_ftp nf_nat nf_conntrack_ftp
> nf_conntrack lcd tda827x saa7134_dvb dvb_pll video_buf_dvb tuner tda1004x
> ves1820 usb_storage usblp saa7134 compat_ioctl32 budget_ci budget_core
> dvb_ttpci dvb_core saa7146_vv video_buf saa7146 ttpci_eeprom via_agp
> ir_kbd_i2c videodev v4l2_common v4l1_compat ir_common agpgart CPU:    0
> EIP:    0060:[<c0158f59>]    Not tainted VLI
> EFLAGS: 00010206   (2.6.22.6 #1)
> EIP is at locks_free_lock+0xb/0x3b
> eax: e1d07f9c   ebx: e1d07f80   ecx: f5f5e2f0   edx: 00000000
> esi: 00000000   edi: 00000000   ebp: 00000000   esp: da3d7f04
> ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
> Process mrtg-load (pid: 19688, ti=da3d6000 task=f5e3a030 task.ti=da3d6000)
> Stack: 00000000 c015972b 00000002 c04889c8 c012b920 f5f5e290 c048541c
> f0ed3ca0 01485414 00000000 e1d07f80 00000000 f0f39f58 44ef35f1 f62fc2ac
> 00000000 00000000 f5f5e290 00000000 d23106c0 c015a891 00000000 00000007
> 00000004 Call Trace:
>  [<c015972b>] __posix_lock_file+0x44e/0x47f
>  [<c012b920>] getnstimeofday+0x2b/0xaf
>  [<c015a891>] fcntl_setlk+0xff/0x1f6
>  [<c011d836>] do_setitimer+0xfa/0x226
>  [<c0156b87>] sys_fcntl64+0x74/0x85
>  [<c0103ade>] syscall_call+0x7/0xb
>  =======================
> Code: 74 1b 8b 15 30 93 48 c0 8d 43 04 89 53 04 89 42 04 a3 30 93 48 c0 c7
> 40 04 30 93 48 c0 5b 5e c3 53 89 c3 8d 40 1c 39 43 1c 74 04 <0f> 0b eb fe
> 8d 43 0c 39 43 0c 74 04 0f 0b eb fe 8d 43 04 39 43 EIP: [<c0158f59>]
> locks_free_lock+0xb/0x3b SS:ESP 0068:da3d7f04
> BUG: unable to handle kernel paging request at virtual address 9ee420b0
>  printing eip:
> c014ab7d
> *pde = 00000000
> Oops: 0002 [#2]
> Modules linked in: ipt_iprange ipt_REDIRECT capi kernelcapi capifs
> ipt_REJECT xt_tcpudp xt_state xt_limit ipt_LOG ipt_MASQUERADE
> iptable_mangle iptable_nat nf_conntrack_ipv4 iptable_filter ip_tables
> x_tables b44 ohci1394 ieee1394 nf_nat_ftp nf_nat nf_conntrack_ftp
> nf_conntrack lcd tda827x saa7134_dvb dvb_pll video_buf_dvb tuner tda1004x
> ves1820 usb_storage usblp saa7134 compat_ioctl32 budget_ci budget_core
> dvb_ttpci dvb_core saa7146_vv video_buf saa7146 ttpci_eeprom via_agp
> ir_kbd_i2c videodev v4l2_common v4l1_compat ir_common agpgart CPU:    0
> EIP:    0060:[<c014ab7d>]    Not tainted VLI
> EFLAGS: 00010082   (2.6.22.6 #1)
> EIP is at free_block+0x61/0xfb
> eax: a75b2c19   ebx: c1cf6c10   ecx: e1d070c4   edx: 9ee420ac
> esi: e1d07000   edi: dfde6960   ebp: dfde7620   esp: dfd87f44
> ds: 007b   es: 007b   fs: 0000  gs: 0000  ss: 0068
> Process events/0 (pid: 4, ti=dfd86000 task=dfdc4a50 task.ti=dfd86000)
> Stack: 00000012 00000000 00000018 00000000 c1cf6c10 c1cf6c10 00000018
> c1cf6c00 dfde7620 c014ac86 00000000 dfde6960 dfde7620 c0521d20 00000000
> c014b869 00000000 00000000 dfde69e0 c0521d20 c014b827 c0125955 dfdc4b5c
> 8f0c99c0 Call Trace:
>  [<c014ac86>] drain_array+0x6f/0x89
>  [<c014b869>] cache_reap+0x42/0xde
>  [<c014b827>] cache_reap+0x0/0xde
>  [<c0125955>] run_workqueue+0x6b/0xdf
>  [<c0125ec7>] worker_thread+0x0/0xbd
>  [<c0125f79>] worker_thread+0xb2/0xbd
>  [<c0128221>] autoremove_wake_function+0x0/0x35
>  [<c01280cc>] kthread+0x36/0x5a
>  [<c0128096>] kthread+0x0/0x5a
>  [<c0104607>] kernel_thread_helper+0x7/0x10
>  =======================
> Code: 8b 02 25 00 40 02 00 3d 00 40 02 00 75 03 8b 52 0c 8b 02 84 c0 78 04
> 0f 0b eb fe 8b 72 1c 8b 54 24 28 8b 46 04 8b 7c 95 4c 8b 16 <89> 42 04 89
> 10 2b 4e 0c c7 06 00 01 10 00 c7 46 04 00 02 20 00 EIP: [<c014ab7d>]
> free_block+0x61/0xfb SS:ESP 0068:dfd87f44
> ------------[ cut here ]------------
> kernel BUG at fs/locks.c:171!
> invalid opcode: 0000 [#3]
> Modules linked in: ipt_iprange ipt_REDIRECT capi kernelcapi capifs
> ipt_REJECT xt_tcpudp xt_state xt_limit ipt_LOG ipt_MASQUERADE
> iptable_mangle iptable_nat nf_conntrack_ipv4 iptable_filter ip_tables
> x_tables b44 ohci1394 ieee1394 nf_nat_ftp nf_nat nf_conntrack_ftp
> nf_conntrack lcd tda827x saa7134_dvb dvb_pll video_buf_dvb tuner tda1004x
> ves1820 usb_storage usblp saa7134 compat_ioctl32 budget_ci budget_core
> dvb_ttpci dvb_core saa7146_vv video_buf saa7146 ttpci_eeprom via_agp
> ir_kbd_i2c videodev v4l2_common v4l1_compat ir_common agpgart CPU:    0
> EIP:    0060:[<c0158f59>]    Not tainted VLI
> EFLAGS: 00010287   (2.6.22.6 #1)
> EIP is at locks_free_lock+0xb/0x3b
> eax: e1d07f40   ebx: e1d07f24   ecx: dfde7620   edx: c16bebc0
> esi: 00000000   edi: 00000000   ebp: f5f5e0c4   esp: f1309efc
> ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
> Process nmbd (pid: 3522, ti=f1308000 task=f12ba590 task.ti=f1308000)
> Stack: 00000000 c015972b f10b8d4c c1f0d380 02e58f5c f5f5e3a4 000007e8
> 00000000 010b8d4c f5f5e120 e1d07f24 00000001 000000a8 00000000 f5f5eca0
> 00000000 00000000 f5f5e3a4 00000000 f635a260 c015a13f 00000000 0000000e
> 0000000a Call Trace:
>  [<c015972b>] __posix_lock_file+0x44e/0x47f
>  [<c015a13f>] fcntl_setlk64+0xff/0x1f4
>  [<c0156b75>] sys_fcntl64+0x62/0x85
>  [<c0103ade>] syscall_call+0x7/0xb
>  =======================
> Code: 74 1b 8b 15 30 93 48 c0 8d 43 04 89 53 04 89 42 04 a3 30 93 48 c0 c7
> 40 04 30 93 48 c0 5b 5e c3 53 89 c3 8d 40 1c 39 43 1c 74 04 <0f> 0b eb fe
> 8d 43 0c 39 43 0c 74 04 0f 0b eb fe 8d 43 04 39 43 EIP: [<c0158f59>]
> locks_free_lock+0xb/0x3b SS:ESP 0068:f1309efc

[-- Attachment #2: fs-lock-debug.patch --]
[-- Type: text/x-diff, Size: 649 bytes --]

Index: linux-2.6/fs/locks.c
===================================================================
--- linux-2.6.orig/fs/locks.c
+++ linux-2.6/fs/locks.c
@@ -147,7 +147,14 @@ static struct kmem_cache *filelock_cache
 /* Allocate an empty lock structure. */
 static struct file_lock *locks_alloc_lock(void)
 {
-	return kmem_cache_alloc(filelock_cache, GFP_KERNEL);
+	struct file_lock *fl;
+	fl = kmem_cache_alloc(filelock_cache, GFP_KERNEL);
+	if (fl) {
+		BUG_ON(waitqueue_active(&fl->fl_wait));
+		BUG_ON(!list_empty(&fl->fl_block));
+		BUG_ON(!list_empty(&fl->fl_link));
+	}
+	return fl;
 }
 
 static void locks_release_private(struct file_lock *fl)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* 2.6.22.6: kernel BUG at fs/locks.c:171
@ 2007-09-13  9:20 Soeren Sonnenburg
  2007-09-12 23:51 ` Nick Piggin
  0 siblings, 1 reply; 12+ messages in thread
From: Soeren Sonnenburg @ 2007-09-13  9:20 UTC (permalink / raw)
  To: Linux Kernel

[-- Attachment #1: Type: text/plain, Size: 5703 bytes --]

Dear all,

I've just seen this in dmesg on a AMD K7 / kernel 2.6.22.6 machine
(config attached).

Any ideas / which further information needed ?

Soeren

------------[ cut here ]------------
kernel BUG at fs/locks.c:171!
invalid opcode: 0000 [#1]
Modules linked in: ipt_iprange ipt_REDIRECT capi kernelcapi capifs ipt_REJECT xt_tcpudp xt_state xt_limit ipt_LOG ipt_MASQUERADE iptable_mangle iptable_nat nf_conntrack_ipv4 iptable_filter ip_tables x_tables b44 ohci1394 ieee1394 nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack lcd tda827x saa7134_dvb dvb_pll video_buf_dvb tuner tda1004x ves1820 usb_storage usblp saa7134 compat_ioctl32 budget_ci budget_core dvb_ttpci dvb_core saa7146_vv video_buf saa7146 ttpci_eeprom via_agp ir_kbd_i2c videodev v4l2_common v4l1_compat ir_common agpgart
CPU:    0
EIP:    0060:[<c0158f59>]    Not tainted VLI
EFLAGS: 00010206   (2.6.22.6 #1)
EIP is at locks_free_lock+0xb/0x3b
eax: e1d07f9c   ebx: e1d07f80   ecx: f5f5e2f0   edx: 00000000
esi: 00000000   edi: 00000000   ebp: 00000000   esp: da3d7f04
ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
Process mrtg-load (pid: 19688, ti=da3d6000 task=f5e3a030 task.ti=da3d6000)
Stack: 00000000 c015972b 00000002 c04889c8 c012b920 f5f5e290 c048541c f0ed3ca0 
       01485414 00000000 e1d07f80 00000000 f0f39f58 44ef35f1 f62fc2ac 00000000 
       00000000 f5f5e290 00000000 d23106c0 c015a891 00000000 00000007 00000004 
Call Trace:
 [<c015972b>] __posix_lock_file+0x44e/0x47f
 [<c012b920>] getnstimeofday+0x2b/0xaf
 [<c015a891>] fcntl_setlk+0xff/0x1f6
 [<c011d836>] do_setitimer+0xfa/0x226
 [<c0156b87>] sys_fcntl64+0x74/0x85
 [<c0103ade>] syscall_call+0x7/0xb
 =======================
Code: 74 1b 8b 15 30 93 48 c0 8d 43 04 89 53 04 89 42 04 a3 30 93 48 c0 c7 40 04 30 93 48 c0 5b 5e c3 53 89 c3 8d 40 1c 39 43 1c 74 04 <0f> 0b eb fe 8d 43 0c 39 43 0c 74 04 0f 0b eb fe 8d 43 04 39 43 
EIP: [<c0158f59>] locks_free_lock+0xb/0x3b SS:ESP 0068:da3d7f04
BUG: unable to handle kernel paging request at virtual address 9ee420b0
 printing eip:
c014ab7d
*pde = 00000000
Oops: 0002 [#2]
Modules linked in: ipt_iprange ipt_REDIRECT capi kernelcapi capifs ipt_REJECT xt_tcpudp xt_state xt_limit ipt_LOG ipt_MASQUERADE iptable_mangle iptable_nat nf_conntrack_ipv4 iptable_filter ip_tables x_tables b44 ohci1394 ieee1394 nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack lcd tda827x saa7134_dvb dvb_pll video_buf_dvb tuner tda1004x ves1820 usb_storage usblp saa7134 compat_ioctl32 budget_ci budget_core dvb_ttpci dvb_core saa7146_vv video_buf saa7146 ttpci_eeprom via_agp ir_kbd_i2c videodev v4l2_common v4l1_compat ir_common agpgart
CPU:    0
EIP:    0060:[<c014ab7d>]    Not tainted VLI
EFLAGS: 00010082   (2.6.22.6 #1)
EIP is at free_block+0x61/0xfb
eax: a75b2c19   ebx: c1cf6c10   ecx: e1d070c4   edx: 9ee420ac
esi: e1d07000   edi: dfde6960   ebp: dfde7620   esp: dfd87f44
ds: 007b   es: 007b   fs: 0000  gs: 0000  ss: 0068
Process events/0 (pid: 4, ti=dfd86000 task=dfdc4a50 task.ti=dfd86000)
Stack: 00000012 00000000 00000018 00000000 c1cf6c10 c1cf6c10 00000018 c1cf6c00 
       dfde7620 c014ac86 00000000 dfde6960 dfde7620 c0521d20 00000000 c014b869 
       00000000 00000000 dfde69e0 c0521d20 c014b827 c0125955 dfdc4b5c 8f0c99c0 
Call Trace:
 [<c014ac86>] drain_array+0x6f/0x89
 [<c014b869>] cache_reap+0x42/0xde
 [<c014b827>] cache_reap+0x0/0xde
 [<c0125955>] run_workqueue+0x6b/0xdf
 [<c0125ec7>] worker_thread+0x0/0xbd
 [<c0125f79>] worker_thread+0xb2/0xbd
 [<c0128221>] autoremove_wake_function+0x0/0x35
 [<c01280cc>] kthread+0x36/0x5a
 [<c0128096>] kthread+0x0/0x5a
 [<c0104607>] kernel_thread_helper+0x7/0x10
 =======================
Code: 8b 02 25 00 40 02 00 3d 00 40 02 00 75 03 8b 52 0c 8b 02 84 c0 78 04 0f 0b eb fe 8b 72 1c 8b 54 24 28 8b 46 04 8b 7c 95 4c 8b 16 <89> 42 04 89 10 2b 4e 0c c7 06 00 01 10 00 c7 46 04 00 02 20 00 
EIP: [<c014ab7d>] free_block+0x61/0xfb SS:ESP 0068:dfd87f44
------------[ cut here ]------------
kernel BUG at fs/locks.c:171!
invalid opcode: 0000 [#3]
Modules linked in: ipt_iprange ipt_REDIRECT capi kernelcapi capifs ipt_REJECT xt_tcpudp xt_state xt_limit ipt_LOG ipt_MASQUERADE iptable_mangle iptable_nat nf_conntrack_ipv4 iptable_filter ip_tables x_tables b44 ohci1394 ieee1394 nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack lcd tda827x saa7134_dvb dvb_pll video_buf_dvb tuner tda1004x ves1820 usb_storage usblp saa7134 compat_ioctl32 budget_ci budget_core dvb_ttpci dvb_core saa7146_vv video_buf saa7146 ttpci_eeprom via_agp ir_kbd_i2c videodev v4l2_common v4l1_compat ir_common agpgart
CPU:    0
EIP:    0060:[<c0158f59>]    Not tainted VLI
EFLAGS: 00010287   (2.6.22.6 #1)
EIP is at locks_free_lock+0xb/0x3b
eax: e1d07f40   ebx: e1d07f24   ecx: dfde7620   edx: c16bebc0
esi: 00000000   edi: 00000000   ebp: f5f5e0c4   esp: f1309efc
ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
Process nmbd (pid: 3522, ti=f1308000 task=f12ba590 task.ti=f1308000)
Stack: 00000000 c015972b f10b8d4c c1f0d380 02e58f5c f5f5e3a4 000007e8 00000000 
       010b8d4c f5f5e120 e1d07f24 00000001 000000a8 00000000 f5f5eca0 00000000 
       00000000 f5f5e3a4 00000000 f635a260 c015a13f 00000000 0000000e 0000000a 
Call Trace:
 [<c015972b>] __posix_lock_file+0x44e/0x47f
 [<c015a13f>] fcntl_setlk64+0xff/0x1f4
 [<c0156b75>] sys_fcntl64+0x62/0x85
 [<c0103ade>] syscall_call+0x7/0xb
 =======================
Code: 74 1b 8b 15 30 93 48 c0 8d 43 04 89 53 04 89 42 04 a3 30 93 48 c0 c7 40 04 30 93 48 c0 5b 5e c3 53 89 c3 8d 40 1c 39 43 1c 74 04 <0f> 0b eb fe 8d 43 0c 39 43 0c 74 04 0f 0b eb fe 8d 43 04 39 43 
EIP: [<c0158f59>] locks_free_lock+0xb/0x3b SS:ESP 0068:f1309efc

-- 
Sometimes, there's a moment as you're waking, when you become aware of
the real world around you, but you're still dreaming.

[-- Attachment #2: config.gz --]
[-- Type: application/x-gzip, Size: 13087 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.22.6: kernel BUG at fs/locks.c:171
  2007-09-14  6:02   ` Soeren Sonnenburg
@ 2007-09-13 21:22     ` Nick Piggin
  2007-09-15  9:47       ` Soeren Sonnenburg
  2007-09-24 20:21       ` Soeren Sonnenburg
  0 siblings, 2 replies; 12+ messages in thread
From: Nick Piggin @ 2007-09-13 21:22 UTC (permalink / raw)
  To: Soeren Sonnenburg; +Cc: linux-fsdevel, Linux Kernel

On Friday 14 September 2007 16:02, Soeren Sonnenburg wrote:
> On Thu, 2007-09-13 at 09:51 +1000, Nick Piggin wrote:
> > On Thursday 13 September 2007 19:20, Soeren Sonnenburg wrote:
> > > Dear all,
> > >
> > > I've just seen this in dmesg on a AMD K7 / kernel 2.6.22.6 machine
> > > (config attached).
> > >
> > > Any ideas / which further information needed ?
> >
> > Thanks for the report. Is it reproduceable? It seems like the
> > locks_free_lock call that's oopsing is coming from __posix_lock_file.
> > The actual function looks fine, but the lock being freed could have
> > been corrupted if there was slab corruption, or a hardware corruption.
> >
> > You could: try running memtest86+ overnight. And try the following
> > patch and turn on slab debugging then try to reproduce the problem.
>
> OK so far I've run memtest86+ 1.40 from freedos for 8 hrs (v1.70 hung on
> startup) - nothing.

Thanks.

> Could this corruption be caused by a pci card/driver? I am asking as I
> am using a new dvb-t card (asus p7131) and the oops happened after 5 or
> 6 days of uptime just about a day after watching some movie (very bad
> reception/lots of errors).

It could be caused by that, definitely. slab debugging plus my earlier
patch may help to narrow it down. (or stress testing with / without the
dvb card in action).


> However this machine used to have uptimes of months before the dvb card
> was in there and the kernel version upgrade (don't know which version
> that was...).
>
> Anyway I am not sure if this is reproducible, but I will keep memtest
> running today and then proceed as you said...

OK. Don't put too much effort into memtest if it hasn't caught anything
by now -- it's really only exercising your CPU and memory, so even if it
is your video hardware, it probably won't find the problem.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.22.6: kernel BUG at fs/locks.c:171
  2007-09-12 23:51 ` Nick Piggin
@ 2007-09-14  6:02   ` Soeren Sonnenburg
  2007-09-13 21:22     ` Nick Piggin
  0 siblings, 1 reply; 12+ messages in thread
From: Soeren Sonnenburg @ 2007-09-14  6:02 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-fsdevel, Linux Kernel

On Thu, 2007-09-13 at 09:51 +1000, Nick Piggin wrote:
> On Thursday 13 September 2007 19:20, Soeren Sonnenburg wrote:
> > Dear all,
> >
> > I've just seen this in dmesg on a AMD K7 / kernel 2.6.22.6 machine
> > (config attached).
> >
> > Any ideas / which further information needed ?
> 
> Thanks for the report. Is it reproduceable? It seems like the
> locks_free_lock call that's oopsing is coming from __posix_lock_file.
> The actual function looks fine, but the lock being freed could have
> been corrupted if there was slab corruption, or a hardware corruption.
> 
> You could: try running memtest86+ overnight. And try the following
> patch and turn on slab debugging then try to reproduce the problem.

OK so far I've run memtest86+ 1.40 from freedos for 8 hrs (v1.70 hung on
startup) - nothing.

Could this corruption be caused by a pci card/driver? I am asking as I
am using a new dvb-t card (asus p7131) and the oops happened after 5 or
6 days of uptime just about a day after watching some movie (very bad
reception/lots of errors). 

However this machine used to have uptimes of months before the dvb card
was in there and the kernel version upgrade (don't know which version
that was...).

Anyway I am not sure if this is reproducible, but I will keep memtest
running today and then proceed as you said...

Thanks,
Soeren
-- 
Sometimes, there's a moment as you're waking, when you become aware of
the real world around you, but you're still dreaming.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.22.6: kernel BUG at fs/locks.c:171
  2007-09-13 21:22     ` Nick Piggin
@ 2007-09-15  9:47       ` Soeren Sonnenburg
  2007-09-15 10:22         ` Soeren Sonnenburg
  2007-09-24 20:21       ` Soeren Sonnenburg
  1 sibling, 1 reply; 12+ messages in thread
From: Soeren Sonnenburg @ 2007-09-15  9:47 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-fsdevel, Linux Kernel

On Fri, 2007-09-14 at 07:22 +1000, Nick Piggin wrote:
> On Friday 14 September 2007 16:02, Soeren Sonnenburg wrote:
> > On Thu, 2007-09-13 at 09:51 +1000, Nick Piggin wrote:
> > > On Thursday 13 September 2007 19:20, Soeren Sonnenburg wrote:
> > > > Dear all,
> > > >
> > > > I've just seen this in dmesg on a AMD K7 / kernel 2.6.22.6 machine
> > > > (config attached).
> > > >
> > > > Any ideas / which further information needed ?
> > >
> > > Thanks for the report. Is it reproduceable? It seems like the
> > > locks_free_lock call that's oopsing is coming from __posix_lock_file.
> > > The actual function looks fine, but the lock being freed could have
> > > been corrupted if there was slab corruption, or a hardware corruption.
> > >
> > > You could: try running memtest86+ overnight. And try the following
> > > patch and turn on slab debugging then try to reproduce the problem.
> >
> > OK so far I've run memtest86+ 1.40 from freedos for 8 hrs (v1.70 hung on
> > startup) - nothing.
> 
> Thanks.
> 
> > Could this corruption be caused by a pci card/driver? I am asking as I
> > am using a new dvb-t card (asus p7131) and the oops happened after 5 or
> > 6 days of uptime just about a day after watching some movie (very bad
> > reception/lots of errors).
> 
> It could be caused by that, definitely. slab debugging plus my earlier
> patch may help to narrow it down. (or stress testing with / without the
> dvb card in action).
> 
> 
> > However this machine used to have uptimes of months before the dvb card
> > was in there and the kernel version upgrade (don't know which version
> > that was...).
> >
> > Anyway I am not sure if this is reproducible, but I will keep memtest
> > running today and then proceed as you said...
> 
> OK. Don't put too much effort into memtest if it hasn't caught anything
> by now -- it's really only exercising your CPU and memory, so even if it
> is your video hardware, it probably won't find the problem.

Memtest did not find anything after 16 passes so I finally stopped it
applied your patch and used

CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_SLAB_LEAK=y

and booted into the new kernel.

A few hours later the machine hung (due to nmi watchdog rebooted), so I
restarted and disabled the watchdog and while compiling a kernel with a
``more minimal'' config I got this (not sure whether this is related/the
cause .../ note that I don't use a swapfile/partition).

I would need more guidance on what to try now...

Thanks!
Soeren

swap_dup: Bad swap file entry 28c8af9d
VM: killing process cc1
Eeek! page_mapcount(page) went negative! (-1)
  page pfn = 36233
  page->flags = 40000834
  page->count = 2
  page->mapping = c1cfed14
  vma->vm_ops = run_init_process+0x3feff000/0x14
------------[ cut here ]------------
kernel BUG at mm/rmap.c:628!
invalid opcode: 0000 [#1]
Modules linked in: ipt_iprange ipt_REDIRECT capi kernelcapi capifs ipt_REJECT xt_tcpudp xt_state xt_limit ipt_LOG ipt_MASQUERADE iptable_mangle iptable_nat nf_conntrack_ipv4 iptable_filter ip_tables x_tables b44 ohci1394 ieee1394 nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack lcd tda827x saa7134_dvb dvb_pll video_buf_dvb tda1004x tuner ves1820 usb_storage usblp budget_ci budget_core saa7134 compat_ioctl32 dvb_ttpci dvb_core saa7146_vv video_buf saa7146 ttpci_eeprom ir_kbd_i2c videodev v4l2_common v4l1_compat ir_common via_agp agpgart
CPU:    0
EIP:    0060:[<c0144487>]    Not tainted VLI
EFLAGS: 00010246   (2.6.22.6 #2)
EIP is at page_remove_rmap+0xd4/0x101
eax: 00000000   ebx: c16c4660   ecx: 00000000   edx: 00000000
esi: d4570b30   edi: d6560a78   ebp: b7400000   esp: d6265eac
ds: 007b   es: 007b   fs: 0000  gs: 0000  ss: 0068
Process cc1 (pid: 26095, ti=d6264000 task=d67af5b0 task.ti=d6264000)
Stack: c0422e26 c1cfed14 c16c4660 b729e000 c013f5b8 36233cce 00000000 d4570b30 
       d6265f20 00000000 00000001 f4ffcb70 f483a3b8 c04f44b8 00000000 ffffffff 
       f4ffcb70 00303ff4 b7c18000 00000000 d6265f20 f4a8c510 f483a3b8 00000009 
Call Trace:
 [<c013f5b8>] unmap_vmas+0x23f/0x404
 [<c0141c09>] exit_mmap+0x5f/0xc9
 [<c011923a>] mmput+0x1b/0x5e
 [<c011cf97>] do_exit+0x1a0/0x606
 [<c01135f8>] do_page_fault+0x49c/0x518
 [<c011e340>] __do_softirq+0x35/0x75
 [<c011315c>] do_page_fault+0x0/0x518
 [<c039aada>] error_code+0x6a/0x70
 =======================
Code: c0 74 0d 8b 50 08 b8 56 2e 42 c0 e8 ac f4 fe ff 8b 46 48 85 c0 74 14 8b 40 10 85 c0 74 0d 8b 50 2c b8 75 2e 42 c0 e8 91 f4 fe ff <0f> 0b eb fe 8b 53 10 8b 03 83 e2 01 c1 e8 1e f7 da 83 c2 04 69 
EIP: [<c0144487>] page_remove_rmap+0xd4/0x101 SS:ESP 0068:d6265eac
Fixing recursive fault but reboot is needed!


-- 
Sometimes, there's a moment as you're waking, when you become aware of
the real world around you, but you're still dreaming.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.22.6: kernel BUG at fs/locks.c:171
  2007-09-15  9:47       ` Soeren Sonnenburg
@ 2007-09-15 10:22         ` Soeren Sonnenburg
  2007-09-16  8:15           ` Nick Piggin
  0 siblings, 1 reply; 12+ messages in thread
From: Soeren Sonnenburg @ 2007-09-15 10:22 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-fsdevel, Linux Kernel

On Sat, 2007-09-15 at 09:47 +0000, Soeren Sonnenburg wrote:
> On Fri, 2007-09-14 at 07:22 +1000, Nick Piggin wrote:
> > On Friday 14 September 2007 16:02, Soeren Sonnenburg wrote:
> > > On Thu, 2007-09-13 at 09:51 +1000, Nick Piggin wrote:
> > > > On Thursday 13 September 2007 19:20, Soeren Sonnenburg wrote:
> > > > > Dear all,
> > > > >
> > > > > I've just seen this in dmesg on a AMD K7 / kernel 2.6.22.6 machine
> > > > > (config attached).
> > > > >
> > > > > Any ideas / which further information needed ?
> > > >
> > > > Thanks for the report. Is it reproduceable? It seems like the
> > > > locks_free_lock call that's oopsing is coming from __posix_lock_file.
> > > > The actual function looks fine, but the lock being freed could have
> > > > been corrupted if there was slab corruption, or a hardware corruption.
> > > >
> > > > You could: try running memtest86+ overnight. And try the following
> > > > patch and turn on slab debugging then try to reproduce the problem.
> > >
> > > OK so far I've run memtest86+ 1.40 from freedos for 8 hrs (v1.70 hung on
> > > startup) - nothing.
> > 
> > Thanks.
> > 
> > > Could this corruption be caused by a pci card/driver? I am asking as I
> > > am using a new dvb-t card (asus p7131) and the oops happened after 5 or
> > > 6 days of uptime just about a day after watching some movie (very bad
> > > reception/lots of errors).
> > 
> > It could be caused by that, definitely. slab debugging plus my earlier
> > patch may help to narrow it down. (or stress testing with / without the
> > dvb card in action).
> > 
> > 
> > > However this machine used to have uptimes of months before the dvb card
> > > was in there and the kernel version upgrade (don't know which version
> > > that was...).
> > >
> > > Anyway I am not sure if this is reproducible, but I will keep memtest
> > > running today and then proceed as you said...
> > 
> > OK. Don't put too much effort into memtest if it hasn't caught anything
> > by now -- it's really only exercising your CPU and memory, so even if it
> > is your video hardware, it probably won't find the problem.
> 
> Memtest did not find anything after 16 passes so I finally stopped it
> applied your patch and used
> 
> CONFIG_DEBUG_SLAB=y
> CONFIG_DEBUG_SLAB_LEAK=y
> 
> and booted into the new kernel.
> 
> A few hours later the machine hung (due to nmi watchdog rebooted), so I
> restarted and disabled the watchdog and while compiling a kernel with a
> ``more minimal'' config I got this (not sure whether this is related/the
> cause .../ note that I don't use a swapfile/partition).
> 
> I would need more guidance on what to try now...
> 
> Thanks!
> Soeren
> 
> swap_dup: Bad swap file entry 28c8af9d
> VM: killing process cc1
> Eeek! page_mapcount(page) went negative! (-1)
>   page pfn = 36233
>   page->flags = 40000834
>   page->count = 2
>   page->mapping = c1cfed14
>   vma->vm_ops = run_init_process+0x3feff000/0x14
> ------------[ cut here ]------------
> kernel BUG at mm/rmap.c:628!
> invalid opcode: 0000 [#1]
> Modules linked in: ipt_iprange ipt_REDIRECT capi kernelcapi capifs ipt_REJECT xt_tcpudp xt_state xt_limit ipt_LOG ipt_MASQUERADE iptable_mangle iptable_nat nf_conntrack_ipv4 iptable_filter ip_tables x_tables b44 ohci1394 ieee1394 nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack lcd tda827x saa7134_dvb dvb_pll video_buf_dvb tda1004x tuner ves1820 usb_storage usblp budget_ci budget_core saa7134 compat_ioctl32 dvb_ttpci dvb_core saa7146_vv video_buf saa7146 ttpci_eeprom ir_kbd_i2c videodev v4l2_common v4l1_compat ir_common via_agp agpgart
> CPU:    0
> EIP:    0060:[<c0144487>]    Not tainted VLI
> EFLAGS: 00010246   (2.6.22.6 #2)
> EIP is at page_remove_rmap+0xd4/0x101
> eax: 00000000   ebx: c16c4660   ecx: 00000000   edx: 00000000
> esi: d4570b30   edi: d6560a78   ebp: b7400000   esp: d6265eac
> ds: 007b   es: 007b   fs: 0000  gs: 0000  ss: 0068
> Process cc1 (pid: 26095, ti=d6264000 task=d67af5b0 task.ti=d6264000)
> Stack: c0422e26 c1cfed14 c16c4660 b729e000 c013f5b8 36233cce 00000000 d4570b30 
>        d6265f20 00000000 00000001 f4ffcb70 f483a3b8 c04f44b8 00000000 ffffffff 
>        f4ffcb70 00303ff4 b7c18000 00000000 d6265f20 f4a8c510 f483a3b8 00000009 
> Call Trace:
>  [<c013f5b8>] unmap_vmas+0x23f/0x404
>  [<c0141c09>] exit_mmap+0x5f/0xc9
>  [<c011923a>] mmput+0x1b/0x5e
>  [<c011cf97>] do_exit+0x1a0/0x606
>  [<c01135f8>] do_page_fault+0x49c/0x518
>  [<c011e340>] __do_softirq+0x35/0x75
>  [<c011315c>] do_page_fault+0x0/0x518
>  [<c039aada>] error_code+0x6a/0x70
>  =======================
> Code: c0 74 0d 8b 50 08 b8 56 2e 42 c0 e8 ac f4 fe ff 8b 46 48 85 c0 74 14 8b 40 10 85 c0 74 0d 8b 50 2c b8 75 2e 42 c0 e8 91 f4 fe ff <0f> 0b eb fe 8b 53 10 8b 03 83 e2 01 c1 e8 1e f7 da 83 c2 04 69 
> EIP: [<c0144487>] page_remove_rmap+0xd4/0x101 SS:ESP 0068:d6265eac
> Fixing recursive fault but reboot is needed!

Hmmhh, so now I rebooted and again tried to

$ make 

the new kernel which again triggered this(?) BUG:

Any ideas?
Soeren.

Eeek! page_mapcount(page) went negative! (-1)
  page pfn = 18722
  page->flags = 40000000
  page->count = 1
  page->mapping = 00000000
  vma->vm_ops = run_init_process+0x3feff000/0x14
------------[ cut here ]------------
kernel BUG at mm/rmap.c:628!
invalid opcode: 0000 [#1]
Modules linked in: ipt_iprange ipt_REDIRECT capi kernelcapi capifs ipt_REJECT xt_tcpudp xt_state xt_limit ipt_LOG ipt_MASQUERADE iptable_mangle iptable_nat nf_conntrack_ipv4 iptable_filter ip_tables x_t
CPU:    0
EIP:    0060:[<c0144487>]    Not tainted VLI
EFLAGS: 00010246   (2.6.22.6 #2)
EIP is at page_remove_rmap+0xd4/0x101
eax: 00000000   ebx: c130e440   ecx: 00000000   edx: 00000000
esi: f438b510   edi: f3328ac8   ebp: c130e440   esp: f28d5eec
ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
Process cc1 (pid: 17957, ti=f28d4000 task=f60bb0d0 task.ti=f28d4000)
Stack: c0422e26 00000000 f3328ac8 00000002 c013f185 b76b2000 f438b510 f43013b8 
       c1a7c640 18722229 b76b2000 f3328ac8 f438b510 c014021d f3328ac8 f4360b74 
       f43013f8 18722229 00100073 b76b2000 f43013b8 f4360b74 00000100 f28d5f90 
Call Trace:
 [<c013f185>] do_wp_page+0x28a/0x35c
 [<c014021d>] __handle_mm_fault+0x626/0x6a4
 [<c0113368>] do_page_fault+0x20c/0x518
 [<c011315c>] do_page_fault+0x0/0x518
 [<c039aada>] error_code+0x6a/0x70
 =======================
Code: c0 74 0d 8b 50 08 b8 56 2e 42 c0 e8 ac f4 fe ff 8b 46 48 85 c0 74 14 8b 40 10 85 c0 74 0d 8b 50 2c b8 75 2e 42 c0 e8 91 f4 fe ff <0f> 0b eb fe 8b 53 10 8b 03 83 e2 01 c1 e8 1e f7 da 83 c2 04 69 
EIP: [<c0144487>] page_remove_rmap+0xd4/0x101 SS:ESP 0068:f28d5eec
Eeek! page_mapcount(page) went negative! (-2)
  page pfn = 18722
  page->flags = 40000004
  page->count = 1
  page->mapping = 00000000
  vma->vm_ops = run_init_process+0x3feff000/0x14
------------[ cut here ]------------
kernel BUG at mm/rmap.c:628!
invalid opcode: 0000 [#2]
Modules linked in: ipt_iprange ipt_REDIRECT capi kernelcapi capifs ipt_REJECT xt_tcpudp xt_state xt_limit ipt_LOG ipt_MASQUERADE iptable_mangle iptable_nat nf_conntrack_ipv4 iptable_filter ip_tables x_t
CPU:    0
EIP:    0060:[<c0144487>]    Not tainted VLI
EFLAGS: 00010246   (2.6.22.6 #2)
EIP is at page_remove_rmap+0xd4/0x101
eax: 00000000   ebx: c130e440   ecx: 00000000   edx: 00000000
esi: f438b510   edi: f3328ac8   ebp: b7800000   esp: f28d5d30
ds: 007b   es: 007b   fs: 0000  gs: 0000  ss: 0068
Process cc1 (pid: 17957, ti=f28d4000 task=f60bb0d0 task.ti=f28d4000)
Stack: c0422e26 00000000 c130e440 b76b2000 c013f5b8 18722229 00000000 f438b510 
       f28d5da4 00000000 00000001 f4360b74 f43013b8 c04f44b8 00000000 ffffffff 
       f4360b74 00173c7a b7c03000 00000000 f28d5da4 f6754cf0 f43013b8 0000000b 
Call Trace:
 [<c013f5b8>] unmap_vmas+0x23f/0x404
 [<c0141c09>] exit_mmap+0x5f/0xc9
 [<c011923a>] mmput+0x1b/0x5e
 [<c011cf97>] do_exit+0x1a0/0x606
 [<c0104db5>] die+0x188/0x190
 [<c0105123>] do_invalid_op+0x0/0x8a
 [<c01051a4>] do_invalid_op+0x81/0x8a
 [<c0144487>] page_remove_rmap+0xd4/0x101
 [<c011ae03>] wake_up_klogd+0x33/0x35
 [<c01066e5>] timer_interrupt+0x1d/0x23
 [<c013445c>] handle_IRQ_event+0x1a/0x3f
 [<c039aada>] error_code+0x6a/0x70
 [<c0144487>] page_remove_rmap+0xd4/0x101
 [<c013f185>] do_wp_page+0x28a/0x35c
 [<c014021d>] __handle_mm_fault+0x626/0x6a4
 [<c0113368>] do_page_fault+0x20c/0x518
 [<c011315c>] do_page_fault+0x0/0x518
 [<c039aada>] error_code+0x6a/0x70
 =======================
Code: c0 74 0d 8b 50 08 b8 56 2e 42 c0 e8 ac f4 fe ff 8b 46 48 85 c0 74 14 8b 40 10 85 c0 74 0d 8b 50 2c b8 75 2e 42 c0 e8 91 f4 fe ff <0f> 0b eb fe 8b 53 10 8b 03 83 e2 01 c1 e8 1e f7 da 83 c2 04 69 
EIP: [<c0144487>] page_remove_rmap+0xd4/0x101 SS:ESP 0068:f28d5d30
Fixing recursive fault but reboot is needed!
-- 
Sometimes, there's a moment as you're waking, when you become aware of
the real world around you, but you're still dreaming.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.22.6: kernel BUG at fs/locks.c:171
  2007-09-15 10:22         ` Soeren Sonnenburg
@ 2007-09-16  8:15           ` Nick Piggin
  2007-09-17 13:43             ` Soeren Sonnenburg
  0 siblings, 1 reply; 12+ messages in thread
From: Nick Piggin @ 2007-09-16  8:15 UTC (permalink / raw)
  To: Soeren Sonnenburg, linux-mm; +Cc: linux-fsdevel, Linux Kernel

On Saturday 15 September 2007 20:22, Soeren Sonnenburg wrote:
> On Sat, 2007-09-15 at 09:47 +0000, Soeren Sonnenburg wrote:

> > Memtest did not find anything after 16 passes so I finally stopped it
> > applied your patch and used
> >
> > CONFIG_DEBUG_SLAB=y
> > CONFIG_DEBUG_SLAB_LEAK=y
> >
> > and booted into the new kernel.
> >
> > A few hours later the machine hung (due to nmi watchdog rebooted), so I
> > restarted and disabled the watchdog and while compiling a kernel with a
> > ``more minimal'' config I got this (not sure whether this is related/the
> > cause .../ note that I don't use a swapfile/partition).
> >
> > I would need more guidance on what to try now...
> >
> > Thanks!
> > Soeren
> >
> > swap_dup: Bad swap file entry 28c8af9d

Hmm, this is another telltale symptom of either bad hardware
or a memory scribbling bug.


> > VM: killing process cc1
> > Eeek! page_mapcount(page) went negative! (-1)
> >   page pfn = 36233
> >   page->flags = 40000834
> >   page->count = 2
> >   page->mapping = c1cfed14
> >   vma->vm_ops = run_init_process+0x3feff000/0x14

And these are probably related (it's just gone off and started
performing VM operations on the wrong page...).

Had you been using the dvb card since rebooting when you saw
these messages come up? What happens if you remove the card
from the system?


> > ------------[ cut here ]------------
> > kernel BUG at mm/rmap.c:628!
> > invalid opcode: 0000 [#1]
> > Modules linked in: ipt_iprange ipt_REDIRECT capi kernelcapi capifs
> > ipt_REJECT xt_tcpudp xt_state xt_limit ipt_LOG ipt_MASQUERADE
> > iptable_mangle iptable_nat nf_conntrack_ipv4 iptable_filter ip_tables
> > x_tables b44 ohci1394 ieee1394 nf_nat_ftp nf_nat nf_conntrack_ftp
> > nf_conntrack lcd tda827x saa7134_dvb dvb_pll video_buf_dvb tda1004x tuner
> > ves1820 usb_storage usblp budget_ci budget_core saa7134 compat_ioctl32
> > dvb_ttpci dvb_core saa7146_vv video_buf saa7146 ttpci_eeprom ir_kbd_i2c
> > videodev v4l2_common v4l1_compat ir_common via_agp agpgart CPU:    0
> > EIP:    0060:[<c0144487>]    Not tainted VLI
> > EFLAGS: 00010246   (2.6.22.6 #2)
> > EIP is at page_remove_rmap+0xd4/0x101
> > eax: 00000000   ebx: c16c4660   ecx: 00000000   edx: 00000000
> > esi: d4570b30   edi: d6560a78   ebp: b7400000   esp: d6265eac
> > ds: 007b   es: 007b   fs: 0000  gs: 0000  ss: 0068
> > Process cc1 (pid: 26095, ti=d6264000 task=d67af5b0 task.ti=d6264000)
> > Stack: c0422e26 c1cfed14 c16c4660 b729e000 c013f5b8 36233cce 00000000
> > d4570b30 d6265f20 00000000 00000001 f4ffcb70 f483a3b8 c04f44b8 00000000
> > ffffffff f4ffcb70 00303ff4 b7c18000 00000000 d6265f20 f4a8c510 f483a3b8
> > 00000009 Call Trace:
> >  [<c013f5b8>] unmap_vmas+0x23f/0x404
> >  [<c0141c09>] exit_mmap+0x5f/0xc9
> >  [<c011923a>] mmput+0x1b/0x5e
> >  [<c011cf97>] do_exit+0x1a0/0x606
> >  [<c01135f8>] do_page_fault+0x49c/0x518
> >  [<c011e340>] __do_softirq+0x35/0x75
> >  [<c011315c>] do_page_fault+0x0/0x518
> >  [<c039aada>] error_code+0x6a/0x70
> >  =======================
> > Code: c0 74 0d 8b 50 08 b8 56 2e 42 c0 e8 ac f4 fe ff 8b 46 48 85 c0 74
> > 14 8b 40 10 85 c0 74 0d 8b 50 2c b8 75 2e 42 c0 e8 91 f4 fe ff <0f> 0b eb
> > fe 8b 53 10 8b 03 83 e2 01 c1 e8 1e f7 da 83 c2 04 69 EIP: [<c0144487>]
> > page_remove_rmap+0xd4/0x101 SS:ESP 0068:d6265eac Fixing recursive fault
> > but reboot is needed!
>
> Hmmhh, so now I rebooted and again tried to
>
> $ make
>
> the new kernel which again triggered this(?) BUG:
>
> Any ideas?
> Soeren.
>
> Eeek! page_mapcount(page) went negative! (-1)
>   page pfn = 18722
>   page->flags = 40000000
>   page->count = 1
>   page->mapping = 00000000
>   vma->vm_ops = run_init_process+0x3feff000/0x14
> ------------[ cut here ]------------
> kernel BUG at mm/rmap.c:628!
> invalid opcode: 0000 [#1]
> Modules linked in: ipt_iprange ipt_REDIRECT capi kernelcapi capifs
> ipt_REJECT xt_tcpudp xt_state xt_limit ipt_LOG ipt_MASQUERADE
> iptable_mangle iptable_nat nf_conntrack_ipv4 iptable_filter ip_tables x_t
> CPU:    0
> EIP:    0060:[<c0144487>]    Not tainted VLI
> EFLAGS: 00010246   (2.6.22.6 #2)
> EIP is at page_remove_rmap+0xd4/0x101
> eax: 00000000   ebx: c130e440   ecx: 00000000   edx: 00000000
> esi: f438b510   edi: f3328ac8   ebp: c130e440   esp: f28d5eec
> ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
> Process cc1 (pid: 17957, ti=f28d4000 task=f60bb0d0 task.ti=f28d4000)
> Stack: c0422e26 00000000 f3328ac8 00000002 c013f185 b76b2000 f438b510
> f43013b8 c1a7c640 18722229 b76b2000 f3328ac8 f438b510 c014021d f3328ac8
> f4360b74 f43013f8 18722229 00100073 b76b2000 f43013b8 f4360b74 00000100
> f28d5f90 Call Trace:
>  [<c013f185>] do_wp_page+0x28a/0x35c
>  [<c014021d>] __handle_mm_fault+0x626/0x6a4
>  [<c0113368>] do_page_fault+0x20c/0x518
>  [<c011315c>] do_page_fault+0x0/0x518
>  [<c039aada>] error_code+0x6a/0x70
>  =======================
> Code: c0 74 0d 8b 50 08 b8 56 2e 42 c0 e8 ac f4 fe ff 8b 46 48 85 c0 74 14
> 8b 40 10 85 c0 74 0d 8b 50 2c b8 75 2e 42 c0 e8 91 f4 fe ff <0f> 0b eb fe
> 8b 53 10 8b 03 83 e2 01 c1 e8 1e f7 da 83 c2 04 69 EIP: [<c0144487>]
> page_remove_rmap+0xd4/0x101 SS:ESP 0068:f28d5eec Eeek! page_mapcount(page)
> went negative! (-2)
>   page pfn = 18722
>   page->flags = 40000004
>   page->count = 1
>   page->mapping = 00000000
>   vma->vm_ops = run_init_process+0x3feff000/0x14
> ------------[ cut here ]------------
> kernel BUG at mm/rmap.c:628!
> invalid opcode: 0000 [#2]
> Modules linked in: ipt_iprange ipt_REDIRECT capi kernelcapi capifs
> ipt_REJECT xt_tcpudp xt_state xt_limit ipt_LOG ipt_MASQUERADE
> iptable_mangle iptable_nat nf_conntrack_ipv4 iptable_filter ip_tables x_t
> CPU:    0
> EIP:    0060:[<c0144487>]    Not tainted VLI
> EFLAGS: 00010246   (2.6.22.6 #2)
> EIP is at page_remove_rmap+0xd4/0x101
> eax: 00000000   ebx: c130e440   ecx: 00000000   edx: 00000000
> esi: f438b510   edi: f3328ac8   ebp: b7800000   esp: f28d5d30
> ds: 007b   es: 007b   fs: 0000  gs: 0000  ss: 0068
> Process cc1 (pid: 17957, ti=f28d4000 task=f60bb0d0 task.ti=f28d4000)
> Stack: c0422e26 00000000 c130e440 b76b2000 c013f5b8 18722229 00000000
> f438b510 f28d5da4 00000000 00000001 f4360b74 f43013b8 c04f44b8 00000000
> ffffffff f4360b74 00173c7a b7c03000 00000000 f28d5da4 f6754cf0 f43013b8
> 0000000b Call Trace:
>  [<c013f5b8>] unmap_vmas+0x23f/0x404
>  [<c0141c09>] exit_mmap+0x5f/0xc9
>  [<c011923a>] mmput+0x1b/0x5e
>  [<c011cf97>] do_exit+0x1a0/0x606
>  [<c0104db5>] die+0x188/0x190
>  [<c0105123>] do_invalid_op+0x0/0x8a
>  [<c01051a4>] do_invalid_op+0x81/0x8a
>  [<c0144487>] page_remove_rmap+0xd4/0x101
>  [<c011ae03>] wake_up_klogd+0x33/0x35
>  [<c01066e5>] timer_interrupt+0x1d/0x23
>  [<c013445c>] handle_IRQ_event+0x1a/0x3f
>  [<c039aada>] error_code+0x6a/0x70
>  [<c0144487>] page_remove_rmap+0xd4/0x101
>  [<c013f185>] do_wp_page+0x28a/0x35c
>  [<c014021d>] __handle_mm_fault+0x626/0x6a4
>  [<c0113368>] do_page_fault+0x20c/0x518
>  [<c011315c>] do_page_fault+0x0/0x518
>  [<c039aada>] error_code+0x6a/0x70
>  =======================
> Code: c0 74 0d 8b 50 08 b8 56 2e 42 c0 e8 ac f4 fe ff 8b 46 48 85 c0 74 14
> 8b 40 10 85 c0 74 0d 8b 50 2c b8 75 2e 42 c0 e8 91 f4 fe ff <0f> 0b eb fe
> 8b 53 10 8b 03 83 e2 01 c1 e8 1e f7 da 83 c2 04 69 EIP: [<c0144487>]
> page_remove_rmap+0xd4/0x101 SS:ESP 0068:f28d5d30 Fixing recursive fault but
> reboot is needed!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.22.6: kernel BUG at fs/locks.c:171
  2007-09-16  8:15           ` Nick Piggin
@ 2007-09-17 13:43             ` Soeren Sonnenburg
  0 siblings, 0 replies; 12+ messages in thread
From: Soeren Sonnenburg @ 2007-09-17 13:43 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-mm, linux-fsdevel, Linux Kernel

On Sun, 2007-09-16 at 18:15 +1000, Nick Piggin wrote:
> On Saturday 15 September 2007 20:22, Soeren Sonnenburg wrote:
> > On Sat, 2007-09-15 at 09:47 +0000, Soeren Sonnenburg wrote:
> 
> > > Memtest did not find anything after 16 passes so I finally stopped
> it
> > > applied your patch and used
> > >
> > > CONFIG_DEBUG_SLAB=y
> > > CONFIG_DEBUG_SLAB_LEAK=y
> > >
> > > and booted into the new kernel.
> > >
> > > A few hours later the machine hung (due to nmi watchdog rebooted),
> so I
[...]
> > > swap_dup: Bad swap file entry 28c8af9d
> 
> Hmm, this is another telltale symptom of either bad hardware
> or a memory scribbling bug.

Since this morning, the machine is running with the dvb driver for that
certain card unloaded...

Anyway you convinced me that it is the bad saa7134_dvb drivers (driving
the asus p7131) fault. As the driver seems huge, I wonder whether there
are a) other config debug options that could aid in debugging b) what
the names of certain io functions are that may cause this...

Thanks a lot!
Soeren
-- 
Sometimes, there's a moment as you're waking, when you become aware of
the real world around you, but you're still dreaming.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.22.6: kernel BUG at fs/locks.c:171
  2007-09-13 21:22     ` Nick Piggin
  2007-09-15  9:47       ` Soeren Sonnenburg
@ 2007-09-24 20:21       ` Soeren Sonnenburg
  1 sibling, 0 replies; 12+ messages in thread
From: Soeren Sonnenburg @ 2007-09-24 20:21 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-fsdevel, Linux Kernel


On Fri, 2007-09-14 at 07:22 +1000, Nick Piggin wrote:
> On Friday 14 September 2007 16:02, Soeren Sonnenburg wrote:
> > On Thu, 2007-09-13 at 09:51 +1000, Nick Piggin wrote:
> > > On Thursday 13 September 2007 19:20, Soeren Sonnenburg wrote:
> > > > Dear all,
> > > >
> > > > I've just seen this in dmesg on a AMD K7 / kernel 2.6.22.6 machine
> > > > (config attached).
> > > >
> > > > Any ideas / which further information needed ?
> > >
> > > Thanks for the report. Is it reproduceable? It seems like the
> > > locks_free_lock call that's oopsing is coming from __posix_lock_file.
> > > The actual function looks fine, but the lock being freed could have
> > > been corrupted if there was slab corruption, or a hardware corruption.
> > >
> > > You could: try running memtest86+ overnight. And try the following
> > > patch and turn on slab debugging then try to reproduce the problem.
> >
> > OK so far I've run memtest86+ 1.40 from freedos for 8 hrs (v1.70 hung on
> > startup) - nothing.
> 
> Thanks.
> 
> > Could this corruption be caused by a pci card/driver? I am asking as I
> > am using a new dvb-t card (asus p7131) and the oops happened after 5 or
> > 6 days of uptime just about a day after watching some movie (very bad
> > reception/lots of errors).
> 
> It could be caused by that, definitely. slab debugging plus my earlier
> patch may help to narrow it down. (or stress testing with / without the
> dvb card in action).

OK, it is the dvb card. I have 1 week of uptime now without any errors.
Only change is the dvb driver (saa7146) not loaded.

:(
Soeren

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.22.6: kernel BUG at fs/locks.c:171
@ 2007-10-09 13:09 Tomasz Chmielewski
  2007-10-09 14:12 ` Soeren Sonnenburg
  2007-10-09 14:48 ` Hugh Dickins
  0 siblings, 2 replies; 12+ messages in thread
From: Tomasz Chmielewski @ 2007-10-09 13:09 UTC (permalink / raw)
  To: LKML, nickpiggin, kernel

Soeren Sonnenburg wrote:

>> Fixing recursive fault but reboot is needed!
> 
> Hmmhh, so now I rebooted and again tried to
> 
> $ make
> 
> the new kernel which again triggered this(?) BUG:

I had a similar issue with 2.6.22.9, but as I had a proprietary nvidia 
module loaded, I didn't report it. X was not enabled, though.

At this moment, the machine was spawning quite a bit of bash / awk etc. 
processes with large variables (50 MB or so), and used memory and CPU a lot.

Normally, it's my desktop machine, and it's rarely on for more than ~12 
hours, but this time, I left it on for a couple of days.

After this happened, these bash / awk processes died. After I restarted 
the script again, I lost ssh access to the machine, and I saw no more 
entries in the syslog. The machine was pingable though, and had it's 
network sockets still open (I could telnet to ssh port).
I used SysRq to reboot the machine.



Oct  3 10:14:09 tomek kernel: Eeek! page_mapcount(page) went negative! (-1)
Oct  3 10:14:09 tomek kernel:   page pfn = 13aa
Oct  3 10:14:10 tomek kernel:   page->flags = 40000400
Oct  3 10:14:10 tomek kernel:   page->count = 1
Oct  3 10:14:10 tomek kernel:   page->mapping = 00000000
Oct  3 10:14:10 tomek kernel:   vma->vm_ops = 0x0
Oct  3 10:14:10 tomek kernel: ------------[ cut here ]------------
Oct  3 10:14:10 tomek syslogd: /dev/tty12: Interrupted system call
Oct  3 10:14:10 tomek kernel: kernel BUG at mm/rmap.c:628!
Oct  3 10:14:10 tomek kernel: invalid opcode: 0000 [#1]
Oct  3 10:14:10 tomek kernel: PREEMPT
Oct  3 10:14:10 tomek kernel: Modules linked in: nvidia(P) iptable_nat 
nf_nat ipt_ULOG ipt_recent af_packet nf_conntrack_ipv4 xt_state 
nf_conntrack nfnetli
nk ipt_REJECT xt_tcpudp iptable_filter ip_tables snd_seq_dummy x_tables 
snd_seq_oss snd_seq_midi_event snd_seq usblp loop dm_mod video thermal 
sbs fan cont
ainer dock battery ac floppy cpufreq_conservative cpufreq_powersave 
processor snd_pcm_oss snd_mixer_oss snd_via82xx snd_ac97_codec ac97_bus 
snd_pcm snd_tim
er snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd 
soundcore ehci_hcd i2c_viapro i2c_core via_rhine uhci_hcd tsdev evdev 
usbcore via_agp agpg
art 8139cp 8139too mii sg
Oct  3 10:14:10 tomek kernel: CPU:    0
Oct  3 10:14:10 tomek kernel: EIP:    0060:[<c015a434>]    Tainted: P 
     VLI
Oct  3 10:14:10 tomek kernel: EFLAGS: 00010246   (2.6.22.9-3 #1)
Oct  3 10:14:10 tomek kernel: EIP is at page_remove_rmap+0xd7/0x105
Oct  3 10:14:10 tomek kernel: eax: 00000000   ebx: c1027540   ecx: 
e8a2e000   edx: 00000002
Oct  3 10:14:10 tomek kernel: esi: c4226f20   edi: c3ab828c   ebp: 
e8a2fea4   esp: e8a2fe94
Oct  3 10:14:10 tomek kernel: ds: 007b   es: 007b   fs: 0000  gs: 0000 
ss: 0068
Oct  3 10:14:10 tomek kernel: Process bash (pid: 28682, ti=e8a2e000 
task=db280000 task.ti=e8a2e000)
Oct  3 10:14:10 tomek kernel: Stack: c0346d1d 00000000 c1027540 0b4a3000 
e8a2ff00 c0154edc e8a2e000 28b98fff
Oct  3 10:14:10 tomek kernel:        013aaa80 00000000 c4226f20 e8a2ff18 
00000001 00000000 00000000 0b800000
Oct  3 10:14:10 tomek kernel:        c21330b4 c21330b4 d896d780 c03f2200 
00000000 ffffffff 28b99000 00000000
Oct  3 10:14:10 tomek kernel: Call Trace:
Oct  3 10:14:10 tomek kernel:  [<c0104d19>] show_trace_log_lvl+0x1a/0x2f
Oct  3 10:14:10 tomek kernel:  [<c0104dc9>] show_stack_log_lvl+0x9b/0xa3
Oct  3 10:14:10 tomek kernel:  [<c0104fa8>] show_registers+0x1d7/0x30c
Oct  3 10:14:10 tomek kernel:  [<c01051db>] die+0xfe/0x1d6
Oct  3 10:14:10 tomek kernel:  [<c02be0af>] do_trap+0x89/0xa2
Oct  3 10:14:10 tomek kernel:  [<c0105605>] do_invalid_op+0x88/0x92
Oct  3 10:14:10 tomek kernel:  [<c02bde8a>] error_code+0x6a/0x70
Oct  3 10:14:10 tomek kernel:  [<c0154edc>] unmap_vmas+0x236/0x425
Oct  3 10:14:10 tomek kernel:  [<c0157a49>] exit_mmap+0x68/0xf0
Oct  3 10:14:10 tomek kernel:  [<c0117553>] mmput+0x1e/0x88
Oct  3 10:14:10 tomek kernel:  [<c011aa52>] exit_mm+0xbb/0xc1
Oct  3 10:14:10 tomek kernel:  [<c011be51>] do_exit+0x1f0/0x720
Oct  3 10:14:12 tomek kernel:  [<c011c3ef>] sys_exit_group+0x0/0x11
Oct  3 10:14:12 tomek kernel:  [<c011c3fe>] sys_exit_group+0xf/0x11
Oct  3 10:14:12 tomek kernel:  [<c0103da2>] sysenter_past_esp+0x5f/0x99
Oct  3 10:14:12 tomek kernel:  =======================
Oct  3 10:14:12 tomek kernel: Code: c0 74 0d 8b 50 08 b8 4d 6d 34 c0 e8 
ea 0f fe ff 8b 46 48 85 c0 74 14 8b 40 10 85 c0 74 0d 8b 50 2c b8 6c 6d 
34 c0 e8 cf
  0f fe ff <0f> 0b eb fe 8b 53 10 8b 03 83 e2 01 c1 e8 1e f7 da 83 c2 04 69
Oct  3 10:14:12 tomek kernel: EIP: [<c015a434>] 
page_remove_rmap+0xd7/0x105 SS:ESP 0068:e8a2fe94
Oct  3 10:14:12 tomek kernel: Fixing recursive fault but reboot is needed!
Oct  3 10:14:12 tomek kernel: BUG: scheduling while atomic: 
bash/0x00000002/28682
Oct  3 10:14:12 tomek kernel: INFO: lockdep is turned off.
Oct  3 10:14:12 tomek kernel:  [<c0104d19>] show_trace_log_lvl+0x1a/0x2f
Oct  3 10:14:12 tomek kernel:  [<c01057b1>] show_trace+0x12/0x14
Oct  3 10:14:12 tomek kernel:  [<c0105837>] dump_stack+0x15/0x17
Oct  3 10:14:12 tomek kernel:  [<c02baf0e>] __sched_text_start+0x6e/0x5d5
Oct  3 10:14:12 tomek kernel:  [<c011bd52>] do_exit+0xf1/0x720
Oct  3 10:14:12 tomek kernel:  [<c01052ab>] die+0x1ce/0x1d6
Oct  3 10:14:12 tomek kernel:  [<c02be0af>] do_trap+0x89/0xa2
Oct  3 10:14:12 tomek kernel:  [<c0105605>] do_invalid_op+0x88/0x92
Oct  3 10:14:12 tomek kernel:  [<c02bde8a>] error_code+0x6a/0x70
Oct  3 10:14:12 tomek kernel:  [<c0154edc>] unmap_vmas+0x236/0x425
Oct  3 10:14:12 tomek kernel:  [<c0157a49>] exit_mmap+0x68/0xf0
Oct  3 10:14:12 tomek kernel:  [<c0117553>] mmput+0x1e/0x88
Oct  3 10:14:12 tomek kernel:  [<c011aa52>] exit_mm+0xbb/0xc1
Oct  3 10:14:12 tomek kernel:  [<c011be51>] do_exit+0x1f0/0x720
Oct  3 10:14:12 tomek kernel:  [<c011c3ef>] sys_exit_group+0x0/0x11
Oct  3 10:14:12 tomek kernel:  [<c011c3fe>] sys_exit_group+0xf/0x11
Oct  3 10:14:12 tomek kernel:  [<c0103da2>] sysenter_past_esp+0x5f/0x99
Oct  3 10:14:12 tomek kernel:  =======================



-- 
Tomasz Chmielewski
http://blog.wpkg.org

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.22.6: kernel BUG at fs/locks.c:171
  2007-10-09 13:09 Tomasz Chmielewski
@ 2007-10-09 14:12 ` Soeren Sonnenburg
  2007-10-09 14:48 ` Hugh Dickins
  1 sibling, 0 replies; 12+ messages in thread
From: Soeren Sonnenburg @ 2007-10-09 14:12 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: LKML, nickpiggin


On Tue, 2007-10-09 at 15:09 +0200, Tomasz Chmielewski wrote:
> Soeren Sonnenburg wrote:
> 
> >> Fixing recursive fault but reboot is needed!
> > 
> > Hmmhh, so now I rebooted and again tried to
> > 
> > $ make
> > 
> > the new kernel which again triggered this(?) BUG:
> 
> I had a similar issue with 2.6.22.9, but as I had a proprietary nvidia 
> module loaded, I didn't report it. X was not enabled, though.
> 
> At this moment, the machine was spawning quite a bit of bash / awk etc. 
> processes with large variables (50 MB or so), and used memory and CPU a lot.
> 
> Normally, it's my desktop machine, and it's rarely on for more than ~12 
> hours, but this time, I left it on for a couple of days.
> 
> After this happened, these bash / awk processes died. After I restarted 
> the script again, I lost ssh access to the machine, and I saw no more 

I am afraid you are seeing some kind of hardware failure/bad driver
behavior, just the symptom is the same.

I am saying this as I have an uptime of 22 days with that very same
machine now. And all I changed was unloading the asus p7131 dvb-t driver
(saa71xx).

Soeren

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.22.6: kernel BUG at fs/locks.c:171
  2007-10-09 13:09 Tomasz Chmielewski
  2007-10-09 14:12 ` Soeren Sonnenburg
@ 2007-10-09 14:48 ` Hugh Dickins
  1 sibling, 0 replies; 12+ messages in thread
From: Hugh Dickins @ 2007-10-09 14:48 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: LKML, nickpiggin, kernel

On Tue, 9 Oct 2007, Tomasz Chmielewski wrote:
> 
> I had a similar issue with 2.6.22.9, but as I had a proprietary nvidia module
> loaded, I didn't report it. X was not enabled, though.

There is indeed a strong likelihood that yours is
related to that nvidia(P): please take it to them.

Hugh

> Oct  3 10:14:09 tomek kernel: Eeek! page_mapcount(page) went negative! (-1)
> Oct  3 10:14:09 tomek kernel:   page pfn = 13aa
> Oct  3 10:14:10 tomek kernel:   page->flags = 40000400
> Oct  3 10:14:10 tomek kernel:   page->count = 1
> Oct  3 10:14:10 tomek kernel:   page->mapping = 00000000
> Oct  3 10:14:10 tomek kernel:   vma->vm_ops = 0x0
> Oct  3 10:14:10 tomek kernel: ------------[ cut here ]------------
> Oct  3 10:14:10 tomek syslogd: /dev/tty12: Interrupted system call
> Oct  3 10:14:10 tomek kernel: kernel BUG at mm/rmap.c:628!
> Oct  3 10:14:10 tomek kernel: invalid opcode: 0000 [#1]
> Oct  3 10:14:10 tomek kernel: PREEMPT
> Oct  3 10:14:10 tomek kernel: Modules linked in: nvidia(P) ...
> Oct  3 10:14:10 tomek kernel: CPU:    0
> Oct  3 10:14:10 tomek kernel: EIP:    0060:[<c015a434>]    Tainted: P     VLI
> Oct  3 10:14:10 tomek kernel: EFLAGS: 00010246   (2.6.22.9-3 #1)
> Oct  3 10:14:10 tomek kernel: EIP is at page_remove_rmap+0xd7/0x105

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-10-09 14:49 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-13  9:20 2.6.22.6: kernel BUG at fs/locks.c:171 Soeren Sonnenburg
2007-09-12 23:51 ` Nick Piggin
2007-09-14  6:02   ` Soeren Sonnenburg
2007-09-13 21:22     ` Nick Piggin
2007-09-15  9:47       ` Soeren Sonnenburg
2007-09-15 10:22         ` Soeren Sonnenburg
2007-09-16  8:15           ` Nick Piggin
2007-09-17 13:43             ` Soeren Sonnenburg
2007-09-24 20:21       ` Soeren Sonnenburg
  -- strict thread matches above, loose matches on Subject: below --
2007-10-09 13:09 Tomasz Chmielewski
2007-10-09 14:12 ` Soeren Sonnenburg
2007-10-09 14:48 ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).