public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [BUG] in 2.4.17 after 10 days uptime
@ 2002-01-01  7:18 Ed Tomlinson
  2002-01-01 19:56 ` Benjamin LaHaise
  0 siblings, 1 reply; 5+ messages in thread
From: Ed Tomlinson @ 2002-01-01  7:18 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 671 bytes --]

Hi,

I started getting these bugs after about 10 days uptime.  There is a patch set for reiserfs applied
along with a few minor patches (ide-tape, disk stats for up to hdg).  The kernel is tainted by:

oscar# taint
filename:    /lib/modules/2.4.17/kernel/net/khttpd/khttpd.o
filename:    /lib/modules/2.4.17/kernel/net/netlink/netlink_dev.o
oscar# alias | grep taint
taint='modinfo `modprobe -l` | sed -ne "/^filename/h; /^license.*none/{g;p;}"'

Hardware has been stable and went over 30 days with 2.4.16 without problems.  The system is
based on debian unstable (and its living up to its name with kde apps vs libpng3 vs libqt2 problems).

Happy New Year!
Ed Tomlinson

[-- Attachment #2: bug.log --]
[-- Type: text/plain, Size: 7816 bytes --]

ksymoops 2.4.3 on i586 2.4.17.  Options used
     -V (default)
     -k 20011231063052.ksyms (specified)
     -l 20011230063029.modules (specified)
     -o /lib/modules/2.4.17/ (default)
     -m /boot/System.map-2.4.17 (default)

Dec 31 01:32:34 oscar kernel: kernel BUG at page_alloc.c:207!
Dec 31 01:32:34 oscar kernel: invalid operand: 0000
Dec 31 01:32:34 oscar kernel: CPU:    0
Dec 31 01:32:34 oscar kernel: EIP:    0010:[rmqueue+474/544]    Tainted: P
Dec 31 01:32:34 oscar kernel: EFLAGS: 00010286
Dec 31 01:32:34 oscar kernel: eax: 00000020   ebx: c022ddc0   ecx: c022ca60   edx: 000045d4
Dec 31 01:32:34 oscar kernel: esi: c1295540   edi: 00000000   ebp: 00000001   esp: cb1e5e54
Dec 31 01:32:34 oscar kernel: ds: 0018   es: 0018   ss: 0018
Dec 31 01:32:34 oscar kernel: Process setiathome (pid: 29402, stackpage=cb1e5000)
Dec 31 01:32:34 oscar kernel: Stack: c01f92da 000000cf c022df1c 00000100 00000000 c022dda8 00009555 00000286
Dec 31 01:32:34 oscar kernel:        00000000 c022dda8 c0127a9b 000001d2 cacf0720 00000001 cacf0720 c022df18
Dec 31 01:32:34 oscar kernel:        000001d2 ffffff00 c01277c6 00104025 c011eff8 410c7000 cacf0720 00000001
Dec 31 01:32:34 oscar kernel: Call Trace: [__alloc_pages+159/352] [_alloc_pages+22/24] [do_anonymous_page+48/164] [do_no_page+51/284] [handle_mm_fault+82/180]
Dec 31 01:32:34 oscar kernel: Code: 0f 0b 83 c4 08 90 8b 46 18 a8 80 74 19 68 d1 00 00 00 68 da
Using defaults from ksymoops -t elf32-i386 -a i386

Code;  00000000 Before first symbol
00000000 <_EIP>:
Code;  00000000 Before first symbol
   0:   0f 0b                     ud2a   
Code;  00000002 Before first symbol
   2:   83 c4 08                  add    $0x8,%esp
Code;  00000004 Before first symbol
   5:   90                        nop    
Code;  00000006 Before first symbol
   6:   8b 46 18                  mov    0x18(%esi),%eax
Code;  00000008 Before first symbol
   9:   a8 80                     test   $0x80,%al
Code;  0000000a Before first symbol
   b:   74 19                     je     26 <_EIP+0x26> 00000026 Before first symbol
Code;  0000000c Before first symbol
   d:   68 d1 00 00 00            push   $0xd1
Code;  00000012 Before first symbol
  12:   68 da 00 00 00            push   $0xda

Dec 31 06:25:25 oscar kernel: kernel BUG at page_alloc.c:76!
Dec 31 06:25:25 oscar kernel: invalid operand: 0000
Dec 31 06:25:25 oscar kernel: CPU:    0
Dec 31 06:25:25 oscar kernel: EIP:    0010:[__free_pages_ok+52/672]    Tainted: P
Dec 31 06:25:25 oscar kernel: EFLAGS: 00010282
Dec 31 06:25:25 oscar kernel: eax: 0000001f   ebx: c1295540   ecx: c022ca60   edx: 00004b0b
Dec 31 06:25:25 oscar kernel: esi: c1295540   edi: 00000000   ebp: 00000000   esp: c183bf1c
Dec 31 06:25:25 oscar kernel: ds: 0018   es: 0018   ss: 0018
Dec 31 06:25:25 oscar kernel: Process kswapd (pid: 4, stackpage=c183b000)
Dec 31 06:25:25 oscar kernel: Stack: c01f92da 0000004c c1295540 c1295540 00000000 00000009 c012eb69 c1295540
Dec 31 06:25:25 oscar kernel:        000001d0 c129555c c0126561 c0127bc4 c129555c c0126c07 00000020 000001d0
Dec 31 06:25:25 oscar kernel:        00000020 00000006 c183a000 00000200 00001bf3 000001d0 c022dda8 c0126e17
Dec 31 06:25:25 oscar kernel: Call Trace: [try_to_release_page+81/88] [lru_cache_del+5/20] [page_cache_release+44/48] [shrink_cache+555/772] [shrink_caches+83/132]
Dec 31 06:25:25 oscar kernel: Code: 0f 0b 83 c4 08 8d b4 26 00 00 00 00 89 f0 2b 05 8c 36 28 c0

Code;  00000000 Before first symbol
00000000 <_EIP>:
Code;  00000000 Before first symbol
   0:   0f 0b                     ud2a   
Code;  00000002 Before first symbol
   2:   83 c4 08                  add    $0x8,%esp
Code;  00000004 Before first symbol
   5:   8d b4 26 00 00 00 00      lea    0x0(%esi,1),%esi
Code;  0000000c Before first symbol
   c:   89 f0                     mov    %esi,%eax
Code;  0000000e Before first symbol
   e:   2b 05 8c 36 28 c0         sub    0xc028368c,%eax

Dec 31 10:13:29 oscar kernel: kernel BUG at page_alloc.c:76!
Dec 31 10:13:29 oscar kernel: invalid operand: 0000
Dec 31 10:13:29 oscar kernel: CPU:    0
Dec 31 10:13:29 oscar kernel: EIP:    0010:[__free_pages_ok+52/672]    Tainted: P
Dec 31 10:13:29 oscar kernel: EFLAGS: 00010286
Dec 31 10:13:29 oscar kernel: eax: 0000001f   ebx: c1295540   ecx: c022ca60   edx: 00004e61
Dec 31 10:13:29 oscar kernel: esi: c1295540   edi: c4b6fcb0   ebp: 00000000   esp: d7eb5ef0
Dec 31 10:13:29 oscar kernel: ds: 0018   es: 0018   ss: 0018
Dec 31 10:13:29 oscar kernel: Process galeon-bin (pid: 9915, stackpage=d7eb5000)
Dec 31 10:13:29 oscar kernel: Stack: c01f92da 0000004c c1295540 00022000 c4b6fcb0 4130a000 c1040000 c022ddc0
Dec 31 10:13:29 oscar kernel:        00000213 ffffffff 00007958 c0127bc4 c1295540 c0127fdd c1295540 c011df3a
Dec 31 10:13:29 oscar kernel:        c1295540 000f6000 c011e2f3 0a555025 c75a12c0 cacf0a40 40f0a000 00319000
Dec 31 10:13:29 oscar kernel: Call Trace: [page_cache_release+44/48] [free_page_and_swap_cache+49/52] [__free_pte+58/64] [zap_page_range+387/556] [exit_mmap+183/300]
Dec 31 10:13:29 oscar kernel: Code: 0f 0b 83 c4 08 8d b4 26 00 00 00 00 89 f0 2b 05 8c 36 28 c0

Code;  00000000 Before first symbol
00000000 <_EIP>:
Code;  00000000 Before first symbol
   0:   0f 0b                     ud2a   
Code;  00000002 Before first symbol
   2:   83 c4 08                  add    $0x8,%esp
Code;  00000004 Before first symbol
   5:   8d b4 26 00 00 00 00      lea    0x0(%esi,1),%esi
Code;  0000000c Before first symbol
   c:   89 f0                     mov    %esi,%eax
Code;  0000000e Before first symbol
   e:   2b 05 8c 36 28 c0         sub    0xc028368c,%eax

Jan  1 01:44:57 oscar kernel:  sda:<1>Unable to handle kernel NULL pointer dereference at virtual address 00000000
Jan  1 01:44:57 oscar kernel: e2b0729f
Jan  1 01:44:57 oscar kernel: *pde = 00000000
Jan  1 01:44:57 oscar kernel: Oops: 0000
Jan  1 01:44:57 oscar kernel: CPU:    0
Jan  1 01:44:57 oscar kernel: EIP:    0010:[softdog:__insmod_softdog_O/lib/modules/2.4.17/kernel/drivers/char/s+-593249/96]
Jan  1 01:44:57 oscar kernel: EFLAGS: 00010287
Jan  1 01:44:57 oscar kernel: eax: 00000000   ebx: c6e18900   ecx: 00000005   edx: 00001000
Jan  1 01:44:57 oscar kernel: esi: 00000000   edi: 00000000   ebp: d1d24000   esp: cfa1bebc
Jan  1 01:44:57 oscar kernel: ds: 0018   es: 0018   ss: 0018
Jan  1 01:44:57 oscar kernel: Process usb-storage-0 (pid: 8948, stackpage=cfa1b000)
Jan  1 01:44:57 oscar kernel: Stack: cecad800 00000000 df4b9e00 df4b9e00 00000009 00000005 cecad858 00000009
Jan  1 01:44:57 oscar kernel:        00086463 00800003 00000a01 00000000 00000000 02008000 0000001f 00000000
Jan  1 01:44:57 oscar kernel:        00000000 00000000 00000000 00000000 00000000 00000000 32203030 30302030
Jan  1 01:44:57 oscar kernel: Call Trace: [schedule+754/796] [softdog:__insmod_softdog_O/lib/modules/2.4.17/kernel/drivers/char/s+-604519/96] [__down_interruptible+183/196] [softdog:__insmod_softdog_O/lib/modules/2.4.17/kernel/drivers/char/s+-606259/96]
Jan  1 01:44:57 oscar kernel: Code: 8b 14 b8 85 d2 75 07 8b 43 1c 39 38 75 3f 8b 9c 24 9c 00 00

Code;  00000000 Before first symbol
00000000 <_EIP>:
Code;  00000000 Before first symbol
   0:   8b 14 b8                  mov    (%eax,%edi,4),%edx
Code;  00000002 Before first symbol
   3:   85 d2                     test   %edx,%edx
Code;  00000004 Before first symbol
   5:   75 07                     jne    e <_EIP+0xe> 0000000e Before first symbol
Code;  00000006 Before first symbol
   7:   8b 43 1c                  mov    0x1c(%ebx),%eax
Code;  0000000a Before first symbol
   a:   39 38                     cmp    %edi,(%eax)
Code;  0000000c Before first symbol
   c:   75 3f                     jne    4d <_EIP+0x4d> 0000004c Before first symbol
Code;  0000000e Before first symbol
   e:   8b 9c 24 9c 00 00 00      mov    0x9c(%esp,1),%ebx


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] in 2.4.17 after 10 days uptime
  2002-01-01  7:18 [BUG] in 2.4.17 after 10 days uptime Ed Tomlinson
@ 2002-01-01 19:56 ` Benjamin LaHaise
  2002-01-07 18:28   ` Marcelo Tosatti
  0 siblings, 1 reply; 5+ messages in thread
From: Benjamin LaHaise @ 2002-01-01 19:56 UTC (permalink / raw)
  To: Ed Tomlinson, Marcelo Tosatti; +Cc: linux-kernel

On Tue, Jan 01, 2002 at 02:18:01AM -0500, Ed Tomlinson wrote:
> I started getting these bugs after about 10 days uptime.  There is a patch 
> set for reiserfs applied along with a few minor patches (ide-tape, disk 
> stats for up to hdg).  The kernel is tainted by:

Expected BUG.  Here's the fix.  Marcelo, this is what we discussed previously: 
parts of the kernel that grab a temporary reference to a page will frequently 
not use page_cache_release as the page may never have been part of the page 
cache.  This shows up with the network stack in sendpage() as well as many 
other paths.  Please apply.

:r ~/patches/v2.4.17-pglru.diff
diff -urN v2.4.17/include/linux/pagemap.h v2.4.17-pglru/include/linux/pagemap.h
--- v2.4.17/include/linux/pagemap.h	Thu Dec 20 19:30:25 2001
+++ v2.4.17-pglru/include/linux/pagemap.h	Tue Jan  1 14:46:04 2002
@@ -29,7 +29,7 @@
 #define PAGE_CACHE_ALIGN(addr)	(((addr)+PAGE_CACHE_SIZE-1)&PAGE_CACHE_MASK)
 
 #define page_cache_get(x)	get_page(x)
-extern void FASTCALL(page_cache_release(struct page *));
+#define page_cache_release(x)	__free_page(x)
 
 static inline struct page *page_cache_alloc(struct address_space *x)
 {
diff -urN v2.4.17/kernel/ksyms.c v2.4.17-pglru/kernel/ksyms.c
--- v2.4.17/kernel/ksyms.c	Tue Jan  1 14:09:35 2002
+++ v2.4.17-pglru/kernel/ksyms.c	Tue Jan  1 14:46:55 2002
@@ -95,7 +95,6 @@
 EXPORT_SYMBOL(alloc_pages_node);
 EXPORT_SYMBOL(__get_free_pages);
 EXPORT_SYMBOL(get_zeroed_page);
-EXPORT_SYMBOL(page_cache_release);
 EXPORT_SYMBOL(__free_pages);
 EXPORT_SYMBOL(free_pages);
 EXPORT_SYMBOL(num_physpages);
diff -urN v2.4.17/mm/page_alloc.c v2.4.17-pglru/mm/page_alloc.c
--- v2.4.17/mm/page_alloc.c	Mon Nov 26 23:43:08 2001
+++ v2.4.17-pglru/mm/page_alloc.c	Tue Jan  1 14:44:59 2002
@@ -70,6 +70,12 @@
 	struct page *base;
 	zone_t *zone;
 
+	/* Yes, think what happens when other parts of the kernel take 
+	 * a reference to a page in order to pin it for io. -ben
+	 */
+	if (PageLRU(page))
+		lru_cache_del(page);
+
 	if (page->buffers)
 		BUG();
 	if (page->mapping)
@@ -426,15 +432,6 @@
 	return 0;
 }
 
-void page_cache_release(struct page *page)
-{
-	if (!PageReserved(page) && put_page_testzero(page)) {
-		if (PageLRU(page))
-			lru_cache_del(page);
-		__free_pages_ok(page, 0);
-	}
-}
-
 void __free_pages(struct page *page, unsigned int order)
 {
 	if (!PageReserved(page) && put_page_testzero(page))

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] in 2.4.17 after 10 days uptime
  2002-01-01 19:56 ` Benjamin LaHaise
@ 2002-01-07 18:28   ` Marcelo Tosatti
  2002-01-08  2:24     ` Benjamin LaHaise
  0 siblings, 1 reply; 5+ messages in thread
From: Marcelo Tosatti @ 2002-01-07 18:28 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Ed Tomlinson, linux-kernel



On Tue, 1 Jan 2002, Benjamin LaHaise wrote:

> On Tue, Jan 01, 2002 at 02:18:01AM -0500, Ed Tomlinson wrote:
> > I started getting these bugs after about 10 days uptime.  There is a patch 
> > set for reiserfs applied along with a few minor patches (ide-tape, disk 
> > stats for up to hdg).  The kernel is tainted by:
> 
> Expected BUG.  Here's the fix.  Marcelo, this is what we discussed previously: 
> parts of the kernel that grab a temporary reference to a page will frequently 
> not use page_cache_release as the page may never have been part of the page 
> cache.  This shows up with the network stack in sendpage() as well as many 
> other paths.  Please apply.

Ben, 

I suppose you're talking about the following case: 

pagecache code has LRU page

			    nonpagecache code does page_cache_get()

pagecache code does page_cache_release()

			   nonpagecache code does __free_pages_ok()
			   on LRU page: BOOM.	

Is my thinking correct ?

If so, I don't see why Ed's trace BUGs at rmqueue first: It should bug at
__free_pages_ok() PageLRU check.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] in 2.4.17 after 10 days uptime
  2002-01-07 18:28   ` Marcelo Tosatti
@ 2002-01-08  2:24     ` Benjamin LaHaise
  2002-01-08  3:38       ` Ed Tomlinson
  0 siblings, 1 reply; 5+ messages in thread
From: Benjamin LaHaise @ 2002-01-08  2:24 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Ed Tomlinson, linux-kernel

On Mon, Jan 07, 2002 at 04:28:12PM -0200, Marcelo Tosatti wrote:
> Is my thinking correct ?

Yes, that's the case I was thinking of.  sendfile() and tux are potential 
triggers of this.

> If so, I don't see why Ed's trace BUGs at rmqueue first: It should bug at
> __free_pages_ok() PageLRU check.

Hmm, as we've discussed on irc, there are some other nasty implications of 
the __free_pages code interacting with shrink_cache without this patch.  I'm 
not certain that explains it, but it could.  Ed, have you seen this oops 
again?  What kind of load is the machine under?

		-ben
-- 
Fish.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] in 2.4.17 after 10 days uptime
  2002-01-08  2:24     ` Benjamin LaHaise
@ 2002-01-08  3:38       ` Ed Tomlinson
  0 siblings, 0 replies; 5+ messages in thread
From: Ed Tomlinson @ 2002-01-08  3:38 UTC (permalink / raw)
  To: Benjamin LaHaise, Marcelo Tosatti; +Cc: linux-kernel

On January 7, 2002 09:24 pm, Benjamin LaHaise wrote:
> On Mon, Jan 07, 2002 at 04:28:12PM -0200, Marcelo Tosatti wrote:
> > Is my thinking correct ?
>
> Yes, that's the case I was thinking of.  sendfile() and tux are potential
> triggers of this.
>
> > If so, I don't see why Ed's trace BUGs at rmqueue first: It should bug at
> > __free_pages_ok() PageLRU check.
>
> Hmm, as we've discussed on irc, there are some other nasty implications of
> the __free_pages code interacting with shrink_cache without this patch. 
> I'm not certain that explains it, but it could.  Ed, have you seen this
> oops again?  What kind of load is the machine under?

After applyng your patch I ran for another couple of day on 18pre1 without
seeing any problems.  The system is fairly lightly loaded running a caching
news server, java apps and acts as a masq gateway/squid cache for the rest 
of the boxes here (home network).  It also the box I use...

Ed

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2002-01-08  3:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-01-01  7:18 [BUG] in 2.4.17 after 10 days uptime Ed Tomlinson
2002-01-01 19:56 ` Benjamin LaHaise
2002-01-07 18:28   ` Marcelo Tosatti
2002-01-08  2:24     ` Benjamin LaHaise
2002-01-08  3:38       ` Ed Tomlinson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox