public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
To: Andrew Morton <akpm@osdl.org>, frankvm@xs4all.nl
Cc: sdake@mvista.com, liste@jordet.nu, linux-kernel@vger.kernel.org,
	sct@redhat.com
Subject: Re: [2.4] page->buffers vanished in journal_try_to_free_buffers()
Date: Sat, 19 Jun 2004 16:48:49 -0300	[thread overview]
Message-ID: <20040619194849.GA2843@logos.cnet> (raw)
In-Reply-To: <20040617200859.7fada9fe.akpm@osdl.org>

On Thu, Jun 17, 2004 at 08:08:59PM -0700, Andrew Morton wrote:
> Marcelo Tosatti <marcelo.tosatti@cyclades.com> wrote:
> >
> >  > tmp (page->buffers) above is null.  b_this_page is at offset 0x28 (the accessed address in the oops).  This means that
> >  > page->buffers is set to null by some other routine which results in the oops.
> >  > 
> >  > I read the page allocate code
> >  > (ext3_read_page->block_read_full_page->create_emty_buffers->create_buffers), and it appears that it is not possible to allocate a page->buffers value of zero in the allocate function.  I am having difficulty reproducing and cannot debug further, however.  Can page->buffers be set to zero somewhere else?  
> >  >Perhaps kswapd and some other thread are racing on the free?
> > 
> >  Steve, 
> > 
> >  Hum, I'm starting to believe we might have an issue here.
> > 
> >  Searching lkml archives I find other similar oopses at the same place 
> >  (trying to access 00000028, tmp->b_this_page), as you said.
> > 
 >  However I wonder what other kernel codepath could remove the page buffers
> >  under us, the page MUST be locked here. In the backtrace above the page 
> >  is locked by shrink_cache(). And with the page locked, we guarantee the VM
> >  freeing routines (shrink_cache) wont try to mess with the page.
> > 
> >  Can you reproduce the oopsen?
> > 
> >  Stephen, Andrew, do you have any idea how the buffers could have vanished
> >  under us with the page locked? That should not be possible. 
> > 
> >  I dont see how this "page->buffers = NULL" could be caused by hardware problem, 
> >  which is usually one or two bit flip.
> 
> It's a bit odd.  The page is definitely locked, and definitely had non-null
> ->buffers a few tens of instructions beforehand.
> 
> Is this an SMP machine?

Steven, did you see the NULL ->b_this_page on SMP or UP?

Stian Jordet had an SMP server, but he also was seeing oopses with v2.6:

 kernel BUG at mm/page_alloc.c:201!
 invalid operand: 0000 [#1]
 CPU:    0
 EIP:    0060:[free_pages_bulk+482/512]    Not tainted
 EIP is at free_pages_bulk+0x1e2/0x200
 eax: 00000001   ebx: c00609c8   ecx: 00000000   edx: \
                666026a5
 esi: 666026a4   edi: ffffffff   ebp: 33301352   esp: \
                c86d5d90
 Process mrtg (pid: 26804, threadinfo=c86d4000 \
                task=c9b860c0)
 Call Trace:
  [free_hot_cold_page+217/240] \
                free_hot_cold_page+0xd9/0xf0
  [do_generic_mapping_read+714/1008] \
                do_generic_mapping_read+0x2ca/0x3f0
  [file_read_actor+0/256] file_read_actor+0x0/0x100
  [__generic_file_aio_read+454/512] \
                __generic_file_aio_read+0x1c6/0x200
  [file_read_actor+0/256] file_read_actor+0x0/0x100
  [generic_file_aio_read+91/128] \
                generic_file_aio_read+0x5b/0x80
  [do_sync_read+137/192] do_sync_read+0x89/0xc0
  [do_page_fault+300/1328] do_page_fault+0x12c/0x530
  [do_brk+324/560] do_brk+0x144/0x230
  [vfs_read+184/304] vfs_read+0xb8/0x130
  [sys_read+66/112] sys_read+0x42/0x70
  [syscall_call+7/11] syscall_call+0x7/0xb
 
and different oopses on v2.4, including sync_page_buffers (also NULL+offset access): 

<1>Unable to handle kernel NULL pointer dereference at virtual address 00000021
c0132e86

eax: 00000000   ebx: 00000009   ecx: 000001d2   edx: 00000012
esi: 00000000   edi: c17e38c0   ebp: c1047a00   esp: c86cbdb4

>>EIP; c0132e86 <sync_page_buffers+e/a4>   <=====

Trace; c0132fdc <try_to_free_buffers+c0/ec>

Code;  c0132e86 <sync_page_buffers+e/a4>
00000000 <_EIP>:
Code;  c0132e86 <sync_page_buffers+e/a4>   <=====
   0:   f6 43 18 06               testb  $0x6,0x18(%ebx)   <=====
Code;  c0132e8a <sync_page_buffers+12/a4>

and the journal_try_to_free_buffers() one:

Unable to handle kernel NULL pointer dereference at virtual address 00000028
c015e3a2
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<c015e3a2>]    Not tainted
EFLAGS: 00010203

eax: 0100004d   ebx: 00000000   ecx: 000001d2   edx: 00000000

Code;  c015e3a2 <journal_try_to_free_buffers+5a/98>
00000000 <_EIP>:
Code;  c015e3a2 <journal_try_to_free_buffers+5a/98>   <=====
   0:   8b 5b 28                  mov    0x28(%ebx),%ebx   <=====
Code;  c015e3a5 <journal_try_to_free_buffers+5d/98>


He upgraded the box and stopped seeing the crashes, running 
recent v2.6. 

However, he also mentioned that his crashes started after upgrading 
from v2.4.19->2.4.22. Should search the diff between them looking for 
anything suspicious.

I can't figure out from the archived reports if this is UP or SMP only. 

Frank van Maarseveen has also seen the journal_try_to_free_buffers() NULL 
b_this_page. Frank, were you running SMP or UP when you reported the oops 
with 2.4.23? 

> One possibility is that we died on the second pass around the loop:
> page->buffers points at a buffer_head which has a NULL ->b_this_page.  But
> I cannot suggest how ->b_this_page could have been zapped.

Oh, yes, indeed. 

Maybe adding this (untested) to v2.4 mainline helps? Comments?

--- transaction.c.orig	2004-06-19 15:21:32.861148560 -0300
+++ transaction.c	2004-06-19 15:23:18.214132472 -0300
@@ -1694,6 +1694,24 @@
 	return 0;
 }
 
+void debug_page(struct page *p)
+{
+	struct buffer_head *bh;
+
+	bh = p->buffers;
+
+	printk(KERN_ERR "%s: page index:%u count:%d flags:%x\n", __FUNCTION__,
+		,p->index , atomic_read(&p->count), p->flags);
+
+	do {
+		printk(KERN_ERR "%s: bh b_next:%p blocknr:%u b_list:%u state:%x\n",
+			__FUNCTION__, bh->b_next, bh->b_blocknr, bh->b_list,
+				bh->b_state);
+		bh = bh->b_this_page;
+	} while (bh);
+}
+
+
 
 /** 
  * int journal_try_to_free_buffers() - try to free page buffers.
@@ -1752,6 +1770,11 @@
 	do {
 		struct buffer_head *p = tmp;
 
+		if (!unlikely(tmp)) {
+			debug_page(page);
+			BUG();
+		}
+			
 		tmp = tmp->b_this_page;
 		if (buffer_jbd(p))
 			if (!__journal_try_to_free_buffer(p, &locked_or_dirty))

  reply	other threads:[~2004-06-19 19:56 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-02-03 18:26 Oopses with both recent 2.4.x kernels and 2.6.x kernels Stian Jordet
2004-02-05 23:51 ` Marcelo Tosatti
2004-03-02 11:03   ` Stian Jordet
2004-03-02 12:31     ` Stian Jordet
2004-03-09 19:22       ` Marcelo Tosatti
2004-03-09 22:28         ` Stian Jordet
2004-06-14 17:07     ` Steven Dake
2004-06-14 18:26       ` Chris Shoemaker
2004-06-15 13:16       ` Marcelo Tosatti
2004-06-15 14:35         ` Stian Jordet
2004-06-15 17:56         ` Steven Dake
2004-06-17 13:16           ` [2.4] page->buffers vanished in journal_try_to_free_buffers() Marcelo Tosatti
2004-06-18  3:08             ` Andrew Morton
2004-06-19 19:48               ` Marcelo Tosatti [this message]
2004-06-19 19:50                 ` Frank van Maarseveen
2004-06-19 22:17                   ` Marcelo Tosatti
2004-06-19 22:44                     ` Frank van Maarseveen
2004-06-19 20:04                 ` Andrew Morton
2004-06-20  7:56                 ` Willy Tarreau
2004-06-21 15:06             ` Stephen C. Tweedie
2004-06-21 15:53               ` Marcelo Tosatti
2004-06-22 22:13                 ` Stephen C. Tweedie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040619194849.GA2843@logos.cnet \
    --to=marcelo.tosatti@cyclades.com \
    --cc=akpm@osdl.org \
    --cc=frankvm@xs4all.nl \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liste@jordet.nu \
    --cc=sct@redhat.com \
    --cc=sdake@mvista.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox