All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
To: Andrew Morton <akpm@osdl.org>, frankvm@xs4all.nl
Cc: sdake@mvista.com, liste@jordet.nu, linux-kernel@vger.kernel.org,
	sct@redhat.com
Subject: Re: [2.4] page->buffers vanished in journal_try_to_free_buffers()
Date: Sat, 19 Jun 2004 16:48:49 -0300	[thread overview]
Message-ID: <20040619194849.GA2843@logos.cnet> (raw)
In-Reply-To: <20040617200859.7fada9fe.akpm@osdl.org>

On Thu, Jun 17, 2004 at 08:08:59PM -0700, Andrew Morton wrote:
> Marcelo Tosatti <marcelo.tosatti@cyclades.com> wrote:
> >
> >  > tmp (page->buffers) above is null.  b_this_page is at offset 0x28 (the accessed address in the oops).  This means that
> >  > page->buffers is set to null by some other routine which results in the oops.
> >  > 
> >  > I read the page allocate code
> >  > (ext3_read_page->block_read_full_page->create_emty_buffers->create_buffers), and it appears that it is not possible to allocate a page->buffers value of zero in the allocate function.  I am having difficulty reproducing and cannot debug further, however.  Can page->buffers be set to zero somewhere else?  
> >  >Perhaps kswapd and some other thread are racing on the free?
> > 
> >  Steve, 
> > 
> >  Hum, I'm starting to believe we might have an issue here.
> > 
> >  Searching lkml archives I find other similar oopses at the same place 
> >  (trying to access 00000028, tmp->b_this_page), as you said.
> > 
 >  However I wonder what other kernel codepath could remove the page buffers
> >  under us, the page MUST be locked here. In the backtrace above the page 
> >  is locked by shrink_cache(). And with the page locked, we guarantee the VM
> >  freeing routines (shrink_cache) wont try to mess with the page.
> > 
> >  Can you reproduce the oopsen?
> > 
> >  Stephen, Andrew, do you have any idea how the buffers could have vanished
> >  under us with the page locked? That should not be possible. 
> > 
> >  I dont see how this "page->buffers = NULL" could be caused by hardware problem, 
> >  which is usually one or two bit flip.
> 
> It's a bit odd.  The page is definitely locked, and definitely had non-null
> ->buffers a few tens of instructions beforehand.
> 
> Is this an SMP machine?

Steven, did you see the NULL ->b_this_page on SMP or UP?

Stian Jordet had an SMP server, but he also was seeing oopses with v2.6:

 kernel BUG at mm/page_alloc.c:201!
 invalid operand: 0000 [#1]
 CPU:    0
 EIP:    0060:[free_pages_bulk+482/512]    Not tainted
 EIP is at free_pages_bulk+0x1e2/0x200
 eax: 00000001   ebx: c00609c8   ecx: 00000000   edx: \
                666026a5
 esi: 666026a4   edi: ffffffff   ebp: 33301352   esp: \
                c86d5d90
 Process mrtg (pid: 26804, threadinfo=c86d4000 \
                task=c9b860c0)
 Call Trace:
  [free_hot_cold_page+217/240] \
                free_hot_cold_page+0xd9/0xf0
  [do_generic_mapping_read+714/1008] \
                do_generic_mapping_read+0x2ca/0x3f0
  [file_read_actor+0/256] file_read_actor+0x0/0x100
  [__generic_file_aio_read+454/512] \
                __generic_file_aio_read+0x1c6/0x200
  [file_read_actor+0/256] file_read_actor+0x0/0x100
  [generic_file_aio_read+91/128] \
                generic_file_aio_read+0x5b/0x80
  [do_sync_read+137/192] do_sync_read+0x89/0xc0
  [do_page_fault+300/1328] do_page_fault+0x12c/0x530
  [do_brk+324/560] do_brk+0x144/0x230
  [vfs_read+184/304] vfs_read+0xb8/0x130
  [sys_read+66/112] sys_read+0x42/0x70
  [syscall_call+7/11] syscall_call+0x7/0xb
 
and different oopses on v2.4, including sync_page_buffers (also NULL+offset access): 

<1>Unable to handle kernel NULL pointer dereference at virtual address 00000021
c0132e86

eax: 00000000   ebx: 00000009   ecx: 000001d2   edx: 00000012
esi: 00000000   edi: c17e38c0   ebp: c1047a00   esp: c86cbdb4

>>EIP; c0132e86 <sync_page_buffers+e/a4>   <=====

Trace; c0132fdc <try_to_free_buffers+c0/ec>

Code;  c0132e86 <sync_page_buffers+e/a4>
00000000 <_EIP>:
Code;  c0132e86 <sync_page_buffers+e/a4>   <=====
   0:   f6 43 18 06               testb  $0x6,0x18(%ebx)   <=====
Code;  c0132e8a <sync_page_buffers+12/a4>

and the journal_try_to_free_buffers() one:

Unable to handle kernel NULL pointer dereference at virtual address 00000028
c015e3a2
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<c015e3a2>]    Not tainted
EFLAGS: 00010203

eax: 0100004d   ebx: 00000000   ecx: 000001d2   edx: 00000000

Code;  c015e3a2 <journal_try_to_free_buffers+5a/98>
00000000 <_EIP>:
Code;  c015e3a2 <journal_try_to_free_buffers+5a/98>   <=====
   0:   8b 5b 28                  mov    0x28(%ebx),%ebx   <=====
Code;  c015e3a5 <journal_try_to_free_buffers+5d/98>


He upgraded the box and stopped seeing the crashes, running 
recent v2.6. 

However, he also mentioned that his crashes started after upgrading 
from v2.4.19->2.4.22. Should search the diff between them looking for 
anything suspicious.

I can't figure out from the archived reports if this is UP or SMP only. 

Frank van Maarseveen has also seen the journal_try_to_free_buffers() NULL 
b_this_page. Frank, were you running SMP or UP when you reported the oops 
with 2.4.23? 

> One possibility is that we died on the second pass around the loop:
> page->buffers points at a buffer_head which has a NULL ->b_this_page.  But
> I cannot suggest how ->b_this_page could have been zapped.

Oh, yes, indeed. 

Maybe adding this (untested) to v2.4 mainline helps? Comments?

--- transaction.c.orig	2004-06-19 15:21:32.861148560 -0300
+++ transaction.c	2004-06-19 15:23:18.214132472 -0300
@@ -1694,6 +1694,24 @@
 	return 0;
 }
 
+void debug_page(struct page *p)
+{
+	struct buffer_head *bh;
+
+	bh = p->buffers;
+
+	printk(KERN_ERR "%s: page index:%u count:%d flags:%x\n", __FUNCTION__,
+		,p->index , atomic_read(&p->count), p->flags);
+
+	do {
+		printk(KERN_ERR "%s: bh b_next:%p blocknr:%u b_list:%u state:%x\n",
+			__FUNCTION__, bh->b_next, bh->b_blocknr, bh->b_list,
+				bh->b_state);
+		bh = bh->b_this_page;
+	} while (bh);
+}
+
+
 
 /** 
  * int journal_try_to_free_buffers() - try to free page buffers.
@@ -1752,6 +1770,11 @@
 	do {
 		struct buffer_head *p = tmp;
 
+		if (!unlikely(tmp)) {
+			debug_page(page);
+			BUG();
+		}
+			
 		tmp = tmp->b_this_page;
 		if (buffer_jbd(p))
 			if (!__journal_try_to_free_buffer(p, &locked_or_dirty))

  reply	other threads:[~2004-06-19 19:56 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-02-03 18:26 Oopses with both recent 2.4.x kernels and 2.6.x kernels Stian Jordet
2004-02-05 23:51 ` Marcelo Tosatti
2004-03-02 11:03   ` Stian Jordet
2004-03-02 12:31     ` Stian Jordet
2004-03-09 19:22       ` Marcelo Tosatti
2004-03-09 22:28         ` Stian Jordet
2004-06-14 17:07     ` Steven Dake
2004-06-14 18:26       ` Chris Shoemaker
2004-06-15 13:16       ` Marcelo Tosatti
2004-06-15 14:35         ` Stian Jordet
2004-06-15 17:56         ` Steven Dake
2004-06-17 13:16           ` [2.4] page->buffers vanished in journal_try_to_free_buffers() Marcelo Tosatti
2004-06-18  3:08             ` Andrew Morton
2004-06-19 19:48               ` Marcelo Tosatti [this message]
2004-06-19 19:50                 ` Frank van Maarseveen
2004-06-19 22:17                   ` Marcelo Tosatti
2004-06-19 22:44                     ` Frank van Maarseveen
2004-06-19 20:04                 ` Andrew Morton
2004-06-20  7:56                 ` Willy Tarreau
2004-06-21 15:06             ` Stephen C. Tweedie
2004-06-21 15:53               ` Marcelo Tosatti
2004-06-22 22:13                 ` Stephen C. Tweedie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040619194849.GA2843@logos.cnet \
    --to=marcelo.tosatti@cyclades.com \
    --cc=akpm@osdl.org \
    --cc=frankvm@xs4all.nl \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liste@jordet.nu \
    --cc=sct@redhat.com \
    --cc=sdake@mvista.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.