Kernel bug?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Kernel bug?
@ 2009-04-04 15:07 Gabriele Tozzi
  2009-04-04 22:40 ` NeilBrown
  0 siblings, 1 reply; 3+ messages in thread
From: Gabriele Tozzi @ 2009-04-04 15:07 UTC (permalink / raw)
  To: linux-raid

Hello,

I guess I've found a kernel bug: I get an oops when rebuilding a raid1
array (/dev/md5) on an SMP system. The md5_resync process then hangs.

This is my current configuration:

transylvania:~# uname -a
Linux transylvania 2.6.29.1transylvania #1 SMP Sat Apr 4 04:08:40 CEST
2009 i686 GNU/Linux

transylvania:~# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md1 : active raid1 sdb2[1] sda2[0]
      1003968 blocks [2/2] [UU]

md2 : active raid1 sdb5[1] sda5[0]
      2008000 blocks [2/2] [UU]

md3 : active raid1 sdb6[1] sda6[0]
      1003904 blocks [2/2] [UU]

md4 : active raid1 sdb7[1] sda7[0]
      505920 blocks [2/2] [UU]

md6 : active linear sdb8[1] sda8[0]
      25719808 blocks 4k rounding

md5 : active raid1 sdd2[1] sdc2[3]
      97843776 blocks [3/1] [_U_]
      [===>.................]  recovery = 15.3% (15018752/97843776)
finish=35.2min speed=39182K/sec

md0 : active raid1 sdb1[1] sda1[0]
      40064 blocks [2/2] [UU]

unused devices: <none>

When the system hangs, i receive these messages:

transylvania:~#
Message from syslogd@localhost at Sat Apr  4 16:51:17 2009 ...
localhost kernel: Oops: 0000 [#1] SMP

Message from syslogd@localhost at Sat Apr  4 16:51:17 2009 ...
localhost kernel: last sysfs file:
/sys/devices/pci0000:00/0000:00:04.2/usb2/2-2/2-2:1.0/host5/target5:0:0/5:0:0:0/type

Message from syslogd@localhost at Sat Apr  4 16:51:17 2009 ...
localhost kernel: Process md5_resync (pid: 6969, ti=dcbba000
task=d59c72a0 task.ti=dcbba000)

Message from syslogd@localhost at Sat Apr  4 16:51:17 2009 ...
localhost kernel: Stack:

Message from syslogd@localhost at Sat Apr  4 16:51:17 2009 ...
localhost kernel:  df3e2080 0ba9f480 00011210 df3e209c c0163dfc 00000010
00000078 00000000

Message from syslogd@localhost at Sat Apr  4 16:51:17 2009 ...
localhost kernel:  c019f38f 00000000 0ba9f480 00000000 0ba9f480 00000000
dcbbbe78 c04a29fd

Message from syslogd@localhost at Sat Apr  4 16:51:17 2009 ...
localhost kernel: Call Trace:

Message from syslogd@localhost at Sat Apr  4 16:51:17 2009 ...
localhost kernel:  [<c04a2753>] r1buf_pool_alloc+0x11c/0x164

Message from syslogd@localhost at Sat Apr  4 16:51:17 2009 ...
localhost kernel:  [<c0163dfc>] mempool_alloc+0x27/0xcb

Message from syslogd@localhost at Sat Apr  4 16:51:17 2009 ...
localhost kernel:  [<c019f38f>] bio_add_page+0x28/0x2e

Message from syslogd@localhost at Sat Apr  4 16:51:17 2009 ...
localhost kernel:  [<c04a29fd>] sync_request+0x1fd/0x5f2

Message from syslogd@localhost at Sat Apr  4 16:51:17 2009 ...
localhost kernel:  [<c01374ba>] __wake_up+0x29/0x39

Message from syslogd@localhost at Sat Apr  4 16:51:17 2009 ...
localhost kernel:  [<c04b5028>] md_do_sync+0x6d4/0xb60

Message from syslogd@localhost at Sat Apr  4 16:51:17 2009 ...
localhost kernel:  [<c013976d>] set_next_entity+0x29/0x51

Message from syslogd@localhost at Sat Apr  4 16:51:17 2009 ...
localhost kernel:  [<c013b884>] try_to_wake_up+0x127/0x130

Message from syslogd@localhost at Sat Apr  4 16:51:17 2009 ...
localhost kernel:  [<c04b58b0>] md_thread+0xdd/0xf4

Message from syslogd@localhost at Sat Apr  4 16:51:17 2009 ...
localhost kernel:  [<c0137445>] complete+0x28/0x36

Message from syslogd@localhost at Sat Apr  4 16:51:17 2009 ...
localhost kernel:  [<c04b57d3>] md_thread+0x0/0xf4

Message from syslogd@localhost at Sat Apr  4 16:51:17 2009 ...
localhost kernel:  [<c014e857>] kthread+0x38/0x5d

Message from syslogd@localhost at Sat Apr  4 16:51:17 2009 ...
localhost kernel:  [<c014e81f>] kthread+0x0/0x5d

Message from syslogd@localhost at Sat Apr  4 16:51:17 2009 ...
localhost kernel:  [<c011d8a3>] kernel_thread_helper+0x7/0x10

Message from syslogd@localhost at Sat Apr  4 16:51:17 2009 ...
localhost kernel: Code: 14 10 8b 02 89 4c 82 08 40 83 f8 0e 89 02 75 07
89 d0 e8 06 fe ff ff 89 d8 50 9d 90 8d b4 26 00 00 00 00 5b c3 55 57 56
53 89 c3 <8b> 00 f6 c4 60 74 23 f6 c4 40 74 03 8b 5b 0c 8d 43 04 f0 ff 08

Message from syslogd@localhost at Sat Apr  4 16:51:17 2009 ...
localhost kernel: EIP: [<c0168f1d>] put_page+0x6/0xdd SS:ESP 0068:dcbbbdd8

transylvania:~#

Please include my address in replies 'cause I'm not subscribed to this
list. Thank you.

Gabriele Tozzi

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Kernel bug?
  2009-04-04 15:07 Kernel bug? Gabriele Tozzi
@ 2009-04-04 22:40 ` NeilBrown
  2009-04-06  4:43   ` Neil Brown
  0 siblings, 1 reply; 3+ messages in thread
From: NeilBrown @ 2009-04-04 22:40 UTC (permalink / raw)
  To: Gabriele Tozzi; +Cc: linux-raid, Jens Axboe

On Sun, April 5, 2009 1:07 am, Gabriele Tozzi wrote:
> Hello,
>
> I guess I've found a kernel bug: I get an oops when rebuilding a raid1
> array (/dev/md5) on an SMP system. The md5_resync process then hangs.

Yes, it appears you have found a bug.  Thanks for reporting it.

It looks like an alloc_page failed in r1buf_pool_alloc and when trying
to clean up we tried to free pages that had never been allocated.

The code in raid1.c assumes that newly allocated 'bios' have their
bvec initialised to NULLs, but that apparently changed recently
with commit d3f761104b097738932afcc310fbbbbfb007ef92

I'll post a patch after the weekend.

Thanks,

NeilBrown


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Kernel bug?
  2009-04-04 22:40 ` NeilBrown
@ 2009-04-06  4:43   ` Neil Brown
  0 siblings, 0 replies; 3+ messages in thread
From: Neil Brown @ 2009-04-06  4:43 UTC (permalink / raw)
  To: Gabriele Tozzi; +Cc: linux-raid, Jens Axboe

On Sunday April 5, neilb@suse.de wrote:
> On Sun, April 5, 2009 1:07 am, Gabriele Tozzi wrote:
> > Hello,
> >
> > I guess I've found a kernel bug: I get an oops when rebuilding a raid1
> > array (/dev/md5) on an SMP system. The md5_resync process then hangs.
> 
> Yes, it appears you have found a bug.  Thanks for reporting it.
> 
> It looks like an alloc_page failed in r1buf_pool_alloc and when trying
> to clean up we tried to free pages that had never been allocated.
> 
> The code in raid1.c assumes that newly allocated 'bios' have their
> bvec initialised to NULLs, but that apparently changed recently
> with commit d3f761104b097738932afcc310fbbbbfb007ef92
> 
> I'll post a patch after the weekend.

And here it is.   Applying this patch will mean that you are resync a
raid1 array safely.   It will send it upstream shortly and hopefully
it will be in 2.6.29.1

Thanks again,
NeilBrown

From 303a0e11d0ee136ad8f53f747f3c377daece763b Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Mon, 6 Apr 2009 14:40:38 +1000
Subject: [PATCH] md/raid1 - don't assume newly allocated bvecs are initialised.

Since commit d3f761104b097738932afcc310fbbbbfb007ef92
newly allocated bvecs aren't initialised to NULL, so we have
to be more careful about freeing a bio which only managed
to get a few pages allocated to it.  Otherwise the resync
process crashes.

This patch is appropriate for 2.6.29-stable.

Cc: stable@kernel.org
Cc: "Jens Axboe" <jens.axboe@oracle.com>
Reported-by: Gabriele Tozzi <gabriele@tozzi.eu>
Signed-off-by: NeilBrown <neilb@suse.de>
---
 drivers/md/raid1.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index b4f4bad..f2247b0 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -123,6 +123,7 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
 				goto out_free_pages;
 
 			bio->bi_io_vec[i].bv_page = page;
+			bio->bi_vcnt = i+1;
 		}
 	}
 	/* If not user-requests, copy the page pointers to all bios */
@@ -138,9 +139,9 @@ static void * r1buf_pool_alloc(gfp_t gfp_flags, void *data)
 	return r1_bio;
 
 out_free_pages:
-	for (i=0; i < RESYNC_PAGES ; i++)
-		for (j=0 ; j < pi->raid_disks; j++)
-			safe_put_page(r1_bio->bios[j]->bi_io_vec[i].bv_page);
+	for (j=0 ; j < pi->raid_disks; j++)
+		for (i=0; i < r1_bio->bios[j]->bi_vcnt ; i++)
+			put_page(r1_bio->bios[j]->bi_io_vec[i].bv_page);
 	j = -1;
 out_free_bio:
 	while ( ++j < pi->raid_disks )
-- 
1.6.2.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-04-06  4:43 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-04 15:07 Kernel bug? Gabriele Tozzi
2009-04-04 22:40 ` NeilBrown
2009-04-06  4:43   ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).