* RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4 [not found] <20070402224745.71a25af7.akpm@linux-foundation.org> @ 2007-04-05 16:33 ` Reuben Farrelly 2007-04-05 20:21 ` Andrew Morton 2007-04-11 3:31 ` Neil Brown 0 siblings, 2 replies; 4+ messages in thread From: Reuben Farrelly @ 2007-04-05 16:33 UTC (permalink / raw) To: Andrew Morton, neilb; +Cc: linux-kernel, linux-raid Hi, On 3/04/2007 3:47 PM, Andrew Morton wrote: > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/ > > - The oops in git-net.patch has been fixed, so that tree has been restored. > It is huge. > > - Added the device-mapper development tree to the -mm lineup (Alasdair > Kergon). It is a quilt tree, living at > ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/. > > - Added davidel's signalfd stuff. Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1. md1 is the first array on the disk, and it refuses to start up on boot, or after boot. tornado ~ # cat /proc/mdstat Personalities : [raid1] md1 : inactive sda1[0] sdc1[1] 208640 blocks md3 : active raid1 sdc3[1] sda3[0] 20008832 blocks [2/2] [UU] bitmap: 0/153 pages [0KB], 64KB chunk md5 : active raid1 sdc5[1] sda5[0] 10008384 blocks [2/2] [UU] bitmap: 4/153 pages [16KB], 32KB chunk md6 : active raid1 sdc6[1] sda6[0] 10008384 blocks [2/2] [UU] bitmap: 0/153 pages [0KB], 32KB chunk md8 : active raid1 sdc8[1] sda8[0] 1003904 blocks [2/2] [UU] bitmap: 0/123 pages [0KB], 4KB chunk md10 : active raid1 sdc10[1] sda10[0] 119933120 blocks [2/2] [UU] bitmap: 1/229 pages [4KB], 256KB chunk md2 : active raid1 sdc2[1] sda2[0] 100004544 blocks [2/2] [UU] bitmap: 10/191 pages [40KB], 256KB chunk unused devices: <none> tornado ~ # tornado ~ # mdadm --examine /dev/sda1 /dev/sda1: Magic : a92b4efc Version : 00.90.00 UUID : f5c2e565:5ed956c0:33b08c07:16154426 Creation Time : Fri Feb 2 10:16:29 2007 Raid Level : raid1 Used Dev Size : 104320 (101.89 MiB 106.82 MB) Array Size : 104320 (101.89 MiB 106.82 MB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Update Time : Fri Apr 6 02:06:17 2007 State : clean Internal Bitmap : present Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : d3668aaa - correct Events : 0.368 Number Major Minor RaidDevice State this 0 8 1 0 active sync /dev/sda1 0 0 8 1 0 active sync /dev/sda1 1 1 8 33 1 active sync /dev/sdc1 tornado ~ # mdadm --examine /dev/sdc1 /dev/sdc1: Magic : a92b4efc Version : 00.90.00 UUID : f5c2e565:5ed956c0:33b08c07:16154426 Creation Time : Fri Feb 2 10:16:29 2007 Raid Level : raid1 Used Dev Size : 104320 (101.89 MiB 106.82 MB) Array Size : 104320 (101.89 MiB 106.82 MB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Update Time : Fri Apr 6 02:06:17 2007 State : clean Internal Bitmap : present Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : d3668acc - correct Events : 0.368 Number Major Minor RaidDevice State this 1 8 33 1 active sync /dev/sdc1 0 0 8 1 0 active sync /dev/sda1 1 1 8 33 1 active sync /dev/sdc1 tornado ~ # tornado ~ # mdadm --assemble /dev/md1 /dev/sda1 /dev/sdc1 mdadm: device /dev/md1 already active - cannot assemble it tornado ~ # mdadm --run /dev/md1 mdadm: failed to run array /dev/md1: Cannot allocate memory tornado ~ # and looking at a dmesg, this is logged: md: bind<sdc1> md: bind<sda1> raid1: raid set md1 active with 2 out of 2 mirrors md1: bitmap initialized from disk: read 0/1 pages, set 0 bits, status: -12 md1: failed to create bitmap (-12) md: pers->run() failed ... tornado ~ # uname -a Linux tornado 2.6.21-rc5-mm4 #1 SMP Thu Apr 5 23:47:42 EST 2007 x86_64 Intel(R) Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux tornado ~ # The last known version that worked was 2.6.21-rc3-mm1 - I haven't been testing out the -mm releases so much lately. Also, Andrew, can you please restart posting/cc'ing your -mm announcements to the linux-kernel-announce@vger.kernel.org list? Seems this stopped around about 2.6.20, it was handy. .config is up at http://www.reub.net/files/kernel/configs/2.6.21-rc5-mm4 Thanks, Reuben ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4 2007-04-05 16:33 ` RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4 Reuben Farrelly @ 2007-04-05 20:21 ` Andrew Morton 2007-04-06 5:34 ` Dan Williams 2007-04-11 3:31 ` Neil Brown 1 sibling, 1 reply; 4+ messages in thread From: Andrew Morton @ 2007-04-05 20:21 UTC (permalink / raw) To: Reuben Farrelly; +Cc: neilb, linux-kernel, linux-raid On Fri, 06 Apr 2007 02:33:03 +1000 Reuben Farrelly <reuben-linuxkernel@reub.net> wrote: > Hi, > > On 3/04/2007 3:47 PM, Andrew Morton wrote: > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/ > > > > - The oops in git-net.patch has been fixed, so that tree has been restored. > > It is huge. > > > > - Added the device-mapper development tree to the -mm lineup (Alasdair > > Kergon). It is a quilt tree, living at > > ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/. > > > > - Added davidel's signalfd stuff. > > Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1. > > md1 is the first array on the disk, and it refuses to start up on boot, or after > boot. > > ... > > tornado ~ # mdadm --assemble /dev/md1 /dev/sda1 /dev/sdc1 > mdadm: device /dev/md1 already active - cannot assemble it > tornado ~ # mdadm --run /dev/md1 > mdadm: failed to run array /dev/md1: Cannot allocate memory > tornado ~ # > > and looking at a dmesg, this is logged: > > md: bind<sdc1> > md: bind<sda1> > raid1: raid set md1 active with 2 out of 2 mirrors > md1: bitmap initialized from disk: read 0/1 pages, set 0 bits, status: -12 > md1: failed to create bitmap (-12) > md: pers->run() failed ... > > tornado ~ # uname -a > Linux tornado 2.6.21-rc5-mm4 #1 SMP Thu Apr 5 23:47:42 EST 2007 x86_64 Intel(R) > Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux > tornado ~ # > > The last known version that worked was 2.6.21-rc3-mm1 - I haven't been testing > out the -mm releases so much lately. OK. I assume that bitmap->chunks in bitmap_init_from_disk() has some unexpectedly large value. I don't _think_ there's anything in -mm which would have triggered this. Does mainline do the same thing? I guess it's possible that the code in git-md-accel.patch accidentally broke things. Perhaps try disabling CONFIG_DMA_ENGINE? > Also, Andrew, can you please restart posting/cc'ing your -mm announcements to > the linux-kernel-announce@vger.kernel.org list? Seems this stopped around about > 2.6.20, it was handy. hm. I always Bcc linux-kernel-announce@vger.kernel.org. I assume that its filters didn't get updated after the s/osdl/linux-foundation/ thing. I'll talk to people, thanks. > .config is up at http://www.reub.net/files/kernel/configs/2.6.21-rc5-mm4 > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4 2007-04-05 20:21 ` Andrew Morton @ 2007-04-06 5:34 ` Dan Williams 0 siblings, 0 replies; 4+ messages in thread From: Dan Williams @ 2007-04-06 5:34 UTC (permalink / raw) To: Andrew Morton; +Cc: Reuben Farrelly, neilb, linux-kernel, linux-raid On 4/5/07, Andrew Morton <akpm@linux-foundation.org> wrote: > On Fri, 06 Apr 2007 02:33:03 +1000 > Reuben Farrelly <reuben-linuxkernel@reub.net> wrote: > > > Hi, > > > > On 3/04/2007 3:47 PM, Andrew Morton wrote: > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/ > > > > > > - The oops in git-net.patch has been fixed, so that tree has been restored. > > > It is huge. > > > > > > - Added the device-mapper development tree to the -mm lineup (Alasdair > > > Kergon). It is a quilt tree, living at > > > ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/. > > > > > > - Added davidel's signalfd stuff. > > > > Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1. > > > > md1 is the first array on the disk, and it refuses to start up on boot, or after > > boot. > > > > ... > > > > tornado ~ # mdadm --assemble /dev/md1 /dev/sda1 /dev/sdc1 > > mdadm: device /dev/md1 already active - cannot assemble it > > tornado ~ # mdadm --run /dev/md1 > > mdadm: failed to run array /dev/md1: Cannot allocate memory > > tornado ~ # > > > > and looking at a dmesg, this is logged: > > > > md: bind<sdc1> > > md: bind<sda1> > > raid1: raid set md1 active with 2 out of 2 mirrors > > md1: bitmap initialized from disk: read 0/1 pages, set 0 bits, status: -12 > > md1: failed to create bitmap (-12) > > md: pers->run() failed ... Is this the dmesg from boot or the dmesg after running the mdadm --run command? > > > > tornado ~ # uname -a > > Linux tornado 2.6.21-rc5-mm4 #1 SMP Thu Apr 5 23:47:42 EST 2007 x86_64 Intel(R) > > Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux > > tornado ~ # > > > > The last known version that worked was 2.6.21-rc3-mm1 - I haven't been testing > > out the -mm releases so much lately. > > OK. I assume that bitmap->chunks in bitmap_init_from_disk() has some > unexpectedly large value. > > I don't _think_ there's anything in -mm which would have triggered this. > Does mainline do the same thing? > > I guess it's possible that the code in git-md-accel.patch accidentally > broke things. Perhaps try disabling CONFIG_DMA_ENGINE? > git-md-accel.patch does not touch anything in the raid1 path, but I guess stranger things have happened. -- Dan ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4 2007-04-05 16:33 ` RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4 Reuben Farrelly 2007-04-05 20:21 ` Andrew Morton @ 2007-04-11 3:31 ` Neil Brown 1 sibling, 0 replies; 4+ messages in thread From: Neil Brown @ 2007-04-11 3:31 UTC (permalink / raw) To: Reuben Farrelly; +Cc: Andrew Morton, stable, linux-kernel, linux-raid On Friday April 6, reuben-linuxkernel@reub.net wrote: > > Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1. Difference is that kzalloc(0, ) now returns NULL. Maybe it is a SLUB/SLAB difference? (So maybe it did use memory it shouldn't have before, but now it fails, which is the better behaviour). This patch fixes the maths and should probably go in various 'stable' kernels. Bug is in 2.6.18, but not 2.6.16. Patch won't work for 2.6.18 as DIV_ROUND_UP is missing, but 2.6.19 and later have it. Thanks for the bug report. NeilBrown ----------------------------- Fix calculation for size of filemap_attr array in md/bitmap. If 'num_pages' were ever 1 more than a multiple of 8 (32bit platforms) for of 16 (64 bit platforms). filemap_attr would be allocated one 'unsigned long' shorter than required. We need a round-up in there. Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./drivers/md/bitmap.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c --- .prev/drivers/md/bitmap.c 2007-04-11 13:24:50.000000000 +1000 +++ ./drivers/md/bitmap.c 2007-04-11 13:24:59.000000000 +1000 @@ -863,9 +863,7 @@ static int bitmap_init_from_disk(struct /* We need 4 bits per page, rounded up to a multiple of sizeof(unsigned long) */ bitmap->filemap_attr = kzalloc( - (((num_pages*4/8)+sizeof(unsigned long)-1) - /sizeof(unsigned long)) - *sizeof(unsigned long), + roundup( DIV_ROUND_UP(num_pages*4, 8), sizeof(unsigned long)), GFP_KERNEL); if (!bitmap->filemap_attr) goto out; ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2007-04-11 3:31 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20070402224745.71a25af7.akpm@linux-foundation.org>
2007-04-05 16:33 ` RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4 Reuben Farrelly
2007-04-05 20:21 ` Andrew Morton
2007-04-06 5:34 ` Dan Williams
2007-04-11 3:31 ` Neil Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).