* RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4
[not found] <20070402224745.71a25af7.akpm@linux-foundation.org>
@ 2007-04-05 16:33 ` Reuben Farrelly
2007-04-05 20:21 ` Andrew Morton
2007-04-11 3:31 ` Neil Brown
0 siblings, 2 replies; 4+ messages in thread
From: Reuben Farrelly @ 2007-04-05 16:33 UTC (permalink / raw)
To: Andrew Morton, neilb; +Cc: linux-kernel, linux-raid
Hi,
On 3/04/2007 3:47 PM, Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/
>
> - The oops in git-net.patch has been fixed, so that tree has been restored.
> It is huge.
>
> - Added the device-mapper development tree to the -mm lineup (Alasdair
> Kergon). It is a quilt tree, living at
> ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/.
>
> - Added davidel's signalfd stuff.
Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1.
md1 is the first array on the disk, and it refuses to start up on boot, or after
boot.
tornado ~ # cat /proc/mdstat
Personalities : [raid1]
md1 : inactive sda1[0] sdc1[1]
208640 blocks
md3 : active raid1 sdc3[1] sda3[0]
20008832 blocks [2/2] [UU]
bitmap: 0/153 pages [0KB], 64KB chunk
md5 : active raid1 sdc5[1] sda5[0]
10008384 blocks [2/2] [UU]
bitmap: 4/153 pages [16KB], 32KB chunk
md6 : active raid1 sdc6[1] sda6[0]
10008384 blocks [2/2] [UU]
bitmap: 0/153 pages [0KB], 32KB chunk
md8 : active raid1 sdc8[1] sda8[0]
1003904 blocks [2/2] [UU]
bitmap: 0/123 pages [0KB], 4KB chunk
md10 : active raid1 sdc10[1] sda10[0]
119933120 blocks [2/2] [UU]
bitmap: 1/229 pages [4KB], 256KB chunk
md2 : active raid1 sdc2[1] sda2[0]
100004544 blocks [2/2] [UU]
bitmap: 10/191 pages [40KB], 256KB chunk
unused devices: <none>
tornado ~ #
tornado ~ # mdadm --examine /dev/sda1
/dev/sda1:
Magic : a92b4efc
Version : 00.90.00
UUID : f5c2e565:5ed956c0:33b08c07:16154426
Creation Time : Fri Feb 2 10:16:29 2007
Raid Level : raid1
Used Dev Size : 104320 (101.89 MiB 106.82 MB)
Array Size : 104320 (101.89 MiB 106.82 MB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Update Time : Fri Apr 6 02:06:17 2007
State : clean
Internal Bitmap : present
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Checksum : d3668aaa - correct
Events : 0.368
Number Major Minor RaidDevice State
this 0 8 1 0 active sync /dev/sda1
0 0 8 1 0 active sync /dev/sda1
1 1 8 33 1 active sync /dev/sdc1
tornado ~ # mdadm --examine /dev/sdc1
/dev/sdc1:
Magic : a92b4efc
Version : 00.90.00
UUID : f5c2e565:5ed956c0:33b08c07:16154426
Creation Time : Fri Feb 2 10:16:29 2007
Raid Level : raid1
Used Dev Size : 104320 (101.89 MiB 106.82 MB)
Array Size : 104320 (101.89 MiB 106.82 MB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Update Time : Fri Apr 6 02:06:17 2007
State : clean
Internal Bitmap : present
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Checksum : d3668acc - correct
Events : 0.368
Number Major Minor RaidDevice State
this 1 8 33 1 active sync /dev/sdc1
0 0 8 1 0 active sync /dev/sda1
1 1 8 33 1 active sync /dev/sdc1
tornado ~ #
tornado ~ # mdadm --assemble /dev/md1 /dev/sda1 /dev/sdc1
mdadm: device /dev/md1 already active - cannot assemble it
tornado ~ # mdadm --run /dev/md1
mdadm: failed to run array /dev/md1: Cannot allocate memory
tornado ~ #
and looking at a dmesg, this is logged:
md: bind<sdc1>
md: bind<sda1>
raid1: raid set md1 active with 2 out of 2 mirrors
md1: bitmap initialized from disk: read 0/1 pages, set 0 bits, status: -12
md1: failed to create bitmap (-12)
md: pers->run() failed ...
tornado ~ # uname -a
Linux tornado 2.6.21-rc5-mm4 #1 SMP Thu Apr 5 23:47:42 EST 2007 x86_64 Intel(R)
Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux
tornado ~ #
The last known version that worked was 2.6.21-rc3-mm1 - I haven't been testing
out the -mm releases so much lately.
Also, Andrew, can you please restart posting/cc'ing your -mm announcements to
the linux-kernel-announce@vger.kernel.org list? Seems this stopped around about
2.6.20, it was handy.
.config is up at http://www.reub.net/files/kernel/configs/2.6.21-rc5-mm4
Thanks,
Reuben
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4
2007-04-05 16:33 ` RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4 Reuben Farrelly
@ 2007-04-05 20:21 ` Andrew Morton
2007-04-06 5:34 ` Dan Williams
2007-04-11 3:31 ` Neil Brown
1 sibling, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2007-04-05 20:21 UTC (permalink / raw)
To: Reuben Farrelly; +Cc: neilb, linux-kernel, linux-raid
On Fri, 06 Apr 2007 02:33:03 +1000
Reuben Farrelly <reuben-linuxkernel@reub.net> wrote:
> Hi,
>
> On 3/04/2007 3:47 PM, Andrew Morton wrote:
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/
> >
> > - The oops in git-net.patch has been fixed, so that tree has been restored.
> > It is huge.
> >
> > - Added the device-mapper development tree to the -mm lineup (Alasdair
> > Kergon). It is a quilt tree, living at
> > ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/.
> >
> > - Added davidel's signalfd stuff.
>
> Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1.
>
> md1 is the first array on the disk, and it refuses to start up on boot, or after
> boot.
>
> ...
>
> tornado ~ # mdadm --assemble /dev/md1 /dev/sda1 /dev/sdc1
> mdadm: device /dev/md1 already active - cannot assemble it
> tornado ~ # mdadm --run /dev/md1
> mdadm: failed to run array /dev/md1: Cannot allocate memory
> tornado ~ #
>
> and looking at a dmesg, this is logged:
>
> md: bind<sdc1>
> md: bind<sda1>
> raid1: raid set md1 active with 2 out of 2 mirrors
> md1: bitmap initialized from disk: read 0/1 pages, set 0 bits, status: -12
> md1: failed to create bitmap (-12)
> md: pers->run() failed ...
>
> tornado ~ # uname -a
> Linux tornado 2.6.21-rc5-mm4 #1 SMP Thu Apr 5 23:47:42 EST 2007 x86_64 Intel(R)
> Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux
> tornado ~ #
>
> The last known version that worked was 2.6.21-rc3-mm1 - I haven't been testing
> out the -mm releases so much lately.
OK. I assume that bitmap->chunks in bitmap_init_from_disk() has some
unexpectedly large value.
I don't _think_ there's anything in -mm which would have triggered this.
Does mainline do the same thing?
I guess it's possible that the code in git-md-accel.patch accidentally
broke things. Perhaps try disabling CONFIG_DMA_ENGINE?
> Also, Andrew, can you please restart posting/cc'ing your -mm announcements to
> the linux-kernel-announce@vger.kernel.org list? Seems this stopped around about
> 2.6.20, it was handy.
hm. I always Bcc linux-kernel-announce@vger.kernel.org. I assume that its
filters didn't get updated after the s/osdl/linux-foundation/ thing. I'll
talk to people, thanks.
> .config is up at http://www.reub.net/files/kernel/configs/2.6.21-rc5-mm4
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4
2007-04-05 20:21 ` Andrew Morton
@ 2007-04-06 5:34 ` Dan Williams
0 siblings, 0 replies; 4+ messages in thread
From: Dan Williams @ 2007-04-06 5:34 UTC (permalink / raw)
To: Andrew Morton; +Cc: Reuben Farrelly, neilb, linux-kernel, linux-raid
On 4/5/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Fri, 06 Apr 2007 02:33:03 +1000
> Reuben Farrelly <reuben-linuxkernel@reub.net> wrote:
>
> > Hi,
> >
> > On 3/04/2007 3:47 PM, Andrew Morton wrote:
> > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/
> > >
> > > - The oops in git-net.patch has been fixed, so that tree has been restored.
> > > It is huge.
> > >
> > > - Added the device-mapper development tree to the -mm lineup (Alasdair
> > > Kergon). It is a quilt tree, living at
> > > ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/.
> > >
> > > - Added davidel's signalfd stuff.
> >
> > Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1.
> >
> > md1 is the first array on the disk, and it refuses to start up on boot, or after
> > boot.
> >
> > ...
> >
> > tornado ~ # mdadm --assemble /dev/md1 /dev/sda1 /dev/sdc1
> > mdadm: device /dev/md1 already active - cannot assemble it
> > tornado ~ # mdadm --run /dev/md1
> > mdadm: failed to run array /dev/md1: Cannot allocate memory
> > tornado ~ #
> >
> > and looking at a dmesg, this is logged:
> >
> > md: bind<sdc1>
> > md: bind<sda1>
> > raid1: raid set md1 active with 2 out of 2 mirrors
> > md1: bitmap initialized from disk: read 0/1 pages, set 0 bits, status: -12
> > md1: failed to create bitmap (-12)
> > md: pers->run() failed ...
Is this the dmesg from boot or the dmesg after running the mdadm --run command?
> >
> > tornado ~ # uname -a
> > Linux tornado 2.6.21-rc5-mm4 #1 SMP Thu Apr 5 23:47:42 EST 2007 x86_64 Intel(R)
> > Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux
> > tornado ~ #
> >
> > The last known version that worked was 2.6.21-rc3-mm1 - I haven't been testing
> > out the -mm releases so much lately.
>
> OK. I assume that bitmap->chunks in bitmap_init_from_disk() has some
> unexpectedly large value.
>
> I don't _think_ there's anything in -mm which would have triggered this.
> Does mainline do the same thing?
>
> I guess it's possible that the code in git-md-accel.patch accidentally
> broke things. Perhaps try disabling CONFIG_DMA_ENGINE?
>
git-md-accel.patch does not touch anything in the raid1 path, but I
guess stranger things have happened.
--
Dan
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4
2007-04-05 16:33 ` RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4 Reuben Farrelly
2007-04-05 20:21 ` Andrew Morton
@ 2007-04-11 3:31 ` Neil Brown
1 sibling, 0 replies; 4+ messages in thread
From: Neil Brown @ 2007-04-11 3:31 UTC (permalink / raw)
To: Reuben Farrelly; +Cc: Andrew Morton, stable, linux-kernel, linux-raid
On Friday April 6, reuben-linuxkernel@reub.net wrote:
>
> Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1.
Difference is that kzalloc(0, ) now returns NULL. Maybe it is a
SLUB/SLAB difference? (So maybe it did use memory it shouldn't have
before, but now it fails, which is the better behaviour).
This patch fixes the maths and should probably go in various 'stable'
kernels. Bug is in 2.6.18, but not 2.6.16.
Patch won't work for 2.6.18 as DIV_ROUND_UP is missing, but 2.6.19 and
later have it.
Thanks for the bug report.
NeilBrown
-----------------------------
Fix calculation for size of filemap_attr array in md/bitmap.
If 'num_pages' were ever 1 more than a multiple of 8 (32bit platforms)
for of 16 (64 bit platforms). filemap_attr would be allocated one
'unsigned long' shorter than required. We need a round-up in there.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/bitmap.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c
--- .prev/drivers/md/bitmap.c 2007-04-11 13:24:50.000000000 +1000
+++ ./drivers/md/bitmap.c 2007-04-11 13:24:59.000000000 +1000
@@ -863,9 +863,7 @@ static int bitmap_init_from_disk(struct
/* We need 4 bits per page, rounded up to a multiple of sizeof(unsigned long) */
bitmap->filemap_attr = kzalloc(
- (((num_pages*4/8)+sizeof(unsigned long)-1)
- /sizeof(unsigned long))
- *sizeof(unsigned long),
+ roundup( DIV_ROUND_UP(num_pages*4, 8), sizeof(unsigned long)),
GFP_KERNEL);
if (!bitmap->filemap_attr)
goto out;
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2007-04-11 3:31 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20070402224745.71a25af7.akpm@linux-foundation.org>
2007-04-05 16:33 ` RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4 Reuben Farrelly
2007-04-05 20:21 ` Andrew Morton
2007-04-06 5:34 ` Dan Williams
2007-04-11 3:31 ` Neil Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).