* BUGS: internal bitmap during array create
@ 2006-10-11 20:11 Eli Stair
2006-10-13 1:02 ` Eli Stair
2006-10-13 2:48 ` Neil Brown
0 siblings, 2 replies; 7+ messages in thread
From: Eli Stair @ 2006-10-11 20:11 UTC (permalink / raw)
To: linux-raid mailing list
[-- Attachment #1: Type: text/plain, Size: 2287 bytes --]
After realizing my stupid error in specifying the bitmap during array
creation, I've triggered a couple of 100% repeatable bugs with this
scenario.
BUG 1)
When I create an array without a bitmap and add it after the array is
synced, all works fine with any filesystem. If I create WITH the
internal bitmap and use xfs, it chokes at mount time with:
mount: wrong fs type, bad option, bad superblock on /dev/md0,
or too many mounted file systems
xfs_check also dies with:
[root@gtmp01 GTMP]# xfs_check /dev/md0
xfs_check: unexpected XFS SB magic number 0x00000000
xfs_check: read failed: Invalid argument
xfs_check: data size check failed
/usr/sbin/xfs_check: line 28: 30580 Segmentation fault
xfs_db$DBOPTS -i -p xfs_check -c "check$OPTS" $1
Strangely, whatever the underlying cause is, ext3 seems immune (at least
in brief testing) to this. I can create and mount an ext3 filesystem on
top of the array that xfs dies trying to mount.
In the case where the array is created with bitmap at build time, if I
wait until resync is completed, do a 'mdadm -Gb none' followed by 'mdadm
-Gb internal', I can then safely create the XFS filesystem and mount it.
BUG 2)
Another bitmap failure during create time: MDADM dies with an error
after creating the array, when it tries to assemble it, with an
external-file bitmap (on ext3):
[root@gtmp01 GTMP]# mdadm -C /dev/md0 -f --chunk=512 --level=10
-n14 -po2 -e1.2 -bESC[1P^M[root@gtmp01 GTMP]# mdadm -C /dev/md0 -f
--chunk=512 --level=10 -n14 -po2 -e1.2 -b/var/tmp/bitmap /dev/mapper/mpath*
mdadm: RUN_ARRAY failed: Cannot allocate memory
mdadm: stopped /dev/md0
The array can be manually assembled, but it does not load with the
bitmap, even when specifying it with 'mdadm -A /dev/md0 -b/var/tmp/bitmap'.
For reference, I'm running:
mdadm - v2.5.3 - 7 August 2006
mkfs.xfs version 2.8.11
kernel 2.6.18 (Opteron, x86_64, SMP)
I've attached typescript of the sessions where I run through all of
these scenarios, as well as an strace of the "mdadm -C
-b/var/tmp/bitmap" where it fails to assemble the array. Also is a file
with the superblock detail on all the member devices.
Again, more than happy to help test patches and any scenarios.
Cheers,
/eli
[-- Attachment #2: mdadm-create-array-with-bitmap-ext3-success.log.gz --]
[-- Type: application/x-gzip, Size: 23083 bytes --]
[-- Attachment #3: mdadm-create-array-with-bitmap-failure.log.gz --]
[-- Type: application/x-gzip, Size: 1998 bytes --]
[-- Attachment #4: mdadm-create-array-with-bitmap-failure.superblocks.gz --]
[-- Type: application/x-gzip, Size: 1122 bytes --]
[-- Attachment #5: mdadm-create-array-with-bitmap_externalfile-fails.log.gz --]
[-- Type: application/x-gzip, Size: 1696 bytes --]
[-- Attachment #6: mdadm-create-array-with-bitmap_externalfile-fails.strace.gz --]
[-- Type: application/x-gzip, Size: 2887 bytes --]
[-- Attachment #7: mdadm-create-array-without-bitmap-success.log.gz --]
[-- Type: application/x-gzip, Size: 1873 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: BUGS: internal bitmap during array create
2006-10-11 20:11 BUGS: internal bitmap during array create Eli Stair
@ 2006-10-13 1:02 ` Eli Stair
2006-10-13 2:48 ` Neil Brown
1 sibling, 0 replies; 7+ messages in thread
From: Eli Stair @ 2006-10-13 1:02 UTC (permalink / raw)
To: Eli Stair; +Cc: linux-raid mailing list
As of NeilB's release a few minutes ago, this issue is still occuring.
Looks like the XFS superblock isn't being written properly or is
corrupted upon read:
/// xfs_repair can't validate superblock:
[root@gtmp04 ~]# xfs_repair /dev/md0
Phase 1 - find and verify superblock...
bad primary superblock - bad magic number !!!
attempting to find secondary superblock...
/// xfs_check doesn't like superblock magic:
[root@gtmp04 ~]# xfs_check -v /dev/md0
xfs_check: unexpected XFS SB magic number 0x00000000
xfs_check: read failed: Invalid argument
xfs_check: data size check failed
Thanks!
/eli
Eli Stair wrote:
>
> After realizing my stupid error in specifying the bitmap during array
> creation, I've triggered a couple of 100% repeatable bugs with this
> scenario.
>
>
> BUG 1)
>
> When I create an array without a bitmap and add it after the array is
> synced, all works fine with any filesystem. If I create WITH the
> internal bitmap and use xfs, it chokes at mount time with:
>
> mount: wrong fs type, bad option, bad superblock on /dev/md0,
> or too many mounted file systems
>
> xfs_check also dies with:
>
> [root@gtmp01 GTMP]# xfs_check /dev/md0
> xfs_check: unexpected XFS SB magic number 0x00000000
> xfs_check: read failed: Invalid argument
> xfs_check: data size check failed
> /usr/sbin/xfs_check: line 28: 30580 Segmentation fault
> xfs_db$DBOPTS -i -p xfs_check -c "check$OPTS" $1
>
>
> Strangely, whatever the underlying cause is, ext3 seems immune (at least
> in brief testing) to this. I can create and mount an ext3 filesystem on
> top of the array that xfs dies trying to mount.
>
> In the case where the array is created with bitmap at build time, if I
> wait until resync is completed, do a 'mdadm -Gb none' followed by 'mdadm
> -Gb internal', I can then safely create the XFS filesystem and mount it.
>
> BUG 2)
>
> Another bitmap failure during create time: MDADM dies with an error
> after creating the array, when it tries to assemble it, with an
> external-file bitmap (on ext3):
>
>
> [root@gtmp01 GTMP]# mdadm -C /dev/md0 -f --chunk=512 --level=10
> -n14 -po2 -e1.2 -bESC[1P^M[root@gtmp01 GTMP]# mdadm -C /dev/md0 -f
> --chunk=512 --level=10 -n14 -po2 -e1.2 -b/var/tmp/bitmap /dev/mapper/mpath*
> mdadm: RUN_ARRAY failed: Cannot allocate memory
> mdadm: stopped /dev/md0
>
>
> The array can be manually assembled, but it does not load with the
> bitmap, even when specifying it with 'mdadm -A /dev/md0 -b/var/tmp/bitmap'.
>
>
>
>
> For reference, I'm running:
>
> mdadm - v2.5.3 - 7 August 2006
> mkfs.xfs version 2.8.11
> kernel 2.6.18 (Opteron, x86_64, SMP)
>
>
> I've attached typescript of the sessions where I run through all of
> these scenarios, as well as an strace of the "mdadm -C
> -b/var/tmp/bitmap" where it fails to assemble the array. Also is a file
> with the superblock detail on all the member devices.
>
> Again, more than happy to help test patches and any scenarios.
>
> Cheers,
>
> /eli
>
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: BUGS: internal bitmap during array create
2006-10-11 20:11 BUGS: internal bitmap during array create Eli Stair
2006-10-13 1:02 ` Eli Stair
@ 2006-10-13 2:48 ` Neil Brown
2006-10-18 23:26 ` Eli Stair
1 sibling, 1 reply; 7+ messages in thread
From: Neil Brown @ 2006-10-13 2:48 UTC (permalink / raw)
To: Eli Stair; +Cc: linux-raid mailing list
On Wednesday October 11, estair@ilm.com wrote:
>
> After realizing my stupid error in specifying the bitmap during array
> creation, I've triggered a couple of 100% repeatable bugs with this
> scenario.
>
>
> BUG 1)
....
>
>
> Strangely, whatever the underlying cause is, ext3 seems immune (at least
> in brief testing) to this. I can create and mount an ext3 filesystem on
> top of the array that xfs dies trying to mount.
>
> In the case where the array is created with bitmap at build time, if I
> wait until resync is completed, do a 'mdadm -Gb none' followed by 'mdadm
> -Gb internal', I can then safely create the XFS filesystem and mount
> it.
Can you get me the output of
mdadm -X some-component-device
both after the creation with a bitmap, and after the bitmap has been
hot-removed and hot-added.
Just for good measure, include the "mdadm -E" output at the same
times.
>
> BUG 2)
>
> Another bitmap failure during create time: MDADM dies with an error
> after creating the array, when it tries to assemble it, with an
> external-file bitmap (on ext3):
>
>
> [root@gtmp01 GTMP]# mdadm -C /dev/md0 -f --chunk=512 --level=10
> -n14 -po2 -e1.2 -bESC[1P^M[root@gtmp01 GTMP]# mdadm -C /dev/md0 -f
> --chunk=512 --level=10 -n14 -po2 -e1.2 -b/var/tmp/bitmap /dev/mapper/mpath*
> mdadm: RUN_ARRAY failed: Cannot allocate memory
> mdadm: stopped /dev/md0
I thought I had fixed this in 2.5, but on reflection that might not
fixed it for 64bit hosts. Can you try explicitly setting the
--bitmap-chunk size such that there will be fewer than 1,000,000
chunks?
NeilBrown
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BUGS: internal bitmap during array create
2006-10-13 2:48 ` Neil Brown
@ 2006-10-18 23:26 ` Eli Stair
2006-10-19 6:50 ` Neil Brown
0 siblings, 1 reply; 7+ messages in thread
From: Eli Stair @ 2006-10-18 23:26 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid mailing list
[-- Attachment #1: Type: text/plain, Size: 3443 bytes --]
I've provided the requested info, attached as two files (typescript output):
BUG 1)
/ create-array-with-internal-bitmap.out
# this file contains the full series of creation, mkfs, examine
# and mount commands/errors
The only obvious detail I can detect is the 'Chunksize' shown in the
superblock detail. The value set when creating the array with
'-binternal' is "1MB", while after removing and re-creating the bitmap,
it is set to "128MB". After this step (and the chunksize increasing),
initial tests show this to work fine.
One thing I noted, the initial resync time is /incredibly/ shortened
when created with an internal write-intent bitmap... completing in
between ONE and TEN minutes vs. the average 200min. initial sync time
for a 1-2 TB array without the bitmap! I don't understand how this can
occur safely, since the write speed of the raw drives isn't enough to
zero all sectors for sanitizing the RAID blocks in that time period....
is this then left in an unclean/dangerous underlying state?
BUG 2)
/ create-array-with-external-bitmap.out
# this file contains attempts to set the bitmap-chunk for external
# bitmap file, unsuccessfully.
This errored out no matter the values I tested. In summary, using
--bitmap-chunk=[>=128] resulted in:
mdadm: size set to 143374336K
mdadm: RUN_ARRAY failed: No space left on device
--bitmap-chunk=[<=64] resulted in:
mdadm: size set to 143374336K
mdadm: RUN_ARRAY failed: Cannot allocate memory
Cheers,
/eli
Neil Brown wrote:
> On Wednesday October 11, estair@ilm.com wrote:
> >
> > After realizing my stupid error in specifying the bitmap during array
> > creation, I've triggered a couple of 100% repeatable bugs with this
> > scenario.
> >
> >
> > BUG 1)
> ....
> >
> >
> > Strangely, whatever the underlying cause is, ext3 seems immune (at least
> > in brief testing) to this. I can create and mount an ext3 filesystem on
> > top of the array that xfs dies trying to mount.
> >
> > In the case where the array is created with bitmap at build time, if I
> > wait until resync is completed, do a 'mdadm -Gb none' followed by 'mdadm
> > -Gb internal', I can then safely create the XFS filesystem and mount
> > it.
>
> Can you get me the output of
> mdadm -X some-component-device
> both after the creation with a bitmap, and after the bitmap has been
> hot-removed and hot-added.
> Just for good measure, include the "mdadm -E" output at the same
> times.
>
> >
> > BUG 2)
> >
> > Another bitmap failure during create time: MDADM dies with an error
> > after creating the array, when it tries to assemble it, with an
> > external-file bitmap (on ext3):
> >
> >
> > [root@gtmp01 GTMP]# mdadm -C /dev/md0 -f --chunk=512 --level=10
> > -n14 -po2 -e1.2 -bESC[1P^M[root@gtmp01 GTMP]# mdadm -C /dev/md0 -f
> > --chunk=512 --level=10 -n14 -po2 -e1.2 -b/var/tmp/bitmap
> /dev/mapper/mpath*
> > mdadm: RUN_ARRAY failed: Cannot allocate memory
> > mdadm: stopped /dev/md0
>
> I thought I had fixed this in 2.5, but on reflection that might not
> fixed it for 64bit hosts. Can you try explicitly setting the
> --bitmap-chunk size such that there will be fewer than 1,000,000
> chunks?
>
> NeilBrown
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
[-- Attachment #2: create-array-with-external-bitmap.out --]
[-- Type: application/octet-stream, Size: 3250 bytes --]
[-- Attachment #3: create-array-with-internal-bitmap.out --]
[-- Type: application/octet-stream, Size: 38155 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: BUGS: internal bitmap during array create
2006-10-18 23:26 ` Eli Stair
@ 2006-10-19 6:50 ` Neil Brown
2006-10-19 23:34 ` Eli Stair
0 siblings, 1 reply; 7+ messages in thread
From: Neil Brown @ 2006-10-19 6:50 UTC (permalink / raw)
To: Eli Stair; +Cc: linux-raid mailing list
On Wednesday October 18, estair@ilm.com wrote:
>
>
> I've provided the requested info, attached as two files (typescript
> output):
Thanks for persisting with this.
There is one bug in mdadm that is causing all of these problems. It
only affect the 'offset' layout with raid10.
The fix is
http://neil.brown.name/git?p=mdadm;a=commitdiff;h=702b557b1c9
and is included below.
You might like to grab the latest source from
git://neil.brown.name/mdadm
and compile that, or just apply the patch.
Thanks again,
NeilBrown
-------------------------
Fix bugs related to raid10 and the new offset layout.
Need to mask of bits above the bottom 16 when calculating number of
copies.
### Diffstat output
./ChangeLog | 1 +
./Create.c | 2 +-
./util.c | 2 +-
3 files changed, 3 insertions(+), 2 deletions(-)
diff .prev/ChangeLog ./ChangeLog
--- .prev/ChangeLog 2006-10-19 16:38:07.000000000 +1000
+++ ./ChangeLog 2006-10-19 16:38:24.000000000 +1000
@@ -13,6 +13,7 @@ Changes Prior to this release
initramfs, but device doesn't yet exist in /dev.
- When --assemble --scan is run, if all arrays that could be found
have already been started, don't report an error.
+ - Fix a couple of bugs related to raid10 and the new 'offset' layout.
Changes Prior to 2.5.4 release
- When creating devices in /dev/md/ create matching symlinks
diff .prev/Create.c ./Create.c
--- .prev/Create.c 2006-10-19 16:38:07.000000000 +1000
+++ ./Create.c 2006-10-19 16:38:24.000000000 +1000
@@ -363,7 +363,7 @@ int Create(struct supertype *st, char *m
* which is array.size * raid_disks / ncopies;
* .. but convert to sectors.
*/
- int ncopies = (layout>>8) * (layout & 255);
+ int ncopies = ((layout>>8) & 255) * (layout & 255);
bitmapsize = (unsigned long long)size * raiddisks / ncopies * 2;
/* printf("bms=%llu as=%d rd=%d nc=%d\n", bitmapsize, size, raiddisks, ncopies);*/
} else
diff .prev/util.c ./util.c
--- .prev/util.c 2006-10-19 16:38:07.000000000 +1000
+++ ./util.c 2006-10-19 16:38:24.000000000 +1000
@@ -179,7 +179,7 @@ int enough(int level, int raid_disks, in
/* This is the tricky one - we need to check
* which actual disks are present.
*/
- copies = (layout&255)* (layout>>8);
+ copies = (layout&255)* ((layout>>8) & 255);
first=0;
do {
/* there must be one of the 'copies' form 'first' */
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: BUGS: internal bitmap during array create
2006-10-19 6:50 ` Neil Brown
@ 2006-10-19 23:34 ` Eli Stair
2006-10-20 0:30 ` Neil Brown
0 siblings, 1 reply; 7+ messages in thread
From: Eli Stair @ 2006-10-19 23:34 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid mailing list
Neil, thanks a ton. This issue appears resolved completely, array
creation/assembly is clean, filesystem creation and consistency checking
is clean, and I have not generated any filesystem or md errors in
testing yet.
Cheers,
/eli
Neil Brown wrote:
> On Wednesday October 18, estair@ilm.com wrote:
> >
> >
> > I've provided the requested info, attached as two files (typescript
> > output):
>
> Thanks for persisting with this.
>
> There is one bug in mdadm that is causing all of these problems. It
> only affect the 'offset' layout with raid10.
>
> The fix is
> http://neil.brown.name/git?p=mdadm;a=commitdiff;h=702b557b1c9
>
> and is included below.
> You might like to grab the latest source from
> git://neil.brown.name/mdadm
> and compile that, or just apply the patch.
>
> Thanks again,
> NeilBrown
>
> -------------------------
> Fix bugs related to raid10 and the new offset layout.
>
> Need to mask of bits above the bottom 16 when calculating number of
> copies.
>
> ### Diffstat output
> ./ChangeLog | 1 +
> ./Create.c | 2 +-
> ./util.c | 2 +-
> 3 files changed, 3 insertions(+), 2 deletions(-)
>
> diff .prev/ChangeLog ./ChangeLog
> --- .prev/ChangeLog 2006-10-19 16:38:07.000000000 +1000
> +++ ./ChangeLog 2006-10-19 16:38:24.000000000 +1000
> @@ -13,6 +13,7 @@ Changes Prior to this release
> initramfs, but device doesn't yet exist in /dev.
> - When --assemble --scan is run, if all arrays that could be found
> have already been started, don't report an error.
> + - Fix a couple of bugs related to raid10 and the new 'offset' layout.
>
> Changes Prior to 2.5.4 release
> - When creating devices in /dev/md/ create matching symlinks
>
> diff .prev/Create.c ./Create.c
> --- .prev/Create.c 2006-10-19 16:38:07.000000000 +1000
> +++ ./Create.c 2006-10-19 16:38:24.000000000 +1000
> @@ -363,7 +363,7 @@ int Create(struct supertype *st, char *m
> * which is array.size * raid_disks / ncopies;
> * .. but convert to sectors.
> */
> - int ncopies = (layout>>8) * (layout & 255);
> + int ncopies = ((layout>>8) & 255) * (layout & 255);
> bitmapsize = (unsigned long long)size * raiddisks /
> ncopies * 2;
> /* printf("bms=%llu as=%d rd=%d nc=%d\n", bitmapsize, size,
> raiddisks, ncopies);*/
> } else
>
> diff .prev/util.c ./util.c
> --- .prev/util.c 2006-10-19 16:38:07.000000000 +1000
> +++ ./util.c 2006-10-19 16:38:24.000000000 +1000
> @@ -179,7 +179,7 @@ int enough(int level, int raid_disks, in
> /* This is the tricky one - we need to check
> * which actual disks are present.
> */
> - copies = (layout&255)* (layout>>8);
> + copies = (layout&255)* ((layout>>8) & 255);
> first=0;
> do {
> /* there must be one of the 'copies' form 'first' */
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2006-10-20 0:30 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-11 20:11 BUGS: internal bitmap during array create Eli Stair
2006-10-13 1:02 ` Eli Stair
2006-10-13 2:48 ` Neil Brown
2006-10-18 23:26 ` Eli Stair
2006-10-19 6:50 ` Neil Brown
2006-10-19 23:34 ` Eli Stair
2006-10-20 0:30 ` Neil Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).