Re: RAID5 resync question BUGREPORT!

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: RAID5 resync question BUGREPORT!
  2005-12-09  4:49             ` Neil Brown
@ 2005-11-17  1:09               ` JaniD++
  2005-12-19  0:57                 ` Neil Brown
  0 siblings, 1 reply; 16+ messages in thread
From: JaniD++ @ 2005-11-17  1:09 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Hello,

Now i trying the patch....

[root@st-0001 root]# mdadm -G --bitmap=/raid.bm /dev/md0
mdadm: Warning - bitmaps created on this kernel are not portable
  between different architectured.  Consider upgrading the Linux kernel.
mdadm: Cannot set bitmap file for /dev/md0: Cannot allocate memory
[root@st-0001 root]# free
             total       used       free     shared    buffers     cached
Mem:       2073152      75036    1998116          0          4      29304
-/+ buffers/cache:      45728    2027424
Swap:            0          0          0
[root@st-0001 root]# mdadm -X /dev/md0
mdadm: WARNING: bitmap file is not large enough for array size
2641363663419644516!

        Filename : /dev/md0
           Magic : a799d766
mdadm: invalid bitmap magic 0xa799d766, the bitmap file appears to be
corrupted
         Version : -91455910
mdadm: unknown bitmap version -91455910, either the bitmap file is corrupted
or you need to upgrade your tools
[root@st-0001 root]#

And now what? :-)

Cheers,
Janos


----- Original Message ----- 
From: "Neil Brown" <neilb@suse.de>
To: "JaniD++" <djani22@dynamicweb.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Friday, December 09, 2005 5:49 AM
Subject: Re: RAID5 resync question BUGREPORT!


> On Friday December 9, djani22@dynamicweb.hu wrote:
> > Hi,
> >
> > After i get this on one of my disk node, imediately send this letter,
and go
> > to the hosting company, to see, is any message on the screen.
> > But unfortunately nothing what i found.
> > simple freeze.
> > no message, no ping, no num lock!
> >
> > The full message of  the node next reboot is here:
> > http://download.netcenter.hu/bughunt/20051209/boot.log
>
> Ahh.... Ok, I know the problem.
> I had originally only tested bitmaps for raid5 and raid6 on a
> single-processor machine.  When you try it on an SMP machine you get a
> deadlock.
> The following patch - which will be in 2.6.15 - fixes the problem.
>
> Thanks for your testing.
>
> NeilBrown
>
> -------------------------------
> Fix locking problem in r5/r6
>
> bitmap_unplug actually writes data (bits) to storage, so we
> shouldn't be holding a spinlock...
>
> Signed-off-by: Neil Brown <neilb@suse.de>
>
> ### Diffstat output
>  ./drivers/md/raid5.c     |    2 ++
>  ./drivers/md/raid6main.c |    2 ++
>  2 files changed, 4 insertions(+)
>
> diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
> --- ./drivers/md/raid5.c 2005-12-06 11:06:53.000000000 +1100
> +++ ./drivers/md/raid5.c~current~ 2005-12-06 11:07:10.000000000 +1100
> @@ -1704,7 +1704,9 @@ static void raid5d (mddev_t *mddev)
>
>   if (conf->seq_flush - conf->seq_write > 0) {
>   int seq = conf->seq_flush;
> + spin_unlock_irq(&conf->device_lock);
>   bitmap_unplug(mddev->bitmap);
> + spin_lock_irq(&conf->device_lock);
>   conf->seq_write = seq;
>   activate_bit_delay(conf);
>   }
>
> diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
> --- ./drivers/md/raid6main.c 2005-12-06 11:06:53.000000000 +1100
> +++ ./drivers/md/raid6main.c~current~ 2005-12-06 11:07:10.000000000 +1100
> @@ -1784,7 +1784,9 @@ static void raid6d (mddev_t *mddev)
>
>   if (conf->seq_flush - conf->seq_write > 0) {
>   int seq = conf->seq_flush;
> + spin_unlock_irq(&conf->device_lock);
>   bitmap_unplug(mddev->bitmap);
> + spin_lock_irq(&conf->device_lock);
>   conf->seq_write = seq;
>   activate_bit_delay(conf);
>   }


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5 resync question BUGREPORT!
  2005-12-22  4:46                     ` Neil Brown
@ 2005-11-23  9:38                       ` JaniD++
  0 siblings, 0 replies; 16+ messages in thread
From: JaniD++ @ 2005-11-23  9:38 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

----- Original Message ----- 
From: "Neil Brown" <neilb@suse.de>
To: "JaniD++" <djani22@dynamicweb.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Thursday, December 22, 2005 5:46 AM
Subject: Re: RAID5 resync question BUGREPORT!


> On Monday December 19, djani22@dynamicweb.hu wrote:
> > ----- Original Message ----- 
> > From: "Neil Brown" <neilb@suse.de>
> > To: "JaniD++" <djani22@dynamicweb.hu>
> > Cc: <linux-raid@vger.kernel.org>
> > Sent: Monday, December 19, 2005 1:57 AM
> > Subject: Re: RAID5 resync question BUGREPORT!
> > >
> > > How big is your array?
> >
> >      Raid Level : raid5
> >      Array Size : 1953583360 (1863.08 GiB 2000.47 GB)
> >     Device Size : 195358336 (186.31 GiB 200.05 GB)
> >
> >
> > > The default bitmap-chunk-size when the bitmap is in a file is 4K, this
> > > makes a very large bitmap on a large array.
>
> Hmmm The bitmap chunks are in the device space rather than the array
> space. So 4K chunks in 186GiB is 48million chunks, so 48million bits.
> 8*4096 bits per page, so 1490 pages, which is a lot, and maybe a
> waste, but you should be able to allocate 4.5Meg...
>
> But there is a table which holds pointers to these pages.
> 4 bytes per pointer (8 on a 64bit machine) so 6K or 12K for the table.
> Allocating anything bigger than 4K can be a problem, so that is
> presumably the limit you hit.
>
> The max the table size should be is 4K, which is 1024 pages (on a
> 32bit machine), which is 33 million bits.  So we shouldn't allow more
> than 33million (33554432 actually) chunks.
> On you array, that would be 5.8K, so 8K chunks should be ok, unless
> you have a 64bit machine, then 16K chunks.
> Still that is wasting a lot of space.

My system is currently running on i386, 32.
I can see, the 2TB array is usually hit some limits. :-)
My first idea was the variables phisical size. (eg: int:32768, double 65535,
etc...)
Did you chech that? :-)

>
> >
> > Yes, and if i can see correctly, it makes overflow.
> >
> > > Try a larger bitmap-chunk size e.g.
> > >
> > >    mdadm -G --bitmap-chunk=256 --bitmap=/raid.bm /dev/md0
> >
> > I think it is still uncompleted!
> >
> > [root@st-0001 /]# mdadm -G --bitmap-chunk=256 --bitmap=/raid.bm /dev/md0
> > mdadm: Warning - bitmaps created on this kernel are not portable
> >   between different architectured.  Consider upgrading the Linux kernel.
> > Segmentation fault
>
> Oh dear.... There should have been an 'oops' message in the kernel
> logs.  Can you post it.

Yes, you have right!

If i think correclty, the problem is the live bitmap file on NFS. :-)
(i am a really good tester! :-D)


Dec 19 10:58:37 st-0001 kernel: md0: bitmap file is out of date (0 <
82198273) -- forcing full recovery
Dec 19 10:58:37 st-0001 kernel: md0: bitmap file is out of date, doing full
recovery
Dec 19 10:58:37 st-0001 kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000078
Dec 19 10:58:38 st-0001 kernel:  printing eip:
Dec 19 10:58:38 st-0001 kernel: c0213524
Dec 19 10:58:38 st-0001 kernel: *pde = 00000000
Dec 19 10:58:38 st-0001 kernel: Oops: 0000 [#1]
Dec 19 10:58:38 st-0001 kernel: SMP
Dec 19 10:58:38 st-0001 kernel: Modules linked in: netconsole
Dec 19 10:58:38 st-0001 kernel: CPU:    0
Dec 19 10:58:38 st-0001 kernel: EIP:    0060:[<c0213524>]    Not tainted VLI
Dec 19 10:58:38 st-0001 kernel: EFLAGS: 00010292   (2.6.14.2-NBDFIX)
Dec 19 10:58:38 st-0001 kernel: EIP is at nfs_flush_incompatible+0xf/0x8d
Dec 19 10:58:38 st-0001
Dec 19 10:58:38 st-0001 kernel: eax: 00000000   ebx: 00000f00   ecx:
00000000   edx: 00000282
Dec 19 10:58:38 st-0001 kernel: esi: 00000001   edi: c1fcaf40   ebp:
f7dc7500   esp: e2281d7c
Dec 19 10:58:38 st-0001 kernel: ds: 007b   es: 007b   ss: 0068
Dec 19 10:58:38 st-0001 kernel: Process mdadm (pid: 30771,
threadinfo=e2280000 task=f6f28540)
Dec 19 10:58:38 st-0001 kernel: Stack: 00000000 00000282 c014fd3f c1fcaf40
00000060 00000f00 00000001 c1fcaf40
Dec 19 10:58:38 st-0001 kernel:        f7dc7500 c04607e1 00000000 c1fcaf40
00000000 00001000 c1fcaf40 00000f00
Dec 19 10:58:38 st-0001 kernel:        c1fcaf40 ffaa6000 00000000 c04619a7
f7dc7500 c1fcaf40 00000001 00000000
Dec 19 10:58:38 st-0001 kernel: Call Trace:
Dec 19 10:58:38 st-0001 kernel:  [<c014fd3f>] page_address+0x8e/0x94
Dec 19 10:58:38 st-0001 kernel:  [<c04607e1>] write_page+0x5b/0x15d
Dec 19 10:58:38 st-0001 kernel:  [<c04619a7>]
bitmap_init_from_disk+0x3eb/0x4df
Dec 19 10:58:38 st-0001 kernel:  [<c0462b79>] bitmap_create+0x1dc/0x2d3
Dec 19 10:58:38 st-0001 kernel:  [<c045d579>] set_bitmap_file+0x68/0x19f
Dec 19 10:58:38 st-0001 kernel:  [<c045e0f6>] md_ioctl+0x456/0x678
Dec 19 10:58:38 st-0001 kernel:  [<c04f7640>]
rpcauth_lookup_credcache+0xe3/0x1cb
Dec 19 10:58:38 st-0001 kernel:  [<c04f7781>] rpcauth_lookupcred+0x59/0x95
Dec 19 10:58:38 st-0001 kernel:  [<c020c240>]
nfs_file_set_open_context+0x29/0x4b
Dec 19 10:58:38 st-0001 kernel:  [<c03656e8>] blkdev_driver_ioctl+0x6b/0x80
Dec 19 10:58:38 st-0001 kernel:  [<c0365824>] blkdev_ioctl+0x127/0x19e
Dec 19 10:58:38 st-0001 kernel:  [<c016a2fb>] block_ioctl+0x2b/0x2f
Dec 19 10:58:38 st-0001 kernel:  [<c01745ed>] do_ioctl+0x2d/0x81
Dec 19 10:58:38 st-0001 kernel:  [<c01747c6>] vfs_ioctl+0x5a/0x1ef
Dec 19 10:58:38 st-0001 kernel:  [<c01749ca>] sys_ioctl+0x6f/0x7d
Dec 19 10:58:38 st-0001 kernel:  [<c0102cc3>] sysenter_past_esp+0x54/0x75
Dec 19 10:58:38 st-0001 kernel: Code: 5c 24 14 e9 bb fe ff ff 89 f8 e8 9e 5d
2f 00 89 34 24 e8 a2 f9 ff ff e9 a7 fe ff ff 55 57 56 53 83 ec 14 8b 7c 24
2c 8b 44 24 28 <8b> 40 78 89 44 24 10 8b 47 10 8b 28 8b 47 14 89 44 24 04 89
2c
Dec 19 10:59:54 st-0001 SysRq :
Dec 19 10:59:54 st-0001 Resetting
Dec 19 10:59:54 st-0001 kernel:  <6>SysRq : Resetting


> > (Anyway i think the --bitmap-chunk option is neccessary to be
automaticaly
> > generated.)
>
> Yes... I wonder what the default should be.
> Certainly not more than 33million bits.  Maybe a maximum of 8 million
> (1 megabyte).

(
Generally i cannot understand why it working this way....
When i made this, it should be work in reverse order!
I mean hardcoded [or soft configurable] divisor 64K or 32K [depends on
superblocks free space], for minimal use of space, and system resources, to
rewrite it on all devices!
eg in my system -what usually hits limits- the full resync time on 2TB is 4
hour.
If the resync time can be only 4 hour /32768 = 0.44 sec, it is really good
enough! :-)
)


> > > > [root@st-0001 root]# mdadm -X /dev/md0
> > >
> > > This usage is only appropriate for arrays with internal bitmaps (I
> > > should get mdadm to check that..).
> >
> > Is there a way to check external bitmaps?
>
> mdadm -X /raid.bm
>
> i.e. eXamine the object (device or file) that has the bitmap on it.
> Actually, I don't think 'mdadm -X /dev/md0' is right even for
> 'internal' bitmaps.  It should be 'mdadm -x /dev/sda1' Or whichever
> is a component device.

That sounds good.

>
> >
> > > >
> > > > And now what? :-)
> > >
> > > Either create an 'internal' bitmap, or choose a --bitmap-chunk size
> > > that is larger.
> >
> > First you sad, the space to the internal bitmap is only 64K.
> > My first bitmap file is ~4MB, and with --bitmap-chunk=256 option still
96000
> > Byte.
> >
> > I don't think so... :-)
>
> When using an internal bitmap, mdadm will automatically size the
> bitmap to fit. In your case I think it will choose 512k as the chunk
> size so the bitmap of 48K will fit in the space after the superblock.

Ahh..
Thats what i have talking about. :-)

> >
> > I am affraid to overwrite an existing data.
> >
>
> There is no risk of that.

OK, i trusting in you, and raid! :-)

Thanks,
Janos

>
> NeilBrown
>
> (holidays are coming, so I may not be able to reply further for a
> couple of weeks)
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RAID5 resync question
@ 2005-12-06  0:18 JaniD++
  2005-12-06  0:32 ` Neil Brown
  0 siblings, 1 reply; 16+ messages in thread
From: JaniD++ @ 2005-12-06  0:18 UTC (permalink / raw)
  To: linux-raid

Hello, list,

Is there a way to force the raid to skip this type of resync?

Every 2s: cat /proc/mdstat                              Tue Dec  6 01:28:55
2005

Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6]
[raid10] [f
aulty]
md0 : active raid5 sdb1[4] sda1[10] hds1[5] hdq1[2] hdo1[3] hdm1[8] hdk1[6]
hdi1
[7] hde1[9] hdc1[1] hda1[0]
      1953583360 blocks level 5, 32k chunk, algorithm 2 [11/11]
[UUUUUUUUUUU]
      [=>...................]  resync =  6.0% (11758724/195358336)
finish=1216.8
min speed=2512K/sec

unused devices: <none>

md: Autodetecting RAID arrays.
md: autorun ...
md: considering sdb1 ...
md:  adding sdb1 ...
md:  adding sda1 ...
md:  adding hds1 ...
md:  adding hdq1 ...
md:  adding hdo1 ...
md:  adding hdm1 ...
md:  adding hdk1 ...
md:  adding hdi1 ...
md:  adding hde1 ...
md:  adding hdc1 ...
md:  adding hda1 ...
md: created md0
md: bind<hda1>
md: bind<hdc1>
md: bind<hde1>
md: bind<hdi1>
md: bind<hdk1>
md: bind<hdm1>
md: bind<hdo1>
md: bind<hdq1>
md: bind<hds1>
md: bind<sda1>
md: bind<sdb1>
md: running:
<sdb1><sda1><hds1><hdq1><hdo1><hdm1><hdk1><hdi1><hde1><hdc1><hda1>
md: md0: raid array is not clean -- starting background reconstruction

Thanks,
Janos

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5 resync question
  2005-12-06  0:18 RAID5 resync question JaniD++
@ 2005-12-06  0:32 ` Neil Brown
  2005-12-06  0:45   ` JaniD++
  0 siblings, 1 reply; 16+ messages in thread
From: Neil Brown @ 2005-12-06  0:32 UTC (permalink / raw)
  To: JaniD++; +Cc: linux-raid

On Tuesday December 6, djani22@dynamicweb.hu wrote:
> Hello, list,
> 
> 
> Is there a way to force the raid to skip this type of resync?

Why would you want to? 
The array is 'unclean', presumably due to a system crash.  The parity
isn't certain to be correct so your data isn't safe against a device
failure.  You *want* this resync.

If you are using 2.6.14 to later you can try turning on the
write-intent bitmap (mdadm --grow /dev/md0 --bitmap=internal).
That may impact write performance a bit (reports on how much would be
appreciated) but will make this resync-after-crash much faster.

NeilBrown

> 
> Every 2s: cat /proc/mdstat                              Tue Dec  6 01:28:55
> 2005
> 
> Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6]
> [raid10] [f
> aulty]
> md0 : active raid5 sdb1[4] sda1[10] hds1[5] hdq1[2] hdo1[3] hdm1[8] hdk1[6]
> hdi1
> [7] hde1[9] hdc1[1] hda1[0]
>       1953583360 blocks level 5, 32k chunk, algorithm 2 [11/11]
> [UUUUUUUUUUU]
>       [=>...................]  resync =  6.0% (11758724/195358336)
> finish=1216.8
> min speed=2512K/sec

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5 resync question
  2005-12-06  0:32 ` Neil Brown
@ 2005-12-06  0:45   ` JaniD++
  2005-12-06  1:05     ` Neil Brown
  0 siblings, 1 reply; 16+ messages in thread
From: JaniD++ @ 2005-12-06  0:45 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid


----- Original Message ----- 
From: "Neil Brown" <neilb@suse.de>
To: "JaniD++" <djani22@dynamicweb.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Tuesday, December 06, 2005 1:32 AM
Subject: Re: RAID5 resync question


> On Tuesday December 6, djani22@dynamicweb.hu wrote:
> > Hello, list,
> >
> >
> > Is there a way to force the raid to skip this type of resync?
>
> Why would you want to?
> The array is 'unclean', presumably due to a system crash.  The parity
> isn't certain to be correct so your data isn't safe against a device
> failure.  You *want* this resync.

Thanks for the warning.
Yes, you have right, the system is crashed.

I know, it is some chance to leave some incorrect parity information on the
array, but may be corrected by next write.
On my system is very little dirty data, thanks to vm configuration and
*very* often flushes.
The risk is low, but the time what takes the resync is bigger problem. :-(

If i can, i want to break this resync.
And same on the fresh NEW raid5 array....

(One possible way:
in this time rebuild the array with "--force-skip-resync" option or
something similar...)

>
> If you are using 2.6.14 to later you can try turning on the
> write-intent bitmap (mdadm --grow /dev/md0 --bitmap=internal).
> That may impact write performance a bit (reports on how much would be
> appreciated) but will make this resync-after-crash much faster.

Hmm.
What does this exactly?
Changes the existing array's structure?
Need to resync? :-D
Safe with existing data?

What do you think about full external "log"?
To use some checkpoints in ext file or device to resync an array?
And the better handling of half-synced array?


Cheers,
Janos

>
> NeilBrown
>
> >
> > Every 2s: cat /proc/mdstat                              Tue Dec  6
01:28:55
> > 2005
> >
> > Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6]
> > [raid10] [f
> > aulty]
> > md0 : active raid5 sdb1[4] sda1[10] hds1[5] hdq1[2] hdo1[3] hdm1[8]
hdk1[6]
> > hdi1
> > [7] hde1[9] hdc1[1] hda1[0]
> >       1953583360 blocks level 5, 32k chunk, algorithm 2 [11/11]
> > [UUUUUUUUUUU]
> >       [=>...................]  resync =  6.0% (11758724/195358336)
> > finish=1216.8
> > min speed=2512K/sec
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5 resync question
  2005-12-06  0:45   ` JaniD++
@ 2005-12-06  1:05     ` Neil Brown
  2005-12-06 10:56       ` JaniD++
  2005-12-08 23:00       ` RAID5 resync question BUGREPORT! JaniD++
  0 siblings, 2 replies; 16+ messages in thread
From: Neil Brown @ 2005-12-06  1:05 UTC (permalink / raw)
  To: JaniD++; +Cc: linux-raid

On Tuesday December 6, djani22@dynamicweb.hu wrote:
> 
> ----- Original Message ----- 
> From: "Neil Brown" <neilb@suse.de>
> To: "JaniD++" <djani22@dynamicweb.hu>
> Cc: <linux-raid@vger.kernel.org>
> Sent: Tuesday, December 06, 2005 1:32 AM
> Subject: Re: RAID5 resync question
> 
> 
> > On Tuesday December 6, djani22@dynamicweb.hu wrote:
> > > Hello, list,
> > >
> > >
> > > Is there a way to force the raid to skip this type of resync?
> >
> > Why would you want to?
> > The array is 'unclean', presumably due to a system crash.  The parity
> > isn't certain to be correct so your data isn't safe against a device
> > failure.  You *want* this resync.
> 
> Thanks for the warning.
> Yes, you have right, the system is crashed.
> 
> I know, it is some chance to leave some incorrect parity information on the
> array, but may be corrected by next write.

Or it may not be corrected by the next write.  The parity-update
algorithm assumes that the parity is correct.


> On my system is very little dirty data, thanks to vm configuration and
> *very* often flushes.
> The risk is low, but the time what takes the resync is bigger problem. :-(
> 
> If i can, i want to break this resync.
> And same on the fresh NEW raid5 array....
> 
> (One possible way:
> in this time rebuild the array with "--force-skip-resync" option or
> something similar...)

If you have mdadm 2.2. then you can recreate the array with
'--assume-clean', and all your data should still be intact.  But if
you get corruption one day, don't complain about it - it's your
choice.

> 
> >
> > If you are using 2.6.14 to later you can try turning on the
> > write-intent bitmap (mdadm --grow /dev/md0 --bitmap=internal).
> > That may impact write performance a bit (reports on how much would be
> > appreciated) but will make this resync-after-crash much faster.
> 
> Hmm.
> What does this exactly?

Divides the array into approximately 200,000 sections (all a power of
2 in size) and keeps track (in a bitmap) of which sections might have
inconsistent parity.  if you crash, it only syncs sections recorded in
the bitmap.

> Changes the existing array's structure?

In a forwards/backwards compatible way (makes use of some otherwise
un-used space).

> Need to resync? :-D

You really should let your array sync this time.  Once it is synced,
add the bitmap.  Then next time you have a crash, the cost will be
much smaller.

> Safe with existing data?

Yes.

> 
> What do you think about full external "log"?

Too much overhead without specialised hardware.

> To use some checkpoints in ext file or device to resync an array?
> And the better handling of half-synced array?

I don't know what these mean.

NeilBrown

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5 resync question
  2005-12-06  1:05     ` Neil Brown
@ 2005-12-06 10:56       ` JaniD++
  2005-12-06 23:50         ` Neil Brown
  2005-12-08 23:00       ` RAID5 resync question BUGREPORT! JaniD++
  1 sibling, 1 reply; 16+ messages in thread
From: JaniD++ @ 2005-12-06 10:56 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

> > I know, it is some chance to leave some incorrect parity information on
the
> > array, but may be corrected by next write.
>
> Or it may not be corrected by the next write.  The parity-update
> algorithm assumes that the parity is correct.

Hmm.
If it works with parity-update algorithm, instead of parity "rewrite
algorithm", you have right.
But it works block-based, and if the entire block is written, the parity is
turned to de correct, or not? :-)
What is the block size?
It isequal to chunk-size?
Thanks the warning again!

> > (One possible way:
> > in this time rebuild the array with "--force-skip-resync" option or
> > something similar...)
>
> If you have mdadm 2.2. then you can recreate the array with
> '--assume-clean', and all your data should still be intact.  But if
> you get corruption one day, don't complain about it - it's your
> choice.

Ahh, thats what i want. :-)
(But reading this letter looks like unneccessary in this case...)

> > What does this exactly?
>
> Divides the array into approximately 200,000 sections (all a power of
> 2 in size) and keeps track (in a bitmap) of which sections might have
> inconsistent parity.  if you crash, it only syncs sections recorded in
> the bitmap.
>
> > Changes the existing array's structure?
>
> In a forwards/backwards compatible way (makes use of some otherwise
> un-used space).

What unused space?
In the raid superblock?
The end of the drives or the end of the array?
It leaves the raid structure unchanged except the superblocks?

>
> > Need to resync? :-D
>
> You really should let your array sync this time.  Once it is synced,
> add the bitmap.  Then next time you have a crash, the cost will be
> much smaller.

This looks like really good idea!
With this bitmap, the force skip resync is really unnecessary....

>
> > To use some checkpoints in ext file or device to resync an array?
> > And the better handling of half-synced array?
>
> I don't know what these mean.

(a little background:
I have write a little stat program, using /sys/block/#/stat -files, to find
performance bottlenecks.
In the stat files i can see, if the device is reads or writes, and the
needed times for these.)

One time while my array is really rebuild one disk (paralel normal
workload), i see, the new drive in the array *only* writes.
i means with "better handling of half-synced array" is this:
If read request comes to the ?% synced array, and if the read is on the
synced half, only need to read from *new* device, instead reading all other
to calculate data from parity.

On a working system this can be a little speed up the rebuild process, and
some offload the system.
Or i'm on a wrong clue? :-)

Cheers,
Janos

>
> NeilBrown

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5 resync question
  2005-12-06 10:56       ` JaniD++
@ 2005-12-06 23:50         ` Neil Brown
  2005-12-07  1:32           ` JaniD++
  0 siblings, 1 reply; 16+ messages in thread
From: Neil Brown @ 2005-12-06 23:50 UTC (permalink / raw)
  To: JaniD++; +Cc: linux-raid

On Tuesday December 6, djani22@dynamicweb.hu wrote:
> 
> > > I know, it is some chance to leave some incorrect parity information on
> the
> > > array, but may be corrected by next write.
> >
> > Or it may not be corrected by the next write.  The parity-update
> > algorithm assumes that the parity is correct.
> 
> Hmm.
> If it works with parity-update algorithm, instead of parity "rewrite
> algorithm", you have right.

It chooses read-modify-write(depending on old parity), or
reconstruct-write (not depending on old parity) depending on how much
pre-reading each option requires.


> But it works block-based, and if the entire block is written, the parity is
> turned to de correct, or not? :-)

Not.

> What is the block size?

PAGE_SIZE (4K)

> It isequal to chunk-size?

no.
> 
> > > What does this exactly?
> >
> > Divides the array into approximately 200,000 sections (all a power of
> > 2 in size) and keeps track (in a bitmap) of which sections might have
> > inconsistent parity.  if you crash, it only syncs sections recorded in
> > the bitmap.
> >
> > > Changes the existing array's structure?
> >
> > In a forwards/backwards compatible way (makes use of some otherwise
> > un-used space).
> 
> What unused space?
> In the raid superblock?

The raid superblock is 4k in size, placed at least 64k from the end of
the devices.  Thus there is always at least 60k of dead space.

> The end of the drives or the end of the array?

end of the drives.  bitmap is stored (similar to raid1) on all
drivers.

> It leaves the raid structure unchanged except the superblocks?

yes.

> 
> >
> > > To use some checkpoints in ext file or device to resync an array?
> > > And the better handling of half-synced array?
> >
> > I don't know what these mean.
> 
> (a little background:
> I have write a little stat program, using /sys/block/#/stat -files, to find
> performance bottlenecks.
> In the stat files i can see, if the device is reads or writes, and the
> needed times for these.)
> 
> One time while my array is really rebuild one disk (paralel normal
> workload), i see, the new drive in the array *only* writes.
> i means with "better handling of half-synced array" is this:
> If read request comes to the ?% synced array, and if the read is on the
> synced half, only need to read from *new* device, instead reading all other
> to calculate data from parity.
> 
> On a working system this can be a little speed up the rebuild process, and
> some offload the system.
> Or i'm on a wrong clue? :-)

Yes, it would probably be possible to get it to read from the
recovering drive once that section had been recovered.  I'll put it on
my todo list.

NeilBrown

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5 resync question
  2005-12-06 23:50         ` Neil Brown
@ 2005-12-07  1:32           ` JaniD++
  0 siblings, 0 replies; 16+ messages in thread
From: JaniD++ @ 2005-12-07  1:32 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

> >
> > One time while my array is really rebuild one disk (paralel normal
> > workload), i see, the new drive in the array *only* writes.
> > i means with "better handling of half-synced array" is this:
> > If read request comes to the ?% synced array, and if the read is on the
> > synced half, only need to read from *new* device, instead reading all
other
> > to calculate data from parity.
> >
> > On a working system this can be a little speed up the rebuild process,
and
> > some offload the system.
> > Or i'm on a wrong clue? :-)
>
> Yes, it would probably be possible to get it to read from the
> recovering drive once that section had been recovered.  I'll put it on
> my todo list.

If i can add some idea to the world's greatest raid software, it is my
pleasure! :-)

But, Neil!
It is still something what i cannot understand.

(Preliminary, i never have read the raid5 code.
However i cannot programming in C or C++, only a little can read.)

I cannot cleanly understand what u sad about the parity-updating!

If the array is clean, the parity spaces (blocks) only need to write. (or
not?)
Why use the raid code read-modify-write?
I think it is unnecessary to read these blocks!
The parity block recalculate in memory is more faster than
read-modify-write.

Why the parity space is continous area? (if it is...)
I think it is only need to be block-based, from a lot of independent blocks.
This can be speed up the resync, easy to using always checkpoints, and some
more...

And if the parity data is damaged (like system crash or sg.), and it is
impossible to detect, the next write to the block will turn to correct again
the parity.

Cheers,
Janos

>
> NeilBrown

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5 resync question BUGREPORT!
  2005-12-06  1:05     ` Neil Brown
  2005-12-06 10:56       ` JaniD++
@ 2005-12-08 23:00       ` JaniD++
  2005-12-08 23:43         ` Neil Brown
  1 sibling, 1 reply; 16+ messages in thread
From: JaniD++ @ 2005-12-08 23:00 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Hello, Neil,

[root@st-0001 mdadm-2.2]# mdadm --grow /dev/md0 --bitmap=internal
mdadm: Warning - bitmaps created on this kernel are not portable
  between different architectured.  Consider upgrading the Linux kernel.

Dec  8 23:59:45 st-0001 kernel: md0: bitmap file is out of date (0 <
81015178) -- forcing full recovery
Dec  8 23:59:45 st-0001 kernel: md0: bitmap file is out of date, doing full
recovery
Dec  8 23:59:46 st-0001 kernel: md0: bitmap initialized from disk: read
12/12 pages, set 381560 bits, status: 0
Dec  8 23:59:46 st-0001 kernel: created bitmap (187 pages) for device md0

And the system is crashed.
no ping reply, no netconsole error logging, no panic and reboot.

Thanks,
Janos


----- Original Message ----- 
From: "Neil Brown" <neilb@suse.de>
To: "JaniD++" <djani22@dynamicweb.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Tuesday, December 06, 2005 2:05 AM
Subject: Re: RAID5 resync question


> On Tuesday December 6, djani22@dynamicweb.hu wrote:
> >
> > ----- Original Message ----- 
> > From: "Neil Brown" <neilb@suse.de>
> > To: "JaniD++" <djani22@dynamicweb.hu>
> > Cc: <linux-raid@vger.kernel.org>
> > Sent: Tuesday, December 06, 2005 1:32 AM
> > Subject: Re: RAID5 resync question
> >
> >
> > > On Tuesday December 6, djani22@dynamicweb.hu wrote:
> > > > Hello, list,
> > > >
> > > >
> > > > Is there a way to force the raid to skip this type of resync?
> > >
> > > Why would you want to?
> > > The array is 'unclean', presumably due to a system crash.  The parity
> > > isn't certain to be correct so your data isn't safe against a device
> > > failure.  You *want* this resync.
> >
> > Thanks for the warning.
> > Yes, you have right, the system is crashed.
> >
> > I know, it is some chance to leave some incorrect parity information on
the
> > array, but may be corrected by next write.
>
> Or it may not be corrected by the next write.  The parity-update
> algorithm assumes that the parity is correct.
>
>
> > On my system is very little dirty data, thanks to vm configuration and
> > *very* often flushes.
> > The risk is low, but the time what takes the resync is bigger problem.
:-(
> >
> > If i can, i want to break this resync.
> > And same on the fresh NEW raid5 array....
> >
> > (One possible way:
> > in this time rebuild the array with "--force-skip-resync" option or
> > something similar...)
>
> If you have mdadm 2.2. then you can recreate the array with
> '--assume-clean', and all your data should still be intact.  But if
> you get corruption one day, don't complain about it - it's your
> choice.
>
> >
> > >
> > > If you are using 2.6.14 to later you can try turning on the
> > > write-intent bitmap (mdadm --grow /dev/md0 --bitmap=internal).
> > > That may impact write performance a bit (reports on how much would be
> > > appreciated) but will make this resync-after-crash much faster.
> >
> > Hmm.
> > What does this exactly?
>
> Divides the array into approximately 200,000 sections (all a power of
> 2 in size) and keeps track (in a bitmap) of which sections might have
> inconsistent parity.  if you crash, it only syncs sections recorded in
> the bitmap.
>
> > Changes the existing array's structure?
>
> In a forwards/backwards compatible way (makes use of some otherwise
> un-used space).
>
> > Need to resync? :-D
>
> You really should let your array sync this time.  Once it is synced,
> add the bitmap.  Then next time you have a crash, the cost will be
> much smaller.
>
> > Safe with existing data?
>
> Yes.
>
> >
> > What do you think about full external "log"?
>
> Too much overhead without specialised hardware.
>
> > To use some checkpoints in ext file or device to resync an array?
> > And the better handling of half-synced array?
>
> I don't know what these mean.
>
> NeilBrown


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5 resync question BUGREPORT!
  2005-12-08 23:00       ` RAID5 resync question BUGREPORT! JaniD++
@ 2005-12-08 23:43         ` Neil Brown
  2005-12-09  4:03           ` JaniD++
  0 siblings, 1 reply; 16+ messages in thread
From: Neil Brown @ 2005-12-08 23:43 UTC (permalink / raw)
  To: JaniD++; +Cc: linux-raid

On Friday December 9, djani22@dynamicweb.hu wrote:
> Hello, Neil,
> 
> [root@st-0001 mdadm-2.2]# mdadm --grow /dev/md0 --bitmap=internal
> mdadm: Warning - bitmaps created on this kernel are not portable
>   between different architectured.  Consider upgrading the Linux kernel.
> 
> Dec  8 23:59:45 st-0001 kernel: md0: bitmap file is out of date (0 <
> 81015178) -- forcing full recovery
> Dec  8 23:59:45 st-0001 kernel: md0: bitmap file is out of date, doing full
> recovery
> Dec  8 23:59:46 st-0001 kernel: md0: bitmap initialized from disk: read
> 12/12 pages, set 381560 bits, status: 0
> Dec  8 23:59:46 st-0001 kernel: created bitmap (187 pages) for device md0
> 
> And the system is crashed.
> no ping reply, no netconsole error logging, no panic and reboot.

Hmmm, that's unfortunate :-(

Exactly what kernel were you running?

NeilBrown

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5 resync question BUGREPORT!
  2005-12-08 23:43         ` Neil Brown
@ 2005-12-09  4:03           ` JaniD++
  2005-12-09  4:49             ` Neil Brown
  0 siblings, 1 reply; 16+ messages in thread
From: JaniD++ @ 2005-12-09  4:03 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Hi,

After i get this on one of my disk node, imediately send this letter, and go
to the hosting company, to see, is any message on the screen.
But unfortunately nothing what i found.
simple freeze.
no message, no ping, no num lock!

The full message of  the node next reboot is here:
http://download.netcenter.hu/bughunt/20051209/boot.log

Next step, i try to restart the whole system. (the concentrator is hangs
too, caused by lost the st-0001 node)
The part of the next reboot message of the concentrator is here:
http://download.netcenter.hu/bughunt/20051209/dy-boot.log

Next step, i stops everything, to awoid more data lost.
Try to remove the possible bitmap from the md0 of  node-1 (st-0001).

The messages is there:
http://download.netcenter.hu/bughunt/20051209/mdadm.log

At this time i cannot remove the broken bitmap, only deactivating the use of
it.
But on next reboot, the node will try to use it again. :(

I have try to change the array to use an external bitmap, but the mdadm
failed to create it too.
The external bitmap file is here: (6 MB!)
http://download.netcenter.hu/bughunt/20051209/md0.bitmap

The error message is the same of internal bitmap creation.

I dont know exactly, what caused the fs-damage, but here is my "possible
list": (sorted)
1. the mdadm  (wrong bitmap size)
2. the kernel (wrong resync on startup)
3. the half written data, caused by first crash.

One question:
On a working array doing the bitmap creation is safe and race-free?
(I mean race between the bitmap-create and bitmap update.)

My data lost finally, really minimal. :-)

Cheers,
Janos

----- Original Message ----- 
From: "Neil Brown" <neilb@suse.de>
To: "JaniD++" <djani22@dynamicweb.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Friday, December 09, 2005 12:43 AM
Subject: Re: RAID5 resync question BUGREPORT!

> On Friday December 9, djani22@dynamicweb.hu wrote:
> > Hello, Neil,
> >
> > [root@st-0001 mdadm-2.2]# mdadm --grow /dev/md0 --bitmap=internal
> > mdadm: Warning - bitmaps created on this kernel are not portable
> >   between different architectured.  Consider upgrading the Linux kernel.
> >
> > Dec  8 23:59:45 st-0001 kernel: md0: bitmap file is out of date (0 <
> > 81015178) -- forcing full recovery
> > Dec  8 23:59:45 st-0001 kernel: md0: bitmap file is out of date, doing
full
> > recovery
> > Dec  8 23:59:46 st-0001 kernel: md0: bitmap initialized from disk: read
> > 12/12 pages, set 381560 bits, status: 0
> > Dec  8 23:59:46 st-0001 kernel: created bitmap (187 pages) for device
md0
> >
> > And the system is crashed.
> > no ping reply, no netconsole error logging, no panic and reboot.
>
> Hmmm, that's unfortunate :-(
>
> Exactly what kernel were you running?
>
> NeilBrown

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5 resync question BUGREPORT!
  2005-12-09  4:03           ` JaniD++
@ 2005-12-09  4:49             ` Neil Brown
  2005-11-17  1:09               ` JaniD++
  0 siblings, 1 reply; 16+ messages in thread
From: Neil Brown @ 2005-12-09  4:49 UTC (permalink / raw)
  To: JaniD++; +Cc: linux-raid

On Friday December 9, djani22@dynamicweb.hu wrote:
> Hi,
> 
> After i get this on one of my disk node, imediately send this letter, and go
> to the hosting company, to see, is any message on the screen.
> But unfortunately nothing what i found.
> simple freeze.
> no message, no ping, no num lock!
> 
> The full message of  the node next reboot is here:
> http://download.netcenter.hu/bughunt/20051209/boot.log

Ahh.... Ok, I know the problem.
I had originally only tested bitmaps for raid5 and raid6 on a
single-processor machine.  When you try it on an SMP machine you get a
deadlock.
The following patch - which will be in 2.6.15 - fixes the problem.

Thanks for your testing.

NeilBrown

-------------------------------
Fix locking problem in r5/r6

bitmap_unplug actually writes data (bits) to storage, so we
shouldn't be holding a spinlock...

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/raid5.c     |    2 ++
 ./drivers/md/raid6main.c |    2 ++
 2 files changed, 4 insertions(+)

diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c	2005-12-06 11:06:53.000000000 +1100
+++ ./drivers/md/raid5.c~current~	2005-12-06 11:07:10.000000000 +1100
@@ -1704,7 +1704,9 @@ static void raid5d (mddev_t *mddev)
 
 		if (conf->seq_flush - conf->seq_write > 0) {
 			int seq = conf->seq_flush;
+			spin_unlock_irq(&conf->device_lock);
 			bitmap_unplug(mddev->bitmap);
+			spin_lock_irq(&conf->device_lock);
 			conf->seq_write = seq;
 			activate_bit_delay(conf);
 		}

diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
--- ./drivers/md/raid6main.c	2005-12-06 11:06:53.000000000 +1100
+++ ./drivers/md/raid6main.c~current~	2005-12-06 11:07:10.000000000 +1100
@@ -1784,7 +1784,9 @@ static void raid6d (mddev_t *mddev)
 
 		if (conf->seq_flush - conf->seq_write > 0) {
 			int seq = conf->seq_flush;
+			spin_unlock_irq(&conf->device_lock);
 			bitmap_unplug(mddev->bitmap);
+			spin_lock_irq(&conf->device_lock);
 			conf->seq_write = seq;
 			activate_bit_delay(conf);
 		}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5 resync question BUGREPORT!
  2005-11-17  1:09               ` JaniD++
@ 2005-12-19  0:57                 ` Neil Brown
  2005-12-19 10:34                   ` JaniD++
  0 siblings, 1 reply; 16+ messages in thread
From: Neil Brown @ 2005-12-19  0:57 UTC (permalink / raw)
  To: JaniD++; +Cc: linux-raid

On Thursday November 17, djani22@dynamicweb.hu wrote:
> Hello,
> 
> Now i trying the patch....
> 
> [root@st-0001 root]# mdadm -G --bitmap=/raid.bm /dev/md0
> mdadm: Warning - bitmaps created on this kernel are not portable
>   between different architectured.  Consider upgrading the Linux kernel.
> mdadm: Cannot set bitmap file for /dev/md0: Cannot allocate memory

How big is your array?
The default bitmap-chunk-size when the bitmap is in a file is 4K, this
makes a very large bitmap on a large array.
Try a larger bitmap-chunk size e.g.

   mdadm -G --bitmap-chunk=256 --bitmap=/raid.bm /dev/md0

> [root@st-0001 root]# free
>              total       used       free     shared    buffers     cached
> Mem:       2073152      75036    1998116          0          4      29304
> -/+ buffers/cache:      45728    2027424
> Swap:            0          0          0
> [root@st-0001 root]# mdadm -X /dev/md0

This usage is only appropriate for arrays with internal bitmaps (I
should get mdadm to check that..).
> 
> And now what? :-)

Either create an 'internal' bitmap, or choose a --bitmap-chunk size
that is larger.

Thanks for the report.

NeilBrown

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5 resync question BUGREPORT!
  2005-12-19  0:57                 ` Neil Brown
@ 2005-12-19 10:34                   ` JaniD++
  2005-12-22  4:46                     ` Neil Brown
  0 siblings, 1 reply; 16+ messages in thread
From: JaniD++ @ 2005-12-19 10:34 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

----- Original Message ----- 
From: "Neil Brown" <neilb@suse.de>
To: "JaniD++" <djani22@dynamicweb.hu>
Cc: <linux-raid@vger.kernel.org>
Sent: Monday, December 19, 2005 1:57 AM
Subject: Re: RAID5 resync question BUGREPORT!


> On Thursday November 17, djani22@dynamicweb.hu wrote:
> > Hello,
> >
> > Now i trying the patch....
> >
> > [root@st-0001 root]# mdadm -G --bitmap=/raid.bm /dev/md0
> > mdadm: Warning - bitmaps created on this kernel are not portable
> >   between different architectured.  Consider upgrading the Linux kernel.
> > mdadm: Cannot set bitmap file for /dev/md0: Cannot allocate memory
>
> How big is your array?

     Raid Level : raid5
     Array Size : 1953583360 (1863.08 GiB 2000.47 GB)
    Device Size : 195358336 (186.31 GiB 200.05 GB)


> The default bitmap-chunk-size when the bitmap is in a file is 4K, this
> makes a very large bitmap on a large array.

Yes, and if i can see correctly, it makes overflow.

> Try a larger bitmap-chunk size e.g.
>
>    mdadm -G --bitmap-chunk=256 --bitmap=/raid.bm /dev/md0

I think it is still uncompleted!

[root@st-0001 /]# mdadm -G --bitmap-chunk=256 --bitmap=/raid.bm /dev/md0
mdadm: Warning - bitmaps created on this kernel are not portable
  between different architectured.  Consider upgrading the Linux kernel.
Segmentation fault
[root@st-0001 /]#

And the raid layer is stopped.
(The nbd-server stops to serving, and the cat /proc/mdstat is hangs too.
i try to sync, and
echo b >/proc/sysrq-trigger
After reboot, everything is back to normal.)

This generates one 96000 byte /raid.bm.

(Anyway i think the --bitmap-chunk option is neccessary to be automaticaly
generated.)


> > [root@st-0001 root]# mdadm -X /dev/md0
>
> This usage is only appropriate for arrays with internal bitmaps (I
> should get mdadm to check that..).

Is there a way to check external bitmaps?

> >
> > And now what? :-)
>
> Either create an 'internal' bitmap, or choose a --bitmap-chunk size
> that is larger.

First you sad, the space to the internal bitmap is only 64K.
My first bitmap file is ~4MB, and with --bitmap-chunk=256 option still 96000
Byte.

I don't think so... :-)

I am affraid to overwrite an existing data.

Cheers,
Janos


>
> Thanks for the report.
>
> NeilBrown
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: RAID5 resync question BUGREPORT!
  2005-12-19 10:34                   ` JaniD++
@ 2005-12-22  4:46                     ` Neil Brown
  2005-11-23  9:38                       ` JaniD++
  0 siblings, 1 reply; 16+ messages in thread
From: Neil Brown @ 2005-12-22  4:46 UTC (permalink / raw)
  To: JaniD++; +Cc: linux-raid

On Monday December 19, djani22@dynamicweb.hu wrote:
> ----- Original Message ----- 
> From: "Neil Brown" <neilb@suse.de>
> To: "JaniD++" <djani22@dynamicweb.hu>
> Cc: <linux-raid@vger.kernel.org>
> Sent: Monday, December 19, 2005 1:57 AM
> Subject: Re: RAID5 resync question BUGREPORT!
> >
> > How big is your array?
> 
>      Raid Level : raid5
>      Array Size : 1953583360 (1863.08 GiB 2000.47 GB)
>     Device Size : 195358336 (186.31 GiB 200.05 GB)
> 
> 
> > The default bitmap-chunk-size when the bitmap is in a file is 4K, this
> > makes a very large bitmap on a large array.

Hmmm The bitmap chunks are in the device space rather than the array
space. So 4K chunks in 186GiB is 48million chunks, so 48million bits.
8*4096 bits per page, so 1490 pages, which is a lot, and maybe a
waste, but you should be able to allocate 4.5Meg...

But there is a table which holds pointers to these pages.
4 bytes per pointer (8 on a 64bit machine) so 6K or 12K for the table.
Allocating anything bigger than 4K can be a problem, so that is
presumably the limit you hit.

The max the table size should be is 4K, which is 1024 pages (on a
32bit machine), which is 33 million bits.  So we shouldn't allow more
than 33million (33554432 actually) chunks.
On you array, that would be 5.8K, so 8K chunks should be ok, unless
you have a 64bit machine, then 16K chunks.
Still that is wasting a lot of space.

> 
> Yes, and if i can see correctly, it makes overflow.
> 
> > Try a larger bitmap-chunk size e.g.
> >
> >    mdadm -G --bitmap-chunk=256 --bitmap=/raid.bm /dev/md0
> 
> I think it is still uncompleted!
> 
> [root@st-0001 /]# mdadm -G --bitmap-chunk=256 --bitmap=/raid.bm /dev/md0
> mdadm: Warning - bitmaps created on this kernel are not portable
>   between different architectured.  Consider upgrading the Linux kernel.
> Segmentation fault

Oh dear.... There should have been an 'oops' message in the kernel
logs.  Can you post it.

> [root@st-0001 /]#
> 
> And the raid layer is stopped.
> (The nbd-server stops to serving, and the cat /proc/mdstat is hangs too.
> i try to sync, and
> echo b >/proc/sysrq-trigger
> After reboot, everything is back to normal.)
> 
> This generates one 96000 byte /raid.bm.
That sounds right.

> 
> (Anyway i think the --bitmap-chunk option is neccessary to be automaticaly
> generated.)

Yes... I wonder what the default should be.
Certainly not more than 33million bits.  Maybe a maximum of 8 million
(1 megabyte).

> 
> 
> > > [root@st-0001 root]# mdadm -X /dev/md0
> >
> > This usage is only appropriate for arrays with internal bitmaps (I
> > should get mdadm to check that..).
> 
> Is there a way to check external bitmaps?

mdadm -X /raid.bm

i.e. eXamine the object (device or file) that has the bitmap on it.
Actually, I don't think 'mdadm -X /dev/md0' is right even for
'internal' bitmaps.  It should be 'mdadm -x /dev/sda1' Or whichever
is a component device.

> 
> > >
> > > And now what? :-)
> >
> > Either create an 'internal' bitmap, or choose a --bitmap-chunk size
> > that is larger.
> 
> First you sad, the space to the internal bitmap is only 64K.
> My first bitmap file is ~4MB, and with --bitmap-chunk=256 option still 96000
> Byte.
> 
> I don't think so... :-)

When using an internal bitmap, mdadm will automatically size the
bitmap to fit. In your case I think it will choose 512k as the chunk
size so the bitmap of 48K will fit in the space after the superblock.
> 
> I am affraid to overwrite an existing data.
> 

There is no risk of that.

NeilBrown

(holidays are coming, so I may not be able to reply further for a
couple of weeks)

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2005-12-22  4:46 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-12-06  0:18 RAID5 resync question JaniD++
2005-12-06  0:32 ` Neil Brown
2005-12-06  0:45   ` JaniD++
2005-12-06  1:05     ` Neil Brown
2005-12-06 10:56       ` JaniD++
2005-12-06 23:50         ` Neil Brown
2005-12-07  1:32           ` JaniD++
2005-12-08 23:00       ` RAID5 resync question BUGREPORT! JaniD++
2005-12-08 23:43         ` Neil Brown
2005-12-09  4:03           ` JaniD++
2005-12-09  4:49             ` Neil Brown
2005-11-17  1:09               ` JaniD++
2005-12-19  0:57                 ` Neil Brown
2005-12-19 10:34                   ` JaniD++
2005-12-22  4:46                     ` Neil Brown
2005-11-23  9:38                       ` JaniD++

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).