* bitmap chunk size
@ 2009-11-10 16:39 Darius S. Naqvi
2009-11-10 18:01 ` Doug Ledford
0 siblings, 1 reply; 4+ messages in thread
From: Darius S. Naqvi @ 2009-11-10 16:39 UTC (permalink / raw)
To: linux-raid
Is there any possibility of having a bitmap chunk size of 512 bytes?
I know that mdadm rejects anything under 4k. I fear that the
assumption of the 4k minimum is embedded fairly strongly in the code.
Can my fear be alleviated?
--
Darius S. Naqvi
dnaqvi@datagardens.com
http://www.datagardens.com
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: bitmap chunk size
2009-11-10 16:39 bitmap chunk size Darius S. Naqvi
@ 2009-11-10 18:01 ` Doug Ledford
2009-11-10 18:36 ` Darius S. Naqvi
0 siblings, 1 reply; 4+ messages in thread
From: Doug Ledford @ 2009-11-10 18:01 UTC (permalink / raw)
To: Darius S. Naqvi; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1289 bytes --]
On 11/10/2009 11:39 AM, Darius S. Naqvi wrote:
> Is there any possibility of having a bitmap chunk size of 512 bytes?
> I know that mdadm rejects anything under 4k. I fear that the
> assumption of the 4k minimum is embedded fairly strongly in the code.
> Can my fear be alleviated?
>
If you're putting any normal filesystem (with a block size of 4k) on
this, then it makes absolutely no sense to have a bitmap size less than
4k as any given filesystem block is either dirty or clean, sub-block
semantics make no sense in this scenario. That said, unless you have a
specific need for this level of granularity, it is a really bad idea
(performance wise and space wise) to go with anything even close to
resembling the granularity you are requesting. I usually go with
--bitmap-chunk=32768 (which since that's expressed in k means
32Megabytes). I would actually suspect that if you have a truly
pressing need for a 512byte bitmap chunk, then you probably don't need a
bare raid, you need some sort of database underlying your data or
something else.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: bitmap chunk size
2009-11-10 18:01 ` Doug Ledford
@ 2009-11-10 18:36 ` Darius S. Naqvi
2009-11-10 19:46 ` Doug Ledford
0 siblings, 1 reply; 4+ messages in thread
From: Darius S. Naqvi @ 2009-11-10 18:36 UTC (permalink / raw)
To: linux-raid
On Tue, 10 Nov 2009, Doug Ledford wrote:
> On 11/10/2009 11:39 AM, Darius S. Naqvi wrote:
>> Is there any possibility of having a bitmap chunk size of 512 bytes?
>> I know that mdadm rejects anything under 4k. I fear that the
>> assumption of the 4k minimum is embedded fairly strongly in the code.
>> Can my fear be alleviated?
>>
>
> If you're putting any normal filesystem (with a block size of 4k) on
> this, then it makes absolutely no sense to have a bitmap size less than
> 4k as any given filesystem block is either dirty or clean, sub-block
> semantics make no sense in this scenario. That said, unless you have a
Well, the mke2fs(8) man page says, "Valid block size values are 1024,
2048 and 4096 bytes per block. If omitted, mke2fs block-size is
heuristically determined by the file system size..."
I always thought that filesystems typically used a block size of 4k,
but apparently there is no guarantee that that is the case. Also, I'm
not sure what windows uses as a filesystem block size. This is
important to me, because we're trying to use md raid 1's to
periodically synchronize blocks from a filesystem, and having
sub-chunk writes messes things up for us. What we want is that a
whole chunk gets written, then we fiddle with things so that a
bitmap-driven resync copies those whole chunks. The chunks were not
necessarily initialized before the write, so a sub-chunk write means
garbage data is copied in the remainder of the chunk.
We've seen this problem occur in practice, meaning either filesystems
are not using 4k (or multiple thereof) chunks, or for some reason,
writes are not 4k-aligned. Does anyone with more knowledge of
filesystems know about this? Perhaps we can force block size and
alignment of filesystems to make this work.
--
Darius S. Naqvi
dnaqvi@datagardens.com
http://www.datagardens.com
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: bitmap chunk size
2009-11-10 18:36 ` Darius S. Naqvi
@ 2009-11-10 19:46 ` Doug Ledford
0 siblings, 0 replies; 4+ messages in thread
From: Doug Ledford @ 2009-11-10 19:46 UTC (permalink / raw)
To: Darius S. Naqvi; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 3480 bytes --]
On 11/10/2009 01:36 PM, Darius S. Naqvi wrote:
> On Tue, 10 Nov 2009, Doug Ledford wrote:
>
>> On 11/10/2009 11:39 AM, Darius S. Naqvi wrote:
>>> Is there any possibility of having a bitmap chunk size of 512 bytes?
>>> I know that mdadm rejects anything under 4k. I fear that the
>>> assumption of the 4k minimum is embedded fairly strongly in the code.
>>> Can my fear be alleviated?
>>>
>>
>> If you're putting any normal filesystem (with a block size of 4k) on
>> this, then it makes absolutely no sense to have a bitmap size less than
>> 4k as any given filesystem block is either dirty or clean, sub-block
>> semantics make no sense in this scenario. That said, unless you have a
>
> Well, the mke2fs(8) man page says, "Valid block size values are 1024,
> 2048 and 4096 bytes per block. If omitted, mke2fs block-size is
> heuristically determined by the file system size..."
Yes, but you can always supply the -b option and tell it what you want.
I'm actually a little confused as to why you would quote this specific
part of the man page then at the end of the mail ask how you can force
block size on the filesystem...well, with this option you just quoted is
how.
> I always thought that filesystems typically used a block size of 4k,
They do, especially if you use any of the modern distributions. I think
they all pass -b 4096 in when calling mke2fs. But, even if they didn't,
I think you need a pretty small filesystem before mke2fs will
voluntarily grab a less than 4k block size.
> but apparently there is no guarantee that that is the case. Also, I'm
> not sure what windows uses as a filesystem block size. This is
> important to me, because we're trying to use md raid 1's to
> periodically synchronize blocks from a filesystem, and having
> sub-chunk writes messes things up for us. What we want is that a
> whole chunk gets written, then we fiddle with things so that a
> bitmap-driven resync copies those whole chunks. The chunks were not
> necessarily initialized before the write, so a sub-chunk write means
> garbage data is copied in the remainder of the chunk.
So? If this is truly a raid1, and it wasn't initialized prior to use,
and you copy a larger than write size chunk with garbage at the end, it
doesn't matter, the other drive has garbage there too so you are just
overwriting garbage with garbage. The only reason it would ever matter
if you copied garbage at the end is if you are trying to do a partial
raid1, where they aren't really fully raid1 mirror copies, but you are
using the bitmap to signal a sort of mask that you want copied and you
want other parts untouched. What you would be doing something like
this for I don't know, but good luck getting it to work with the md
raid1 code. It simply isn't intended to be used in that way, and even
if you manage to get it to work, I suspect it would be *VERY* fragile.
> We've seen this problem occur in practice, meaning either filesystems
> are not using 4k (or multiple thereof) chunks, or for some reason,
> writes are not 4k-aligned. Does anyone with more knowledge of
> filesystems know about this? Perhaps we can force block size and
> alignment of filesystems to make this work.
>
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-11-10 19:46 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-10 16:39 bitmap chunk size Darius S. Naqvi
2009-11-10 18:01 ` Doug Ledford
2009-11-10 18:36 ` Darius S. Naqvi
2009-11-10 19:46 ` Doug Ledford
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).