* Query regarding stripe management in linux md-raid
@ 2009-04-28 14:02 Kunal Kushwaha
2009-04-28 23:15 ` Neil Brown
0 siblings, 1 reply; 6+ messages in thread
From: Kunal Kushwaha @ 2009-04-28 14:02 UTC (permalink / raw)
To: linux-raid
Hi,
I am trying to put Linux raid in Box with 256 MB of RAM. The kernel is
compiled with non-swappable memory management option. I looked into
raid5.c and found, it allocates one page for each chunk. I am using 5
disks for 64k chunk size. considering my kernel is within 30 MB.
1. If one page is allocated for one chunk instead of actual buffer
size, isn't memory lot of memory is wasted?
2. It will restrict the no. of stripes in memory (in case of
non-swappable memory, no will be very less) will effect performance of
IO badly.
Please correct me, if I am wrong, also any suggestion to overcome this
problem is most welcomed.
--
Thanks in Advance,
Kunal Kushwaha--
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Query regarding stripe management in linux md-raid
2009-04-28 14:02 Query regarding stripe management in linux md-raid Kunal Kushwaha
@ 2009-04-28 23:15 ` Neil Brown
2009-04-29 7:27 ` Kunal Kushwaha
0 siblings, 1 reply; 6+ messages in thread
From: Neil Brown @ 2009-04-28 23:15 UTC (permalink / raw)
To: Kunal Kushwaha; +Cc: linux-raid
On Tuesday April 28, kunal.kushwaha@gmail.com wrote:
> Hi,
>
> I am trying to put Linux raid in Box with 256 MB of RAM. The kernel is
> compiled with non-swappable memory management option. I looked into
> raid5.c and found, it allocates one page for each chunk. I am using 5
> disks for 64k chunk size. considering my kernel is within 30 MB.
That isn't quite right. It is not 1 page per chunk.
raid5 maintains a stripe cache. Each entry in the cache has one page
per device, and there are 256 entries by default.
So for a 5-disk array, that is 5*256 == 1280 pages or 5MB (plus
overhead).
>
> 1. If one page is allocated for one chunk instead of actual buffer
> size, isn't memory lot of memory is wasted?
It depends on what you mean by "wasted". The memory is used to
provide adequate performance with manageable code complexity.
Possibly the same performance could be achieved using less memory, but
I suspect the code would be much more complex and so probably more
buggy. It is a trade off.
If the 5MB eats in to you 256MB too much, you can reduce it be writing
a number to /sys/block/mdX/md/stripe_cache_size.
You could probably get away with as little as '4', but I suspect that
would really kill performance. With a 64K chunk size you want at
least 16 entries, so 32 or 64 might be a suitable compromise for you.
> 2. It will restrict the no. of stripes in memory (in case of
> non-swappable memory, no will be very less) will effect performance of
> IO badly.
The number of (page-wide) stripes in memory is fixed by the cache
size. It defaults to 256, but you can change it. A smaller size
would be likely to negatively affect performance. A larger cache can
improve performance depending on workload.
>
> Please correct me, if I am wrong, also any suggestion to overcome this
> problem is most welcomed.
Do you have an actual observed problem, or are you just trying to
discover what problems you might eventually run in to?
NeilBrown
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Query regarding stripe management in linux md-raid
2009-04-28 23:15 ` Neil Brown
@ 2009-04-29 7:27 ` Kunal Kushwaha
2009-05-01 21:41 ` Goswin von Brederlow
0 siblings, 1 reply; 6+ messages in thread
From: Kunal Kushwaha @ 2009-04-29 7:27 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Hi Neil,
Thanks for your reply. I am trying to find out problems that I could
face later on.
On Wed, Apr 29, 2009 at 7:15 AM, Neil Brown <neilb@suse.de> wrote:
> On Tuesday April 28, kunal.kushwaha@gmail.com wrote:
>> Hi,
>>
>> I am trying to put Linux raid in Box with 256 MB of RAM. The kernel is
>> compiled with non-swappable memory management option. I looked into
>> raid5.c and found, it allocates one page for each chunk. I am using 5
>> disks for 64k chunk size. considering my kernel is within 30 MB.
>
> That isn't quite right. It is not 1 page per chunk.
> raid5 maintains a stripe cache. Each entry in the cache has one page
> per device, and there are 256 entries by default.
> So for a 5-disk array, that is 5*256 == 1280 pages or 5MB (plus
> overhead).
>
Sorry I missed to mention about stripe cache. Yes we will be using
only 5MB if we
consider page size is of 4k.
I have one more doubt regarding this. How a page of 4k will be able to
store data of
64k( my chunk size if of 64k) ?
>>
>> 1. If one page is allocated for one chunk instead of actual buffer
>> size, isn't memory lot of memory is wasted?
>
> It depends on what you mean by "wasted". The memory is used to
> provide adequate performance with manageable code complexity.
> Possibly the same performance could be achieved using less memory, but
> I suspect the code would be much more complex and so probably more
> buggy. It is a trade off.
>
> If the 5MB eats in to you 256MB too much, you can reduce it be writing
> a number to /sys/block/mdX/md/stripe_cache_size.
> You could probably get away with as little as '4', but I suspect that
> would really kill performance. With a 64K chunk size you want at
> least 16 entries, so 32 or 64 might be a suitable compromise for you.
>
>> 2. It will restrict the no. of stripes in memory (in case of
>> non-swappable memory, no will be very less) will effect performance of
>> IO badly.
>
> The number of (page-wide) stripes in memory is fixed by the cache
> size. It defaults to 256, but you can change it. A smaller size
> would be likely to negatively affect performance. A larger cache can
> improve performance depending on workload.
>
>>
>> Please correct me, if I am wrong, also any suggestion to overcome this
>> problem is most welcomed.
>
> Do you have an actual observed problem, or are you just trying to
> discover what problems you might eventually run in to?
>
> NeilBrown
>
--
Thanks in Advance,
Kunal Kushwaha
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Query regarding stripe management in linux md-raid
2009-04-29 7:27 ` Kunal Kushwaha
@ 2009-05-01 21:41 ` Goswin von Brederlow
2009-05-02 7:18 ` Kunal Kushwaha
0 siblings, 1 reply; 6+ messages in thread
From: Goswin von Brederlow @ 2009-05-01 21:41 UTC (permalink / raw)
To: Kunal Kushwaha; +Cc: Neil Brown, linux-raid
Kunal Kushwaha <kunal.kushwaha@gmail.com> writes:
> Hi Neil,
>
> Thanks for your reply. I am trying to find out problems that I could
> face later on.
>
> On Wed, Apr 29, 2009 at 7:15 AM, Neil Brown <neilb@suse.de> wrote:
>> On Tuesday April 28, kunal.kushwaha@gmail.com wrote:
>>> Hi,
>>>
>>> I am trying to put Linux raid in Box with 256 MB of RAM. The kernel is
>>> compiled with non-swappable memory management option. I looked into
>>> raid5.c and found, it allocates one page for each chunk. I am using 5
>>> disks for 64k chunk size. considering my kernel is within 30 MB.
>>
>> That isn't quite right. It is not 1 page per chunk.
>> raid5 maintains a stripe cache. Each entry in the cache has one page
>> per device, and there are 256 entries by default.
>> So for a 5-disk array, that is 5*256 == 1280 pages or 5MB (plus
>> overhead).
>>
>
> Sorry I missed to mention about stripe cache. Yes we will be using
> only 5MB if we
> consider page size is of 4k.
>
> I have one more doubt regarding this. How a page of 4k will be able to
> store data of
> 64k( my chunk size if of 64k) ?
A chunk size of 64k does in no way mean all transactions are done in
64k chunks.
What effect does chunk size actually do have? Was it just the amount
of sequential data before the parity is rotated in a raid5/6.
MfG
Goswin
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Query regarding stripe management in linux md-raid
2009-05-01 21:41 ` Goswin von Brederlow
@ 2009-05-02 7:18 ` Kunal Kushwaha
2009-05-04 17:04 ` Goswin von Brederlow
0 siblings, 1 reply; 6+ messages in thread
From: Kunal Kushwaha @ 2009-05-02 7:18 UTC (permalink / raw)
To: Goswin von Brederlow; +Cc: Neil Brown, linux-raid
Hi Goswin,
My understanding of chunk size is that, it is minimum unit of data
that is used for read/write.
Chunk size defines, amount of data written on one disk after breaking
it in equal no of chunks.
Chunk size = Stripe size / (disks in array - parity disks)
so for raid 5, chunk_size = Stripe_Size / (total_disks -1)
and for raid 6, chunk_size = Stripe_Size / total_disks -2)
and parity is also written in terms of chunk size.
Now considering a case where IO request came for complete 1 stripe, we
need buffer, for 1 stripe, but since we allocated 1 page per chunk,
instead of 64k buffer we have only 4k buffer, So how this is handled?
Thanks & Regards,
Kunal
On Sat, May 2, 2009 at 3:11 AM, Goswin von Brederlow <goswin-v-b@web.de> wrote:
> Kunal Kushwaha <kunal.kushwaha@gmail.com> writes:
>
>> Hi Neil,
>>
>> Thanks for your reply. I am trying to find out problems that I could
>> face later on.
>>
>> On Wed, Apr 29, 2009 at 7:15 AM, Neil Brown <neilb@suse.de> wrote:
>>> On Tuesday April 28, kunal.kushwaha@gmail.com wrote:
>>>> Hi,
>>>>
>>>> I am trying to put Linux raid in Box with 256 MB of RAM. The kernel is
>>>> compiled with non-swappable memory management option. I looked into
>>>> raid5.c and found, it allocates one page for each chunk. I am using 5
>>>> disks for 64k chunk size. considering my kernel is within 30 MB.
>>>
>>> That isn't quite right. It is not 1 page per chunk.
>>> raid5 maintains a stripe cache. Each entry in the cache has one page
>>> per device, and there are 256 entries by default.
>>> So for a 5-disk array, that is 5*256 == 1280 pages or 5MB (plus
>>> overhead).
>>>
>>
>> Sorry I missed to mention about stripe cache. Yes we will be using
>> only 5MB if we
>> consider page size is of 4k.
>>
>> I have one more doubt regarding this. How a page of 4k will be able to
>> store data of
>> 64k( my chunk size if of 64k) ?
>
> A chunk size of 64k does in no way mean all transactions are done in
> 64k chunks.
>
> What effect does chunk size actually do have? Was it just the amount
> of sequential data before the parity is rotated in a raid5/6.
>
> MfG
> Goswin
>
--
Regards,
Kunal Kushwaha
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Query regarding stripe management in linux md-raid
2009-05-02 7:18 ` Kunal Kushwaha
@ 2009-05-04 17:04 ` Goswin von Brederlow
0 siblings, 0 replies; 6+ messages in thread
From: Goswin von Brederlow @ 2009-05-04 17:04 UTC (permalink / raw)
To: Kunal Kushwaha; +Cc: Goswin von Brederlow, Neil Brown, linux-raid
Kunal Kushwaha <kunal.kushwaha@gmail.com> writes:
> Hi Goswin,
>
> My understanding of chunk size is that, it is minimum unit of data
> that is used for read/write.
> Chunk size defines, amount of data written on one disk after breaking
> it in equal no of chunks.
> Chunk size = Stripe size / (disks in array - parity disks)
>
> so for raid 5, chunk_size = Stripe_Size / (total_disks -1)
> and for raid 6, chunk_size = Stripe_Size / total_disks -2)
>
> and parity is also written in terms of chunk size.
>
> Now considering a case where IO request came for complete 1 stripe, we
> need buffer, for 1 stripe, but since we allocated 1 page per chunk,
> instead of 64k buffer we have only 4k buffer, So how this is handled?
No. read/write are generaly handled on a page size level, which almost
always means 4k. If you think about it you can easily split a chunk
into 4k bits and handle each one independently.
MfG
Goswin
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-05-04 17:04 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-28 14:02 Query regarding stripe management in linux md-raid Kunal Kushwaha
2009-04-28 23:15 ` Neil Brown
2009-04-29 7:27 ` Kunal Kushwaha
2009-05-01 21:41 ` Goswin von Brederlow
2009-05-02 7:18 ` Kunal Kushwaha
2009-05-04 17:04 ` Goswin von Brederlow
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).