* Query regarding stripe management in linux md-raid @ 2009-04-28 14:02 Kunal Kushwaha 2009-04-28 23:15 ` Neil Brown 0 siblings, 1 reply; 6+ messages in thread From: Kunal Kushwaha @ 2009-04-28 14:02 UTC (permalink / raw) To: linux-raid Hi, I am trying to put Linux raid in Box with 256 MB of RAM. The kernel is compiled with non-swappable memory management option. I looked into raid5.c and found, it allocates one page for each chunk. I am using 5 disks for 64k chunk size. considering my kernel is within 30 MB. 1. If one page is allocated for one chunk instead of actual buffer size, isn't memory lot of memory is wasted? 2. It will restrict the no. of stripes in memory (in case of non-swappable memory, no will be very less) will effect performance of IO badly. Please correct me, if I am wrong, also any suggestion to overcome this problem is most welcomed. -- Thanks in Advance, Kunal Kushwaha-- ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Query regarding stripe management in linux md-raid 2009-04-28 14:02 Query regarding stripe management in linux md-raid Kunal Kushwaha @ 2009-04-28 23:15 ` Neil Brown 2009-04-29 7:27 ` Kunal Kushwaha 0 siblings, 1 reply; 6+ messages in thread From: Neil Brown @ 2009-04-28 23:15 UTC (permalink / raw) To: Kunal Kushwaha; +Cc: linux-raid On Tuesday April 28, kunal.kushwaha@gmail.com wrote: > Hi, > > I am trying to put Linux raid in Box with 256 MB of RAM. The kernel is > compiled with non-swappable memory management option. I looked into > raid5.c and found, it allocates one page for each chunk. I am using 5 > disks for 64k chunk size. considering my kernel is within 30 MB. That isn't quite right. It is not 1 page per chunk. raid5 maintains a stripe cache. Each entry in the cache has one page per device, and there are 256 entries by default. So for a 5-disk array, that is 5*256 == 1280 pages or 5MB (plus overhead). > > 1. If one page is allocated for one chunk instead of actual buffer > size, isn't memory lot of memory is wasted? It depends on what you mean by "wasted". The memory is used to provide adequate performance with manageable code complexity. Possibly the same performance could be achieved using less memory, but I suspect the code would be much more complex and so probably more buggy. It is a trade off. If the 5MB eats in to you 256MB too much, you can reduce it be writing a number to /sys/block/mdX/md/stripe_cache_size. You could probably get away with as little as '4', but I suspect that would really kill performance. With a 64K chunk size you want at least 16 entries, so 32 or 64 might be a suitable compromise for you. > 2. It will restrict the no. of stripes in memory (in case of > non-swappable memory, no will be very less) will effect performance of > IO badly. The number of (page-wide) stripes in memory is fixed by the cache size. It defaults to 256, but you can change it. A smaller size would be likely to negatively affect performance. A larger cache can improve performance depending on workload. > > Please correct me, if I am wrong, also any suggestion to overcome this > problem is most welcomed. Do you have an actual observed problem, or are you just trying to discover what problems you might eventually run in to? NeilBrown ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Query regarding stripe management in linux md-raid 2009-04-28 23:15 ` Neil Brown @ 2009-04-29 7:27 ` Kunal Kushwaha 2009-05-01 21:41 ` Goswin von Brederlow 0 siblings, 1 reply; 6+ messages in thread From: Kunal Kushwaha @ 2009-04-29 7:27 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid Hi Neil, Thanks for your reply. I am trying to find out problems that I could face later on. On Wed, Apr 29, 2009 at 7:15 AM, Neil Brown <neilb@suse.de> wrote: > On Tuesday April 28, kunal.kushwaha@gmail.com wrote: >> Hi, >> >> I am trying to put Linux raid in Box with 256 MB of RAM. The kernel is >> compiled with non-swappable memory management option. I looked into >> raid5.c and found, it allocates one page for each chunk. I am using 5 >> disks for 64k chunk size. considering my kernel is within 30 MB. > > That isn't quite right. It is not 1 page per chunk. > raid5 maintains a stripe cache. Each entry in the cache has one page > per device, and there are 256 entries by default. > So for a 5-disk array, that is 5*256 == 1280 pages or 5MB (plus > overhead). > Sorry I missed to mention about stripe cache. Yes we will be using only 5MB if we consider page size is of 4k. I have one more doubt regarding this. How a page of 4k will be able to store data of 64k( my chunk size if of 64k) ? >> >> 1. If one page is allocated for one chunk instead of actual buffer >> size, isn't memory lot of memory is wasted? > > It depends on what you mean by "wasted". The memory is used to > provide adequate performance with manageable code complexity. > Possibly the same performance could be achieved using less memory, but > I suspect the code would be much more complex and so probably more > buggy. It is a trade off. > > If the 5MB eats in to you 256MB too much, you can reduce it be writing > a number to /sys/block/mdX/md/stripe_cache_size. > You could probably get away with as little as '4', but I suspect that > would really kill performance. With a 64K chunk size you want at > least 16 entries, so 32 or 64 might be a suitable compromise for you. > >> 2. It will restrict the no. of stripes in memory (in case of >> non-swappable memory, no will be very less) will effect performance of >> IO badly. > > The number of (page-wide) stripes in memory is fixed by the cache > size. It defaults to 256, but you can change it. A smaller size > would be likely to negatively affect performance. A larger cache can > improve performance depending on workload. > >> >> Please correct me, if I am wrong, also any suggestion to overcome this >> problem is most welcomed. > > Do you have an actual observed problem, or are you just trying to > discover what problems you might eventually run in to? > > NeilBrown > -- Thanks in Advance, Kunal Kushwaha -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Query regarding stripe management in linux md-raid 2009-04-29 7:27 ` Kunal Kushwaha @ 2009-05-01 21:41 ` Goswin von Brederlow 2009-05-02 7:18 ` Kunal Kushwaha 0 siblings, 1 reply; 6+ messages in thread From: Goswin von Brederlow @ 2009-05-01 21:41 UTC (permalink / raw) To: Kunal Kushwaha; +Cc: Neil Brown, linux-raid Kunal Kushwaha <kunal.kushwaha@gmail.com> writes: > Hi Neil, > > Thanks for your reply. I am trying to find out problems that I could > face later on. > > On Wed, Apr 29, 2009 at 7:15 AM, Neil Brown <neilb@suse.de> wrote: >> On Tuesday April 28, kunal.kushwaha@gmail.com wrote: >>> Hi, >>> >>> I am trying to put Linux raid in Box with 256 MB of RAM. The kernel is >>> compiled with non-swappable memory management option. I looked into >>> raid5.c and found, it allocates one page for each chunk. I am using 5 >>> disks for 64k chunk size. considering my kernel is within 30 MB. >> >> That isn't quite right. It is not 1 page per chunk. >> raid5 maintains a stripe cache. Each entry in the cache has one page >> per device, and there are 256 entries by default. >> So for a 5-disk array, that is 5*256 == 1280 pages or 5MB (plus >> overhead). >> > > Sorry I missed to mention about stripe cache. Yes we will be using > only 5MB if we > consider page size is of 4k. > > I have one more doubt regarding this. How a page of 4k will be able to > store data of > 64k( my chunk size if of 64k) ? A chunk size of 64k does in no way mean all transactions are done in 64k chunks. What effect does chunk size actually do have? Was it just the amount of sequential data before the parity is rotated in a raid5/6. MfG Goswin -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Query regarding stripe management in linux md-raid 2009-05-01 21:41 ` Goswin von Brederlow @ 2009-05-02 7:18 ` Kunal Kushwaha 2009-05-04 17:04 ` Goswin von Brederlow 0 siblings, 1 reply; 6+ messages in thread From: Kunal Kushwaha @ 2009-05-02 7:18 UTC (permalink / raw) To: Goswin von Brederlow; +Cc: Neil Brown, linux-raid Hi Goswin, My understanding of chunk size is that, it is minimum unit of data that is used for read/write. Chunk size defines, amount of data written on one disk after breaking it in equal no of chunks. Chunk size = Stripe size / (disks in array - parity disks) so for raid 5, chunk_size = Stripe_Size / (total_disks -1) and for raid 6, chunk_size = Stripe_Size / total_disks -2) and parity is also written in terms of chunk size. Now considering a case where IO request came for complete 1 stripe, we need buffer, for 1 stripe, but since we allocated 1 page per chunk, instead of 64k buffer we have only 4k buffer, So how this is handled? Thanks & Regards, Kunal On Sat, May 2, 2009 at 3:11 AM, Goswin von Brederlow <goswin-v-b@web.de> wrote: > Kunal Kushwaha <kunal.kushwaha@gmail.com> writes: > >> Hi Neil, >> >> Thanks for your reply. I am trying to find out problems that I could >> face later on. >> >> On Wed, Apr 29, 2009 at 7:15 AM, Neil Brown <neilb@suse.de> wrote: >>> On Tuesday April 28, kunal.kushwaha@gmail.com wrote: >>>> Hi, >>>> >>>> I am trying to put Linux raid in Box with 256 MB of RAM. The kernel is >>>> compiled with non-swappable memory management option. I looked into >>>> raid5.c and found, it allocates one page for each chunk. I am using 5 >>>> disks for 64k chunk size. considering my kernel is within 30 MB. >>> >>> That isn't quite right. It is not 1 page per chunk. >>> raid5 maintains a stripe cache. Each entry in the cache has one page >>> per device, and there are 256 entries by default. >>> So for a 5-disk array, that is 5*256 == 1280 pages or 5MB (plus >>> overhead). >>> >> >> Sorry I missed to mention about stripe cache. Yes we will be using >> only 5MB if we >> consider page size is of 4k. >> >> I have one more doubt regarding this. How a page of 4k will be able to >> store data of >> 64k( my chunk size if of 64k) ? > > A chunk size of 64k does in no way mean all transactions are done in > 64k chunks. > > What effect does chunk size actually do have? Was it just the amount > of sequential data before the parity is rotated in a raid5/6. > > MfG > Goswin > -- Regards, Kunal Kushwaha -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Query regarding stripe management in linux md-raid 2009-05-02 7:18 ` Kunal Kushwaha @ 2009-05-04 17:04 ` Goswin von Brederlow 0 siblings, 0 replies; 6+ messages in thread From: Goswin von Brederlow @ 2009-05-04 17:04 UTC (permalink / raw) To: Kunal Kushwaha; +Cc: Goswin von Brederlow, Neil Brown, linux-raid Kunal Kushwaha <kunal.kushwaha@gmail.com> writes: > Hi Goswin, > > My understanding of chunk size is that, it is minimum unit of data > that is used for read/write. > Chunk size defines, amount of data written on one disk after breaking > it in equal no of chunks. > Chunk size = Stripe size / (disks in array - parity disks) > > so for raid 5, chunk_size = Stripe_Size / (total_disks -1) > and for raid 6, chunk_size = Stripe_Size / total_disks -2) > > and parity is also written in terms of chunk size. > > Now considering a case where IO request came for complete 1 stripe, we > need buffer, for 1 stripe, but since we allocated 1 page per chunk, > instead of 64k buffer we have only 4k buffer, So how this is handled? No. read/write are generaly handled on a page size level, which almost always means 4k. If you think about it you can easily split a chunk into 4k bits and handle each one independently. MfG Goswin ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-05-04 17:04 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-04-28 14:02 Query regarding stripe management in linux md-raid Kunal Kushwaha 2009-04-28 23:15 ` Neil Brown 2009-04-29 7:27 ` Kunal Kushwaha 2009-05-01 21:41 ` Goswin von Brederlow 2009-05-02 7:18 ` Kunal Kushwaha 2009-05-04 17:04 ` Goswin von Brederlow
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).