* stripe cache question @ 2011-02-24 21:06 Piergiorgio Sartor 2011-02-25 3:51 ` NeilBrown 0 siblings, 1 reply; 6+ messages in thread From: Piergiorgio Sartor @ 2011-02-24 21:06 UTC (permalink / raw) To: linux-raid Hi all, few posts ago was mentioned that the unit of the stripe cache are "pages per device", usually 4K pages. Questions: 1) Does "device" means raid (md) device or component device (HDD)? 2) The max possible value seems to be 32768, which means, in case of 4K page per md device, a max of 128MiB of RAM. Is this by design? Would it be possible to increase up to whatever is available? Thanks, bye, -- piergiorgio ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: stripe cache question 2011-02-24 21:06 stripe cache question Piergiorgio Sartor @ 2011-02-25 3:51 ` NeilBrown 2011-02-26 10:21 ` Piergiorgio Sartor 0 siblings, 1 reply; 6+ messages in thread From: NeilBrown @ 2011-02-25 3:51 UTC (permalink / raw) To: Piergiorgio Sartor; +Cc: linux-raid On Thu, 24 Feb 2011 22:06:43 +0100 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de> wrote: > Hi all, > > few posts ago was mentioned that the unit of the stripe > cache are "pages per device", usually 4K pages. > > Questions: > > 1) Does "device" means raid (md) device or component > device (HDD)? component device. In drivers/md/raid5.[ch] there is a 'struct stripe_head'. It holds one page per component device (ignoring spares). Several of these comprise the 'cache'. The 'size' of the cache is the number oof 'struct stripe_head' and associated pages that are allocated. > > 2) The max possible value seems to be 32768, which > means, in case of 4K page per md device, a max of > 128MiB of RAM. > Is this by design? Would it be possible to increase > up to whatever is available? 32768 is just an arbitrary number. It is there in raid5.c and is easy to change (for people comfortable with recompiling their kernels). I wanted an upper limit because setting it too high could easily cause your machine to run out of memory and become very sluggish - or worse. Ideally the cache should be automatically sized based on demand and memory size - with maybe just a tunable to select between "use a much memory as you need - within reason" verse "use a little memory as you can manage with". But that requires thought and design and code and .... it just never seemed like a priority. NeilBrown > > Thanks, > > bye, > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: stripe cache question 2011-02-25 3:51 ` NeilBrown @ 2011-02-26 10:21 ` Piergiorgio Sartor 2011-02-27 4:43 ` NeilBrown 0 siblings, 1 reply; 6+ messages in thread From: Piergiorgio Sartor @ 2011-02-26 10:21 UTC (permalink / raw) To: NeilBrown; +Cc: Piergiorgio Sartor, linux-raid On Fri, Feb 25, 2011 at 02:51:25PM +1100, NeilBrown wrote: > On Thu, 24 Feb 2011 22:06:43 +0100 Piergiorgio Sartor > <piergiorgio.sartor@nexgo.de> wrote: > > > Hi all, > > > > few posts ago was mentioned that the unit of the stripe > > cache are "pages per device", usually 4K pages. > > > > Questions: > > > > 1) Does "device" means raid (md) device or component > > device (HDD)? > > component device. > In drivers/md/raid5.[ch] there is a 'struct stripe_head'. > It holds one page per component device (ignoring spares). > Several of these comprise the 'cache'. The 'size' of the cache is the number > oof 'struct stripe_head' and associated pages that are allocated. > > > > > > 2) The max possible value seems to be 32768, which > > means, in case of 4K page per md device, a max of > > 128MiB of RAM. > > Is this by design? Would it be possible to increase > > up to whatever is available? > > 32768 is just an arbitrary number. It is there in raid5.c and is easy to > change (for people comfortable with recompiling their kernels). Ah! I found it. Maybe, considering currently available memory you should think about incresing it to, for example, 128K or 512K. > I wanted an upper limit because setting it too high could easily cause your > machine to run out of memory and become very sluggish - or worse. > > Ideally the cache should be automatically sized based on demand and memory > size - with maybe just a tunable to select between "use a much memory as you > need - within reason" verse "use a little memory as you can manage with". > > But that requires thought and design and code and .... it just never seemed > like a priority. You're a bit contraddicting your philosopy of "let's do the smart things in user space"... :-) IMHO, if really necessary, it could be enough to have this "upper limit" avaialable in sysfs. Then user space can decide what to do with it. For example, at boot the amount of memory is checked and the upper limit set. I see a duplication here, maybe better just remove the upper limit and let user space to deal with that. bye, -- piergiorgio ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: stripe cache question 2011-02-26 10:21 ` Piergiorgio Sartor @ 2011-02-27 4:43 ` NeilBrown 2011-02-27 11:37 ` Piergiorgio Sartor 2011-03-06 20:08 ` Piergiorgio Sartor 0 siblings, 2 replies; 6+ messages in thread From: NeilBrown @ 2011-02-27 4:43 UTC (permalink / raw) To: Piergiorgio Sartor; +Cc: linux-raid On Sat, 26 Feb 2011 11:21:28 +0100 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de> wrote: > On Fri, Feb 25, 2011 at 02:51:25PM +1100, NeilBrown wrote: > > On Thu, 24 Feb 2011 22:06:43 +0100 Piergiorgio Sartor > > <piergiorgio.sartor@nexgo.de> wrote: > > > > > Hi all, > > > > > > few posts ago was mentioned that the unit of the stripe > > > cache are "pages per device", usually 4K pages. > > > > > > Questions: > > > > > > 1) Does "device" means raid (md) device or component > > > device (HDD)? > > > > component device. > > In drivers/md/raid5.[ch] there is a 'struct stripe_head'. > > It holds one page per component device (ignoring spares). > > Several of these comprise the 'cache'. The 'size' of the cache is the number > > oof 'struct stripe_head' and associated pages that are allocated. > > > > > > > > > > 2) The max possible value seems to be 32768, which > > > means, in case of 4K page per md device, a max of > > > 128MiB of RAM. > > > Is this by design? Would it be possible to increase > > > up to whatever is available? > > > > 32768 is just an arbitrary number. It is there in raid5.c and is easy to > > change (for people comfortable with recompiling their kernels). > > Ah! I found it. Maybe, considering currently > available memory you should think about incresing > it to, for example, 128K or 512K. > > > I wanted an upper limit because setting it too high could easily cause your > > machine to run out of memory and become very sluggish - or worse. > > > > Ideally the cache should be automatically sized based on demand and memory > > size - with maybe just a tunable to select between "use a much memory as you > > need - within reason" verse "use a little memory as you can manage with". > > > > But that requires thought and design and code and .... it just never seemed > > like a priority. > > You're a bit contraddicting your philosopy of > "let's do the smart things in user space"... :-) > > IMHO, if really necessary, it could be enough to > have this "upper limit" avaialable in sysfs. > > Then user space can decide what to do with it. > > For example, at boot the amount of memory is checked > and the upper limit set. > I see a duplication here, maybe better just remove > the upper limit and let user space to deal with that. Maybe.... I still feel I want some sort of built-in protection... Maybe if I did all the allocations with "__GFP_WAIT" clear so that it would only allocate memory that is easily available. It wouldn't be a hard guarantee against running out, but it might help.. Maybe you could try removing the limit and see what actually happens when you set a ridiculously large size.?? NeilBrown ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: stripe cache question 2011-02-27 4:43 ` NeilBrown @ 2011-02-27 11:37 ` Piergiorgio Sartor 2011-03-06 20:08 ` Piergiorgio Sartor 1 sibling, 0 replies; 6+ messages in thread From: Piergiorgio Sartor @ 2011-02-27 11:37 UTC (permalink / raw) To: NeilBrown; +Cc: Piergiorgio Sartor, linux-raid > > > Ideally the cache should be automatically sized based on demand and memory > > > size - with maybe just a tunable to select between "use a much memory as you > > > need - within reason" verse "use a little memory as you can manage with". > > > > > > But that requires thought and design and code and .... it just never seemed > > > like a priority. > > > > You're a bit contraddicting your philosopy of > > "let's do the smart things in user space"... :-) > > > > IMHO, if really necessary, it could be enough to > > have this "upper limit" avaialable in sysfs. > > > > Then user space can decide what to do with it. > > > > For example, at boot the amount of memory is checked > > and the upper limit set. > > I see a duplication here, maybe better just remove > > the upper limit and let user space to deal with that. > > > Maybe.... I still feel I want some sort of built-in protection... As I wrote, I think a second sysfs entry, with the upper limit, could be enough. It allows flexibility and somehow protection. It would be required two _coordinated_ access to sysfs in order to break the limit, which is unlikely to happen by random. That is, at boot /sys/block/mdX/md/stripe_cache_limit will be 32768 and the "cache_size" will be 256. If someone wants to play with the cache size, will be able to top it to 32768. Otherwise, the first entry has to be changed to higher values (min value should be the cache_size). This is, of course, a duplication, but it enforce a certain process (two accesses) giving then some degree of protection. I guess, but you're the expert, this should be easier than other solutions. > Maybe if I did all the allocations with "__GFP_WAIT" clear so that it would > only allocate memory that is easily available. It wouldn't be a hard > guarantee against running out, but it might help.. Again, I think you're over-designing it. BTW, I hope that is unswappable memory, or? > Maybe you could try removing the limit and see what actually happens when > you set a ridiculously large size.?? Yes and no. The home PC has a RAID-10f2, the work PC has a RAID-5, but I do not want to play with the kernel on it. I guess using loop devices will not be meaningful. As soon as I manage to build the RAID-6 NAS I could give it a try, but this has no "schedule" right now. bye, -- piergiorgio ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: stripe cache question 2011-02-27 4:43 ` NeilBrown 2011-02-27 11:37 ` Piergiorgio Sartor @ 2011-03-06 20:08 ` Piergiorgio Sartor 1 sibling, 0 replies; 6+ messages in thread From: Piergiorgio Sartor @ 2011-03-06 20:08 UTC (permalink / raw) To: NeilBrown; +Cc: Piergiorgio Sartor, linux-raid > Maybe you could try removing the limit and see what actually happens when > you set a ridiculously large size.?? Actually, I tried (unintentionally) something similar. The storage I've has 7 RAID-6 arrays, so I tried to increase the stripe_cache_size to 32768 on each array. Of course, while writing to the array... The first 3 have 10, 10 and 9 HDDs, so with the third array about 3.6GiB were allocated out of 4GiB the PC has. At this point the PC was completely unresponsive, but still working, i.e. it was still writing to the array. Also ssh did not answer in time. Nevertheless, it was not dead or locked, just extremely slow, in fact, once the writing finished, the PC was again working as before. I guess there was a lot of swapping going on... In any case, an upper limit seems to be necessary, but it should be consistent with all available RAM. It does not help, it seems, to limit arrays independently, there should a be a "global" limit, so that the _sum_ of the caches does no exceed this limit. Hope this helps, bye, -- piergiorgio ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-03-06 20:08 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-02-24 21:06 stripe cache question Piergiorgio Sartor 2011-02-25 3:51 ` NeilBrown 2011-02-26 10:21 ` Piergiorgio Sartor 2011-02-27 4:43 ` NeilBrown 2011-02-27 11:37 ` Piergiorgio Sartor 2011-03-06 20:08 ` Piergiorgio Sartor
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).