linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* stripe_cache_size, some info
@ 2017-03-19 18:35 Gandalf Corvotempesta
  2017-03-20 15:59 ` Wols Lists
  2017-03-24  5:47 ` NeilBrown
  0 siblings, 2 replies; 7+ messages in thread
From: Gandalf Corvotempesta @ 2017-03-19 18:35 UTC (permalink / raw)
  To: linux-raid

Hi to all,
I need some info about stripe_cache_size

Is that a sort of "writeback" cache? Higher the number, higher the
amount of data to be cached in ram before writing to disks, right?

Some questions:

1) any upper limit for that value ? Can I set it near 1GB like most
hardware controller?
2) why on my RAID-6 I don't have /sys/block/md0/md/stripe_cache_size ?

As I would like to replace most of our HW raid controller with mdadm,
any suggestion on how to improve RAID-6 speed ?

Modern CPU aren't an issue, I don't think that double-parity
calculation could create any bottleneck on a modern CPU.
The real advantages of a raid controller are mostly 2:

1) the writeback cache (1GB or 2GB)
2) the ability to automatically replace a disk by hotswapping it.

Any solution to this ? For the "2", i've tried by configuring the
POLICY in mdadm.conf but new disk is never reconized and I always have
to manually add the new disk to the array.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: stripe_cache_size, some info
  2017-03-19 18:35 stripe_cache_size, some info Gandalf Corvotempesta
@ 2017-03-20 15:59 ` Wols Lists
  2017-03-20 16:13   ` Gandalf Corvotempesta
  2017-03-24  5:47 ` NeilBrown
  1 sibling, 1 reply; 7+ messages in thread
From: Wols Lists @ 2017-03-20 15:59 UTC (permalink / raw)
  To: Gandalf Corvotempesta, linux-raid

On 19/03/17 18:35, Gandalf Corvotempesta wrote:
> As I would like to replace most of our HW raid controller with mdadm,
> any suggestion on how to improve RAID-6 speed ?

Burst speed, or sustained speed? Big difference ...
> 
> Modern CPU aren't an issue, I don't think that double-parity
> calculation could create any bottleneck on a modern CPU.

Using a journal on an SSD will offload stuff and give you a decent burst
speed, I suspect. You'll need to get benchmarks, but that should mean
you don't notice a slow background write speed.

> The real advantages of a raid controller are mostly 2:
> 
> 1) the writeback cache (1GB or 2GB)
> 2) the ability to automatically replace a disk by hotswapping it.
> 
> Any solution to this ? For the "2", i've tried by configuring the
> POLICY in mdadm.conf but new disk is never reconized and I always have
> to manually add the new disk to the array.

I think you have to manually add the disk to the array (group) as a
spare first.

And I would avoid that entirely if I can - put the new disk in, do a
--replace, and then remove the old one. Doing a hotswap like that will
increase the stress on the array, and increased stress means another
disk is more likely to fail.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: stripe_cache_size, some info
  2017-03-20 15:59 ` Wols Lists
@ 2017-03-20 16:13   ` Gandalf Corvotempesta
  2017-03-20 16:22     ` Wols Lists
  0 siblings, 1 reply; 7+ messages in thread
From: Gandalf Corvotempesta @ 2017-03-20 16:13 UTC (permalink / raw)
  To: Wols Lists; +Cc: linux-raid

2017-03-20 16:59 GMT+01:00 Wols Lists <antlists@youngman.org.uk>:
> Burst speed, or sustained speed? Big difference ...

Both :)

> And I would avoid that entirely if I can - put the new disk in, do a
> --replace, and then remove the old one. Doing a hotswap like that will
> increase the stress on the array, and increased stress means another
> disk is more likely to fail.

On newer server, i'll tend to avoid using all slots because of this.
With at least 1 slot available, cool things could be done, like
replace disks without compromize redundancy and so on.

But on older server, i don't have enough slot available and the only
way to replace a disk is.... directly replace a disk :)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: stripe_cache_size, some info
  2017-03-20 16:13   ` Gandalf Corvotempesta
@ 2017-03-20 16:22     ` Wols Lists
  2017-03-20 16:24       ` Gandalf Corvotempesta
  0 siblings, 1 reply; 7+ messages in thread
From: Wols Lists @ 2017-03-20 16:22 UTC (permalink / raw)
  To: Gandalf Corvotempesta; +Cc: linux-raid

On 20/03/17 16:13, Gandalf Corvotempesta wrote:
> 2017-03-20 16:59 GMT+01:00 Wols Lists <antlists@youngman.org.uk>:
>> Burst speed, or sustained speed? Big difference ...
> 
> Both :)
> 
>> And I would avoid that entirely if I can - put the new disk in, do a
>> --replace, and then remove the old one. Doing a hotswap like that will
>> increase the stress on the array, and increased stress means another
>> disk is more likely to fail.
> 
> On newer server, i'll tend to avoid using all slots because of this.
> With at least 1 slot available, cool things could be done, like
> replace disks without compromize redundancy and so on.
> 
> But on older server, i don't have enough slot available and the only
> way to replace a disk is.... directly replace a disk :)
> 
Get a cheap PCI(e) SATA card! You should be able to get something for
around GBP20, and if it's only temporary who cares if drives and cables
are left all over the place so long as the data is safe while you're
updating. :-)

Cheers,
Wol

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: stripe_cache_size, some info
  2017-03-20 16:22     ` Wols Lists
@ 2017-03-20 16:24       ` Gandalf Corvotempesta
  0 siblings, 0 replies; 7+ messages in thread
From: Gandalf Corvotempesta @ 2017-03-20 16:24 UTC (permalink / raw)
  To: Wols Lists; +Cc: linux-raid

2017-03-20 17:22 GMT+01:00 Wols Lists <antlists@youngman.org.uk>:
> Get a cheap PCI(e) SATA card! You should be able to get something for
> around GBP20, and if it's only temporary who cares if drives and cables
> are left all over the place so long as the data is safe while you're
> updating. :-)

This means:

1) that i'm using SATA on my servers
2) that I can power down for adding a new card
3) that I have enough HDD slots available

but

1) usually I use SAS
2) I can't power down the server when replacing a disk.
3) our older server have all slot full.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: stripe_cache_size, some info
  2017-03-19 18:35 stripe_cache_size, some info Gandalf Corvotempesta
  2017-03-20 15:59 ` Wols Lists
@ 2017-03-24  5:47 ` NeilBrown
  2017-03-24  6:00   ` Roman Mamedov
  1 sibling, 1 reply; 7+ messages in thread
From: NeilBrown @ 2017-03-24  5:47 UTC (permalink / raw)
  To: Gandalf Corvotempesta, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1955 bytes --]

On Sun, Mar 19 2017, Gandalf Corvotempesta wrote:

> Hi to all,
> I need some info about stripe_cache_size
>
> Is that a sort of "writeback" cache? Higher the number, higher the
> amount of data to be cached in ram before writing to disks, right?

No.
All of memory is potentially a write-back cache for all filesystems.
the stripe_cache is a write-through cache which holds strips (one page
per device) while reading missing blocks and computing parity.

When the array is degraded, it is also used to hold the blocks in a
strip will calculating the missing data.

>
> Some questions:
>
> 1) any upper limit for that value ? Can I set it near 1GB like most
> hardware controller?

It is currently limited to 32768, for no particularly good reason.
Several stripes (so several times 64 with the default chunk size) is
good.  Many stripes might help very random workloads.


> 2) why on my RAID-6 I don't have /sys/block/md0/md/stripe_cache_size ?

No idea.  I certainly should do.


>
> As I would like to replace most of our HW raid controller with mdadm,
> any suggestion on how to improve RAID-6 speed ?
>
> Modern CPU aren't an issue, I don't think that double-parity
> calculation could create any bottleneck on a modern CPU.
> The real advantages of a raid controller are mostly 2:
>
> 1) the writeback cache (1GB or 2GB)
> 2) the ability to automatically replace a disk by hotswapping it.
>
> Any solution to this ? For the "2", i've tried by configuring the
> POLICY in mdadm.conf but new disk is never reconized and I always have
> to manually add the new disk to the array.

What, precisely, have you tried?  Please provide exact contents of
config files (i.e mdadm.conf) and exact steps you took and what you
expected to happen.

NeilBrown


> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: stripe_cache_size, some info
  2017-03-24  5:47 ` NeilBrown
@ 2017-03-24  6:00   ` Roman Mamedov
  0 siblings, 0 replies; 7+ messages in thread
From: Roman Mamedov @ 2017-03-24  6:00 UTC (permalink / raw)
  To: NeilBrown; +Cc: Gandalf Corvotempesta, linux-raid

On Fri, 24 Mar 2017 16:47:47 +1100
NeilBrown <neilb@suse.com> wrote:

> > 1) any upper limit for that value ? Can I set it near 1GB like most
> > hardware controller?
> 
> It is currently limited to 32768, for no particularly good reason.

I feel it should be clarified that's in *PAGES* (4K on x86) per *ARRAY MEMBER*.

So the calculation can get a little bit complex, it's not like you're setting
it to 32 MB with that number.

For a 8-drive array that would be 32768 * 4096 * 8 = 1073741824 bytes (1GB).

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-03-24  6:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-03-19 18:35 stripe_cache_size, some info Gandalf Corvotempesta
2017-03-20 15:59 ` Wols Lists
2017-03-20 16:13   ` Gandalf Corvotempesta
2017-03-20 16:22     ` Wols Lists
2017-03-20 16:24       ` Gandalf Corvotempesta
2017-03-24  5:47 ` NeilBrown
2017-03-24  6:00   ` Roman Mamedov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).