Discuss: New default recovery config settings

All of lore.kernel.org
 help / color / mirror / Atom feed

* Discuss: New default recovery config settings
       [not found] <1939533999.9756941.1432935747903.JavaMail.zimbra@redhat.com>
@ 2015-05-29 21:47 ` Samuel Just
  2015-05-29 22:16   ` Milosz Tanski
                     ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Samuel Just @ 2015-05-29 21:47 UTC (permalink / raw)
  To: ceph-devel,
	'ceph-users@lists.ceph.com' (ceph-users@lists.ceph.com)

Many people have reported that they need to lower the osd recovery config options to minimize the impact of recovery on client io.  We are talking about changing the defaults as follows:

osd_max_backfills to 1 (from 10)
osd_recovery_max_active to 3 (from 15)
osd_recovery_op_priority to 1 (from 10)
osd_recovery_max_single_start to 1 (from 5)

We'd like a bit of feedback first though.  Is anyone happy with the current configs?  Is anyone using something between these values and the current defaults?  What kind of workload?  I'd guess that lowering osd_max_backfills to 1 is probably a good idea, but I wonder whether lowering osd_recovery_max_active and osd_recovery_max_single_start will cause small objects to recover unacceptably slowly.

Thoughts?
-Sam

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Discuss: New default recovery config settings
  2015-05-29 21:47 ` Discuss: New default recovery config settings Samuel Just
@ 2015-05-29 22:16   ` Milosz Tanski
       [not found]   ` <1394947829.9758745.1432936033017.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2015-05-31 14:29   ` Justin Erenkrantz
  2 siblings, 0 replies; 18+ messages in thread
From: Milosz Tanski @ 2015-05-29 22:16 UTC (permalink / raw)
  To: Samuel Just
  Cc: ceph-devel,
	'ceph-users@lists.ceph.com' (ceph-users@lists.ceph.com)

On Fri, May 29, 2015 at 5:47 PM, Samuel Just <sjust@redhat.com> wrote:
> Many people have reported that they need to lower the osd recovery config options to minimize the impact of recovery on client io.  We are talking about changing the defaults as follows:
>
> osd_max_backfills to 1 (from 10)
> osd_recovery_max_active to 3 (from 15)
> osd_recovery_op_priority to 1 (from 10)
> osd_recovery_max_single_start to 1 (from 5)
>
> We'd like a bit of feedback first though.  Is anyone happy with the current configs?  Is anyone using something between these values and the current defaults?  What kind of workload?  I'd guess that lowering osd_max_backfills to 1 is probably a good idea, but I wonder whether lowering osd_recovery_max_active and osd_recovery_max_single_start will cause small objects to recover unacceptably slowly.
>
> Thoughts?
> -Sam

Sam I was thinking about this recently. We recently recently we ended
up hitting a recovery story & a scrub storm both happened at a time of
high client activity. While changing the defaults down will make these
kinds of disruptions less likely to occur, it also makes recovery
(rebalancing) very slow. What I'd like to see

What I would be happy to see is more of a QOS style tunable along the
lines of networking traffic shaping. Where can guarantee a minimum
amount of recovery "load" (and I say it in quotes since there's more
the one resource involved) when the cluster is busy with client IO. Or
vice versa there's a minimum amount of client IO that's guaranteed.
Then when there's lower periods of client activity the recovery (and
other background work) can proceed at full speed. Many workloads are
cyclical or seasonal (in the statistics term of it, eg. intra/infra
day seasonality).

QOS style managment should lead to a more dynamic system where we can
maximize available utilization, minimize disruptions, and not play
wack-a-mole with many conf knobs. I'm aware that this is much harder
to implement but thankfully there's a lot of literature,
implementation and practical experience out there to draw upon.

- Milosz

-- 
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: milosz@adfin.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Discuss: New default recovery config settings
       [not found]   ` <1394947829.9758745.1432936033017.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-05-29 22:16     ` Josef Johansson
       [not found]       ` <CAOnYue-zF1driSi1oxGCQ8Vh1UcG=viT5P8AeHA3R1NCca1o6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2015-05-29 22:33     ` Somnath Roy
  2015-05-29 23:17     ` Gregory Farnum
  2 siblings, 1 reply; 18+ messages in thread
From: Josef Johansson @ 2015-05-29 22:16 UTC (permalink / raw)
  To: Samuel Just, ceph-devel,
	'ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org' (ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org)


[-- Attachment #1.1: Type: text/plain, Size: 1516 bytes --]

Hi,

We did it the other way around instead, defining a period where the load is
lighter and turn off/on backfill/recover. Then you want the backfill values
to be the what is default right now.

Also, someone said that (think it was Greg?) If you have problems with
backfill, your cluster backing store is not fast enough/too much load.
If 10 osds goes down at the same time you want those values to be high to
minimize the downtime.

/Josef

fre 29 maj 2015 23:47 Samuel Just <sjust-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> skrev:

> Many people have reported that they need to lower the osd recovery config
> options to minimize the impact of recovery on client io.  We are talking
> about changing the defaults as follows:
>
> osd_max_backfills to 1 (from 10)
> osd_recovery_max_active to 3 (from 15)
> osd_recovery_op_priority to 1 (from 10)
> osd_recovery_max_single_start to 1 (from 5)
>
> We'd like a bit of feedback first though.  Is anyone happy with the
> current configs?  Is anyone using something between these values and the
> current defaults?  What kind of workload?  I'd guess that lowering
> osd_max_backfills to 1 is probably a good idea, but I wonder whether
> lowering osd_recovery_max_active and osd_recovery_max_single_start will
> cause small objects to recover unacceptably slowly.
>
> Thoughts?
> -Sam
> _______________________________________________
> ceph-users mailing list
> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

[-- Attachment #1.2: Type: text/html, Size: 2017 bytes --]

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Discuss: New default recovery config settings
       [not found]   ` <1394947829.9758745.1432936033017.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2015-05-29 22:16     ` Josef Johansson
@ 2015-05-29 22:33     ` Somnath Roy
  2015-05-29 23:17     ` Gregory Farnum
  2 siblings, 0 replies; 18+ messages in thread
From: Somnath Roy @ 2015-05-29 22:33 UTC (permalink / raw)
  To: Samuel Just, ceph-devel,
	'ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org' (ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org)

Sam,
We are seeing some good client IO results during recovery by using the following values..

osd recovery max active = 1
osd max backfills = 1
osd recovery threads = 1
osd recovery op priority = 1

It is all flash though.  The recovery time in case of entire node (~120 TB) failure/a single drive (~8TB) failure is also not too bad with the above settings.

Thanks & Regards
Somnath

-----Original Message-----
From: ceph-devel-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:ceph-devel-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Samuel Just
Sent: Friday, May 29, 2015 2:47 PM
To: ceph-devel; 'ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org' (ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org)
Subject: Discuss: New default recovery config settings

Many people have reported that they need to lower the osd recovery config options to minimize the impact of recovery on client io.  We are talking about changing the defaults as follows:

osd_max_backfills to 1 (from 10)
osd_recovery_max_active to 3 (from 15)
osd_recovery_op_priority to 1 (from 10)
osd_recovery_max_single_start to 1 (from 5)

We'd like a bit of feedback first though.  Is anyone happy with the current configs?  Is anyone using something between these values and the current defaults?  What kind of workload?  I'd guess that lowering osd_max_backfills to 1 is probably a good idea, but I wonder whether lowering osd_recovery_max_active and osd_recovery_max_single_start will cause small objects to recover unacceptably slowly.

Thoughts?
-Sam
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Discuss: New default recovery config settings
       [not found]       ` <CAOnYue-zF1driSi1oxGCQ8Vh1UcG=viT5P8AeHA3R1NCca1o6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-05-29 22:56         ` Stillwell, Bryan
  0 siblings, 0 replies; 18+ messages in thread
From: Stillwell, Bryan @ 2015-05-29 22:56 UTC (permalink / raw)
  To: Josef Johansson, Samuel Just, ceph-devel,
	'ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org' (ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org)

[-- Attachment #1.1: Type: text/plain, Size: 3620 bytes --]

I like the idea of turning the defaults down.  During the ceph operators session at the OpenStack conference last week Warren described the behavior pretty accurately as "Ceph basically DOSes itself unless you reduce those settings."  Maybe this is more of a problem when the clusters are small?

Another idea would be to have a better way to prioritize recovery traffic to an even lower priority level by setting the ionice value to 'Idle' in the CFQ scheduler?

Bryan

From: Josef Johansson <josef86-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org<mailto:josef86-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>>
Date: Friday, May 29, 2015 at 4:16 PM
To: Samuel Just <sjust-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org<mailto:sjust-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>>, ceph-devel <ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org<mailto:ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>>, "'ceph-users@lists.ceph.com<mailto:'ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org>' (ceph-users-idqoXFIVOFKIjjVqG0RrOw@public.gmane.orgom<mailto:ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org>)" <ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org<mailto:ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org>>
Subject: Re: [ceph-users] Discuss: New default recovery config settings

Hi,

We did it the other way around instead, defining a period where the load is lighter and turn off/on backfill/recover. Then you want the backfill values to be the what is default right now.

Also, someone said that (think it was Greg?) If you have problems with backfill, your cluster backing store is not fast enough/too much load.
If 10 osds goes down at the same time you want those values to be high to minimize the downtime.

/Josef

fre 29 maj 2015 23:47 Samuel Just <sjust-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org<mailto:sjust-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>> skrev:
Many people have reported that they need to lower the osd recovery config options to minimize the impact of recovery on client io.  We are talking about changing the defaults as follows:

osd_max_backfills to 1 (from 10)
osd_recovery_max_active to 3 (from 15)
osd_recovery_op_priority to 1 (from 10)
osd_recovery_max_single_start to 1 (from 5)

We'd like a bit of feedback first though.  Is anyone happy with the current configs?  Is anyone using something between these values and the current defaults?  What kind of workload?  I'd guess that lowering osd_max_backfills to 1 is probably a good idea, but I wonder whether lowering osd_recovery_max_active and osd_recovery_max_single_start will cause small objects to recover unacceptably slowly.

Thoughts?
-Sam
_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org<mailto:ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

________________________________
This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.

[-- Attachment #1.2: Type: text/html, Size: 5240 bytes --]

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Discuss: New default recovery config settings
       [not found]   ` <1394947829.9758745.1432936033017.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2015-05-29 22:16     ` Josef Johansson
  2015-05-29 22:33     ` Somnath Roy
@ 2015-05-29 23:17     ` Gregory Farnum
       [not found]       ` <CAC6JEv8LyM1SRsOszDCj6tLxhZ=hvrEkinDARW8BvyQHDCz+LQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2015-06-02  1:39       ` Paul Von-Stamwitz
  2 siblings, 2 replies; 18+ messages in thread
From: Gregory Farnum @ 2015-05-29 23:17 UTC (permalink / raw)
  To: Samuel Just
  Cc: ceph-devel,
	'ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org' (ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org)

On Fri, May 29, 2015 at 2:47 PM, Samuel Just <sjust-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> Many people have reported that they need to lower the osd recovery config options to minimize the impact of recovery on client io.  We are talking about changing the defaults as follows:
>
> osd_max_backfills to 1 (from 10)
> osd_recovery_max_active to 3 (from 15)
> osd_recovery_op_priority to 1 (from 10)
> osd_recovery_max_single_start to 1 (from 5)

I'm under the (possibly erroneous) impression that reducing the number
of max backfills doesn't actually reduce recovery speed much (but will
reduce memory use), but that dropping the op priority can. I'd rather
we make users manually adjust values which can have a material impact
on their data safety, even if most of them choose to do so.

After all, even under our worst behavior we're still doing a lot
better than a resilvering RAID array. ;)
-Greg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Discuss: New default recovery config settings
  2015-05-29 21:47 ` Discuss: New default recovery config settings Samuel Just
  2015-05-29 22:16   ` Milosz Tanski
       [not found]   ` <1394947829.9758745.1432936033017.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-05-31 14:29   ` Justin Erenkrantz
  2 siblings, 0 replies; 18+ messages in thread
From: Justin Erenkrantz @ 2015-05-31 14:29 UTC (permalink / raw)
  To: Samuel Just
  Cc: ceph-devel,
	'ceph-users@lists.ceph.com' (ceph-users@lists.ceph.com)

On Fri, May 29, 2015 at 5:47 PM, Samuel Just <sjust@redhat.com> wrote:
> Many people have reported that they need to lower the osd recovery config options to minimize the impact of recovery on client io.  We are talking about changing the defaults as follows:
>
> osd_max_backfills to 1 (from 10)
> osd_recovery_max_active to 3 (from 15)
> osd_recovery_op_priority to 1 (from 10)
> osd_recovery_max_single_start to 1 (from 5)
>
> We'd like a bit of feedback first though.  Is anyone happy with the current configs?  Is anyone using something between these values and the current defaults?  What kind of workload?  I'd guess that lowering osd_max_backfills to 1 is probably a good idea, but I wonder whether lowering osd_recovery_max_active and osd_recovery_max_single_start will cause small objects to recover unacceptably slowly.
>
> Thoughts?

When we enable explicit rebalancing, we set the parameters as such to
reduce the impact on client I/O:

osd_recovery_max_active = 1
osd_max_backfills = 1
osd_op_threads = 10
osd_recovery_op_priority = 1
osd_mon_report_interval_min = 30

See https://github.com/bloomberg/chef-bcpc/blob/master/cookbooks/bcpc/templates/default/ceph.conf.erb

Cheers.  -- justin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Discuss: New default recovery config settings
       [not found]       ` <CAC6JEv8LyM1SRsOszDCj6tLxhZ=hvrEkinDARW8BvyQHDCz+LQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-06-01  7:43         ` Jan Schermer
       [not found]           ` <9555849C-7C6E-485E-B60C-BB4996F96E32-SB6/BxVxTjHtwjQa/ONI9g@public.gmane.org>
  2015-06-01  8:57           ` [ceph-users] " huang jun
  0 siblings, 2 replies; 18+ messages in thread
From: Jan Schermer @ 2015-06-01  7:43 UTC (permalink / raw)
  To: Gregory Farnum
  Cc: ceph-devel, ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org

We had to disable deep scrub or the cluster would me unusable - we need to turn it back on sooner or later, though.
With minimal scrubbing and recovery settings, everything is mostly good. Turned out many issues we had were due to too few PGs - once we increased them from 4K to 16K everything sped up nicely (because the chunks are smaller), but during heavy activity we are still getting some “slow IOs”.
I believe there is an ionice knob in newer versions (we still run Dumpling), and that should do the trick no matter how much additional “load” is put on the OSDs.
Everybody’s bottleneck will be different - we run all flash so disk IO is not a problem but an OSD daemon is - no ionice setting will help with that, it just needs to be faster ;-)

Jan


> On 30 May 2015, at 01:17, Gregory Farnum <greg@gregs42.com> wrote:
> 
> On Fri, May 29, 2015 at 2:47 PM, Samuel Just <sjust@redhat.com> wrote:
>> Many people have reported that they need to lower the osd recovery config options to minimize the impact of recovery on client io.  We are talking about changing the defaults as follows:
>> 
>> osd_max_backfills to 1 (from 10)
>> osd_recovery_max_active to 3 (from 15)
>> osd_recovery_op_priority to 1 (from 10)
>> osd_recovery_max_single_start to 1 (from 5)
> 
> I'm under the (possibly erroneous) impression that reducing the number
> of max backfills doesn't actually reduce recovery speed much (but will
> reduce memory use), but that dropping the op priority can. I'd rather
> we make users manually adjust values which can have a material impact
> on their data safety, even if most of them choose to do so.
> 
> After all, even under our worst behavior we're still doing a lot
> better than a resilvering RAID array. ;)
> -Greg
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Discuss: New default recovery config settings
       [not found]           ` <9555849C-7C6E-485E-B60C-BB4996F96E32-SB6/BxVxTjHtwjQa/ONI9g@public.gmane.org>
@ 2015-06-01  8:13             ` Lionel Bouton
  0 siblings, 0 replies; 18+ messages in thread
From: Lionel Bouton @ 2015-06-01  8:13 UTC (permalink / raw)
  To: Jan Schermer, Gregory Farnum
  Cc: ceph-devel, ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org

On 06/01/15 09:43, Jan Schermer wrote:
> We had to disable deep scrub or the cluster would me unusable - we need to turn it back on sooner or later, though.
> With minimal scrubbing and recovery settings, everything is mostly good. Turned out many issues we had were due to too few PGs - once we increased them from 4K to 16K everything sped up nicely (because the chunks are smaller), but during heavy activity we are still getting some “slow IOs”.
> I believe there is an ionice knob in newer versions (we still run Dumpling), and that should do the trick no matter how much additional “load” is put on the OSDs.
> Everybody’s bottleneck will be different - we run all flash so disk IO is not a problem but an OSD daemon is - no ionice setting will help with that, it just needs to be faster ;-)

If you are interested I'm currently testing a ruby script which
schedules the deep scrubs one at a time trying to simultaneously make
them fit in a given time window, avoid successive scrubs on the same OSD
and space the deep scrubs according to the amount of data scrubed.  I
use it because Ceph by itself can't prevent multiple scrubs to happen
simultaneously on the network and it can severely impact our VM performance.
I can clean it up and post it on Github.

Best regards,

Lionel
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [ceph-users] Discuss: New default recovery config settings
  2015-06-01  7:43         ` Jan Schermer
       [not found]           ` <9555849C-7C6E-485E-B60C-BB4996F96E32-SB6/BxVxTjHtwjQa/ONI9g@public.gmane.org>
@ 2015-06-01  8:57           ` huang jun
  2015-06-01  9:01             ` Jan Schermer
  1 sibling, 1 reply; 18+ messages in thread
From: huang jun @ 2015-06-01  8:57 UTC (permalink / raw)
  To: Jan Schermer
  Cc: Gregory Farnum, Samuel Just, ceph-devel,
	ceph-users@lists.ceph.com

hi,jan

2015-06-01 15:43 GMT+08:00 Jan Schermer <jan@schermer.cz>:
> We had to disable deep scrub or the cluster would me unusable - we need to turn it back on sooner or later, though.
> With minimal scrubbing and recovery settings, everything is mostly good. Turned out many issues we had were due to too few PGs - once we increased them from 4K to 16K everything sped up nicely (because the chunks are smaller), but during heavy activity we are still getting some “slow IOs”.

How many PGs do you set ?  we get "slow requests" many times, but
didn't relate it to PG number.
And we follow the equation below for every pool:

                    (OSDs * 100)
Total PGs =  ---------------------
                      pool size
our cluster has 157 OSDs and 3 POOLs, we set pg_num to  8192 for every pool,
but osd cpu utlity percentage is up to 300% after restart, we think
it's  loading pgs during the period.
and we will try different PG number when we get "slow request"

thanks!

> I believe there is an ionice knob in newer versions (we still run Dumpling), and that should do the trick no matter how much additional “load” is put on the OSDs.
> Everybody’s bottleneck will be different - we run all flash so disk IO is not a problem but an OSD daemon is - no ionice setting will help with that, it just needs to be faster ;-)
>
> Jan
>
>
>> On 30 May 2015, at 01:17, Gregory Farnum <greg@gregs42.com> wrote:
>>
>> On Fri, May 29, 2015 at 2:47 PM, Samuel Just <sjust@redhat.com> wrote:
>>> Many people have reported that they need to lower the osd recovery config options to minimize the impact of recovery on client io.  We are talking about changing the defaults as follows:
>>>
>>> osd_max_backfills to 1 (from 10)
>>> osd_recovery_max_active to 3 (from 15)
>>> osd_recovery_op_priority to 1 (from 10)
>>> osd_recovery_max_single_start to 1 (from 5)
>>
>> I'm under the (possibly erroneous) impression that reducing the number
>> of max backfills doesn't actually reduce recovery speed much (but will
>> reduce memory use), but that dropping the op priority can. I'd rather
>> we make users manually adjust values which can have a material impact
>> on their data safety, even if most of them choose to do so.
>>
>> After all, even under our worst behavior we're still doing a lot
>> better than a resilvering RAID array. ;)
>> -Greg
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
thanks
huangjun
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [ceph-users] Discuss: New default recovery config settings
  2015-06-01  8:57           ` [ceph-users] " huang jun
@ 2015-06-01  9:01             ` Jan Schermer
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Schermer @ 2015-06-01  9:01 UTC (permalink / raw)
  To: huang jun
  Cc: Gregory Farnum, Samuel Just, ceph-devel,
	ceph-users@lists.ceph.com

Slow requests are not exactly tied to the PG number, but we were getting slow requests whenever backfills or recoveries fired up - increasing the number of PGs helped with this as the “blocks” of work are much smaller than before.

We have roughly the same number of OSDs as you but only one really important pool (“volumes”), we ended with 16384 PGs for this one.
Number of threads increased exponentionaly, some latencies wet down, some went up, in the end it works just as well as before with the added benefit of better data distribution and a better behaving cluster.
But YMMV - once you go up you can’t go down.

Jan


> On 01 Jun 2015, at 10:57, huang jun <hjwsm1989@gmail.com> wrote:
> 
> hi,jan
> 
> 2015-06-01 15:43 GMT+08:00 Jan Schermer <jan@schermer.cz>:
>> We had to disable deep scrub or the cluster would me unusable - we need to turn it back on sooner or later, though.
>> With minimal scrubbing and recovery settings, everything is mostly good. Turned out many issues we had were due to too few PGs - once we increased them from 4K to 16K everything sped up nicely (because the chunks are smaller), but during heavy activity we are still getting some “slow IOs”.
> 
> How many PGs do you set ?  we get "slow requests" many times, but
> didn't relate it to PG number.
> And we follow the equation below for every pool:
> 
>                    (OSDs * 100)
> Total PGs =  ---------------------
>                      pool size
> our cluster has 157 OSDs and 3 POOLs, we set pg_num to  8192 for every pool,
> but osd cpu utlity percentage is up to 300% after restart, we think
> it's  loading pgs during the period.
> and we will try different PG number when we get "slow request"
> 
> thanks!
> 
>> I believe there is an ionice knob in newer versions (we still run Dumpling), and that should do the trick no matter how much additional “load” is put on the OSDs.
>> Everybody’s bottleneck will be different - we run all flash so disk IO is not a problem but an OSD daemon is - no ionice setting will help with that, it just needs to be faster ;-)
>> 
>> Jan
>> 
>> 
>>> On 30 May 2015, at 01:17, Gregory Farnum <greg@gregs42.com> wrote:
>>> 
>>> On Fri, May 29, 2015 at 2:47 PM, Samuel Just <sjust@redhat.com> wrote:
>>>> Many people have reported that they need to lower the osd recovery config options to minimize the impact of recovery on client io.  We are talking about changing the defaults as follows:
>>>> 
>>>> osd_max_backfills to 1 (from 10)
>>>> osd_recovery_max_active to 3 (from 15)
>>>> osd_recovery_op_priority to 1 (from 10)
>>>> osd_recovery_max_single_start to 1 (from 5)
>>> 
>>> I'm under the (possibly erroneous) impression that reducing the number
>>> of max backfills doesn't actually reduce recovery speed much (but will
>>> reduce memory use), but that dropping the op priority can. I'd rather
>>> we make users manually adjust values which can have a material impact
>>> on their data safety, even if most of them choose to do so.
>>> 
>>> After all, even under our worst behavior we're still doing a lot
>>> better than a resilvering RAID array. ;)
>>> -Greg
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> -- 
> thanks
> huangjun

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Discuss: New default recovery config settings
  2015-05-29 23:17     ` Gregory Farnum
       [not found]       ` <CAC6JEv8LyM1SRsOszDCj6tLxhZ=hvrEkinDARW8BvyQHDCz+LQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-06-02  1:39       ` Paul Von-Stamwitz
       [not found]         ` <622F4407872BA447A16110F65453358C03DFB21C4B5F-Y+un6SQecilYCZvkXUWeucM6rOWSkUom@public.gmane.org>
  1 sibling, 1 reply; 18+ messages in thread
From: Paul Von-Stamwitz @ 2015-06-02  1:39 UTC (permalink / raw)
  To: Gregory Farnum, Samuel Just
  Cc: ceph-devel,
	'ceph-users@lists.ceph.com' (ceph-users@lists.ceph.com)

On Fri, May 29, 2015 at 4:18 PM, Gregory Farnum <greg@gregs42.com> wrote:
> On Fri, May 29, 2015 at 2:47 PM, Samuel Just <sjust@redhat.com> wrote:
> > Many people have reported that they need to lower the osd recovery config options to minimize the impact of recovery on client io.  We are talking about changing the defaults as follows:
> >
> > osd_max_backfills to 1 (from 10)
> > osd_recovery_max_active to 3 (from 15)
> > osd_recovery_op_priority to 1 (from 10)
> > osd_recovery_max_single_start to 1 (from 5)
> 
> I'm under the (possibly erroneous) impression that reducing the number of max backfills doesn't actually reduce recovery speed much (but will reduce memory use), but that dropping the op priority can. I'd rather we make users manually adjust values which can have a material impact on their data safety, even if most of them choose to do so.
> 
> After all, even under our worst behavior we're still doing a lot better than a resilvering RAID array. ;) -Greg
> --


Greg,
When we set...

osd recovery max active = 1
osd max backfills = 1

We see rebalance times go down by more than half and client write performance increase significantly while rebalancing. We initially played with these settings to improve client IO expecting recovery time to get worse, but we got a 2-for-1. 
This was with firefly using replication, downing an entire node with lots of SAS drives. We left osd_recovery_threads, osd_recovery_op_priority, and osd_recovery_max_single_start default.

We dropped osd_recovery_max_active and osd_max_backfills together. If you're right, do you think osd_recovery_max_active=1 is primary reason for the improvement? (higher osd_max_backfills helps recovery time with erasure coding.)

-Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Discuss: New default recovery config settings
       [not found]         ` <622F4407872BA447A16110F65453358C03DFB21C4B5F-Y+un6SQecilYCZvkXUWeucM6rOWSkUom@public.gmane.org>
@ 2015-06-02  3:43           ` Gregory Farnum
       [not found]             ` <CAC6JEv-i6tOwPBDjP++EW4FxAKZ_X_prEZ8YZOpLq-RTP1ZguQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Gregory Farnum @ 2015-06-02  3:43 UTC (permalink / raw)
  To: Paul Von-Stamwitz
  Cc: ceph-devel,
	'ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org' (ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org)

On Mon, Jun 1, 2015 at 6:39 PM, Paul Von-Stamwitz
<PVonStamwitz@us.fujitsu.com> wrote:
> On Fri, May 29, 2015 at 4:18 PM, Gregory Farnum <greg@gregs42.com> wrote:
>> On Fri, May 29, 2015 at 2:47 PM, Samuel Just <sjust@redhat.com> wrote:
>> > Many people have reported that they need to lower the osd recovery config options to minimize the impact of recovery on client io.  We are talking about changing the defaults as follows:
>> >
>> > osd_max_backfills to 1 (from 10)
>> > osd_recovery_max_active to 3 (from 15)
>> > osd_recovery_op_priority to 1 (from 10)
>> > osd_recovery_max_single_start to 1 (from 5)
>>
>> I'm under the (possibly erroneous) impression that reducing the number of max backfills doesn't actually reduce recovery speed much (but will reduce memory use), but that dropping the op priority can. I'd rather we make users manually adjust values which can have a material impact on their data safety, even if most of them choose to do so.
>>
>> After all, even under our worst behavior we're still doing a lot better than a resilvering RAID array. ;) -Greg
>> --
>
>
> Greg,
> When we set...
>
> osd recovery max active = 1
> osd max backfills = 1
>
> We see rebalance times go down by more than half and client write performance increase significantly while rebalancing. We initially played with these settings to improve client IO expecting recovery time to get worse, but we got a 2-for-1.
> This was with firefly using replication, downing an entire node with lots of SAS drives. We left osd_recovery_threads, osd_recovery_op_priority, and osd_recovery_max_single_start default.
>
> We dropped osd_recovery_max_active and osd_max_backfills together. If you're right, do you think osd_recovery_max_active=1 is primary reason for the improvement? (higher osd_max_backfills helps recovery time with erasure coding.)

Well, recovery max active and max backfills are similar in many ways.
Both are about moving data into a new or outdated copy of the PG — the
difference is that recovery refers to our log-based recovery (where we
compare the PG logs and move over the objects which have changed)
whereas backfill requires us to incrementally move through the entire
PG's hash space and compare.
I suspect dropping down max backfills is more important than reducing
max recovery (gathering recovery metadata happens largely in memory)
but I don't really know either way.

My comment was meant to convey that I'd prefer we not reduce the
recovery op priority levels. :)
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Discuss: New default recovery config settings
       [not found]             ` <CAC6JEv-i6tOwPBDjP++EW4FxAKZ_X_prEZ8YZOpLq-RTP1ZguQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-06-03 22:44               ` Sage Weil
  2015-06-03 22:55                 ` Gregory Farnum
       [not found]                 ` <alpine.DEB.2.00.1506031541200.26591-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
  0 siblings, 2 replies; 18+ messages in thread
From: Sage Weil @ 2015-06-03 22:44 UTC (permalink / raw)
  To: Gregory Farnum
  Cc: ceph-devel, Paul Von-Stamwitz,
	'ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org' (ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org)

On Mon, 1 Jun 2015, Gregory Farnum wrote:
> On Mon, Jun 1, 2015 at 6:39 PM, Paul Von-Stamwitz
> <PVonStamwitz-gkcJ3tX5bYHQFUHtdCDX3A@public.gmane.org> wrote:
> > On Fri, May 29, 2015 at 4:18 PM, Gregory Farnum <greg-3KCAGdo1P2hBDgjK7y7TUQ@public.gmane.org> wrote:
> >> On Fri, May 29, 2015 at 2:47 PM, Samuel Just <sjust-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> >> > Many people have reported that they need to lower the osd recovery config options to minimize the impact of recovery on client io.  We are talking about changing the defaults as follows:
> >> >
> >> > osd_max_backfills to 1 (from 10)
> >> > osd_recovery_max_active to 3 (from 15)
> >> > osd_recovery_op_priority to 1 (from 10)
> >> > osd_recovery_max_single_start to 1 (from 5)
> >>
> >> I'm under the (possibly erroneous) impression that reducing the number of max backfills doesn't actually reduce recovery speed much (but will reduce memory use), but that dropping the op priority can. I'd rather we make users manually adjust values which can have a material impact on their data safety, even if most of them choose to do so.
> >>
> >> After all, even under our worst behavior we're still doing a lot better than a resilvering RAID array. ;) -Greg
> >> --
> >
> >
> > Greg,
> > When we set...
> >
> > osd recovery max active = 1
> > osd max backfills = 1
> >
> > We see rebalance times go down by more than half and client write performance increase significantly while rebalancing. We initially played with these settings to improve client IO expecting recovery time to get worse, but we got a 2-for-1.
> > This was with firefly using replication, downing an entire node with lots of SAS drives. We left osd_recovery_threads, osd_recovery_op_priority, and osd_recovery_max_single_start default.
> >
> > We dropped osd_recovery_max_active and osd_max_backfills together. If you're right, do you think osd_recovery_max_active=1 is primary reason for the improvement? (higher osd_max_backfills helps recovery time with erasure coding.)
> 
> Well, recovery max active and max backfills are similar in many ways.
> Both are about moving data into a new or outdated copy of the PG ? the
> difference is that recovery refers to our log-based recovery (where we
> compare the PG logs and move over the objects which have changed)
> whereas backfill requires us to incrementally move through the entire
> PG's hash space and compare.
> I suspect dropping down max backfills is more important than reducing
> max recovery (gathering recovery metadata happens largely in memory)
> but I don't really know either way.
> 
> My comment was meant to convey that I'd prefer we not reduce the
> recovery op priority levels. :)

We could make a less extreme move than to 1, but IMO we have to reduce it 
one way or another.  Every major operator I've talked to does this, our PS 
folks have been recommending it for years, and I've yet to see a single 
complaint about recovery times... meanwhile we're drowning in a sea of 
complaints about the impact on clients.

How about

 osd_max_backfills to 1 (from 10)
 osd_recovery_max_active to 3 (from 15)
 osd_recovery_op_priority to 3 (from 10)
 osd_recovery_max_single_start to 1 (from 5)

(same as above, but 1/3rd the recovery op prio instead of 1/10th)
?

sage

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Discuss: New default recovery config settings
  2015-06-03 22:44               ` Sage Weil
@ 2015-06-03 22:55                 ` Gregory Farnum
       [not found]                 ` <alpine.DEB.2.00.1506031541200.26591-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
  1 sibling, 0 replies; 18+ messages in thread
From: Gregory Farnum @ 2015-06-03 22:55 UTC (permalink / raw)
  To: Sage Weil
  Cc: Paul Von-Stamwitz, Samuel Just, ceph-devel,
	'ceph-users@lists.ceph.com' (ceph-users@lists.ceph.com)

On Wed, Jun 3, 2015 at 3:44 PM, Sage Weil <sage@newdream.net> wrote:
> On Mon, 1 Jun 2015, Gregory Farnum wrote:
>> On Mon, Jun 1, 2015 at 6:39 PM, Paul Von-Stamwitz
>> <PVonStamwitz@us.fujitsu.com> wrote:
>> > On Fri, May 29, 2015 at 4:18 PM, Gregory Farnum <greg@gregs42.com> wrote:
>> >> On Fri, May 29, 2015 at 2:47 PM, Samuel Just <sjust@redhat.com> wrote:
>> >> > Many people have reported that they need to lower the osd recovery config options to minimize the impact of recovery on client io.  We are talking about changing the defaults as follows:
>> >> >
>> >> > osd_max_backfills to 1 (from 10)
>> >> > osd_recovery_max_active to 3 (from 15)
>> >> > osd_recovery_op_priority to 1 (from 10)
>> >> > osd_recovery_max_single_start to 1 (from 5)
>> >>
>> >> I'm under the (possibly erroneous) impression that reducing the number of max backfills doesn't actually reduce recovery speed much (but will reduce memory use), but that dropping the op priority can. I'd rather we make users manually adjust values which can have a material impact on their data safety, even if most of them choose to do so.
>> >>
>> >> After all, even under our worst behavior we're still doing a lot better than a resilvering RAID array. ;) -Greg
>> >> --
>> >
>> >
>> > Greg,
>> > When we set...
>> >
>> > osd recovery max active = 1
>> > osd max backfills = 1
>> >
>> > We see rebalance times go down by more than half and client write performance increase significantly while rebalancing. We initially played with these settings to improve client IO expecting recovery time to get worse, but we got a 2-for-1.
>> > This was with firefly using replication, downing an entire node with lots of SAS drives. We left osd_recovery_threads, osd_recovery_op_priority, and osd_recovery_max_single_start default.
>> >
>> > We dropped osd_recovery_max_active and osd_max_backfills together. If you're right, do you think osd_recovery_max_active=1 is primary reason for the improvement? (higher osd_max_backfills helps recovery time with erasure coding.)
>>
>> Well, recovery max active and max backfills are similar in many ways.
>> Both are about moving data into a new or outdated copy of the PG ? the
>> difference is that recovery refers to our log-based recovery (where we
>> compare the PG logs and move over the objects which have changed)
>> whereas backfill requires us to incrementally move through the entire
>> PG's hash space and compare.
>> I suspect dropping down max backfills is more important than reducing
>> max recovery (gathering recovery metadata happens largely in memory)
>> but I don't really know either way.
>>
>> My comment was meant to convey that I'd prefer we not reduce the
>> recovery op priority levels. :)
>
> We could make a less extreme move than to 1, but IMO we have to reduce it
> one way or another.  Every major operator I've talked to does this, our PS
> folks have been recommending it for years, and I've yet to see a single
> complaint about recovery times... meanwhile we're drowning in a sea of
> complaints about the impact on clients.
>
> How about
>
>  osd_max_backfills to 1 (from 10)
>  osd_recovery_max_active to 3 (from 15)
>  osd_recovery_op_priority to 3 (from 10)
>  osd_recovery_max_single_start to 1 (from 5)
>
> (same as above, but 1/3rd the recovery op prio instead of 1/10th)
> ?

Do we actually have numbers for these changes individually? We might,
but I have a suspicion that at some point there was just a "well, you
could turn them all down" comment and that state was preferred to our
defaults.

I mean, I have no real knowledge of how changing the op priority
impacts things, but I don't think many (any?) other people do either,
so I'd rather mutate slowly and see if that works better. :)
Especially given Paul's comment that just the recovery_max and
max_backfills values made a huge positive difference without any
change to priorities.
-Greg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Discuss: New default recovery config settings
       [not found]                 ` <alpine.DEB.2.00.1506031541200.26591-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
@ 2015-06-04 21:01                   ` Mike Dawson
       [not found]                     ` <5570BCAE.5050509-ffsCFlcjuZBWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Mike Dawson @ 2015-06-04 21:01 UTC (permalink / raw)
  To: Sage Weil, Gregory Farnum
  Cc: ceph-devel, Paul Von-Stamwitz,
	'ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org' (ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org)

With a write-heavy RBD workload, I add the following to ceph.conf:

osd_max_backfills = 2
osd_recovery_max_active = 2

If things are going well during recovery (i.e. guests happy and no slow 
requests), I will often bump both up to three:

# ceph tell osd.* injectargs '--osd-max-backfills 3 
--osd-recovery-max-active 3'

If I see slow requests, I drop them down.

The biggest downside to setting either to 1 seems to be the long tail 
issue detailed in:

http://tracker.ceph.com/issues/9566

Thanks,
Mike Dawson


On 6/3/2015 6:44 PM, Sage Weil wrote:
> On Mon, 1 Jun 2015, Gregory Farnum wrote:
>> On Mon, Jun 1, 2015 at 6:39 PM, Paul Von-Stamwitz
>> <PVonStamwitz-gkcJ3tX5bYHQFUHtdCDX3A@public.gmane.org> wrote:
>>> On Fri, May 29, 2015 at 4:18 PM, Gregory Farnum <greg-3KCAGdo1P2hBDgjK7y7TUQ@public.gmane.org> wrote:
>>>> On Fri, May 29, 2015 at 2:47 PM, Samuel Just <sjust-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>>>>> Many people have reported that they need to lower the osd recovery config options to minimize the impact of recovery on client io.  We are talking about changing the defaults as follows:
>>>>>
>>>>> osd_max_backfills to 1 (from 10)
>>>>> osd_recovery_max_active to 3 (from 15)
>>>>> osd_recovery_op_priority to 1 (from 10)
>>>>> osd_recovery_max_single_start to 1 (from 5)
>>>>
>>>> I'm under the (possibly erroneous) impression that reducing the number of max backfills doesn't actually reduce recovery speed much (but will reduce memory use), but that dropping the op priority can. I'd rather we make users manually adjust values which can have a material impact on their data safety, even if most of them choose to do so.
>>>>
>>>> After all, even under our worst behavior we're still doing a lot better than a resilvering RAID array. ;) -Greg
>>>> --
>>>
>>>
>>> Greg,
>>> When we set...
>>>
>>> osd recovery max active = 1
>>> osd max backfills = 1
>>>
>>> We see rebalance times go down by more than half and client write performance increase significantly while rebalancing. We initially played with these settings to improve client IO expecting recovery time to get worse, but we got a 2-for-1.
>>> This was with firefly using replication, downing an entire node with lots of SAS drives. We left osd_recovery_threads, osd_recovery_op_priority, and osd_recovery_max_single_start default.
>>>
>>> We dropped osd_recovery_max_active and osd_max_backfills together. If you're right, do you think osd_recovery_max_active=1 is primary reason for the improvement? (higher osd_max_backfills helps recovery time with erasure coding.)
>>
>> Well, recovery max active and max backfills are similar in many ways.
>> Both are about moving data into a new or outdated copy of the PG ? the
>> difference is that recovery refers to our log-based recovery (where we
>> compare the PG logs and move over the objects which have changed)
>> whereas backfill requires us to incrementally move through the entire
>> PG's hash space and compare.
>> I suspect dropping down max backfills is more important than reducing
>> max recovery (gathering recovery metadata happens largely in memory)
>> but I don't really know either way.
>>
>> My comment was meant to convey that I'd prefer we not reduce the
>> recovery op priority levels. :)
>
> We could make a less extreme move than to 1, but IMO we have to reduce it
> one way or another.  Every major operator I've talked to does this, our PS
> folks have been recommending it for years, and I've yet to see a single
> complaint about recovery times... meanwhile we're drowning in a sea of
> complaints about the impact on clients.
>
> How about
>
>   osd_max_backfills to 1 (from 10)
>   osd_recovery_max_active to 3 (from 15)
>   osd_recovery_op_priority to 3 (from 10)
>   osd_recovery_max_single_start to 1 (from 5)
>
> (same as above, but 1/3rd the recovery op prio instead of 1/10th)
> ?
>
> sage
> _______________________________________________
> ceph-users mailing list
> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Discuss: New default recovery config settings
       [not found]                     ` <5570BCAE.5050509-ffsCFlcjuZBWk0Htik3J/w@public.gmane.org>
@ 2015-06-04 23:24                       ` Scottix
       [not found]                         ` <CANKFHZ_yWaUYoKt3X5RE8BewRqhM_nvfmKwJPRPcFz444q_nWA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Scottix @ 2015-06-04 23:24 UTC (permalink / raw)
  To: Mike Dawson, Sage Weil, Gregory Farnum
  Cc: ceph-devel, Paul Von-Stamwitz,
	'ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org' (ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org)


[-- Attachment #1.1: Type: text/plain, Size: 5673 bytes --]

From a ease of use standpoint and depending on the situation you are
setting up your environment, the idea is as follow;

It seems like it would be nice to have some easy on demand control where
you don't have to think a whole lot other than knowing how it is going to
affect your cluster in a general sense.

The two extremes and a general limitation would be:
1. Priority data recover
2. Priority client usability
3rd might be hardware related like 1Gb connection

With predefined settings you can setup different levels that have sensible
settings and maybe 1 that is custom for the advanced user.
Example command (Caveat: I don't fully know how your configs work):
ceph osd set priority <low|medium|high|custom>
*With priority set it would lock certain attributes
**With priority unset it would unlock certain attributes

In our use case basically after 8pm the activity goes way down. Here I can
up the priority to medium or high, then at 6 am I can adjust it back to low.

With cron I can easily schedule that or depending on the current situation
I can schedule maintenance and change the priority to fit my needs.



On Thu, Jun 4, 2015 at 2:01 PM Mike Dawson <mike.dawson-ffsCFlcjuZBWk0Htik3J/w@public.gmane.org> wrote:

> With a write-heavy RBD workload, I add the following to ceph.conf:
>
> osd_max_backfills = 2
> osd_recovery_max_active = 2
>
> If things are going well during recovery (i.e. guests happy and no slow
> requests), I will often bump both up to three:
>
> # ceph tell osd.* injectargs '--osd-max-backfills 3
> --osd-recovery-max-active 3'
>
> If I see slow requests, I drop them down.
>
> The biggest downside to setting either to 1 seems to be the long tail
> issue detailed in:
>
> http://tracker.ceph.com/issues/9566
>
> Thanks,
> Mike Dawson
>
>
> On 6/3/2015 6:44 PM, Sage Weil wrote:
> > On Mon, 1 Jun 2015, Gregory Farnum wrote:
> >> On Mon, Jun 1, 2015 at 6:39 PM, Paul Von-Stamwitz
> >> <PVonStamwitz-gkcJ3tX5bYHQFUHtdCDX3A@public.gmane.org> wrote:
> >>> On Fri, May 29, 2015 at 4:18 PM, Gregory Farnum <greg-3KCAGdo1P2hBDgjK7y7TUQ@public.gmane.org>
> wrote:
> >>>> On Fri, May 29, 2015 at 2:47 PM, Samuel Just <sjust-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> wrote:
> >>>>> Many people have reported that they need to lower the osd recovery
> config options to minimize the impact of recovery on client io.  We are
> talking about changing the defaults as follows:
> >>>>>
> >>>>> osd_max_backfills to 1 (from 10)
> >>>>> osd_recovery_max_active to 3 (from 15)
> >>>>> osd_recovery_op_priority to 1 (from 10)
> >>>>> osd_recovery_max_single_start to 1 (from 5)
> >>>>
> >>>> I'm under the (possibly erroneous) impression that reducing the
> number of max backfills doesn't actually reduce recovery speed much (but
> will reduce memory use), but that dropping the op priority can. I'd rather
> we make users manually adjust values which can have a material impact on
> their data safety, even if most of them choose to do so.
> >>>>
> >>>> After all, even under our worst behavior we're still doing a lot
> better than a resilvering RAID array. ;) -Greg
> >>>> --
> >>>
> >>>
> >>> Greg,
> >>> When we set...
> >>>
> >>> osd recovery max active = 1
> >>> osd max backfills = 1
> >>>
> >>> We see rebalance times go down by more than half and client write
> performance increase significantly while rebalancing. We initially played
> with these settings to improve client IO expecting recovery time to get
> worse, but we got a 2-for-1.
> >>> This was with firefly using replication, downing an entire node with
> lots of SAS drives. We left osd_recovery_threads, osd_recovery_op_priority,
> and osd_recovery_max_single_start default.
> >>>
> >>> We dropped osd_recovery_max_active and osd_max_backfills together. If
> you're right, do you think osd_recovery_max_active=1 is primary reason for
> the improvement? (higher osd_max_backfills helps recovery time with erasure
> coding.)
> >>
> >> Well, recovery max active and max backfills are similar in many ways.
> >> Both are about moving data into a new or outdated copy of the PG ? the
> >> difference is that recovery refers to our log-based recovery (where we
> >> compare the PG logs and move over the objects which have changed)
> >> whereas backfill requires us to incrementally move through the entire
> >> PG's hash space and compare.
> >> I suspect dropping down max backfills is more important than reducing
> >> max recovery (gathering recovery metadata happens largely in memory)
> >> but I don't really know either way.
> >>
> >> My comment was meant to convey that I'd prefer we not reduce the
> >> recovery op priority levels. :)
> >
> > We could make a less extreme move than to 1, but IMO we have to reduce it
> > one way or another.  Every major operator I've talked to does this, our
> PS
> > folks have been recommending it for years, and I've yet to see a single
> > complaint about recovery times... meanwhile we're drowning in a sea of
> > complaints about the impact on clients.
> >
> > How about
> >
> >   osd_max_backfills to 1 (from 10)
> >   osd_recovery_max_active to 3 (from 15)
> >   osd_recovery_op_priority to 3 (from 10)
> >   osd_recovery_max_single_start to 1 (from 5)
> >
> > (same as above, but 1/3rd the recovery op prio instead of 1/10th)
> > ?
> >
> > sage
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> _______________________________________________
> ceph-users mailing list
> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

[-- Attachment #1.2: Type: text/html, Size: 7663 bytes --]

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Discuss: New default recovery config settings
       [not found]                         ` <CANKFHZ_yWaUYoKt3X5RE8BewRqhM_nvfmKwJPRPcFz444q_nWA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-08-23  4:09                           ` Shinobu
  0 siblings, 0 replies; 18+ messages in thread
From: Shinobu @ 2015-08-23  4:09 UTC (permalink / raw)
  To: Scottix
  Cc: Paul Von-Stamwitz,
	'ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org' (ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org),
	Mike Dawson, ceph-devel


[-- Attachment #1.1: Type: text/plain, Size: 7896 bytes --]

Based on original concept of *osd_max_backfills* which prevents the
following:

"*situationIf all of these backfills happen simultaneously, it would put
excessive load on the osd.*"

the value of "osd_max_backfills" could be important in some situation. So
we might not be able to say how it's important.

From my experience, big cluster easily could become complicted. Because I
know some automobile manufacturers which faced performance issues. Actually
their ceph cluster are not quite big so -;



*"dropping down max backfills is more important than reducing max recovery
(gathering recovery metadata happens largely in memory)"*

As Jan said,

"*increasing the number of PGs helped with this as the “blocks” of work are
much smaller than before.*"

A number of PGs is also one of factors that improve performance, and needs
to be considered.

From messages of Huang and Jan, we might need to think that a total number
of PGs are not always equal to the following formular.

"*Total PGs = (OSDs * 100) / pool size*"

So what I like and would like to try are:

"
*What I would be happy to see is more of a QOS style tunable along the
lines of networking traffic shaping.*"
 - Milosz Tanski

"
*Another idea would be to have a better way to prioritize recovery traffic
to an*
*even lower priority level by setting the ionice value to 'idle' in the CFQ
scheduler*"
 - Bryan Stillwell

 Shinobu


On Fri, Jun 5, 2015 at 8:24 AM, Scottix <scottix-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> From a ease of use standpoint and depending on the situation you are
> setting up your environment, the idea is as follow;
>
> It seems like it would be nice to have some easy on demand control where
> you don't have to think a whole lot other than knowing how it is going to
> affect your cluster in a general sense.
>
> The two extremes and a general limitation would be:
> 1. Priority data recover
> 2. Priority client usability
> 3rd might be hardware related like 1Gb connection
>
> With predefined settings you can setup different levels that have sensible
> settings and maybe 1 that is custom for the advanced user.
> Example command (Caveat: I don't fully know how your configs work):
> ceph osd set priority <low|medium|high|custom>
> *With priority set it would lock certain attributes
> **With priority unset it would unlock certain attributes
>
> In our use case basically after 8pm the activity goes way down. Here I can
> up the priority to medium or high, then at 6 am I can adjust it back to low.
>
> With cron I can easily schedule that or depending on the current situation
> I can schedule maintenance and change the priority to fit my needs.
>
>
>
> On Thu, Jun 4, 2015 at 2:01 PM Mike Dawson <mike.dawson-ffsCFlcjuZBWk0Htik3J/w@public.gmane.org>
> wrote:
>
>> With a write-heavy RBD workload, I add the following to ceph.conf:
>>
>> osd_max_backfills = 2
>> osd_recovery_max_active = 2
>>
>> If things are going well during recovery (i.e. guests happy and no slow
>> requests), I will often bump both up to three:
>>
>> # ceph tell osd.* injectargs '--osd-max-backfills 3
>> --osd-recovery-max-active 3'
>>
>> If I see slow requests, I drop them down.
>>
>> The biggest downside to setting either to 1 seems to be the long tail
>> issue detailed in:
>>
>> http://tracker.ceph.com/issues/9566
>>
>> Thanks,
>> Mike Dawson
>>
>>
>> On 6/3/2015 6:44 PM, Sage Weil wrote:
>> > On Mon, 1 Jun 2015, Gregory Farnum wrote:
>> >> On Mon, Jun 1, 2015 at 6:39 PM, Paul Von-Stamwitz
>> >> <PVonStamwitz-gkcJ3tX5bYHQFUHtdCDX3A@public.gmane.org> wrote:
>> >>> On Fri, May 29, 2015 at 4:18 PM, Gregory Farnum <greg-3KCAGdo1P2hBDgjK7y7TUQ@public.gmane.org>
>> wrote:
>> >>>> On Fri, May 29, 2015 at 2:47 PM, Samuel Just <sjust-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>> wrote:
>> >>>>> Many people have reported that they need to lower the osd recovery
>> config options to minimize the impact of recovery on client io.  We are
>> talking about changing the defaults as follows:
>> >>>>>
>> >>>>> osd_max_backfills to 1 (from 10)
>> >>>>> osd_recovery_max_active to 3 (from 15)
>> >>>>> osd_recovery_op_priority to 1 (from 10)
>> >>>>> osd_recovery_max_single_start to 1 (from 5)
>> >>>>
>> >>>> I'm under the (possibly erroneous) impression that reducing the
>> number of max backfills doesn't actually reduce recovery speed much (but
>> will reduce memory use), but that dropping the op priority can. I'd rather
>> we make users manually adjust values which can have a material impact on
>> their data safety, even if most of them choose to do so.
>> >>>>
>> >>>> After all, even under our worst behavior we're still doing a lot
>> better than a resilvering RAID array. ;) -Greg
>> >>>> --
>> >>>
>> >>>
>> >>> Greg,
>> >>> When we set...
>> >>>
>> >>> osd recovery max active = 1
>> >>> osd max backfills = 1
>> >>>
>> >>> We see rebalance times go down by more than half and client write
>> performance increase significantly while rebalancing. We initially played
>> with these settings to improve client IO expecting recovery time to get
>> worse, but we got a 2-for-1.
>> >>> This was with firefly using replication, downing an entire node with
>> lots of SAS drives. We left osd_recovery_threads, osd_recovery_op_priority,
>> and osd_recovery_max_single_start default.
>> >>>
>> >>> We dropped osd_recovery_max_active and osd_max_backfills together. If
>> you're right, do you think osd_recovery_max_active=1 is primary reason for
>> the improvement? (higher osd_max_backfills helps recovery time with erasure
>> coding.)
>> >>
>> >> Well, recovery max active and max backfills are similar in many ways.
>> >> Both are about moving data into a new or outdated copy of the PG ? the
>> >> difference is that recovery refers to our log-based recovery (where we
>> >> compare the PG logs and move over the objects which have changed)
>> >> whereas backfill requires us to incrementally move through the entire
>> >> PG's hash space and compare.
>> >> I suspect dropping down max backfills is more important than reducing
>> >> max recovery (gathering recovery metadata happens largely in memory)
>> >> but I don't really know either way.
>> >>
>> >> My comment was meant to convey that I'd prefer we not reduce the
>> >> recovery op priority levels. :)
>> >
>> > We could make a less extreme move than to 1, but IMO we have to reduce
>> it
>> > one way or another.  Every major operator I've talked to does this, our
>> PS
>> > folks have been recommending it for years, and I've yet to see a single
>> > complaint about recovery times... meanwhile we're drowning in a sea of
>> > complaints about the impact on clients.
>> >
>> > How about
>> >
>> >   osd_max_backfills to 1 (from 10)
>> >   osd_recovery_max_active to 3 (from 15)
>> >   osd_recovery_op_priority to 3 (from 10)
>> >   osd_recovery_max_single_start to 1 (from 5)
>> >
>> > (same as above, but 1/3rd the recovery op prio instead of 1/10th)
>> > ?
>> >
>> > sage
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Email:
 shinobu-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org
 skinjo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org

 Life w/ Linux <http://i-shinobu.hatenablog.com/>

[-- Attachment #1.2: Type: text/html, Size: 10827 bytes --]

[-- Attachment #2: Type: text/plain, Size: 178 bytes --]

_______________________________________________
ceph-users mailing list
ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-08-23  4:09 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1939533999.9756941.1432935747903.JavaMail.zimbra@redhat.com>
2015-05-29 21:47 ` Discuss: New default recovery config settings Samuel Just
2015-05-29 22:16   ` Milosz Tanski
     [not found]   ` <1394947829.9758745.1432936033017.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-05-29 22:16     ` Josef Johansson
     [not found]       ` <CAOnYue-zF1driSi1oxGCQ8Vh1UcG=viT5P8AeHA3R1NCca1o6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-29 22:56         ` Stillwell, Bryan
2015-05-29 22:33     ` Somnath Roy
2015-05-29 23:17     ` Gregory Farnum
     [not found]       ` <CAC6JEv8LyM1SRsOszDCj6tLxhZ=hvrEkinDARW8BvyQHDCz+LQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-06-01  7:43         ` Jan Schermer
     [not found]           ` <9555849C-7C6E-485E-B60C-BB4996F96E32-SB6/BxVxTjHtwjQa/ONI9g@public.gmane.org>
2015-06-01  8:13             ` Lionel Bouton
2015-06-01  8:57           ` [ceph-users] " huang jun
2015-06-01  9:01             ` Jan Schermer
2015-06-02  1:39       ` Paul Von-Stamwitz
     [not found]         ` <622F4407872BA447A16110F65453358C03DFB21C4B5F-Y+un6SQecilYCZvkXUWeucM6rOWSkUom@public.gmane.org>
2015-06-02  3:43           ` Gregory Farnum
     [not found]             ` <CAC6JEv-i6tOwPBDjP++EW4FxAKZ_X_prEZ8YZOpLq-RTP1ZguQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-06-03 22:44               ` Sage Weil
2015-06-03 22:55                 ` Gregory Farnum
     [not found]                 ` <alpine.DEB.2.00.1506031541200.26591-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
2015-06-04 21:01                   ` Mike Dawson
     [not found]                     ` <5570BCAE.5050509-ffsCFlcjuZBWk0Htik3J/w@public.gmane.org>
2015-06-04 23:24                       ` Scottix
     [not found]                         ` <CANKFHZ_yWaUYoKt3X5RE8BewRqhM_nvfmKwJPRPcFz444q_nWA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-08-23  4:09                           ` Shinobu
2015-05-31 14:29   ` Justin Erenkrantz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.