slow request problem

All of lore.kernel.org
 help / color / mirror / Atom feed

* slow request problem
@ 2013-07-12 13:16 Stefan Priebe - Profihost AG
       [not found] ` <51E00197.7010708-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Priebe - Profihost AG @ 2013-07-12 13:16 UTC (permalink / raw)
  To: ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; +Cc: ceph-users-Qp0mS5GaXlQ

Hello list,

anyone else here who always has problems bringing back an offline OSD?
Since cuttlefish i'm seeing slow requests for the first 2-5 minutes
after bringing an OSD oinline again but that's so long that the VMs
crash as they think their disk is offline...

Under bobtail i never had any problems with that.

Please HELP!

Greets,
Stefan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: slow request problem
       [not found] ` <51E00197.7010708-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
@ 2013-07-14 12:15   ` Stefan Priebe
       [not found]     ` <51E29660.4040200-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Priebe @ 2013-07-14 12:15 UTC (permalink / raw)
  To: ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; +Cc: ceph-users-Qp0mS5GaXlQ

Hello list,

might this be a problem due to having too much PGs? I've 370 per OSD 
instead of having 33 / OSD (OSDs*100/3).

Is there any plan for PG merging?

Stefan
> Hello list,
>
> anyone else here who always has problems bringing back an offline OSD?
> Since cuttlefish i'm seeing slow requests for the first 2-5 minutes
> after bringing an OSD oinline again but that's so long that the VMs
> crash as they think their disk is offline...
>
> Under bobtail i never had any problems with that.
>
> Please HELP!
>
> Greets,
> Stefan
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: slow request problem
       [not found]     ` <51E29660.4040200-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
@ 2013-07-14 15:01       ` Sage Weil
       [not found]         ` <alpine.DEB.2.00.1307140753380.25223-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Sage Weil @ 2013-07-14 15:01 UTC (permalink / raw)
  To: Stefan Priebe
  Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	ceph-users-Qp0mS5GaXlQ

On Sun, 14 Jul 2013, Stefan Priebe wrote:
> Hello list,
> 
> might this be a problem due to having too much PGs? I've 370 per OSD instead
> of having 33 / OSD (OSDs*100/3).

That might exacerbate it.

Can you try setting

 osd min pg log entries = 50
 osd max pg log entries = 100

across your cluster, restarting your osds, and see if that makes a 
difference?  I'm wondering if this is a problem with pg log rewrites after 
peering.  Note that adding that option and restarting isn't enough to 
trigger the trim; you have to hit the cluster with some IO too, and (if 
this is the source of your problem) the trim itself might be expensive.  
So add it, restart, do a bunch of io (to all pools/pgs if you can), and 
then see if the problem is still present?

Also note that the lower osd min pg log entries means that the osd cannot 
be down as long without requiring a backfill (50 ios per pg).  These 
probably aren't the values that we want, but I'd like to find out whether 
the pg log rewrites after peering in cuttlefish are the culprit here.

Thanks!

> Is there any plan for PG merging?

Not right now.  :(  I'll talk to Sam, though, to see how difficult it 
would be given the split approach we settled on.

Thanks!
sage

> 
> Stefan
> > Hello list,
> > 
> > anyone else here who always has problems bringing back an offline OSD?
> > Since cuttlefish i'm seeing slow requests for the first 2-5 minutes
> > after bringing an OSD oinline again but that's so long that the VMs
> > crash as they think their disk is offline...
> > 
> > Under bobtail i never had any problems with that.
> > 
> > Please HELP!
> > 
> > Greets,
> > Stefan
> > 
> _______________________________________________
> ceph-users mailing list
> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: slow request problem
       [not found]         ` <alpine.DEB.2.00.1307140753380.25223-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
@ 2013-07-14 15:46           ` Stefan Priebe - Profihost AG
       [not found]             ` <10993195-49C7-44DB-B6D2-EFF484204DB3-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Priebe - Profihost AG @ 2013-07-14 15:46 UTC (permalink / raw)
  To: Sage Weil
  Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	ceph-users-Qp0mS5GaXlQ@public.gmane.org

Hi sage,

Am 14.07.2013 um 17:01 schrieb Sage Weil <sage-4GqslpFJ+cxBDgjK7y7TUQ@public.gmane.org>:

> On Sun, 14 Jul 2013, Stefan Priebe wrote:
>> Hello list,
>> 
>> might this be a problem due to having too much PGs? I've 370 per OSD instead
>> of having 33 / OSD (OSDs*100/3).
> 
> That might exacerbate it.
> 
> Can you try setting
> 
> osd min pg log entries = 50
> osd max pg log entries = 100

What does that exactly do? And why is a restart of all osds needed. Thanks!

> across your cluster, restarting your osds, and see if that makes a 
> difference?  I'm wondering if this is a problem with pg log rewrites after 
> peering.  Note that adding that option and restarting isn't enough to 
> trigger the trim; you have to hit the cluster with some IO too, and (if 
> this is the source of your problem) the trim itself might be expensive.  
> So add it, restart, do a bunch of io (to all pools/pgs if you can), and 
> then see if the problem is still present?

Will try can't produce a write to every pg. it's a prod. Cluster with KVM rbd. But it has 800-1200 iop/s per second. 

> 
> Also note that the lower osd min pg log entries means that the osd cannot 
> be down as long without requiring a backfill (50 ios per pg).  These 
> probably aren't the values that we want, but I'd like to find out whether 
> the pg log rewrites after peering in cuttlefish are the culprit here.


> 
> Thanks!
> 
>> Is there any plan for PG merging?
> 
> Not right now.  :(  I'll talk to Sam, though, to see how difficult it 
> would be given the split approach we settled on.
> 
> Thanks!
> sage
> 
> 
>> 
>> Stefan
>>> Hello list,
>>> 
>>> anyone else here who always has problems bringing back an offline OSD?
>>> Since cuttlefish i'm seeing slow requests for the first 2-5 minutes
>>> after bringing an OSD oinline again but that's so long that the VMs
>>> crash as they think their disk is offline...
>>> 
>>> Under bobtail i never had any problems with that.
>>> 
>>> Please HELP!
>>> 
>>> Greets,
>>> Stefan
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: slow request problem
       [not found]             ` <10993195-49C7-44DB-B6D2-EFF484204DB3-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
@ 2013-07-14 16:19               ` Sage Weil
  2013-07-14 18:26                 ` [ceph-users] " Stefan Priebe
  0 siblings, 1 reply; 8+ messages in thread
From: Sage Weil @ 2013-07-14 16:19 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	ceph-users-Qp0mS5GaXlQ@public.gmane.org

On Sun, 14 Jul 2013, Stefan Priebe - Profihost AG wrote:
> Hi sage,
> 
> Am 14.07.2013 um 17:01 schrieb Sage Weil <sage-4GqslpFJ+cxBDgjK7y7TUQ@public.gmane.org>:
> 
> > On Sun, 14 Jul 2013, Stefan Priebe wrote:
> >> Hello list,
> >> 
> >> might this be a problem due to having too much PGs? I've 370 per OSD instead
> >> of having 33 / OSD (OSDs*100/3).
> > 
> > That might exacerbate it.
> > 
> > Can you try setting
> > 
> > osd min pg log entries = 50
> > osd max pg log entries = 100
> 
> What does that exactly do? And why is a restart of all osds needed. Thanks!

This limits the size of the pg log.

> 
> > across your cluster, restarting your osds, and see if that makes a 
> > difference?  I'm wondering if this is a problem with pg log rewrites after 
> > peering.  Note that adding that option and restarting isn't enough to 
> > trigger the trim; you have to hit the cluster with some IO too, and (if 
> > this is the source of your problem) the trim itself might be expensive.  
> > So add it, restart, do a bunch of io (to all pools/pgs if you can), and 
> > then see if the problem is still present?
> 
> Will try can't produce a write to every pg. it's a prod. Cluster with 
> KVM rbd. But it has 800-1200 iop/s per second.

Hmm, if this is a production cluster, I would be careful, then!  Setting 
the pg logs too short can lead to backfill, which is very expensive (as 
you know).

The defaults are 3000 / 10000, so maybe try something less aggressive like 
changing min to 500?

Also, I think

 ceph osd tell \* injectargs '--osd-min-pg-log-entries 500'

should work as well.  But again, be aware that lowering the value will 
incur a trim that may in itself be a bit expensive (if this is the source 
of the problem).

It is probably worth watching ceph pg dump | grep $some_random_pg and 
watching the 'v' column over time (say, a minute or two) to see how 
quickly pg events are being generated on your cluster. This will give you 
a sense of how much time 500 (or however many) pg log entries covers!

sage


> 
> > 
> > Also note that the lower osd min pg log entries means that the osd cannot 
> > be down as long without requiring a backfill (50 ios per pg).  These 
> > probably aren't the values that we want, but I'd like to find out whether 
> > the pg log rewrites after peering in cuttlefish are the culprit here.
> 
> 
> > 
> > Thanks!
> > 
> >> Is there any plan for PG merging?
> > 
> > Not right now.  :(  I'll talk to Sam, though, to see how difficult it 
> > would be given the split approach we settled on.
> > 
> > Thanks!
> > sage
> > 
> > 
> >> 
> >> Stefan
> >>> Hello list,
> >>> 
> >>> anyone else here who always has problems bringing back an offline OSD?
> >>> Since cuttlefish i'm seeing slow requests for the first 2-5 minutes
> >>> after bringing an OSD oinline again but that's so long that the VMs
> >>> crash as they think their disk is offline...
> >>> 
> >>> Under bobtail i never had any problems with that.
> >>> 
> >>> Please HELP!
> >>> 
> >>> Greets,
> >>> Stefan
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> 
> >> 
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [ceph-users] slow request problem
  2013-07-14 16:19               ` Sage Weil
@ 2013-07-14 18:26                 ` Stefan Priebe
       [not found]                   ` <51E2ED49.30304-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Priebe @ 2013-07-14 18:26 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel@vger.kernel.org, ceph-users@ceph.com

Am 14.07.2013 18:19, schrieb Sage Weil:
> On Sun, 14 Jul 2013, Stefan Priebe - Profihost AG wrote:
>> Hi sage,
>>
>> Am 14.07.2013 um 17:01 schrieb Sage Weil <sage@inktank.com>:
>>
>>> On Sun, 14 Jul 2013, Stefan Priebe wrote:
>>>> Hello list,
>>>>
>>>> might this be a problem due to having too much PGs? I've 370 per OSD instead
>>>> of having 33 / OSD (OSDs*100/3).
>>>
>>> That might exacerbate it.
>>>
>>> Can you try setting
>>>
>>> osd min pg log entries = 50
>>> osd max pg log entries = 100
>>
>> What does that exactly do? And why is a restart of all osds needed. Thanks!
>
> This limits the size of the pg log.
>
>>
>>> across your cluster, restarting your osds, and see if that makes a
>>> difference?  I'm wondering if this is a problem with pg log rewrites after
>>> peering.  Note that adding that option and restarting isn't enough to
>>> trigger the trim; you have to hit the cluster with some IO too, and (if
>>> this is the source of your problem) the trim itself might be expensive.
>>> So add it, restart, do a bunch of io (to all pools/pgs if you can), and
>>> then see if the problem is still present?
>>
>> Will try can't produce a write to every pg. it's a prod. Cluster with
>> KVM rbd. But it has 800-1200 iop/s per second.
>
> Hmm, if this is a production cluster, I would be careful, then!  Setting
> the pg logs too short can lead to backfill, which is very expensive (as
> you know).
>
> The defaults are 3000 / 10000, so maybe try something less aggressive like
> changing min to 500?

I've lowered the values to 500 / 1500 and it seems to lower the impact 
but does not seem to solve that one.

Stefan

> Also, I think
>
>   ceph osd tell \* injectargs '--osd-min-pg-log-entries 500'
>
> should work as well.  But again, be aware that lowering the value will
> incur a trim that may in itself be a bit expensive (if this is the source
> of the problem).
>
> It is probably worth watching ceph pg dump | grep $some_random_pg and
> watching the 'v' column over time (say, a minute or two) to see how
> quickly pg events are being generated on your cluster. This will give you
> a sense of how much time 500 (or however many) pg log entries covers!
>
> sage
>
>
>>
>>>
>>> Also note that the lower osd min pg log entries means that the osd cannot
>>> be down as long without requiring a backfill (50 ios per pg).  These
>>> probably aren't the values that we want, but I'd like to find out whether
>>> the pg log rewrites after peering in cuttlefish are the culprit here.
>>
>>
>>>
>>> Thanks!
>>>
>>>> Is there any plan for PG merging?
>>>
>>> Not right now.  :(  I'll talk to Sam, though, to see how difficult it
>>> would be given the split approach we settled on.
>>>
>>> Thanks!
>>> sage
>>>
>>>
>>>>
>>>> Stefan
>>>>> Hello list,
>>>>>
>>>>> anyone else here who always has problems bringing back an offline OSD?
>>>>> Since cuttlefish i'm seeing slow requests for the first 2-5 minutes
>>>>> after bringing an OSD oinline again but that's so long that the VMs
>>>>> crash as they think their disk is offline...
>>>>>
>>>>> Under bobtail i never had any problems with that.
>>>>>
>>>>> Please HELP!
>>>>>
>>>>> Greets,
>>>>> Stefan
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>>
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: slow request problem
       [not found]                   ` <51E2ED49.30304-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
@ 2013-07-14 19:05                     ` Sage Weil
  2013-07-14 19:26                       ` [ceph-users] " Stefan Priebe
  0 siblings, 1 reply; 8+ messages in thread
From: Sage Weil @ 2013-07-14 19:05 UTC (permalink / raw)
  To: Stefan Priebe
  Cc: ceph-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	ceph-users-Qp0mS5GaXlQ@public.gmane.org

On Sun, 14 Jul 2013, Stefan Priebe wrote:
> Am 14.07.2013 18:19, schrieb Sage Weil:
> > On Sun, 14 Jul 2013, Stefan Priebe - Profihost AG wrote:
> > > Hi sage,
> > > 
> > > Am 14.07.2013 um 17:01 schrieb Sage Weil <sage-4GqslpFJ+cxBDgjK7y7TUQ@public.gmane.org>:
> > > 
> > > > On Sun, 14 Jul 2013, Stefan Priebe wrote:
> > > > > Hello list,
> > > > > 
> > > > > might this be a problem due to having too much PGs? I've 370 per OSD
> > > > > instead
> > > > > of having 33 / OSD (OSDs*100/3).
> > > > 
> > > > That might exacerbate it.
> > > > 
> > > > Can you try setting
> > > > 
> > > > osd min pg log entries = 50
> > > > osd max pg log entries = 100
> > > 
> > > What does that exactly do? And why is a restart of all osds needed.
> > > Thanks!
> > 
> > This limits the size of the pg log.
> > 
> > > 
> > > > across your cluster, restarting your osds, and see if that makes a
> > > > difference?  I'm wondering if this is a problem with pg log rewrites
> > > > after
> > > > peering.  Note that adding that option and restarting isn't enough to
> > > > trigger the trim; you have to hit the cluster with some IO too, and (if
> > > > this is the source of your problem) the trim itself might be expensive.
> > > > So add it, restart, do a bunch of io (to all pools/pgs if you can), and
> > > > then see if the problem is still present?
> > > 
> > > Will try can't produce a write to every pg. it's a prod. Cluster with
> > > KVM rbd. But it has 800-1200 iop/s per second.
> > 
> > Hmm, if this is a production cluster, I would be careful, then!  Setting
> > the pg logs too short can lead to backfill, which is very expensive (as
> > you know).
> > 
> > The defaults are 3000 / 10000, so maybe try something less aggressive like
> > changing min to 500?
> 
> I've lowered the values to 500 / 1500 and it seems to lower the impact but
> does not seem to solve that one.

This suggests that the problem is the pg log rewrites that are an inherent 
part of cuttlefish.  This is replaced with improved rewrite logic in 0.66 
or so, so dumpling will be better.  I suspect that having a large number 
of pgs is exacerbating the issue for you.

We think there is still a different peering performance problem that Sam 
and paravoid have been trying to track down, but I believe in that case 
reducing the pg log sizes didn't have much effect.  (Maybe one of them can 
chime in here.)

This was unfortunately something we failed to catch before cuttlefish was 
released.  One of the main focuses right now is in creating large clusters 
and observing peering and recovery to make sure we don't repeat the same 
sort of mistake for dumpling!

sage



> 
> Stefan
> 
> > Also, I think
> > 
> >   ceph osd tell \* injectargs '--osd-min-pg-log-entries 500'
> > 
> > should work as well.  But again, be aware that lowering the value will
> > incur a trim that may in itself be a bit expensive (if this is the source
> > of the problem).
> > 
> > It is probably worth watching ceph pg dump | grep $some_random_pg and
> > watching the 'v' column over time (say, a minute or two) to see how
> > quickly pg events are being generated on your cluster. This will give you
> > a sense of how much time 500 (or however many) pg log entries covers!
> > 
> > sage
> > 
> > 
> > > 
> > > > 
> > > > Also note that the lower osd min pg log entries means that the osd
> > > > cannot
> > > > be down as long without requiring a backfill (50 ios per pg).  These
> > > > probably aren't the values that we want, but I'd like to find out
> > > > whether
> > > > the pg log rewrites after peering in cuttlefish are the culprit here.
> > > 
> > > 
> > > > 
> > > > Thanks!
> > > > 
> > > > > Is there any plan for PG merging?
> > > > 
> > > > Not right now.  :(  I'll talk to Sam, though, to see how difficult it
> > > > would be given the split approach we settled on.
> > > > 
> > > > Thanks!
> > > > sage
> > > > 
> > > > 
> > > > > 
> > > > > Stefan
> > > > > > Hello list,
> > > > > > 
> > > > > > anyone else here who always has problems bringing back an offline
> > > > > > OSD?
> > > > > > Since cuttlefish i'm seeing slow requests for the first 2-5 minutes
> > > > > > after bringing an OSD oinline again but that's so long that the VMs
> > > > > > crash as they think their disk is offline...
> > > > > > 
> > > > > > Under bobtail i never had any problems with that.
> > > > > > 
> > > > > > Please HELP!
> > > > > > 
> > > > > > Greets,
> > > > > > Stefan
> > > > > _______________________________________________
> > > > > ceph-users mailing list
> > > > > ceph-users-idqoXFIVOFJgJs9I8MT0rw@public.gmane.org
> > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > 
> > > > > 
> > > 
> > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [ceph-users] slow request problem
  2013-07-14 19:05                     ` Sage Weil
@ 2013-07-14 19:26                       ` Stefan Priebe
  0 siblings, 0 replies; 8+ messages in thread
From: Stefan Priebe @ 2013-07-14 19:26 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel@vger.kernel.org, ceph-users@ceph.com

Am 14.07.2013 21:05, schrieb Sage Weil:
> On Sun, 14 Jul 2013, Stefan Priebe wrote:
>> Am 14.07.2013 18:19, schrieb Sage Weil:
>>> On Sun, 14 Jul 2013, Stefan Priebe - Profihost AG wrote:
>>>> Hi sage,
>>>>
>>>> Am 14.07.2013 um 17:01 schrieb Sage Weil <sage@inktank.com>:
>>>>
>>>>> On Sun, 14 Jul 2013, Stefan Priebe wrote:
>>>>>> Hello list,
>>>>>>
>>>>>> might this be a problem due to having too much PGs? I've 370 per OSD
>>>>>> instead
>>>>>> of having 33 / OSD (OSDs*100/3).
>>>>>
>>>>> That might exacerbate it.
>>>>>
>>>>> Can you try setting
>>>>>
>>>>> osd min pg log entries = 50
>>>>> osd max pg log entries = 100
>>>>
>>>> What does that exactly do? And why is a restart of all osds needed.
>>>> Thanks!
>>>
>>> This limits the size of the pg log.
>>>
>>>>
>>>>> across your cluster, restarting your osds, and see if that makes a
>>>>> difference?  I'm wondering if this is a problem with pg log rewrites
>>>>> after
>>>>> peering.  Note that adding that option and restarting isn't enough to
>>>>> trigger the trim; you have to hit the cluster with some IO too, and (if
>>>>> this is the source of your problem) the trim itself might be expensive.
>>>>> So add it, restart, do a bunch of io (to all pools/pgs if you can), and
>>>>> then see if the problem is still present?
>>>>
>>>> Will try can't produce a write to every pg. it's a prod. Cluster with
>>>> KVM rbd. But it has 800-1200 iop/s per second.
>>>
>>> Hmm, if this is a production cluster, I would be careful, then!  Setting
>>> the pg logs too short can lead to backfill, which is very expensive (as
>>> you know).
>>>
>>> The defaults are 3000 / 10000, so maybe try something less aggressive like
>>> changing min to 500?
>>
>> I've lowered the values to 500 / 1500 and it seems to lower the impact but
>> does not seem to solve that one.
>
> This suggests that the problem is the pg log rewrites that are an inherent
> part of cuttlefish.  This is replaced with improved rewrite logic in 0.66
> or so, so dumpling will be better.  I suspect that having a large number
> of pgs is exacerbating the issue for you.
>
> We think there is still a different peering performance problem that Sam
> and paravoid have been trying to track down, but I believe in that case
> reducing the pg log sizes didn't have much effect.  (Maybe one of them can
> chime in here.)
>
> This was unfortunately something we failed to catch before cuttlefish was
> released.  One of the main focuses right now is in creating large clusters
> and observing peering and recovery to make sure we don't repeat the same
> sort of mistake for dumpling!

Thanks Sage for these information. I had some OSD restarts which went 
better with the new settings but others which don't. But it's hard to 
measure and compare restart OSD.X with OSD.Y.

Do you have any recommandations for me? Wait for dumpling and hope that 
nothing fails until then? Or upgrading to 0.66? Or trying to move all 
data to a new pool having fewer PGs?

Thanks!

Greets,
Stefan

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-07-14 19:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-12 13:16 slow request problem Stefan Priebe - Profihost AG
     [not found] ` <51E00197.7010708-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
2013-07-14 12:15   ` Stefan Priebe
     [not found]     ` <51E29660.4040200-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
2013-07-14 15:01       ` Sage Weil
     [not found]         ` <alpine.DEB.2.00.1307140753380.25223-vIokxiIdD2AQNTJnQDzGJqxOck334EZe@public.gmane.org>
2013-07-14 15:46           ` Stefan Priebe - Profihost AG
     [not found]             ` <10993195-49C7-44DB-B6D2-EFF484204DB3-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
2013-07-14 16:19               ` Sage Weil
2013-07-14 18:26                 ` [ceph-users] " Stefan Priebe
     [not found]                   ` <51E2ED49.30304-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
2013-07-14 19:05                     ` Sage Weil
2013-07-14 19:26                       ` [ceph-users] " Stefan Priebe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.