All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mark Nelson <mark.nelson@inktank.com>
To: Maciej Bonin <maciej.bonin@m247.com>
Cc: ceph-devel@vger.kernel.org
Subject: Re: Flapping osd / continuously reported as failed
Date: Fri, 24 Jan 2014 07:36:02 -0600	[thread overview]
Message-ID: <52E26C42.6090806@inktank.com> (raw)
In-Reply-To: <loom.20140124T132806-603@post.gmane.org>

On 01/24/2014 06:29 AM, Maciej Bonin wrote:
> Gregory Farnum <greg@...> writes:
>
>>
>> On Mon, Aug 19, 2013 at 3:09 PM, Mostowiec Dominik
>> <Dominik.Mostowiec@...> wrote:
>>> Hi,
>>>> Yes, it definitely can as scrubbing takes locks on the PG, which will
> prevent reads or writes while the
>> message is being processed (which will involve the rgw index being
> scanned).
>>> It is possible to tune scrubbing config for eliminate slow requests and
> marking osd down when large rgw
>> bucket index is scrubbing?
>>
>> Unfortunately not, or we would have mentioned it before. :/ There are
>> some proposals for sharding bucket indexes that would ameliorate this
>> problem, and on Cuttlefish or Dumpling the OSD won't get marked down,
>> but it will still block incoming requests on that object (ie, requests
>> to access the bucket) while the scrubbing is in place.
>> That said, that improvement might be sufficient since you haven't
>> actually shown us how long the object scrub takes.
>> -Greg
>> Software Engineer #42  <at>  http://inktank.com | http://ceph.com
>>
>
>
> Hello Guys,
>
> I just wanted to share that we've had a similar problem and we had solved it
> by borrowing sensible kernel option defaults from a radosgw patch iirc.
> net.ipv4.ip_local_port_range = 1024 65535
> net.core.netdev_max_backlog = 30000
> net.core.somaxconn = 4096
> net.ipv4.tcp_max_syn_backlog = 252144
> net.ipv4.tcp_max_tw_buckets = 360000
> net.ipv4.tcp_fin_timeout = 3
> net.ipv4.tcp_max_orphans = 262144
> net.ipv4.tcp_synack_retries = 2
> net.ipv4.tcp_syn_retries = 2

FWIW, these may not strictly help with the situation you described, but 
at least on our test cluster helped improve RGW performance in general 
on 10GbE+:

echo 33554432 | sudo tee /proc/sys/net/core/rmem_default
echo 33554432 | sudo tee /proc/sys/net/core/wmem_default
echo 33554432 | sudo tee /proc/sys/net/core/rmem_max
echo 33554432 | sudo tee /proc/sys/net/core/wmem_max
echo "10240 87380 33554432" | sudo tee /proc/sys/net/ipv4/tcp_rmem
echo "10240 87380 33554432" | sudo tee /proc/sys/net/ipv4/tcp_wmem
echo 250000 | sudo tee /proc/sys/net/core/netdev_max_backlog
echo 524288 | sudo tee /proc/sys/net/nf_conntrack_max
echo 1 | sudo tee /proc/sys/net/ipv4/tcp_tw_recycle
echo 1 | sudo tee /proc/sys/net/ipv4/tcp_tw_reuse

>
>
> Regards,
> Maciej Bonin
> Systems Engineer
> m247.com
> ISO 27001 Data Protection Classification: A - Public
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


  reply	other threads:[~2014-01-24 13:36 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-23 21:50 Flapping osd / continuously reported as failed Studziński Krzysztof
     [not found] ` <0D057B737C42FC4AB3F22773A5C9425F259DBDEDD0-K9pFWFEelezFe27LHpJFGNHuzzzSOjJt@public.gmane.org>
2013-07-23 22:12   ` Gregory Farnum
     [not found]     ` <CAPYLRzjGDep1ny6K-Ctz_7VG4THV6nAx9odOdjr=WNNesV4cVA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-07-23 22:20       ` Studziński Krzysztof
     [not found]         ` <0D057B737C42FC4AB3F22773A5C9425F259DBDEDD1-K9pFWFEelezFe27LHpJFGNHuzzzSOjJt@public.gmane.org>
2013-07-23 22:28           ` Gregory Farnum
     [not found]             ` <CAPYLRzhVtMCY+-d-y5F5M5hMVDwRh343+bB7An4Xcw4DT3n82w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-07-23 23:18               ` Studziński Krzysztof
2013-07-24  7:48             ` [ceph-users] " Studziński Krzysztof
     [not found]               ` <0D057B737C42FC4AB3F22773A5C9425F259DBDF026-K9pFWFEelezFe27LHpJFGNHuzzzSOjJt@public.gmane.org>
2013-07-25  7:47                 ` Mostowiec Dominik
2013-07-25 17:32                   ` [ceph-users] " Gregory Farnum
     [not found]                     ` <CAPYLRzghUwEvu_f0aV2Q37JqnyCJ=46cTWiteTwN4=Tmqxd3HA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-08-16 12:47                       ` Mostowiec Dominik
2013-08-19 19:55                         ` [ceph-users] " Gregory Farnum
2013-08-19 22:09                           ` Mostowiec Dominik
     [not found]                             ` <ADBDB4FFB0814748AF32D0A1EE6E10AF228322C9F0-K9pFWFEelezFe27LHpJFGNHuzzzSOjJt@public.gmane.org>
2013-08-19 22:19                               ` Gregory Farnum
2014-01-24 12:29                                 ` Maciej Bonin
2014-01-24 13:36                                   ` Mark Nelson [this message]
  -- strict thread matches above, loose matches on Subject: below --
2013-07-23 21:36 Studziński Krzysztof

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52E26C42.6090806@inktank.com \
    --to=mark.nelson@inktank.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=maciej.bonin@m247.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.