From: Wido den Hollander <wido@widodh.nl>
To: Yann ROBIN <yann.robin@youscribe.com>
Cc: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: Osd load and timeout when recovering
Date: Mon, 22 Oct 2012 14:02:36 +0200 [thread overview]
Message-ID: <508535DC.5070506@widodh.nl> (raw)
In-Reply-To: <367CC0B0FC02EE47BF7482398BD623C53F6582FE@DB3PRD0311MB416.eurprd03.prod.outlook.com>
On 10/22/2012 12:27 PM, Yann ROBIN wrote:
> Hi,
>
> We use ceph to store small file (lot of them) on different servers and access it using rados gateway.
> Our data size is 380Go (very small). We have two host with 5 osd each.
> We use small config for ceph : 2Go RAM server with 5 x 2To Disk (one OSD on each disk).
> This is a very cheap config that allow us to keep our storing cost under control and it's enough to get the read performance we need.
> (We use this config with mogilefs to store 150To of data)
>
> This week-end we had an alert saying ceph was down.
>
> After looking at the osd, we saw a very high load on osd (450 of load), some were down.
> Ceph -s displayed that we were having down pg, peering+down pg, remapped pg. etc.
>
Could you tell us a bit more?
When the load was 450, was this mainly due to disk I/O wait?
Did the machines start to swap?
Could it be that the swapping was actually causing the machines to die
even more?
Although a OSD could run with 100M of memory, during recovery it can
grow quite fast.
> So we started to see that when we were peering and stuff like that, the load was very high.
> OSD stop responding and we could see in the log message like :
> FileStore timeout and Abort Signal
>
> So basically the cluster was under load because we was recovering... but because it was under load recovering could not complete.
>
FileStore aborts indicate that it couldn't get the work done quickly
enough. I've seen this with btrfs, but you say you are using XFS.
You say you are storing small files. What exactly is "small"?
Wido
> We change this params to get a longer timeout :
> filestore op thread suicide timeout = 360
> filestore op thread timeout = 180
> osd default notify timeout = 360
>
> The cluster was still under heavy load, osd was still timeouting (less timeouting but still)
>
> So we test param to "throttle" the "recovery" process :
> filestore op threads = 6
> filestore queue max ops = 24
> osd recovery max active = 1
>
> Load was better, but still very high (30).
>
> We also try to put the journal in a tmpfs with zram.
> We set noout so it won't copy files to satisfy the replicate count because osd were out.
>
> We then updated to kernel 3.5 to get last xfs optim.
>
> In the end nothing was working we were in the same infinite death loop of recovering => load => timeout => recovering.
> So we updated from ceph 0.48.2 to 0.53, load was better and recovery finally worked.
>
> As we don't want to be in the position again (24h downtime), I have some questions on ceph/rados.
>
> 1/ Even when we switch to ceph 0.53, the rados gateway was still not responding, Log was displaying Initalization timeout.
> Is it normal that the "recovering" process kill the fact that we can read data from ceph ?
> The data is here, it is just moving, why can't we access it ?
>
> 2/ In case of very high load because ceph is moving data, is there a way to tell ceph to go slowly ?
>
>
> Thanks,
>
> --
> Yann ROBIN
> Société Publica
> www.YouScribe.com
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2012-10-22 12:02 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-22 10:27 Osd load and timeout when recovering Yann ROBIN
2012-10-22 12:02 ` Wido den Hollander [this message]
2012-10-22 13:59 ` Yann ROBIN
2012-10-22 14:26 ` Gregory Farnum
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=508535DC.5070506@widodh.nl \
--to=wido@widodh.nl \
--cc=ceph-devel@vger.kernel.org \
--cc=yann.robin@youscribe.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.