From: "Jim Schutt" <jaschut@sandia.gov>
To: Sage Weil <sage@inktank.com>
Cc: Gregory Farnum <greg@inktank.com>,
Joao Eduardo Luis <joao.luis@inktank.com>,
"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: Trouble getting a new file system to start, for v0.59 and newer
Date: Thu, 4 Apr 2013 13:14:20 -0600 [thread overview]
Message-ID: <515DD10C.2030602@sandia.gov> (raw)
In-Reply-To: <515CAFED.60705@sandia.gov>
On 04/03/2013 04:40 PM, Jim Schutt wrote:
> On 04/03/2013 12:25 PM, Sage Weil wrote:
>>>>>> >>> > > Sorry, guess I forgot some of the history since this piece at least is
>>>>>> >>> > > resolved now. I'm surprised if 30-second timeouts are causing issues
>>>>>> >>> > > without those overloads you were seeing; have you seen this issue
>>>>>> >>> > > without your high debugging levels and without the bad PG commits (due
>>>>>> >>> > > to debugging)?
>>>> >> >
>>>> >> > I think so, because that's why I started with higher debugging
>>>> >> > levels.
>>>> >> >
>>>> >> > But, as it turns out, I'm just in the process of returning to my
>>>> >> > testing of next, with all my debugging back to 0. So, I'll try
>>>> >> > the default timeout of 30 seconds first. If I have trouble starting
>>>> >> > up a new file system, I'll turn up the timeout and try again, without
>>>> >> > any extra debugging. Either way, I'll let you know what happens.
>> > I would be curious to hear roughly what value between 30 and 300 is
>> > sufficient, if you can experiment just a bit. We probably want to adjust
>> > the default.
>> >
>> > Perhaps more importantly, we'll need to look at the performance of the pg
>> > stat updates on the mon. There is a refactor due in that code that should
>> > improve life, but it's slated for dumpling.
> OK, here's some results, with all debugging at 0, using current next...
>
> My testing is for 1 mon + 576 OSDs, 24/host. All my storage cluster hosts
> use 10 GbE NICs now. The mon host uses an SSD for the mon data store.
> My test procedure is to start 'ceph -w', start all the OSDs, and once
> they're all running start the mon. I report the time from starting
> the mon to all PGs active+clean.
As a sanity check, to be sure I wasn't doing something differently
now than I remember doing before, I re-ran this test for v0.57,
v0.58, and v0.59, using default 'osd mon ack timeout', default
'paxos propose interval', and no debugging:
55,392 PGs 221,568 PGs
v0.57 1m 07s 9m 42s
v0.58 1m 04s 11m 44s
v0.59 >30m not attempted
The v0.57/v0.58 runs showed no signs of stress, e.g. no
slow op reports, etc.
The v0.59 run behaved as I previously reported, i.e.,
lots of stale peers, OSDs wrongly marked down, etc.,
before I gave up on it.
-- Jim
prev parent reply other threads:[~2013-04-04 19:14 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-03 15:46 Trouble getting a new file system to start, for v0.59 and newer Jim Schutt
2013-04-03 15:58 ` Sage Weil
2013-04-03 17:09 ` Jim Schutt
2013-04-03 17:14 ` Gregory Farnum
2013-04-03 17:49 ` Gregory Farnum
2013-04-03 17:58 ` Jim Schutt
2013-04-03 18:25 ` Sage Weil
2013-04-03 22:40 ` Jim Schutt
2013-04-03 22:51 ` Gregory Farnum
2013-04-04 14:15 ` Jim Schutt
2013-04-04 15:52 ` Jim Schutt
2013-04-04 19:14 ` Jim Schutt [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=515DD10C.2030602@sandia.gov \
--to=jaschut@sandia.gov \
--cc=ceph-devel@vger.kernel.org \
--cc=greg@inktank.com \
--cc=joao.luis@inktank.com \
--cc=sage@inktank.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.