From: "Jim Schutt" <jaschut@sandia.gov>
To: Gregory Farnum <gregory.farnum@dreamhost.com>
Cc: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: [RFC PATCH 0/6] Understanding delays due to throttling under very heavy write load
Date: Fri, 3 Feb 2012 16:33:58 -0700 [thread overview]
Message-ID: <4F2C6EE6.4050008@sandia.gov> (raw)
In-Reply-To: <3032884323297001561@unknownmsgid>
On 02/03/2012 10:06 AM, Gregory Farnum wrote:
> On Feb 3, 2012, at 8:18 AM, Jim Schutt<jaschut@sandia.gov> wrote:
>
>> On 02/02/2012 05:28 PM, Gregory Farnum wrote:
>>> On Thu, Feb 2, 2012 at 12:22 PM, Jim Schutt<jaschut@sandia.gov> wrote:
>>>> I found 0 instances of "waiting for commit" in all my OSD logs for my last
>>>> run.
>>>>
>>>> So I never waited on the journal?
>>>
>>> Looks like it. Interesting.
>>>
>>>
>>>>>> So far I'm looking at two behaviours I've noticed that seem anomalous to
>>>>>> me.
>>>>>>
>>>>>> One is that I instrumented ms_dispatch(), and I see it take
>>>>>> a half-second or more several hundred times, out of several
>>>>>> thousand messages. Is that expected?
>>>>>
>>>>>
>>>>> How did you instrument it? If you wrapped the whole function it's
>>>>> possible that those longer runs are actually chewing through several
>>>>> messages that had to get waitlisted for some reason previously.
>>>>> (That's the call to do_waiters().)
>>>>
>>>>
>>>> Yep, I wrapped the whole function, and also instrumented taking osd_lock
>>>> while I was there. About half the time that ms_dispatch() takes more than
>>>> 0.5 seconds, taking osd_lock is responsible for the delay. There's two
>>>> dispatch threads, one for ops and one for rep_ops, right? So one's
>>>> waiting on the other?
>>>
>>> There's just one main dispatcher; no split for the ops and rep_ops .
>>> The reason for that "dispatch_running" is that if there are requests
>>> waiting then the tick() function will run through them if the
>>> messenger dispatch thread is currently idle.
>>> But it is possible for the Messenger to try and dispatch, and for that
>>> to be blocked while some amount of (usually trivial) work is being
>>> done by a different thread, yes. I don't think we've ever observed it
>>> being a problem for anything other than updating OSD maps, though...
>>
>> Ah, OK.
>>
>> I guess I was confused by my log output, e.g.:
>
> D'oh. Sorry, you confused me with your reference to repops, which
> aren't special-cased or anything. But there are two messengers on the
> OSD, each with their own dispatch thread. One of those messengers is
> for clients and one is for other OSDs.
>
> And now that you point that out, I wonder if the problem is lack of
> Cond signaling in ms_dispatch. I'm on my phone right now but I believe
> there's a chunk of commented-out code (why commented instead of
> deleted? I don't know) that we want to uncomment for reasons that will
> become clear when you look at it. :)
> Try that and see how many of your problems disappear?
>
So I cherry-picked Sage's commit 7641a0e171f onto the code
I've been running (1fe75ee6419 + some debug stuff), and saw
no obvious difference in behaviour.
I also tested Sage's suggestion of separating journals and
data, by putting two journal partitions on half my disks,
and two data partitions on the other half. I made the data
partitions relatively small (~200 GiB each on a 1 TiB drive)
to minimize the effect of inner vs. outer tracks.
That didn't seem to help either.
Still looking -- Jim
next prev parent reply other threads:[~2012-02-03 23:34 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-01 15:54 [RFC PATCH 0/6] Understanding delays due to throttling under very heavy write load Jim Schutt
2012-02-01 15:54 ` [RFC PATCH 1/6] msgr: print message sequence number and tid when receiving message envelope Jim Schutt
2012-02-01 15:54 ` [RFC PATCH 2/6] common/Throttle: track sleep/wake sequences in Throttle, report them for policy throttler Jim Schutt
2012-02-01 15:54 ` [RFC PATCH 3/6] common/Throttle: throttle in FIFO order Jim Schutt
2012-02-02 17:53 ` Gregory Farnum
2012-02-02 18:31 ` Jim Schutt
2012-02-02 19:01 ` Gregory Farnum
2012-02-01 15:54 ` [RFC PATCH 4/6] common/Throttle: FIFO throttler doesn't need to signal waiters when max changes Jim Schutt
2012-02-01 15:54 ` [RFC PATCH 5/6] common/Throttle: make get() report number of waiters on entry/exit Jim Schutt
2012-02-01 15:54 ` [RFC PATCH 6/6] msg: log Message interactions with throttler Jim Schutt
2012-02-01 22:33 ` [RFC PATCH 0/6] Understanding delays due to throttling under very heavy write load Gregory Farnum
2012-02-02 15:38 ` Jim Schutt
[not found] ` <4F29CDAA.408@sandia.gov>
[not found] ` <CAF3hT9BZEP_FWS=qt8ivA++aDpPGGFzuD_PtMcvDRS2aDEN+hw@mail.gmail.com>
[not found] ` <4F2AABF5.6050803@sandia.gov>
2012-02-02 17:52 ` Gregory Farnum
2012-02-02 19:06 ` [EXTERNAL] " Jim Schutt
2012-02-02 19:15 ` Sage Weil
2012-02-02 19:33 ` Jim Schutt
2012-02-02 19:32 ` Gregory Farnum
2012-02-02 20:22 ` Jim Schutt
2012-02-02 20:31 ` Jim Schutt
2012-02-03 0:28 ` [EXTERNAL] " Gregory Farnum
2012-02-03 16:17 ` Jim Schutt
2012-02-03 17:06 ` Gregory Farnum
2012-02-03 23:33 ` Jim Schutt [this message]
[not found] ` <CAC-hyiHSNv_VgLcyVCrJ66HxTGFNBONrmmBddJk5326dLTKgkw@mail.gmail.com>
2012-02-04 0:04 ` Yehuda Sadeh Weinraub
2012-02-06 16:20 ` Jim Schutt
2012-02-06 17:22 ` Yehuda Sadeh Weinraub
2012-02-06 18:20 ` Jim Schutt
2012-02-06 18:35 ` Gregory Farnum
2012-02-09 20:53 ` Jim Schutt
2012-02-09 22:40 ` sridhar basam
2012-02-09 23:15 ` Jim Schutt
2012-02-10 0:34 ` Tommi Virtanen
2012-02-10 1:26 ` sridhar basam
2012-02-10 15:32 ` [EXTERNAL] " Jim Schutt
2012-02-10 17:13 ` sridhar basam
2012-02-10 23:09 ` Jim Schutt
2012-02-11 0:05 ` sridhar basam
2012-02-13 15:26 ` Jim Schutt
2012-02-03 17:07 ` Sage Weil
2012-02-24 15:38 ` Jim Schutt
2012-02-24 18:31 ` Tommi Virtanen
2012-02-24 18:38 ` Tommi Virtanen
2013-02-21 0:12 ` Sage Weil
2013-02-26 19:16 ` Jim Schutt
2013-02-26 19:36 ` Sage Weil
2013-02-28 19:37 ` Jim Schutt
2013-02-28 21:06 ` Sage Weil
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F2C6EE6.4050008@sandia.gov \
--to=jaschut@sandia.gov \
--cc=ceph-devel@vger.kernel.org \
--cc=gregory.farnum@dreamhost.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.