Re: perf counters from a performance discrepancy

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mark Nelson <mnelson@redhat.com>
To: Gregory Farnum <gfarnum@redhat.com>, Sage Weil <sage@newdream.net>
Cc: "Deneau, Tom" <tom.deneau@amd.com>,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: perf counters from a performance discrepancy
Date: Wed, 23 Sep 2015 13:42:32 -0500	[thread overview]
Message-ID: <5602F298.7060609@redhat.com> (raw)
In-Reply-To: <CAJ4mKGavY4ZdutpjnKduQ1AcpCpHDeWST8NyOa_RT19W+DvrNQ@mail.gmail.com>



On 09/23/2015 01:25 PM, Gregory Farnum wrote:
> On Wed, Sep 23, 2015 at 11:19 AM, Sage Weil <sage@newdream.net> wrote:
>> On Wed, 23 Sep 2015, Deneau, Tom wrote:
>>> Hi all --
>>>
>>> Looking for guidance with perf counters...
>>> I am trying to see whether the perf counters can tell me anything about the following discrepancy
>>>
>>> I populate a number of 40k size objects in each of two pools, poolA and poolB.
>>> Both pools cover osds on a single node, 5 osds total.
>>>
>>>     * Config 1 (1p):
>>>        * use single rados bench client with 32 threads to do seq read of 20000 objects from poolA.
>>>
>>>     * Config 2 (2p):
>>>        * use two concurrent rados bench clients (running on same client node) with 16 threads each,
>>>             one reading 10000 objects from poolA,
>>>             one reading 10000 objects from poolB,
>>>
>>> So in both configs, we have 32 threads total and the number of objects read is the same.
>>> Note: in all cases, we drop the caches before doing the seq reads
>>>
>>> The combined bandwidth (MB/sec) for the 2 clients in config 2 is about 1/3 of the bandwidth for
>>> the single client in config 1.
>>
>> How were the object written?  I assume the cluster is backed by spinning
>> disks?
>>
>> I wonder if this is a disk layout issue.  If the 20,000 objects are
>> written in order, they willb e roughly sequential on disk, and the 32
>> thread case will read them in order.  In the 2x 10,000 case, the two
>> clients are reading two sequences of objects written at different
>> times, and the disk arms will be swinging around more.
>>
>> My guess is that if the reads were reading the objects in a random order
>> the performance would be the same... I'm not sure that rados bench does
>> that though?
>>
>> sage
>>
>>>
>>>
>>> I gathered perf counters before and after each run and looked at the difference of
>>> the before and after counters for both the 1p and 2p cases.  Here are some things I noticed
>>> that are different between the two runs.  Can someone take a look and let me know
>>> whether any of these differences are significant.  In particular, for the
>>> throttle-msgr_dispatch_throttler ones, since I don't know the detailed definitions of these fields.
>>> Note: these are the numbers for one of the 5 osds, the other osds are similar...
>>>
>>> * The field osd/loadavg is always about 3 times higher on the 2p c
>>>
>>> some latency-related counters
>>> ------------------------------
>>> osd/op_latency/sum 1p=6.24801117205061, 2p=579.722513078945
>>> osd/op_process_latency/sum 1p=3.48506945394911, 2p=42.6278494549915
>>> osd/op_r_latency/sum 1p=6.2480111719924, 2p=579.722513079003
>>> osd/op_r_process_latency/sum 1p=3.48506945399276, 2p=42.6278494550061
>
> So, yep, the individual read ops are taking much longer in the
> two-client case. Naively that's pretty odd.
>
>>>
>>>
>>> and some throttle-msgr_dispatch_throttler related counters
>>> ----------------------------------------------------------
>>> throttle-msgr_dispatch_throttler-client/get 1p=1337, 2p=1339, diff=2
>>> throttle-msgr_dispatch_throttler-client/get_sum 1p=222877, 2p=223088, diff=211
>>> throttle-msgr_dispatch_throttler-client/put 1p=1337, 2p=1339, diff=2
>>> throttle-msgr_dispatch_throttler-client/put_sum 1p=222877, 2p=223088, diff=211
>>> throttle-msgr_dispatch_throttler-hb_back_server/get 1p=58, 2p=134, diff=76
>>> throttle-msgr_dispatch_throttler-hb_back_server/get_sum 1p=2726, 2p=6298, diff=3572
>>> throttle-msgr_dispatch_throttler-hb_back_server/put 1p=58, 2p=134, diff=76
>>> throttle-msgr_dispatch_throttler-hb_back_server/put_sum 1p=2726, 2p=6298, diff=3572
>>> throttle-msgr_dispatch_throttler-hb_front_server/get 1p=58, 2p=134, diff=76
>>> throttle-msgr_dispatch_throttler-hb_front_server/get_sum 1p=2726, 2p=6298, diff=3572
>>> throttle-msgr_dispatch_throttler-hb_front_server/put 1p=58, 2p=134, diff=76
>>> throttle-msgr_dispatch_throttler-hb_front_server/put_sum 1p=2726, 2p=6298, diff=3572
>>> throttle-msgr_dispatch_throttler-hbclient/get 1p=168, 2p=252, diff=84
>>> throttle-msgr_dispatch_throttler-hbclient/get_sum 1p=7896, 2p=11844, diff=3948
>>> throttle-msgr_dispatch_throttler-hbclient/put 1p=168, 2p=252, diff=84
>>> throttle-msgr_dispatch_throttler-hbclient/put_sum 1p=7896, 2p=11844, diff=3948
>
> IIRC these are just saying how many times the dispatch throttler was
> accessed on each messenger — nothing here is surprising, you're doing
> basically the same number of messages on the client messengers, and
> the heartbeat messengers are passing more because the test takes
> longer.
>
> I'd go with Sage's idea for what is actually causing this, or try and
> look at how the latency changes over time — if you're going to two
> pools instead of one, presumably you're doubling the amount of
> metadata that needs to be read into memory during the run? Perhaps
> that's just a significant enough effect with your settings that you're
> seeing a bunch of extra directory lookups impact your throughput more
> than expected... :/

FWIW, typically if I've seen an effect, it's been the opposite where 
multiple rados bench processes are slightly faster (maybe simply related 
to the client side implementation).  Running collectl or iostat would 
show various interval statistics that would help diagnose if this is 
related to slower accesses on the disks.  blktrace of course would give 
a more nuanced view.  Might be worth doing if there are extra metadata 
accesses.

A couple of other random thoughts:

- Are any of the reads coming from buffer cache?
- Readahead not working well?
- Were the pools recreated between tests?
- If not, what were the total number of objects and PGs for each pool 
(Is it possible that the per-pg directory hirearchy was deeper for the 
2nd set of tests?)
- If the pools still exist, what does the following script say about them?

https://github.com/ceph/cbt/blob/master/tools/readpgdump.py


> -Greg
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2015-09-23 18:42 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-23 16:33 perf counters from a performance discrepancy Deneau, Tom
2015-09-23 18:19 ` Sage Weil
2015-09-23 18:25   ` Gregory Farnum
2015-09-23 18:42     ` Mark Nelson [this message]
2015-09-23 20:05       ` Deneau, Tom
2015-09-23 20:28         ` Samuel Just
2015-09-23 20:46           ` Deneau, Tom
2015-10-07 22:39       ` Deneau, Tom
2015-10-08  2:47         ` Sage Weil
2015-10-08 15:49           ` Deneau, Tom
2015-10-08 15:55             ` Sage Weil
2015-10-08 17:58               ` Somnath Roy
2015-09-23 20:39 ` Gregory Farnum
2015-09-23 20:51   ` Deneau, Tom
2015-09-23 20:52     ` Gregory Farnum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5602F298.7060609@redhat.com \
    --to=mnelson@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=gfarnum@redhat.com \
    --cc=sage@newdream.net \
    --cc=tom.deneau@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.