All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andreas Bluemle <andreas.bluemle@itxperts.de>
To: ceph-devel@vger.kernel.org
Subject: SimpleMessenger dispatching: cause of performance problems?
Date: Thu, 16 Aug 2012 18:08:39 +0200	[thread overview]
Message-ID: <502D1B07.3030101@itxperts.de> (raw)

[-- Attachment #1: Type: text/plain, Size: 3202 bytes --]

Hi,

I have been trying to migrate a ceph cluster (ceph-0.48argonaut)
to a high speed cluster network and encounter scalability problems:
the overall performance of the ceph cluster does not scale well
with an increase in the underlying networking speed.

In short:

I believe that the dispatching from SimpleMessenger to
OSD worker queues causes that scalability issue.

Question: is it possible that this dispatching is causing performance 
problems?


In detail:

In order to find out more about this problem, I have added profiling to
the ceph code in various place; for write operations to the primary or the
secondary, timestamps are recorded for OSD object, offset and length of
the such a write request.

Timestamps record:
  - receipt time at SimpleMessenger
  - processing time at osd
  - for primary write operations: wait time until replication operation
    is acknowledged.

What I believe is happening: dispatching requests from SimpleMessenger to
OSD worker threads seems to consume a fair amount of time. This ends
up in a widening gap between subsequent receipts of requests and the start
of OSD processing them.

A primary write suffers twice from this problem: first because
the delay happens on the primary OSD and second because the replicating
OSD also suffers from the same problem - and hence causes additional delays
at the primary OSD when it waits for the commit from the replicating OSD.

In the attached graphics, the x-axis shows the time (in seconds)
The y-axis shows the offset where a request to write happened.

The red bar represents the SimpleMessenger receive, i.e. from reading
the message header until enqueuing the completely decoded message into
the SImpleMessenger dispatch queue.

The green bar represents the time required for local processing, i.e.
dispatching the the OSD worker, writing to filesystem and journal, send
out the replication operation to the replicating OSD. It right
end of the green bar is the time when locally everything has finished
and a commit could happen.

The blue bar represents the time until the replicating OSD has sent a commit
back to the primary OSD and the original write request can be committed to
the client.

The green bar is interrupted by a black bar: the left end represents
the time when the request has been enqueued on the OSD worker queue. The
right end gives the time when the request is taken off the OSD worker
queue and actual OSD processing starts.

The test was a simple sequential write to a rados block device.

Receiption of the write requests at the OSD is also sequential in the
graphics: the bar to the bottom of the graphics shows an earlier write 
request.

Note that the dispatching of a later request in all cases relates to the
enqueue time at the OSD worker queue of the previous write request: the left
end of a black bar relates nicely to the beginning of a green bar above it.



-- 
Andreas Bluemle                     mailto:Andreas.Bluemle@itxperts.de
ITXperts GmbH                       http://www.itxperts.de
Balanstrasse 73, Geb. 08            Phone: (+49) 89 89044917
D-81541 Muenchen (Germany)          Fax:   (+49) 89 89044910

Company details: http://www.itxperts.de/imprint.htm


[-- Attachment #2: dispatch-detail.png --]
[-- Type: image/png, Size: 7157 bytes --]

             reply	other threads:[~2012-08-16 16:08 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-16 16:08 Andreas Bluemle [this message]
2012-08-16 16:24 ` SimpleMessenger dispatching: cause of performance problems? Gregory Farnum
2012-08-16 16:44 ` Yehuda Sadeh
2012-08-17  7:09   ` Andreas Bluemle
2012-08-17  7:30   ` Andreas Bluemle
2012-08-17 12:01   ` Andreas Bluemle
2012-08-16 16:58 ` Sage Weil
2012-08-20 12:39   ` Andreas Bluemle
2012-08-20 20:39     ` Sage Weil
2012-08-21  9:49       ` Andreas Bluemle
2012-08-21 12:43         ` Mark Nelson
2012-08-21 18:13         ` Sage Weil
2012-08-21 19:20           ` Sage Weil
2012-08-21 20:34             ` Samuel Just
2012-08-22  5:29               ` Andreas Bluemle
2012-08-22 17:08                 ` Samuel Just
2012-08-23  9:37                   ` Andreas Bluemle
2012-09-04  6:09                     ` Andreas Bluemle

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=502D1B07.3030101@itxperts.de \
    --to=andreas.bluemle@itxperts.de \
    --cc=ceph-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.