From: "Jim Schutt" <jaschut@sandia.gov>
To: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Trouble with paxos service for large PG count
Date: Mon, 1 Apr 2013 15:14:01 -0600 [thread overview]
Message-ID: <5159F899.4090001@sandia.gov> (raw)
Hi,
I've been having trouble starting a new file system
created using the current next branch (most recently,
commit 3b5f663f11).
I believe the trouble is related to how long it takes paxos to
process a pgmap proposal.
For a configuration with 1 mon, 1 mds, and 576 osds, using
pg_bits = 3 and debug paxos = 10, if I start just the monitor,
here's what I get when paxos processes the first non-trivial
pgmap proposal:
2013-04-01 14:04:16.330735 7ffff7fbe780 10 mon.cs31@0(leader).paxosservice(pgmap) propose_pending
2013-04-01 14:04:16.358973 7ffff7fbe780 5 mon.cs31@0(leader).paxos(paxos active c 1..3) queue_proposal bl 4943990 bytes; ctx = 0x11e81f0
2013-04-01 14:04:16.359021 7ffff7fbe780 5 mon.cs31@0(leader).paxos(paxos preparing update c 1..3) propose_queued 4 4943990 bytes
2013-04-01 14:04:16.359025 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos preparing update c 1..3) propose_queued list_proposals 1 in queue:
2013-04-01 14:04:28.096284 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos preparing update c 1..3) begin for 4 4943990 bytes
2013-04-01 14:04:28.164876 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos updating c 1..3) commit 4
2013-04-01 14:04:28.226789 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos active c 1..4) finish_queued_proposal finishing proposal
2013-04-01 14:04:28.226885 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos active c 1..4) finish_queued_proposal finish it (proposal = 0x1260f50)
2013-04-01 14:04:28.226892 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos active c 1..4) finish_queued_proposal proposal took 11.867914 to finish
2013-04-01 14:04:28.226941 7ffff7fbe780 10 mon.cs31@0(leader).paxosservice(pgmap) _active
When I redo the test with pg_bits = 4 (27696 PGs total) I get this:
2013-04-01 14:25:16.993255 7ffff7fbe780 10 mon.cs31@0(leader).paxosservice(pgmap) propose_pending
2013-04-01 14:25:17.051729 7ffff7fbe780 5 mon.cs31@0(leader).paxos(paxos active c 1..3) queue_proposal bl 9887726 bytes; ctx = 0x11e81f0
2013-04-01 14:25:17.051882 7ffff7fbe780 5 mon.cs31@0(leader).paxos(paxos preparing update c 1..3) propose_queued 4 9887726 bytes
2013-04-01 14:25:17.051888 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos preparing update c 1..3) propose_queued list_proposals 1 in queue:
2013-04-01 14:26:08.185143 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos preparing update c 1..3) begin for 4 9887726 bytes
2013-04-01 14:26:08.296351 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos updating c 1..3) commit 4
2013-04-01 14:26:08.408900 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos active c 1..4) finish_queued_proposal finishing proposal
2013-04-01 14:26:08.408928 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos active c 1..4) finish_queued_proposal finish it (proposal = 0x1260f50)
2013-04-01 14:26:08.408931 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos active c 1..4) finish_queued_proposal proposal took 51.357088 to finish
2013-04-01 14:26:08.408974 7ffff7fbe780 10 mon.cs31@0(leader).paxosservice(pgmap) _active
and at pg_bits = 5 (55392 PGs total) I get this:
2013-04-01 14:36:15.385280 7ffff7fbe780 10 mon.cs31@0(leader).paxosservice(pgmap) propose_pending
2013-04-01 14:36:15.500292 7ffff7fbe780 5 mon.cs31@0(leader).paxos(paxos active c 1..3) queue_proposal bl 19775198 bytes; ctx = 0x11e81f0
2013-04-01 14:36:15.500433 7ffff7fbe780 5 mon.cs31@0(leader).paxos(paxos preparing update c 1..3) propose_queued 4 19775198 bytes
2013-04-01 14:36:15.500438 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos preparing update c 1..3) propose_queued list_proposals 1 in queue:
2013-04-01 14:39:43.645062 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos preparing update c 1..3) begin for 4 19775198 bytes
2013-04-01 14:39:43.875099 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos updating c 1..3) commit 4
2013-04-01 14:39:44.086097 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos active c 1..4) finish_queued_proposal finishing proposal
2013-04-01 14:39:44.086128 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos active c 1..4) finish_queued_proposal finish it (proposal = 0x1260f50)
2013-04-01 14:39:44.086136 7ffff7fbe780 10 mon.cs31@0(leader).paxos(paxos active c 1..4) finish_queued_proposal proposal took 208.585710 to finish
2013-04-01 14:39:44.086156 7ffff7fbe780 10 mon.cs31@0(leader).paxosservice(pgmap) _active
For this last configuration, after collecting the above
I waited a bit, started all the OSDs, waited a bit longer,
then collected this:
2013-04-01 14:54:37.364686 7ffff328d700 10 mon.cs31@0(leader).paxosservice(pgmap) propose_pending
2013-04-01 14:54:37.433641 7ffff328d700 5 mon.cs31@0(leader).paxos(paxos active c 1..27) queue_proposal bl 10629660 bytes; ctx = 0x11ece50
2013-04-01 14:54:37.433750 7ffff328d700 5 mon.cs31@0(leader).paxos(paxos preparing update c 1..27) propose_queued 28 10629660 bytes
2013-04-01 14:54:37.433755 7ffff328d700 10 mon.cs31@0(leader).paxos(paxos preparing update c 1..27) propose_queued list_proposals 1 in queue:
2013-04-01 14:55:38.684532 7ffff328d700 10 mon.cs31@0(leader).paxos(paxos preparing update c 1..27) begin for 28 10629660 bytes
2013-04-01 14:55:38.814528 7ffff328d700 10 mon.cs31@0(leader).paxos(paxos updating c 1..27) commit 28
2013-04-01 14:55:38.937087 7ffff328d700 10 mon.cs31@0(leader).paxos(paxos active c 1..28) finish_queued_proposal finishing proposal
2013-04-01 14:55:38.937120 7ffff328d700 10 mon.cs31@0(leader).paxos(paxos active c 1..28) finish_queued_proposal finish it (proposal = 0x1c6e3c0)
2013-04-01 14:55:38.937124 7ffff328d700 10 mon.cs31@0(leader).paxos(paxos active c 1..28) finish_queued_proposal proposal took 61.503375 to finish
2013-04-01 14:55:38.937168 7ffff328d700 10 mon.cs31@0(leader).paxosservice(pgmap) _active
It looks like finish_queued_proposal processing time is scaling
quadratically with the proposal length for pgmaps.
FWIW, I don't believe I saw any issues of this sort for
versions 0.58 and earlier.
Please let me know if there is any other information I can
provide that will help to help fix this.
Thanks -- Jim
next reply other threads:[~2013-04-01 21:14 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-01 21:14 Jim Schutt [this message]
2013-04-02 13:41 ` Trouble with paxos service for large PG count Joao Eduardo Luis
2013-04-02 15:37 ` Joao Eduardo Luis
2013-04-02 15:42 ` Joao Eduardo Luis
2013-04-02 18:18 ` Jim Schutt
[not found] ` <CAGgG2xTdTOWvaPAK2FnxpzC1fsok50_HXXSgayEoD3J8SK6k3w@mail.gmail.com>
2013-04-02 19:16 ` Jim Schutt
2013-04-02 20:36 ` Jim Schutt
2013-04-02 23:24 ` Joao Eduardo Luis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5159F899.4090001@sandia.gov \
--to=jaschut@sandia.gov \
--cc=ceph-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.