From: "Jim Schutt" <jaschut@sandia.gov>
To: Sage Weil <sage@inktank.com>
Cc: ceph-devel@vger.kernel.org
Subject: Re: pg balancing
Date: Wed, 5 Jun 2013 13:49:21 -0600 [thread overview]
Message-ID: <51AF9641.60403@sandia.gov> (raw)
In-Reply-To: <alpine.DEB.2.00.1305131730380.10961@cobra.newdream.net>
Hi Sage,
On 05/13/2013 06:35 PM, Sage Weil wrote:
> Hi Jim-
>
> You mentioned the other day your concerns about the uniformity of the PG
> and data distribution. There are several ways to attack it (including
> increasing the number of PGs), but one that we haven't tested much yet is
> the 'reweight-by-utilization' function in the monitor.
>
> The idea is that there will always be some statistical variance in the
> distribution and a non-zero probability of having outlier OSDs with too
> many PG. We adjust for this by taking nodes that are substantially above
> the mean down by some adjustment factor in an automated way.
>
> ceph osd reweight-by-utilization MIN
>
> where MIN is the minimum relative utilization at which we will start
> adjusting down. It is always > 100 (100% of the mean), and defaults to
> 120. After it adjusts the reweights, you should see the result in 'ceph
> osd tree' output
>
> Have you played with this at all on your cluster? I'd be very interested
> in how well this does/does not improve things for you.
I've been experimenting with re-weighting, and have found that
it works well to redistribute data, as you expected.
Here's a few observations:
- when an OSD goes out and comes back in, its weight gets reset
to 1. It would be nice if it could remember its old weight.
- in order to reach the data distribution uniformity I'm after, I
need to run multiple iterations of re-weighting - each iteration
pushes data off the most highly utilized OSDs, but some ends up
on average OSDs and pushes them over the limit.
- as you expected, to reach the uniformity I'm after, a _lot_ of data
needs to move. I've got some scripts I'm using to generate
'ceph osd reweight OSD WEIGHT' commands based on PG distribution,
and I can use these after I create a new filesystem, to get a
suitably uniform PG distribution before there is any data to
move. Some iteration is required here as well, and this is
working really well for me. When I start writing data to such
a re-weighted filesystem, the data distribution pretty closely
mirrors the PG distribution (once you write enough data).
- re-weighting to get a more uniform data distribution works better
if there are more PGs to work with. At 576 OSDs, I can't quite
get things as uniform as I'd like with 64K PGs, but I can with
128K PGs. FWIW, here's the (max PGs/OSD)/(min PGs/OSD) I've
measured for various numbers of PGs on 576 OSDs, with no
re-weighting:
PGs (max PGs/OSD) / (min PGs/OSD)
65536 1.478
131073 1.308
262144 1.240
524288 1.155
1048576 1.105
(BTW, your recent leveldb work enabled those 512K and 1M
measurements. Thanks!)
With 128K PGs and iterative re-weighting, I can get
(max PGs/OSD)/(min PGs/OSD) < 1.05, and after writing
enough data to consume ~33% of available storage, I
get (max OSD data use)/(min OSD data use) ~ 1.06.
OSD weights end up in the 0.85 - 1.0 range for such a
distribution.
So, re-weighting is definitely working for me.
-- Jim
>
> Thanks!
> sage
>
>
>
prev parent reply other threads:[~2013-06-05 19:49 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-14 0:35 pg balancing Sage Weil
2013-05-14 15:10 ` Chen, Xiaoxi
2013-05-14 15:25 ` Sage Weil
2013-05-14 15:39 ` Jim Schutt
2013-06-05 19:49 ` Jim Schutt [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51AF9641.60403@sandia.gov \
--to=jaschut@sandia.gov \
--cc=ceph-devel@vger.kernel.org \
--cc=sage@inktank.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.