* Re: [ceph-users] CephFS test-case
[not found] ` <CACSYr9RcSbX8qhTA3QCsFOT0qLi4x5hVqCitwHzdVOASx_UNsA@mail.gmail.com>
@ 2013-09-06 23:22 ` Sage Weil
2013-09-06 23:29 ` Mark Nelson
0 siblings, 1 reply; 3+ messages in thread
From: Sage Weil @ 2013-09-06 23:22 UTC (permalink / raw)
To: Nigel Williams; +Cc: ceph-devel
[re-adding ceph-devel]
On Sat, 7 Sep 2013, Nigel Williams wrote:
> On Sat, Sep 7, 2013 at 1:27 AM, Sage Weil <sage@inktank.com> wrote:
> > It sounds like the problem is cluster B's pools have too few PGs, making
> > the data distribution get all out of whack.
>
> Agree, it was too few PGs, I have no re-adjusted and it is busy
> backfilling and evening out the data-distribution across the OSDs.
>
> My overall point is that the out-of-the-box defaults don't provide a
> stable test-deployment (whereas older versions like 0.61 did), and so
> minimally perhaps ceph-deploy needs to have a stab at choosing a
> workable value of PGs? or alternatively the health warning could
> include a note about PGs being too low.
I agree; this is a general problem that we need to come up with a better
solution to.
One idea:
- make ceph health warn when the pg distribution looks "bad"
- too few pgs relative the # of osds
- too many objects in a pool relative to the # of pgs and the
above
(We'll need to be a little creative to make thresholds that make sense.)
If we have an interactive ceph-deploy new, we can also estimate how big
the cluster will get and make a more sensible starting count. I like that
less, though, as it potentially confusing and has more room for user
error.
sage
>
> > ceph osd dump | grep ^pool
> > say, and how many OSDs do you have?
>
> I assume you mean PGs, it was the default (192?) and changing it to
> 400 seems to have helped. There are 12 OSDs (4 per server, 3 servers).
>
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [ceph-users] CephFS test-case
2013-09-06 23:22 ` [ceph-users] CephFS test-case Sage Weil
@ 2013-09-06 23:29 ` Mark Nelson
2013-09-07 0:39 ` Nigel Williams
0 siblings, 1 reply; 3+ messages in thread
From: Mark Nelson @ 2013-09-06 23:29 UTC (permalink / raw)
To: Sage Weil; +Cc: Nigel Williams, ceph-devel
On 09/06/2013 06:22 PM, Sage Weil wrote:
> [re-adding ceph-devel]
>
> On Sat, 7 Sep 2013, Nigel Williams wrote:
>
>> On Sat, Sep 7, 2013 at 1:27 AM, Sage Weil <sage@inktank.com> wrote:
>>> It sounds like the problem is cluster B's pools have too few PGs, making
>>> the data distribution get all out of whack.
>>
>> Agree, it was too few PGs, I have no re-adjusted and it is busy
>> backfilling and evening out the data-distribution across the OSDs.
>>
>> My overall point is that the out-of-the-box defaults don't provide a
>> stable test-deployment (whereas older versions like 0.61 did), and so
>> minimally perhaps ceph-deploy needs to have a stab at choosing a
>> workable value of PGs? or alternatively the health warning could
>> include a note about PGs being too low.
>
> I agree; this is a general problem that we need to come up with a better
> solution to.
>
> One idea:
>
> - make ceph health warn when the pg distribution looks "bad"
> - too few pgs relative the # of osds
> - too many objects in a pool relative to the # of pgs and the
> above
>
> (We'll need to be a little creative to make thresholds that make sense.)
>
> If we have an interactive ceph-deploy new, we can also estimate how big
> the cluster will get and make a more sensible starting count. I like that
> less, though, as it potentially confusing and has more room for user
> error.
At one point Sam and I were discussing some kind of message that
wouldn't be a health warning, but something kind of similar to what you
are discussing here. The idea is this would be for when Ceph thinks
something is configured sub-optimally, but the issue doesn't necessarily
affect the health of the cluster (at least in so much as everything is
functioning as defined). We were concerned that people might not want
more things causing health warnings.
>
> sage
>
>
>>
>>> ceph osd dump | grep ^pool
>>> say, and how many OSDs do you have?
>>
>> I assume you mean PGs, it was the default (192?) and changing it to
>> 400 seems to have helped. There are 12 OSDs (4 per server, 3 servers).
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [ceph-users] CephFS test-case
2013-09-06 23:29 ` Mark Nelson
@ 2013-09-07 0:39 ` Nigel Williams
0 siblings, 0 replies; 3+ messages in thread
From: Nigel Williams @ 2013-09-07 0:39 UTC (permalink / raw)
To: Mark Nelson; +Cc: Sage Weil, ceph-devel
One way might be to have a nag system, with a global flag that can
turn nags off at the cluster level (for production deployments), but
the nags are added to the cluster-state messages on a regular basis to
remind operators that there is something to investigate.
Having an indexed list of nags would allow them to be turned off individually.
ceph --nagoff 1
List of example of nags:
1: There are too few placement groups (PGs) for the number of nodes
and OSDs, adjust with: ceph osd pool set data pg_num XXX and: ceph osd
pool set data pgp_num XXX
2: There are too few MONs deployment for the number of nodes
3: An even number of MONs are configured, consider removing or adding
one for better efficiency
4: Journal size is too small for the write traffic, consider increasing to ...
...
14: /sys/block/.../queue/nr_requests is too low, adjust higher with:
echo 512 > /sys/block/sdb/queue/nr_requests, observe with iostat -x
/dev/sdX
15: Target Transaction size is too small for this CPU, adjust higher
with ceph osd set_target_transaction_size = 50
99: I am nagging too much aren't I...
...etc...
ceph -nagon 3
Nag me about #3 as a reminder.
Definition of "nag"
v. nagged, nag·ging, nags
v.tr.
1. To annoy by constant scolding, complaining, or urging.
2. To torment persistently, as with anxiety or pain.
This subsystem could be called ceph-wife, but I might get into trouble
for that suggestion.
On Sat, Sep 7, 2013 at 9:29 AM, Mark Nelson <mark.nelson@inktank.com> wrote:
> At one point Sam and I were discussing some kind of message that wouldn't be
> a health warning, but something kind of similar to what you are discussing
> here. The idea is this would be for when Ceph thinks something is
> configured sub-optimally, but the issue doesn't necessarily affect the
> health of the cluster (at least in so much as everything is functioning as
> defined). We were concerned that people might not want more things causing
> health warnings.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-09-07 0:39 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CACSYr9TsjHq9uWx816aPrW0SDYqwhfyfVkfcR0L-jsk-XcH9Ag@mail.gmail.com>
[not found] ` <alpine.DEB.2.00.1309060825410.2805@cobra.newdream.net>
[not found] ` <CACSYr9RcSbX8qhTA3QCsFOT0qLi4x5hVqCitwHzdVOASx_UNsA@mail.gmail.com>
2013-09-06 23:22 ` [ceph-users] CephFS test-case Sage Weil
2013-09-06 23:29 ` Mark Nelson
2013-09-07 0:39 ` Nigel Williams
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.