* Re: [ceph-users] CephFS test-case [not found] ` <CACSYr9RcSbX8qhTA3QCsFOT0qLi4x5hVqCitwHzdVOASx_UNsA@mail.gmail.com> @ 2013-09-06 23:22 ` Sage Weil 2013-09-06 23:29 ` Mark Nelson 0 siblings, 1 reply; 3+ messages in thread From: Sage Weil @ 2013-09-06 23:22 UTC (permalink / raw) To: Nigel Williams; +Cc: ceph-devel [re-adding ceph-devel] On Sat, 7 Sep 2013, Nigel Williams wrote: > On Sat, Sep 7, 2013 at 1:27 AM, Sage Weil <sage@inktank.com> wrote: > > It sounds like the problem is cluster B's pools have too few PGs, making > > the data distribution get all out of whack. > > Agree, it was too few PGs, I have no re-adjusted and it is busy > backfilling and evening out the data-distribution across the OSDs. > > My overall point is that the out-of-the-box defaults don't provide a > stable test-deployment (whereas older versions like 0.61 did), and so > minimally perhaps ceph-deploy needs to have a stab at choosing a > workable value of PGs? or alternatively the health warning could > include a note about PGs being too low. I agree; this is a general problem that we need to come up with a better solution to. One idea: - make ceph health warn when the pg distribution looks "bad" - too few pgs relative the # of osds - too many objects in a pool relative to the # of pgs and the above (We'll need to be a little creative to make thresholds that make sense.) If we have an interactive ceph-deploy new, we can also estimate how big the cluster will get and make a more sensible starting count. I like that less, though, as it potentially confusing and has more room for user error. sage > > > ceph osd dump | grep ^pool > > say, and how many OSDs do you have? > > I assume you mean PGs, it was the default (192?) and changing it to > 400 seems to have helped. There are 12 OSDs (4 per server, 3 servers). > > ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [ceph-users] CephFS test-case 2013-09-06 23:22 ` [ceph-users] CephFS test-case Sage Weil @ 2013-09-06 23:29 ` Mark Nelson 2013-09-07 0:39 ` Nigel Williams 0 siblings, 1 reply; 3+ messages in thread From: Mark Nelson @ 2013-09-06 23:29 UTC (permalink / raw) To: Sage Weil; +Cc: Nigel Williams, ceph-devel On 09/06/2013 06:22 PM, Sage Weil wrote: > [re-adding ceph-devel] > > On Sat, 7 Sep 2013, Nigel Williams wrote: > >> On Sat, Sep 7, 2013 at 1:27 AM, Sage Weil <sage@inktank.com> wrote: >>> It sounds like the problem is cluster B's pools have too few PGs, making >>> the data distribution get all out of whack. >> >> Agree, it was too few PGs, I have no re-adjusted and it is busy >> backfilling and evening out the data-distribution across the OSDs. >> >> My overall point is that the out-of-the-box defaults don't provide a >> stable test-deployment (whereas older versions like 0.61 did), and so >> minimally perhaps ceph-deploy needs to have a stab at choosing a >> workable value of PGs? or alternatively the health warning could >> include a note about PGs being too low. > > I agree; this is a general problem that we need to come up with a better > solution to. > > One idea: > > - make ceph health warn when the pg distribution looks "bad" > - too few pgs relative the # of osds > - too many objects in a pool relative to the # of pgs and the > above > > (We'll need to be a little creative to make thresholds that make sense.) > > If we have an interactive ceph-deploy new, we can also estimate how big > the cluster will get and make a more sensible starting count. I like that > less, though, as it potentially confusing and has more room for user > error. At one point Sam and I were discussing some kind of message that wouldn't be a health warning, but something kind of similar to what you are discussing here. The idea is this would be for when Ceph thinks something is configured sub-optimally, but the issue doesn't necessarily affect the health of the cluster (at least in so much as everything is functioning as defined). We were concerned that people might not want more things causing health warnings. > > sage > > >> >>> ceph osd dump | grep ^pool >>> say, and how many OSDs do you have? >> >> I assume you mean PGs, it was the default (192?) and changing it to >> 400 seems to have helped. There are 12 OSDs (4 per server, 3 servers). >> >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [ceph-users] CephFS test-case 2013-09-06 23:29 ` Mark Nelson @ 2013-09-07 0:39 ` Nigel Williams 0 siblings, 0 replies; 3+ messages in thread From: Nigel Williams @ 2013-09-07 0:39 UTC (permalink / raw) To: Mark Nelson; +Cc: Sage Weil, ceph-devel One way might be to have a nag system, with a global flag that can turn nags off at the cluster level (for production deployments), but the nags are added to the cluster-state messages on a regular basis to remind operators that there is something to investigate. Having an indexed list of nags would allow them to be turned off individually. ceph --nagoff 1 List of example of nags: 1: There are too few placement groups (PGs) for the number of nodes and OSDs, adjust with: ceph osd pool set data pg_num XXX and: ceph osd pool set data pgp_num XXX 2: There are too few MONs deployment for the number of nodes 3: An even number of MONs are configured, consider removing or adding one for better efficiency 4: Journal size is too small for the write traffic, consider increasing to ... ... 14: /sys/block/.../queue/nr_requests is too low, adjust higher with: echo 512 > /sys/block/sdb/queue/nr_requests, observe with iostat -x /dev/sdX 15: Target Transaction size is too small for this CPU, adjust higher with ceph osd set_target_transaction_size = 50 99: I am nagging too much aren't I... ...etc... ceph -nagon 3 Nag me about #3 as a reminder. Definition of "nag" v. nagged, nag·ging, nags v.tr. 1. To annoy by constant scolding, complaining, or urging. 2. To torment persistently, as with anxiety or pain. This subsystem could be called ceph-wife, but I might get into trouble for that suggestion. On Sat, Sep 7, 2013 at 9:29 AM, Mark Nelson <mark.nelson@inktank.com> wrote: > At one point Sam and I were discussing some kind of message that wouldn't be > a health warning, but something kind of similar to what you are discussing > here. The idea is this would be for when Ceph thinks something is > configured sub-optimally, but the issue doesn't necessarily affect the > health of the cluster (at least in so much as everything is functioning as > defined). We were concerned that people might not want more things causing > health warnings. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-09-07 0:39 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CACSYr9TsjHq9uWx816aPrW0SDYqwhfyfVkfcR0L-jsk-XcH9Ag@mail.gmail.com>
[not found] ` <alpine.DEB.2.00.1309060825410.2805@cobra.newdream.net>
[not found] ` <CACSYr9RcSbX8qhTA3QCsFOT0qLi4x5hVqCitwHzdVOASx_UNsA@mail.gmail.com>
2013-09-06 23:22 ` [ceph-users] CephFS test-case Sage Weil
2013-09-06 23:29 ` Mark Nelson
2013-09-07 0:39 ` Nigel Williams
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.