From: John Spray <john.spray@redhat.com>
To: Robert LeBlanc <robert@leblancnet.us>
Cc: Gregory Farnum <greg@gregs42.com>,
"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: RFC: progress bars
Date: Thu, 28 May 2015 17:52:04 +0100 [thread overview]
Message-ID: <556747B4.1010808@redhat.com> (raw)
In-Reply-To: <CAANLjFoGRNqGH34BsFKp54R3L5o1x8HKi+s9FQrN+HtbbQjBbg@mail.gmail.com>
On 28/05/2015 17:41, Robert LeBlanc wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Let me see if I understand this... Your idea is to have a progress bar
> that show (active+clean + active+scrub + active+deep-scrub) / pgs and
> then estimate time remaining?
Not quite: it's not about doing a calculation on the global PG state
counts. The code identifies specific PGs affected by specific
operations, and then watches the status of those PGs.
>
> So if PGs are split the numbers change and the progress bar go
> backwards, is that a big deal?
I don't see a case where the progress bars go backwards with the code I
have so far? In the case of operations on PGs that split, it'll just
ignore the new PGs, but you'll get a separate event tracking the
creation of the new ones. In general, progress bars going backwards
isn't something we should allow to happen (happy to hear counter
examples though, I'm mainly speaking from intuition on that point!)
If this was extended to track operations across PG splits (it's unclear
to me that that complexity is worthwhile), then the bar still wouldn't
need to go backwards, as whatever stat was being tracked would remain
the same when summed across the newly split PGs.
> I don't think so, it might take a
> little time to recalculate how long it will take, but no big deal. I
> do like the idea of the progress bar even if it is fuzzy. I keep
> running ceph status or ceph -w to watch things and have to imagine it
> in my mind.
Right, the idea is to save the admin from having to interpret PG counts
mentally.
> It might be nice to have some other stats like client I/O
> and rebuild I/O so that I can see if recovery is impacting production
> I/O.
We already have some of these stats globally, but it would be nice to be
able to reason about what proportion of I/O is associated with specific
operations, e.g. "I have some total recovery IO number, what proportion
of that is due to a particular drive failure?". Without going and
looking at current pg stat structures I don't know if there is enough
data in the mon right now to guess those numbers. This would
*definitely* be heuristic rather than exact, in any case.
Cheers,
John
prev parent reply other threads:[~2015-05-28 16:52 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-07 12:54 RFC: progress bars John Spray
2015-05-28 5:47 ` Gregory Farnum
2015-05-28 10:13 ` John Spray
2015-05-28 16:41 ` Robert LeBlanc
2015-05-28 16:52 ` John Spray [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=556747B4.1010808@redhat.com \
--to=john.spray@redhat.com \
--cc=ceph-devel@vger.kernel.org \
--cc=greg@gregs42.com \
--cc=robert@leblancnet.us \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.