From: Loic Dachary <loic@dachary.org>
To: Sage Weil <sweil@redhat.com>
Cc: Samuel Just <sam.just@inktank.com>,
Ceph Development <ceph-devel@vger.kernel.org>
Subject: Re: Reducing backfilling/recovery long tail
Date: Fri, 12 Dec 2014 18:59:42 +0100 [thread overview]
Message-ID: <548B2D0E.2030004@dachary.org> (raw)
In-Reply-To: <alpine.DEB.2.00.1412120812070.23559@cobra.newdream.net>
[-- Attachment #1: Type: text/plain, Size: 2767 bytes --]
On 12/12/2014 17:12, Sage Weil wrote:
> On Fri, 12 Dec 2014, Loic Dachary wrote:
>> Hi Sam & Sage,
>>
>> In the context of http://tracker.ceph.com/issues/9566 I'm inclined to
>> think the best solution would be that the AsyncReserver choose a PG
>> instead of just picking the next one in the list when there is a free
>> slot. It would always choose a PG that must move to/from an OSDs for
>> which there are more PGs waiting in the AsyncRerserver than any other
>> OSD. The sort involved does not seem too expensive.
>>
>> Calculating priorities before adding the PG to the AsyncReserver seems
>> wrong because the state of the system will change significantly while
>> the PG is waiting to be processed. For instance the first PGs to be
>> added have a low priority while the next have increasing priorities when
>> they accumulate. If reservations are canceled because the OSD map
>> changed again (maybe another OSD is decommissioned before recovery of
>> the first one completes), you may end up having high priorities for PGs
>> that are no longer associated with busy OSDs. That could backfire and
>> create even more frequent long tails.
>>
>> What do you think ?
>
> That makes sense. In order to make that decision, it means that the OSDs
> need to be sharing the level of recovery work they have pending on a
> regular basis, right?
>
It may not be necessary. The local_reserver is populated with all PGs that need to move. Say 50 of them are for osd.0 and 10 are for osd.1. The decision is made to schedule a PG for osd.0 because it has more PG to go. This PG will then try to get a remote_reserver slot on osd.0 : if it turns out that osd.0 already is busy, it will be queued. Up to osd_max_backfill can be queued for a given osd in the remote_reserver in this way because only osd_max_backfill PGs will get a slot in the local_reserver. Since the remote_reserver queue is capped by osd_max_backfill, its length does not accurately reflect the workload associated to an OSD. For this reason the priority could be modified when asking for the remote reservation (the priority field that we currently have) to reflect the workload. If the workload change while PGs are waiting in the remote_reserver queue, it could be that these PGs are given a priority that is sub-optimal. It is probably an acceptable tradeoff since it impacts onl
y osd_max_backfill PGs per osd. In contrast, hundreds of PGs could be queued in the local_reserver and setting a priority for them at the time they are queued could have lasting undesirable side effects.
I should probably enumerate the steps of an actual situation to clarify my thinking :-)
Cheers
> sage
>
--
Loïc Dachary, Artisan Logiciel Libre
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
prev parent reply other threads:[~2014-12-12 17:59 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-12 15:16 Reducing backfilling/recovery long tail Loic Dachary
2014-12-12 16:12 ` Sage Weil
2014-12-12 17:59 ` Loic Dachary [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=548B2D0E.2030004@dachary.org \
--to=loic@dachary.org \
--cc=ceph-devel@vger.kernel.org \
--cc=sam.just@inktank.com \
--cc=sweil@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.