Re: [ceph-users] Possible bug in op path?

ceph-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Robert LeBlanc <robert@leblancnet.us>
To: Dan van der Ster <dan@vanderster.com>
Cc: ceph-users <ceph-users@ceph.io>, ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: [ceph-users] Possible bug in op path?
Date: Wed, 20 May 2020 09:56:51 -0700	[thread overview]
Message-ID: <CAANLjFq+EerkocoYN0MW4VaGRHOWdNPs2HLsROLtcu374B6sMg@mail.gmail.com> (raw)
In-Reply-To: <CABZ+qqn7DwiDO1Ecynx8zY_JuEP81AKemoesSE4F8dkhLtAmLg@mail.gmail.com>

We are using high and the people on the list that have also changed
have not seen the improvements that I would expect.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Wed, May 20, 2020 at 1:38 AM Dan van der Ster <dan@vanderster.com> wrote:
>
> Hi Robert,
>
> Since you didn't mention -- are you using osd_op_queue_cut_off low or
> high? I know you are usually advocating high, but the default is still
> low and most users don't change this setting.
>
> Cheers, Dan
>
>
> On Wed, May 20, 2020 at 9:41 AM Robert LeBlanc <robert@leblancnet.us> wrote:
> >
> > We upgraded our Jewel cluster to Nautilus a few months ago and I've noticed
> > that op behavior has changed. This is an HDD cluster (NVMe journals and
> > NVMe CephFS metadata pool) with about 800 OSDs. When on Jewel and running
> > WPQ with the high cut-off, it was rock solid. When we had recoveries going
> > on it barely dented the client ops and when the client ops on the cluster
> > went down the backfills would run as fast as the cluster could go. I could
> > have max_backfills set to 10 and the cluster performed admirably.
> > After upgrading to Nautilus the cluster struggles with any kind of recovery
> > and if there is any significant client write load the cluster can get into
> > a death spiral. Even heavy client write bandwidth (3-4 GB/s) can cause the
> > heartbeat checks to raise, blocked IO and even OSDs becoming unresponsive.
> > As the person who wrote the WPQ code initially, I know that it was fair and
> > proportional to the op priority and in Jewel it worked. It's not working in
> > Nautilus. I've tweaked a lot of things trying to troubleshoot the issue and
> > setting the recovery priority to 1 or zero barely makes any difference. My
> > best estimation is that the op priority is getting lost before reaching the
> > WPQ scheduler and is thus not prioritizing and dispatching ops correctly.
> > It's almost as if all ops are being treated the same and there is no
> > priority at all.
> > Unfortunately, I do not have the time to set up the dev/testing environment
> > to track this down and we will be moving away from Ceph. But I really like
> > Ceph and want to see it succeed. I strongly suggest that someone look into
> > this because I think it will resolve a lot of problems people have had on
> > the mailing list. I'm not sure if a bug was introduced with the other
> > queues that touches more of the op path or if something in the op path
> > restructuring that changed how things work (I know that was being discussed
> > around the time that Jewel was released). But my guess is that it is
> > somewhere between the op being created and being received into the queue.
> > I really hope that this helps in the search for this regression. I spent a
> > lot of time studying the issue to come up with WPQ and saw it work great
> > when I switched this cluster from PRIO to WPQ. I've also spent countless
> > hours studying how it's changed in Nautilus.
> >
> > Thank you,
> > Robert LeBlanc
> > ----------------
> > Robert LeBlanc
> > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-leave@ceph.io

     prev parent reply	other threads:[~2020-05-20 16:57 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-20  7:40 Possible bug in op path? Robert LeBlanc
2020-05-20  7:50 ` Fwd: " Robert LeBlanc
2020-05-20  8:37 ` [ceph-users] " Dan van der Ster
2020-05-20 16:56   ` Robert LeBlanc [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAANLjFq+EerkocoYN0MW4VaGRHOWdNPs2HLsROLtcu374B6sMg@mail.gmail.com \
    --to=robert@leblancnet.us \
    --cc=ceph-devel@vger.kernel.org \
    --cc=ceph-users@ceph.io \
    --cc=dan@vanderster.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).