CEPH filesystem development
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Ilya Dryomov <idryomov@gmail.com>, "Yan, Zheng" <ukernel@gmail.com>
Cc: ceph-devel <ceph-devel@vger.kernel.org>,
	Patrick Donnelly <pdonnell@redhat.com>
Subject: Re: [RFC PATCH 0/4] ceph: fix spurious recover_session=clean errors
Date: Tue, 29 Sep 2020 08:48:22 -0400	[thread overview]
Message-ID: <3224feb0441327729dc777666c33042b4ced82a8.camel@kernel.org> (raw)
In-Reply-To: <CAOi1vP_E9he3RaTHAZ3qeXGe1xgcSkEXdrCYOY7rjab4-vr=6w@mail.gmail.com>

On Tue, 2020-09-29 at 12:58 +0200, Ilya Dryomov wrote:
> On Tue, Sep 29, 2020 at 12:44 PM Yan, Zheng <ukernel@gmail.com> wrote:
> > On Tue, Sep 29, 2020 at 4:55 PM Ilya Dryomov <idryomov@gmail.com> wrote:
> > > On Tue, Sep 29, 2020 at 10:28 AM Yan, Zheng <ukernel@gmail.com> wrote:
> > > > On Fri, Sep 25, 2020 at 10:08 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > > > Ilya noticed that he would get spurious EACCES errors on calls done just
> > > > > after blocklisting the client on mounts with recover_session=clean. The
> > > > > session would get marked as REJECTED and that caused in-flight calls to
> > > > > die with EACCES. This patchset seems to smooth over the problem, but I'm
> > > > > not fully convinced it's the right approach.
> > > > > 
> > > > 
> > > > the root is cause is that client does not recover session instantly
> > > > after getting rejected by mds. Before session gets recovered, client
> > > > continues to return error.
> > > 
> > > Hi Zheng,
> > > 
> > > I don't think it's about whether that happens instantly or not.
> > > In the example from [1], the first "ls" would fail even if issued
> > > minutes after the session reject message and the reconnect.  From
> > > the user's POV it is well after the automatic recovery promised by
> > > recover_session=clean.
> > > 
> > > [1] https://tracker.ceph.com/issues/47385
> > 
> > Reconnect should close all old session. It's likely because that
> > client didn't detect it's blacklisted.
> 
> Sorry, I should have pasted dmesg there as well.  It _does_ detect
> blacklisting -- notice that I wrote "after the session reject message
> and the reconnect".
> 

Yep, this is pretty easy to reproduce too (as Ilya points out in the tracker).

I'm open to other ways of smoothing this over. If we end up with a small
window where errors can occur, then so be it, but I think we can
probably do better than we have now.

-- 
Jeff Layton <jlayton@kernel.org>


  reply	other threads:[~2020-09-29 12:48 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-25 14:08 [RFC PATCH 0/4] ceph: fix spurious recover_session=clean errors Jeff Layton
2020-09-25 14:08 ` [RFC PATCH 1/4] ceph: don't WARN when removing caps due to blocklisting Jeff Layton
2020-09-25 14:08 ` [RFC PATCH 2/4] ceph: don't mark mount as SHUTDOWN when recovering session Jeff Layton
2020-09-29  8:20   ` Yan, Zheng
2020-09-29 12:30     ` Jeff Layton
2020-09-25 14:08 ` [RFC PATCH 3/4] ceph: remove timeout on allowing reconnect after blocklisting Jeff Layton
2020-09-25 14:08 ` [RFC PATCH 4/4] ceph: queue request when CLEANRECOVER is set Jeff Layton
2020-09-29  8:31   ` Yan, Zheng
2020-09-29 12:46     ` Jeff Layton
2020-09-29 19:55   ` Jeff Layton
2020-09-29  8:28 ` [RFC PATCH 0/4] ceph: fix spurious recover_session=clean errors Yan, Zheng
2020-09-29  8:54   ` Ilya Dryomov
2020-09-29 10:44     ` Yan, Zheng
2020-09-29 10:58       ` Ilya Dryomov
2020-09-29 12:48         ` Jeff Layton [this message]
2020-09-29 19:50       ` Jeff Layton
2020-09-30  8:45         ` Yan, Zheng
2020-09-30 17:55           ` Jeff Layton
2020-09-30 12:10 ` [RFC PATCH v2 " Jeff Layton
2020-09-30 12:10   ` [RFC PATCH v2 1/4] ceph: don't WARN when removing caps due to blocklisting Jeff Layton
2020-09-30 12:10   ` [RFC PATCH v2 2/4] ceph: don't mark mount as SHUTDOWN when recovering session Jeff Layton
2020-09-30 12:10   ` [RFC PATCH v2 3/4] ceph: remove timeout on allowing reconnect after blocklisting Jeff Layton
2020-09-30 12:10   ` [RFC PATCH v2 4/4] ceph: queue MDS requests to REJECTED sessions when CLEANRECOVER is set Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3224feb0441327729dc777666c33042b4ced82a8.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=idryomov@gmail.com \
    --cc=pdonnell@redhat.com \
    --cc=ukernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox