linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Max Kellermann <max.kellermann@ionos.com>
To: Xiubo Li <xiubli@redhat.com>
Cc: idryomov@gmail.com, jlayton@kernel.org,
	ceph-devel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH] fs/ceph/mds_client: ignore responses for waiting requests
Date: Wed, 8 Mar 2023 16:17:12 +0100	[thread overview]
Message-ID: <CAKPOu+_1ee8QDkuB4TxQBaUwnHi4bRKuszWzCb-BCY44cp1aJQ@mail.gmail.com> (raw)
In-Reply-To: <c2f9e0d3-0242-1304-26ea-04f25c3cdee4@redhat.com>

On Wed, Mar 8, 2023 at 4:42 AM Xiubo Li <xiubli@redhat.com> wrote:
> How could this happen ?
>
> Since the req hasn't been submitted yet, how could it receive a reply
> normally ?

I have no idea. We have frequent problems with MDS closing the
connection (once or twice a week), and sometimes, this leads to the
WARNING problem which leaves the server hanging. This seems to be some
timing problem, but that MDS connection problem is a different
problem.
My patch just attempts to address the WARNING; not knowing much about
Ceph internals, my idea was that even if the server sends bad reply
packets, the client shouldn't panic.

> It should be a corrupted reply and it lead us to get a incorrect req,
> which hasn't been submitted yet.
>
> BTW, do you have the dump of the corrupted msg by 'ceph_msg_dump(msg)' ?

Unfortunately not - we have already scrubbed the server that had this
problem and rebooted it with a fresh image including my patch. It
seems I don't have a full copy of the kernel log anymore.

Coincidentally, the patch has prevented another kernel hang just a few
minutes ago:

 Mar 08 15:48:53 sweb1 kernel: ceph: mds0 caps stale
 Mar 08 15:49:13 sweb1 kernel: ceph: mds0 caps stale
 Mar 08 15:49:35 sweb1 kernel: ceph: mds0 caps went stale, renewing
 Mar 08 15:49:35 sweb1 kernel: ceph: mds0 caps stale
 Mar 08 15:49:35 sweb1 kernel: libceph: mds0 (1)10.41.2.11:6801 socket
error on write
 Mar 08 15:49:35 sweb1 kernel: libceph: mds0 (1)10.41.2.11:6801 session reset
 Mar 08 15:49:35 sweb1 kernel: ceph: mds0 closed our session
 Mar 08 15:49:35 sweb1 kernel: ceph: mds0 reconnect start
 Mar 08 15:49:36 sweb1 kernel: ceph: mds0 reconnect success
 Mar 08 15:49:36 sweb1 kernel: ceph:  dropping dirty+flushing Fx state
for 0000000064778286 2199046848012
 Mar 08 15:49:40 sweb1 kernel: ceph: mdsc_handle_reply on waiting
request tid 1106187
 Mar 08 15:49:53 sweb1 kernel: ceph: mds0 caps renewed

Since my patch is already in place, the kernel hasn't checked the
unexpected packet and thus hasn't dumped it....

If you need more information and have a patch with more logging, I
could easily boot those servers with your patch and post that data
next time it happens.

Max

  reply	other threads:[~2023-03-08 15:17 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-02 13:06 [PATCH] fs/ceph/mds_client: ignore responses for waiting requests Max Kellermann
2023-03-08  3:42 ` Xiubo Li
2023-03-08 15:17   ` Max Kellermann [this message]
2023-03-09  5:30     ` Xiubo Li
2023-03-09  7:30       ` Max Kellermann
2023-03-09  7:33         ` Xiubo Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKPOu+_1ee8QDkuB4TxQBaUwnHi4bRKuszWzCb-BCY44cp1aJQ@mail.gmail.com \
    --to=max.kellermann@ionos.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=idryomov@gmail.com \
    --cc=jlayton@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=xiubli@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).