Re: [PATCH v3] ceph: blocklist the kclient when receiving corrupted snap trace

public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed

From: Xiubo Li <xiubli@redhat.com>
To: Ilya Dryomov <idryomov@gmail.com>
Cc: ceph-devel@vger.kernel.org, jlayton@kernel.org,
	mchangir@redhat.com, atomlin@atomlin.com, stable@vger.kernel.org
Subject: Re: [PATCH v3] ceph: blocklist the kclient when receiving corrupted snap trace
Date: Wed, 7 Dec 2022 21:19:31 +0800	[thread overview]
Message-ID: <baa681e9-4472-bcfb-601f-132dc6658888@redhat.com> (raw)
In-Reply-To: <CAOi1vP8hkXZ7w9D5LnMViyjqVCmsKo3H2dg1QpzgHCPuNfvACQ@mail.gmail.com>


On 07/12/2022 18:59, Ilya Dryomov wrote:
> On Tue, Dec 6, 2022 at 1:59 PM <xiubli@redhat.com> wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> When received corrupted snap trace we don't know what exactly has
>> happened in MDS side. And we shouldn't continue writing to OSD,
>> which may corrupt the snapshot contents.
>>
>> Just try to blocklist this client and If fails we need to crash the
>> client instead of leaving it writeable to OSDs.
>>
>> Cc: stable@vger.kernel.org
>> URL: https://tracker.ceph.com/issues/57686
>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>> ---
>>
>> Thanks Aaron's feedback.
>>
>> V3:
>> - Fixed ERROR: spaces required around that ':' (ctx:VxW)
>>
>> V2:
>> - Switched to WARN() to taint the Linux kernel.
>>
>>   fs/ceph/mds_client.c |  3 ++-
>>   fs/ceph/mds_client.h |  1 +
>>   fs/ceph/snap.c       | 25 +++++++++++++++++++++++++
>>   3 files changed, 28 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>> index cbbaf334b6b8..59094944af28 100644
>> --- a/fs/ceph/mds_client.c
>> +++ b/fs/ceph/mds_client.c
>> @@ -5648,7 +5648,8 @@ static void mds_peer_reset(struct ceph_connection *con)
>>          struct ceph_mds_client *mdsc = s->s_mdsc;
>>
>>          pr_warn("mds%d closed our session\n", s->s_mds);
>> -       send_mds_reconnect(mdsc, s);
>> +       if (!mdsc->no_reconnect)
>> +               send_mds_reconnect(mdsc, s);
>>   }
>>
>>   static void mds_dispatch(struct ceph_connection *con, struct ceph_msg *msg)
>> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
>> index 728b7d72bf76..8e8f0447c0ad 100644
>> --- a/fs/ceph/mds_client.h
>> +++ b/fs/ceph/mds_client.h
>> @@ -413,6 +413,7 @@ struct ceph_mds_client {
>>          atomic_t                num_sessions;
>>          int                     max_sessions;  /* len of sessions array */
>>          int                     stopping;      /* true if shutting down */
>> +       int                     no_reconnect;  /* true if snap trace is corrupted */
>>
>>          atomic64_t              quotarealms_count; /* # realms with quota */
>>          /*
>> diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
>> index c1c452afa84d..023852b7c527 100644
>> --- a/fs/ceph/snap.c
>> +++ b/fs/ceph/snap.c
>> @@ -767,8 +767,10 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
>>          struct ceph_snap_realm *realm;
>>          struct ceph_snap_realm *first_realm = NULL;
>>          struct ceph_snap_realm *realm_to_rebuild = NULL;
>> +       struct ceph_client *client = mdsc->fsc->client;
>>          int rebuild_snapcs;
>>          int err = -ENOMEM;
>> +       int ret;
>>          LIST_HEAD(dirty_realms);
>>
>>          lockdep_assert_held_write(&mdsc->snap_rwsem);
>> @@ -885,6 +887,29 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
>>          if (first_realm)
>>                  ceph_put_snap_realm(mdsc, first_realm);
>>          pr_err("%s error %d\n", __func__, err);
>> +
>> +       /*
>> +        * When receiving a corrupted snap trace we don't know what
>> +        * exactly has happened in MDS side. And we shouldn't continue
>> +        * writing to OSD, which may corrupt the snapshot contents.
>> +        *
>> +        * Just try to blocklist this kclient and if it fails we need
>> +        * to crash the kclient instead of leaving it writeable.
> Hi Xiubo,
>
> I'm not sure I understand this "let's blocklist ourselves" concept.
> If the kernel client shouldn't continue writing to OSDs in this case,
> why not just stop issuing writes -- perhaps initiating some equivalent
> of a read-only remount like many local filesystems would do on I/O
> errors (e.g. errors=remount-ro mode)?

The following patch seems working. Let me do more test to make sure 
there is not further crash.

diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
index c1c452afa84d..cd487f8a4cb5 100644
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -767,6 +767,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
         struct ceph_snap_realm *realm;
         struct ceph_snap_realm *first_realm = NULL;
         struct ceph_snap_realm *realm_to_rebuild = NULL;
+       struct super_block *sb = mdsc->fsc->sb;
         int rebuild_snapcs;
         int err = -ENOMEM;
         LIST_HEAD(dirty_realms);
@@ -885,6 +886,9 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
         if (first_realm)
                 ceph_put_snap_realm(mdsc, first_realm);
         pr_err("%s error %d\n", __func__, err);
+       pr_err("Remounting filesystem read-only\n");
+       sb->s_flags |= SB_RDONLY;
+
         return err;
  }




>
> Or, perhaps, all in-memory snap contexts could somehow be invalidated
> in this case, making writes fail naturally -- on the client side,
> without actually being sent to OSDs just to be nixed by the blocklist
> hammer.
>
> But further, what makes a failure to decode a snap trace special?
> AFAIK we don't do anything close to this for any other decoding
> failure.  Wouldn't "when received corrupted XYZ we don't know what
> exactly has happened in MDS side" argument apply to pretty much all
> decoding failures?
>
>> +        *
>> +        * Then this kclient must be remounted to continue after the
>> +        * corrupted metadata fixed in the MDS side.
>> +        */
>> +       mdsc->no_reconnect = 1;
>> +       ret = ceph_monc_blocklist_add(&client->monc, &client->msgr.inst.addr);
>> +       if (ret) {
>> +               pr_err("%s blocklist of %s failed: %d", __func__,
>> +                      ceph_pr_addr(&client->msgr.inst.addr), ret);
>> +               BUG();
> ... and this is a rough equivalent of errors=panic mode.
>
> Is there a corresponding userspace client PR that can be referenced?
> This needs additional background and justification IMO.
>
> Thanks,
>
>                  Ilya
>

next prev parent reply	other threads:[~2022-12-07 13:20 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-06 12:59 [PATCH v3] ceph: blocklist the kclient when receiving corrupted snap trace xiubli
2022-12-07 10:59 ` Ilya Dryomov
2022-12-07 12:35   ` Xiubo Li
2022-12-07 14:28     ` Ilya Dryomov
2022-12-08  1:04       ` Xiubo Li
2022-12-09 11:28         ` Ilya Dryomov
2022-12-07 13:19   ` Xiubo Li [this message]
2022-12-07 13:30     ` Xiubo Li
2022-12-07 14:20       ` Ilya Dryomov
2022-12-08  0:36         ` Xiubo Li
2022-12-09  6:14           ` Venky Shankar
2022-12-09  6:59             ` Xiubo Li
2022-12-09  7:15               ` Venky Shankar
2022-12-09  8:07                 ` Xiubo Li

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:c1c452afa84 dfblob:cd487f8a4cb )
 OR (
bs:"Re: [PATCH v3] ceph: blocklist the kclient when receiving corrupted snap trace" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=baa681e9-4472-bcfb-601f-132dc6658888@redhat.com \
    --to=xiubli@redhat.com \
    --cc=atomlin@atomlin.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=idryomov@gmail.com \
    --cc=jlayton@kernel.org \
    --cc=mchangir@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox