public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Viacheslav Dubeyko <vdubeyko@redhat.com>
To: Alex Markuze <amarkuze@redhat.com>, ceph-devel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, idryomov@gmail.com
Subject: Re: [EXTERNAL] [PATCH v3 01/11] ceph: convert inode flags to named bit positions and atomic bitops
Date: Wed, 29 Apr 2026 12:31:03 -0700	[thread overview]
Message-ID: <9bd559bf25d2e4f7faaab94eb756755f75bb4eff.camel@redhat.com> (raw)
In-Reply-To: <20260429125206.1512203-2-amarkuze@redhat.com>

On Wed, 2026-04-29 at 12:51 +0000, Alex Markuze wrote:
> Define named bit-position constants for all CEPH_I_* inode flags and
> derive the bitmask values from them.  This gives every flag a named
> _BIT constant usable with the test_bit/set_bit/clear_bit family.
> The intentionally unused bit position 1 is documented inline.
> 
> Convert all flag modifications to use atomic bitops (set_bit,
> clear_bit, test_and_clear_bit).  The previous code mixed lockless
> atomic ops on some flags (ERROR_WRITE, ODIRECT) with non-atomic
> read-modify-write (|= / &= ~) on other flags sharing the same
> unsigned long.  A concurrent non-atomic RMW can clobber an
> adjacent lockless atomic update -- for example, a lockless
> clear_bit(ERROR_WRITE) could be silently resurrected by a
> concurrent ci->i_ceph_flags |= CEPH_I_FLUSH under the spinlock.
> Using atomic bitops for all modifications eliminates this class
> of race entirely.
> 
> Flags whose only users are now the _BIT form (ERROR_WRITE,
> ASYNC_CHECK_CAPS) have their old mask defines removed to document
> that callers must use the _BIT constant with the set_bit/test_bit
> family.  ERROR_FILELOCK and SHUTDOWN retain their mask defines
> because they are still used via bitmask tests in lockless readers
> (ceph_inode_is_shutdown, reconnect_caps_cb).
> 
> Flag reads under i_ceph_lock continue to use bitmask tests where
> the tested flag is only modified under the same lock; this is safe
> because the lock serialises both the read and the write.  The
> remaining flags continue to use non-atomic bitmask operations under
> i_ceph_lock, which is correct and unchanged.
> 
> The lockless reader ceph_inode_is_shutdown() retains the READ_ONCE()
> snapshot plus bitmask test pattern -- the single atomic load into a
> local variable is correct and avoids a second memory access that
> test_bit() would require.  It now uses the named CEPH_I_SHUTDOWN
> mask constant instead of an inline BIT().
> 
> The direct assignment in ceph_finish_async_create() is converted
> from i_ceph_flags = CEPH_I_ASYNC_CREATE to set_bit().  This
> inode is I_NEW at this point -- still invisible to other threads
> and guaranteed to have zero flags from alloc_inode -- so either
> form is safe, but set_bit() keeps the conversion uniform.
> 
> The only remaining direct assignment (alloc_inode zeroing) operates
> on an inode that is not yet visible to other threads, so it is safe
> without atomic ops.
> 
> The dead precomputed flags variable in ceph_pool_perm_check() is
> removed; the check: loop re-reads flags from i_ceph_flags after
> the set_bit() calls, keeping a single source of truth.
> 
> Co-developed-by: Viacheslav Dubeyko <vdubeyko@redhat.com>
> Signed-off-by: Viacheslav Dubeyko <vdubeyko@redhat.com>
> Signed-off-by: Alex Markuze <amarkuze@redhat.com>
> ---
>  fs/ceph/addr.c       | 17 ++++++------
>  fs/ceph/caps.c       | 24 ++++++++---------
>  fs/ceph/file.c       | 13 ++++-----
>  fs/ceph/inode.c      |  4 +--
>  fs/ceph/locks.c      | 22 ++++-----------
>  fs/ceph/mds_client.c |  3 ++-
>  fs/ceph/mds_client.h |  2 +-
>  fs/ceph/snap.c       |  2 +-
>  fs/ceph/super.h      | 64 +++++++++++++++++++++++---------------------
>  fs/ceph/xattr.c      |  2 +-
>  10 files changed, 72 insertions(+), 81 deletions(-)
> 
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index 2090fc78529c..35c5fdb5a448 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -2583,20 +2583,19 @@ int ceph_pool_perm_check(struct inode *inode, int need)

I have realized that we have issue here. We declare flags as int [1]:

	int ret, flags;

However, we assign ci->i_ceph_flags [2] that has unsigned long type [3]:

	spin_lock(&ci->i_ceph_lock);
	flags = ci->i_ceph_flags;
	pool = ci->i_layout.pool_id;
	spin_unlock(&ci->i_ceph_lock);

I think we need to rework the declaration of flags variable.

The rest of the patch looks good.

Thanks,
Slava.

[1] https://elixir.bootlin.com/linux/v7.0.1/source/fs/ceph/addr.c#L2544
[2] https://elixir.bootlin.com/linux/v7.0.1/source/fs/ceph/addr.c#L2564
[3] https://elixir.bootlin.com/linux/v7.0.1/source/fs/ceph/super.h#L383


  reply	other threads:[~2026-04-29 19:31 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-29 12:51 [PATCH v3 00/11] ceph: manual client session reset Alex Markuze
2026-04-29 12:51 ` [PATCH v3 01/11] ceph: convert inode flags to named bit positions and atomic bitops Alex Markuze
2026-04-29 19:31   ` Viacheslav Dubeyko [this message]
2026-04-29 12:51 ` [PATCH v3 02/11] ceph: use proper endian conversion for flock_len in reconnect Alex Markuze
2026-04-29 12:51 ` [PATCH v3 03/11] ceph: harden send_mds_reconnect and handle active-MDS peer reset Alex Markuze
2026-04-29 21:22   ` [EXTERNAL] " Viacheslav Dubeyko
2026-04-29 12:51 ` [PATCH v3 04/11] ceph: add diagnostic timeout loop to wait_caps_flush() Alex Markuze
2026-04-29 21:41   ` [EXTERNAL] " Viacheslav Dubeyko
2026-04-29 12:52 ` [PATCH v3 05/11] ceph: add client reset state machine and session teardown Alex Markuze
2026-04-29 22:29   ` [EXTERNAL] " Viacheslav Dubeyko
2026-04-29 12:52 ` [PATCH v3 06/11] ceph: add manual reset debugfs control and tracepoints Alex Markuze
2026-04-30 18:38   ` [EXTERNAL] " Viacheslav Dubeyko
2026-04-29 12:52 ` [PATCH v3 07/11] selftests: ceph: add reset consistency checker Alex Markuze
2026-04-29 12:52 ` [PATCH v3 08/11] selftests: ceph: add reset stress test Alex Markuze
2026-04-29 12:52 ` [PATCH v3 09/11] selftests: ceph: add reset corner-case tests Alex Markuze
2026-04-29 12:52 ` [PATCH v3 10/11] selftests: ceph: add validation harness Alex Markuze
2026-04-29 12:52 ` [PATCH v3 11/11] selftests: ceph: wire up Ceph reset kselftests and documentation Alex Markuze

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9bd559bf25d2e4f7faaab94eb756755f75bb4eff.camel@redhat.com \
    --to=vdubeyko@redhat.com \
    --cc=amarkuze@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=idryomov@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox