* [PATCH v9 00/12] ceph: support idmapped mounts
@ 2023-08-04 8:48 Alexander Mikhalitsyn
2023-08-04 8:48 ` [PATCH v9 01/12] fs: export mnt_idmap_get/mnt_idmap_put Alexander Mikhalitsyn
` (11 more replies)
0 siblings, 12 replies; 25+ messages in thread
From: Alexander Mikhalitsyn @ 2023-08-04 8:48 UTC (permalink / raw)
To: xiubli
Cc: brauner, stgraber, linux-fsdevel, Ilya Dryomov, Jeff Layton,
ceph-devel, linux-kernel
Dear friends,
This patchset was originally developed by Christian Brauner but I'll continue
to push it forward. Christian allowed me to do that :)
This feature is already actively used/tested with LXD/LXC project.
Git tree (based on https://github.com/ceph/ceph-client.git testing):
v9: https://github.com/mihalicyn/linux/commits/fs.idmapped.ceph.v9
current: https://github.com/mihalicyn/linux/tree/fs.idmapped.ceph
In the version 3 I've changed only two commits:
- fs: export mnt_idmap_get/mnt_idmap_put
- ceph: allow idmapped setattr inode op
and added a new one:
- ceph: pass idmap to __ceph_setattr
In the version 4 I've reworked the ("ceph: stash idmapping in mdsc request")
commit. Now we take idmap refcounter just in place where req->r_mnt_idmap
is filled. It's more safer approach and prevents possible refcounter underflow
on error paths where __register_request wasn't called but ceph_mdsc_release_request is
called.
Changelog for version 5:
- a few commits were squashed into one (as suggested by Xiubo Li)
- started passing an idmapping everywhere (if possible), so a caller
UID/GID-s will be mapped almost everywhere (as suggested by Xiubo Li)
Changelog for version 6:
- rebased on top of testing branch
- passed an idmapping in a few places (readdir, ceph_netfs_issue_op_inline)
Changelog for version 7:
- rebased on top of testing branch
- this thing now requires a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID
https://github.com/ceph/ceph/pull/52575
Changelog for version 8:
- rebased on top of testing branch
- added enable_unsafe_idmap module parameter to make idmapped mounts
work with old MDS server versions
- properly handled case when old MDS used with new kernel client
Changelog for version 9:
- added "struct_len" field in struct ceph_mds_request_head as requested by Xiubo Li
I can confirm that this version passes xfstests and
tested with old MDS (without CEPHFS_FEATURE_HAS_OWNER_UIDGID)
and with recent MDS version.
Links to previous versions:
v1: https://lore.kernel.org/all/20220104140414.155198-1-brauner@kernel.org/
v2: https://lore.kernel.org/lkml/20230524153316.476973-1-aleksandr.mikhalitsyn@canonical.com/
tree: https://github.com/mihalicyn/linux/commits/fs.idmapped.ceph.v2
v3: https://lore.kernel.org/lkml/20230607152038.469739-1-aleksandr.mikhalitsyn@canonical.com/#t
v4: https://lore.kernel.org/lkml/20230607180958.645115-1-aleksandr.mikhalitsyn@canonical.com/#t
tree: https://github.com/mihalicyn/linux/commits/fs.idmapped.ceph.v4
v5: https://lore.kernel.org/lkml/20230608154256.562906-1-aleksandr.mikhalitsyn@canonical.com/#t
tree: https://github.com/mihalicyn/linux/commits/fs.idmapped.ceph.v5
v6: https://lore.kernel.org/lkml/20230609093125.252186-1-aleksandr.mikhalitsyn@canonical.com/
tree: https://github.com/mihalicyn/linux/commits/fs.idmapped.ceph.v6
v7: https://lore.kernel.org/all/20230726141026.307690-1-aleksandr.mikhalitsyn@canonical.com/
tree: https://github.com/mihalicyn/linux/commits/fs.idmapped.ceph.v7
v8: https://lore.kernel.org/all/20230803135955.230449-1-aleksandr.mikhalitsyn@canonical.com/
tree: -
Kind regards,
Alex
Original description from Christian:
========================================================================
This patch series enables cephfs to support idmapped mounts, i.e. the
ability to alter ownership information on a per-mount basis.
Container managers such as LXD support sharaing data via cephfs between
the host and unprivileged containers and between unprivileged containers.
They may all use different idmappings. Idmapped mounts can be used to
create mounts with the idmapping used for the container (or a different
one specific to the use-case).
There are in fact more use-cases such as remapping ownership for
mountpoints on the host itself to grant or restrict access to different
users or to make it possible to enforce that programs running as root
will write with a non-zero {g,u}id to disk.
The patch series is simple overall and few changes are needed to cephfs.
There is one cephfs specific issue that I would like to discuss and
solve which I explain in detail in:
[PATCH 02/12] ceph: handle idmapped mounts in create_request_message()
It has to do with how to handle mds serves which have id-based access
restrictions configured. I would ask you to please take a look at the
explanation in the aforementioned patch.
The patch series passes the vfs and idmapped mount testsuite as part of
xfstests. To run it you will need a config like:
[ceph]
export FSTYP=ceph
export TEST_DIR=/mnt/test
export TEST_DEV=10.103.182.10:6789:/
export TEST_FS_MOUNT_OPTS="-o name=admin,secret=$password
and then simply call
sudo ./check -g idmapped
========================================================================
Alexander Mikhalitsyn (3):
fs: export mnt_idmap_get/mnt_idmap_put
ceph: add enable_unsafe_idmap module parameter
ceph: pass idmap to __ceph_setattr
Christian Brauner (9):
ceph: stash idmapping in mdsc request
ceph: handle idmapped mounts in create_request_message()
ceph: pass an idmapping to mknod/symlink/mkdir
ceph: allow idmapped getattr inode op
ceph: allow idmapped permission inode op
ceph: allow idmapped setattr inode op
ceph/acl: allow idmapped set_acl inode op
ceph/file: allow idmapped atomic_open inode op
ceph: allow idmapped mounts
fs/ceph/acl.c | 6 +--
fs/ceph/crypto.c | 2 +-
fs/ceph/dir.c | 3 ++
fs/ceph/file.c | 10 ++++-
fs/ceph/inode.c | 29 +++++++++------
fs/ceph/mds_client.c | 70 ++++++++++++++++++++++++++++++++---
fs/ceph/mds_client.h | 8 +++-
fs/ceph/super.c | 7 +++-
fs/ceph/super.h | 3 +-
fs/mnt_idmapping.c | 2 +
include/linux/ceph/ceph_fs.h | 5 ++-
include/linux/mnt_idmapping.h | 3 ++
12 files changed, 121 insertions(+), 27 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v9 01/12] fs: export mnt_idmap_get/mnt_idmap_put
2023-08-04 8:48 [PATCH v9 00/12] ceph: support idmapped mounts Alexander Mikhalitsyn
@ 2023-08-04 8:48 ` Alexander Mikhalitsyn
2023-08-04 8:48 ` [PATCH v9 02/12] ceph: stash idmapping in mdsc request Alexander Mikhalitsyn
` (10 subsequent siblings)
11 siblings, 0 replies; 25+ messages in thread
From: Alexander Mikhalitsyn @ 2023-08-04 8:48 UTC (permalink / raw)
To: xiubli
Cc: brauner, stgraber, linux-fsdevel, Jeff Layton, Ilya Dryomov,
ceph-devel, Alexander Mikhalitsyn, Alexander Viro, Seth Forshee,
linux-kernel
These helpers are required to support idmapped mounts in the Cephfs.
Cc: Christian Brauner <brauner@kernel.org>
Cc: Xiubo Li <xiubli@redhat.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: ceph-devel@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
---
v3:
- EXPORT_SYMBOL -> EXPORT_SYMBOL_GPL as Christoph Hellwig suggested
---
fs/mnt_idmapping.c | 2 ++
include/linux/mnt_idmapping.h | 3 +++
2 files changed, 5 insertions(+)
diff --git a/fs/mnt_idmapping.c b/fs/mnt_idmapping.c
index 4905665c47d0..57d1dedf3f8f 100644
--- a/fs/mnt_idmapping.c
+++ b/fs/mnt_idmapping.c
@@ -256,6 +256,7 @@ struct mnt_idmap *mnt_idmap_get(struct mnt_idmap *idmap)
return idmap;
}
+EXPORT_SYMBOL_GPL(mnt_idmap_get);
/**
* mnt_idmap_put - put a reference to an idmapping
@@ -271,3 +272,4 @@ void mnt_idmap_put(struct mnt_idmap *idmap)
kfree(idmap);
}
}
+EXPORT_SYMBOL_GPL(mnt_idmap_put);
diff --git a/include/linux/mnt_idmapping.h b/include/linux/mnt_idmapping.h
index 057c89867aa2..b8da2db4ecd2 100644
--- a/include/linux/mnt_idmapping.h
+++ b/include/linux/mnt_idmapping.h
@@ -115,6 +115,9 @@ static inline bool vfsgid_eq_kgid(vfsgid_t vfsgid, kgid_t kgid)
int vfsgid_in_group_p(vfsgid_t vfsgid);
+struct mnt_idmap *mnt_idmap_get(struct mnt_idmap *idmap);
+void mnt_idmap_put(struct mnt_idmap *idmap);
+
vfsuid_t make_vfsuid(struct mnt_idmap *idmap,
struct user_namespace *fs_userns, kuid_t kuid);
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v9 02/12] ceph: stash idmapping in mdsc request
2023-08-04 8:48 [PATCH v9 00/12] ceph: support idmapped mounts Alexander Mikhalitsyn
2023-08-04 8:48 ` [PATCH v9 01/12] fs: export mnt_idmap_get/mnt_idmap_put Alexander Mikhalitsyn
@ 2023-08-04 8:48 ` Alexander Mikhalitsyn
2023-08-04 8:48 ` [PATCH v9 03/12] ceph: handle idmapped mounts in create_request_message() Alexander Mikhalitsyn
` (9 subsequent siblings)
11 siblings, 0 replies; 25+ messages in thread
From: Alexander Mikhalitsyn @ 2023-08-04 8:48 UTC (permalink / raw)
To: xiubli
Cc: brauner, stgraber, linux-fsdevel, Jeff Layton, Ilya Dryomov,
ceph-devel, Alexander Mikhalitsyn, linux-kernel
From: Christian Brauner <brauner@kernel.org>
When sending a mds request cephfs will send relevant data for the
requested operation. For creation requests the caller's fs{g,u}id is
used to set the ownership of the newly created filesystem object. For
setattr requests the caller can pass in arbitrary {g,u}id values to
which the relevant filesystem object is supposed to be changed.
If the caller is performing the relevant operation via an idmapped mount
cephfs simply needs to take the idmapping into account when it sends the
relevant mds request.
In order to support idmapped mounts for cephfs we stash the idmapping
whenever they are relevant for the operation for the duration of the
request. Since mds requests can be queued and performed asynchronously
we make sure to keep the idmapping around and release it once the
request has finished.
In follow-up patches we will use this to send correct ownership
information over the wire. This patch just adds the basic infrastructure
to keep the idmapping around. The actual conversion patches are all
fairly minimal.
Cc: Xiubo Li <xiubli@redhat.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: ceph-devel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
---
v4:
- don't call mnt_idmap_get(..) in __register_request
---
fs/ceph/mds_client.c | 5 +++++
fs/ceph/mds_client.h | 1 +
2 files changed, 6 insertions(+)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 9aae39289b43..8829f55103da 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -12,6 +12,7 @@
#include <linux/bits.h>
#include <linux/ktime.h>
#include <linux/bitmap.h>
+#include <linux/mnt_idmapping.h>
#include "super.h"
#include "crypto.h"
@@ -1121,6 +1122,8 @@ void ceph_mdsc_release_request(struct kref *kref)
kfree(req->r_path1);
kfree(req->r_path2);
put_cred(req->r_cred);
+ if (req->r_mnt_idmap)
+ mnt_idmap_put(req->r_mnt_idmap);
if (req->r_pagelist)
ceph_pagelist_release(req->r_pagelist);
kfree(req->r_fscrypt_auth);
@@ -1180,6 +1183,8 @@ static void __register_request(struct ceph_mds_client *mdsc,
insert_request(&mdsc->request_tree, req);
req->r_cred = get_current_cred();
+ if (!req->r_mnt_idmap)
+ req->r_mnt_idmap = &nop_mnt_idmap;
if (mdsc->oldest_tid == 0 && req->r_op != CEPH_MDS_OP_SETFILELOCK)
mdsc->oldest_tid = req->r_tid;
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 717a7399bacb..e3bbf3ba8ee8 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -300,6 +300,7 @@ struct ceph_mds_request {
int r_fmode; /* file mode, if expecting cap */
int r_request_release_offset;
const struct cred *r_cred;
+ struct mnt_idmap *r_mnt_idmap;
struct timespec64 r_stamp;
/* for choosing which mds to send this request to */
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v9 03/12] ceph: handle idmapped mounts in create_request_message()
2023-08-04 8:48 [PATCH v9 00/12] ceph: support idmapped mounts Alexander Mikhalitsyn
2023-08-04 8:48 ` [PATCH v9 01/12] fs: export mnt_idmap_get/mnt_idmap_put Alexander Mikhalitsyn
2023-08-04 8:48 ` [PATCH v9 02/12] ceph: stash idmapping in mdsc request Alexander Mikhalitsyn
@ 2023-08-04 8:48 ` Alexander Mikhalitsyn
2023-08-04 14:53 ` Christian Brauner
` (2 more replies)
2023-08-04 8:48 ` [PATCH v9 04/12] ceph: add enable_unsafe_idmap module parameter Alexander Mikhalitsyn
` (8 subsequent siblings)
11 siblings, 3 replies; 25+ messages in thread
From: Alexander Mikhalitsyn @ 2023-08-04 8:48 UTC (permalink / raw)
To: xiubli
Cc: brauner, stgraber, linux-fsdevel, Jeff Layton, Ilya Dryomov,
ceph-devel, Alexander Mikhalitsyn, linux-kernel
From: Christian Brauner <brauner@kernel.org>
Inode operations that create a new filesystem object such as ->mknod,
->create, ->mkdir() and others don't take a {g,u}id argument explicitly.
Instead the caller's fs{g,u}id is used for the {g,u}id of the new
filesystem object.
In order to ensure that the correct {g,u}id is used map the caller's
fs{g,u}id for creation requests. This doesn't require complex changes.
It suffices to pass in the relevant idmapping recorded in the request
message. If this request message was triggered from an inode operation
that creates filesystem objects it will have passed down the relevant
idmaping. If this is a request message that was triggered from an inode
operation that doens't need to take idmappings into account the initial
idmapping is passed down which is an identity mapping.
This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID
which adds two new fields (owner_{u,g}id) to the request head structure.
So, we need to ensure that MDS supports it otherwise we need to fail
any IO that comes through an idmapped mount because we can't process it
in a proper way. MDS server without such an extension will use caller_{u,g}id
fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id
values are unmapped. At the same time we can't map these fields with an
idmapping as it can break UID/GID-based permission checks logic on the
MDS side. This problem was described with a lot of details at [1], [2].
[1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
[2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/
Link: https://github.com/ceph/ceph/pull/52575
Link: https://tracker.ceph.com/issues/62217
Cc: Xiubo Li <xiubli@redhat.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: ceph-devel@vger.kernel.org
Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
---
v7:
- reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575)
v8:
- properly handled case when old MDS used with new kernel client
---
fs/ceph/mds_client.c | 47 +++++++++++++++++++++++++++++++++---
fs/ceph/mds_client.h | 5 +++-
include/linux/ceph/ceph_fs.h | 5 +++-
3 files changed, 52 insertions(+), 5 deletions(-)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 8829f55103da..41e4bf3811c4 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request *
}
}
+static inline u16 mds_supported_head_version(struct ceph_mds_session *session)
+{
+ if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features))
+ return 1;
+
+ if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features))
+ return 2;
+
+ return CEPH_MDS_REQUEST_HEAD_VERSION;
+}
+
static struct ceph_mds_request_head_legacy *
find_legacy_request_head(void *p, u64 features)
{
@@ -2923,6 +2934,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
{
int mds = session->s_mds;
struct ceph_mds_client *mdsc = session->s_mdsc;
+ struct ceph_client *cl = mdsc->fsc->client;
struct ceph_msg *msg;
struct ceph_mds_request_head_legacy *lhead;
const char *path1 = NULL;
@@ -2936,7 +2948,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
void *p, *end;
int ret;
bool legacy = !(session->s_con.peer_features & CEPH_FEATURE_FS_BTIME);
- bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features);
+ u16 request_head_version = mds_supported_head_version(session);
ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry,
req->r_parent, req->r_path1, req->r_ino1.ino,
@@ -2977,8 +2989,10 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
*/
if (legacy)
len = sizeof(struct ceph_mds_request_head_legacy);
- else if (old_version)
+ else if (request_head_version == 1)
len = sizeof(struct ceph_mds_request_head_old);
+ else if (request_head_version == 2)
+ len = offsetofend(struct ceph_mds_request_head, ext_num_fwd);
else
len = sizeof(struct ceph_mds_request_head);
@@ -3028,6 +3042,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
lhead = find_legacy_request_head(msg->front.iov_base,
session->s_con.peer_features);
+ if ((req->r_mnt_idmap != &nop_mnt_idmap) &&
+ !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) {
+ pr_err_ratelimited_client(cl,
+ "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID"
+ " is not supported by MDS. Fail request with -EIO.\n");
+
+ ret = -EIO;
+ goto out_err;
+ }
+
/*
* The ceph_mds_request_head_legacy didn't contain a version field, and
* one was added when we moved the message version from 3->4.
@@ -3035,17 +3059,34 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
if (legacy) {
msg->hdr.version = cpu_to_le16(3);
p = msg->front.iov_base + sizeof(*lhead);
- } else if (old_version) {
+ } else if (request_head_version == 1) {
struct ceph_mds_request_head_old *ohead = msg->front.iov_base;
msg->hdr.version = cpu_to_le16(4);
ohead->version = cpu_to_le16(1);
p = msg->front.iov_base + sizeof(*ohead);
+ } else if (request_head_version == 2) {
+ struct ceph_mds_request_head *nhead = msg->front.iov_base;
+
+ msg->hdr.version = cpu_to_le16(6);
+ nhead->version = cpu_to_le16(2);
+
+ p = msg->front.iov_base + offsetofend(struct ceph_mds_request_head, ext_num_fwd);
} else {
struct ceph_mds_request_head *nhead = msg->front.iov_base;
+ kuid_t owner_fsuid;
+ kgid_t owner_fsgid;
msg->hdr.version = cpu_to_le16(6);
nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
+ nhead->struct_len = sizeof(struct ceph_mds_request_head);
+
+ owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns,
+ VFSUIDT_INIT(req->r_cred->fsuid));
+ owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns,
+ VFSGIDT_INIT(req->r_cred->fsgid));
+ nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, owner_fsuid));
+ nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, owner_fsgid));
p = msg->front.iov_base + sizeof(*nhead);
}
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index e3bbf3ba8ee8..8f683e8203bd 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -33,8 +33,10 @@ enum ceph_feature_type {
CEPHFS_FEATURE_NOTIFY_SESSION_STATE,
CEPHFS_FEATURE_OP_GETVXATTR,
CEPHFS_FEATURE_32BITS_RETRY_FWD,
+ CEPHFS_FEATURE_NEW_SNAPREALM_INFO,
+ CEPHFS_FEATURE_HAS_OWNER_UIDGID,
- CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD,
+ CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID,
};
#define CEPHFS_FEATURES_CLIENT_SUPPORTED { \
@@ -49,6 +51,7 @@ enum ceph_feature_type {
CEPHFS_FEATURE_NOTIFY_SESSION_STATE, \
CEPHFS_FEATURE_OP_GETVXATTR, \
CEPHFS_FEATURE_32BITS_RETRY_FWD, \
+ CEPHFS_FEATURE_HAS_OWNER_UIDGID, \
}
/*
diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
index 5f2301ee88bc..b91699b08f26 100644
--- a/include/linux/ceph/ceph_fs.h
+++ b/include/linux/ceph/ceph_fs.h
@@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy {
union ceph_mds_request_args args;
} __attribute__ ((packed));
-#define CEPH_MDS_REQUEST_HEAD_VERSION 2
+#define CEPH_MDS_REQUEST_HEAD_VERSION 3
struct ceph_mds_request_head_old {
__le16 version; /* struct version */
@@ -530,6 +530,9 @@ struct ceph_mds_request_head {
__le32 ext_num_retry; /* new count retry attempts */
__le32 ext_num_fwd; /* new count fwd attempts */
+
+ __le32 struct_len; /* to store size of struct ceph_mds_request_head */
+ __le32 owner_uid, owner_gid; /* used for OPs which create inodes */
} __attribute__ ((packed));
/* cap/lease release record */
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v9 04/12] ceph: add enable_unsafe_idmap module parameter
2023-08-04 8:48 [PATCH v9 00/12] ceph: support idmapped mounts Alexander Mikhalitsyn
` (2 preceding siblings ...)
2023-08-04 8:48 ` [PATCH v9 03/12] ceph: handle idmapped mounts in create_request_message() Alexander Mikhalitsyn
@ 2023-08-04 8:48 ` Alexander Mikhalitsyn
2023-08-04 14:54 ` Christian Brauner
2023-08-04 8:48 ` [PATCH v9 05/12] ceph: pass an idmapping to mknod/symlink/mkdir Alexander Mikhalitsyn
` (7 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Alexander Mikhalitsyn @ 2023-08-04 8:48 UTC (permalink / raw)
To: xiubli
Cc: brauner, stgraber, linux-fsdevel, Jeff Layton, Ilya Dryomov,
ceph-devel, Alexander Mikhalitsyn, linux-kernel
This parameter is used to decide if we allow
to perform IO on idmapped mount in case when MDS lacks
support of CEPHFS_FEATURE_HAS_OWNER_UIDGID feature.
In this case we can't properly handle MDS permission
checks and if UID/GID-based restrictions are enabled
on the MDS side then IO requests which go through an
idmapped mount may fail with -EACCESS/-EPERM.
Fortunately, for most of users it's not a case and
everything should work fine. But we put work "unsafe"
in the module parameter name to warn users about
possible problems with this feature and encourage
update of cephfs MDS.
Cc: Xiubo Li <xiubli@redhat.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: ceph-devel@vger.kernel.org
Suggested-by: Stéphane Graber <stgraber@ubuntu.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
---
fs/ceph/mds_client.c | 28 +++++++++++++++++++++-------
fs/ceph/mds_client.h | 2 ++
fs/ceph/super.c | 5 +++++
3 files changed, 28 insertions(+), 7 deletions(-)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 41e4bf3811c4..42c0afbb6376 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2949,6 +2949,8 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
int ret;
bool legacy = !(session->s_con.peer_features & CEPH_FEATURE_FS_BTIME);
u16 request_head_version = mds_supported_head_version(session);
+ kuid_t caller_fsuid = req->r_cred->fsuid;
+ kgid_t caller_fsgid = req->r_cred->fsgid;
ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry,
req->r_parent, req->r_path1, req->r_ino1.ino,
@@ -3044,12 +3046,24 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
if ((req->r_mnt_idmap != &nop_mnt_idmap) &&
!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) {
- pr_err_ratelimited_client(cl,
- "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID"
- " is not supported by MDS. Fail request with -EIO.\n");
+ if (enable_unsafe_idmap) {
+ pr_warn_once_client(cl,
+ "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID"
+ " is not supported by MDS. UID/GID-based restrictions may"
+ " not work properly.\n");
+
+ caller_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns,
+ VFSUIDT_INIT(req->r_cred->fsuid));
+ caller_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns,
+ VFSGIDT_INIT(req->r_cred->fsgid));
+ } else {
+ pr_err_ratelimited_client(cl,
+ "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID"
+ " is not supported by MDS. Fail request with -EIO.\n");
- ret = -EIO;
- goto out_err;
+ ret = -EIO;
+ goto out_err;
+ }
}
/*
@@ -3095,9 +3109,9 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
lhead->mdsmap_epoch = cpu_to_le32(mdsc->mdsmap->m_epoch);
lhead->op = cpu_to_le32(req->r_op);
lhead->caller_uid = cpu_to_le32(from_kuid(&init_user_ns,
- req->r_cred->fsuid));
+ caller_fsuid));
lhead->caller_gid = cpu_to_le32(from_kgid(&init_user_ns,
- req->r_cred->fsgid));
+ caller_fsgid));
lhead->ino = cpu_to_le64(req->r_deleg_ino);
lhead->args = req->r_args;
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 8f683e8203bd..0945ae4cf3c5 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -619,4 +619,6 @@ static inline int ceph_wait_on_async_create(struct inode *inode)
extern int ceph_wait_on_conflict_unlink(struct dentry *dentry);
extern u64 ceph_get_deleg_ino(struct ceph_mds_session *session);
extern int ceph_restore_deleg_ino(struct ceph_mds_session *session, u64 ino);
+
+extern bool enable_unsafe_idmap;
#endif
diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index 49fd17fbba9f..18bfdfd48cef 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -1680,6 +1680,11 @@ static const struct kernel_param_ops param_ops_mount_syntax = {
module_param_cb(mount_syntax_v1, ¶m_ops_mount_syntax, &mount_support, 0444);
module_param_cb(mount_syntax_v2, ¶m_ops_mount_syntax, &mount_support, 0444);
+bool enable_unsafe_idmap = false;
+module_param(enable_unsafe_idmap, bool, 0644);
+MODULE_PARM_DESC(enable_unsafe_idmap,
+ "Allow to use idmapped mounts with MDS without CEPHFS_FEATURE_HAS_OWNER_UIDGID");
+
module_init(init_ceph);
module_exit(exit_ceph);
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v9 05/12] ceph: pass an idmapping to mknod/symlink/mkdir
2023-08-04 8:48 [PATCH v9 00/12] ceph: support idmapped mounts Alexander Mikhalitsyn
` (3 preceding siblings ...)
2023-08-04 8:48 ` [PATCH v9 04/12] ceph: add enable_unsafe_idmap module parameter Alexander Mikhalitsyn
@ 2023-08-04 8:48 ` Alexander Mikhalitsyn
2023-08-04 8:48 ` [PATCH v9 06/12] ceph: allow idmapped getattr inode op Alexander Mikhalitsyn
` (6 subsequent siblings)
11 siblings, 0 replies; 25+ messages in thread
From: Alexander Mikhalitsyn @ 2023-08-04 8:48 UTC (permalink / raw)
To: xiubli
Cc: brauner, stgraber, linux-fsdevel, Jeff Layton, Ilya Dryomov,
ceph-devel, Alexander Mikhalitsyn, linux-kernel
From: Christian Brauner <brauner@kernel.org>
Enable mknod/symlink/mkdir iops to handle idmapped mounts.
This is just a matter of passing down the mount's idmapping.
Cc: Xiubo Li <xiubli@redhat.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: ceph-devel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
---
v4:
- call mnt_idmap_get
v7:
- don't pass idmapping for ceph_rename (no need)
---
fs/ceph/dir.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index b752ed3ccdf0..397656ae7787 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -952,6 +952,7 @@ static int ceph_mknod(struct mnt_idmap *idmap, struct inode *dir,
req->r_parent = dir;
ihold(dir);
set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
+ req->r_mnt_idmap = mnt_idmap_get(idmap);
req->r_args.mknod.mode = cpu_to_le32(mode);
req->r_args.mknod.rdev = cpu_to_le32(rdev);
req->r_dentry_drop = CEPH_CAP_FILE_SHARED | CEPH_CAP_AUTH_EXCL |
@@ -1067,6 +1068,7 @@ static int ceph_symlink(struct mnt_idmap *idmap, struct inode *dir,
}
set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
+ req->r_mnt_idmap = mnt_idmap_get(idmap);
req->r_dentry = dget(dentry);
req->r_num_caps = 2;
req->r_dentry_drop = CEPH_CAP_FILE_SHARED | CEPH_CAP_AUTH_EXCL |
@@ -1146,6 +1148,7 @@ static int ceph_mkdir(struct mnt_idmap *idmap, struct inode *dir,
req->r_parent = dir;
ihold(dir);
set_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
+ req->r_mnt_idmap = mnt_idmap_get(idmap);
req->r_args.mkdir.mode = cpu_to_le32(mode);
req->r_dentry_drop = CEPH_CAP_FILE_SHARED | CEPH_CAP_AUTH_EXCL |
CEPH_CAP_XATTR_EXCL;
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v9 06/12] ceph: allow idmapped getattr inode op
2023-08-04 8:48 [PATCH v9 00/12] ceph: support idmapped mounts Alexander Mikhalitsyn
` (4 preceding siblings ...)
2023-08-04 8:48 ` [PATCH v9 05/12] ceph: pass an idmapping to mknod/symlink/mkdir Alexander Mikhalitsyn
@ 2023-08-04 8:48 ` Alexander Mikhalitsyn
2023-08-04 8:48 ` [PATCH v9 07/12] ceph: allow idmapped permission " Alexander Mikhalitsyn
` (5 subsequent siblings)
11 siblings, 0 replies; 25+ messages in thread
From: Alexander Mikhalitsyn @ 2023-08-04 8:48 UTC (permalink / raw)
To: xiubli
Cc: brauner, stgraber, linux-fsdevel, Jeff Layton, Ilya Dryomov,
ceph-devel, Alexander Mikhalitsyn, linux-kernel
From: Christian Brauner <brauner@kernel.org>
Enable ceph_getattr() to handle idmapped mounts. This is just a matter
of passing down the mount's idmapping.
Cc: Xiubo Li <xiubli@redhat.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: ceph-devel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
---
fs/ceph/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 3ff4f57f223f..136b68ccdbef 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -3034,7 +3034,7 @@ int ceph_getattr(struct mnt_idmap *idmap, const struct path *path,
return err;
}
- generic_fillattr(&nop_mnt_idmap, inode, stat);
+ generic_fillattr(idmap, inode, stat);
stat->ino = ceph_present_inode(inode);
/*
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v9 07/12] ceph: allow idmapped permission inode op
2023-08-04 8:48 [PATCH v9 00/12] ceph: support idmapped mounts Alexander Mikhalitsyn
` (5 preceding siblings ...)
2023-08-04 8:48 ` [PATCH v9 06/12] ceph: allow idmapped getattr inode op Alexander Mikhalitsyn
@ 2023-08-04 8:48 ` Alexander Mikhalitsyn
2023-08-04 8:48 ` [PATCH v9 08/12] ceph: pass idmap to __ceph_setattr Alexander Mikhalitsyn
` (4 subsequent siblings)
11 siblings, 0 replies; 25+ messages in thread
From: Alexander Mikhalitsyn @ 2023-08-04 8:48 UTC (permalink / raw)
To: xiubli
Cc: brauner, stgraber, linux-fsdevel, Jeff Layton, Ilya Dryomov,
ceph-devel, Alexander Mikhalitsyn, linux-kernel
From: Christian Brauner <brauner@kernel.org>
Enable ceph_permission() to handle idmapped mounts. This is just a
matter of passing down the mount's idmapping.
Cc: Xiubo Li <xiubli@redhat.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: ceph-devel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
---
fs/ceph/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 136b68ccdbef..9b50861bd2b5 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -2977,7 +2977,7 @@ int ceph_permission(struct mnt_idmap *idmap, struct inode *inode,
err = ceph_do_getattr(inode, CEPH_CAP_AUTH_SHARED, false);
if (!err)
- err = generic_permission(&nop_mnt_idmap, inode, mask);
+ err = generic_permission(idmap, inode, mask);
return err;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v9 08/12] ceph: pass idmap to __ceph_setattr
2023-08-04 8:48 [PATCH v9 00/12] ceph: support idmapped mounts Alexander Mikhalitsyn
` (6 preceding siblings ...)
2023-08-04 8:48 ` [PATCH v9 07/12] ceph: allow idmapped permission " Alexander Mikhalitsyn
@ 2023-08-04 8:48 ` Alexander Mikhalitsyn
2023-08-04 14:55 ` Christian Brauner
2023-08-04 8:48 ` [PATCH v9 09/12] ceph: allow idmapped setattr inode op Alexander Mikhalitsyn
` (3 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Alexander Mikhalitsyn @ 2023-08-04 8:48 UTC (permalink / raw)
To: xiubli
Cc: brauner, stgraber, linux-fsdevel, Jeff Layton, Ilya Dryomov,
ceph-devel, Alexander Mikhalitsyn, linux-kernel
Just pass down the mount's idmapping to __ceph_setattr,
because we will need it later.
Cc: Xiubo Li <xiubli@redhat.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: brauner@kernel.org
Cc: ceph-devel@vger.kernel.org
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
---
fs/ceph/acl.c | 4 ++--
fs/ceph/crypto.c | 2 +-
fs/ceph/inode.c | 5 +++--
fs/ceph/super.h | 3 ++-
4 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/fs/ceph/acl.c b/fs/ceph/acl.c
index 32b26deb1741..89280c168acb 100644
--- a/fs/ceph/acl.c
+++ b/fs/ceph/acl.c
@@ -142,7 +142,7 @@ int ceph_set_acl(struct mnt_idmap *idmap, struct dentry *dentry,
newattrs.ia_ctime = current_time(inode);
newattrs.ia_mode = new_mode;
newattrs.ia_valid = ATTR_MODE | ATTR_CTIME;
- ret = __ceph_setattr(inode, &newattrs, NULL);
+ ret = __ceph_setattr(idmap, inode, &newattrs, NULL);
if (ret)
goto out_free;
}
@@ -153,7 +153,7 @@ int ceph_set_acl(struct mnt_idmap *idmap, struct dentry *dentry,
newattrs.ia_ctime = old_ctime;
newattrs.ia_mode = old_mode;
newattrs.ia_valid = ATTR_MODE | ATTR_CTIME;
- __ceph_setattr(inode, &newattrs, NULL);
+ __ceph_setattr(idmap, inode, &newattrs, NULL);
}
goto out_free;
}
diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
index b9071bba3b08..8cf32e7f59bf 100644
--- a/fs/ceph/crypto.c
+++ b/fs/ceph/crypto.c
@@ -112,7 +112,7 @@ static int ceph_crypt_set_context(struct inode *inode, const void *ctx, size_t l
cia.fscrypt_auth = cfa;
- ret = __ceph_setattr(inode, &attr, &cia);
+ ret = __ceph_setattr(&nop_mnt_idmap, inode, &attr, &cia);
if (ret == 0)
inode_set_flags(inode, S_ENCRYPTED, S_ENCRYPTED);
kfree(cia.fscrypt_auth);
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 9b50861bd2b5..6c4cc009d819 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -2466,7 +2466,8 @@ static int fill_fscrypt_truncate(struct inode *inode,
return ret;
}
-int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *cia)
+int __ceph_setattr(struct mnt_idmap *idmap, struct inode *inode,
+ struct iattr *attr, struct ceph_iattr *cia)
{
struct ceph_inode_info *ci = ceph_inode(inode);
unsigned int ia_valid = attr->ia_valid;
@@ -2818,7 +2819,7 @@ int ceph_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
ceph_quota_is_max_bytes_exceeded(inode, attr->ia_size))
return -EDQUOT;
- err = __ceph_setattr(inode, attr, NULL);
+ err = __ceph_setattr(idmap, inode, attr, NULL);
if (err >= 0 && (attr->ia_valid & ATTR_MODE))
err = posix_acl_chmod(&nop_mnt_idmap, dentry, attr->ia_mode);
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 4e78de1be23e..e729cde7b4a0 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -1101,7 +1101,8 @@ struct ceph_iattr {
struct ceph_fscrypt_auth *fscrypt_auth;
};
-extern int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *cia);
+extern int __ceph_setattr(struct mnt_idmap *idmap, struct inode *inode,
+ struct iattr *attr, struct ceph_iattr *cia);
extern int ceph_setattr(struct mnt_idmap *idmap,
struct dentry *dentry, struct iattr *attr);
extern int ceph_getattr(struct mnt_idmap *idmap,
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v9 09/12] ceph: allow idmapped setattr inode op
2023-08-04 8:48 [PATCH v9 00/12] ceph: support idmapped mounts Alexander Mikhalitsyn
` (7 preceding siblings ...)
2023-08-04 8:48 ` [PATCH v9 08/12] ceph: pass idmap to __ceph_setattr Alexander Mikhalitsyn
@ 2023-08-04 8:48 ` Alexander Mikhalitsyn
2023-08-04 8:48 ` [PATCH v9 10/12] ceph/acl: allow idmapped set_acl " Alexander Mikhalitsyn
` (2 subsequent siblings)
11 siblings, 0 replies; 25+ messages in thread
From: Alexander Mikhalitsyn @ 2023-08-04 8:48 UTC (permalink / raw)
To: xiubli
Cc: brauner, stgraber, linux-fsdevel, Jeff Layton, Ilya Dryomov,
ceph-devel, Alexander Mikhalitsyn, linux-kernel
From: Christian Brauner <brauner@kernel.org>
Enable __ceph_setattr() to handle idmapped mounts. This is just a matter
of passing down the mount's idmapping.
Cc: Xiubo Li <xiubli@redhat.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: ceph-devel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
[ adapted to b27c82e12965 ("attr: port attribute changes to new types") ]
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
---
v4:
- introduced fsuid/fsgid local variables
v3:
- reworked as Christian suggested here:
https://lore.kernel.org/lkml/20230602-vorzeichen-praktikum-f17931692301@brauner/
---
fs/ceph/inode.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 6c4cc009d819..0a8cc0327f85 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -2553,33 +2553,37 @@ int __ceph_setattr(struct mnt_idmap *idmap, struct inode *inode,
#endif /* CONFIG_FS_ENCRYPTION */
if (ia_valid & ATTR_UID) {
+ kuid_t fsuid = from_vfsuid(idmap, i_user_ns(inode), attr->ia_vfsuid);
+
doutc(cl, "%p %llx.%llx uid %d -> %d\n", inode,
ceph_vinop(inode),
from_kuid(&init_user_ns, inode->i_uid),
from_kuid(&init_user_ns, attr->ia_uid));
if (issued & CEPH_CAP_AUTH_EXCL) {
- inode->i_uid = attr->ia_uid;
+ inode->i_uid = fsuid;
dirtied |= CEPH_CAP_AUTH_EXCL;
} else if ((issued & CEPH_CAP_AUTH_SHARED) == 0 ||
- !uid_eq(attr->ia_uid, inode->i_uid)) {
+ !uid_eq(fsuid, inode->i_uid)) {
req->r_args.setattr.uid = cpu_to_le32(
- from_kuid(&init_user_ns, attr->ia_uid));
+ from_kuid(&init_user_ns, fsuid));
mask |= CEPH_SETATTR_UID;
release |= CEPH_CAP_AUTH_SHARED;
}
}
if (ia_valid & ATTR_GID) {
+ kgid_t fsgid = from_vfsgid(idmap, i_user_ns(inode), attr->ia_vfsgid);
+
doutc(cl, "%p %llx.%llx gid %d -> %d\n", inode,
ceph_vinop(inode),
from_kgid(&init_user_ns, inode->i_gid),
from_kgid(&init_user_ns, attr->ia_gid));
if (issued & CEPH_CAP_AUTH_EXCL) {
- inode->i_gid = attr->ia_gid;
+ inode->i_gid = fsgid;
dirtied |= CEPH_CAP_AUTH_EXCL;
} else if ((issued & CEPH_CAP_AUTH_SHARED) == 0 ||
- !gid_eq(attr->ia_gid, inode->i_gid)) {
+ !gid_eq(fsgid, inode->i_gid)) {
req->r_args.setattr.gid = cpu_to_le32(
- from_kgid(&init_user_ns, attr->ia_gid));
+ from_kgid(&init_user_ns, fsgid));
mask |= CEPH_SETATTR_GID;
release |= CEPH_CAP_AUTH_SHARED;
}
@@ -2807,7 +2811,7 @@ int ceph_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
if (err)
return err;
- err = setattr_prepare(&nop_mnt_idmap, dentry, attr);
+ err = setattr_prepare(idmap, dentry, attr);
if (err != 0)
return err;
@@ -2822,7 +2826,7 @@ int ceph_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
err = __ceph_setattr(idmap, inode, attr, NULL);
if (err >= 0 && (attr->ia_valid & ATTR_MODE))
- err = posix_acl_chmod(&nop_mnt_idmap, dentry, attr->ia_mode);
+ err = posix_acl_chmod(idmap, dentry, attr->ia_mode);
return err;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v9 10/12] ceph/acl: allow idmapped set_acl inode op
2023-08-04 8:48 [PATCH v9 00/12] ceph: support idmapped mounts Alexander Mikhalitsyn
` (8 preceding siblings ...)
2023-08-04 8:48 ` [PATCH v9 09/12] ceph: allow idmapped setattr inode op Alexander Mikhalitsyn
@ 2023-08-04 8:48 ` Alexander Mikhalitsyn
2023-08-04 8:48 ` [PATCH v9 11/12] ceph/file: allow idmapped atomic_open " Alexander Mikhalitsyn
2023-08-04 8:48 ` [PATCH v9 12/12] ceph: allow idmapped mounts Alexander Mikhalitsyn
11 siblings, 0 replies; 25+ messages in thread
From: Alexander Mikhalitsyn @ 2023-08-04 8:48 UTC (permalink / raw)
To: xiubli
Cc: brauner, stgraber, linux-fsdevel, Jeff Layton, Ilya Dryomov,
ceph-devel, Alexander Mikhalitsyn, linux-kernel
From: Christian Brauner <brauner@kernel.org>
Enable ceph_set_acl() to handle idmapped mounts. This is just a matter
of passing down the mount's idmapping.
Cc: Xiubo Li <xiubli@redhat.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: ceph-devel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
---
fs/ceph/acl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ceph/acl.c b/fs/ceph/acl.c
index 89280c168acb..ffc6a1c02388 100644
--- a/fs/ceph/acl.c
+++ b/fs/ceph/acl.c
@@ -107,7 +107,7 @@ int ceph_set_acl(struct mnt_idmap *idmap, struct dentry *dentry,
case ACL_TYPE_ACCESS:
name = XATTR_NAME_POSIX_ACL_ACCESS;
if (acl) {
- ret = posix_acl_update_mode(&nop_mnt_idmap, inode,
+ ret = posix_acl_update_mode(idmap, inode,
&new_mode, &acl);
if (ret)
goto out;
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v9 11/12] ceph/file: allow idmapped atomic_open inode op
2023-08-04 8:48 [PATCH v9 00/12] ceph: support idmapped mounts Alexander Mikhalitsyn
` (9 preceding siblings ...)
2023-08-04 8:48 ` [PATCH v9 10/12] ceph/acl: allow idmapped set_acl " Alexander Mikhalitsyn
@ 2023-08-04 8:48 ` Alexander Mikhalitsyn
2023-08-04 8:48 ` [PATCH v9 12/12] ceph: allow idmapped mounts Alexander Mikhalitsyn
11 siblings, 0 replies; 25+ messages in thread
From: Alexander Mikhalitsyn @ 2023-08-04 8:48 UTC (permalink / raw)
To: xiubli
Cc: brauner, stgraber, linux-fsdevel, Jeff Layton, Ilya Dryomov,
ceph-devel, Alexander Mikhalitsyn, linux-kernel
From: Christian Brauner <brauner@kernel.org>
Enable ceph_atomic_open() to handle idmapped mounts. This is just a
matter of passing down the mount's idmapping.
Cc: Xiubo Li <xiubli@redhat.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: ceph-devel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
[ adapted to 5fadbd9929 ("ceph: rely on vfs for setgid stripping") ]
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
---
v4:
- call mnt_idmap_get
---
fs/ceph/file.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 7470daafe595..f73d8b760682 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -668,7 +668,9 @@ static int ceph_finish_async_create(struct inode *dir, struct inode *inode,
in.truncate_seq = cpu_to_le32(1);
in.truncate_size = cpu_to_le64(-1ULL);
in.xattr_version = cpu_to_le64(1);
- in.uid = cpu_to_le32(from_kuid(&init_user_ns, current_fsuid()));
+ in.uid = cpu_to_le32(from_kuid(&init_user_ns,
+ mapped_fsuid(req->r_mnt_idmap,
+ &init_user_ns)));
if (dir->i_mode & S_ISGID) {
in.gid = cpu_to_le32(from_kgid(&init_user_ns, dir->i_gid));
@@ -676,7 +678,9 @@ static int ceph_finish_async_create(struct inode *dir, struct inode *inode,
if (S_ISDIR(mode))
mode |= S_ISGID;
} else {
- in.gid = cpu_to_le32(from_kgid(&init_user_ns, current_fsgid()));
+ in.gid = cpu_to_le32(from_kgid(&init_user_ns,
+ mapped_fsgid(req->r_mnt_idmap,
+ &init_user_ns)));
}
in.mode = cpu_to_le32((u32)mode);
@@ -743,6 +747,7 @@ static int ceph_finish_async_create(struct inode *dir, struct inode *inode,
int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
struct file *file, unsigned flags, umode_t mode)
{
+ struct mnt_idmap *idmap = file_mnt_idmap(file);
struct ceph_fs_client *fsc = ceph_sb_to_fs_client(dir->i_sb);
struct ceph_client *cl = fsc->client;
struct ceph_mds_client *mdsc = fsc->mdsc;
@@ -802,6 +807,7 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
mask |= CEPH_CAP_XATTR_SHARED;
req->r_args.open.mask = cpu_to_le32(mask);
req->r_parent = dir;
+ req->r_mnt_idmap = mnt_idmap_get(idmap);
ihold(dir);
if (IS_ENCRYPTED(dir)) {
set_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags);
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v9 12/12] ceph: allow idmapped mounts
2023-08-04 8:48 [PATCH v9 00/12] ceph: support idmapped mounts Alexander Mikhalitsyn
` (10 preceding siblings ...)
2023-08-04 8:48 ` [PATCH v9 11/12] ceph/file: allow idmapped atomic_open " Alexander Mikhalitsyn
@ 2023-08-04 8:48 ` Alexander Mikhalitsyn
11 siblings, 0 replies; 25+ messages in thread
From: Alexander Mikhalitsyn @ 2023-08-04 8:48 UTC (permalink / raw)
To: xiubli
Cc: brauner, stgraber, linux-fsdevel, Jeff Layton, Ilya Dryomov,
ceph-devel, Alexander Mikhalitsyn, linux-kernel
From: Christian Brauner <brauner@kernel.org>
Now that we converted cephfs internally to account for idmapped mounts
allow the creation of idmapped mounts on by setting the FS_ALLOW_IDMAP
flag.
Cc: Xiubo Li <xiubli@redhat.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: ceph-devel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
---
fs/ceph/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index 18bfdfd48cef..ad6d40309ebe 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -1581,7 +1581,7 @@ static struct file_system_type ceph_fs_type = {
.name = "ceph",
.init_fs_context = ceph_init_fs_context,
.kill_sb = ceph_kill_sb,
- .fs_flags = FS_RENAME_DOES_D_MOVE,
+ .fs_flags = FS_RENAME_DOES_D_MOVE | FS_ALLOW_IDMAP,
};
MODULE_ALIAS_FS("ceph");
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v9 03/12] ceph: handle idmapped mounts in create_request_message()
2023-08-04 8:48 ` [PATCH v9 03/12] ceph: handle idmapped mounts in create_request_message() Alexander Mikhalitsyn
@ 2023-08-04 14:53 ` Christian Brauner
2023-08-05 1:55 ` kernel test robot
2023-08-07 1:24 ` Xiubo Li
2 siblings, 0 replies; 25+ messages in thread
From: Christian Brauner @ 2023-08-04 14:53 UTC (permalink / raw)
To: Alexander Mikhalitsyn
Cc: xiubli, stgraber, linux-fsdevel, Jeff Layton, Ilya Dryomov,
ceph-devel, linux-kernel
On Fri, Aug 04, 2023 at 10:48:49AM +0200, Alexander Mikhalitsyn wrote:
> From: Christian Brauner <brauner@kernel.org>
>
> Inode operations that create a new filesystem object such as ->mknod,
> ->create, ->mkdir() and others don't take a {g,u}id argument explicitly.
> Instead the caller's fs{g,u}id is used for the {g,u}id of the new
> filesystem object.
>
> In order to ensure that the correct {g,u}id is used map the caller's
> fs{g,u}id for creation requests. This doesn't require complex changes.
> It suffices to pass in the relevant idmapping recorded in the request
> message. If this request message was triggered from an inode operation
> that creates filesystem objects it will have passed down the relevant
> idmaping. If this is a request message that was triggered from an inode
> operation that doens't need to take idmappings into account the initial
> idmapping is passed down which is an identity mapping.
>
> This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID
> which adds two new fields (owner_{u,g}id) to the request head structure.
> So, we need to ensure that MDS supports it otherwise we need to fail
> any IO that comes through an idmapped mount because we can't process it
> in a proper way. MDS server without such an extension will use caller_{u,g}id
> fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id
> values are unmapped. At the same time we can't map these fields with an
> idmapping as it can break UID/GID-based permission checks logic on the
> MDS side. This problem was described with a lot of details at [1], [2].
>
> [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
> [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/
>
> Link: https://github.com/ceph/ceph/pull/52575
> Link: https://tracker.ceph.com/issues/62217
> Cc: Xiubo Li <xiubli@redhat.com>
> Cc: Jeff Layton <jlayton@kernel.org>
> Cc: Ilya Dryomov <idryomov@gmail.com>
> Cc: ceph-devel@vger.kernel.org
> Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> Signed-off-by: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> ---
I like the new extension,
Acked-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v9 04/12] ceph: add enable_unsafe_idmap module parameter
2023-08-04 8:48 ` [PATCH v9 04/12] ceph: add enable_unsafe_idmap module parameter Alexander Mikhalitsyn
@ 2023-08-04 14:54 ` Christian Brauner
0 siblings, 0 replies; 25+ messages in thread
From: Christian Brauner @ 2023-08-04 14:54 UTC (permalink / raw)
To: Alexander Mikhalitsyn
Cc: xiubli, stgraber, linux-fsdevel, Jeff Layton, Ilya Dryomov,
ceph-devel, linux-kernel
On Fri, Aug 04, 2023 at 10:48:50AM +0200, Alexander Mikhalitsyn wrote:
> This parameter is used to decide if we allow
> to perform IO on idmapped mount in case when MDS lacks
> support of CEPHFS_FEATURE_HAS_OWNER_UIDGID feature.
>
> In this case we can't properly handle MDS permission
> checks and if UID/GID-based restrictions are enabled
> on the MDS side then IO requests which go through an
> idmapped mount may fail with -EACCESS/-EPERM.
> Fortunately, for most of users it's not a case and
> everything should work fine. But we put work "unsafe"
> in the module parameter name to warn users about
> possible problems with this feature and encourage
> update of cephfs MDS.
>
> Cc: Xiubo Li <xiubli@redhat.com>
> Cc: Jeff Layton <jlayton@kernel.org>
> Cc: Ilya Dryomov <idryomov@gmail.com>
> Cc: ceph-devel@vger.kernel.org
> Suggested-by: Stéphane Graber <stgraber@ubuntu.com>
> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> ---
Seems good enough,
Acked-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v9 08/12] ceph: pass idmap to __ceph_setattr
2023-08-04 8:48 ` [PATCH v9 08/12] ceph: pass idmap to __ceph_setattr Alexander Mikhalitsyn
@ 2023-08-04 14:55 ` Christian Brauner
0 siblings, 0 replies; 25+ messages in thread
From: Christian Brauner @ 2023-08-04 14:55 UTC (permalink / raw)
To: Alexander Mikhalitsyn
Cc: xiubli, stgraber, linux-fsdevel, Jeff Layton, Ilya Dryomov,
ceph-devel, linux-kernel
On Fri, Aug 04, 2023 at 10:48:54AM +0200, Alexander Mikhalitsyn wrote:
> Just pass down the mount's idmapping to __ceph_setattr,
> because we will need it later.
>
> Cc: Xiubo Li <xiubli@redhat.com>
> Cc: Jeff Layton <jlayton@kernel.org>
> Cc: Ilya Dryomov <idryomov@gmail.com>
> Cc: brauner@kernel.org
Cc: Christian Brauner <brauner@kernel.org>
> Cc: ceph-devel@vger.kernel.org
> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> ---
Acked-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v9 03/12] ceph: handle idmapped mounts in create_request_message()
2023-08-04 8:48 ` [PATCH v9 03/12] ceph: handle idmapped mounts in create_request_message() Alexander Mikhalitsyn
2023-08-04 14:53 ` Christian Brauner
@ 2023-08-05 1:55 ` kernel test robot
2023-08-05 9:15 ` Aleksandr Mikhalitsyn
2023-08-07 1:24 ` Xiubo Li
2 siblings, 1 reply; 25+ messages in thread
From: kernel test robot @ 2023-08-05 1:55 UTC (permalink / raw)
To: Alexander Mikhalitsyn, xiubli
Cc: oe-kbuild-all, brauner, stgraber, linux-fsdevel, Jeff Layton,
Ilya Dryomov, ceph-devel, Alexander Mikhalitsyn, linux-kernel
Hi Alexander,
kernel test robot noticed the following build warnings:
[auto build test WARNING on ceph-client/testing]
[cannot apply to ceph-client/for-linus brauner-vfs/vfs.all linus/master vfs-idmapping/for-next v6.5-rc4 next-20230804]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Alexander-Mikhalitsyn/fs-export-mnt_idmap_get-mnt_idmap_put/20230804-165330
base: https://github.com/ceph/ceph-client.git testing
patch link: https://lore.kernel.org/r/20230804084858.126104-4-aleksandr.mikhalitsyn%40canonical.com
patch subject: [PATCH v9 03/12] ceph: handle idmapped mounts in create_request_message()
config: um-randconfig-r091-20230730 (https://download.01.org/0day-ci/archive/20230805/202308050925.ifGg1BUH-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce: (https://download.01.org/0day-ci/archive/20230805/202308050925.ifGg1BUH-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202308050925.ifGg1BUH-lkp@intel.com/
sparse warnings: (new ones prefixed by >>)
>> fs/ceph/mds_client.c:3082:35: sparse: sparse: incorrect type in assignment (different base types) @@ expected restricted __le32 [usertype] struct_len @@ got unsigned long @@
fs/ceph/mds_client.c:3082:35: sparse: expected restricted __le32 [usertype] struct_len
fs/ceph/mds_client.c:3082:35: sparse: got unsigned long
vim +3082 fs/ceph/mds_client.c
2927
2928 /*
2929 * called under mdsc->mutex
2930 */
2931 static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
2932 struct ceph_mds_request *req,
2933 bool drop_cap_releases)
2934 {
2935 int mds = session->s_mds;
2936 struct ceph_mds_client *mdsc = session->s_mdsc;
2937 struct ceph_client *cl = mdsc->fsc->client;
2938 struct ceph_msg *msg;
2939 struct ceph_mds_request_head_legacy *lhead;
2940 const char *path1 = NULL;
2941 const char *path2 = NULL;
2942 u64 ino1 = 0, ino2 = 0;
2943 int pathlen1 = 0, pathlen2 = 0;
2944 bool freepath1 = false, freepath2 = false;
2945 struct dentry *old_dentry = NULL;
2946 int len;
2947 u16 releases;
2948 void *p, *end;
2949 int ret;
2950 bool legacy = !(session->s_con.peer_features & CEPH_FEATURE_FS_BTIME);
2951 u16 request_head_version = mds_supported_head_version(session);
2952
2953 ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry,
2954 req->r_parent, req->r_path1, req->r_ino1.ino,
2955 &path1, &pathlen1, &ino1, &freepath1,
2956 test_bit(CEPH_MDS_R_PARENT_LOCKED,
2957 &req->r_req_flags));
2958 if (ret < 0) {
2959 msg = ERR_PTR(ret);
2960 goto out;
2961 }
2962
2963 /* If r_old_dentry is set, then assume that its parent is locked */
2964 if (req->r_old_dentry &&
2965 !(req->r_old_dentry->d_flags & DCACHE_DISCONNECTED))
2966 old_dentry = req->r_old_dentry;
2967 ret = set_request_path_attr(mdsc, NULL, old_dentry,
2968 req->r_old_dentry_dir,
2969 req->r_path2, req->r_ino2.ino,
2970 &path2, &pathlen2, &ino2, &freepath2, true);
2971 if (ret < 0) {
2972 msg = ERR_PTR(ret);
2973 goto out_free1;
2974 }
2975
2976 req->r_altname = get_fscrypt_altname(req, &req->r_altname_len);
2977 if (IS_ERR(req->r_altname)) {
2978 msg = ERR_CAST(req->r_altname);
2979 req->r_altname = NULL;
2980 goto out_free2;
2981 }
2982
2983 /*
2984 * For old cephs without supporting the 32bit retry/fwd feature
2985 * it will copy the raw memories directly when decoding the
2986 * requests. While new cephs will decode the head depending the
2987 * version member, so we need to make sure it will be compatible
2988 * with them both.
2989 */
2990 if (legacy)
2991 len = sizeof(struct ceph_mds_request_head_legacy);
2992 else if (request_head_version == 1)
2993 len = sizeof(struct ceph_mds_request_head_old);
2994 else if (request_head_version == 2)
2995 len = offsetofend(struct ceph_mds_request_head, ext_num_fwd);
2996 else
2997 len = sizeof(struct ceph_mds_request_head);
2998
2999 /* filepaths */
3000 len += 2 * (1 + sizeof(u32) + sizeof(u64));
3001 len += pathlen1 + pathlen2;
3002
3003 /* cap releases */
3004 len += sizeof(struct ceph_mds_request_release) *
3005 (!!req->r_inode_drop + !!req->r_dentry_drop +
3006 !!req->r_old_inode_drop + !!req->r_old_dentry_drop);
3007
3008 if (req->r_dentry_drop)
3009 len += pathlen1;
3010 if (req->r_old_dentry_drop)
3011 len += pathlen2;
3012
3013 /* MClientRequest tail */
3014
3015 /* req->r_stamp */
3016 len += sizeof(struct ceph_timespec);
3017
3018 /* gid list */
3019 len += sizeof(u32) + (sizeof(u64) * req->r_cred->group_info->ngroups);
3020
3021 /* alternate name */
3022 len += sizeof(u32) + req->r_altname_len;
3023
3024 /* fscrypt_auth */
3025 len += sizeof(u32); // fscrypt_auth
3026 if (req->r_fscrypt_auth)
3027 len += ceph_fscrypt_auth_len(req->r_fscrypt_auth);
3028
3029 /* fscrypt_file */
3030 len += sizeof(u32);
3031 if (test_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags))
3032 len += sizeof(__le64);
3033
3034 msg = ceph_msg_new2(CEPH_MSG_CLIENT_REQUEST, len, 1, GFP_NOFS, false);
3035 if (!msg) {
3036 msg = ERR_PTR(-ENOMEM);
3037 goto out_free2;
3038 }
3039
3040 msg->hdr.tid = cpu_to_le64(req->r_tid);
3041
3042 lhead = find_legacy_request_head(msg->front.iov_base,
3043 session->s_con.peer_features);
3044
3045 if ((req->r_mnt_idmap != &nop_mnt_idmap) &&
3046 !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) {
3047 pr_err_ratelimited_client(cl,
3048 "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID"
3049 " is not supported by MDS. Fail request with -EIO.\n");
3050
3051 ret = -EIO;
3052 goto out_err;
3053 }
3054
3055 /*
3056 * The ceph_mds_request_head_legacy didn't contain a version field, and
3057 * one was added when we moved the message version from 3->4.
3058 */
3059 if (legacy) {
3060 msg->hdr.version = cpu_to_le16(3);
3061 p = msg->front.iov_base + sizeof(*lhead);
3062 } else if (request_head_version == 1) {
3063 struct ceph_mds_request_head_old *ohead = msg->front.iov_base;
3064
3065 msg->hdr.version = cpu_to_le16(4);
3066 ohead->version = cpu_to_le16(1);
3067 p = msg->front.iov_base + sizeof(*ohead);
3068 } else if (request_head_version == 2) {
3069 struct ceph_mds_request_head *nhead = msg->front.iov_base;
3070
3071 msg->hdr.version = cpu_to_le16(6);
3072 nhead->version = cpu_to_le16(2);
3073
3074 p = msg->front.iov_base + offsetofend(struct ceph_mds_request_head, ext_num_fwd);
3075 } else {
3076 struct ceph_mds_request_head *nhead = msg->front.iov_base;
3077 kuid_t owner_fsuid;
3078 kgid_t owner_fsgid;
3079
3080 msg->hdr.version = cpu_to_le16(6);
3081 nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
> 3082 nhead->struct_len = sizeof(struct ceph_mds_request_head);
3083
3084 owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns,
3085 VFSUIDT_INIT(req->r_cred->fsuid));
3086 owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns,
3087 VFSGIDT_INIT(req->r_cred->fsgid));
3088 nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, owner_fsuid));
3089 nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, owner_fsgid));
3090 p = msg->front.iov_base + sizeof(*nhead);
3091 }
3092
3093 end = msg->front.iov_base + msg->front.iov_len;
3094
3095 lhead->mdsmap_epoch = cpu_to_le32(mdsc->mdsmap->m_epoch);
3096 lhead->op = cpu_to_le32(req->r_op);
3097 lhead->caller_uid = cpu_to_le32(from_kuid(&init_user_ns,
3098 req->r_cred->fsuid));
3099 lhead->caller_gid = cpu_to_le32(from_kgid(&init_user_ns,
3100 req->r_cred->fsgid));
3101 lhead->ino = cpu_to_le64(req->r_deleg_ino);
3102 lhead->args = req->r_args;
3103
3104 ceph_encode_filepath(&p, end, ino1, path1);
3105 ceph_encode_filepath(&p, end, ino2, path2);
3106
3107 /* make note of release offset, in case we need to replay */
3108 req->r_request_release_offset = p - msg->front.iov_base;
3109
3110 /* cap releases */
3111 releases = 0;
3112 if (req->r_inode_drop)
3113 releases += ceph_encode_inode_release(&p,
3114 req->r_inode ? req->r_inode : d_inode(req->r_dentry),
3115 mds, req->r_inode_drop, req->r_inode_unless,
3116 req->r_op == CEPH_MDS_OP_READDIR);
3117 if (req->r_dentry_drop) {
3118 ret = ceph_encode_dentry_release(&p, req->r_dentry,
3119 req->r_parent, mds, req->r_dentry_drop,
3120 req->r_dentry_unless);
3121 if (ret < 0)
3122 goto out_err;
3123 releases += ret;
3124 }
3125 if (req->r_old_dentry_drop) {
3126 ret = ceph_encode_dentry_release(&p, req->r_old_dentry,
3127 req->r_old_dentry_dir, mds,
3128 req->r_old_dentry_drop,
3129 req->r_old_dentry_unless);
3130 if (ret < 0)
3131 goto out_err;
3132 releases += ret;
3133 }
3134 if (req->r_old_inode_drop)
3135 releases += ceph_encode_inode_release(&p,
3136 d_inode(req->r_old_dentry),
3137 mds, req->r_old_inode_drop, req->r_old_inode_unless, 0);
3138
3139 if (drop_cap_releases) {
3140 releases = 0;
3141 p = msg->front.iov_base + req->r_request_release_offset;
3142 }
3143
3144 lhead->num_releases = cpu_to_le16(releases);
3145
3146 encode_mclientrequest_tail(&p, req);
3147
3148 if (WARN_ON_ONCE(p > end)) {
3149 ceph_msg_put(msg);
3150 msg = ERR_PTR(-ERANGE);
3151 goto out_free2;
3152 }
3153
3154 msg->front.iov_len = p - msg->front.iov_base;
3155 msg->hdr.front_len = cpu_to_le32(msg->front.iov_len);
3156
3157 if (req->r_pagelist) {
3158 struct ceph_pagelist *pagelist = req->r_pagelist;
3159 ceph_msg_data_add_pagelist(msg, pagelist);
3160 msg->hdr.data_len = cpu_to_le32(pagelist->length);
3161 } else {
3162 msg->hdr.data_len = 0;
3163 }
3164
3165 msg->hdr.data_off = cpu_to_le16(0);
3166
3167 out_free2:
3168 if (freepath2)
3169 ceph_mdsc_free_path((char *)path2, pathlen2);
3170 out_free1:
3171 if (freepath1)
3172 ceph_mdsc_free_path((char *)path1, pathlen1);
3173 out:
3174 return msg;
3175 out_err:
3176 ceph_msg_put(msg);
3177 msg = ERR_PTR(ret);
3178 goto out_free2;
3179 }
3180
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v9 03/12] ceph: handle idmapped mounts in create_request_message()
2023-08-05 1:55 ` kernel test robot
@ 2023-08-05 9:15 ` Aleksandr Mikhalitsyn
0 siblings, 0 replies; 25+ messages in thread
From: Aleksandr Mikhalitsyn @ 2023-08-05 9:15 UTC (permalink / raw)
To: kernel test robot
Cc: xiubli, oe-kbuild-all, brauner, stgraber, linux-fsdevel,
Jeff Layton, Ilya Dryomov, ceph-devel, linux-kernel
On Sat, Aug 5, 2023 at 3:56 AM kernel test robot <lkp@intel.com> wrote:
>
> Hi Alexander,
>
> kernel test robot noticed the following build warnings:
>
> [auto build test WARNING on ceph-client/testing]
> [cannot apply to ceph-client/for-linus brauner-vfs/vfs.all linus/master vfs-idmapping/for-next v6.5-rc4 next-20230804]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Alexander-Mikhalitsyn/fs-export-mnt_idmap_get-mnt_idmap_put/20230804-165330
> base: https://github.com/ceph/ceph-client.git testing
> patch link: https://lore.kernel.org/r/20230804084858.126104-4-aleksandr.mikhalitsyn%40canonical.com
> patch subject: [PATCH v9 03/12] ceph: handle idmapped mounts in create_request_message()
> config: um-randconfig-r091-20230730 (https://download.01.org/0day-ci/archive/20230805/202308050925.ifGg1BUH-lkp@intel.com/config)
> compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
> reproduce: (https://download.01.org/0day-ci/archive/20230805/202308050925.ifGg1BUH-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202308050925.ifGg1BUH-lkp@intel.com/
>
> sparse warnings: (new ones prefixed by >>)
> >> fs/ceph/mds_client.c:3082:35: sparse: sparse: incorrect type in assignment (different base types) @@ expected restricted __le32 [usertype] struct_len @@ got unsigned long @@
> fs/ceph/mds_client.c:3082:35: sparse: expected restricted __le32 [usertype] struct_len
> fs/ceph/mds_client.c:3082:35: sparse: got unsigned long
>
> vim +3082 fs/ceph/mds_client.c
>
> 2927
> 2928 /*
> 2929 * called under mdsc->mutex
> 2930 */
> 2931 static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> 2932 struct ceph_mds_request *req,
> 2933 bool drop_cap_releases)
> 2934 {
> 2935 int mds = session->s_mds;
> 2936 struct ceph_mds_client *mdsc = session->s_mdsc;
> 2937 struct ceph_client *cl = mdsc->fsc->client;
> 2938 struct ceph_msg *msg;
> 2939 struct ceph_mds_request_head_legacy *lhead;
> 2940 const char *path1 = NULL;
> 2941 const char *path2 = NULL;
> 2942 u64 ino1 = 0, ino2 = 0;
> 2943 int pathlen1 = 0, pathlen2 = 0;
> 2944 bool freepath1 = false, freepath2 = false;
> 2945 struct dentry *old_dentry = NULL;
> 2946 int len;
> 2947 u16 releases;
> 2948 void *p, *end;
> 2949 int ret;
> 2950 bool legacy = !(session->s_con.peer_features & CEPH_FEATURE_FS_BTIME);
> 2951 u16 request_head_version = mds_supported_head_version(session);
> 2952
> 2953 ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry,
> 2954 req->r_parent, req->r_path1, req->r_ino1.ino,
> 2955 &path1, &pathlen1, &ino1, &freepath1,
> 2956 test_bit(CEPH_MDS_R_PARENT_LOCKED,
> 2957 &req->r_req_flags));
> 2958 if (ret < 0) {
> 2959 msg = ERR_PTR(ret);
> 2960 goto out;
> 2961 }
> 2962
> 2963 /* If r_old_dentry is set, then assume that its parent is locked */
> 2964 if (req->r_old_dentry &&
> 2965 !(req->r_old_dentry->d_flags & DCACHE_DISCONNECTED))
> 2966 old_dentry = req->r_old_dentry;
> 2967 ret = set_request_path_attr(mdsc, NULL, old_dentry,
> 2968 req->r_old_dentry_dir,
> 2969 req->r_path2, req->r_ino2.ino,
> 2970 &path2, &pathlen2, &ino2, &freepath2, true);
> 2971 if (ret < 0) {
> 2972 msg = ERR_PTR(ret);
> 2973 goto out_free1;
> 2974 }
> 2975
> 2976 req->r_altname = get_fscrypt_altname(req, &req->r_altname_len);
> 2977 if (IS_ERR(req->r_altname)) {
> 2978 msg = ERR_CAST(req->r_altname);
> 2979 req->r_altname = NULL;
> 2980 goto out_free2;
> 2981 }
> 2982
> 2983 /*
> 2984 * For old cephs without supporting the 32bit retry/fwd feature
> 2985 * it will copy the raw memories directly when decoding the
> 2986 * requests. While new cephs will decode the head depending the
> 2987 * version member, so we need to make sure it will be compatible
> 2988 * with them both.
> 2989 */
> 2990 if (legacy)
> 2991 len = sizeof(struct ceph_mds_request_head_legacy);
> 2992 else if (request_head_version == 1)
> 2993 len = sizeof(struct ceph_mds_request_head_old);
> 2994 else if (request_head_version == 2)
> 2995 len = offsetofend(struct ceph_mds_request_head, ext_num_fwd);
> 2996 else
> 2997 len = sizeof(struct ceph_mds_request_head);
> 2998
> 2999 /* filepaths */
> 3000 len += 2 * (1 + sizeof(u32) + sizeof(u64));
> 3001 len += pathlen1 + pathlen2;
> 3002
> 3003 /* cap releases */
> 3004 len += sizeof(struct ceph_mds_request_release) *
> 3005 (!!req->r_inode_drop + !!req->r_dentry_drop +
> 3006 !!req->r_old_inode_drop + !!req->r_old_dentry_drop);
> 3007
> 3008 if (req->r_dentry_drop)
> 3009 len += pathlen1;
> 3010 if (req->r_old_dentry_drop)
> 3011 len += pathlen2;
> 3012
> 3013 /* MClientRequest tail */
> 3014
> 3015 /* req->r_stamp */
> 3016 len += sizeof(struct ceph_timespec);
> 3017
> 3018 /* gid list */
> 3019 len += sizeof(u32) + (sizeof(u64) * req->r_cred->group_info->ngroups);
> 3020
> 3021 /* alternate name */
> 3022 len += sizeof(u32) + req->r_altname_len;
> 3023
> 3024 /* fscrypt_auth */
> 3025 len += sizeof(u32); // fscrypt_auth
> 3026 if (req->r_fscrypt_auth)
> 3027 len += ceph_fscrypt_auth_len(req->r_fscrypt_auth);
> 3028
> 3029 /* fscrypt_file */
> 3030 len += sizeof(u32);
> 3031 if (test_bit(CEPH_MDS_R_FSCRYPT_FILE, &req->r_req_flags))
> 3032 len += sizeof(__le64);
> 3033
> 3034 msg = ceph_msg_new2(CEPH_MSG_CLIENT_REQUEST, len, 1, GFP_NOFS, false);
> 3035 if (!msg) {
> 3036 msg = ERR_PTR(-ENOMEM);
> 3037 goto out_free2;
> 3038 }
> 3039
> 3040 msg->hdr.tid = cpu_to_le64(req->r_tid);
> 3041
> 3042 lhead = find_legacy_request_head(msg->front.iov_base,
> 3043 session->s_con.peer_features);
> 3044
> 3045 if ((req->r_mnt_idmap != &nop_mnt_idmap) &&
> 3046 !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) {
> 3047 pr_err_ratelimited_client(cl,
> 3048 "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID"
> 3049 " is not supported by MDS. Fail request with -EIO.\n");
> 3050
> 3051 ret = -EIO;
> 3052 goto out_err;
> 3053 }
> 3054
> 3055 /*
> 3056 * The ceph_mds_request_head_legacy didn't contain a version field, and
> 3057 * one was added when we moved the message version from 3->4.
> 3058 */
> 3059 if (legacy) {
> 3060 msg->hdr.version = cpu_to_le16(3);
> 3061 p = msg->front.iov_base + sizeof(*lhead);
> 3062 } else if (request_head_version == 1) {
> 3063 struct ceph_mds_request_head_old *ohead = msg->front.iov_base;
> 3064
> 3065 msg->hdr.version = cpu_to_le16(4);
> 3066 ohead->version = cpu_to_le16(1);
> 3067 p = msg->front.iov_base + sizeof(*ohead);
> 3068 } else if (request_head_version == 2) {
> 3069 struct ceph_mds_request_head *nhead = msg->front.iov_base;
> 3070
> 3071 msg->hdr.version = cpu_to_le16(6);
> 3072 nhead->version = cpu_to_le16(2);
> 3073
> 3074 p = msg->front.iov_base + offsetofend(struct ceph_mds_request_head, ext_num_fwd);
> 3075 } else {
> 3076 struct ceph_mds_request_head *nhead = msg->front.iov_base;
> 3077 kuid_t owner_fsuid;
> 3078 kgid_t owner_fsgid;
> 3079
> 3080 msg->hdr.version = cpu_to_le16(6);
> 3081 nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
> > 3082 nhead->struct_len = sizeof(struct ceph_mds_request_head);
should be
nhead->struct_len = cpu_to_le32(sizeof(struct ceph_mds_request_head));
Fixed and pushed https://github.com/mihalicyn/linux/commits/fs.idmapped.ceph
> 3083
> 3084 owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns,
> 3085 VFSUIDT_INIT(req->r_cred->fsuid));
> 3086 owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns,
> 3087 VFSGIDT_INIT(req->r_cred->fsgid));
> 3088 nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, owner_fsuid));
> 3089 nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, owner_fsgid));
> 3090 p = msg->front.iov_base + sizeof(*nhead);
> 3091 }
> 3092
> 3093 end = msg->front.iov_base + msg->front.iov_len;
> 3094
> 3095 lhead->mdsmap_epoch = cpu_to_le32(mdsc->mdsmap->m_epoch);
> 3096 lhead->op = cpu_to_le32(req->r_op);
> 3097 lhead->caller_uid = cpu_to_le32(from_kuid(&init_user_ns,
> 3098 req->r_cred->fsuid));
> 3099 lhead->caller_gid = cpu_to_le32(from_kgid(&init_user_ns,
> 3100 req->r_cred->fsgid));
> 3101 lhead->ino = cpu_to_le64(req->r_deleg_ino);
> 3102 lhead->args = req->r_args;
> 3103
> 3104 ceph_encode_filepath(&p, end, ino1, path1);
> 3105 ceph_encode_filepath(&p, end, ino2, path2);
> 3106
> 3107 /* make note of release offset, in case we need to replay */
> 3108 req->r_request_release_offset = p - msg->front.iov_base;
> 3109
> 3110 /* cap releases */
> 3111 releases = 0;
> 3112 if (req->r_inode_drop)
> 3113 releases += ceph_encode_inode_release(&p,
> 3114 req->r_inode ? req->r_inode : d_inode(req->r_dentry),
> 3115 mds, req->r_inode_drop, req->r_inode_unless,
> 3116 req->r_op == CEPH_MDS_OP_READDIR);
> 3117 if (req->r_dentry_drop) {
> 3118 ret = ceph_encode_dentry_release(&p, req->r_dentry,
> 3119 req->r_parent, mds, req->r_dentry_drop,
> 3120 req->r_dentry_unless);
> 3121 if (ret < 0)
> 3122 goto out_err;
> 3123 releases += ret;
> 3124 }
> 3125 if (req->r_old_dentry_drop) {
> 3126 ret = ceph_encode_dentry_release(&p, req->r_old_dentry,
> 3127 req->r_old_dentry_dir, mds,
> 3128 req->r_old_dentry_drop,
> 3129 req->r_old_dentry_unless);
> 3130 if (ret < 0)
> 3131 goto out_err;
> 3132 releases += ret;
> 3133 }
> 3134 if (req->r_old_inode_drop)
> 3135 releases += ceph_encode_inode_release(&p,
> 3136 d_inode(req->r_old_dentry),
> 3137 mds, req->r_old_inode_drop, req->r_old_inode_unless, 0);
> 3138
> 3139 if (drop_cap_releases) {
> 3140 releases = 0;
> 3141 p = msg->front.iov_base + req->r_request_release_offset;
> 3142 }
> 3143
> 3144 lhead->num_releases = cpu_to_le16(releases);
> 3145
> 3146 encode_mclientrequest_tail(&p, req);
> 3147
> 3148 if (WARN_ON_ONCE(p > end)) {
> 3149 ceph_msg_put(msg);
> 3150 msg = ERR_PTR(-ERANGE);
> 3151 goto out_free2;
> 3152 }
> 3153
> 3154 msg->front.iov_len = p - msg->front.iov_base;
> 3155 msg->hdr.front_len = cpu_to_le32(msg->front.iov_len);
> 3156
> 3157 if (req->r_pagelist) {
> 3158 struct ceph_pagelist *pagelist = req->r_pagelist;
> 3159 ceph_msg_data_add_pagelist(msg, pagelist);
> 3160 msg->hdr.data_len = cpu_to_le32(pagelist->length);
> 3161 } else {
> 3162 msg->hdr.data_len = 0;
> 3163 }
> 3164
> 3165 msg->hdr.data_off = cpu_to_le16(0);
> 3166
> 3167 out_free2:
> 3168 if (freepath2)
> 3169 ceph_mdsc_free_path((char *)path2, pathlen2);
> 3170 out_free1:
> 3171 if (freepath1)
> 3172 ceph_mdsc_free_path((char *)path1, pathlen1);
> 3173 out:
> 3174 return msg;
> 3175 out_err:
> 3176 ceph_msg_put(msg);
> 3177 msg = ERR_PTR(ret);
> 3178 goto out_free2;
> 3179 }
> 3180
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v9 03/12] ceph: handle idmapped mounts in create_request_message()
2023-08-04 8:48 ` [PATCH v9 03/12] ceph: handle idmapped mounts in create_request_message() Alexander Mikhalitsyn
2023-08-04 14:53 ` Christian Brauner
2023-08-05 1:55 ` kernel test robot
@ 2023-08-07 1:24 ` Xiubo Li
2023-08-07 6:51 ` Aleksandr Mikhalitsyn
2 siblings, 1 reply; 25+ messages in thread
From: Xiubo Li @ 2023-08-07 1:24 UTC (permalink / raw)
To: Alexander Mikhalitsyn
Cc: brauner, stgraber, linux-fsdevel, Jeff Layton, Ilya Dryomov,
ceph-devel, linux-kernel
On 8/4/23 16:48, Alexander Mikhalitsyn wrote:
> From: Christian Brauner <brauner@kernel.org>
>
> Inode operations that create a new filesystem object such as ->mknod,
> ->create, ->mkdir() and others don't take a {g,u}id argument explicitly.
> Instead the caller's fs{g,u}id is used for the {g,u}id of the new
> filesystem object.
>
> In order to ensure that the correct {g,u}id is used map the caller's
> fs{g,u}id for creation requests. This doesn't require complex changes.
> It suffices to pass in the relevant idmapping recorded in the request
> message. If this request message was triggered from an inode operation
> that creates filesystem objects it will have passed down the relevant
> idmaping. If this is a request message that was triggered from an inode
> operation that doens't need to take idmappings into account the initial
> idmapping is passed down which is an identity mapping.
>
> This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID
> which adds two new fields (owner_{u,g}id) to the request head structure.
> So, we need to ensure that MDS supports it otherwise we need to fail
> any IO that comes through an idmapped mount because we can't process it
> in a proper way. MDS server without such an extension will use caller_{u,g}id
> fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id
> values are unmapped. At the same time we can't map these fields with an
> idmapping as it can break UID/GID-based permission checks logic on the
> MDS side. This problem was described with a lot of details at [1], [2].
>
> [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
> [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/
>
> Link: https://github.com/ceph/ceph/pull/52575
> Link: https://tracker.ceph.com/issues/62217
> Cc: Xiubo Li <xiubli@redhat.com>
> Cc: Jeff Layton <jlayton@kernel.org>
> Cc: Ilya Dryomov <idryomov@gmail.com>
> Cc: ceph-devel@vger.kernel.org
> Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> Signed-off-by: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> ---
> v7:
> - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575)
> v8:
> - properly handled case when old MDS used with new kernel client
> ---
> fs/ceph/mds_client.c | 47 +++++++++++++++++++++++++++++++++---
> fs/ceph/mds_client.h | 5 +++-
> include/linux/ceph/ceph_fs.h | 5 +++-
> 3 files changed, 52 insertions(+), 5 deletions(-)
>
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index 8829f55103da..41e4bf3811c4 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request *
> }
> }
>
> +static inline u16 mds_supported_head_version(struct ceph_mds_session *session)
> +{
> + if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features))
> + return 1;
> +
> + if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features))
> + return 2;
> +
> + return CEPH_MDS_REQUEST_HEAD_VERSION;
> +}
> +
> static struct ceph_mds_request_head_legacy *
> find_legacy_request_head(void *p, u64 features)
> {
> @@ -2923,6 +2934,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> {
> int mds = session->s_mds;
> struct ceph_mds_client *mdsc = session->s_mdsc;
> + struct ceph_client *cl = mdsc->fsc->client;
> struct ceph_msg *msg;
> struct ceph_mds_request_head_legacy *lhead;
> const char *path1 = NULL;
> @@ -2936,7 +2948,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> void *p, *end;
> int ret;
> bool legacy = !(session->s_con.peer_features & CEPH_FEATURE_FS_BTIME);
> - bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features);
> + u16 request_head_version = mds_supported_head_version(session);
>
> ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry,
> req->r_parent, req->r_path1, req->r_ino1.ino,
> @@ -2977,8 +2989,10 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> */
> if (legacy)
> len = sizeof(struct ceph_mds_request_head_legacy);
> - else if (old_version)
> + else if (request_head_version == 1)
> len = sizeof(struct ceph_mds_request_head_old);
> + else if (request_head_version == 2)
> + len = offsetofend(struct ceph_mds_request_head, ext_num_fwd);
> else
> len = sizeof(struct ceph_mds_request_head);
>
> @@ -3028,6 +3042,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> lhead = find_legacy_request_head(msg->front.iov_base,
> session->s_con.peer_features);
>
> + if ((req->r_mnt_idmap != &nop_mnt_idmap) &&
> + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) {
> + pr_err_ratelimited_client(cl,
> + "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID"
> + " is not supported by MDS. Fail request with -EIO.\n");
> +
> + ret = -EIO;
> + goto out_err;
> + }
> +
> /*
> * The ceph_mds_request_head_legacy didn't contain a version field, and
> * one was added when we moved the message version from 3->4.
> @@ -3035,17 +3059,34 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> if (legacy) {
> msg->hdr.version = cpu_to_le16(3);
> p = msg->front.iov_base + sizeof(*lhead);
> - } else if (old_version) {
> + } else if (request_head_version == 1) {
> struct ceph_mds_request_head_old *ohead = msg->front.iov_base;
>
> msg->hdr.version = cpu_to_le16(4);
> ohead->version = cpu_to_le16(1);
> p = msg->front.iov_base + sizeof(*ohead);
> + } else if (request_head_version == 2) {
> + struct ceph_mds_request_head *nhead = msg->front.iov_base;
> +
> + msg->hdr.version = cpu_to_le16(6);
> + nhead->version = cpu_to_le16(2);
> +
> + p = msg->front.iov_base + offsetofend(struct ceph_mds_request_head, ext_num_fwd);
> } else {
> struct ceph_mds_request_head *nhead = msg->front.iov_base;
> + kuid_t owner_fsuid;
> + kgid_t owner_fsgid;
>
> msg->hdr.version = cpu_to_le16(6);
> nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
> + nhead->struct_len = sizeof(struct ceph_mds_request_head);
> +
> + owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns,
> + VFSUIDT_INIT(req->r_cred->fsuid));
> + owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns,
> + VFSGIDT_INIT(req->r_cred->fsgid));
> + nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, owner_fsuid));
> + nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, owner_fsgid));
> p = msg->front.iov_base + sizeof(*nhead);
> }
>
> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> index e3bbf3ba8ee8..8f683e8203bd 100644
> --- a/fs/ceph/mds_client.h
> +++ b/fs/ceph/mds_client.h
> @@ -33,8 +33,10 @@ enum ceph_feature_type {
> CEPHFS_FEATURE_NOTIFY_SESSION_STATE,
> CEPHFS_FEATURE_OP_GETVXATTR,
> CEPHFS_FEATURE_32BITS_RETRY_FWD,
> + CEPHFS_FEATURE_NEW_SNAPREALM_INFO,
> + CEPHFS_FEATURE_HAS_OWNER_UIDGID,
>
> - CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD,
> + CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> };
>
> #define CEPHFS_FEATURES_CLIENT_SUPPORTED { \
> @@ -49,6 +51,7 @@ enum ceph_feature_type {
> CEPHFS_FEATURE_NOTIFY_SESSION_STATE, \
> CEPHFS_FEATURE_OP_GETVXATTR, \
> CEPHFS_FEATURE_32BITS_RETRY_FWD, \
> + CEPHFS_FEATURE_HAS_OWNER_UIDGID, \
> }
>
> /*
> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
> index 5f2301ee88bc..b91699b08f26 100644
> --- a/include/linux/ceph/ceph_fs.h
> +++ b/include/linux/ceph/ceph_fs.h
> @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy {
> union ceph_mds_request_args args;
> } __attribute__ ((packed));
>
> -#define CEPH_MDS_REQUEST_HEAD_VERSION 2
> +#define CEPH_MDS_REQUEST_HEAD_VERSION 3
>
> struct ceph_mds_request_head_old {
> __le16 version; /* struct version */
> @@ -530,6 +530,9 @@ struct ceph_mds_request_head {
>
> __le32 ext_num_retry; /* new count retry attempts */
> __le32 ext_num_fwd; /* new count fwd attempts */
> +
> + __le32 struct_len; /* to store size of struct ceph_mds_request_head */
> + __le32 owner_uid, owner_gid; /* used for OPs which create inodes */
Let's also initialize them to -1 for all the other requests as we do in
your PR.
Thanks
- Xiubo
> } __attribute__ ((packed));
>
> /* cap/lease release record */
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v9 03/12] ceph: handle idmapped mounts in create_request_message()
2023-08-07 1:24 ` Xiubo Li
@ 2023-08-07 6:51 ` Aleksandr Mikhalitsyn
2023-08-07 10:28 ` Xiubo Li
0 siblings, 1 reply; 25+ messages in thread
From: Aleksandr Mikhalitsyn @ 2023-08-07 6:51 UTC (permalink / raw)
To: Xiubo Li
Cc: brauner, stgraber, linux-fsdevel, Jeff Layton, Ilya Dryomov,
ceph-devel, linux-kernel
On Mon, Aug 7, 2023 at 3:25 AM Xiubo Li <xiubli@redhat.com> wrote:
>
>
> On 8/4/23 16:48, Alexander Mikhalitsyn wrote:
> > From: Christian Brauner <brauner@kernel.org>
> >
> > Inode operations that create a new filesystem object such as ->mknod,
> > ->create, ->mkdir() and others don't take a {g,u}id argument explicitly.
> > Instead the caller's fs{g,u}id is used for the {g,u}id of the new
> > filesystem object.
> >
> > In order to ensure that the correct {g,u}id is used map the caller's
> > fs{g,u}id for creation requests. This doesn't require complex changes.
> > It suffices to pass in the relevant idmapping recorded in the request
> > message. If this request message was triggered from an inode operation
> > that creates filesystem objects it will have passed down the relevant
> > idmaping. If this is a request message that was triggered from an inode
> > operation that doens't need to take idmappings into account the initial
> > idmapping is passed down which is an identity mapping.
> >
> > This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID
> > which adds two new fields (owner_{u,g}id) to the request head structure.
> > So, we need to ensure that MDS supports it otherwise we need to fail
> > any IO that comes through an idmapped mount because we can't process it
> > in a proper way. MDS server without such an extension will use caller_{u,g}id
> > fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id
> > values are unmapped. At the same time we can't map these fields with an
> > idmapping as it can break UID/GID-based permission checks logic on the
> > MDS side. This problem was described with a lot of details at [1], [2].
> >
> > [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
> > [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/
> >
> > Link: https://github.com/ceph/ceph/pull/52575
> > Link: https://tracker.ceph.com/issues/62217
> > Cc: Xiubo Li <xiubli@redhat.com>
> > Cc: Jeff Layton <jlayton@kernel.org>
> > Cc: Ilya Dryomov <idryomov@gmail.com>
> > Cc: ceph-devel@vger.kernel.org
> > Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> > Signed-off-by: Christian Brauner <brauner@kernel.org>
> > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> > ---
> > v7:
> > - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575)
> > v8:
> > - properly handled case when old MDS used with new kernel client
> > ---
> > fs/ceph/mds_client.c | 47 +++++++++++++++++++++++++++++++++---
> > fs/ceph/mds_client.h | 5 +++-
> > include/linux/ceph/ceph_fs.h | 5 +++-
> > 3 files changed, 52 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > index 8829f55103da..41e4bf3811c4 100644
> > --- a/fs/ceph/mds_client.c
> > +++ b/fs/ceph/mds_client.c
> > @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request *
> > }
> > }
> >
> > +static inline u16 mds_supported_head_version(struct ceph_mds_session *session)
> > +{
> > + if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features))
> > + return 1;
> > +
> > + if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features))
> > + return 2;
> > +
> > + return CEPH_MDS_REQUEST_HEAD_VERSION;
> > +}
> > +
> > static struct ceph_mds_request_head_legacy *
> > find_legacy_request_head(void *p, u64 features)
> > {
> > @@ -2923,6 +2934,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> > {
> > int mds = session->s_mds;
> > struct ceph_mds_client *mdsc = session->s_mdsc;
> > + struct ceph_client *cl = mdsc->fsc->client;
> > struct ceph_msg *msg;
> > struct ceph_mds_request_head_legacy *lhead;
> > const char *path1 = NULL;
> > @@ -2936,7 +2948,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> > void *p, *end;
> > int ret;
> > bool legacy = !(session->s_con.peer_features & CEPH_FEATURE_FS_BTIME);
> > - bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features);
> > + u16 request_head_version = mds_supported_head_version(session);
> >
> > ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry,
> > req->r_parent, req->r_path1, req->r_ino1.ino,
> > @@ -2977,8 +2989,10 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> > */
> > if (legacy)
> > len = sizeof(struct ceph_mds_request_head_legacy);
> > - else if (old_version)
> > + else if (request_head_version == 1)
> > len = sizeof(struct ceph_mds_request_head_old);
> > + else if (request_head_version == 2)
> > + len = offsetofend(struct ceph_mds_request_head, ext_num_fwd);
> > else
> > len = sizeof(struct ceph_mds_request_head);
> >
> > @@ -3028,6 +3042,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> > lhead = find_legacy_request_head(msg->front.iov_base,
> > session->s_con.peer_features);
> >
> > + if ((req->r_mnt_idmap != &nop_mnt_idmap) &&
> > + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) {
> > + pr_err_ratelimited_client(cl,
> > + "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID"
> > + " is not supported by MDS. Fail request with -EIO.\n");
> > +
> > + ret = -EIO;
> > + goto out_err;
> > + }
> > +
> > /*
> > * The ceph_mds_request_head_legacy didn't contain a version field, and
> > * one was added when we moved the message version from 3->4.
> > @@ -3035,17 +3059,34 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> > if (legacy) {
> > msg->hdr.version = cpu_to_le16(3);
> > p = msg->front.iov_base + sizeof(*lhead);
> > - } else if (old_version) {
> > + } else if (request_head_version == 1) {
> > struct ceph_mds_request_head_old *ohead = msg->front.iov_base;
> >
> > msg->hdr.version = cpu_to_le16(4);
> > ohead->version = cpu_to_le16(1);
> > p = msg->front.iov_base + sizeof(*ohead);
> > + } else if (request_head_version == 2) {
> > + struct ceph_mds_request_head *nhead = msg->front.iov_base;
> > +
> > + msg->hdr.version = cpu_to_le16(6);
> > + nhead->version = cpu_to_le16(2);
> > +
> > + p = msg->front.iov_base + offsetofend(struct ceph_mds_request_head, ext_num_fwd);
> > } else {
> > struct ceph_mds_request_head *nhead = msg->front.iov_base;
> > + kuid_t owner_fsuid;
> > + kgid_t owner_fsgid;
> >
> > msg->hdr.version = cpu_to_le16(6);
> > nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
> > + nhead->struct_len = sizeof(struct ceph_mds_request_head);
> > +
> > + owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns,
> > + VFSUIDT_INIT(req->r_cred->fsuid));
> > + owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns,
> > + VFSGIDT_INIT(req->r_cred->fsgid));
> > + nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, owner_fsuid));
> > + nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, owner_fsgid));
> > p = msg->front.iov_base + sizeof(*nhead);
> > }
> >
> > diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> > index e3bbf3ba8ee8..8f683e8203bd 100644
> > --- a/fs/ceph/mds_client.h
> > +++ b/fs/ceph/mds_client.h
> > @@ -33,8 +33,10 @@ enum ceph_feature_type {
> > CEPHFS_FEATURE_NOTIFY_SESSION_STATE,
> > CEPHFS_FEATURE_OP_GETVXATTR,
> > CEPHFS_FEATURE_32BITS_RETRY_FWD,
> > + CEPHFS_FEATURE_NEW_SNAPREALM_INFO,
> > + CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> >
> > - CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD,
> > + CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> > };
> >
> > #define CEPHFS_FEATURES_CLIENT_SUPPORTED { \
> > @@ -49,6 +51,7 @@ enum ceph_feature_type {
> > CEPHFS_FEATURE_NOTIFY_SESSION_STATE, \
> > CEPHFS_FEATURE_OP_GETVXATTR, \
> > CEPHFS_FEATURE_32BITS_RETRY_FWD, \
> > + CEPHFS_FEATURE_HAS_OWNER_UIDGID, \
> > }
> >
> > /*
> > diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
> > index 5f2301ee88bc..b91699b08f26 100644
> > --- a/include/linux/ceph/ceph_fs.h
> > +++ b/include/linux/ceph/ceph_fs.h
> > @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy {
> > union ceph_mds_request_args args;
> > } __attribute__ ((packed));
> >
> > -#define CEPH_MDS_REQUEST_HEAD_VERSION 2
> > +#define CEPH_MDS_REQUEST_HEAD_VERSION 3
> >
> > struct ceph_mds_request_head_old {
> > __le16 version; /* struct version */
> > @@ -530,6 +530,9 @@ struct ceph_mds_request_head {
> >
> > __le32 ext_num_retry; /* new count retry attempts */
> > __le32 ext_num_fwd; /* new count fwd attempts */
> > +
> > + __le32 struct_len; /* to store size of struct ceph_mds_request_head */
> > + __le32 owner_uid, owner_gid; /* used for OPs which create inodes */
>
> Let's also initialize them to -1 for all the other requests as we do in
> your PR.
They are always initialized already. As you can see from the code we
don't have any extra conditions
on filling these fields. We always fill them with an appropriate
UID/GID. If mount is not idmapped then it's just == caller_uid/gid,
if mount idmapped then it's idmapped caller_uid/gid.
Kind regards,
Alex
>
> Thanks
>
> - Xiubo
>
>
>
> > } __attribute__ ((packed));
> >
> > /* cap/lease release record */
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v9 03/12] ceph: handle idmapped mounts in create_request_message()
2023-08-07 6:51 ` Aleksandr Mikhalitsyn
@ 2023-08-07 10:28 ` Xiubo Li
2023-08-07 10:34 ` Aleksandr Mikhalitsyn
2023-08-07 11:45 ` Aleksandr Mikhalitsyn
0 siblings, 2 replies; 25+ messages in thread
From: Xiubo Li @ 2023-08-07 10:28 UTC (permalink / raw)
To: Aleksandr Mikhalitsyn
Cc: brauner, stgraber, linux-fsdevel, Jeff Layton, Ilya Dryomov,
ceph-devel, linux-kernel
On 8/7/23 14:51, Aleksandr Mikhalitsyn wrote:
> On Mon, Aug 7, 2023 at 3:25 AM Xiubo Li <xiubli@redhat.com> wrote:
>>
>> On 8/4/23 16:48, Alexander Mikhalitsyn wrote:
>>> From: Christian Brauner <brauner@kernel.org>
>>>
>>> Inode operations that create a new filesystem object such as ->mknod,
>>> ->create, ->mkdir() and others don't take a {g,u}id argument explicitly.
>>> Instead the caller's fs{g,u}id is used for the {g,u}id of the new
>>> filesystem object.
>>>
>>> In order to ensure that the correct {g,u}id is used map the caller's
>>> fs{g,u}id for creation requests. This doesn't require complex changes.
>>> It suffices to pass in the relevant idmapping recorded in the request
>>> message. If this request message was triggered from an inode operation
>>> that creates filesystem objects it will have passed down the relevant
>>> idmaping. If this is a request message that was triggered from an inode
>>> operation that doens't need to take idmappings into account the initial
>>> idmapping is passed down which is an identity mapping.
>>>
>>> This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID
>>> which adds two new fields (owner_{u,g}id) to the request head structure.
>>> So, we need to ensure that MDS supports it otherwise we need to fail
>>> any IO that comes through an idmapped mount because we can't process it
>>> in a proper way. MDS server without such an extension will use caller_{u,g}id
>>> fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id
>>> values are unmapped. At the same time we can't map these fields with an
>>> idmapping as it can break UID/GID-based permission checks logic on the
>>> MDS side. This problem was described with a lot of details at [1], [2].
>>>
>>> [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
>>> [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/
>>>
>>> Link: https://github.com/ceph/ceph/pull/52575
>>> Link: https://tracker.ceph.com/issues/62217
>>> Cc: Xiubo Li <xiubli@redhat.com>
>>> Cc: Jeff Layton <jlayton@kernel.org>
>>> Cc: Ilya Dryomov <idryomov@gmail.com>
>>> Cc: ceph-devel@vger.kernel.org
>>> Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
>>> Signed-off-by: Christian Brauner <brauner@kernel.org>
>>> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
>>> ---
>>> v7:
>>> - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575)
>>> v8:
>>> - properly handled case when old MDS used with new kernel client
>>> ---
>>> fs/ceph/mds_client.c | 47 +++++++++++++++++++++++++++++++++---
>>> fs/ceph/mds_client.h | 5 +++-
>>> include/linux/ceph/ceph_fs.h | 5 +++-
>>> 3 files changed, 52 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>>> index 8829f55103da..41e4bf3811c4 100644
>>> --- a/fs/ceph/mds_client.c
>>> +++ b/fs/ceph/mds_client.c
>>> @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request *
>>> }
>>> }
>>>
>>> +static inline u16 mds_supported_head_version(struct ceph_mds_session *session)
>>> +{
>>> + if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features))
>>> + return 1;
>>> +
>>> + if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features))
>>> + return 2;
>>> +
>>> + return CEPH_MDS_REQUEST_HEAD_VERSION;
>>> +}
>>> +
>>> static struct ceph_mds_request_head_legacy *
>>> find_legacy_request_head(void *p, u64 features)
>>> {
>>> @@ -2923,6 +2934,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>>> {
>>> int mds = session->s_mds;
>>> struct ceph_mds_client *mdsc = session->s_mdsc;
>>> + struct ceph_client *cl = mdsc->fsc->client;
>>> struct ceph_msg *msg;
>>> struct ceph_mds_request_head_legacy *lhead;
>>> const char *path1 = NULL;
>>> @@ -2936,7 +2948,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>>> void *p, *end;
>>> int ret;
>>> bool legacy = !(session->s_con.peer_features & CEPH_FEATURE_FS_BTIME);
>>> - bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features);
>>> + u16 request_head_version = mds_supported_head_version(session);
>>>
>>> ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry,
>>> req->r_parent, req->r_path1, req->r_ino1.ino,
>>> @@ -2977,8 +2989,10 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>>> */
>>> if (legacy)
>>> len = sizeof(struct ceph_mds_request_head_legacy);
>>> - else if (old_version)
>>> + else if (request_head_version == 1)
>>> len = sizeof(struct ceph_mds_request_head_old);
>>> + else if (request_head_version == 2)
>>> + len = offsetofend(struct ceph_mds_request_head, ext_num_fwd);
>>> else
>>> len = sizeof(struct ceph_mds_request_head);
>>>
>>> @@ -3028,6 +3042,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>>> lhead = find_legacy_request_head(msg->front.iov_base,
>>> session->s_con.peer_features);
>>>
>>> + if ((req->r_mnt_idmap != &nop_mnt_idmap) &&
>>> + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) {
>>> + pr_err_ratelimited_client(cl,
>>> + "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID"
>>> + " is not supported by MDS. Fail request with -EIO.\n");
>>> +
>>> + ret = -EIO;
>>> + goto out_err;
>>> + }
>>> +
>>> /*
>>> * The ceph_mds_request_head_legacy didn't contain a version field, and
>>> * one was added when we moved the message version from 3->4.
>>> @@ -3035,17 +3059,34 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>>> if (legacy) {
>>> msg->hdr.version = cpu_to_le16(3);
>>> p = msg->front.iov_base + sizeof(*lhead);
>>> - } else if (old_version) {
>>> + } else if (request_head_version == 1) {
>>> struct ceph_mds_request_head_old *ohead = msg->front.iov_base;
>>>
>>> msg->hdr.version = cpu_to_le16(4);
>>> ohead->version = cpu_to_le16(1);
>>> p = msg->front.iov_base + sizeof(*ohead);
>>> + } else if (request_head_version == 2) {
>>> + struct ceph_mds_request_head *nhead = msg->front.iov_base;
>>> +
>>> + msg->hdr.version = cpu_to_le16(6);
>>> + nhead->version = cpu_to_le16(2);
>>> +
>>> + p = msg->front.iov_base + offsetofend(struct ceph_mds_request_head, ext_num_fwd);
>>> } else {
>>> struct ceph_mds_request_head *nhead = msg->front.iov_base;
>>> + kuid_t owner_fsuid;
>>> + kgid_t owner_fsgid;
>>>
>>> msg->hdr.version = cpu_to_le16(6);
>>> nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
>>> + nhead->struct_len = sizeof(struct ceph_mds_request_head);
>>> +
>>> + owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns,
>>> + VFSUIDT_INIT(req->r_cred->fsuid));
>>> + owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns,
>>> + VFSGIDT_INIT(req->r_cred->fsgid));
>>> + nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, owner_fsuid));
>>> + nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, owner_fsgid));
>>> p = msg->front.iov_base + sizeof(*nhead);
>>> }
>>>
>>> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
>>> index e3bbf3ba8ee8..8f683e8203bd 100644
>>> --- a/fs/ceph/mds_client.h
>>> +++ b/fs/ceph/mds_client.h
>>> @@ -33,8 +33,10 @@ enum ceph_feature_type {
>>> CEPHFS_FEATURE_NOTIFY_SESSION_STATE,
>>> CEPHFS_FEATURE_OP_GETVXATTR,
>>> CEPHFS_FEATURE_32BITS_RETRY_FWD,
>>> + CEPHFS_FEATURE_NEW_SNAPREALM_INFO,
>>> + CEPHFS_FEATURE_HAS_OWNER_UIDGID,
>>>
>>> - CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD,
>>> + CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID,
>>> };
>>>
>>> #define CEPHFS_FEATURES_CLIENT_SUPPORTED { \
>>> @@ -49,6 +51,7 @@ enum ceph_feature_type {
>>> CEPHFS_FEATURE_NOTIFY_SESSION_STATE, \
>>> CEPHFS_FEATURE_OP_GETVXATTR, \
>>> CEPHFS_FEATURE_32BITS_RETRY_FWD, \
>>> + CEPHFS_FEATURE_HAS_OWNER_UIDGID, \
>>> }
>>>
>>> /*
>>> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
>>> index 5f2301ee88bc..b91699b08f26 100644
>>> --- a/include/linux/ceph/ceph_fs.h
>>> +++ b/include/linux/ceph/ceph_fs.h
>>> @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy {
>>> union ceph_mds_request_args args;
>>> } __attribute__ ((packed));
>>>
>>> -#define CEPH_MDS_REQUEST_HEAD_VERSION 2
>>> +#define CEPH_MDS_REQUEST_HEAD_VERSION 3
>>>
>>> struct ceph_mds_request_head_old {
>>> __le16 version; /* struct version */
>>> @@ -530,6 +530,9 @@ struct ceph_mds_request_head {
>>>
>>> __le32 ext_num_retry; /* new count retry attempts */
>>> __le32 ext_num_fwd; /* new count fwd attempts */
>>> +
>>> + __le32 struct_len; /* to store size of struct ceph_mds_request_head */
>>> + __le32 owner_uid, owner_gid; /* used for OPs which create inodes */
>> Let's also initialize them to -1 for all the other requests as we do in
>> your PR.
> They are always initialized already. As you can see from the code we
> don't have any extra conditions
> on filling these fields. We always fill them with an appropriate
> UID/GID. If mount is not idmapped then it's just == caller_uid/gid,
> if mount idmapped then it's idmapped caller_uid/gid.
Then in kclient all the request will always initialized the
'owner_{uid/gid}' with 'caller_{uid/gid}'. While in userspace libcephfs
it will only set them for 'create/mknod/mkdir/symlink` instead.
I'd prefer to make them consistent, which is what I am still focusing
on, to make them easier to read and comparing when troubles hooting.
Thanks
- Xiubo
> Kind regards,
> Alex
>
>> Thanks
>>
>> - Xiubo
>>
>>
>>
>>> } __attribute__ ((packed));
>>>
>>> /* cap/lease release record */
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v9 03/12] ceph: handle idmapped mounts in create_request_message()
2023-08-07 10:28 ` Xiubo Li
@ 2023-08-07 10:34 ` Aleksandr Mikhalitsyn
2023-08-07 11:21 ` Xiubo Li
2023-08-07 11:45 ` Aleksandr Mikhalitsyn
1 sibling, 1 reply; 25+ messages in thread
From: Aleksandr Mikhalitsyn @ 2023-08-07 10:34 UTC (permalink / raw)
To: Xiubo Li
Cc: brauner, stgraber, linux-fsdevel, Jeff Layton, Ilya Dryomov,
ceph-devel, linux-kernel
On Mon, Aug 7, 2023 at 12:28 PM Xiubo Li <xiubli@redhat.com> wrote:
>
>
> On 8/7/23 14:51, Aleksandr Mikhalitsyn wrote:
> > On Mon, Aug 7, 2023 at 3:25 AM Xiubo Li <xiubli@redhat.com> wrote:
> >>
> >> On 8/4/23 16:48, Alexander Mikhalitsyn wrote:
> >>> From: Christian Brauner <brauner@kernel.org>
> >>>
> >>> Inode operations that create a new filesystem object such as ->mknod,
> >>> ->create, ->mkdir() and others don't take a {g,u}id argument explicitly.
> >>> Instead the caller's fs{g,u}id is used for the {g,u}id of the new
> >>> filesystem object.
> >>>
> >>> In order to ensure that the correct {g,u}id is used map the caller's
> >>> fs{g,u}id for creation requests. This doesn't require complex changes.
> >>> It suffices to pass in the relevant idmapping recorded in the request
> >>> message. If this request message was triggered from an inode operation
> >>> that creates filesystem objects it will have passed down the relevant
> >>> idmaping. If this is a request message that was triggered from an inode
> >>> operation that doens't need to take idmappings into account the initial
> >>> idmapping is passed down which is an identity mapping.
> >>>
> >>> This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID
> >>> which adds two new fields (owner_{u,g}id) to the request head structure.
> >>> So, we need to ensure that MDS supports it otherwise we need to fail
> >>> any IO that comes through an idmapped mount because we can't process it
> >>> in a proper way. MDS server without such an extension will use caller_{u,g}id
> >>> fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id
> >>> values are unmapped. At the same time we can't map these fields with an
> >>> idmapping as it can break UID/GID-based permission checks logic on the
> >>> MDS side. This problem was described with a lot of details at [1], [2].
> >>>
> >>> [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
> >>> [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/
> >>>
> >>> Link: https://github.com/ceph/ceph/pull/52575
> >>> Link: https://tracker.ceph.com/issues/62217
> >>> Cc: Xiubo Li <xiubli@redhat.com>
> >>> Cc: Jeff Layton <jlayton@kernel.org>
> >>> Cc: Ilya Dryomov <idryomov@gmail.com>
> >>> Cc: ceph-devel@vger.kernel.org
> >>> Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> >>> Signed-off-by: Christian Brauner <brauner@kernel.org>
> >>> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> >>> ---
> >>> v7:
> >>> - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575)
> >>> v8:
> >>> - properly handled case when old MDS used with new kernel client
> >>> ---
> >>> fs/ceph/mds_client.c | 47 +++++++++++++++++++++++++++++++++---
> >>> fs/ceph/mds_client.h | 5 +++-
> >>> include/linux/ceph/ceph_fs.h | 5 +++-
> >>> 3 files changed, 52 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> >>> index 8829f55103da..41e4bf3811c4 100644
> >>> --- a/fs/ceph/mds_client.c
> >>> +++ b/fs/ceph/mds_client.c
> >>> @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request *
> >>> }
> >>> }
> >>>
> >>> +static inline u16 mds_supported_head_version(struct ceph_mds_session *session)
> >>> +{
> >>> + if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features))
> >>> + return 1;
> >>> +
> >>> + if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features))
> >>> + return 2;
> >>> +
> >>> + return CEPH_MDS_REQUEST_HEAD_VERSION;
> >>> +}
> >>> +
> >>> static struct ceph_mds_request_head_legacy *
> >>> find_legacy_request_head(void *p, u64 features)
> >>> {
> >>> @@ -2923,6 +2934,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>> {
> >>> int mds = session->s_mds;
> >>> struct ceph_mds_client *mdsc = session->s_mdsc;
> >>> + struct ceph_client *cl = mdsc->fsc->client;
> >>> struct ceph_msg *msg;
> >>> struct ceph_mds_request_head_legacy *lhead;
> >>> const char *path1 = NULL;
> >>> @@ -2936,7 +2948,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>> void *p, *end;
> >>> int ret;
> >>> bool legacy = !(session->s_con.peer_features & CEPH_FEATURE_FS_BTIME);
> >>> - bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features);
> >>> + u16 request_head_version = mds_supported_head_version(session);
> >>>
> >>> ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry,
> >>> req->r_parent, req->r_path1, req->r_ino1.ino,
> >>> @@ -2977,8 +2989,10 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>> */
> >>> if (legacy)
> >>> len = sizeof(struct ceph_mds_request_head_legacy);
> >>> - else if (old_version)
> >>> + else if (request_head_version == 1)
> >>> len = sizeof(struct ceph_mds_request_head_old);
> >>> + else if (request_head_version == 2)
> >>> + len = offsetofend(struct ceph_mds_request_head, ext_num_fwd);
> >>> else
> >>> len = sizeof(struct ceph_mds_request_head);
> >>>
> >>> @@ -3028,6 +3042,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>> lhead = find_legacy_request_head(msg->front.iov_base,
> >>> session->s_con.peer_features);
> >>>
> >>> + if ((req->r_mnt_idmap != &nop_mnt_idmap) &&
> >>> + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) {
> >>> + pr_err_ratelimited_client(cl,
> >>> + "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID"
> >>> + " is not supported by MDS. Fail request with -EIO.\n");
> >>> +
> >>> + ret = -EIO;
> >>> + goto out_err;
> >>> + }
> >>> +
> >>> /*
> >>> * The ceph_mds_request_head_legacy didn't contain a version field, and
> >>> * one was added when we moved the message version from 3->4.
> >>> @@ -3035,17 +3059,34 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>> if (legacy) {
> >>> msg->hdr.version = cpu_to_le16(3);
> >>> p = msg->front.iov_base + sizeof(*lhead);
> >>> - } else if (old_version) {
> >>> + } else if (request_head_version == 1) {
> >>> struct ceph_mds_request_head_old *ohead = msg->front.iov_base;
> >>>
> >>> msg->hdr.version = cpu_to_le16(4);
> >>> ohead->version = cpu_to_le16(1);
> >>> p = msg->front.iov_base + sizeof(*ohead);
> >>> + } else if (request_head_version == 2) {
> >>> + struct ceph_mds_request_head *nhead = msg->front.iov_base;
> >>> +
> >>> + msg->hdr.version = cpu_to_le16(6);
> >>> + nhead->version = cpu_to_le16(2);
> >>> +
> >>> + p = msg->front.iov_base + offsetofend(struct ceph_mds_request_head, ext_num_fwd);
> >>> } else {
> >>> struct ceph_mds_request_head *nhead = msg->front.iov_base;
> >>> + kuid_t owner_fsuid;
> >>> + kgid_t owner_fsgid;
> >>>
> >>> msg->hdr.version = cpu_to_le16(6);
> >>> nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
> >>> + nhead->struct_len = sizeof(struct ceph_mds_request_head);
> >>> +
> >>> + owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns,
> >>> + VFSUIDT_INIT(req->r_cred->fsuid));
> >>> + owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns,
> >>> + VFSGIDT_INIT(req->r_cred->fsgid));
> >>> + nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, owner_fsuid));
> >>> + nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, owner_fsgid));
> >>> p = msg->front.iov_base + sizeof(*nhead);
> >>> }
> >>>
> >>> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> >>> index e3bbf3ba8ee8..8f683e8203bd 100644
> >>> --- a/fs/ceph/mds_client.h
> >>> +++ b/fs/ceph/mds_client.h
> >>> @@ -33,8 +33,10 @@ enum ceph_feature_type {
> >>> CEPHFS_FEATURE_NOTIFY_SESSION_STATE,
> >>> CEPHFS_FEATURE_OP_GETVXATTR,
> >>> CEPHFS_FEATURE_32BITS_RETRY_FWD,
> >>> + CEPHFS_FEATURE_NEW_SNAPREALM_INFO,
> >>> + CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> >>>
> >>> - CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD,
> >>> + CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> >>> };
> >>>
> >>> #define CEPHFS_FEATURES_CLIENT_SUPPORTED { \
> >>> @@ -49,6 +51,7 @@ enum ceph_feature_type {
> >>> CEPHFS_FEATURE_NOTIFY_SESSION_STATE, \
> >>> CEPHFS_FEATURE_OP_GETVXATTR, \
> >>> CEPHFS_FEATURE_32BITS_RETRY_FWD, \
> >>> + CEPHFS_FEATURE_HAS_OWNER_UIDGID, \
> >>> }
> >>>
> >>> /*
> >>> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
> >>> index 5f2301ee88bc..b91699b08f26 100644
> >>> --- a/include/linux/ceph/ceph_fs.h
> >>> +++ b/include/linux/ceph/ceph_fs.h
> >>> @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy {
> >>> union ceph_mds_request_args args;
> >>> } __attribute__ ((packed));
> >>>
> >>> -#define CEPH_MDS_REQUEST_HEAD_VERSION 2
> >>> +#define CEPH_MDS_REQUEST_HEAD_VERSION 3
> >>>
> >>> struct ceph_mds_request_head_old {
> >>> __le16 version; /* struct version */
> >>> @@ -530,6 +530,9 @@ struct ceph_mds_request_head {
> >>>
> >>> __le32 ext_num_retry; /* new count retry attempts */
> >>> __le32 ext_num_fwd; /* new count fwd attempts */
> >>> +
> >>> + __le32 struct_len; /* to store size of struct ceph_mds_request_head */
> >>> + __le32 owner_uid, owner_gid; /* used for OPs which create inodes */
> >> Let's also initialize them to -1 for all the other requests as we do in
> >> your PR.
> > They are always initialized already. As you can see from the code we
> > don't have any extra conditions
> > on filling these fields. We always fill them with an appropriate
> > UID/GID. If mount is not idmapped then it's just == caller_uid/gid,
> > if mount idmapped then it's idmapped caller_uid/gid.
>
> Then in kclient all the request will always initialized the
> 'owner_{uid/gid}' with 'caller_{uid/gid}'. While in userspace libcephfs
> it will only set them for 'create/mknod/mkdir/symlink` instead.
>
> I'd prefer to make them consistent, which is what I am still focusing
> on, to make them easier to read and comparing when troubles hooting.
Dear Xiubo,
Sure, I will do it.
Couldn't you please review all the rest of the patches before I fix
this particular thing?
It will allow me to fix and send -v10 with all required fixes
incorporated in it.
Kind regards,
Alex
>
> Thanks
>
> - Xiubo
>
> > Kind regards,
> > Alex
> >
> >> Thanks
> >>
> >> - Xiubo
> >>
> >>
> >>
> >>> } __attribute__ ((packed));
> >>>
> >>> /* cap/lease release record */
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v9 03/12] ceph: handle idmapped mounts in create_request_message()
2023-08-07 10:34 ` Aleksandr Mikhalitsyn
@ 2023-08-07 11:21 ` Xiubo Li
2023-08-07 11:28 ` Aleksandr Mikhalitsyn
0 siblings, 1 reply; 25+ messages in thread
From: Xiubo Li @ 2023-08-07 11:21 UTC (permalink / raw)
To: Aleksandr Mikhalitsyn
Cc: brauner, stgraber, linux-fsdevel, Jeff Layton, Ilya Dryomov,
ceph-devel, linux-kernel
On 8/7/23 18:34, Aleksandr Mikhalitsyn wrote:
> On Mon, Aug 7, 2023 at 12:28 PM Xiubo Li <xiubli@redhat.com> wrote:
>>
>> On 8/7/23 14:51, Aleksandr Mikhalitsyn wrote:
>>> On Mon, Aug 7, 2023 at 3:25 AM Xiubo Li <xiubli@redhat.com> wrote:
>>>> On 8/4/23 16:48, Alexander Mikhalitsyn wrote:
>>>>> From: Christian Brauner <brauner@kernel.org>
>>>>>
>>>>> Inode operations that create a new filesystem object such as ->mknod,
>>>>> ->create, ->mkdir() and others don't take a {g,u}id argument explicitly.
>>>>> Instead the caller's fs{g,u}id is used for the {g,u}id of the new
>>>>> filesystem object.
>>>>>
>>>>> In order to ensure that the correct {g,u}id is used map the caller's
>>>>> fs{g,u}id for creation requests. This doesn't require complex changes.
>>>>> It suffices to pass in the relevant idmapping recorded in the request
>>>>> message. If this request message was triggered from an inode operation
>>>>> that creates filesystem objects it will have passed down the relevant
>>>>> idmaping. If this is a request message that was triggered from an inode
>>>>> operation that doens't need to take idmappings into account the initial
>>>>> idmapping is passed down which is an identity mapping.
>>>>>
>>>>> This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID
>>>>> which adds two new fields (owner_{u,g}id) to the request head structure.
>>>>> So, we need to ensure that MDS supports it otherwise we need to fail
>>>>> any IO that comes through an idmapped mount because we can't process it
>>>>> in a proper way. MDS server without such an extension will use caller_{u,g}id
>>>>> fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id
>>>>> values are unmapped. At the same time we can't map these fields with an
>>>>> idmapping as it can break UID/GID-based permission checks logic on the
>>>>> MDS side. This problem was described with a lot of details at [1], [2].
>>>>>
>>>>> [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
>>>>> [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/
>>>>>
>>>>> Link: https://github.com/ceph/ceph/pull/52575
>>>>> Link: https://tracker.ceph.com/issues/62217
>>>>> Cc: Xiubo Li <xiubli@redhat.com>
>>>>> Cc: Jeff Layton <jlayton@kernel.org>
>>>>> Cc: Ilya Dryomov <idryomov@gmail.com>
>>>>> Cc: ceph-devel@vger.kernel.org
>>>>> Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
>>>>> Signed-off-by: Christian Brauner <brauner@kernel.org>
>>>>> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
>>>>> ---
>>>>> v7:
>>>>> - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575)
>>>>> v8:
>>>>> - properly handled case when old MDS used with new kernel client
>>>>> ---
>>>>> fs/ceph/mds_client.c | 47 +++++++++++++++++++++++++++++++++---
>>>>> fs/ceph/mds_client.h | 5 +++-
>>>>> include/linux/ceph/ceph_fs.h | 5 +++-
>>>>> 3 files changed, 52 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>>>>> index 8829f55103da..41e4bf3811c4 100644
>>>>> --- a/fs/ceph/mds_client.c
>>>>> +++ b/fs/ceph/mds_client.c
>>>>> @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request *
>>>>> }
>>>>> }
>>>>>
>>>>> +static inline u16 mds_supported_head_version(struct ceph_mds_session *session)
>>>>> +{
>>>>> + if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features))
>>>>> + return 1;
>>>>> +
>>>>> + if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features))
>>>>> + return 2;
>>>>> +
>>>>> + return CEPH_MDS_REQUEST_HEAD_VERSION;
>>>>> +}
>>>>> +
>>>>> static struct ceph_mds_request_head_legacy *
>>>>> find_legacy_request_head(void *p, u64 features)
>>>>> {
>>>>> @@ -2923,6 +2934,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>>>>> {
>>>>> int mds = session->s_mds;
>>>>> struct ceph_mds_client *mdsc = session->s_mdsc;
>>>>> + struct ceph_client *cl = mdsc->fsc->client;
>>>>> struct ceph_msg *msg;
>>>>> struct ceph_mds_request_head_legacy *lhead;
>>>>> const char *path1 = NULL;
>>>>> @@ -2936,7 +2948,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>>>>> void *p, *end;
>>>>> int ret;
>>>>> bool legacy = !(session->s_con.peer_features & CEPH_FEATURE_FS_BTIME);
>>>>> - bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features);
>>>>> + u16 request_head_version = mds_supported_head_version(session);
>>>>>
>>>>> ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry,
>>>>> req->r_parent, req->r_path1, req->r_ino1.ino,
>>>>> @@ -2977,8 +2989,10 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>>>>> */
>>>>> if (legacy)
>>>>> len = sizeof(struct ceph_mds_request_head_legacy);
>>>>> - else if (old_version)
>>>>> + else if (request_head_version == 1)
>>>>> len = sizeof(struct ceph_mds_request_head_old);
>>>>> + else if (request_head_version == 2)
>>>>> + len = offsetofend(struct ceph_mds_request_head, ext_num_fwd);
>>>>> else
>>>>> len = sizeof(struct ceph_mds_request_head);
>>>>>
>>>>> @@ -3028,6 +3042,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>>>>> lhead = find_legacy_request_head(msg->front.iov_base,
>>>>> session->s_con.peer_features);
>>>>>
>>>>> + if ((req->r_mnt_idmap != &nop_mnt_idmap) &&
>>>>> + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) {
>>>>> + pr_err_ratelimited_client(cl,
>>>>> + "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID"
>>>>> + " is not supported by MDS. Fail request with -EIO.\n");
>>>>> +
>>>>> + ret = -EIO;
>>>>> + goto out_err;
>>>>> + }
>>>>> +
>>>>> /*
>>>>> * The ceph_mds_request_head_legacy didn't contain a version field, and
>>>>> * one was added when we moved the message version from 3->4.
>>>>> @@ -3035,17 +3059,34 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
>>>>> if (legacy) {
>>>>> msg->hdr.version = cpu_to_le16(3);
>>>>> p = msg->front.iov_base + sizeof(*lhead);
>>>>> - } else if (old_version) {
>>>>> + } else if (request_head_version == 1) {
>>>>> struct ceph_mds_request_head_old *ohead = msg->front.iov_base;
>>>>>
>>>>> msg->hdr.version = cpu_to_le16(4);
>>>>> ohead->version = cpu_to_le16(1);
>>>>> p = msg->front.iov_base + sizeof(*ohead);
>>>>> + } else if (request_head_version == 2) {
>>>>> + struct ceph_mds_request_head *nhead = msg->front.iov_base;
>>>>> +
>>>>> + msg->hdr.version = cpu_to_le16(6);
>>>>> + nhead->version = cpu_to_le16(2);
>>>>> +
>>>>> + p = msg->front.iov_base + offsetofend(struct ceph_mds_request_head, ext_num_fwd);
>>>>> } else {
>>>>> struct ceph_mds_request_head *nhead = msg->front.iov_base;
>>>>> + kuid_t owner_fsuid;
>>>>> + kgid_t owner_fsgid;
>>>>>
>>>>> msg->hdr.version = cpu_to_le16(6);
>>>>> nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
>>>>> + nhead->struct_len = sizeof(struct ceph_mds_request_head);
>>>>> +
>>>>> + owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns,
>>>>> + VFSUIDT_INIT(req->r_cred->fsuid));
>>>>> + owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns,
>>>>> + VFSGIDT_INIT(req->r_cred->fsgid));
>>>>> + nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, owner_fsuid));
>>>>> + nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, owner_fsgid));
>>>>> p = msg->front.iov_base + sizeof(*nhead);
>>>>> }
>>>>>
>>>>> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
>>>>> index e3bbf3ba8ee8..8f683e8203bd 100644
>>>>> --- a/fs/ceph/mds_client.h
>>>>> +++ b/fs/ceph/mds_client.h
>>>>> @@ -33,8 +33,10 @@ enum ceph_feature_type {
>>>>> CEPHFS_FEATURE_NOTIFY_SESSION_STATE,
>>>>> CEPHFS_FEATURE_OP_GETVXATTR,
>>>>> CEPHFS_FEATURE_32BITS_RETRY_FWD,
>>>>> + CEPHFS_FEATURE_NEW_SNAPREALM_INFO,
>>>>> + CEPHFS_FEATURE_HAS_OWNER_UIDGID,
>>>>>
>>>>> - CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD,
>>>>> + CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID,
>>>>> };
>>>>>
>>>>> #define CEPHFS_FEATURES_CLIENT_SUPPORTED { \
>>>>> @@ -49,6 +51,7 @@ enum ceph_feature_type {
>>>>> CEPHFS_FEATURE_NOTIFY_SESSION_STATE, \
>>>>> CEPHFS_FEATURE_OP_GETVXATTR, \
>>>>> CEPHFS_FEATURE_32BITS_RETRY_FWD, \
>>>>> + CEPHFS_FEATURE_HAS_OWNER_UIDGID, \
>>>>> }
>>>>>
>>>>> /*
>>>>> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
>>>>> index 5f2301ee88bc..b91699b08f26 100644
>>>>> --- a/include/linux/ceph/ceph_fs.h
>>>>> +++ b/include/linux/ceph/ceph_fs.h
>>>>> @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy {
>>>>> union ceph_mds_request_args args;
>>>>> } __attribute__ ((packed));
>>>>>
>>>>> -#define CEPH_MDS_REQUEST_HEAD_VERSION 2
>>>>> +#define CEPH_MDS_REQUEST_HEAD_VERSION 3
>>>>>
>>>>> struct ceph_mds_request_head_old {
>>>>> __le16 version; /* struct version */
>>>>> @@ -530,6 +530,9 @@ struct ceph_mds_request_head {
>>>>>
>>>>> __le32 ext_num_retry; /* new count retry attempts */
>>>>> __le32 ext_num_fwd; /* new count fwd attempts */
>>>>> +
>>>>> + __le32 struct_len; /* to store size of struct ceph_mds_request_head */
>>>>> + __le32 owner_uid, owner_gid; /* used for OPs which create inodes */
>>>> Let's also initialize them to -1 for all the other requests as we do in
>>>> your PR.
>>> They are always initialized already. As you can see from the code we
>>> don't have any extra conditions
>>> on filling these fields. We always fill them with an appropriate
>>> UID/GID. If mount is not idmapped then it's just == caller_uid/gid,
>>> if mount idmapped then it's idmapped caller_uid/gid.
>> Then in kclient all the request will always initialized the
>> 'owner_{uid/gid}' with 'caller_{uid/gid}'. While in userspace libcephfs
>> it will only set them for 'create/mknod/mkdir/symlink` instead.
>>
>> I'd prefer to make them consistent, which is what I am still focusing
>> on, to make them easier to read and comparing when troubles hooting.
> Dear Xiubo,
>
> Sure, I will do it.
>
> Couldn't you please review all the rest of the patches before I fix
> this particular thing?
> It will allow me to fix and send -v10 with all required fixes
> incorporated in it.
I have gone through them all and they LGTM.
Thanks
- Xiubo
> Kind regards,
> Alex
>
>> Thanks
>>
>> - Xiubo
>>
>>> Kind regards,
>>> Alex
>>>
>>>> Thanks
>>>>
>>>> - Xiubo
>>>>
>>>>
>>>>
>>>>> } __attribute__ ((packed));
>>>>>
>>>>> /* cap/lease release record */
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v9 03/12] ceph: handle idmapped mounts in create_request_message()
2023-08-07 11:21 ` Xiubo Li
@ 2023-08-07 11:28 ` Aleksandr Mikhalitsyn
0 siblings, 0 replies; 25+ messages in thread
From: Aleksandr Mikhalitsyn @ 2023-08-07 11:28 UTC (permalink / raw)
To: Xiubo Li
Cc: brauner, stgraber, linux-fsdevel, Jeff Layton, Ilya Dryomov,
ceph-devel, linux-kernel
On Mon, Aug 7, 2023 at 1:21 PM Xiubo Li <xiubli@redhat.com> wrote:
>
>
> On 8/7/23 18:34, Aleksandr Mikhalitsyn wrote:
> > On Mon, Aug 7, 2023 at 12:28 PM Xiubo Li <xiubli@redhat.com> wrote:
> >>
> >> On 8/7/23 14:51, Aleksandr Mikhalitsyn wrote:
> >>> On Mon, Aug 7, 2023 at 3:25 AM Xiubo Li <xiubli@redhat.com> wrote:
> >>>> On 8/4/23 16:48, Alexander Mikhalitsyn wrote:
> >>>>> From: Christian Brauner <brauner@kernel.org>
> >>>>>
> >>>>> Inode operations that create a new filesystem object such as ->mknod,
> >>>>> ->create, ->mkdir() and others don't take a {g,u}id argument explicitly.
> >>>>> Instead the caller's fs{g,u}id is used for the {g,u}id of the new
> >>>>> filesystem object.
> >>>>>
> >>>>> In order to ensure that the correct {g,u}id is used map the caller's
> >>>>> fs{g,u}id for creation requests. This doesn't require complex changes.
> >>>>> It suffices to pass in the relevant idmapping recorded in the request
> >>>>> message. If this request message was triggered from an inode operation
> >>>>> that creates filesystem objects it will have passed down the relevant
> >>>>> idmaping. If this is a request message that was triggered from an inode
> >>>>> operation that doens't need to take idmappings into account the initial
> >>>>> idmapping is passed down which is an identity mapping.
> >>>>>
> >>>>> This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID
> >>>>> which adds two new fields (owner_{u,g}id) to the request head structure.
> >>>>> So, we need to ensure that MDS supports it otherwise we need to fail
> >>>>> any IO that comes through an idmapped mount because we can't process it
> >>>>> in a proper way. MDS server without such an extension will use caller_{u,g}id
> >>>>> fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id
> >>>>> values are unmapped. At the same time we can't map these fields with an
> >>>>> idmapping as it can break UID/GID-based permission checks logic on the
> >>>>> MDS side. This problem was described with a lot of details at [1], [2].
> >>>>>
> >>>>> [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
> >>>>> [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/
> >>>>>
> >>>>> Link: https://github.com/ceph/ceph/pull/52575
> >>>>> Link: https://tracker.ceph.com/issues/62217
> >>>>> Cc: Xiubo Li <xiubli@redhat.com>
> >>>>> Cc: Jeff Layton <jlayton@kernel.org>
> >>>>> Cc: Ilya Dryomov <idryomov@gmail.com>
> >>>>> Cc: ceph-devel@vger.kernel.org
> >>>>> Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> >>>>> Signed-off-by: Christian Brauner <brauner@kernel.org>
> >>>>> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> >>>>> ---
> >>>>> v7:
> >>>>> - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575)
> >>>>> v8:
> >>>>> - properly handled case when old MDS used with new kernel client
> >>>>> ---
> >>>>> fs/ceph/mds_client.c | 47 +++++++++++++++++++++++++++++++++---
> >>>>> fs/ceph/mds_client.h | 5 +++-
> >>>>> include/linux/ceph/ceph_fs.h | 5 +++-
> >>>>> 3 files changed, 52 insertions(+), 5 deletions(-)
> >>>>>
> >>>>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> >>>>> index 8829f55103da..41e4bf3811c4 100644
> >>>>> --- a/fs/ceph/mds_client.c
> >>>>> +++ b/fs/ceph/mds_client.c
> >>>>> @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request *
> >>>>> }
> >>>>> }
> >>>>>
> >>>>> +static inline u16 mds_supported_head_version(struct ceph_mds_session *session)
> >>>>> +{
> >>>>> + if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features))
> >>>>> + return 1;
> >>>>> +
> >>>>> + if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features))
> >>>>> + return 2;
> >>>>> +
> >>>>> + return CEPH_MDS_REQUEST_HEAD_VERSION;
> >>>>> +}
> >>>>> +
> >>>>> static struct ceph_mds_request_head_legacy *
> >>>>> find_legacy_request_head(void *p, u64 features)
> >>>>> {
> >>>>> @@ -2923,6 +2934,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>>>> {
> >>>>> int mds = session->s_mds;
> >>>>> struct ceph_mds_client *mdsc = session->s_mdsc;
> >>>>> + struct ceph_client *cl = mdsc->fsc->client;
> >>>>> struct ceph_msg *msg;
> >>>>> struct ceph_mds_request_head_legacy *lhead;
> >>>>> const char *path1 = NULL;
> >>>>> @@ -2936,7 +2948,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>>>> void *p, *end;
> >>>>> int ret;
> >>>>> bool legacy = !(session->s_con.peer_features & CEPH_FEATURE_FS_BTIME);
> >>>>> - bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features);
> >>>>> + u16 request_head_version = mds_supported_head_version(session);
> >>>>>
> >>>>> ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry,
> >>>>> req->r_parent, req->r_path1, req->r_ino1.ino,
> >>>>> @@ -2977,8 +2989,10 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>>>> */
> >>>>> if (legacy)
> >>>>> len = sizeof(struct ceph_mds_request_head_legacy);
> >>>>> - else if (old_version)
> >>>>> + else if (request_head_version == 1)
> >>>>> len = sizeof(struct ceph_mds_request_head_old);
> >>>>> + else if (request_head_version == 2)
> >>>>> + len = offsetofend(struct ceph_mds_request_head, ext_num_fwd);
> >>>>> else
> >>>>> len = sizeof(struct ceph_mds_request_head);
> >>>>>
> >>>>> @@ -3028,6 +3042,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>>>> lhead = find_legacy_request_head(msg->front.iov_base,
> >>>>> session->s_con.peer_features);
> >>>>>
> >>>>> + if ((req->r_mnt_idmap != &nop_mnt_idmap) &&
> >>>>> + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) {
> >>>>> + pr_err_ratelimited_client(cl,
> >>>>> + "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID"
> >>>>> + " is not supported by MDS. Fail request with -EIO.\n");
> >>>>> +
> >>>>> + ret = -EIO;
> >>>>> + goto out_err;
> >>>>> + }
> >>>>> +
> >>>>> /*
> >>>>> * The ceph_mds_request_head_legacy didn't contain a version field, and
> >>>>> * one was added when we moved the message version from 3->4.
> >>>>> @@ -3035,17 +3059,34 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>>>> if (legacy) {
> >>>>> msg->hdr.version = cpu_to_le16(3);
> >>>>> p = msg->front.iov_base + sizeof(*lhead);
> >>>>> - } else if (old_version) {
> >>>>> + } else if (request_head_version == 1) {
> >>>>> struct ceph_mds_request_head_old *ohead = msg->front.iov_base;
> >>>>>
> >>>>> msg->hdr.version = cpu_to_le16(4);
> >>>>> ohead->version = cpu_to_le16(1);
> >>>>> p = msg->front.iov_base + sizeof(*ohead);
> >>>>> + } else if (request_head_version == 2) {
> >>>>> + struct ceph_mds_request_head *nhead = msg->front.iov_base;
> >>>>> +
> >>>>> + msg->hdr.version = cpu_to_le16(6);
> >>>>> + nhead->version = cpu_to_le16(2);
> >>>>> +
> >>>>> + p = msg->front.iov_base + offsetofend(struct ceph_mds_request_head, ext_num_fwd);
> >>>>> } else {
> >>>>> struct ceph_mds_request_head *nhead = msg->front.iov_base;
> >>>>> + kuid_t owner_fsuid;
> >>>>> + kgid_t owner_fsgid;
> >>>>>
> >>>>> msg->hdr.version = cpu_to_le16(6);
> >>>>> nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
> >>>>> + nhead->struct_len = sizeof(struct ceph_mds_request_head);
> >>>>> +
> >>>>> + owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns,
> >>>>> + VFSUIDT_INIT(req->r_cred->fsuid));
> >>>>> + owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns,
> >>>>> + VFSGIDT_INIT(req->r_cred->fsgid));
> >>>>> + nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, owner_fsuid));
> >>>>> + nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, owner_fsgid));
> >>>>> p = msg->front.iov_base + sizeof(*nhead);
> >>>>> }
> >>>>>
> >>>>> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> >>>>> index e3bbf3ba8ee8..8f683e8203bd 100644
> >>>>> --- a/fs/ceph/mds_client.h
> >>>>> +++ b/fs/ceph/mds_client.h
> >>>>> @@ -33,8 +33,10 @@ enum ceph_feature_type {
> >>>>> CEPHFS_FEATURE_NOTIFY_SESSION_STATE,
> >>>>> CEPHFS_FEATURE_OP_GETVXATTR,
> >>>>> CEPHFS_FEATURE_32BITS_RETRY_FWD,
> >>>>> + CEPHFS_FEATURE_NEW_SNAPREALM_INFO,
> >>>>> + CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> >>>>>
> >>>>> - CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD,
> >>>>> + CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> >>>>> };
> >>>>>
> >>>>> #define CEPHFS_FEATURES_CLIENT_SUPPORTED { \
> >>>>> @@ -49,6 +51,7 @@ enum ceph_feature_type {
> >>>>> CEPHFS_FEATURE_NOTIFY_SESSION_STATE, \
> >>>>> CEPHFS_FEATURE_OP_GETVXATTR, \
> >>>>> CEPHFS_FEATURE_32BITS_RETRY_FWD, \
> >>>>> + CEPHFS_FEATURE_HAS_OWNER_UIDGID, \
> >>>>> }
> >>>>>
> >>>>> /*
> >>>>> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
> >>>>> index 5f2301ee88bc..b91699b08f26 100644
> >>>>> --- a/include/linux/ceph/ceph_fs.h
> >>>>> +++ b/include/linux/ceph/ceph_fs.h
> >>>>> @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy {
> >>>>> union ceph_mds_request_args args;
> >>>>> } __attribute__ ((packed));
> >>>>>
> >>>>> -#define CEPH_MDS_REQUEST_HEAD_VERSION 2
> >>>>> +#define CEPH_MDS_REQUEST_HEAD_VERSION 3
> >>>>>
> >>>>> struct ceph_mds_request_head_old {
> >>>>> __le16 version; /* struct version */
> >>>>> @@ -530,6 +530,9 @@ struct ceph_mds_request_head {
> >>>>>
> >>>>> __le32 ext_num_retry; /* new count retry attempts */
> >>>>> __le32 ext_num_fwd; /* new count fwd attempts */
> >>>>> +
> >>>>> + __le32 struct_len; /* to store size of struct ceph_mds_request_head */
> >>>>> + __le32 owner_uid, owner_gid; /* used for OPs which create inodes */
> >>>> Let's also initialize them to -1 for all the other requests as we do in
> >>>> your PR.
> >>> They are always initialized already. As you can see from the code we
> >>> don't have any extra conditions
> >>> on filling these fields. We always fill them with an appropriate
> >>> UID/GID. If mount is not idmapped then it's just == caller_uid/gid,
> >>> if mount idmapped then it's idmapped caller_uid/gid.
> >> Then in kclient all the request will always initialized the
> >> 'owner_{uid/gid}' with 'caller_{uid/gid}'. While in userspace libcephfs
> >> it will only set them for 'create/mknod/mkdir/symlink` instead.
> >>
> >> I'd prefer to make them consistent, which is what I am still focusing
> >> on, to make them easier to read and comparing when troubles hooting.
> > Dear Xiubo,
> >
> > Sure, I will do it.
> >
> > Couldn't you please review all the rest of the patches before I fix
> > this particular thing?
> > It will allow me to fix and send -v10 with all required fixes
> > incorporated in it.
>
> I have gone through them all and they LGTM.
Thanks!
Kind regards,
Alex
>
> Thanks
>
> - Xiubo
>
>
> > Kind regards,
> > Alex
> >
> >> Thanks
> >>
> >> - Xiubo
> >>
> >>> Kind regards,
> >>> Alex
> >>>
> >>>> Thanks
> >>>>
> >>>> - Xiubo
> >>>>
> >>>>
> >>>>
> >>>>> } __attribute__ ((packed));
> >>>>>
> >>>>> /* cap/lease release record */
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v9 03/12] ceph: handle idmapped mounts in create_request_message()
2023-08-07 10:28 ` Xiubo Li
2023-08-07 10:34 ` Aleksandr Mikhalitsyn
@ 2023-08-07 11:45 ` Aleksandr Mikhalitsyn
1 sibling, 0 replies; 25+ messages in thread
From: Aleksandr Mikhalitsyn @ 2023-08-07 11:45 UTC (permalink / raw)
To: Xiubo Li
Cc: brauner, stgraber, linux-fsdevel, Jeff Layton, Ilya Dryomov,
ceph-devel, linux-kernel
On Mon, Aug 7, 2023 at 12:28 PM Xiubo Li <xiubli@redhat.com> wrote:
>
>
> On 8/7/23 14:51, Aleksandr Mikhalitsyn wrote:
> > On Mon, Aug 7, 2023 at 3:25 AM Xiubo Li <xiubli@redhat.com> wrote:
> >>
> >> On 8/4/23 16:48, Alexander Mikhalitsyn wrote:
> >>> From: Christian Brauner <brauner@kernel.org>
> >>>
> >>> Inode operations that create a new filesystem object such as ->mknod,
> >>> ->create, ->mkdir() and others don't take a {g,u}id argument explicitly.
> >>> Instead the caller's fs{g,u}id is used for the {g,u}id of the new
> >>> filesystem object.
> >>>
> >>> In order to ensure that the correct {g,u}id is used map the caller's
> >>> fs{g,u}id for creation requests. This doesn't require complex changes.
> >>> It suffices to pass in the relevant idmapping recorded in the request
> >>> message. If this request message was triggered from an inode operation
> >>> that creates filesystem objects it will have passed down the relevant
> >>> idmaping. If this is a request message that was triggered from an inode
> >>> operation that doens't need to take idmappings into account the initial
> >>> idmapping is passed down which is an identity mapping.
> >>>
> >>> This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID
> >>> which adds two new fields (owner_{u,g}id) to the request head structure.
> >>> So, we need to ensure that MDS supports it otherwise we need to fail
> >>> any IO that comes through an idmapped mount because we can't process it
> >>> in a proper way. MDS server without such an extension will use caller_{u,g}id
> >>> fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id
> >>> values are unmapped. At the same time we can't map these fields with an
> >>> idmapping as it can break UID/GID-based permission checks logic on the
> >>> MDS side. This problem was described with a lot of details at [1], [2].
> >>>
> >>> [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
> >>> [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/
> >>>
> >>> Link: https://github.com/ceph/ceph/pull/52575
> >>> Link: https://tracker.ceph.com/issues/62217
> >>> Cc: Xiubo Li <xiubli@redhat.com>
> >>> Cc: Jeff Layton <jlayton@kernel.org>
> >>> Cc: Ilya Dryomov <idryomov@gmail.com>
> >>> Cc: ceph-devel@vger.kernel.org
> >>> Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> >>> Signed-off-by: Christian Brauner <brauner@kernel.org>
> >>> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> >>> ---
> >>> v7:
> >>> - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575)
> >>> v8:
> >>> - properly handled case when old MDS used with new kernel client
> >>> ---
> >>> fs/ceph/mds_client.c | 47 +++++++++++++++++++++++++++++++++---
> >>> fs/ceph/mds_client.h | 5 +++-
> >>> include/linux/ceph/ceph_fs.h | 5 +++-
> >>> 3 files changed, 52 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> >>> index 8829f55103da..41e4bf3811c4 100644
> >>> --- a/fs/ceph/mds_client.c
> >>> +++ b/fs/ceph/mds_client.c
> >>> @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request *
> >>> }
> >>> }
> >>>
> >>> +static inline u16 mds_supported_head_version(struct ceph_mds_session *session)
> >>> +{
> >>> + if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features))
> >>> + return 1;
> >>> +
> >>> + if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features))
> >>> + return 2;
> >>> +
> >>> + return CEPH_MDS_REQUEST_HEAD_VERSION;
> >>> +}
> >>> +
> >>> static struct ceph_mds_request_head_legacy *
> >>> find_legacy_request_head(void *p, u64 features)
> >>> {
> >>> @@ -2923,6 +2934,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>> {
> >>> int mds = session->s_mds;
> >>> struct ceph_mds_client *mdsc = session->s_mdsc;
> >>> + struct ceph_client *cl = mdsc->fsc->client;
> >>> struct ceph_msg *msg;
> >>> struct ceph_mds_request_head_legacy *lhead;
> >>> const char *path1 = NULL;
> >>> @@ -2936,7 +2948,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>> void *p, *end;
> >>> int ret;
> >>> bool legacy = !(session->s_con.peer_features & CEPH_FEATURE_FS_BTIME);
> >>> - bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features);
> >>> + u16 request_head_version = mds_supported_head_version(session);
> >>>
> >>> ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry,
> >>> req->r_parent, req->r_path1, req->r_ino1.ino,
> >>> @@ -2977,8 +2989,10 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>> */
> >>> if (legacy)
> >>> len = sizeof(struct ceph_mds_request_head_legacy);
> >>> - else if (old_version)
> >>> + else if (request_head_version == 1)
> >>> len = sizeof(struct ceph_mds_request_head_old);
> >>> + else if (request_head_version == 2)
> >>> + len = offsetofend(struct ceph_mds_request_head, ext_num_fwd);
> >>> else
> >>> len = sizeof(struct ceph_mds_request_head);
> >>>
> >>> @@ -3028,6 +3042,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>> lhead = find_legacy_request_head(msg->front.iov_base,
> >>> session->s_con.peer_features);
> >>>
> >>> + if ((req->r_mnt_idmap != &nop_mnt_idmap) &&
> >>> + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) {
> >>> + pr_err_ratelimited_client(cl,
> >>> + "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID"
> >>> + " is not supported by MDS. Fail request with -EIO.\n");
> >>> +
> >>> + ret = -EIO;
> >>> + goto out_err;
> >>> + }
> >>> +
> >>> /*
> >>> * The ceph_mds_request_head_legacy didn't contain a version field, and
> >>> * one was added when we moved the message version from 3->4.
> >>> @@ -3035,17 +3059,34 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
> >>> if (legacy) {
> >>> msg->hdr.version = cpu_to_le16(3);
> >>> p = msg->front.iov_base + sizeof(*lhead);
> >>> - } else if (old_version) {
> >>> + } else if (request_head_version == 1) {
> >>> struct ceph_mds_request_head_old *ohead = msg->front.iov_base;
> >>>
> >>> msg->hdr.version = cpu_to_le16(4);
> >>> ohead->version = cpu_to_le16(1);
> >>> p = msg->front.iov_base + sizeof(*ohead);
> >>> + } else if (request_head_version == 2) {
> >>> + struct ceph_mds_request_head *nhead = msg->front.iov_base;
> >>> +
> >>> + msg->hdr.version = cpu_to_le16(6);
> >>> + nhead->version = cpu_to_le16(2);
> >>> +
> >>> + p = msg->front.iov_base + offsetofend(struct ceph_mds_request_head, ext_num_fwd);
> >>> } else {
> >>> struct ceph_mds_request_head *nhead = msg->front.iov_base;
> >>> + kuid_t owner_fsuid;
> >>> + kgid_t owner_fsgid;
> >>>
> >>> msg->hdr.version = cpu_to_le16(6);
> >>> nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION);
> >>> + nhead->struct_len = sizeof(struct ceph_mds_request_head);
> >>> +
> >>> + owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns,
> >>> + VFSUIDT_INIT(req->r_cred->fsuid));
> >>> + owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns,
> >>> + VFSGIDT_INIT(req->r_cred->fsgid));
> >>> + nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, owner_fsuid));
> >>> + nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, owner_fsgid));
> >>> p = msg->front.iov_base + sizeof(*nhead);
> >>> }
> >>>
> >>> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> >>> index e3bbf3ba8ee8..8f683e8203bd 100644
> >>> --- a/fs/ceph/mds_client.h
> >>> +++ b/fs/ceph/mds_client.h
> >>> @@ -33,8 +33,10 @@ enum ceph_feature_type {
> >>> CEPHFS_FEATURE_NOTIFY_SESSION_STATE,
> >>> CEPHFS_FEATURE_OP_GETVXATTR,
> >>> CEPHFS_FEATURE_32BITS_RETRY_FWD,
> >>> + CEPHFS_FEATURE_NEW_SNAPREALM_INFO,
> >>> + CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> >>>
> >>> - CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD,
> >>> + CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID,
> >>> };
> >>>
> >>> #define CEPHFS_FEATURES_CLIENT_SUPPORTED { \
> >>> @@ -49,6 +51,7 @@ enum ceph_feature_type {
> >>> CEPHFS_FEATURE_NOTIFY_SESSION_STATE, \
> >>> CEPHFS_FEATURE_OP_GETVXATTR, \
> >>> CEPHFS_FEATURE_32BITS_RETRY_FWD, \
> >>> + CEPHFS_FEATURE_HAS_OWNER_UIDGID, \
> >>> }
> >>>
> >>> /*
> >>> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
> >>> index 5f2301ee88bc..b91699b08f26 100644
> >>> --- a/include/linux/ceph/ceph_fs.h
> >>> +++ b/include/linux/ceph/ceph_fs.h
> >>> @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy {
> >>> union ceph_mds_request_args args;
> >>> } __attribute__ ((packed));
> >>>
> >>> -#define CEPH_MDS_REQUEST_HEAD_VERSION 2
> >>> +#define CEPH_MDS_REQUEST_HEAD_VERSION 3
> >>>
> >>> struct ceph_mds_request_head_old {
> >>> __le16 version; /* struct version */
> >>> @@ -530,6 +530,9 @@ struct ceph_mds_request_head {
> >>>
> >>> __le32 ext_num_retry; /* new count retry attempts */
> >>> __le32 ext_num_fwd; /* new count fwd attempts */
> >>> +
> >>> + __le32 struct_len; /* to store size of struct ceph_mds_request_head */
> >>> + __le32 owner_uid, owner_gid; /* used for OPs which create inodes */
> >> Let's also initialize them to -1 for all the other requests as we do in
> >> your PR.
> > They are always initialized already. As you can see from the code we
> > don't have any extra conditions
> > on filling these fields. We always fill them with an appropriate
> > UID/GID. If mount is not idmapped then it's just == caller_uid/gid,
> > if mount idmapped then it's idmapped caller_uid/gid.
>
> Then in kclient all the request will always initialized the
> 'owner_{uid/gid}' with 'caller_{uid/gid}'. While in userspace libcephfs
> it will only set them for 'create/mknod/mkdir/symlink` instead.
>
> I'd prefer to make them consistent, which is what I am still focusing
> on, to make them easier to read and comparing when troubles hooting.
Have fixed:
https://github.com/mihalicyn/linux/commit/5a5b590ca5aa9ec81d68ff60d77ea54fc86bf33a
Also have added appropriate checks in mkdir/atomic_open:
https://github.com/mihalicyn/linux/commit/bc1fa68f7143a58af8c181bbfab64edf0397dca5
https://github.com/mihalicyn/linux/commit/30e21387063710a10cdca15a5d840fcf8e1e6ccd
Will send v10 soon.
Kind regards,
Alex
>
> Thanks
>
> - Xiubo
>
> > Kind regards,
> > Alex
> >
> >> Thanks
> >>
> >> - Xiubo
> >>
> >>
> >>
> >>> } __attribute__ ((packed));
> >>>
> >>> /* cap/lease release record */
>
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2023-08-07 11:46 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-04 8:48 [PATCH v9 00/12] ceph: support idmapped mounts Alexander Mikhalitsyn
2023-08-04 8:48 ` [PATCH v9 01/12] fs: export mnt_idmap_get/mnt_idmap_put Alexander Mikhalitsyn
2023-08-04 8:48 ` [PATCH v9 02/12] ceph: stash idmapping in mdsc request Alexander Mikhalitsyn
2023-08-04 8:48 ` [PATCH v9 03/12] ceph: handle idmapped mounts in create_request_message() Alexander Mikhalitsyn
2023-08-04 14:53 ` Christian Brauner
2023-08-05 1:55 ` kernel test robot
2023-08-05 9:15 ` Aleksandr Mikhalitsyn
2023-08-07 1:24 ` Xiubo Li
2023-08-07 6:51 ` Aleksandr Mikhalitsyn
2023-08-07 10:28 ` Xiubo Li
2023-08-07 10:34 ` Aleksandr Mikhalitsyn
2023-08-07 11:21 ` Xiubo Li
2023-08-07 11:28 ` Aleksandr Mikhalitsyn
2023-08-07 11:45 ` Aleksandr Mikhalitsyn
2023-08-04 8:48 ` [PATCH v9 04/12] ceph: add enable_unsafe_idmap module parameter Alexander Mikhalitsyn
2023-08-04 14:54 ` Christian Brauner
2023-08-04 8:48 ` [PATCH v9 05/12] ceph: pass an idmapping to mknod/symlink/mkdir Alexander Mikhalitsyn
2023-08-04 8:48 ` [PATCH v9 06/12] ceph: allow idmapped getattr inode op Alexander Mikhalitsyn
2023-08-04 8:48 ` [PATCH v9 07/12] ceph: allow idmapped permission " Alexander Mikhalitsyn
2023-08-04 8:48 ` [PATCH v9 08/12] ceph: pass idmap to __ceph_setattr Alexander Mikhalitsyn
2023-08-04 14:55 ` Christian Brauner
2023-08-04 8:48 ` [PATCH v9 09/12] ceph: allow idmapped setattr inode op Alexander Mikhalitsyn
2023-08-04 8:48 ` [PATCH v9 10/12] ceph/acl: allow idmapped set_acl " Alexander Mikhalitsyn
2023-08-04 8:48 ` [PATCH v9 11/12] ceph/file: allow idmapped atomic_open " Alexander Mikhalitsyn
2023-08-04 8:48 ` [PATCH v9 12/12] ceph: allow idmapped mounts Alexander Mikhalitsyn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).