From: Luis Henriques <lhenriques@suse.com>
To: Jeff Layton <jlayton@kernel.org>
Cc: ceph-devel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] ceph: allow object copies across different filesystems in the same cluster
Date: Fri, 06 Sep 2019 17:26:51 +0100 [thread overview]
Message-ID: <87sgp9o0fo.fsf@suse.com> (raw)
In-Reply-To: <30b09cb015563913d073c488c8de8ba0cceedd7b.camel@kernel.org> (Jeff Layton's message of "Fri, 06 Sep 2019 12:18:10 -0400")
"Jeff Layton" <jlayton@kernel.org> writes:
> On Fri, 2019-09-06 at 14:57 +0100, Luis Henriques wrote:
>> OSDs are able to perform object copies across different pools. Thus,
>> there's no need to prevent copy_file_range from doing remote copies if the
>> source and destination superblocks are different. Only return -EXDEV if
>> they have different fsid (the cluster ID).
>>
>> Signed-off-by: Luis Henriques <lhenriques@suse.com>
>> ---
>> fs/ceph/file.c | 23 +++++++++++++++++++----
>> 1 file changed, 19 insertions(+), 4 deletions(-)
>>
>> Hi!
>>
>> I've finally managed to run some tests using multiple filesystems, both
>> within a single cluster and also using two different clusters. The
>> behaviour of copy_file_range (with this patch, of course) was what I
>> expected:
>>
>> - Object copies work fine across different filesystems within the same
>> cluster (even with pools in different PGs);
>> - -EXDEV is returned if the fsid is different
>>
>> (OT: I wonder why the cluster ID is named 'fsid'; historical reasons?
>> Because this is actually what's in ceph.conf fsid in "[global]"
>> section. Anyway...)
>>
>> So, what's missing right now is (I always mention this when I have the
>> opportunity!) to merge https://github.com/ceph/ceph/pull/25374 :-)
>> And add the corresponding support for the new flag to the kernel
>> client, of course.
>>
>> Cheers,
>> --
>> Luis
>>
>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> index 685a03cc4b77..88d116893c2b 100644
>> --- a/fs/ceph/file.c
>> +++ b/fs/ceph/file.c
>> @@ -1904,6 +1904,7 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
>> struct ceph_inode_info *src_ci = ceph_inode(src_inode);
>> struct ceph_inode_info *dst_ci = ceph_inode(dst_inode);
>> struct ceph_cap_flush *prealloc_cf;
>> + struct ceph_fs_client *src_fsc = ceph_inode_to_client(src_inode);
>> struct ceph_object_locator src_oloc, dst_oloc;
>> struct ceph_object_id src_oid, dst_oid;
>> loff_t endoff = 0, size;
>> @@ -1915,8 +1916,22 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
>>
>> if (src_inode == dst_inode)
>> return -EINVAL;
>> - if (src_inode->i_sb != dst_inode->i_sb)
>> - return -EXDEV;
>> + if (src_inode->i_sb != dst_inode->i_sb) {
>> + struct ceph_fs_client *dst_fsc = ceph_inode_to_client(dst_inode);
>> +
>> + if (!src_fsc->client->have_fsid || !dst_fsc->client->have_fsid) {
>> + dout("No fsid in a fs client\n");
>> + return -EXDEV;
>> + }
>
> In what situation is there no fsid? Old cluster version?
>
> If there is no fsid, can we take that to indicate that there is only a
> single filesystem possible in the cluster and that we should attempt the
> copy anyway?
TBH I'm not sure if 'have_fsid' can ever be 'false' in this call. It is
set to 'true' when handling the monmap, and it's never changed back to
'false'. Since I don't think copy_file_range will be invoked *before*
we get the monmap, it should be safe to drop this check. Maybe it could
be replaced it by a WARN_ON()?
Cheers,
--
Luis
>
>> + if (ceph_fsid_compare(&src_fsc->client->fsid,
>> + &dst_fsc->client->fsid)) {
>> + dout("Copying object across different clusters:");
>> + dout(" src fsid: %*ph\n dst fsid: %*ph\n",
>> + 16, &src_fsc->client->fsid,
>> + 16, &dst_fsc->client->fsid);
>> + return -EXDEV;
>> + }
>> + }
>> if (ceph_snap(dst_inode) != CEPH_NOSNAP)
>> return -EROFS;
>>
>> @@ -1928,7 +1943,7 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
>> * efficient).
>> */
>>
>> - if (ceph_test_mount_opt(ceph_inode_to_client(src_inode), NOCOPYFROM))
>> + if (ceph_test_mount_opt(src_fsc, NOCOPYFROM))
>> return -EOPNOTSUPP;
>>
>> if ((src_ci->i_layout.stripe_unit != dst_ci->i_layout.stripe_unit) ||
>> @@ -2044,7 +2059,7 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
>> dst_ci->i_vino.ino, dst_objnum);
>> /* Do an object remote copy */
>> err = ceph_osdc_copy_from(
>> - &ceph_inode_to_client(src_inode)->client->osdc,
>> + &src_fsc->client->osdc,
>> src_ci->i_vino.snap, 0,
>> &src_oid, &src_oloc,
>> CEPH_OSD_OP_FLAG_FADVISE_SEQUENTIAL |
WARNING: multiple messages have this Message-ID (diff)
From: Luis Henriques <lhenriques@suse.com>
To: "Jeff Layton" <jlayton@kernel.org>
Cc: <ceph-devel@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] ceph: allow object copies across different filesystems in the same cluster
Date: Fri, 06 Sep 2019 17:26:51 +0100 [thread overview]
Message-ID: <87sgp9o0fo.fsf@suse.com> (raw)
In-Reply-To: <30b09cb015563913d073c488c8de8ba0cceedd7b.camel@kernel.org> (Jeff Layton's message of "Fri, 06 Sep 2019 12:18:10 -0400")
"Jeff Layton" <jlayton@kernel.org> writes:
> On Fri, 2019-09-06 at 14:57 +0100, Luis Henriques wrote:
>> OSDs are able to perform object copies across different pools. Thus,
>> there's no need to prevent copy_file_range from doing remote copies if the
>> source and destination superblocks are different. Only return -EXDEV if
>> they have different fsid (the cluster ID).
>>
>> Signed-off-by: Luis Henriques <lhenriques@suse.com>
>> ---
>> fs/ceph/file.c | 23 +++++++++++++++++++----
>> 1 file changed, 19 insertions(+), 4 deletions(-)
>>
>> Hi!
>>
>> I've finally managed to run some tests using multiple filesystems, both
>> within a single cluster and also using two different clusters. The
>> behaviour of copy_file_range (with this patch, of course) was what I
>> expected:
>>
>> - Object copies work fine across different filesystems within the same
>> cluster (even with pools in different PGs);
>> - -EXDEV is returned if the fsid is different
>>
>> (OT: I wonder why the cluster ID is named 'fsid'; historical reasons?
>> Because this is actually what's in ceph.conf fsid in "[global]"
>> section. Anyway...)
>>
>> So, what's missing right now is (I always mention this when I have the
>> opportunity!) to merge https://github.com/ceph/ceph/pull/25374 :-)
>> And add the corresponding support for the new flag to the kernel
>> client, of course.
>>
>> Cheers,
>> --
>> Luis
>>
>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>> index 685a03cc4b77..88d116893c2b 100644
>> --- a/fs/ceph/file.c
>> +++ b/fs/ceph/file.c
>> @@ -1904,6 +1904,7 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
>> struct ceph_inode_info *src_ci = ceph_inode(src_inode);
>> struct ceph_inode_info *dst_ci = ceph_inode(dst_inode);
>> struct ceph_cap_flush *prealloc_cf;
>> + struct ceph_fs_client *src_fsc = ceph_inode_to_client(src_inode);
>> struct ceph_object_locator src_oloc, dst_oloc;
>> struct ceph_object_id src_oid, dst_oid;
>> loff_t endoff = 0, size;
>> @@ -1915,8 +1916,22 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
>>
>> if (src_inode == dst_inode)
>> return -EINVAL;
>> - if (src_inode->i_sb != dst_inode->i_sb)
>> - return -EXDEV;
>> + if (src_inode->i_sb != dst_inode->i_sb) {
>> + struct ceph_fs_client *dst_fsc = ceph_inode_to_client(dst_inode);
>> +
>> + if (!src_fsc->client->have_fsid || !dst_fsc->client->have_fsid) {
>> + dout("No fsid in a fs client\n");
>> + return -EXDEV;
>> + }
>
> In what situation is there no fsid? Old cluster version?
>
> If there is no fsid, can we take that to indicate that there is only a
> single filesystem possible in the cluster and that we should attempt the
> copy anyway?
TBH I'm not sure if 'have_fsid' can ever be 'false' in this call. It is
set to 'true' when handling the monmap, and it's never changed back to
'false'. Since I don't think copy_file_range will be invoked *before*
we get the monmap, it should be safe to drop this check. Maybe it could
be replaced it by a WARN_ON()?
Cheers,
--
Luis
>
>> + if (ceph_fsid_compare(&src_fsc->client->fsid,
>> + &dst_fsc->client->fsid)) {
>> + dout("Copying object across different clusters:");
>> + dout(" src fsid: %*ph\n dst fsid: %*ph\n",
>> + 16, &src_fsc->client->fsid,
>> + 16, &dst_fsc->client->fsid);
>> + return -EXDEV;
>> + }
>> + }
>> if (ceph_snap(dst_inode) != CEPH_NOSNAP)
>> return -EROFS;
>>
>> @@ -1928,7 +1943,7 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
>> * efficient).
>> */
>>
>> - if (ceph_test_mount_opt(ceph_inode_to_client(src_inode), NOCOPYFROM))
>> + if (ceph_test_mount_opt(src_fsc, NOCOPYFROM))
>> return -EOPNOTSUPP;
>>
>> if ((src_ci->i_layout.stripe_unit != dst_ci->i_layout.stripe_unit) ||
>> @@ -2044,7 +2059,7 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
>> dst_ci->i_vino.ino, dst_objnum);
>> /* Do an object remote copy */
>> err = ceph_osdc_copy_from(
>> - &ceph_inode_to_client(src_inode)->client->osdc,
>> + &src_fsc->client->osdc,
>> src_ci->i_vino.snap, 0,
>> &src_oid, &src_oloc,
>> CEPH_OSD_OP_FLAG_FADVISE_SEQUENTIAL |
next parent reply other threads:[~2019-09-06 16:26 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20190906135750.29543-1-lhenriques@suse.com>
[not found] ` <30b09cb015563913d073c488c8de8ba0cceedd7b.camel@kernel.org>
2019-09-06 16:26 ` Luis Henriques [this message]
2019-09-06 16:26 ` [PATCH] ceph: allow object copies across different filesystems in the same cluster Luis Henriques
2019-09-07 13:53 ` Jeff Layton
2019-09-09 10:18 ` Luis Henriques
2019-09-09 10:18 ` Luis Henriques
2019-09-09 10:28 ` [PATCH v2] " Luis Henriques
2019-09-09 10:35 ` Jeff Layton
2019-09-09 11:05 ` Jeff Layton
2019-09-09 13:55 ` Luis Henriques
2019-09-09 13:55 ` Luis Henriques
2019-09-09 15:21 ` Jeff Layton
2019-09-09 11:15 ` Luis Henriques
2019-09-09 11:15 ` Luis Henriques
2019-09-09 22:22 ` Gregory Farnum
2019-09-10 10:45 ` Luis Henriques
2019-09-09 10:51 ` Ilya Dryomov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87sgp9o0fo.fsf@suse.com \
--to=lhenriques@suse.com \
--cc=ceph-devel@vger.kernel.org \
--cc=jlayton@kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.