From: "Luís Henriques" <lhenriques@suse.de>
To: Jeff Layton <jlayton@kernel.org>
Cc: Xiubo Li <xiubli@redhat.com>,
idryomov@gmail.com, vshankar@redhat.com, khiremat@redhat.com,
pdonnell@redhat.com, ceph-devel@vger.kernel.org
Subject: Re: [PATCH v2 4/4] ceph: add truncate size handling support for fscrypt
Date: Wed, 27 Oct 2021 16:06:31 +0100 [thread overview]
Message-ID: <YXlq95FB8sU493dr@suse.de> (raw)
In-Reply-To: <cb4ddb7a862dbb0b5f44c4c4a131adfc8c3f008c.camel@kernel.org>
On Wed, Oct 27, 2021 at 08:17:55AM -0400, Jeff Layton wrote:
> On Wed, 2021-10-27 at 13:12 +0800, Xiubo Li wrote:
> > On 10/26/21 4:01 AM, Jeff Layton wrote:
> > > On Wed, 2021-10-20 at 21:28 +0800, xiubli@redhat.com wrote:
> > > > From: Xiubo Li <xiubli@redhat.com>
> > > >
> > > > This will transfer the encrypted last block contents to the MDS
> > > > along with the truncate request only when new size is smaller and
> > > > not aligned to the fscrypt BLOCK size.
> > > >
> > > > Signed-off-by: Xiubo Li <xiubli@redhat.com>
> > > > ---
> > > > fs/ceph/caps.c | 9 +--
> > > > fs/ceph/inode.c | 210 ++++++++++++++++++++++++++++++++++++++++++------
> > > > 2 files changed, 190 insertions(+), 29 deletions(-)
> > > >
> > > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > > > index 4e2a588465c5..1a36f0870d89 100644
> > > > --- a/fs/ceph/caps.c
> > > > +++ b/fs/ceph/caps.c
> > ...
> > > > +fill_last_block:
> > > > + pagelist = ceph_pagelist_alloc(GFP_KERNEL);
> > > > + if (!pagelist)
> > > > + return -ENOMEM;
> > > > +
> > > > + /* Insert the header first */
> > > > + header.ver = 1;
> > > > + header.compat = 1;
> > > > + /* sizeof(file_offset) + sizeof(block_size) + blen */
> > > > + header.data_len = cpu_to_le32(8 + 8 + CEPH_FSCRYPT_BLOCK_SIZE);
> > > > + header.file_offset = cpu_to_le64(orig_pos);
> > > > + if (fill_header_only) {
> > > > + header.file_offset = cpu_to_le64(0);
> > > > + header.block_size = cpu_to_le64(0);
> > > > + } else {
> > > > + header.file_offset = cpu_to_le64(orig_pos);
> > > > + header.block_size = cpu_to_le64(CEPH_FSCRYPT_BLOCK_SIZE);
> > > > + }
> > > > + ret = ceph_pagelist_append(pagelist, &header, sizeof(header));
> > > > + if (ret)
> > > > + goto out;
> > > >
> > > >
> > > Note that you're doing a read/modify/write cycle, and you must ensure
> > > that the object remains consistent between the read and write or you may
> > > end up with data corruption. This means that you probably need to
> > > transmit an object version as part of the write. See this patch in the
> > > stack:
> > >
> > > libceph: add CEPH_OSD_OP_ASSERT_VER support
> > >
> > > That op tells the OSD to stop processing the request if the version is
> > > wrong.
> > >
> > > You'll want to grab the "ver" from the __ceph_sync_read call, and then
> > > send that along with the updated last block. Then, when the MDS is
> > > truncating, it can use a CEPH_OSD_OP_ASSERT_VER op with that version to
> > > ensure that the object hasn't changed when doing it. If the assertion
> > > trips, then the MDS should send back EAGAIN or something similar to the
> > > client to tell it to retry.
> > >
> > > It's also possible (though I've never used it) to make an OSD request
> > > assert that the contents of an extent haven't changed, so you could
> > > instead send along the old contents along with the new ones, etc.
> > >
> > > That may end up being more efficient if the object is getting hammered
> > > with I/O in other fscrypt blocks within the same object. It may be worth
> > > exploring that avenue as well.
> >
> > Hi Jeff,
> >
> > One questions about this:
> >
> > Should we consider that the FSCRYPT BLOCK will cross two different Rados
> > objects ? As default the Rados object size is 4MB.
> >
> > In case the FSCRYPT BLOCK size is 4K, when the object size is 3K or 5K ?
> >
> > Or the object size should always be multiple of FSCRYPT BLOCK size ?
> >
>
> The danger here is that it's very hard to ensure atomicity in RADOS
> across two different objects. If your crypto blocks can span objects,
> then you can end up with torn writes, and a torn write on a crypto block
> turns it into garbage.
>
> So, I think we want to forbid:
>
> 1/ custom file layouts on encrypted files, to ensure that we don't end
> up with weird object sizes. Luis' patch from August does this, but I
> think we might want the MDS to also vet this.
>
> 2/ block sizes larger than the object size
I believe that object sizes have a minimum of 65k, defined by
CEPH_MIN_STRIPE_UNIT. So, maybe I'm oversimplifying, but I think we just
need to ensure (a build-time check?) that
CEPH_FSCRYPT_BLOCK_SIZE <= CEPH_MIN_STRIPE_UNIT
Cheers,
--
Luís
> 3/ non-power-of-two crypto block sizes (so no 3k or 5k blocks, but you
> could do 1k, 2k, 4k, 8k, etc...)
>
> ...with that we should be able to ensure that they never span objects.
> Eventually we may be able to relax some of these constraints, but I
> don't think most users will have a problem with these constraints.
>
> --
> Jeff Layton <jlayton@kernel.org>
>
next prev parent reply other threads:[~2021-10-27 15:06 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-20 13:28 [PATCH v2 0/4] ceph: size handling for the fscrypt xiubli
2021-10-20 13:28 ` [PATCH v2 1/4] ceph: add __ceph_get_caps helper support xiubli
2021-10-20 13:28 ` [PATCH v2 2/4] ceph: add __ceph_sync_read " xiubli
2021-10-20 13:28 ` [PATCH v2 3/4] ceph: return the real size readed when hit EOF xiubli
2021-10-25 19:05 ` Jeff Layton
2021-10-26 3:12 ` Xiubo Li
2021-10-20 13:28 ` [PATCH v2 4/4] ceph: add truncate size handling support for fscrypt xiubli
2021-10-25 20:01 ` Jeff Layton
2021-10-26 3:41 ` Xiubo Li
2021-10-27 23:23 ` Xiubo Li
2021-10-27 5:12 ` Xiubo Li
2021-10-27 12:17 ` Jeff Layton
2021-10-27 13:57 ` Xiubo Li
2021-10-27 15:06 ` Luís Henriques [this message]
2021-10-27 23:08 ` Xiubo Li
2021-10-20 15:32 ` [PATCH v2 0/4] ceph: size handling for the fscrypt Jeff Layton
2021-10-25 20:13 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YXlq95FB8sU493dr@suse.de \
--to=lhenriques@suse.de \
--cc=ceph-devel@vger.kernel.org \
--cc=idryomov@gmail.com \
--cc=jlayton@kernel.org \
--cc=khiremat@redhat.com \
--cc=pdonnell@redhat.com \
--cc=vshankar@redhat.com \
--cc=xiubli@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.