From: Luis Henriques <lhenriques@suse.de>
To: Jeff Layton <jlayton@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>,
ceph-devel@vger.kernel.org, linux-kernel@vger.kernel.org,
Patrick Donnelly <pdonnell@redhat.com>
Subject: Re: [RFC PATCH] ceph: fix cross quota realms renames with new truncated files
Date: Thu, 12 Nov 2020 10:40:56 +0000 [thread overview]
Message-ID: <87v9eadfif.fsf@suse.de> (raw)
In-Reply-To: <925dda9b15044c8a19ac2017d4b135209e1f6184.camel@kernel.org> (Jeff Layton's message of "Wed, 11 Nov 2020 18:51:36 -0500")
Jeff Layton <jlayton@kernel.org> writes:
> On Wed, 2020-11-11 at 18:28 +0000, Luis Henriques wrote:
>> Jeff Layton <jlayton@kernel.org> writes:
>>
>> > On Wed, 2020-11-11 at 15:39 +0000, Luis Henriques wrote:
>> > > When doing a rename across quota realms, there's a corner case that isn't
>> > > handled correctly. Here's a testcase:
>> > >
>> > > mkdir files limit
>> > > truncate files/file -s 10G
>> > > setfattr limit -n ceph.quota.max_bytes -v 1000000
>> > > mv files limit/
>> > >
>> > > The above will succeed because ftruncate(2) won't result in an immediate
>> > > notification of the MDSs with the new file size, and thus the quota realms
>> > > stats won't be updated.
>> > >
>> > > This patch forces a sync with the MDS every time there's an ATTR_SIZE that
>> > > sets a new i_size, even if we have Fx caps.
>> > >
>> > > Cc: stable@vger.kernel.org
>> > > Fixes: dffdcd71458e ("ceph: allow rename operation under different quota realms")
>> > > URL: https://tracker.ceph.com/issues/36593
>> > > Signed-off-by: Luis Henriques <lhenriques@suse.de>
>> > > ---
>> > > fs/ceph/inode.c | 11 ++---------
>> > > 1 file changed, 2 insertions(+), 9 deletions(-)
>> > >
>> > > diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
>> > > index 526faf4778ce..30e3f240ac96 100644
>> > > --- a/fs/ceph/inode.c
>> > > +++ b/fs/ceph/inode.c
>> > > @@ -2136,15 +2136,8 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr)
>> > > if (ia_valid & ATTR_SIZE) {
>> > > dout("setattr %p size %lld -> %lld\n", inode,
>> > > inode->i_size, attr->ia_size);
>> > > - if ((issued & CEPH_CAP_FILE_EXCL) &&
>> > > - attr->ia_size > inode->i_size) {
>> > > - i_size_write(inode, attr->ia_size);
>> > > - inode->i_blocks = calc_inode_blocks(attr->ia_size);
>> > > - ci->i_reported_size = attr->ia_size;
>> > > - dirtied |= CEPH_CAP_FILE_EXCL;
>> > > - ia_valid |= ATTR_MTIME;
>> > > - } else if ((issued & CEPH_CAP_FILE_SHARED) == 0 ||
>> > > - attr->ia_size != inode->i_size) {
>> > > + if ((issued & (CEPH_CAP_FILE_EXCL|CEPH_CAP_FILE_SHARED)) ||
>> > > + (attr->ia_size != inode->i_size)) {
>> > > req->r_args.setattr.size = cpu_to_le64(attr->ia_size);
>> > > req->r_args.setattr.old_size =
>> > > cpu_to_le64(inode->i_size);
>> >
>> > Hmm...this makes truncates more expensive when we have caps. I'd rather
>> > not do that if we can help it.
>>
>> Yeah, as I mentioned in the tracker, there's indeed a performance impact
>> with this fix. That's what made me add the RFC in the subject ;-)
>>
>> > What about instead having the client mimic a fsync when there is a
>> > rename across quota realms? If we can't tell that reliably then we could
>> > also just do an effective fsync ahead of any cross-directory rename?
>>
>> Ok, thanks for the suggestion. That may actually work, although it will
>> make the rename more expensive of course. I'll test that tomorrow and
>> eventually follow-up with a patch.
>>
>
> Patrick pointed out to me on IRC that since you're moving the parent
> directory of the truncated file, flushing the caps on the directory
> won't really help. You'd need to walk the entire subtree and try to
> flush every dirty inode, or basically do a syncfs() prior to renaming
> the directory across quotarealms.
>
> I think we probably will need to revert the change to allow cross-
> quotarealm renames of directories and make those return EXDEV again.
> Anything else sounds like it's probably going to be too expensive.
Hmm... that sounds a bit drastic and it would make the kernel client
behave differently from the fuse client -- from what I could understand
the fuse client does the sync ATTR_SIZE and thus doesn't have this issue.
Obviously, I agree with you that the performance penalty is too high for
such a common operation. But maybe renames across quotarealms aren't that
common and paying the penalty of doing a full ceph_flush_dirty_caps() is
acceptable for such cases?
Cheers,
--
Luis
next prev parent reply other threads:[~2020-11-12 10:40 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-11 15:39 [RFC PATCH] ceph: fix cross quota realms renames with new truncated files Luis Henriques
2020-11-11 17:40 ` Jeff Layton
2020-11-11 18:28 ` Luis Henriques
2020-11-11 19:33 ` Jeff Layton
2020-11-11 23:51 ` Jeff Layton
2020-11-12 10:40 ` Luis Henriques [this message]
2020-11-12 12:16 ` Jeff Layton
2020-11-12 15:01 ` Luis Henriques
2020-11-12 15:23 ` [PATCH] Revert "ceph: allow rename operation under different quota realms" Luis Henriques
2020-11-12 16:34 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87v9eadfif.fsf@suse.de \
--to=lhenriques@suse.de \
--cc=ceph-devel@vger.kernel.org \
--cc=idryomov@gmail.com \
--cc=jlayton@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pdonnell@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox