From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx1.redhat.com ([209.132.183.28]:36836 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726678AbfDLDiA (ORCPT ); Thu, 11 Apr 2019 23:38:00 -0400 Subject: Re: [RFC PATCH 2/2] ceph: test basic ceph.quota.max_bytes quota References: <20190402103428.21435-1-lhenriques@suse.com> <20190402103428.21435-3-lhenriques@suse.com> <20190402210931.GV23020@dastard> <87d0m3e81f.fsf@suse.com> <874l7fdy5s.fsf@suse.com> <20190403214708.GA26298@dastard> <87tvfecbv5.fsf@suse.com> <20190412011559.GE1695@dread.disaster.area> From: "Yan, Zheng" Message-ID: <740207e9-b4ef-e4b4-4097-9ece2ac189a7@redhat.com> Date: Fri, 12 Apr 2019 11:37:55 +0800 MIME-Version: 1.0 In-Reply-To: <20190412011559.GE1695@dread.disaster.area> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Sender: fstests-owner@vger.kernel.org Content-Transfer-Encoding: quoted-printable To: Dave Chinner , Luis Henriques Cc: Nikolay Borisov , fstests@vger.kernel.org, ceph-devel@vger.kernel.org List-ID: On 4/12/19 9:15 AM, Dave Chinner wrote: > On Thu, Apr 04, 2019 at 11:18:22AM +0100, Luis Henriques wrote: >> Dave Chinner writes: >> >>> On Wed, Apr 03, 2019 at 02:19:11PM +0100, Luis Henriques wrote: >>>> Nikolay Borisov writes: >>>>> On 3.04.19 =D0=B3. 12:45 =D1=87., Luis Henriques wrote: >>>>>> Dave Chinner writes: >>>>>>> Makes no sense to me. xfs_io does a write() loop internally with >>>>>>> this pwrite command of 4kB writes - the default buffer size. If y= ou >>>>>>> want xfs_io to loop doing 1MB sized pwrite() calls, then all you >>>>>>> need is this: >>>>>>> >>>>>>> $XFS_IO_PROG -f -c "pwrite -w -B 1m 0 ${size}m" $file | _filter_= xfs_io >>>>>>> >>>>>> >>>>>> Thank you for your review, Dave. I'll make sure the next revision= of >>>>>> these tests will include all your comments implemented... except f= or >>>>>> this one. >>>>>> >>>>>> The reason I'm using a loop for writing a file is due to the natur= e of >>>>>> the (very!) loose definition of quotas in CephFS. Basically, clie= nts >>>>>> will likely write some amount of data over the configured limit be= cause >>>>>> the servers they are communicating with to write the data (the OSD= s) >>>>>> have no idea about the concept of quotas (or files even); the file= system >>>>>> view in the cluster is managed at a different level, with the help= of >>>>>> the MDS and the client itself. >>>>>> >>>>>> So, the loop in this function is simply to allow the metadata asso= ciated >>>>>> with the file to be updated while we're writing the file. If I us= e a >>>>> >>>>> But the metadata will be modified while writing the file even with = a >>>>> single invocation of xfs_io. >>>> >>>> No, that's not true. It would be too expensive to keep the metadata >>>> server updated while writing to a file. So, making sure there's >>>> actually an open/close to the file (plus the fsync in pwrite) helps >>>> making sure the metadata is flushed into the MDS. >>> >>> /me sighs. >>> >>> So you want: >>> >>> loop until ${size}MB written: >>> write 1MB >>> fsync >>> -> flush data to server >>> -> flush metadata to server >>> >>> i.e. this one liner: >>> >>> xfs_io -f -c "pwrite -D -B 1m 0 ${size}m" /path/to/file >> >> Unfortunately, that doesn't do what I want either :-/ >> (and I guess you meant '-b 1m', not '-B 1m', right?) >=20 > Yes. But I definitely did mean "-D" so that RWF_DSYNC was used with > each 1MB write. >=20 >> [ Zheng: please feel free to correct me if I'm saying something really >> stupid below. ] >> >> So, one of the key things in my loop is the open/close operations. Wh= en >> a file is closed in cephfs the capabilities (that's ceph jargon for wh= at >> sort of operations a client is allowed to perform on an inode) will >> likely be released and that's when the metadata server will get the >> updated file size. Before that, the client is allowed to modify the >> file size if it has acquired the capabilities for doing so. >=20 > So you are saying that O_DSYNC writes on ceph do not force file > size metadata changes to the metadata server to be made stable? >=20 >> OTOH, a pwrite operation will eventually get the -EDQUOT even with the >> one-liner above because the client itself will realize it has exceeded= a >> certain threshold set by the MDS and will eventually update the server >> with the new file size. >=20 > Sure, but if the client crashes without having sent the updated file > size to the server as part of an extending O_DSYNC write, then how > is it recovered when the client reconnects to the server and > accesses the file again? For DSYNC write, client has already written data to object store. If=20 client crashes, MDS will set file to 'recovering' state and probe file=20 size by checking object store. Accessing the file is blocked during=20 recovery. Regards Yan, Zheng >=20 >> However that won't happen at a deterministic >> file size. For example, if quota is 10m and we're writing 20m, we may >> get the error after writing 15m. >> >> Does this make sense? >=20 > Only makes sense to me if O_DSYNC is ignored by the ceph client... >=20 >> So, I guess I *could* use your one-liner in the test, but I would need >> to slightly change the test logic -- I would need to write enough data >> to the file to make sure I would get the -EDQUOT but I wouldn't be abl= e >> to actually check the file size as it will not be constant. >> >>> Fundamentally, if you find yourself writing a loop around xfs_io to >>> break up a sequential IO stream into individual chunks, then you are >>> most likely doing something xfs_io can already do. And if xfs_io >>> cannot do it, then the right thing to do is to modify xfs_io to be >>> able to do it and then use xfs_io.... >> >> Got it! But I guess it wouldn't make sense to change xfs_io for this >> specific scenario where I want several open-write-close cycles. >=20 > That's how individual NFS client writes appear to filesystem under > the NFS server. I've previously considered adding an option in > xfs_io to mimic this open-write-close loop per buffer so it's easy > to exercise such behaviours, but never actually required it to > reproduce the problems I was chasing. So it's definitely something > that xfs_io /could/ do if necessary. >=20 > Cheers, >=20 > Dave. >=20