From: "Yan, Zheng" <zyan@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Luis Henriques <lhenriques@suse.com>,
Nikolay Borisov <nborisov@suse.com>,
fstests@vger.kernel.org, ceph-devel@vger.kernel.org
Subject: Re: [RFC PATCH 2/2] ceph: test basic ceph.quota.max_bytes quota
Date: Mon, 15 Apr 2019 10:16:18 +0800 [thread overview]
Message-ID: <0cbc6885-93ae-ca79-184e-cdc56681202c@redhat.com> (raw)
In-Reply-To: <20190414221535.GF1695@dread.disaster.area>
On 4/15/19 6:15 AM, Dave Chinner wrote:
> On Fri, Apr 12, 2019 at 11:37:55AM +0800, Yan, Zheng wrote:
>> On 4/12/19 9:15 AM, Dave Chinner wrote:
>>> On Thu, Apr 04, 2019 at 11:18:22AM +0100, Luis Henriques wrote:
>>>> Dave Chinner <david@fromorbit.com> writes:
>>>>
>>>>> On Wed, Apr 03, 2019 at 02:19:11PM +0100, Luis Henriques wrote:
>>>>>> Nikolay Borisov <nborisov@suse.com> writes:
>>>>>>> On 3.04.19 г. 12:45 ч., Luis Henriques wrote:
>>>>>>>> Dave Chinner <david@fromorbit.com> writes:
>>>>>>>>> Makes no sense to me. xfs_io does a write() loop internally with
>>>>>>>>> this pwrite command of 4kB writes - the default buffer size. If you
>>>>>>>>> want xfs_io to loop doing 1MB sized pwrite() calls, then all you
>>>>>>>>> need is this:
>>>>>>>>>
>>>>>>>>> $XFS_IO_PROG -f -c "pwrite -w -B 1m 0 ${size}m" $file | _filter_xfs_io
>>>>>>>>>
>>>>>>>>
>>>>>>>> Thank you for your review, Dave. I'll make sure the next revision of
>>>>>>>> these tests will include all your comments implemented... except for
>>>>>>>> this one.
>>>>>>>>
>>>>>>>> The reason I'm using a loop for writing a file is due to the nature of
>>>>>>>> the (very!) loose definition of quotas in CephFS. Basically, clients
>>>>>>>> will likely write some amount of data over the configured limit because
>>>>>>>> the servers they are communicating with to write the data (the OSDs)
>>>>>>>> have no idea about the concept of quotas (or files even); the filesystem
>>>>>>>> view in the cluster is managed at a different level, with the help of
>>>>>>>> the MDS and the client itself.
>>>>>>>>
>>>>>>>> So, the loop in this function is simply to allow the metadata associated
>>>>>>>> with the file to be updated while we're writing the file. If I use a
>>>>>>>
>>>>>>> But the metadata will be modified while writing the file even with a
>>>>>>> single invocation of xfs_io.
>>>>>>
>>>>>> No, that's not true. It would be too expensive to keep the metadata
>>>>>> server updated while writing to a file. So, making sure there's
>>>>>> actually an open/close to the file (plus the fsync in pwrite) helps
>>>>>> making sure the metadata is flushed into the MDS.
>>>>>
>>>>> /me sighs.
>>>>>
>>>>> So you want:
>>>>>
>>>>> loop until ${size}MB written:
>>>>> write 1MB
>>>>> fsync
>>>>> -> flush data to server
>>>>> -> flush metadata to server
>>>>>
>>>>> i.e. this one liner:
>>>>>
>>>>> xfs_io -f -c "pwrite -D -B 1m 0 ${size}m" /path/to/file
>>>>
>>>> Unfortunately, that doesn't do what I want either :-/
>>>> (and I guess you meant '-b 1m', not '-B 1m', right?)
>>>
>>> Yes. But I definitely did mean "-D" so that RWF_DSYNC was used with
>>> each 1MB write.
>>>
>>>> [ Zheng: please feel free to correct me if I'm saying something really
>>>> stupid below. ]
>>>>
>>>> So, one of the key things in my loop is the open/close operations. When
>>>> a file is closed in cephfs the capabilities (that's ceph jargon for what
>>>> sort of operations a client is allowed to perform on an inode) will
>>>> likely be released and that's when the metadata server will get the
>>>> updated file size. Before that, the client is allowed to modify the
>>>> file size if it has acquired the capabilities for doing so.
>>>
>>> So you are saying that O_DSYNC writes on ceph do not force file
>>> size metadata changes to the metadata server to be made stable?
>>>
>>>> OTOH, a pwrite operation will eventually get the -EDQUOT even with the
>>>> one-liner above because the client itself will realize it has exceeded a
>>>> certain threshold set by the MDS and will eventually update the server
>>>> with the new file size.
>>>
>>> Sure, but if the client crashes without having sent the updated file
>>> size to the server as part of an extending O_DSYNC write, then how
>>> is it recovered when the client reconnects to the server and
>>> accesses the file again?
>>
>>
>> For DSYNC write, client has already written data to object store. If client
>> crashes, MDS will set file to 'recovering' state and probe file size by
>> checking object store. Accessing the file is blocked during recovery.
>
> IOWs, ceph allows data integrity writes to the object store even
> though those writes breach limits on that object store? i.e.
> ceph quota essentially ignores O_SYNC/O_DSYNC metadata requirements?
>
Current cephfs quota implementation checks quota (compare i_size and
quota setting) at very beginning of ceph_write_iter(). Nothing do with
O_SYNC and O_DSYNC.
Regards
Yan, Zheng
> FWIW, quotas normally have soft and hard limits - soft limits can be
> breached with a warning and a time limit to return under the soft
> limit, but the quota hard limit should /never/ be breached by users.
>
> I guess that's the way of the world these days - fast and loose
> because everyone demands fast before correct....
>
> Cheers,
>
> Dave.
>
next prev parent reply other threads:[~2019-04-15 2:16 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-02 10:34 [RFC PATCH 0/2] Initial CephFS tests Luis Henriques
2019-04-02 10:34 ` [RFC PATCH 1/2] ceph: test basic ceph.quota.max_files quota Luis Henriques
2019-04-02 10:34 ` [RFC PATCH 2/2] ceph: test basic ceph.quota.max_bytes quota Luis Henriques
2019-04-02 21:09 ` Dave Chinner
2019-04-03 9:45 ` Luis Henriques
2019-04-03 12:17 ` Nikolay Borisov
2019-04-03 13:19 ` Luis Henriques
2019-04-03 21:47 ` Dave Chinner
2019-04-04 10:18 ` Luis Henriques
2019-04-12 1:15 ` Dave Chinner
2019-04-12 3:37 ` Yan, Zheng
2019-04-12 11:04 ` Luis Henriques
2019-04-14 22:15 ` Dave Chinner
2019-04-15 2:16 ` Yan, Zheng [this message]
2019-04-16 8:13 ` Dave Chinner
2019-04-16 10:48 ` Luis Henriques
2019-04-16 18:38 ` Gregory Farnum
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0cbc6885-93ae-ca79-184e-cdc56681202c@redhat.com \
--to=zyan@redhat.com \
--cc=ceph-devel@vger.kernel.org \
--cc=david@fromorbit.com \
--cc=fstests@vger.kernel.org \
--cc=lhenriques@suse.com \
--cc=nborisov@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox