public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Kazuo Ito <ito_kazuo_g3@lab.ntt.co.jp>
To: Trond Myklebust <trondmy@gmail.com>,
	Benjamin Coddington <bcodding@redhat.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>,
	linux-nfs@vger.kernel.org,
	Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>,
	watanabe.hiroyuki@lab.ntt.co.jp
Subject: Re: [PATCH] pNFS: Avoid read-modify-write for page-aligned full page write
Date: Tue, 12 Feb 2019 13:34:28 +0900	[thread overview]
Message-ID: <bea45df0-0bf1-048d-e842-7fd7eb0f2b34@lab.ntt.co.jp> (raw)
In-Reply-To: <17e929a2eea3d2c33dcd3d2c9b8d8a932568be47.camel@hammerspace.com>

On 2019/02/08 23:58, Trond Myklebust wrote:
> On Fri, 2019-02-08 at 16:54 +0900, 伊藤和夫 wrote:
>> On 2019/02/07 22:37, Benjamin Coddington wrote:
>>> On 7 Feb 2019, at 3:12, Kazuo Ito wrote:
>>> [snipped]
>>>> @@ -299,8 +305,10 @@ static int nfs_want_read_modify_write(struct
>>>> file
>>>> *file, struct page *page,
>>>>       unsigned int end = offset + len;
>>>>
>>>>       if (pnfs_ld_read_whole_page(file->f_mapping->host)) {
>>>> -        if (!PageUptodate(page))
>>>> -            return 1;
>>>> +        if (!PageUptodate(page)) {
>>>> +            if (pglen && (end < pglen || offset))
>>>> +                return 1;
>>>> +        }
>>>>           return 0;
>>>>       }
>>>
>>> This looks right.  I think that a static inline bool
>>> nfs_write_covers_page,
>>> or full_page_write or similar might make sense here, as we do the
>>> same test
>>> just below, and would make the code easier to quickly understand.
>>>
>>> Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
>>   >
>>   > Ben
>>
>> As per Ben's comment, I made the check for full page write a static
>> inline function and both the block-oriented and the non-block-
>> oriented paths call it.
>>
>> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
>> index 29553fdba8af..458c77ccf274 100644
>> --- a/fs/nfs/file.c
>> +++ b/fs/nfs/file.c
>> @@ -276,6 +276,12 @@ EXPORT_SYMBOL_GPL(nfs_file_fsync);
>>     * then a modify/write/read cycle when writing to a page in the
>>     * page cache.
>>     *
>> + * Some pNFS layout drivers can only read/write at a certain block
>> + * granularity like all block devices and therefore we must perform
>> + * read/modify/write whenever a page hasn't read yet and the data
>> + * to be written there is not aligned to a block boundary and/or
>> + * smaller than the block size.
>> + *
>>     * The modify/write/read cycle may occur if a page is read before
>>     * being completely filled by the writer.  In this situation, the
>>     * page must be completely written to stable storage on the server
>> @@ -291,15 +297,23 @@ EXPORT_SYMBOL_GPL(nfs_file_fsync);
>>     * and that the new data won't completely replace the old data in
>>     * that range of the file.
>>     */
>> -static int nfs_want_read_modify_write(struct file *file, struct page
>> *page,
>> -			loff_t pos, unsigned len)
>> +static bool nfs_full_page_write(struct page *page, loff_t pos,
>> unsigned
>> len)
>>    {
>>    	unsigned int pglen = nfs_page_length(page);
>>    	unsigned int offset = pos & (PAGE_SIZE - 1);
>>    	unsigned int end = offset + len;
>>
>> +	if (pglen && ((end < pglen) || offset))
>> +	    return 0;
>> +	return 1;
>> +}
>> +
>> +static int nfs_want_read_modify_write(struct file *file, struct page
>> *page,
>> +			loff_t pos, unsigned len)
>> +{
>>    	if (pnfs_ld_read_whole_page(file->f_mapping->host)) {
>> -		if (!PageUptodate(page))
>> +		if (!PageUptodate(page) &&
>> +		    !nfs_full_page_write(page, pos, len))
>>    			return 1;
>>    		return 0;
>>    	}
>> @@ -307,8 +321,7 @@ static int nfs_want_read_modify_write(struct
>> file
>> *file, struct page *page,
>>    	if ((file->f_mode & FMODE_READ) &&	/* open for read? */
>>    	    !PageUptodate(page) &&		/* Uptodate? */
>>    	    !PagePrivate(page) &&		/* i/o request already? */
>> -	    pglen &&				/* valid bytes of
>> file? */
>> -	    (end < pglen || offset))		/* replace all valid
>> bytes? */
>> +	    !nfs_full_page_write(page, pos, len))
>>    		return 1;
>>    	return 0;
>>    }
> 
> How about adding a separate
> 
> if (PageUptodate(page) || nfs_full_page_write())
>      return 0;
> 
> before the check for pNFS?
 >
 > That means we won't have to duplicate those for the pNFS block and
 > ordinary case, and it improves code clarity.

Yes, it is much better, and

> BTW: Why doesn't the pNFS case check for PagePrivate(page)? That looks
> like a bug which would cause the existing write to get corrupted.
> If so, we should move that check too into the common code.

It's been that way since the check for
pnfs_ld_read_whole_page(file->f_mapping->host) was added there.
As you pointed out, it shouldn't try to initiate read when there's
an outstanding write.
So, I'll update the patch with these changes, including check for
ongoing I/O, and come up with newer test results in a couple of days.
--
kazuo ito (ito_kazuo_g3@iecl.ntt.co.jp)
NTT OSS Center

      reply	other threads:[~2019-02-12  4:34 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-07  8:12 [PATCH] pNFS: Avoid read-modify-write for page-aligned full page write Kazuo Ito
2019-02-07 13:37 ` Benjamin Coddington
2019-02-08  7:54   ` 伊藤和夫
2019-02-08 14:58     ` Trond Myklebust
2019-02-12  4:34       ` Kazuo Ito [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bea45df0-0bf1-048d-e842-7fd7eb0f2b34@lab.ntt.co.jp \
    --to=ito_kazuo_g3@lab.ntt.co.jp \
    --cc=anna.schumaker@netapp.com \
    --cc=bcodding@redhat.com \
    --cc=konishi.ryusuke@lab.ntt.co.jp \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@gmail.com \
    --cc=watanabe.hiroyuki@lab.ntt.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox