From mboxrd@z Thu Jan  1 00:00:00 1970
From: Chuck Lever <chuck.lever@oracle.com>
Subject: Re: [PATCH 1/2] NFS: Clean up nfs_update_request()
Date: Wed, 18 Jun 2008 16:40:14 -0400
Message-ID: <485972AE.9030405@oracle.com>
References: <20080616192348.20513.72423.stgit@localhost.localdomain>	 <20080616192348.20513.3284.stgit@localhost.localdomain>	 <B9BACB96-E0DB-4C6D-B55D-A50CC6F94474@oracle.com> <1213739807.7288.86.camel@localhost>
Reply-To: chuck.lever@oracle.com
Mime-Version: 1.0
Content-Type: multipart/mixed;
 boundary="------------070004020300090003060700"
Cc: linux-nfs@vger.kernel.org
To: Trond Myklebust <Trond.Myklebust@netapp.com>
Return-path: <linux-nfs-owner@vger.kernel.org>
Received: from agminet01.oracle.com ([141.146.126.228]:55303 "EHLO
	agminet01.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753350AbYFRUkm (ORCPT
	<rfc822;linux-nfs@vger.kernel.org>); Wed, 18 Jun 2008 16:40:42 -0400
In-Reply-To: <1213739807.7288.86.camel@localhost>
Sender: linux-nfs-owner@vger.kernel.org
List-ID: <linux-nfs.vger.kernel.org>

This is a multi-part message in MIME format.
--------------070004020300090003060700
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Trond Myklebust wrote:
> On Tue, 2008-06-17 at 17:35 -0400, Chuck Lever wrote:
>> Hi Trond-
>>
>> On Jun 16, 2008, at 3:23 PM, Trond Myklebust wrote:
>>> Simplify the loop in nfs_update_request by moving into a separate  
>>> function
>>> the code that attempts to update an existing cached NFS write.
>>>
>>> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
>>> ---
>>>
>>> fs/nfs/write.c |  105 +++++++++++++++++++++++++++ 
>>> +----------------------------
>>> 1 files changed, 53 insertions(+), 52 deletions(-)
>>>
>>> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
>>> index 8f12157..a675dc9 100644
>>> --- a/fs/nfs/write.c
>>> +++ b/fs/nfs/write.c
>>> @@ -34,9 +34,8 @@
>>> /*
>>>  * Local function declarations
>>>  */
>>> -static struct nfs_page * nfs_update_request(struct nfs_open_context*,
>>> -					    struct page *,
>>> -					    unsigned int, unsigned int);
>>> +static struct nfs_page * nfs_setup_write_request(struct  
>>> nfs_open_context *ctx,
>>> +		struct page *page, unsigned int offset, unsigned int bytes);
>>> static void nfs_pageio_init_write(struct nfs_pageio_descriptor *desc,
>>> 				  struct inode *inode, int ioflags);
>>> static void nfs_redirty_request(struct nfs_page *req);
>>> @@ -178,7 +177,7 @@ static int nfs_writepage_setup(struct  
>>> nfs_open_context *ctx, struct page *page,
>>> 	int ret;
>>>
>>> 	for (;;) {
>>> -		req = nfs_update_request(ctx, page, offset, count);
>>> +		req = nfs_setup_write_request(ctx, page, offset, count);
>>> 		if (!IS_ERR(req))
>>> 			break;
>>> 		ret = PTR_ERR(req);
>>> @@ -566,6 +565,41 @@ static inline int nfs_scan_commit(struct inode  
>>> *inode, struct list_head *dst, pg
>>> }
>>> #endif
>>>
>>> +static int nfs_try_to_update_request(struct nfs_page *req,
>>> +		unsigned int offset,
>>> +		unsigned int end)
>>> +{
>>> +	unsigned int rqend;
>>> +
>>> +	rqend = req->wb_offset + req->wb_bytes;
>>> +
>>> +	/*
>>> +	 * If the page doesn't match, or the offsets
>>> +	 * are non-contiguous, tell the caller to
>>> +	 * wait on the conflicting request.
>>> +	 */
>>> +	if (req->wb_page == NULL
>>> +	    || !nfs_dirty_request(req)
>>> +	    || offset > rqend
>>> +	    || end < req->wb_offset)
>>> +		return -EBUSY;
>> I don't see how this is equivalent to what it replaces in  
>> nfs_update_request:
>>
>>> -	if (req->wb_context != ctx
>>> -	    || req->wb_page != page
>>
>> What happened to the context check, for example?
> 
> That is a deliberate change.
> 
> We hold the page lock across nfs_vm_page_mkwrite and/or
> nfs_write_begin()/nfs_write_end(), so nobody else can add a new nfs_page
> to page->private during that time.
> 
> For that reason, the check that is done in nfs_flush_incompatible()
> together with the check for req->wb_page==NULL (done in
> try_to_update_page() while still holding the inode->i_lock) means that
> the above two checks are redundant.
> 
> ...actually, come to think of it, the check for req->wb_page == NULL is
> redundant too, since nfs_clear_request() is called at the end of
> nfs_inode_remove_request(), and hence nfs_page_find_request_locked()
> would fail to find anything.
> 
>>> +
>>> +	if (!nfs_set_page_tag_locked(req))
>>> +		return -EAGAIN;
>> This -EAGAIN (and the resulting logic in nfs_setup_write_request) is  
>> headache-inducing.  Is there a more straightforward way to break this  
>> up?
> 
> No. If the page is being written out, then we _must_ release all locks,
> wait on the request, and try again. This is not a change compared to the
> previous incarnation.

Right, understood... but I think the logic added here is harder to 
understand and thus more prone to break when something changes than the 
previous incarnation was.

>>> +
>>> +	/* Okay, the request matches. Update the region */
>>> +	if (offset < req->wb_offset) {
>>> +		req->wb_offset = offset;
>>> +		req->wb_pgbase = offset;
>>> +	}
>>> +	if (end > rqend)
>>> +		req->wb_bytes = end - req->wb_offset;
>>> +	else
>>> +		req->wb_bytes = rqend - req->wb_offset;
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> /*
>>>  * Try to update any existing write request, or create one if there  
>>> is none.
>>>  * In order to match, the request's credentials must match those of
>>> @@ -573,16 +607,15 @@ static inline int nfs_scan_commit(struct inode  
>>> *inode, struct list_head *dst, pg
>>>  *
>>>  * Note: Should always be called with the Page Lock held!
>>>  */
>>> -static struct nfs_page * nfs_update_request(struct  
>>> nfs_open_context* ctx,
>>> +static struct nfs_page * nfs_setup_write_request(struct  
>>> nfs_open_context* ctx,
>>> 		struct page *page, unsigned int offset, unsigned int bytes)
>>> {
>>> -	struct address_space *mapping = page->mapping;
>>> -	struct inode *inode = mapping->host;
>>> +	struct inode *inode = page->mapping->host;
>>> 	struct nfs_page		*req, *new = NULL;
>>> -	pgoff_t		rqend, end;
>>> +	unsigned int end;
>>> +	int error;
>>>
>>> 	end = offset + bytes;
>>> -
>>> 	for (;;) {
>>> 		/* Loop over all inode entries and see if we find
>>> 		 * A request for the page we wish to update
>>> @@ -590,26 +623,16 @@ static struct nfs_page *  
>>> nfs_update_request(struct nfs_open_context* ctx,
>>> 		spin_lock(&inode->i_lock);
>>> 		req = nfs_page_find_request_locked(page);
>>> 		if (req) {
>>> -			if (!nfs_set_page_tag_locked(req)) {
>>> -				int error;
>>> -
>>> -				spin_unlock(&inode->i_lock);
>>> +			error = nfs_try_to_update_request(req, offset, end);
>>> +			spin_unlock(&inode->i_lock);
>>> +			if (error == 0)
>>> +				break;
>>> +			if (error == -EAGAIN)
>>> 				error = nfs_wait_on_request(req);
>>> -				nfs_release_request(req);
>>> -				if (error < 0) {
>>> -					if (new) {
>>> -						radix_tree_preload_end();
>>> -						nfs_release_request(new);
>>> -					}
>>> -					return ERR_PTR(error);
>>> -				}
>>> +			nfs_release_request(req);
>>> +			if (error == 0)
>>> 				continue;
>>> -			}
>>> -			spin_unlock(&inode->i_lock);
>>> -			if (new) {
>>> -				radix_tree_preload_end();
>>> -				nfs_release_request(new);
>>> -			}
>>> +			req = ERR_PTR(error);
>>> 			break;
>>> 		}
>>>
>>> @@ -632,32 +655,10 @@ static struct nfs_page *  
>>> nfs_update_request(struct nfs_open_context* ctx,
>>> 		}
>>> 	}
>>>
>>> -	/* We have a request for our page.
>>> -	 * If the creds don't match, or the
>>> -	 * page addresses don't match,
>>> -	 * tell the caller to wait on the conflicting
>>> -	 * request.
>>> -	 */
>>> -	rqend = req->wb_offset + req->wb_bytes;
>>> -	if (req->wb_context != ctx
>>> -	    || req->wb_page != page
>>> -	    || !nfs_dirty_request(req)
>>> -	    || offset > rqend || end < req->wb_offset) {
>>> -		nfs_clear_page_tag_locked(req);
>>> -		return ERR_PTR(-EBUSY);
>>> +	if (new) {
>>> +		radix_tree_preload_end();
>>> +		nfs_release_request(new);
>>> 	}
>>> -
>>> -	/* Okay, the request matches. Update the region */
>>> -	if (offset < req->wb_offset) {
>>> -		req->wb_offset = offset;
>>> -		req->wb_pgbase = offset;
>>> -		req->wb_bytes = max(end, rqend) - req->wb_offset;
>>> -		goto out;
>>> -	}
>>> -
>>> -	if (end > rqend)
>>> -		req->wb_bytes = end - req->wb_offset;
>>> -
>>> out:
>> There's only one "goto out;" left in nfs_setup_write_request().  There  
>> are other return sites in nfs_setup_write_request() that simply  
>> "return req;" -- for what purpose is this one goto still here?
> 
> It could be replaced by a single 'return req'.

--------------070004020300090003060700
Content-Type: text/x-vcard; charset=utf-8;
 name="chuck_lever.vcf"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="chuck_lever.vcf"

begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
version:2.1
end:vcard


--------------070004020300090003060700--