[Ocfs2-devel] [PATCH 1/1] OCFS2: anti stale inode for nfs (V4)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Wengang Wang <wen.gang.wang@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH 1/1] OCFS2: anti stale inode for nfs (V4)
Date: Fri, 27 Feb 2009 14:48:14 +0800	[thread overview]
Message-ID: <49A78CAE.2060807@oracle.com> (raw)
In-Reply-To: <20090225165828.GC26476@ca-server1.us.oracle.com>



Joel Becker wrote:
> On Wed, Feb 25, 2009 at 09:57:48PM +0800, Wengang Wang wrote:
>> Joel Becker wrote:
>>> On Wed, Feb 25, 2009 at 11:44:00AM +0800, Wengang Wang wrote:
>>>>>> +	status = ocfs2_test_inode_bit(osb, blkno, &set);
>>>>>> +	if (status < 0) {
>>>>>> +		if (status == -EINVAL) {
>>>>>> +			/* meta block never be re-allocated as data block.
>>>>>> +			 * nfsd gives us wrong blkno */
>>>>>> +			status = -EEXIST;
>>>>>> +		} else {
>>>>>> +			mlog(ML_ERROR, "test inode bit failed %d\n", status);
>>>>>> +		}
>>>>>> +		goto unlock_nfs_sync;
>>>>>> +	}
>>>>> 	This looks very nice, but it should return -ESTALE instead
>>>>> of -EEXIST; 
>>>> why? I think it's an EEXIST.
>>>> if you say nfs don't know how to deal with the EEXIST, I agree to change 
>>>> it.
>>>> it most likely does not exist, and nfs knows how to handle
>>>>> that error.
>>> 	EEXIST means "the file exists", which it clearly does not.
>>> I think you mean ENOENT.  
>> Sorry, yes I meant ENOENT.
>> also, the following EEXIST is wrong too(and also in some existing codes).
>>
>> +	inode_alloc_inode =
>> +		ocfs2_get_system_file_inode(osb, INODE_ALLOC_SYSTEM_INODE,
>> +					    suballoc_slot);
>> +	if (!inode_alloc_inode) {
>> +		/* the error code could be inaccurate, but we are not able to
>> +		 * get the correct one. */
>> +		status = -EEXIST;
>>
>> here, we don't have the accurate reason. I suggest we return PTR_ERR(xxx) 
>> instead of NULL for ocfs2_get_system_file_inode() ( 
>> _ocfs2_get_system_file_inode() actually) and change code where 
>> ocfs2_get_system_file_inode() is called.
> 
> 	Yeah, I noticed that we don't have an error from
> ocfs2_get_system_file_inode().  But we don't have any need to fix that
> right now.  That can be a later patch if we care.
> 

Ok.

>>> But what is really happening here is nfs asks
>>> "I used to have file XXX, is it still there?"  The correct response
>>> seems to me to be "Um, not anymore - it's stale".
>> yes, your case is the "normal" stale case.
>> checking where EINVAL is returned, it's either 
>> (!OCFS2_IS_VALID_DINODE(inode_fe)) or alloc slot out of range.
>> in the two cases, I prefer nfs passed a wrong fh(blkno) instead of a stale 
>> one, that's nfs does a buggy request.
>> well, returning ESTALE is also OK. I just think ENOENT is more accurate.
> 
> 	I agree with you that ENOENT is more accurate, but right now
> we're playing nice with nfsd.  Actually, return -ENOENT from the test
> function, but in ocfs2_get_dentry() change it to -ESTALE for nfsd.
> 
Ok.

>> if my understand is correct:),
>> yes, when there could be a new group added to a chain. and the chain list 
>> changed(pointers). and there could be a chain 
>> relink(ocfs2_relink_block_group()).
>> but I don't understand why it affects the group in question.
>> the group in question is gotten by reading the block which is numbered of 
>> inode blkno minus alloc bit. that doesn't involve any pointer.
>> so, could you tell me more detail?
> 
> 	I agree it probably doesn't affect this direct block group.  At
> least, unless we change the code to do some walking or whatever.  But we
> want this function to be right no matter the behavior.
> 
Ok.

>>> 	Also, it is a correctness issue.  Your way may happen to work
>>> for the stale inode checking, but that's by sheer luck.
>> until I got clear your idea about the groups, I persist :P.
>>
>>>  Putting the
>>> access of the group inside the lock is definitively correct.
>> Yes, agree.
>> I just want a better performance when we are correct.
> 
> 	We will never sacrifice correctness for performance.  You don't
> access data outside of locks unless you have some really, really good
> reason.  This won't hurt any performance - it's a standard operation we
> do all the time: "take lock, do something with group descriptor, drop
> lock".  If it was a performance problem it would be a problem
> everywhere, and we'd solve it in a correct fashion.
> 	Secondly, if some later code decides to use this function, they
> certainly won't expect it to be doing unsafe things.  And they won't be
> inside a very specific nfs code path.
Yeah, as a common function it shouldn't be nfs specific.

thanks,
wengang.

next prev parent reply	other threads:[~2009-02-27  6:48 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-20  9:23 [Ocfs2-devel] [PATCH 1/1] OCFS2: anti stale inode for nfs (V4) wengang wang
2009-02-25  2:08 ` Joel Becker
2009-02-25  3:44   ` Wengang Wang
2009-02-25 11:27     ` Joel Becker
2009-02-25 13:57       ` Wengang Wang
2009-02-25 16:58         ` Joel Becker
2009-02-27  6:48           ` Wengang Wang [this message]
2009-02-25  2:10 ` Joel Becker
2009-02-25  2:36 ` Tao Ma
2009-02-25  3:46   ` Wengang Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49A78CAE.2060807@oracle.com \
    --to=wen.gang.wang@oracle.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.