From: Chuck Lever <chuck.lever@oracle.com>
To: Wim Colgate <Wim.Colgate@xensource.com>
Cc: nfs@lists.sourceforge.net
Subject: Re: NFS_UNSTABLE vs. FILE and DATA sync.
Date: Mon, 06 Aug 2007 15:42:15 -0400 [thread overview]
Message-ID: <46B77997.6000804@oracle.com> (raw)
In-Reply-To: <A9AD3C3BCF83FD4182B7D4D99E37FAD836BF1D@exchrdm.ad.xensource.com>
[-- Attachment #1: Type: text/plain, Size: 3545 bytes --]
Wim Colgate wrote:
> The linux kernel I was using is 2.6.18-8.
>
> To be fair, I was not trying to force NFS_FILE_SYNC; to make a long
> story short, I started with O_DIRECT (please don't cache data). I moved
> to add O_SYNC (don't return until my data is written safely). And when I
> couldn't explain why I was missing some data (discrepancy between client
> and server), I started investigating what was happening under the hood.
In fact O_DIRECT also guarantees that the data is on the server's disk
before the write() call returns. In some older versions of the client,
O_SYNC forced the direct I/O engine to use NFS_FILE_SYNC writes for
everything. I don't think that logic is there any more.
But what you describe above is a bug. A network dump would be the next
step to understand the true interaction between the client and the
server during a server reboot.
There were some bugs in the client's direct I/O engine where server
reboot recovery might result in data loss. Trond fixed a couple of bugs
in this area around 2.6.19 or 20. It would be interesting if you tested
a later kernel, just for behavioral comparison.
> -----Original Message-----
> From: Chuck Lever [mailto:chuck.lever@oracle.com]
> Sent: Monday, August 06, 2007 12:16 PM
> To: Wim Colgate
> Cc: nfs@lists.sourceforge.net
> Subject: Re: [NFS] NFS_UNSTABLE vs. FILE and DATA sync.
>
> Wim Colgate wrote:
>> Specifically I am trying to inject errors by manually (but politely)
>> bringing the NFS server down then up, then down (rinse and repeat ...)
>> while doing IO from a linux client. As mentioned the open file is
>> O_DIRECT and O_SYNC -- which I thought should mean either the data
> hits
>> the server's storage or I should get an error; and I'm more than happy
>> to deal with an IO error.
>>
>> I'm confident the writes are less than wsize (4096 bytes to be
> precise).
>>
>> Is there a 100% guaranteed method to get the behavior I thought
> O_DIRECT
>> and O_SYNC was providing?
>
> What behavior did you expect O_DIRECT + O_SYNC to provide? O_DIRECT
> means "don't cache data" and O_SYNC means "make sure the data is flushed
>
> to the server's disk before each write() system call returns."
> Technically, you don't need NFS_FILE_SYNC writes to do either of those.
>
> Which kernel are you testing? The client's use of NFS_FILE_SYNC writes
> changed over time.
>
>> -----Original Message-----
>> From: Peter Staubach [mailto:staubach@redhat.com]
>> Sent: Monday, August 06, 2007 10:33 AM
>> To: chuck.lever@oracle.com
>> Cc: Wim Colgate; nfs@lists.sourceforge.net
>> Subject: Re: [NFS] NFS_UNSTABLE vs. FILE and DATA sync.
>>
>> Chuck Lever wrote:
>>> Wim Colgate wrote:
>>>> If I have a soft mount, and open a file with O_DIRECT and O_SYNC,
>>>> should I ever expect a callback (nfs_writeback_done) with a
>>>> successful task->tk_status (i.e >= 0) with the committed state
>>>> (resp->verf->committed) set to NFS_UNSTABLE?
>>> Yes, this can happen if the server decides to return NFS_UNSTABLE.
>>> Rare, but possible.
>>>
>>>> A secondary question: if the above is expected, does this occur
>>>> because someone is caching the write and is there a mechanism to
>>>> disable this effect?
>>> Servers can return NFS_UNSTABLE to any WRITE request, so I can't
> think
>>> of a way this might be disabled.
>> Actually, it would be a protocol error for a server to return
>> a commitment level less than was requested by the client. The
>> server can return a greater commitment level, but not less than.
>>
>> ps
[-- Attachment #2: chuck.lever.vcf --]
[-- Type: text/x-vcard, Size: 290 bytes --]
begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
url:http://oss.oracle.com/~cel
version:2.1
end:vcard
[-- Attachment #3: Type: text/plain, Size: 315 bytes --]
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
[-- Attachment #4: Type: text/plain, Size: 140 bytes --]
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
prev parent reply other threads:[~2007-08-06 19:42 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-08-06 16:02 NFS_UNSTABLE vs. FILE and DATA sync Wim Colgate
2007-08-06 16:37 ` Chuck Lever
2007-08-06 17:10 ` Chuck Lever
2007-08-06 18:58 ` Trond Myklebust
2007-08-06 19:13 ` Chuck Lever
2007-08-06 19:19 ` Trond Myklebust
2007-08-06 19:35 ` Chuck Lever
2007-08-06 17:33 ` Peter Staubach
2007-08-06 17:40 ` Wim Colgate
2007-08-06 19:16 ` Chuck Lever
2007-08-06 19:33 ` Wim Colgate
2007-08-06 19:42 ` Chuck Lever [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46B77997.6000804@oracle.com \
--to=chuck.lever@oracle.com \
--cc=Wim.Colgate@xensource.com \
--cc=nfs@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.