From: Chuck Lever <chuck.lever@oracle.com>
To: Wim Colgate <Wim.Colgate@xensource.com>
Cc: nfs@lists.sourceforge.net
Subject: Re: NFS_UNSTABLE vs. FILE and DATA sync.
Date: Mon, 06 Aug 2007 15:42:15 -0400 [thread overview]
Message-ID: <46B77997.6000804@oracle.com> (raw)
In-Reply-To: <A9AD3C3BCF83FD4182B7D4D99E37FAD836BF1D@exchrdm.ad.xensource.com>
[-- Attachment #1: Type: text/plain, Size: 3545 bytes --]
Wim Colgate wrote:
> The linux kernel I was using is 2.6.18-8.
>
> To be fair, I was not trying to force NFS_FILE_SYNC; to make a long
> story short, I started with O_DIRECT (please don't cache data). I moved
> to add O_SYNC (don't return until my data is written safely). And when I
> couldn't explain why I was missing some data (discrepancy between client
> and server), I started investigating what was happening under the hood.
In fact O_DIRECT also guarantees that the data is on the server's disk
before the write() call returns. In some older versions of the client,
O_SYNC forced the direct I/O engine to use NFS_FILE_SYNC writes for
everything. I don't think that logic is there any more.
But what you describe above is a bug. A network dump would be the next
step to understand the true interaction between the client and the
server during a server reboot.
There were some bugs in the client's direct I/O engine where server
reboot recovery might result in data loss. Trond fixed a couple of bugs
in this area around 2.6.19 or 20. It would be interesting if you tested
a later kernel, just for behavioral comparison.
> -----Original Message-----
> From: Chuck Lever [mailto:chuck.lever@oracle.com]
> Sent: Monday, August 06, 2007 12:16 PM
> To: Wim Colgate
> Cc: nfs@lists.sourceforge.net
> Subject: Re: [NFS] NFS_UNSTABLE vs. FILE and DATA sync.
>
> Wim Colgate wrote:
>> Specifically I am trying to inject errors by manually (but politely)
>> bringing the NFS server down then up, then down (rinse and repeat ...)
>> while doing IO from a linux client. As mentioned the open file is
>> O_DIRECT and O_SYNC -- which I thought should mean either the data
> hits
>> the server's storage or I should get an error; and I'm more than happy
>> to deal with an IO error.
>>
>> I'm confident the writes are less than wsize (4096 bytes to be
> precise).
>>
>> Is there a 100% guaranteed method to get the behavior I thought
> O_DIRECT
>> and O_SYNC was providing?
>
> What behavior did you expect O_DIRECT + O_SYNC to provide? O_DIRECT
> means "don't cache data" and O_SYNC means "make sure the data is flushed
>
> to the server's disk before each write() system call returns."
> Technically, you don't need NFS_FILE_SYNC writes to do either of those.
>
> Which kernel are you testing? The client's use of NFS_FILE_SYNC writes
> changed over time.
>
>> -----Original Message-----
>> From: Peter Staubach [mailto:staubach@redhat.com]
>> Sent: Monday, August 06, 2007 10:33 AM
>> To: chuck.lever@oracle.com
>> Cc: Wim Colgate; nfs@lists.sourceforge.net
>> Subject: Re: [NFS] NFS_UNSTABLE vs. FILE and DATA sync.
>>
>> Chuck Lever wrote:
>>> Wim Colgate wrote:
>>>> If I have a soft mount, and open a file with O_DIRECT and O_SYNC,
>>>> should I ever expect a callback (nfs_writeback_done) with a
>>>> successful task->tk_status (i.e >= 0) with the committed state
>>>> (resp->verf->committed) set to NFS_UNSTABLE?
>>> Yes, this can happen if the server decides to return NFS_UNSTABLE.
>>> Rare, but possible.
>>>
>>>> A secondary question: if the above is expected, does this occur
>>>> because someone is caching the write and is there a mechanism to
>>>> disable this effect?
>>> Servers can return NFS_UNSTABLE to any WRITE request, so I can't
> think
>>> of a way this might be disabled.
>> Actually, it would be a protocol error for a server to return
>> a commitment level less than was requested by the client. The
>> server can return a greater commitment level, but not less than.
>>
>> ps
[-- Attachment #2: chuck.lever.vcf --]
[-- Type: text/x-vcard, Size: 290 bytes --]
begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
url:http://oss.oracle.com/~cel
version:2.1
end:vcard
[-- Attachment #3: Type: text/plain, Size: 315 bytes --]
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
[-- Attachment #4: Type: text/plain, Size: 140 bytes --]
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
prev parent reply other threads:[~2007-08-06 19:42 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-08-06 16:02 NFS_UNSTABLE vs. FILE and DATA sync Wim Colgate
2007-08-06 16:37 ` Chuck Lever
2007-08-06 17:10 ` Chuck Lever
2007-08-06 18:58 ` Trond Myklebust
2007-08-06 19:13 ` Chuck Lever
2007-08-06 19:19 ` Trond Myklebust
2007-08-06 19:35 ` Chuck Lever
2007-08-06 17:33 ` Peter Staubach
2007-08-06 17:40 ` Wim Colgate
2007-08-06 19:16 ` Chuck Lever
2007-08-06 19:33 ` Wim Colgate
2007-08-06 19:42 ` Chuck Lever [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46B77997.6000804@oracle.com \
--to=chuck.lever@oracle.com \
--cc=Wim.Colgate@xensource.com \
--cc=nfs@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox