linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Benny Halevy <bhalevy@panasas.com>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Daniel.Muntz@emc.com, andros@netapp.com, sjoshi@bluearc.com,
	linux-nfs@vger.kernel.org, NFSv4 <nfsv4@ietf.org>
Subject: Re: 4.1 client - LAYOUTCOMMIT & close
Date: Wed, 07 Jul 2010 15:05:20 +0300	[thread overview]
Message-ID: <4C346D80.8010405@panasas.com> (raw)
In-Reply-To: <1278448834.16176.5.camel@heimdal.trondhjem.org>

On Jul. 06, 2010, 23:40 +0300, Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@emc.com wrote: 
>> The COMMIT to the DS, ttbomk, commits data on the DS.  I see it as
>> orthogonal to updating the metadata on the MDS (but perhaps I'm wrong).
>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a synchronization
>> point, so even if the non-clustered server does not want to update
>> metadata on every DS I/O, the LAYOUTCOMMIT could also be a trigger to
>> execute whatever synchronization mechanism the implementer wishes to put
>> in the control protocol.
> 
> As far as I'm aware, there are no exceptions in RFC5661 that would allow
> pNFS servers to break the rule that any visible change to the data must
> be atomically accompanied with a change attribute update.
> 

Trond, I'm not sure how this rule you mentioned is specified.

See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and change/time_modify
in particular:

   For some layout protocols, the storage device is able to notify the
   metadata server of the occurrence of an I/O; as a result, the change
   and time_modify attributes may be updated at the metadata server.
   For a metadata server that is capable of monitoring updates to the
   change and time_modify attributes, LAYOUTCOMMIT processing is not
   required to update the change attribute.  In this case, the metadata
   server must ensure that no further update to the data has occurred
   since the last update of the attributes; file-based protocols may
   have enough information to make this determination or may update the
   change attribute upon each file modification.  This also applies for
   the time_modify attribute.  If the server implementation is able to
   determine that the file has not been modified since the last
   time_modify update, the server need not update time_modify at
   LAYOUTCOMMIT.  At LAYOUTCOMMIT completion, the updated attributes
   should be visible if that file was modified since the latest previous
   LAYOUTCOMMIT or LAYOUTGET

> As I see it, if your server allows one client to read data that may have
> been modified by another client that holds a WRITE layout for that range
> then (since that is a visible data change) it should provide a change
> attribute update irrespective of whether or not a LAYOUTCOMMIT has been
> sent.

the requirement for the server in WRITE's implementation section 
is quite weak: "It is assumed that the act of writing data to a file will
cause the time_modified and change attributes of the file to be updated."

The difference here is that for pNFS the written data is not guaranteed
to be visible until LAYOUTCOMMIT.  In a broader sense, assuming the clients
are caching dirty data and use a write-behind cache, application-written data
may be visible to other processes on the same host but not to others until
fsync() or close() - open-to-close semantics are the only thing the client
guarantees, right?  Issuing LAYOUTCOMMIT on fsync() and close() ensure the
data is committed to stable storage and is visible to all other clients in
the cluster.

Benny

> If your MDS is incapable of determining whether or not data has changed
> on the DSes, then it should probably recall the WRITE layout if someone
> tries to read data that may have been modified. Said server also needs a
> strategy for determining if a data change occurred if the client that
> held the WRITE layout died before it could send the LAYOUTCOMMIT.
> 
> Cheers
>    Trond
> 
>> -Dan
>>
>>> -----Original Message-----
>>> From: Andy Adamson [mailto:andros@netapp.com] 
>>> Sent: Tuesday, July 06, 2010 6:38 AM
>>> To: Muntz, Daniel
>>> Cc: sjoshi@bluearc.com; linux-nfs@vger.kernel.org; bhalevy@panasas.com
>>> Subject: Re: 4.1 client - LAYOUTCOMMIT & close
>>>
>>>
>>> On Jul 2, 2010, at 5:46 PM, <Daniel.Muntz@emc.com> wrote:
>>>
>>>> By "extremely lame server" I assume you mean any pNFS server that
>>>> doesn't have a cluster FS on the back end.
>>>
>>> No, I mean a pNFS file layout type server that depends upon 
>>> the 'hint'  
>>> of file size given by LAYOUTCOMMIT. This does not mean that the file  
>>> system has to be a cluster FS.
>>>
>>> If COMMIT through MDS is set, the MDS to DS protocol (be it a 
>>> cluster  
>>> FS or not) ensures the data is "commited" on the DSs.  
>>> LAYOUTCOMMIT is  
>>> not needed.
>>>
>>> If COMMITs are sent to the DSs (or FILE_SYNC writes), then 
>>> the MDS to  
>>> DS protocol (be it a cluster FS or not) should kick off a 
>>> back-end DS  
>>> to MDS communication to update the file size on the MDS.
>>>
>>> What I consider an 'extremely lame pNFS file layout server' is one  
>>> that requires COMMITs to the DS and then depends upon the 
>>> LAYOUTCOMMIT  
>>> to communicate the commited data size to the MDS.
>>>
>>> -->Andy
>>>
>>>
>>>> So while this might work
>>>> well for NetApp (as long as NetApp never ships a non-clustered  
>>>> pNFS), it
>>>> might break others, or at least severely impact their 
>>> performance.   
>>>> For
>>>> example, will the Solaris pNFS server work correctly without
>>>> LAYOUTCOMMIT?  IMHO, the client MUST issue the appropriate  
>>>> LAYOUTCOMMIT,
>>>> but the server is free to handle it as a no-op if the server
>>>> implementation does not need to utilize the payload.
>>>>
>>>>  -Dan
>>>>
>>>>> -----Original Message-----
>>>>> From: linux-nfs-owner@vger.kernel.org
>>>>> [mailto:linux-nfs-owner@vger.kernel.org] On Behalf Of Andy Adamson
>>>>> Sent: Friday, July 02, 2010 8:41 AM
>>>>> To: Sandeep Joshi
>>>>> Cc: linux-nfs@vger.kernel.org; bhalevy@panasas.com
>>>>> Subject: Re: 4.1 client - LAYOUTCOMMIT & close
>>>>>
>>>>>
>>>>> On Jul 1, 2010, at 8:07 PM, Sandeep Joshi wrote:
>>>>>
>>>>> Hi Sandeep
>>>>>
>>>>>>
>>>>>> In certain cases, I don't see layoutcommit on a file at all even
>>>>>> after doing many writes.
>>>>>
>>>>> FYI:
>>>>>
>>>>> You should not be paying attention to layoutcommits  - they have no
>>>>> value for the file layout type.
>>>>>
>>>>> From RFC 5661:
>>>>>
>>>>> "The LAYOUTCOMMIT operation commits chages in the layout 
>>> represented
>>>>> by the current filehandle, client ID (derived from the 
>>> session ID in
>>>>> the preceding SEQUENCE operation), byte-range, and stateid."
>>>>>
>>>>> For the block layout type, this sentence has meaning in that
>>>>> there is
>>>>> a layoutupdate4 payload that enumerates the blocks that 
>>> have changed
>>>>> state from being 'handed out' to being 'written'.
>>>>>
>>>>> The file layout type has no layoutupdate4 payload, and the
>>>>> layout does
>>>>> not change due to writes, and thus the LAYOUTCOMMIT call 
>>> is useless.
>>>>>
>>>>> The only field in the LAYOUTCOMMIT4args that might possibly
>>>>> be useful
>>>>> is the loca_last_write_offset which tells the server what 
>>> the client
>>>>> thinks is the EOF of the file after WRITE. It is an extremely lame
>>>>> server (file layout type server) that depends upon clients for this
>>>>> info.
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Client side operations:
>>>>>>
>>>>>> open
>>>>>> write(s)
>>>>>> close
>>>>>>
>>>>>>
>>>>>> On server side (observed operations):
>>>>>>
>>>>>> open
>>>>>> layoutget's
>>>>>> close
>>>>>>
>>>>>>
>>>>>> But, I do not see laycommit at all. In terms data written
>>>>> by client
>>>>>> it is about 4-5MB.
>>>>>>
>>>>>> When does client issue laycommit?
>>>>>
>>>>> The latest linux client sends a layout commit when the VFS does a
>>>>> super_operations.write_inode call which happens when the 
>>> metadata of
>>>>> an inode needs updating. We are seriously considering removing the
>>>>> layoutcommit call from the file layout client.
>>>>>
>>>>> -->Andy
>>>>>
>>>>>>
>>>>>>
>>>>>> regards,
>>>>>>
>>>>>> Sandeep
>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>> linux-nfs"
>>>>>> in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>> linux-nfs" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> 
> 

  parent reply	other threads:[~2010-07-07 12:05 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-01 23:47 4.1 client - LAYOUTCOMMIT Sandeep Joshi
2010-07-02  0:07 ` 4.1 client - LAYOUTCOMMIT & close Sandeep Joshi
     [not found]   ` <A062FCC8662DA848949F7C3046B9BEAE01F3A6EE-e1HlL03umel79urLq6li5IWksG4c/lV9Sp/tIRYA5EM@public.gmane.org>
2010-07-02 15:41     ` Andy Adamson
2010-07-02 17:08       ` 4.1 client - LAYOUTCOMMIT &amp; close Suchit Kaura
     [not found]         ` <loom.20100702T190300-538-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org>
2010-07-06 13:12           ` Andy Adamson
2010-07-06 13:23             ` Benny Halevy
2010-07-02 21:46       ` 4.1 client - LAYOUTCOMMIT & close Daniel.Muntz
2010-07-06 13:35         ` Benny Halevy
2010-07-06 13:37         ` Andy Adamson
2010-07-06 14:04           ` Boaz Harrosh
2010-07-06 19:20           ` Daniel.Muntz
2010-07-06 20:40             ` Trond Myklebust
2010-07-06 22:50               ` Daniel.Muntz
2010-07-06 23:23                 ` Trond Myklebust
2010-07-07 12:05               ` Benny Halevy [this message]
2010-07-07 13:06                 ` Trond Myklebust
2010-07-07 13:18                   ` [nfsv4] " Trond Myklebust
2010-07-07 13:51                     ` Benny Halevy
2010-07-07 14:03                       ` Trond Myklebust
2010-07-07 17:45                         ` Dean Hildebrand
2010-07-07 20:39                         ` Daniel.Muntz
2010-07-07 21:01                           ` Trond Myklebust
2010-07-07 22:04                             ` Noveck_David
2010-07-07 22:27                               ` Trond Myklebust
2010-07-07 22:44                               ` david.black
2010-07-07 22:52                                 ` Trond Myklebust
2010-07-07 23:09                                   ` Trond Myklebust
     [not found]                                     ` <1278544497.15524.17.camel@heimdal.trondhje! m .org>
     [not found]                                       ` < 4C35F5E3.3000604@panasas.com>
2010-07-07 23:14                                     ` Trond Myklebust
2010-07-08 15:59                                       ` Benny Halevy
2010-07-08 20:30                                         ` [nfsv4] " david.black
2010-07-08 21:16                                           ` Trond Myklebust
2010-07-08 23:51                                             ` Daniel.Muntz
     [not found]                                             ` <1278623771.13551.54.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2010-07-09  0:03                                               ` [nfsv4] " Sandeep Joshi
2010-07-08 22:12                                           ` sfaibish
2010-07-08 23:01                                             ` Tom Haynes
2010-07-08 23:57                                               ` sfaibish
2010-07-09  0:41                                               ` [nfsv4] " Trond Myklebust
2010-07-06 13:20 ` 4.1 client - LAYOUTCOMMIT Benny Halevy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C346D80.8010405@panasas.com \
    --to=bhalevy@panasas.com \
    --cc=Daniel.Muntz@emc.com \
    --cc=andros@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=nfsv4@ietf.org \
    --cc=sjoshi@bluearc.com \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).