linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Benny Halevy <bhalevy@panasas.com>
To: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: "William A. (Andy) Adamson" <androsadamson@gmail.com>,
	Fred Isaman <iisaman@netapp.com>,
	NFS list <linux-nfs@vger.kernel.org>,
	David Black <Black_David@emc.com>, "Rees, James" <rees@umich.edu>
Subject: Re: [PATCH 11/12] NFSv4.1: layoutcommit
Date: Fri, 25 Mar 2011 11:39:15 +0200	[thread overview]
Message-ID: <4D8C62C3.6080901@panasas.com> (raw)
In-Reply-To: <1300986948.31106.16.camel@lade.trondhjem.org>

On 2011-03-24 19:15, Trond Myklebust wrote:

> On Thu, 2011-03-24 at 18:58 +0200, Benny Halevy wrote:
>> On 2011-03-24 18:48, Trond Myklebust wrote:
>>> On Thu, 2011-03-24 at 18:37 +0200, Benny Halevy wrote:
>>>> On 2011-03-24 15:57, William A. (Andy) Adamson wrote:
>>>>>>> Only whole file layout support means that there is only one IOMODE_RW layout
>>>>>>> segment.
>>>>>>>
>>>>>>> Signed-off-by: Andy Adamson <andros@netapp.com>
>>>>>>> Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
>>>>>>> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
>>>>>>> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
>>>>>>> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
>>>>>>> Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn>
>>>>>>> Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
>>>>>>> Signed-off-by: Zhang Jingwang <zhangjingwang@nrchpc.ac.cn>
>>>>>>> Tested-by: Boaz Harrosh <bharrosh@panasas.com>
>>>>>>> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
>>>>>> The code in this patch is new and different enough from the one I/we
>>>>>> signed-off originally that they don't make sense here.
>>>>> Hi Benny
>>>>>
>>>>> OK with me
>>>>>
>>>>>>> +             /* references matched in nfs4_layoutcommit_release */
>>>>>>> +             wdata->lseg->pls_lc_cred =
>>>>>>> +                     get_rpccred(wdata->args.context->state->owner->so_cred);
>>>>>>> +             mark_inode_dirty_sync(wdata->inode);
>>>>>>> +             dprintk("%s: Set layoutcommit for inode %lu ",
>>>>>>> +                     __func__, wdata->inode->i_ino);
>>>>>>> +     }
>>>>>>> +     if (end_pos > wdata->lseg->pls_end_pos)
>>>>>>> +             wdata->lseg->pls_end_pos = end_pos;
>>>>>> The end_pos is essentially per inode, why maintain it per lseg?
>>>>>> How do you see this working with multiple lsegs in mind?
>>>>> The end-pos is per lseg, not per inode - each layoutcommit applies to
>>>>> a range of WRITES for a layoutsegment over the LAYOUTCOMMIT range.
>>>>>
>>>>> From Section 18.42.3
>>>>> .  The byte-range being committed is
>>>>>    specified through the byte-range (loca_offset and loca_length).  This
>>>>>    byte-range MUST overlap with one or more existing layouts previously
>>>>>    granted via LAYOUTGET
>>>>>
>>>>>
>>>>>    Also, loca_last_write_offset MUST overlap the range
>>>>>    described by loca_offset and loca_length.
>>>>>
>>>>> For the multiple lseg case: if the lsegs are merged, bookeeping
>>>>> end_pos per lseg just works. If a layoutdriver does not use merged
>>>>> lsegs, then there is a bit of work to do to walk the list of lsegs and
>>>>> determine the final end_pos for a given LAYOUTCOMMIT.  If there are
>>>>> multiple non-contiguous lsegs, each used for WRITEs then multiple
>>>>> LAYOUTCOMMITs will need to be sent, otherwise the LAYOUTCOMMIT
>>>>> byte-range will not overlap as required.
>>>>>
>>>> For the current layout types I believe that the LAYOUTCOMMIT can "merge"
>>>> multiple layout segments into a single LAYOUTCOMMIT, with a byte range
>>>> covering all segments and a last_byte_written offset which is just the maximum.
>>>> Future layout types may need this method though...
>>> Is that safe?
>>>
>>> What if I'm doing blocks and have written layout segment 1 & 3, but not
>>> layout segment 2? I don't want to have the MDS commit layout segment 2,
>>> and make the (lack of) data there visible to future readers.
>>>
>> I'm not the real expert on pnfs-blocks but my interpretation of rfc5663 is that the
>> list of extents in pnfs_block_layoutupdate4 may be sparse (or holey if you'd like).
>> Note that the client may have written just parts of the layout it got in one layout segment.
>> In this case too, you don't want to send multiple LAYOUTCOMMITs for each contiguous
>> area...
> Sure, but my understanding was that RFC5663 supports copy on write files
> and that the actual copying of the block may need to be done by the
> client (see section 2.3.4).
>
> If the new block is still uninitialised when you call LAYOUTCOMMIT, then
> that will corrupt the file if the client then dies before it finishes
> processing segment 2, since the previously valid data blocks are being
> replaced by uninitialised ones (which I presume will convert that
> section into a pre-allocated hole???).
>
The extents that LAYOUTCOMMITted MUST NOT be PNFS_BLOCK_INVALID_DATA,
as noted in 2.3.2:

   The bex_state
   field of each extent in the blu_commit_list MUST be set to
   PNFS_BLOCK_READ_WRITE_DATA.

Benny



  reply	other threads:[~2011-03-25  9:38 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-24 13:57 [PATCH 11/12] NFSv4.1: layoutcommit William A. (Andy) Adamson
2011-03-24 16:37 ` Benny Halevy
2011-03-24 16:48   ` Trond Myklebust
2011-03-24 16:54     ` Fred Isaman
2011-03-24 16:58     ` Benny Halevy
2011-03-24 17:15       ` Trond Myklebust
2011-03-25  9:39         ` Benny Halevy [this message]
  -- strict thread matches above, loose matches on Subject: below --
2011-03-24 14:45 William A. (Andy) Adamson
2011-03-24 17:06 ` Benny Halevy
2011-03-23 13:27 [PATCH 00/12] NFSv4.1 pnfs wave5 submission (try 2) Fred Isaman
2011-03-23 13:27 ` [PATCH 11/12] NFSv4.1: layoutcommit Fred Isaman
2011-03-23 20:33   ` Benny Halevy
2011-03-23 21:00   ` Christoph Hellwig
2011-03-23 21:15     ` Trond Myklebust
2011-03-23 21:21       ` Christoph Hellwig
2011-03-23 21:26         ` Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D8C62C3.6080901@panasas.com \
    --to=bhalevy@panasas.com \
    --cc=Black_David@emc.com \
    --cc=Trond.Myklebust@netapp.com \
    --cc=androsadamson@gmail.com \
    --cc=iisaman@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=rees@umich.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).