From: Benny Halevy <bhalevy@panasas.com>
To: "William A. (Andy) Adamson" <androsadamson@gmail.com>
Cc: Fred Isaman <iisaman@netapp.com>,
Trond Myklebust <Trond.Myklebust@netapp.com>,
NFS list <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH 11/12] NFSv4.1: layoutcommit
Date: Thu, 24 Mar 2011 18:37:03 +0200 [thread overview]
Message-ID: <4D8B732F.8020404@panasas.com> (raw)
In-Reply-To: <AANLkTinBuLxDq5zrxC=-0fS_md5CXTQAS_POsKA9issP@mail.gmail.com>
On 2011-03-24 15:57, William A. (Andy) Adamson wrote:
>>> Only whole file layout support means that there is only one IOMODE_RW layout
>>> segment.
>>>
>>> Signed-off-by: Andy Adamson <andros@netapp.com>
>>> Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
>>> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
>>> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
>>> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
>>> Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn>
>>> Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
>>> Signed-off-by: Zhang Jingwang <zhangjingwang@nrchpc.ac.cn>
>>> Tested-by: Boaz Harrosh <bharrosh@panasas.com>
>>> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
>>
>> The code in this patch is new and different enough from the one I/we
>> signed-off originally that they don't make sense here.
>
> Hi Benny
>
> OK with me
>
>>>
>>> + /* references matched in nfs4_layoutcommit_release */
>>> + wdata->lseg->pls_lc_cred =
>>> + get_rpccred(wdata->args.context->state->owner->so_cred);
>>> + mark_inode_dirty_sync(wdata->inode);
>>> + dprintk("%s: Set layoutcommit for inode %lu ",
>>> + __func__, wdata->inode->i_ino);
>>> + }
>>> + if (end_pos > wdata->lseg->pls_end_pos)
>>> + wdata->lseg->pls_end_pos = end_pos;
>>
>> The end_pos is essentially per inode, why maintain it per lseg?
>> How do you see this working with multiple lsegs in mind?
>
> The end-pos is per lseg, not per inode - each layoutcommit applies to
> a range of WRITES for a layoutsegment over the LAYOUTCOMMIT range.
>
> From Section 18.42.3
> . The byte-range being committed is
> specified through the byte-range (loca_offset and loca_length). This
> byte-range MUST overlap with one or more existing layouts previously
> granted via LAYOUTGET
>
>
> Also, loca_last_write_offset MUST overlap the range
> described by loca_offset and loca_length.
>
> For the multiple lseg case: if the lsegs are merged, bookeeping
> end_pos per lseg just works. If a layoutdriver does not use merged
> lsegs, then there is a bit of work to do to walk the list of lsegs and
> determine the final end_pos for a given LAYOUTCOMMIT. If there are
> multiple non-contiguous lsegs, each used for WRITEs then multiple
> LAYOUTCOMMITs will need to be sent, otherwise the LAYOUTCOMMIT
> byte-range will not overlap as required.
>
For the current layout types I believe that the LAYOUTCOMMIT can "merge"
multiple layout segments into a single LAYOUTCOMMIT, with a byte range
covering all segments and a last_byte_written offset which is just the maximum.
Future layout types may need this method though...
Benny
>>> +pnfs_layoutcommit_inode(struct inode *inode, int sync)
>>
>> "bool sync" makes more sense
>
>>> +{
>>> + struct nfs4_layoutcommit_data *data;
>>> + struct nfs_inode *nfsi = NFS_I(inode);
>>> + struct pnfs_layout_segment *lseg;
>>> + struct rpc_cred *cred;
>>> + loff_t end_pos;
>>> + int status = 0;
>>> +
>>> + dprintk("--> %s inode %lu\n", __func__, inode->i_ino);
>>> +
>>> + /* Note kzalloc ensures data->res.seq_res.sr_slot == NULL */
>>> + data = kzalloc(sizeof(*data), GFP_NOFS);
>>> + spin_lock(&inode->i_lock);
>>> +
>>> + if (!test_and_clear_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags)) {
>>
>> previously (i.e. in the linux-pnfs tree :) this function is called only
>> if layoutcommit_needed(), now I worry may waste a kzalloc too frequently.
>> I suggest testing (and not clearing) NFS_INO_LAYOUTCOMMIT before doing
>> the allocation to prevent that.
>
> Agreed.
>
>>> + end_pos = lseg->pls_end_pos;
>>> + cred = lseg->pls_lc_cred;
>>> + lseg->pls_end_pos = 0;
>>> + lseg->pls_lc_cred = NULL;
>>> +
>>> + if (!data) {
>>
>> eh?
>> why not test this before test_and_clear_bit(NFS_INO_LAYOUTCOMMIT ?
>
> Because we should clear the LAYOUTCOMMIT needed information from the inode.
> The LAYOUTCOMMIT for the file layout is an optimization. If the client
> can't alloc the required buffer, the compound just won't be sent.
>
> -->Andy
next prev parent reply other threads:[~2011-03-24 16:37 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-24 13:57 [PATCH 11/12] NFSv4.1: layoutcommit William A. (Andy) Adamson
2011-03-24 16:37 ` Benny Halevy [this message]
2011-03-24 16:48 ` Trond Myklebust
2011-03-24 16:54 ` Fred Isaman
2011-03-24 16:58 ` Benny Halevy
2011-03-24 17:15 ` Trond Myklebust
2011-03-25 9:39 ` Benny Halevy
-- strict thread matches above, loose matches on Subject: below --
2011-03-24 14:45 William A. (Andy) Adamson
2011-03-24 17:06 ` Benny Halevy
2011-03-23 13:27 [PATCH 00/12] NFSv4.1 pnfs wave5 submission (try 2) Fred Isaman
2011-03-23 13:27 ` [PATCH 11/12] NFSv4.1: layoutcommit Fred Isaman
2011-03-23 20:33 ` Benny Halevy
2011-03-23 21:00 ` Christoph Hellwig
2011-03-23 21:15 ` Trond Myklebust
2011-03-23 21:21 ` Christoph Hellwig
2011-03-23 21:26 ` Trond Myklebust
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D8B732F.8020404@panasas.com \
--to=bhalevy@panasas.com \
--cc=Trond.Myklebust@netapp.com \
--cc=androsadamson@gmail.com \
--cc=iisaman@netapp.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.