From: Benny Halevy <bhalevy@panasas.com>
To: Fred Isaman <iisaman@netapp.com>
Cc: linux-nfs@vger.kernel.org, Trond Myklebust <Trond.Myklebust@netapp.com>
Subject: Re: [PATCH 08/12] NFSv4.1: add generic layer hooks for pnfs COMMIT
Date: Wed, 23 Mar 2011 22:39:04 +0200 [thread overview]
Message-ID: <4D8A5A68.7060302@panasas.com> (raw)
In-Reply-To: <AANLkTi=_9nmyg4j1LqC61QBAGD5q0y0ndBMgzNLN_0Py@mail.gmail.com>
On 2011-03-23 22:30, Fred Isaman wrote:
> 2011/3/23 Benny Halevy <bhalevy@panasas.com>:
>> On 2011-03-23 15:27, Fred Isaman wrote:
>>>
>>> We create three major hooks for the pnfs code.
>>>
>>> pnfs_mark_request_commit() is called during writeback_done from
>>> nfs_mark_request_commit, which gives the driver an opportunity to
>>> claim it wants control over commiting a particular req.
>>>
>>> pnfs_choose_commit_list() is called from nfs_scan_list
>>> to choose which list a given req should be added to, based on
>>> where we intend to send it for COMMIT. It is up to the driver
>>> to have preallocated list headers for each destination it may need.
>>>
>>> pnfs_commit_list() is how the driver actually takes control, it is
>>> used instead of nfs_commit_list().
>>>
>>> In order to pass information between the above functions, we create
>>> a union in nfs_page to hold a lseg (which is possible because the req is
>>> not on any list while in transition), and add some flags to indicate
>>> if we need to use the pnfs code.
>>>
>>> Signed-off-by: Fred Isaman<iisaman@netapp.com>
>>> ---
>>> fs/nfs/pagelist.c | 5 ++-
>>> fs/nfs/pnfs.h | 73
>>> ++++++++++++++++++++++++++++++++++++++++++++++
>>> fs/nfs/write.c | 41 ++++++++++++++++---------
>>> include/linux/nfs_fs.h | 1 +
>>> include/linux/nfs_page.h | 6 +++-
>>> 5 files changed, 108 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
>>> index fd85618..87a593c 100644
>>> --- a/fs/nfs/pagelist.c
>>> +++ b/fs/nfs/pagelist.c
>>> @@ -398,6 +398,7 @@ int nfs_scan_list(struct nfs_inode *nfsi,
>>> pgoff_t idx_end;
>>> int found, i;
>>> int res;
>>> + struct list_head *list;
>>>
>>> res = 0;
>>> if (npages == 0)
>>> @@ -418,10 +419,10 @@ int nfs_scan_list(struct nfs_inode *nfsi,
>>> idx_start = req->wb_index + 1;
>>> if (nfs_set_page_tag_locked(req)) {
>>> kref_get(&req->wb_kref);
>>> - nfs_list_remove_request(req);
>>> radix_tree_tag_clear(&nfsi->nfs_page_tree,
>>> req->wb_index, tag);
>>> - nfs_list_add_request(req, dst);
>>> + list = pnfs_choose_commit_list(req, dst);
>>> + nfs_list_add_request(req, list);
>>> res++;
>>> if (res == INT_MAX)
>>> goto out;
>>> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
>>> index 6380b94..5370f1b 100644
>>> --- a/fs/nfs/pnfs.h
>>> +++ b/fs/nfs/pnfs.h
>>> @@ -74,6 +74,13 @@ struct pnfs_layoutdriver_type {
>>> /* test for nfs page cache coalescing */
>>> int (*pg_test)(struct nfs_pageio_descriptor *, struct nfs_page *,
>>> struct nfs_page *);
>>>
>>> + /* Returns true if layoutdriver wants to divert this request to
>>> + * driver's commit routine.
>>> + */
>>> + bool (*mark_pnfs_commit)(struct pnfs_layout_segment *lseg);
>>> + struct list_head * (*choose_commit_list) (struct nfs_page *req);
>>> + int (*commit_pagelist)(struct inode *inode, struct list_head
>>> *mds_pages, int how);
>>> +
>>> /*
>>> * Return PNFS_ATTEMPTED to indicate the layout code has attempted
>>> * I/O, else return PNFS_NOT_ATTEMPTED to fall back to normal NFS
>>> @@ -169,6 +176,51 @@ static inline int pnfs_enabled_sb(struct nfs_server
>>> *nfss)
>>> return nfss->pnfs_curr_ld != NULL;
>>> }
>>>
>>> +static inline void
>>> +pnfs_mark_request_commit(struct nfs_page *req, struct pnfs_layout_segment
>>> *lseg)
>>> +{
>>> + if (lseg) {
>>> + struct pnfs_layoutdriver_type *ld;
>>> +
>>> + ld =
>>> NFS_SERVER(req->wb_page->mapping->host)->pnfs_curr_ld;
>>> + if (ld->mark_pnfs_commit&& ld->mark_pnfs_commit(lseg)) {
>>
>> nit: space before and after "&&"
>>
>
> I'm ignoring the formatting nits due to mailer mishandling, per your
> subsequent email.
>
>>> + set_bit(PG_PNFS_COMMIT,&req->wb_flags);
>>
>> nit: space after comma
>>
>>> + req->wb_commit_lseg = get_lseg(lseg);
>>
>> What are the atomicity requirements of the PG_PNFS_COMMIT bit in wb_flags
>> and the validity of wb_commit_lseg?
>>
>
> It is handled under the PG_BUSY bit_lock.
>
OK
>>> + }
>>> + }
>>> +}
>>> +
>>> +static inline int
>>> +pnfs_commit_list(struct inode *inode, struct list_head *mds_pages, int
>>> how)
>>> +{
>>> + if (!test_and_clear_bit(NFS_INO_PNFS_COMMIT,&NFS_I(inode)->flags))
>>> + return PNFS_NOT_ATTEMPTED;
>>> + return NFS_SERVER(inode)->pnfs_curr_ld->commit_pagelist(inode,
>>> mds_pages, how);
>>
>> Again, I don't get the concurrency control here...
>> NFS_INO_PNFS_COMMIT is set in pnfs_choose_commit_list under i_lock
>> but then pnfs_commit_list is called outside the i_lock so what guarantees
>> that NFS_INO_PNFS_COMMIT is still valid with respect to the output of
>> pnfs_choose_commit_list?
>>
>
>
> They are both done under the NFS_INO_COMMIT bit lock.
>
>>> +}
>>> +
>>> +static inline struct list_head *
>>> +pnfs_choose_commit_list(struct nfs_page *req, struct list_head *mds)
>>> +{
>>> + struct list_head *rv;
>>> +
>>> + if (test_and_clear_bit(PG_PNFS_COMMIT,&req->wb_flags)) {
>>
>> nit: space after comma
>>
>>> + struct inode *inode =
>>> req->wb_commit_lseg->pls_layout->plh_inode;
>>> +
>>> + set_bit(NFS_INO_PNFS_COMMIT,&NFS_I(inode)->flags);
>>
>> nit: ditto
>>
>>> + rv =
>>> NFS_SERVER(inode)->pnfs_curr_ld->choose_commit_list(req);
>>> + /* matched by ref taken when PG_PNFS_COMMIT is set */
>>> + put_lseg(req->wb_commit_lseg);
>>
>> Since wb_commit_lseg and wb_list are in a union, how about
>> INIT_LIST_HEAD(&req->wb_list);
>
> I actually had that there. Trond pointed out it was unnecessary and
> asked that it be removed.
>
Seems fragile to me. I hope Trond's around to fix it when it breaks ;-)
>>
>>> + } else
>>> + rv = mds;
>>> + return rv;
>>> +}
>>> +
>>> +static inline void pnfs_clear_request_commit(struct nfs_page *req)
>>> +{
>>> + if (test_and_clear_bit(PG_PNFS_COMMIT,&req->wb_flags))
>>> + put_lseg(req->wb_commit_lseg);
>>
>> ditto
>
> I see your ditto, and raise you one :)
>
:)
>>
>> Benny
>>
>>> +}
>>> +
>>> #else /* CONFIG_NFS_V4_1 */
>>>
>>> static inline void pnfs_destroy_all_layouts(struct nfs_client *clp)
>>> @@ -252,6 +304,27 @@ pnfs_pageio_init_write(struct nfs_pageio_descriptor
>>> *pgio, struct inode *ino)
>>> pgio->pg_test = NULL;
>>> }
>>>
>>> +static inline void
>>> +pnfs_mark_request_commit(struct nfs_page *req, struct pnfs_layout_segment
>>> *lseg)
>>> +{
>>> +}
>>> +
>>> +static inline int
>>> +pnfs_commit_list(struct inode *inode, struct list_head *mds_pages, int
>>> how)
>>> +{
>>> + return PNFS_NOT_ATTEMPTED;
>>> +}
>>> +
>>> +static inline struct list_head *
>>> +pnfs_choose_commit_list(struct nfs_page *req, struct list_head *mds)
>>> +{
>>> + return mds;
>>> +}
>>> +
>>> +static inline void pnfs_clear_request_commit(struct nfs_page *req)
>>> +{
>>> +}
>>> +
>>> #endif /* CONFIG_NFS_V4_1 */
>>>
>>> #endif /* FS_NFS_PNFS_H */
>>> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
>>> index f5f005e..6927a18 100644
>>> --- a/fs/nfs/write.c
>>> +++ b/fs/nfs/write.c
>>> @@ -441,7 +441,7 @@ nfs_mark_request_dirty(struct nfs_page *req)
>>> * Add a request to the inode's commit list.
>>> */
>>> static void
>>> -nfs_mark_request_commit(struct nfs_page *req)
>>> +nfs_mark_request_commit(struct nfs_page *req, struct pnfs_layout_segment
>>> *lseg)
>>> {
>>> struct inode *inode = req->wb_context->path.dentry->d_inode;
>>> struct nfs_inode *nfsi = NFS_I(inode);
>>> @@ -453,6 +453,7 @@ nfs_mark_request_commit(struct nfs_page *req)
>>> NFS_PAGE_TAG_COMMIT);
>>> nfsi->ncommit++;
>>> spin_unlock(&inode->i_lock);
>>> + pnfs_mark_request_commit(req, lseg);
>>
>> Why do this out of the i_lock?
>>
>
> Both the req and lseg are held in memory by references, and any
> serialization needs are met by the req's PG_BUSY bit lock.
>
> Fred
>
>>> inc_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
>>> inc_bdi_stat(req->wb_page->mapping->backing_dev_info,
>>> BDI_RECLAIMABLE);
>>> __mark_inode_dirty(inode, I_DIRTY_DATASYNC);
>>> @@ -481,10 +482,11 @@ int nfs_write_need_commit(struct nfs_write_data
>>> *data)
>>> }
>>>
>>> static inline
>>> -int nfs_reschedule_unstable_write(struct nfs_page *req)
>>> +int nfs_reschedule_unstable_write(struct nfs_page *req,
>>> + struct nfs_write_data *data)
>>> {
>>> if (test_and_clear_bit(PG_NEED_COMMIT,&req->wb_flags)) {
>>> - nfs_mark_request_commit(req);
>>> + nfs_mark_request_commit(req, data->lseg);
>>> return 1;
>>> }
>>> if (test_and_clear_bit(PG_NEED_RESCHED,&req->wb_flags)) {
>>> @@ -495,7 +497,7 @@ int nfs_reschedule_unstable_write(struct nfs_page
>>> *req)
>>> }
>>> #else
>>> static inline void
>>> -nfs_mark_request_commit(struct nfs_page *req)
>>> +nfs_mark_request_commit(struct nfs_page *req, struct pnfs_layout_segment
>>> *lseg)
>>> {
>>> }
>>>
>>> @@ -512,7 +514,8 @@ int nfs_write_need_commit(struct nfs_write_data *data)
>>> }
>>>
>>> static inline
>>> -int nfs_reschedule_unstable_write(struct nfs_page *req)
>>> +int nfs_reschedule_unstable_write(struct nfs_page *req,
>>> + struct nfs_write_data *data)
>>> {
>>> return 0;
>>> }
>>> @@ -615,9 +618,11 @@ static struct nfs_page
>>> *nfs_try_to_update_request(struct inode *inode,
>>> }
>>>
>>> if (nfs_clear_request_commit(req)&&
>>> - radix_tree_tag_clear(&NFS_I(inode)->nfs_page_tree,
>>> - req->wb_index, NFS_PAGE_TAG_COMMIT) !=
>>> NULL)
>>> + radix_tree_tag_clear(&NFS_I(inode)->nfs_page_tree,
>>> + req->wb_index, NFS_PAGE_TAG_COMMIT) !=
>>> NULL) {
>>> NFS_I(inode)->ncommit--;
>>> + pnfs_clear_request_commit(req);
>>> + }
>>>
>>> /* Okay, the request matches. Update the region */
>>> if (offset< req->wb_offset) {
>>> @@ -765,11 +770,12 @@ int nfs_updatepage(struct file *file, struct page
>>> *page,
>>> return status;
>>> }
>>>
>>> -static void nfs_writepage_release(struct nfs_page *req)
>>> +static void nfs_writepage_release(struct nfs_page *req,
>>> + struct nfs_write_data *data)
>>> {
>>> struct page *page = req->wb_page;
>>>
>>> - if (PageError(req->wb_page) ||
>>> !nfs_reschedule_unstable_write(req))
>>> + if (PageError(req->wb_page) || !nfs_reschedule_unstable_write(req,
>>> data))
>>> nfs_inode_remove_request(req);
>>> nfs_clear_page_tag_locked(req);
>>> nfs_end_page_writeback(page);
>>> @@ -1087,7 +1093,7 @@ static void nfs_writeback_release_partial(void
>>> *calldata)
>>>
>>> out:
>>> if (atomic_dec_and_test(&req->wb_complete))
>>> - nfs_writepage_release(req);
>>> + nfs_writepage_release(req, data);
>>> nfs_writedata_release(calldata);
>>> }
>>>
>>> @@ -1154,7 +1160,7 @@ static void nfs_writeback_release_full(void
>>> *calldata)
>>>
>>> if (nfs_write_need_commit(data)) {
>>> memcpy(&req->wb_verf,&data->verf,
>>> sizeof(req->wb_verf));
>>> - nfs_mark_request_commit(req);
>>> + nfs_mark_request_commit(req, data->lseg);
>>> dprintk(" marked for commit\n");
>>> goto next;
>>> }
>>> @@ -1357,14 +1363,15 @@ static void nfs_init_commit(struct nfs_write_data
>>> *data,
>>> nfs_fattr_init(&data->fattr);
>>> }
>>>
>>> -static void nfs_retry_commit(struct list_head *page_list)
>>> +static void nfs_retry_commit(struct list_head *page_list,
>>> + struct pnfs_layout_segment *lseg)
>>> {
>>> struct nfs_page *req;
>>>
>>> while (!list_empty(page_list)) {
>>> req = nfs_list_entry(page_list->next);
>>> nfs_list_remove_request(req);
>>> - nfs_mark_request_commit(req);
>>> + nfs_mark_request_commit(req, lseg);
>>> dec_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
>>> dec_bdi_stat(req->wb_page->mapping->backing_dev_info,
>>> BDI_RECLAIMABLE);
>>> @@ -1389,7 +1396,7 @@ nfs_commit_list(struct inode *inode, struct
>>> list_head *head, int how)
>>> nfs_init_commit(data, head);
>>> return nfs_initiate_commit(data, NFS_CLIENT(inode), data->mds_ops,
>>> how);
>>> out_bad:
>>> - nfs_retry_commit(head);
>>> + nfs_retry_commit(head, NULL);
>>> nfs_commit_clear_lock(NFS_I(inode));
>>> return -ENOMEM;
>>> }
>>> @@ -1477,7 +1484,11 @@ int nfs_commit_inode(struct inode *inode, int how)
>>> res = nfs_scan_commit(inode,&head, 0, 0);
>>> spin_unlock(&inode->i_lock);
>>> if (res) {
>>> - int error = nfs_commit_list(inode,&head, how);
>>> + int error;
>>> +
>>> + error = pnfs_commit_list(inode,&head, how);
>>> + if (error == PNFS_NOT_ATTEMPTED)
>>> + error = nfs_commit_list(inode,&head, how);
>>> if (error< 0)
>>> return error;
>>> if (!may_wait)
>>> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
>>> index cb2add4..f64ea14 100644
>>> --- a/include/linux/nfs_fs.h
>>> +++ b/include/linux/nfs_fs.h
>>> @@ -221,6 +221,7 @@ struct nfs_inode {
>>> #define NFS_INO_FSCACHE (5) /* inode can be
>>> cached by FS-Cache */
>>> #define NFS_INO_FSCACHE_LOCK (6) /* FS-Cache cookie
>>> management lock */
>>> #define NFS_INO_COMMIT (7) /* inode is
>>> committing unstable writes */
>>> +#define NFS_INO_PNFS_COMMIT (8) /* use pnfs code for
>>> commit */
>>
>> nit: alignment
>>
>>>
>>> static inline struct nfs_inode *NFS_I(const struct inode *inode)
>>> {
>>> diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
>>> index 92d54c8..8023e4e 100644
>>> --- a/include/linux/nfs_page.h
>>> +++ b/include/linux/nfs_page.h
>>> @@ -33,11 +33,15 @@ enum {
>>> PG_CLEAN,
>>> PG_NEED_COMMIT,
>>> PG_NEED_RESCHED,
>>> + PG_PNFS_COMMIT,
>>> };
>>>
>>> struct nfs_inode;
>>> struct nfs_page {
>>> - struct list_head wb_list; /* Defines state of page:
>>> */
>>> + union {
>>> + struct list_head wb_list; /* Defines state
>>> of page: */
>>> + struct pnfs_layout_segment *wb_commit_lseg; /* Used when
>>> PG_PNFS_COMMIT set */
>>> + };
>>> struct page *wb_page; /* page to read in/write
>>> out */
>>> struct nfs_open_context *wb_context; /* File state context info
>>> */
>>> struct nfs_lock_context *wb_lock_context; /* lock context
>>> info */
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
next prev parent reply other threads:[~2011-03-23 20:38 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-23 13:27 [PATCH 00/12] NFSv4.1 pnfs wave5 submission (try 2) Fred Isaman
2011-03-23 13:27 ` [PATCH 01/12] NFSv4.1: don't send COMMIT to ds for data sync writes Fred Isaman
2011-03-23 13:27 ` [PATCH 02/12] NFSv4.1: rearrange nfs_commit_rpcsetup Fred Isaman
2011-03-23 13:27 ` [PATCH 03/12] NFSv4.1: add callback to nfs4_commit_done Fred Isaman
2011-03-23 13:27 ` [PATCH 04/12] NFSv4.1: pull error handling out of nfs_commit_list Fred Isaman
2011-03-23 13:27 ` [PATCH 05/12] NFSv4.1: pull out code from nfs_commit_release Fred Isaman
2011-03-23 13:27 ` [PATCH 06/12] NFSv4.1: shift filelayout_free_lseg Fred Isaman
2011-03-23 13:27 ` [PATCH 07/12] NFSv4.1: alloc and free commit_buckets Fred Isaman
2011-03-23 13:27 ` [PATCH 08/12] NFSv4.1: add generic layer hooks for pnfs COMMIT Fred Isaman
2011-03-23 19:42 ` Benny Halevy
2011-03-23 19:45 ` Benny Halevy
2011-03-23 20:30 ` Fred Isaman
2011-03-23 20:39 ` Benny Halevy [this message]
2011-03-23 20:52 ` Trond Myklebust
2011-03-23 13:27 ` [PATCH 09/12] NFSv4.1: remove GETATTR from ds commits Fred Isaman
2011-03-23 19:51 ` Benny Halevy
2011-03-23 20:32 ` Fred Isaman
2011-03-23 20:35 ` Trond Myklebust
2011-03-23 13:27 ` [PATCH 10/12] NFSv4.1: filelayout driver specific code for COMMIT Fred Isaman
2011-03-23 13:27 ` [PATCH 11/12] NFSv4.1: layoutcommit Fred Isaman
2011-03-23 20:33 ` Benny Halevy
2011-03-23 21:00 ` Christoph Hellwig
2011-03-23 21:15 ` Trond Myklebust
2011-03-23 21:21 ` Christoph Hellwig
2011-03-23 21:26 ` Trond Myklebust
2011-03-23 13:27 ` [PATCH 12/12] NFSv4.1 remove temp code that prevented ds commits Fred Isaman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D8A5A68.7060302@panasas.com \
--to=bhalevy@panasas.com \
--cc=Trond.Myklebust@netapp.com \
--cc=iisaman@netapp.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).