From: Benny Halevy <bhalevy@panasas.com>
To: Fred Isaman <iisaman@netapp.com>
Cc: Weston Andros Adamson <dros@netapp.com>,
Boaz Harrosh <bharrosh@panasas.com>,
trond@netapp.com, linux-nfs@vger.kernel.org
Subject: Re: [PATCH] NFS: filelayout should use nfs_generic_pg_test
Date: Wed, 01 Jun 2011 21:56:25 +0300 [thread overview]
Message-ID: <4DE68B59.3030402@panasas.com> (raw)
In-Reply-To: <BANLkTi=Bwuuar76phVyEguB1Amqn4Q70Fw@mail.gmail.com>
On 2011-06-01 19:01, Fred Isaman wrote:
> On Wed, Jun 1, 2011 at 10:51 AM, Benny Halevy <bhalevy@panasas.com> wrote:
>> On 2011-06-01 17:44, Weston Andros Adamson wrote:
>>>
>>> On Jun 1, 2011, at 1:47 AM, Boaz Harrosh wrote:
>>>
>>>> On 06/01/2011 06:18 AM, Weston Andros Adamson wrote:
>>>>> Use nfs_generic_pg_test instead of pnfs_generic_pg_test.
>>>>>
>>>>> This fixes the BUG at fs/nfs/write.c:941 introduced by
>>>>> 89a58e32d9105c01022a757fb32ddc3b51bf0025.
>>>>>
>>>>> I was able to trigger this BUG reliably using pynfs in pnfs mode,
>>>>> by using dd(1) to write many small blocks.
>>>>>
>>>>> Signed-off-by: Weston Andros Adamson <dros@netapp.com>
>>>>> ---
>>>>> Fix proposed by Trond.
>>>>>
>>>>> Benny- Does this make sense?
>>>>>
>>>>> fs/nfs/nfs4filelayout.c | 2 +-
>>>>> fs/nfs/pagelist.c | 5 ++++-
>>>>> include/linux/nfs_page.h | 3 ++-
>>>>> 3 files changed, 7 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
>>>>> index 4269088..1c3bb72 100644
>>>>> --- a/fs/nfs/nfs4filelayout.c
>>>>> +++ b/fs/nfs/nfs4filelayout.c
>>>>> @@ -661,7 +661,7 @@ filelayout_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
>>>>> u64 p_stripe, r_stripe;
>>>>> u32 stripe_unit;
>>>>>
>>>>> - if (!pnfs_generic_pg_test(pgio, prev, req))
>>>>> + if (!nfs_generic_pg_test(pgio, prev, req))
>>>>> return 0;
>>>>>
>>>>
>>>> pnfs_generic_pg_test is the one that gets the layout.
>>>>
>>>> What you've done is revert to MDS IO
>>>>
>>>> Boaz
>>>
>>> Ah, you're right - I didn't even notice that! I usually confirm client -> DS communication with tcpdump. I was working for too long yesterday :)
>>>
>>> Patch: recalled. Discussion about a real fix: started.
>>>
>>> -dros
>>
>> I think the following should work:
>>
>> Benny
>>
>> git diff --stat -p -M
>> fs/nfs/nfs4filelayout.c | 10 ++++++++++
>> 1 files changed, 10 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
>> index 4269088..9f1d445 100644
>> --- a/fs/nfs/nfs4filelayout.c
>> +++ b/fs/nfs/nfs4filelayout.c
>> @@ -661,6 +661,16 @@ filelayout_pg_test(struct nfs_pageio_descriptor
>> *pgio, struct nfs_page *prev,
>> u64 p_stripe, r_stripe;
>> u32 stripe_unit;
>>
>> + /*
>> + * FIXME: ideally we should be able to coalesce all requests
>> + * that are not block boundary aligned, but currently this
>> + * is problematic for the case of bsize < PAGE_CACHE_SIZE,
>> + * since nfs_flush_multi and nfs_pagein_multi assume you
>> + * can have only one struct nfs_page.
>> + */
>> + if (desc->pg_bsize < PAGE_SIZE)
>> + return 0;
>> +
>> if (!pnfs_generic_pg_test(pgio, prev, req))
>> return 0;
>>
>
> Note this moves a test that was once part of the plain nfs code into
> the file layout driver. Why don't other drivers need this test?
True. Note I said it would work, not that it's the right fix? :-/
This just tells us what change exposed this issue...
Boaz moved this check to the nfs only path assuming that pg_bsize,
which holds the MDS's wsize/rsize is irrelevant for coalescing requests
for striping over pnfs.
I'm still convinced why nfs_flush_multi cannot use desc->pg_lseg
if it exists, but at the same time it seems like not doing the
right thing for pnfs coalescing in nfs_pageio_init_write and
nfs_pageio_do_add_request.
For pnfs, we need to ignore wsize, meaning we first need to try
to coalesce the pages and then decide if we're going the nfs_flush_multi
or the nfs_flush_one way, based on the coalesced length.
Benny
>
> Fred
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2011-06-01 18:56 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-01 3:18 [PATCH] NFS: filelayout should use nfs_generic_pg_test Weston Andros Adamson
2011-06-01 5:47 ` Boaz Harrosh
2011-06-01 12:14 ` Trond Myklebust
2011-06-01 13:36 ` Boaz Harrosh
2011-06-01 13:43 ` Benny Halevy
2011-06-01 14:32 ` Benny Halevy
2011-06-01 14:44 ` Weston Andros Adamson
2011-06-01 14:51 ` Benny Halevy
2011-06-01 15:36 ` Weston Andros Adamson
2011-06-01 16:01 ` Fred Isaman
2011-06-01 18:56 ` Benny Halevy [this message]
2011-06-01 19:17 ` Trond Myklebust
2011-06-01 19:29 ` Boaz Harrosh
2011-06-01 19:38 ` Trond Myklebust
2011-06-01 19:49 ` Boaz Harrosh
2011-06-01 19:52 ` Trond Myklebust
2011-06-01 18:07 ` Trond Myklebust
2011-06-01 19:13 ` Benny Halevy
2011-06-01 19:29 ` Trond Myklebust
2011-06-01 20:09 ` Benny Halevy
2011-06-06 16:47 ` William A. (Andy) Adamson
2011-06-06 18:21 ` Benny Halevy
2011-06-06 18:22 ` Myklebust, Trond
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4DE68B59.3030402@panasas.com \
--to=bhalevy@panasas.com \
--cc=bharrosh@panasas.com \
--cc=dros@netapp.com \
--cc=iisaman@netapp.com \
--cc=linux-nfs@vger.kernel.org \
--cc=trond@netapp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.