From: Benny Halevy <bhalevy@tonian.com>
To: "Myklebust, Trond" <Trond.Myklebust@netapp.com>
Cc: Peng Tao <bergwolf@gmail.com>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>,
Boaz Harrosh <bharrosh@panasas.com>,
"Isaman, Fred" <Fred.Isaman@netapp.com>,
Idan Keidar <idank@tonian.com>, Lev Solomonov <solo@tonian.com>
Subject: Re: [PATCH] NFSv4.1: Remove a bogus BUG_ON() in nfs4_layoutreturn_done
Date: Wed, 15 Aug 2012 14:50:11 +0300 [thread overview]
Message-ID: <502B8CF3.6020309@tonian.com> (raw)
In-Reply-To: <4FA345DA4F4AE44899BD2B03EEEC2FA93A3B3C@SACEXCMBX04-PRD.hq.netapp.com>
On 2012-08-14 17:53, Myklebust, Trond wrote:
> On Tue, 2012-08-14 at 22:30 +0800, Peng Tao wrote:
>> On Tue, Aug 14, 2012 at 9:45 PM, Myklebust, Trond
>> <Trond.Myklebust@netapp.com> wrote:
>>> On Tue, 2012-08-14 at 10:48 +0300, Benny Halevy wrote:
>>>> On 2012-08-09 18:39, Myklebust, Trond wrote:
>>>>> On Thu, 2012-08-09 at 23:01 +0800, Peng Tao wrote:
>>>>>> On Thu, Aug 9, 2012 at 10:36 PM, Myklebust, Trond
>>>>>> <Trond.Myklebust@netapp.com> wrote:
>>>>>>> On Thu, 2012-08-09 at 22:30 +0800, Peng Tao wrote:
>>>>>>>> On Thu, Aug 9, 2012 at 4:21 AM, Trond Myklebust
>>>>>>>> <Trond.Myklebust@netapp.com> wrote:
>>>>>>>>> Ever since commit 0a57cdac3f (NFSv4.1 send layoutreturn to fence
>>>>>>>>> disconnected data server) we've been sending layoutreturn calls
>>>>>>>>> while there is potentially still outstanding I/O to the data
>>>>>>>>> servers. The reason we do this is to avoid races between replayed
>>>>>>>>> writes to the MDS and the original writes to the DS.
>>>>>>>>>
>>>>>>>>> When this happens, the BUG_ON() in nfs4_layoutreturn_done can
>>>>>>>>> be triggered because it assumes that we would never call
>>>>>>>>> layoutreturn without knowing that all I/O to the DS is
>>>>>>>>> finished. The fix is to remove the BUG_ON() now that the
>>>>>>>>> assumptions behind the test are obsolete.
>>>>>>>>>
>>>>>>>> Isn't MDS supposed to recall the layout if races are possible between
>>>>>>>> outstanding write-to-DS and write-through-MDS?
>>>>>>>
>>>>>>> Where do you read that in RFC5661?
>>>>>>>
>>>>>> That's my (maybe mis-)understanding of how server works... But looking
>>>>>> at rfc5661 section 18.44.3. layoutreturn implementation.
>>>>>> "
>>>>>> After this call,
>>>>>> the client MUST NOT use the returned layout(s) and the associated
>>>>>> storage protocol to access the file data.
>>>>>> "
>>>>>> And given commit 0a57cdac3f, client is using the layout even after
>>>>>> layoutreturn, which IMHO is a violation of rfc5661.
>>>>>
>>>>> No. It is using the layoutreturn to tell the MDS to fence off I/O to a
>>>>> data server that is not responding. It isn't attempting to use the
>>>>> layout after the layoutreturn: the whole point is that we are attempting
>>>>> write-through-MDS after the attempt to write through the DS timed out.
>>>>>
>>>>
>>>> I hear you, but this use case is valid after a time out / disconnect
>>>> (which will translate to PNFS_OSD_ERR_UNREACHABLE for the objects layout)
>>>> In other cases, I/Os to the DS might obviously be in flight and the BUG_ON
>>>> indicates that.
>>>>
>>>> IMO, the right way to implement that is to initially mark the lsegs invalid
>>>> and increment plh_block_lgets, as we do today in _pnfs_return_layout
>>>> but actually send the layout return only when the last segment is dereferenced.
>>>
>>> This is what we do for object and block layout types, so your
>>> objects-specific objection is unfounded.
>>>
>> object layout is also doing layout return on IO error (commit
>> fe0fe83585f8). And it doesn't take care of draining concurrent
>> in-flight IO. I guess that's why Boaz saw the same BUG_ON.
>
> Yes. I did notice that code when I was looking into this. However that's
> Boaz's own patch, and it _only_ applies to the objects layout type. I
> assumed that he had tested it when I applied it...
>
> One way to fix that would be to keep a count of "outstanding
> read/writes" in the layout, so that when the error occurs, and we want
> to fall back to MDS, we just increment plh_block_lgets, invalidate the
> layout, and then let the outstanding read/writes fall to zero before
> sending the layoutreturn.
Sounds reasonable to me too.
> If the objects layout wants to do that, then I have no objection. As
> I've said multiple times, though, I'm not convinced we want to do that
> for the files layout.
>
I just fear that removing the BUG_ON will prevent us from detecting cases
where a LAYOUTRETURN is sent while there are layout segments in use
in the error free or non-timeout case.
Benny
prev parent reply other threads:[~2012-08-15 11:50 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-08 20:21 [PATCH] NFSv4.1: Remove a bogus BUG_ON() in nfs4_layoutreturn_done Trond Myklebust
2012-08-09 14:30 ` Peng Tao
2012-08-09 14:36 ` Myklebust, Trond
2012-08-09 15:01 ` Peng Tao
2012-08-09 15:39 ` Myklebust, Trond
2012-08-09 16:22 ` Peng Tao
2012-08-09 16:29 ` Myklebust, Trond
2012-08-09 16:40 ` Peng Tao
2012-08-09 17:06 ` Peng Tao
2012-08-12 17:36 ` Boaz Harrosh
2012-08-13 16:26 ` Myklebust, Trond
2012-08-13 23:39 ` Boaz Harrosh
2012-08-14 0:16 ` Myklebust, Trond
2012-08-14 0:28 ` Boaz Harrosh
2012-08-14 0:49 ` Myklebust, Trond
[not found] ` <1344875167.7706.31.camel@lade.trondhjem.org>
2012-08-13 16:58 ` Myklebust, Trond
2012-08-14 7:48 ` Benny Halevy
2012-08-14 13:45 ` Myklebust, Trond
2012-08-14 14:30 ` Peng Tao
2012-08-14 14:53 ` Myklebust, Trond
2012-08-15 11:50 ` Benny Halevy [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=502B8CF3.6020309@tonian.com \
--to=bhalevy@tonian.com \
--cc=Fred.Isaman@netapp.com \
--cc=Trond.Myklebust@netapp.com \
--cc=bergwolf@gmail.com \
--cc=bharrosh@panasas.com \
--cc=idank@tonian.com \
--cc=linux-nfs@vger.kernel.org \
--cc=solo@tonian.com \
--cc=tigran.mkrtchyan@desy.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.