From: Benny Halevy <bhalevy@tonian.com>
To: tao.peng@emc.com
Cc: Trond.Myklebust@netapp.com, linux-nfs@vger.kernel.org,
honey@citi.umich.edu, rees@umich.edu
Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
Date: Wed, 21 Sep 2011 14:27:48 +0300 [thread overview]
Message-ID: <4E79CA34.3060602@tonian.com> (raw)
In-Reply-To: <F19688880B763E40B28B2B462677FBF805C4952A34@MX09A.corp.emc.com>
On 2011-09-21 14:10, tao.peng@emc.com wrote:
>> -----Original Message-----
>> From: Benny Halevy [mailto:bhalevy@tonian.com]
>> Sent: Wednesday, September 21, 2011 6:57 PM
>> To: Peng, Tao
>> Cc: Trond.Myklebust@netapp.com; linux-nfs@vger.kernel.org;
>> honey@citi.umich.edu; rees@umich.edu
>> Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
>>
>> On 2011-09-21 13:23, tao.peng@emc.com wrote:
>>> Hi, Benny,
>>>
>>>> -----Original Message-----
>>>> From: linux-nfs-owner@vger.kernel.org
>> [mailto:linux-nfs-owner@vger.kernel.org]
>>>> On Behalf Of Benny Halevy
>>>> Sent: Wednesday, September 21, 2011 3:05 PM
>>>> To: Peng, Tao
>>>> Cc: Trond.Myklebust@netapp.com; linux-nfs@vger.kernel.org;
>>>> honey@citi.umich.edu; rees@umich.edu
>>>> Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
>>>>
>>>> On 2011-09-21 08:16, tao.peng@emc.com wrote:
>>>>>> -----Original Message-----
>>>>>> From: Myklebust, Trond [mailto:Trond.Myklebust@netapp.com]
>>>>>> Sent: Wednesday, September 21, 2011 12:20 PM
>>>>>> To: Peng, Tao
>>>>>> Cc: bhalevy@tonian.com; linux-nfs@vger.kernel.org; honey@citi.umich.edu;
>>>>>> rees@umich.edu
>>>>>> Subject: RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: tao.peng@emc.com [mailto:tao.peng@emc.com]
>>>>>>> Sent: Tuesday, September 20, 2011 10:44 PM
>>>>>>> To: Myklebust, Trond
>>>>>>> Cc: bhalevy@tonian.com; linux-nfs@vger.kernel.org;
>>>>>> honey@citi.umich.edu;
>>>>>>> rees@umich.edu
>>>>>>> Subject: RE: [PATCH 2/3] pnfs: introduce pnfs private workqueue
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: linux-nfs-owner@vger.kernel.org
>>>>>>>> [mailto:linux-nfs-owner@vger.kernel.org]
>>>>>>>> On Behalf Of Jim Rees
>>>>>>>> Sent: Wednesday, September 21, 2011 8:29 AM
>>>>>>>> To: Trond Myklebust
>>>>>>>> Cc: Benny Halevy; linux-nfs@vger.kernel.org; peter honeyman
>>>>>>>> Subject: Re: [PATCH 2/3] pnfs: introduce pnfs private workqueue
>>>>>>>>
>>>>>>>> Trond Myklebust wrote:
>>>>>>>>
>>>>>>>> On Mon, 2011-09-19 at 23:18 -0400, Jim Rees wrote:
>>>>>>>> > From: Peng Tao <bergwolf@gmail.com>
>>>>>>>> >
>>>>>>>> > For layoutdriver io done functions, default workqueue is not a
>>>>>> good
>>>>>>> place as
>>>>>>>> > the code is executed in IO path. So add a pnfs private workqueue
>>>>>> to
>>>>>>> handle
>>>>>>>> > them.
>>>>>>>> >
>>>>>>>> > Also change block and object layout code to make use of this
>>>>>> private
>>>>>>>> > workqueue.
>>>>>>>> >
>>>>>>>>
>>>>>>>> Wait, what????
>>>>>>>>
>>>>>>>> Why is the I/O path (i.e. the nfsiod queue) inappropriate for
>>>>>>>> layoutdriver io_done functions?
>>>>>>> The current code uses the system_wq to handle io_done functions. If
>>>>>> any
>>>>>>> function allocates memory in system_wq and causes dirty writeback in
>>>>>> nfs
>>>>>>> path, the io_done function can never be called because it is after the
>>>>>> original
>>>>>>> function in the system_wq. And you said that the rpciod/nfsiod is not
>>>>>> the
>>>>>>> ideal place because of memory allocation. So I suggested pnfs private
>>>>>>> workqueue and Benny agreed on it.
>>>>>>
>>>>>> You appear to be optimizing for a corner case. Why?
>>>>> Current code uses system_wq for io_done and it can cause deadlock. So this is
>>>> more than just an optimization.
>>>>>
>>>>>>
>>>>>> IOW: why do we need a whole new workqueue for something which is
>>>>>> supposed to be extremely rare? Why can't we just use standard keventd or
>>>>>> allocate a perfectly normal thread (e.g. the state recovery thread)?
>>>>> Unlike nfsiod and rpciod, keventd (aka, system_wq) is not created with
>>>> WQ_MEM_RECLAIM, indicating that it should not be used for IO path which can
>> be
>>>> invoked during memory reclaim. And I agree that a normal kthread can solve it.
>>>>>
>>>>> However, the blocklayout write end_io function still needs a context where it
>> can
>>>> allocate memory to convert extent state (for every IO that touches new extents).
>> If
>>>> you look into the patch, we are not just using the pnfs private workqueue to
>> handle
>>>> pnfs_ld_read/write_done, but also calling mark_extents_written() inside the
>>>> workqueue.
>>>>
>>>> Can this memory be preallocated before the I/O for the common case?
>>> It is possible to preallocate memory for the common case. But it doesn't remove
>> the need for workqueue. Without workqueue, we will need to put a bunch of extent
>> manipulation code into bio->end_io, which is executed in bottom half and we
>> always need minimize its execution.
>>>
>>> Unless we do following:
>>> 1. preallocate memory for extent state convertion
>>> 2. use nfsiod/rpciod to handle bl_write_cleanup
>>> 3. for pnfs error case, create a kthread to recollapse and resend to MDS
>>>
>>> not sure if it worth the complexity though...
>>
>> We do want to avoid memory allocation on the completion path to ensure
>> completion under memory pressure and graceful handling of low memory
>> conditions
>> so I think this approach is worth while.
>>
>> Having a lightweight handler on the bottom half done path is ok too.
>>
>> Doing the heavy lifting for the (assumingly rare) error case in a workqueue/
>> kthread makes a sense, just it mustn't block. Could you use the nfs state manager
>> for that?
> I don't quite understand. How do you use nfs state manager to do other tasks?
You need to keep a list of things to do hanging off of the nfs client structure
and set a bit in cl_state telling the state manager it has work to do
and wake it up. It then needs to go over the list of, say nfs_inodes
and call into the layout driver to handle the errors.
Benny
>
> Thanks,
> Tao
>
>>
>> Benny
>>
>>>
>>> Cheers,
>>> Tao
>>>
>>>>
>>>> Benny
>>>>
>>>>> If we don't create a pnfs private workqueue for pnfs_ld_read/write_done (i.e.,
>> by
>>>> just creating a normal kthread in error case), we would still have to create a
>>>> workqueue inside blocklayout driver to handle write end_io. And if blocklayout
>> has a
>>>> private workqueue, it no longer needs the normal kthread for read/write_done
>> error
>>>> handling, which leaves the kthread only specific to object layout (if it doesn't
>> need a
>>>> workqueue like blocklayout does).
>>>>>
>>>>> So I think a generic pnfs workqueue is better because it simplifies things and
>> solve
>>>> both problems.
>>>>>
>>>>> Best,
>>>>> Tao
>>>>>
>>>>>>
>>>>>> Trond
>>>>>
next prev parent reply other threads:[~2011-09-21 11:27 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-09-20 3:18 [PATCH 0/3] replacement for "introduce pnfs private workqueue" Jim Rees
2011-09-20 3:18 ` [PATCH 1/3] pnfsblock: add missing rpc_put_mount and path_put Jim Rees
2011-09-20 3:18 ` [PATCH 2/3] pnfs: introduce pnfs private workqueue Jim Rees
2011-09-20 22:41 ` Trond Myklebust
2011-09-21 0:29 ` Jim Rees
2011-09-21 2:44 ` tao.peng
2011-09-21 4:20 ` Myklebust, Trond
2011-09-21 5:16 ` tao.peng
2011-09-21 7:04 ` Benny Halevy
2011-09-21 10:23 ` tao.peng
2011-09-21 10:38 ` Boaz Harrosh
2011-09-21 11:04 ` tao.peng
2011-09-21 10:56 ` Benny Halevy
2011-09-21 11:10 ` tao.peng
2011-09-21 11:27 ` Benny Halevy [this message]
2011-09-21 11:42 ` Boaz Harrosh
2011-09-21 11:50 ` Benny Halevy
2011-09-21 13:56 ` Boaz Harrosh
2011-09-21 15:45 ` Peng Tao
2011-09-21 16:03 ` Trond Myklebust
2011-09-22 3:30 ` tao.peng
2011-09-22 7:17 ` Boaz Harrosh
2011-09-21 4:22 ` Myklebust, Trond
2011-09-20 3:18 ` [PATCH 3/3] SQUASHME: pnfs: simplify and clean up pnfsiod workqueue Jim Rees
2011-09-21 11:52 ` [PATCH 0/3] replacement for "introduce pnfs private workqueue" Benny Halevy
2011-09-21 12:32 ` Jim Rees
-- strict thread matches above, loose matches on Subject: below --
2011-09-10 17:41 [PATCH 0/3] pnfs private workqueue, and two cleanups Jim Rees
2011-09-10 17:41 ` [PATCH 2/3] pNFS: introduce pnfs private workqueue Jim Rees
2011-09-11 14:51 ` Benny Halevy
2011-09-11 15:15 ` Benny Halevy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E79CA34.3060602@tonian.com \
--to=bhalevy@tonian.com \
--cc=Trond.Myklebust@netapp.com \
--cc=honey@citi.umich.edu \
--cc=linux-nfs@vger.kernel.org \
--cc=rees@umich.edu \
--cc=tao.peng@emc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).