From: Leon Romanovsky <leon@kernel.org>
To: Yunsheng Lin <linyunsheng@huawei.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>, Lang Cheng <chenglang@huawei.com>,
dledford@redhat.com, davem@davemloft.net, salil.mehta@huawei.com,
yisen.zhuang@huawei.com, linuxarm@huawei.com,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
Saeed Mahameed <saeedm@mellanox.com>,
bhaktipriya96@gmail.com, tj@kernel.org,
Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Subject: Re: [RFC rdma-next] RDMA/core: Add attribute WQ_MEM_RECLAIM to workqueue "infiniband"
Date: Wed, 19 Feb 2020 13:07:00 +0200 [thread overview]
Message-ID: <20200219110700.GK15239@unreal> (raw)
In-Reply-To: <1155d15f-4188-e5cd-3e4a-6e0c52e9b1eb@huawei.com>
On Wed, Feb 19, 2020 at 03:40:59PM +0800, Yunsheng Lin wrote:
> +cc Bhaktipriya, Tejun and Jeff
>
> On 2020/2/19 14:45, Leon Romanovsky wrote:
> > On Wed, Feb 19, 2020 at 09:13:23AM +0800, Yunsheng Lin wrote:
> >> On 2020/2/18 23:31, Jason Gunthorpe wrote:
> >>> On Tue, Feb 18, 2020 at 11:35:35AM +0800, Lang Cheng wrote:
> >>>> The hns3 driver sets "hclge_service_task" workqueue with
> >>>> WQ_MEM_RECLAIM flag in order to guarantee forward progress
> >>>> under memory pressure.
> >>>
> >>> Don't do that. WQ_MEM_RECLAIM is only to be used by things interlinked
> >>> with reclaimed processing.
> >>>
> >>> Work on queues marked with WQ_MEM_RECLAIM can't use GFP_KERNEL
> >>> allocations, can't do certain kinds of sleeps, can't hold certain
> >>> kinds of locks, etc.
>
> By the way, what kind of sleeps and locks can not be done in the work
> queued to wq marked with WQ_MEM_RECLAIM?
I didn't see this knowledge documented, but I would assume that
everything that can block memory reclaim progress should not be
in such workqueue.
>
> >>
> >> From mlx5 driver, it seems that there is GFP_KERNEL allocations
> >> on wq marked with WQ_MEM_RECLAIM too:
> >>
> >> mlx5e_tx_timeout_work() -> mlx5e_safe_reopen_channels() ->
> >> mlx5e_safe_switch_channels() -> mlx5e_open_channels()
> >>
> >> kcalloc() is called with GFP_KERNEL in mlx5e_open_channels(),
> >> and mlx5e_tx_timeout_work() is queued with priv->wq, which is
> >> allocated with WQ_MEM_RECLAIM flags. see:
> >>
> >> mlx5e_netdev_init() -> create_singlethread_workqueue()
> >
> > There are two reasons for that, first mlx5 driver was written far before
> > WQ_MEM_RECLAIM usage was clarified, second mlx5 has bugs.
> >
> >>
> >>
> >> From the comment in kernel/workqueue.c, the work queued with
> >> wq with WQ_MEM_RECLAIM flag set seems to be executed without
> >> blocking under some rare case. I still not quite understand
> >> the comment, and I can not find any doc that point out the
> >> GFP_KERNEL allocations can not be done in wq with WQ_MEM_RECLAIM
> >> yet. Is there any doc that mentions that GFP_KERNEL allocations
> >> can not be done in wq with WQ_MEM_RECLAIM?
> >
> > It is whole purpose of WQ_MEM_RECLAIM flag - allow progress in case of
> > memory pressure. Allocation memory while we are under memory pressure
> > is an invitation for a disaster.
>
> Ok, make sense.
>
> >
> >>
> >>
> >> /**
> >> * rescuer_thread - the rescuer thread function
> >> * @__rescuer: self
> >> *
> >> * Workqueue rescuer thread function. There's one rescuer for each
> >> * workqueue which has WQ_MEM_RECLAIM set.
> >> *
> >> * Regular work processing on a pool may block trying to create a new
> >> * worker which uses GFP_KERNEL allocation which has slight chance of
> >> * developing into deadlock if some works currently on the same queue
> >> * need to be processed to satisfy the GFP_KERNEL allocation. This is
> >> * the problem rescuer solves.
> >> *
> >> * When such condition is possible, the pool summons rescuers of all
> >> * workqueues which have works queued on the pool and let them process
> >> * those works so that forward progress can be guaranteed.
> >> *
> >> * This should happen rarely.
> >> *
> >> * Return: 0
> >> */
> >>
> >>
> >> The below is the reason we add the sets "hclge_service_task" workqueue
> >> with WQ_MEM_RECLAIM through analysing why other ethernet drivers has
> >> allocated wq with WQ_MEM_RECLAIM flag, I may be wrong about that:
> >
> > Many drivers are developed using copy/paste technique, so it is wrong
> > to assume that "other ethernet drivers" did the right thing.
> >
> >>
> >> hns3 ethernet driver may be used as the low level transport of a
> >> network file system, memory reclaim data path may depend on the
> >> worker in hns3 driver to bring back the ethernet link so that it flush
> >> the some cache to network based disk.
> >
> > Unlikely that this "network file system" dependency on ethernet link is correct.
>
> Ok, I may be wrong about the above usecase.
> but the below commit explicitly state that network devices may be used in
> memory reclaim path.
>
> 0a38c17a21a0 ("fm10k: Remove create_workqueue"):
>
> fm10k: Remove create_workqueue
>
> alloc_workqueue replaces deprecated create_workqueue().
>
> A dedicated workqueue has been used since the workitem (viz
> fm10k_service_task, which manages and runs other subtasks) is involved in
> normal device operation and requires forward progress under memory
> pressure.
>
> create_workqueue has been replaced with alloc_workqueue with max_active
> as 0 since there is no need for throttling the number of active work
> items.
>
> Since network devices may be used in memory reclaim path,
> WQ_MEM_RECLAIM has been set to guarantee forward progress.
>
> flush_workqueue is unnecessary since destroy_workqueue() itself calls
> drain_workqueue() which flushes repeatedly till the workqueue
> becomes empty. Hence the call to flush_workqueue() has been dropped.
>
> Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
> Acked-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>
> So:
> 1. Maybe the above commit log is misleading, and network device driver's
> wq does not need the WQ_MEM_RECLAIM flag, then maybe document what can
> not be done in the work queued to wq marked with WQ_MEM_RECLAIM, and
> remove the WQ_MEM_RECLAIM flag for the wq of network device driver.
I wouldn't truly count on what is written in commit messages of patch
series which globally replaced create_workqueue() interface.
>
>
> 2. If the network device driver's wq does need the WQ_MEM_RECLAIM flag, then
> hns3 may have tow problems here: WQ_MEM_RECLAIM wq flushing !WQ_MEM_RECLAIM
> wq problem and GFP_KERNEL allocations in the work queued to WQ_MEM_RECLAIM wq.
You are proposing to put WQ_MEM_RECLAIM in infiniband queue and not to
special queue inside of the driver.
>
> >
> > Thanks
> >
> >>
> >>>
> >>> Jason
> >>>
> >>>
> >>
> >
> > .
> >
>
next prev parent reply other threads:[~2020-02-19 11:07 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-18 3:35 [RFC rdma-next] RDMA/core: Add attribute WQ_MEM_RECLAIM to workqueue "infiniband" Lang Cheng
2020-02-18 15:31 ` Jason Gunthorpe
2020-02-19 1:13 ` Yunsheng Lin
2020-02-19 6:45 ` Leon Romanovsky
2020-02-19 7:40 ` Yunsheng Lin
2020-02-19 11:07 ` Leon Romanovsky [this message]
2020-02-20 1:16 ` Yunsheng Lin
2020-02-19 13:04 ` Jason Gunthorpe
2020-02-20 1:06 ` Yunsheng Lin
2020-02-20 17:46 ` Alexander Duyck
2020-02-21 1:44 ` Yunsheng Lin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200219110700.GK15239@unreal \
--to=leon@kernel.org \
--cc=bhaktipriya96@gmail.com \
--cc=chenglang@huawei.com \
--cc=davem@davemloft.net \
--cc=dledford@redhat.com \
--cc=jeffrey.t.kirsher@intel.com \
--cc=jgg@ziepe.ca \
--cc=linux-rdma@vger.kernel.org \
--cc=linuxarm@huawei.com \
--cc=linyunsheng@huawei.com \
--cc=netdev@vger.kernel.org \
--cc=saeedm@mellanox.com \
--cc=salil.mehta@huawei.com \
--cc=tj@kernel.org \
--cc=yisen.zhuang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.