From: Lang Cheng <chenglang@huawei.com>
To: <dledford@redhat.com>, <jgg@ziepe.ca>, <leon@kernel.org>,
<davem@davemloft.net>, <salil.mehta@huawei.com>,
<yisen.zhuang@huawei.com>
Cc: <linuxarm@huawei.com>, <netdev@vger.kernel.org>,
<linux-rdma@vger.kernel.org>
Subject: [RFC rdma-next] RDMA/core: Add attribute WQ_MEM_RECLAIM to workqueue "infiniband"
Date: Tue, 18 Feb 2020 11:35:35 +0800 [thread overview]
Message-ID: <1581996935-46507-1-git-send-email-chenglang@huawei.com> (raw)
The hns3 driver sets "hclge_service_task" workqueue with
WQ_MEM_RECLAIM flag in order to guarantee forward progress
under memory pressure. When hns3 ethernet driver perfrom a
reset bacause of tx timeout or ras error, hclge_service_task
will unregister ib_device before telling the fw to perfrom the
hardware reset in oder to disable accessing to the ib_device.
And ib_unregister_device() will call ib_cache_cleanup_one() to
flush workqueue "infiniband", which is without WQ_MEM_RECLAIM set,
then a WARNNING is triggered as below:
[11246.200168] hns3 0000:bd:00.1: Reset done, hclge driver initialization finished.
[11246.209979] hns3 0000:bd:00.1 eth7: net open
[11246.227608] ------------[ cut here ]------------
[11246.237370] workqueue: WQ_MEM_RECLAIM hclge:hclge_service_task [hclge] is flushing !WQ_MEM_RECLAIM infiniband:0x0
[11246.237391] WARNING: CPU: 50 PID: 2279 at ./kernel/workqueue.c:2605 check_flush_dependency+0xcc/0x140
[11246.260412] Modules linked in: hclgevf hns_roce_hw_v2 rdma_test(O) hns3 xt_CHECKSUM iptable_mangle xt_conntrack ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bpfilter vfio_iommu_type1 vfio_pci vfio_virqfd vfio ib_isert iscsi_target_mod ib_ipoib ib_umad rpcrdma ib_iser libiscsi scsi_transport_iscsi aes_ce_blk crypto_simd cryptd aes_ce_cipher sunrpc nls_iso8859_1 crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce joydev input_leds hid_generic usbkbd usbmouse sbsa_gwdt usbhid usb_storage hid ses hclge hisi_zip hisi_hpre hisi_sec2 hnae3 hisi_qm ahci hisi_trng_v2 evbug uacce rng_core gpio_dwapb autofs4 hisi_sas_v3_hw megaraid_sas hisi_sas_main libsas scsi_transport_sas [last unloaded: hns_roce_hw_v2]
[11246.325742] CPU: 50 PID: 2279 Comm: kworker/50:0 Kdump: loaded Tainted: G O 5.4.0-rc4+ #1
[11246.335181] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDD, BIOS 2280-V2 CS V3.B140.01 12/18/2019
[11246.344802] Workqueue: hclge hclge_service_task [hclge]
[11246.350007] pstate: 60c00009 (nZCv daif +PAN +UAO)
[11246.354779] pc : check_flush_dependency+0xcc/0x140
[11246.359549] lr : check_flush_dependency+0xcc/0x140
[11246.364317] sp : ffff800268a73990
[11246.367618] x29: ffff800268a73990 x28: 0000000000000001
[11246.372907] x27: ffffcbe4f5868000 x26: ffffcbe4f5541000
[11246.378196] x25: 00000000000000b8 x24: ffff002fdd0ff868
[11246.383483] x23: ffff002fdd0ff800 x22: ffff2027401ba600
[11246.388770] x21: 0000000000000000 x20: ffff002fdd0ff800
[11246.394059] x19: ffff202719293b00 x18: ffffcbe4f5541948
[11246.399347] x17: 000000006f8ad8dd x16: 0000000000000002
[11246.404634] x15: ffff8002e8a734f7 x14: 6c66207369205d65
[11246.409922] x13: 676c63685b206b73 x12: 61745f6563697672
[11246.415208] x11: 65735f65676c6368 x10: 3a65676c6368204d
[11246.420494] x9 : 49414c4345525f4d x8 : 6e6162696e69666e
[11246.425782] x7 : 69204d49414c4345 x6 : ffffcbe4f5765145
[11246.431068] x5 : 0000000000000000 x4 : 0000000000000000
[11246.436355] x3 : 0000000000000030 x2 : 00000000ffffffff
[11246.441642] x1 : 3349eb1ac5310100 x0 : 0000000000000000
[11246.446928] Call trace:
[11246.449363] check_flush_dependency+0xcc/0x140
[11246.453785] flush_workqueue+0x110/0x410
[11246.457691] ib_cache_cleanup_one+0x54/0x468
[11246.461943] __ib_unregister_device+0x70/0xa8
[11246.466279] ib_unregister_device+0x2c/0x40
[11246.470455] hns_roce_exit+0x34/0x198 [hns_roce_hw_v2]
[11246.475571] __hns_roce_hw_v2_uninit_instance.isra.56+0x3c/0x58 [hns_roce_hw_v2]
[11246.482934] hns_roce_hw_v2_reset_notify+0xd8/0x210 [hns_roce_hw_v2]
[11246.489261] hclge_notify_roce_client+0x84/0xe0 [hclge]
[11246.494464] hclge_reset_rebuild+0x60/0x730 [hclge]
[11246.499320] hclge_reset_service_task+0x400/0x5a0 [hclge]
[11246.504695] hclge_service_task+0x54/0x698 [hclge]
[11246.509464] process_one_work+0x15c/0x458
[11246.513454] worker_thread+0x144/0x520
[11246.517186] kthread+0xfc/0x128
[11246.520314] ret_from_fork+0x10/0x18
[11246.523873] ---[ end trace eb980723699c2585 ]---
[11246.528710] hns3 0000:bd:00.2: Func clear success after reset.
[11246.528747] hns3 0000:bd:00.0: Func clear success after reset.
[11246.907710] hns3 0000:bd:00.1 eth7: link up
There may be three ways to avoid the above warnning:
1. Allocate the "hclge_service_task" workqueue without
WQ_MEM_RECLAIM flag, which may cause deadlock problem
when hns3 driver is used as the low level transport of
a network file system
2. Do not unregister ib_device during reset process, maybe
only disable accessing to the ib_device using disable_device()
as rdma_dev_change_netns() does.
3. Allocate the "infiniband" workqueue with WQ_MEM_RECLAIM flag.
This patch allocates the "infiniband" workqueue with WQ_MEM_RECLAIM
flag to avoid the warnning.
Fixes: 0ea68902256e ("net: hns3: allocate WQ with WQ_MEM_RECLAIM flag")
Signed-off-by: Lang Cheng <chenglang@huawei.com>
---
drivers/infiniband/core/device.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 84dd74f..595548a 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2707,7 +2707,7 @@ static int __init ib_core_init(void)
{
int ret;
- ib_wq = alloc_workqueue("infiniband", 0, 0);
+ ib_wq = alloc_workqueue("infiniband", WQ_MEM_RECLAIM, 0);
if (!ib_wq)
return -ENOMEM;
--
2.8.1
next reply other threads:[~2020-02-18 3:39 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-18 3:35 Lang Cheng [this message]
2020-02-18 15:31 ` [RFC rdma-next] RDMA/core: Add attribute WQ_MEM_RECLAIM to workqueue "infiniband" Jason Gunthorpe
2020-02-19 1:13 ` Yunsheng Lin
2020-02-19 6:45 ` Leon Romanovsky
2020-02-19 7:40 ` Yunsheng Lin
2020-02-19 11:07 ` Leon Romanovsky
2020-02-20 1:16 ` Yunsheng Lin
2020-02-19 13:04 ` Jason Gunthorpe
2020-02-20 1:06 ` Yunsheng Lin
2020-02-20 17:46 ` Alexander Duyck
2020-02-21 1:44 ` Yunsheng Lin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1581996935-46507-1-git-send-email-chenglang@huawei.com \
--to=chenglang@huawei.com \
--cc=davem@davemloft.net \
--cc=dledford@redhat.com \
--cc=jgg@ziepe.ca \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linuxarm@huawei.com \
--cc=netdev@vger.kernel.org \
--cc=salil.mehta@huawei.com \
--cc=yisen.zhuang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).