From: Wang Yugui <wangyugui@e16-tech.com>
To: Maor Gottlieb <maorg@mellanox.com>
Cc: Leon Romanovsky <leon@kernel.org>,
Chuck Lever <chuck.lever@oracle.com>,
linux-rdma <linux-rdma@vger.kernel.org>
Subject: Re: a bug(BUG: kernel NULL pointer dereference) of ib or mlx happened in 5.4.21 but not in 5.4.20
Date: Wed, 26 Feb 2020 08:44:39 +0800 [thread overview]
Message-ID: <20200226084438.9265.409509F4@e16-tech.com> (raw)
In-Reply-To: <bc3ce212-9fde-0489-e7ac-8cb8be55c015@mellanox.com>
[-- Attachment #1: Type: text/plain, Size: 9562 bytes --]
Hi, Maor, Leon
The kernel 5.4.21 plus the two patches successfully boot now
without the NULL pointer problem. And nfs4/rdma sucessfully mount too.
#RDMA-core-Fix-use-of-logical-OR-in-get_new_pps.patch
#RDMA-core-fix-null.patch (the patch from Maor saved as git-am format)
My MCX354A have 2 port, and port 1 is set as InfiniBand, port 2 is set
as Ethernet.
# ibstat
CA 'mlx4_0'
CA type: MT4099
Number of ports: 2
Firmware version: 2.42.5000
Hardware version: 1
Node GUID: 0xe41d2d03007b4080
System image GUID: 0xe41d2d03007b4083
Port 1:
State: Down
Physical state: Polling
Rate: 10
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x02594868
Port GUID: 0xe41d2d03007b4081
Link layer: InfiniBand
Port 2:
State: Down
Physical state: Disabled
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x00010000
Port GUID: 0xe61d2dfffe7b4082
Link layer: Ethernet
# mlxup
Querying Mellanox devices firmware ...
Device #1:
----------
Device Type: ConnectX3
Part Number: 01T7NW
Description: ConnectX-3 VPI adapter; dual-port QSFP; FDR IB (56Gb/s) and 40GbE;PCIe3.0 x8 8GT/s; Dell PowerEdge
PSID: DEL1090001019
PCI Device Name: 0000:84:00.0
Port1 GUID: e41d2d03007b4081
Port2 MAC: e41d2d7b4082
Versions: Current Available
FW 2.42.5000 N/A
PXE 3.4.0752 N/A
Status: No matching image found
My server is a dell PowerEdge T630 with some other NIC cards.
# rxe_cfg
Name Link Driver Speed NMTU IPv4_addr RDEV RMTU
em1 yes igb 1500 192.168.2.63
em2 no igb 1500
p1p1 no bnx2x 10GigE 9000 10.0.0.63
p1p2 no bnx2x 10GigE 9000 10.0.1.63
p6p2 no mlx4_en 9000 10.40.1.63
virbr0 no bridge 1500 192.168.122.1
virbr0-nic no tun 1500
Best Regards
王玉贵
2020/02/26
> On 2/20/2020 6:26 PM, Wang Yugui wrote:
> > Hi, Leon, Chuck
> >
> > It is still broken even with the hotfix(https://patchwork.kernel.org/patch/11387567/) for 5.4.21.
>
> Hi Wang,
>
> How can I reproduce it ?
>
> Can you please try with the below diff?
>
> iff --git a/drivers/infiniband/core/security.c b/drivers/infiniband/core/security.c
> index b9a36ea244d4..2d5608315dc8 100644
> --- a/drivers/infiniband/core/security.c
> +++ b/drivers/infiniband/core/security.c
> @@ -340,11 +340,15 @@ static struct ib_ports_pkeys *get_new_pps(const struct ib_qp *qp,
> ??????????????? return NULL;
>
> ??????? if (qp_attr_mask & IB_QP_PORT)
> -?????????? new_pps->main.port_num =
> -?????????????????? (qp_pps) ? qp_pps->main.port_num : qp_attr->port_num;
> +???????? new_pps->main.port_num = qp_attr->port_num;
> + else if (qp_pps)
> +???????? new_pps->main.port_num = qp_pps->main.port_num;
> +
> ??????? if (qp_attr_mask & IB_QP_PKEY_INDEX)
> -?????????? new_pps->main.pkey_index = (qp_pps) ? qp_pps->main.pkey_index :
> - qp_attr->pkey_index;
> +???????? new_pps->main.pkey_index = qp_attr->pkey_index;
> + else if (qp_pps)
> +???????? new_pps->main.pkey_index = qp_pps->main.pkey_index;
> +
> ??????? if ((qp_attr_mask & IB_QP_PKEY_INDEX) && (qp_attr_mask & IB_QP_PORT))
> ??????????????? new_pps->main.state = IB_PORT_PKEY_VALID;
>
> >
> > the call stack is almost the same.
> >
> > Feb 20 23:49:53 T630 kernel: Call Trace:
> > Feb 20 23:49:53 T630 kernel: port_pkey_list_insert+0x30/0x1a0 [ib_core]
> > Feb 20 23:49:53 T630 kernel: ? kmem_cache_alloc_trace+0x219/0x230
> > Feb 20 23:49:53 T630 kernel: ib_security_modify_qp+0x244/0x3b0 [ib_core]
> > Feb 20 23:49:53 T630 kernel: _ib_modify_qp+0x1c0/0x3c0 [ib_core]
> > Feb 20 23:49:53 T630 kernel: ? dma_pool_free+0x24/0xc0
> > Feb 20 23:49:53 T630 kernel: ipoib_init_qp+0x77/0x190 [ib_ipoib]
> > Feb 20 23:49:53 T630 kernel: ? __mlx4_ib_query_pkey+0xe7/0x110 [mlx4_ib]
> > Feb 20 23:49:53 T630 kernel: ? ib_find_pkey+0x98/0xe0 [ib_core]
> > Feb 20 23:49:53 T630 kernel: ipoib_ib_dev_open_default+0x1a/0x180 [ib_ipoib]
> > Feb 20 23:49:53 T630 kernel: ipoib_ib_dev_open+0x66/0xa0 [ib_ipoib]
> > Feb 20 23:49:53 T630 kernel: ipoib_open+0x44/0x110 [ib_ipoib]
> > Feb 20 23:49:53 T630 kernel: __dev_open+0xcd/0x160
> > Feb 20 23:49:53 T630 kernel: __dev_change_flags+0x1ad/0x220
> > Feb 20 23:49:53 T630 kernel: ? __dev_notify_flags+0x95/0xf0
> > Feb 20 23:49:53 T630 kernel: dev_change_flags+0x21/0x60
> > Feb 20 23:49:53 T630 kernel: do_setlink+0x320/0xf00
> > Feb 20 23:49:53 T630 kernel: ? __nla_validate_parse+0x51/0x840
> > Feb 20 23:49:53 T630 kernel: ? xas_load+0x8/0x80
> > Feb 20 23:49:53 T630 kernel: ? __update_load_avg_cfs_rq+0x1d5/0x2c0
> > Feb 20 23:49:53 T630 kernel: ? cpumask_next+0x17/0x20
> > Feb 20 23:49:53 T630 kernel: ? __snmp6_fill_stats64.isra.56+0x6b/0x110
> > Feb 20 23:49:53 T630 kernel: ? __nla_validate_parse+0x51/0x840
> > Feb 20 23:49:53 T630 kernel: __rtnl_newlink+0x53d/0x890
> > Feb 20 23:49:53 T630 kernel: ? __nla_reserve+0x38/0x50
> > Feb 20 23:49:53 T630 kernel: ? __nla_put+0xc/0x20
> > Feb 20 23:49:53 T630 kernel: ? __nla_reserve+0x38/0x50
> > Feb 20 23:49:53 T630 kernel: ? __nla_put+0xc/0x20
> > Feb 20 23:49:53 T630 kernel: ? nla_put+0x2f/0x40
> > Feb 20 23:49:53 T630 kernel: ? __nla_reserve+0x38/0x50
> > Feb 20 23:49:53 T630 kernel: ? __nla_put+0xc/0x20
> > Feb 20 23:49:53 T630 kernel: ? nla_put+0x2f/0x40
> > Feb 20 23:49:53 T630 kernel: ? rt6_fill_node+0x2d4/0x850
> > Feb 20 23:49:53 T630 kernel: ? _cond_resched+0x15/0x30
> > Feb 20 23:49:53 T630 kernel: ? kmem_cache_alloc_trace+0x1c9/0x230
> > Feb 20 23:49:53 T630 kernel: rtnl_newlink+0x43/0x60
> > Feb 20 23:49:53 T630 kernel: rtnetlink_rcv_msg+0x2b1/0x360
> > Feb 20 23:49:53 T630 kernel: ? __kmalloc_node_track_caller+0x241/0x300
> > Feb 20 23:49:53 T630 kernel: ? _cond_resched+0x15/0x30
> > Feb 20 23:49:53 T630 kernel: ? rtnl_calcit.isra.32+0x110/0x110
> > Feb 20 23:49:53 T630 kernel: netlink_rcv_skb+0x49/0x110
> > Feb 20 23:49:53 T630 kernel: netlink_unicast+0x191/0x220
> > Feb 20 23:49:53 T630 kernel: netlink_sendmsg+0x21d/0x3f0
> > Feb 20 23:49:53 T630 kernel: sock_sendmsg+0x5b/0x60
> > Feb 20 23:49:53 T630 kernel: ____sys_sendmsg+0x1eb/0x260
> > Feb 20 23:49:53 T630 kernel: ? copy_msghdr_from_user+0xdb/0x160
> > Feb 20 23:49:53 T630 kernel: ___sys_sendmsg+0x7c/0xc0
> > Feb 20 23:49:53 T630 kernel: ? do_filp_open+0xa7/0x100
> > Feb 20 23:49:53 T630 kernel: ? netdev_run_todo+0x5e/0x290
> > Feb 20 23:49:53 T630 kernel: ? list_lru_add+0xb7/0x1d0
> > Feb 20 23:49:53 T630 kernel: __sys_sendmsg+0x57/0xa0
> > Feb 20 23:49:53 T630 kernel: do_syscall_64+0x5b/0x180
> > Feb 20 23:49:53 T630 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
> >
> > This card have 2 port, and port 1 is set as InfiniBand, port 2
> > is set as Ethernet.
> >
> > # ibstat
> > CA 'mlx4_0'
> > CA type: MT4099
> > Number of ports: 2
> > Firmware version: 2.42.5000
> > Hardware version: 1
> > Node GUID: 0xe41d2d03007b4080
> > System image GUID: 0xe41d2d03007b4083
> > Port 1:
> > State: Down
> > Physical state: Polling
> > Rate: 10
> > Base lid: 0
> > LMC: 0
> > SM lid: 0
> > Capability mask: 0x02594868
> > Port GUID: 0xe41d2d03007b4081
> > Link layer: InfiniBand
> > Port 2:
> > State: Down
> > Physical state: Disabled
> > Rate: 40
> > Base lid: 0
> > LMC: 0
> > SM lid: 0
> > Capability mask: 0x00010000
> > Port GUID: 0xe61d2dfffe7b4082
> > Link layer: Ethernet
> >
> >
> > Best Regards
> > 王玉贵
> > 2020/02/21
> >
> >> On Thu, Feb 20, 2020 at 08:57:29AM -0500, Chuck Lever wrote:
> >>> Hello!
> >>>
> >>> Thanks for your bug report.
> >>>
> >>>
> >>>> On Feb 19, 2020, at 10:22 PM, Wang Yugui <wangyugui@e16-tech.com> wrote:
> >>>>
> >>>> Hi, chuck.lever
> >>>>
> >>>> a bug(BUG: kernel NULL pointer dereference) of ib or mlx happened in 5.4.21 but not in 5.4.20.
> >>>>
> >>>> maybe some releationship to xprtrdma-fix-dma-scatter-gather-list-mapping-imbalance.patch
> >>> I don't see an obvious connection to fix-dma-scatter-gather-list-mapping-imbalance.
> >>> The backtrace below is through IPoIB code paths. Those have nothing to do with
> >>> NFS/RDMA, which is the only ULP code that is changed by my commit.
> >>>
> >>>
> >>>> maybe the info is useful.
> >>> I'm copying linux-rdma for a bigger set of eyeballs.
> >>>
> >>> My knee-jerk recommendation is that if you have a reliable reproducer, try "git bisect"
> >>> between .20 and .21 to nail down a specific commit where the BUG starts to occur.
> >> No need to bisect, it is me who broke.
> >> The fix is already accepted, but not yet merged.
> >> https://patchwork.kernel.org/patch/11387567/
> >>
> >> Thanks
> > --------------------------------------
> > 北京京垓科技有限公司
> > 王玉贵 wangyugui@e16-tech.com
> > 电话:+86-136-71123776
> >
--------------------------------------
北京京垓科技有限公司
王玉贵 wangyugui@e16-tech.com
电话:+86-136-71123776
[-- Attachment #2: RDMA-core-fix-null.patch --]
[-- Type: application/octet-stream, Size: 1218 bytes --]
From d4078b7c5e9782b2ca3d6c6035f4abb995c4dab7 Mon Sep 17 00:00:00 2001
From: maorg@mellanox.com
Date: Wed, 26 Feb 2020 07:58:29 +0800
Subject: [PATCH] RDMA-core-fix-NULL
---
drivers/infiniband/core/security.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/infiniband/core/security.c b/drivers/infiniband/core/security.c
index 2b4d803..9e27ca1 100644
--- a/drivers/infiniband/core/security.c
+++ b/drivers/infiniband/core/security.c
@@ -340,11 +340,15 @@ static struct ib_ports_pkeys *get_new_pps(const struct ib_qp *qp,
return NULL;
if (qp_attr_mask & IB_QP_PORT)
- new_pps->main.port_num =
- (qp_pps) ? qp_pps->main.port_num : qp_attr->port_num;
+ new_pps->main.port_num = qp_attr->port_num;
+ else if (qp_pps)
+ new_pps->main.port_num = qp_pps->main.port_num;
+
if (qp_attr_mask & IB_QP_PKEY_INDEX)
- new_pps->main.pkey_index = (qp_pps) ? qp_pps->main.pkey_index :
- qp_attr->pkey_index;
+ new_pps->main.pkey_index = qp_attr->pkey_index;
+ else if (qp_pps)
+ new_pps->main.pkey_index = qp_pps->main.pkey_index;
+
if ((qp_attr_mask & IB_QP_PKEY_INDEX) && (qp_attr_mask & IB_QP_PORT))
new_pps->main.state = IB_PORT_PKEY_VALID;
--
2.24.1
[-- Attachment #3: RDMA-core-Fix-use-of-logical-OR-in-get_new_pps.patch --]
[-- Type: application/octet-stream, Size: 5511 bytes --]
From patchwork Mon Feb 17 20:43:18 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Nathan Chancellor <natechancellor@gmail.com>
X-Patchwork-Id: 11387567
X-Patchwork-Delegate: jgg@ziepe.ca
Return-Path: <SRS0=eK3A=4F=vger.kernel.org=linux-rdma-owner@kernel.org>
Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org
[172.30.200.123])
by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1B2E114E3
for <patchwork-linux-rdma@patchwork.kernel.org>;
Mon, 17 Feb 2020 20:43:40 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
by mail.kernel.org (Postfix) with ESMTP id EF64120801
for <patchwork-linux-rdma@patchwork.kernel.org>;
Mon, 17 Feb 2020 20:43:39 +0000 (UTC)
Authentication-Results: mail.kernel.org;
dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
header.b="NSM1P5Sb"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
id S1728676AbgBQUnj (ORCPT
<rfc822;patchwork-linux-rdma@patchwork.kernel.org>);
Mon, 17 Feb 2020 15:43:39 -0500
Received: from mail-oi1-f194.google.com ([209.85.167.194]:42426 "EHLO
mail-oi1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
with ESMTP id S1727300AbgBQUni (ORCPT
<rfc822;linux-rdma@vger.kernel.org>); Mon, 17 Feb 2020 15:43:38 -0500
Received: by mail-oi1-f194.google.com with SMTP id j132so17938514oih.9;
Mon, 17 Feb 2020 12:43:38 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20161025;
h=from:to:cc:subject:date:message-id:mime-version
:content-transfer-encoding;
bh=mt8KEiHVFXt+VI7oyaRToYaExGUPicwbfI4j6wwtPQE=;
b=NSM1P5SbjhpDBQ9V9I+7JKKNZZ8Xsi/Ao/gOUbQ1xd+3FCSvZBiK2f28jPw8GLxAEi
aPZehpxvudMkidUrcGsB2Bew1M4jb7qwd7CU6KSuteWVELybmQqqn+sWdTuiGjRa2g10
+XPrCy7IfzxuiYXxJGNn7Ms7wtLppo/NuXOOLQgDXLpcxFU4SBFDoIcJJzIs6MrZpt5v
OK9Wpq4viCjxUrxAqvRh/W2VHdxlS/M8ZahbKDXH/U2gJQ5iTtyzaTqqisYboEJxjtVl
hNHbFyIaNBkGy8Y7gWacVVo0+X77h06DaEi0HIZrwH3mG260jhh4PYTM8+cJZWgXikmi
bYAg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20161025;
h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version
:content-transfer-encoding;
bh=mt8KEiHVFXt+VI7oyaRToYaExGUPicwbfI4j6wwtPQE=;
b=nHxJb/N0aEXI1NLuEX6v7goMRKLqSy2/f6Oe/ur4GDnQuJUWv1J3+8Y9hjmvmTffGd
hDULZoeMDf0ZI8lNjXGj5mGcQjgm4DPCfCU5lHtSGCztmG9J67UAI2blGUPRa99n8X8h
hAK6FSREY/mooA0V2D2ww1ry/6800CZI5OBBLhE3xSp8nd38YT9Sco6bBKmkqD8RqF1X
TQ3JmRGtHeBALgLm5Cwlr1KtB6i35NHyMlHNhdwPSKDvZGvjTqw4YFRHiSIX16K5a9Ag
Axrw5TOTyicoVx7j0AmPBQI1veCKvoVSC7tCjY2QEEN1K4RjKyAVhZ154iDOanXAwne4
RiEA==
X-Gm-Message-State: APjAAAWx8xNsmDH5SniaSlaS6gKI0cMNDnb6qfbkgcsQom4cDI5RRHIH
XioNkIq8Dk7YsiSnNin+azk=
X-Google-Smtp-Source:
APXvYqwScEu4D7KeCYqO8/1v9KdWk5GSYNtypdkxNUfqBHecf0KjewwAPXsmUl0Uj1AZUoq+J4MAcA==
X-Received: by 2002:aca:530e:: with SMTP id h14mr505712oib.105.1581972218092;
Mon, 17 Feb 2020 12:43:38 -0800 (PST)
Received: from localhost.localdomain ([2604:1380:4111:8b00::1])
by smtp.gmail.com with ESMTPSA id
w20sm545592otj.21.2020.02.17.12.43.37
(version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
Mon, 17 Feb 2020 12:43:37 -0800 (PST)
From: Nathan Chancellor <natechancellor@gmail.com>
To: Doug Ledford <dledford@redhat.com>, Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org, clang-built-linux@googlegroups.com,
Nathan Chancellor <natechancellor@gmail.com>
Subject: [PATCH] RDMA/core: Fix use of logical OR in get_new_pps
Date: Mon, 17 Feb 2020 13:43:18 -0700
Message-Id: <20200217204318.13609-1-natechancellor@gmail.com>
X-Mailer: git-send-email 2.25.1
MIME-Version: 1.0
X-Patchwork-Bot: notify
Sender: linux-rdma-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-rdma.vger.kernel.org>
X-Mailing-List: linux-rdma@vger.kernel.org
Clang warns:
../drivers/infiniband/core/security.c:351:41: warning: converting the
enum constant to a boolean [-Wint-in-bool-context]
if (!(qp_attr_mask & (IB_QP_PKEY_INDEX || IB_QP_PORT)) && qp_pps) {
^
1 warning generated.
A bitwise OR should have been used instead.
Fixes: 1dd017882e01 ("RDMA/core: Fix protection fault in get_pkey_idx_qp_list")
Link: https://github.com/ClangBuiltLinux/linux/issues/889
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/core/security.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/core/security.c b/drivers/infiniband/core/security.c
index 2b4d80393bd0..b9a36ea244d4 100644
--- a/drivers/infiniband/core/security.c
+++ b/drivers/infiniband/core/security.c
@@ -348,7 +348,7 @@ static struct ib_ports_pkeys *get_new_pps(const struct ib_qp *qp,
if ((qp_attr_mask & IB_QP_PKEY_INDEX) && (qp_attr_mask & IB_QP_PORT))
new_pps->main.state = IB_PORT_PKEY_VALID;
- if (!(qp_attr_mask & (IB_QP_PKEY_INDEX || IB_QP_PORT)) && qp_pps) {
+ if (!(qp_attr_mask & (IB_QP_PKEY_INDEX | IB_QP_PORT)) && qp_pps) {
new_pps->main.port_num = qp_pps->main.port_num;
new_pps->main.pkey_index = qp_pps->main.pkey_index;
if (qp_pps->main.state != IB_PORT_PKEY_NOT_VALID)
prev parent reply other threads:[~2020-02-26 0:44 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20200220112231.34FB.409509F4@e16-tech.com>
2020-02-20 13:57 ` a bug(BUG: kernel NULL pointer dereference) of ib or mlx happened in 5.4.21 but not in 5.4.20 Chuck Lever
2020-02-20 14:05 ` Leon Romanovsky
2020-02-20 16:26 ` Wang Yugui
2020-02-25 13:05 ` Maor Gottlieb
2020-02-26 0:44 ` Wang Yugui [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200226084438.9265.409509F4@e16-tech.com \
--to=wangyugui@e16-tech.com \
--cc=chuck.lever@oracle.com \
--cc=leon@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=maorg@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.