From: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: "Michael J. Ruhl"
<michael.j.ruhl-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH] RDMA/netlink: OOPs in rdma_nl_rcv_msg() from misinterpreted flag
Date: Fri, 20 Oct 2017 10:37:24 +0300 [thread overview]
Message-ID: <20171020073724.GY2106@mtr-leonro.local> (raw)
In-Reply-To: <20171019213859.26124.37851.stgit-K+u1se/DcYrLESAwzcoQNrvm/XP+8Wra@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 5644 bytes --]
On Thu, Oct 19, 2017 at 05:40:59PM -0400, Michael J. Ruhl wrote:
> I was playing with the ibacm service and discovered an issue
> the other day.
>
> If no provider library is present (I removed libacmp.so, and the
> provider keyword in the opts.cfg file is libacmp), when a resolve
> request is posted, the kernel will crash with the following Oops:
>
> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: (null)
> PGD 10543f1067 P4D 10543f1067 PUD 1033f93067 PMD 0
> Oops: 0010 [#1] SMP
> Modules linked in: rpcrdma ib_isert iscsi_target_mod
> target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_u
> ib_uverbs ib_umad rdma_cm ib_cm iw_cm dm_mirror dm_region_hash dm_log dm_mod
> dax sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass
> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_si
> glue_helper cryptd hfi1 rdmavt iTCO_wdt iTCO_vendor_support ib_core mei_me
> lpc_ich pcspkr mei ioatdma sg shpchp i2c_i801 mfd_core wmi ipmi_si ipmi_devi
> ipmi_msghandler acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl lockd gra
> sunrpc ip_tables ext4 mbcache jbd2 sd_mod mgag200 drm_kms_helper syscopyarea
> sysfillrect sysimgblt fb_sys_fops ttm igb ahci crc32c_intel ptp libahci
> pps_core drm dca libata i2c_algo_bit i2c_core
> CPU: 54 PID: 9841 Comm: ibacm Tainted: G I 4.14.0-rc2+ #6
> Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015
> task: ffff880855f42d00 task.stack: ffffc900246b4000
> RIP: 0010: (null)
> RSP: 0018:ffffc900246b7bc8 EFLAGS: 00010246
> RAX: ffffffff81dbe9e0 RBX: ffff881058bb1000 RCX: 0000000000000000
> RDX: 0000000000001100 RSI: ffff881058bb1320 RDI: ffff881056362000
> RBP: ffffc900246b7bf8 R08: 0000000000000ec0 R09: 0000000000001100
> R10: ffff8810573a5000 R11: 0000000000000000 R12: ffff881056362000
> R13: 0000000000000ec0 R14: ffff881058bb1320 R15: 0000000000000ec0
> FS: 00007fe0ba5a38c0(0000) GS:ffff88105f080000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 0000001056f5d003 CR4: 00000000001606e0
> Call Trace:
> ? netlink_dump+0x12c/0x290
> __netlink_dump_start+0x186/0x1f0
> rdma_nl_rcv_msg+0x193/0x1b0 [ib_core]
> rdma_nl_rcv+0xdc/0x130 [ib_core]
> netlink_unicast+0x181/0x240
> netlink_sendmsg+0x2c2/0x3b0
> sock_sendmsg+0x38/0x50
> SYSC_sendto+0x102/0x190
> ? __audit_syscall_entry+0xaf/0x100
> ? syscall_trace_enter+0x1d0/0x2b0
> ? __audit_syscall_exit+0x209/0x290
> SyS_sendto+0xe/0x10
> do_syscall_64+0x67/0x1b0
> entry_SYSCALL64_slow_path+0x25/0x25
> RIP: 0033:0x7fe0b9db2a63
> RSP: 002b:00007ffc55edc260 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
> RAX: ffffffffffffffda RBX: 0000000000000010 RCX: 00007fe0b9db2a63
> RDX: 0000000000000010 RSI: 00007ffc55edc280 RDI: 000000000000000d
> RBP: 00007ffc55edc670 R08: 00007ffc55edc270 R09: 000000000000000c
> R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffc55edc280
> R13: 000000000260b400 R14: 000000000000000d R15: 0000000000000001
> Code: Bad RIP value.
> RIP: (null) RSP: ffffc900246b7bc8
> CR2: 0000000000000000
> ---[ end trace 8d67abcfd10ec209 ]---
> Kernel panic - not syncing: Fatal exception
> Kernel Offset: disabled
> ---[ end Kernel panic - not syncing: Fatal exception
> ------------[ cut here ]------------
>
> The issue is that in rdma_nl_rcv_msg(), the check
> 'if (flags & NLM_F_DUMP)' is not completely correct.
>
> NLM_F_DUMP is two bits NLM_F_ROOT | NLM_F_MATCH.
>
> ibacm sends a RDMA_NL_LS response with the RDMA_NL_LS_F_ERR bit set
> if an error occurs in the service (like no provider being available,
> or ACM_STATUS_ENODATA, etc.).
>
> NLM_F_ROOT == (0x100) == RDMA_NL_LS_F_ERR.
>
> The current code thinks that it sees a NLM_F_DUMP flag and incorrectly calls
> the .dump() callback.
Hi Michael,
Thanks for the report and for excellent analysis, You are right that
RDMA_NL_LS_F_ERR has the same value as NLM_F_ROOT and it is bad, but
I just think that it is not the final root cause.
In case of errors, the LS was supposed to send NLMSG_ERROR message and not
overload general nlmsg_flags, which is awful. However I don't know if it is
feasible to fix current implementation without breaking UAPI contract.
In meanwhile, can we implement dummy dumpit functions for the LS,
which reuse ib_nl_is_good_ip_resp?
I prefer this solution over yours, because it doesn't mix LS-specifics with
general decision function and leaves LS anomalies in the LS-relevant code.
And returning 0 in absence of dumpit function as a response with
NLM_F_DUMP flag is wrong. User should be aware of the fact that
something wrong was with his request.
Thanks
>
> The included patch is an atempt to fix this issue. This patch fixes the
> issue that I am seeing, but I am not sure how to test the messages for
> RDMA_NL_RDMA_CM or RDMA_NL_IWCM (or any message that uses the
> NLM_F_DUMP bits).
>
> If anyone has some knowledge of these services, any extra testing would
> be welcomed.
>
> If the patch has no issues or comments, I will formally re-submit it
> (through my usual channel Denny).
>
> Thanks,
>
> Mike
>
>
> ---
>
> Michael J. Ruhl (1):
> RDMA/netlink: OOPs in rdma_nl_rcv_msg() from misinterpreted flag
>
>
> drivers/infiniband/core/netlink.c | 7 +++++--
> 1 files changed, 5 insertions(+), 2 deletions(-)
>
> --
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2017-10-20 7:37 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-19 21:40 [PATCH] RDMA/netlink: OOPs in rdma_nl_rcv_msg() from misinterpreted flag Michael J. Ruhl
[not found] ` <20171019213859.26124.37851.stgit-K+u1se/DcYrLESAwzcoQNrvm/XP+8Wra@public.gmane.org>
2017-10-19 21:41 ` Michael J. Ruhl
2017-10-20 7:37 ` Leon Romanovsky [this message]
[not found] ` <20171020073724.GY2106-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-10-20 12:18 ` Wan, Kaike
[not found] ` <3F128C9216C9B84BB6ED23EF16290AFB6347E3BF-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2017-10-20 16:20 ` Leon Romanovsky
[not found] ` <20171020162017.GZ2106-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-10-20 19:04 ` Wan, Kaike
[not found] ` <3F128C9216C9B84BB6ED23EF16290AFB6347E59B-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2017-10-23 5:54 ` Leon Romanovsky
2017-10-20 17:20 ` Ruhl, Michael J
[not found] ` <14063C7AD467DE4B82DEDB5C278E8663875E0841-AtyAts71sc88Ug9VwtkbtrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2017-10-23 8:11 ` Leon Romanovsky
[not found] ` <20171023081117.GE2106-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-10-23 13:38 ` Ruhl, Michael J
2017-10-23 14:49 ` Doug Ledford
[not found] ` <f03e51d6-4157-64b4-ec5d-9beac00ceb87-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-10-23 17:12 ` Leon Romanovsky
[not found] ` <20171023171211.GM2106-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-10-23 17:39 ` Doug Ledford
[not found] ` <1508780384.3325.13.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-10-23 18:03 ` Leon Romanovsky
[not found] ` <20171023180336.GQ2106-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-10-23 18:19 ` Ruhl, Michael J
[not found] ` <14063C7AD467DE4B82DEDB5C278E8663875E0FE2-AtyAts71sc88Ug9VwtkbtrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2017-10-23 18:25 ` Leon Romanovsky
[not found] ` <20171023182504.GB16127-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-10-23 20:24 ` Ruhl, Michael J
-- strict thread matches above, loose matches on Subject: below --
2017-10-24 12:41 Michael J. Ruhl
[not found] ` <20171024123957.32207.70888.stgit-K+u1se/DcYrLESAwzcoQNrvm/XP+8Wra@public.gmane.org>
2017-10-24 14:41 ` Leon Romanovsky
[not found] ` <20171024144152.GH16127-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-10-24 14:52 ` Ruhl, Michael J
[not found] ` <14063C7AD467DE4B82DEDB5C278E8663875E153D-AtyAts71sc88Ug9VwtkbtrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2017-10-24 15:19 ` Leon Romanovsky
[not found] ` <20171024151958.GI16127-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-10-24 15:42 ` Ruhl, Michael J
[not found] ` <14063C7AD467DE4B82DEDB5C278E8663875E15AD-AtyAts71sc88Ug9VwtkbtrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2017-10-25 18:57 ` Doug Ledford
[not found] ` <1508957840.3325.54.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-10-25 19:06 ` Leon Romanovsky
[not found] ` <20171025190608.GX16127-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-10-25 19:17 ` Doug Ledford
[not found] ` <1508959048.3325.58.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-10-25 19:32 ` Leon Romanovsky
2017-10-24 14:42 ` Shiraz Saleem
2017-10-24 16:31 ` Doug Ledford
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171020073724.GY2106@mtr-leonro.local \
--to=leon-dgejt+ai2ygdnm+yrofe0a@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=michael.j.ruhl-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox