public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Jan Karcher <jaka@linux.ibm.com>
To: "D. Wythe" <alibuda@linux.alibaba.com>,
	kgraul@linux.ibm.com, wenjia@linux.ibm.com
Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org,
	linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org
Subject: Re: [PATCH net-next 00/10] net/smc: optimize the parallelism of SMC-R connections
Date: Tue, 16 Aug 2022 11:35:15 +0200	[thread overview]
Message-ID: <2182efbc-99f8-17ba-d344-95a467536b05@linux.ibm.com> (raw)
In-Reply-To: <cover.1660152975.git.alibuda@linux.alibaba.com>



On 10.08.2022 19:47, D. Wythe wrote:
> From: "D. Wythe" <alibuda@linux.alibaba.com>
> 
> This patch set attempts to optimize the parallelism of SMC-R connections,
> mainly to reduce unnecessary blocking on locks, and to fix exceptions that
> occur after thoses optimization.
>

Thank you again for your submission!
Let me give you a quick update from our side:
We tested your patches on top of the net-next kernel on our s390 
systems. They did crash our systems. After verifying our environment we 
pulled console logs and now we can tell that there is indeed a problem 
with your patches regarding SMC-D. So please do not integrate this 
change as of right now. I'm going to do more in depth reviews of your 
patches but i need some time for them so here is a quick a description 
of the problem:

It is a SMC-D problem, that occurs while building up the connection. In 
smc_conn_create you set struct smc_lnk_cluster *lnkc = NULL. For the 
SMC-R path you do grab the pointer, for SMC-D that never happens. Still 
you are using this refernce for SMC-D => Crash. This problem can be 
reproduced using the SMC-D path. Here is an example console output:

[  779.516382] Unable to handle kernel pointer dereference in virtual 
kernel address space
[  779.516389] Failing address: 0000000000000000 TEID: 0000000000000483
[  779.516391] Fault in home space mode while using kernel ASCE.
[  779.516395] AS:0000000069628007 R3:00000000ffbf0007 
S:00000000ffbef800 P:000000000000003d
[  779.516431] Oops: 0004 ilc:2 [#1] SMP
[  779.516436] Modules linked in: tcp_diag inet_diag ism mlx5_ib 
ib_uverbs mlx5_core smc_diag smc ib_core nft_fib_inet nft_fib_ipv4
nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 
nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv
6 nf_defrag_ipv4 ip_set nf_tables n
[  779.516470] CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 
5.19.0-13940-g22a46254655a #3
[  779.516476] Hardware name: IBM 8561 T01 701 (z/VM 7.2.0)

[  779.522738] Workqueue: smc_hs_wq smc_listen_work [smc]
[  779.522755] Krnl PSW : 0704c00180000000 000003ff803da89c 
(smc_conn_create+0x174/0x968 [smc])
[  779.522766]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 
PM:0 RI:0 EA:3
[  779.522770] Krnl GPRS: 0000000000000002 0000000000000000 
0000000000000001 0000000000000000
[  779.522773]            000000008a4128a0 000003ff803f21aa 
000000008e30d640 0000000086d72000
[  779.522776]            0000000086d72000 000000008a412803 
000000008a412800 000000008e30d650
[  779.522779]            0000000080934200 0000000000000000 
000003ff803cb954 00000380002dfa88
[  779.522789] Krnl Code: 000003ff803da88e: e310f0e80024        stg 
%r1,232(%r15)
[  779.522789]            000003ff803da894: a7180000            lhi 
%r1,0
[  779.522789]           #000003ff803da898: 582003ac            l 
%r2,940
[  779.522789]           >000003ff803da89c: ba123020            cs 
%r1,%r2,32(%r3)
[  779.522789]            000003ff803da8a0: ec1603be007e        cij 
%r1,0,6,000003ff803db01c

[  779.522789]            000003ff803da8a6: 4110b002            la 
%r1,2(%r11)
[  779.522789]            000003ff803da8aa: e310f0f00024        stg 
%r1,240(%r15)
[  779.522789]            000003ff803da8b0: e310f0c00004        lg 
%r1,192(%r15)
[  779.522870] Call Trace:
[  779.522873]  [<000003ff803da89c>] smc_conn_create+0x174/0x968 [smc]
[  779.522884]  [<000003ff803cb954>] 
smc_find_ism_v2_device_serv+0x1b4/0x300 [smc]
01: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP 
stop from CPU 01.
01: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP 
stop from CPU 00.
[  779.522894]  [<000003ff803cbace>] smc_listen_find_device+0x2e/0x370 [smc]


I'm going to send the review for the first patch right away (which is 
the one causing the crash), so far I'm done with it. The others are 
going to follow. Maybe you can look over the problem and come up with a 
solution, otherwise we are going to decide if we want to look into it as 
soon as I'm done with the reviews. Thank you for your patience.

  parent reply	other threads:[~2022-08-16 10:17 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-10 17:47 [PATCH net-next 00/10] net/smc: optimize the parallelism of SMC-R connections D. Wythe
2022-08-10 17:47 ` [PATCH net-next 01/10] net/smc: remove locks smc_client_lgr_pending and smc_server_lgr_pending D. Wythe
2022-08-11  3:41   ` kernel test robot
2022-08-11 11:51   ` kernel test robot
2022-08-16  9:43   ` Jan Karcher
2022-08-16 12:47     ` Tony Lu
2022-08-16 12:52   ` Tony Lu
2022-08-10 17:47 ` [PATCH net-next 02/10] net/smc: fix SMC_CLC_DECL_ERR_REGRMB without smc_server_lgr_pending D. Wythe
2022-08-16  7:58   ` Tony Lu
2022-08-10 17:47 ` [PATCH net-next 03/10] net/smc: allow confirm/delete rkey response deliver multiplex D. Wythe
2022-08-16  8:17   ` Tony Lu
2022-08-10 17:47 ` [PATCH net-next 04/10] net/smc: make SMC_LLC_FLOW_RKEY run concurrently D. Wythe
2022-08-10 17:47 ` [PATCH net-next 05/10] net/smc: llc_conf_mutex refactor, replace it with rw_semaphore D. Wythe
2022-08-10 17:47 ` [PATCH net-next 06/10] net/smc: use read semaphores to reduce unnecessary blocking in smc_buf_create() & smcr_buf_unuse() D. Wythe
2022-08-10 17:47 ` [PATCH net-next 07/10] net/smc: reduce unnecessary blocking in smcr_lgr_reg_rmbs() D. Wythe
2022-08-16  8:24   ` Tony Lu
2022-08-10 17:47 ` [PATCH net-next 08/10] net/smc: replace mutex rmbs_lock and sndbufs_lock with rw_semaphore D. Wythe
2022-08-16  8:37   ` Tony Lu
2022-08-10 17:47 ` [PATCH net-next 09/10] net/smc: fix potential panic dues to unprotected smc_llc_srv_add_link() D. Wythe
2022-08-16  8:28   ` Tony Lu
2022-08-10 17:47 ` [PATCH net-next 10/10] net/smc: fix application data exception D. Wythe
2022-08-11  3:28 ` [PATCH net-next 00/10] net/smc: optimize the parallelism of SMC-R connections Jakub Kicinski
2022-08-11  5:13   ` Tony Lu
2022-08-11 12:31 ` Karsten Graul
2022-08-16  9:35 ` Jan Karcher [this message]
2022-08-16 12:40   ` Tony Lu
2022-08-17  4:55   ` D. Wythe
2022-08-17 16:52     ` Jan Karcher
2022-08-18 13:06       ` D. Wythe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2182efbc-99f8-17ba-d344-95a467536b05@linux.ibm.com \
    --to=jaka@linux.ibm.com \
    --cc=alibuda@linux.alibaba.com \
    --cc=davem@davemloft.net \
    --cc=kgraul@linux.ibm.com \
    --cc=kuba@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=wenjia@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox