public inbox for linux-s390@vger.kernel.org
 help / color / mirror / Atom feed
From: Tony Lu <tonylu@linux.alibaba.com>
To: Jan Karcher <jaka@linux.ibm.com>
Cc: "D. Wythe" <alibuda@linux.alibaba.com>,
	kgraul@linux.ibm.com, wenjia@linux.ibm.com, kuba@kernel.org,
	davem@davemloft.net, netdev@vger.kernel.org,
	linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org
Subject: Re: [PATCH net-next 00/10] net/smc: optimize the parallelism of SMC-R connections
Date: Tue, 16 Aug 2022 20:40:50 +0800	[thread overview]
Message-ID: <YvuQUu/hbIZgJdTG@TonyMac-Alibaba> (raw)
In-Reply-To: <2182efbc-99f8-17ba-d344-95a467536b05@linux.ibm.com>

On Tue, Aug 16, 2022 at 11:35:15AM +0200, Jan Karcher wrote:
> 
> 
> On 10.08.2022 19:47, D. Wythe wrote:
> > From: "D. Wythe" <alibuda@linux.alibaba.com>
> > 
> > This patch set attempts to optimize the parallelism of SMC-R connections,
> > mainly to reduce unnecessary blocking on locks, and to fix exceptions that
> > occur after thoses optimization.
> > 
> 
> Thank you again for your submission!
> Let me give you a quick update from our side:
> We tested your patches on top of the net-next kernel on our s390 systems.
> They did crash our systems. After verifying our environment we pulled
> console logs and now we can tell that there is indeed a problem with your
> patches regarding SMC-D. So please do not integrate this change as of right
> now. I'm going to do more in depth reviews of your patches but i need some
> time for them so here is a quick a description of the problem:
> 
> It is a SMC-D problem, that occurs while building up the connection. In
> smc_conn_create you set struct smc_lnk_cluster *lnkc = NULL. For the SMC-R
> path you do grab the pointer, for SMC-D that never happens. Still you are
> using this refernce for SMC-D => Crash. This problem can be reproduced using
> the SMC-D path. Here is an example console output:

Got it.

> 
> [  779.516382] Unable to handle kernel pointer dereference in virtual kernel
> address space
> [  779.516389] Failing address: 0000000000000000 TEID: 0000000000000483
> [  779.516391] Fault in home space mode while using kernel ASCE.
> [  779.516395] AS:0000000069628007 R3:00000000ffbf0007 S:00000000ffbef800
> P:000000000000003d
> [  779.516431] Oops: 0004 ilc:2 [#1] SMP
> [  779.516436] Modules linked in: tcp_diag inet_diag ism mlx5_ib ib_uverbs
> mlx5_core smc_diag smc ib_core nft_fib_inet nft_fib_ipv4
> nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
> nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv
> 6 nf_defrag_ipv4 ip_set nf_tables n
> [  779.516470] CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted
> 5.19.0-13940-g22a46254655a #3
> [  779.516476] Hardware name: IBM 8561 T01 701 (z/VM 7.2.0)
> 
> [  779.522738] Workqueue: smc_hs_wq smc_listen_work [smc]
> [  779.522755] Krnl PSW : 0704c00180000000 000003ff803da89c
> (smc_conn_create+0x174/0x968 [smc])
> [  779.522766]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0
> RI:0 EA:3
> [  779.522770] Krnl GPRS: 0000000000000002 0000000000000000 0000000000000001
> 0000000000000000
> [  779.522773]            000000008a4128a0 000003ff803f21aa 000000008e30d640
> 0000000086d72000
> [  779.522776]            0000000086d72000 000000008a412803 000000008a412800
> 000000008e30d650
> [  779.522779]            0000000080934200 0000000000000000 000003ff803cb954
> 00000380002dfa88
> [  779.522789] Krnl Code: 000003ff803da88e: e310f0e80024        stg
> %r1,232(%r15)
> [  779.522789]            000003ff803da894: a7180000            lhi %r1,0
> [  779.522789]           #000003ff803da898: 582003ac            l %r2,940
> [  779.522789]           >000003ff803da89c: ba123020            cs
> %r1,%r2,32(%r3)
> [  779.522789]            000003ff803da8a0: ec1603be007e        cij
> %r1,0,6,000003ff803db01c
> 
> [  779.522789]            000003ff803da8a6: 4110b002            la
> %r1,2(%r11)
> [  779.522789]            000003ff803da8aa: e310f0f00024        stg
> %r1,240(%r15)
> [  779.522789]            000003ff803da8b0: e310f0c00004        lg
> %r1,192(%r15)
> [  779.522870] Call Trace:
> [  779.522873]  [<000003ff803da89c>] smc_conn_create+0x174/0x968 [smc]
> [  779.522884]  [<000003ff803cb954>] smc_find_ism_v2_device_serv+0x1b4/0x300
> [smc]
> 01: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop
> from CPU 01.
> 01: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop
> from CPU 00.
> [  779.522894]  [<000003ff803cbace>] smc_listen_find_device+0x2e/0x370 [smc]
> 
> 
> I'm going to send the review for the first patch right away (which is the
> one causing the crash), so far I'm done with it. The others are going to
> follow. Maybe you can look over the problem and come up with a solution,
> otherwise we are going to decide if we want to look into it as soon as I'm
> done with the reviews. Thank you for your patience.

Thanks for pointing this issue. We will fix this soon in v2.

Tony Lu

  reply	other threads:[~2022-08-16 12:40 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-10 17:47 [PATCH net-next 00/10] net/smc: optimize the parallelism of SMC-R connections D. Wythe
2022-08-10 17:47 ` [PATCH net-next 01/10] net/smc: remove locks smc_client_lgr_pending and smc_server_lgr_pending D. Wythe
2022-08-11  3:41   ` kernel test robot
2022-08-11 11:51   ` kernel test robot
2022-08-16  9:43   ` Jan Karcher
2022-08-16 12:47     ` Tony Lu
2022-08-16 12:52   ` Tony Lu
2022-08-10 17:47 ` [PATCH net-next 02/10] net/smc: fix SMC_CLC_DECL_ERR_REGRMB without smc_server_lgr_pending D. Wythe
2022-08-16  7:58   ` Tony Lu
2022-08-10 17:47 ` [PATCH net-next 03/10] net/smc: allow confirm/delete rkey response deliver multiplex D. Wythe
2022-08-16  8:17   ` Tony Lu
2022-08-10 17:47 ` [PATCH net-next 04/10] net/smc: make SMC_LLC_FLOW_RKEY run concurrently D. Wythe
2022-08-10 17:47 ` [PATCH net-next 05/10] net/smc: llc_conf_mutex refactor, replace it with rw_semaphore D. Wythe
2022-08-10 17:47 ` [PATCH net-next 06/10] net/smc: use read semaphores to reduce unnecessary blocking in smc_buf_create() & smcr_buf_unuse() D. Wythe
2022-08-10 17:47 ` [PATCH net-next 07/10] net/smc: reduce unnecessary blocking in smcr_lgr_reg_rmbs() D. Wythe
2022-08-16  8:24   ` Tony Lu
2022-08-10 17:47 ` [PATCH net-next 08/10] net/smc: replace mutex rmbs_lock and sndbufs_lock with rw_semaphore D. Wythe
2022-08-16  8:37   ` Tony Lu
2022-08-10 17:47 ` [PATCH net-next 09/10] net/smc: fix potential panic dues to unprotected smc_llc_srv_add_link() D. Wythe
2022-08-16  8:28   ` Tony Lu
2022-08-10 17:47 ` [PATCH net-next 10/10] net/smc: fix application data exception D. Wythe
2022-08-11  3:28 ` [PATCH net-next 00/10] net/smc: optimize the parallelism of SMC-R connections Jakub Kicinski
2022-08-11  5:13   ` Tony Lu
2022-08-11 12:31 ` Karsten Graul
2022-08-16  9:35 ` Jan Karcher
2022-08-16 12:40   ` Tony Lu [this message]
2022-08-17  4:55   ` D. Wythe
2022-08-17 16:52     ` Jan Karcher
2022-08-18 13:06       ` D. Wythe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YvuQUu/hbIZgJdTG@TonyMac-Alibaba \
    --to=tonylu@linux.alibaba.com \
    --cc=alibuda@linux.alibaba.com \
    --cc=davem@davemloft.net \
    --cc=jaka@linux.ibm.com \
    --cc=kgraul@linux.ibm.com \
    --cc=kuba@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=wenjia@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox