From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 489D0ECAAD8 for ; Fri, 16 Sep 2022 05:17:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229940AbiIPFRD (ORCPT ); Fri, 16 Sep 2022 01:17:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55180 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229935AbiIPFRC (ORCPT ); Fri, 16 Sep 2022 01:17:02 -0400 Received: from out30-45.freemail.mail.aliyun.com (out30-45.freemail.mail.aliyun.com [115.124.30.45]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B3577B784; Thu, 15 Sep 2022 22:16:59 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R991e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046060;MF=alibuda@linux.alibaba.com;NM=1;PH=DS;RN=8;SR=0;TI=SMTPD_---0VPvZOCg_1663305415; Received: from 30.221.149.4(mailfrom:alibuda@linux.alibaba.com fp:SMTPD_---0VPvZOCg_1663305415) by smtp.aliyun-inc.com; Fri, 16 Sep 2022 13:16:57 +0800 Message-ID: <852b0d7b-090e-b24d-7950-b1332b1dcd6c@linux.alibaba.com> Date: Fri, 16 Sep 2022 13:16:54 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [PATCH net-next v2 01/10] net/smc: remove locks smc_client_lgr_pending and smc_server_lgr_pending Content-Language: en-US To: Jan Karcher , kgraul@linux.ibm.com, wenjia@linux.ibm.com Cc: kuba@kernel.org, davem@davemloft.net, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-rdma@vger.kernel.org References: <688d165fe630989665e5091a28a5b1238123fbdc.1661407821.git.alibuda@linux.alibaba.com> <377011d2-7f9d-bf7f-8366-c579bf42c396@linux.alibaba.com> <1767b6e4-0053-728b-9722-add68da13781@linux.ibm.com> From: "D. Wythe" In-Reply-To: <1767b6e4-0053-728b-9722-add68da13781@linux.ibm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-s390@vger.kernel.org Hi, Jan Thanks a lot for your test, your information is very important! And feel sorry to reply late, this issues is quite hidden but stupid. Here's the problem: We use ini->is_smcd to determine whether the lock needs to be removed in our patch, however, this value is only available after smc_listen_find_device be invoked, Before that, the value is always false (kzalloc). Unfortunately, we had used it before. In other words, it is equivalent that we removed the SMC-D lock but did not do SMC-D link cluster, which will lead to a large number of critical area issues, including abnormal downgrade to SMC-R, residual connection status, etc. Considering that, it seems that we have to do the link cluster for SMC-D too, It won't take much time, I expect to send a new version next week. Thanks D. Wythe On 9/7/22 4:10 PM, Jan Karcher wrote: > > > On 02.09.2022 13:25, D. Wythe wrote: >> >> >> On 8/31/22 11:04 PM, Jan Karcher wrote: >>> >>> >>> On 26.08.2022 11:51, D. Wythe wrote: >>>> From: "D. Wythe" >>>> >>>> This patch attempts to remove locks named smc_client_lgr_pending and >>>> smc_server_lgr_pending, which aim to serialize the creation of link >>>> group. However, once link group existed already, those locks are >>>> meaningless, worse still, they make incoming connections have to be >>>> queued one after the other. >>>> >>>> Now, the creation of link group is no longer generated by competition, >>>> but allocated through following strategy. >>>> >>>> 1. Try to find a suitable link group, if successd, current connection >>>> is considered as NON first contact connection. ends. >>>> >>>> 2. Check the number of connections currently waiting for a suitable >>>> link group to be created, if it is not less that the number of link >>>> groups to be created multiplied by (SMC_RMBS_PER_LGR_MAX - 1), then >>>> increase the number of link groups to be created, current connection >>>> is considered as the first contact connection. ends. >>>> >>>> 3. Increase the number of connections currently waiting, and wait >>>> for woken up. >>>> >>>> 4. Decrease the number of connections currently waiting, goto 1. >>>> >>>> We wake up the connection that was put to sleep in stage 3 through >>>> the SMC link state change event. Once the link moves out of the >>>> SMC_LNK_ACTIVATING state, decrease the number of link groups to >>>> be created, and then wake up at most (SMC_RMBS_PER_LGR_MAX - 1) >>>> connections. >>>> >>>> In the iplementation, we introduce the concept of lnk cluster, which is >>>> a collection of links with the same characteristics (see >>>> smcr_lnk_cluster_cmpfn() with more details), which makes it possible to >>>> wake up efficiently in the scenario of N v.s 1. >>>> >>>> Signed-off-by: D. Wythe >>> >>> Hello D., >>> >>> thanks for the v2 and the patience. >>> I got to testing and as with v1 I want to share our findings with you. If you need more information or want us to look deeper into the findings please let us know. >>> >>> Regarding SMC-R test-suite: >>> We see a refcount error during one of our stress tests. This lets us believe that the smc_link_cluster_put() to smc_link_cluster_hold() ratio is not right anymore. >>> The patch provided by yacan does fix this issue but we did not verify if it is the right way to balance the hold and put calls. >>> >>> [root@t8345011 ~]# journalctl --dmesg | tail -100 >>> Aug 31 16:17:36 t8345011.lnxne.boe smc-tests: test_smcapp_50x_ifdown started >>> Aug 31 16:17:46 t8345011.lnxne.boe kernel: smc: SMC-R lg 00000100 net 1 link removed: id 00000101, peerid 00000101, ibdev mlx5_0, ibport 1 >>> Aug 31 16:17:46 t8345011.lnxne.boe kernel: smc: SMC-R lg 00000100 net 1 state changed: SINGLE, pnetid NET25 >>> Aug 31 16:17:46 t8345011.lnxne.boe kernel: smc: SMC-R lg 00000100 net 1 link added: id 00000103, peerid 00000103, ibdev mlx5_0, ibport 1 >>> Aug 31 16:17:46 t8345011.lnxne.boe kernel: smc: SMC-R lg 00000100 net 1 state changed: ASYMMETRIC_PEER, pnetid NET25 >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel: smc: SMC-R lg 00000100 net 1 link added: id 00000104, peerid 00000104, ibdev mlx5_0, ibport 1 >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel: smc: SMC-R lg 00000100 net 1 state changed: SYMMETRIC, pnetid NET25 >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel: ------------[ cut here ]------------ >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel: refcount_t: underflow; use-after-free. >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel: WARNING: CPU: 1 PID: 150 at lib/refcount.c:87 refcount_dec_not_one+0x88/0xa8 >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel: Modules linked in: smc_diag tcp_diag inet_diag nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink mlx5_ib ism smc ib_uverbs ib_core vfio_ccw mdev s390_trng vfio_iommu_type1 vfio sch_fq_codel configfs ip_tables x_tables ghash_s390 prng chacha_s390 libchacha aes_s390 mlx5_core des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common pkey zcrypt rng_core autofs4 >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel: CPU: 1 PID: 150 Comm: kworker/1:2 Not tainted 6.0.0-rc2-00493-g91ecd751199f #8 >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel: Hardware name: IBM 8561 T01 701 (z/VM 7.2.0) >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel: Workqueue: events smc_llc_add_link_work [smc] >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel: Krnl PSW : 0704c00180000000 000000005b31f32c (refcount_dec_not_one+0x8c/0xa8) >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel:            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel: Krnl GPRS: 00000000ffffffea 0000000000000027 0000000000000026 000000005c3151e0 >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel:            00000000fee80000 0000038000000001 000000008e0e9a00 000000008de79c24 >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel:            0000038000000000 000003ff803f05ac 0000000095038360 000000008de79c00 >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel:            00000000828ca100 0000000095038360 000000005b31f328 0000038000943b50 >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel: Krnl Code: 000000005b31f31c: c02000466122        larl        %r2,000000005bbeb560 >>> 000000005b31f322: c0e500232e53        brasl        %r14,000000005b784fc8 >>> #000000005b31f328: af000000                mc        0,0 >>> >000000005b31f32c: a7280001                lhi        %r2,1 >>> 000000005b31f330: ebeff0a00004        lmg        %r14,%r15,160(%r15) >>> 000000005b31f336: ec223fbf0055        risbg        %r2,%r2,63,191,0 >>> 000000005b31f33c: 07fe                bcr        15,%r14 >>> 000000005b31f33e: 47000700                bc        0,1792 >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel: Call Trace: >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel:  [<000000005b31f32c>] refcount_dec_not_one+0x8c/0xa8 >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel: ([<000000005b31f328>] refcount_dec_not_one+0x88/0xa8) >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel:  [<000003ff803ef16a>] smcr_link_cluster_on_link_state.part.0+0x1ba/0x440 [smc] >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel:  [<000003ff803f05ac>] smcr_link_clear+0x5c/0x1b0 [smc] >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel:  [<000003ff803fadf4>] smc_llc_add_link_work+0x43c/0x470 [smc] >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel:  [<000000005ac1f0e2>] process_one_work+0x1fa/0x478 >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel:  [<000000005ac1f88c>] worker_thread+0x64/0x468 >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel:  [<000000005ac28580>] kthread+0x108/0x110 >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel:  [<000000005abaf2dc>] __ret_from_fork+0x3c/0x58 >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel:  [<000000005b7a4d6a>] ret_from_fork+0xa/0x40 >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel: Last Breaking-Event-Address: >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel:  [<000000005b785028>] __warn_printk+0x60/0x68 >> >> Thank you for your test, I need to think about it, please give me some time. >> >> >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel: ---[ end trace 0000000000000000 ]--- >>> Aug 31 16:17:55 t8345011.lnxne.boe kernel: smc: SMC-R lg 00000100 net 1 link removed: id 00000103, peerid 00000103, ibdev mlx5_0, ibport 1 >>> [root@t8345011 ~]# >>> >>> >>> >>> Regarding SMC-D test-suite: >>> For SMC-D we also see errors during another stress test. While we expect connections to fall back to TCP due to the limit of parallel connections your patch introduces TCP fallbacks with a new reason. >>> >>> [root@t8345011 ~]# journalctl --dmesg | tail -10 >>> Aug 31 16:30:07 t8345011.lnxne.boe smc-tests: test_oob7_send_multi_urg_at_start started >>> Aug 31 16:30:16 t8345011.lnxne.boe smc-tests: test_oob8_ignore_some_urg_data started >>> Aug 31 16:30:30 t8345011.lnxne.boe smc-tests: test_smc_tool_second started >>> Aug 31 16:30:34 t8345011.lnxne.boe smc-tests: test_tshark started >>> Aug 31 16:30:34 t8345011.lnxne.boe smc-tests: test_smcapp_torture_test started >>> Aug 31 16:30:49 t8345011.lnxne.boe kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000401, peerid 00000401, ibdev mlx5_0, ibport 1 >>> Aug 31 16:30:49 t8345011.lnxne.boe kernel: smc: SMC-R lg 00000400 net 1 state changed: SINGLE, pnetid NET25 >>> Aug 31 16:30:49 t8345011.lnxne.boe kernel: TCP: request_sock_TCP: Possible SYN flooding on port 51897. Sending cookies.  Check SNMP counters. >>> Aug 31 16:30:49 t8345011.lnxne.boe kernel: smc: SMC-R lg 00000400 net 1 link added: id 00000402, peerid 00000402, ibdev mlx5_1, ibport 1 >>> Aug 31 16:30:49 t8345011.lnxne.boe kernel: smc: SMC-R lg 00000400 net 1 state changed: SYMMETRIC, pnetid NET25 >>> >>> ^ >>> I am wondering why we see SMC-R dmesgs even if we communicate with SMC-D. Gotta verify that. Can be an error on our side. >> >> This is very weird, is there no such SMC-R dmesgs before apply my PATCH? >> >> I am not sure if there is logic to downgrade SMC-D to SMC-R, maybe it's has related to 0x03010000. >> I need to check the code, the reason will be sent out as soon as possible >> >> >>> [root@t8345011 ~]# >>> [root@t8345011 ~]# smcss >>> ACTIVE         00000 0067005 10.25.45.10:48096       10.25.45.11:51897     0000 SMCD >>> ACTIVE         00000 0067001 10.25.45.10:48060       10.25.45.11:51897     0000 SMCD >>> ACTIVE         00000 0066999 10.25.45.10:48054       10.25.45.11:51897     0000 SMCD >>> ACTIVE         00000 0068762 10.25.45.10:48046       10.25.45.11:51897     0000 SMCD >>> ACTIVE         00000 0066997 10.25.45.10:48044       10.25.45.11:51897     0000 SMCD >>> ACTIVE         00000 0068760 10.25.45.10:48036       10.25.45.11:51897     0000 SMCD >>> ACTIVE         00000 0066995 10.25.45.10:48026       10.25.45.11:51897     0000 SMCD >>> ACTIVE         00000 0068758 10.25.45.10:48024       10.25.45.11:51897     0000 SMCD >>> ACTIVE         00000 0066993 10.25.45.10:48022       10.25.45.11:51897     0000 SMCD >>> ACTIVE         00000 0068756 10.25.45.10:48006       10.25.45.11:51897     0000 SMCD >>> ACTIVE         00000 0066991 10.25.45.10:47998       10.25.45.11:51897     0000 SMCD >>> ACTIVE         00000 0068754 10.25.45.10:47984       10.25.45.11:51897     0000 SMCD >>> ACTIVE         00000 0067124 10.25.45.11:51897       10.25.45.10:48314     0000 TCP 0x05000000/0x030d0000 >>> ACTIVE         00000 0067121 10.25.45.11:51897       10.25.45.10:48302     0000 TCP 0x05000000/0x030d0000 >>> ACTIVE         00000 0067120 10.25.45.11:51897       10.25.45.10:48284     0000 TCP 0x05000000/0x030d0000 >>> ACTIVE         00000 0067114 10.25.45.11:51897       10.25.45.10:48282     0000 TCP 0x05000000/0x030d0000 >>> ACTIVE         00000 0067115 10.25.45.11:51897       10.25.45.10:48254     0000 TCP 0x05000000/0x030d0000 >>> ACTIVE         00000 0067111 10.25.45.11:51897       10.25.45.10:48250     0000 TCP 0x05000000/0x030d0000 >>> ACTIVE         00000 0066415 10.25.45.11:51897       10.25.45.10:48242     0000 TCP 0x05000000/0x030d0000 >>> ACTIVE         00000 0067113 10.25.45.11:51897       10.25.45.10:48230     0000 TCP 0x05000000/0x030d0000 >>> ACTIVE         00000 0066409 10.25.45.11:51897       10.25.45.10:48202     0000 TCP 0x05000000/0x030d0000 >>> ACTIVE         00000 0066413 10.25.45.11:51897       10.25.45.10:48214     0000 TCP 0x05000000/0x030d0000 >>> ACTIVE         00000 0066414 10.25.45.11:51897       10.25.45.10:48204     0000 TCP 0x05000000/0x030d0000 >>> ACTIVE         00000 0066397 10.25.45.11:51897       10.25.45.10:48120     0000 TCP 0x05000000/0x030d0000 >>> ACTIVE         00000 0066399 10.25.45.11:51897       10.25.45.10:48084     0000 TCP 0x05000000/0x030d0000 >>> ACTIVE         00000 0066396 10.25.45.11:51897       10.25.45.10:48078     0000 TCP 0x05000000/0x030d0000 >>> ACTIVE         00000 0062632 10.25.45.11:51897       10.25.45.10:43120     0000 TCP 0x03010000 >>> ACTIVE         00000 0062631 10.25.45.11:51897       10.25.45.10:43134     0000 TCP 0x03010000 >>> ACTIVE         00000 0062626 10.25.45.11:51897       10.25.45.10:43106     0000 TCP 0x03010000 >>> ACTIVE         00000 0062625 10.25.45.11:51897       10.25.45.10:43138     0000 TCP 0x03010000 >>> ACTIVE         00000 0062621 10.25.45.11:51897       10.25.45.10:43160     0000 TCP 0x03010000 >>> ACTIVE         00000 0061580 10.25.45.11:51897       10.25.45.10:42820     0000 TCP 0x03010000 >>> ACTIVE         00000 0061558 10.25.45.11:51897       10.25.45.10:42792     0000 TCP 0x03010000 >>> ACTIVE         00000 0061549 10.25.45.11:51897       10.25.45.10:42816     0000 TCP 0x03010000 >>> ACTIVE         00000 0061548 10.25.45.11:51897       10.25.45.10:42764     0000 TCP 0x03010000 >>> ACTIVE         00000 0061544 10.25.45.11:51897       10.25.45.10:42804     0000 TCP 0x03010000 >>> ACTIVE         00000 0061543 10.25.45.11:51897       10.25.45.10:42856     0000 TCP 0x03010000 >>> ACTIVE         00000 0061542 10.25.45.11:51897       10.25.45.10:42756     0000 TCP 0x03010000 >>> ACTIVE         00000 0062554 10.25.45.11:51897       10.25.45.10:42852     0000 TCP 0x03010000 >>> ACTIVE         00000 0062553 10.25.45.11:51897       10.25.45.10:42844     0000 TCP 0x03010000 >>> ACTIVE         00000 0062549 10.25.45.11:51897       10.25.45.10:42836     0000 TCP 0x03010000 >>> >>> ^ >>> Here SMCD and 0x05000000/0x030d0000 are expected. But: >>>    [353] smcss confirmed connection of type SMCD >>>    [353] Error: Found TCP fallback due to unexpected reasons: 0x03010000 >> sysctl -w net.ipv4.tcp_syncookies=0 >> >> Can you retry your test after set above configure? When TCP detects a potential flooding attack, >> it will starts syn-cookies to verify traffic. In this case, SMC can't work, and then triggering a fallback with >> error code 0x03010000. >> >> This doesn't seem to be the problem that my PATCH can cause, but my PATCH removes the lock in >> the handshake phase, which may speed up the frequency of your test initiating connections, >> But I can't be sure ... >> >> >>> We also exeperience that the lsmod count stays above 2 even after the testcase finished and takes quite a while before it goes down again (we send a kill signal at the end of our testcase). >> >>> >>> During test (which is fine) >>> >>> [root@t8345011 ~]# lsmod | grep smc >>> smc_diag               16384  0 >>> smc                   225280  2981 ism,smc_diag >>> ib_core               413696  3 smc,ib_uverbs,mlx5_ib >>> >>> Count > 2 even after tests finish! >>> >>> [root@t8345011 ~]# lsmod | grep smc >>> smc_diag               16384  0 >>> smc                   225280  40 ism,smc_diag >>> ib_core               413696  3 smc,ib_uverbs,mlx5_ib >> >>> Let us know if you need any more information. >>> Thanks, Jan >> >> >> This usually means that there are still connections that are not really destroyed, >> can you try this and to see if there are any remaining connections? >> >> smcd linkgroup; #or smcr, it depends, if any, can you show us the connection state (smcss -r or -d) >> >> ps aux | grep D; # check if there is work thread hungs, if any, please show us the /proc/$PID/stack. >> >> >> D. Wythe >> Thanks. >> > > Thank you for the tip with the syncookies. We disabled them on both systems and here is the new output which should be not as confusing. > > For your understanding of the output: > t8345010 is the system driving the tests and is acting as the client in this testcase. > t8345011 is the pair system and the server of this test. > What we are doing is we spawn a lot of connection between the two systems to see what is happening if there is stress (in terms of connection handling) on the system. > > We see the following: > - The driver falls back to SMCR in many occasions. This should not be. Also note the missmatch of numbers of connections handled. There were no other connections beside the test. > >   T8345010 >   > SMC-D Connections Summary >   >   Total connections handled          1012 >   > SMC-R Connections Summary >   >   Total connections handled          1512 > >   T8345011 >   > SMC-D Connections Summary >   >   Total connections handled          1190 >   > SMC-R Connections Summary >   >   Total connections handled          1513 > > > - Linkgroups for the SMCD & SMCR connections are being build up. > >   T8345011 >   > [root@t8345011 ~]# smcd linkgroup >   > LG-ID    VLAN  #Conns  PNET-ID >   > 00000300    0      37  NET25 >   > [root@t8345011 ~]# smcr linkgroup >   > LG-ID    LG-Role  LG-Type  VLAN  #Conns  PNET-ID >   > 00000400 SERV     SYM         0       0  NET25 >   > [ 5 more LG 0500-0900] > > > - Linkgroups for the SMCD & SMCR connections are being build down once the clients finish. > - ALL SMCR linkgoups are being cleared completely as expected. They still reside empty for a while which is fine. > - The SMCD linkgroups are NOT cleared all the way. A few connections stay in there (See output above). > - If we perform smcss on the server side those connections are listed there as ACTIVE while the smcss list on the client side is empty. > >   T8345011 >   > [root@t8345011 ~]# smcss >   > State          UID   Inode   Local Address           Peer Address          Intf Mode >   > ACTIVE         00000 0100758 10.25.45.11:40237 10.25.45.10:55790       0000 SMCD >   > [ 36 more ACTIVE connections ] > > > - The remaing ACTIVE connections on the server are reflected in the smcd linkgroup #Conns aswell. > - On the client the lsmod count for the smc module is 39 also reflecting the leftover connections. > >   T8345010 >   > [root@t8345010 tela-kernel]# lsmod |grep smc >   > smc                   225280  39 ism,smc_diag > > > - On the server the lsmod count for the smc module is 79. > >   T8345011 >   > [root@t8345011 ~]# lsmod | grep smc >   > smc                   225280  79 ism,smc_diag > > > - The most important smc_dbg outputs are provided and are showing that the client is pretty clean and the server is still handling ghost connections. > >   T8345011 >   > [root@t8345011 ~]# smc_dbg >   > State          UID   Inode   Local Address           Peer Address          Intf Mode GID              Token            Peer-GID Peer-Token       Linkid >   > ACTIVE         00000 0100758 10.25.45.11:40237 10.25.45.10:55790       0000 SMCD 120014a12e488561 0000890fd0000000 3e0014a32e488561 00008a0bd0000000 00000300 >   > State          UID   Inode   Local Address           Peer Address          Intf Mode Shutd Token    Sndbuf   Rcvbuf   Peerbuf rxprod-Cursor rxcons-Cursor rxFlags txprod-Cursor txcons-Cursor txFlags txprep-Cursor txsent-Cursor txfin-Cursor >   > ACTIVE         00000 0100758 10.25.45.11:40237 10.25.45.10:55790       0000 SMCD  <->  00001611 00004000 0000ffe0 0000ffe0 0000:00000000 0000:00000000 00:00   0000:00000000 0000:00000000 00:00   0000:00000000 0000:00000000 0000:00000000 > > > - Via netstat we see that the server is in a CLOSE_WAIT state for the connections and the client in a FIN_WAIT2 > >   T8345010 >   > [root@t8345010 tela-kernel]# netstat -nta >   > Proto Recv-Q Send-Q Local Address           Foreign Address State >   > tcp        0      0 10.25.45.10:55790       10.25.45.11:40237 FIN_WAIT2 >   T8345011 >   > [root@t8345011 ~]# netstat -nta | grep "40237" >   > tcp        1      0 10.25.45.11:40237       10.25.45.10:55790 CLOSE_WAIT > > > While I'm pretty new to the mailing list we had a discussion about how to provide the log data in a reasonable way. > To prevent too much information we decided to go for the short output on top. If that is not enough for you shot me a message and i can send you the full output outside the mailing list. > If you have any ideas on how to provide larger output in a reasonable way feel free to share your oppinion. > > I hope the new output helps you locating the error. > Feel free to contact us in case you have questions. > - Jan