* Question regarding CDC NCM and VNC performance issue @ 2023-12-04 18:37 Hiago De Franco 2023-12-05 9:50 ` Oliver Neukum 0 siblings, 1 reply; 29+ messages in thread From: Hiago De Franco @ 2023-12-04 18:37 UTC (permalink / raw) To: linux-usb Cc: Oliver Neukum, David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Hiago De Franco From: Hiago De Franco <hiago.franco@toradex.com> Hi, I hope this email finds you well. I would like to ask a question regarding the usage of CDC NCM and VCN on an ARM device running the Linux kernel, connected to a host PC, also running Linux, via USB. I have successfully set up a CDC NCM connection, where my device acts as a DHCP server, and the connection is established without any issues. Now, I have a VNC server running on the device. However, when I connect to the VNC server, the performance is extremely slow and it also freezes, making it unusable. Interestingly, when I simultaneously launch iperf3 using the same NCM connection, the VNC performance improves significantly. The VNC remains smooth as long as iperf3 is active. When iperf3 finishes, the VNC returns to its frozen or slow state. I've tested the same setup with Windows 11, and in that case, the VNC operates normally without any issues. I've also compared the network traffic of both Windows and Linux connections using Wireshark but haven't identified anything relevant. Has anyone encountered a similar issue before? Could this be related to the size or quantity of transmitted packages? Any insights or assistance on this matter would be greatly appreciated. Thanks, Hiago ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Question regarding CDC NCM and VNC performance issue 2023-12-04 18:37 Question regarding CDC NCM and VNC performance issue Hiago De Franco @ 2023-12-05 9:50 ` Oliver Neukum 2023-12-05 12:22 ` Hiago De Franco 0 siblings, 1 reply; 29+ messages in thread From: Oliver Neukum @ 2023-12-05 9:50 UTC (permalink / raw) To: Hiago De Franco, linux-usb Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Hiago De Franco On 04.12.23 19:37, Hiago De Franco wrote: > > Has anyone encountered a similar issue before? Could this be related to > the size or quantity of transmitted packages? At first thought my gut feeling is that the packet bonding is killing your performance. What does a simple ping do? Regards Oliver ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Question regarding CDC NCM and VNC performance issue 2023-12-05 9:50 ` Oliver Neukum @ 2023-12-05 12:22 ` Hiago De Franco 2023-12-05 12:38 ` Oliver Neukum 0 siblings, 1 reply; 29+ messages in thread From: Hiago De Franco @ 2023-12-05 12:22 UTC (permalink / raw) To: oneukum; +Cc: davem, edumazet, hiago.franco, hiagofranco, kuba, linux-usb, pabeni From: Hiago De Franco <hiago.franco@toradex.com> Hi Oliver, > At first thought my gut feeling is that the packet bonding is killing > your performance. What does a simple ping do? Thanks for the reply. Pinging the device works with or without the VNC connected. $ ping 192.168.11.2 PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data. 64 bytes from 192.168.11.2: icmp_seq=1 ttl=64 time=0.027 ms 64 bytes from 192.168.11.2: icmp_seq=2 ttl=64 time=0.092 ms 64 bytes from 192.168.11.2: icmp_seq=3 ttl=64 time=0.072 ms ... 64 bytes from 192.168.11.2: icmp_seq=37 ttl=64 time=0.065 ms 64 bytes from 192.168.11.2: icmp_seq=38 ttl=64 time=0.070 ms 64 bytes from 192.168.11.2: icmp_seq=39 ttl=64 time=0.090 ms ^C --- 192.168.11.2 ping statistics --- 39 packets transmitted, 39 received, 0% packet loss, time 38910ms rtt min/avg/max/mdev = 0.027/0.076/0.105/0.015 ms During the ping, nothing changes, VNC continues to be extremely slow. I switched from NCM to ECM for testing purposes. On ECM, everything works seamlessly without any issues. It appears that the problem may be specific to the combination of NCM and the Linux host machine, as the setup works fine on Windows. I'm curious if there are issues with NCM and small USB packet lengths. The VNC application tends to send numerous small packets (1514 bytes). Regards, Hiago ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Question regarding CDC NCM and VNC performance issue 2023-12-05 12:22 ` Hiago De Franco @ 2023-12-05 12:38 ` Oliver Neukum 2023-12-05 12:55 ` Hiago De Franco 2023-12-05 20:14 ` Hiago De Franco 0 siblings, 2 replies; 29+ messages in thread From: Oliver Neukum @ 2023-12-05 12:38 UTC (permalink / raw) To: Hiago De Franco, oneukum Cc: davem, edumazet, hiago.franco, kuba, linux-usb, pabeni On 05.12.23 13:22, Hiago De Franco wrote: Hi, > During the ping, nothing changes, VNC continues to be extremely slow. I was wondering whether you ping itself has bad delays. > I switched from NCM to ECM for testing purposes. On ECM, everything works > seamlessly without any issues. It appears that the problem may be specific > to the combination of NCM and the Linux host machine, as the setup works > fine on Windows. > > I'm curious if there are issues with NCM and small USB packet lengths. > The VNC application tends to send numerous small packets (1514 bytes). 1514 is a standard ethernet packet. What is wNtbOutMaxDatagrams on your device? Regards Oliver ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Question regarding CDC NCM and VNC performance issue 2023-12-05 12:38 ` Oliver Neukum @ 2023-12-05 12:55 ` Hiago De Franco 2023-12-05 20:14 ` Hiago De Franco 1 sibling, 0 replies; 29+ messages in thread From: Hiago De Franco @ 2023-12-05 12:55 UTC (permalink / raw) To: oneukum; +Cc: davem, edumazet, hiago.franco, hiagofranco, kuba, linux-usb, pabeni From: Hiago De Franco <hiago.franco@toradex.com> Hi Oliver, In my last reply, I sent the ping command with the wrong target. This is the correct one: $ ping 192.168.11.1 PING 192.168.11.1 (192.168.11.1) 56(84) bytes of data. ... 64 bytes from 192.168.11.1: icmp_seq=23 ttl=64 time=2.23 ms 64 bytes from 192.168.11.1: icmp_seq=24 ttl=64 time=2.27 ms 64 bytes from 192.168.11.1: icmp_seq=25 ttl=64 time=2.14 ms 64 bytes from 192.168.11.1: icmp_seq=26 ttl=64 time=2.24 ms 64 bytes from 192.168.11.1: icmp_seq=27 ttl=64 time=2.26 ms 64 bytes from 192.168.11.1: icmp_seq=28 ttl=64 time=2.14 ms 64 bytes from 192.168.11.1: icmp_seq=29 ttl=64 time=3106 ms 64 bytes from 192.168.11.1: icmp_seq=30 ttl=64 time=2097 ms 64 bytes from 192.168.11.1: icmp_seq=31 ttl=64 time=1073 ms 64 bytes from 192.168.11.1: icmp_seq=32 ttl=64 time=48.7 ms 64 bytes from 192.168.11.1: icmp_seq=33 ttl=64 time=2791 ms 64 bytes from 192.168.11.1: icmp_seq=34 ttl=64 time=1781 ms 64 bytes from 192.168.11.1: icmp_seq=35 ttl=64 time=757 ms 64 bytes from 192.168.11.1: icmp_seq=36 ttl=64 time=6.48 ms 64 bytes from 192.168.11.1: icmp_seq=37 ttl=64 time=5828 ms 64 bytes from 192.168.11.1: icmp_seq=38 ttl=64 time=4821 ms 64 bytes from 192.168.11.1: icmp_seq=39 ttl=64 time=3793 ms 64 bytes from 192.168.11.1: icmp_seq=40 ttl=64 time=2773 ms 64 bytes from 192.168.11.1: icmp_seq=41 ttl=64 time=1745 ms 64 bytes from 192.168.11.1: icmp_seq=42 ttl=64 time=725 ms 64 bytes from 192.168.11.1: icmp_seq=43 ttl=64 time=6.11 ms 64 bytes from 192.168.11.1: icmp_seq=44 ttl=64 time=2467 ms 64 bytes from 192.168.11.1: icmp_seq=45 ttl=64 time=1457 ms 64 bytes from 192.168.11.1: icmp_seq=46 ttl=64 time=437 ms 64 bytes from 192.168.11.1: icmp_seq=47 ttl=64 time=8.34 ms 64 bytes from 192.168.11.1: icmp_seq=48 ttl=64 time=2403 ms 64 bytes from 192.168.11.1: icmp_seq=49 ttl=64 time=1397 ms 64 bytes from 192.168.11.1: icmp_seq=50 ttl=64 time=373 ms 64 bytes from 192.168.11.1: icmp_seq=51 ttl=64 time=3212 ms 64 bytes from 192.168.11.1: icmp_seq=52 ttl=64 time=2197 ms 64 bytes from 192.168.11.1: icmp_seq=53 ttl=64 time=1169 ms 64 bytes from 192.168.11.1: icmp_seq=54 ttl=64 time=145 ms 64 bytes from 192.168.11.1: icmp_seq=55 ttl=64 time=7.27 ms 64 bytes from 192.168.11.1: icmp_seq=56 ttl=64 time=2303 ms 64 bytes from 192.168.11.1: icmp_seq=57 ttl=64 time=1301 ms 64 bytes from 192.168.11.1: icmp_seq=58 ttl=64 time=273 ms 64 bytes from 192.168.11.1: icmp_seq=59 ttl=64 time=2.01 ms 64 bytes from 192.168.11.1: icmp_seq=60 ttl=64 time=2335 ms 64 bytes from 192.168.11.1: icmp_seq=61 ttl=64 time=1333 ms 64 bytes from 192.168.11.1: icmp_seq=62 ttl=64 time=305 ms 64 bytes from 192.168.11.1: icmp_seq=63 ttl=64 time=2.14 ms 64 bytes from 192.168.11.1: icmp_seq=64 ttl=64 time=2.26 ms 64 bytes from 192.168.11.1: icmp_seq=65 ttl=64 time=2.25 ms 64 bytes from 192.168.11.1: icmp_seq=66 ttl=64 time=2.19 ms 64 bytes from 192.168.11.1: icmp_seq=67 ttl=64 time=2.20 ms 64 bytes from 192.168.11.1: icmp_seq=68 ttl=64 time=2.24 ms 64 bytes from 192.168.11.1: icmp_seq=69 ttl=64 time=2.21 ms 64 bytes from 192.168.11.1: icmp_seq=70 ttl=64 time=2.11 ms 64 bytes from 192.168.11.1: icmp_seq=71 ttl=64 time=2.19 ms 64 bytes from 192.168.11.1: icmp_seq=72 ttl=64 time=2.19 ms ^C --- 192.168.11.1 ping statistics --- 72 packets transmitted, 72 received, 0% packet loss, time 71445ms rtt min/avg/max/mdev = 2.011/757.798/5827.620/1266.170 ms, pipe 6 So everything is going well, around 2.1 ms, until I connect to VNC. When this happens, the ping spikes to 1000~2000 ms. Then when I close the connection, it gets back to ~2 ms. I can also see some "TCP Retransmission" on Wireshark. On 05.12.23 13:38, Oliver Neukum wrote: > 1514 is a standard ethernet packet. What is wNtbOutMaxDatagrams on your > device? I couldn't find this file on my device. On my Host Linux machine it reports 0. Regards, Hiago ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Question regarding CDC NCM and VNC performance issue 2023-12-05 12:38 ` Oliver Neukum 2023-12-05 12:55 ` Hiago De Franco @ 2023-12-05 20:14 ` Hiago De Franco 2023-12-06 12:21 ` Oliver Neukum 1 sibling, 1 reply; 29+ messages in thread From: Hiago De Franco @ 2023-12-05 20:14 UTC (permalink / raw) To: oneukum; +Cc: davem, edumazet, hiago.franco, hiagofranco, kuba, linux-usb, pabeni From: Hiago De Franco <hiago.franco@toradex.com> Hi Oliver, Following up on your initial email, I found that adjusting the NTB size successfully resolved the problem on both Linux and Windows. The VNC client is now operating with speed and stability after dividing the package size by 8. I know that altering the NTB size is not the correct solution, please consider the attached patch as an explanation, but the assumption that the issue may be related to an accumulation of small packages, causing delays in fulfilling the 16KB requirement, seems related. It appears that the proper fix involves the `hrtimer_start` function on line 1084 of `f_ncm.c`. However, I'm not sure what I could do to fix it. Thanks and regards, Hiago. drivers/usb/gadget/function/f_ncm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/usb/gadget/function/f_ncm.c b/drivers/usb/gadget/function/f_ncm.c index cc0ed29a4adc..138982da794a 100644 --- a/drivers/usb/gadget/function/f_ncm.c +++ b/drivers/usb/gadget/function/f_ncm.c @@ -88,8 +88,8 @@ static inline struct f_ncm *func_to_ncm(struct usb_function *f) * If the host can group frames, allow it to do that, 16K is selected, * because it's used by default by the current linux host driver */ -#define NTB_DEFAULT_IN_SIZE 16384 -#define NTB_OUT_SIZE 16384 +#define NTB_DEFAULT_IN_SIZE 2048 +#define NTB_OUT_SIZE 2048 /* Allocation for storing the NDP, 32 should suffice for a * 16k packet. This allows a maximum of 32 * 507 Byte packets to -- 2.42.0 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: Question regarding CDC NCM and VNC performance issue 2023-12-05 20:14 ` Hiago De Franco @ 2023-12-06 12:21 ` Oliver Neukum 2023-12-06 16:23 ` Hiago De Franco 0 siblings, 1 reply; 29+ messages in thread From: Oliver Neukum @ 2023-12-06 12:21 UTC (permalink / raw) To: Hiago De Franco; +Cc: davem, edumazet, hiago.franco, kuba, linux-usb, pabeni On 05.12.23 21:14, Hiago De Franco wrote: > From: Hiago De Franco <hiago.franco@toradex.com> > > Hi Oliver, > > Following up on your initial email, I found that adjusting the NTB size > successfully resolved the problem on both Linux and Windows. The VNC > client is now operating with speed and stability after dividing the > package size by 8. OK, that indicates that your performance is bad because your VNC client was only partially filling the buffers, which caused most of your packets to be transmitted only upon a timeout. At least this is my working hypothesis. > I know that altering the NTB size is not the correct solution, please > consider the attached patch as an explanation, but the assumption that the > issue may be related to an accumulation of small packages, causing delays > in fulfilling the 16KB requirement, seems related. Before we proceed, however, your patch raises a fundamental question: Are we talking about gadget or host mode here? Could you please describe your setup in detail? Regards Oliver ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Question regarding CDC NCM and VNC performance issue 2023-12-06 12:21 ` Oliver Neukum @ 2023-12-06 16:23 ` Hiago De Franco 2023-12-07 9:41 ` Oliver Neukum 0 siblings, 1 reply; 29+ messages in thread From: Hiago De Franco @ 2023-12-06 16:23 UTC (permalink / raw) To: oneukum; +Cc: davem, edumazet, hiago.franco, hiagofranco, kuba, linux-usb, pabeni From: Hiago De Franco <hiago.franco@toradex.com> Hi Oliver, On 06.12.23 13:21, Oliver Neukum wrote: > Before we proceed, however, your patch raises a fundamental question: > Are we talking about gadget or host mode here? > Could you please describe your setup in detail? Sure, I'm running a Apalis iMX6QD 512MB, which is basically an ARM 32-bits device with NXP iMX6QD. This device is running Linux with wayland, weston and the neatvnc server, so I can access my app using a vnc client on my host PC (also running Linux, Debian 12). The device is connected to my host pc using a USB cable. The gadget is my ARM device, which I'm configuring using the libusbgx to create a new gadget called 'gncm'. The host, in this case, is my Debian PC. My device is automatically configured to use the ip 192.168.11.1 and give to my PC, using the udhcpd from busybox, the ip 192.168.11.2. Everything works, when I connect the ARM device to my PC with the USB cable, my Debian automatically detects as a new interface and get the 192.168.11.2 ip: 39: enx4224347674f3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 42:24:34:76:74:f3 brd ff:ff:ff:ff:ff:ff inet 192.168.11.2/24 brd 192.168.11.255 scope global dynamic noprefixroute enx4224347674f3 valid_lft 863996sec preferred_lft 863996sec inet6 fe80::8959:d7da:f1fc:774e/64 scope link noprefixroute valid_lft forever preferred_lft forever While on the device: / # 3: usb0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 7a:c5:38:46:21:e5 brd ff:ff:ff:ff:ff:ff inet 192.168.11.1/24 scope global usb0 valid_lft forever preferred_lft forever / # cat /sys/class/udc/ci_hdrc.0/function gncm / # ls /sys/kernel/config/usb_gadget/gncm/ UDC bMaxPacketSize0 functions os_desc bDeviceClass bcdDevice idProduct strings bDeviceProtocol bcdUSB idVendor bDeviceSubClass configs max_speed Finally, I open a VNC client on my host PC and connect to 192.168.11.1, where the issue begins with it being very slow/frozen. After applying the patch on the device (Apalis iMX6), now VNC runs smoothly. Now I will also try to debug the driver on the PC host side. Thanks and regards, Hiago ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Question regarding CDC NCM and VNC performance issue 2023-12-06 16:23 ` Hiago De Franco @ 2023-12-07 9:41 ` Oliver Neukum 2023-12-07 10:03 ` Francesco Dolcini 0 siblings, 1 reply; 29+ messages in thread From: Oliver Neukum @ 2023-12-07 9:41 UTC (permalink / raw) To: Hiago De Franco, oneukum, Maciej Żenczykowski Cc: davem, edumazet, hiago.franco, kuba, linux-usb, pabeni On 06.12.23 17:23, Hiago De Franco wrote: > From: Hiago De Franco <hiago.franco@toradex.com> > Good morning, > Hi Oliver, > > On 06.12.23 13:21, Oliver Neukum wrote: >> Before we proceed, however, your patch raises a fundamental question: >> Are we talking about gadget or host mode here? >> Could you please describe your setup in detail? > > Sure, I'm running a Apalis iMX6QD 512MB, which is basically an ARM 32-bits > device with NXP iMX6QD. This device is running Linux with wayland, weston > and the neatvnc server, so I can access my app using a vnc client on my > host PC (also running Linux, Debian 12). OK, you are using Linux on _both_ sides. Interesting, far from obvious, though. (Putting Maciej into CC) > Finally, I open a VNC client on my host PC and connect to 192.168.11.1, > where the issue begins with it being very slow/frozen. After applying the > patch on the device (Apalis iMX6), now VNC runs smoothly. For debugging purposes your first patch modifies the result of USB_CDC_GET_NTB_PARAMETERS. Thereby both the host and the gadget change their behavior. Hence we cannot tell which side has a problem with its buffering at the old buffer size. Can you tell us which side in your test case produces many small packets? Furthermore, for testing purposes, could you decrease TX_TIMEOUT_NSECS in f_ncm.c by an order of magnitude. Regards Oliver ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Question regarding CDC NCM and VNC performance issue 2023-12-07 9:41 ` Oliver Neukum @ 2023-12-07 10:03 ` Francesco Dolcini 2023-12-07 11:07 ` Oliver Neukum 0 siblings, 1 reply; 29+ messages in thread From: Francesco Dolcini @ 2023-12-07 10:03 UTC (permalink / raw) To: Oliver Neukum Cc: Hiago De Franco, Maciej Żenczykowski, davem, edumazet, hiago.franco, kuba, linux-usb, pabeni Hello Oliver and Maciej On Thu, Dec 07, 2023 at 10:41:51AM +0100, Oliver Neukum wrote: > On 06.12.23 17:23, Hiago De Franco wrote: > > From: Hiago De Franco <hiago.franco@toradex.com> > > On 06.12.23 13:21, Oliver Neukum wrote: > > > Before we proceed, however, your patch raises a fundamental question: > > > Are we talking about gadget or host mode here? > > > Could you please describe your setup in detail? > > > > Sure, I'm running a Apalis iMX6QD 512MB, which is basically an ARM 32-bits > > device with NXP iMX6QD. This device is running Linux with wayland, weston > > and the neatvnc server, so I can access my app using a vnc client on my > > host PC (also running Linux, Debian 12). > > OK, you are using Linux on _both_ sides. Interesting, far from obvious, though. > (Putting Maciej into CC) One more data point. If the USB host side is Windows and not Linux it works fine. Francesco ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Question regarding CDC NCM and VNC performance issue 2023-12-07 10:03 ` Francesco Dolcini @ 2023-12-07 11:07 ` Oliver Neukum 2023-12-07 11:38 ` Maciej Żenczykowski 0 siblings, 1 reply; 29+ messages in thread From: Oliver Neukum @ 2023-12-07 11:07 UTC (permalink / raw) To: Francesco Dolcini, Oliver Neukum Cc: Hiago De Franco, Maciej Żenczykowski, davem, edumazet, hiago.franco, kuba, linux-usb, pabeni On 07.12.23 11:03, Francesco Dolcini wrote: > Hello Oliver and Maciej > > On Thu, Dec 07, 2023 at 10:41:51AM +0100, Oliver Neukum wrote: >> OK, you are using Linux on _both_ sides. Interesting, far from obvious, though. >> (Putting Maciej into CC) > > One more data point. If the USB host side is Windows and not Linux it > works fine. That suggests, but does not prove that the issue is on the host side. Could you post the result of "ethtool -S" after a test run? We should get statistics on the reasons for transmissions that way. Regards Oliver ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Question regarding CDC NCM and VNC performance issue 2023-12-07 11:07 ` Oliver Neukum @ 2023-12-07 11:38 ` Maciej Żenczykowski 2023-12-07 18:57 ` Hiago De Franco 0 siblings, 1 reply; 29+ messages in thread From: Maciej Żenczykowski @ 2023-12-07 11:38 UTC (permalink / raw) To: Oliver Neukum Cc: Francesco Dolcini, Hiago De Franco, davem, edumazet, hiago.franco, kuba, linux-usb, pabeni On Thu, Dec 7, 2023 at 12:07 PM Oliver Neukum <oneukum@suse.com> wrote: > On 07.12.23 11:03, Francesco Dolcini wrote: > > Hello Oliver and Maciej > > > > On Thu, Dec 07, 2023 at 10:41:51AM +0100, Oliver Neukum wrote: > > >> OK, you are using Linux on _both_ sides. Interesting, far from obvious, though. > >> (Putting Maciej into CC) > > > > One more data point. If the USB host side is Windows and not Linux it > > works fine. > > That suggests, but does not prove that the issue is on the host side. > Could you post the result of "ethtool -S" after a test run? We should > get statistics on the reasons for transmissions that way. An every 1s (the default) ping is too rare to be of help I'd assume... Try ping with various intervals (-i). All the way down to a flood ping (-f). Most likely -i 0.01 would be enough to make things work better... Also which specific versions of the kernel are involved on both sides of the connection. There was a pretty recent fix related to packet aggregation recently that could be either the fix or the cause. "usb: gadget: ncm: Handle decoding of multiple NTB's in unwrap call" Though I doubt it - I believe that was specific to how windows packs things. Also Krishna Kurapati has a (afaik still not merged) patch "usb: gadget: ncm: Add support to configure wMaxSegmentSize" that could be of use - though again, doubt it. Another thing that comes to mind, is that perhaps the device in question does not have sufficiently high res timers? There might be something in the kernel boot log / dmesg about hrtimer resolution... Maybe this just needs to be configurable... Or pick a smaller value with broken hrtimer (if that's the issue), or just disable aggregation if lowres hrtimers... etc... #define TX_TIMEOUT_NSECS 300000 300 us is too small to be noticeable by VNC imho, so I think something *must* be misbehaving. Perhaps timer resolution is bad and this 300us ends up being much larger...??? I wonder if the hrtimer_init() call should be with CLOCK_BOOTTIME instead of CLOCK_MONOTONIC. There could potentially be an issue with suspend, though I really doubt it. Another idea would be to add a gadget setting to disable tx aggregation entirely... (note that reducing from 8000 to 2000 doesn't actually turn off aggregation...) Have you tried reducing from 8000 to 4000 or 3500 or 3000? Maybe there's some funkiness with page sizes?? ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Question regarding CDC NCM and VNC performance issue 2023-12-07 11:38 ` Maciej Żenczykowski @ 2023-12-07 18:57 ` Hiago De Franco 2023-12-07 19:37 ` Maciej Żenczykowski 0 siblings, 1 reply; 29+ messages in thread From: Hiago De Franco @ 2023-12-07 18:57 UTC (permalink / raw) To: Oliver Neukum, Maciej Żenczykowski Cc: Francesco Dolcini, Hiago De Franco, davem, edumazet, hiago.franco, kuba, linux-usb, pabeni Hello Oliver and Maciej, On Thu, Dec 07, 2023 at 10:41:51AM +0100, Oliver Neukum wrote: > Can you tell us which side in your test case produces many small packets? > > Furthermore, for testing purposes, could you decrease TX_TIMEOUT_NSECS in > f_ncm.c by an order of magnitude. The device side, Apalis iMX6, is the one producing the small packages and sending to my host PC. The VNC server is running on iMX6 device and, the client, on my host PC. I've decreased TX_TIMEOUT_NSECS from 300000 to 30000, but nothing changed, the behaviour is the same. On Thu, Dec 07, 2023 at 12:07:25PM +0100, Oliver Neukum wrote: > That suggests, but does not prove that the issue is on the host side. > Could you post the result of "ethtool -S" after a test run? We should > get statistics on the reasons for transmissions that way. Sure, for some reason I couldn't run ethtool on the iMX6 device: / # ethtool -S usb0 no stats available So I ran everything on my Debian host PC. First, without any changes on the device's kernel, this is the result (where the VNC is really slow/frozen): $ sudo ethtool -S enx3e5dcdead75e NIC statistics: tx_reason_ntb_full: 0 tx_reason_ndp_full: 0 tx_reason_timeout: 222 tx_reason_max_datagram: 0 tx_overhead: 42387 tx_ntbs: 222 rx_overhead: 38375 rx_ntbs: 529 Next, I decreased NTB_DEFAULT_IN_SIZE and NTB_OUT_SIZE from 16384 to 8192. The performance improved a bit, and this is the result: $ sudo ethtool -S enx42ff68c1000a NIC statistics: tx_reason_ntb_full: 0 tx_reason_ndp_full: 0 tx_reason_timeout: 321 tx_reason_max_datagram: 0 tx_overhead: 61617 tx_ntbs: 321 rx_overhead: 59341 rx_ntbs: 759 Finally, I changed from 8192 to 4096, and the perfomance was better: $ sudo ethtool -S enx3a601e306de1 NIC statistics: tx_reason_ntb_full: 0 tx_reason_ndp_full: 0 tx_reason_timeout: 56067 tx_reason_max_datagram: 0 tx_overhead: 83630876 tx_ntbs: 56064 rx_overhead: 25437595 rx_ntbs: 847920 At 4096 I can use the VNC with my app, click on buttons and see the mouse moving smoothly. Please note the device name changes because we're using random MAC addresses. 'ethtool' was running on my Debian host PC. I tested for 1min30s and then got the statics with ethtool for all 3 tests. On Thu, Dec 07, 2023 at 12:38:18PM +0100, Maciej Żenczykowski wrote: > > An every 1s (the default) ping is too rare to be of help I'd assume... > Try ping with various intervals (-i). All the way down to a flood ping (-f). > Most likely -i 0.01 would be enough to make things work better... This one is interesting, backing to the default value of 16384, I launched the VNC client which now is back to be slow/frozen. In parallel, I started the ping command. The first one with 1 second results the following (I'm pingging my host PC, using my device's terminal): / # ping 192.168.11.2 -i 1 PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data. 64 bytes from 192.168.11.2: icmp_seq=1 ttl=64 time=5070 ms 64 bytes from 192.168.11.2: icmp_seq=2 ttl=64 time=4003 ms 64 bytes from 192.168.11.2: icmp_seq=3 ttl=64 time=2963 ms 64 bytes from 192.168.11.2: icmp_seq=4 ttl=64 time=1923 ms 64 bytes from 192.168.11.2: icmp_seq=5 ttl=64 time=883 ms ^C --- 192.168.11.2 ping statistics --- 26 packets transmitted, 5 received, 80.7692% packet loss, time 25950ms rtt min/avg/max/mdev = 882.886/2968.250/5069.878/1478.425 ms, pipe 5 Ping is really slow and lost almost all packets. Next test, I decreased to 0.1s: / # ping 192.168.11.2 -i 0.1 PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data. ... --- 192.168.11.2 ping statistics --- 129 packets transmitted, 122 received, 5.42636% packet loss, time 13971ms rtt min/avg/max/mdev = 1.752/999.274/2751.767/799.248 ms, pipe 26 While ping is running in parallel, VNC has a better performance, I can see my mouse running and click on some buttons. As soon as ping stops, VNC is slow/frozen again. Also we have less packet loss. Next test, decreased to 0.01: / # ping 192.168.11.2 -i 0.01 PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data. ... --- 192.168.11.2 ping statistics --- 584 packets transmitted, 572 received, 2.05479% packet loss, time 10106ms rtt min/avg/max/mdev = 1.565/171.031/559.872/165.475 ms, pipe 28 And finally, the flood: / # ping 192.168.11.2 -f PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data. ... --- 192.168.11.2 ping statistics --- 1314 packets transmitted, 1311 received, 0.22831% packet loss, time 16299ms rtt min/avg/max/mdev = 0.729/131.710/548.971/163.098 ms, pipe 28, ipg/ewma 12.413/7.802 ms While the flood is happening in parallel, the VNC runs very smoothly, and, again, as soon as it stops, it's back to slow/frozen. I believe here the ping command is helping to fullfil the buffer, that's why running it on parallel makes the VNC work... > Also which specific versions of the kernel are involved on both sides > of the connection. Device iMX6 is running Linux kernel v6.1.65, while my host PC is running Linux kernel v6.5.0. > There was a pretty recent fix related to packet aggregation recently > that could be either the fix or the cause. > "usb: gadget: ncm: Handle decoding of multiple NTB's in unwrap call" > Though I doubt it - I believe that was specific to how windows packs things. > > Also Krishna Kurapati has a (afaik still not merged) patch "usb: > gadget: ncm: Add support to configure wMaxSegmentSize" > that could be of use - though again, doubt it. I could also try to apply these patches and check how it goes. Thanks for the information. > Another thing that comes to mind, is that perhaps the device in > question does not have sufficiently high res timers? > There might be something in the kernel boot log / dmesg about hrtimer > resolution... > Maybe this just needs to be configurable... Or pick a smaller value > with broken hrtimer (if that's the issue), > or just disable aggregation if lowres hrtimers... etc... > > #define TX_TIMEOUT_NSECS 300000 > 300 us is too small to be noticeable by VNC imho, so I think something > *must* be misbehaving. > Perhaps timer resolution is bad and this 300us ends up being much larger...??? This is what I got from dmesg inside the iMX6 device: / # dmesg | grep timer [ 0.000000] Switching to timer-based delay loop, resolution 333ns [ 0.000019] clocksource: mxc_timer1: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 637086815595 ns [ 0.001545] Calibrating delay loop (skipped), value calculated using timer frequency.. 6.00 BogoMIPS (lpj=30000) [ 0.203469] clocksource: Switched to clocksource mxc_timer1 > I wonder if the hrtimer_init() call should be with CLOCK_BOOTTIME > instead of CLOCK_MONOTONIC. > There could potentially be an issue with suspend, though I really doubt it. Also tested this, but it didn't change anything, VNC is still slow/frozen. > Another idea would be to add a gadget setting to disable tx > aggregation entirely... > (note that reducing from 8000 to 2000 doesn't actually turn off aggregation...) > > Have you tried reducing from 8000 to 4000 or 3500 or 3000? > Maybe there's some funkiness with page sizes?? Sorry, do you mean reducing the both NTB_DEFAULT_IN_SIZE and NTB_OUT_SIZE? If so, I tried to reduce from 16384 to 8192, 4096 and 2048. Everytime I reduced the value, it got better and better, VNC running smoothly. I will keep testing to see if I get something. Thanks and regards, Hiago. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Question regarding CDC NCM and VNC performance issue 2023-12-07 18:57 ` Hiago De Franco @ 2023-12-07 19:37 ` Maciej Żenczykowski 2023-12-11 20:28 ` Hiago De Franco 0 siblings, 1 reply; 29+ messages in thread From: Maciej Żenczykowski @ 2023-12-07 19:37 UTC (permalink / raw) To: Hiago De Franco Cc: Oliver Neukum, Francesco Dolcini, davem, edumazet, hiago.franco, kuba, linux-usb, pabeni On Thu, Dec 7, 2023 at 7:57 PM Hiago De Franco <hiagofranco@gmail.com> wrote: > > Hello Oliver and Maciej, > > On Thu, Dec 07, 2023 at 10:41:51AM +0100, Oliver Neukum wrote: > > Can you tell us which side in your test case produces many small packets? > > > > Furthermore, for testing purposes, could you decrease TX_TIMEOUT_NSECS in > > f_ncm.c by an order of magnitude. > > The device side, Apalis iMX6, is the one producing the small packages > and sending to my host PC. The VNC server is running on iMX6 device and, > the client, on my host PC. > > I've decreased TX_TIMEOUT_NSECS from 300000 to 30000, but nothing > changed, the behaviour is the same. > > On Thu, Dec 07, 2023 at 12:07:25PM +0100, Oliver Neukum wrote: > > That suggests, but does not prove that the issue is on the host side. > > Could you post the result of "ethtool -S" after a test run? We should > > get statistics on the reasons for transmissions that way. > > Sure, for some reason I couldn't run ethtool on the iMX6 device: > > / # ethtool -S usb0 > no stats available > > So I ran everything on my Debian host PC. First, without any changes on > the device's kernel, this is the result (where the VNC is really > slow/frozen): > > $ sudo ethtool -S enx3e5dcdead75e > NIC statistics: > tx_reason_ntb_full: 0 > tx_reason_ndp_full: 0 > tx_reason_timeout: 222 > tx_reason_max_datagram: 0 > tx_overhead: 42387 > tx_ntbs: 222 > rx_overhead: 38375 > rx_ntbs: 529 > > Next, I decreased NTB_DEFAULT_IN_SIZE and NTB_OUT_SIZE from 16384 to > 8192. The performance improved a bit, and this is the result: > > $ sudo ethtool -S enx42ff68c1000a > NIC statistics: > tx_reason_ntb_full: 0 > tx_reason_ndp_full: 0 > tx_reason_timeout: 321 > tx_reason_max_datagram: 0 > tx_overhead: 61617 > tx_ntbs: 321 > rx_overhead: 59341 > rx_ntbs: 759 > > Finally, I changed from 8192 to 4096, and the perfomance was > better: > > $ sudo ethtool -S enx3a601e306de1 > NIC statistics: > tx_reason_ntb_full: 0 > tx_reason_ndp_full: 0 > tx_reason_timeout: 56067 > tx_reason_max_datagram: 0 > tx_overhead: 83630876 > tx_ntbs: 56064 > rx_overhead: 25437595 > rx_ntbs: 847920 > > At 4096 I can use the VNC with my app, click on buttons and see the mouse > moving smoothly. Please note the device name changes because we're using > random MAC addresses. 'ethtool' was running on my Debian host PC. I tested > for 1min30s and then got the statics with ethtool for all 3 tests. > > On Thu, Dec 07, 2023 at 12:38:18PM +0100, Maciej Żenczykowski wrote: > > > > An every 1s (the default) ping is too rare to be of help I'd assume... > > Try ping with various intervals (-i). All the way down to a flood ping (-f). > > Most likely -i 0.01 would be enough to make things work better... > > This one is interesting, backing to the default value of 16384, I > launched the VNC client which now is back to be slow/frozen. In > parallel, I started the ping command. The first one with 1 second > results the following (I'm pingging my host PC, using my device's > terminal): > > / # ping 192.168.11.2 -i 1 > PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data. > 64 bytes from 192.168.11.2: icmp_seq=1 ttl=64 time=5070 ms > 64 bytes from 192.168.11.2: icmp_seq=2 ttl=64 time=4003 ms > 64 bytes from 192.168.11.2: icmp_seq=3 ttl=64 time=2963 ms > 64 bytes from 192.168.11.2: icmp_seq=4 ttl=64 time=1923 ms > 64 bytes from 192.168.11.2: icmp_seq=5 ttl=64 time=883 ms > ^C > --- 192.168.11.2 ping statistics --- > 26 packets transmitted, 5 received, 80.7692% packet loss, time 25950ms > rtt min/avg/max/mdev = 882.886/2968.250/5069.878/1478.425 ms, pipe 5 > > Ping is really slow and lost almost all packets. It is *slow*. But I don't think you're actually losing packets. They simply haven't arrived *yet*. Notice that the packets are sent every second, but they arrive after ~5/~4/~3/~2/~1 s. This suggests they all got delivered in one burst. You are missing 21 packets, but it's all the *final* 21 packets. You'd need to run for a long long time (maybe 5 minutes or even more) and see what happens then... to see if the packets are indeed lost, or just not yet delivered... > Next test, I decreased > to 0.1s: > > / # ping 192.168.11.2 -i 0.1 > PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data. > ... > --- 192.168.11.2 ping statistics --- > 129 packets transmitted, 122 received, 5.42636% packet loss, time 13971ms > rtt min/avg/max/mdev = 1.752/999.274/2751.767/799.248 ms, pipe 26 > > While ping is running in parallel, VNC has a better performance, I can > see my mouse running and click on some buttons. As soon as ping stops, > VNC is slow/frozen again. Also we have less packet loss. Again - this packet loss may be just at the tail. You'd need to run it for much much longer to see if packet loss is high or not. It may just be that you always lose the last ~7 packets or so... > > Next test, decreased to 0.01: > > / # ping 192.168.11.2 -i 0.01 > PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data. > ... > --- 192.168.11.2 ping statistics --- > 584 packets transmitted, 572 received, 2.05479% packet loss, time 10106ms > rtt min/avg/max/mdev = 1.565/171.031/559.872/165.475 ms, pipe 28 > > And finally, the flood: > > / # ping 192.168.11.2 -f > PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data. > ... > --- 192.168.11.2 ping statistics --- > 1314 packets transmitted, 1311 received, 0.22831% packet loss, time 16299ms > rtt min/avg/max/mdev = 0.729/131.710/548.971/163.098 ms, pipe 28, ipg/ewma > 12.413/7.802 ms > > While the flood is happening in parallel, the VNC runs very smoothly, > and, again, as soon as it stops, it's back to slow/frozen. > > I believe here the ping command is helping to fullfil the buffer, that's > why running it on parallel makes the VNC work... > > > Also which specific versions of the kernel are involved on both sides > > of the connection. > > Device iMX6 is running Linux kernel v6.1.65, while my host PC is running > Linux kernel v6.5.0. > > > There was a pretty recent fix related to packet aggregation recently > > that could be either the fix or the cause. > > "usb: gadget: ncm: Handle decoding of multiple NTB's in unwrap call" > > Though I doubt it - I believe that was specific to how windows packs things. > > > > Also Krishna Kurapati has a (afaik still not merged) patch "usb: > > gadget: ncm: Add support to configure wMaxSegmentSize" > > that could be of use - though again, doubt it. > > I could also try to apply these patches and check how it goes. Thanks > for the information. > > > Another thing that comes to mind, is that perhaps the device in > > question does not have sufficiently high res timers? > > There might be something in the kernel boot log / dmesg about hrtimer > > resolution... > > Maybe this just needs to be configurable... Or pick a smaller value > > with broken hrtimer (if that's the issue), > > or just disable aggregation if lowres hrtimers... etc... > > > > #define TX_TIMEOUT_NSECS 300000 > > 300 us is too small to be noticeable by VNC imho, so I think something > > *must* be misbehaving. > > Perhaps timer resolution is bad and this 300us ends up being much larger...??? > > This is what I got from dmesg inside the iMX6 device: > > / # dmesg | grep timer > [ 0.000000] Switching to timer-based delay loop, resolution 333ns > [ 0.000019] clocksource: mxc_timer1: mask: 0xffffffff max_cycles: > 0xffffffff, max_idle_ns: 637086815595 ns > [ 0.001545] Calibrating delay loop (skipped), value calculated using > timer frequency.. 6.00 BogoMIPS (lpj=30000) > [ 0.203469] clocksource: Switched to clocksource mxc_timer1 > > > I wonder if the hrtimer_init() call should be with CLOCK_BOOTTIME > > instead of CLOCK_MONOTONIC. > > There could potentially be an issue with suspend, though I really doubt it. > > Also tested this, but it didn't change anything, VNC is still > slow/frozen. > > > Another idea would be to add a gadget setting to disable tx > > aggregation entirely... > > (note that reducing from 8000 to 2000 doesn't actually turn off aggregation...) > > > > Have you tried reducing from 8000 to 4000 or 3500 or 3000? > > Maybe there's some funkiness with page sizes?? > > Sorry, do you mean reducing the both NTB_DEFAULT_IN_SIZE and NTB_OUT_SIZE? I'm not sure ;-) > If so, I tried to reduce from 16384 to 8192, 4096 and 2048. Everytime I > reduced the value, it got better and better, VNC running smoothly. > > I will keep testing to see if I get something. > > Thanks and regards, > Hiago. I think we need to figure out whether the problem is host->device or device->host. Or perhaps you do already know this?? Maybe this was mentioned already earlier in the thread and I missed it. Try reducing only in or only out... and changing only one side of the connection. Maybe change TX_MAX_NUM_DPE to 1 (or other values) Maybe change if (ncm->skb_tx_data && (ncm->ndp_dgram_count >= TX_MAX_NUM_DPE || (ncm->skb_tx_data->len + div + rem + skb->len + ncm->skb_tx_ndp->len + ndp_align + (2 * dgram_idx_len)) > max_size)) { to just if (ncm->skb_tx_data) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Question regarding CDC NCM and VNC performance issue 2023-12-07 19:37 ` Maciej Żenczykowski @ 2023-12-11 20:28 ` Hiago De Franco 2023-12-11 20:44 ` Maciej Żenczykowski 0 siblings, 1 reply; 29+ messages in thread From: Hiago De Franco @ 2023-12-11 20:28 UTC (permalink / raw) To: Maciej Żenczykowski Cc: Oliver Neukum, Francesco Dolcini, davem, edumazet, hiago.franco, kuba, linux-usb, pabeni On Thu, Dec 07, 2023 at 08:37:09PM +0100, Maciej Żenczykowski wrote: > On Thu, Dec 7, 2023 at 7:57 PM Hiago De Franco <hiagofranco@gmail.com> wrote: > > > > Hello Oliver and Maciej, > > > > On Thu, Dec 07, 2023 at 10:41:51AM +0100, Oliver Neukum wrote: > > > Can you tell us which side in your test case produces many small packets? > > > > > > Furthermore, for testing purposes, could you decrease TX_TIMEOUT_NSECS in > > > f_ncm.c by an order of magnitude. > > > > The device side, Apalis iMX6, is the one producing the small packages > > and sending to my host PC. The VNC server is running on iMX6 device and, > > the client, on my host PC. > > > > I've decreased TX_TIMEOUT_NSECS from 300000 to 30000, but nothing > > changed, the behaviour is the same. > > > > On Thu, Dec 07, 2023 at 12:07:25PM +0100, Oliver Neukum wrote: > > > That suggests, but does not prove that the issue is on the host side. > > > Could you post the result of "ethtool -S" after a test run? We should > > > get statistics on the reasons for transmissions that way. > > > > Sure, for some reason I couldn't run ethtool on the iMX6 device: > > > > / # ethtool -S usb0 > > no stats available > > > > So I ran everything on my Debian host PC. First, without any changes on > > the device's kernel, this is the result (where the VNC is really > > slow/frozen): > > > > $ sudo ethtool -S enx3e5dcdead75e > > NIC statistics: > > tx_reason_ntb_full: 0 > > tx_reason_ndp_full: 0 > > tx_reason_timeout: 222 > > tx_reason_max_datagram: 0 > > tx_overhead: 42387 > > tx_ntbs: 222 > > rx_overhead: 38375 > > rx_ntbs: 529 > > > > Next, I decreased NTB_DEFAULT_IN_SIZE and NTB_OUT_SIZE from 16384 to > > 8192. The performance improved a bit, and this is the result: > > > > $ sudo ethtool -S enx42ff68c1000a > > NIC statistics: > > tx_reason_ntb_full: 0 > > tx_reason_ndp_full: 0 > > tx_reason_timeout: 321 > > tx_reason_max_datagram: 0 > > tx_overhead: 61617 > > tx_ntbs: 321 > > rx_overhead: 59341 > > rx_ntbs: 759 > > > > Finally, I changed from 8192 to 4096, and the perfomance was > > better: > > > > $ sudo ethtool -S enx3a601e306de1 > > NIC statistics: > > tx_reason_ntb_full: 0 > > tx_reason_ndp_full: 0 > > tx_reason_timeout: 56067 > > tx_reason_max_datagram: 0 > > tx_overhead: 83630876 > > tx_ntbs: 56064 > > rx_overhead: 25437595 > > rx_ntbs: 847920 > > > > At 4096 I can use the VNC with my app, click on buttons and see the mouse > > moving smoothly. Please note the device name changes because we're using > > random MAC addresses. 'ethtool' was running on my Debian host PC. I tested > > for 1min30s and then got the statics with ethtool for all 3 tests. > > > > On Thu, Dec 07, 2023 at 12:38:18PM +0100, Maciej Żenczykowski wrote: > > > > > > An every 1s (the default) ping is too rare to be of help I'd assume... > > > Try ping with various intervals (-i). All the way down to a flood ping (-f). > > > Most likely -i 0.01 would be enough to make things work better... > > > > This one is interesting, backing to the default value of 16384, I > > launched the VNC client which now is back to be slow/frozen. In > > parallel, I started the ping command. The first one with 1 second > > results the following (I'm pingging my host PC, using my device's > > terminal): > > > > / # ping 192.168.11.2 -i 1 > > PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data. > > 64 bytes from 192.168.11.2: icmp_seq=1 ttl=64 time=5070 ms > > 64 bytes from 192.168.11.2: icmp_seq=2 ttl=64 time=4003 ms > > 64 bytes from 192.168.11.2: icmp_seq=3 ttl=64 time=2963 ms > > 64 bytes from 192.168.11.2: icmp_seq=4 ttl=64 time=1923 ms > > 64 bytes from 192.168.11.2: icmp_seq=5 ttl=64 time=883 ms > > ^C > > --- 192.168.11.2 ping statistics --- > > 26 packets transmitted, 5 received, 80.7692% packet loss, time 25950ms > > rtt min/avg/max/mdev = 882.886/2968.250/5069.878/1478.425 ms, pipe 5 > > > > Ping is really slow and lost almost all packets. > > It is *slow*. But I don't think you're actually losing packets. > They simply haven't arrived *yet*. > Notice that the packets are sent every second, but they arrive after > ~5/~4/~3/~2/~1 s. > This suggests they all got delivered in one burst. > > You are missing 21 packets, but it's all the *final* 21 packets. > You'd need to run for a long long time (maybe 5 minutes or even more) > and see what happens then... > to see if the packets are indeed lost, or just not yet delivered... > Great, thanks for the explanation. Indeed, I ran the tests again, but now I waited 6 minutes for each test, and these are the results: -i 1 --- 192.168.11.2 ping statistics --- 349 packets transmitted, 349 received, 0% packet loss, time 361724ms rtt min/avg/max/mdev = 1.597/10751.276/27960.848/12974.107 ms, pipe 27 -i 0.1 --- 192.168.11.2 ping statistics --- 3301 packets transmitted, 3294 received, 0.212057% packet loss, time 361342ms rtt min/avg/max/mdev = 0.959/1131.714/3081.504/1385.261 ms, pipe 29 -i 0.01 --- 192.168.11.2 ping statistics --- 20718 packets transmitted, 20718 received, 0% packet loss, time 366465ms rtt min/avg/max/mdev = 0.568/178.764/587.956/234.702 ms, pipe 29 -flood --- 192.168.11.2 ping statistics --- 29173 packets transmitted, 29149 received, 0.0822679% packet loss, time 362065ms rtt min/avg/max/mdev = 0.518/127.472/583.971/204.277 ms, pipe 29, ipg/ewma 12.411/21.570 ms So yes, we can assume 0% of packet loss. Furthermore, I can see the bursts happening on my terminal, the ping command doesn't do anything and suddenly all the pings arrive at the same time, over and over again in a loop. > > Next test, I decreased > > to 0.1s: > > > > / # ping 192.168.11.2 -i 0.1 > > PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data. > > ... > > --- 192.168.11.2 ping statistics --- > > 129 packets transmitted, 122 received, 5.42636% packet loss, time 13971ms > > rtt min/avg/max/mdev = 1.752/999.274/2751.767/799.248 ms, pipe 26 > > > > While ping is running in parallel, VNC has a better performance, I can > > see my mouse running and click on some buttons. As soon as ping stops, > > VNC is slow/frozen again. Also we have less packet loss. > > Again - this packet loss may be just at the tail. > You'd need to run it for much much longer to see if packet loss is high or not. > It may just be that you always lose the last ~7 packets or so... > > > > > Next test, decreased to 0.01: > > > > / # ping 192.168.11.2 -i 0.01 > > PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data. > > ... > > --- 192.168.11.2 ping statistics --- > > 584 packets transmitted, 572 received, 2.05479% packet loss, time 10106ms > > rtt min/avg/max/mdev = 1.565/171.031/559.872/165.475 ms, pipe 28 > > > > And finally, the flood: > > > > / # ping 192.168.11.2 -f > > PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data. > > ... > > --- 192.168.11.2 ping statistics --- > > 1314 packets transmitted, 1311 received, 0.22831% packet loss, time 16299ms > > rtt min/avg/max/mdev = 0.729/131.710/548.971/163.098 ms, pipe 28, ipg/ewma > > 12.413/7.802 ms > > > > While the flood is happening in parallel, the VNC runs very smoothly, > > and, again, as soon as it stops, it's back to slow/frozen. > > > > I believe here the ping command is helping to fullfil the buffer, that's > > why running it on parallel makes the VNC work... > > > > > Also which specific versions of the kernel are involved on both sides > > > of the connection. > > > > Device iMX6 is running Linux kernel v6.1.65, while my host PC is running > > Linux kernel v6.5.0. > > > > > There was a pretty recent fix related to packet aggregation recently > > > that could be either the fix or the cause. > > > "usb: gadget: ncm: Handle decoding of multiple NTB's in unwrap call" > > > Though I doubt it - I believe that was specific to how windows packs things. > > > > > > Also Krishna Kurapati has a (afaik still not merged) patch "usb: > > > gadget: ncm: Add support to configure wMaxSegmentSize" > > > that could be of use - though again, doubt it. > > > > I could also try to apply these patches and check how it goes. Thanks > > for the information. > > > > > Another thing that comes to mind, is that perhaps the device in > > > question does not have sufficiently high res timers? > > > There might be something in the kernel boot log / dmesg about hrtimer > > > resolution... > > > Maybe this just needs to be configurable... Or pick a smaller value > > > with broken hrtimer (if that's the issue), > > > or just disable aggregation if lowres hrtimers... etc... > > > > > > #define TX_TIMEOUT_NSECS 300000 > > > 300 us is too small to be noticeable by VNC imho, so I think something > > > *must* be misbehaving. > > > Perhaps timer resolution is bad and this 300us ends up being much larger...??? > > > > This is what I got from dmesg inside the iMX6 device: > > > > / # dmesg | grep timer > > [ 0.000000] Switching to timer-based delay loop, resolution 333ns > > [ 0.000019] clocksource: mxc_timer1: mask: 0xffffffff max_cycles: > > 0xffffffff, max_idle_ns: 637086815595 ns > > [ 0.001545] Calibrating delay loop (skipped), value calculated using > > timer frequency.. 6.00 BogoMIPS (lpj=30000) > > [ 0.203469] clocksource: Switched to clocksource mxc_timer1 > > > > > I wonder if the hrtimer_init() call should be with CLOCK_BOOTTIME > > > instead of CLOCK_MONOTONIC. > > > There could potentially be an issue with suspend, though I really doubt it. > > > > Also tested this, but it didn't change anything, VNC is still > > slow/frozen. > > > > > Another idea would be to add a gadget setting to disable tx > > > aggregation entirely... > > > (note that reducing from 8000 to 2000 doesn't actually turn off aggregation...) > > > > > > Have you tried reducing from 8000 to 4000 or 3500 or 3000? > > > Maybe there's some funkiness with page sizes?? > > > > Sorry, do you mean reducing the both NTB_DEFAULT_IN_SIZE and NTB_OUT_SIZE? > > I'm not sure ;-) > > > If so, I tried to reduce from 16384 to 8192, 4096 and 2048. Everytime I > > reduced the value, it got better and better, VNC running smoothly. > > > > I will keep testing to see if I get something. > > > > Thanks and regards, > > Hiago. > > I think we need to figure out whether the problem is host->device or > device->host. > Or perhaps you do already know this?? > Maybe this was mentioned already earlier in the thread and I missed it. > Actually I still don't know yet. > Try reducing only in or only out... and changing only one side of the > connection. I did this test and it was very interesting, thanks for the reccomendation. It looks like only NTB_DEFAULT_IN_SIZE is relevant for this issue. I changed NTB_DEFAULT_IN_SIZE from 16384 to 2048 on the _device_ side (Apalis iMX6) and left the NTB_OUT_SIZE with the default value of 16384, and now I can use my VNC client on the host side (move the mouse, click some buttons...). Very fast and responsive I would say. Does it mean the issue is on the HOST side? > Maybe change TX_MAX_NUM_DPE to 1 (or other values) > > Maybe change > > if (ncm->skb_tx_data > && (ncm->ndp_dgram_count >= TX_MAX_NUM_DPE > || (ncm->skb_tx_data->len + > div + rem + skb->len + > ncm->skb_tx_ndp->len + ndp_align + (2 * dgram_idx_len)) > > max_size)) { > > to just if (ncm->skb_tx_data) Changing the 'if' statement to only 'if (ncm->skb_tx_data)' also made it work! VNC running smoothly on my VNC client. Again, I applied this patch to the device's kernel, Apalis iMX6. I will put some print debugging here to see if I can figure out something else. Thanks a lot, Hiago. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Question regarding CDC NCM and VNC performance issue 2023-12-11 20:28 ` Hiago De Franco @ 2023-12-11 20:44 ` Maciej Żenczykowski 2023-12-12 9:48 ` Oliver Neukum 0 siblings, 1 reply; 29+ messages in thread From: Maciej Żenczykowski @ 2023-12-11 20:44 UTC (permalink / raw) To: Hiago De Franco Cc: Oliver Neukum, Francesco Dolcini, davem, edumazet, hiago.franco, kuba, linux-usb, pabeni On Mon, Dec 11, 2023 at 12:29 PM Hiago De Franco <hiagofranco@gmail.com> wrote: > > On Thu, Dec 07, 2023 at 08:37:09PM +0100, Maciej Żenczykowski wrote: > > On Thu, Dec 7, 2023 at 7:57 PM Hiago De Franco <hiagofranco@gmail.com> wrote: > > > > > > Hello Oliver and Maciej, > > > > > > On Thu, Dec 07, 2023 at 10:41:51AM +0100, Oliver Neukum wrote: > > > > Can you tell us which side in your test case produces many small packets? > > > > > > > > Furthermore, for testing purposes, could you decrease TX_TIMEOUT_NSECS in > > > > f_ncm.c by an order of magnitude. > > > > > > The device side, Apalis iMX6, is the one producing the small packages > > > and sending to my host PC. The VNC server is running on iMX6 device and, > > > the client, on my host PC. > > > > > > I've decreased TX_TIMEOUT_NSECS from 300000 to 30000, but nothing > > > changed, the behaviour is the same. > > > > > > On Thu, Dec 07, 2023 at 12:07:25PM +0100, Oliver Neukum wrote: > > > > That suggests, but does not prove that the issue is on the host side. > > > > Could you post the result of "ethtool -S" after a test run? We should > > > > get statistics on the reasons for transmissions that way. > > > > > > Sure, for some reason I couldn't run ethtool on the iMX6 device: > > > > > > / # ethtool -S usb0 > > > no stats available > > > > > > So I ran everything on my Debian host PC. First, without any changes on > > > the device's kernel, this is the result (where the VNC is really > > > slow/frozen): > > > > > > $ sudo ethtool -S enx3e5dcdead75e > > > NIC statistics: > > > tx_reason_ntb_full: 0 > > > tx_reason_ndp_full: 0 > > > tx_reason_timeout: 222 > > > tx_reason_max_datagram: 0 > > > tx_overhead: 42387 > > > tx_ntbs: 222 > > > rx_overhead: 38375 > > > rx_ntbs: 529 > > > > > > Next, I decreased NTB_DEFAULT_IN_SIZE and NTB_OUT_SIZE from 16384 to > > > 8192. The performance improved a bit, and this is the result: > > > > > > $ sudo ethtool -S enx42ff68c1000a > > > NIC statistics: > > > tx_reason_ntb_full: 0 > > > tx_reason_ndp_full: 0 > > > tx_reason_timeout: 321 > > > tx_reason_max_datagram: 0 > > > tx_overhead: 61617 > > > tx_ntbs: 321 > > > rx_overhead: 59341 > > > rx_ntbs: 759 > > > > > > Finally, I changed from 8192 to 4096, and the perfomance was > > > better: > > > > > > $ sudo ethtool -S enx3a601e306de1 > > > NIC statistics: > > > tx_reason_ntb_full: 0 > > > tx_reason_ndp_full: 0 > > > tx_reason_timeout: 56067 > > > tx_reason_max_datagram: 0 > > > tx_overhead: 83630876 > > > tx_ntbs: 56064 > > > rx_overhead: 25437595 > > > rx_ntbs: 847920 > > > > > > At 4096 I can use the VNC with my app, click on buttons and see the mouse > > > moving smoothly. Please note the device name changes because we're using > > > random MAC addresses. 'ethtool' was running on my Debian host PC. I tested > > > for 1min30s and then got the statics with ethtool for all 3 tests. > > > > > > On Thu, Dec 07, 2023 at 12:38:18PM +0100, Maciej Żenczykowski wrote: > > > > > > > > An every 1s (the default) ping is too rare to be of help I'd assume... > > > > Try ping with various intervals (-i). All the way down to a flood ping (-f). > > > > Most likely -i 0.01 would be enough to make things work better... > > > > > > This one is interesting, backing to the default value of 16384, I > > > launched the VNC client which now is back to be slow/frozen. In > > > parallel, I started the ping command. The first one with 1 second > > > results the following (I'm pingging my host PC, using my device's > > > terminal): > > > > > > / # ping 192.168.11.2 -i 1 > > > PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data. > > > 64 bytes from 192.168.11.2: icmp_seq=1 ttl=64 time=5070 ms > > > 64 bytes from 192.168.11.2: icmp_seq=2 ttl=64 time=4003 ms > > > 64 bytes from 192.168.11.2: icmp_seq=3 ttl=64 time=2963 ms > > > 64 bytes from 192.168.11.2: icmp_seq=4 ttl=64 time=1923 ms > > > 64 bytes from 192.168.11.2: icmp_seq=5 ttl=64 time=883 ms > > > ^C > > > --- 192.168.11.2 ping statistics --- > > > 26 packets transmitted, 5 received, 80.7692% packet loss, time 25950ms > > > rtt min/avg/max/mdev = 882.886/2968.250/5069.878/1478.425 ms, pipe 5 > > > > > > Ping is really slow and lost almost all packets. > > > > It is *slow*. But I don't think you're actually losing packets. > > They simply haven't arrived *yet*. > > Notice that the packets are sent every second, but they arrive after > > ~5/~4/~3/~2/~1 s. > > This suggests they all got delivered in one burst. > > > > You are missing 21 packets, but it's all the *final* 21 packets. > > You'd need to run for a long long time (maybe 5 minutes or even more) > > and see what happens then... > > to see if the packets are indeed lost, or just not yet delivered... > > > > Great, thanks for the explanation. Indeed, I ran the tests again, but > now I waited 6 minutes for each test, and these are the results: > > -i 1 > --- 192.168.11.2 ping statistics --- > 349 packets transmitted, 349 received, 0% packet loss, time 361724ms > rtt min/avg/max/mdev = 1.597/10751.276/27960.848/12974.107 ms, pipe 27 > > -i 0.1 > --- 192.168.11.2 ping statistics --- > 3301 packets transmitted, 3294 received, 0.212057% packet loss, time 361342ms > rtt min/avg/max/mdev = 0.959/1131.714/3081.504/1385.261 ms, pipe 29 > > -i 0.01 > --- 192.168.11.2 ping statistics --- > 20718 packets transmitted, 20718 received, 0% packet loss, time 366465ms > rtt min/avg/max/mdev = 0.568/178.764/587.956/234.702 ms, pipe 29 > > -flood > --- 192.168.11.2 ping statistics --- > 29173 packets transmitted, 29149 received, 0.0822679% packet loss, time 362065ms > rtt min/avg/max/mdev = 0.518/127.472/583.971/204.277 ms, pipe 29, ipg/ewma 12.411/21.570 ms > > So yes, we can assume 0% of packet loss. Furthermore, I can see the > bursts happening on my terminal, the ping command doesn't do anything > and suddenly all the pings arrive at the same time, over and over again > in a loop. Ok, this is great to know. No loss, just very very high delay. > > > Next test, I decreased > > > to 0.1s: > > > > > > / # ping 192.168.11.2 -i 0.1 > > > PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data. > > > ... > > > --- 192.168.11.2 ping statistics --- > > > 129 packets transmitted, 122 received, 5.42636% packet loss, time 13971ms > > > rtt min/avg/max/mdev = 1.752/999.274/2751.767/799.248 ms, pipe 26 > > > > > > While ping is running in parallel, VNC has a better performance, I can > > > see my mouse running and click on some buttons. As soon as ping stops, > > > VNC is slow/frozen again. Also we have less packet loss. > > > > Again - this packet loss may be just at the tail. > > You'd need to run it for much much longer to see if packet loss is high or not. > > It may just be that you always lose the last ~7 packets or so... > > > > > > > > Next test, decreased to 0.01: > > > > > > / # ping 192.168.11.2 -i 0.01 > > > PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data. > > > ... > > > --- 192.168.11.2 ping statistics --- > > > 584 packets transmitted, 572 received, 2.05479% packet loss, time 10106ms > > > rtt min/avg/max/mdev = 1.565/171.031/559.872/165.475 ms, pipe 28 > > > > > > And finally, the flood: > > > > > > / # ping 192.168.11.2 -f > > > PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data. > > > ... > > > --- 192.168.11.2 ping statistics --- > > > 1314 packets transmitted, 1311 received, 0.22831% packet loss, time 16299ms > > > rtt min/avg/max/mdev = 0.729/131.710/548.971/163.098 ms, pipe 28, ipg/ewma > > > 12.413/7.802 ms > > > > > > While the flood is happening in parallel, the VNC runs very smoothly, > > > and, again, as soon as it stops, it's back to slow/frozen. > > > > > > I believe here the ping command is helping to fullfil the buffer, that's > > > why running it on parallel makes the VNC work... > > > > > > > Also which specific versions of the kernel are involved on both sides > > > > of the connection. > > > > > > Device iMX6 is running Linux kernel v6.1.65, while my host PC is running > > > Linux kernel v6.5.0. > > > > > > > There was a pretty recent fix related to packet aggregation recently > > > > that could be either the fix or the cause. > > > > "usb: gadget: ncm: Handle decoding of multiple NTB's in unwrap call" > > > > Though I doubt it - I believe that was specific to how windows packs things. > > > > > > > > Also Krishna Kurapati has a (afaik still not merged) patch "usb: > > > > gadget: ncm: Add support to configure wMaxSegmentSize" > > > > that could be of use - though again, doubt it. > > > > > > I could also try to apply these patches and check how it goes. Thanks > > > for the information. > > > > > > > Another thing that comes to mind, is that perhaps the device in > > > > question does not have sufficiently high res timers? > > > > There might be something in the kernel boot log / dmesg about hrtimer > > > > resolution... > > > > Maybe this just needs to be configurable... Or pick a smaller value > > > > with broken hrtimer (if that's the issue), > > > > or just disable aggregation if lowres hrtimers... etc... > > > > > > > > #define TX_TIMEOUT_NSECS 300000 > > > > 300 us is too small to be noticeable by VNC imho, so I think something > > > > *must* be misbehaving. > > > > Perhaps timer resolution is bad and this 300us ends up being much larger...??? > > > > > > This is what I got from dmesg inside the iMX6 device: > > > > > > / # dmesg | grep timer > > > [ 0.000000] Switching to timer-based delay loop, resolution 333ns > > > [ 0.000019] clocksource: mxc_timer1: mask: 0xffffffff max_cycles: > > > 0xffffffff, max_idle_ns: 637086815595 ns > > > [ 0.001545] Calibrating delay loop (skipped), value calculated using > > > timer frequency.. 6.00 BogoMIPS (lpj=30000) > > > [ 0.203469] clocksource: Switched to clocksource mxc_timer1 > > > > > > > I wonder if the hrtimer_init() call should be with CLOCK_BOOTTIME > > > > instead of CLOCK_MONOTONIC. > > > > There could potentially be an issue with suspend, though I really doubt it. > > > > > > Also tested this, but it didn't change anything, VNC is still > > > slow/frozen. > > > > > > > Another idea would be to add a gadget setting to disable tx > > > > aggregation entirely... > > > > (note that reducing from 8000 to 2000 doesn't actually turn off aggregation...) > > > > > > > > Have you tried reducing from 8000 to 4000 or 3500 or 3000? > > > > Maybe there's some funkiness with page sizes?? > > > > > > Sorry, do you mean reducing the both NTB_DEFAULT_IN_SIZE and NTB_OUT_SIZE? > > > > I'm not sure ;-) > > > > > If so, I tried to reduce from 16384 to 8192, 4096 and 2048. Everytime I > > > reduced the value, it got better and better, VNC running smoothly. > > > > > > I will keep testing to see if I get something. > > > > > > Thanks and regards, > > > Hiago. > > > > I think we need to figure out whether the problem is host->device or > > device->host. > > Or perhaps you do already know this?? > > Maybe this was mentioned already earlier in the thread and I missed it. > > > > Actually I still don't know yet. Heh. > > > Try reducing only in or only out... and changing only one side of the > > connection. > > I did this test and it was very interesting, thanks for the > reccomendation. > It looks like only NTB_DEFAULT_IN_SIZE is relevant for > this issue. I changed NTB_DEFAULT_IN_SIZE from 16384 to 2048 on the > _device_ side (Apalis iMX6) and left the NTB_OUT_SIZE with the default > value of 16384, and now I can use my VNC client on the host side (move the > mouse, click some buttons...). Very fast and responsive I would say. > > Does it mean the issue is on the HOST side? Well... that's what I would assume based on that, but... I think what we've really found is a bug: unsigned max_size = ncm->port.fixed_in_len; is I think using fixed_in_len instead of using out_len. *OR* This is the gadget/device side and IN/OUT are always from the point of view of the host. ie. IN to the host, so transmitted by the gadget/device. It's bound to be one or the other. And while initially I thought the former, I think it's likely the latter. I vaguelly recall having run into something with in/out being host-relative somewhere before. > > Maybe change TX_MAX_NUM_DPE to 1 (or other values) > > > > Maybe change > > > > if (ncm->skb_tx_data > > && (ncm->ndp_dgram_count >= TX_MAX_NUM_DPE > > || (ncm->skb_tx_data->len + > > div + rem + skb->len + > > ncm->skb_tx_ndp->len + ndp_align + (2 * dgram_idx_len)) > > > max_size)) { > > > > to just if (ncm->skb_tx_data) > > Changing the 'if' statement to only 'if (ncm->skb_tx_data)' also made it > work! VNC running smoothly on my VNC client. Again, I applied this patch > to the device's kernel, Apalis iMX6. Here you modified only the device's tx - basically turning off buffering/aggregation. (and likely poorly at that since - I think - this approach still ends up buffering one packet, because it seems to send the previous packet when it gets the next one) ...and in general it seems more likely something is busted on the device, then on the host... > I will put some print debugging here to see if I can figure out > something else. Everything is pointing towards the timer not firing... Perhaps try instrumenting the timer callback to see if it ever runs. ncm_tx_timeout <-- add a printk in here See if the timer registration functions aren't returning non-caught errors. hrtimer_try_to_cancel - returns an int - it's not checked. - maybe this always claims there's no timer to cancel? but afaict: hrtimer_start - does not return anything hrtimer_init - also does not appear to return anything... ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Question regarding CDC NCM and VNC performance issue 2023-12-11 20:44 ` Maciej Żenczykowski @ 2023-12-12 9:48 ` Oliver Neukum 2023-12-12 20:32 ` Hiago De Franco 0 siblings, 1 reply; 29+ messages in thread From: Oliver Neukum @ 2023-12-12 9:48 UTC (permalink / raw) To: Maciej Żenczykowski, Hiago De Franco Cc: Oliver Neukum, Francesco Dolcini, davem, edumazet, hiago.franco, kuba, linux-usb, pabeni On 11.12.23 21:44, Maciej Żenczykowski wrote: > On Mon, Dec 11, 2023 at 12:29 PM Hiago De Franco <hiagofranco@gmail.com> wrote: >> >> On Thu, Dec 07, 2023 at 08:37:09PM +0100, Maciej Żenczykowski wrote: >>> On Thu, Dec 7, 2023 at 7:57 PM Hiago De Franco <hiagofranco@gmail.com> wrote: Hi, >>>> On Thu, Dec 07, 2023 at 12:07:25PM +0100, Oliver Neukum wrote: >>>>> That suggests, but does not prove that the issue is on the host side. >>>>> Could you post the result of "ethtool -S" after a test run? We should >>>>> get statistics on the reasons for transmissions that way. >>>> Finally, I changed from 8192 to 4096, and the perfomance was >>>> better: >>>> >>>> $ sudo ethtool -S enx3a601e306de1 >>>> NIC statistics: >>>> tx_reason_ntb_full: 0 >>>> tx_reason_ndp_full: 0 >>>> tx_reason_timeout: 56067 This has grown two orders of magnitude. >>>> tx_reason_max_datagram: 0 >>>> tx_overhead: 83630876 >>>> tx_ntbs: 56064 >>>> rx_overhead: 25437595 >>>> rx_ntbs: 847920 >>>> >>>> At 4096 I can use the VNC with my app, click on buttons and see the mouse >>>> moving smoothly. Please note the device name changes because we're using >>>> random MAC addresses. 'ethtool' was running on my Debian host PC. I tested >>>> for 1min30s and then got the statics with ethtool for all 3 tests. As you are testing for a constant time, the increase in transmissions due to timeouts also decreases latency by two orders of magnitude. Though this does not ultimately tell us which side is responsible. >>>> While the flood is happening in parallel, the VNC runs very smoothly, >>>> and, again, as soon as it stops, it's back to slow/frozen. >>>> >>>> I believe here the ping command is helping to fullfil the buffer, that's >>>> why running it on parallel makes the VNC work... Indeed. You can confirm this by running "ethool -S" before and after the ping. During the ping tx_reason_timeout should stagnate and probably tx_reason_ndp_full will go up. >>>>> #define TX_TIMEOUT_NSECS 300000 >>>>> 300 us is too small to be noticeable by VNC imho, so I think something >>>>> *must* be misbehaving. >>>>> Perhaps timer resolution is bad and this 300us ends up being much larger...??? Now that you mention it and have taken a closer look I suspect this piece of code: } else if ((n < ctx->tx_max_datagrams) && (ready2send == 0) && (ctx->timer_interval > 0)) { /* wait for more frames */ /* push variables */ ctx->tx_curr_skb = skb_out; /* set the pending count */ if (n < CDC_NCM_RESTART_TIMER_DATAGRAM_CNT) ctx->tx_timer_pending = CDC_NCM_TIMER_PENDING_CNT; goto exit_no_skb; Hiago, could you try lowering CDC_NCM_TIMER_PENDING_CNT, if need be all the way to 1? It is defined in include/linux/usb/cdc_ncm.h as 3 currently This applies to the host side. Regards Oliver ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Question regarding CDC NCM and VNC performance issue 2023-12-12 9:48 ` Oliver Neukum @ 2023-12-12 20:32 ` Hiago De Franco 2023-12-12 21:21 ` Oliver Neukum 0 siblings, 1 reply; 29+ messages in thread From: Hiago De Franco @ 2023-12-12 20:32 UTC (permalink / raw) To: Maciej Żenczykowski, Oliver Neukum Cc: Francesco Dolcini, davem, edumazet, hiago.franco, kuba, linux-usb, pabeni Hi Maciej and Oliver, On Mon, Dec 11, 2023 at 12:44:42PM -0800, Maciej Żenczykowski wrote: > On Mon, Dec 11, 2023 at 12:29 PM Hiago De Franco <hiagofranco@gmail.com> wrote: > > > > On Thu, Dec 07, 2023 at 08:37:09PM +0100, Maciej Żenczykowski wrote: > > > On Thu, Dec 7, 2023 at 7:57 PM Hiago De Franco <hiagofranco@gmail.com> wrote: > > > > > > > > Hello Oliver and Maciej, > > > > > > > > On Thu, Dec 07, 2023 at 10:41:51AM +0100, Oliver Neukum wrote: > > > > > Can you tell us which side in your test case produces many small packets? > > > > > > > > > > Furthermore, for testing purposes, could you decrease TX_TIMEOUT_NSECS in > > > > > f_ncm.c by an order of magnitude. > > > > > > > > The device side, Apalis iMX6, is the one producing the small packages > > > > and sending to my host PC. The VNC server is running on iMX6 device and, > > > > the client, on my host PC. > > > > > > > > I've decreased TX_TIMEOUT_NSECS from 300000 to 30000, but nothing > > > > changed, the behaviour is the same. > > > > > > > > On Thu, Dec 07, 2023 at 12:07:25PM +0100, Oliver Neukum wrote: > > > > > That suggests, but does not prove that the issue is on the host side. > > > > > Could you post the result of "ethtool -S" after a test run? We should > > > > > get statistics on the reasons for transmissions that way. > > > > > > > > Sure, for some reason I couldn't run ethtool on the iMX6 device: > > > > > > > > / # ethtool -S usb0 > > > > no stats available > > > > > > > > So I ran everything on my Debian host PC. First, without any changes on > > > > the device's kernel, this is the result (where the VNC is really > > > > slow/frozen): > > > > > > > > $ sudo ethtool -S enx3e5dcdead75e > > > > NIC statistics: > > > > tx_reason_ntb_full: 0 > > > > tx_reason_ndp_full: 0 > > > > tx_reason_timeout: 222 > > > > tx_reason_max_datagram: 0 > > > > tx_overhead: 42387 > > > > tx_ntbs: 222 > > > > rx_overhead: 38375 > > > > rx_ntbs: 529 > > > > > > > > Next, I decreased NTB_DEFAULT_IN_SIZE and NTB_OUT_SIZE from 16384 to > > > > 8192. The performance improved a bit, and this is the result: > > > > > > > > $ sudo ethtool -S enx42ff68c1000a > > > > NIC statistics: > > > > tx_reason_ntb_full: 0 > > > > tx_reason_ndp_full: 0 > > > > tx_reason_timeout: 321 > > > > tx_reason_max_datagram: 0 > > > > tx_overhead: 61617 > > > > tx_ntbs: 321 > > > > rx_overhead: 59341 > > > > rx_ntbs: 759 > > > > > > > > Finally, I changed from 8192 to 4096, and the perfomance was > > > > better: > > > > > > > > $ sudo ethtool -S enx3a601e306de1 > > > > NIC statistics: > > > > tx_reason_ntb_full: 0 > > > > tx_reason_ndp_full: 0 > > > > tx_reason_timeout: 56067 > > > > tx_reason_max_datagram: 0 > > > > tx_overhead: 83630876 > > > > tx_ntbs: 56064 > > > > rx_overhead: 25437595 > > > > rx_ntbs: 847920 > > > > > > > > At 4096 I can use the VNC with my app, click on buttons and see the mouse > > > > moving smoothly. Please note the device name changes because we're using > > > > random MAC addresses. 'ethtool' was running on my Debian host PC. I tested > > > > for 1min30s and then got the statics with ethtool for all 3 tests. > > > > > > > > On Thu, Dec 07, 2023 at 12:38:18PM +0100, Maciej Żenczykowski wrote: > > > > > > > > > > An every 1s (the default) ping is too rare to be of help I'd assume... > > > > > Try ping with various intervals (-i). All the way down to a flood ping (-f). > > > > > Most likely -i 0.01 would be enough to make things work better... > > > > > > > > This one is interesting, backing to the default value of 16384, I > > > > launched the VNC client which now is back to be slow/frozen. In > > > > parallel, I started the ping command. The first one with 1 second > > > > results the following (I'm pingging my host PC, using my device's > > > > terminal): > > > > > > > > / # ping 192.168.11.2 -i 1 > > > > PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data. > > > > 64 bytes from 192.168.11.2: icmp_seq=1 ttl=64 time=5070 ms > > > > 64 bytes from 192.168.11.2: icmp_seq=2 ttl=64 time=4003 ms > > > > 64 bytes from 192.168.11.2: icmp_seq=3 ttl=64 time=2963 ms > > > > 64 bytes from 192.168.11.2: icmp_seq=4 ttl=64 time=1923 ms > > > > 64 bytes from 192.168.11.2: icmp_seq=5 ttl=64 time=883 ms > > > > ^C > > > > --- 192.168.11.2 ping statistics --- > > > > 26 packets transmitted, 5 received, 80.7692% packet loss, time 25950ms > > > > rtt min/avg/max/mdev = 882.886/2968.250/5069.878/1478.425 ms, pipe 5 > > > > > > > > Ping is really slow and lost almost all packets. > > > > > > It is *slow*. But I don't think you're actually losing packets. > > > They simply haven't arrived *yet*. > > > Notice that the packets are sent every second, but they arrive after > > > ~5/~4/~3/~2/~1 s. > > > This suggests they all got delivered in one burst. > > > > > > You are missing 21 packets, but it's all the *final* 21 packets. > > > You'd need to run for a long long time (maybe 5 minutes or even more) > > > and see what happens then... > > > to see if the packets are indeed lost, or just not yet delivered... > > > > > > > Great, thanks for the explanation. Indeed, I ran the tests again, but > > now I waited 6 minutes for each test, and these are the results: > > > > -i 1 > > --- 192.168.11.2 ping statistics --- > > 349 packets transmitted, 349 received, 0% packet loss, time 361724ms > > rtt min/avg/max/mdev = 1.597/10751.276/27960.848/12974.107 ms, pipe 27 > > > > -i 0.1 > > --- 192.168.11.2 ping statistics --- > > 3301 packets transmitted, 3294 received, 0.212057% packet loss, time 361342ms > > rtt min/avg/max/mdev = 0.959/1131.714/3081.504/1385.261 ms, pipe 29 > > > > -i 0.01 > > --- 192.168.11.2 ping statistics --- > > 20718 packets transmitted, 20718 received, 0% packet loss, time 366465ms > > rtt min/avg/max/mdev = 0.568/178.764/587.956/234.702 ms, pipe 29 > > > > -flood > > --- 192.168.11.2 ping statistics --- > > 29173 packets transmitted, 29149 received, 0.0822679% packet loss, time 362065ms > > rtt min/avg/max/mdev = 0.518/127.472/583.971/204.277 ms, pipe 29, ipg/ewma 12.411/21.570 ms > > > > So yes, we can assume 0% of packet loss. Furthermore, I can see the > > bursts happening on my terminal, the ping command doesn't do anything > > and suddenly all the pings arrive at the same time, over and over again > > in a loop. > > Ok, this is great to know. No loss, just very very high delay. > > > > > Next test, I decreased > > > > to 0.1s: > > > > > > > > / # ping 192.168.11.2 -i 0.1 > > > > PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data. > > > > ... > > > > --- 192.168.11.2 ping statistics --- > > > > 129 packets transmitted, 122 received, 5.42636% packet loss, time 13971ms > > > > rtt min/avg/max/mdev = 1.752/999.274/2751.767/799.248 ms, pipe 26 > > > > > > > > While ping is running in parallel, VNC has a better performance, I can > > > > see my mouse running and click on some buttons. As soon as ping stops, > > > > VNC is slow/frozen again. Also we have less packet loss. > > > > > > Again - this packet loss may be just at the tail. > > > You'd need to run it for much much longer to see if packet loss is high or not. > > > It may just be that you always lose the last ~7 packets or so... > > > > > > > > > > > Next test, decreased to 0.01: > > > > > > > > / # ping 192.168.11.2 -i 0.01 > > > > PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data. > > > > ... > > > > --- 192.168.11.2 ping statistics --- > > > > 584 packets transmitted, 572 received, 2.05479% packet loss, time 10106ms > > > > rtt min/avg/max/mdev = 1.565/171.031/559.872/165.475 ms, pipe 28 > > > > > > > > And finally, the flood: > > > > > > > > / # ping 192.168.11.2 -f > > > > PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data. > > > > ... > > > > --- 192.168.11.2 ping statistics --- > > > > 1314 packets transmitted, 1311 received, 0.22831% packet loss, time 16299ms > > > > rtt min/avg/max/mdev = 0.729/131.710/548.971/163.098 ms, pipe 28, ipg/ewma > > > > 12.413/7.802 ms > > > > > > > > While the flood is happening in parallel, the VNC runs very smoothly, > > > > and, again, as soon as it stops, it's back to slow/frozen. > > > > > > > > I believe here the ping command is helping to fullfil the buffer, that's > > > > why running it on parallel makes the VNC work... > > > > > > > > > Also which specific versions of the kernel are involved on both sides > > > > > of the connection. > > > > > > > > Device iMX6 is running Linux kernel v6.1.65, while my host PC is running > > > > Linux kernel v6.5.0. > > > > > > > > > There was a pretty recent fix related to packet aggregation recently > > > > > that could be either the fix or the cause. > > > > > "usb: gadget: ncm: Handle decoding of multiple NTB's in unwrap call" > > > > > Though I doubt it - I believe that was specific to how windows packs things. > > > > > > > > > > Also Krishna Kurapati has a (afaik still not merged) patch "usb: > > > > > gadget: ncm: Add support to configure wMaxSegmentSize" > > > > > that could be of use - though again, doubt it. > > > > > > > > I could also try to apply these patches and check how it goes. Thanks > > > > for the information. > > > > > > > > > Another thing that comes to mind, is that perhaps the device in > > > > > question does not have sufficiently high res timers? > > > > > There might be something in the kernel boot log / dmesg about hrtimer > > > > > resolution... > > > > > Maybe this just needs to be configurable... Or pick a smaller value > > > > > with broken hrtimer (if that's the issue), > > > > > or just disable aggregation if lowres hrtimers... etc... > > > > > > > > > > #define TX_TIMEOUT_NSECS 300000 > > > > > 300 us is too small to be noticeable by VNC imho, so I think something > > > > > *must* be misbehaving. > > > > > Perhaps timer resolution is bad and this 300us ends up being much larger...??? > > > > > > > > This is what I got from dmesg inside the iMX6 device: > > > > > > > > / # dmesg | grep timer > > > > [ 0.000000] Switching to timer-based delay loop, resolution 333ns > > > > [ 0.000019] clocksource: mxc_timer1: mask: 0xffffffff max_cycles: > > > > 0xffffffff, max_idle_ns: 637086815595 ns > > > > [ 0.001545] Calibrating delay loop (skipped), value calculated using > > > > timer frequency.. 6.00 BogoMIPS (lpj=30000) > > > > [ 0.203469] clocksource: Switched to clocksource mxc_timer1 > > > > > > > > > I wonder if the hrtimer_init() call should be with CLOCK_BOOTTIME > > > > > instead of CLOCK_MONOTONIC. > > > > > There could potentially be an issue with suspend, though I really doubt it. > > > > > > > > Also tested this, but it didn't change anything, VNC is still > > > > slow/frozen. > > > > > > > > > Another idea would be to add a gadget setting to disable tx > > > > > aggregation entirely... > > > > > (note that reducing from 8000 to 2000 doesn't actually turn off aggregation...) > > > > > > > > > > Have you tried reducing from 8000 to 4000 or 3500 or 3000? > > > > > Maybe there's some funkiness with page sizes?? > > > > > > > > Sorry, do you mean reducing the both NTB_DEFAULT_IN_SIZE and NTB_OUT_SIZE? > > > > > > I'm not sure ;-) > > > > > > > If so, I tried to reduce from 16384 to 8192, 4096 and 2048. Everytime I > > > > reduced the value, it got better and better, VNC running smoothly. > > > > > > > > I will keep testing to see if I get something. > > > > > > > > Thanks and regards, > > > > Hiago. > > > > > > I think we need to figure out whether the problem is host->device or > > > device->host. > > > Or perhaps you do already know this?? > > > Maybe this was mentioned already earlier in the thread and I missed it. > > > > > > > Actually I still don't know yet. > > Heh. > > > > > > Try reducing only in or only out... and changing only one side of the > > > connection. > > > > I did this test and it was very interesting, thanks for the > > reccomendation. > > It looks like only NTB_DEFAULT_IN_SIZE is relevant for > > this issue. I changed NTB_DEFAULT_IN_SIZE from 16384 to 2048 on the > > _device_ side (Apalis iMX6) and left the NTB_OUT_SIZE with the default > > value of 16384, and now I can use my VNC client on the host side (move the > > mouse, click some buttons...). Very fast and responsive I would say. > > > > Does it mean the issue is on the HOST side? > > Well... that's what I would assume based on that, but... > > I think what we've really found is a bug: > unsigned max_size = ncm->port.fixed_in_len; > is I think using fixed_in_len instead of using out_len. > > *OR* > > This is the gadget/device side and IN/OUT are always from the point of > view of the host. > ie. IN to the host, so transmitted by the gadget/device. > > It's bound to be one or the other. And while initially I thought the former, > I think it's likely the latter. > I vaguelly recall having run into something with in/out being > host-relative somewhere before. > > > > Maybe change TX_MAX_NUM_DPE to 1 (or other values) > > > > > > Maybe change > > > > > > if (ncm->skb_tx_data > > > && (ncm->ndp_dgram_count >= TX_MAX_NUM_DPE > > > || (ncm->skb_tx_data->len + > > > div + rem + skb->len + > > > ncm->skb_tx_ndp->len + ndp_align + (2 * dgram_idx_len)) > > > > max_size)) { > > > > > > to just if (ncm->skb_tx_data) > > > > Changing the 'if' statement to only 'if (ncm->skb_tx_data)' also made it > > work! VNC running smoothly on my VNC client. Again, I applied this patch > > to the device's kernel, Apalis iMX6. > > Here you modified only the device's tx - basically turning off > buffering/aggregation. > (and likely poorly at that since - I think - this approach still ends > up buffering one packet, > because it seems to send the previous packet when it gets the next one) > > ...and in general it seems more likely something is busted on the > device, then on the host... > > > I will put some print debugging here to see if I can figure out > > something else. > > Everything is pointing towards the timer not firing... > Perhaps try instrumenting the timer callback to see if it ever runs. > ncm_tx_timeout <-- add a printk in here > See if the timer registration functions aren't returning non-caught errors. > hrtimer_try_to_cancel - returns an int - it's not checked. - maybe > this always claims there's no timer to cancel? > but afaict: > hrtimer_start - does not return anything > hrtimer_init - also does not appear to return anything... I was thinking about the same, the timer is not firing at all. I put the printk's as suggested, and the ncm_tx_is being triggered. Basically, when I first turn on my device, the try_to_cancel function is always returning -1, which means "when the timer is currently executing the callback function and cannot be stopped". When I launch my VNC client, a see the following pattern: [ 94.926136] hfranco: hrtimer_start called. [ 94.926259] hfranco: try_to_cancel 1 [ 94.926268] hfranco: hrtimer_start called. [ 94.926391] hfranco: try_to_cancel 1 The hrtimer is called and then try_to_cancel returns 1, which means it was canceled "when the timer was active" (following the function comment). So I guess the hrtimer is somehow working as expected (at least being called). I also calculated the time between the consecutive messages of timer started and trying to cancel the timer, and this is what I got: 94.926259 - 94.926136 = 0.000123 seconds 94.926391 - 94.926268 = 0.000123 seconds 94.927548 - 94.926401 = 0.001147 seconds 94.927818 - 94.927585 = 0.000233 seconds 94.928338 - 94.927836 = 0.000502 seconds 94.928590 - 94.928360 = 0.000230 seconds 94.929330 - 94.928608 = 0.000722 seconds 94.929609 - 94.929363 = 0.000246 seconds I'm not sure how accurate this is, but we can clearly sometimes see it gets more than 300us. On Tue, Dec 12, 2023 at 10:48:01AM +0100, Oliver Neukum wrote: > On 11.12.23 21:44, Maciej Żenczykowski wrote: > > On Mon, Dec 11, 2023 at 12:29 PM Hiago De Franco <hiagofranco@gmail.com> wrote: > > > > > > On Thu, Dec 07, 2023 at 08:37:09PM +0100, Maciej Żenczykowski wrote: > > > > On Thu, Dec 7, 2023 at 7:57 PM Hiago De Franco <hiagofranco@gmail.com> wrote: > > Hi, > > > > > > On Thu, Dec 07, 2023 at 12:07:25PM +0100, Oliver Neukum wrote: > > > > > > That suggests, but does not prove that the issue is on the host side. > > > > > > Could you post the result of "ethtool -S" after a test run? We should > > > > > > get statistics on the reasons for transmissions that way. > > > > > > Finally, I changed from 8192 to 4096, and the perfomance was > > > > > better: > > > > > > > > > > $ sudo ethtool -S enx3a601e306de1 > > > > > NIC statistics: > > > > > tx_reason_ntb_full: 0 > > > > > tx_reason_ndp_full: 0 > > > > > tx_reason_timeout: 56067 > > This has grown two orders of magnitude. > > > > > > tx_reason_max_datagram: 0 > > > > > tx_overhead: 83630876 > > > > > tx_ntbs: 56064 > > > > > rx_overhead: 25437595 > > > > > rx_ntbs: 847920 > > > > > > > > > > At 4096 I can use the VNC with my app, click on buttons and see the mouse > > > > > moving smoothly. Please note the device name changes because we're using > > > > > random MAC addresses. 'ethtool' was running on my Debian host PC. I tested > > > > > for 1min30s and then got the statics with ethtool for all 3 tests. > > As you are testing for a constant time, the increase in transmissions > due to timeouts also decreases latency by two orders of magnitude. > Though this does not ultimately tell us which side is responsible. > > > > > While the flood is happening in parallel, the VNC runs very smoothly, > > > > > and, again, as soon as it stops, it's back to slow/frozen. > > > > > > > > > > I believe here the ping command is helping to fullfil the buffer, that's > > > > > why running it on parallel makes the VNC work... > > Indeed. You can confirm this by running "ethool -S" before and after the ping. > During the ping tx_reason_timeout should stagnate and probably tx_reason_ndp_full > will go up. > > > > > > > #define TX_TIMEOUT_NSECS 300000 > > > > > > 300 us is too small to be noticeable by VNC imho, so I think something > > > > > > *must* be misbehaving. > > > > > > Perhaps timer resolution is bad and this 300us ends up being much larger...??? > > Now that you mention it and have taken a closer look I suspect this piece of code: > > } else if ((n < ctx->tx_max_datagrams) && (ready2send == 0) && (ctx->timer_interval > 0)) { > /* wait for more frames */ > /* push variables */ > ctx->tx_curr_skb = skb_out; > /* set the pending count */ > if (n < CDC_NCM_RESTART_TIMER_DATAGRAM_CNT) > ctx->tx_timer_pending = CDC_NCM_TIMER_PENDING_CNT; > goto exit_no_skb; > > Hiago, could you try lowering CDC_NCM_TIMER_PENDING_CNT, if need be all the way to 1? > It is defined in include/linux/usb/cdc_ncm.h as 3 currently > This applies to the host side. On my side CDC_NCM_TIMER_PENDING_CNT is set to 2 by default, did you mean CDC_NCM_RESTART_TIMER_DATAGRAM_CNT? Despite of that, I tried to lower both CDC_NCM_TIMER_PENDING_CNT and CDC_NCM_RESTART_TIMER_DATAGRAM_CNT all the way down to 1, first the CDC_NCM_TIMER_PENDING_CNT, then CDC_NCM_RESTART_TIMER_DATAGRAM_CNT and finally both at the same time, but it didn't help. I've also put some printks as following: ctx->tx_curr_frame_num = n; printk("hfranco: tx_curr_frame_num = %d", n); if (n == 0) { printk("hfranco: n == 0"); /* wait for more frames */ /* push variables */ ctx->tx_curr_skb = skb_out; goto exit_no_skb; } else if ((n < ctx->tx_max_datagrams) && (ready2send == 0) && (ctx->timer_interval > 0)) { printk("hfranco: tx_max_datagrams = %d", ctx->tx_max_datagrams); printk("hfranco: timer_interval = %d", ctx->timer_interval); printk("hfranco: n inside else if = %d", n); /* wait for more frames */ /* push variables */ ctx->tx_curr_skb = skb_out; /* set the pending count */ if (n < CDC_NCM_RESTART_TIMER_DATAGRAM_CNT) ctx->tx_timer_pending = CDC_NCM_TIMER_PENDING_CNT; goto exit_no_skb; } else { printk("hfranco: n inside else = %d", n); if (n == ctx->tx_max_datagrams) ctx->tx_reason_max_datagram++; /* count reason for transmitting */ I ran it on my host PC, compiled it as module for my Debian dekstop, and this is the dmesg: [ 9663.478807] hfranco: tx_curr_frame_num = 1 [ 9663.478816] hfranco: tx_max_datagrams = 40 [ 9663.478818] hfranco: timer_interval = 400000 [ 9663.478820] hfranco: n inside else if = 1 [ 9663.478822] hfranco: timer started [ 9663.479645] hfranco: tx_curr_frame_num = 1 [ 9663.479652] hfranco: n inside else = 1 And then it basically repeats. Looks like 'n' never passes the 1 value. By tweaking the flags mentioned before, 'n' got a value of 4, but that was the maximum value. I was wondering, why do you think this code looks suspicious? I basically just inserted some printks on the tx side, I will see if I can get something from the rx as well. > > Regards > Oliver Thanks a lot and regards, Hiago. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Question regarding CDC NCM and VNC performance issue 2023-12-12 20:32 ` Hiago De Franco @ 2023-12-12 21:21 ` Oliver Neukum 2023-12-13 1:11 ` Maciej Żenczykowski 0 siblings, 1 reply; 29+ messages in thread From: Oliver Neukum @ 2023-12-12 21:21 UTC (permalink / raw) To: Hiago De Franco, Maciej Żenczykowski, Oliver Neukum Cc: Francesco Dolcini, davem, edumazet, hiago.franco, kuba, linux-usb, pabeni On 12.12.23 21:32, Hiago De Franco wrote: Hi, > On Mon, Dec 11, 2023 at 12:44:42PM -0800, Maciej Żenczykowski wrote: >> On Mon, Dec 11, 2023 at 12:29 PM Hiago De Franco <hiagofranco@gmail.com> wrote: >> Hiago, could you try lowering CDC_NCM_TIMER_PENDING_CNT, if need be all the way to 1? >> It is defined in include/linux/usb/cdc_ncm.h as 3 currently >> This applies to the host side. > > On my side CDC_NCM_TIMER_PENDING_CNT is set to 2 by default, did you > mean CDC_NCM_RESTART_TIMER_DATAGRAM_CNT? Yes, I meant that. Sorry. > Despite of that, I tried to lower both CDC_NCM_TIMER_PENDING_CNT and > CDC_NCM_RESTART_TIMER_DATAGRAM_CNT all the way down to 1, first the > CDC_NCM_TIMER_PENDING_CNT, then CDC_NCM_RESTART_TIMER_DATAGRAM_CNT and > finally both at the same time, but it didn't help. > > I've also put some printks as following: > > ctx->tx_curr_frame_num = n; > printk("hfranco: tx_curr_frame_num = %d", n); > > if (n == 0) { > printk("hfranco: n == 0"); > /* wait for more frames */ > /* push variables */ > ctx->tx_curr_skb = skb_out; > goto exit_no_skb; > > } else if ((n < ctx->tx_max_datagrams) && (ready2send == 0) && (ctx->timer_interval > 0)) { > printk("hfranco: tx_max_datagrams = %d", ctx->tx_max_datagrams); > printk("hfranco: timer_interval = %d", ctx->timer_interval); > printk("hfranco: n inside else if = %d", n); > /* wait for more frames */ > /* push variables */ > ctx->tx_curr_skb = skb_out; > /* set the pending count */ > if (n < CDC_NCM_RESTART_TIMER_DATAGRAM_CNT) > ctx->tx_timer_pending = CDC_NCM_TIMER_PENDING_CNT; > goto exit_no_skb; > > } else { > printk("hfranco: n inside else = %d", n); > if (n == ctx->tx_max_datagrams) > ctx->tx_reason_max_datagram++; /* count reason for transmitting */ > > I ran it on my host PC, compiled it as module for my Debian dekstop, and > this is the dmesg: > > [ 9663.478807] hfranco: tx_curr_frame_num = 1 > [ 9663.478816] hfranco: tx_max_datagrams = 40 > [ 9663.478818] hfranco: timer_interval = 400000 > [ 9663.478820] hfranco: n inside else if = 1 > [ 9663.478822] hfranco: timer started > [ 9663.479645] hfranco: tx_curr_frame_num = 1 > [ 9663.479652] hfranco: n inside else = 1 > > And then it basically repeats. Looks like 'n' never passes the 1 value. > By tweaking the flags mentioned before, 'n' got a value of 4, but that > was the maximum value. I was wondering, why do you think this code looks > suspicious? I basically just inserted some printks on the tx side, I > will see if I can get something from the rx as well. If we look at the statistics you initially gathered, we can see that all transmissions on the host side happen because the timeout elapses. That, however, does _not_ tell us that the host is to blame. We could look at two possible scenarios A - the gadget is bundling up the packets with too much delay and the host just answers to the megatransmissions with one packet and the delay on the host is inconsequential B - the timer on the host runs for too long or sometimes not at all. If that were the case that code I pointed out would be most likely to blame Could I suggest we try to localize the issue? Can you ping the host from the device? Regards Oliver ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Question regarding CDC NCM and VNC performance issue 2023-12-12 21:21 ` Oliver Neukum @ 2023-12-13 1:11 ` Maciej Żenczykowski 2023-12-15 21:20 ` Hiago De Franco 0 siblings, 1 reply; 29+ messages in thread From: Maciej Żenczykowski @ 2023-12-13 1:11 UTC (permalink / raw) To: Oliver Neukum Cc: Hiago De Franco, Francesco Dolcini, davem, edumazet, hiago.franco, kuba, linux-usb, pabeni On Tue, Dec 12, 2023 at 1:21 PM Oliver Neukum <oneukum@suse.com> wrote: > > On 12.12.23 21:32, Hiago De Franco wrote: > > Hi, > > > On Mon, Dec 11, 2023 at 12:44:42PM -0800, Maciej Żenczykowski wrote: > >> On Mon, Dec 11, 2023 at 12:29 PM Hiago De Franco <hiagofranco@gmail.com> wrote: > > >> Hiago, could you try lowering CDC_NCM_TIMER_PENDING_CNT, if need be all the way to 1? > >> It is defined in include/linux/usb/cdc_ncm.h as 3 currently > >> This applies to the host side. > > > > On my side CDC_NCM_TIMER_PENDING_CNT is set to 2 by default, did you > > mean CDC_NCM_RESTART_TIMER_DATAGRAM_CNT? > > Yes, I meant that. Sorry. > > > Despite of that, I tried to lower both CDC_NCM_TIMER_PENDING_CNT and > > CDC_NCM_RESTART_TIMER_DATAGRAM_CNT all the way down to 1, first the > > CDC_NCM_TIMER_PENDING_CNT, then CDC_NCM_RESTART_TIMER_DATAGRAM_CNT and > > finally both at the same time, but it didn't help. > > > > I've also put some printks as following: > > > > ctx->tx_curr_frame_num = n; > > printk("hfranco: tx_curr_frame_num = %d", n); > > > > if (n == 0) { > > printk("hfranco: n == 0"); > > /* wait for more frames */ > > /* push variables */ > > ctx->tx_curr_skb = skb_out; > > goto exit_no_skb; > > > > } else if ((n < ctx->tx_max_datagrams) && (ready2send == 0) && (ctx->timer_interval > 0)) { > > printk("hfranco: tx_max_datagrams = %d", ctx->tx_max_datagrams); > > printk("hfranco: timer_interval = %d", ctx->timer_interval); > > printk("hfranco: n inside else if = %d", n); > > /* wait for more frames */ > > /* push variables */ > > ctx->tx_curr_skb = skb_out; > > /* set the pending count */ > > if (n < CDC_NCM_RESTART_TIMER_DATAGRAM_CNT) > > ctx->tx_timer_pending = CDC_NCM_TIMER_PENDING_CNT; > > goto exit_no_skb; > > > > } else { > > printk("hfranco: n inside else = %d", n); > > if (n == ctx->tx_max_datagrams) > > ctx->tx_reason_max_datagram++; /* count reason for transmitting */ > > > > I ran it on my host PC, compiled it as module for my Debian dekstop, and Shouldn't you be doing this on the gadget side? I thought we were thinking it was the gadget transmit timer having issues. > > this is the dmesg: > > > > [ 9663.478807] hfranco: tx_curr_frame_num = 1 > > [ 9663.478816] hfranco: tx_max_datagrams = 40 > > [ 9663.478818] hfranco: timer_interval = 400000 > > [ 9663.478820] hfranco: n inside else if = 1 > > [ 9663.478822] hfranco: timer started > > [ 9663.479645] hfranco: tx_curr_frame_num = 1 > > [ 9663.479652] hfranco: n inside else = 1 > > > > And then it basically repeats. Looks like 'n' never passes the 1 value. > > By tweaking the flags mentioned before, 'n' got a value of 4, but that > > was the maximum value. I was wondering, why do you think this code looks > > suspicious? I basically just inserted some printks on the tx side, I > > will see if I can get something from the rx as well. > > If we look at the statistics you initially gathered, we can see that all transmissions > on the host side happen because the timeout elapses. That, however, does > _not_ tell us that the host is to blame. We could look at two possible scenarios > > A - the gadget is bundling up the packets with too much delay and the host > just answers to the megatransmissions with one packet and the delay on the host > is inconsequential > > B - the timer on the host runs for too long or sometimes not at all. If that were > the case that code I pointed out would be most likely to blame > > Could I suggest we try to localize the issue? Can you ping the host from the device? > > Regards > Oliver -- Maciej Żenczykowski, Kernel Networking Developer @ Google ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Question regarding CDC NCM and VNC performance issue 2023-12-13 1:11 ` Maciej Żenczykowski @ 2023-12-15 21:20 ` Hiago De Franco 2023-12-15 21:44 ` Maciej Żenczykowski 0 siblings, 1 reply; 29+ messages in thread From: Hiago De Franco @ 2023-12-15 21:20 UTC (permalink / raw) To: Oliver Neukum, Maciej Żenczykowski Cc: Francesco Dolcini, davem, edumazet, hiago.franco, kuba, linux-usb, pabeni Hi Oliver and Maciej, Sorry for the delay on my message. I took some time to set up some debugging tools, hoping to find something more interesting. I couldn't find too much I guess, but I will describe the results here. First, I set up ftrace to track 'package_for_tx' function. My idea would be that I would see the delays between two consecutives calls of this function being more than 300us, this would say the code wasn't respecting the hrtimer. The trace log looks something like that: # tracer: function # # entries-in-buffer/entries-written: 468/468 #P:2 # # _-----=> irqs-off/BH-disabled # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> migrate-disable # |||| / delay # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | tezi-237 [001] dns.. 101.030667: package_for_tx <-ncm_wrap_ntb <idle>-0 [000] d.s.. 101.032771: package_for_tx <-ncm_wrap_ntb weston-228 [001] d.s.. 101.039430: package_for_tx <-ncm_wrap_ntb tezi-237 [000] d.s.. 101.041634: package_for_tx <-ncm_wrap_ntb <idle>-0 [001] d.s.. 101.042326: package_for_tx <-ncm_wrap_ntb <idle>-0 [001] d.s.. 101.055665: package_for_tx <-ncm_wrap_ntb weston-231 [001] d.s.. 101.057779: package_for_tx <-ncm_wrap_ntb <idle>-0 [001] d.s.. 101.068630: package_for_tx <-ncm_wrap_ntb weston-231 [001] d.s.. 101.085738: package_for_tx <-ncm_wrap_ntb tezi-237 [000] D.s.. 101.088611: package_for_tx <-ncm_wrap_ntb So I created a simple script to calculate the delta time between two consecutive calls and give me at the end how many of them are above 300us. This is the result: Total timestamps above 300 microseconds: 440 of 742 Percentage of timestamps above 300 microseconds: 59.30% Then I decided to lower down the TX_MAX_NUM_DPE from 32 to 0. 0 and 4 made the VNC runs smooth. I've also tested 8 and 16, but these two didn't help. Again, by setting the number to 0, this is the result: Total timestamps above 300 microseconds: 3149 of 65420 Percentage of timestamps above 300 microseconds: 4.81% Of course we also have more packages, as the VNC is working and it's not frozen. It's interesting how fast this is now, but as discussed with a coleague, we don't know if this tells much. There are more latencies envolved, for example from the scheduler, so probably the result is not showing me anything valuable. My second test was trying to figure out how TX_MAX_NUM_DPE was impacting my issue, so I added two printks to check the number it was getting and when it was being zeroed. /* Delay for the transmit to wait before sending an unfilled NTB frame. */ #define TX_TIMEOUT_NSECS 300000 @@ -964,6 +964,7 @@ static struct sk_buff *package_for_tx(struct f_ncm *ncm) /* Set the final NDP wLength */ new_len = opts->ndp_size + (ncm->ndp_dgram_count * dgram_idx_len); + printk("hfranco: package_for_tx called, setting ndp_dram_count to 0"); ncm->ndp_dgram_count = 0; /* Increment from start to wLength */ ntb_iter = (void *) ncm->skb_tx_ndp->data; @@ -1027,6 +1028,8 @@ static struct sk_buff *ncm_wrap_ntb(struct gether *port, * ready for new data. * NOTE: Assume maximum align for speed of calculation. */ + + printk("hfranco: ncm->ndp_dgram_count = %d", ncm->ndp_dgram_count); if (ncm->skb_tx_data && (ncm->ndp_dgram_count >= TX_MAX_NUM_DPE || (ncm->skb_tx_data->len + This is the result: [ 6613.034304] hfranco: package_for_tx called, setting ndp_dram_count to 0 [ 6613.034783] hfranco: ncm->ndp_dgram_count = 0 [ 6613.034812] hfranco: ncm->ndp_dgram_count = 2 [ 6613.034823] hfranco: ncm->ndp_dgram_count = 3 [ 6613.034835] hfranco: ncm->ndp_dgram_count = 4 [ 6613.034844] hfranco: ncm->ndp_dgram_count = 5 [ 6613.034855] hfranco: ncm->ndp_dgram_count = 6 [ 6613.034865] hfranco: ncm->ndp_dgram_count = 7 [ 6613.034874] hfranco: ncm->ndp_dgram_count = 8 [ 6613.034884] hfranco: ncm->ndp_dgram_count = 9 [ 6613.034895] hfranco: ncm->ndp_dgram_count = 10 [ 6613.034905] hfranco: ncm->ndp_dgram_count = 11 [ 6613.034914] hfranco: package_for_tx called, setting ndp_dram_count to 0 [ 6613.035266] hfranco: ncm->ndp_dgram_count = 2 [ 6613.035281] hfranco: ncm->ndp_dgram_count = 3 [ 6613.035292] hfranco: ncm->ndp_dgram_count = 4 [ 6613.035301] hfranco: ncm->ndp_dgram_count = 5 [ 6613.035310] hfranco: ncm->ndp_dgram_count = 6 [ 6613.035320] hfranco: ncm->ndp_dgram_count = 7 [ 6613.035329] hfranco: ncm->ndp_dgram_count = 8 [ 6613.035339] hfranco: ncm->ndp_dgram_count = 9 [ 6613.035350] hfranco: ncm->ndp_dgram_count = 10 [ 6613.035360] hfranco: ncm->ndp_dgram_count = 11 [ 6613.035366] hfranco: package_for_tx called, setting ndp_dram_count to 0 So, in general the value never passes 10 or 11, from all the test I done with the VNC. As said earlier, by putting 4 made the VNC work: [ 17.333014] hfranco: package_for_tx called, setting ndp_dram_count to 0 [ 17.333065] hfranco: ncm->ndp_dgram_count = 2 [ 17.333079] hfranco: ncm->ndp_dgram_count = 3 [ 17.333105] hfranco: ncm->ndp_dgram_count = 4 [ 17.333114] hfranco: package_for_tx called, setting ndp_dram_count to 0 [ 17.333161] hfranco: ncm->ndp_dgram_count = 2 [ 17.333173] hfranco: ncm->ndp_dgram_count = 3 [ 17.333184] hfranco: ncm->ndp_dgram_count = 4 [ 17.333191] hfranco: package_for_tx called, setting ndp_dram_count to 0 [ 17.333235] hfranco: ncm->ndp_dgram_count = 2 [ 17.333259] hfranco: ncm->ndp_dgram_count = 3 [ 17.333271] hfranco: ncm->ndp_dgram_count = 4 On Tue, Dec 12, 2023 at 10:21:13PM +0100, Oliver Neukum wrote: > On 12.12.23 21:32, Hiago De Franco wrote: > > Hi, > > > On Mon, Dec 11, 2023 at 12:44:42PM -0800, Maciej Żenczykowski wrote: > > > On Mon, Dec 11, 2023 at 12:29 PM Hiago De Franco <hiagofranco@gmail.com> wrote: > > > > Hiago, could you try lowering CDC_NCM_TIMER_PENDING_CNT, if need be all the way to 1? > > > It is defined in include/linux/usb/cdc_ncm.h as 3 currently > > > This applies to the host side. > > > > On my side CDC_NCM_TIMER_PENDING_CNT is set to 2 by default, did you > > mean CDC_NCM_RESTART_TIMER_DATAGRAM_CNT? > > Yes, I meant that. Sorry. > > > Despite of that, I tried to lower both CDC_NCM_TIMER_PENDING_CNT and > > CDC_NCM_RESTART_TIMER_DATAGRAM_CNT all the way down to 1, first the > > CDC_NCM_TIMER_PENDING_CNT, then CDC_NCM_RESTART_TIMER_DATAGRAM_CNT and > > finally both at the same time, but it didn't help. > > > > I've also put some printks as following: > > > > ctx->tx_curr_frame_num = n; > > printk("hfranco: tx_curr_frame_num = %d", n); > > > > if (n == 0) { > > printk("hfranco: n == 0"); > > /* wait for more frames */ > > /* push variables */ > > ctx->tx_curr_skb = skb_out; > > goto exit_no_skb; > > > > } else if ((n < ctx->tx_max_datagrams) && (ready2send == 0) && (ctx->timer_interval > 0)) { > > printk("hfranco: tx_max_datagrams = %d", ctx->tx_max_datagrams); > > printk("hfranco: timer_interval = %d", ctx->timer_interval); > > printk("hfranco: n inside else if = %d", n); > > /* wait for more frames */ > > /* push variables */ > > ctx->tx_curr_skb = skb_out; > > /* set the pending count */ > > if (n < CDC_NCM_RESTART_TIMER_DATAGRAM_CNT) > > ctx->tx_timer_pending = CDC_NCM_TIMER_PENDING_CNT; > > goto exit_no_skb; > > > > } else { > > printk("hfranco: n inside else = %d", n); > > if (n == ctx->tx_max_datagrams) > > ctx->tx_reason_max_datagram++; /* count reason for transmitting */ > > > > I ran it on my host PC, compiled it as module for my Debian dekstop, and > > this is the dmesg: > > > > [ 9663.478807] hfranco: tx_curr_frame_num = 1 > > [ 9663.478816] hfranco: tx_max_datagrams = 40 > > [ 9663.478818] hfranco: timer_interval = 400000 > > [ 9663.478820] hfranco: n inside else if = 1 > > [ 9663.478822] hfranco: timer started > > [ 9663.479645] hfranco: tx_curr_frame_num = 1 > > [ 9663.479652] hfranco: n inside else = 1 > > > > And then it basically repeats. Looks like 'n' never passes the 1 value. > > By tweaking the flags mentioned before, 'n' got a value of 4, but that > > was the maximum value. I was wondering, why do you think this code looks > > suspicious? I basically just inserted some printks on the tx side, I > > will see if I can get something from the rx as well. > > If we look at the statistics you initially gathered, we can see that all transmissions > on the host side happen because the timeout elapses. That, however, does > _not_ tell us that the host is to blame. We could look at two possible scenarios > > A - the gadget is bundling up the packets with too much delay and the host > just answers to the megatransmissions with one packet and the delay on the host > is inconsequential > > B - the timer on the host runs for too long or sometimes not at all. If that were > the case that code I pointed out would be most likely to blame > > Could I suggest we try to localize the issue? Can you ping the host from the device? Lastly, about Oliver's question, I tried to send an UDP package using netcat from host to device and then from device to host. I was sending one single packet in loop with 1 second interval first, then tried with 0.5 second and 0.1 second. My hope was, with this test, detect who was failing, host or device. For 1 second, it pretty much works on both directions. As example, this is the output of my host PC (sendind packages with the device -> "echo 'hi' | nc -u 192.168.11.2 5201"): $ sudo tcpdump -i enx9627798c3fa3 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on enx9627798c3fa3, link-type EN10MB (Ethernet), snapshot length 262144 bytes 17:47:51.865967 IP apalis-imx6-10692086.local.54882 > hdebian.5201: UDP, length 3 17:47:51.866051 IP hdebian > apalis-imx6-10692086.local: ICMP hdebian udp port 5201 unreachable, length 39 17:47:52.145987 IP hdebian.mdns > mdns.mcast.net.mdns: 0 [b2&3=0x200] [2a] PTR (QM)? 1.11.168.192.in-addr.arpa. (134) 17:47:52.146020 IP hdebian.mdns > mdns.mcast.net.mdns: 0 [0q] (12) 17:47:52.198689 IP apalis-imx6-10692086.local.mdns > mdns.mcast.net.mdns: 0*- [0q] 1/0/0 (Cache flush) PTR apalis-imx6-10692086.local. (77) 17:47:52.882532 IP apalis-imx6-10692086.local.37745 > hdebian.5201: UDP, length 3 17:47:52.882634 IP hdebian > apalis-imx6-10692086.local: ICMP hdebian udp port 5201 unreachable, length 39 17:47:53.903508 IP apalis-imx6-10692086.local.38147 > hdebian.5201: UDP, length 3 17:47:53.903585 IP hdebian > apalis-imx6-10692086.local: ICMP hdebian udp port 5201 unreachable, length 39 17:47:54.920174 IP apalis-imx6-10692086.local.60718 > hdebian.5201: UDP, length 3 17:47:54.920265 IP hdebian > apalis-imx6-10692086.local: ICMP hdebian udp port 5201 unreachable, length 39 17:47:55.940687 IP apalis-imx6-10692086.local.38729 > hdebian.5201: UDP, length 3 17:47:55.940743 IP hdebian > apalis-imx6-10692086.local: ICMP hdebian udp port 5201 unreachable, length 39 17:47:56.913135 ARP, Request who-has apalis-imx6-10692086.local tell hdebian, length 28 17:47:56.915052 ARP, Reply apalis-imx6-10692086.local is-at ba:6d:d0:0a:95:6b (oui Unknown), length 28 17:47:56.956927 IP apalis-imx6-10692086.local.54649 > hdebian.5201: UDP, length 3 17:47:56.956966 IP hdebian > apalis-imx6-10692086.local: ICMP hdebian udp port 5201 unreachable, length 39 17:47:57.980879 IP apalis-imx6-10692086.local.53366 > hdebian.5201: UDP, length 3 17:47:57.980984 IP hdebian > apalis-imx6-10692086.local: ICMP hdebian udp port 5201 unreachable, length 39 When I lower down the time between two packages from 1 second to 0.5 and 0.1, it suddenly freezes. It happens on both directions, host to device and device to host. As example, this is the host send UDP packets to the device with 0.1s interval: / # tcpdump -i usb0 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on usb0, link-type EN10MB (Ethernet), snapshot length 262144 bytes 18:39:53.662594 IP 192.168.11.2.54802 > 192.168.11.1.5201: UDP, length 3 18:39:53.662716 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39 18:39:53.768678 IP 192.168.11.2.49080 > 192.168.11.1.5201: UDP, length 3 18:39:53.768767 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39 18:39:53.875305 IP 192.168.11.2.56938 > 192.168.11.1.5201: UDP, length 3 18:39:53.875376 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39 18:39:53.982238 IP 192.168.11.2.40177 > 192.168.11.1.5201: UDP, length 3 18:39:53.982354 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39 18:39:54.089279 IP 192.168.11.2.43284 > 192.168.11.1.5201: UDP, length 3 18:39:54.089378 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39 18:39:54.196027 IP 192.168.11.2.41990 > 192.168.11.1.5201: UDP, length 3 18:39:54.196105 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39 18:39:54.303793 IP 192.168.11.2.42362 > 192.168.11.1.5201: UDP, length 3 18:39:58.664608 ARP, Request who-has 192.168.11.2 tell 192.168.11.1, length 28 18:39:58.666781 ARP, Reply 192.168.11.2 is-at ea:2a:bc:40:41:30 (oui Unknown), length 28 18:39:58.669531 ARP, Request who-has 192.168.11.1 tell 192.168.11.2, length 28 18:39:58.669582 ARP, Reply 192.168.11.1 is-at 1e:be:d4:00:51:0e (oui Unknown), length 28 It stops after this ARP request. Sometimes it send packages again, but after a few seconds it freezes. So, at the end, I couldn't figure out where the issue is... > > Regards > Oliver On Tue, Dec 12, 2023 at 05:11:49PM -0800, Maciej Żenczykowski wrote: > On Tue, Dec 12, 2023 at 1:21 PM Oliver Neukum <oneukum@suse.com> wrote: > > > > On 12.12.23 21:32, Hiago De Franco wrote: > > > > Hi, > > > > > On Mon, Dec 11, 2023 at 12:44:42PM -0800, Maciej Żenczykowski wrote: > > >> On Mon, Dec 11, 2023 at 12:29 PM Hiago De Franco <hiagofranco@gmail.com> wrote: > > > > >> Hiago, could you try lowering CDC_NCM_TIMER_PENDING_CNT, if need be all the way to 1? > > >> It is defined in include/linux/usb/cdc_ncm.h as 3 currently > > >> This applies to the host side. > > > > > > On my side CDC_NCM_TIMER_PENDING_CNT is set to 2 by default, did you > > > mean CDC_NCM_RESTART_TIMER_DATAGRAM_CNT? > > > > Yes, I meant that. Sorry. > > > > > Despite of that, I tried to lower both CDC_NCM_TIMER_PENDING_CNT and > > > CDC_NCM_RESTART_TIMER_DATAGRAM_CNT all the way down to 1, first the > > > CDC_NCM_TIMER_PENDING_CNT, then CDC_NCM_RESTART_TIMER_DATAGRAM_CNT and > > > finally both at the same time, but it didn't help. > > > > > > I've also put some printks as following: > > > > > > ctx->tx_curr_frame_num = n; > > > printk("hfranco: tx_curr_frame_num = %d", n); > > > > > > if (n == 0) { > > > printk("hfranco: n == 0"); > > > /* wait for more frames */ > > > /* push variables */ > > > ctx->tx_curr_skb = skb_out; > > > goto exit_no_skb; > > > > > > } else if ((n < ctx->tx_max_datagrams) && (ready2send == 0) && (ctx->timer_interval > 0)) { > > > printk("hfranco: tx_max_datagrams = %d", ctx->tx_max_datagrams); > > > printk("hfranco: timer_interval = %d", ctx->timer_interval); > > > printk("hfranco: n inside else if = %d", n); > > > /* wait for more frames */ > > > /* push variables */ > > > ctx->tx_curr_skb = skb_out; > > > /* set the pending count */ > > > if (n < CDC_NCM_RESTART_TIMER_DATAGRAM_CNT) > > > ctx->tx_timer_pending = CDC_NCM_TIMER_PENDING_CNT; > > > goto exit_no_skb; > > > > > > } else { > > > printk("hfranco: n inside else = %d", n); > > > if (n == ctx->tx_max_datagrams) > > > ctx->tx_reason_max_datagram++; /* count reason for transmitting */ > > > > > > I ran it on my host PC, compiled it as module for my Debian dekstop, and > > Shouldn't you be doing this on the gadget side? > I thought we were thinking it was the gadget transmit timer having issues. Because I still don't know where the issue is, I was testing both sides. > > > > this is the dmesg: > > > > > > [ 9663.478807] hfranco: tx_curr_frame_num = 1 > > > [ 9663.478816] hfranco: tx_max_datagrams = 40 > > > [ 9663.478818] hfranco: timer_interval = 400000 > > > [ 9663.478820] hfranco: n inside else if = 1 > > > [ 9663.478822] hfranco: timer started > > > [ 9663.479645] hfranco: tx_curr_frame_num = 1 > > > [ 9663.479652] hfranco: n inside else = 1 > > > > > > And then it basically repeats. Looks like 'n' never passes the 1 value. > > > By tweaking the flags mentioned before, 'n' got a value of 4, but that > > > was the maximum value. I was wondering, why do you think this code looks > > > suspicious? I basically just inserted some printks on the tx side, I > > > will see if I can get something from the rx as well. > > > > If we look at the statistics you initially gathered, we can see that all transmissions > > on the host side happen because the timeout elapses. That, however, does > > _not_ tell us that the host is to blame. We could look at two possible scenarios > > > > A - the gadget is bundling up the packets with too much delay and the host > > just answers to the megatransmissions with one packet and the delay on the host > > is inconsequential > > > > B - the timer on the host runs for too long or sometimes not at all. If that were > > the case that code I pointed out would be most likely to blame > > > > Could I suggest we try to localize the issue? Can you ping the host from the device? > > > > Regards > > Oliver > > -- > Maciej Żenczykowski, Kernel Networking Developer @ Google Sorry if I couldn't provide enough information, I'm kind of learning trought the process. The TX_MAX_NUM_DPE is something I'm not fully convinced, but again I will keep debugging. Best regards, Hiago. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Question regarding CDC NCM and VNC performance issue 2023-12-15 21:20 ` Hiago De Franco @ 2023-12-15 21:44 ` Maciej Żenczykowski 2023-12-18 14:34 ` Oliver Neukum 0 siblings, 1 reply; 29+ messages in thread From: Maciej Żenczykowski @ 2023-12-15 21:44 UTC (permalink / raw) To: Hiago De Franco Cc: Oliver Neukum, Francesco Dolcini, davem, edumazet, hiago.franco, kuba, linux-usb, pabeni On Fri, Dec 15, 2023 at 1:20 PM Hiago De Franco <hiagofranco@gmail.com> wrote: > > Hi Oliver and Maciej, > > Sorry for the delay on my message. I took some time to set up some > debugging tools, hoping to find something more interesting. I couldn't > find too much I guess, but I will describe the results here. No worries. > First, I set up ftrace to track 'package_for_tx' function. My idea would > be that I would see the delays between two consecutives calls of this > function being more than 300us, this would say the code wasn't > respecting the hrtimer. The trace log looks something like that: Here's my recommendation. On the gadget side. Print something when the timer is armed. Print something when the timer fires. Include the timestamp. Then try to send something (a single packet) that won't trigger a reply. For example run a script in a screen session that sends 1 unsolicited arp reply packet every 10 seconds. This is rare enough, it should obviously get the timer to fire. Again, wait 5 minutes, so you've had 30+ packets sent. See if it does even fire... You can run tcpdump on the receiver to see what you receive (and the timestamps on that) > > # tracer: function > # > # entries-in-buffer/entries-written: 468/468 #P:2 > # > # _-----=> irqs-off/BH-disabled > # / _----=> need-resched > # | / _---=> hardirq/softirq > # || / _--=> preempt-depth > # ||| / _-=> migrate-disable > # |||| / delay > # TASK-PID CPU# ||||| TIMESTAMP FUNCTION > # | | | ||||| | | > tezi-237 [001] dns.. 101.030667: package_for_tx <-ncm_wrap_ntb > <idle>-0 [000] d.s.. 101.032771: package_for_tx <-ncm_wrap_ntb > weston-228 [001] d.s.. 101.039430: package_for_tx <-ncm_wrap_ntb > tezi-237 [000] d.s.. 101.041634: package_for_tx <-ncm_wrap_ntb > <idle>-0 [001] d.s.. 101.042326: package_for_tx <-ncm_wrap_ntb > <idle>-0 [001] d.s.. 101.055665: package_for_tx <-ncm_wrap_ntb > weston-231 [001] d.s.. 101.057779: package_for_tx <-ncm_wrap_ntb > <idle>-0 [001] d.s.. 101.068630: package_for_tx <-ncm_wrap_ntb > weston-231 [001] d.s.. 101.085738: package_for_tx <-ncm_wrap_ntb > tezi-237 [000] D.s.. 101.088611: package_for_tx <-ncm_wrap_ntb > > So I created a simple script to calculate the delta time between two > consecutive calls and give me at the end how many of them are above > 300us. This is the result: > > Total timestamps above 300 microseconds: 440 of 742 > Percentage of timestamps above 300 microseconds: 59.30% > > Then I decided to lower down the TX_MAX_NUM_DPE from 32 to 0. > 0 and 4 made the VNC runs smooth. I've also tested 8 and 16, but these > two didn't help. Again, by setting the number to 0, this is the result: > > Total timestamps above 300 microseconds: 3149 of 65420 > Percentage of timestamps above 300 microseconds: 4.81% > > Of course we also have more packages, as the VNC is working and it's not > frozen. It's interesting how fast this is now, but as discussed with a > coleague, we don't know if this tells much. There are more latencies > envolved, for example from the scheduler, so probably the result is not > showing me anything valuable. > > My second test was trying to figure out how TX_MAX_NUM_DPE was > impacting my issue, so I added two printks to check the number it was > getting and when it was being zeroed. > > /* Delay for the transmit to wait before sending an unfilled NTB frame. */ > #define TX_TIMEOUT_NSECS 300000 > @@ -964,6 +964,7 @@ static struct sk_buff *package_for_tx(struct f_ncm *ncm) > /* Set the final NDP wLength */ > new_len = opts->ndp_size + > (ncm->ndp_dgram_count * dgram_idx_len); > + printk("hfranco: package_for_tx called, setting ndp_dram_count to 0"); > ncm->ndp_dgram_count = 0; > /* Increment from start to wLength */ > ntb_iter = (void *) ncm->skb_tx_ndp->data; > @@ -1027,6 +1028,8 @@ static struct sk_buff *ncm_wrap_ntb(struct gether *port, > * ready for new data. > * NOTE: Assume maximum align for speed of calculation. > */ > + > + printk("hfranco: ncm->ndp_dgram_count = %d", ncm->ndp_dgram_count); > if (ncm->skb_tx_data > && (ncm->ndp_dgram_count >= TX_MAX_NUM_DPE > || (ncm->skb_tx_data->len + > > This is the result: > > [ 6613.034304] hfranco: package_for_tx called, setting ndp_dram_count to 0 > [ 6613.034783] hfranco: ncm->ndp_dgram_count = 0 > [ 6613.034812] hfranco: ncm->ndp_dgram_count = 2 > [ 6613.034823] hfranco: ncm->ndp_dgram_count = 3 > [ 6613.034835] hfranco: ncm->ndp_dgram_count = 4 > [ 6613.034844] hfranco: ncm->ndp_dgram_count = 5 > [ 6613.034855] hfranco: ncm->ndp_dgram_count = 6 > [ 6613.034865] hfranco: ncm->ndp_dgram_count = 7 > [ 6613.034874] hfranco: ncm->ndp_dgram_count = 8 > [ 6613.034884] hfranco: ncm->ndp_dgram_count = 9 > [ 6613.034895] hfranco: ncm->ndp_dgram_count = 10 > [ 6613.034905] hfranco: ncm->ndp_dgram_count = 11 > [ 6613.034914] hfranco: package_for_tx called, setting ndp_dram_count to 0 > [ 6613.035266] hfranco: ncm->ndp_dgram_count = 2 > [ 6613.035281] hfranco: ncm->ndp_dgram_count = 3 > [ 6613.035292] hfranco: ncm->ndp_dgram_count = 4 > [ 6613.035301] hfranco: ncm->ndp_dgram_count = 5 > [ 6613.035310] hfranco: ncm->ndp_dgram_count = 6 > [ 6613.035320] hfranco: ncm->ndp_dgram_count = 7 > [ 6613.035329] hfranco: ncm->ndp_dgram_count = 8 > [ 6613.035339] hfranco: ncm->ndp_dgram_count = 9 > [ 6613.035350] hfranco: ncm->ndp_dgram_count = 10 > [ 6613.035360] hfranco: ncm->ndp_dgram_count = 11 > [ 6613.035366] hfranco: package_for_tx called, setting ndp_dram_count to 0 > > So, in general the value never passes 10 or 11, from all the test I done > with the VNC. As said earlier, by putting 4 made the VNC work: > > [ 17.333014] hfranco: package_for_tx called, setting ndp_dram_count to 0 > [ 17.333065] hfranco: ncm->ndp_dgram_count = 2 > [ 17.333079] hfranco: ncm->ndp_dgram_count = 3 > [ 17.333105] hfranco: ncm->ndp_dgram_count = 4 > [ 17.333114] hfranco: package_for_tx called, setting ndp_dram_count to 0 > [ 17.333161] hfranco: ncm->ndp_dgram_count = 2 > [ 17.333173] hfranco: ncm->ndp_dgram_count = 3 > [ 17.333184] hfranco: ncm->ndp_dgram_count = 4 > [ 17.333191] hfranco: package_for_tx called, setting ndp_dram_count to 0 > [ 17.333235] hfranco: ncm->ndp_dgram_count = 2 > [ 17.333259] hfranco: ncm->ndp_dgram_count = 3 > [ 17.333271] hfranco: ncm->ndp_dgram_count = 4 > > On Tue, Dec 12, 2023 at 10:21:13PM +0100, Oliver Neukum wrote: > > On 12.12.23 21:32, Hiago De Franco wrote: > > > > Hi, > > > > > On Mon, Dec 11, 2023 at 12:44:42PM -0800, Maciej Żenczykowski wrote: > > > > On Mon, Dec 11, 2023 at 12:29 PM Hiago De Franco <hiagofranco@gmail.com> wrote: > > > > > > Hiago, could you try lowering CDC_NCM_TIMER_PENDING_CNT, if need be all the way to 1? > > > > It is defined in include/linux/usb/cdc_ncm.h as 3 currently > > > > This applies to the host side. > > > > > > On my side CDC_NCM_TIMER_PENDING_CNT is set to 2 by default, did you > > > mean CDC_NCM_RESTART_TIMER_DATAGRAM_CNT? > > > > Yes, I meant that. Sorry. > > > > > Despite of that, I tried to lower both CDC_NCM_TIMER_PENDING_CNT and > > > CDC_NCM_RESTART_TIMER_DATAGRAM_CNT all the way down to 1, first the > > > CDC_NCM_TIMER_PENDING_CNT, then CDC_NCM_RESTART_TIMER_DATAGRAM_CNT and > > > finally both at the same time, but it didn't help. > > > > > > I've also put some printks as following: > > > > > > ctx->tx_curr_frame_num = n; > > > printk("hfranco: tx_curr_frame_num = %d", n); > > > > > > if (n == 0) { > > > printk("hfranco: n == 0"); > > > /* wait for more frames */ > > > /* push variables */ > > > ctx->tx_curr_skb = skb_out; > > > goto exit_no_skb; > > > > > > } else if ((n < ctx->tx_max_datagrams) && (ready2send == 0) && (ctx->timer_interval > 0)) { > > > printk("hfranco: tx_max_datagrams = %d", ctx->tx_max_datagrams); > > > printk("hfranco: timer_interval = %d", ctx->timer_interval); > > > printk("hfranco: n inside else if = %d", n); > > > /* wait for more frames */ > > > /* push variables */ > > > ctx->tx_curr_skb = skb_out; > > > /* set the pending count */ > > > if (n < CDC_NCM_RESTART_TIMER_DATAGRAM_CNT) > > > ctx->tx_timer_pending = CDC_NCM_TIMER_PENDING_CNT; > > > goto exit_no_skb; > > > > > > } else { > > > printk("hfranco: n inside else = %d", n); > > > if (n == ctx->tx_max_datagrams) > > > ctx->tx_reason_max_datagram++; /* count reason for transmitting */ > > > > > > I ran it on my host PC, compiled it as module for my Debian dekstop, and > > > this is the dmesg: > > > > > > [ 9663.478807] hfranco: tx_curr_frame_num = 1 > > > [ 9663.478816] hfranco: tx_max_datagrams = 40 > > > [ 9663.478818] hfranco: timer_interval = 400000 > > > [ 9663.478820] hfranco: n inside else if = 1 > > > [ 9663.478822] hfranco: timer started > > > [ 9663.479645] hfranco: tx_curr_frame_num = 1 > > > [ 9663.479652] hfranco: n inside else = 1 > > > > > > And then it basically repeats. Looks like 'n' never passes the 1 value. > > > By tweaking the flags mentioned before, 'n' got a value of 4, but that > > > was the maximum value. I was wondering, why do you think this code looks > > > suspicious? I basically just inserted some printks on the tx side, I > > > will see if I can get something from the rx as well. > > > > If we look at the statistics you initially gathered, we can see that all transmissions > > on the host side happen because the timeout elapses. That, however, does > > _not_ tell us that the host is to blame. We could look at two possible scenarios > > > > A - the gadget is bundling up the packets with too much delay and the host > > just answers to the megatransmissions with one packet and the delay on the host > > is inconsequential > > > > B - the timer on the host runs for too long or sometimes not at all. If that were > > the case that code I pointed out would be most likely to blame > > > > Could I suggest we try to localize the issue? Can you ping the host from the device? > > Lastly, about Oliver's question, I tried to send an UDP package using > netcat from host to device and then from device to host. I was sending > one single packet in loop with 1 second interval first, then tried with > 0.5 second and 0.1 second. My hope was, with this test, detect who was > failing, host or device. > > For 1 second, it pretty much works on both directions. As example, this > is the output of my host PC (sendind packages with the device -> "echo > 'hi' | nc -u 192.168.11.2 5201"): > > $ sudo tcpdump -i enx9627798c3fa3 > tcpdump: verbose output suppressed, use -v[v]... for full protocol decode > listening on enx9627798c3fa3, link-type EN10MB (Ethernet), snapshot length 262144 bytes > 17:47:51.865967 IP apalis-imx6-10692086.local.54882 > hdebian.5201: UDP, length 3 > 17:47:51.866051 IP hdebian > apalis-imx6-10692086.local: ICMP hdebian udp port 5201 unreachable, length 39 > 17:47:52.145987 IP hdebian.mdns > mdns.mcast.net.mdns: 0 [b2&3=0x200] [2a] PTR (QM)? 1.11.168.192.in-addr.arpa. (134) > 17:47:52.146020 IP hdebian.mdns > mdns.mcast.net.mdns: 0 [0q] (12) > 17:47:52.198689 IP apalis-imx6-10692086.local.mdns > mdns.mcast.net.mdns: 0*- [0q] 1/0/0 (Cache flush) PTR apalis-imx6-10692086.local. (77) > 17:47:52.882532 IP apalis-imx6-10692086.local.37745 > hdebian.5201: UDP, length 3 > 17:47:52.882634 IP hdebian > apalis-imx6-10692086.local: ICMP hdebian udp port 5201 unreachable, length 39 > 17:47:53.903508 IP apalis-imx6-10692086.local.38147 > hdebian.5201: UDP, length 3 > 17:47:53.903585 IP hdebian > apalis-imx6-10692086.local: ICMP hdebian udp port 5201 unreachable, length 39 > 17:47:54.920174 IP apalis-imx6-10692086.local.60718 > hdebian.5201: UDP, length 3 > 17:47:54.920265 IP hdebian > apalis-imx6-10692086.local: ICMP hdebian udp port 5201 unreachable, length 39 > 17:47:55.940687 IP apalis-imx6-10692086.local.38729 > hdebian.5201: UDP, length 3 > 17:47:55.940743 IP hdebian > apalis-imx6-10692086.local: ICMP hdebian udp port 5201 unreachable, length 39 > 17:47:56.913135 ARP, Request who-has apalis-imx6-10692086.local tell hdebian, length 28 > 17:47:56.915052 ARP, Reply apalis-imx6-10692086.local is-at ba:6d:d0:0a:95:6b (oui Unknown), length 28 > 17:47:56.956927 IP apalis-imx6-10692086.local.54649 > hdebian.5201: UDP, length 3 > 17:47:56.956966 IP hdebian > apalis-imx6-10692086.local: ICMP hdebian udp port 5201 unreachable, length 39 > 17:47:57.980879 IP apalis-imx6-10692086.local.53366 > hdebian.5201: UDP, length 3 > 17:47:57.980984 IP hdebian > apalis-imx6-10692086.local: ICMP hdebian udp port 5201 unreachable, length 39 note that you're getting bidirectional transfer due to the icmp errors. if you want to test unidirectionally, you need to send something that won't get a reply. for example an unsolicited arp reply (with arping)... or a udp packet to a port drop'ed by a firewall. and of course make sure you don't run it in such a way that you get ssh traffic causing bidirectional traffic. (ie. run it (and tcpdump) under screen) [this is assuming you can't run it directly on the device without getting in via ssh, maybe you can via serial port or whatever - though do note that serial wakeups could potentially also 'trigger' / 'fix' your hrtimer, so probably best to run things silently under screen with no direct output via ssh/serial/console. > When I lower down the time between two packages from 1 second to 0.5 and > 0.1, it suddenly freezes. It happens on both directions, host to device > and device to host. As example, this is the host send UDP packets to the > device with 0.1s interval: > > / # tcpdump -i usb0 > tcpdump: verbose output suppressed, use -v[v]... for full protocol decode > listening on usb0, link-type EN10MB (Ethernet), snapshot length 262144 bytes > 18:39:53.662594 IP 192.168.11.2.54802 > 192.168.11.1.5201: UDP, length 3 > 18:39:53.662716 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39 > 18:39:53.768678 IP 192.168.11.2.49080 > 192.168.11.1.5201: UDP, length 3 > 18:39:53.768767 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39 > 18:39:53.875305 IP 192.168.11.2.56938 > 192.168.11.1.5201: UDP, length 3 > 18:39:53.875376 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39 > 18:39:53.982238 IP 192.168.11.2.40177 > 192.168.11.1.5201: UDP, length 3 > 18:39:53.982354 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39 > 18:39:54.089279 IP 192.168.11.2.43284 > 192.168.11.1.5201: UDP, length 3 > 18:39:54.089378 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39 > 18:39:54.196027 IP 192.168.11.2.41990 > 192.168.11.1.5201: UDP, length 3 > 18:39:54.196105 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39 > 18:39:54.303793 IP 192.168.11.2.42362 > 192.168.11.1.5201: UDP, length 3 > 18:39:58.664608 ARP, Request who-has 192.168.11.2 tell 192.168.11.1, length 28 > 18:39:58.666781 ARP, Reply 192.168.11.2 is-at ea:2a:bc:40:41:30 (oui Unknown), length 28 > 18:39:58.669531 ARP, Request who-has 192.168.11.1 tell 192.168.11.2, length 28 > 18:39:58.669582 ARP, Reply 192.168.11.1 is-at 1e:be:d4:00:51:0e (oui Unknown), length 28 > > It stops after this ARP request. Sometimes it send packages again, but > after a few seconds it freezes. So, at the end, I couldn't figure out > where the issue is... > > > > > Regards > > Oliver > > On Tue, Dec 12, 2023 at 05:11:49PM -0800, Maciej Żenczykowski wrote: > > On Tue, Dec 12, 2023 at 1:21 PM Oliver Neukum <oneukum@suse.com> wrote: > > > > > > On 12.12.23 21:32, Hiago De Franco wrote: > > > > > > Hi, > > > > > > > On Mon, Dec 11, 2023 at 12:44:42PM -0800, Maciej Żenczykowski wrote: > > > >> On Mon, Dec 11, 2023 at 12:29 PM Hiago De Franco <hiagofranco@gmail.com> wrote: > > > > > > >> Hiago, could you try lowering CDC_NCM_TIMER_PENDING_CNT, if need be all the way to 1? > > > >> It is defined in include/linux/usb/cdc_ncm.h as 3 currently > > > >> This applies to the host side. > > > > > > > > On my side CDC_NCM_TIMER_PENDING_CNT is set to 2 by default, did you > > > > mean CDC_NCM_RESTART_TIMER_DATAGRAM_CNT? > > > > > > Yes, I meant that. Sorry. > > > > > > > Despite of that, I tried to lower both CDC_NCM_TIMER_PENDING_CNT and > > > > CDC_NCM_RESTART_TIMER_DATAGRAM_CNT all the way down to 1, first the > > > > CDC_NCM_TIMER_PENDING_CNT, then CDC_NCM_RESTART_TIMER_DATAGRAM_CNT and > > > > finally both at the same time, but it didn't help. > > > > > > > > I've also put some printks as following: > > > > > > > > ctx->tx_curr_frame_num = n; > > > > printk("hfranco: tx_curr_frame_num = %d", n); > > > > > > > > if (n == 0) { > > > > printk("hfranco: n == 0"); > > > > /* wait for more frames */ > > > > /* push variables */ > > > > ctx->tx_curr_skb = skb_out; > > > > goto exit_no_skb; > > > > > > > > } else if ((n < ctx->tx_max_datagrams) && (ready2send == 0) && (ctx->timer_interval > 0)) { > > > > printk("hfranco: tx_max_datagrams = %d", ctx->tx_max_datagrams); > > > > printk("hfranco: timer_interval = %d", ctx->timer_interval); > > > > printk("hfranco: n inside else if = %d", n); > > > > /* wait for more frames */ > > > > /* push variables */ > > > > ctx->tx_curr_skb = skb_out; > > > > /* set the pending count */ > > > > if (n < CDC_NCM_RESTART_TIMER_DATAGRAM_CNT) > > > > ctx->tx_timer_pending = CDC_NCM_TIMER_PENDING_CNT; > > > > goto exit_no_skb; > > > > > > > > } else { > > > > printk("hfranco: n inside else = %d", n); > > > > if (n == ctx->tx_max_datagrams) > > > > ctx->tx_reason_max_datagram++; /* count reason for transmitting */ > > > > > > > > I ran it on my host PC, compiled it as module for my Debian dekstop, and > > > > Shouldn't you be doing this on the gadget side? > > I thought we were thinking it was the gadget transmit timer having issues. > > Because I still don't know where the issue is, I was testing both sides. > > > > > > > this is the dmesg: > > > > > > > > [ 9663.478807] hfranco: tx_curr_frame_num = 1 > > > > [ 9663.478816] hfranco: tx_max_datagrams = 40 > > > > [ 9663.478818] hfranco: timer_interval = 400000 > > > > [ 9663.478820] hfranco: n inside else if = 1 > > > > [ 9663.478822] hfranco: timer started > > > > [ 9663.479645] hfranco: tx_curr_frame_num = 1 > > > > [ 9663.479652] hfranco: n inside else = 1 > > > > > > > > And then it basically repeats. Looks like 'n' never passes the 1 value. > > > > By tweaking the flags mentioned before, 'n' got a value of 4, but that > > > > was the maximum value. I was wondering, why do you think this code looks > > > > suspicious? I basically just inserted some printks on the tx side, I > > > > will see if I can get something from the rx as well. > > > > > > If we look at the statistics you initially gathered, we can see that all transmissions > > > on the host side happen because the timeout elapses. That, however, does > > > _not_ tell us that the host is to blame. We could look at two possible scenarios > > > > > > A - the gadget is bundling up the packets with too much delay and the host > > > just answers to the megatransmissions with one packet and the delay on the host > > > is inconsequential > > > > > > B - the timer on the host runs for too long or sometimes not at all. If that were > > > the case that code I pointed out would be most likely to blame > > > > > > Could I suggest we try to localize the issue? Can you ping the host from the device? > > > > > > Regards > > > Oliver > > > > -- > > Maciej Żenczykowski, Kernel Networking Developer @ Google > > Sorry if I couldn't provide enough information, I'm kind of learning > trought the process. The TX_MAX_NUM_DPE is something I'm not fully > convinced, but again I will keep debugging. > > Best regards, > Hiago. -- Maciej Żenczykowski, Kernel Networking Developer @ Google ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Question regarding CDC NCM and VNC performance issue 2023-12-15 21:44 ` Maciej Żenczykowski @ 2023-12-18 14:34 ` Oliver Neukum 2023-12-18 20:59 ` Hiago De Franco 0 siblings, 1 reply; 29+ messages in thread From: Oliver Neukum @ 2023-12-18 14:34 UTC (permalink / raw) To: Maciej Żenczykowski, Hiago De Franco Cc: Oliver Neukum, Francesco Dolcini, davem, edumazet, hiago.franco, kuba, linux-usb, pabeni On 15.12.23 22:44, Maciej Żenczykowski wrote: > On Fri, Dec 15, 2023 at 1:20 PM Hiago De Franco <hiagofranco@gmail.com> wrote: Hi, > Here's my recommendation. > On the gadget side. > Print something when the timer is armed. > Print something when the timer fires. > Include the timestamp. > > Then try to send something (a single packet) that won't trigger a reply. You are implicitly raising a question. Does this bug strike if a single package is queued? The failure case seems to be 10 or 11 packages. Regards Oliver ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Question regarding CDC NCM and VNC performance issue 2023-12-18 14:34 ` Oliver Neukum @ 2023-12-18 20:59 ` Hiago De Franco 2023-12-18 23:00 ` Maciej Żenczykowski 0 siblings, 1 reply; 29+ messages in thread From: Hiago De Franco @ 2023-12-18 20:59 UTC (permalink / raw) To: Maciej Żenczykowski, Oliver Neukum Cc: Francesco Dolcini, davem, edumazet, hiago.franco, kuba, linux-usb, pabeni Hi, On Fri, Dec 15, 2023 at 01:44:29PM -0800, Maciej Żenczykowski wrote: > Here's my recommendation. > On the gadget side. > Print something when the timer is armed. > Print something when the timer fires. > Include the timestamp. > > Then try to send something (a single packet) that won't trigger a reply. > > For example run a script in a screen session that sends 1 unsolicited > arp reply packet every 10 seconds. > This is rare enough, it should obviously get the timer to fire. > Again, wait 5 minutes, so you've had 30+ packets sent. > See if it does even fire... > You can run tcpdump on the receiver to see what you receive (and the > timestamps on that) Thanks for the recommendation, I did what you suggested for 5 minutes adding printk's for all 'hrtimer_*' function calls inside f_ncm.c. Basically we only get 'hrtimer_start' and 'hrtimer_try_to_cancel' calls, which makes sense. Here is the result, doing ARP pings from the device to the host PC: [ 20.427879] hfranco: hrtimer_try_to_cancel [ 21.270355] hfranco: hrtimer_start [ 21.270732] hfranco: hrtimer_try_to_cancel [ 31.695892] hfranco: hrtimer_start [ 31.696215] hfranco: hrtimer_try_to_cancel [ 36.324646] hfranco: hrtimer_start [ 36.324989] hfranco: hrtimer_try_to_cancel [ 37.168880] hfranco: hrtimer_start [ 37.169216] hfranco: hrtimer_try_to_cancel [ 60.359510] hfranco: hrtimer_start [ 60.359836] hfranco: hrtimer_try_to_cancel [ 68.188814] hfranco: hrtimer_start [ 68.189141] hfranco: hrtimer_try_to_cancel [ 69.035732] hfranco: hrtimer_start [ 69.036062] hfranco: hrtimer_try_to_cancel [ 90.413786] hfranco: hrtimer_start [ 90.414116] hfranco: hrtimer_try_to_cancel [ 100.380712] hfranco: hrtimer_start [ 100.381048] hfranco: hrtimer_try_to_cancel [ 110.349906] hfranco: hrtimer_start [ 110.350247] hfranco: hrtimer_try_to_cancel [ 120.321527] hfranco: hrtimer_start [ 120.321872] hfranco: hrtimer_try_to_cancel [ 120.325531] hfranco: hrtimer_start [ 120.325869] hfranco: hrtimer_try_to_cancel [ 130.295101] hfranco: hrtimer_start [ 130.295428] hfranco: hrtimer_try_to_cancel [ 131.992642] hfranco: hrtimer_start [ 131.992984] hfranco: hrtimer_try_to_cancel [ 132.839585] hfranco: hrtimer_start [ 132.839927] hfranco: hrtimer_try_to_cancel [ 140.270782] hfranco: hrtimer_start [ 140.271112] hfranco: hrtimer_try_to_cancel [ 150.248507] hfranco: hrtimer_start [ 150.248848] hfranco: hrtimer_try_to_cancel [ 160.228126] hfranco: hrtimer_start [ 160.228453] hfranco: hrtimer_try_to_cancel [ 170.209459] hfranco: hrtimer_start [ 170.209822] hfranco: hrtimer_try_to_cancel [ 180.192015] hfranco: hrtimer_start [ 180.192351] hfranco: hrtimer_try_to_cancel [ 190.176082] hfranco: hrtimer_start [ 190.176423] hfranco: hrtimer_try_to_cancel [ 200.161272] hfranco: hrtimer_start [ 200.161600] hfranco: hrtimer_try_to_cancel [ 210.147622] hfranco: hrtimer_start [ 210.147949] hfranco: hrtimer_try_to_cancel [ 220.135257] hfranco: hrtimer_start [ 220.135582] hfranco: hrtimer_try_to_cancel [ 230.123934] hfranco: hrtimer_start [ 230.124274] hfranco: hrtimer_try_to_cancel [ 237.879789] hfranco: hrtimer_start [ 237.880133] hfranco: hrtimer_try_to_cancel [ 240.113459] hfranco: hrtimer_start [ 240.113794] hfranco: hrtimer_try_to_cancel [ 250.103968] hfranco: hrtimer_start [ 250.104309] hfranco: hrtimer_try_to_cancel [ 259.800618] hfranco: hrtimer_start [ 259.800946] hfranco: hrtimer_try_to_cancel [ 260.095486] hfranco: hrtimer_start [ 260.095828] hfranco: hrtimer_try_to_cancel [ 260.646435] hfranco: hrtimer_start [ 260.646776] hfranco: hrtimer_try_to_cancel [ 270.087384] hfranco: hrtimer_start [ 270.087711] hfranco: hrtimer_try_to_cancel [ 280.080143] hfranco: hrtimer_start [ 280.080468] hfranco: hrtimer_try_to_cancel [ 290.073603] hfranco: hrtimer_start [ 290.073945] hfranco: hrtimer_try_to_cancel [ 300.067513] hfranco: hrtimer_start [ 300.067852] hfranco: hrtimer_try_to_cancel [ 310.061945] hfranco: hrtimer_start [ 310.062297] hfranco: hrtimer_try_to_cancel [ 320.056667] hfranco: hrtimer_start [ 320.057008] hfranco: hrtimer_try_to_cancel [ 330.051900] hfranco: hrtimer_start [ 330.052255] hfranco: hrtimer_try_to_cancel [ 340.047353] hfranco: hrtimer_start [ 340.047678] hfranco: hrtimer_try_to_cancel [ 350.044862] hfranco: hrtimer_start [ 350.045221] hfranco: hrtimer_try_to_cancel [ 360.043011] hfranco: hrtimer_start [ 360.043352] hfranco: hrtimer_try_to_cancel [ 370.041268] hfranco: hrtimer_start [ 370.041611] hfranco: hrtimer_try_to_cancel [ 380.039556] hfranco: hrtimer_start [ 380.039891] hfranco: hrtimer_try_to_cancel [ 390.037826] hfranco: hrtimer_start [ 390.038150] hfranco: hrtimer_try_to_cancel [ 400.036353] hfranco: hrtimer_start / # arping -U -I usb0 -i 10 -s 192.168.11.1 192.168.11.2 ARPING 192.168.11.2 from 192.168.11.1 usb0 Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.504ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.459ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.612ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.676ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.509ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.696ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 3.008ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.446ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.776ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.529ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.669ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.523ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.574ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.872ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.654ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.639ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.576ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.584ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.560ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.393ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.627ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.799ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.980ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.661ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.707ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.523ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.711ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.704ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.566ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.678ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.592ms Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.654ms ^CSent 32 probes (1 broadcast(s)) Received 32 response(s) $ sudo tcpdump -i enx620542df6fc2 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on enx620542df6fc2, link-type EN10MB (Ethernet), snapshot length 262144 bytes 17:02:32.599638 IP6 hdebian > ip6-allrouters: ICMP6, router solicitation, length 8 17:02:35.425325 IP6 apalis-imx6-10692086.local > ip6-allrouters: ICMP6, router solicitation, length 16 17:02:42.810738 IP6 hdebian.mdns > ff02::fb.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45) 17:02:42.865794 IP6 hdebian.mdns > ff02::fb.mdns: 0 [b2&3=0x200] [2a] PTR (QM)? b.f.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.f.f.ip6.arpa. (181) 17:02:42.865832 IP6 hdebian.mdns > ff02::fb.mdns: 0 [0q] (12) 17:02:43.253971 IP apalis-imx6-10692086.local.mdns > mdns.mcast.net.mdns: 0 [1a] PTR (QM)? _tezi._tcp.local. (71) 17:02:43.895157 IP6 hdebian.mdns > ff02::fb.mdns: 0 [b2&3=0x200] [2a] PTR (QM)? b.f.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.f.f.ip6.arpa. (181) 17:02:43.895196 IP6 hdebian.mdns > ff02::fb.mdns: 0 [0q] (12) 17:02:44.100862 IP6 apalis-imx6-10692086.local.mdns > ff02::fb.mdns: 0 [1a] PTR (QM)? _tezi._tcp.local. (71) 17:02:44.961717 IP hdebian.mdns > mdns.mcast.net.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45) 17:02:46.145083 IP6 hdebian.mdns > ff02::fb.mdns: 0 [b2&3=0x200] [2a] PTR (QM)? b.f.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.f.f.ip6.arpa. (181) 17:02:46.145128 IP6 hdebian.mdns > ff02::fb.mdns: 0 [0q] (12) 17:03:05.477178 ARP, Request who-has hdebian (Broadcast) tell apalis-imx6-10692086.local, length 28 17:03:05.477217 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:03:15.443291 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:03:15.443309 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:03:24.580306 IP6 hdebian > ip6-allrouters: ICMP6, router solicitation, length 8 17:03:25.411804 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:03:25.411821 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:03:35.382697 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:03:35.382717 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:03:35.386582 IP6 apalis-imx6-10692086.local > ip6-allrouters: ICMP6, router solicitation, length 16 17:03:45.355482 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:03:45.355504 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:03:46.811643 IP6 hdebian.mdns > ff02::fb.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45) 17:03:47.052930 IP apalis-imx6-10692086.local.mdns > mdns.mcast.net.mdns: 0 [1a] PTR (QM)? _tezi._tcp.local. (71) 17:03:47.899743 IP6 apalis-imx6-10692086.local.mdns > ff02::fb.mdns: 0 [1a] PTR (QM)? _tezi._tcp.local. (71) 17:03:48.962107 IP hdebian.mdns > mdns.mcast.net.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45) 17:03:55.330407 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:03:55.330423 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:04:05.307308 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:04:05.307324 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:04:15.286180 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:04:15.286209 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:04:25.266785 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:04:25.266816 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:04:35.248474 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:04:35.248489 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:04:45.231873 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:04:45.231902 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:04:55.216288 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:04:55.216315 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:05:05.201924 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:05:05.201951 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:05:09.438461 IP6 hdebian > ip6-allrouters: ICMP6, router solicitation, length 8 17:05:15.188755 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:05:15.188799 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:05:25.176620 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:05:25.176648 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:05:32.931912 IP6 apalis-imx6-10692086.local > ip6-allrouters: ICMP6, router solicitation, length 16 17:05:35.165375 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:05:35.165399 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:05:45.155068 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:05:45.155082 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:05:54.813550 IP6 hdebian.mdns > ff02::fb.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45) 17:05:54.851012 IP apalis-imx6-10692086.local.mdns > mdns.mcast.net.mdns: 0 [1a] PTR (QM)? _tezi._tcp.local. (71) 17:05:55.145920 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:05:55.145943 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:05:55.696822 IP6 apalis-imx6-10692086.local.mdns > ff02::fb.mdns: 0 [1a] PTR (QM)? _tezi._tcp.local. (71) 17:05:56.963151 IP hdebian.mdns > mdns.mcast.net.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45) 17:06:05.136948 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:06:05.137104 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:06:15.128897 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:06:15.128918 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:06:25.121662 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:06:25.121682 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:06:35.114912 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:06:35.114943 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:06:45.108576 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:06:45.108599 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:06:55.102468 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:06:55.102501 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:07:05.096969 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:07:05.096994 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:07:15.091619 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:07:15.091647 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:07:25.088359 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:07:25.088385 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:07:35.085718 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:07:35.085753 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:07:45.083263 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:07:45.083291 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:07:55.080759 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:07:55.080782 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:08:05.078303 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:08:05.078344 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 17:08:15.075990 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 17:08:15.076012 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 ^C 88 packets captured 88 packets received by filter 0 packets dropped by kernel Everything basically worked, without any freezing/slowness (please note that during the test I wasn't using the VNC client, only the arping). I also did it using screen with ssh/serial console disconnected. Although I passed -U flag to arping, I still get replies. Is this expected? > note that you're getting bidirectional transfer due to the icmp errors. > > if you want to test unidirectionally, you need to send something that > won't get a reply. > for example an unsolicited arp reply (with arping)... or a udp packet > to a port drop'ed by a firewall. > > and of course make sure you don't run it in such a way that you get > ssh traffic causing bidirectional traffic. > (ie. run it (and tcpdump) under screen) > [this is assuming you can't run it directly on the device without > getting in via ssh, > maybe you can via serial port or whatever - though do note that serial > wakeups could potentially also 'trigger' / 'fix' your hrtimer, > so probably best to run things silently under screen with no direct > output via ssh/serial/console. I also tested by sending UDP packets with netcat, every 1 second: ``` while true do echo 'hi' | nc -u 192.168.11.2 5201 sleep 1 done ``` I also manually added the ARP entry to the table using the 'arp' command, expecting I would get rid off ARP calls/replies, but it didn't. Even though the ARP table is correct, I still see ARP calls. Also, I can't get rid of ICMP packets, so at the end I couldn't truly test the one-way direction communication. For example, this is a snippet from tcpdump: 20:22:00.559786 IP 192.168.11.2.50323 > 192.168.11.1.5201: UDP, length 3 20:22:00.559857 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39 20:22:01.563804 IP 192.168.11.2.44521 > 192.168.11.1.5201: UDP, length 3 20:22:01.563903 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39 20:22:02.554294 ARP, Request who-has 192.168.11.2 tell 192.168.11.1, length 28 20:22:02.556414 ARP, Reply 192.168.11.2 is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 20:22:02.569192 IP 192.168.11.2.56428 > 192.168.11.1.5201: UDP, length 3 20:22:02.569281 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39 20:22:03.575838 IP 192.168.11.2.50178 > 192.168.11.1.5201: UDP, length 3 20:22:03.575903 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39 20:22:03.717633 ARP, Request who-has 192.168.11.1 tell 192.168.11.2, length 28 20:22:03.717680 ARP, Reply 192.168.11.1 is-at be:8e:0a:23:00:70 (oui Unknown), length 28 20:22:04.582394 IP 192.168.11.2.41700 > 192.168.11.1.5201: UDP, length 3 20:22:04.582473 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39 20:22:05.587417 IP 192.168.11.2.57901 > 192.168.11.1.5201: UDP, length 3 The same happens on both directions, host -> device and device -> host. I couldn't see any slowness or freezing. On Mon, Dec 18, 2023 at 03:34:36PM +0100, Oliver Neukum wrote: > > > On 15.12.23 22:44, Maciej Żenczykowski wrote: > > On Fri, Dec 15, 2023 at 1:20 PM Hiago De Franco <hiagofranco@gmail.com> wrote: > > Hi, > > > Here's my recommendation. > > On the gadget side. > > Print something when the timer is armed. > > Print something when the timer fires. > > Include the timestamp. > > > > Then try to send something (a single packet) that won't trigger a reply. > > You are implicitly raising a question. > Does this bug strike if a single package is queued? The failure > case seems to be 10 or 11 packages. I think I will keep the same test but try to increase the number of packages per second I'm sending, maybe one packet / 100 ms or something like that. > > Regards > Oliver > Best Regards, Hiago. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Question regarding CDC NCM and VNC performance issue 2023-12-18 20:59 ` Hiago De Franco @ 2023-12-18 23:00 ` Maciej Żenczykowski 2023-12-19 8:45 ` Oliver Neukum 0 siblings, 1 reply; 29+ messages in thread From: Maciej Żenczykowski @ 2023-12-18 23:00 UTC (permalink / raw) To: Hiago De Franco Cc: Oliver Neukum, Francesco Dolcini, davem, edumazet, hiago.franco, kuba, linux-usb, pabeni On Mon, Dec 18, 2023 at 1:00 PM Hiago De Franco <hiagofranco@gmail.com> wrote: > > Hi, > > On Fri, Dec 15, 2023 at 01:44:29PM -0800, Maciej Żenczykowski wrote: > > Here's my recommendation. > > On the gadget side. > > Print something when the timer is armed. > > Print something when the timer fires. > > Include the timestamp. > > > > Then try to send something (a single packet) that won't trigger a reply. > > > > For example run a script in a screen session that sends 1 unsolicited > > arp reply packet every 10 seconds. > > This is rare enough, it should obviously get the timer to fire. > > Again, wait 5 minutes, so you've had 30+ packets sent. > > See if it does even fire... > > You can run tcpdump on the receiver to see what you receive (and the > > timestamps on that) > > Thanks for the recommendation, I did what you suggested for 5 minutes > adding printk's for all 'hrtimer_*' function calls inside f_ncm.c. > Basically we only get 'hrtimer_start' and 'hrtimer_try_to_cancel' calls, > which makes sense. Here is the result, doing ARP pings from the device > to the host PC: > > [ 20.427879] hfranco: hrtimer_try_to_cancel > [ 21.270355] hfranco: hrtimer_start > [ 21.270732] hfranco: hrtimer_try_to_cancel > [ 31.695892] hfranco: hrtimer_start > [ 31.696215] hfranco: hrtimer_try_to_cancel > [ 36.324646] hfranco: hrtimer_start > [ 36.324989] hfranco: hrtimer_try_to_cancel > [ 37.168880] hfranco: hrtimer_start > [ 37.169216] hfranco: hrtimer_try_to_cancel > [ 60.359510] hfranco: hrtimer_start > [ 60.359836] hfranco: hrtimer_try_to_cancel > [ 68.188814] hfranco: hrtimer_start > [ 68.189141] hfranco: hrtimer_try_to_cancel > [ 69.035732] hfranco: hrtimer_start > [ 69.036062] hfranco: hrtimer_try_to_cancel > [ 90.413786] hfranco: hrtimer_start > [ 90.414116] hfranco: hrtimer_try_to_cancel > [ 100.380712] hfranco: hrtimer_start > [ 100.381048] hfranco: hrtimer_try_to_cancel > [ 110.349906] hfranco: hrtimer_start > [ 110.350247] hfranco: hrtimer_try_to_cancel > [ 120.321527] hfranco: hrtimer_start > [ 120.321872] hfranco: hrtimer_try_to_cancel > [ 120.325531] hfranco: hrtimer_start > [ 120.325869] hfranco: hrtimer_try_to_cancel > [ 130.295101] hfranco: hrtimer_start > [ 130.295428] hfranco: hrtimer_try_to_cancel > [ 131.992642] hfranco: hrtimer_start > [ 131.992984] hfranco: hrtimer_try_to_cancel > [ 132.839585] hfranco: hrtimer_start > [ 132.839927] hfranco: hrtimer_try_to_cancel > [ 140.270782] hfranco: hrtimer_start > [ 140.271112] hfranco: hrtimer_try_to_cancel > [ 150.248507] hfranco: hrtimer_start > [ 150.248848] hfranco: hrtimer_try_to_cancel > [ 160.228126] hfranco: hrtimer_start > [ 160.228453] hfranco: hrtimer_try_to_cancel > [ 170.209459] hfranco: hrtimer_start > [ 170.209822] hfranco: hrtimer_try_to_cancel > [ 180.192015] hfranco: hrtimer_start > [ 180.192351] hfranco: hrtimer_try_to_cancel > [ 190.176082] hfranco: hrtimer_start > [ 190.176423] hfranco: hrtimer_try_to_cancel > [ 200.161272] hfranco: hrtimer_start > [ 200.161600] hfranco: hrtimer_try_to_cancel > [ 210.147622] hfranco: hrtimer_start > [ 210.147949] hfranco: hrtimer_try_to_cancel > [ 220.135257] hfranco: hrtimer_start > [ 220.135582] hfranco: hrtimer_try_to_cancel > [ 230.123934] hfranco: hrtimer_start > [ 230.124274] hfranco: hrtimer_try_to_cancel > [ 237.879789] hfranco: hrtimer_start > [ 237.880133] hfranco: hrtimer_try_to_cancel > [ 240.113459] hfranco: hrtimer_start > [ 240.113794] hfranco: hrtimer_try_to_cancel > [ 250.103968] hfranco: hrtimer_start > [ 250.104309] hfranco: hrtimer_try_to_cancel > [ 259.800618] hfranco: hrtimer_start > [ 259.800946] hfranco: hrtimer_try_to_cancel > [ 260.095486] hfranco: hrtimer_start > [ 260.095828] hfranco: hrtimer_try_to_cancel > [ 260.646435] hfranco: hrtimer_start > [ 260.646776] hfranco: hrtimer_try_to_cancel > [ 270.087384] hfranco: hrtimer_start > [ 270.087711] hfranco: hrtimer_try_to_cancel > [ 280.080143] hfranco: hrtimer_start > [ 280.080468] hfranco: hrtimer_try_to_cancel > [ 290.073603] hfranco: hrtimer_start > [ 290.073945] hfranco: hrtimer_try_to_cancel > [ 300.067513] hfranco: hrtimer_start > [ 300.067852] hfranco: hrtimer_try_to_cancel > [ 310.061945] hfranco: hrtimer_start > [ 310.062297] hfranco: hrtimer_try_to_cancel > [ 320.056667] hfranco: hrtimer_start > [ 320.057008] hfranco: hrtimer_try_to_cancel > [ 330.051900] hfranco: hrtimer_start > [ 330.052255] hfranco: hrtimer_try_to_cancel > [ 340.047353] hfranco: hrtimer_start > [ 340.047678] hfranco: hrtimer_try_to_cancel > [ 350.044862] hfranco: hrtimer_start > [ 350.045221] hfranco: hrtimer_try_to_cancel > [ 360.043011] hfranco: hrtimer_start > [ 360.043352] hfranco: hrtimer_try_to_cancel > [ 370.041268] hfranco: hrtimer_start > [ 370.041611] hfranco: hrtimer_try_to_cancel > [ 380.039556] hfranco: hrtimer_start > [ 380.039891] hfranco: hrtimer_try_to_cancel > [ 390.037826] hfranco: hrtimer_start > [ 390.038150] hfranco: hrtimer_try_to_cancel > [ 400.036353] hfranco: hrtimer_start > > / # arping -U -I usb0 -i 10 -s 192.168.11.1 192.168.11.2 > ARPING 192.168.11.2 from 192.168.11.1 usb0 > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.504ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.459ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.612ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.676ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.509ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.696ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 3.008ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.446ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.776ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.529ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.669ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.523ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.574ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.872ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.654ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.639ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.576ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.584ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.560ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.393ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.627ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.799ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.980ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.661ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.707ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.523ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.711ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.704ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.566ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.678ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.592ms > Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2] 2.654ms > ^CSent 32 probes (1 broadcast(s)) > Received 32 response(s) > > $ sudo tcpdump -i enx620542df6fc2 > tcpdump: verbose output suppressed, use -v[v]... for full protocol decode > listening on enx620542df6fc2, link-type EN10MB (Ethernet), snapshot length 262144 bytes > 17:02:32.599638 IP6 hdebian > ip6-allrouters: ICMP6, router solicitation, length 8 > 17:02:35.425325 IP6 apalis-imx6-10692086.local > ip6-allrouters: ICMP6, router solicitation, length 16 > 17:02:42.810738 IP6 hdebian.mdns > ff02::fb.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45) > 17:02:42.865794 IP6 hdebian.mdns > ff02::fb.mdns: 0 [b2&3=0x200] [2a] PTR (QM)? b.f.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.f.f.ip6.arpa. (181) > 17:02:42.865832 IP6 hdebian.mdns > ff02::fb.mdns: 0 [0q] (12) > 17:02:43.253971 IP apalis-imx6-10692086.local.mdns > mdns.mcast.net.mdns: 0 [1a] PTR (QM)? _tezi._tcp.local. (71) > 17:02:43.895157 IP6 hdebian.mdns > ff02::fb.mdns: 0 [b2&3=0x200] [2a] PTR (QM)? b.f.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.f.f.ip6.arpa. (181) > 17:02:43.895196 IP6 hdebian.mdns > ff02::fb.mdns: 0 [0q] (12) > 17:02:44.100862 IP6 apalis-imx6-10692086.local.mdns > ff02::fb.mdns: 0 [1a] PTR (QM)? _tezi._tcp.local. (71) > 17:02:44.961717 IP hdebian.mdns > mdns.mcast.net.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45) > 17:02:46.145083 IP6 hdebian.mdns > ff02::fb.mdns: 0 [b2&3=0x200] [2a] PTR (QM)? b.f.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.f.f.ip6.arpa. (181) > 17:02:46.145128 IP6 hdebian.mdns > ff02::fb.mdns: 0 [0q] (12) > 17:03:05.477178 ARP, Request who-has hdebian (Broadcast) tell apalis-imx6-10692086.local, length 28 > 17:03:05.477217 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:03:15.443291 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:03:15.443309 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:03:24.580306 IP6 hdebian > ip6-allrouters: ICMP6, router solicitation, length 8 > 17:03:25.411804 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:03:25.411821 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:03:35.382697 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:03:35.382717 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:03:35.386582 IP6 apalis-imx6-10692086.local > ip6-allrouters: ICMP6, router solicitation, length 16 > 17:03:45.355482 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:03:45.355504 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:03:46.811643 IP6 hdebian.mdns > ff02::fb.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45) > 17:03:47.052930 IP apalis-imx6-10692086.local.mdns > mdns.mcast.net.mdns: 0 [1a] PTR (QM)? _tezi._tcp.local. (71) > 17:03:47.899743 IP6 apalis-imx6-10692086.local.mdns > ff02::fb.mdns: 0 [1a] PTR (QM)? _tezi._tcp.local. (71) > 17:03:48.962107 IP hdebian.mdns > mdns.mcast.net.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45) > 17:03:55.330407 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:03:55.330423 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:04:05.307308 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:04:05.307324 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:04:15.286180 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:04:15.286209 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:04:25.266785 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:04:25.266816 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:04:35.248474 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:04:35.248489 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:04:45.231873 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:04:45.231902 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:04:55.216288 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:04:55.216315 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:05:05.201924 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:05:05.201951 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:05:09.438461 IP6 hdebian > ip6-allrouters: ICMP6, router solicitation, length 8 > 17:05:15.188755 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:05:15.188799 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:05:25.176620 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:05:25.176648 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:05:32.931912 IP6 apalis-imx6-10692086.local > ip6-allrouters: ICMP6, router solicitation, length 16 > 17:05:35.165375 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:05:35.165399 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:05:45.155068 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:05:45.155082 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:05:54.813550 IP6 hdebian.mdns > ff02::fb.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45) > 17:05:54.851012 IP apalis-imx6-10692086.local.mdns > mdns.mcast.net.mdns: 0 [1a] PTR (QM)? _tezi._tcp.local. (71) > 17:05:55.145920 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:05:55.145943 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:05:55.696822 IP6 apalis-imx6-10692086.local.mdns > ff02::fb.mdns: 0 [1a] PTR (QM)? _tezi._tcp.local. (71) > 17:05:56.963151 IP hdebian.mdns > mdns.mcast.net.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45) > 17:06:05.136948 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:06:05.137104 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:06:15.128897 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:06:15.128918 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:06:25.121662 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:06:25.121682 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:06:35.114912 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:06:35.114943 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:06:45.108576 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:06:45.108599 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:06:55.102468 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:06:55.102501 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:07:05.096969 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:07:05.096994 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:07:15.091619 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:07:15.091647 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:07:25.088359 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:07:25.088385 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:07:35.085718 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:07:35.085753 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:07:45.083263 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:07:45.083291 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:07:55.080759 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:07:55.080782 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:08:05.078303 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:08:05.078344 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 17:08:15.075990 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28 > 17:08:15.076012 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > ^C > 88 packets captured > 88 packets received by filter > 0 packets dropped by kernel > > Everything basically worked, without any freezing/slowness (please note > that during the test I wasn't using the VNC client, only the arping). I > also did it using screen with ssh/serial console disconnected. > > Although I passed -U flag to arping, I still get replies. Is this > expected? No. this defeats the purpose of the (since it's not unidirectional). Perhaps -A would work better. You could also probably pick an ip as the destination that isn't the device on the other side of the cable, so that it *doesn't* respond. The following seems to work for me: while true; do sudo arping -c 1 -i $DEV -b 1.2.3.4; sleep 10; done since nothing responds to 1.2.3.4 on my lan. But Oliver's point about it possibly needing 10 or 11 queued packets is also interesting... though that would be very very weird... > > > note that you're getting bidirectional transfer due to the icmp errors. > > > > if you want to test unidirectionally, you need to send something that > > won't get a reply. > > for example an unsolicited arp reply (with arping)... or a udp packet > > to a port drop'ed by a firewall. > > > > and of course make sure you don't run it in such a way that you get > > ssh traffic causing bidirectional traffic. > > (ie. run it (and tcpdump) under screen) > > [this is assuming you can't run it directly on the device without > > getting in via ssh, > > maybe you can via serial port or whatever - though do note that serial > > wakeups could potentially also 'trigger' / 'fix' your hrtimer, > > so probably best to run things silently under screen with no direct > > output via ssh/serial/console. > > I also tested by sending UDP packets with netcat, every 1 second: > > ``` > while true > do > echo 'hi' | nc -u 192.168.11.2 5201 > sleep 1 > done > ``` > > I also manually added the ARP entry to the table using the 'arp' > command, expecting I would get rid off ARP calls/replies, but it didn't. > Even though the ARP table is correct, I still see ARP calls. Also, I > can't get rid of ICMP packets, so at the end I couldn't truly test the > one-way direction communication. For example, this is a snippet from > tcpdump: > > 20:22:00.559786 IP 192.168.11.2.50323 > 192.168.11.1.5201: UDP, length 3 > 20:22:00.559857 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39 > 20:22:01.563804 IP 192.168.11.2.44521 > 192.168.11.1.5201: UDP, length 3 > 20:22:01.563903 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39 > 20:22:02.554294 ARP, Request who-has 192.168.11.2 tell 192.168.11.1, length 28 > 20:22:02.556414 ARP, Reply 192.168.11.2 is-at 62:05:42:df:6f:c2 (oui Unknown), length 28 > 20:22:02.569192 IP 192.168.11.2.56428 > 192.168.11.1.5201: UDP, length 3 > 20:22:02.569281 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39 > 20:22:03.575838 IP 192.168.11.2.50178 > 192.168.11.1.5201: UDP, length 3 > 20:22:03.575903 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39 > 20:22:03.717633 ARP, Request who-has 192.168.11.1 tell 192.168.11.2, length 28 > 20:22:03.717680 ARP, Reply 192.168.11.1 is-at be:8e:0a:23:00:70 (oui Unknown), length 28 > 20:22:04.582394 IP 192.168.11.2.41700 > 192.168.11.1.5201: UDP, length 3 > 20:22:04.582473 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39 > 20:22:05.587417 IP 192.168.11.2.57901 > 192.168.11.1.5201: UDP, length 3 > > The same happens on both directions, host -> device and device -> host. > I couldn't see any slowness or freezing. > > On Mon, Dec 18, 2023 at 03:34:36PM +0100, Oliver Neukum wrote: > > > > > > On 15.12.23 22:44, Maciej Żenczykowski wrote: > > > On Fri, Dec 15, 2023 at 1:20 PM Hiago De Franco <hiagofranco@gmail.com> wrote: > > > > Hi, > > > > > Here's my recommendation. > > > On the gadget side. > > > Print something when the timer is armed. > > > Print something when the timer fires. > > > Include the timestamp. > > > > > > Then try to send something (a single packet) that won't trigger a reply. > > > > You are implicitly raising a question. > > Does this bug strike if a single package is queued? The failure > > case seems to be 10 or 11 packages. > > I think I will keep the same test but try to increase the number of > packages per second I'm sending, maybe one packet / 100 ms or something > like that. > > > > > Regards > > Oliver > > > > Best Regards, > Hiago. -- Maciej Żenczykowski, Kernel Networking Developer @ Google ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Question regarding CDC NCM and VNC performance issue 2023-12-18 23:00 ` Maciej Żenczykowski @ 2023-12-19 8:45 ` Oliver Neukum 2023-12-19 12:19 ` Maciej Żenczykowski 0 siblings, 1 reply; 29+ messages in thread From: Oliver Neukum @ 2023-12-19 8:45 UTC (permalink / raw) To: Maciej Żenczykowski, Hiago De Franco Cc: Francesco Dolcini, davem, edumazet, hiago.franco, kuba, linux-usb, pabeni On 19.12.23 00:00, Maciej Żenczykowski wrote: > On Mon, Dec 18, 2023 at 1:00 PM Hiago De Franco <hiagofranco@gmail.com> wrote: Hi, >> Everything basically worked, without any freezing/slowness (please note >> that during the test I wasn't using the VNC client, only the arping). I >> also did it using screen with ssh/serial console disconnected. >> >> Although I passed -U flag to arping, I still get replies. Is this >> expected? > > No. this defeats the purpose of the (since it's not unidirectional). > Perhaps -A would work better. > You could also probably pick an ip as the destination that isn't the > device on the other side of the cable, so that it *doesn't* respond. > > The following seems to work for me: > while true; do sudo arping -c 1 -i $DEV -b 1.2.3.4; sleep 10; done > since nothing responds to 1.2.3.4 on my lan. > > But Oliver's point about it possibly needing 10 or 11 queued packets > is also interesting... > though that would be very very weird... Probably not 10 or 11 but more than 1. Sorry for being unclear. Let me explain. The timer should not run for an idle interface, should it? Thus we have three situations A - first network packet to be transmitted: queue it and start a timer B - further network packet to be transmitted: queue it and modify the timer C - enough network packets: transmit It seems to me that Hiago has just shown that A works. Lowering the threshold also solves the issue, so C works. Case B seems to be the issue, yet the test setup tests case A. That is my issue. Regards Oliver ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Question regarding CDC NCM and VNC performance issue 2023-12-19 8:45 ` Oliver Neukum @ 2023-12-19 12:19 ` Maciej Żenczykowski 2023-12-19 13:24 ` Oliver Neukum 0 siblings, 1 reply; 29+ messages in thread From: Maciej Żenczykowski @ 2023-12-19 12:19 UTC (permalink / raw) To: Oliver Neukum Cc: Hiago De Franco, Francesco Dolcini, davem, edumazet, hiago.franco, kuba, linux-usb, pabeni On Tue, Dec 19, 2023 at 12:45 AM Oliver Neukum <oneukum@suse.com> wrote: > > > > On 19.12.23 00:00, Maciej Żenczykowski wrote: > > On Mon, Dec 18, 2023 at 1:00 PM Hiago De Franco <hiagofranco@gmail.com> wrote: > > Hi, > > >> Everything basically worked, without any freezing/slowness (please note > >> that during the test I wasn't using the VNC client, only the arping). I > >> also did it using screen with ssh/serial console disconnected. > >> > >> Although I passed -U flag to arping, I still get replies. Is this > >> expected? > > > > No. this defeats the purpose of the (since it's not unidirectional). > > Perhaps -A would work better. > > You could also probably pick an ip as the destination that isn't the > > device on the other side of the cable, so that it *doesn't* respond. > > > > The following seems to work for me: > > while true; do sudo arping -c 1 -i $DEV -b 1.2.3.4; sleep 10; done > > since nothing responds to 1.2.3.4 on my lan. > > > > But Oliver's point about it possibly needing 10 or 11 queued packets > > is also interesting... > > though that would be very very weird... > > Probably not 10 or 11 but more than 1. Sorry for being unclear. Let me > explain. > > The timer should not run for an idle interface, should it? > Thus we have three situations > > A - first network packet to be transmitted: > queue it and start a timer > > B - further network packet to be transmitted: > queue it and modify the timer > > C - enough network packets: > transmit > > It seems to me that Hiago has just shown that A works. Lowering the threshold > also solves the issue, so C works. Case B seems to be the issue, yet the test > setup tests case A. That is my issue. Perhaps. I've been looking at the gadget tx code, and assuming hrtimers work, I don't see how it could possibly screw up... the hrtimer arm/cancel are clearly at the same spot in the code where we allocate/unqueue the ncm->skb_tx_data (I have found a potential use-after-free-on-error bug and a stupidity that I'll send fixes for, but that doesn't appear related to this thread) But the host side driver seems more complex/confusing. However, I've not really ever looked at it previously though, so it might just be that... Maybe there's some case where we end up with no timer and queued packets?? Haven't found it yet though... > > Regards > Oliver ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Question regarding CDC NCM and VNC performance issue 2023-12-19 12:19 ` Maciej Żenczykowski @ 2023-12-19 13:24 ` Oliver Neukum 2023-12-19 16:25 ` Maciej Żenczykowski 0 siblings, 1 reply; 29+ messages in thread From: Oliver Neukum @ 2023-12-19 13:24 UTC (permalink / raw) To: Maciej Żenczykowski, Oliver Neukum Cc: Hiago De Franco, Francesco Dolcini, davem, edumazet, hiago.franco, kuba, linux-usb, pabeni On 19.12.23 13:19, Maciej Żenczykowski wrote: > On Tue, Dec 19, 2023 at 12:45 AM Oliver Neukum <oneukum@suse.com> wrote: >> >> >> >> On 19.12.23 00:00, Maciej Żenczykowski wrote: > Perhaps. I've been looking at the gadget tx code, and assuming hrtimers work, > I don't see how it could possibly screw up... the hrtimer arm/cancel > are clearly > at the same spot in the code where we allocate/unqueue the ncm->skb_tx_data > > (I have found a potential use-after-free-on-error bug and a stupidity > that I'll send fixes for, > but that doesn't appear related to this thread) Yes. Could there be a logic bug in the algorithm? Maybe it is necessary to also consider the time the first packet was queued and to transmit if that is too long in the past? > But the host side driver seems more complex/confusing. > However, I've not really ever looked at it previously though, so it > might just be that... How can Hiago determine that? If you do a ping from the gadget to the host, tcpdump should show the timing of the echo requests, shouldn't it? If they arrive simultaneously we at least know which side the issue is on. Regards Oliver ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Question regarding CDC NCM and VNC performance issue 2023-12-19 13:24 ` Oliver Neukum @ 2023-12-19 16:25 ` Maciej Żenczykowski 0 siblings, 0 replies; 29+ messages in thread From: Maciej Żenczykowski @ 2023-12-19 16:25 UTC (permalink / raw) To: Oliver Neukum Cc: Hiago De Franco, Francesco Dolcini, davem, edumazet, hiago.franco, kuba, linux-usb, pabeni On Tue, Dec 19, 2023 at 5:24 AM Oliver Neukum <oneukum@suse.com> wrote: > > On 19.12.23 13:19, Maciej Żenczykowski wrote: > > On Tue, Dec 19, 2023 at 12:45 AM Oliver Neukum <oneukum@suse.com> wrote: > >> > >> > >> > >> On 19.12.23 00:00, Maciej Żenczykowski wrote: > > > Perhaps. I've been looking at the gadget tx code, and assuming hrtimers work, > > I don't see how it could possibly screw up... the hrtimer arm/cancel > > are clearly > > at the same spot in the code where we allocate/unqueue the ncm->skb_tx_data > > > > (I have found a potential use-after-free-on-error bug and a stupidity > > that I'll send fixes for, > > but that doesn't appear related to this thread) > > Yes. Could there be a logic bug in the algorithm? Perhaps? But, as I said... I'm not seeing it (on the gadget side). > Maybe it is necessary to also consider the time the first packet > was queued and to transmit if that is too long in the past? The gadget side does not (any more) adjust the timer (on later packets)... Which is part of what makes it so simple logic wise... This sort of bug would likely be easier to make on the host side cdc_ncm driver... I'm still not following it's logic... > > > But the host side driver seems more complex/confusing. > > However, I've not really ever looked at it previously though, so it > > might just be that... > > How can Hiago determine that? If you do a ping from the gadget > to the host, tcpdump should show the timing of the echo requests, > shouldn't it? If they arrive simultaneously we at least know which > side the issue is on. I guess you could ping on one side, and run tcpdump on the *other*. Then look at the timestamps on the tcpdump vs ping cli reported... If tcpdump shows ping requests coming in at equal intervals... then the ping sender side + receiver is OK, and the problem must be the return path. > > Regards > Oliver > -- Maciej Żenczykowski, Kernel Networking Developer @ Google ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2023-12-19 16:25 UTC | newest] Thread overview: 29+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-12-04 18:37 Question regarding CDC NCM and VNC performance issue Hiago De Franco 2023-12-05 9:50 ` Oliver Neukum 2023-12-05 12:22 ` Hiago De Franco 2023-12-05 12:38 ` Oliver Neukum 2023-12-05 12:55 ` Hiago De Franco 2023-12-05 20:14 ` Hiago De Franco 2023-12-06 12:21 ` Oliver Neukum 2023-12-06 16:23 ` Hiago De Franco 2023-12-07 9:41 ` Oliver Neukum 2023-12-07 10:03 ` Francesco Dolcini 2023-12-07 11:07 ` Oliver Neukum 2023-12-07 11:38 ` Maciej Żenczykowski 2023-12-07 18:57 ` Hiago De Franco 2023-12-07 19:37 ` Maciej Żenczykowski 2023-12-11 20:28 ` Hiago De Franco 2023-12-11 20:44 ` Maciej Żenczykowski 2023-12-12 9:48 ` Oliver Neukum 2023-12-12 20:32 ` Hiago De Franco 2023-12-12 21:21 ` Oliver Neukum 2023-12-13 1:11 ` Maciej Żenczykowski 2023-12-15 21:20 ` Hiago De Franco 2023-12-15 21:44 ` Maciej Żenczykowski 2023-12-18 14:34 ` Oliver Neukum 2023-12-18 20:59 ` Hiago De Franco 2023-12-18 23:00 ` Maciej Żenczykowski 2023-12-19 8:45 ` Oliver Neukum 2023-12-19 12:19 ` Maciej Żenczykowski 2023-12-19 13:24 ` Oliver Neukum 2023-12-19 16:25 ` Maciej Żenczykowski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox