Question regarding CDC NCM and VNC performance issue

Linux USB
 help / color / mirror / Atom feed

* Question regarding CDC NCM and VNC performance issue
@ 2023-12-04 18:37 Hiago De Franco
  2023-12-05  9:50 ` Oliver Neukum
  0 siblings, 1 reply; 29+ messages in thread
From: Hiago De Franco @ 2023-12-04 18:37 UTC (permalink / raw)
  To: linux-usb
  Cc: Oliver Neukum, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Hiago De Franco

From: Hiago De Franco <hiago.franco@toradex.com>

Hi,

I hope this email finds you well. I would like to ask a question
regarding the usage of CDC NCM and VCN on an ARM device running the Linux
kernel, connected to a host PC, also running Linux, via USB.

I have successfully set up a CDC NCM connection, where my device acts as
a DHCP server, and the connection is established without any issues.

Now, I have a VNC server running on the device. However, when I connect to
the VNC server, the performance is extremely slow and it also freezes,
making it unusable. Interestingly, when I simultaneously launch iperf3
using the same NCM connection, the VNC performance improves significantly.
The VNC remains smooth as long as iperf3 is active. When iperf3 finishes,
the VNC returns to its frozen or slow state.

I've tested the same setup with Windows 11, and in that case, the VNC
operates normally without any issues. I've also compared the network
traffic of both Windows and Linux connections using Wireshark but haven't
identified anything relevant.

Has anyone encountered a similar issue before? Could this be related to
the size or quantity of transmitted packages? 

Any insights or assistance on this matter would be greatly appreciated.

Thanks,
Hiago

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Question regarding CDC NCM and VNC performance issue
  2023-12-04 18:37 Question regarding CDC NCM and VNC performance issue Hiago De Franco
@ 2023-12-05  9:50 ` Oliver Neukum
  2023-12-05 12:22   ` Hiago De Franco
  0 siblings, 1 reply; 29+ messages in thread
From: Oliver Neukum @ 2023-12-05  9:50 UTC (permalink / raw)
  To: Hiago De Franco, linux-usb
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Hiago De Franco

On 04.12.23 19:37, Hiago De Franco wrote:

> 
> Has anyone encountered a similar issue before? Could this be related to
> the size or quantity of transmitted packages?

At first thought my gut feeling is that the packet bonding is killing
your performance. What does a simple ping do?

	Regards
		Oliver



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Question regarding CDC NCM and VNC performance issue
  2023-12-05  9:50 ` Oliver Neukum
@ 2023-12-05 12:22   ` Hiago De Franco
  2023-12-05 12:38     ` Oliver Neukum
  0 siblings, 1 reply; 29+ messages in thread
From: Hiago De Franco @ 2023-12-05 12:22 UTC (permalink / raw)
  To: oneukum; +Cc: davem, edumazet, hiago.franco, hiagofranco, kuba, linux-usb,
	pabeni

From: Hiago De Franco <hiago.franco@toradex.com>

Hi Oliver,

> At first thought my gut feeling is that the packet bonding is killing
> your performance. What does a simple ping do?

Thanks for the reply.

Pinging the device works with or without the VNC connected.

$ ping 192.168.11.2
PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data.
64 bytes from 192.168.11.2: icmp_seq=1 ttl=64 time=0.027 ms
64 bytes from 192.168.11.2: icmp_seq=2 ttl=64 time=0.092 ms
64 bytes from 192.168.11.2: icmp_seq=3 ttl=64 time=0.072 ms
...
64 bytes from 192.168.11.2: icmp_seq=37 ttl=64 time=0.065 ms
64 bytes from 192.168.11.2: icmp_seq=38 ttl=64 time=0.070 ms
64 bytes from 192.168.11.2: icmp_seq=39 ttl=64 time=0.090 ms
^C
--- 192.168.11.2 ping statistics ---
39 packets transmitted, 39 received, 0% packet loss, time 38910ms
rtt min/avg/max/mdev = 0.027/0.076/0.105/0.015 ms

During the ping, nothing changes, VNC continues to be extremely slow.

I switched from NCM to ECM for testing purposes. On ECM, everything works
seamlessly without any issues. It appears that the problem may be specific
to the combination of NCM and the Linux host machine, as the setup works
fine on Windows.

I'm curious if there are issues with NCM and small USB packet lengths.
The VNC application tends to send numerous small packets (1514 bytes).

Regards,
Hiago

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Question regarding CDC NCM and VNC performance issue
  2023-12-05 12:22   ` Hiago De Franco
@ 2023-12-05 12:38     ` Oliver Neukum
  2023-12-05 12:55       ` Hiago De Franco
  2023-12-05 20:14       ` Hiago De Franco
  0 siblings, 2 replies; 29+ messages in thread
From: Oliver Neukum @ 2023-12-05 12:38 UTC (permalink / raw)
  To: Hiago De Franco, oneukum
  Cc: davem, edumazet, hiago.franco, kuba, linux-usb, pabeni



On 05.12.23 13:22, Hiago De Franco wrote:

Hi,

> During the ping, nothing changes, VNC continues to be extremely slow.

I was wondering whether you ping itself has bad delays.

> I switched from NCM to ECM for testing purposes. On ECM, everything works
> seamlessly without any issues. It appears that the problem may be specific
> to the combination of NCM and the Linux host machine, as the setup works
> fine on Windows.
> 
> I'm curious if there are issues with NCM and small USB packet lengths.
> The VNC application tends to send numerous small packets (1514 bytes).

1514 is a standard ethernet packet. What is wNtbOutMaxDatagrams on your
device?

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Question regarding CDC NCM and VNC performance issue
  2023-12-05 12:38     ` Oliver Neukum
@ 2023-12-05 12:55       ` Hiago De Franco
  2023-12-05 20:14       ` Hiago De Franco
  1 sibling, 0 replies; 29+ messages in thread
From: Hiago De Franco @ 2023-12-05 12:55 UTC (permalink / raw)
  To: oneukum; +Cc: davem, edumazet, hiago.franco, hiagofranco, kuba, linux-usb,
	pabeni

From: Hiago De Franco <hiago.franco@toradex.com>

Hi Oliver,

In my last reply, I sent the ping command with the wrong target. This is
the correct one:

$ ping 192.168.11.1
PING 192.168.11.1 (192.168.11.1) 56(84) bytes of data.
...
64 bytes from 192.168.11.1: icmp_seq=23 ttl=64 time=2.23 ms
64 bytes from 192.168.11.1: icmp_seq=24 ttl=64 time=2.27 ms
64 bytes from 192.168.11.1: icmp_seq=25 ttl=64 time=2.14 ms
64 bytes from 192.168.11.1: icmp_seq=26 ttl=64 time=2.24 ms
64 bytes from 192.168.11.1: icmp_seq=27 ttl=64 time=2.26 ms
64 bytes from 192.168.11.1: icmp_seq=28 ttl=64 time=2.14 ms
64 bytes from 192.168.11.1: icmp_seq=29 ttl=64 time=3106 ms
64 bytes from 192.168.11.1: icmp_seq=30 ttl=64 time=2097 ms
64 bytes from 192.168.11.1: icmp_seq=31 ttl=64 time=1073 ms
64 bytes from 192.168.11.1: icmp_seq=32 ttl=64 time=48.7 ms
64 bytes from 192.168.11.1: icmp_seq=33 ttl=64 time=2791 ms
64 bytes from 192.168.11.1: icmp_seq=34 ttl=64 time=1781 ms
64 bytes from 192.168.11.1: icmp_seq=35 ttl=64 time=757 ms
64 bytes from 192.168.11.1: icmp_seq=36 ttl=64 time=6.48 ms
64 bytes from 192.168.11.1: icmp_seq=37 ttl=64 time=5828 ms
64 bytes from 192.168.11.1: icmp_seq=38 ttl=64 time=4821 ms
64 bytes from 192.168.11.1: icmp_seq=39 ttl=64 time=3793 ms
64 bytes from 192.168.11.1: icmp_seq=40 ttl=64 time=2773 ms
64 bytes from 192.168.11.1: icmp_seq=41 ttl=64 time=1745 ms
64 bytes from 192.168.11.1: icmp_seq=42 ttl=64 time=725 ms
64 bytes from 192.168.11.1: icmp_seq=43 ttl=64 time=6.11 ms
64 bytes from 192.168.11.1: icmp_seq=44 ttl=64 time=2467 ms
64 bytes from 192.168.11.1: icmp_seq=45 ttl=64 time=1457 ms
64 bytes from 192.168.11.1: icmp_seq=46 ttl=64 time=437 ms
64 bytes from 192.168.11.1: icmp_seq=47 ttl=64 time=8.34 ms
64 bytes from 192.168.11.1: icmp_seq=48 ttl=64 time=2403 ms
64 bytes from 192.168.11.1: icmp_seq=49 ttl=64 time=1397 ms
64 bytes from 192.168.11.1: icmp_seq=50 ttl=64 time=373 ms
64 bytes from 192.168.11.1: icmp_seq=51 ttl=64 time=3212 ms
64 bytes from 192.168.11.1: icmp_seq=52 ttl=64 time=2197 ms
64 bytes from 192.168.11.1: icmp_seq=53 ttl=64 time=1169 ms
64 bytes from 192.168.11.1: icmp_seq=54 ttl=64 time=145 ms
64 bytes from 192.168.11.1: icmp_seq=55 ttl=64 time=7.27 ms
64 bytes from 192.168.11.1: icmp_seq=56 ttl=64 time=2303 ms
64 bytes from 192.168.11.1: icmp_seq=57 ttl=64 time=1301 ms
64 bytes from 192.168.11.1: icmp_seq=58 ttl=64 time=273 ms
64 bytes from 192.168.11.1: icmp_seq=59 ttl=64 time=2.01 ms
64 bytes from 192.168.11.1: icmp_seq=60 ttl=64 time=2335 ms
64 bytes from 192.168.11.1: icmp_seq=61 ttl=64 time=1333 ms
64 bytes from 192.168.11.1: icmp_seq=62 ttl=64 time=305 ms
64 bytes from 192.168.11.1: icmp_seq=63 ttl=64 time=2.14 ms
64 bytes from 192.168.11.1: icmp_seq=64 ttl=64 time=2.26 ms
64 bytes from 192.168.11.1: icmp_seq=65 ttl=64 time=2.25 ms
64 bytes from 192.168.11.1: icmp_seq=66 ttl=64 time=2.19 ms
64 bytes from 192.168.11.1: icmp_seq=67 ttl=64 time=2.20 ms
64 bytes from 192.168.11.1: icmp_seq=68 ttl=64 time=2.24 ms
64 bytes from 192.168.11.1: icmp_seq=69 ttl=64 time=2.21 ms
64 bytes from 192.168.11.1: icmp_seq=70 ttl=64 time=2.11 ms
64 bytes from 192.168.11.1: icmp_seq=71 ttl=64 time=2.19 ms
64 bytes from 192.168.11.1: icmp_seq=72 ttl=64 time=2.19 ms
^C
--- 192.168.11.1 ping statistics ---
72 packets transmitted, 72 received, 0% packet loss, time 71445ms
rtt min/avg/max/mdev = 2.011/757.798/5827.620/1266.170 ms, pipe 6

So everything is going well, around 2.1 ms, until I connect to VNC. When
this happens, the ping spikes to 1000~2000 ms. Then when I close the
connection, it gets back to ~2 ms. I can also see some "TCP
Retransmission" on Wireshark.

On 05.12.23 13:38, Oliver Neukum wrote:

> 1514 is a standard ethernet packet. What is wNtbOutMaxDatagrams on your
> device?

I couldn't find this file on my device. On my Host Linux machine it
reports 0.

Regards,
Hiago

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Question regarding CDC NCM and VNC performance issue
  2023-12-05 12:38     ` Oliver Neukum
  2023-12-05 12:55       ` Hiago De Franco
@ 2023-12-05 20:14       ` Hiago De Franco
  2023-12-06 12:21         ` Oliver Neukum
  1 sibling, 1 reply; 29+ messages in thread
From: Hiago De Franco @ 2023-12-05 20:14 UTC (permalink / raw)
  To: oneukum; +Cc: davem, edumazet, hiago.franco, hiagofranco, kuba, linux-usb,
	pabeni

From: Hiago De Franco <hiago.franco@toradex.com>

Hi Oliver,

Following up on your initial email, I found that adjusting the NTB size
successfully resolved the problem on both Linux and Windows. The VNC
client is now operating with speed and stability after dividing the
package size by 8.

I know that altering the NTB size is not the correct solution, please
consider the attached patch as an explanation, but the assumption that the
issue may be related to an accumulation of small packages, causing delays
in fulfilling the 16KB requirement, seems related.

It appears that the proper fix involves the `hrtimer_start` function on
line 1084 of `f_ncm.c`. However, I'm not sure what I could do to fix it.

Thanks and regards,
Hiago.

 drivers/usb/gadget/function/f_ncm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/gadget/function/f_ncm.c b/drivers/usb/gadget/function/f_ncm.c
index cc0ed29a4adc..138982da794a 100644
--- a/drivers/usb/gadget/function/f_ncm.c
+++ b/drivers/usb/gadget/function/f_ncm.c
@@ -88,8 +88,8 @@ static inline struct f_ncm *func_to_ncm(struct usb_function *f)
  * If the host can group frames, allow it to do that, 16K is selected,
  * because it's used by default by the current linux host driver
  */
-#define NTB_DEFAULT_IN_SIZE	16384
-#define NTB_OUT_SIZE		16384
+#define NTB_DEFAULT_IN_SIZE	2048
+#define NTB_OUT_SIZE		2048

 /* Allocation for storing the NDP, 32 should suffice for a
  * 16k packet. This allows a maximum of 32 * 507 Byte packets to
-- 
2.42.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: Question regarding CDC NCM and VNC performance issue
  2023-12-05 20:14       ` Hiago De Franco
@ 2023-12-06 12:21         ` Oliver Neukum
  2023-12-06 16:23           ` Hiago De Franco
  0 siblings, 1 reply; 29+ messages in thread
From: Oliver Neukum @ 2023-12-06 12:21 UTC (permalink / raw)
  To: Hiago De Franco; +Cc: davem, edumazet, hiago.franco, kuba, linux-usb, pabeni



On 05.12.23 21:14, Hiago De Franco wrote:
> From: Hiago De Franco <hiago.franco@toradex.com>
> 
> Hi Oliver,
> 
> Following up on your initial email, I found that adjusting the NTB size
> successfully resolved the problem on both Linux and Windows. The VNC
> client is now operating with speed and stability after dividing the
> package size by 8.

OK, that indicates that your performance is bad because your VNC client
was only partially filling the buffers, which caused most of your packets
to be transmitted only upon a timeout.

At least this is my working hypothesis.

> I know that altering the NTB size is not the correct solution, please
> consider the attached patch as an explanation, but the assumption that the
> issue may be related to an accumulation of small packages, causing delays
> in fulfilling the 16KB requirement, seems related.

Before we proceed, however, your patch raises a fundamental question:
Are we talking about gadget or host mode here?
Could you please describe your setup in detail?

	Regards
		Oliver



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Question regarding CDC NCM and VNC performance issue
  2023-12-06 12:21         ` Oliver Neukum
@ 2023-12-06 16:23           ` Hiago De Franco
  2023-12-07  9:41             ` Oliver Neukum
  0 siblings, 1 reply; 29+ messages in thread
From: Hiago De Franco @ 2023-12-06 16:23 UTC (permalink / raw)
  To: oneukum; +Cc: davem, edumazet, hiago.franco, hiagofranco, kuba, linux-usb,
	pabeni

From: Hiago De Franco <hiago.franco@toradex.com>

Hi Oliver,

On 06.12.23 13:21, Oliver Neukum wrote:
> Before we proceed, however, your patch raises a fundamental question:
> Are we talking about gadget or host mode here?
> Could you please describe your setup in detail?

Sure, I'm running a Apalis iMX6QD 512MB, which is basically an ARM 32-bits
device with NXP iMX6QD. This device is running Linux with wayland, weston
and the neatvnc server, so I can access my app using a vnc client on my
host PC (also running Linux, Debian 12).

The device is connected to my host pc using a USB cable. The gadget is my
ARM device, which I'm configuring using the libusbgx to create a new
gadget called 'gncm'. The host, in this case, is my Debian PC. My device
is automatically configured to use the ip 192.168.11.1 and give to my PC,
using the udhcpd from busybox, the ip 192.168.11.2.

Everything works, when I connect the ARM device to my PC with the USB
cable, my Debian automatically detects as a new interface and get the
192.168.11.2 ip:

39: enx4224347674f3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
fq_codel state UP group default qlen 1000
    link/ether 42:24:34:76:74:f3 brd ff:ff:ff:ff:ff:ff
    inet 192.168.11.2/24 brd 192.168.11.255 scope global dynamic
    noprefixroute enx4224347674f3
       valid_lft 863996sec preferred_lft 863996sec
    inet6 fe80::8959:d7da:f1fc:774e/64 scope link noprefixroute
       valid_lft forever preferred_lft forever

While on the device:

/ #
3: usb0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen
1000
    link/ether 7a:c5:38:46:21:e5 brd ff:ff:ff:ff:ff:ff
    inet 192.168.11.1/24 scope global usb0
       valid_lft forever preferred_lft forever

/ # cat /sys/class/udc/ci_hdrc.0/function
gncm
/ # ls /sys/kernel/config/usb_gadget/gncm/
UDC              bMaxPacketSize0  functions        os_desc
bDeviceClass     bcdDevice        idProduct        strings
bDeviceProtocol  bcdUSB           idVendor
bDeviceSubClass  configs          max_speed

Finally, I open a VNC client on my host PC and connect to 192.168.11.1,
where the issue begins with it being very slow/frozen. After applying the
patch on the device (Apalis iMX6), now VNC runs smoothly.

Now I will also try to debug the driver on the PC host side.

Thanks and regards,
Hiago

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Question regarding CDC NCM and VNC performance issue
  2023-12-06 16:23           ` Hiago De Franco
@ 2023-12-07  9:41             ` Oliver Neukum
  2023-12-07 10:03               ` Francesco Dolcini
  0 siblings, 1 reply; 29+ messages in thread
From: Oliver Neukum @ 2023-12-07  9:41 UTC (permalink / raw)
  To: Hiago De Franco, oneukum, Maciej Żenczykowski
  Cc: davem, edumazet, hiago.franco, kuba, linux-usb, pabeni



On 06.12.23 17:23, Hiago De Franco wrote:
> From: Hiago De Franco <hiago.franco@toradex.com>
> 

Good morning,

> Hi Oliver,
> 
> On 06.12.23 13:21, Oliver Neukum wrote:
>> Before we proceed, however, your patch raises a fundamental question:
>> Are we talking about gadget or host mode here?
>> Could you please describe your setup in detail?
> 
> Sure, I'm running a Apalis iMX6QD 512MB, which is basically an ARM 32-bits
> device with NXP iMX6QD. This device is running Linux with wayland, weston
> and the neatvnc server, so I can access my app using a vnc client on my
> host PC (also running Linux, Debian 12).

OK, you are using Linux on _both_ sides. Interesting, far from obvious, though.
(Putting Maciej into CC)  
> Finally, I open a VNC client on my host PC and connect to 192.168.11.1,
> where the issue begins with it being very slow/frozen. After applying the
> patch on the device (Apalis iMX6), now VNC runs smoothly.

For debugging purposes your first patch modifies the result of
USB_CDC_GET_NTB_PARAMETERS. Thereby both the host and the gadget change
their behavior. Hence we cannot tell which side has a problem with its
buffering at the old buffer size.

Can you tell us which side in your test case produces many small packets?

Furthermore, for testing purposes, could you decrease TX_TIMEOUT_NSECS in
f_ncm.c by an order of magnitude.

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Question regarding CDC NCM and VNC performance issue
  2023-12-07  9:41             ` Oliver Neukum
@ 2023-12-07 10:03               ` Francesco Dolcini
  2023-12-07 11:07                 ` Oliver Neukum
  0 siblings, 1 reply; 29+ messages in thread
From: Francesco Dolcini @ 2023-12-07 10:03 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Hiago De Franco, Maciej Żenczykowski, davem, edumazet,
	hiago.franco, kuba, linux-usb, pabeni

Hello Oliver and Maciej

On Thu, Dec 07, 2023 at 10:41:51AM +0100, Oliver Neukum wrote:
> On 06.12.23 17:23, Hiago De Franco wrote:
> > From: Hiago De Franco <hiago.franco@toradex.com>
> > On 06.12.23 13:21, Oliver Neukum wrote:
> > > Before we proceed, however, your patch raises a fundamental question:
> > > Are we talking about gadget or host mode here?
> > > Could you please describe your setup in detail?
> > 
> > Sure, I'm running a Apalis iMX6QD 512MB, which is basically an ARM 32-bits
> > device with NXP iMX6QD. This device is running Linux with wayland, weston
> > and the neatvnc server, so I can access my app using a vnc client on my
> > host PC (also running Linux, Debian 12).
> 
> OK, you are using Linux on _both_ sides. Interesting, far from obvious, though.
> (Putting Maciej into CC)

One more data point. If the USB host side is Windows and not Linux it
works fine.

Francesco


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Question regarding CDC NCM and VNC performance issue
  2023-12-07 10:03               ` Francesco Dolcini
@ 2023-12-07 11:07                 ` Oliver Neukum
  2023-12-07 11:38                   ` Maciej Żenczykowski
  0 siblings, 1 reply; 29+ messages in thread
From: Oliver Neukum @ 2023-12-07 11:07 UTC (permalink / raw)
  To: Francesco Dolcini, Oliver Neukum
  Cc: Hiago De Franco, Maciej Żenczykowski, davem, edumazet,
	hiago.franco, kuba, linux-usb, pabeni



On 07.12.23 11:03, Francesco Dolcini wrote:
> Hello Oliver and Maciej
> 
> On Thu, Dec 07, 2023 at 10:41:51AM +0100, Oliver Neukum wrote:

>> OK, you are using Linux on _both_ sides. Interesting, far from obvious, though.
>> (Putting Maciej into CC)
> 
> One more data point. If the USB host side is Windows and not Linux it
> works fine.

That suggests, but does not prove that the issue is on the host side.
Could you post the result of "ethtool -S" after a test run? We should
get statistics on the reasons for transmissions that way.

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Question regarding CDC NCM and VNC performance issue
  2023-12-07 11:07                 ` Oliver Neukum
@ 2023-12-07 11:38                   ` Maciej Żenczykowski
  2023-12-07 18:57                     ` Hiago De Franco
  0 siblings, 1 reply; 29+ messages in thread
From: Maciej Żenczykowski @ 2023-12-07 11:38 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Francesco Dolcini, Hiago De Franco, davem, edumazet, hiago.franco,
	kuba, linux-usb, pabeni

On Thu, Dec 7, 2023 at 12:07 PM Oliver Neukum <oneukum@suse.com> wrote:
> On 07.12.23 11:03, Francesco Dolcini wrote:
> > Hello Oliver and Maciej
> >
> > On Thu, Dec 07, 2023 at 10:41:51AM +0100, Oliver Neukum wrote:
>
> >> OK, you are using Linux on _both_ sides. Interesting, far from obvious, though.
> >> (Putting Maciej into CC)
> >
> > One more data point. If the USB host side is Windows and not Linux it
> > works fine.
>
> That suggests, but does not prove that the issue is on the host side.
> Could you post the result of "ethtool -S" after a test run? We should
> get statistics on the reasons for transmissions that way.

An every 1s (the default) ping is too rare to be of help I'd assume...
Try ping with various intervals (-i).  All the way down to a flood ping (-f).
Most likely -i 0.01 would be enough to make things work better...

Also which specific versions of the kernel are involved on both sides
of the connection.

There was a pretty recent fix related to packet aggregation recently
that could be either the fix or the cause.
  "usb: gadget: ncm: Handle decoding of multiple NTB's in unwrap call"
Though I doubt it - I believe that was specific to how windows packs things.

Also Krishna Kurapati has a (afaik still not merged) patch "usb:
gadget: ncm: Add support to configure wMaxSegmentSize"
that could be of use - though again, doubt it.

Another thing that comes to mind, is that perhaps the device in
question does not have sufficiently high res timers?
There might be something in the kernel boot log / dmesg about hrtimer
resolution...
Maybe this just needs to be configurable...  Or pick a smaller value
with broken hrtimer (if that's the issue),
or just disable aggregation if lowres hrtimers... etc...

#define TX_TIMEOUT_NSECS 300000
300 us is too small to be noticeable by VNC imho, so I think something
*must* be misbehaving.
Perhaps timer resolution is bad and this 300us ends up being much larger...???

I wonder if the hrtimer_init() call should be with CLOCK_BOOTTIME
instead of CLOCK_MONOTONIC.
There could potentially be an issue with suspend, though I really doubt it.

Another idea would be to add a gadget setting to disable tx
aggregation entirely...
(note that reducing from 8000 to 2000 doesn't actually turn off aggregation...)

Have you tried reducing from 8000 to 4000 or 3500 or 3000?
Maybe there's some funkiness with page sizes??

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Question regarding CDC NCM and VNC performance issue
  2023-12-07 11:38                   ` Maciej Żenczykowski
@ 2023-12-07 18:57                     ` Hiago De Franco
  2023-12-07 19:37                       ` Maciej Żenczykowski
  0 siblings, 1 reply; 29+ messages in thread
From: Hiago De Franco @ 2023-12-07 18:57 UTC (permalink / raw)
  To: Oliver Neukum, Maciej Żenczykowski
  Cc: Francesco Dolcini, Hiago De Franco, davem, edumazet, hiago.franco,
	kuba, linux-usb, pabeni

Hello Oliver and Maciej,

On Thu, Dec 07, 2023 at 10:41:51AM +0100, Oliver Neukum wrote:
> Can you tell us which side in your test case produces many small packets?
> 
> Furthermore, for testing purposes, could you decrease TX_TIMEOUT_NSECS in
> f_ncm.c by an order of magnitude.

The device side, Apalis iMX6, is the one producing the small packages
and sending to my host PC. The VNC server is running on iMX6 device and,
the client, on my host PC.

I've decreased TX_TIMEOUT_NSECS from 300000 to 30000, but nothing
changed, the behaviour is the same.

On Thu, Dec 07, 2023 at 12:07:25PM +0100, Oliver Neukum wrote:
> That suggests, but does not prove that the issue is on the host side.
> Could you post the result of "ethtool -S" after a test run? We should
> get statistics on the reasons for transmissions that way.

Sure, for some reason I couldn't run ethtool on the iMX6 device:

/ # ethtool -S usb0
no stats available

So I ran everything on my Debian host PC. First, without any changes on
the device's kernel, this is the result (where the VNC is really
slow/frozen):

$ sudo ethtool -S enx3e5dcdead75e
NIC statistics:
     tx_reason_ntb_full: 0
     tx_reason_ndp_full: 0
     tx_reason_timeout: 222
     tx_reason_max_datagram: 0
     tx_overhead: 42387
     tx_ntbs: 222
     rx_overhead: 38375
     rx_ntbs: 529

Next, I decreased NTB_DEFAULT_IN_SIZE and NTB_OUT_SIZE from 16384 to
8192. The performance improved a bit, and this is the result:

$ sudo ethtool -S enx42ff68c1000a
NIC statistics:
     tx_reason_ntb_full: 0
     tx_reason_ndp_full: 0
     tx_reason_timeout: 321
     tx_reason_max_datagram: 0
     tx_overhead: 61617
     tx_ntbs: 321
     rx_overhead: 59341
     rx_ntbs: 759

Finally, I changed from 8192 to 4096, and the perfomance was 
better:

$ sudo ethtool -S enx3a601e306de1
NIC statistics:
     tx_reason_ntb_full: 0
     tx_reason_ndp_full: 0
     tx_reason_timeout: 56067
     tx_reason_max_datagram: 0
     tx_overhead: 83630876
     tx_ntbs: 56064
     rx_overhead: 25437595
     rx_ntbs: 847920

At 4096 I can use the VNC with my app, click on buttons and see the mouse
moving smoothly. Please note the device name changes because we're using
random MAC addresses. 'ethtool' was running on my Debian host PC. I tested
for 1min30s and then got the statics with ethtool for all 3 tests.

On Thu, Dec 07, 2023 at 12:38:18PM +0100, Maciej Żenczykowski wrote:
> 
> An every 1s (the default) ping is too rare to be of help I'd assume...
> Try ping with various intervals (-i).  All the way down to a flood ping (-f).
> Most likely -i 0.01 would be enough to make things work better...

This one is interesting, backing to the default value of 16384, I
launched the VNC client which now is back to be slow/frozen. In
parallel, I started the ping command. The first one with 1 second
results the following (I'm pingging my host PC, using my device's
terminal):

/ # ping 192.168.11.2 -i 1
PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data.
64 bytes from 192.168.11.2: icmp_seq=1 ttl=64 time=5070 ms
64 bytes from 192.168.11.2: icmp_seq=2 ttl=64 time=4003 ms
64 bytes from 192.168.11.2: icmp_seq=3 ttl=64 time=2963 ms
64 bytes from 192.168.11.2: icmp_seq=4 ttl=64 time=1923 ms
64 bytes from 192.168.11.2: icmp_seq=5 ttl=64 time=883 ms
^C
--- 192.168.11.2 ping statistics ---
26 packets transmitted, 5 received, 80.7692% packet loss, time 25950ms
rtt min/avg/max/mdev = 882.886/2968.250/5069.878/1478.425 ms, pipe 5

Ping is really slow and lost almost all packets. Next test, I decreased
to 0.1s:

/ # ping 192.168.11.2 -i 0.1
PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data.
...
--- 192.168.11.2 ping statistics ---
129 packets transmitted, 122 received, 5.42636% packet loss, time 13971ms
rtt min/avg/max/mdev = 1.752/999.274/2751.767/799.248 ms, pipe 26

While ping is running in parallel, VNC has a better performance, I can
see my mouse running and click on some buttons. As soon as ping stops,
VNC is slow/frozen again. Also we have less packet loss.

Next test, decreased to 0.01:

/ # ping 192.168.11.2 -i 0.01
PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data.
...
--- 192.168.11.2 ping statistics ---
584 packets transmitted, 572 received, 2.05479% packet loss, time 10106ms
rtt min/avg/max/mdev = 1.565/171.031/559.872/165.475 ms, pipe 28

And finally, the flood:

/ # ping 192.168.11.2 -f
PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data.
...
--- 192.168.11.2 ping statistics ---
1314 packets transmitted, 1311 received, 0.22831% packet loss, time 16299ms
rtt min/avg/max/mdev = 0.729/131.710/548.971/163.098 ms, pipe 28, ipg/ewma
12.413/7.802 ms

While the flood is happening in parallel, the VNC runs very smoothly,
and, again, as soon as it stops, it's back to slow/frozen.

I believe here the ping command is helping to fullfil the buffer, that's
why running it on parallel makes the VNC work...

> Also which specific versions of the kernel are involved on both sides
> of the connection.

Device iMX6 is running Linux kernel v6.1.65, while my host PC is running
Linux kernel v6.5.0.

> There was a pretty recent fix related to packet aggregation recently
> that could be either the fix or the cause.
>   "usb: gadget: ncm: Handle decoding of multiple NTB's in unwrap call"
> Though I doubt it - I believe that was specific to how windows packs things.
> 
> Also Krishna Kurapati has a (afaik still not merged) patch "usb:
> gadget: ncm: Add support to configure wMaxSegmentSize"
> that could be of use - though again, doubt it.

I could also try to apply these patches and check how it goes. Thanks
for the information.

> Another thing that comes to mind, is that perhaps the device in
> question does not have sufficiently high res timers?
> There might be something in the kernel boot log / dmesg about hrtimer
> resolution...
> Maybe this just needs to be configurable...  Or pick a smaller value
> with broken hrtimer (if that's the issue),
> or just disable aggregation if lowres hrtimers... etc...
> 
> #define TX_TIMEOUT_NSECS 300000
> 300 us is too small to be noticeable by VNC imho, so I think something
> *must* be misbehaving.
> Perhaps timer resolution is bad and this 300us ends up being much larger...???

This is what I got from dmesg inside the iMX6 device:

/ # dmesg | grep timer
[    0.000000] Switching to timer-based delay loop, resolution 333ns
[    0.000019] clocksource: mxc_timer1: mask: 0xffffffff max_cycles:
0xffffffff, max_idle_ns: 637086815595 ns
[    0.001545] Calibrating delay loop (skipped), value calculated using
timer frequency.. 6.00 BogoMIPS (lpj=30000)
[    0.203469] clocksource: Switched to clocksource mxc_timer1

> I wonder if the hrtimer_init() call should be with CLOCK_BOOTTIME
> instead of CLOCK_MONOTONIC.
> There could potentially be an issue with suspend, though I really doubt it.

Also tested this, but it didn't change anything, VNC is still
slow/frozen.

> Another idea would be to add a gadget setting to disable tx
> aggregation entirely...
> (note that reducing from 8000 to 2000 doesn't actually turn off aggregation...)
> 
> Have you tried reducing from 8000 to 4000 or 3500 or 3000?
> Maybe there's some funkiness with page sizes??

Sorry, do you mean reducing the both NTB_DEFAULT_IN_SIZE and NTB_OUT_SIZE?

If so, I tried to reduce from 16384 to 8192, 4096 and 2048. Everytime I
reduced the value, it got better and better, VNC running smoothly.

I will keep testing to see if I get something.

Thanks and regards,
Hiago.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Question regarding CDC NCM and VNC performance issue
  2023-12-07 18:57                     ` Hiago De Franco
@ 2023-12-07 19:37                       ` Maciej Żenczykowski
  2023-12-11 20:28                         ` Hiago De Franco
  0 siblings, 1 reply; 29+ messages in thread
From: Maciej Żenczykowski @ 2023-12-07 19:37 UTC (permalink / raw)
  To: Hiago De Franco
  Cc: Oliver Neukum, Francesco Dolcini, davem, edumazet, hiago.franco,
	kuba, linux-usb, pabeni

On Thu, Dec 7, 2023 at 7:57 PM Hiago De Franco <hiagofranco@gmail.com> wrote:
>
> Hello Oliver and Maciej,
>
> On Thu, Dec 07, 2023 at 10:41:51AM +0100, Oliver Neukum wrote:
> > Can you tell us which side in your test case produces many small packets?
> >
> > Furthermore, for testing purposes, could you decrease TX_TIMEOUT_NSECS in
> > f_ncm.c by an order of magnitude.
>
> The device side, Apalis iMX6, is the one producing the small packages
> and sending to my host PC. The VNC server is running on iMX6 device and,
> the client, on my host PC.
>
> I've decreased TX_TIMEOUT_NSECS from 300000 to 30000, but nothing
> changed, the behaviour is the same.
>
> On Thu, Dec 07, 2023 at 12:07:25PM +0100, Oliver Neukum wrote:
> > That suggests, but does not prove that the issue is on the host side.
> > Could you post the result of "ethtool -S" after a test run? We should
> > get statistics on the reasons for transmissions that way.
>
> Sure, for some reason I couldn't run ethtool on the iMX6 device:
>
> / # ethtool -S usb0
> no stats available
>
> So I ran everything on my Debian host PC. First, without any changes on
> the device's kernel, this is the result (where the VNC is really
> slow/frozen):
>
> $ sudo ethtool -S enx3e5dcdead75e
> NIC statistics:
>      tx_reason_ntb_full: 0
>      tx_reason_ndp_full: 0
>      tx_reason_timeout: 222
>      tx_reason_max_datagram: 0
>      tx_overhead: 42387
>      tx_ntbs: 222
>      rx_overhead: 38375
>      rx_ntbs: 529
>
> Next, I decreased NTB_DEFAULT_IN_SIZE and NTB_OUT_SIZE from 16384 to
> 8192. The performance improved a bit, and this is the result:
>
> $ sudo ethtool -S enx42ff68c1000a
> NIC statistics:
>      tx_reason_ntb_full: 0
>      tx_reason_ndp_full: 0
>      tx_reason_timeout: 321
>      tx_reason_max_datagram: 0
>      tx_overhead: 61617
>      tx_ntbs: 321
>      rx_overhead: 59341
>      rx_ntbs: 759
>
> Finally, I changed from 8192 to 4096, and the perfomance was
> better:
>
> $ sudo ethtool -S enx3a601e306de1
> NIC statistics:
>      tx_reason_ntb_full: 0
>      tx_reason_ndp_full: 0
>      tx_reason_timeout: 56067
>      tx_reason_max_datagram: 0
>      tx_overhead: 83630876
>      tx_ntbs: 56064
>      rx_overhead: 25437595
>      rx_ntbs: 847920
>
> At 4096 I can use the VNC with my app, click on buttons and see the mouse
> moving smoothly. Please note the device name changes because we're using
> random MAC addresses. 'ethtool' was running on my Debian host PC. I tested
> for 1min30s and then got the statics with ethtool for all 3 tests.
>
> On Thu, Dec 07, 2023 at 12:38:18PM +0100, Maciej Żenczykowski wrote:
> >
> > An every 1s (the default) ping is too rare to be of help I'd assume...
> > Try ping with various intervals (-i).  All the way down to a flood ping (-f).
> > Most likely -i 0.01 would be enough to make things work better...
>
> This one is interesting, backing to the default value of 16384, I
> launched the VNC client which now is back to be slow/frozen. In
> parallel, I started the ping command. The first one with 1 second
> results the following (I'm pingging my host PC, using my device's
> terminal):
>
> / # ping 192.168.11.2 -i 1
> PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data.
> 64 bytes from 192.168.11.2: icmp_seq=1 ttl=64 time=5070 ms
> 64 bytes from 192.168.11.2: icmp_seq=2 ttl=64 time=4003 ms
> 64 bytes from 192.168.11.2: icmp_seq=3 ttl=64 time=2963 ms
> 64 bytes from 192.168.11.2: icmp_seq=4 ttl=64 time=1923 ms
> 64 bytes from 192.168.11.2: icmp_seq=5 ttl=64 time=883 ms
> ^C
> --- 192.168.11.2 ping statistics ---
> 26 packets transmitted, 5 received, 80.7692% packet loss, time 25950ms
> rtt min/avg/max/mdev = 882.886/2968.250/5069.878/1478.425 ms, pipe 5
>
> Ping is really slow and lost almost all packets.

It is *slow*.  But I don't think you're actually losing packets.
They simply haven't arrived *yet*.
Notice that the packets are sent every second, but they arrive after
~5/~4/~3/~2/~1 s.
This suggests they all got delivered in one burst.

You are missing 21 packets, but it's all the *final* 21 packets.
You'd need to run for a long long time (maybe 5 minutes or even more)
and see what happens then...
to see if the packets are indeed lost, or just not yet delivered...

> Next test, I decreased
> to 0.1s:
>
> / # ping 192.168.11.2 -i 0.1
> PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data.
> ...
> --- 192.168.11.2 ping statistics ---
> 129 packets transmitted, 122 received, 5.42636% packet loss, time 13971ms
> rtt min/avg/max/mdev = 1.752/999.274/2751.767/799.248 ms, pipe 26
>
> While ping is running in parallel, VNC has a better performance, I can
> see my mouse running and click on some buttons. As soon as ping stops,
> VNC is slow/frozen again. Also we have less packet loss.

Again - this packet loss may be just at the tail.
You'd need to run it for much much longer to see if packet loss is high or not.
It may just be that you always lose the last ~7 packets or so...

>
> Next test, decreased to 0.01:
>
> / # ping 192.168.11.2 -i 0.01
> PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data.
> ...
> --- 192.168.11.2 ping statistics ---
> 584 packets transmitted, 572 received, 2.05479% packet loss, time 10106ms
> rtt min/avg/max/mdev = 1.565/171.031/559.872/165.475 ms, pipe 28
>
> And finally, the flood:
>
> / # ping 192.168.11.2 -f
> PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data.
> ...
> --- 192.168.11.2 ping statistics ---
> 1314 packets transmitted, 1311 received, 0.22831% packet loss, time 16299ms
> rtt min/avg/max/mdev = 0.729/131.710/548.971/163.098 ms, pipe 28, ipg/ewma
> 12.413/7.802 ms
>
> While the flood is happening in parallel, the VNC runs very smoothly,
> and, again, as soon as it stops, it's back to slow/frozen.
>
> I believe here the ping command is helping to fullfil the buffer, that's
> why running it on parallel makes the VNC work...
>
> > Also which specific versions of the kernel are involved on both sides
> > of the connection.
>
> Device iMX6 is running Linux kernel v6.1.65, while my host PC is running
> Linux kernel v6.5.0.
>
> > There was a pretty recent fix related to packet aggregation recently
> > that could be either the fix or the cause.
> >   "usb: gadget: ncm: Handle decoding of multiple NTB's in unwrap call"
> > Though I doubt it - I believe that was specific to how windows packs things.
> >
> > Also Krishna Kurapati has a (afaik still not merged) patch "usb:
> > gadget: ncm: Add support to configure wMaxSegmentSize"
> > that could be of use - though again, doubt it.
>
> I could also try to apply these patches and check how it goes. Thanks
> for the information.
>
> > Another thing that comes to mind, is that perhaps the device in
> > question does not have sufficiently high res timers?
> > There might be something in the kernel boot log / dmesg about hrtimer
> > resolution...
> > Maybe this just needs to be configurable...  Or pick a smaller value
> > with broken hrtimer (if that's the issue),
> > or just disable aggregation if lowres hrtimers... etc...
> >
> > #define TX_TIMEOUT_NSECS 300000
> > 300 us is too small to be noticeable by VNC imho, so I think something
> > *must* be misbehaving.
> > Perhaps timer resolution is bad and this 300us ends up being much larger...???
>
> This is what I got from dmesg inside the iMX6 device:
>
> / # dmesg | grep timer
> [    0.000000] Switching to timer-based delay loop, resolution 333ns
> [    0.000019] clocksource: mxc_timer1: mask: 0xffffffff max_cycles:
> 0xffffffff, max_idle_ns: 637086815595 ns
> [    0.001545] Calibrating delay loop (skipped), value calculated using
> timer frequency.. 6.00 BogoMIPS (lpj=30000)
> [    0.203469] clocksource: Switched to clocksource mxc_timer1
>
> > I wonder if the hrtimer_init() call should be with CLOCK_BOOTTIME
> > instead of CLOCK_MONOTONIC.
> > There could potentially be an issue with suspend, though I really doubt it.
>
> Also tested this, but it didn't change anything, VNC is still
> slow/frozen.
>
> > Another idea would be to add a gadget setting to disable tx
> > aggregation entirely...
> > (note that reducing from 8000 to 2000 doesn't actually turn off aggregation...)
> >
> > Have you tried reducing from 8000 to 4000 or 3500 or 3000?
> > Maybe there's some funkiness with page sizes??
>
> Sorry, do you mean reducing the both NTB_DEFAULT_IN_SIZE and NTB_OUT_SIZE?

I'm not sure ;-)

> If so, I tried to reduce from 16384 to 8192, 4096 and 2048. Everytime I
> reduced the value, it got better and better, VNC running smoothly.
>
> I will keep testing to see if I get something.
>
> Thanks and regards,
> Hiago.

I think we need to figure out whether the problem is host->device or
device->host.
Or perhaps you do already know this??
Maybe this was mentioned already earlier in the thread and I missed it.

Try reducing only in or only out... and changing only one side of the
connection.
Maybe change TX_MAX_NUM_DPE to 1 (or other values)

Maybe change

if (ncm->skb_tx_data
&& (ncm->ndp_dgram_count >= TX_MAX_NUM_DPE
|| (ncm->skb_tx_data->len +
div + rem + skb->len +
ncm->skb_tx_ndp->len + ndp_align + (2 * dgram_idx_len))
> max_size)) {

to just if (ncm->skb_tx_data)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Question regarding CDC NCM and VNC performance issue
  2023-12-07 19:37                       ` Maciej Żenczykowski
@ 2023-12-11 20:28                         ` Hiago De Franco
  2023-12-11 20:44                           ` Maciej Żenczykowski
  0 siblings, 1 reply; 29+ messages in thread
From: Hiago De Franco @ 2023-12-11 20:28 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Oliver Neukum, Francesco Dolcini, davem, edumazet, hiago.franco,
	kuba, linux-usb, pabeni

On Thu, Dec 07, 2023 at 08:37:09PM +0100, Maciej Żenczykowski wrote:
> On Thu, Dec 7, 2023 at 7:57 PM Hiago De Franco <hiagofranco@gmail.com> wrote:
> >
> > Hello Oliver and Maciej,
> >
> > On Thu, Dec 07, 2023 at 10:41:51AM +0100, Oliver Neukum wrote:
> > > Can you tell us which side in your test case produces many small packets?
> > >
> > > Furthermore, for testing purposes, could you decrease TX_TIMEOUT_NSECS in
> > > f_ncm.c by an order of magnitude.
> >
> > The device side, Apalis iMX6, is the one producing the small packages
> > and sending to my host PC. The VNC server is running on iMX6 device and,
> > the client, on my host PC.
> >
> > I've decreased TX_TIMEOUT_NSECS from 300000 to 30000, but nothing
> > changed, the behaviour is the same.
> >
> > On Thu, Dec 07, 2023 at 12:07:25PM +0100, Oliver Neukum wrote:
> > > That suggests, but does not prove that the issue is on the host side.
> > > Could you post the result of "ethtool -S" after a test run? We should
> > > get statistics on the reasons for transmissions that way.
> >
> > Sure, for some reason I couldn't run ethtool on the iMX6 device:
> >
> > / # ethtool -S usb0
> > no stats available
> >
> > So I ran everything on my Debian host PC. First, without any changes on
> > the device's kernel, this is the result (where the VNC is really
> > slow/frozen):
> >
> > $ sudo ethtool -S enx3e5dcdead75e
> > NIC statistics:
> >      tx_reason_ntb_full: 0
> >      tx_reason_ndp_full: 0
> >      tx_reason_timeout: 222
> >      tx_reason_max_datagram: 0
> >      tx_overhead: 42387
> >      tx_ntbs: 222
> >      rx_overhead: 38375
> >      rx_ntbs: 529
> >
> > Next, I decreased NTB_DEFAULT_IN_SIZE and NTB_OUT_SIZE from 16384 to
> > 8192. The performance improved a bit, and this is the result:
> >
> > $ sudo ethtool -S enx42ff68c1000a
> > NIC statistics:
> >      tx_reason_ntb_full: 0
> >      tx_reason_ndp_full: 0
> >      tx_reason_timeout: 321
> >      tx_reason_max_datagram: 0
> >      tx_overhead: 61617
> >      tx_ntbs: 321
> >      rx_overhead: 59341
> >      rx_ntbs: 759
> >
> > Finally, I changed from 8192 to 4096, and the perfomance was
> > better:
> >
> > $ sudo ethtool -S enx3a601e306de1
> > NIC statistics:
> >      tx_reason_ntb_full: 0
> >      tx_reason_ndp_full: 0
> >      tx_reason_timeout: 56067
> >      tx_reason_max_datagram: 0
> >      tx_overhead: 83630876
> >      tx_ntbs: 56064
> >      rx_overhead: 25437595
> >      rx_ntbs: 847920
> >
> > At 4096 I can use the VNC with my app, click on buttons and see the mouse
> > moving smoothly. Please note the device name changes because we're using
> > random MAC addresses. 'ethtool' was running on my Debian host PC. I tested
> > for 1min30s and then got the statics with ethtool for all 3 tests.
> >
> > On Thu, Dec 07, 2023 at 12:38:18PM +0100, Maciej Żenczykowski wrote:
> > >
> > > An every 1s (the default) ping is too rare to be of help I'd assume...
> > > Try ping with various intervals (-i).  All the way down to a flood ping (-f).
> > > Most likely -i 0.01 would be enough to make things work better...
> >
> > This one is interesting, backing to the default value of 16384, I
> > launched the VNC client which now is back to be slow/frozen. In
> > parallel, I started the ping command. The first one with 1 second
> > results the following (I'm pingging my host PC, using my device's
> > terminal):
> >
> > / # ping 192.168.11.2 -i 1
> > PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data.
> > 64 bytes from 192.168.11.2: icmp_seq=1 ttl=64 time=5070 ms
> > 64 bytes from 192.168.11.2: icmp_seq=2 ttl=64 time=4003 ms
> > 64 bytes from 192.168.11.2: icmp_seq=3 ttl=64 time=2963 ms
> > 64 bytes from 192.168.11.2: icmp_seq=4 ttl=64 time=1923 ms
> > 64 bytes from 192.168.11.2: icmp_seq=5 ttl=64 time=883 ms
> > ^C
> > --- 192.168.11.2 ping statistics ---
> > 26 packets transmitted, 5 received, 80.7692% packet loss, time 25950ms
> > rtt min/avg/max/mdev = 882.886/2968.250/5069.878/1478.425 ms, pipe 5
> >
> > Ping is really slow and lost almost all packets.
> 
> It is *slow*.  But I don't think you're actually losing packets.
> They simply haven't arrived *yet*.
> Notice that the packets are sent every second, but they arrive after
> ~5/~4/~3/~2/~1 s.
> This suggests they all got delivered in one burst.
> 
> You are missing 21 packets, but it's all the *final* 21 packets.
> You'd need to run for a long long time (maybe 5 minutes or even more)
> and see what happens then...
> to see if the packets are indeed lost, or just not yet delivered...
>

Great, thanks for the explanation. Indeed, I ran the tests again, but
now I waited 6 minutes for each test, and these are the results:

-i 1
--- 192.168.11.2 ping statistics ---
349 packets transmitted, 349 received, 0% packet loss, time 361724ms
rtt min/avg/max/mdev = 1.597/10751.276/27960.848/12974.107 ms, pipe 27

-i 0.1
--- 192.168.11.2 ping statistics ---
3301 packets transmitted, 3294 received, 0.212057% packet loss, time 361342ms
rtt min/avg/max/mdev = 0.959/1131.714/3081.504/1385.261 ms, pipe 29

-i 0.01
--- 192.168.11.2 ping statistics ---
20718 packets transmitted, 20718 received, 0% packet loss, time 366465ms
rtt min/avg/max/mdev = 0.568/178.764/587.956/234.702 ms, pipe 29

-flood
--- 192.168.11.2 ping statistics ---
29173 packets transmitted, 29149 received, 0.0822679% packet loss, time 362065ms
rtt min/avg/max/mdev = 0.518/127.472/583.971/204.277 ms, pipe 29, ipg/ewma 12.411/21.570 ms

So yes, we can assume 0% of packet loss. Furthermore, I can see the
bursts happening on my terminal, the ping command doesn't do anything
and suddenly all the pings arrive at the same time, over and over again
in a loop.

> > Next test, I decreased
> > to 0.1s:
> >
> > / # ping 192.168.11.2 -i 0.1
> > PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data.
> > ...
> > --- 192.168.11.2 ping statistics ---
> > 129 packets transmitted, 122 received, 5.42636% packet loss, time 13971ms
> > rtt min/avg/max/mdev = 1.752/999.274/2751.767/799.248 ms, pipe 26
> >
> > While ping is running in parallel, VNC has a better performance, I can
> > see my mouse running and click on some buttons. As soon as ping stops,
> > VNC is slow/frozen again. Also we have less packet loss.
> 
> Again - this packet loss may be just at the tail.
> You'd need to run it for much much longer to see if packet loss is high or not.
> It may just be that you always lose the last ~7 packets or so...
> 
> >
> > Next test, decreased to 0.01:
> >
> > / # ping 192.168.11.2 -i 0.01
> > PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data.
> > ...
> > --- 192.168.11.2 ping statistics ---
> > 584 packets transmitted, 572 received, 2.05479% packet loss, time 10106ms
> > rtt min/avg/max/mdev = 1.565/171.031/559.872/165.475 ms, pipe 28
> >
> > And finally, the flood:
> >
> > / # ping 192.168.11.2 -f
> > PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data.
> > ...
> > --- 192.168.11.2 ping statistics ---
> > 1314 packets transmitted, 1311 received, 0.22831% packet loss, time 16299ms
> > rtt min/avg/max/mdev = 0.729/131.710/548.971/163.098 ms, pipe 28, ipg/ewma
> > 12.413/7.802 ms
> >
> > While the flood is happening in parallel, the VNC runs very smoothly,
> > and, again, as soon as it stops, it's back to slow/frozen.
> >
> > I believe here the ping command is helping to fullfil the buffer, that's
> > why running it on parallel makes the VNC work...
> >
> > > Also which specific versions of the kernel are involved on both sides
> > > of the connection.
> >
> > Device iMX6 is running Linux kernel v6.1.65, while my host PC is running
> > Linux kernel v6.5.0.
> >
> > > There was a pretty recent fix related to packet aggregation recently
> > > that could be either the fix or the cause.
> > >   "usb: gadget: ncm: Handle decoding of multiple NTB's in unwrap call"
> > > Though I doubt it - I believe that was specific to how windows packs things.
> > >
> > > Also Krishna Kurapati has a (afaik still not merged) patch "usb:
> > > gadget: ncm: Add support to configure wMaxSegmentSize"
> > > that could be of use - though again, doubt it.
> >
> > I could also try to apply these patches and check how it goes. Thanks
> > for the information.
> >
> > > Another thing that comes to mind, is that perhaps the device in
> > > question does not have sufficiently high res timers?
> > > There might be something in the kernel boot log / dmesg about hrtimer
> > > resolution...
> > > Maybe this just needs to be configurable...  Or pick a smaller value
> > > with broken hrtimer (if that's the issue),
> > > or just disable aggregation if lowres hrtimers... etc...
> > >
> > > #define TX_TIMEOUT_NSECS 300000
> > > 300 us is too small to be noticeable by VNC imho, so I think something
> > > *must* be misbehaving.
> > > Perhaps timer resolution is bad and this 300us ends up being much larger...???
> >
> > This is what I got from dmesg inside the iMX6 device:
> >
> > / # dmesg | grep timer
> > [    0.000000] Switching to timer-based delay loop, resolution 333ns
> > [    0.000019] clocksource: mxc_timer1: mask: 0xffffffff max_cycles:
> > 0xffffffff, max_idle_ns: 637086815595 ns
> > [    0.001545] Calibrating delay loop (skipped), value calculated using
> > timer frequency.. 6.00 BogoMIPS (lpj=30000)
> > [    0.203469] clocksource: Switched to clocksource mxc_timer1
> >
> > > I wonder if the hrtimer_init() call should be with CLOCK_BOOTTIME
> > > instead of CLOCK_MONOTONIC.
> > > There could potentially be an issue with suspend, though I really doubt it.
> >
> > Also tested this, but it didn't change anything, VNC is still
> > slow/frozen.
> >
> > > Another idea would be to add a gadget setting to disable tx
> > > aggregation entirely...
> > > (note that reducing from 8000 to 2000 doesn't actually turn off aggregation...)
> > >
> > > Have you tried reducing from 8000 to 4000 or 3500 or 3000?
> > > Maybe there's some funkiness with page sizes??
> >
> > Sorry, do you mean reducing the both NTB_DEFAULT_IN_SIZE and NTB_OUT_SIZE?
> 
> I'm not sure ;-)
> 
> > If so, I tried to reduce from 16384 to 8192, 4096 and 2048. Everytime I
> > reduced the value, it got better and better, VNC running smoothly.
> >
> > I will keep testing to see if I get something.
> >
> > Thanks and regards,
> > Hiago.
> 
> I think we need to figure out whether the problem is host->device or
> device->host.
> Or perhaps you do already know this??
> Maybe this was mentioned already earlier in the thread and I missed it.
> 

Actually I still don't know yet.

> Try reducing only in or only out... and changing only one side of the
> connection.

I did this test and it was very interesting, thanks for the
reccomendation. 
It looks like only NTB_DEFAULT_IN_SIZE is relevant for
this issue. I changed NTB_DEFAULT_IN_SIZE from 16384 to 2048 on the
_device_ side (Apalis iMX6) and left the NTB_OUT_SIZE with the default
value of 16384, and now I can use my VNC client on the host side (move the
mouse, click some buttons...). Very fast and responsive I would say.

Does it mean the issue is on the HOST side?

> Maybe change TX_MAX_NUM_DPE to 1 (or other values)
> 
> Maybe change
> 
> if (ncm->skb_tx_data
> && (ncm->ndp_dgram_count >= TX_MAX_NUM_DPE
> || (ncm->skb_tx_data->len +
> div + rem + skb->len +
> ncm->skb_tx_ndp->len + ndp_align + (2 * dgram_idx_len))
> > max_size)) {
> 
> to just if (ncm->skb_tx_data)

Changing the 'if' statement to only 'if (ncm->skb_tx_data)' also made it
work! VNC running smoothly on my VNC client. Again, I applied this patch
to the device's kernel, Apalis iMX6.

I will put some print debugging here to see if I can figure out
something else.

Thanks a lot,
Hiago.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Question regarding CDC NCM and VNC performance issue
  2023-12-11 20:28                         ` Hiago De Franco
@ 2023-12-11 20:44                           ` Maciej Żenczykowski
  2023-12-12  9:48                             ` Oliver Neukum
  0 siblings, 1 reply; 29+ messages in thread
From: Maciej Żenczykowski @ 2023-12-11 20:44 UTC (permalink / raw)
  To: Hiago De Franco
  Cc: Oliver Neukum, Francesco Dolcini, davem, edumazet, hiago.franco,
	kuba, linux-usb, pabeni

On Mon, Dec 11, 2023 at 12:29 PM Hiago De Franco <hiagofranco@gmail.com> wrote:
>
> On Thu, Dec 07, 2023 at 08:37:09PM +0100, Maciej Żenczykowski wrote:
> > On Thu, Dec 7, 2023 at 7:57 PM Hiago De Franco <hiagofranco@gmail.com> wrote:
> > >
> > > Hello Oliver and Maciej,
> > >
> > > On Thu, Dec 07, 2023 at 10:41:51AM +0100, Oliver Neukum wrote:
> > > > Can you tell us which side in your test case produces many small packets?
> > > >
> > > > Furthermore, for testing purposes, could you decrease TX_TIMEOUT_NSECS in
> > > > f_ncm.c by an order of magnitude.
> > >
> > > The device side, Apalis iMX6, is the one producing the small packages
> > > and sending to my host PC. The VNC server is running on iMX6 device and,
> > > the client, on my host PC.
> > >
> > > I've decreased TX_TIMEOUT_NSECS from 300000 to 30000, but nothing
> > > changed, the behaviour is the same.
> > >
> > > On Thu, Dec 07, 2023 at 12:07:25PM +0100, Oliver Neukum wrote:
> > > > That suggests, but does not prove that the issue is on the host side.
> > > > Could you post the result of "ethtool -S" after a test run? We should
> > > > get statistics on the reasons for transmissions that way.
> > >
> > > Sure, for some reason I couldn't run ethtool on the iMX6 device:
> > >
> > > / # ethtool -S usb0
> > > no stats available
> > >
> > > So I ran everything on my Debian host PC. First, without any changes on
> > > the device's kernel, this is the result (where the VNC is really
> > > slow/frozen):
> > >
> > > $ sudo ethtool -S enx3e5dcdead75e
> > > NIC statistics:
> > >      tx_reason_ntb_full: 0
> > >      tx_reason_ndp_full: 0
> > >      tx_reason_timeout: 222
> > >      tx_reason_max_datagram: 0
> > >      tx_overhead: 42387
> > >      tx_ntbs: 222
> > >      rx_overhead: 38375
> > >      rx_ntbs: 529
> > >
> > > Next, I decreased NTB_DEFAULT_IN_SIZE and NTB_OUT_SIZE from 16384 to
> > > 8192. The performance improved a bit, and this is the result:
> > >
> > > $ sudo ethtool -S enx42ff68c1000a
> > > NIC statistics:
> > >      tx_reason_ntb_full: 0
> > >      tx_reason_ndp_full: 0
> > >      tx_reason_timeout: 321
> > >      tx_reason_max_datagram: 0
> > >      tx_overhead: 61617
> > >      tx_ntbs: 321
> > >      rx_overhead: 59341
> > >      rx_ntbs: 759
> > >
> > > Finally, I changed from 8192 to 4096, and the perfomance was
> > > better:
> > >
> > > $ sudo ethtool -S enx3a601e306de1
> > > NIC statistics:
> > >      tx_reason_ntb_full: 0
> > >      tx_reason_ndp_full: 0
> > >      tx_reason_timeout: 56067
> > >      tx_reason_max_datagram: 0
> > >      tx_overhead: 83630876
> > >      tx_ntbs: 56064
> > >      rx_overhead: 25437595
> > >      rx_ntbs: 847920
> > >
> > > At 4096 I can use the VNC with my app, click on buttons and see the mouse
> > > moving smoothly. Please note the device name changes because we're using
> > > random MAC addresses. 'ethtool' was running on my Debian host PC. I tested
> > > for 1min30s and then got the statics with ethtool for all 3 tests.
> > >
> > > On Thu, Dec 07, 2023 at 12:38:18PM +0100, Maciej Żenczykowski wrote:
> > > >
> > > > An every 1s (the default) ping is too rare to be of help I'd assume...
> > > > Try ping with various intervals (-i).  All the way down to a flood ping (-f).
> > > > Most likely -i 0.01 would be enough to make things work better...
> > >
> > > This one is interesting, backing to the default value of 16384, I
> > > launched the VNC client which now is back to be slow/frozen. In
> > > parallel, I started the ping command. The first one with 1 second
> > > results the following (I'm pingging my host PC, using my device's
> > > terminal):
> > >
> > > / # ping 192.168.11.2 -i 1
> > > PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data.
> > > 64 bytes from 192.168.11.2: icmp_seq=1 ttl=64 time=5070 ms
> > > 64 bytes from 192.168.11.2: icmp_seq=2 ttl=64 time=4003 ms
> > > 64 bytes from 192.168.11.2: icmp_seq=3 ttl=64 time=2963 ms
> > > 64 bytes from 192.168.11.2: icmp_seq=4 ttl=64 time=1923 ms
> > > 64 bytes from 192.168.11.2: icmp_seq=5 ttl=64 time=883 ms
> > > ^C
> > > --- 192.168.11.2 ping statistics ---
> > > 26 packets transmitted, 5 received, 80.7692% packet loss, time 25950ms
> > > rtt min/avg/max/mdev = 882.886/2968.250/5069.878/1478.425 ms, pipe 5
> > >
> > > Ping is really slow and lost almost all packets.
> >
> > It is *slow*.  But I don't think you're actually losing packets.
> > They simply haven't arrived *yet*.
> > Notice that the packets are sent every second, but they arrive after
> > ~5/~4/~3/~2/~1 s.
> > This suggests they all got delivered in one burst.
> >
> > You are missing 21 packets, but it's all the *final* 21 packets.
> > You'd need to run for a long long time (maybe 5 minutes or even more)
> > and see what happens then...
> > to see if the packets are indeed lost, or just not yet delivered...
> >
>
> Great, thanks for the explanation. Indeed, I ran the tests again, but
> now I waited 6 minutes for each test, and these are the results:
>
> -i 1
> --- 192.168.11.2 ping statistics ---
> 349 packets transmitted, 349 received, 0% packet loss, time 361724ms
> rtt min/avg/max/mdev = 1.597/10751.276/27960.848/12974.107 ms, pipe 27
>
> -i 0.1
> --- 192.168.11.2 ping statistics ---
> 3301 packets transmitted, 3294 received, 0.212057% packet loss, time 361342ms
> rtt min/avg/max/mdev = 0.959/1131.714/3081.504/1385.261 ms, pipe 29
>
> -i 0.01
> --- 192.168.11.2 ping statistics ---
> 20718 packets transmitted, 20718 received, 0% packet loss, time 366465ms
> rtt min/avg/max/mdev = 0.568/178.764/587.956/234.702 ms, pipe 29
>
> -flood
> --- 192.168.11.2 ping statistics ---
> 29173 packets transmitted, 29149 received, 0.0822679% packet loss, time 362065ms
> rtt min/avg/max/mdev = 0.518/127.472/583.971/204.277 ms, pipe 29, ipg/ewma 12.411/21.570 ms
>
> So yes, we can assume 0% of packet loss. Furthermore, I can see the
> bursts happening on my terminal, the ping command doesn't do anything
> and suddenly all the pings arrive at the same time, over and over again
> in a loop.

Ok, this is great to know.  No loss, just very very high delay.

> > > Next test, I decreased
> > > to 0.1s:
> > >
> > > / # ping 192.168.11.2 -i 0.1
> > > PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data.
> > > ...
> > > --- 192.168.11.2 ping statistics ---
> > > 129 packets transmitted, 122 received, 5.42636% packet loss, time 13971ms
> > > rtt min/avg/max/mdev = 1.752/999.274/2751.767/799.248 ms, pipe 26
> > >
> > > While ping is running in parallel, VNC has a better performance, I can
> > > see my mouse running and click on some buttons. As soon as ping stops,
> > > VNC is slow/frozen again. Also we have less packet loss.
> >
> > Again - this packet loss may be just at the tail.
> > You'd need to run it for much much longer to see if packet loss is high or not.
> > It may just be that you always lose the last ~7 packets or so...
> >
> > >
> > > Next test, decreased to 0.01:
> > >
> > > / # ping 192.168.11.2 -i 0.01
> > > PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data.
> > > ...
> > > --- 192.168.11.2 ping statistics ---
> > > 584 packets transmitted, 572 received, 2.05479% packet loss, time 10106ms
> > > rtt min/avg/max/mdev = 1.565/171.031/559.872/165.475 ms, pipe 28
> > >
> > > And finally, the flood:
> > >
> > > / # ping 192.168.11.2 -f
> > > PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data.
> > > ...
> > > --- 192.168.11.2 ping statistics ---
> > > 1314 packets transmitted, 1311 received, 0.22831% packet loss, time 16299ms
> > > rtt min/avg/max/mdev = 0.729/131.710/548.971/163.098 ms, pipe 28, ipg/ewma
> > > 12.413/7.802 ms
> > >
> > > While the flood is happening in parallel, the VNC runs very smoothly,
> > > and, again, as soon as it stops, it's back to slow/frozen.
> > >
> > > I believe here the ping command is helping to fullfil the buffer, that's
> > > why running it on parallel makes the VNC work...
> > >
> > > > Also which specific versions of the kernel are involved on both sides
> > > > of the connection.
> > >
> > > Device iMX6 is running Linux kernel v6.1.65, while my host PC is running
> > > Linux kernel v6.5.0.
> > >
> > > > There was a pretty recent fix related to packet aggregation recently
> > > > that could be either the fix or the cause.
> > > >   "usb: gadget: ncm: Handle decoding of multiple NTB's in unwrap call"
> > > > Though I doubt it - I believe that was specific to how windows packs things.
> > > >
> > > > Also Krishna Kurapati has a (afaik still not merged) patch "usb:
> > > > gadget: ncm: Add support to configure wMaxSegmentSize"
> > > > that could be of use - though again, doubt it.
> > >
> > > I could also try to apply these patches and check how it goes. Thanks
> > > for the information.
> > >
> > > > Another thing that comes to mind, is that perhaps the device in
> > > > question does not have sufficiently high res timers?
> > > > There might be something in the kernel boot log / dmesg about hrtimer
> > > > resolution...
> > > > Maybe this just needs to be configurable...  Or pick a smaller value
> > > > with broken hrtimer (if that's the issue),
> > > > or just disable aggregation if lowres hrtimers... etc...
> > > >
> > > > #define TX_TIMEOUT_NSECS 300000
> > > > 300 us is too small to be noticeable by VNC imho, so I think something
> > > > *must* be misbehaving.
> > > > Perhaps timer resolution is bad and this 300us ends up being much larger...???
> > >
> > > This is what I got from dmesg inside the iMX6 device:
> > >
> > > / # dmesg | grep timer
> > > [    0.000000] Switching to timer-based delay loop, resolution 333ns
> > > [    0.000019] clocksource: mxc_timer1: mask: 0xffffffff max_cycles:
> > > 0xffffffff, max_idle_ns: 637086815595 ns
> > > [    0.001545] Calibrating delay loop (skipped), value calculated using
> > > timer frequency.. 6.00 BogoMIPS (lpj=30000)
> > > [    0.203469] clocksource: Switched to clocksource mxc_timer1
> > >
> > > > I wonder if the hrtimer_init() call should be with CLOCK_BOOTTIME
> > > > instead of CLOCK_MONOTONIC.
> > > > There could potentially be an issue with suspend, though I really doubt it.
> > >
> > > Also tested this, but it didn't change anything, VNC is still
> > > slow/frozen.
> > >
> > > > Another idea would be to add a gadget setting to disable tx
> > > > aggregation entirely...
> > > > (note that reducing from 8000 to 2000 doesn't actually turn off aggregation...)
> > > >
> > > > Have you tried reducing from 8000 to 4000 or 3500 or 3000?
> > > > Maybe there's some funkiness with page sizes??
> > >
> > > Sorry, do you mean reducing the both NTB_DEFAULT_IN_SIZE and NTB_OUT_SIZE?
> >
> > I'm not sure ;-)
> >
> > > If so, I tried to reduce from 16384 to 8192, 4096 and 2048. Everytime I
> > > reduced the value, it got better and better, VNC running smoothly.
> > >
> > > I will keep testing to see if I get something.
> > >
> > > Thanks and regards,
> > > Hiago.
> >
> > I think we need to figure out whether the problem is host->device or
> > device->host.
> > Or perhaps you do already know this??
> > Maybe this was mentioned already earlier in the thread and I missed it.
> >
>
> Actually I still don't know yet.

Heh.

>
> > Try reducing only in or only out... and changing only one side of the
> > connection.
>
> I did this test and it was very interesting, thanks for the
> reccomendation.
> It looks like only NTB_DEFAULT_IN_SIZE is relevant for
> this issue. I changed NTB_DEFAULT_IN_SIZE from 16384 to 2048 on the
> _device_ side (Apalis iMX6) and left the NTB_OUT_SIZE with the default
> value of 16384, and now I can use my VNC client on the host side (move the
> mouse, click some buttons...). Very fast and responsive I would say.
>
> Does it mean the issue is on the HOST side?

Well... that's what I would assume based on that, but...

I think what we've really found is a bug:
  unsigned max_size = ncm->port.fixed_in_len;
is I think using fixed_in_len instead of using out_len.

*OR*

This is the gadget/device side and IN/OUT are always from the point of
view of the host.
ie. IN to the host, so transmitted by the gadget/device.

It's bound to be one or the other.  And while initially I thought the former,
I think it's likely the latter.
I vaguelly recall having run into something with in/out being
host-relative somewhere before.

> > Maybe change TX_MAX_NUM_DPE to 1 (or other values)
> >
> > Maybe change
> >
> > if (ncm->skb_tx_data
> > && (ncm->ndp_dgram_count >= TX_MAX_NUM_DPE
> > || (ncm->skb_tx_data->len +
> > div + rem + skb->len +
> > ncm->skb_tx_ndp->len + ndp_align + (2 * dgram_idx_len))
> > > max_size)) {
> >
> > to just if (ncm->skb_tx_data)
>
> Changing the 'if' statement to only 'if (ncm->skb_tx_data)' also made it
> work! VNC running smoothly on my VNC client. Again, I applied this patch
> to the device's kernel, Apalis iMX6.

Here you modified only the device's tx - basically turning off
buffering/aggregation.
(and likely poorly at that since - I think - this approach still ends
up buffering one packet,
because it seems to send the previous packet when it gets the next one)

...and in general it seems more likely something is busted on the
device, then on the host...

> I will put some print debugging here to see if I can figure out
> something else.

Everything is pointing towards the timer not firing...
Perhaps try instrumenting the timer callback to see if it ever runs.
  ncm_tx_timeout <-- add a printk in here
See if the timer registration functions aren't returning non-caught errors.
  hrtimer_try_to_cancel - returns an int - it's not checked. - maybe
this always claims there's no timer to cancel?
but afaict:
  hrtimer_start - does not return anything
  hrtimer_init - also does not appear to return anything...

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Question regarding CDC NCM and VNC performance issue
  2023-12-11 20:44                           ` Maciej Żenczykowski
@ 2023-12-12  9:48                             ` Oliver Neukum
  2023-12-12 20:32                               ` Hiago De Franco
  0 siblings, 1 reply; 29+ messages in thread
From: Oliver Neukum @ 2023-12-12  9:48 UTC (permalink / raw)
  To: Maciej Żenczykowski, Hiago De Franco
  Cc: Oliver Neukum, Francesco Dolcini, davem, edumazet, hiago.franco,
	kuba, linux-usb, pabeni

On 11.12.23 21:44, Maciej Żenczykowski wrote:
> On Mon, Dec 11, 2023 at 12:29 PM Hiago De Franco <hiagofranco@gmail.com> wrote:
>>
>> On Thu, Dec 07, 2023 at 08:37:09PM +0100, Maciej Żenczykowski wrote:
>>> On Thu, Dec 7, 2023 at 7:57 PM Hiago De Franco <hiagofranco@gmail.com> wrote:

Hi,

>>>> On Thu, Dec 07, 2023 at 12:07:25PM +0100, Oliver Neukum wrote:
>>>>> That suggests, but does not prove that the issue is on the host side.
>>>>> Could you post the result of "ethtool -S" after a test run? We should
>>>>> get statistics on the reasons for transmissions that way.

>>>> Finally, I changed from 8192 to 4096, and the perfomance was
>>>> better:
>>>>
>>>> $ sudo ethtool -S enx3a601e306de1
>>>> NIC statistics:
>>>>       tx_reason_ntb_full: 0
>>>>       tx_reason_ndp_full: 0
>>>>       tx_reason_timeout: 56067

This has grown two orders of magnitude.

>>>>       tx_reason_max_datagram: 0
>>>>       tx_overhead: 83630876
>>>>       tx_ntbs: 56064
>>>>       rx_overhead: 25437595
>>>>       rx_ntbs: 847920
>>>>
>>>> At 4096 I can use the VNC with my app, click on buttons and see the mouse
>>>> moving smoothly. Please note the device name changes because we're using
>>>> random MAC addresses. 'ethtool' was running on my Debian host PC. I tested
>>>> for 1min30s and then got the statics with ethtool for all 3 tests.

As you are testing for a constant time, the increase in transmissions
due to timeouts also decreases latency by two orders of magnitude.
Though this does not ultimately tell us which side is responsible.
>>>> While the flood is happening in parallel, the VNC runs very smoothly,
>>>> and, again, as soon as it stops, it's back to slow/frozen.
>>>>
>>>> I believe here the ping command is helping to fullfil the buffer, that's
>>>> why running it on parallel makes the VNC work...

Indeed. You can confirm this by running "ethool -S" before and after the ping.
During the ping tx_reason_timeout should stagnate and probably tx_reason_ndp_full
will go up.

>>>>> #define TX_TIMEOUT_NSECS 300000
>>>>> 300 us is too small to be noticeable by VNC imho, so I think something
>>>>> *must* be misbehaving.
>>>>> Perhaps timer resolution is bad and this 300us ends up being much larger...???

Now that you mention it and have taken a closer look I suspect this piece of code:

         } else if ((n < ctx->tx_max_datagrams) && (ready2send == 0) && (ctx->timer_interval > 0)) {
                 /* wait for more frames */
                 /* push variables */
                 ctx->tx_curr_skb = skb_out;
                 /* set the pending count */
                 if (n < CDC_NCM_RESTART_TIMER_DATAGRAM_CNT)
                         ctx->tx_timer_pending = CDC_NCM_TIMER_PENDING_CNT;
                 goto exit_no_skb;

Hiago, could you try lowering CDC_NCM_TIMER_PENDING_CNT, if need be all the way to 1?
It is defined in include/linux/usb/cdc_ncm.h as 3 currently
This applies to the host side.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Question regarding CDC NCM and VNC performance issue
  2023-12-12  9:48                             ` Oliver Neukum
@ 2023-12-12 20:32                               ` Hiago De Franco
  2023-12-12 21:21                                 ` Oliver Neukum
  0 siblings, 1 reply; 29+ messages in thread
From: Hiago De Franco @ 2023-12-12 20:32 UTC (permalink / raw)
  To: Maciej Żenczykowski, Oliver Neukum
  Cc: Francesco Dolcini, davem, edumazet, hiago.franco, kuba, linux-usb,
	pabeni

Hi Maciej and Oliver,

On Mon, Dec 11, 2023 at 12:44:42PM -0800, Maciej Żenczykowski wrote:
> On Mon, Dec 11, 2023 at 12:29 PM Hiago De Franco <hiagofranco@gmail.com> wrote:
> >
> > On Thu, Dec 07, 2023 at 08:37:09PM +0100, Maciej Żenczykowski wrote:
> > > On Thu, Dec 7, 2023 at 7:57 PM Hiago De Franco <hiagofranco@gmail.com> wrote:
> > > >
> > > > Hello Oliver and Maciej,
> > > >
> > > > On Thu, Dec 07, 2023 at 10:41:51AM +0100, Oliver Neukum wrote:
> > > > > Can you tell us which side in your test case produces many small packets?
> > > > >
> > > > > Furthermore, for testing purposes, could you decrease TX_TIMEOUT_NSECS in
> > > > > f_ncm.c by an order of magnitude.
> > > >
> > > > The device side, Apalis iMX6, is the one producing the small packages
> > > > and sending to my host PC. The VNC server is running on iMX6 device and,
> > > > the client, on my host PC.
> > > >
> > > > I've decreased TX_TIMEOUT_NSECS from 300000 to 30000, but nothing
> > > > changed, the behaviour is the same.
> > > >
> > > > On Thu, Dec 07, 2023 at 12:07:25PM +0100, Oliver Neukum wrote:
> > > > > That suggests, but does not prove that the issue is on the host side.
> > > > > Could you post the result of "ethtool -S" after a test run? We should
> > > > > get statistics on the reasons for transmissions that way.
> > > >
> > > > Sure, for some reason I couldn't run ethtool on the iMX6 device:
> > > >
> > > > / # ethtool -S usb0
> > > > no stats available
> > > >
> > > > So I ran everything on my Debian host PC. First, without any changes on
> > > > the device's kernel, this is the result (where the VNC is really
> > > > slow/frozen):
> > > >
> > > > $ sudo ethtool -S enx3e5dcdead75e
> > > > NIC statistics:
> > > >      tx_reason_ntb_full: 0
> > > >      tx_reason_ndp_full: 0
> > > >      tx_reason_timeout: 222
> > > >      tx_reason_max_datagram: 0
> > > >      tx_overhead: 42387
> > > >      tx_ntbs: 222
> > > >      rx_overhead: 38375
> > > >      rx_ntbs: 529
> > > >
> > > > Next, I decreased NTB_DEFAULT_IN_SIZE and NTB_OUT_SIZE from 16384 to
> > > > 8192. The performance improved a bit, and this is the result:
> > > >
> > > > $ sudo ethtool -S enx42ff68c1000a
> > > > NIC statistics:
> > > >      tx_reason_ntb_full: 0
> > > >      tx_reason_ndp_full: 0
> > > >      tx_reason_timeout: 321
> > > >      tx_reason_max_datagram: 0
> > > >      tx_overhead: 61617
> > > >      tx_ntbs: 321
> > > >      rx_overhead: 59341
> > > >      rx_ntbs: 759
> > > >
> > > > Finally, I changed from 8192 to 4096, and the perfomance was
> > > > better:
> > > >
> > > > $ sudo ethtool -S enx3a601e306de1
> > > > NIC statistics:
> > > >      tx_reason_ntb_full: 0
> > > >      tx_reason_ndp_full: 0
> > > >      tx_reason_timeout: 56067
> > > >      tx_reason_max_datagram: 0
> > > >      tx_overhead: 83630876
> > > >      tx_ntbs: 56064
> > > >      rx_overhead: 25437595
> > > >      rx_ntbs: 847920
> > > >
> > > > At 4096 I can use the VNC with my app, click on buttons and see the mouse
> > > > moving smoothly. Please note the device name changes because we're using
> > > > random MAC addresses. 'ethtool' was running on my Debian host PC. I tested
> > > > for 1min30s and then got the statics with ethtool for all 3 tests.
> > > >
> > > > On Thu, Dec 07, 2023 at 12:38:18PM +0100, Maciej Żenczykowski wrote:
> > > > >
> > > > > An every 1s (the default) ping is too rare to be of help I'd assume...
> > > > > Try ping with various intervals (-i).  All the way down to a flood ping (-f).
> > > > > Most likely -i 0.01 would be enough to make things work better...
> > > >
> > > > This one is interesting, backing to the default value of 16384, I
> > > > launched the VNC client which now is back to be slow/frozen. In
> > > > parallel, I started the ping command. The first one with 1 second
> > > > results the following (I'm pingging my host PC, using my device's
> > > > terminal):
> > > >
> > > > / # ping 192.168.11.2 -i 1
> > > > PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data.
> > > > 64 bytes from 192.168.11.2: icmp_seq=1 ttl=64 time=5070 ms
> > > > 64 bytes from 192.168.11.2: icmp_seq=2 ttl=64 time=4003 ms
> > > > 64 bytes from 192.168.11.2: icmp_seq=3 ttl=64 time=2963 ms
> > > > 64 bytes from 192.168.11.2: icmp_seq=4 ttl=64 time=1923 ms
> > > > 64 bytes from 192.168.11.2: icmp_seq=5 ttl=64 time=883 ms
> > > > ^C
> > > > --- 192.168.11.2 ping statistics ---
> > > > 26 packets transmitted, 5 received, 80.7692% packet loss, time 25950ms
> > > > rtt min/avg/max/mdev = 882.886/2968.250/5069.878/1478.425 ms, pipe 5
> > > >
> > > > Ping is really slow and lost almost all packets.
> > >
> > > It is *slow*.  But I don't think you're actually losing packets.
> > > They simply haven't arrived *yet*.
> > > Notice that the packets are sent every second, but they arrive after
> > > ~5/~4/~3/~2/~1 s.
> > > This suggests they all got delivered in one burst.
> > >
> > > You are missing 21 packets, but it's all the *final* 21 packets.
> > > You'd need to run for a long long time (maybe 5 minutes or even more)
> > > and see what happens then...
> > > to see if the packets are indeed lost, or just not yet delivered...
> > >
> >
> > Great, thanks for the explanation. Indeed, I ran the tests again, but
> > now I waited 6 minutes for each test, and these are the results:
> >
> > -i 1
> > --- 192.168.11.2 ping statistics ---
> > 349 packets transmitted, 349 received, 0% packet loss, time 361724ms
> > rtt min/avg/max/mdev = 1.597/10751.276/27960.848/12974.107 ms, pipe 27
> >
> > -i 0.1
> > --- 192.168.11.2 ping statistics ---
> > 3301 packets transmitted, 3294 received, 0.212057% packet loss, time 361342ms
> > rtt min/avg/max/mdev = 0.959/1131.714/3081.504/1385.261 ms, pipe 29
> >
> > -i 0.01
> > --- 192.168.11.2 ping statistics ---
> > 20718 packets transmitted, 20718 received, 0% packet loss, time 366465ms
> > rtt min/avg/max/mdev = 0.568/178.764/587.956/234.702 ms, pipe 29
> >
> > -flood
> > --- 192.168.11.2 ping statistics ---
> > 29173 packets transmitted, 29149 received, 0.0822679% packet loss, time 362065ms
> > rtt min/avg/max/mdev = 0.518/127.472/583.971/204.277 ms, pipe 29, ipg/ewma 12.411/21.570 ms
> >
> > So yes, we can assume 0% of packet loss. Furthermore, I can see the
> > bursts happening on my terminal, the ping command doesn't do anything
> > and suddenly all the pings arrive at the same time, over and over again
> > in a loop.
> 
> Ok, this is great to know.  No loss, just very very high delay.
> 
> > > > Next test, I decreased
> > > > to 0.1s:
> > > >
> > > > / # ping 192.168.11.2 -i 0.1
> > > > PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data.
> > > > ...
> > > > --- 192.168.11.2 ping statistics ---
> > > > 129 packets transmitted, 122 received, 5.42636% packet loss, time 13971ms
> > > > rtt min/avg/max/mdev = 1.752/999.274/2751.767/799.248 ms, pipe 26
> > > >
> > > > While ping is running in parallel, VNC has a better performance, I can
> > > > see my mouse running and click on some buttons. As soon as ping stops,
> > > > VNC is slow/frozen again. Also we have less packet loss.
> > >
> > > Again - this packet loss may be just at the tail.
> > > You'd need to run it for much much longer to see if packet loss is high or not.
> > > It may just be that you always lose the last ~7 packets or so...
> > >
> > > >
> > > > Next test, decreased to 0.01:
> > > >
> > > > / # ping 192.168.11.2 -i 0.01
> > > > PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data.
> > > > ...
> > > > --- 192.168.11.2 ping statistics ---
> > > > 584 packets transmitted, 572 received, 2.05479% packet loss, time 10106ms
> > > > rtt min/avg/max/mdev = 1.565/171.031/559.872/165.475 ms, pipe 28
> > > >
> > > > And finally, the flood:
> > > >
> > > > / # ping 192.168.11.2 -f
> > > > PING 192.168.11.2 (192.168.11.2) 56(84) bytes of data.
> > > > ...
> > > > --- 192.168.11.2 ping statistics ---
> > > > 1314 packets transmitted, 1311 received, 0.22831% packet loss, time 16299ms
> > > > rtt min/avg/max/mdev = 0.729/131.710/548.971/163.098 ms, pipe 28, ipg/ewma
> > > > 12.413/7.802 ms
> > > >
> > > > While the flood is happening in parallel, the VNC runs very smoothly,
> > > > and, again, as soon as it stops, it's back to slow/frozen.
> > > >
> > > > I believe here the ping command is helping to fullfil the buffer, that's
> > > > why running it on parallel makes the VNC work...
> > > >
> > > > > Also which specific versions of the kernel are involved on both sides
> > > > > of the connection.
> > > >
> > > > Device iMX6 is running Linux kernel v6.1.65, while my host PC is running
> > > > Linux kernel v6.5.0.
> > > >
> > > > > There was a pretty recent fix related to packet aggregation recently
> > > > > that could be either the fix or the cause.
> > > > >   "usb: gadget: ncm: Handle decoding of multiple NTB's in unwrap call"
> > > > > Though I doubt it - I believe that was specific to how windows packs things.
> > > > >
> > > > > Also Krishna Kurapati has a (afaik still not merged) patch "usb:
> > > > > gadget: ncm: Add support to configure wMaxSegmentSize"
> > > > > that could be of use - though again, doubt it.
> > > >
> > > > I could also try to apply these patches and check how it goes. Thanks
> > > > for the information.
> > > >
> > > > > Another thing that comes to mind, is that perhaps the device in
> > > > > question does not have sufficiently high res timers?
> > > > > There might be something in the kernel boot log / dmesg about hrtimer
> > > > > resolution...
> > > > > Maybe this just needs to be configurable...  Or pick a smaller value
> > > > > with broken hrtimer (if that's the issue),
> > > > > or just disable aggregation if lowres hrtimers... etc...
> > > > >
> > > > > #define TX_TIMEOUT_NSECS 300000
> > > > > 300 us is too small to be noticeable by VNC imho, so I think something
> > > > > *must* be misbehaving.
> > > > > Perhaps timer resolution is bad and this 300us ends up being much larger...???
> > > >
> > > > This is what I got from dmesg inside the iMX6 device:
> > > >
> > > > / # dmesg | grep timer
> > > > [    0.000000] Switching to timer-based delay loop, resolution 333ns
> > > > [    0.000019] clocksource: mxc_timer1: mask: 0xffffffff max_cycles:
> > > > 0xffffffff, max_idle_ns: 637086815595 ns
> > > > [    0.001545] Calibrating delay loop (skipped), value calculated using
> > > > timer frequency.. 6.00 BogoMIPS (lpj=30000)
> > > > [    0.203469] clocksource: Switched to clocksource mxc_timer1
> > > >
> > > > > I wonder if the hrtimer_init() call should be with CLOCK_BOOTTIME
> > > > > instead of CLOCK_MONOTONIC.
> > > > > There could potentially be an issue with suspend, though I really doubt it.
> > > >
> > > > Also tested this, but it didn't change anything, VNC is still
> > > > slow/frozen.
> > > >
> > > > > Another idea would be to add a gadget setting to disable tx
> > > > > aggregation entirely...
> > > > > (note that reducing from 8000 to 2000 doesn't actually turn off aggregation...)
> > > > >
> > > > > Have you tried reducing from 8000 to 4000 or 3500 or 3000?
> > > > > Maybe there's some funkiness with page sizes??
> > > >
> > > > Sorry, do you mean reducing the both NTB_DEFAULT_IN_SIZE and NTB_OUT_SIZE?
> > >
> > > I'm not sure ;-)
> > >
> > > > If so, I tried to reduce from 16384 to 8192, 4096 and 2048. Everytime I
> > > > reduced the value, it got better and better, VNC running smoothly.
> > > >
> > > > I will keep testing to see if I get something.
> > > >
> > > > Thanks and regards,
> > > > Hiago.
> > >
> > > I think we need to figure out whether the problem is host->device or
> > > device->host.
> > > Or perhaps you do already know this??
> > > Maybe this was mentioned already earlier in the thread and I missed it.
> > >
> >
> > Actually I still don't know yet.
> 
> Heh.
> 
> >
> > > Try reducing only in or only out... and changing only one side of the
> > > connection.
> >
> > I did this test and it was very interesting, thanks for the
> > reccomendation.
> > It looks like only NTB_DEFAULT_IN_SIZE is relevant for
> > this issue. I changed NTB_DEFAULT_IN_SIZE from 16384 to 2048 on the
> > _device_ side (Apalis iMX6) and left the NTB_OUT_SIZE with the default
> > value of 16384, and now I can use my VNC client on the host side (move the
> > mouse, click some buttons...). Very fast and responsive I would say.
> >
> > Does it mean the issue is on the HOST side?
> 
> Well... that's what I would assume based on that, but...
> 
> I think what we've really found is a bug:
>   unsigned max_size = ncm->port.fixed_in_len;
> is I think using fixed_in_len instead of using out_len.
> 
> *OR*
> 
> This is the gadget/device side and IN/OUT are always from the point of
> view of the host.
> ie. IN to the host, so transmitted by the gadget/device.
> 
> It's bound to be one or the other.  And while initially I thought the former,
> I think it's likely the latter.
> I vaguelly recall having run into something with in/out being
> host-relative somewhere before.
> 
> > > Maybe change TX_MAX_NUM_DPE to 1 (or other values)
> > >
> > > Maybe change
> > >
> > > if (ncm->skb_tx_data
> > > && (ncm->ndp_dgram_count >= TX_MAX_NUM_DPE
> > > || (ncm->skb_tx_data->len +
> > > div + rem + skb->len +
> > > ncm->skb_tx_ndp->len + ndp_align + (2 * dgram_idx_len))
> > > > max_size)) {
> > >
> > > to just if (ncm->skb_tx_data)
> >
> > Changing the 'if' statement to only 'if (ncm->skb_tx_data)' also made it
> > work! VNC running smoothly on my VNC client. Again, I applied this patch
> > to the device's kernel, Apalis iMX6.
> 
> Here you modified only the device's tx - basically turning off
> buffering/aggregation.
> (and likely poorly at that since - I think - this approach still ends
> up buffering one packet,
> because it seems to send the previous packet when it gets the next one)
> 
> ...and in general it seems more likely something is busted on the
> device, then on the host...
> 
> > I will put some print debugging here to see if I can figure out
> > something else.
> 
> Everything is pointing towards the timer not firing...
> Perhaps try instrumenting the timer callback to see if it ever runs.
>   ncm_tx_timeout <-- add a printk in here
> See if the timer registration functions aren't returning non-caught errors.
>   hrtimer_try_to_cancel - returns an int - it's not checked. - maybe
> this always claims there's no timer to cancel?
> but afaict:
>   hrtimer_start - does not return anything
>   hrtimer_init - also does not appear to return anything...

I was thinking about the same, the timer is not firing at all. I put the
printk's as suggested, and the ncm_tx_is being triggered. Basically,
when I first turn on my device, the try_to_cancel function is always
returning -1, which means "when the timer is currently executing the
callback function and cannot be stopped".

When I launch my VNC client, a see the following pattern:
[   94.926136] hfranco: hrtimer_start called.
[   94.926259] hfranco: try_to_cancel 1
[   94.926268] hfranco: hrtimer_start called.
[   94.926391] hfranco: try_to_cancel 1

The hrtimer is called and then try_to_cancel returns 1, which means it
was canceled "when the timer was active" (following the function
comment).
So I guess the hrtimer is somehow working as expected (at least being
called).
I also calculated the time between the consecutive messages of timer
started and trying to cancel the timer, and this is what I got:

94.926259 - 94.926136 = 0.000123 seconds
94.926391 - 94.926268 = 0.000123 seconds
94.927548 - 94.926401 = 0.001147 seconds
94.927818 - 94.927585 = 0.000233 seconds
94.928338 - 94.927836 = 0.000502 seconds
94.928590 - 94.928360 = 0.000230 seconds
94.929330 - 94.928608 = 0.000722 seconds
94.929609 - 94.929363 = 0.000246 seconds

I'm not sure how accurate this is, but we can clearly sometimes see it
gets more than 300us.

On Tue, Dec 12, 2023 at 10:48:01AM +0100, Oliver Neukum wrote:
> On 11.12.23 21:44, Maciej Żenczykowski wrote:
> > On Mon, Dec 11, 2023 at 12:29 PM Hiago De Franco <hiagofranco@gmail.com> wrote:
> > > 
> > > On Thu, Dec 07, 2023 at 08:37:09PM +0100, Maciej Żenczykowski wrote:
> > > > On Thu, Dec 7, 2023 at 7:57 PM Hiago De Franco <hiagofranco@gmail.com> wrote:
> 
> Hi,
> 
> > > > > On Thu, Dec 07, 2023 at 12:07:25PM +0100, Oliver Neukum wrote:
> > > > > > That suggests, but does not prove that the issue is on the host side.
> > > > > > Could you post the result of "ethtool -S" after a test run? We should
> > > > > > get statistics on the reasons for transmissions that way.
> 
> > > > > Finally, I changed from 8192 to 4096, and the perfomance was
> > > > > better:
> > > > > 
> > > > > $ sudo ethtool -S enx3a601e306de1
> > > > > NIC statistics:
> > > > >       tx_reason_ntb_full: 0
> > > > >       tx_reason_ndp_full: 0
> > > > >       tx_reason_timeout: 56067
> 
> This has grown two orders of magnitude.
> 
> > > > >       tx_reason_max_datagram: 0
> > > > >       tx_overhead: 83630876
> > > > >       tx_ntbs: 56064
> > > > >       rx_overhead: 25437595
> > > > >       rx_ntbs: 847920
> > > > > 
> > > > > At 4096 I can use the VNC with my app, click on buttons and see the mouse
> > > > > moving smoothly. Please note the device name changes because we're using
> > > > > random MAC addresses. 'ethtool' was running on my Debian host PC. I tested
> > > > > for 1min30s and then got the statics with ethtool for all 3 tests.
> 
> As you are testing for a constant time, the increase in transmissions
> due to timeouts also decreases latency by two orders of magnitude.
> Though this does not ultimately tell us which side is responsible.
> > > > > While the flood is happening in parallel, the VNC runs very smoothly,
> > > > > and, again, as soon as it stops, it's back to slow/frozen.
> > > > > 
> > > > > I believe here the ping command is helping to fullfil the buffer, that's
> > > > > why running it on parallel makes the VNC work...
> 
> Indeed. You can confirm this by running "ethool -S" before and after the ping.
> During the ping tx_reason_timeout should stagnate and probably tx_reason_ndp_full
> will go up.
> 
> > > > > > #define TX_TIMEOUT_NSECS 300000
> > > > > > 300 us is too small to be noticeable by VNC imho, so I think something
> > > > > > *must* be misbehaving.
> > > > > > Perhaps timer resolution is bad and this 300us ends up being much larger...???
> 
> Now that you mention it and have taken a closer look I suspect this piece of code:
> 
>         } else if ((n < ctx->tx_max_datagrams) && (ready2send == 0) && (ctx->timer_interval > 0)) {
>                 /* wait for more frames */
>                 /* push variables */
>                 ctx->tx_curr_skb = skb_out;
>                 /* set the pending count */
>                 if (n < CDC_NCM_RESTART_TIMER_DATAGRAM_CNT)
>                         ctx->tx_timer_pending = CDC_NCM_TIMER_PENDING_CNT;
>                 goto exit_no_skb;
> 
> Hiago, could you try lowering CDC_NCM_TIMER_PENDING_CNT, if need be all the way to 1?
> It is defined in include/linux/usb/cdc_ncm.h as 3 currently
> This applies to the host side.

On my side CDC_NCM_TIMER_PENDING_CNT is set to 2 by default, did you
mean CDC_NCM_RESTART_TIMER_DATAGRAM_CNT?

Despite of that, I tried to lower both CDC_NCM_TIMER_PENDING_CNT and
CDC_NCM_RESTART_TIMER_DATAGRAM_CNT all the way down to 1, first the
CDC_NCM_TIMER_PENDING_CNT, then CDC_NCM_RESTART_TIMER_DATAGRAM_CNT and
finally both at the same time, but it didn't help.

I've also put some printks as following:

	ctx->tx_curr_frame_num = n;
	printk("hfranco: tx_curr_frame_num = %d", n);

	if (n == 0) {
		printk("hfranco: n == 0");
		/* wait for more frames */
		/* push variables */
		ctx->tx_curr_skb = skb_out;
		goto exit_no_skb;

	} else if ((n < ctx->tx_max_datagrams) && (ready2send == 0) && (ctx->timer_interval > 0)) {
		printk("hfranco: tx_max_datagrams = %d", ctx->tx_max_datagrams);
		printk("hfranco: timer_interval = %d", ctx->timer_interval);
		printk("hfranco: n inside else if = %d", n);
		/* wait for more frames */
		/* push variables */
		ctx->tx_curr_skb = skb_out;
		/* set the pending count */
		if (n < CDC_NCM_RESTART_TIMER_DATAGRAM_CNT)
			ctx->tx_timer_pending = CDC_NCM_TIMER_PENDING_CNT;
		goto exit_no_skb;

	} else {
		printk("hfranco: n inside else = %d", n);
		if (n == ctx->tx_max_datagrams)
			ctx->tx_reason_max_datagram++;	/* count reason for transmitting */

I ran it on my host PC, compiled it as module for my Debian dekstop, and
this is the dmesg:

[ 9663.478807] hfranco: tx_curr_frame_num = 1
[ 9663.478816] hfranco: tx_max_datagrams = 40
[ 9663.478818] hfranco: timer_interval = 400000
[ 9663.478820] hfranco: n inside else if = 1
[ 9663.478822] hfranco: timer started
[ 9663.479645] hfranco: tx_curr_frame_num = 1
[ 9663.479652] hfranco: n inside else = 1

And then it basically repeats. Looks like 'n' never passes the 1 value.
By tweaking the flags mentioned before, 'n' got a value of 4, but that
was the maximum value. I was wondering, why do you think this code looks
suspicious? I basically just inserted some printks on the tx side, I
will see if I can get something from the rx as well.

> 
> 	Regards
> 		Oliver

Thanks a lot and regards,
Hiago.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Question regarding CDC NCM and VNC performance issue
  2023-12-12 20:32                               ` Hiago De Franco
@ 2023-12-12 21:21                                 ` Oliver Neukum
  2023-12-13  1:11                                   ` Maciej Żenczykowski
  0 siblings, 1 reply; 29+ messages in thread
From: Oliver Neukum @ 2023-12-12 21:21 UTC (permalink / raw)
  To: Hiago De Franco, Maciej Żenczykowski, Oliver Neukum
  Cc: Francesco Dolcini, davem, edumazet, hiago.franco, kuba, linux-usb,
	pabeni

On 12.12.23 21:32, Hiago De Franco wrote:

Hi,

> On Mon, Dec 11, 2023 at 12:44:42PM -0800, Maciej Żenczykowski wrote:
>> On Mon, Dec 11, 2023 at 12:29 PM Hiago De Franco <hiagofranco@gmail.com> wrote:

>> Hiago, could you try lowering CDC_NCM_TIMER_PENDING_CNT, if need be all the way to 1?
>> It is defined in include/linux/usb/cdc_ncm.h as 3 currently
>> This applies to the host side.
> 
> On my side CDC_NCM_TIMER_PENDING_CNT is set to 2 by default, did you
> mean CDC_NCM_RESTART_TIMER_DATAGRAM_CNT?

Yes, I meant that. Sorry.

> Despite of that, I tried to lower both CDC_NCM_TIMER_PENDING_CNT and
> CDC_NCM_RESTART_TIMER_DATAGRAM_CNT all the way down to 1, first the
> CDC_NCM_TIMER_PENDING_CNT, then CDC_NCM_RESTART_TIMER_DATAGRAM_CNT and
> finally both at the same time, but it didn't help.
> 
> I've also put some printks as following:
> 
> 	ctx->tx_curr_frame_num = n;
> 	printk("hfranco: tx_curr_frame_num = %d", n);
> 
> 	if (n == 0) {
> 		printk("hfranco: n == 0");
> 		/* wait for more frames */
> 		/* push variables */
> 		ctx->tx_curr_skb = skb_out;
> 		goto exit_no_skb;
> 
> 	} else if ((n < ctx->tx_max_datagrams) && (ready2send == 0) && (ctx->timer_interval > 0)) {
> 		printk("hfranco: tx_max_datagrams = %d", ctx->tx_max_datagrams);
> 		printk("hfranco: timer_interval = %d", ctx->timer_interval);
> 		printk("hfranco: n inside else if = %d", n);
> 		/* wait for more frames */
> 		/* push variables */
> 		ctx->tx_curr_skb = skb_out;
> 		/* set the pending count */
> 		if (n < CDC_NCM_RESTART_TIMER_DATAGRAM_CNT)
> 			ctx->tx_timer_pending = CDC_NCM_TIMER_PENDING_CNT;
> 		goto exit_no_skb;
> 
> 	} else {
> 		printk("hfranco: n inside else = %d", n);
> 		if (n == ctx->tx_max_datagrams)
> 			ctx->tx_reason_max_datagram++;	/* count reason for transmitting */
> 
> I ran it on my host PC, compiled it as module for my Debian dekstop, and
> this is the dmesg:
> 
> [ 9663.478807] hfranco: tx_curr_frame_num = 1
> [ 9663.478816] hfranco: tx_max_datagrams = 40
> [ 9663.478818] hfranco: timer_interval = 400000
> [ 9663.478820] hfranco: n inside else if = 1
> [ 9663.478822] hfranco: timer started
> [ 9663.479645] hfranco: tx_curr_frame_num = 1
> [ 9663.479652] hfranco: n inside else = 1
> 
> And then it basically repeats. Looks like 'n' never passes the 1 value.
> By tweaking the flags mentioned before, 'n' got a value of 4, but that
> was the maximum value. I was wondering, why do you think this code looks
> suspicious? I basically just inserted some printks on the tx side, I
> will see if I can get something from the rx as well.

If we look at the statistics you initially gathered, we can see that all transmissions
on the host side happen because the timeout elapses. That, however, does
_not_ tell us that the host is to blame. We could look at two possible scenarios

A - the gadget is bundling up the packets with too much delay and the host
just answers to the megatransmissions with one packet and the delay on the host
is inconsequential

B - the timer on the host runs for too long or sometimes not at all. If that were
the case that code I pointed out would be most likely to blame

Could I suggest we try to localize the issue? Can you ping the host from the device?

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Question regarding CDC NCM and VNC performance issue
  2023-12-12 21:21                                 ` Oliver Neukum
@ 2023-12-13  1:11                                   ` Maciej Żenczykowski
  2023-12-15 21:20                                     ` Hiago De Franco
  0 siblings, 1 reply; 29+ messages in thread
From: Maciej Żenczykowski @ 2023-12-13  1:11 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Hiago De Franco, Francesco Dolcini, davem, edumazet, hiago.franco,
	kuba, linux-usb, pabeni

On Tue, Dec 12, 2023 at 1:21 PM Oliver Neukum <oneukum@suse.com> wrote:
>
> On 12.12.23 21:32, Hiago De Franco wrote:
>
> Hi,
>
> > On Mon, Dec 11, 2023 at 12:44:42PM -0800, Maciej Żenczykowski wrote:
> >> On Mon, Dec 11, 2023 at 12:29 PM Hiago De Franco <hiagofranco@gmail.com> wrote:
>
> >> Hiago, could you try lowering CDC_NCM_TIMER_PENDING_CNT, if need be all the way to 1?
> >> It is defined in include/linux/usb/cdc_ncm.h as 3 currently
> >> This applies to the host side.
> >
> > On my side CDC_NCM_TIMER_PENDING_CNT is set to 2 by default, did you
> > mean CDC_NCM_RESTART_TIMER_DATAGRAM_CNT?
>
> Yes, I meant that. Sorry.
>
> > Despite of that, I tried to lower both CDC_NCM_TIMER_PENDING_CNT and
> > CDC_NCM_RESTART_TIMER_DATAGRAM_CNT all the way down to 1, first the
> > CDC_NCM_TIMER_PENDING_CNT, then CDC_NCM_RESTART_TIMER_DATAGRAM_CNT and
> > finally both at the same time, but it didn't help.
> >
> > I've also put some printks as following:
> >
> >       ctx->tx_curr_frame_num = n;
> >       printk("hfranco: tx_curr_frame_num = %d", n);
> >
> >       if (n == 0) {
> >               printk("hfranco: n == 0");
> >               /* wait for more frames */
> >               /* push variables */
> >               ctx->tx_curr_skb = skb_out;
> >               goto exit_no_skb;
> >
> >       } else if ((n < ctx->tx_max_datagrams) && (ready2send == 0) && (ctx->timer_interval > 0)) {
> >               printk("hfranco: tx_max_datagrams = %d", ctx->tx_max_datagrams);
> >               printk("hfranco: timer_interval = %d", ctx->timer_interval);
> >               printk("hfranco: n inside else if = %d", n);
> >               /* wait for more frames */
> >               /* push variables */
> >               ctx->tx_curr_skb = skb_out;
> >               /* set the pending count */
> >               if (n < CDC_NCM_RESTART_TIMER_DATAGRAM_CNT)
> >                       ctx->tx_timer_pending = CDC_NCM_TIMER_PENDING_CNT;
> >               goto exit_no_skb;
> >
> >       } else {
> >               printk("hfranco: n inside else = %d", n);
> >               if (n == ctx->tx_max_datagrams)
> >                       ctx->tx_reason_max_datagram++;  /* count reason for transmitting */
> >
> > I ran it on my host PC, compiled it as module for my Debian dekstop, and

Shouldn't you be doing this on the gadget side?
I thought we were thinking it was the gadget transmit timer having issues.

> > this is the dmesg:
> >
> > [ 9663.478807] hfranco: tx_curr_frame_num = 1
> > [ 9663.478816] hfranco: tx_max_datagrams = 40
> > [ 9663.478818] hfranco: timer_interval = 400000
> > [ 9663.478820] hfranco: n inside else if = 1
> > [ 9663.478822] hfranco: timer started
> > [ 9663.479645] hfranco: tx_curr_frame_num = 1
> > [ 9663.479652] hfranco: n inside else = 1
> >
> > And then it basically repeats. Looks like 'n' never passes the 1 value.
> > By tweaking the flags mentioned before, 'n' got a value of 4, but that
> > was the maximum value. I was wondering, why do you think this code looks
> > suspicious? I basically just inserted some printks on the tx side, I
> > will see if I can get something from the rx as well.
>
> If we look at the statistics you initially gathered, we can see that all transmissions
> on the host side happen because the timeout elapses. That, however, does
> _not_ tell us that the host is to blame. We could look at two possible scenarios
>
> A - the gadget is bundling up the packets with too much delay and the host
> just answers to the megatransmissions with one packet and the delay on the host
> is inconsequential
>
> B - the timer on the host runs for too long or sometimes not at all. If that were
> the case that code I pointed out would be most likely to blame
>
> Could I suggest we try to localize the issue? Can you ping the host from the device?
>
>         Regards
>                 Oliver

--
Maciej Żenczykowski, Kernel Networking Developer @ Google

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Question regarding CDC NCM and VNC performance issue
  2023-12-13  1:11                                   ` Maciej Żenczykowski
@ 2023-12-15 21:20                                     ` Hiago De Franco
  2023-12-15 21:44                                       ` Maciej Żenczykowski
  0 siblings, 1 reply; 29+ messages in thread
From: Hiago De Franco @ 2023-12-15 21:20 UTC (permalink / raw)
  To: Oliver Neukum, Maciej Żenczykowski
  Cc: Francesco Dolcini, davem, edumazet, hiago.franco, kuba, linux-usb,
	pabeni

Hi Oliver and Maciej,

Sorry for the delay on my message. I took some time to set up some
debugging tools, hoping to find something more interesting. I couldn't
find too much I guess, but I will describe the results here.

First, I set up ftrace to track 'package_for_tx' function. My idea would
be that I would see the delays between two consecutives calls of this
function being more than 300us, this would say the code wasn't
respecting the hrtimer. The trace log looks something like that:

# tracer: function
#
# entries-in-buffer/entries-written: 468/468   #P:2
#
#                                _-----=> irqs-off/BH-disabled
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| / _-=> migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
            tezi-237     [001] dns..   101.030667: package_for_tx <-ncm_wrap_ntb
          <idle>-0       [000] d.s..   101.032771: package_for_tx <-ncm_wrap_ntb
          weston-228     [001] d.s..   101.039430: package_for_tx <-ncm_wrap_ntb
            tezi-237     [000] d.s..   101.041634: package_for_tx <-ncm_wrap_ntb
          <idle>-0       [001] d.s..   101.042326: package_for_tx <-ncm_wrap_ntb
          <idle>-0       [001] d.s..   101.055665: package_for_tx <-ncm_wrap_ntb
          weston-231     [001] d.s..   101.057779: package_for_tx <-ncm_wrap_ntb
          <idle>-0       [001] d.s..   101.068630: package_for_tx <-ncm_wrap_ntb
          weston-231     [001] d.s..   101.085738: package_for_tx <-ncm_wrap_ntb
            tezi-237     [000] D.s..   101.088611: package_for_tx <-ncm_wrap_ntb

So I created a simple script to calculate the delta time between two
consecutive calls and give me at the end how many of them are above
300us. This is the result:

Total timestamps above 300 microseconds: 440 of 742
Percentage of timestamps above 300 microseconds: 59.30%

Then I decided to lower down the TX_MAX_NUM_DPE from 32 to 0.
0 and 4 made the VNC runs smooth. I've also tested 8 and 16, but these
two didn't help. Again, by setting the number to 0, this is the result:

Total timestamps above 300 microseconds: 3149 of 65420
Percentage of timestamps above 300 microseconds: 4.81%

Of course we also have more packages, as the VNC is working and it's not
frozen. It's interesting how fast this is now, but as discussed with a
coleague, we don't know if this tells much. There are more latencies
envolved, for example from the scheduler, so probably the result is not
showing me anything valuable.

My second test was trying to figure out how TX_MAX_NUM_DPE was
impacting my issue, so I added two printks to check the number it was
getting and when it was being zeroed.

/* Delay for the transmit to wait before sending an unfilled NTB frame. */
 #define TX_TIMEOUT_NSECS       300000
@@ -964,6 +964,7 @@ static struct sk_buff *package_for_tx(struct f_ncm *ncm)
        /* Set the final NDP wLength */
        new_len = opts->ndp_size +
                        (ncm->ndp_dgram_count * dgram_idx_len);
+       printk("hfranco: package_for_tx called, setting ndp_dram_count to 0");
        ncm->ndp_dgram_count = 0;
        /* Increment from start to wLength */
        ntb_iter = (void *) ncm->skb_tx_ndp->data;
@@ -1027,6 +1028,8 @@ static struct sk_buff *ncm_wrap_ntb(struct gether *port,
                 * ready for new data.
                 * NOTE: Assume maximum align for speed of calculation.
                 */
+
+               printk("hfranco: ncm->ndp_dgram_count = %d", ncm->ndp_dgram_count);
                if (ncm->skb_tx_data
                    && (ncm->ndp_dgram_count >= TX_MAX_NUM_DPE
                    || (ncm->skb_tx_data->len +

This is the result:

[ 6613.034304] hfranco: package_for_tx called, setting ndp_dram_count to 0
[ 6613.034783] hfranco: ncm->ndp_dgram_count = 0
[ 6613.034812] hfranco: ncm->ndp_dgram_count = 2
[ 6613.034823] hfranco: ncm->ndp_dgram_count = 3
[ 6613.034835] hfranco: ncm->ndp_dgram_count = 4
[ 6613.034844] hfranco: ncm->ndp_dgram_count = 5
[ 6613.034855] hfranco: ncm->ndp_dgram_count = 6
[ 6613.034865] hfranco: ncm->ndp_dgram_count = 7
[ 6613.034874] hfranco: ncm->ndp_dgram_count = 8
[ 6613.034884] hfranco: ncm->ndp_dgram_count = 9
[ 6613.034895] hfranco: ncm->ndp_dgram_count = 10
[ 6613.034905] hfranco: ncm->ndp_dgram_count = 11
[ 6613.034914] hfranco: package_for_tx called, setting ndp_dram_count to 0
[ 6613.035266] hfranco: ncm->ndp_dgram_count = 2
[ 6613.035281] hfranco: ncm->ndp_dgram_count = 3
[ 6613.035292] hfranco: ncm->ndp_dgram_count = 4
[ 6613.035301] hfranco: ncm->ndp_dgram_count = 5
[ 6613.035310] hfranco: ncm->ndp_dgram_count = 6
[ 6613.035320] hfranco: ncm->ndp_dgram_count = 7
[ 6613.035329] hfranco: ncm->ndp_dgram_count = 8
[ 6613.035339] hfranco: ncm->ndp_dgram_count = 9
[ 6613.035350] hfranco: ncm->ndp_dgram_count = 10
[ 6613.035360] hfranco: ncm->ndp_dgram_count = 11
[ 6613.035366] hfranco: package_for_tx called, setting ndp_dram_count to 0

So, in general the value never passes 10 or 11, from all the test I done
with the VNC. As said earlier, by putting 4 made the VNC work:

[   17.333014] hfranco: package_for_tx called, setting ndp_dram_count to 0
[   17.333065] hfranco: ncm->ndp_dgram_count = 2
[   17.333079] hfranco: ncm->ndp_dgram_count = 3
[   17.333105] hfranco: ncm->ndp_dgram_count = 4
[   17.333114] hfranco: package_for_tx called, setting ndp_dram_count to 0
[   17.333161] hfranco: ncm->ndp_dgram_count = 2
[   17.333173] hfranco: ncm->ndp_dgram_count = 3
[   17.333184] hfranco: ncm->ndp_dgram_count = 4
[   17.333191] hfranco: package_for_tx called, setting ndp_dram_count to 0
[   17.333235] hfranco: ncm->ndp_dgram_count = 2
[   17.333259] hfranco: ncm->ndp_dgram_count = 3
[   17.333271] hfranco: ncm->ndp_dgram_count = 4

On Tue, Dec 12, 2023 at 10:21:13PM +0100, Oliver Neukum wrote:
> On 12.12.23 21:32, Hiago De Franco wrote:
> 
> Hi,
> 
> > On Mon, Dec 11, 2023 at 12:44:42PM -0800, Maciej Żenczykowski wrote:
> > > On Mon, Dec 11, 2023 at 12:29 PM Hiago De Franco <hiagofranco@gmail.com> wrote:
> 
> > > Hiago, could you try lowering CDC_NCM_TIMER_PENDING_CNT, if need be all the way to 1?
> > > It is defined in include/linux/usb/cdc_ncm.h as 3 currently
> > > This applies to the host side.
> > 
> > On my side CDC_NCM_TIMER_PENDING_CNT is set to 2 by default, did you
> > mean CDC_NCM_RESTART_TIMER_DATAGRAM_CNT?
> 
> Yes, I meant that. Sorry.
> 
> > Despite of that, I tried to lower both CDC_NCM_TIMER_PENDING_CNT and
> > CDC_NCM_RESTART_TIMER_DATAGRAM_CNT all the way down to 1, first the
> > CDC_NCM_TIMER_PENDING_CNT, then CDC_NCM_RESTART_TIMER_DATAGRAM_CNT and
> > finally both at the same time, but it didn't help.
> > 
> > I've also put some printks as following:
> > 
> > 	ctx->tx_curr_frame_num = n;
> > 	printk("hfranco: tx_curr_frame_num = %d", n);
> > 
> > 	if (n == 0) {
> > 		printk("hfranco: n == 0");
> > 		/* wait for more frames */
> > 		/* push variables */
> > 		ctx->tx_curr_skb = skb_out;
> > 		goto exit_no_skb;
> > 
> > 	} else if ((n < ctx->tx_max_datagrams) && (ready2send == 0) && (ctx->timer_interval > 0)) {
> > 		printk("hfranco: tx_max_datagrams = %d", ctx->tx_max_datagrams);
> > 		printk("hfranco: timer_interval = %d", ctx->timer_interval);
> > 		printk("hfranco: n inside else if = %d", n);
> > 		/* wait for more frames */
> > 		/* push variables */
> > 		ctx->tx_curr_skb = skb_out;
> > 		/* set the pending count */
> > 		if (n < CDC_NCM_RESTART_TIMER_DATAGRAM_CNT)
> > 			ctx->tx_timer_pending = CDC_NCM_TIMER_PENDING_CNT;
> > 		goto exit_no_skb;
> > 
> > 	} else {
> > 		printk("hfranco: n inside else = %d", n);
> > 		if (n == ctx->tx_max_datagrams)
> > 			ctx->tx_reason_max_datagram++;	/* count reason for transmitting */
> > 
> > I ran it on my host PC, compiled it as module for my Debian dekstop, and
> > this is the dmesg:
> > 
> > [ 9663.478807] hfranco: tx_curr_frame_num = 1
> > [ 9663.478816] hfranco: tx_max_datagrams = 40
> > [ 9663.478818] hfranco: timer_interval = 400000
> > [ 9663.478820] hfranco: n inside else if = 1
> > [ 9663.478822] hfranco: timer started
> > [ 9663.479645] hfranco: tx_curr_frame_num = 1
> > [ 9663.479652] hfranco: n inside else = 1
> > 
> > And then it basically repeats. Looks like 'n' never passes the 1 value.
> > By tweaking the flags mentioned before, 'n' got a value of 4, but that
> > was the maximum value. I was wondering, why do you think this code looks
> > suspicious? I basically just inserted some printks on the tx side, I
> > will see if I can get something from the rx as well.
> 
> If we look at the statistics you initially gathered, we can see that all transmissions
> on the host side happen because the timeout elapses. That, however, does
> _not_ tell us that the host is to blame. We could look at two possible scenarios
> 
> A - the gadget is bundling up the packets with too much delay and the host
> just answers to the megatransmissions with one packet and the delay on the host
> is inconsequential
> 
> B - the timer on the host runs for too long or sometimes not at all. If that were
> the case that code I pointed out would be most likely to blame
> 
> Could I suggest we try to localize the issue? Can you ping the host from the device?

Lastly, about Oliver's question, I tried to send an UDP package using
netcat from host to device and then from device to host. I was sending
one single packet in loop with 1 second interval first, then tried with
0.5 second and 0.1 second. My hope was, with this test, detect who was
failing, host or device.

For 1 second, it pretty much works on both directions. As example, this
is the output of my host PC (sendind packages with the device -> "echo
'hi' | nc -u 192.168.11.2 5201"):

$ sudo tcpdump -i enx9627798c3fa3
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on enx9627798c3fa3, link-type EN10MB (Ethernet), snapshot length 262144 bytes
17:47:51.865967 IP apalis-imx6-10692086.local.54882 > hdebian.5201: UDP, length 3
17:47:51.866051 IP hdebian > apalis-imx6-10692086.local: ICMP hdebian udp port 5201 unreachable, length 39
17:47:52.145987 IP hdebian.mdns > mdns.mcast.net.mdns: 0 [b2&3=0x200] [2a] PTR (QM)? 1.11.168.192.in-addr.arpa. (134)
17:47:52.146020 IP hdebian.mdns > mdns.mcast.net.mdns: 0 [0q] (12)
17:47:52.198689 IP apalis-imx6-10692086.local.mdns > mdns.mcast.net.mdns: 0*- [0q] 1/0/0 (Cache flush) PTR apalis-imx6-10692086.local. (77)
17:47:52.882532 IP apalis-imx6-10692086.local.37745 > hdebian.5201: UDP, length 3
17:47:52.882634 IP hdebian > apalis-imx6-10692086.local: ICMP hdebian udp port 5201 unreachable, length 39
17:47:53.903508 IP apalis-imx6-10692086.local.38147 > hdebian.5201: UDP, length 3
17:47:53.903585 IP hdebian > apalis-imx6-10692086.local: ICMP hdebian udp port 5201 unreachable, length 39
17:47:54.920174 IP apalis-imx6-10692086.local.60718 > hdebian.5201: UDP, length 3
17:47:54.920265 IP hdebian > apalis-imx6-10692086.local: ICMP hdebian udp port 5201 unreachable, length 39
17:47:55.940687 IP apalis-imx6-10692086.local.38729 > hdebian.5201: UDP, length 3
17:47:55.940743 IP hdebian > apalis-imx6-10692086.local: ICMP hdebian udp port 5201 unreachable, length 39
17:47:56.913135 ARP, Request who-has apalis-imx6-10692086.local tell hdebian, length 28
17:47:56.915052 ARP, Reply apalis-imx6-10692086.local is-at ba:6d:d0:0a:95:6b (oui Unknown), length 28
17:47:56.956927 IP apalis-imx6-10692086.local.54649 > hdebian.5201: UDP, length 3
17:47:56.956966 IP hdebian > apalis-imx6-10692086.local: ICMP hdebian udp port 5201 unreachable, length 39
17:47:57.980879 IP apalis-imx6-10692086.local.53366 > hdebian.5201: UDP, length 3
17:47:57.980984 IP hdebian > apalis-imx6-10692086.local: ICMP hdebian udp port 5201 unreachable, length 39

When I lower down the time between two packages from 1 second to 0.5 and
0.1, it suddenly freezes. It happens on both directions, host to device
and device to host. As example, this is the host send UDP packets to the
device with 0.1s interval:

/ # tcpdump -i usb0
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on usb0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
18:39:53.662594 IP 192.168.11.2.54802 > 192.168.11.1.5201: UDP, length 3
18:39:53.662716 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39
18:39:53.768678 IP 192.168.11.2.49080 > 192.168.11.1.5201: UDP, length 3
18:39:53.768767 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39
18:39:53.875305 IP 192.168.11.2.56938 > 192.168.11.1.5201: UDP, length 3
18:39:53.875376 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39
18:39:53.982238 IP 192.168.11.2.40177 > 192.168.11.1.5201: UDP, length 3
18:39:53.982354 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39
18:39:54.089279 IP 192.168.11.2.43284 > 192.168.11.1.5201: UDP, length 3
18:39:54.089378 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39
18:39:54.196027 IP 192.168.11.2.41990 > 192.168.11.1.5201: UDP, length 3
18:39:54.196105 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39
18:39:54.303793 IP 192.168.11.2.42362 > 192.168.11.1.5201: UDP, length 3
18:39:58.664608 ARP, Request who-has 192.168.11.2 tell 192.168.11.1, length 28
18:39:58.666781 ARP, Reply 192.168.11.2 is-at ea:2a:bc:40:41:30 (oui Unknown), length 28
18:39:58.669531 ARP, Request who-has 192.168.11.1 tell 192.168.11.2, length 28
18:39:58.669582 ARP, Reply 192.168.11.1 is-at 1e:be:d4:00:51:0e (oui Unknown), length 28

It stops after this ARP request. Sometimes it send packages again, but
after a few seconds it freezes. So, at the end, I couldn't figure out
where the issue is...

> 
> 	Regards
> 		Oliver

On Tue, Dec 12, 2023 at 05:11:49PM -0800, Maciej Żenczykowski wrote:
> On Tue, Dec 12, 2023 at 1:21 PM Oliver Neukum <oneukum@suse.com> wrote:
> >
> > On 12.12.23 21:32, Hiago De Franco wrote:
> >
> > Hi,
> >
> > > On Mon, Dec 11, 2023 at 12:44:42PM -0800, Maciej Żenczykowski wrote:
> > >> On Mon, Dec 11, 2023 at 12:29 PM Hiago De Franco <hiagofranco@gmail.com> wrote:
> >
> > >> Hiago, could you try lowering CDC_NCM_TIMER_PENDING_CNT, if need be all the way to 1?
> > >> It is defined in include/linux/usb/cdc_ncm.h as 3 currently
> > >> This applies to the host side.
> > >
> > > On my side CDC_NCM_TIMER_PENDING_CNT is set to 2 by default, did you
> > > mean CDC_NCM_RESTART_TIMER_DATAGRAM_CNT?
> >
> > Yes, I meant that. Sorry.
> >
> > > Despite of that, I tried to lower both CDC_NCM_TIMER_PENDING_CNT and
> > > CDC_NCM_RESTART_TIMER_DATAGRAM_CNT all the way down to 1, first the
> > > CDC_NCM_TIMER_PENDING_CNT, then CDC_NCM_RESTART_TIMER_DATAGRAM_CNT and
> > > finally both at the same time, but it didn't help.
> > >
> > > I've also put some printks as following:
> > >
> > >       ctx->tx_curr_frame_num = n;
> > >       printk("hfranco: tx_curr_frame_num = %d", n);
> > >
> > >       if (n == 0) {
> > >               printk("hfranco: n == 0");
> > >               /* wait for more frames */
> > >               /* push variables */
> > >               ctx->tx_curr_skb = skb_out;
> > >               goto exit_no_skb;
> > >
> > >       } else if ((n < ctx->tx_max_datagrams) && (ready2send == 0) && (ctx->timer_interval > 0)) {
> > >               printk("hfranco: tx_max_datagrams = %d", ctx->tx_max_datagrams);
> > >               printk("hfranco: timer_interval = %d", ctx->timer_interval);
> > >               printk("hfranco: n inside else if = %d", n);
> > >               /* wait for more frames */
> > >               /* push variables */
> > >               ctx->tx_curr_skb = skb_out;
> > >               /* set the pending count */
> > >               if (n < CDC_NCM_RESTART_TIMER_DATAGRAM_CNT)
> > >                       ctx->tx_timer_pending = CDC_NCM_TIMER_PENDING_CNT;
> > >               goto exit_no_skb;
> > >
> > >       } else {
> > >               printk("hfranco: n inside else = %d", n);
> > >               if (n == ctx->tx_max_datagrams)
> > >                       ctx->tx_reason_max_datagram++;  /* count reason for transmitting */
> > >
> > > I ran it on my host PC, compiled it as module for my Debian dekstop, and
> 
> Shouldn't you be doing this on the gadget side?
> I thought we were thinking it was the gadget transmit timer having issues.

Because I still don't know where the issue is, I was testing both sides.

> 
> > > this is the dmesg:
> > >
> > > [ 9663.478807] hfranco: tx_curr_frame_num = 1
> > > [ 9663.478816] hfranco: tx_max_datagrams = 40
> > > [ 9663.478818] hfranco: timer_interval = 400000
> > > [ 9663.478820] hfranco: n inside else if = 1
> > > [ 9663.478822] hfranco: timer started
> > > [ 9663.479645] hfranco: tx_curr_frame_num = 1
> > > [ 9663.479652] hfranco: n inside else = 1
> > >
> > > And then it basically repeats. Looks like 'n' never passes the 1 value.
> > > By tweaking the flags mentioned before, 'n' got a value of 4, but that
> > > was the maximum value. I was wondering, why do you think this code looks
> > > suspicious? I basically just inserted some printks on the tx side, I
> > > will see if I can get something from the rx as well.
> >
> > If we look at the statistics you initially gathered, we can see that all transmissions
> > on the host side happen because the timeout elapses. That, however, does
> > _not_ tell us that the host is to blame. We could look at two possible scenarios
> >
> > A - the gadget is bundling up the packets with too much delay and the host
> > just answers to the megatransmissions with one packet and the delay on the host
> > is inconsequential
> >
> > B - the timer on the host runs for too long or sometimes not at all. If that were
> > the case that code I pointed out would be most likely to blame
> >
> > Could I suggest we try to localize the issue? Can you ping the host from the device?
> >
> >         Regards
> >                 Oliver
> 
> --
> Maciej Żenczykowski, Kernel Networking Developer @ Google

Sorry if I couldn't provide enough information, I'm kind of learning
trought the process. The TX_MAX_NUM_DPE is something I'm not fully
convinced, but again I will keep debugging.

Best regards,
Hiago.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Question regarding CDC NCM and VNC performance issue
  2023-12-15 21:20                                     ` Hiago De Franco
@ 2023-12-15 21:44                                       ` Maciej Żenczykowski
  2023-12-18 14:34                                         ` Oliver Neukum
  0 siblings, 1 reply; 29+ messages in thread
From: Maciej Żenczykowski @ 2023-12-15 21:44 UTC (permalink / raw)
  To: Hiago De Franco
  Cc: Oliver Neukum, Francesco Dolcini, davem, edumazet, hiago.franco,
	kuba, linux-usb, pabeni

On Fri, Dec 15, 2023 at 1:20 PM Hiago De Franco <hiagofranco@gmail.com> wrote:
>
> Hi Oliver and Maciej,
>
> Sorry for the delay on my message. I took some time to set up some
> debugging tools, hoping to find something more interesting. I couldn't
> find too much I guess, but I will describe the results here.

No worries.

> First, I set up ftrace to track 'package_for_tx' function. My idea would
> be that I would see the delays between two consecutives calls of this
> function being more than 300us, this would say the code wasn't
> respecting the hrtimer. The trace log looks something like that:

Here's my recommendation.
On the gadget side.
Print something when the timer is armed.
Print something when the timer fires.
Include the timestamp.

Then try to send something (a single packet) that won't trigger a reply.

For example run a script in a screen session that sends 1 unsolicited
arp reply packet every 10 seconds.
This is rare enough, it should obviously get the timer to fire.
Again, wait 5 minutes, so you've had 30+ packets sent.
See if it does even fire...
You can run tcpdump on the receiver to see what you receive (and the
timestamps on that)


>
> # tracer: function
> #
> # entries-in-buffer/entries-written: 468/468   #P:2
> #
> #                                _-----=> irqs-off/BH-disabled
> #                               / _----=> need-resched
> #                              | / _---=> hardirq/softirq
> #                              || / _--=> preempt-depth
> #                              ||| / _-=> migrate-disable
> #                              |||| /     delay
> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> #              | |         |   |||||     |         |
>             tezi-237     [001] dns..   101.030667: package_for_tx <-ncm_wrap_ntb
>           <idle>-0       [000] d.s..   101.032771: package_for_tx <-ncm_wrap_ntb
>           weston-228     [001] d.s..   101.039430: package_for_tx <-ncm_wrap_ntb
>             tezi-237     [000] d.s..   101.041634: package_for_tx <-ncm_wrap_ntb
>           <idle>-0       [001] d.s..   101.042326: package_for_tx <-ncm_wrap_ntb
>           <idle>-0       [001] d.s..   101.055665: package_for_tx <-ncm_wrap_ntb
>           weston-231     [001] d.s..   101.057779: package_for_tx <-ncm_wrap_ntb
>           <idle>-0       [001] d.s..   101.068630: package_for_tx <-ncm_wrap_ntb
>           weston-231     [001] d.s..   101.085738: package_for_tx <-ncm_wrap_ntb
>             tezi-237     [000] D.s..   101.088611: package_for_tx <-ncm_wrap_ntb
>
> So I created a simple script to calculate the delta time between two
> consecutive calls and give me at the end how many of them are above
> 300us. This is the result:
>
> Total timestamps above 300 microseconds: 440 of 742
> Percentage of timestamps above 300 microseconds: 59.30%
>
> Then I decided to lower down the TX_MAX_NUM_DPE from 32 to 0.
> 0 and 4 made the VNC runs smooth. I've also tested 8 and 16, but these
> two didn't help. Again, by setting the number to 0, this is the result:
>
> Total timestamps above 300 microseconds: 3149 of 65420
> Percentage of timestamps above 300 microseconds: 4.81%
>
> Of course we also have more packages, as the VNC is working and it's not
> frozen. It's interesting how fast this is now, but as discussed with a
> coleague, we don't know if this tells much. There are more latencies
> envolved, for example from the scheduler, so probably the result is not
> showing me anything valuable.
>
> My second test was trying to figure out how TX_MAX_NUM_DPE was
> impacting my issue, so I added two printks to check the number it was
> getting and when it was being zeroed.
>
> /* Delay for the transmit to wait before sending an unfilled NTB frame. */
>  #define TX_TIMEOUT_NSECS       300000
> @@ -964,6 +964,7 @@ static struct sk_buff *package_for_tx(struct f_ncm *ncm)
>         /* Set the final NDP wLength */
>         new_len = opts->ndp_size +
>                         (ncm->ndp_dgram_count * dgram_idx_len);
> +       printk("hfranco: package_for_tx called, setting ndp_dram_count to 0");
>         ncm->ndp_dgram_count = 0;
>         /* Increment from start to wLength */
>         ntb_iter = (void *) ncm->skb_tx_ndp->data;
> @@ -1027,6 +1028,8 @@ static struct sk_buff *ncm_wrap_ntb(struct gether *port,
>                  * ready for new data.
>                  * NOTE: Assume maximum align for speed of calculation.
>                  */
> +
> +               printk("hfranco: ncm->ndp_dgram_count = %d", ncm->ndp_dgram_count);
>                 if (ncm->skb_tx_data
>                     && (ncm->ndp_dgram_count >= TX_MAX_NUM_DPE
>                     || (ncm->skb_tx_data->len +
>
> This is the result:
>
> [ 6613.034304] hfranco: package_for_tx called, setting ndp_dram_count to 0
> [ 6613.034783] hfranco: ncm->ndp_dgram_count = 0
> [ 6613.034812] hfranco: ncm->ndp_dgram_count = 2
> [ 6613.034823] hfranco: ncm->ndp_dgram_count = 3
> [ 6613.034835] hfranco: ncm->ndp_dgram_count = 4
> [ 6613.034844] hfranco: ncm->ndp_dgram_count = 5
> [ 6613.034855] hfranco: ncm->ndp_dgram_count = 6
> [ 6613.034865] hfranco: ncm->ndp_dgram_count = 7
> [ 6613.034874] hfranco: ncm->ndp_dgram_count = 8
> [ 6613.034884] hfranco: ncm->ndp_dgram_count = 9
> [ 6613.034895] hfranco: ncm->ndp_dgram_count = 10
> [ 6613.034905] hfranco: ncm->ndp_dgram_count = 11
> [ 6613.034914] hfranco: package_for_tx called, setting ndp_dram_count to 0
> [ 6613.035266] hfranco: ncm->ndp_dgram_count = 2
> [ 6613.035281] hfranco: ncm->ndp_dgram_count = 3
> [ 6613.035292] hfranco: ncm->ndp_dgram_count = 4
> [ 6613.035301] hfranco: ncm->ndp_dgram_count = 5
> [ 6613.035310] hfranco: ncm->ndp_dgram_count = 6
> [ 6613.035320] hfranco: ncm->ndp_dgram_count = 7
> [ 6613.035329] hfranco: ncm->ndp_dgram_count = 8
> [ 6613.035339] hfranco: ncm->ndp_dgram_count = 9
> [ 6613.035350] hfranco: ncm->ndp_dgram_count = 10
> [ 6613.035360] hfranco: ncm->ndp_dgram_count = 11
> [ 6613.035366] hfranco: package_for_tx called, setting ndp_dram_count to 0
>
> So, in general the value never passes 10 or 11, from all the test I done
> with the VNC. As said earlier, by putting 4 made the VNC work:
>
> [   17.333014] hfranco: package_for_tx called, setting ndp_dram_count to 0
> [   17.333065] hfranco: ncm->ndp_dgram_count = 2
> [   17.333079] hfranco: ncm->ndp_dgram_count = 3
> [   17.333105] hfranco: ncm->ndp_dgram_count = 4
> [   17.333114] hfranco: package_for_tx called, setting ndp_dram_count to 0
> [   17.333161] hfranco: ncm->ndp_dgram_count = 2
> [   17.333173] hfranco: ncm->ndp_dgram_count = 3
> [   17.333184] hfranco: ncm->ndp_dgram_count = 4
> [   17.333191] hfranco: package_for_tx called, setting ndp_dram_count to 0
> [   17.333235] hfranco: ncm->ndp_dgram_count = 2
> [   17.333259] hfranco: ncm->ndp_dgram_count = 3
> [   17.333271] hfranco: ncm->ndp_dgram_count = 4
>
> On Tue, Dec 12, 2023 at 10:21:13PM +0100, Oliver Neukum wrote:
> > On 12.12.23 21:32, Hiago De Franco wrote:
> >
> > Hi,
> >
> > > On Mon, Dec 11, 2023 at 12:44:42PM -0800, Maciej Żenczykowski wrote:
> > > > On Mon, Dec 11, 2023 at 12:29 PM Hiago De Franco <hiagofranco@gmail.com> wrote:
> >
> > > > Hiago, could you try lowering CDC_NCM_TIMER_PENDING_CNT, if need be all the way to 1?
> > > > It is defined in include/linux/usb/cdc_ncm.h as 3 currently
> > > > This applies to the host side.
> > >
> > > On my side CDC_NCM_TIMER_PENDING_CNT is set to 2 by default, did you
> > > mean CDC_NCM_RESTART_TIMER_DATAGRAM_CNT?
> >
> > Yes, I meant that. Sorry.
> >
> > > Despite of that, I tried to lower both CDC_NCM_TIMER_PENDING_CNT and
> > > CDC_NCM_RESTART_TIMER_DATAGRAM_CNT all the way down to 1, first the
> > > CDC_NCM_TIMER_PENDING_CNT, then CDC_NCM_RESTART_TIMER_DATAGRAM_CNT and
> > > finally both at the same time, but it didn't help.
> > >
> > > I've also put some printks as following:
> > >
> > >     ctx->tx_curr_frame_num = n;
> > >     printk("hfranco: tx_curr_frame_num = %d", n);
> > >
> > >     if (n == 0) {
> > >             printk("hfranco: n == 0");
> > >             /* wait for more frames */
> > >             /* push variables */
> > >             ctx->tx_curr_skb = skb_out;
> > >             goto exit_no_skb;
> > >
> > >     } else if ((n < ctx->tx_max_datagrams) && (ready2send == 0) && (ctx->timer_interval > 0)) {
> > >             printk("hfranco: tx_max_datagrams = %d", ctx->tx_max_datagrams);
> > >             printk("hfranco: timer_interval = %d", ctx->timer_interval);
> > >             printk("hfranco: n inside else if = %d", n);
> > >             /* wait for more frames */
> > >             /* push variables */
> > >             ctx->tx_curr_skb = skb_out;
> > >             /* set the pending count */
> > >             if (n < CDC_NCM_RESTART_TIMER_DATAGRAM_CNT)
> > >                     ctx->tx_timer_pending = CDC_NCM_TIMER_PENDING_CNT;
> > >             goto exit_no_skb;
> > >
> > >     } else {
> > >             printk("hfranco: n inside else = %d", n);
> > >             if (n == ctx->tx_max_datagrams)
> > >                     ctx->tx_reason_max_datagram++;  /* count reason for transmitting */
> > >
> > > I ran it on my host PC, compiled it as module for my Debian dekstop, and
> > > this is the dmesg:
> > >
> > > [ 9663.478807] hfranco: tx_curr_frame_num = 1
> > > [ 9663.478816] hfranco: tx_max_datagrams = 40
> > > [ 9663.478818] hfranco: timer_interval = 400000
> > > [ 9663.478820] hfranco: n inside else if = 1
> > > [ 9663.478822] hfranco: timer started
> > > [ 9663.479645] hfranco: tx_curr_frame_num = 1
> > > [ 9663.479652] hfranco: n inside else = 1
> > >
> > > And then it basically repeats. Looks like 'n' never passes the 1 value.
> > > By tweaking the flags mentioned before, 'n' got a value of 4, but that
> > > was the maximum value. I was wondering, why do you think this code looks
> > > suspicious? I basically just inserted some printks on the tx side, I
> > > will see if I can get something from the rx as well.
> >
> > If we look at the statistics you initially gathered, we can see that all transmissions
> > on the host side happen because the timeout elapses. That, however, does
> > _not_ tell us that the host is to blame. We could look at two possible scenarios
> >
> > A - the gadget is bundling up the packets with too much delay and the host
> > just answers to the megatransmissions with one packet and the delay on the host
> > is inconsequential
> >
> > B - the timer on the host runs for too long or sometimes not at all. If that were
> > the case that code I pointed out would be most likely to blame
> >
> > Could I suggest we try to localize the issue? Can you ping the host from the device?
>
> Lastly, about Oliver's question, I tried to send an UDP package using
> netcat from host to device and then from device to host. I was sending
> one single packet in loop with 1 second interval first, then tried with
> 0.5 second and 0.1 second. My hope was, with this test, detect who was
> failing, host or device.
>
> For 1 second, it pretty much works on both directions. As example, this
> is the output of my host PC (sendind packages with the device -> "echo
> 'hi' | nc -u 192.168.11.2 5201"):
>
> $ sudo tcpdump -i enx9627798c3fa3
> tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
> listening on enx9627798c3fa3, link-type EN10MB (Ethernet), snapshot length 262144 bytes
> 17:47:51.865967 IP apalis-imx6-10692086.local.54882 > hdebian.5201: UDP, length 3
> 17:47:51.866051 IP hdebian > apalis-imx6-10692086.local: ICMP hdebian udp port 5201 unreachable, length 39
> 17:47:52.145987 IP hdebian.mdns > mdns.mcast.net.mdns: 0 [b2&3=0x200] [2a] PTR (QM)? 1.11.168.192.in-addr.arpa. (134)
> 17:47:52.146020 IP hdebian.mdns > mdns.mcast.net.mdns: 0 [0q] (12)
> 17:47:52.198689 IP apalis-imx6-10692086.local.mdns > mdns.mcast.net.mdns: 0*- [0q] 1/0/0 (Cache flush) PTR apalis-imx6-10692086.local. (77)
> 17:47:52.882532 IP apalis-imx6-10692086.local.37745 > hdebian.5201: UDP, length 3
> 17:47:52.882634 IP hdebian > apalis-imx6-10692086.local: ICMP hdebian udp port 5201 unreachable, length 39
> 17:47:53.903508 IP apalis-imx6-10692086.local.38147 > hdebian.5201: UDP, length 3
> 17:47:53.903585 IP hdebian > apalis-imx6-10692086.local: ICMP hdebian udp port 5201 unreachable, length 39
> 17:47:54.920174 IP apalis-imx6-10692086.local.60718 > hdebian.5201: UDP, length 3
> 17:47:54.920265 IP hdebian > apalis-imx6-10692086.local: ICMP hdebian udp port 5201 unreachable, length 39
> 17:47:55.940687 IP apalis-imx6-10692086.local.38729 > hdebian.5201: UDP, length 3
> 17:47:55.940743 IP hdebian > apalis-imx6-10692086.local: ICMP hdebian udp port 5201 unreachable, length 39
> 17:47:56.913135 ARP, Request who-has apalis-imx6-10692086.local tell hdebian, length 28
> 17:47:56.915052 ARP, Reply apalis-imx6-10692086.local is-at ba:6d:d0:0a:95:6b (oui Unknown), length 28
> 17:47:56.956927 IP apalis-imx6-10692086.local.54649 > hdebian.5201: UDP, length 3
> 17:47:56.956966 IP hdebian > apalis-imx6-10692086.local: ICMP hdebian udp port 5201 unreachable, length 39
> 17:47:57.980879 IP apalis-imx6-10692086.local.53366 > hdebian.5201: UDP, length 3
> 17:47:57.980984 IP hdebian > apalis-imx6-10692086.local: ICMP hdebian udp port 5201 unreachable, length 39

note that you're getting bidirectional transfer due to the icmp errors.

if you want to test unidirectionally, you need to send something that
won't get a reply.
for example an unsolicited arp reply (with arping)... or a udp packet
to a port drop'ed by a firewall.

and of course make sure you don't run it in such a way that you get
ssh traffic causing bidirectional traffic.
(ie. run it (and tcpdump) under screen)
[this is assuming you can't run it directly on the device without
getting in via ssh,
maybe you can via serial port or whatever - though do note that serial
wakeups could potentially also 'trigger' / 'fix' your hrtimer,
so probably best to run things silently under screen with no direct
output via ssh/serial/console.

> When I lower down the time between two packages from 1 second to 0.5 and
> 0.1, it suddenly freezes. It happens on both directions, host to device
> and device to host. As example, this is the host send UDP packets to the
> device with 0.1s interval:
>
> / # tcpdump -i usb0
> tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
> listening on usb0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
> 18:39:53.662594 IP 192.168.11.2.54802 > 192.168.11.1.5201: UDP, length 3
> 18:39:53.662716 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39
> 18:39:53.768678 IP 192.168.11.2.49080 > 192.168.11.1.5201: UDP, length 3
> 18:39:53.768767 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39
> 18:39:53.875305 IP 192.168.11.2.56938 > 192.168.11.1.5201: UDP, length 3
> 18:39:53.875376 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39
> 18:39:53.982238 IP 192.168.11.2.40177 > 192.168.11.1.5201: UDP, length 3
> 18:39:53.982354 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39
> 18:39:54.089279 IP 192.168.11.2.43284 > 192.168.11.1.5201: UDP, length 3
> 18:39:54.089378 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39
> 18:39:54.196027 IP 192.168.11.2.41990 > 192.168.11.1.5201: UDP, length 3
> 18:39:54.196105 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39
> 18:39:54.303793 IP 192.168.11.2.42362 > 192.168.11.1.5201: UDP, length 3
> 18:39:58.664608 ARP, Request who-has 192.168.11.2 tell 192.168.11.1, length 28
> 18:39:58.666781 ARP, Reply 192.168.11.2 is-at ea:2a:bc:40:41:30 (oui Unknown), length 28
> 18:39:58.669531 ARP, Request who-has 192.168.11.1 tell 192.168.11.2, length 28
> 18:39:58.669582 ARP, Reply 192.168.11.1 is-at 1e:be:d4:00:51:0e (oui Unknown), length 28
>
> It stops after this ARP request. Sometimes it send packages again, but
> after a few seconds it freezes. So, at the end, I couldn't figure out
> where the issue is...
>
> >
> >       Regards
> >               Oliver
>
> On Tue, Dec 12, 2023 at 05:11:49PM -0800, Maciej Żenczykowski wrote:
> > On Tue, Dec 12, 2023 at 1:21 PM Oliver Neukum <oneukum@suse.com> wrote:
> > >
> > > On 12.12.23 21:32, Hiago De Franco wrote:
> > >
> > > Hi,
> > >
> > > > On Mon, Dec 11, 2023 at 12:44:42PM -0800, Maciej Żenczykowski wrote:
> > > >> On Mon, Dec 11, 2023 at 12:29 PM Hiago De Franco <hiagofranco@gmail.com> wrote:
> > >
> > > >> Hiago, could you try lowering CDC_NCM_TIMER_PENDING_CNT, if need be all the way to 1?
> > > >> It is defined in include/linux/usb/cdc_ncm.h as 3 currently
> > > >> This applies to the host side.
> > > >
> > > > On my side CDC_NCM_TIMER_PENDING_CNT is set to 2 by default, did you
> > > > mean CDC_NCM_RESTART_TIMER_DATAGRAM_CNT?
> > >
> > > Yes, I meant that. Sorry.
> > >
> > > > Despite of that, I tried to lower both CDC_NCM_TIMER_PENDING_CNT and
> > > > CDC_NCM_RESTART_TIMER_DATAGRAM_CNT all the way down to 1, first the
> > > > CDC_NCM_TIMER_PENDING_CNT, then CDC_NCM_RESTART_TIMER_DATAGRAM_CNT and
> > > > finally both at the same time, but it didn't help.
> > > >
> > > > I've also put some printks as following:
> > > >
> > > >       ctx->tx_curr_frame_num = n;
> > > >       printk("hfranco: tx_curr_frame_num = %d", n);
> > > >
> > > >       if (n == 0) {
> > > >               printk("hfranco: n == 0");
> > > >               /* wait for more frames */
> > > >               /* push variables */
> > > >               ctx->tx_curr_skb = skb_out;
> > > >               goto exit_no_skb;
> > > >
> > > >       } else if ((n < ctx->tx_max_datagrams) && (ready2send == 0) && (ctx->timer_interval > 0)) {
> > > >               printk("hfranco: tx_max_datagrams = %d", ctx->tx_max_datagrams);
> > > >               printk("hfranco: timer_interval = %d", ctx->timer_interval);
> > > >               printk("hfranco: n inside else if = %d", n);
> > > >               /* wait for more frames */
> > > >               /* push variables */
> > > >               ctx->tx_curr_skb = skb_out;
> > > >               /* set the pending count */
> > > >               if (n < CDC_NCM_RESTART_TIMER_DATAGRAM_CNT)
> > > >                       ctx->tx_timer_pending = CDC_NCM_TIMER_PENDING_CNT;
> > > >               goto exit_no_skb;
> > > >
> > > >       } else {
> > > >               printk("hfranco: n inside else = %d", n);
> > > >               if (n == ctx->tx_max_datagrams)
> > > >                       ctx->tx_reason_max_datagram++;  /* count reason for transmitting */
> > > >
> > > > I ran it on my host PC, compiled it as module for my Debian dekstop, and
> >
> > Shouldn't you be doing this on the gadget side?
> > I thought we were thinking it was the gadget transmit timer having issues.
>
> Because I still don't know where the issue is, I was testing both sides.
>
> >
> > > > this is the dmesg:
> > > >
> > > > [ 9663.478807] hfranco: tx_curr_frame_num = 1
> > > > [ 9663.478816] hfranco: tx_max_datagrams = 40
> > > > [ 9663.478818] hfranco: timer_interval = 400000
> > > > [ 9663.478820] hfranco: n inside else if = 1
> > > > [ 9663.478822] hfranco: timer started
> > > > [ 9663.479645] hfranco: tx_curr_frame_num = 1
> > > > [ 9663.479652] hfranco: n inside else = 1
> > > >
> > > > And then it basically repeats. Looks like 'n' never passes the 1 value.
> > > > By tweaking the flags mentioned before, 'n' got a value of 4, but that
> > > > was the maximum value. I was wondering, why do you think this code looks
> > > > suspicious? I basically just inserted some printks on the tx side, I
> > > > will see if I can get something from the rx as well.
> > >
> > > If we look at the statistics you initially gathered, we can see that all transmissions
> > > on the host side happen because the timeout elapses. That, however, does
> > > _not_ tell us that the host is to blame. We could look at two possible scenarios
> > >
> > > A - the gadget is bundling up the packets with too much delay and the host
> > > just answers to the megatransmissions with one packet and the delay on the host
> > > is inconsequential
> > >
> > > B - the timer on the host runs for too long or sometimes not at all. If that were
> > > the case that code I pointed out would be most likely to blame
> > >
> > > Could I suggest we try to localize the issue? Can you ping the host from the device?
> > >
> > >         Regards
> > >                 Oliver
> >
> > --
> > Maciej Żenczykowski, Kernel Networking Developer @ Google
>
> Sorry if I couldn't provide enough information, I'm kind of learning
> trought the process. The TX_MAX_NUM_DPE is something I'm not fully
> convinced, but again I will keep debugging.
>
> Best regards,
> Hiago.

--
Maciej Żenczykowski, Kernel Networking Developer @ Google

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Question regarding CDC NCM and VNC performance issue
  2023-12-15 21:44                                       ` Maciej Żenczykowski
@ 2023-12-18 14:34                                         ` Oliver Neukum
  2023-12-18 20:59                                           ` Hiago De Franco
  0 siblings, 1 reply; 29+ messages in thread
From: Oliver Neukum @ 2023-12-18 14:34 UTC (permalink / raw)
  To: Maciej Żenczykowski, Hiago De Franco
  Cc: Oliver Neukum, Francesco Dolcini, davem, edumazet, hiago.franco,
	kuba, linux-usb, pabeni



On 15.12.23 22:44, Maciej Żenczykowski wrote:
> On Fri, Dec 15, 2023 at 1:20 PM Hiago De Franco <hiagofranco@gmail.com> wrote:

Hi,

> Here's my recommendation.
> On the gadget side.
> Print something when the timer is armed.
> Print something when the timer fires.
> Include the timestamp.
> 
> Then try to send something (a single packet) that won't trigger a reply.

You are implicitly raising a question.
Does this bug strike if a single package is queued? The failure
case seems to be 10 or 11 packages.

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Question regarding CDC NCM and VNC performance issue
  2023-12-18 14:34                                         ` Oliver Neukum
@ 2023-12-18 20:59                                           ` Hiago De Franco
  2023-12-18 23:00                                             ` Maciej Żenczykowski
  0 siblings, 1 reply; 29+ messages in thread
From: Hiago De Franco @ 2023-12-18 20:59 UTC (permalink / raw)
  To: Maciej Żenczykowski, Oliver Neukum
  Cc: Francesco Dolcini, davem, edumazet, hiago.franco, kuba, linux-usb,
	pabeni

Hi,

On Fri, Dec 15, 2023 at 01:44:29PM -0800, Maciej Żenczykowski wrote:
> Here's my recommendation.
> On the gadget side.
> Print something when the timer is armed.
> Print something when the timer fires.
> Include the timestamp.
> 
> Then try to send something (a single packet) that won't trigger a reply.
> 
> For example run a script in a screen session that sends 1 unsolicited
> arp reply packet every 10 seconds.
> This is rare enough, it should obviously get the timer to fire.
> Again, wait 5 minutes, so you've had 30+ packets sent.
> See if it does even fire...
> You can run tcpdump on the receiver to see what you receive (and the
> timestamps on that)

Thanks for the recommendation, I did what you suggested for 5 minutes
adding printk's for all 'hrtimer_*' function calls inside f_ncm.c.
Basically we only get 'hrtimer_start' and 'hrtimer_try_to_cancel' calls,
which makes sense. Here is the result, doing ARP pings from the device
to the host PC:

[   20.427879] hfranco: hrtimer_try_to_cancel
[   21.270355] hfranco: hrtimer_start
[   21.270732] hfranco: hrtimer_try_to_cancel
[   31.695892] hfranco: hrtimer_start
[   31.696215] hfranco: hrtimer_try_to_cancel
[   36.324646] hfranco: hrtimer_start
[   36.324989] hfranco: hrtimer_try_to_cancel
[   37.168880] hfranco: hrtimer_start
[   37.169216] hfranco: hrtimer_try_to_cancel
[   60.359510] hfranco: hrtimer_start
[   60.359836] hfranco: hrtimer_try_to_cancel
[   68.188814] hfranco: hrtimer_start
[   68.189141] hfranco: hrtimer_try_to_cancel
[   69.035732] hfranco: hrtimer_start
[   69.036062] hfranco: hrtimer_try_to_cancel
[   90.413786] hfranco: hrtimer_start
[   90.414116] hfranco: hrtimer_try_to_cancel
[  100.380712] hfranco: hrtimer_start
[  100.381048] hfranco: hrtimer_try_to_cancel
[  110.349906] hfranco: hrtimer_start
[  110.350247] hfranco: hrtimer_try_to_cancel
[  120.321527] hfranco: hrtimer_start
[  120.321872] hfranco: hrtimer_try_to_cancel
[  120.325531] hfranco: hrtimer_start
[  120.325869] hfranco: hrtimer_try_to_cancel
[  130.295101] hfranco: hrtimer_start
[  130.295428] hfranco: hrtimer_try_to_cancel
[  131.992642] hfranco: hrtimer_start
[  131.992984] hfranco: hrtimer_try_to_cancel
[  132.839585] hfranco: hrtimer_start
[  132.839927] hfranco: hrtimer_try_to_cancel
[  140.270782] hfranco: hrtimer_start
[  140.271112] hfranco: hrtimer_try_to_cancel
[  150.248507] hfranco: hrtimer_start
[  150.248848] hfranco: hrtimer_try_to_cancel
[  160.228126] hfranco: hrtimer_start
[  160.228453] hfranco: hrtimer_try_to_cancel
[  170.209459] hfranco: hrtimer_start
[  170.209822] hfranco: hrtimer_try_to_cancel
[  180.192015] hfranco: hrtimer_start
[  180.192351] hfranco: hrtimer_try_to_cancel
[  190.176082] hfranco: hrtimer_start
[  190.176423] hfranco: hrtimer_try_to_cancel
[  200.161272] hfranco: hrtimer_start
[  200.161600] hfranco: hrtimer_try_to_cancel
[  210.147622] hfranco: hrtimer_start
[  210.147949] hfranco: hrtimer_try_to_cancel
[  220.135257] hfranco: hrtimer_start
[  220.135582] hfranco: hrtimer_try_to_cancel
[  230.123934] hfranco: hrtimer_start
[  230.124274] hfranco: hrtimer_try_to_cancel
[  237.879789] hfranco: hrtimer_start
[  237.880133] hfranco: hrtimer_try_to_cancel
[  240.113459] hfranco: hrtimer_start
[  240.113794] hfranco: hrtimer_try_to_cancel
[  250.103968] hfranco: hrtimer_start
[  250.104309] hfranco: hrtimer_try_to_cancel
[  259.800618] hfranco: hrtimer_start
[  259.800946] hfranco: hrtimer_try_to_cancel
[  260.095486] hfranco: hrtimer_start
[  260.095828] hfranco: hrtimer_try_to_cancel
[  260.646435] hfranco: hrtimer_start
[  260.646776] hfranco: hrtimer_try_to_cancel
[  270.087384] hfranco: hrtimer_start
[  270.087711] hfranco: hrtimer_try_to_cancel
[  280.080143] hfranco: hrtimer_start
[  280.080468] hfranco: hrtimer_try_to_cancel
[  290.073603] hfranco: hrtimer_start
[  290.073945] hfranco: hrtimer_try_to_cancel
[  300.067513] hfranco: hrtimer_start
[  300.067852] hfranco: hrtimer_try_to_cancel
[  310.061945] hfranco: hrtimer_start
[  310.062297] hfranco: hrtimer_try_to_cancel
[  320.056667] hfranco: hrtimer_start
[  320.057008] hfranco: hrtimer_try_to_cancel
[  330.051900] hfranco: hrtimer_start
[  330.052255] hfranco: hrtimer_try_to_cancel
[  340.047353] hfranco: hrtimer_start
[  340.047678] hfranco: hrtimer_try_to_cancel
[  350.044862] hfranco: hrtimer_start
[  350.045221] hfranco: hrtimer_try_to_cancel
[  360.043011] hfranco: hrtimer_start
[  360.043352] hfranco: hrtimer_try_to_cancel
[  370.041268] hfranco: hrtimer_start
[  370.041611] hfranco: hrtimer_try_to_cancel
[  380.039556] hfranco: hrtimer_start
[  380.039891] hfranco: hrtimer_try_to_cancel
[  390.037826] hfranco: hrtimer_start
[  390.038150] hfranco: hrtimer_try_to_cancel
[  400.036353] hfranco: hrtimer_start

/ # arping -U -I usb0 -i 10 -s 192.168.11.1 192.168.11.2
ARPING 192.168.11.2 from 192.168.11.1 usb0
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.504ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.459ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.612ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.676ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.509ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.696ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  3.008ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.446ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.776ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.529ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.669ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.523ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.574ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.872ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.654ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.639ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.576ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.584ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.560ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.393ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.627ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.799ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.980ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.661ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.707ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.523ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.711ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.704ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.566ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.678ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.592ms
Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.654ms
^CSent 32 probes (1 broadcast(s))
Received 32 response(s)

$ sudo tcpdump -i enx620542df6fc2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on enx620542df6fc2, link-type EN10MB (Ethernet), snapshot length 262144 bytes
17:02:32.599638 IP6 hdebian > ip6-allrouters: ICMP6, router solicitation, length 8
17:02:35.425325 IP6 apalis-imx6-10692086.local > ip6-allrouters: ICMP6, router solicitation, length 16
17:02:42.810738 IP6 hdebian.mdns > ff02::fb.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45)
17:02:42.865794 IP6 hdebian.mdns > ff02::fb.mdns: 0 [b2&3=0x200] [2a] PTR (QM)? b.f.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.f.f.ip6.arpa. (181)
17:02:42.865832 IP6 hdebian.mdns > ff02::fb.mdns: 0 [0q] (12)
17:02:43.253971 IP apalis-imx6-10692086.local.mdns > mdns.mcast.net.mdns: 0 [1a] PTR (QM)? _tezi._tcp.local. (71)
17:02:43.895157 IP6 hdebian.mdns > ff02::fb.mdns: 0 [b2&3=0x200] [2a] PTR (QM)? b.f.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.f.f.ip6.arpa. (181)
17:02:43.895196 IP6 hdebian.mdns > ff02::fb.mdns: 0 [0q] (12)
17:02:44.100862 IP6 apalis-imx6-10692086.local.mdns > ff02::fb.mdns: 0 [1a] PTR (QM)? _tezi._tcp.local. (71)
17:02:44.961717 IP hdebian.mdns > mdns.mcast.net.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45)
17:02:46.145083 IP6 hdebian.mdns > ff02::fb.mdns: 0 [b2&3=0x200] [2a] PTR (QM)? b.f.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.f.f.ip6.arpa. (181)
17:02:46.145128 IP6 hdebian.mdns > ff02::fb.mdns: 0 [0q] (12)
17:03:05.477178 ARP, Request who-has hdebian (Broadcast) tell apalis-imx6-10692086.local, length 28
17:03:05.477217 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:03:15.443291 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:03:15.443309 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:03:24.580306 IP6 hdebian > ip6-allrouters: ICMP6, router solicitation, length 8
17:03:25.411804 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:03:25.411821 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:03:35.382697 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:03:35.382717 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:03:35.386582 IP6 apalis-imx6-10692086.local > ip6-allrouters: ICMP6, router solicitation, length 16
17:03:45.355482 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:03:45.355504 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:03:46.811643 IP6 hdebian.mdns > ff02::fb.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45)
17:03:47.052930 IP apalis-imx6-10692086.local.mdns > mdns.mcast.net.mdns: 0 [1a] PTR (QM)? _tezi._tcp.local. (71)
17:03:47.899743 IP6 apalis-imx6-10692086.local.mdns > ff02::fb.mdns: 0 [1a] PTR (QM)? _tezi._tcp.local. (71)
17:03:48.962107 IP hdebian.mdns > mdns.mcast.net.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45)
17:03:55.330407 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:03:55.330423 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:04:05.307308 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:04:05.307324 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:04:15.286180 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:04:15.286209 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:04:25.266785 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:04:25.266816 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:04:35.248474 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:04:35.248489 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:04:45.231873 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:04:45.231902 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:04:55.216288 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:04:55.216315 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:05:05.201924 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:05:05.201951 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:05:09.438461 IP6 hdebian > ip6-allrouters: ICMP6, router solicitation, length 8
17:05:15.188755 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:05:15.188799 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:05:25.176620 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:05:25.176648 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:05:32.931912 IP6 apalis-imx6-10692086.local > ip6-allrouters: ICMP6, router solicitation, length 16
17:05:35.165375 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:05:35.165399 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:05:45.155068 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:05:45.155082 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:05:54.813550 IP6 hdebian.mdns > ff02::fb.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45)
17:05:54.851012 IP apalis-imx6-10692086.local.mdns > mdns.mcast.net.mdns: 0 [1a] PTR (QM)? _tezi._tcp.local. (71)
17:05:55.145920 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:05:55.145943 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:05:55.696822 IP6 apalis-imx6-10692086.local.mdns > ff02::fb.mdns: 0 [1a] PTR (QM)? _tezi._tcp.local. (71)
17:05:56.963151 IP hdebian.mdns > mdns.mcast.net.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45)
17:06:05.136948 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:06:05.137104 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:06:15.128897 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:06:15.128918 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:06:25.121662 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:06:25.121682 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:06:35.114912 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:06:35.114943 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:06:45.108576 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:06:45.108599 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:06:55.102468 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:06:55.102501 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:07:05.096969 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:07:05.096994 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:07:15.091619 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:07:15.091647 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:07:25.088359 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:07:25.088385 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:07:35.085718 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:07:35.085753 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:07:45.083263 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:07:45.083291 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:07:55.080759 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:07:55.080782 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:08:05.078303 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:08:05.078344 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
17:08:15.075990 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
17:08:15.076012 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
^C
88 packets captured
88 packets received by filter
0 packets dropped by kernel

Everything basically worked, without any freezing/slowness (please note
that during the test I wasn't using the VNC client, only the arping). I
also did it using screen with ssh/serial console disconnected.

Although I passed -U flag to arping, I still get replies. Is this
expected?

> note that you're getting bidirectional transfer due to the icmp errors.
> 
> if you want to test unidirectionally, you need to send something that
> won't get a reply.
> for example an unsolicited arp reply (with arping)... or a udp packet
> to a port drop'ed by a firewall.
> 
> and of course make sure you don't run it in such a way that you get
> ssh traffic causing bidirectional traffic.
> (ie. run it (and tcpdump) under screen)
> [this is assuming you can't run it directly on the device without
> getting in via ssh,
> maybe you can via serial port or whatever - though do note that serial
> wakeups could potentially also 'trigger' / 'fix' your hrtimer,
> so probably best to run things silently under screen with no direct
> output via ssh/serial/console.

I also tested by sending UDP packets with netcat, every 1 second:

```
while true
do
        echo 'hi' | nc -u 192.168.11.2 5201
        sleep 1
done
```

I also manually added the ARP entry to the table using the 'arp'
command, expecting I would get rid off ARP calls/replies, but it didn't.
Even though the ARP table is correct, I still see ARP calls. Also, I
can't get rid of ICMP packets, so at the end I couldn't truly test the
one-way direction communication. For example, this is a snippet from
tcpdump:

20:22:00.559786 IP 192.168.11.2.50323 > 192.168.11.1.5201: UDP, length 3
20:22:00.559857 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39
20:22:01.563804 IP 192.168.11.2.44521 > 192.168.11.1.5201: UDP, length 3
20:22:01.563903 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39
20:22:02.554294 ARP, Request who-has 192.168.11.2 tell 192.168.11.1, length 28
20:22:02.556414 ARP, Reply 192.168.11.2 is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
20:22:02.569192 IP 192.168.11.2.56428 > 192.168.11.1.5201: UDP, length 3
20:22:02.569281 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39
20:22:03.575838 IP 192.168.11.2.50178 > 192.168.11.1.5201: UDP, length 3
20:22:03.575903 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39
20:22:03.717633 ARP, Request who-has 192.168.11.1 tell 192.168.11.2, length 28
20:22:03.717680 ARP, Reply 192.168.11.1 is-at be:8e:0a:23:00:70 (oui Unknown), length 28
20:22:04.582394 IP 192.168.11.2.41700 > 192.168.11.1.5201: UDP, length 3
20:22:04.582473 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39
20:22:05.587417 IP 192.168.11.2.57901 > 192.168.11.1.5201: UDP, length 3

The same happens on both directions, host -> device and device -> host.
I couldn't see any slowness or freezing.

On Mon, Dec 18, 2023 at 03:34:36PM +0100, Oliver Neukum wrote:
> 
> 
> On 15.12.23 22:44, Maciej Żenczykowski wrote:
> > On Fri, Dec 15, 2023 at 1:20 PM Hiago De Franco <hiagofranco@gmail.com> wrote:
> 
> Hi,
> 
> > Here's my recommendation.
> > On the gadget side.
> > Print something when the timer is armed.
> > Print something when the timer fires.
> > Include the timestamp.
> > 
> > Then try to send something (a single packet) that won't trigger a reply.
> 
> You are implicitly raising a question.
> Does this bug strike if a single package is queued? The failure
> case seems to be 10 or 11 packages.

I think I will keep the same test but try to increase the number of
packages per second I'm sending, maybe one packet / 100 ms or something
like that.

> 
> 	Regards
> 		Oliver
>

Best Regards,
Hiago.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Question regarding CDC NCM and VNC performance issue
  2023-12-18 20:59                                           ` Hiago De Franco
@ 2023-12-18 23:00                                             ` Maciej Żenczykowski
  2023-12-19  8:45                                               ` Oliver Neukum
  0 siblings, 1 reply; 29+ messages in thread
From: Maciej Żenczykowski @ 2023-12-18 23:00 UTC (permalink / raw)
  To: Hiago De Franco
  Cc: Oliver Neukum, Francesco Dolcini, davem, edumazet, hiago.franco,
	kuba, linux-usb, pabeni

On Mon, Dec 18, 2023 at 1:00 PM Hiago De Franco <hiagofranco@gmail.com> wrote:
>
> Hi,
>
> On Fri, Dec 15, 2023 at 01:44:29PM -0800, Maciej Żenczykowski wrote:
> > Here's my recommendation.
> > On the gadget side.
> > Print something when the timer is armed.
> > Print something when the timer fires.
> > Include the timestamp.
> >
> > Then try to send something (a single packet) that won't trigger a reply.
> >
> > For example run a script in a screen session that sends 1 unsolicited
> > arp reply packet every 10 seconds.
> > This is rare enough, it should obviously get the timer to fire.
> > Again, wait 5 minutes, so you've had 30+ packets sent.
> > See if it does even fire...
> > You can run tcpdump on the receiver to see what you receive (and the
> > timestamps on that)
>
> Thanks for the recommendation, I did what you suggested for 5 minutes
> adding printk's for all 'hrtimer_*' function calls inside f_ncm.c.
> Basically we only get 'hrtimer_start' and 'hrtimer_try_to_cancel' calls,
> which makes sense. Here is the result, doing ARP pings from the device
> to the host PC:
>
> [   20.427879] hfranco: hrtimer_try_to_cancel
> [   21.270355] hfranco: hrtimer_start
> [   21.270732] hfranco: hrtimer_try_to_cancel
> [   31.695892] hfranco: hrtimer_start
> [   31.696215] hfranco: hrtimer_try_to_cancel
> [   36.324646] hfranco: hrtimer_start
> [   36.324989] hfranco: hrtimer_try_to_cancel
> [   37.168880] hfranco: hrtimer_start
> [   37.169216] hfranco: hrtimer_try_to_cancel
> [   60.359510] hfranco: hrtimer_start
> [   60.359836] hfranco: hrtimer_try_to_cancel
> [   68.188814] hfranco: hrtimer_start
> [   68.189141] hfranco: hrtimer_try_to_cancel
> [   69.035732] hfranco: hrtimer_start
> [   69.036062] hfranco: hrtimer_try_to_cancel
> [   90.413786] hfranco: hrtimer_start
> [   90.414116] hfranco: hrtimer_try_to_cancel
> [  100.380712] hfranco: hrtimer_start
> [  100.381048] hfranco: hrtimer_try_to_cancel
> [  110.349906] hfranco: hrtimer_start
> [  110.350247] hfranco: hrtimer_try_to_cancel
> [  120.321527] hfranco: hrtimer_start
> [  120.321872] hfranco: hrtimer_try_to_cancel
> [  120.325531] hfranco: hrtimer_start
> [  120.325869] hfranco: hrtimer_try_to_cancel
> [  130.295101] hfranco: hrtimer_start
> [  130.295428] hfranco: hrtimer_try_to_cancel
> [  131.992642] hfranco: hrtimer_start
> [  131.992984] hfranco: hrtimer_try_to_cancel
> [  132.839585] hfranco: hrtimer_start
> [  132.839927] hfranco: hrtimer_try_to_cancel
> [  140.270782] hfranco: hrtimer_start
> [  140.271112] hfranco: hrtimer_try_to_cancel
> [  150.248507] hfranco: hrtimer_start
> [  150.248848] hfranco: hrtimer_try_to_cancel
> [  160.228126] hfranco: hrtimer_start
> [  160.228453] hfranco: hrtimer_try_to_cancel
> [  170.209459] hfranco: hrtimer_start
> [  170.209822] hfranco: hrtimer_try_to_cancel
> [  180.192015] hfranco: hrtimer_start
> [  180.192351] hfranco: hrtimer_try_to_cancel
> [  190.176082] hfranco: hrtimer_start
> [  190.176423] hfranco: hrtimer_try_to_cancel
> [  200.161272] hfranco: hrtimer_start
> [  200.161600] hfranco: hrtimer_try_to_cancel
> [  210.147622] hfranco: hrtimer_start
> [  210.147949] hfranco: hrtimer_try_to_cancel
> [  220.135257] hfranco: hrtimer_start
> [  220.135582] hfranco: hrtimer_try_to_cancel
> [  230.123934] hfranco: hrtimer_start
> [  230.124274] hfranco: hrtimer_try_to_cancel
> [  237.879789] hfranco: hrtimer_start
> [  237.880133] hfranco: hrtimer_try_to_cancel
> [  240.113459] hfranco: hrtimer_start
> [  240.113794] hfranco: hrtimer_try_to_cancel
> [  250.103968] hfranco: hrtimer_start
> [  250.104309] hfranco: hrtimer_try_to_cancel
> [  259.800618] hfranco: hrtimer_start
> [  259.800946] hfranco: hrtimer_try_to_cancel
> [  260.095486] hfranco: hrtimer_start
> [  260.095828] hfranco: hrtimer_try_to_cancel
> [  260.646435] hfranco: hrtimer_start
> [  260.646776] hfranco: hrtimer_try_to_cancel
> [  270.087384] hfranco: hrtimer_start
> [  270.087711] hfranco: hrtimer_try_to_cancel
> [  280.080143] hfranco: hrtimer_start
> [  280.080468] hfranco: hrtimer_try_to_cancel
> [  290.073603] hfranco: hrtimer_start
> [  290.073945] hfranco: hrtimer_try_to_cancel
> [  300.067513] hfranco: hrtimer_start
> [  300.067852] hfranco: hrtimer_try_to_cancel
> [  310.061945] hfranco: hrtimer_start
> [  310.062297] hfranco: hrtimer_try_to_cancel
> [  320.056667] hfranco: hrtimer_start
> [  320.057008] hfranco: hrtimer_try_to_cancel
> [  330.051900] hfranco: hrtimer_start
> [  330.052255] hfranco: hrtimer_try_to_cancel
> [  340.047353] hfranco: hrtimer_start
> [  340.047678] hfranco: hrtimer_try_to_cancel
> [  350.044862] hfranco: hrtimer_start
> [  350.045221] hfranco: hrtimer_try_to_cancel
> [  360.043011] hfranco: hrtimer_start
> [  360.043352] hfranco: hrtimer_try_to_cancel
> [  370.041268] hfranco: hrtimer_start
> [  370.041611] hfranco: hrtimer_try_to_cancel
> [  380.039556] hfranco: hrtimer_start
> [  380.039891] hfranco: hrtimer_try_to_cancel
> [  390.037826] hfranco: hrtimer_start
> [  390.038150] hfranco: hrtimer_try_to_cancel
> [  400.036353] hfranco: hrtimer_start
>
> / # arping -U -I usb0 -i 10 -s 192.168.11.1 192.168.11.2
> ARPING 192.168.11.2 from 192.168.11.1 usb0
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.504ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.459ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.612ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.676ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.509ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.696ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  3.008ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.446ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.776ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.529ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.669ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.523ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.574ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.872ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.654ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.639ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.576ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.584ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.560ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.393ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.627ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.799ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.980ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.661ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.707ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.523ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.711ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.704ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.566ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.678ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.592ms
> Unicast reply from 192.168.11.2 [62:05:42:DF:6F:C2]  2.654ms
> ^CSent 32 probes (1 broadcast(s))
> Received 32 response(s)
>
> $ sudo tcpdump -i enx620542df6fc2
> tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
> listening on enx620542df6fc2, link-type EN10MB (Ethernet), snapshot length 262144 bytes
> 17:02:32.599638 IP6 hdebian > ip6-allrouters: ICMP6, router solicitation, length 8
> 17:02:35.425325 IP6 apalis-imx6-10692086.local > ip6-allrouters: ICMP6, router solicitation, length 16
> 17:02:42.810738 IP6 hdebian.mdns > ff02::fb.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45)
> 17:02:42.865794 IP6 hdebian.mdns > ff02::fb.mdns: 0 [b2&3=0x200] [2a] PTR (QM)? b.f.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.f.f.ip6.arpa. (181)
> 17:02:42.865832 IP6 hdebian.mdns > ff02::fb.mdns: 0 [0q] (12)
> 17:02:43.253971 IP apalis-imx6-10692086.local.mdns > mdns.mcast.net.mdns: 0 [1a] PTR (QM)? _tezi._tcp.local. (71)
> 17:02:43.895157 IP6 hdebian.mdns > ff02::fb.mdns: 0 [b2&3=0x200] [2a] PTR (QM)? b.f.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.f.f.ip6.arpa. (181)
> 17:02:43.895196 IP6 hdebian.mdns > ff02::fb.mdns: 0 [0q] (12)
> 17:02:44.100862 IP6 apalis-imx6-10692086.local.mdns > ff02::fb.mdns: 0 [1a] PTR (QM)? _tezi._tcp.local. (71)
> 17:02:44.961717 IP hdebian.mdns > mdns.mcast.net.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45)
> 17:02:46.145083 IP6 hdebian.mdns > ff02::fb.mdns: 0 [b2&3=0x200] [2a] PTR (QM)? b.f.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.f.f.ip6.arpa. (181)
> 17:02:46.145128 IP6 hdebian.mdns > ff02::fb.mdns: 0 [0q] (12)
> 17:03:05.477178 ARP, Request who-has hdebian (Broadcast) tell apalis-imx6-10692086.local, length 28
> 17:03:05.477217 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:03:15.443291 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:03:15.443309 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:03:24.580306 IP6 hdebian > ip6-allrouters: ICMP6, router solicitation, length 8
> 17:03:25.411804 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:03:25.411821 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:03:35.382697 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:03:35.382717 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:03:35.386582 IP6 apalis-imx6-10692086.local > ip6-allrouters: ICMP6, router solicitation, length 16
> 17:03:45.355482 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:03:45.355504 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:03:46.811643 IP6 hdebian.mdns > ff02::fb.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45)
> 17:03:47.052930 IP apalis-imx6-10692086.local.mdns > mdns.mcast.net.mdns: 0 [1a] PTR (QM)? _tezi._tcp.local. (71)
> 17:03:47.899743 IP6 apalis-imx6-10692086.local.mdns > ff02::fb.mdns: 0 [1a] PTR (QM)? _tezi._tcp.local. (71)
> 17:03:48.962107 IP hdebian.mdns > mdns.mcast.net.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45)
> 17:03:55.330407 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:03:55.330423 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:04:05.307308 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:04:05.307324 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:04:15.286180 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:04:15.286209 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:04:25.266785 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:04:25.266816 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:04:35.248474 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:04:35.248489 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:04:45.231873 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:04:45.231902 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:04:55.216288 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:04:55.216315 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:05:05.201924 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:05:05.201951 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:05:09.438461 IP6 hdebian > ip6-allrouters: ICMP6, router solicitation, length 8
> 17:05:15.188755 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:05:15.188799 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:05:25.176620 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:05:25.176648 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:05:32.931912 IP6 apalis-imx6-10692086.local > ip6-allrouters: ICMP6, router solicitation, length 16
> 17:05:35.165375 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:05:35.165399 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:05:45.155068 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:05:45.155082 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:05:54.813550 IP6 hdebian.mdns > ff02::fb.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45)
> 17:05:54.851012 IP apalis-imx6-10692086.local.mdns > mdns.mcast.net.mdns: 0 [1a] PTR (QM)? _tezi._tcp.local. (71)
> 17:05:55.145920 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:05:55.145943 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:05:55.696822 IP6 apalis-imx6-10692086.local.mdns > ff02::fb.mdns: 0 [1a] PTR (QM)? _tezi._tcp.local. (71)
> 17:05:56.963151 IP hdebian.mdns > mdns.mcast.net.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45)
> 17:06:05.136948 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:06:05.137104 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:06:15.128897 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:06:15.128918 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:06:25.121662 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:06:25.121682 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:06:35.114912 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:06:35.114943 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:06:45.108576 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:06:45.108599 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:06:55.102468 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:06:55.102501 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:07:05.096969 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:07:05.096994 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:07:15.091619 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:07:15.091647 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:07:25.088359 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:07:25.088385 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:07:35.085718 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:07:35.085753 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:07:45.083263 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:07:45.083291 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:07:55.080759 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:07:55.080782 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:08:05.078303 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:08:05.078344 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 17:08:15.075990 ARP, Request who-has hdebian (62:05:42:df:6f:c2 (oui Unknown)) tell apalis-imx6-10692086.local, length 28
> 17:08:15.076012 ARP, Reply hdebian is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> ^C
> 88 packets captured
> 88 packets received by filter
> 0 packets dropped by kernel
>
> Everything basically worked, without any freezing/slowness (please note
> that during the test I wasn't using the VNC client, only the arping). I
> also did it using screen with ssh/serial console disconnected.
>
> Although I passed -U flag to arping, I still get replies. Is this
> expected?

No. this defeats the purpose of the (since it's not unidirectional).
Perhaps -A would work better.
You could also probably pick an ip as the destination that isn't the
device on the other side of the cable, so that it *doesn't* respond.

The following seems to work for me:
  while true; do sudo arping -c 1 -i $DEV -b 1.2.3.4; sleep 10; done
since nothing responds to 1.2.3.4 on my lan.

But Oliver's point about it possibly needing 10 or 11 queued packets
is also interesting...
though that would be very very weird...

>
> > note that you're getting bidirectional transfer due to the icmp errors.
> >
> > if you want to test unidirectionally, you need to send something that
> > won't get a reply.
> > for example an unsolicited arp reply (with arping)... or a udp packet
> > to a port drop'ed by a firewall.
> >
> > and of course make sure you don't run it in such a way that you get
> > ssh traffic causing bidirectional traffic.
> > (ie. run it (and tcpdump) under screen)
> > [this is assuming you can't run it directly on the device without
> > getting in via ssh,
> > maybe you can via serial port or whatever - though do note that serial
> > wakeups could potentially also 'trigger' / 'fix' your hrtimer,
> > so probably best to run things silently under screen with no direct
> > output via ssh/serial/console.
>
> I also tested by sending UDP packets with netcat, every 1 second:
>
> ```
> while true
> do
>         echo 'hi' | nc -u 192.168.11.2 5201
>         sleep 1
> done
> ```
>
> I also manually added the ARP entry to the table using the 'arp'
> command, expecting I would get rid off ARP calls/replies, but it didn't.
> Even though the ARP table is correct, I still see ARP calls. Also, I
> can't get rid of ICMP packets, so at the end I couldn't truly test the
> one-way direction communication. For example, this is a snippet from
> tcpdump:
>
> 20:22:00.559786 IP 192.168.11.2.50323 > 192.168.11.1.5201: UDP, length 3
> 20:22:00.559857 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39
> 20:22:01.563804 IP 192.168.11.2.44521 > 192.168.11.1.5201: UDP, length 3
> 20:22:01.563903 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39
> 20:22:02.554294 ARP, Request who-has 192.168.11.2 tell 192.168.11.1, length 28
> 20:22:02.556414 ARP, Reply 192.168.11.2 is-at 62:05:42:df:6f:c2 (oui Unknown), length 28
> 20:22:02.569192 IP 192.168.11.2.56428 > 192.168.11.1.5201: UDP, length 3
> 20:22:02.569281 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39
> 20:22:03.575838 IP 192.168.11.2.50178 > 192.168.11.1.5201: UDP, length 3
> 20:22:03.575903 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39
> 20:22:03.717633 ARP, Request who-has 192.168.11.1 tell 192.168.11.2, length 28
> 20:22:03.717680 ARP, Reply 192.168.11.1 is-at be:8e:0a:23:00:70 (oui Unknown), length 28
> 20:22:04.582394 IP 192.168.11.2.41700 > 192.168.11.1.5201: UDP, length 3
> 20:22:04.582473 IP 192.168.11.1 > 192.168.11.2: ICMP 192.168.11.1 udp port 5201 unreachable, length 39
> 20:22:05.587417 IP 192.168.11.2.57901 > 192.168.11.1.5201: UDP, length 3
>
> The same happens on both directions, host -> device and device -> host.
> I couldn't see any slowness or freezing.
>
> On Mon, Dec 18, 2023 at 03:34:36PM +0100, Oliver Neukum wrote:
> >
> >
> > On 15.12.23 22:44, Maciej Żenczykowski wrote:
> > > On Fri, Dec 15, 2023 at 1:20 PM Hiago De Franco <hiagofranco@gmail.com> wrote:
> >
> > Hi,
> >
> > > Here's my recommendation.
> > > On the gadget side.
> > > Print something when the timer is armed.
> > > Print something when the timer fires.
> > > Include the timestamp.
> > >
> > > Then try to send something (a single packet) that won't trigger a reply.
> >
> > You are implicitly raising a question.
> > Does this bug strike if a single package is queued? The failure
> > case seems to be 10 or 11 packages.
>
> I think I will keep the same test but try to increase the number of
> packages per second I'm sending, maybe one packet / 100 ms or something
> like that.
>
> >
> >       Regards
> >               Oliver
> >
>
> Best Regards,
> Hiago.

--
Maciej Żenczykowski, Kernel Networking Developer @ Google

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Question regarding CDC NCM and VNC performance issue
  2023-12-18 23:00                                             ` Maciej Żenczykowski
@ 2023-12-19  8:45                                               ` Oliver Neukum
  2023-12-19 12:19                                                 ` Maciej Żenczykowski
  0 siblings, 1 reply; 29+ messages in thread
From: Oliver Neukum @ 2023-12-19  8:45 UTC (permalink / raw)
  To: Maciej Żenczykowski, Hiago De Franco
  Cc: Francesco Dolcini, davem, edumazet, hiago.franco, kuba, linux-usb,
	pabeni



On 19.12.23 00:00, Maciej Żenczykowski wrote:
> On Mon, Dec 18, 2023 at 1:00 PM Hiago De Franco <hiagofranco@gmail.com> wrote:

Hi,

>> Everything basically worked, without any freezing/slowness (please note
>> that during the test I wasn't using the VNC client, only the arping). I
>> also did it using screen with ssh/serial console disconnected.
>>
>> Although I passed -U flag to arping, I still get replies. Is this
>> expected?
> 
> No. this defeats the purpose of the (since it's not unidirectional).
> Perhaps -A would work better.
> You could also probably pick an ip as the destination that isn't the
> device on the other side of the cable, so that it *doesn't* respond.
> 
> The following seems to work for me:
>    while true; do sudo arping -c 1 -i $DEV -b 1.2.3.4; sleep 10; done
> since nothing responds to 1.2.3.4 on my lan.
> 
> But Oliver's point about it possibly needing 10 or 11 queued packets
> is also interesting...
> though that would be very very weird...

Probably not 10 or 11 but more than 1. Sorry for being unclear. Let me
explain.

The timer should not run for an idle interface, should it?
Thus we have three situations

A - first network packet to be transmitted:
queue it and start a timer

B - further network packet to be transmitted:
queue it and modify the timer

C - enough network packets:
transmit

It seems to me that Hiago has just shown that A works. Lowering the threshold
also solves the issue, so C works. Case B seems to be the issue, yet the test
setup tests case A. That is my issue.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Question regarding CDC NCM and VNC performance issue
  2023-12-19  8:45                                               ` Oliver Neukum
@ 2023-12-19 12:19                                                 ` Maciej Żenczykowski
  2023-12-19 13:24                                                   ` Oliver Neukum
  0 siblings, 1 reply; 29+ messages in thread
From: Maciej Żenczykowski @ 2023-12-19 12:19 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Hiago De Franco, Francesco Dolcini, davem, edumazet, hiago.franco,
	kuba, linux-usb, pabeni

On Tue, Dec 19, 2023 at 12:45 AM Oliver Neukum <oneukum@suse.com> wrote:
>
>
>
> On 19.12.23 00:00, Maciej Żenczykowski wrote:
> > On Mon, Dec 18, 2023 at 1:00 PM Hiago De Franco <hiagofranco@gmail.com> wrote:
>
> Hi,
>
> >> Everything basically worked, without any freezing/slowness (please note
> >> that during the test I wasn't using the VNC client, only the arping). I
> >> also did it using screen with ssh/serial console disconnected.
> >>
> >> Although I passed -U flag to arping, I still get replies. Is this
> >> expected?
> >
> > No. this defeats the purpose of the (since it's not unidirectional).
> > Perhaps -A would work better.
> > You could also probably pick an ip as the destination that isn't the
> > device on the other side of the cable, so that it *doesn't* respond.
> >
> > The following seems to work for me:
> >    while true; do sudo arping -c 1 -i $DEV -b 1.2.3.4; sleep 10; done
> > since nothing responds to 1.2.3.4 on my lan.
> >
> > But Oliver's point about it possibly needing 10 or 11 queued packets
> > is also interesting...
> > though that would be very very weird...
>
> Probably not 10 or 11 but more than 1. Sorry for being unclear. Let me
> explain.
>
> The timer should not run for an idle interface, should it?
> Thus we have three situations
>
> A - first network packet to be transmitted:
> queue it and start a timer
>
> B - further network packet to be transmitted:
> queue it and modify the timer
>
> C - enough network packets:
> transmit
>
> It seems to me that Hiago has just shown that A works. Lowering the threshold
> also solves the issue, so C works. Case B seems to be the issue, yet the test
> setup tests case A. That is my issue.

Perhaps.  I've been looking at the gadget tx code, and assuming hrtimers work,
I don't see how it could possibly screw up...  the hrtimer arm/cancel
are clearly
at the same spot in the code where we allocate/unqueue the ncm->skb_tx_data

(I have found a potential use-after-free-on-error bug and a stupidity
that I'll send fixes for,
but that doesn't appear related to this thread)

But the host side driver seems more complex/confusing.
However, I've not really ever looked at it previously though, so it
might just be that...

Maybe there's some case where we end up with no timer and queued packets??
Haven't found it yet though...

>
>         Regards
>                 Oliver

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Question regarding CDC NCM and VNC performance issue
  2023-12-19 12:19                                                 ` Maciej Żenczykowski
@ 2023-12-19 13:24                                                   ` Oliver Neukum
  2023-12-19 16:25                                                     ` Maciej Żenczykowski
  0 siblings, 1 reply; 29+ messages in thread
From: Oliver Neukum @ 2023-12-19 13:24 UTC (permalink / raw)
  To: Maciej Żenczykowski, Oliver Neukum
  Cc: Hiago De Franco, Francesco Dolcini, davem, edumazet, hiago.franco,
	kuba, linux-usb, pabeni

On 19.12.23 13:19, Maciej Żenczykowski wrote:
> On Tue, Dec 19, 2023 at 12:45 AM Oliver Neukum <oneukum@suse.com> wrote:
>>
>>
>>
>> On 19.12.23 00:00, Maciej Żenczykowski wrote:

> Perhaps.  I've been looking at the gadget tx code, and assuming hrtimers work,
> I don't see how it could possibly screw up...  the hrtimer arm/cancel
> are clearly
> at the same spot in the code where we allocate/unqueue the ncm->skb_tx_data
> 
> (I have found a potential use-after-free-on-error bug and a stupidity
> that I'll send fixes for,
> but that doesn't appear related to this thread)

Yes. Could there be a logic bug in the algorithm?
Maybe it is necessary to also consider the time the first packet
was queued and to transmit if that is too long in the past?

> But the host side driver seems more complex/confusing.
> However, I've not really ever looked at it previously though, so it
> might just be that...

How can Hiago determine that? If you do a ping from the gadget
to the host, tcpdump should show the timing of the echo requests,
shouldn't it? If they arrive simultaneously we at least know which
side the issue is on.

	Regards
		Oliver
  

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Question regarding CDC NCM and VNC performance issue
  2023-12-19 13:24                                                   ` Oliver Neukum
@ 2023-12-19 16:25                                                     ` Maciej Żenczykowski
  0 siblings, 0 replies; 29+ messages in thread
From: Maciej Żenczykowski @ 2023-12-19 16:25 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Hiago De Franco, Francesco Dolcini, davem, edumazet, hiago.franco,
	kuba, linux-usb, pabeni

On Tue, Dec 19, 2023 at 5:24 AM Oliver Neukum <oneukum@suse.com> wrote:
>
> On 19.12.23 13:19, Maciej Żenczykowski wrote:
> > On Tue, Dec 19, 2023 at 12:45 AM Oliver Neukum <oneukum@suse.com> wrote:
> >>
> >>
> >>
> >> On 19.12.23 00:00, Maciej Żenczykowski wrote:
>
> > Perhaps.  I've been looking at the gadget tx code, and assuming hrtimers work,
> > I don't see how it could possibly screw up...  the hrtimer arm/cancel
> > are clearly
> > at the same spot in the code where we allocate/unqueue the ncm->skb_tx_data
> >
> > (I have found a potential use-after-free-on-error bug and a stupidity
> > that I'll send fixes for,
> > but that doesn't appear related to this thread)
>
> Yes. Could there be a logic bug in the algorithm?

Perhaps?  But, as I said... I'm not seeing it (on the gadget side).

> Maybe it is necessary to also consider the time the first packet
> was queued and to transmit if that is too long in the past?

The gadget side does not (any more) adjust the timer (on later packets)...
Which is part of what makes it so simple logic wise...

This sort of bug would likely be easier to make on the host side
cdc_ncm driver...
I'm still not following it's logic...

>
> > But the host side driver seems more complex/confusing.
> > However, I've not really ever looked at it previously though, so it
> > might just be that...
>
> How can Hiago determine that? If you do a ping from the gadget
> to the host, tcpdump should show the timing of the echo requests,
> shouldn't it? If they arrive simultaneously we at least know which
> side the issue is on.

I guess you could ping on one side, and run tcpdump on the *other*.
Then look at the timestamps on the tcpdump vs ping cli reported...

If tcpdump shows ping requests coming in at equal intervals...
then the ping sender side + receiver is OK, and the problem must be
the return path.

>
>         Regards
>                 Oliver
>

--
Maciej Żenczykowski, Kernel Networking Developer @ Google

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-12-19 16:25 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-04 18:37 Question regarding CDC NCM and VNC performance issue Hiago De Franco
2023-12-05  9:50 ` Oliver Neukum
2023-12-05 12:22   ` Hiago De Franco
2023-12-05 12:38     ` Oliver Neukum
2023-12-05 12:55       ` Hiago De Franco
2023-12-05 20:14       ` Hiago De Franco
2023-12-06 12:21         ` Oliver Neukum
2023-12-06 16:23           ` Hiago De Franco
2023-12-07  9:41             ` Oliver Neukum
2023-12-07 10:03               ` Francesco Dolcini
2023-12-07 11:07                 ` Oliver Neukum
2023-12-07 11:38                   ` Maciej Żenczykowski
2023-12-07 18:57                     ` Hiago De Franco
2023-12-07 19:37                       ` Maciej Żenczykowski
2023-12-11 20:28                         ` Hiago De Franco
2023-12-11 20:44                           ` Maciej Żenczykowski
2023-12-12  9:48                             ` Oliver Neukum
2023-12-12 20:32                               ` Hiago De Franco
2023-12-12 21:21                                 ` Oliver Neukum
2023-12-13  1:11                                   ` Maciej Żenczykowski
2023-12-15 21:20                                     ` Hiago De Franco
2023-12-15 21:44                                       ` Maciej Żenczykowski
2023-12-18 14:34                                         ` Oliver Neukum
2023-12-18 20:59                                           ` Hiago De Franco
2023-12-18 23:00                                             ` Maciej Żenczykowski
2023-12-19  8:45                                               ` Oliver Neukum
2023-12-19 12:19                                                 ` Maciej Żenczykowski
2023-12-19 13:24                                                   ` Oliver Neukum
2023-12-19 16:25                                                     ` Maciej Żenczykowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox