* [RFC net-next] tcp: allow larger TSO to be built under overload @ 2022-03-08 3:03 Jakub Kicinski 2022-03-08 3:50 ` Eric Dumazet 0 siblings, 1 reply; 12+ messages in thread From: Jakub Kicinski @ 2022-03-08 3:03 UTC (permalink / raw) To: edumazet; +Cc: netdev, willemb, ncardwell, ycheng, Jakub Kicinski We observed Tx-heavy workloads causing softirq overload because with increased load and therefore latency the pacing rates fall, pushing TCP to generate smaller and smaller TSO packets. It seems reasonable to allow larger packets to be built when system is under stress. TCP already uses the this_cpu_ksoftirqd() == current condition as an indication of overload for TSQ scheduling. Signed-off-by: Jakub Kicinski <kuba@kernel.org> --- Sending as an RFC because it seems reasonable, but really I haven't run any large scale testing, yet. Bumping tcp_min_tso_segs to prevent overloads is okay but it seems like we can do better since we only need coarser pacing once disaster strikes? The downsides are that users may have already increased the value to what's needed during overload, or applied the same logic in out-of-tree CA algo implementations (only BBR implements ca_ops->min_tso_segs() upstream). --- net/ipv4/tcp_output.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 2319531267c6..815ef4ffc39d 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1967,7 +1967,13 @@ static u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now, * This preserves ACK clocking and is consistent * with tcp_tso_should_defer() heuristic. */ - segs = max_t(u32, bytes / mss_now, min_tso_segs); + segs = bytes / mss_now; + if (segs < min_tso_segs) { + segs = min_tso_segs; + /* Allow larger packets under stress */ + if (this_cpu_ksoftirqd() == current) + segs *= 2; + } return segs; } -- 2.34.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [RFC net-next] tcp: allow larger TSO to be built under overload 2022-03-08 3:03 [RFC net-next] tcp: allow larger TSO to be built under overload Jakub Kicinski @ 2022-03-08 3:50 ` Eric Dumazet 2022-03-08 4:29 ` Jakub Kicinski 2022-03-08 9:07 ` David Laight 0 siblings, 2 replies; 12+ messages in thread From: Eric Dumazet @ 2022-03-08 3:50 UTC (permalink / raw) To: Jakub Kicinski; +Cc: netdev, Willem de Bruijn, Neal Cardwell, Yuchung Cheng On Mon, Mar 7, 2022 at 7:03 PM Jakub Kicinski <kuba@kernel.org> wrote: > > We observed Tx-heavy workloads causing softirq overload because > with increased load and therefore latency the pacing rates fall, > pushing TCP to generate smaller and smaller TSO packets. Yes, we saw this behavior but came up with something more generic, also helping the common case. Cooking larger TSO is really a function of the radius (distance between peers) > It seems reasonable to allow larger packets to be built when > system is under stress. TCP already uses the > > this_cpu_ksoftirqd() == current > > condition as an indication of overload for TSQ scheduling. > > Signed-off-by: Jakub Kicinski <kuba@kernel.org> > --- > Sending as an RFC because it seems reasonable, but really > I haven't run any large scale testing, yet. Bumping > tcp_min_tso_segs to prevent overloads is okay but it > seems like we can do better since we only need coarser > pacing once disaster strikes? > > The downsides are that users may have already increased > the value to what's needed during overload, or applied > the same logic in out-of-tree CA algo implementations > (only BBR implements ca_ops->min_tso_segs() upstream). > Unfortunately this would make packetdrill flaky, thus break our tests. Also, I would guess the pacing decreases because CWND is small anyway, or RTT increases ? What CC are you using ? The issue I see here is that bi modal behavior will cause all kinds of artifacts. BBR2 has something to give an extra allowance based on min_rtt. I think we should adopt this for all CC, because it is not bi-modal, and even allow full size TSO packets for hosts in the same rack. diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 2319531267c6830b633768dea7f0b40a46633ee1..02ec5866a05ffc2920ead95e9a65cc1ba77459c7 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1956,20 +1956,34 @@ static bool tcp_nagle_check(bool partial, const struct tcp_sock *tp, static u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now, int min_tso_segs) { - u32 bytes, segs; +/* Use min_rtt to help adapt TSO burst size, with smaller min_rtt resulting + * in bigger TSO bursts. By default we cut the RTT-based allowance in half + * for every 2^9 usec (aka 512 us) of RTT, so that the RTT-based allowance + * is below 1500 bytes after 6 * ~500 usec = 3ms. + * Default: halve allowance per 2^9 usecs, 512us. + */ + const u32 rtt_shift = 9; + unsigned long bytes; + u32 r; + + bytes = sk->sk_pacing_rate >> READ_ONCE(sk->sk_pacing_shift); + /* Budget a TSO/GSO burst size allowance based on min_rtt. For every + * K = 2^tso_rtt_shift microseconds of min_rtt, halve the burst. + * The min_rtt-based burst allowance is: 64 KBytes / 2^(min_rtt/K) + */ + r = tcp_min_rtt(tcp_sk(sk)) >> rtt_shift; + if (r < BITS_PER_TYPE(u32)) + bytes += GSO_MAX_SIZE >> r; + + bytes = min_t(unsigned long, bytes, sk->sk_gso_max_size); - bytes = min_t(unsigned long, - sk->sk_pacing_rate >> READ_ONCE(sk->sk_pacing_shift), - sk->sk_gso_max_size); /* Goal is to send at least one packet per ms, * not one big TSO packet every 100 ms. * This preserves ACK clocking and is consistent * with tcp_tso_should_defer() heuristic. */ - segs = max_t(u32, bytes / mss_now, min_tso_segs); - - return segs; + return max_t(u32, bytes / mss_now, min_tso_segs); } /* Return the number of segments we want in the skb we are transmitting. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC net-next] tcp: allow larger TSO to be built under overload 2022-03-08 3:50 ` Eric Dumazet @ 2022-03-08 4:29 ` Jakub Kicinski 2022-03-08 9:07 ` David Laight 1 sibling, 0 replies; 12+ messages in thread From: Jakub Kicinski @ 2022-03-08 4:29 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, Willem de Bruijn, Neal Cardwell, Yuchung Cheng On Mon, 7 Mar 2022 19:50:10 -0800 Eric Dumazet wrote: > On Mon, Mar 7, 2022 at 7:03 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > > We observed Tx-heavy workloads causing softirq overload because > > with increased load and therefore latency the pacing rates fall, > > pushing TCP to generate smaller and smaller TSO packets. > > Yes, we saw this behavior but came up with something more generic, > also helping the common case. Cooking larger TSO is really a function > of the radius (distance between peers) Excellent, I was hoping you have a better fix :) > > It seems reasonable to allow larger packets to be built when > > system is under stress. TCP already uses the > > > > this_cpu_ksoftirqd() == current > > > > condition as an indication of overload for TSQ scheduling. > > > > Signed-off-by: Jakub Kicinski <kuba@kernel.org> > > --- > > Sending as an RFC because it seems reasonable, but really > > I haven't run any large scale testing, yet. Bumping > > tcp_min_tso_segs to prevent overloads is okay but it > > seems like we can do better since we only need coarser > > pacing once disaster strikes? > > > > The downsides are that users may have already increased > > the value to what's needed during overload, or applied > > the same logic in out-of-tree CA algo implementations > > (only BBR implements ca_ops->min_tso_segs() upstream). > > > > Unfortunately this would make packetdrill flaky, thus break our tests. > > Also, I would guess the pacing decreases because CWND is small anyway, > or RTT increases ? Both increase - CWND can go up to the 256-512 bucket (in a histogram) but latency gets insane as the machine tries to pump out 2kB segments, doing a lot of splitting and barely services the ACK from the Rx ring. With a Rx ring of a few thousand packets latency crosses 250ms, in-building. I've seen srtt_us > 1M. > What CC are you using ? A mix of CUBIC and DCTCP for this application, primarily DCTCP. > The issue I see here is that bi modal behavior will cause all kinds of > artifacts. > > BBR2 has something to give an extra allowance based on min_rtt. > > I think we should adopt this for all CC, because it is not bi-modal, > and even allow full size TSO packets > for hosts in the same rack. Using min_rtt makes perfect sense in the case I saw. > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index 2319531267c6830b633768dea7f0b40a46633ee1..02ec5866a05ffc2920ead95e9a65cc1ba77459c7 > 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -1956,20 +1956,34 @@ static bool tcp_nagle_check(bool partial, > const struct tcp_sock *tp, > static u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now, > int min_tso_segs) > { > - u32 bytes, segs; > +/* Use min_rtt to help adapt TSO burst size, with smaller min_rtt resulting > + * in bigger TSO bursts. By default we cut the RTT-based allowance in half > + * for every 2^9 usec (aka 512 us) of RTT, so that the RTT-based allowance > + * is below 1500 bytes after 6 * ~500 usec = 3ms. > + * Default: halve allowance per 2^9 usecs, 512us. > + */ > + const u32 rtt_shift = 9; > + unsigned long bytes; > + u32 r; > + > + bytes = sk->sk_pacing_rate >> READ_ONCE(sk->sk_pacing_shift); > + /* Budget a TSO/GSO burst size allowance based on min_rtt. For every > + * K = 2^tso_rtt_shift microseconds of min_rtt, halve the burst. > + * The min_rtt-based burst allowance is: 64 KBytes / 2^(min_rtt/K) > + */ > + r = tcp_min_rtt(tcp_sk(sk)) >> rtt_shift; > + if (r < BITS_PER_TYPE(u32)) > + bytes += GSO_MAX_SIZE >> r; > + > + bytes = min_t(unsigned long, bytes, sk->sk_gso_max_size); > > - bytes = min_t(unsigned long, > - sk->sk_pacing_rate >> READ_ONCE(sk->sk_pacing_shift), > - sk->sk_gso_max_size); > > /* Goal is to send at least one packet per ms, > * not one big TSO packet every 100 ms. > * This preserves ACK clocking and is consistent > * with tcp_tso_should_defer() heuristic. > */ > - segs = max_t(u32, bytes / mss_now, min_tso_segs); > - > - return segs; > + return max_t(u32, bytes / mss_now, min_tso_segs); > } > > /* Return the number of segments we want in the skb we are transmitting. ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [RFC net-next] tcp: allow larger TSO to be built under overload 2022-03-08 3:50 ` Eric Dumazet 2022-03-08 4:29 ` Jakub Kicinski @ 2022-03-08 9:07 ` David Laight 2022-03-08 19:53 ` Eric Dumazet 1 sibling, 1 reply; 12+ messages in thread From: David Laight @ 2022-03-08 9:07 UTC (permalink / raw) To: 'Eric Dumazet', Jakub Kicinski Cc: netdev, Willem de Bruijn, Neal Cardwell, Yuchung Cheng From: Eric Dumazet > Sent: 08 March 2022 03:50 ... > /* Goal is to send at least one packet per ms, > * not one big TSO packet every 100 ms. > * This preserves ACK clocking and is consistent > * with tcp_tso_should_defer() heuristic. > */ > - segs = max_t(u32, bytes / mss_now, min_tso_segs); > - > - return segs; > + return max_t(u32, bytes / mss_now, min_tso_segs); > } Which is the common side of that max_t() ? If it is mon_tso_segs it might be worth avoiding the divide by coding as: return bytes > mss_now * min_tso_segs ? bytes / mss_now : min_tso_segs; David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC net-next] tcp: allow larger TSO to be built under overload 2022-03-08 9:07 ` David Laight @ 2022-03-08 19:53 ` Eric Dumazet 2022-03-08 22:12 ` David Laight 2022-03-09 0:18 ` Jakub Kicinski 0 siblings, 2 replies; 12+ messages in thread From: Eric Dumazet @ 2022-03-08 19:53 UTC (permalink / raw) To: David Laight Cc: Jakub Kicinski, netdev, Willem de Bruijn, Neal Cardwell, Yuchung Cheng On Tue, Mar 8, 2022 at 1:08 AM David Laight <David.Laight@aculab.com> wrote: > > From: Eric Dumazet > > Sent: 08 March 2022 03:50 > ... > > /* Goal is to send at least one packet per ms, > > * not one big TSO packet every 100 ms. > > * This preserves ACK clocking and is consistent > > * with tcp_tso_should_defer() heuristic. > > */ > > - segs = max_t(u32, bytes / mss_now, min_tso_segs); > > - > > - return segs; > > + return max_t(u32, bytes / mss_now, min_tso_segs); > > } > > Which is the common side of that max_t() ? > If it is mon_tso_segs it might be worth avoiding the > divide by coding as: > > return bytes > mss_now * min_tso_segs ? bytes / mss_now : min_tso_segs; > I think the common case is when the divide must happen. Not sure if this really matters with current cpus. Jakub, Neal, I am going to send a patch for net-next. In conjunction with BIG TCP, this gives a considerable boost of performance. Before: otrv5:/home/google/edumazet# nstat -n;./super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000;nstat|egrep "TcpInSegs|TcpOutSegs|TcpRetransSegs|Delivered" 96005 TcpInSegs 15649381 0.0 TcpOutSegs 58659574 0.0 # Average of 3.74 4K segments per TSO packet TcpExtTCPDelivered 58655240 0.0 TcpExtTCPDeliveredCE 21 0.0 After: otrv5:/home/google/edumazet# nstat -n;./super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000;nstat|egrep "TcpInSegs|TcpOutSegs|TcpRetransSegs|Delivered" 96046 TcpInSegs 1445864 0.0 TcpOutSegs 58885065 0.0 # Average of 40.72 4K segments per TSO packet TcpExtTCPDelivered 58880873 0.0 TcpExtTCPDeliveredCE 28 0.0 -> 1,445,864 ACK packets instead of 15,649,381 And about 25 % of cpu cycles saved, according to perf stat Performance counter stats for './super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000': 66,895.00 msec task-clock # 2.886 CPUs utilized 1,312,687 context-switches # 19623.389 M/sec 5,645 cpu-migrations # 84.387 M/sec 942,412 page-faults # 14088.139 M/sec 203,672,224,410 cycles # 3044700.936 GHz (83.40%) 18,933,350,691 stalled-cycles-frontend # 9.30% frontend cycles idle (83.46%) 138,500,001,318 stalled-cycles-backend # 68.00% backend cycles idle (83.38%) 53,694,300,814 instructions # 0.26 insn per cycle # 2.58 stalled cycles per insn (83.30%) 9,100,155,390 branches # 136038439.770 M/sec (83.26%) 152,331,123 branch-misses # 1.67% of all branches (83.47%) 23.180309488 seconds time elapsed --> Performance counter stats for './super_netperf 600 -H otrv6 -l 20 -- -K dctcp -q 20000000': 48,964.30 msec task-clock # 2.103 CPUs utilized 184,903 context-switches # 3776.305 M/sec 3,057 cpu-migrations # 62.434 M/sec 940,615 page-faults # 19210.338 M/sec 152,390,738,065 cycles # 3112301.652 GHz (83.61%) 11,603,675,527 stalled-cycles-frontend # 7.61% frontend cycles idle (83.49%) 120,240,493,440 stalled-cycles-backend # 78.90% backend cycles idle (83.30%) 37,106,498,492 instructions # 0.24 insn per cycle # 3.24 stalled cycles per insn (83.47%) 5,968,256,846 branches # 121890712.483 M/sec (83.25%) 88,743,145 branch-misses # 1.49% of all branches (83.24%) 23.284583305 seconds time elapsed ^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [RFC net-next] tcp: allow larger TSO to be built under overload 2022-03-08 19:53 ` Eric Dumazet @ 2022-03-08 22:12 ` David Laight 2022-03-08 22:26 ` Eric Dumazet 2022-03-09 0:18 ` Jakub Kicinski 1 sibling, 1 reply; 12+ messages in thread From: David Laight @ 2022-03-08 22:12 UTC (permalink / raw) To: 'Eric Dumazet' Cc: Jakub Kicinski, netdev, Willem de Bruijn, Neal Cardwell, Yuchung Cheng From: Eric Dumazet > Sent: 08 March 2022 19:54 .. > > Which is the common side of that max_t() ? > > If it is mon_tso_segs it might be worth avoiding the > > divide by coding as: > > > > return bytes > mss_now * min_tso_segs ? bytes / mss_now : min_tso_segs; > > > > I think the common case is when the divide must happen. > Not sure if this really matters with current cpus. Last document I looked at still quoted considerable latency for integer divide on x86-64. If you get a cmov then all the instructions will just get queued waiting for the divide to complete. But a branch could easily get mispredicted. That is likely to hit ppc - which I don't think has a cmov? OTOH if the divide is in the ?: bit nothing probably depends on it for a while - so the latency won't matter. Latest figures I have are for skylakeX u-ops latency 1/throughput DIV r8 10 10 p0 p1 p5 p6 23 6 DIV r16 10 10 p0 p1 p5 p6 23 6 DIV r32 10 10 p0 p1 p5 p6 26 6 DIV r64 36 36 p0 p1 p5 p6 35-88 21-83 IDIV r8 11 11 p0 p1 p5 p6 24 6 IDIV r16 10 10 p0 p1 p5 p6 23 6 IDIV r32 10 10 p0 p1 p5 p6 26 6 IDIV r64 57 57 p0 p1 p5 p6 42-95 24-90 Broadwell is a bit slower. Note that 64bit divide is really horrid. I think that one will be 32bit - so 'only' 26 clocks latency. AMD Ryzen is a lot better for 64bit divides: ltncy 1/thpt DIV r8/m8 1 13-16 13-16 DIV r16/m16 2 14-21 14-21 DIV r32/m32 2 14-30 14-30 DIV r64/m64 2 14-46 14-45 IDIV r8/m8 1 13-16 13-16 IDIV r16/m16 2 13-21 14-22 IDIV r32/m32 2 14-30 14-30 IDIV r64/m64 2 14-47 14-45 But less pipelining for 32bit ones. Quite how those tables actually affect real code is another matter - but they are guidelines about what is possible (if you can get the u-ops executed on the right ports). David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC net-next] tcp: allow larger TSO to be built under overload 2022-03-08 22:12 ` David Laight @ 2022-03-08 22:26 ` Eric Dumazet 2022-03-08 22:42 ` Eric Dumazet 0 siblings, 1 reply; 12+ messages in thread From: Eric Dumazet @ 2022-03-08 22:26 UTC (permalink / raw) To: David Laight Cc: Jakub Kicinski, netdev, Willem de Bruijn, Neal Cardwell, Yuchung Cheng On Tue, Mar 8, 2022 at 2:12 PM David Laight <David.Laight@aculab.com> wrote: > > From: Eric Dumazet > > Sent: 08 March 2022 19:54 > .. > > > Which is the common side of that max_t() ? > > > If it is mon_tso_segs it might be worth avoiding the > > > divide by coding as: > > > > > > return bytes > mss_now * min_tso_segs ? bytes / mss_now : min_tso_segs; > > > > > > > I think the common case is when the divide must happen. > > Not sure if this really matters with current cpus. > > Last document I looked at still quoted considerable latency > for integer divide on x86-64. > If you get a cmov then all the instructions will just get > queued waiting for the divide to complete. > But a branch could easily get mispredicted. > That is likely to hit ppc - which I don't think has a cmov? > > OTOH if the divide is in the ?: bit nothing probably depends > on it for a while - so the latency won't matter. > > Latest figures I have are for skylakeX > u-ops latency 1/throughput > DIV r8 10 10 p0 p1 p5 p6 23 6 > DIV r16 10 10 p0 p1 p5 p6 23 6 > DIV r32 10 10 p0 p1 p5 p6 26 6 > DIV r64 36 36 p0 p1 p5 p6 35-88 21-83 > IDIV r8 11 11 p0 p1 p5 p6 24 6 > IDIV r16 10 10 p0 p1 p5 p6 23 6 > IDIV r32 10 10 p0 p1 p5 p6 26 6 > IDIV r64 57 57 p0 p1 p5 p6 42-95 24-90 > > Broadwell is a bit slower. > Note that 64bit divide is really horrid. > > I think that one will be 32bit - so 'only' 26 clocks > latency. > > AMD Ryzen is a lot better for 64bit divides: > ltncy 1/thpt > DIV r8/m8 1 13-16 13-16 > DIV r16/m16 2 14-21 14-21 > DIV r32/m32 2 14-30 14-30 > DIV r64/m64 2 14-46 14-45 > IDIV r8/m8 1 13-16 13-16 > IDIV r16/m16 2 13-21 14-22 > IDIV r32/m32 2 14-30 14-30 > IDIV r64/m64 2 14-47 14-45 > But less pipelining for 32bit ones. > > Quite how those tables actually affect real code > is another matter - but they are guidelines about > what is possible (if you can get the u-ops executed > on the right ports). > Thanks, I think I will make sure that we use the 32bit divide then, because compiler might not be smart enough to detect both operands are < ~0U ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC net-next] tcp: allow larger TSO to be built under overload 2022-03-08 22:26 ` Eric Dumazet @ 2022-03-08 22:42 ` Eric Dumazet 0 siblings, 0 replies; 12+ messages in thread From: Eric Dumazet @ 2022-03-08 22:42 UTC (permalink / raw) To: David Laight Cc: Jakub Kicinski, netdev, Willem de Bruijn, Neal Cardwell, Yuchung Cheng On Tue, Mar 8, 2022 at 2:26 PM Eric Dumazet <edumazet@google.com> wrote: > > > Thanks, I think I will make sure that we use the 32bit divide then, > because compiler might not be smart enough to detect both operands are < ~0U BTW, it seems the compiler (clang for me) is smart enough. bytes = min_t(unsigned long, bytes, sk->sk_gso_max_size); return max_t(u32, bytes / mss_now, min_tso_segs); Compiler is using the divide by 32bit operation (div %ecx) If you remove the min_t() clamping, and only keep: return max_t(u32, bytes / mss_now, min_tso_segs); Then clang makes a special case if bytes >= (1UL<<32) 790d: 48 89 c2 mov %rax,%rdx 7910: 48 c1 ea 20 shr $0x20,%rdx 7914: 74 07 je 791d <tcp_tso_autosize+0x4d> 7916: 31 d2 xor %edx,%edx 7918: 48 f7 f1 div %rcx # More expensive divide 791b: eb 04 jmp 7921 <tcp_tso_autosize+0x51> 791d: 31 d2 xor %edx,%edx 791f: f7 f1 div %ecx 7921: 44 39 c0 cmp %r8d,%eax 7924: 44 0f 47 c0 cmova %eax,%r8d 7928: 44 89 c0 mov %r8d,%eax 792b: 5d pop %rbp 792c: c3 ret ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC net-next] tcp: allow larger TSO to be built under overload 2022-03-08 19:53 ` Eric Dumazet 2022-03-08 22:12 ` David Laight @ 2022-03-09 0:18 ` Jakub Kicinski 2022-03-09 1:09 ` Eric Dumazet 2022-03-09 16:42 ` Neal Cardwell 1 sibling, 2 replies; 12+ messages in thread From: Jakub Kicinski @ 2022-03-09 0:18 UTC (permalink / raw) To: Eric Dumazet Cc: David Laight, netdev, Willem de Bruijn, Neal Cardwell, Yuchung Cheng On Tue, 8 Mar 2022 11:53:38 -0800 Eric Dumazet wrote: > Jakub, Neal, I am going to send a patch for net-next. > > In conjunction with BIG TCP, this gives a considerable boost of performance. SGTM! The change may cause increased burstiness, or is that unlikely? I'm asking to gauge risk / figure out appropriate roll out plan. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC net-next] tcp: allow larger TSO to be built under overload 2022-03-09 0:18 ` Jakub Kicinski @ 2022-03-09 1:09 ` Eric Dumazet 2022-03-09 16:42 ` Neal Cardwell 1 sibling, 0 replies; 12+ messages in thread From: Eric Dumazet @ 2022-03-09 1:09 UTC (permalink / raw) To: Jakub Kicinski Cc: David Laight, netdev, Willem de Bruijn, Neal Cardwell, Yuchung Cheng On Tue, Mar 8, 2022 at 4:18 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Tue, 8 Mar 2022 11:53:38 -0800 Eric Dumazet wrote: > > Jakub, Neal, I am going to send a patch for net-next. > > > > In conjunction with BIG TCP, this gives a considerable boost of performance. > > SGTM! The change may cause increased burstiness, or is that unlikely? > I'm asking to gauge risk / figure out appropriate roll out plan. I guess we can make the shift factor a sysctl, default to 9. If you want to disable the feature, set the sysctl to a small value like 0 or 1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC net-next] tcp: allow larger TSO to be built under overload 2022-03-09 0:18 ` Jakub Kicinski 2022-03-09 1:09 ` Eric Dumazet @ 2022-03-09 16:42 ` Neal Cardwell 2022-03-09 16:54 ` Jakub Kicinski 1 sibling, 1 reply; 12+ messages in thread From: Neal Cardwell @ 2022-03-09 16:42 UTC (permalink / raw) To: Jakub Kicinski Cc: Eric Dumazet, David Laight, netdev, Willem de Bruijn, Yuchung Cheng On Tue, Mar 8, 2022 at 7:18 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Tue, 8 Mar 2022 11:53:38 -0800 Eric Dumazet wrote: > > Jakub, Neal, I am going to send a patch for net-next. > > > > In conjunction with BIG TCP, this gives a considerable boost of performance. > > SGTM! The change may cause increased burstiness, or is that unlikely? > I'm asking to gauge risk / figure out appropriate roll out plan. In theory it could cause increased burstiness in some scenarios, but in practice we have used this min_rtt-based TSO autosizing component in production for about two years, where we see improvements in load tests and no problems seen in production. neal ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC net-next] tcp: allow larger TSO to be built under overload 2022-03-09 16:42 ` Neal Cardwell @ 2022-03-09 16:54 ` Jakub Kicinski 0 siblings, 0 replies; 12+ messages in thread From: Jakub Kicinski @ 2022-03-09 16:54 UTC (permalink / raw) To: Neal Cardwell Cc: Eric Dumazet, David Laight, netdev, Willem de Bruijn, Yuchung Cheng On Wed, 9 Mar 2022 11:42:24 -0500 Neal Cardwell wrote: > > SGTM! The change may cause increased burstiness, or is that unlikely? > > I'm asking to gauge risk / figure out appropriate roll out plan. > > In theory it could cause increased burstiness in some scenarios, but > in practice we have used this min_rtt-based TSO autosizing component > in production for about two years, where we see improvements in load > tests and no problems seen in production. Perfect, sounds safe to put it in the next point release here, then. Thank you! ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-03-09 17:04 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-03-08 3:03 [RFC net-next] tcp: allow larger TSO to be built under overload Jakub Kicinski 2022-03-08 3:50 ` Eric Dumazet 2022-03-08 4:29 ` Jakub Kicinski 2022-03-08 9:07 ` David Laight 2022-03-08 19:53 ` Eric Dumazet 2022-03-08 22:12 ` David Laight 2022-03-08 22:26 ` Eric Dumazet 2022-03-08 22:42 ` Eric Dumazet 2022-03-09 0:18 ` Jakub Kicinski 2022-03-09 1:09 ` Eric Dumazet 2022-03-09 16:42 ` Neal Cardwell 2022-03-09 16:54 ` Jakub Kicinski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).