From: Martin KaFai Lau <martin.lau@linux.dev>
To: Miao Xu <miaxu@meta.com>
Cc: Eric Dumazet <edumazet@google.com>,
"David S . Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
David Ahern <dsahern@kernel.org>, Martin Lau <kafai@meta.com>,
netdev@vger.kernel.org, bpf@vger.kernel.org
Subject: Re: [PATCH 2/2] [PATCH net-next,2/2] Add test for the use of new args in cong_control
Date: Thu, 25 Apr 2024 12:09:57 -0700 [thread overview]
Message-ID: <6e134365-b793-49f9-82dd-d148bdd16c85@linux.dev> (raw)
In-Reply-To: <20240424203713.4003974-2-miaxu@meta.com>
On 4/24/24 1:37 PM, Miao Xu wrote:
> +#define USEC_PER_SEC 1000000UL
> +#define TCP_PACING_SS_RATIO (200)
> +#define TCP_PACING_CA_RATIO (120)
> +#define TCP_REORDERING (12)
> +#define likely(x) (__builtin_expect(!!(x), 1))
> +
> +static __always_inline __u64 div64_u64(__u64 dividend, __u64 divisor)
> +{
> + return dividend / divisor;
> +}
> +
> +static void tcp_update_pacing_rate(struct sock *sk)
> +{
> + const struct tcp_sock *tp = tcp_sk(sk);
> + __u64 rate;
> +
> + /* set sk_pacing_rate to 200 % of current rate (mss * cwnd / srtt) */
> + rate = (__u64)tp->mss_cache * ((USEC_PER_SEC / 100) << 3);
> +
> + /* current rate is (cwnd * mss) / srtt
> + * In Slow Start [1], set sk_pacing_rate to 200 % the current rate.
> + * In Congestion Avoidance phase, set it to 120 % the current rate.
> + *
> + * [1] : Normal Slow Start condition is (tp->snd_cwnd < tp->snd_ssthresh)
> + * If snd_cwnd >= (tp->snd_ssthresh / 2), we are approaching
> + * end of slow start and should slow down.
> + */
> + if (tp->snd_cwnd < tp->snd_ssthresh / 2)
> + rate *= TCP_PACING_SS_RATIO;
> + else
> + rate *= TCP_PACING_CA_RATIO;
> +
> + rate *= max(tp->snd_cwnd, tp->packets_out);
> +
> + if (likely(tp->srtt_us))
> + rate = div64_u64(rate, (__u64)tp->srtt_us);
> +
> + sk->sk_pacing_rate = min(rate, (__u64)sk->sk_max_pacing_rate);
> +}
> +
> +static __always_inline void tcp_cwnd_reduction(
> + struct sock *sk,
> + int newly_acked_sacked,
> + int newly_lost,
> + int flag) {
> + struct tcp_sock *tp = tcp_sk(sk);
> + int sndcnt = 0;
> + __u32 pkts_in_flight = tp->packets_out - (tp->sacked_out + tp->lost_out) + tp->retrans_out;
> + int delta = tp->snd_ssthresh - pkts_in_flight;
> +
> + if (newly_acked_sacked <= 0 || !tp->prior_cwnd)
> + return;
> +
> + __u32 prr_delivered = tp->prr_delivered + newly_acked_sacked;
> +
> + if (delta < 0) {
> + __u64 dividend =
> + (__u64)tp->snd_ssthresh * prr_delivered + tp->prior_cwnd - 1;
> + sndcnt = (__u32)div64_u64(dividend, (__u64)tp->prior_cwnd) - tp->prr_out;
> + } else {
> + sndcnt = max(prr_delivered - tp->prr_out, newly_acked_sacked);
> + if (flag & FLAG_SND_UNA_ADVANCED && !newly_lost)
> + sndcnt++;
> + sndcnt = min(delta, sndcnt);
> + }
> + /* Force a fast retransmit upon entering fast recovery */
> + sndcnt = max(sndcnt, (tp->prr_out ? 0 : 1));
> + tp->snd_cwnd = pkts_in_flight + sndcnt;
> +}
> +
> +/* Decide wheather to run the increase function of congestion control. */
> +static __always_inline bool tcp_may_raise_cwnd(
> + const struct sock *sk,
> + const int flag) {
> + if (tcp_sk(sk)->reordering > TCP_REORDERING)
> + return flag & FLAG_FORWARD_PROGRESS;
> +
> + return flag & FLAG_DATA_ACKED;
> +}
> +
> +void BPF_STRUCT_OPS(bpf_cubic_cong_control, struct sock *sk, __u32 ack, int flag,
> + const struct rate_sample *rs)
> +{
> + struct tcp_sock *tp = tcp_sk(sk);
> +
> + if (((1<<TCP_CA_CWR) | (1<<TCP_CA_Recovery)) &
> + (1 << inet_csk(sk)->icsk_ca_state)) {
> + /* Reduce cwnd if state mandates */
> + tcp_cwnd_reduction(sk, rs->acked_sacked, rs->losses, flag);
> +
> + if (!before(tp->snd_una, tp->high_seq)) {
> + /* Reset cwnd to ssthresh in CWR or Recovery (unless it's undone) */
> + if (tp->snd_ssthresh < TCP_INFINITE_SSTHRESH &&
> + inet_csk(sk)->icsk_ca_state == TCP_CA_CWR) {
> + tp->snd_cwnd = tp->snd_ssthresh;
> + tp->snd_cwnd_stamp = tcp_jiffies32;
> + }
> + // __cwnd_event(sk, CA_EVENT_COMPLETE_CWR);
> + }
> + } else if (tcp_may_raise_cwnd(sk, flag)) {
> + /* Advance cwnd if state allows */
> + cubictcp_cong_avoid(sk, ack, rs->acked_sacked);
> + tp->snd_cwnd_stamp = tcp_jiffies32;
> + }
> +
> + tcp_update_pacing_rate(sk);
It will be useful to highlight what you want to do differently from the kernel's
tcp_cong_control()+tcp_cong_avoid() here. or it is something that I missed from
the above example?
next prev parent reply other threads:[~2024-04-25 19:10 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-24 20:37 [PATCH 1/2] [PATCH net-next,1/2] Add new args for cong_control in tcp_congestion_ops Miao Xu
2024-04-24 20:37 ` [PATCH 2/2] [PATCH net-next,2/2] Add test for the use of new args in cong_control Miao Xu
2024-04-25 19:09 ` Martin KaFai Lau [this message]
2024-04-27 13:06 ` [PATCH 1/2] [PATCH net-next,1/2] Add new args for cong_control in tcp_congestion_ops Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6e134365-b793-49f9-82dd-d148bdd16c85@linux.dev \
--to=martin.lau@linux.dev \
--cc=bpf@vger.kernel.org \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=kafai@meta.com \
--cc=kuba@kernel.org \
--cc=miaxu@meta.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox