From: Daniel Borkmann <dborkman@redhat.com>
To: davem@davemloft.net
Cc: hagen@jauu.net, lars@netapp.com, eric.dumazet@gmail.com,
fontana@sharpeleven.org, hannes@stressinduktion.org,
stephen@networkplumber.org, glenn.judd@morganstanley.com,
dborkman@redhat.com, fw@strlen.de, netdev@vger.kernel.org
Subject: [PATCH net-next v3 4/5] net: tcp: more detailed ACK events and events for CE marked packets
Date: Fri, 26 Sep 2014 22:37:35 +0200 [thread overview]
Message-ID: <1411763856-14230-5-git-send-email-dborkman@redhat.com> (raw)
In-Reply-To: <1411763856-14230-1-git-send-email-dborkman@redhat.com>
From: Florian Westphal <fw@strlen.de>
DataCenter TCP (DCTCP) determines cwnd growth based on ECN information
and ACK properties, e.g. ACK that updates window is treated differently
than DUPACK.
Also DCTCP needs information whether ACK was delayed ACK. Furthermore,
DCTCP also implements a CE state machine that keeps track of CE markings
of incoming packets.
Therefore, extend the congestion control framework to provide these
event types, so that DCTCP can be properly implemented as a normal
congestion algorithm module outside of the core stack.
Joint work with Daniel Borkmann and Glenn Judd.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
---
include/net/tcp.h | 9 ++++++++-
net/ipv4/tcp_input.c | 22 ++++++++++++++++++----
net/ipv4/tcp_output.c | 4 ++++
3 files changed, 30 insertions(+), 5 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index e71884a..382714e 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -744,10 +744,17 @@ enum tcp_ca_event {
CA_EVENT_CWND_RESTART, /* congestion window restart */
CA_EVENT_COMPLETE_CWR, /* end of congestion recovery */
CA_EVENT_LOSS, /* loss timeout */
+ CA_EVENT_ECN_NO_CE, /* ECT set, but not CE marked */
+ CA_EVENT_ECN_IS_CE, /* received CE marked IP packet */
+ CA_EVENT_DELAYED_ACK, /* Delayed ack is sent */
+ CA_EVENT_NON_DELAYED_ACK,
};
+/* Information about inbound ACK, passed to cong_ops->in_ack_event() */
enum tcp_ca_ack_event_flags {
- CA_ACK_SLOWPATH = (1 << 0),
+ CA_ACK_SLOWPATH = (1 << 0), /* In slow path processing */
+ CA_ACK_WIN_UPDATE = (1 << 1), /* ACK updated window */
+ CA_ACK_ECE = (1 << 2), /* ECE bit is set on ack */
};
/*
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index e28f80a..45c779a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -233,14 +233,21 @@ static inline void TCP_ECN_check_ce(struct tcp_sock *tp, const struct sk_buff *s
tcp_enter_quickack_mode((struct sock *)tp);
break;
case INET_ECN_CE:
+ if (tcp_ca_needs_ecn((struct sock *)tp))
+ tcp_ca_event((struct sock *)tp, CA_EVENT_ECN_IS_CE);
+
if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
/* Better not delay acks, sender can have a very low cwnd */
tcp_enter_quickack_mode((struct sock *)tp);
tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
}
- /* fallinto */
+ tp->ecn_flags |= TCP_ECN_SEEN;
+ break;
default:
+ if (tcp_ca_needs_ecn((struct sock *)tp))
+ tcp_ca_event((struct sock *)tp, CA_EVENT_ECN_NO_CE);
tp->ecn_flags |= TCP_ECN_SEEN;
+ break;
}
}
@@ -3429,10 +3436,12 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
tp->snd_una = ack;
flag |= FLAG_WIN_UPDATE;
- tcp_in_ack_event(sk, 0);
+ tcp_in_ack_event(sk, CA_ACK_WIN_UPDATE);
NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPHPACKS);
} else {
+ u32 ack_ev_flags = CA_ACK_SLOWPATH;
+
if (ack_seq != TCP_SKB_CB(skb)->end_seq)
flag |= FLAG_DATA;
else
@@ -3444,10 +3453,15 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
flag |= tcp_sacktag_write_queue(sk, skb, prior_snd_una,
&sack_rtt_us);
- if (TCP_ECN_rcv_ecn_echo(tp, tcp_hdr(skb)))
+ if (TCP_ECN_rcv_ecn_echo(tp, tcp_hdr(skb))) {
flag |= FLAG_ECE;
+ ack_ev_flags |= CA_ACK_ECE;
+ }
+
+ if (flag & FLAG_WIN_UPDATE)
+ ack_ev_flags |= CA_ACK_WIN_UPDATE;
- tcp_in_ack_event(sk, CA_ACK_SLOWPATH);
+ tcp_in_ack_event(sk, ack_ev_flags);
}
/* We passed data and got it acked, remove any soft error
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index a6ff21f..4ec4ea6 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3122,6 +3122,8 @@ void tcp_send_delayed_ack(struct sock *sk)
int ato = icsk->icsk_ack.ato;
unsigned long timeout;
+ tcp_ca_event(sk, CA_EVENT_DELAYED_ACK);
+
if (ato > TCP_DELACK_MIN) {
const struct tcp_sock *tp = tcp_sk(sk);
int max_ato = HZ / 2;
@@ -3178,6 +3180,8 @@ void tcp_send_ack(struct sock *sk)
if (sk->sk_state == TCP_CLOSE)
return;
+ tcp_ca_event(sk, CA_EVENT_NON_DELAYED_ACK);
+
/* We are not putting this on the write queue, so
* tcp_transmit_skb() will set the ownership to this
* sock.
--
1.7.11.7
next prev parent reply other threads:[~2014-09-26 20:38 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-26 20:37 [PATCH net-next v3 0/5] net: tcp: DCTCP congestion control algorithm Daniel Borkmann
2014-09-26 20:37 ` [PATCH net-next v3 1/5] net: tcp: assign tcp cong_ops when tcp sk is created Daniel Borkmann
2014-09-26 20:37 ` [PATCH net-next v3 2/5] net: tcp: add flag for ca to indicate that ECN is required Daniel Borkmann
2014-09-26 20:37 ` [PATCH net-next v3 3/5] net: tcp: split ack slow/fast events from cwnd_event Daniel Borkmann
2014-09-26 20:37 ` Daniel Borkmann [this message]
2014-09-26 20:37 ` [PATCH net-next v3 5/5] net: tcp: add DCTCP congestion control algorithm Daniel Borkmann
2014-09-29 4:25 ` [PATCH net-next v3 0/5] net: tcp: " David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1411763856-14230-5-git-send-email-dborkman@redhat.com \
--to=dborkman@redhat.com \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=fontana@sharpeleven.org \
--cc=fw@strlen.de \
--cc=glenn.judd@morganstanley.com \
--cc=hagen@jauu.net \
--cc=hannes@stressinduktion.org \
--cc=lars@netapp.com \
--cc=netdev@vger.kernel.org \
--cc=stephen@networkplumber.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).