From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
To: sowmini.varadhan@oracle.com, netdev@vger.kernel.org
Cc: edumazet@google.com, brakmo@fb.com
Subject: [PATCH RFC net-next 0/3] Extensions to allow asynchronous TCP_INFO notifications based on congestion parameters
Date: Mon, 22 Oct 2018 08:23:57 -0700 [thread overview]
Message-ID: <cover.1540220847.git.sowmini.varadhan@oracle.com> (raw)
Problem statement:
We would like to monitor some subset of TCP sockets in user-space,
(the monitoring application would define 4-tuples it wants to monitor)
using TCP_INFO stats to analyze reported problems. The idea is to
use those stats to see where the bottlenecks are likely to be ("is it
application-limited?" or "is there evidence of BufferBloat in the
path?" etc)
Today we can do this by periodically polling for tcp_info, but this
could be made more efficient if the kernel would asynchronously
notify the application via tcp_info when some "interesting"
thresholds (e.g., "RTT variance > X", or "total_retrans > Y" etc)
are reached. And to make this effective, it is better if
we could apply the threshold check *before* constructing the
tcp_info netlink notification, so that we don't waste resources
constructing notifications that will be discarded by the filter.
One idea, implemented in this patchset, is to extend the tcp_call_bpf()
infra so that the BPF kernel module (the sock_ops filter/callback)
can examine the values in the sock_ops, apply any thresholds it wants,
and return some new status ("BPF_TCP_INFO_NOTIFY"). Use this status in
the tcp stack to queue up a tcp_info notification (similar to
sock_diag_broadcast_destroy() today..)
Patch 1 in this set refactors the existing sock_diag code so that
the functions can be reused for notifications from other states than CLOSE.
Patch 2 provides a minor extension to tcp_call_bpf() so that it
will queue a tcp_info_notification if the BPF callout returns
BPF_TCP_INFO_NOTIFY
Patch 3, provided strictly as a demonstration/PoC to aid in reviewing
this proposal, shows a simple sample/bpf example where we trigger the
tcp_info notification for an iperf connection if the number of
retransmits exceeds 16.
Sowmini Varadhan (3):
sock_diag: Refactor inet_sock_diag_destroy code
tcp: BPF_TCP_INFO_NOTIFY support
bpf: Added a sample for tcp_info_notify callback
include/linux/sock_diag.h | 18 +++++++---
include/net/tcp.h | 15 +++++++-
include/uapi/linux/bpf.h | 4 ++
include/uapi/linux/sock_diag.h | 2 +
net/core/sock.c | 4 +-
net/core/sock_diag.c | 11 +++---
samples/bpf/Makefile | 1 +
samples/bpf/tcp_notify_kern.c | 73 ++++++++++++++++++++++++++++++++++++++++
8 files changed, 114 insertions(+), 14 deletions(-)
create mode 100644 samples/bpf/tcp_notify_kern.c
next reply other threads:[~2018-10-22 23:48 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-22 15:23 Sowmini Varadhan [this message]
2018-10-22 15:23 ` [PATCH RFC net-next 1/3] sock_diag: Refactor inet_sock_diag_destroy code Sowmini Varadhan
2018-10-22 15:23 ` [PATCH RFC net-next 2/3] tcp: BPF_TCP_INFO_NOTIFY support Sowmini Varadhan
2018-10-22 15:24 ` [PATCH RFC net-next 3/3] bpf: Added a sample for tcp_info_notify callback Sowmini Varadhan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1540220847.git.sowmini.varadhan@oracle.com \
--to=sowmini.varadhan@oracle.com \
--cc=brakmo@fb.com \
--cc=edumazet@google.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).