From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
To: sowmini.varadhan@oracle.com, netdev@vger.kernel.org
Cc: edumazet@google.com, brakmo@fb.com
Subject: [PATCH RFC net-next 0/3] Extensions to allow asynchronous TCP_INFO notifications based on congestion parameters
Date: Mon, 22 Oct 2018 08:23:57 -0700 [thread overview]
Message-ID: <cover.1540220847.git.sowmini.varadhan@oracle.com> (raw)
Problem statement:
We would like to monitor some subset of TCP sockets in user-space,
(the monitoring application would define 4-tuples it wants to monitor)
using TCP_INFO stats to analyze reported problems. The idea is to
use those stats to see where the bottlenecks are likely to be ("is it
application-limited?" or "is there evidence of BufferBloat in the
path?" etc)
Today we can do this by periodically polling for tcp_info, but this
could be made more efficient if the kernel would asynchronously
notify the application via tcp_info when some "interesting"
thresholds (e.g., "RTT variance > X", or "total_retrans > Y" etc)
are reached. And to make this effective, it is better if
we could apply the threshold check *before* constructing the
tcp_info netlink notification, so that we don't waste resources
constructing notifications that will be discarded by the filter.
One idea, implemented in this patchset, is to extend the tcp_call_bpf()
infra so that the BPF kernel module (the sock_ops filter/callback)
can examine the values in the sock_ops, apply any thresholds it wants,
and return some new status ("BPF_TCP_INFO_NOTIFY"). Use this status in
the tcp stack to queue up a tcp_info notification (similar to
sock_diag_broadcast_destroy() today..)
Patch 1 in this set refactors the existing sock_diag code so that
the functions can be reused for notifications from other states than CLOSE.
Patch 2 provides a minor extension to tcp_call_bpf() so that it
will queue a tcp_info_notification if the BPF callout returns
BPF_TCP_INFO_NOTIFY
Patch 3, provided strictly as a demonstration/PoC to aid in reviewing
this proposal, shows a simple sample/bpf example where we trigger the
tcp_info notification for an iperf connection if the number of
retransmits exceeds 16.
Sowmini Varadhan (3):
sock_diag: Refactor inet_sock_diag_destroy code
tcp: BPF_TCP_INFO_NOTIFY support
bpf: Added a sample for tcp_info_notify callback
include/linux/sock_diag.h | 18 +++++++---
include/net/tcp.h | 15 +++++++-
include/uapi/linux/bpf.h | 4 ++
include/uapi/linux/sock_diag.h | 2 +
net/core/sock.c | 4 +-
net/core/sock_diag.c | 11 +++---
samples/bpf/Makefile | 1 +
samples/bpf/tcp_notify_kern.c | 73 ++++++++++++++++++++++++++++++++++++++++
8 files changed, 114 insertions(+), 14 deletions(-)
create mode 100644 samples/bpf/tcp_notify_kern.c
next reply other threads:[~2018-10-22 23:48 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-22 15:23 Sowmini Varadhan [this message]
2018-10-22 15:23 ` [PATCH RFC net-next 1/3] sock_diag: Refactor inet_sock_diag_destroy code Sowmini Varadhan
2018-10-22 15:23 ` [PATCH RFC net-next 2/3] tcp: BPF_TCP_INFO_NOTIFY support Sowmini Varadhan
2018-10-22 15:24 ` [PATCH RFC net-next 3/3] bpf: Added a sample for tcp_info_notify callback Sowmini Varadhan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1540220847.git.sowmini.varadhan@oracle.com \
--to=sowmini.varadhan@oracle.com \
--cc=brakmo@fb.com \
--cc=edumazet@google.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.