netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PATCH net-next v3 00/15
@ 2017-06-20  3:00 Lawrence Brakmo
  2017-06-20  3:00 ` [PATCH net-next v3 01/15] bpf: BPF support for sock_ops Lawrence Brakmo
                   ` (14 more replies)
  0 siblings, 15 replies; 28+ messages in thread
From: Lawrence Brakmo @ 2017-06-20  3:00 UTC (permalink / raw)
  To: netdev
  Cc: Kernel Team, Blake Matheny, Alexei Starovoitov, Daniel Borkmann,
	David Ahern

Created a new BPF program type, BPF_PROG_TYPE_SOCK_OPS, and a corresponding
struct that allows BPF programs of this type to access some of the
socket's fields (such as IP addresses, ports, etc.) and setting
connection parameters such as buffer sizes, initial window, SYN/SYN-ACK
RTOs, etc.

Unlike current BPF program types that expect to be called at a particular
place in the network stack code, SOCK_OPS program can be called at
different places and use an "op" field to indicate the context. There
are currently two types of operations, those whose effect is through
their return value and those whose effect is through the new
bpf_setsocketop BPF helper function.

Example operands of the first type are:
  BPF_SOCK_OPS_TIMEOUT_INIT
  BPF_SOCK_OPS_RWND_INIT
  BPF_SOCK_OPS_NEEDS_ECN

Example operands of the secont type are:
  BPF_SOCK_OPS_TCP_CONNECT_CB
  BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB
  BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB

Current operands are only called during connection establishment so
there should not be any BPF overheads after connection establishment. The
main idea is to use connection information form both hosts, such as IP
addresses and ports to allow setting of per connection parameters to
optimize the connection's peformance.

Alghough there are already 3 mechanisms to set parameters (sysctls,
route metrics and setsockopts), this new mechanism provides some
disticnt advantages. Unlike sysctls, it can set parameters per
connection. In contrast to route metrics, it can also use port numbers
and information provided by a user level program. In addition, it could
set parameters probabilistically for evaluation purposes (i.e. do
something different on 10% of the flows and compare results with the
other 90% of the flows). Also, in cases where IPv6 addresses contain
geographic information, the rules to make changes based on the distance
(or RTT) between the hosts are much easier than route metric rules and
can be global. Finally, unlike setsockopt, it does not require
application changes and it can be updated easily at any time.

Currently there is functionality to load one global BPF program of this
type but I plan to add support for loading per cgroup socket ops BPF
programs in the near future. When that is done, the global program could
be called when a cgroup has no program associated with it.

One question is whether I should add this functionality into David Ahern's
BPF_PROG_TYPE_CGROUP_SOCK or create a new cgroup bpf type. Whereas the
current cgroup_sock type expects to be called only once during a connection's
lifetime, the new socket_ops type could be called multipe times. My preference
is to define a new bpf attach type, BPF_CGROUP_SOCK_OPS, to attach
BPF_PROG_TYPE_SOCK_OPS to cgroups.

This patch set also includes sample BPF programs to demostrate the differnet
features.

v2: Formatting changes, rebased to latest net-next

v3: Fixed build issues, changed socket_ops to sock_ops throught,
    fixed formatting issues, removed the syscall to load sock_ops
    program and added functionality to use existing bpf attach and
    bpf detach system calls, removed reader/writer locks in
    sock_bpfops.c (used when saving sock_ops global program)

Consists of the following patches:


 include/linux/bpf.h           |   6 ++
 include/linux/bpf_types.h     |   1 +
 include/linux/filter.h        |  10 ++
 include/net/tcp.h             |  60 ++++++++++-
 include/uapi/linux/bpf.h      |  66 +++++++++++-
 kernel/bpf/syscall.c          |  62 +++++++++---
 net/core/Makefile             |   3 +-
 net/core/filter.c             | 271 ++++++++++++++++++++++++++++++++++++++++++++++++++
 net/core/sock_bpfops.c        |  65 ++++++++++++
 net/ipv4/tcp.c                |   2 +-
 net/ipv4/tcp_cong.c           |  32 ++++--
 net/ipv4/tcp_fastopen.c       |   1 +
 net/ipv4/tcp_input.c          |  10 +-
 net/ipv4/tcp_minisocks.c      |   9 +-
 net/ipv4/tcp_output.c         |  18 +++-
 samples/bpf/Makefile          |   9 ++
 samples/bpf/bpf_helpers.h     |   3 +
 samples/bpf/bpf_load.c        |  13 ++-
 samples/bpf/tcp_bpf.c         |  86 ++++++++++++++++
 samples/bpf/tcp_bufs_kern.c   |  76 ++++++++++++++
 samples/bpf/tcp_clamp_kern.c  |  93 +++++++++++++++++
 samples/bpf/tcp_cong_kern.c   |  73 ++++++++++++++
 samples/bpf/tcp_iw_kern.c     |  78 +++++++++++++++
 samples/bpf/tcp_rwnd_kern.c   |  60 +++++++++++
 samples/bpf/tcp_synrto_kern.c |  59 +++++++++++
 25 files changed, 1126 insertions(+), 40 deletions(-)

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-06-29  9:47 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-20  3:00 PATCH net-next v3 00/15 Lawrence Brakmo
2017-06-20  3:00 ` [PATCH net-next v3 01/15] bpf: BPF support for sock_ops Lawrence Brakmo
2017-06-22 22:41   ` Daniel Borkmann
2017-06-22 22:58     ` Lawrence Brakmo
2017-06-22 23:19       ` Daniel Borkmann
2017-06-22 23:57         ` Lawrence Brakmo
2017-06-23 21:15           ` Daniel Borkmann
2017-06-28 17:45             ` Lawrence Brakmo
2017-06-29  9:47               ` Daniel Borkmann
2017-06-20  3:00 ` [PATCH net-next v3 02/15] bpf: program to load sock_ops BPF programs Lawrence Brakmo
2017-06-20  3:00 ` [PATCH net-next v3 03/15] bpf: Support for per connection SYN/SYN-ACK RTOs Lawrence Brakmo
2017-06-20  3:00 ` [PATCH net-next v3 04/15] bpf: Sample bpf program to set " Lawrence Brakmo
2017-06-20  3:00 ` [PATCH net-next v3 05/15] bpf: Support for setting initial receive window Lawrence Brakmo
2017-06-20  3:00 ` [PATCH net-next v3 06/15] bpf: Sample bpf program to set initial window Lawrence Brakmo
2017-06-20  3:00 ` [PATCH net-next v3 07/15] bpf: Add setsockopt helper function to bpf Lawrence Brakmo
2017-06-20 21:25   ` Craig Gallek
2017-06-21 16:51     ` Lawrence Brakmo
2017-06-21 17:13       ` Craig Gallek
2017-06-21 23:55         ` Lawrence Brakmo
2017-06-20  3:00 ` [PATCH net-next v3 08/15] bpf: Add TCP connection BPF callbacks Lawrence Brakmo
2017-06-20  3:00 ` [PATCH net-next v3 09/15] bpf: Sample BPF program to set buffer sizes Lawrence Brakmo
2017-06-20  3:00 ` [PATCH net-next v3 10/15] bpf: Add support for changing congestion control Lawrence Brakmo
2017-06-20  8:40   ` kbuild test robot
2017-06-20  3:00 ` [PATCH net-next v3 11/15] bpf: Sample BPF program to set " Lawrence Brakmo
2017-06-20  3:00 ` [PATCH net-next v3 12/15] bpf: Adds support for setting initial cwnd Lawrence Brakmo
2017-06-20  3:00 ` [PATCH net-next v3 13/15] bpf: Sample BPF program to set " Lawrence Brakmo
2017-06-20  3:00 ` [PATCH net-next v3 14/15] bpf: Adds support for setting sndcwnd clamp Lawrence Brakmo
2017-06-20  3:00 ` [PATCH net-next v3 15/15] bpf: Sample bpf program to set " Lawrence Brakmo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).