* [PATCH mptcp-next 0/3] doc: introduce MPTCP global doc
@ 2024-05-17 17:40 Matthieu Baerts (NGI0)
2024-05-17 17:40 ` [PATCH mptcp-next 1/3] doc: mptcp: add missing 'available_schedulers' entry Matthieu Baerts (NGI0)
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Matthieu Baerts (NGI0) @ 2024-05-17 17:40 UTC (permalink / raw)
To: mptcp; +Cc: Matthieu Baerts (NGI0)
The most interesting bit is in the last patch: a new 'mptcp' page in
'networking' documentation.
The first patch is a fix (missing sysctl entry), and the second one
reorder the sysctl entries.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
Matthieu Baerts (NGI0) (3):
doc: mptcp: add missing 'available_schedulers' entry
doc: mptcp: alphabetical order
doc: new 'mptcp' page in 'networking'
Documentation/networking/index.rst | 1 +
Documentation/networking/mptcp-sysctl.rst | 74 +++++++-------
Documentation/networking/mptcp.rst | 155 ++++++++++++++++++++++++++++++
MAINTAINERS | 2 +-
4 files changed, 196 insertions(+), 36 deletions(-)
---
base-commit: 2ccd59d4df6aef3f553caf11e00af66300448e63
change-id: 20240517-mptcp-doc-315f909a12c4
Best regards,
--
Matthieu Baerts (NGI0) <matttbe@kernel.org>
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH mptcp-next 1/3] doc: mptcp: add missing 'available_schedulers' entry 2024-05-17 17:40 [PATCH mptcp-next 0/3] doc: introduce MPTCP global doc Matthieu Baerts (NGI0) @ 2024-05-17 17:40 ` Matthieu Baerts (NGI0) 2024-05-17 20:34 ` Mat Martineau 2024-05-17 17:40 ` [PATCH mptcp-next 2/3] doc: mptcp: alphabetical order Matthieu Baerts (NGI0) ` (2 subsequent siblings) 3 siblings, 1 reply; 10+ messages in thread From: Matthieu Baerts (NGI0) @ 2024-05-17 17:40 UTC (permalink / raw) To: mptcp; +Cc: Matthieu Baerts (NGI0) This sysctl knob has been added recently, but the documentation has not been updated. This knob is used to show the available schedulers choices that are registered, similar to 'net.ipv4.tcp_available_congestion_control'. Fixes: 73c900aa3660 ("mptcp: add net.mptcp.available_schedulers") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- Documentation/networking/mptcp-sysctl.rst | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/Documentation/networking/mptcp-sysctl.rst b/Documentation/networking/mptcp-sysctl.rst index 69975ce25a02..102a45e7bfa8 100644 --- a/Documentation/networking/mptcp-sysctl.rst +++ b/Documentation/networking/mptcp-sysctl.rst @@ -93,3 +93,7 @@ scheduler - STRING sysctl. Default: "default" + +available_schedulers - STRING + Shows the available schedulers choices that are registered. More packet + schedulers may be available, but not loaded. -- 2.43.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH mptcp-next 1/3] doc: mptcp: add missing 'available_schedulers' entry 2024-05-17 17:40 ` [PATCH mptcp-next 1/3] doc: mptcp: add missing 'available_schedulers' entry Matthieu Baerts (NGI0) @ 2024-05-17 20:34 ` Mat Martineau 0 siblings, 0 replies; 10+ messages in thread From: Mat Martineau @ 2024-05-17 20:34 UTC (permalink / raw) To: Matthieu Baerts (NGI0); +Cc: mptcp On Fri, 17 May 2024, Matthieu Baerts (NGI0) wrote: > This sysctl knob has been added recently, but the documentation has not > been updated. > > This knob is used to show the available schedulers choices that are > registered, similar to 'net.ipv4.tcp_available_congestion_control'. > > Fixes: 73c900aa3660 ("mptcp: add net.mptcp.available_schedulers") > Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> > --- > Documentation/networking/mptcp-sysctl.rst | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/Documentation/networking/mptcp-sysctl.rst b/Documentation/networking/mptcp-sysctl.rst > index 69975ce25a02..102a45e7bfa8 100644 > --- a/Documentation/networking/mptcp-sysctl.rst > +++ b/Documentation/networking/mptcp-sysctl.rst > @@ -93,3 +93,7 @@ scheduler - STRING > sysctl. > > Default: "default" > + > +available_schedulers - STRING > + Shows the available schedulers choices that are registered. More packet > + schedulers may be available, but not loaded. > > -- Reviewed-by: Mat Martineau <martineau@kernel.org> ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH mptcp-next 2/3] doc: mptcp: alphabetical order 2024-05-17 17:40 [PATCH mptcp-next 0/3] doc: introduce MPTCP global doc Matthieu Baerts (NGI0) 2024-05-17 17:40 ` [PATCH mptcp-next 1/3] doc: mptcp: add missing 'available_schedulers' entry Matthieu Baerts (NGI0) @ 2024-05-17 17:40 ` Matthieu Baerts (NGI0) 2024-05-17 20:37 ` Mat Martineau 2024-05-17 17:40 ` [PATCH mptcp-next 3/3] doc: new 'mptcp' page in 'networking' Matthieu Baerts (NGI0) 2024-05-17 18:28 ` [PATCH mptcp-next 0/3] doc: introduce MPTCP global doc MPTCP CI 3 siblings, 1 reply; 10+ messages in thread From: Matthieu Baerts (NGI0) @ 2024-05-17 17:40 UTC (permalink / raw) To: mptcp; +Cc: Matthieu Baerts (NGI0) Similar to what is done in other 'sysctl' pages. Also, by not putting new entries at the end, this can help to reduce conflicts in case of backports. Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- Documentation/networking/mptcp-sysctl.rst | 78 +++++++++++++++---------------- 1 file changed, 39 insertions(+), 39 deletions(-) diff --git a/Documentation/networking/mptcp-sysctl.rst b/Documentation/networking/mptcp-sysctl.rst index 102a45e7bfa8..fd514bba8c43 100644 --- a/Documentation/networking/mptcp-sysctl.rst +++ b/Documentation/networking/mptcp-sysctl.rst @@ -7,14 +7,6 @@ MPTCP Sysfs variables /proc/sys/net/mptcp/* Variables =============================== -enabled - BOOLEAN - Control whether MPTCP sockets can be created. - - MPTCP sockets can be created if the value is 1. This is a - per-namespace sysctl. - - Default: 1 (enabled) - add_addr_timeout - INTEGER (seconds) Set the timeout after which an ADD_ADDR control message will be resent to an MPTCP peer that has not acknowledged a previous @@ -25,25 +17,6 @@ add_addr_timeout - INTEGER (seconds) Default: 120 -close_timeout - INTEGER (seconds) - Set the make-after-break timeout: in absence of any close or - shutdown syscall, MPTCP sockets will maintain the status - unchanged for such time, after the last subflow removal, before - moving to TCP_CLOSE. - - The default value matches TCP_TIMEWAIT_LEN. This is a per-namespace - sysctl. - - Default: 60 - -checksum_enabled - BOOLEAN - Control whether DSS checksum can be enabled. - - DSS checksum can be enabled if the value is nonzero. This is a - per-namespace sysctl. - - Default: 0 - allow_join_initial_addr_port - BOOLEAN Allow peers to send join requests to the IP address and port number used by the initial subflow if the value is 1. This controls a flag that is @@ -57,6 +30,37 @@ allow_join_initial_addr_port - BOOLEAN Default: 1 +available_schedulers - STRING + Shows the available schedulers choices that are registered. More packet + schedulers may be available, but not loaded. + +checksum_enabled - BOOLEAN + Control whether DSS checksum can be enabled. + + DSS checksum can be enabled if the value is nonzero. This is a + per-namespace sysctl. + + Default: 0 + +close_timeout - INTEGER (seconds) + Set the make-after-break timeout: in absence of any close or + shutdown syscall, MPTCP sockets will maintain the status + unchanged for such time, after the last subflow removal, before + moving to TCP_CLOSE. + + The default value matches TCP_TIMEWAIT_LEN. This is a per-namespace + sysctl. + + Default: 60 + +enabled - BOOLEAN + Control whether MPTCP sockets can be created. + + MPTCP sockets can be created if the value is 1. This is a + per-namespace sysctl. + + Default: 1 (enabled) + pm_type - INTEGER Set the default path manager type to use for each new MPTCP socket. In-kernel path management will control subflow @@ -74,6 +78,14 @@ pm_type - INTEGER Default: 0 +scheduler - STRING + Select the scheduler of your choice. + + Support for selection of different schedulers. This is a per-namespace + sysctl. + + Default: "default" + stale_loss_cnt - INTEGER The number of MPTCP-level retransmission intervals with no traffic and pending outstanding data on a given subflow required to declare it stale. @@ -85,15 +97,3 @@ stale_loss_cnt - INTEGER This is a per-namespace sysctl. Default: 4 - -scheduler - STRING - Select the scheduler of your choice. - - Support for selection of different schedulers. This is a per-namespace - sysctl. - - Default: "default" - -available_schedulers - STRING - Shows the available schedulers choices that are registered. More packet - schedulers may be available, but not loaded. -- 2.43.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH mptcp-next 2/3] doc: mptcp: alphabetical order 2024-05-17 17:40 ` [PATCH mptcp-next 2/3] doc: mptcp: alphabetical order Matthieu Baerts (NGI0) @ 2024-05-17 20:37 ` Mat Martineau 2024-05-18 15:46 ` Matthieu Baerts 0 siblings, 1 reply; 10+ messages in thread From: Mat Martineau @ 2024-05-17 20:37 UTC (permalink / raw) To: Matthieu Baerts (NGI0); +Cc: mptcp On Fri, 17 May 2024, Matthieu Baerts (NGI0) wrote: > Similar to what is done in other 'sysctl' pages. > > Also, by not putting new entries at the end, this can help to reduce > conflicts in case of backports. > Putting these in order makes sense to me from a readability perspective. This does replace one backporting problem with another, but we don't change the information here too often. Reviewed-by: Mat Martineau <martineau@kernel.org> > Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> > --- > Documentation/networking/mptcp-sysctl.rst | 78 +++++++++++++++---------------- > 1 file changed, 39 insertions(+), 39 deletions(-) > > diff --git a/Documentation/networking/mptcp-sysctl.rst b/Documentation/networking/mptcp-sysctl.rst > index 102a45e7bfa8..fd514bba8c43 100644 > --- a/Documentation/networking/mptcp-sysctl.rst > +++ b/Documentation/networking/mptcp-sysctl.rst > @@ -7,14 +7,6 @@ MPTCP Sysfs variables > /proc/sys/net/mptcp/* Variables > =============================== > > -enabled - BOOLEAN > - Control whether MPTCP sockets can be created. > - > - MPTCP sockets can be created if the value is 1. This is a > - per-namespace sysctl. > - > - Default: 1 (enabled) > - > add_addr_timeout - INTEGER (seconds) > Set the timeout after which an ADD_ADDR control message will be > resent to an MPTCP peer that has not acknowledged a previous > @@ -25,25 +17,6 @@ add_addr_timeout - INTEGER (seconds) > > Default: 120 > > -close_timeout - INTEGER (seconds) > - Set the make-after-break timeout: in absence of any close or > - shutdown syscall, MPTCP sockets will maintain the status > - unchanged for such time, after the last subflow removal, before > - moving to TCP_CLOSE. > - > - The default value matches TCP_TIMEWAIT_LEN. This is a per-namespace > - sysctl. > - > - Default: 60 > - > -checksum_enabled - BOOLEAN > - Control whether DSS checksum can be enabled. > - > - DSS checksum can be enabled if the value is nonzero. This is a > - per-namespace sysctl. > - > - Default: 0 > - > allow_join_initial_addr_port - BOOLEAN > Allow peers to send join requests to the IP address and port number used > by the initial subflow if the value is 1. This controls a flag that is > @@ -57,6 +30,37 @@ allow_join_initial_addr_port - BOOLEAN > > Default: 1 > > +available_schedulers - STRING > + Shows the available schedulers choices that are registered. More packet > + schedulers may be available, but not loaded. > + > +checksum_enabled - BOOLEAN > + Control whether DSS checksum can be enabled. > + > + DSS checksum can be enabled if the value is nonzero. This is a > + per-namespace sysctl. > + > + Default: 0 > + > +close_timeout - INTEGER (seconds) > + Set the make-after-break timeout: in absence of any close or > + shutdown syscall, MPTCP sockets will maintain the status > + unchanged for such time, after the last subflow removal, before > + moving to TCP_CLOSE. > + > + The default value matches TCP_TIMEWAIT_LEN. This is a per-namespace > + sysctl. > + > + Default: 60 > + > +enabled - BOOLEAN > + Control whether MPTCP sockets can be created. > + > + MPTCP sockets can be created if the value is 1. This is a > + per-namespace sysctl. > + > + Default: 1 (enabled) > + > pm_type - INTEGER > Set the default path manager type to use for each new MPTCP > socket. In-kernel path management will control subflow > @@ -74,6 +78,14 @@ pm_type - INTEGER > > Default: 0 > > +scheduler - STRING > + Select the scheduler of your choice. > + > + Support for selection of different schedulers. This is a per-namespace > + sysctl. > + > + Default: "default" > + > stale_loss_cnt - INTEGER > The number of MPTCP-level retransmission intervals with no traffic and > pending outstanding data on a given subflow required to declare it stale. > @@ -85,15 +97,3 @@ stale_loss_cnt - INTEGER > This is a per-namespace sysctl. > > Default: 4 > - > -scheduler - STRING > - Select the scheduler of your choice. > - > - Support for selection of different schedulers. This is a per-namespace > - sysctl. > - > - Default: "default" > - > -available_schedulers - STRING > - Shows the available schedulers choices that are registered. More packet > - schedulers may be available, but not loaded. > > -- > 2.43.0 > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH mptcp-next 2/3] doc: mptcp: alphabetical order 2024-05-17 20:37 ` Mat Martineau @ 2024-05-18 15:46 ` Matthieu Baerts 0 siblings, 0 replies; 10+ messages in thread From: Matthieu Baerts @ 2024-05-18 15:46 UTC (permalink / raw) To: Mat Martineau; +Cc: mptcp On 17/05/2024 22:37, Mat Martineau wrote: > On Fri, 17 May 2024, Matthieu Baerts (NGI0) wrote: > >> Similar to what is done in other 'sysctl' pages. >> >> Also, by not putting new entries at the end, this can help to reduce >> conflicts in case of backports. >> > > Putting these in order makes sense to me from a readability perspective. > This does replace one backporting problem with another, but we don't > change the information here too often. Thank you, good point, I will update the commit message in the v2 to make this clearer :) Cheers, Matt -- Sponsored by the NGI0 Core fund. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH mptcp-next 3/3] doc: new 'mptcp' page in 'networking' 2024-05-17 17:40 [PATCH mptcp-next 0/3] doc: introduce MPTCP global doc Matthieu Baerts (NGI0) 2024-05-17 17:40 ` [PATCH mptcp-next 1/3] doc: mptcp: add missing 'available_schedulers' entry Matthieu Baerts (NGI0) 2024-05-17 17:40 ` [PATCH mptcp-next 2/3] doc: mptcp: alphabetical order Matthieu Baerts (NGI0) @ 2024-05-17 17:40 ` Matthieu Baerts (NGI0) 2024-05-17 20:43 ` Mat Martineau 2024-05-17 18:28 ` [PATCH mptcp-next 0/3] doc: introduce MPTCP global doc MPTCP CI 3 siblings, 1 reply; 10+ messages in thread From: Matthieu Baerts (NGI0) @ 2024-05-17 17:40 UTC (permalink / raw) To: mptcp; +Cc: Matthieu Baerts (NGI0) A global documentation about MPTCP was missing since its introduction in v5.6. Most of what is there comes from our recently updated mptcp.dev website, with additional links to resources from the kernel documentation. This is a first version, mainly targeting app developers and users. Link: https://www.mptcp.dev Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- Documentation/networking/index.rst | 1 + Documentation/networking/mptcp.rst | 155 +++++++++++++++++++++++++++++++++++++ MAINTAINERS | 2 +- 3 files changed, 157 insertions(+), 1 deletion(-) diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index 7664c0bfe461..a6443851a142 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -72,6 +72,7 @@ Contents: mac80211-injection mctp mpls-sysctl + mptcp mptcp-sysctl multiqueue multi-pf-netdev diff --git a/Documentation/networking/mptcp.rst b/Documentation/networking/mptcp.rst new file mode 100644 index 000000000000..cb04a553d010 --- /dev/null +++ b/Documentation/networking/mptcp.rst @@ -0,0 +1,155 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===================== +Multipath TCP (MPTCP) +===================== + +Introduction +============ + +Multipath TCP or MPTCP is an extension to the standard TCP and is described in +`RFC 8684 (MPTCPv1) <https://www.rfc-editor.org/rfc/rfc8684.html>`_. It allows a +device to make use of multiple interfaces at once to send and receive TCP +packets over a single MPTCP connection. MPTCP can aggregate the bandwidth of +multiple interfaces or prefer the one with the lowest latency, it also allows a +fail-over if one path is down, and the traffic is seamlessly reinjected on other +paths. + +For more details about Multipath TCP in the Linux kernel, please see the +official website: `mptcp.dev <https://www.mptcp.dev>`. + + +Use cases +========= + +Thanks to MPTCP, being able to use multiple paths in parallel or simultaneously +brings new use-cases, compared to TCP: + +- Seamless handovers: switching from one path to another while preserving + established connections, e.g. to be used in mobility use-cases, like on + smartphones. +- Best network selection: using the "best" available path depending on some + conditions, e.g. latency, losses, cost, bandwidth, etc. +- Network aggregation: using multiple paths at the same time to have a higher + throughput, e.g. to combine fixed and mobile networks to send files faster. + + +Concepts +======== + +Technically, when a new socket is created with the ``IPPROTO_MPTCP`` protocol +(Linux-specific), a *subflow* (or *path*) is created. This *subflow* consists of +a regular TCP connection that is used to transmit data through one interface. +Additional *subflows* can be negotiated later between the hosts. For the remote +host to be able to detect the use of MPTCP, a new field is added to the TCP +*option* field of the underlying TCP *subflow*. This field contains, amongst +other things, a ``MP_CAPABLE`` option that tells the other host to use MPTCP if +it is supported. If the remote host or any middlebox in between does not support +it, the returned ``SYN+ACK`` packet will not contain MPTCP options in the TCP +*option* field. In that case, the connection will be "downgraded" to plain TCP, +and it will continue with a single path. + +This behavior is made possible by two internal components: the path manager, and +the packet scheduler. + +Path Manager +------------ + +The Path Manager is in charge of *subflows*, from creation to deletion, and also +address announcements. Typically, it is the client side that initiates subflows, +and the server side that announces additional addresses via the ``ADD_ADDR`` and +``REMOVE_ADDR`` options. + +Path managers are controlled by the ``net.mptcp.pm_type`` sysctl knob -- see +mptcp-sysctl.rst. There are two types: the in-kernel one (type ``0``) where the +same rules are applied for all the connections (see: ``ip mptcp``) ; and the +userspace one (type ``1``), controlled by a userspace daemon (i.e. `mptcpd +<https://mptcpd.mptcp.dev/>`_) where different rules can be applied for each +connection. The path managers can be controlled via a Netlink API, see +netlink_spec/mptcp_pm.rst. + +To be able to use multiple IP addresses on a host to create multiple *subflows* +(paths), the default in-kernel MPTCP path-manager needs to know which IP +addresses can be used. This can be configured with ``ip mptcp endpoint`` for +example. + +Packet Scheduler +---------------- + +The Packet Scheduler is in charge of selecting which available *subflow(s)* to +use to send the next data packet. It can decide to maximize the use of the +available bandwidth, only to pick the path with the lower latency, or any other +policy depending on the configuration. + +Packet schedulers are controlled by the ``net.mptcp.scheduler`` sysctl knob -- +see mptcp-sysctl.rst. + + +Sockets API +=========== + +Creating MPTCP sockets +---------------------- + +On Linux, MPTCP can be used by selecting MPTCP instead of TCP when creating the +``socket``: + +.. code-block:: C + + int sd = socket(AF_INET(6), SOCK_STREAM, IPPROTO_MPTCP); + +Note that ``IPPROTO_MPTCP`` is defined as ``262``. + +If MPTCP is not supported, ``errno`` will be set to: + +- ``EINVAL``: (*Invalid argument*): MPTCP is not available, on kernels < 5.6. +- ``EPROTONOSUPPORT`` (*Protocol not supported*): MPTCP has not been compiled, + on kernels >= v5.6. +- ``ENOPROTOOPT`` (*Protocol not available*): MPTCP has been disabled using + ``net.mptcp.enabled`` sysctl knob, see mptcp-sysctl.rst. + +MPTCP is then opt-in: applications need to explicitly request it. Note that +applications can be forced to use MPTCP with different techniques, e.g. +``LD_PRELOAD`` (see ``mptcpize``), eBPF (see ``mptcpify``), SystemTAP, +``GODEBUG`` (``GODEBUG=multipathtcp=1``), etc. + +Switching to ``IPPROTO_MPTCP`` instead of ``IPPROTO_TCP`` should be as +transparent as possible for the userspace applications. + +Socket options +-------------- + +MPTCP supports most socket options handled by TCP. It is possible some less +common ones are not supported, but contributions are welcomed. + +Generally, the same value is propagated to all subflows, including the ones +created later. eBPF can be used to set different values per subflows. + +There are some MPTCP specific socket options at the ``SOL_MPTCP`` (284) level to +retrieve info. They fill the ``optval`` buffer of the ``getsockopt()`` system +call: + +- ``MPTCP_INFO``: Uses ``struct mptcp_info``. +- ``MPTCP_TCPINFO``: Uses ``struct mptcp_subflow_data``, followed by an array of + ``struct tcp_info``. +- ``MPTCP_SUBFLOW_ADDRS``: Uses ``struct mptcp_subflow_data``, followed by an + array of ``mptcp_subflow_addrs``. +- ``MPTCP_FULL_INFO``: Uses ``struct mptcp_full_info``, with one pointer to an + array of ``struct mptcp_subflow_info`` (including the + ``struct mptcp_subflow_addrs``), and one pointer to an array of + ``struct tcp_info``, followed by the content of ``struct mptcp_info``. + +Note that at the TCP level, ``TCP_IS_MPTCP`` socket option can be used to know +if MPTCP is still being used: the value will be set to 1 if it is. + + +Design choices +============== + +A new socket type has been added for MPTCP for the userspace-facing socket. The +kernel is in charge of creating subflow sockets: they are TCP sockets where the +behavior is modified using TCP-ULP. + +MPTCP listen sockets will create "plain" *accepted* TCP sockets if the +connection request from the client didn't ask for MPTCP, making the performance +impact minimal when MPTCP is enabled by default. diff --git a/MAINTAINERS b/MAINTAINERS index 50892cdafb25..4edd8a3742f0 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15573,7 +15573,7 @@ B: https://github.com/multipath-tcp/mptcp_net-next/issues T: git https://github.com/multipath-tcp/mptcp_net-next.git export-net T: git https://github.com/multipath-tcp/mptcp_net-next.git export F: Documentation/netlink/specs/mptcp_pm.yaml -F: Documentation/networking/mptcp-sysctl.rst +F: Documentation/networking/mptcp*.rst F: include/net/mptcp.h F: include/trace/events/mptcp.h F: include/uapi/linux/mptcp*.h -- 2.43.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH mptcp-next 3/3] doc: new 'mptcp' page in 'networking' 2024-05-17 17:40 ` [PATCH mptcp-next 3/3] doc: new 'mptcp' page in 'networking' Matthieu Baerts (NGI0) @ 2024-05-17 20:43 ` Mat Martineau 2024-05-18 15:50 ` Matthieu Baerts 0 siblings, 1 reply; 10+ messages in thread From: Mat Martineau @ 2024-05-17 20:43 UTC (permalink / raw) To: Matthieu Baerts (NGI0); +Cc: mptcp On Fri, 17 May 2024, Matthieu Baerts (NGI0) wrote: > A global documentation about MPTCP was missing since its introduction in > v5.6. > > Most of what is there comes from our recently updated mptcp.dev website, > with additional links to resources from the kernel documentation. > > This is a first version, mainly targeting app developers and users. > > Link: https://www.mptcp.dev > Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> > --- > Documentation/networking/index.rst | 1 + > Documentation/networking/mptcp.rst | 155 +++++++++++++++++++++++++++++++++++++ > MAINTAINERS | 2 +- > 3 files changed, 157 insertions(+), 1 deletion(-) > > diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst > index 7664c0bfe461..a6443851a142 100644 > --- a/Documentation/networking/index.rst > +++ b/Documentation/networking/index.rst > @@ -72,6 +72,7 @@ Contents: > mac80211-injection > mctp > mpls-sysctl > + mptcp > mptcp-sysctl > multiqueue > multi-pf-netdev > diff --git a/Documentation/networking/mptcp.rst b/Documentation/networking/mptcp.rst > new file mode 100644 > index 000000000000..cb04a553d010 > --- /dev/null > +++ b/Documentation/networking/mptcp.rst > @@ -0,0 +1,155 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +===================== > +Multipath TCP (MPTCP) > +===================== > + > +Introduction > +============ > + > +Multipath TCP or MPTCP is an extension to the standard TCP and is described in > +`RFC 8684 (MPTCPv1) <https://www.rfc-editor.org/rfc/rfc8684.html>`_. It allows a > +device to make use of multiple interfaces at once to send and receive TCP > +packets over a single MPTCP connection. MPTCP can aggregate the bandwidth of > +multiple interfaces or prefer the one with the lowest latency, it also allows a > +fail-over if one path is down, and the traffic is seamlessly reinjected on other > +paths. > + > +For more details about Multipath TCP in the Linux kernel, please see the > +official website: `mptcp.dev <https://www.mptcp.dev>`. > + > + > +Use cases > +========= > + > +Thanks to MPTCP, being able to use multiple paths in parallel or simultaneously > +brings new use-cases, compared to TCP: > + > +- Seamless handovers: switching from one path to another while preserving > + established connections, e.g. to be used in mobility use-cases, like on > + smartphones. > +- Best network selection: using the "best" available path depending on some > + conditions, e.g. latency, losses, cost, bandwidth, etc. > +- Network aggregation: using multiple paths at the same time to have a higher > + throughput, e.g. to combine fixed and mobile networks to send files faster. > + > + > +Concepts > +======== > + > +Technically, when a new socket is created with the ``IPPROTO_MPTCP`` protocol > +(Linux-specific), a *subflow* (or *path*) is created. This *subflow* consists of > +a regular TCP connection that is used to transmit data through one interface. > +Additional *subflows* can be negotiated later between the hosts. For the remote > +host to be able to detect the use of MPTCP, a new field is added to the TCP > +*option* field of the underlying TCP *subflow*. This field contains, amongst > +other things, a ``MP_CAPABLE`` option that tells the other host to use MPTCP if > +it is supported. If the remote host or any middlebox in between does not support > +it, the returned ``SYN+ACK`` packet will not contain MPTCP options in the TCP > +*option* field. In that case, the connection will be "downgraded" to plain TCP, > +and it will continue with a single path. > + > +This behavior is made possible by two internal components: the path manager, and > +the packet scheduler. > + > +Path Manager > +------------ > + > +The Path Manager is in charge of *subflows*, from creation to deletion, and also > +address announcements. Typically, it is the client side that initiates subflows, > +and the server side that announces additional addresses via the ``ADD_ADDR`` and > +``REMOVE_ADDR`` options. > + > +Path managers are controlled by the ``net.mptcp.pm_type`` sysctl knob -- see > +mptcp-sysctl.rst. There are two types: the in-kernel one (type ``0``) where the > +same rules are applied for all the connections (see: ``ip mptcp``) ; and the > +userspace one (type ``1``), controlled by a userspace daemon (i.e. `mptcpd > +<https://mptcpd.mptcp.dev/>`_) where different rules can be applied for each > +connection. The path managers can be controlled via a Netlink API, see > +netlink_spec/mptcp_pm.rst. > + > +To be able to use multiple IP addresses on a host to create multiple *subflows* > +(paths), the default in-kernel MPTCP path-manager needs to know which IP > +addresses can be used. This can be configured with ``ip mptcp endpoint`` for > +example. > + > +Packet Scheduler > +---------------- > + > +The Packet Scheduler is in charge of selecting which available *subflow(s)* to > +use to send the next data packet. It can decide to maximize the use of the > +available bandwidth, only to pick the path with the lower latency, or any other > +policy depending on the configuration. > + > +Packet schedulers are controlled by the ``net.mptcp.scheduler`` sysctl knob -- > +see mptcp-sysctl.rst. > + > + > +Sockets API > +=========== > + > +Creating MPTCP sockets > +---------------------- > + > +On Linux, MPTCP can be used by selecting MPTCP instead of TCP when creating the > +``socket``: > + > +.. code-block:: C > + > + int sd = socket(AF_INET(6), SOCK_STREAM, IPPROTO_MPTCP); > + > +Note that ``IPPROTO_MPTCP`` is defined as ``262``. > + > +If MPTCP is not supported, ``errno`` will be set to: > + > +- ``EINVAL``: (*Invalid argument*): MPTCP is not available, on kernels < 5.6. > +- ``EPROTONOSUPPORT`` (*Protocol not supported*): MPTCP has not been compiled, > + on kernels >= v5.6. > +- ``ENOPROTOOPT`` (*Protocol not available*): MPTCP has been disabled using > + ``net.mptcp.enabled`` sysctl knob, see mptcp-sysctl.rst. > + > +MPTCP is then opt-in: applications need to explicitly request it. Note that > +applications can be forced to use MPTCP with different techniques, e.g. > +``LD_PRELOAD`` (see ``mptcpize``), eBPF (see ``mptcpify``), SystemTAP, > +``GODEBUG`` (``GODEBUG=multipathtcp=1``), etc. > + > +Switching to ``IPPROTO_MPTCP`` instead of ``IPPROTO_TCP`` should be as > +transparent as possible for the userspace applications. > + Hi Matthieu - Looks like most of the above is verbatim from mptcp.dev - LGTM > +Socket options > +-------------- > + > +MPTCP supports most socket options handled by TCP. It is possible some less > +common ones are not supported, but contributions are welcomed. "common options are not supported, but contributions are welcome." > + > +Generally, the same value is propagated to all subflows, including the ones > +created later. eBPF can be used to set different values per subflows. Same edit here as the mptcp.dev PR: "created after the calls to `setsockopt()`. eBPF can be used to set different values per subflow." > + > +There are some MPTCP specific socket options at the ``SOL_MPTCP`` (284) level to > +retrieve info. They fill the ``optval`` buffer of the ``getsockopt()`` system > +call: > + > +- ``MPTCP_INFO``: Uses ``struct mptcp_info``. > +- ``MPTCP_TCPINFO``: Uses ``struct mptcp_subflow_data``, followed by an array of > + ``struct tcp_info``. > +- ``MPTCP_SUBFLOW_ADDRS``: Uses ``struct mptcp_subflow_data``, followed by an > + array of ``mptcp_subflow_addrs``. > +- ``MPTCP_FULL_INFO``: Uses ``struct mptcp_full_info``, with one pointer to an > + array of ``struct mptcp_subflow_info`` (including the > + ``struct mptcp_subflow_addrs``), and one pointer to an array of > + ``struct tcp_info``, followed by the content of ``struct mptcp_info``. > + > +Note that at the TCP level, ``TCP_IS_MPTCP`` socket option can be used to know > +if MPTCP is still being used: the value will be set to 1 if it is. s/still/currently/ > + > + > +Design choices > +============== > + > +A new socket type has been added for MPTCP for the userspace-facing socket. The > +kernel is in charge of creating subflow sockets: they are TCP sockets where the > +behavior is modified using TCP-ULP. > + > +MPTCP listen sockets will create "plain" *accepted* TCP sockets if the > +connection request from the client didn't ask for MPTCP, making the performance > +impact minimal when MPTCP is enabled by default. > diff --git a/MAINTAINERS b/MAINTAINERS > index 50892cdafb25..4edd8a3742f0 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -15573,7 +15573,7 @@ B: https://github.com/multipath-tcp/mptcp_net-next/issues > T: git https://github.com/multipath-tcp/mptcp_net-next.git export-net > T: git https://github.com/multipath-tcp/mptcp_net-next.git export > F: Documentation/netlink/specs/mptcp_pm.yaml > -F: Documentation/networking/mptcp-sysctl.rst > +F: Documentation/networking/mptcp*.rst > F: include/net/mptcp.h > F: include/trace/events/mptcp.h > F: include/uapi/linux/mptcp*.h > Thanks for adding this document! Reviewed-by: Mat Martineau <martineau@kernel.org> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH mptcp-next 3/3] doc: new 'mptcp' page in 'networking' 2024-05-17 20:43 ` Mat Martineau @ 2024-05-18 15:50 ` Matthieu Baerts 0 siblings, 0 replies; 10+ messages in thread From: Matthieu Baerts @ 2024-05-18 15:50 UTC (permalink / raw) To: Mat Martineau; +Cc: mptcp Hi Mat, On 17/05/2024 22:43, Mat Martineau wrote: > On Fri, 17 May 2024, Matthieu Baerts (NGI0) wrote: (...) > Looks like most of the above is verbatim from mptcp.dev - LGTM Thank you for the code review! > >> +Socket options >> +-------------- >> + >> +MPTCP supports most socket options handled by TCP. It is possible >> some less >> +common ones are not supported, but contributions are welcomed. > > "common options are not supported, but contributions are welcome." > >> + >> +Generally, the same value is propagated to all subflows, including >> the ones >> +created later. eBPF can be used to set different values per subflows. > > Same edit here as the mptcp.dev PR: > > "created after the calls to `setsockopt()`. eBPF can be used to set > different values per subflow." (sorry for the duplicated error :) ) I just applied this modification, but using two pairs of backquotes around 'setsockopt()', because the RST format is used here. Cheers, Matt -- Sponsored by the NGI0 Core fund. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH mptcp-next 0/3] doc: introduce MPTCP global doc 2024-05-17 17:40 [PATCH mptcp-next 0/3] doc: introduce MPTCP global doc Matthieu Baerts (NGI0) ` (2 preceding siblings ...) 2024-05-17 17:40 ` [PATCH mptcp-next 3/3] doc: new 'mptcp' page in 'networking' Matthieu Baerts (NGI0) @ 2024-05-17 18:28 ` MPTCP CI 3 siblings, 0 replies; 10+ messages in thread From: MPTCP CI @ 2024-05-17 18:28 UTC (permalink / raw) To: Matthieu Baerts; +Cc: mptcp Hi Matthieu, Thank you for your modifications, that's great! Our CI did some validations and here is its report: - KVM Validation: normal: Success! ✅ - KVM Validation: debug: Success! ✅ - KVM Validation: btf (only bpftest_all): Success! ✅ - Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/9132273070 Initiator: Patchew Applier Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/8d1e5a53dec8 Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=854078 If there are some issues, you can reproduce them using the same environment as the one used by the CI thanks to a docker image, e.g.: $ cd [kernel source code] $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \ --pull always mptcp/mptcp-upstream-virtme-docker:latest \ auto-normal For more details: https://github.com/multipath-tcp/mptcp-upstream-virtme-docker Please note that despite all the efforts that have been already done to have a stable tests suite when executed on a public CI like here, it is possible some reported issues are not due to your modifications. Still, do not hesitate to help us improve that ;-) Cheers, MPTCP GH Action bot Bot operated by Matthieu Baerts (NGI0 Core) ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-05-18 15:50 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-05-17 17:40 [PATCH mptcp-next 0/3] doc: introduce MPTCP global doc Matthieu Baerts (NGI0) 2024-05-17 17:40 ` [PATCH mptcp-next 1/3] doc: mptcp: add missing 'available_schedulers' entry Matthieu Baerts (NGI0) 2024-05-17 20:34 ` Mat Martineau 2024-05-17 17:40 ` [PATCH mptcp-next 2/3] doc: mptcp: alphabetical order Matthieu Baerts (NGI0) 2024-05-17 20:37 ` Mat Martineau 2024-05-18 15:46 ` Matthieu Baerts 2024-05-17 17:40 ` [PATCH mptcp-next 3/3] doc: new 'mptcp' page in 'networking' Matthieu Baerts (NGI0) 2024-05-17 20:43 ` Mat Martineau 2024-05-18 15:50 ` Matthieu Baerts 2024-05-17 18:28 ` [PATCH mptcp-next 0/3] doc: introduce MPTCP global doc MPTCP CI
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.