netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/3] ip: add RECVFRAGSIZE cmsg
@ 2016-11-02 15:02 Willem de Bruijn
  2016-11-02 15:02 ` [PATCH net-next 1/3] ipv4: add IP_RECVFRAGSIZE cmsg Willem de Bruijn
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Willem de Bruijn @ 2016-11-02 15:02 UTC (permalink / raw)
  To: netdev; +Cc: jdorfman, eric.dumazet, davem, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

On IP datagrams and raw sockets, when packets arrive fragmented,
expose the largest received fragment size through a new cmsg.

Protocols implemented on top of these sockets may use this, for
instance, to inform peers to lower MSS on platforms that silently
allow send calls to exceed PMTU and cause fragmentation.

Willem de Bruijn (3):
  ipv4: add IP_RECVFRAGSIZE cmsg
  ipv6: add IPV6_RECVFRAGSIZE cmsg
  ipv6: on reassembly, record frag_max_size

 include/linux/ipv6.h     |  5 +++--
 include/net/inet_sock.h  |  1 +
 include/uapi/linux/in.h  |  1 +
 include/uapi/linux/in6.h |  1 +
 net/ipv4/ip_sockglue.c   | 26 ++++++++++++++++++++++++++
 net/ipv6/datagram.c      |  5 +++++
 net/ipv6/ipv6_sockglue.c |  8 ++++++++
 net/ipv6/reassembly.c    |  7 ++++++-
 8 files changed, 51 insertions(+), 3 deletions(-)

-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net-next 1/3] ipv4: add IP_RECVFRAGSIZE cmsg
  2016-11-02 15:02 [PATCH net-next 0/3] ip: add RECVFRAGSIZE cmsg Willem de Bruijn
@ 2016-11-02 15:02 ` Willem de Bruijn
  2016-11-02 15:52   ` Eric Dumazet
  2016-11-02 15:02 ` [PATCH net-next 2/3] ipv6: add IPV6_RECVFRAGSIZE cmsg Willem de Bruijn
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Willem de Bruijn @ 2016-11-02 15:02 UTC (permalink / raw)
  To: netdev; +Cc: jdorfman, eric.dumazet, davem, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

The IP stack records the largest fragment of a reassembled packet
in IPCB(skb)->frag_max_size. When reading a datagram or raw packet
that arrived fragmented, expose the value to allow applications to
estimate receive path MTU.

Tested:
  Sent data over a veth pair of which the source has a small mtu.
  Sent data using netcat, received using a dedicated process.

  Verified that the cmsg IP_RECVFRAGSIZE is returned only when
  data arrives fragmented, and in that cases matches the veth mtu.

    ip link add veth0 type veth peer name veth1

    ip netns add from
    ip netns add to

    ip link set dev veth1 netns to
    ip netns exec to ip addr add dev veth1 192.168.10.1/24
    ip netns exec to ip link set dev veth1 up

    ip link set dev veth0 netns from
    ip netns exec from ip addr add dev veth0 192.168.10.2/24
    ip netns exec from ip link set dev veth0 up
    ip netns exec from ip link set dev veth0 mtu 1300
    ip netns exec from ethtool -K veth0 ufo off

    dd if=/dev/zero bs=1 count=1400 2>/dev/null > payload

    ip netns exec to ./recv_cmsg_recvfragsize -4 -u -p 6000 &
    ip netns exec from nc -q 1 -u 192.168.10.1 6000 < payload

  using github.com/wdebruij/kerneltools/blob/master/tests/recvfragsize.c

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/net/inet_sock.h |  1 +
 include/uapi/linux/in.h |  1 +
 net/ipv4/ip_sockglue.c  | 26 ++++++++++++++++++++++++++
 3 files changed, 28 insertions(+)

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 236a810..c9cff97 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -228,6 +228,7 @@ struct inet_sock {
 #define IP_CMSG_PASSSEC		BIT(5)
 #define IP_CMSG_ORIGDSTADDR	BIT(6)
 #define IP_CMSG_CHECKSUM	BIT(7)
+#define IP_CMSG_RECVFRAGSIZE	BIT(8)
 
 /**
  * sk_to_full_sk - Access to a full socket
diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h
index eaf9491..4e557f4 100644
--- a/include/uapi/linux/in.h
+++ b/include/uapi/linux/in.h
@@ -117,6 +117,7 @@ struct in_addr {
 #define IP_NODEFRAG     22
 #define IP_CHECKSUM	23
 #define IP_BIND_ADDRESS_NO_PORT	24
+#define IP_RECVFRAGSIZE	25
 
 /* IP_MTU_DISCOVER values */
 #define IP_PMTUDISC_DONT		0	/* Never send DF frames */
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index b8a2d63..ecbaae2 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -97,6 +97,17 @@ static void ip_cmsg_recv_retopts(struct msghdr *msg, struct sk_buff *skb)
 	put_cmsg(msg, SOL_IP, IP_RETOPTS, opt->optlen, opt->__data);
 }
 
+static void ip_cmsg_recv_fragsize(struct msghdr *msg, struct sk_buff *skb)
+{
+	int val;
+
+	if (IPCB(skb)->frag_max_size == 0)
+		return;
+
+	val = IPCB(skb)->frag_max_size;
+	put_cmsg(msg, SOL_IP, IP_RECVFRAGSIZE, sizeof(val), &val);
+}
+
 static void ip_cmsg_recv_checksum(struct msghdr *msg, struct sk_buff *skb,
 				  int tlen, int offset)
 {
@@ -218,6 +229,9 @@ void ip_cmsg_recv_offset(struct msghdr *msg, struct sk_buff *skb,
 
 	if (flags & IP_CMSG_CHECKSUM)
 		ip_cmsg_recv_checksum(msg, skb, tlen, offset);
+
+	if (flags & IP_CMSG_RECVFRAGSIZE)
+		ip_cmsg_recv_fragsize(msg, skb);
 }
 EXPORT_SYMBOL(ip_cmsg_recv_offset);
 
@@ -614,6 +628,7 @@ static int do_ip_setsockopt(struct sock *sk, int level,
 	case IP_MULTICAST_LOOP:
 	case IP_RECVORIGDSTADDR:
 	case IP_CHECKSUM:
+	case IP_RECVFRAGSIZE:
 		if (optlen >= sizeof(int)) {
 			if (get_user(val, (int __user *) optval))
 				return -EFAULT;
@@ -726,6 +741,14 @@ static int do_ip_setsockopt(struct sock *sk, int level,
 			}
 		}
 		break;
+	case IP_RECVFRAGSIZE:
+		if (sk->sk_type != SOCK_RAW && sk->sk_type != SOCK_DGRAM)
+			goto e_inval;
+		if (val)
+			inet->cmsg_flags |= IP_CMSG_RECVFRAGSIZE;
+		else
+			inet->cmsg_flags &= ~IP_CMSG_RECVFRAGSIZE;
+		break;
 	case IP_TOS:	/* This sets both TOS and Precedence */
 		if (sk->sk_type == SOCK_STREAM) {
 			val &= ~INET_ECN_MASK;
@@ -1357,6 +1380,9 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
 	case IP_CHECKSUM:
 		val = (inet->cmsg_flags & IP_CMSG_CHECKSUM) != 0;
 		break;
+	case IP_RECVFRAGSIZE:
+		val = (inet->cmsg_flags & IP_CMSG_RECVFRAGSIZE) != 0;
+		break;
 	case IP_TOS:
 		val = inet->tos;
 		break;
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next 2/3] ipv6: add IPV6_RECVFRAGSIZE cmsg
  2016-11-02 15:02 [PATCH net-next 0/3] ip: add RECVFRAGSIZE cmsg Willem de Bruijn
  2016-11-02 15:02 ` [PATCH net-next 1/3] ipv4: add IP_RECVFRAGSIZE cmsg Willem de Bruijn
@ 2016-11-02 15:02 ` Willem de Bruijn
  2016-11-02 15:53   ` Eric Dumazet
  2016-11-02 15:02 ` [PATCH net-next 3/3] ipv6: on reassembly, record frag_max_size Willem de Bruijn
  2016-11-03 19:41 ` [PATCH net-next 0/3] ip: add RECVFRAGSIZE cmsg David Miller
  3 siblings, 1 reply; 8+ messages in thread
From: Willem de Bruijn @ 2016-11-02 15:02 UTC (permalink / raw)
  To: netdev; +Cc: jdorfman, eric.dumazet, davem, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

When reading a datagram or raw packet that arrived fragmented, expose
the maximum fragment size if recorded to allow applications to
estimate receive path MTU.

At this point, the field is only recorded when ipv6 connection
tracking is enabled. A follow-up patch will record this field also
in the ipv6 input path.

Tested using the test for IP_RECVFRAGSIZE plus

  ip netns exec to ip addr add dev veth1 fc07::1/64
  ip netns exec from ip addr add dev veth0 fc07::2/64

  ip netns exec to ./recv_cmsg_recvfragsize -6 -u -p 6000 &
  ip netns exec from nc -q 1 -u fc07::1 6000 < payload

Both with and without enabling connection tracking

  ip6tables -A INPUT -m state --state NEW -p udp -j LOG

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/linux/ipv6.h     | 5 +++--
 include/uapi/linux/in6.h | 1 +
 net/ipv6/datagram.c      | 5 +++++
 net/ipv6/ipv6_sockglue.c | 8 ++++++++
 4 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index ca1ad9e..1afb6e8 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -229,8 +229,9 @@ struct ipv6_pinfo {
                                 rxflow:1,
 				rxtclass:1,
 				rxpmtu:1,
-				rxorigdstaddr:1;
-				/* 2 bits hole */
+				rxorigdstaddr:1,
+				recvfragsize:1;
+				/* 1 bits hole */
 		} bits;
 		__u16		all;
 	} rxopt;
diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h
index b39ea4f..46444f8 100644
--- a/include/uapi/linux/in6.h
+++ b/include/uapi/linux/in6.h
@@ -283,6 +283,7 @@ struct in6_flowlabel_req {
 #define IPV6_RECVORIGDSTADDR    IPV6_ORIGDSTADDR
 #define IPV6_TRANSPARENT        75
 #define IPV6_UNICAST_IF         76
+#define IPV6_RECVFRAGSIZE	77
 
 /*
  * Multicast Routing:
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 37874e2..620c79a 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -715,6 +715,11 @@ void ip6_datagram_recv_specific_ctl(struct sock *sk, struct msghdr *msg,
 			put_cmsg(msg, SOL_IPV6, IPV6_ORIGDSTADDR, sizeof(sin6), &sin6);
 		}
 	}
+	if (np->rxopt.bits.recvfragsize && opt->frag_max_size) {
+		int val = opt->frag_max_size;
+
+		put_cmsg(msg, SOL_IPV6, IPV6_RECVFRAGSIZE, sizeof(val), &val);
+	}
 }
 
 void ip6_datagram_recv_ctl(struct sock *sk, struct msghdr *msg,
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 636ec56..6c12678 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -868,6 +868,10 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 		np->autoflowlabel = valbool;
 		retv = 0;
 		break;
+	case IPV6_RECVFRAGSIZE:
+		np->rxopt.bits.recvfragsize = valbool;
+		retv = 0;
+		break;
 	}
 
 	release_sock(sk);
@@ -1310,6 +1314,10 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
 		val = np->autoflowlabel;
 		break;
 
+	case IPV6_RECVFRAGSIZE:
+		val = np->rxopt.bits.recvfragsize;
+		break;
+
 	default:
 		return -ENOPROTOOPT;
 	}
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next 3/3] ipv6: on reassembly, record frag_max_size
  2016-11-02 15:02 [PATCH net-next 0/3] ip: add RECVFRAGSIZE cmsg Willem de Bruijn
  2016-11-02 15:02 ` [PATCH net-next 1/3] ipv4: add IP_RECVFRAGSIZE cmsg Willem de Bruijn
  2016-11-02 15:02 ` [PATCH net-next 2/3] ipv6: add IPV6_RECVFRAGSIZE cmsg Willem de Bruijn
@ 2016-11-02 15:02 ` Willem de Bruijn
  2016-11-02 15:55   ` Eric Dumazet
  2016-11-03 19:41 ` [PATCH net-next 0/3] ip: add RECVFRAGSIZE cmsg David Miller
  3 siblings, 1 reply; 8+ messages in thread
From: Willem de Bruijn @ 2016-11-02 15:02 UTC (permalink / raw)
  To: netdev; +Cc: jdorfman, eric.dumazet, davem, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

IP6CB and IPCB have a frag_max_size field. In IPv6 this field is
filled in when packets are reassembled by the connection tracking
code. Also fill in when reassembling in the input path, to expose
it through cmsg IPV6_RECVFRAGSIZE in all cases.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 net/ipv6/reassembly.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 3815e85..e1da5b8 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -211,7 +211,7 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
 {
 	struct sk_buff *prev, *next;
 	struct net_device *dev;
-	int offset, end;
+	int offset, end, fragsize;
 	struct net *net = dev_net(skb_dst(skb)->dev);
 	u8 ecn;
 
@@ -336,6 +336,10 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
 	fq->ecn |= ecn;
 	add_frag_mem_limit(fq->q.net, skb->truesize);
 
+	fragsize = -skb_network_offset(skb) + skb->len;
+	if (fragsize > fq->q.max_size)
+		fq->q.max_size = fragsize;
+
 	/* The first fragment.
 	 * nhoffset is obtained from the first fragment, of course.
 	 */
@@ -495,6 +499,7 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
 	ipv6_change_dsfield(ipv6_hdr(head), 0xff, ecn);
 	IP6CB(head)->nhoff = nhoff;
 	IP6CB(head)->flags |= IP6SKB_FRAGMENTED;
+	IP6CB(head)->frag_max_size = fq->q.max_size;
 
 	/* Yes, and fold redundant checksum back. 8) */
 	skb_postpush_rcsum(head, skb_network_header(head),
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 1/3] ipv4: add IP_RECVFRAGSIZE cmsg
  2016-11-02 15:02 ` [PATCH net-next 1/3] ipv4: add IP_RECVFRAGSIZE cmsg Willem de Bruijn
@ 2016-11-02 15:52   ` Eric Dumazet
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2016-11-02 15:52 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: netdev, jdorfman, davem, Willem de Bruijn

On Wed, 2016-11-02 at 11:02 -0400, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> The IP stack records the largest fragment of a reassembled packet
> in IPCB(skb)->frag_max_size. When reading a datagram or raw packet
> that arrived fragmented, expose the value to allow applications to
> estimate receive path MTU.

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 2/3] ipv6: add IPV6_RECVFRAGSIZE cmsg
  2016-11-02 15:02 ` [PATCH net-next 2/3] ipv6: add IPV6_RECVFRAGSIZE cmsg Willem de Bruijn
@ 2016-11-02 15:53   ` Eric Dumazet
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2016-11-02 15:53 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: netdev, jdorfman, davem, Willem de Bruijn

On Wed, 2016-11-02 at 11:02 -0400, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> When reading a datagram or raw packet that arrived fragmented, expose
> the maximum fragment size if recorded to allow applications to
> estimate receive path MTU.

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 3/3] ipv6: on reassembly, record frag_max_size
  2016-11-02 15:02 ` [PATCH net-next 3/3] ipv6: on reassembly, record frag_max_size Willem de Bruijn
@ 2016-11-02 15:55   ` Eric Dumazet
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2016-11-02 15:55 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: netdev, jdorfman, davem, Willem de Bruijn

On Wed, 2016-11-02 at 11:02 -0400, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> IP6CB and IPCB have a frag_max_size field. In IPv6 this field is
> filled in when packets are reassembled by the connection tracking
> code. Also fill in when reassembling in the input path, to expose
> it through cmsg IPV6_RECVFRAGSIZE in all cases.
> 
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 0/3] ip: add RECVFRAGSIZE cmsg
  2016-11-02 15:02 [PATCH net-next 0/3] ip: add RECVFRAGSIZE cmsg Willem de Bruijn
                   ` (2 preceding siblings ...)
  2016-11-02 15:02 ` [PATCH net-next 3/3] ipv6: on reassembly, record frag_max_size Willem de Bruijn
@ 2016-11-03 19:41 ` David Miller
  3 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2016-11-03 19:41 UTC (permalink / raw)
  To: willemdebruijn.kernel; +Cc: netdev, jdorfman, eric.dumazet, willemb

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Wed,  2 Nov 2016 11:02:15 -0400

> On IP datagrams and raw sockets, when packets arrive fragmented,
> expose the largest received fragment size through a new cmsg.
> 
> Protocols implemented on top of these sockets may use this, for
> instance, to inform peers to lower MSS on platforms that silently
> allow send calls to exceed PMTU and cause fragmentation.

Looks good, series applied, thanks Willem.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-11-03 19:41 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-02 15:02 [PATCH net-next 0/3] ip: add RECVFRAGSIZE cmsg Willem de Bruijn
2016-11-02 15:02 ` [PATCH net-next 1/3] ipv4: add IP_RECVFRAGSIZE cmsg Willem de Bruijn
2016-11-02 15:52   ` Eric Dumazet
2016-11-02 15:02 ` [PATCH net-next 2/3] ipv6: add IPV6_RECVFRAGSIZE cmsg Willem de Bruijn
2016-11-02 15:53   ` Eric Dumazet
2016-11-02 15:02 ` [PATCH net-next 3/3] ipv6: on reassembly, record frag_max_size Willem de Bruijn
2016-11-02 15:55   ` Eric Dumazet
2016-11-03 19:41 ` [PATCH net-next 0/3] ip: add RECVFRAGSIZE cmsg David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).