* [PATCH net 0/3] yet another new mtu discovery mode
@ 2014-02-26 0:20 Hannes Frederic Sowa
2014-02-26 0:20 ` [PATCH net 1/3] ipv4: use ip_skb_dst_mtu to determine mtu in ip_fragment Hannes Frederic Sowa
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: Hannes Frederic Sowa @ 2014-02-26 0:20 UTC (permalink / raw)
To: netdev
Hi!
After my proposal for weakening the IP_PMTUDISC_INTERFACE mode to
produce fragments if the packet size exceeds the outgoing interface mtu
was rightfully rejected, I am now in the really bad position to have to
propose yet another IP_MTU_DISCOVER mode (I don't like to do that at all,
especially because I argued so favorably for the IP_PMTUDISC_INTERFACE
mode :( ).
Currently unbound and bind kind of use a fire-and-forget logic in
the UDP output path and are not propagating errors back. To use
IP_PMTUDISC_INTERFACE correctly, I would need to have access to the
unserialized dns data while trying the first send, checking for EMSGSIZE
and in case the syscall failed with EMSGSIZE, altering the dns data by
removing everything but question section and setting TC=1 and remembering
the new minimal outgoing udp packet size globally. The unserialized dns
data is mostly not available at that point the sending takes place.
That said, I propose this slightly weaker version of
IP_PMTUDISC_INTERFACE, which allows to send fragments if the packet
size exceeds the outgoing interface mtu. As such, I can now just change
IP_PMTUDISC_DONT, which is currently in use by dns software, to this new
option and we finally have fragmentation avoidance in dns. This option
seems much easier to support and will find users fast.
I propose this for the net branch, because I currently classify the logic
of IP_PMTUDISC_INTERFACE as flawed and want to fix this with this new
option. Hopefully IP_PMTUDISC_INTERFACE does still serve a purpose for
someone out there. Because of the preparations for IP_PMTUDISC_INTERFACE
the changes are not too big and it luckily doesn't need additional space
in the sock structs.
The first patch ensures we cut the packet into pieces of the size of
the interface mtu and not of the pmtu in case of IP_PMTUDISC_INTERFACE.
Thanks and very sorry for bloating kernel api,
Hannes
Included patches:
ipv4: use ip_skb_dst_mtu instead of
ipv4: yet another new IP_MTU_DISCOVER option
ipv6: yet another new IPV6_MTU_DISCOVER option
Diffstat:
include/net/ip.h | 9 ++++++++-
include/net/ip6_route.h | 9 ++++++++-
include/uapi/linux/in.h | 4 ++++
include/uapi/linux/in6.h | 4 ++++
net/ipv4/ip_output.c | 12 ++++--------
net/ipv4/ip_sockglue.c | 2 +-
net/ipv6/ip6_output.c | 9 +++++----
net/ipv6/ipv6_sockglue.c | 2 +-
8 files changed, 35 insertions(+), 16 deletions(-)
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH net 1/3] ipv4: use ip_skb_dst_mtu to determine mtu in ip_fragment
2014-02-26 0:20 [PATCH net 0/3] yet another new mtu discovery mode Hannes Frederic Sowa
@ 2014-02-26 0:20 ` Hannes Frederic Sowa
2014-02-26 0:20 ` [PATCH net 2/3] ipv4: yet another new IP_MTU_DISCOVER option IP_PMTUDISC_OMIT Hannes Frederic Sowa
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Hannes Frederic Sowa @ 2014-02-26 0:20 UTC (permalink / raw)
To: netdev
ip_skb_dst_mtu mostly falls back to ip_dst_mtu_maybe_forward if no socket
is attached to the skb (in case of forwarding) or determines the mtu like
we do in ip_finish_output, which actually checks if we should branch to
ip_fragment. Thus use the same function to determine the mtu here, too.
This is important for the introduction of IP_PMTUDISC_OMIT, where we
want the packets getting cut in pieces of the size of the outgoing
interface mtu. IPv6 already does this correctly.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
net/ipv4/ip_output.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 73c6b63..7cf80d2 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -446,7 +446,6 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
__be16 not_last_frag;
struct rtable *rt = skb_rtable(skb);
int err = 0;
- bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED;
dev = rt->dst.dev;
@@ -456,7 +455,7 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
iph = ip_hdr(skb);
- mtu = ip_dst_mtu_maybe_forward(&rt->dst, forwarding);
+ mtu = ip_skb_dst_mtu(skb);
if (unlikely(((iph->frag_off & htons(IP_DF)) && !skb->local_df) ||
(IPCB(skb)->frag_max_size &&
IPCB(skb)->frag_max_size > mtu))) {
--
1.8.5.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH net 2/3] ipv4: yet another new IP_MTU_DISCOVER option IP_PMTUDISC_OMIT
2014-02-26 0:20 [PATCH net 0/3] yet another new mtu discovery mode Hannes Frederic Sowa
2014-02-26 0:20 ` [PATCH net 1/3] ipv4: use ip_skb_dst_mtu to determine mtu in ip_fragment Hannes Frederic Sowa
@ 2014-02-26 0:20 ` Hannes Frederic Sowa
2014-02-26 0:20 ` [PATCH net 3/3] ipv6: yet another new IPV6_MTU_DISCOVER option IPV6_PMTUDISC_OMIT Hannes Frederic Sowa
2014-02-26 4:17 ` [PATCH net 0/3] yet another new mtu discovery mode David Miller
3 siblings, 0 replies; 6+ messages in thread
From: Hannes Frederic Sowa @ 2014-02-26 0:20 UTC (permalink / raw)
To: netdev; +Cc: Florian Weimer
IP_PMTUDISC_INTERFACE has a design error: because it does not allow the
generation of fragments if the interface mtu is exceeded, it is very
hard to make use of this option in already deployed name server software
for which I introduced this option.
This patch adds yet another new IP_MTU_DISCOVER option to not honor any
path mtu information and not accepting new icmp notifications destined for
the socket this option is enabled on. But we allow outgoing fragmentation
in case the packet size exceeds the outgoing interface mtu.
As such this new option can be used as a drop-in replacement for
IP_PMTUDISC_DONT, which is currently in use by most name server software
making the adoption of this option very smooth and easy.
The original advantage of IP_PMTUDISC_INTERFACE is still maintained:
ignoring incoming path MTU updates and not honoring discovered path MTUs
in the output path.
Fixes: 482fc6094afad5 ("ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE")
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
include/net/ip.h | 9 ++++++++-
include/uapi/linux/in.h | 4 ++++
net/ipv4/ip_output.c | 9 +++------
net/ipv4/ip_sockglue.c | 2 +-
4 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/include/net/ip.h b/include/net/ip.h
index 23be0fd..c060efe 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -266,7 +266,8 @@ int ip_dont_fragment(struct sock *sk, struct dst_entry *dst)
static inline bool ip_sk_accept_pmtu(const struct sock *sk)
{
- return inet_sk(sk)->pmtudisc != IP_PMTUDISC_INTERFACE;
+ return inet_sk(sk)->pmtudisc != IP_PMTUDISC_INTERFACE &&
+ inet_sk(sk)->pmtudisc != IP_PMTUDISC_OMIT;
}
static inline bool ip_sk_use_pmtu(const struct sock *sk)
@@ -274,6 +275,12 @@ static inline bool ip_sk_use_pmtu(const struct sock *sk)
return inet_sk(sk)->pmtudisc < IP_PMTUDISC_PROBE;
}
+static inline bool ip_sk_local_df(const struct sock *sk)
+{
+ return inet_sk(sk)->pmtudisc < IP_PMTUDISC_DO ||
+ inet_sk(sk)->pmtudisc == IP_PMTUDISC_OMIT;
+}
+
static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
bool forwarding)
{
diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h
index 393c5de..c33a65e 100644
--- a/include/uapi/linux/in.h
+++ b/include/uapi/linux/in.h
@@ -120,6 +120,10 @@ struct in_addr {
* this socket to prevent accepting spoofed ones.
*/
#define IP_PMTUDISC_INTERFACE 4
+/* weaker version of IP_PMTUDISC_INTERFACE, which allos packets to get
+ * fragmented if they exeed the interface mtu
+ */
+#define IP_PMTUDISC_OMIT 5
#define IP_MULTICAST_IF 32
#define IP_MULTICAST_TTL 33
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 7cf80d2..1a0755f 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -821,8 +821,7 @@ static int __ip_append_data(struct sock *sk,
fragheaderlen = sizeof(struct iphdr) + (opt ? opt->optlen : 0);
maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen;
- maxnonfragsize = (inet->pmtudisc >= IP_PMTUDISC_DO) ?
- mtu : 0xFFFF;
+ maxnonfragsize = ip_sk_local_df(sk) ? 0xFFFF : mtu;
if (cork->length + length > maxnonfragsize - fragheaderlen) {
ip_local_error(sk, EMSGSIZE, fl4->daddr, inet->inet_dport,
@@ -1145,8 +1144,7 @@ ssize_t ip_append_page(struct sock *sk, struct flowi4 *fl4, struct page *page,
fragheaderlen = sizeof(struct iphdr) + (opt ? opt->optlen : 0);
maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen;
- maxnonfragsize = (inet->pmtudisc >= IP_PMTUDISC_DO) ?
- mtu : 0xFFFF;
+ maxnonfragsize = ip_sk_local_df(sk) ? 0xFFFF : mtu;
if (cork->length + size > maxnonfragsize - fragheaderlen) {
ip_local_error(sk, EMSGSIZE, fl4->daddr, inet->inet_dport,
@@ -1307,8 +1305,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
* to fragment the frame generated here. No matter, what transforms
* how transforms change size of the packet, it will come out.
*/
- if (inet->pmtudisc < IP_PMTUDISC_DO)
- skb->local_df = 1;
+ skb->local_df = ip_sk_local_df(sk);
/* DF bit is set when we want to see DF on outgoing frames.
* If local_df is set too, we still allow to fragment this frame
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 580dd96..9b98f74 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -626,7 +626,7 @@ static int do_ip_setsockopt(struct sock *sk, int level,
inet->nodefrag = val ? 1 : 0;
break;
case IP_MTU_DISCOVER:
- if (val < IP_PMTUDISC_DONT || val > IP_PMTUDISC_INTERFACE)
+ if (val < IP_PMTUDISC_DONT || val > IP_PMTUDISC_OMIT)
goto e_inval;
inet->pmtudisc = val;
break;
--
1.8.5.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH net 3/3] ipv6: yet another new IPV6_MTU_DISCOVER option IPV6_PMTUDISC_OMIT
2014-02-26 0:20 [PATCH net 0/3] yet another new mtu discovery mode Hannes Frederic Sowa
2014-02-26 0:20 ` [PATCH net 1/3] ipv4: use ip_skb_dst_mtu to determine mtu in ip_fragment Hannes Frederic Sowa
2014-02-26 0:20 ` [PATCH net 2/3] ipv4: yet another new IP_MTU_DISCOVER option IP_PMTUDISC_OMIT Hannes Frederic Sowa
@ 2014-02-26 0:20 ` Hannes Frederic Sowa
2014-02-26 4:17 ` [PATCH net 0/3] yet another new mtu discovery mode David Miller
3 siblings, 0 replies; 6+ messages in thread
From: Hannes Frederic Sowa @ 2014-02-26 0:20 UTC (permalink / raw)
To: netdev; +Cc: Florian Weimer
This option has the same semantic as IP_PMTUDISC_OMIT for IPv4 which
got recently introduced. It doesn't honor the path mtu discovered by the
host but in contrary to IPV6_PMTUDISC_INTERFACE allows the generation of
fragments if the packet size exceeds the MTU of the outgoing interface
MTU.
Fixes: 93b36cf3425b9b ("ipv6: support IPV6_PMTU_INTERFACE on sockets")
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
include/net/ip6_route.h | 9 ++++++++-
include/uapi/linux/in6.h | 4 ++++
net/ipv6/ip6_output.c | 9 +++++----
net/ipv6/ipv6_sockglue.c | 2 +-
4 files changed, 18 insertions(+), 6 deletions(-)
diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 017badb..00e3f12 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -171,7 +171,14 @@ static inline int ip6_skb_dst_mtu(struct sk_buff *skb)
static inline bool ip6_sk_accept_pmtu(const struct sock *sk)
{
- return inet6_sk(sk)->pmtudisc != IPV6_PMTUDISC_INTERFACE;
+ return inet6_sk(sk)->pmtudisc != IPV6_PMTUDISC_INTERFACE &&
+ inet6_sk(sk)->pmtudisc != IPV6_PMTUDISC_OMIT;
+}
+
+static inline bool ip6_sk_local_df(const struct sock *sk)
+{
+ return inet6_sk(sk)->pmtudisc < IPV6_PMTUDISC_DO ||
+ inet6_sk(sk)->pmtudisc == IPV6_PMTUDISC_OMIT;
}
static inline struct in6_addr *rt6_nexthop(struct rt6_info *rt)
diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h
index e9a1d2d..0d8e0f0 100644
--- a/include/uapi/linux/in6.h
+++ b/include/uapi/linux/in6.h
@@ -185,6 +185,10 @@ struct in6_flowlabel_req {
* also see comments on IP_PMTUDISC_INTERFACE
*/
#define IPV6_PMTUDISC_INTERFACE 4
+/* weaker version of IPV6_PMTUDISC_INTERFACE, which allows packets to
+ * get fragmented if they exceed the interface mtu
+ */
+#define IPV6_PMTUDISC_OMIT 5
/* Flowlabel */
#define IPV6_FLOWLABEL_MGR 32
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 16f91a2..2bc1070 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1231,8 +1231,10 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to,
sizeof(struct frag_hdr) : 0) +
rt->rt6i_nfheader_len;
- maxnonfragsize = (np->pmtudisc >= IPV6_PMTUDISC_DO) ?
- mtu : sizeof(struct ipv6hdr) + IPV6_MAXPLEN;
+ if (ip6_sk_local_df(sk))
+ maxnonfragsize = sizeof(struct ipv6hdr) + IPV6_MAXPLEN;
+ else
+ maxnonfragsize = mtu;
/* dontfrag active */
if ((cork->length + length > mtu - headersize) && dontfrag &&
@@ -1540,8 +1542,7 @@ int ip6_push_pending_frames(struct sock *sk)
}
/* Allow local fragmentation. */
- if (np->pmtudisc < IPV6_PMTUDISC_DO)
- skb->local_df = 1;
+ skb->local_df = ip6_sk_local_df(sk);
*final_dst = fl6->daddr;
__skb_pull(skb, skb_network_header_len(skb));
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 0a00f44..edb58af 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -722,7 +722,7 @@ done:
case IPV6_MTU_DISCOVER:
if (optlen < sizeof(int))
goto e_inval;
- if (val < IPV6_PMTUDISC_DONT || val > IPV6_PMTUDISC_INTERFACE)
+ if (val < IPV6_PMTUDISC_DONT || val > IPV6_PMTUDISC_OMIT)
goto e_inval;
np->pmtudisc = val;
retv = 0;
--
1.8.5.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net 0/3] yet another new mtu discovery mode
2014-02-26 0:20 [PATCH net 0/3] yet another new mtu discovery mode Hannes Frederic Sowa
` (2 preceding siblings ...)
2014-02-26 0:20 ` [PATCH net 3/3] ipv6: yet another new IPV6_MTU_DISCOVER option IPV6_PMTUDISC_OMIT Hannes Frederic Sowa
@ 2014-02-26 4:17 ` David Miller
2014-02-26 11:02 ` Hannes Frederic Sowa
3 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2014-02-26 4:17 UTC (permalink / raw)
To: hannes; +Cc: netdev
This is a new feature and therefore definitely not appropriate for
'net', it's 'net-next' material only.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net 0/3] yet another new mtu discovery mode
2014-02-26 4:17 ` [PATCH net 0/3] yet another new mtu discovery mode David Miller
@ 2014-02-26 11:02 ` Hannes Frederic Sowa
0 siblings, 0 replies; 6+ messages in thread
From: Hannes Frederic Sowa @ 2014-02-26 11:02 UTC (permalink / raw)
To: David Miller; +Cc: netdev
On Tue, Feb 25, 2014 at 11:17:26PM -0500, David Miller wrote:
>
> This is a new feature and therefore definitely not appropriate for
> 'net', it's 'net-next' material only.
You are correct.
As these patches are still in patchworks, I assume I don't need to repost with
changed patch tags and you would do whatever is appropriate for them.
Thanks a lot,
Hannes
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-02-26 11:02 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-26 0:20 [PATCH net 0/3] yet another new mtu discovery mode Hannes Frederic Sowa
2014-02-26 0:20 ` [PATCH net 1/3] ipv4: use ip_skb_dst_mtu to determine mtu in ip_fragment Hannes Frederic Sowa
2014-02-26 0:20 ` [PATCH net 2/3] ipv4: yet another new IP_MTU_DISCOVER option IP_PMTUDISC_OMIT Hannes Frederic Sowa
2014-02-26 0:20 ` [PATCH net 3/3] ipv6: yet another new IPV6_MTU_DISCOVER option IPV6_PMTUDISC_OMIT Hannes Frederic Sowa
2014-02-26 4:17 ` [PATCH net 0/3] yet another new mtu discovery mode David Miller
2014-02-26 11:02 ` Hannes Frederic Sowa
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).