[PATCH net-next] ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next] ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE
@ 2013-10-26 20:11 Hannes Frederic Sowa
  2013-10-29  4:08 ` David Miller
  0 siblings, 1 reply; 17+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-26 20:11 UTC (permalink / raw)
  To: netdev; +Cc: fweimer

Sockets marked with IP_PMTUDISC_INTERFACE won't do path mtu discovery,
their sockets won't accept and install new path mtu information and they
will always use the interface mtu for outgoing packets. It is guaranteed
that the packet is not fragmented locally. But we won't set the DF-Flag
on the outgoing frames.

Florian Weimer had the idea to use this flag to ensure DNS servers are
never generating outgoing fragments. They may well be fragmented on the
path, but the server never stores or usees path mtu values, which could
well be forged in an attack.

(The root of the problem with path MTU discovery is that there is
no reliable way to authenticate ICMP Fragmentation Needed But DF Set
messages because they are sent from intermediate routers with their
source addresses, and the IMCP payload will not always contain sufficient
information to identify a flow.)

Recent research in the DNS community showed that it is possible to
implement an attack where DNS cache poisoning is feasible by spoofing
fragments. This work was done by Amir Herzberg and Haya Shulman:
<https://sites.google.com/site/hayashulman/files/fragmentation-poisoning.pdf>

This issue was previously discussed among the DNS community, e.g.
<http://www.ietf.org/mail-archive/web/dnsext/current/msg01204.html>,
without leading to fixes.

This patch depends on the patch "ipv4: fix DO and PROBE pmtu mode
regarding local fragmentation with UFO/CORK" for the enforcement of the
non-fragmentable checks. If other users than ip_append_page/data should
use this semantic too, we have to add a new flag to IPCB(skb)->flags to
suppress local fragmentation and check for this in ip_finish_output.

Many thanks to Florian Weimer for the idea and feedback while implementing
this patch.

Suggested-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/route.h     | 16 ++++++++++++----
 include/uapi/linux/in.h |  5 +++++
 net/dccp/ipv4.c         |  1 +
 net/ipv4/ip_output.c    |  8 ++++----
 net/ipv4/ip_sockglue.c  |  2 +-
 net/ipv4/route.c        |  4 ++++
 net/ipv4/tcp_ipv4.c     |  1 +
 7 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index dd4ae00..f68c167 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -313,12 +313,20 @@ static inline int ip4_dst_hoplimit(const struct dst_entry *dst)
 	return hoplimit;
 }
 
-static inline int ip_skb_dst_mtu(struct sk_buff *skb)
+static inline bool ip_sk_accept_pmtu(const struct sock *sk)
 {
-	struct inet_sock *inet = skb->sk ? inet_sk(skb->sk) : NULL;
+	return inet_sk(sk)->pmtudisc != IP_PMTUDISC_INTERFACE;
+}
 
-	return (inet && inet->pmtudisc == IP_PMTUDISC_PROBE) ?
-	       skb_dst(skb)->dev->mtu : dst_mtu(skb_dst(skb));
+static inline bool ip_sk_use_pmtu(const struct sock *sk)
+{
+	return inet_sk(sk)->pmtudisc < IP_PMTUDISC_PROBE;
+}
+
+static inline int ip_skb_dst_mtu(const struct sk_buff *skb)
+{
+	return (!skb->sk || ip_sk_use_pmtu(skb->sk)) ?
+	       dst_mtu(skb_dst(skb)) : skb_dst(skb)->dev->mtu;
 }
 
 #endif	/* _ROUTE_H */
diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h
index f9e8e49..393c5de 100644
--- a/include/uapi/linux/in.h
+++ b/include/uapi/linux/in.h
@@ -115,6 +115,11 @@ struct in_addr {
 #define IP_PMTUDISC_WANT		1	/* Use per route hints	*/
 #define IP_PMTUDISC_DO			2	/* Always DF		*/
 #define IP_PMTUDISC_PROBE		3       /* Ignore dst pmtu      */
+/* Always use interface mtu (ignores dst pmtu) but don't set DF flag.
+ * Also incoming ICMP frag_needed notifications will be ignored on
+ * this socket to prevent accepting spoofed ones.
+ */
+#define IP_PMTUDISC_INTERFACE		4
 
 #define IP_MULTICAST_IF			32
 #define IP_MULTICAST_TTL 		33
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index 720c362..d9f65fc 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -174,6 +174,7 @@ static inline void dccp_do_pmtu_discovery(struct sock *sk,
 	mtu = dst_mtu(dst);
 
 	if (inet->pmtudisc != IP_PMTUDISC_DONT &&
+	    ip_sk_accept_pmtu(sk) &&
 	    inet_csk(sk)->icsk_pmtu_cookie > mtu) {
 		dccp_sync_mss(sk, mtu);
 
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 27dc1b3..88c270a 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1037,7 +1037,6 @@ error:
 static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
 			 struct ipcm_cookie *ipc, struct rtable **rtp)
 {
-	struct inet_sock *inet = inet_sk(sk);
 	struct ip_options_rcu *opt;
 	struct rtable *rt;
 
@@ -1063,8 +1062,8 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
 	 * We steal reference to this route, caller should not release it
 	 */
 	*rtp = NULL;
-	cork->fragsize = inet->pmtudisc == IP_PMTUDISC_PROBE ?
-			 rt->dst.dev->mtu : dst_mtu(&rt->dst);
+	cork->fragsize = ip_sk_use_pmtu(sk) ?
+			 dst_mtu(&rt->dst) : rt->dst.dev->mtu;
 	cork->dst = &rt->dst;
 	cork->length = 0;
 	cork->ttl = ipc->ttl;
@@ -1315,7 +1314,8 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
 	/* DF bit is set when we want to see DF on outgoing frames.
 	 * If local_df is set too, we still allow to fragment this frame
 	 * locally. */
-	if (inet->pmtudisc >= IP_PMTUDISC_DO ||
+	if (inet->pmtudisc == IP_PMTUDISC_DO ||
+	    inet->pmtudisc == IP_PMTUDISC_PROBE ||
 	    (skb->len <= dst_mtu(&rt->dst) &&
 	     ip_dont_fragment(sk, &rt->dst)))
 		df = htons(IP_DF);
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 0626f2c..3f85826 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -627,7 +627,7 @@ static int do_ip_setsockopt(struct sock *sk, int level,
 		inet->nodefrag = val ? 1 : 0;
 		break;
 	case IP_MTU_DISCOVER:
-		if (val < IP_PMTUDISC_DONT || val > IP_PMTUDISC_PROBE)
+		if (val < IP_PMTUDISC_DONT || val > IP_PMTUDISC_INTERFACE)
 			goto e_inval;
 		inet->pmtudisc = val;
 		break;
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index d2d3253..f428935 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1036,6 +1036,10 @@ void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu)
 	bool new = false;
 
 	bh_lock_sock(sk);
+
+	if (!ip_sk_accept_pmtu(sk))
+		goto out;
+
 	rt = (struct rtable *) __sk_dst_get(sk);
 
 	if (sock_owned_by_user(sk) || !rt) {
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 300ab2c..14bba8a 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -288,6 +288,7 @@ static void tcp_v4_mtu_reduced(struct sock *sk)
 	mtu = dst_mtu(dst);
 
 	if (inet->pmtudisc != IP_PMTUDISC_DONT &&
+	    ip_sk_accept_pmtu(sk) &&
 	    inet_csk(sk)->icsk_pmtu_cookie > mtu) {
 		tcp_sync_mss(sk, mtu);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE
  2013-10-26 20:11 [PATCH net-next] ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE Hannes Frederic Sowa
@ 2013-10-29  4:08 ` David Miller
  2013-10-29  7:36   ` Florian Weimer
  2013-10-29 12:04   ` Hannes Frederic Sowa
  0 siblings, 2 replies; 17+ messages in thread
From: David Miller @ 2013-10-29  4:08 UTC (permalink / raw)
  To: hannes; +Cc: netdev, fweimer

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Sat, 26 Oct 2013 22:11:58 +0200

> Sockets marked with IP_PMTUDISC_INTERFACE won't do path mtu discovery,
> their sockets won't accept and install new path mtu information and they
> will always use the interface mtu for outgoing packets. It is guaranteed
> that the packet is not fragmented locally. But we won't set the DF-Flag
> on the outgoing frames.
> 
> Florian Weimer had the idea to use this flag to ensure DNS servers are
> never generating outgoing fragments. They may well be fragmented on the
> path, but the server never stores or usees path mtu values, which could
> well be forged in an attack.
> 
> (The root of the problem with path MTU discovery is that there is
> no reliable way to authenticate ICMP Fragmentation Needed But DF Set
> messages because they are sent from intermediate routers with their
> source addresses, and the IMCP payload will not always contain sufficient
> information to identify a flow.)

I do not like this reasoning.  You have several more acceptable paths to take
to resolve this problem:

1) "I don't trust path MTU information at all"

   Just turn it off globally, end of story.  It has the same effect as your
   new per-application mode.

2) "I don't trust path MTU information unless the full socket ID is available
    in the ICMP packets quoted headers"

   Then simply implement a policy as such and submit it to me.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE
  2013-10-29  4:08 ` David Miller
@ 2013-10-29  7:36   ` Florian Weimer
  2013-10-29 19:38     ` David Miller
  2013-10-29 12:04   ` Hannes Frederic Sowa
  1 sibling, 1 reply; 17+ messages in thread
From: Florian Weimer @ 2013-10-29  7:36 UTC (permalink / raw)
  To: David Miller, hannes; +Cc: netdev

On 10/29/2013 05:08 AM, David Miller wrote:

> I do not like this reasoning.  You have several more acceptable paths to take
> to resolve this problem:
>
> 1) "I don't trust path MTU information at all"
>
>     Just turn it off globally, end of story.  It has the same effect as your
>     new per-application mode.

We can't push this as a security update.  We could tell everyone running 
DNS servers to reconfigure their systems in this way, but I always 
consider this a bit of a cop-out.

A new knob to turn IP_PMTUDISC_DONT into something that behaves like 
IP_PMTUDISC_INTERFACE would be more conservative and easier to deploy, I 
think.

> 2) "I don't trust path MTU information unless the full socket ID is available
>      in the ICMP packets quoted headers"
>
>     Then simply implement a policy as such and submit it to me.

There are IP protocols where these bits aren't readily available and 
where we don't want the kernel (outside the Netfilter code) to be aware 
of the payload structure.  Netfilter isn't a solution because it 
requires state and doesn't work well with request-response UDP protocols 
like DNS (even before source port randomization).

You could make the path MTU dependent on the protocol (which would even 
be the correct solution from a technical point of view) and use 
validation for TCP and UDP, but that's a fairly invasive change for such 
relatively minor functionality.

-- 
Florian Weimer / Red Hat Product Security Team

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE
  2013-10-29  4:08 ` David Miller
  2013-10-29  7:36   ` Florian Weimer
@ 2013-10-29 12:04   ` Hannes Frederic Sowa
  2013-10-29 12:50     ` Hannes Frederic Sowa
  2013-10-30 20:07     ` Hannes Frederic Sowa
  1 sibling, 2 replies; 17+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-29 12:04 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, fweimer

On Tue, Oct 29, 2013 at 12:08:44AM -0400, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Sat, 26 Oct 2013 22:11:58 +0200
> 
> > Sockets marked with IP_PMTUDISC_INTERFACE won't do path mtu discovery,
> > their sockets won't accept and install new path mtu information and they
> > will always use the interface mtu for outgoing packets. It is guaranteed
> > that the packet is not fragmented locally. But we won't set the DF-Flag
> > on the outgoing frames.
> > 
> > Florian Weimer had the idea to use this flag to ensure DNS servers are
> > never generating outgoing fragments. They may well be fragmented on the
> > path, but the server never stores or usees path mtu values, which could
> > well be forged in an attack.
> > 
> > (The root of the problem with path MTU discovery is that there is
> > no reliable way to authenticate ICMP Fragmentation Needed But DF Set
> > messages because they are sent from intermediate routers with their
> > source addresses, and the IMCP payload will not always contain sufficient
> > information to identify a flow.)
> 
> I do not like this reasoning.  You have several more acceptable paths to take
> to resolve this problem:

So, I hope I can convince you to merge this patch. ;-)

I really tried hard to find alternatives or even a way to enable
the protection automatically given that at least unbound does apply
IP_PMTUDISC_DONT to its sockets already. These are the reasons why I
came up with the new IP_PMTUDISC_INTERFACE value:

> 1) "I don't trust path MTU information at all"
> 
>    Just turn it off globally, end of story.  It has the same effect as your
>    new per-application mode.

I think you are referring to ipv4/ip_no_pmtu_disc?

This setting actually worsens the situation:

0.000 `echo 1 > /proc/sys/net/ipv4/ip_no_pmtu_disc`
0.500 `ethtool -K tun0 ufo off`

/* check current mtu with connected socket */
1.000 socket(..., SOCK_DGRAM, IPPROTO_UDP) = 3
1.000 connect(3, ..., ...) = 0
1.000 getsockopt(3, IPPROTO_IP, IP_MTU, [1500], [4]) = 0
1.000 close(3) = 0

/* inject frag_needed to unconnected socket */
2.000 socket(..., SOCK_DGRAM, IPPROTO_UDP) = 3
2.000 setsockopt(3, IPPROTO_IP, IP_MTU_DISCOVER, [IP_PMTUDISC_DONT], 4) = 0
2.000 sendto(3, ..., 1000, 0, ..., ...) = 1000
2.000 > udp (1000)
/* this simulates a forged icmp packet which tries to lower the mtu */
2.100 < [udp (1000)] icmp unreachable frag_needed mtu 1400

2.5 sendto(3, ..., 1000, 0, ..., ...) = 1000
/* first fragment splitted at 522 bytes */
2.5 > udp(522)
/* didn't find a way to match for the second fragment - so error out here
 * comment out this check to check for the mtu a IP_PMTUDISC_DONT socket now has
 *
 * it really is used, otherwise we would not have seen this fragment
 */

3.000 close(3) = 0

/* verify mtu again */
3.000 socket(..., SOCK_DGRAM, IPPROTO_UDP) = 3
3.000 connect(3, ..., ...) = 0
/* we actually have 522 as the minimal mtu - so error out here */
3.000 getsockopt(3, IPPROTO_IP, IP_MTU, [1400], [4]) = 0
3.000 close(3) = 0

5.000 `echo 0 > /proc/sys/net/ipv4/ip_no_pmtu_disc`

Given this packetdrill snippet we see it is actually very
counterproductive to tell people to globally disable pmtu discovery as
it will certainly generate more fragments.

The reason for this is, that __ip_rt_update_pmtu is called with zero mtu
and will get clamped to min_pmtu. It is possible to tell people to also
increase min_pmtu but I don't want to take the risk of breaking other
applications on a DNS server given how the current semantics are.

If we receive a pmtu event on a TCP or DCCP socket we actually
check for the pmtudisc setting and won't apply it if it is set to
IP_PMTUDISC_DONT. This is not checked on UDP and RAW sockets and it was
very hard to think about the consequences to introduce such checks now
(at least for me).

Adding a new sysctl would be a viable option but it is harder to use
by DNS server implementations and could be fiddled with by users or
scripts etc.  I dislike a globel (or even per namespace) setting. Also
Fedora e.g. installs an unbound server on Desktops if one installs
the dnssec-trigger package.  Maybe we will see more DNS servers on
workstations where we don't want to globally disable pmtu.

> 2) "I don't trust path MTU information unless the full socket ID is available
>     in the ICMP packets quoted headers"
> 
>    Then simply implement a policy as such and submit it to me.

We don't trust the MTU information in any case for that socket.

It seems easy to inject wrong path mtu information on a socket which will
be installed as a globel fnhe for 600 seconds. This is a very viable
target if an attacker is able to do cache poisoning afterwards. There
is not much protection even if we match the full socket ID on an
UDP socket. As this socket is more exposed to attacks we drop the mtu
information there. Of course, other ports could also install wrong pmtu
information causing packets being fragmented so we always use the path
mtu and don't fragment locally.  Dropping the pmtu information on these
sockets just makes it more symmetrical and causes less stress on the fnhe
hash tables if people do these attacks (they could well be running these
attacks in parallel). If the DNS port is the only one reachable externally
we won't do any pmtu processing on that server from untrusted peers, which is
a nice side effect.

All that said, I hope dns servers provide a setting to apply this option
on sockets and fail to start up if it could not be applied. So an
administrator is not in some sense of false hope that this protection
is enabled when running on an older kernel. This could only be done if we
provide a new value for IP_MTU_DISCOVER.

Greetings,

  Hannes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE
  2013-10-29 12:04   ` Hannes Frederic Sowa
@ 2013-10-29 12:50     ` Hannes Frederic Sowa
  2013-10-30 20:07     ` Hannes Frederic Sowa
  1 sibling, 0 replies; 17+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-29 12:50 UTC (permalink / raw)
  To: David Miller, netdev, fweimer

On Tue, Oct 29, 2013 at 01:04:25PM +0100, Hannes Frederic Sowa wrote:
> The reason for this is, that __ip_rt_update_pmtu is called with zero mtu
> and will get clamped to min_pmtu. It is possible to tell people to also
> increase min_pmtu but I don't want to take the risk of breaking other
> applications on a DNS server given how the current semantics are.

It is actually difficult to select a reasonable mtu here given more than two
interfaces with different mtus.

Greetings,

  Hannes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE
  2013-10-29  7:36   ` Florian Weimer
@ 2013-10-29 19:38     ` David Miller
  0 siblings, 0 replies; 17+ messages in thread
From: David Miller @ 2013-10-29 19:38 UTC (permalink / raw)
  To: fweimer; +Cc: hannes, netdev

From: Florian Weimer <fweimer@redhat.com>
Date: Tue, 29 Oct 2013 08:36:35 +0100

> You could make the path MTU dependent on the protocol (which would
> even be the correct solution from a technical point of view) and use
> validation for TCP and UDP, but that's a fairly invasive change for
> such relatively minor functionality.

The code pretty much already does this, we cannot even find the
exact route without being able to look up the socket right now.
We need the full flow info.

There is a backup path, but you could make the new (non-default)
behavior be to never trust that path for protocols like UDP and
TCP.

I don't buy any of your arguments against my suggestions so far.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE
  2013-10-29 12:04   ` Hannes Frederic Sowa
  2013-10-29 12:50     ` Hannes Frederic Sowa
@ 2013-10-30 20:07     ` Hannes Frederic Sowa
  2013-10-30 21:36       ` David Miller
  1 sibling, 1 reply; 17+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-30 20:07 UTC (permalink / raw)
  To: David Miller, netdev, fweimer

Hi!

On Tue, Oct 29, 2013 at 01:04:25PM +0100, Hannes Frederic Sowa wrote:
> I really tried hard to find alternatives or even a way to enable
> the protection automatically given that at least unbound does apply
> IP_PMTUDISC_DONT to its sockets already. These are the reasons why I
> came up with the new IP_PMTUDISC_INTERFACE value:

Sorry to bother you but I would really love to hear your feedback on my
reasoning so I can try to come up with a solution you would be happy with.

Thank you,

  Hannes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE
  2013-10-30 20:07     ` Hannes Frederic Sowa
@ 2013-10-30 21:36       ` David Miller
  2013-10-30 22:58         ` Hannes Frederic Sowa
  0 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2013-10-30 21:36 UTC (permalink / raw)
  To: hannes; +Cc: netdev, fweimer

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Wed, 30 Oct 2013 21:07:25 +0100

> On Tue, Oct 29, 2013 at 01:04:25PM +0100, Hannes Frederic Sowa wrote:
>> I really tried hard to find alternatives or even a way to enable
>> the protection automatically given that at least unbound does apply
>> IP_PMTUDISC_DONT to its sockets already. These are the reasons why I
>> came up with the new IP_PMTUDISC_INTERFACE value:
> 
> Sorry to bother you but I would really love to hear your feedback on my
> reasoning so I can try to come up with a solution you would be happy with.

All I've read is that administrators cannot be relied upon to
configure their systems properly for the requirements they have.

And that is a non-argument for adding this new socket option as far as
I'm concerned.

"I strongly do not trust path MTU information" has a scope as small as
a route or an interface, it doesn't go down to the socket or
application level at all.

Please stop pretending that it does.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE
  2013-10-30 21:36       ` David Miller
@ 2013-10-30 22:58         ` Hannes Frederic Sowa
  2013-10-31  4:29           ` David Miller
  0 siblings, 1 reply; 17+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-30 22:58 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, fweimer

On Wed, Oct 30, 2013 at 05:36:08PM -0400, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Wed, 30 Oct 2013 21:07:25 +0100
> 
> > On Tue, Oct 29, 2013 at 01:04:25PM +0100, Hannes Frederic Sowa wrote:
> >> I really tried hard to find alternatives or even a way to enable
> >> the protection automatically given that at least unbound does apply
> >> IP_PMTUDISC_DONT to its sockets already. These are the reasons why I
> >> came up with the new IP_PMTUDISC_INTERFACE value:
> > 
> > Sorry to bother you but I would really love to hear your feedback on my
> > reasoning so I can try to come up with a solution you would be happy with.

Thanks for the answer but I still tend to disagree:

> All I've read is that administrators cannot be relied upon to
> configure their systems properly for the requirements they have.
> 
> And that is a non-argument for adding this new socket option as far as
> I'm concerned.

First of, it is pretty hard to do it correct. Before writing this patch I
thought that ip_no_pmtu_disc does not honour pmtu updates at all. But as
soon as one fragmentation needed ICMP enters the box the mtu is reduced
to pmin_mtu. This was very counterintuitive for me (I don't know if this is
actually intended). Also we honour per application settings for TCP and DCCP
sockets, but I totally understand if you take this as a non-argument.

But that is not that important. Documentation can fix this.

> "I strongly do not trust path MTU information" has a scope as small as
> a route or an interface, it doesn't go down to the socket or
> application level at all.

I still disagree here. We can prevent generating UDP fragments if we
lock the mtu on the route, I agree and somehow missed that before (even
though I already used that myself once).

DNS resolver can fallback to TCP for querying where we can honour the
path MTU because it won't do any harm and ensures connectivity.

Also for TCP the socket is matched on the whole 4-tuple and we may
fallback to a 2-tuple lookup on unconnected UDP sockets. The new socket
option would let an application programmer choose to do path mtu discovery
because it knows it will only use a connected socket. On unconnected
sockets one can specify IP_PMTUDISC_INTERFACE to suppress the path MTU
updates and always use the interface MTU without the DF-bit set.

I really am open to suggestions. The socket option could be renamed or
we can transform IP_PMTUDISC_DONT to what I think it actually should
do. Per route or interface settings to enable the acceptance of path
MTU updates per protocol would be ok, too, but more complex.

Greetings,

  Hannes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE
  2013-10-30 22:58         ` Hannes Frederic Sowa
@ 2013-10-31  4:29           ` David Miller
  2013-10-31  9:42             ` Hannes Frederic Sowa
  0 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2013-10-31  4:29 UTC (permalink / raw)
  To: hannes; +Cc: netdev, fweimer

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Wed, 30 Oct 2013 23:58:51 +0100

> DNS resolver can fallback to TCP for querying where we can honour the
> path MTU because it won't do any harm and ensures connectivity.
> 
> Also for TCP the socket is matched on the whole 4-tuple and we may
> fallback to a 2-tuple lookup on unconnected UDP sockets. The new socket
> option would let an application programmer choose to do path mtu discovery
> because it knows it will only use a connected socket. On unconnected
> sockets one can specify IP_PMTUDISC_INTERFACE to suppress the path MTU
> updates and always use the interface MTU without the DF-bit set.

This still sounds like a route scope policy with two binary states,
one for connected sockets and one for unconnected ones.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE
  2013-10-31  4:29           ` David Miller
@ 2013-10-31  9:42             ` Hannes Frederic Sowa
  2013-11-04 23:25               ` Hannes Frederic Sowa
  0 siblings, 1 reply; 17+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-31  9:42 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, fweimer

On Thu, Oct 31, 2013 at 12:29:11AM -0400, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Wed, 30 Oct 2013 23:58:51 +0100
> 
> > DNS resolver can fallback to TCP for querying where we can honour the
> > path MTU because it won't do any harm and ensures connectivity.
> > 
> > Also for TCP the socket is matched on the whole 4-tuple and we may
> > fallback to a 2-tuple lookup on unconnected UDP sockets. The new socket
> > option would let an application programmer choose to do path mtu discovery
> > because it knows it will only use a connected socket. On unconnected
> > sockets one can specify IP_PMTUDISC_INTERFACE to suppress the path MTU
> > updates and always use the interface MTU without the DF-bit set.
> 
> This still sounds like a route scope policy with two binary states,
> one for connected sockets and one for unconnected ones.

Orthogonally a user may want to decide per protocol. Path MTU discovery
may happen on protocols where I have protection against fragment poisoning
because of the TCP 3-WHS and want to drop path mtu discovery information
on simple UDP request-response protocols even if they are used in
connected state (this is the specific case I want to protect in DNS).

Do we accept path MTU information if we just send a spoofed TCP SYN
request and then just fire a spoofed ICMP path mtu packet afterwards? I
guess (I really have not checked), yes. If the UDP stack would use this
information we again generate fragments and are vulnerable to UDP based
cache poisoning attacks. So when can we really say we can match and
trust the full socket ID?

Please note, dropping the path MTU on those sockets is merely an
optimization.  All I care about is that we don't fragment packets locally
and have the possibility to not set the DF-bit (so IP_PMTUDISC_PROBE
without DF). I could very well remove the ip_sk_accept_pmtu logic from
the patch.

I still think this patch is the reasonable way to solve this problem. If
you still think it should be on a per-route basis I will look down on
how this can be achieved but would focus on multiplexing the MTU per
protocol somehow.

Please let me know!

Thank you,

  Hannes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE
  2013-10-31  9:42             ` Hannes Frederic Sowa
@ 2013-11-04 23:25               ` Hannes Frederic Sowa
  2013-11-05  0:52                 ` David Miller
  0 siblings, 1 reply; 17+ messages in thread
From: Hannes Frederic Sowa @ 2013-11-04 23:25 UTC (permalink / raw)
  To: David Miller, netdev, fweimer

Hi David!

On Thu, Oct 31, 2013 at 10:42:26AM +0100, Hannes Frederic Sowa wrote:
> On Thu, Oct 31, 2013 at 12:29:11AM -0400, David Miller wrote:
> > From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > Date: Wed, 30 Oct 2013 23:58:51 +0100
> > 
> > > DNS resolver can fallback to TCP for querying where we can honour the
> > > path MTU because it won't do any harm and ensures connectivity.
> > > 
> > > Also for TCP the socket is matched on the whole 4-tuple and we may
> > > fallback to a 2-tuple lookup on unconnected UDP sockets. The new socket
> > > option would let an application programmer choose to do path mtu discovery
> > > because it knows it will only use a connected socket. On unconnected
> > > sockets one can specify IP_PMTUDISC_INTERFACE to suppress the path MTU
> > > updates and always use the interface MTU without the DF-bit set.
> > 
> > This still sounds like a route scope policy with two binary states,
> > one for connected sockets and one for unconnected ones.
> 
> Orthogonally a user may want to decide per protocol. Path MTU discovery
> may happen on protocols where I have protection against fragment poisoning
> because of the TCP 3-WHS and want to drop path mtu discovery information
> on simple UDP request-response protocols even if they are used in
> connected state (this is the specific case I want to protect in DNS).
> 
> Do we accept path MTU information if we just send a spoofed TCP SYN
> request and then just fire a spoofed ICMP path mtu packet afterwards? I
> guess (I really have not checked), yes. If the UDP stack would use this
> information we again generate fragments and are vulnerable to UDP based
> cache poisoning attacks. So when can we really say we can match and
> trust the full socket ID?
> 
> Please note, dropping the path MTU on those sockets is merely an
> optimization.  All I care about is that we don't fragment packets locally
> and have the possibility to not set the DF-bit (so IP_PMTUDISC_PROBE
> without DF). I could very well remove the ip_sk_accept_pmtu logic from
> the patch.
> 
> I still think this patch is the reasonable way to solve this problem. If
> you still think it should be on a per-route basis I will look down on
> how this can be achieved but would focus on multiplexing the MTU per
> protocol somehow.

Sorry for being so pushy about this patch! :|

I did a proof of concept of a per-route setting to suppress local
datagram fragmentation. It does work and would solve the problem. I did
not find a clean way to deal with sockets already using IP_MTU_DISCOVER,
so I just overwrite the setting and fallback to the safe mode and don't
allow local fragmentation (but I don't touch the DF-flag). This could have
implications to applications which already set a policy for their sockets
and expect a certain behaviour (e.g. traceroute in some circumstances). I
don't like the approach, but I would like to see this problem solved. If
you would accept this patch I would retest and do a clean submission
(with the iproute2 bits) then. UI currently looks like this:

# ip r change default via 10.0.0.254 dgram_dont_local_frag 1

After all, I still think this should be a per-application/per-socket
setting because the application must deal with the fact that no local
fragmentation is possible so it has to be more sensitive to EMSGSIZE
errors. DNS has to negotiate the UDP buffer size with its peers already
(with the help of EDNS), so there is already logic to deal with the
size of fragments and so it should not be done transparently IMHO.

With the per-route setting DNS implementations can work around this by
querying the IP_MTU or handle the socket errors but I don't know about
other UDP applications on the same host. Everything is ok as long as
the DNS server is the only application on that system.

This is not the same as e.g. TCP_QUICKACK which can transparently be
dealt with by applications and thus makes sense to be a per-route
setting. Applications will see different behaviour when using
dgram_dont_frag. This really is no tuning parameter.

I also tried per-protocol path MTUs and abandoned the patch earlier
today. It got too complex and had problems with how to emulate a common
mtu for old iproutes.

Also, I would be happy to support this patch and the per-socket approach.

I would love to hear your feedback.

diff --git a/include/net/route.h b/include/net/route.h
index dd4ae00..3f6b04b 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -313,12 +313,23 @@ static inline int ip4_dst_hoplimit(const struct dst_entry *dst)
 	return hoplimit;
 }
 
+static inline bool ip_dgram_local_fragment(const struct dst_entry *dst,
+				      const struct sock *sk)
+{
+	if (sk->sk_protocol == IPPROTO_UDP &&
+	    dst_metric(dst, RTAX_DGRAM_DONT_LOCAL_FRAG))
+		return false;
+	return true;
+}
+
 static inline int ip_skb_dst_mtu(struct sk_buff *skb)
 {
 	struct inet_sock *inet = skb->sk ? inet_sk(skb->sk) : NULL;
+	struct dst_entry *dst = skb_dst(skb);
 
-	return (inet && inet->pmtudisc == IP_PMTUDISC_PROBE) ?
-	       skb_dst(skb)->dev->mtu : dst_mtu(skb_dst(skb));
+	return (inet && (inet->pmtudisc == IP_PMTUDISC_PROBE ||
+			 !ip_dgram_local_fragment(dst, skb->sk))) ?
+	       dst->dev->mtu : dst_mtu(dst);
 }
 
 #endif	/* _ROUTE_H */
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index eb0f1a5..2e450f3 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -388,6 +388,8 @@ enum {
 #define RTAX_INITRWND RTAX_INITRWND
 	RTAX_QUICKACK,
 #define RTAX_QUICKACK RTAX_QUICKACK
+	RTAX_DGRAM_DONT_LOCAL_FRAG,
+#define RTAX_DGRAM_DONT_LOCAL_FRAG RTAX_DGRAM_DONT_LOCAL_FRAG
 	__RTAX_MAX
 };
 
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 51be64e..661c618 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -823,7 +823,8 @@ static int __ip_append_data(struct sock *sk,
 
 	fragheaderlen = sizeof(struct iphdr) + (opt ? opt->optlen : 0);
 	maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen;
-	maxnonfragsize = (inet->pmtudisc >= IP_PMTUDISC_DO) ?
+	maxnonfragsize = ((inet->pmtudisc >= IP_PMTUDISC_DO) ||
+			  !ip_dgram_local_fragment(&rt->dst, sk)) ?
 			 mtu : 0xFFFF;
 
 	if (cork->length + length > maxnonfragsize - fragheaderlen) {
@@ -1063,7 +1064,8 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
 	 * We steal reference to this route, caller should not release it
 	 */
 	*rtp = NULL;
-	cork->fragsize = inet->pmtudisc == IP_PMTUDISC_PROBE ?
+	cork->fragsize = ((inet->pmtudisc == IP_PMTUDISC_PROBE) ||
+			  !ip_dgram_local_fragment(&rt->dst, sk)) ?
 			 rt->dst.dev->mtu : dst_mtu(&rt->dst);
 	cork->dst = &rt->dst;
 	cork->length = 0;
@@ -1148,8 +1150,9 @@ ssize_t	ip_append_page(struct sock *sk, struct flowi4 *fl4, struct page *page,
 
 	fragheaderlen = sizeof(struct iphdr) + (opt ? opt->optlen : 0);
 	maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen;
-	maxnonfragsize = (inet->pmtudisc >= IP_PMTUDISC_DO) ?
-			 mtu : 0xFFFF;
+	maxnonfragsize = ((inet->pmtudisc >= IP_PMTUDISC_DO) ||
+			  !ip_dgram_local_fragment(&rt->dst, sk)) ?
+		         mtu : 0xFFFF;
 
 	if (cork->length + size > maxnonfragsize - fragheaderlen) {
 		ip_local_error(sk, EMSGSIZE, fl4->daddr, inet->inet_dport, mtu);
@@ -1309,7 +1312,8 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
 	 * to fragment the frame generated here. No matter, what transforms
 	 * how transforms change size of the packet, it will come out.
 	 */
-	if (inet->pmtudisc < IP_PMTUDISC_DO)
+	if (inet->pmtudisc < IP_PMTUDISC_DO &&
+	    ip_dgram_local_fragment(&rt->dst, sk))
 		skb->local_df = 1;
 
 	/* DF bit is set when we want to see DF on outgoing frames.

Thank you,

  Hannes

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE
  2013-11-04 23:25               ` Hannes Frederic Sowa
@ 2013-11-05  0:52                 ` David Miller
  2013-11-05  0:58                   ` Hannes Frederic Sowa
  0 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2013-11-05  0:52 UTC (permalink / raw)
  To: hannes; +Cc: netdev, fweimer

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Tue, 5 Nov 2013 00:25:41 +0100

> Sorry for being so pushy about this patch! :|

I haven't ruled out applying your original patch, could you please
just repost it so that it gets queued up anew in patchwork?

I'll get to resolving this, please be patient.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE
  2013-11-05  0:52                 ` David Miller
@ 2013-11-05  0:58                   ` Hannes Frederic Sowa
  0 siblings, 0 replies; 17+ messages in thread
From: Hannes Frederic Sowa @ 2013-11-05  0:58 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, fweimer

On Mon, Nov 04, 2013 at 07:52:30PM -0500, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Tue, 5 Nov 2013 00:25:41 +0100
> 
> > Sorry for being so pushy about this patch! :|
> 
> I haven't ruled out applying your original patch, could you please
> just repost it so that it gets queued up anew in patchwork?

Ok, will do, thanks!

> I'll get to resolving this, please be patient.

Sorry, I did not want to rush you.

Greetings,

  Hannes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH net-next] ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE
@ 2013-11-05  1:24 Hannes Frederic Sowa
  2013-11-06  2:57 ` David Miller
  0 siblings, 1 reply; 17+ messages in thread
From: Hannes Frederic Sowa @ 2013-11-05  1:24 UTC (permalink / raw)
  To: netdev; +Cc: davem, fweimer

Sockets marked with IP_PMTUDISC_INTERFACE won't do path mtu discovery,
their sockets won't accept and install new path mtu information and they
will always use the interface mtu for outgoing packets. It is guaranteed
that the packet is not fragmented locally. But we won't set the DF-Flag
on the outgoing frames.

Florian Weimer had the idea to use this flag to ensure DNS servers are
never generating outgoing fragments. They may well be fragmented on the
path, but the server never stores or usees path mtu values, which could
well be forged in an attack.

(The root of the problem with path MTU discovery is that there is
no reliable way to authenticate ICMP Fragmentation Needed But DF Set
messages because they are sent from intermediate routers with their
source addresses, and the IMCP payload will not always contain sufficient
information to identify a flow.)

Recent research in the DNS community showed that it is possible to
implement an attack where DNS cache poisoning is feasible by spoofing
fragments. This work was done by Amir Herzberg and Haya Shulman:
<https://sites.google.com/site/hayashulman/files/fragmentation-poisoning.pdf>

This issue was previously discussed among the DNS community, e.g.
<http://www.ietf.org/mail-archive/web/dnsext/current/msg01204.html>,
without leading to fixes.

This patch depends on the patch "ipv4: fix DO and PROBE pmtu mode
regarding local fragmentation with UFO/CORK" for the enforcement of the
non-fragmentable checks. If other users than ip_append_page/data should
use this semantic too, we have to add a new flag to IPCB(skb)->flags to
suppress local fragmentation and check for this in ip_finish_output.

Many thanks to Florian Weimer for the idea and feedback while implementing
this patch.

Cc: David S. Miller <davem@davemloft.net>
Suggested-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/net/route.h     | 16 ++++++++++++----
 include/uapi/linux/in.h |  5 +++++
 net/dccp/ipv4.c         |  1 +
 net/ipv4/ip_output.c    |  8 ++++----
 net/ipv4/ip_sockglue.c  |  2 +-
 net/ipv4/route.c        |  4 ++++
 net/ipv4/tcp_ipv4.c     |  1 +
 7 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index dd4ae00..f68c167 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -313,12 +313,20 @@ static inline int ip4_dst_hoplimit(const struct dst_entry *dst)
 	return hoplimit;
 }
 
-static inline int ip_skb_dst_mtu(struct sk_buff *skb)
+static inline bool ip_sk_accept_pmtu(const struct sock *sk)
 {
-	struct inet_sock *inet = skb->sk ? inet_sk(skb->sk) : NULL;
+	return inet_sk(sk)->pmtudisc != IP_PMTUDISC_INTERFACE;
+}
 
-	return (inet && inet->pmtudisc == IP_PMTUDISC_PROBE) ?
-	       skb_dst(skb)->dev->mtu : dst_mtu(skb_dst(skb));
+static inline bool ip_sk_use_pmtu(const struct sock *sk)
+{
+	return inet_sk(sk)->pmtudisc < IP_PMTUDISC_PROBE;
+}
+
+static inline int ip_skb_dst_mtu(const struct sk_buff *skb)
+{
+	return (!skb->sk || ip_sk_use_pmtu(skb->sk)) ?
+	       dst_mtu(skb_dst(skb)) : skb_dst(skb)->dev->mtu;
 }
 
 #endif	/* _ROUTE_H */
diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h
index f9e8e49..393c5de 100644
--- a/include/uapi/linux/in.h
+++ b/include/uapi/linux/in.h
@@ -115,6 +115,11 @@ struct in_addr {
 #define IP_PMTUDISC_WANT		1	/* Use per route hints	*/
 #define IP_PMTUDISC_DO			2	/* Always DF		*/
 #define IP_PMTUDISC_PROBE		3       /* Ignore dst pmtu      */
+/* Always use interface mtu (ignores dst pmtu) but don't set DF flag.
+ * Also incoming ICMP frag_needed notifications will be ignored on
+ * this socket to prevent accepting spoofed ones.
+ */
+#define IP_PMTUDISC_INTERFACE		4
 
 #define IP_MULTICAST_IF			32
 #define IP_MULTICAST_TTL 		33
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index 720c362..d9f65fc 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -174,6 +174,7 @@ static inline void dccp_do_pmtu_discovery(struct sock *sk,
 	mtu = dst_mtu(dst);
 
 	if (inet->pmtudisc != IP_PMTUDISC_DONT &&
+	    ip_sk_accept_pmtu(sk) &&
 	    inet_csk(sk)->icsk_pmtu_cookie > mtu) {
 		dccp_sync_mss(sk, mtu);
 
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 51be64e..9124027 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1037,7 +1037,6 @@ error:
 static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
 			 struct ipcm_cookie *ipc, struct rtable **rtp)
 {
-	struct inet_sock *inet = inet_sk(sk);
 	struct ip_options_rcu *opt;
 	struct rtable *rt;
 
@@ -1063,8 +1062,8 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
 	 * We steal reference to this route, caller should not release it
 	 */
 	*rtp = NULL;
-	cork->fragsize = inet->pmtudisc == IP_PMTUDISC_PROBE ?
-			 rt->dst.dev->mtu : dst_mtu(&rt->dst);
+	cork->fragsize = ip_sk_use_pmtu(sk) ?
+			 dst_mtu(&rt->dst) : rt->dst.dev->mtu;
 	cork->dst = &rt->dst;
 	cork->length = 0;
 	cork->ttl = ipc->ttl;
@@ -1315,7 +1314,8 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
 	/* DF bit is set when we want to see DF on outgoing frames.
 	 * If local_df is set too, we still allow to fragment this frame
 	 * locally. */
-	if (inet->pmtudisc >= IP_PMTUDISC_DO ||
+	if (inet->pmtudisc == IP_PMTUDISC_DO ||
+	    inet->pmtudisc == IP_PMTUDISC_PROBE ||
 	    (skb->len <= dst_mtu(&rt->dst) &&
 	     ip_dont_fragment(sk, &rt->dst)))
 		df = htons(IP_DF);
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 0626f2c..3f85826 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -627,7 +627,7 @@ static int do_ip_setsockopt(struct sock *sk, int level,
 		inet->nodefrag = val ? 1 : 0;
 		break;
 	case IP_MTU_DISCOVER:
-		if (val < IP_PMTUDISC_DONT || val > IP_PMTUDISC_PROBE)
+		if (val < IP_PMTUDISC_DONT || val > IP_PMTUDISC_INTERFACE)
 			goto e_inval;
 		inet->pmtudisc = val;
 		break;
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index d2d3253..f428935 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1036,6 +1036,10 @@ void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu)
 	bool new = false;
 
 	bh_lock_sock(sk);
+
+	if (!ip_sk_accept_pmtu(sk))
+		goto out;
+
 	rt = (struct rtable *) __sk_dst_get(sk);
 
 	if (sock_owned_by_user(sk) || !rt) {
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 300ab2c..14bba8a 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -288,6 +288,7 @@ static void tcp_v4_mtu_reduced(struct sock *sk)
 	mtu = dst_mtu(dst);
 
 	if (inet->pmtudisc != IP_PMTUDISC_DONT &&
+	    ip_sk_accept_pmtu(sk) &&
 	    inet_csk(sk)->icsk_pmtu_cookie > mtu) {
 		tcp_sync_mss(sk, mtu);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE
  2013-11-05  1:24 Hannes Frederic Sowa
@ 2013-11-06  2:57 ` David Miller
  2013-11-06  3:11   ` Hannes Frederic Sowa
  0 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2013-11-06  2:57 UTC (permalink / raw)
  To: hannes; +Cc: netdev, fweimer

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Tue, 5 Nov 2013 02:24:17 +0100

> Sockets marked with IP_PMTUDISC_INTERFACE won't do path mtu discovery,
> their sockets won't accept and install new path mtu information and they
> will always use the interface mtu for outgoing packets. It is guaranteed
> that the packet is not fragmented locally. But we won't set the DF-Flag
> on the outgoing frames.
> 
> Florian Weimer had the idea to use this flag to ensure DNS servers are
> never generating outgoing fragments. They may well be fragmented on the
> path, but the server never stores or usees path mtu values, which could
> well be forged in an attack.
> 
> (The root of the problem with path MTU discovery is that there is
> no reliable way to authenticate ICMP Fragmentation Needed But DF Set
> messages because they are sent from intermediate routers with their
> source addresses, and the IMCP payload will not always contain sufficient
> information to identify a flow.)
> 
> Recent research in the DNS community showed that it is possible to
> implement an attack where DNS cache poisoning is feasible by spoofing
> fragments. This work was done by Amir Herzberg and Haya Shulman:
> <https://sites.google.com/site/hayashulman/files/fragmentation-poisoning.pdf>
> 
> This issue was previously discussed among the DNS community, e.g.
> <http://www.ietf.org/mail-archive/web/dnsext/current/msg01204.html>,
> without leading to fixes.
> 
> This patch depends on the patch "ipv4: fix DO and PROBE pmtu mode
> regarding local fragmentation with UFO/CORK" for the enforcement of the
> non-fragmentable checks. If other users than ip_append_page/data should
> use this semantic too, we have to add a new flag to IPCB(skb)->flags to
> suppress local fragmentation and check for this in ip_finish_output.
> 
> Many thanks to Florian Weimer for the idea and feedback while implementing
> this patch.
> 
> Cc: David S. Miller <davem@davemloft.net>
> Suggested-by: Florian Weimer <fweimer@redhat.com>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Ok I changed my mind and decided to apply this, thanks.

BTW, what about IPV6?  Even if ipv6 doesn't need this facility, for
whatever reason, these IP_PMTUDISC_* values are shared with ipv6
(via the IPV6_MTU_DISCOVER socket option) so we should at least make
sure that we do something reasonable if the new value happens to get
passed in.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE
  2013-11-06  2:57 ` David Miller
@ 2013-11-06  3:11   ` Hannes Frederic Sowa
  0 siblings, 0 replies; 17+ messages in thread
From: Hannes Frederic Sowa @ 2013-11-06  3:11 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, fweimer

On Tue, Nov 05, 2013 at 09:57:50PM -0500, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Tue, 5 Nov 2013 02:24:17 +0100
> 
> > Sockets marked with IP_PMTUDISC_INTERFACE won't do path mtu discovery,
> > their sockets won't accept and install new path mtu information and they
> > will always use the interface mtu for outgoing packets. It is guaranteed
> > that the packet is not fragmented locally. But we won't set the DF-Flag
> > on the outgoing frames.
> > 
> > Florian Weimer had the idea to use this flag to ensure DNS servers are
> > never generating outgoing fragments. They may well be fragmented on the
> > path, but the server never stores or usees path mtu values, which could
> > well be forged in an attack.
> > 
> > (The root of the problem with path MTU discovery is that there is
> > no reliable way to authenticate ICMP Fragmentation Needed But DF Set
> > messages because they are sent from intermediate routers with their
> > source addresses, and the IMCP payload will not always contain sufficient
> > information to identify a flow.)
> > 
> > Recent research in the DNS community showed that it is possible to
> > implement an attack where DNS cache poisoning is feasible by spoofing
> > fragments. This work was done by Amir Herzberg and Haya Shulman:
> > <https://sites.google.com/site/hayashulman/files/fragmentation-poisoning.pdf>
> > 
> > This issue was previously discussed among the DNS community, e.g.
> > <http://www.ietf.org/mail-archive/web/dnsext/current/msg01204.html>,
> > without leading to fixes.
> > 
> > This patch depends on the patch "ipv4: fix DO and PROBE pmtu mode
> > regarding local fragmentation with UFO/CORK" for the enforcement of the
> > non-fragmentable checks. If other users than ip_append_page/data should
> > use this semantic too, we have to add a new flag to IPCB(skb)->flags to
> > suppress local fragmentation and check for this in ip_finish_output.
> > 
> > Many thanks to Florian Weimer for the idea and feedback while implementing
> > this patch.
> > 
> > Cc: David S. Miller <davem@davemloft.net>
> > Suggested-by: Florian Weimer <fweimer@redhat.com>
> > Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> 
> Ok I changed my mind and decided to apply this, thanks.
> 
> BTW, what about IPV6?  Even if ipv6 doesn't need this facility, for
> whatever reason, these IP_PMTUDISC_* values are shared with ipv6
> (via the IPV6_MTU_DISCOVER socket option) so we should at least make
> sure that we do something reasonable if the new value happens to get
> passed in.

Of course I will look after IPv6. ;)

IPPROTO_IP setsockopt level does change the value of inet_sk->pmtudisc.
The IPPROTO_IPV6 would change the pmtudisc in ipv6_pinfo. So we currently
don't accept this option on IPv6 sockets (-EINVAL) and when applied with
IPPROTO_IP level it will change the sockets behaviour for IPv4 mapped
sockets only. So nothing will break with this change.

It should be easy to do and I plan to send the changes this week. pmtudisc
in ipv6_pinfo has only 2 bits so I will have to increase the socket size
a bit.

Thank you,

  Hannes

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-11-06  3:11 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-26 20:11 [PATCH net-next] ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE Hannes Frederic Sowa
2013-10-29  4:08 ` David Miller
2013-10-29  7:36   ` Florian Weimer
2013-10-29 19:38     ` David Miller
2013-10-29 12:04   ` Hannes Frederic Sowa
2013-10-29 12:50     ` Hannes Frederic Sowa
2013-10-30 20:07     ` Hannes Frederic Sowa
2013-10-30 21:36       ` David Miller
2013-10-30 22:58         ` Hannes Frederic Sowa
2013-10-31  4:29           ` David Miller
2013-10-31  9:42             ` Hannes Frederic Sowa
2013-11-04 23:25               ` Hannes Frederic Sowa
2013-11-05  0:52                 ` David Miller
2013-11-05  0:58                   ` Hannes Frederic Sowa
  -- strict thread matches above, loose matches on Subject: below --
2013-11-05  1:24 Hannes Frederic Sowa
2013-11-06  2:57 ` David Miller
2013-11-06  3:11   ` Hannes Frederic Sowa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).