From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steffen Klassert Subject: Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets Date: Thu, 20 Dec 2012 08:34:46 +0100 Message-ID: <20121220073445.GM18940@secunet.com> References: <50D1BCC0.2000208@oktetlabs.ru> <1355924119.2676.6.camel@bwh-desktop.uk.solarflarecom.com> <50D1CECE.7090706@oktetlabs.ru> <1355945864.2676.21.camel@bwh-desktop.uk.solarflarecom.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: "Yurij M. Plotnikov" , netdev@vger.kernel.org, "Alexandra N. Kossovsky" To: Ben Hutchings Return-path: Received: from a.mx.secunet.com ([195.81.216.161]:40914 "EHLO a.mx.secunet.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750843Ab2LTHes (ORCPT ); Thu, 20 Dec 2012 02:34:48 -0500 Content-Disposition: inline In-Reply-To: <1355945864.2676.21.camel@bwh-desktop.uk.solarflarecom.com> Sender: netdev-owner@vger.kernel.org List-ID: On Wed, Dec 19, 2012 at 07:37:44PM +0000, Ben Hutchings wrote: > On Wed, 2012-12-19 at 18:27 +0400, Yurij M. Plotnikov wrote: > > On 12/19/12 17:35, Ben Hutchings wrote: > > > On Wed, 2012-12-19 at 17:10 +0400, Yurij M. Plotnikov wrote: > > > > > >> On kernel 3.7.1 I get strange behaviour of IP_MTU_DISCOVER socket > > >> option. The behaviour in case of IP_PMTUDISC_DO and IP_PMTUDISC_WANT > > >> values of IP_MTU_DISCOVER socket option on SOCK_DGRAM socket are the > > >> same and packet is always sent with "Don't Fragment" bit in case of > > >> IP_PMTUDISC_WANT. Also, the value of IP_MTU socket option is not updated. > > >> > > > You could try reverting: > > > > > > commit ee9a8f7ab2edf801b8b514c310455c94acc232f6 > > > Author: Steffen Klassert > > > Date: Mon Oct 8 00:56:54 2012 +0000 > > > > > > ipv4: Don't report stale pmtu values to userspace > > > > > > We report cached pmtu values even if they are already expired. > > > Change this to not report these values after they are expired > > > and fix a race in the expire time calculation, as suggested by > > > Eric Dumazet. > > > > > > Still, PMTU information is not supposed to expire for 10 minutes... > > > > > > > > With reverted commit there is no such problem on 3.7.1: IP_MTU is > > updated and DF is set only for the first packet in case of > > IP_PMTUDISC_WANT. > [...] > > So it looks like something is going wrong with the expiry calculation > here. > > This change shouldn't affect the PMTU actually used by the kernel, but > could affect Onload since that relies on netlink route updates to keep > in synch. You didn't say you were using Onload, but if you are then we > should not bother netdev with this until we can demonstrate a problem > that involves only the kernel stack. > I'm really surprised that this change can have such an effect, it changes nothing at the kernels pmtu handling. When looking at the code, I found that we may report a mtu value from a stale dst_entry when we query the mtu value with the IP_MTU socket option. But a subsequent send() should update the socket cached dst_entry, so at most one packet should be affected. Does the patch below change anything? diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index 3c9d208..1049ce0 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -1198,7 +1198,7 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname, { struct dst_entry *dst; val = 0; - dst = sk_dst_get(sk); + dst = sk_dst_check(sk, 0); if (dst) { val = dst_mtu(dst); dst_release(dst);