netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PMTU discovery is broken on kernel 3.7.1 for UDP sockets
@ 2012-12-19 13:10 Yurij M. Plotnikov
  2012-12-19 13:35 ` Ben Hutchings
  0 siblings, 1 reply; 24+ messages in thread
From: Yurij M. Plotnikov @ 2012-12-19 13:10 UTC (permalink / raw)
  To: netdev, Ben Hutchings, Alexandra N. Kossovsky

On kernel 3.7.1 I get strange behaviour of IP_MTU_DISCOVER socket 
option. The behaviour in case of IP_PMTUDISC_DO and IP_PMTUDISC_WANT 
values of IP_MTU_DISCOVER socket option on SOCK_DGRAM socket are the 
same and packet is always sent with "Don't Fragment" bit in case of 
IP_PMTUDISC_WANT. Also, the value of IP_MTU socket option is not updated.

This can be reproduced with 3 hosts configuration. Let it be the hosts: 
host_A, host_B and host _C. host_A via interface eth1 connected with 
host_B via intefaces eth1. Let host_C via interface eth1 connected with 
host_B via interface eth2. Also Lets address 10.0.1.1/24 is assigned to 
eth1 on host_A; 10.0.1.2/24 is assigned to eth1 on host_B; 10.0.2.1/24 
is assigned to eth2 on host_B; 10.0.2.2/24 is assigned to eth1 on 
host_C. Also there are two routes: "10.0.2.2 via 10.0.1.2 dev eth1" on 
host_A and "10.0.1.1 via 10.0.2.1 dev eth1" on host_C. Also forwarding 
is on on host_B. So we have the following picture:

host_A-eth1(10.0.1.1)<-->(10.0.1.2)eth1-host_B-eth2(10.0.2.1)<-->(10.0.2.2)eth1-host_C

MTU is equal to 1500 on all involved interfaces. Then we make the 
followign steps:

on host_A:
1. socket(SOCK_DGRAM) -> 6
2. bind(6, 10.0.1.1:25630) -> 0
on host_C:
3. socket(SOCK_DGRAM) -> 5
4. bind(5, 10.0.2.2:25631) -> 0
on host_A:
5. connect(6, 10.0.2.2:25631) -> 0
on host_C:
6. connect(5, 10.0.1.1:25630) -> 0
on host_A
7. getsockopt(6,IP_MTU) -> 0 // Returns that MTU is 1500
8. getsockopt(6,IP_MTU_DISCOVER) -> 0 // Returns that default value is 
IP_PMTUDISC_WANT

On eth2 on host_B and on eth1 on host_C change MTU from 1500 to 750. 
Wait for a while.

9. send(6, lenght=1400) -> 1400 // the packet is sent with "Don't 
Fragment" bit, tcpdump on eth1 on host_B shows it
10. sleep(5);
11. send(6, length=1400) -> -1 with EMSGSIZE
12. sleep(5);
13. getsockopt(6,IP_MTU) -> 0 // Returns that MTU is 1500 once again. So 
value is not updated.
14. send(6, lenght=1400) -> 1400 // the packet one again is sent with 
"Don't Fragment" bit, tcpdump on eth1 on host_B shows it

So "Don't Fragment" bit is always set for the packets in case when value 
of IP_MTU_DISCOVER is IP_PMTUDISC_WANT. If at step 8 we change 
IP_MTU_DISCOVER value from IP_PMTUDISC_WANT to IP_PMTUDISC_DO we have 
the same picture. The value of IP_MTU socket options is still 1500 at 
step 13 in this case.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets
  2012-12-19 13:10 PMTU discovery is broken on kernel 3.7.1 for UDP sockets Yurij M. Plotnikov
@ 2012-12-19 13:35 ` Ben Hutchings
  2012-12-19 14:27   ` Yurij M. Plotnikov
  0 siblings, 1 reply; 24+ messages in thread
From: Ben Hutchings @ 2012-12-19 13:35 UTC (permalink / raw)
  To: Yurij M. Plotnikov; +Cc: netdev, Alexandra N. Kossovsky

On Wed, 2012-12-19 at 17:10 +0400, Yurij M. Plotnikov wrote:
> On kernel 3.7.1 I get strange behaviour of IP_MTU_DISCOVER socket 
> option. The behaviour in case of IP_PMTUDISC_DO and IP_PMTUDISC_WANT 
> values of IP_MTU_DISCOVER socket option on SOCK_DGRAM socket are the 
> same and packet is always sent with "Don't Fragment" bit in case of 
> IP_PMTUDISC_WANT. Also, the value of IP_MTU socket option is not updated.

You could try reverting:

commit ee9a8f7ab2edf801b8b514c310455c94acc232f6
Author: Steffen Klassert <steffen.klassert@secunet.com>
Date:   Mon Oct 8 00:56:54 2012 +0000

    ipv4: Don't report stale pmtu values to userspace
    
    We report cached pmtu values even if they are already expired.
    Change this to not report these values after they are expired
    and fix a race in the expire time calculation, as suggested by
    Eric Dumazet.

Still, PMTU information is not supposed to expire for 10 minutes...

[...]
> On eth2 on host_B and on eth1 on host_C change MTU from 1500 to 750. 
> Wait for a while.
> 
> 9. send(6, lenght=1400) -> 1400 // the packet is sent with "Don't 
> Fragment" bit, tcpdump on eth1 on host_B shows it
> 10. sleep(5);
> 11. send(6, length=1400) -> -1 with EMSGSIZE
> 12. sleep(5);
> 13. getsockopt(6,IP_MTU) -> 0 // Returns that MTU is 1500 once again. So 
> value is not updated.
[...]

What if you read this option immediately before the sleep(5)?

Ben.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets
  2012-12-19 13:35 ` Ben Hutchings
@ 2012-12-19 14:27   ` Yurij M. Plotnikov
  2012-12-19 19:37     ` Ben Hutchings
  0 siblings, 1 reply; 24+ messages in thread
From: Yurij M. Plotnikov @ 2012-12-19 14:27 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev, Alexandra N. Kossovsky

On 12/19/12 17:35, Ben Hutchings wrote:
> On Wed, 2012-12-19 at 17:10 +0400, Yurij M. Plotnikov wrote:
>    
>> On kernel 3.7.1 I get strange behaviour of IP_MTU_DISCOVER socket
>> option. The behaviour in case of IP_PMTUDISC_DO and IP_PMTUDISC_WANT
>> values of IP_MTU_DISCOVER socket option on SOCK_DGRAM socket are the
>> same and packet is always sent with "Don't Fragment" bit in case of
>> IP_PMTUDISC_WANT. Also, the value of IP_MTU socket option is not updated.
>>      
> You could try reverting:
>
> commit ee9a8f7ab2edf801b8b514c310455c94acc232f6
> Author: Steffen Klassert<steffen.klassert@secunet.com>
> Date:   Mon Oct 8 00:56:54 2012 +0000
>
>      ipv4: Don't report stale pmtu values to userspace
>
>      We report cached pmtu values even if they are already expired.
>      Change this to not report these values after they are expired
>      and fix a race in the expire time calculation, as suggested by
>      Eric Dumazet.
>
> Still, PMTU information is not supposed to expire for 10 minutes...
>
>    
With reverted commit there is no such problem on 3.7.1: IP_MTU is 
updated and DF is set only for the first packet in case of 
IP_PMTUDISC_WANT.
> [...]
>    
>> On eth2 on host_B and on eth1 on host_C change MTU from 1500 to 750.
>> Wait for a while.
>>
>> 9. send(6, lenght=1400) ->  1400 // the packet is sent with "Don't
>> Fragment" bit, tcpdump on eth1 on host_B shows it
>> 10. sleep(5);
>> 11. send(6, length=1400) ->  -1 with EMSGSIZE
>> 12. sleep(5);
>> 13. getsockopt(6,IP_MTU) ->  0 // Returns that MTU is 1500 once again. So
>> value is not updated.
>>      
> [...]
>
> What if you read this option immediately before the sleep(5)?
>    
It still returns that MTU is 1500.

Yurij.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets
  2012-12-19 14:27   ` Yurij M. Plotnikov
@ 2012-12-19 19:37     ` Ben Hutchings
  2012-12-20  7:14       ` Yurij M. Plotnikov
  2012-12-20  7:34       ` Steffen Klassert
  0 siblings, 2 replies; 24+ messages in thread
From: Ben Hutchings @ 2012-12-19 19:37 UTC (permalink / raw)
  To: Yurij M. Plotnikov; +Cc: netdev, Alexandra N. Kossovsky

On Wed, 2012-12-19 at 18:27 +0400, Yurij M. Plotnikov wrote:
> On 12/19/12 17:35, Ben Hutchings wrote:
> > On Wed, 2012-12-19 at 17:10 +0400, Yurij M. Plotnikov wrote:
> >    
> >> On kernel 3.7.1 I get strange behaviour of IP_MTU_DISCOVER socket
> >> option. The behaviour in case of IP_PMTUDISC_DO and IP_PMTUDISC_WANT
> >> values of IP_MTU_DISCOVER socket option on SOCK_DGRAM socket are the
> >> same and packet is always sent with "Don't Fragment" bit in case of
> >> IP_PMTUDISC_WANT. Also, the value of IP_MTU socket option is not updated.
> >>      
> > You could try reverting:
> >
> > commit ee9a8f7ab2edf801b8b514c310455c94acc232f6
> > Author: Steffen Klassert<steffen.klassert@secunet.com>
> > Date:   Mon Oct 8 00:56:54 2012 +0000
> >
> >      ipv4: Don't report stale pmtu values to userspace
> >
> >      We report cached pmtu values even if they are already expired.
> >      Change this to not report these values after they are expired
> >      and fix a race in the expire time calculation, as suggested by
> >      Eric Dumazet.
> >
> > Still, PMTU information is not supposed to expire for 10 minutes...
> >
> >    
> With reverted commit there is no such problem on 3.7.1: IP_MTU is 
> updated and DF is set only for the first packet in case of 
> IP_PMTUDISC_WANT.
[...]

So it looks like something is going wrong with the expiry calculation
here.

This change shouldn't affect the PMTU actually used by the kernel, but
could affect Onload since that relies on netlink route updates to keep
in synch.  You didn't say you were using Onload, but if you are then we
should not bother netdev with this until we can demonstrate a problem
that involves only the kernel stack.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets
  2012-12-19 19:37     ` Ben Hutchings
@ 2012-12-20  7:14       ` Yurij M. Plotnikov
  2012-12-20  7:34       ` Steffen Klassert
  1 sibling, 0 replies; 24+ messages in thread
From: Yurij M. Plotnikov @ 2012-12-20  7:14 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev, Alexandra N. Kossovsky

On 12/19/12 23:37, Ben Hutchings wrote:
> On Wed, 2012-12-19 at 18:27 +0400, Yurij M. Plotnikov wrote:
>    
>> On 12/19/12 17:35, Ben Hutchings wrote:
>>      
>>> On Wed, 2012-12-19 at 17:10 +0400, Yurij M. Plotnikov wrote:
>>>
>>>        
>>>> On kernel 3.7.1 I get strange behaviour of IP_MTU_DISCOVER socket
>>>> option. The behaviour in case of IP_PMTUDISC_DO and IP_PMTUDISC_WANT
>>>> values of IP_MTU_DISCOVER socket option on SOCK_DGRAM socket are the
>>>> same and packet is always sent with "Don't Fragment" bit in case of
>>>> IP_PMTUDISC_WANT. Also, the value of IP_MTU socket option is not updated.
>>>>
>>>>          
>>> You could try reverting:
>>>
>>> commit ee9a8f7ab2edf801b8b514c310455c94acc232f6
>>> Author: Steffen Klassert<steffen.klassert@secunet.com>
>>> Date:   Mon Oct 8 00:56:54 2012 +0000
>>>
>>>       ipv4: Don't report stale pmtu values to userspace
>>>
>>>       We report cached pmtu values even if they are already expired.
>>>       Change this to not report these values after they are expired
>>>       and fix a race in the expire time calculation, as suggested by
>>>       Eric Dumazet.
>>>
>>> Still, PMTU information is not supposed to expire for 10 minutes...
>>>
>>>
>>>        
>> With reverted commit there is no such problem on 3.7.1: IP_MTU is
>> updated and DF is set only for the first packet in case of
>> IP_PMTUDISC_WANT.
>>      
> [...]
>
> So it looks like something is going wrong with the expiry calculation
> here.
>
> This change shouldn't affect the PMTU actually used by the kernel, but
> could affect Onload since that relies on netlink route updates to keep
> in synch.  You didn't say you were using Onload, but if you are then we
> should not bother netdev with this until we can demonstrate a problem
> that involves only the kernel stack.
>
>    
The results were obtained on pure Linux kernel without using Onload.

Yurij.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets
  2012-12-19 19:37     ` Ben Hutchings
  2012-12-20  7:14       ` Yurij M. Plotnikov
@ 2012-12-20  7:34       ` Steffen Klassert
  2012-12-20 11:22         ` Yurij M. Plotnikov
  1 sibling, 1 reply; 24+ messages in thread
From: Steffen Klassert @ 2012-12-20  7:34 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Yurij M. Plotnikov, netdev, Alexandra N. Kossovsky

On Wed, Dec 19, 2012 at 07:37:44PM +0000, Ben Hutchings wrote:
> On Wed, 2012-12-19 at 18:27 +0400, Yurij M. Plotnikov wrote:
> > On 12/19/12 17:35, Ben Hutchings wrote:
> > > On Wed, 2012-12-19 at 17:10 +0400, Yurij M. Plotnikov wrote:
> > >    
> > >> On kernel 3.7.1 I get strange behaviour of IP_MTU_DISCOVER socket
> > >> option. The behaviour in case of IP_PMTUDISC_DO and IP_PMTUDISC_WANT
> > >> values of IP_MTU_DISCOVER socket option on SOCK_DGRAM socket are the
> > >> same and packet is always sent with "Don't Fragment" bit in case of
> > >> IP_PMTUDISC_WANT. Also, the value of IP_MTU socket option is not updated.
> > >>      
> > > You could try reverting:
> > >
> > > commit ee9a8f7ab2edf801b8b514c310455c94acc232f6
> > > Author: Steffen Klassert<steffen.klassert@secunet.com>
> > > Date:   Mon Oct 8 00:56:54 2012 +0000
> > >
> > >      ipv4: Don't report stale pmtu values to userspace
> > >
> > >      We report cached pmtu values even if they are already expired.
> > >      Change this to not report these values after they are expired
> > >      and fix a race in the expire time calculation, as suggested by
> > >      Eric Dumazet.
> > >
> > > Still, PMTU information is not supposed to expire for 10 minutes...
> > >
> > >    
> > With reverted commit there is no such problem on 3.7.1: IP_MTU is 
> > updated and DF is set only for the first packet in case of 
> > IP_PMTUDISC_WANT.
> [...]
> 
> So it looks like something is going wrong with the expiry calculation
> here.
> 
> This change shouldn't affect the PMTU actually used by the kernel, but
> could affect Onload since that relies on netlink route updates to keep
> in synch.  You didn't say you were using Onload, but if you are then we
> should not bother netdev with this until we can demonstrate a problem
> that involves only the kernel stack.
> 

I'm really surprised that this change can have such an effect,
it changes nothing at the kernels pmtu handling. When looking
at the code, I found that we may report a mtu value from a stale
dst_entry when we query the mtu value with the IP_MTU socket
option. But a subsequent send() should update the socket cached
dst_entry, so at most one packet should be affected.

Does the patch below change anything?


diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 3c9d208..1049ce0 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1198,7 +1198,7 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
 	{
 		struct dst_entry *dst;
 		val = 0;
-		dst = sk_dst_get(sk);
+		dst = sk_dst_check(sk, 0);
 		if (dst) {
 			val = dst_mtu(dst);
 			dst_release(dst);

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets
  2012-12-20  7:34       ` Steffen Klassert
@ 2012-12-20 11:22         ` Yurij M. Plotnikov
  2012-12-20 12:35           ` Steffen Klassert
  0 siblings, 1 reply; 24+ messages in thread
From: Yurij M. Plotnikov @ 2012-12-20 11:22 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: Ben Hutchings, netdev, Alexandra N. Kossovsky

On 12/20/12 11:34, Steffen Klassert wrote:
> On Wed, Dec 19, 2012 at 07:37:44PM +0000, Ben Hutchings wrote:
>    
>> On Wed, 2012-12-19 at 18:27 +0400, Yurij M. Plotnikov wrote:
>>      
>>> On 12/19/12 17:35, Ben Hutchings wrote:
>>>        
>>>> On Wed, 2012-12-19 at 17:10 +0400, Yurij M. Plotnikov wrote:
>>>>
>>>>          
>>>>> On kernel 3.7.1 I get strange behaviour of IP_MTU_DISCOVER socket
>>>>> option. The behaviour in case of IP_PMTUDISC_DO and IP_PMTUDISC_WANT
>>>>> values of IP_MTU_DISCOVER socket option on SOCK_DGRAM socket are the
>>>>> same and packet is always sent with "Don't Fragment" bit in case of
>>>>> IP_PMTUDISC_WANT. Also, the value of IP_MTU socket option is not updated.
>>>>>
>>>>>            
>>>> You could try reverting:
>>>>
>>>> commit ee9a8f7ab2edf801b8b514c310455c94acc232f6
>>>> Author: Steffen Klassert<steffen.klassert@secunet.com>
>>>> Date:   Mon Oct 8 00:56:54 2012 +0000
>>>>
>>>>       ipv4: Don't report stale pmtu values to userspace
>>>>
>>>>       We report cached pmtu values even if they are already expired.
>>>>       Change this to not report these values after they are expired
>>>>       and fix a race in the expire time calculation, as suggested by
>>>>       Eric Dumazet.
>>>>
>>>> Still, PMTU information is not supposed to expire for 10 minutes...
>>>>
>>>>
>>>>          
>>> With reverted commit there is no such problem on 3.7.1: IP_MTU is
>>> updated and DF is set only for the first packet in case of
>>> IP_PMTUDISC_WANT.
>>>        
>> [...]
>>
>> So it looks like something is going wrong with the expiry calculation
>> here.
>>
>> This change shouldn't affect the PMTU actually used by the kernel, but
>> could affect Onload since that relies on netlink route updates to keep
>> in synch.  You didn't say you were using Onload, but if you are then we
>> should not bother netdev with this until we can demonstrate a problem
>> that involves only the kernel stack.
>>
>>      
> I'm really surprised that this change can have such an effect,
> it changes nothing at the kernels pmtu handling. When looking
> at the code, I found that we may report a mtu value from a stale
> dst_entry when we query the mtu value with the IP_MTU socket
> option. But a subsequent send() should update the socket cached
> dst_entry, so at most one packet should be affected.
>
> Does the patch below change anything?
>
>
> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> index 3c9d208..1049ce0 100644
> --- a/net/ipv4/ip_sockglue.c
> +++ b/net/ipv4/ip_sockglue.c
> @@ -1198,7 +1198,7 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
>   	{
>   		struct dst_entry *dst;
>   		val = 0;
> -		dst = sk_dst_get(sk);
> +		dst = sk_dst_check(sk, 0);
>   		if (dst) {
>   			val = dst_mtu(dst);
>   			dst_release(dst);
>    
With this patch kernel 3.7.1 works perfect. All described problems are 
fixed.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets
  2012-12-20 11:22         ` Yurij M. Plotnikov
@ 2012-12-20 12:35           ` Steffen Klassert
  2012-12-21 10:22             ` Steffen Klassert
  0 siblings, 1 reply; 24+ messages in thread
From: Steffen Klassert @ 2012-12-20 12:35 UTC (permalink / raw)
  To: Yurij M. Plotnikov; +Cc: Ben Hutchings, netdev, Alexandra N. Kossovsky

On Thu, Dec 20, 2012 at 03:22:13PM +0400, Yurij M. Plotnikov wrote:
> On 12/20/12 11:34, Steffen Klassert wrote:
> >
> >diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> >index 3c9d208..1049ce0 100644
> >--- a/net/ipv4/ip_sockglue.c
> >+++ b/net/ipv4/ip_sockglue.c
> >@@ -1198,7 +1198,7 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
> >  	{
> >  		struct dst_entry *dst;
> >  		val = 0;
> >-		dst = sk_dst_get(sk);
> >+		dst = sk_dst_check(sk, 0);
> >  		if (dst) {
> >  			val = dst_mtu(dst);
> >  			dst_release(dst);
> With this patch kernel 3.7.1 works perfect. All described problems
> are fixed.

Thanks for testing!

I'm not sure if we can't use this as a fix. I think with this patch it
could happen that we return -ENOTCONN instead of a pmtu value on a
connected socket. Perhaps it is better to update the cached dst_entry in
ipv4_sk_update_pmtu() when we receive the -EMSGSIZE. I'll do some
investigation.

Anyway, it is still odd that reverting my other patch 'fixes'
this issue too.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets
  2012-12-20 12:35           ` Steffen Klassert
@ 2012-12-21 10:22             ` Steffen Klassert
  2013-01-14  8:26               ` Yurij M. Plotnikov
  0 siblings, 1 reply; 24+ messages in thread
From: Steffen Klassert @ 2012-12-21 10:22 UTC (permalink / raw)
  To: Yurij M. Plotnikov; +Cc: Ben Hutchings, netdev, Alexandra N. Kossovsky

On Thu, Dec 20, 2012 at 01:35:35PM +0100, Steffen Klassert wrote:
> On Thu, Dec 20, 2012 at 03:22:13PM +0400, Yurij M. Plotnikov wrote:
> > On 12/20/12 11:34, Steffen Klassert wrote:
> > >
> > >diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> > >index 3c9d208..1049ce0 100644
> > >--- a/net/ipv4/ip_sockglue.c
> > >+++ b/net/ipv4/ip_sockglue.c
> > >@@ -1198,7 +1198,7 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
> > >  	{
> > >  		struct dst_entry *dst;
> > >  		val = 0;
> > >-		dst = sk_dst_get(sk);
> > >+		dst = sk_dst_check(sk, 0);
> > >  		if (dst) {
> > >  			val = dst_mtu(dst);
> > >  			dst_release(dst);
> > With this patch kernel 3.7.1 works perfect. All described problems
> > are fixed.
> 
> Thanks for testing!
> 
> I'm not sure if we can't use this as a fix. I think with this patch it
> could happen that we return -ENOTCONN instead of a pmtu value on a
> connected socket. Perhaps it is better to update the cached dst_entry in
> ipv4_sk_update_pmtu() when we receive the -EMSGSIZE. I'll do some
> investigation.
> 

It turned out that updating the cached dst_entry in ipv4_sk_update_pmtu()
is not trivial. We need to implement proper socket locking and we need
socket release calback functions for all protocols that use
ipv4_sk_update_pmtu(), similar to tcp.

Today is my last office day for this year, so we probably have to defer
a solution to the next year.

Thanks.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets
  2012-12-21 10:22             ` Steffen Klassert
@ 2013-01-14  8:26               ` Yurij M. Plotnikov
  2013-01-14 12:52                 ` Steffen Klassert
  2013-01-18  8:11                 ` Steffen Klassert
  0 siblings, 2 replies; 24+ messages in thread
From: Yurij M. Plotnikov @ 2013-01-14  8:26 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: Ben Hutchings, netdev, Alexandra N. Kossovsky

On 12/21/12 14:22, Steffen Klassert wrote:
> On Thu, Dec 20, 2012 at 01:35:35PM +0100, Steffen Klassert wrote:
>    
>> On Thu, Dec 20, 2012 at 03:22:13PM +0400, Yurij M. Plotnikov wrote:
>>      
>>> On 12/20/12 11:34, Steffen Klassert wrote:
>>>        
>>>> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
>>>> index 3c9d208..1049ce0 100644
>>>> --- a/net/ipv4/ip_sockglue.c
>>>> +++ b/net/ipv4/ip_sockglue.c
>>>> @@ -1198,7 +1198,7 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
>>>>   	{
>>>>   		struct dst_entry *dst;
>>>>   		val = 0;
>>>> -		dst = sk_dst_get(sk);
>>>> +		dst = sk_dst_check(sk, 0);
>>>>   		if (dst) {
>>>>   			val = dst_mtu(dst);
>>>>   			dst_release(dst);
>>>>          
>>> With this patch kernel 3.7.1 works perfect. All described problems
>>> are fixed.
>>>        
>> Thanks for testing!
>>
>> I'm not sure if we can't use this as a fix. I think with this patch it
>> could happen that we return -ENOTCONN instead of a pmtu value on a
>> connected socket. Perhaps it is better to update the cached dst_entry in
>> ipv4_sk_update_pmtu() when we receive the -EMSGSIZE. I'll do some
>> investigation.
>>
>>      
> It turned out that updating the cached dst_entry in ipv4_sk_update_pmtu()
> is not trivial. We need to implement proper socket locking and we need
> socket release calback functions for all protocols that use
> ipv4_sk_update_pmtu(), similar to tcp.
>
> Today is my last office day for this year, so we probably have to defer
> a solution to the next year.
>
> Thanks.
>    
Hi Steffen,

Could you, please, tell me is there any news about this bug?

Thanks in advance,
Yurij.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets
  2013-01-14  8:26               ` Yurij M. Plotnikov
@ 2013-01-14 12:52                 ` Steffen Klassert
  2013-01-18  8:11                 ` Steffen Klassert
  1 sibling, 0 replies; 24+ messages in thread
From: Steffen Klassert @ 2013-01-14 12:52 UTC (permalink / raw)
  To: Yurij M. Plotnikov; +Cc: Ben Hutchings, netdev, Alexandra N. Kossovsky

On Mon, Jan 14, 2013 at 12:26:59PM +0400, Yurij M. Plotnikov wrote:
> 
> Could you, please, tell me is there any news about this bug?
> 

Currently I'm testing some patches.
I'll contact you  when I have something that works,
probably later this week.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets
  2013-01-14  8:26               ` Yurij M. Plotnikov
  2013-01-14 12:52                 ` Steffen Klassert
@ 2013-01-18  8:11                 ` Steffen Klassert
  2013-01-18  8:14                   ` [RFC PATCH 1/3] ipv4: Invalidate the socket cached route on pmtu events if possible Steffen Klassert
                                     ` (3 more replies)
  1 sibling, 4 replies; 24+ messages in thread
From: Steffen Klassert @ 2013-01-18  8:11 UTC (permalink / raw)
  To: Yurij M. Plotnikov; +Cc: Ben Hutchings, netdev, Alexandra N. Kossovsky

On Mon, Jan 14, 2013 at 12:26:59PM +0400, Yurij M. Plotnikov wrote:
> 
> Could you, please, tell me is there any news about this bug?
> 

Ok, I tried to reconstruct your testcase and I can confirm this bug.
It happens only on the first pmtu event on a given route. It works
if I run this test again. Actually it works sometimes even on the
first pmtu event, so it took me quite a while to understand what's ging on.

This is because standart routes (no pmtu or redirect events happened on
that route) are per cpu, so each cpu has it's own struct rtable. This means
that we do not invalidate the socket cached route if the NET_RX_SOFTIRQ
(which handles the pmtu event) is not served by the same cpu that the
sending socket uses. Exceptional routes (routes where a pmtu or redirect
event occured) are not per cpu, so there is no problem.

I'll send three patches in reply to this mail. Applying patch one and two
should fix the bug you reported. If you use IPsec you need patch three too.
The patches are marked as RFC, I'll do proper patch submission if you can
confirm that they fix this bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH 1/3] ipv4: Invalidate the socket cached route on pmtu events if possible
  2013-01-18  8:11                 ` Steffen Klassert
@ 2013-01-18  8:14                   ` Steffen Klassert
  2013-01-18 19:38                     ` David Miller
  2013-01-19  0:54                     ` Julian Anastasov
  2013-01-18  8:15                   ` [RFC PATCH 2/3] ipv4: Add a socket release callback for datagram sockets Steffen Klassert
                                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 24+ messages in thread
From: Steffen Klassert @ 2013-01-18  8:14 UTC (permalink / raw)
  To: Yurij M. Plotnikov; +Cc: Ben Hutchings, netdev, Alexandra N. Kossovsky

The route lookup in ipv4_sk_update_pmtu() might return a route
different from the route we cached at the socket. This is because
standart routes are per cpu, so each cpu has it's own struct rtable.
This means that we do not invalidate the socket cached route if the
NET_RX_SOFTIRQ is not served by the same cpu that the sending socket
uses. As a result, the cached route reused until we disconnect.

With this patch we invalidate the socket cached route if possible.
If the socket is owened by the user, we can't update the cached
route directly. A followup patch will implement socket release
callback functions for datagram sockets to handle this case.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/route.c |   40 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 259cbee..e59541b 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -965,7 +965,7 @@ void ipv4_update_pmtu(struct sk_buff *skb, struct net *net, u32 mtu,
 }
 EXPORT_SYMBOL_GPL(ipv4_update_pmtu);
 
-void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu)
+static void __ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu)
 {
 	const struct iphdr *iph = (const struct iphdr *) skb->data;
 	struct flowi4 fl4;
@@ -978,6 +978,44 @@ void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu)
 		ip_rt_put(rt);
 	}
 }
+
+void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu)
+{
+	const struct iphdr *iph = (const struct iphdr *) skb->data;
+	struct flowi4 fl4;
+	struct rtable *rt = (struct rtable *) __sk_dst_get(sk);
+	struct dst_entry *dst;
+
+	bh_lock_sock(sk);
+	if (sock_owned_by_user(sk) || !rt) {
+		__ipv4_sk_update_pmtu(skb, sk, mtu);
+		goto out;
+	}
+
+	__build_flow_key(&fl4, sk, iph, 0, 0, 0, 0, 0);
+
+	if (!__sk_dst_check(sk, 0)) {
+		rt = ip_route_output_flow(sock_net(sk), &fl4, sk);
+		if (IS_ERR(rt))
+			goto out;
+	}
+
+	__ip_rt_update_pmtu((struct rtable *) rt->dst.path, &fl4, mtu);
+
+	dst = dst_check(&rt->dst, 0);
+	if (!dst) {
+		rt = ip_route_output_flow(sock_net(sk), &fl4, sk);
+		if (IS_ERR(rt))
+			goto out;
+
+		dst = &rt->dst;
+	}
+
+	 __sk_dst_set(sk, dst);
+
+out:
+	bh_unlock_sock(sk);
+}
 EXPORT_SYMBOL_GPL(ipv4_sk_update_pmtu);
 
 void ipv4_redirect(struct sk_buff *skb, struct net *net,
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH 2/3] ipv4: Add a socket release callback for datagram sockets
  2013-01-18  8:11                 ` Steffen Klassert
  2013-01-18  8:14                   ` [RFC PATCH 1/3] ipv4: Invalidate the socket cached route on pmtu events if possible Steffen Klassert
@ 2013-01-18  8:15                   ` Steffen Klassert
  2013-01-18 19:39                     ` David Miller
  2013-01-18  8:16                   ` [RFC PATCH 3/3] xfrm4: Invalidate all ipv4 routes on IPsec pmtu events Steffen Klassert
  2013-01-21 11:31                   ` PMTU discovery is broken on kernel 3.7.1 for UDP sockets Yurij M. Plotnikov
  3 siblings, 1 reply; 24+ messages in thread
From: Steffen Klassert @ 2013-01-18  8:15 UTC (permalink / raw)
  To: Yurij M. Plotnikov; +Cc: Ben Hutchings, netdev, Alexandra N. Kossovsky

This implements a socket release callback function to check
if the socket cached route got invalid during the time
we owned the socket. The function is used from udp, raw
and ping sockets.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/ip.h    |    2 ++
 net/ipv4/datagram.c |   25 +++++++++++++++++++++++++
 net/ipv4/ping.c     |    1 +
 net/ipv4/raw.c      |    1 +
 net/ipv4/udp.c      |    1 +
 5 files changed, 30 insertions(+)

diff --git a/include/net/ip.h b/include/net/ip.h
index 0707fb9..a68f838 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -143,6 +143,8 @@ static inline struct sk_buff *ip_finish_skb(struct sock *sk, struct flowi4 *fl4)
 extern int		ip4_datagram_connect(struct sock *sk, 
 					     struct sockaddr *uaddr, int addr_len);
 
+extern void ip4_datagram_release_cb(struct sock *sk);
+
 struct ip_reply_arg {
 	struct kvec iov[1];   
 	int	    flags;
diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c
index 424fafb..b28e863 100644
--- a/net/ipv4/datagram.c
+++ b/net/ipv4/datagram.c
@@ -85,3 +85,28 @@ out:
 	return err;
 }
 EXPORT_SYMBOL(ip4_datagram_connect);
+
+void ip4_datagram_release_cb(struct sock *sk)
+{
+	const struct inet_sock *inet = inet_sk(sk);
+	const struct ip_options_rcu *inet_opt;
+	__be32 daddr = inet->inet_daddr;
+	struct flowi4 fl4;
+	struct rtable *rt;
+
+	if (! __sk_dst_get(sk) || __sk_dst_check(sk, 0))
+		return;
+
+	rcu_read_lock();
+	inet_opt = rcu_dereference(inet->inet_opt);
+	if (inet_opt && inet_opt->opt.srr)
+		daddr = inet_opt->opt.faddr;
+	rt = ip_route_output_ports(sock_net(sk), &fl4, sk, daddr,
+				   inet->inet_saddr, inet->inet_dport,
+				   inet->inet_sport, sk->sk_protocol,
+				   RT_CONN_FLAGS(sk), sk->sk_bound_dev_if);
+	if (!IS_ERR(rt))
+		__sk_dst_set(sk, &rt->dst);
+	rcu_read_unlock();
+}
+EXPORT_SYMBOL_GPL(ip4_datagram_release_cb);
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 8f3d054..6f9c072 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -738,6 +738,7 @@ struct proto ping_prot = {
 	.recvmsg =	ping_recvmsg,
 	.bind =		ping_bind,
 	.backlog_rcv =	ping_queue_rcv_skb,
+	.release_cb =	ip4_datagram_release_cb,
 	.hash =		ping_v4_hash,
 	.unhash =	ping_v4_unhash,
 	.get_port =	ping_v4_get_port,
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 73d1e4d..6f08991 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -894,6 +894,7 @@ struct proto raw_prot = {
 	.recvmsg	   = raw_recvmsg,
 	.bind		   = raw_bind,
 	.backlog_rcv	   = raw_rcv_skb,
+	.release_cb	   = ip4_datagram_release_cb,
 	.hash		   = raw_hash_sk,
 	.unhash		   = raw_unhash_sk,
 	.obj_size	   = sizeof(struct raw_sock),
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 79c8dbe..1f4d405 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1952,6 +1952,7 @@ struct proto udp_prot = {
 	.recvmsg	   = udp_recvmsg,
 	.sendpage	   = udp_sendpage,
 	.backlog_rcv	   = __udp_queue_rcv_skb,
+	.release_cb	   = ip4_datagram_release_cb,
 	.hash		   = udp_lib_hash,
 	.unhash		   = udp_lib_unhash,
 	.rehash		   = udp_v4_rehash,
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH 3/3] xfrm4: Invalidate all ipv4 routes on IPsec pmtu events
  2013-01-18  8:11                 ` Steffen Klassert
  2013-01-18  8:14                   ` [RFC PATCH 1/3] ipv4: Invalidate the socket cached route on pmtu events if possible Steffen Klassert
  2013-01-18  8:15                   ` [RFC PATCH 2/3] ipv4: Add a socket release callback for datagram sockets Steffen Klassert
@ 2013-01-18  8:16                   ` Steffen Klassert
  2013-01-18 19:39                     ` David Miller
  2013-01-21 11:31                   ` PMTU discovery is broken on kernel 3.7.1 for UDP sockets Yurij M. Plotnikov
  3 siblings, 1 reply; 24+ messages in thread
From: Steffen Klassert @ 2013-01-18  8:16 UTC (permalink / raw)
  To: Yurij M. Plotnikov; +Cc: Ben Hutchings, netdev, Alexandra N. Kossovsky

On IPsec pmtu events we can't access the transport headers of
the original packet, so we can't find the socket that sent
the packet. The only chance to notify the socket about the
pmtu change is to force a relookup for all routes. This
patch implenents this for the IPsec protocols.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/ah4.c    |    7 +++++--
 net/ipv4/esp4.c   |    7 +++++--
 net/ipv4/ipcomp.c |    7 +++++--
 3 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c
index a0d8392..612ecc9 100644
--- a/net/ipv4/ah4.c
+++ b/net/ipv4/ah4.c
@@ -413,9 +413,12 @@ static void ah4_err(struct sk_buff *skb, u32 info)
 	if (!x)
 		return;
 
-	if (icmp_hdr(skb)->type == ICMP_DEST_UNREACH)
+	if (icmp_hdr(skb)->type == ICMP_DEST_UNREACH) {
+		atomic_inc(&flow_cache_genid);
+		rt_genid_bump(net);
+
 		ipv4_update_pmtu(skb, net, info, 0, 0, IPPROTO_AH, 0);
-	else
+	} else
 		ipv4_redirect(skb, net, 0, 0, IPPROTO_AH, 0);
 	xfrm_state_put(x);
 }
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index b61e9de..2d32ae5 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -499,9 +499,12 @@ static void esp4_err(struct sk_buff *skb, u32 info)
 	if (!x)
 		return;
 
-	if (icmp_hdr(skb)->type == ICMP_DEST_UNREACH)
+	if (icmp_hdr(skb)->type == ICMP_DEST_UNREACH) {
+		atomic_inc(&flow_cache_genid);
+		rt_genid_bump(net);
+
 		ipv4_update_pmtu(skb, net, info, 0, 0, IPPROTO_ESP, 0);
-	else
+	} else
 		ipv4_redirect(skb, net, 0, 0, IPPROTO_ESP, 0);
 	xfrm_state_put(x);
 }
diff --git a/net/ipv4/ipcomp.c b/net/ipv4/ipcomp.c
index d3ab47e..9a46dae 100644
--- a/net/ipv4/ipcomp.c
+++ b/net/ipv4/ipcomp.c
@@ -47,9 +47,12 @@ static void ipcomp4_err(struct sk_buff *skb, u32 info)
 	if (!x)
 		return;
 
-	if (icmp_hdr(skb)->type == ICMP_DEST_UNREACH)
+	if (icmp_hdr(skb)->type == ICMP_DEST_UNREACH) {
+		atomic_inc(&flow_cache_genid);
+		rt_genid_bump(net);
+
 		ipv4_update_pmtu(skb, net, info, 0, 0, IPPROTO_COMP, 0);
-	else
+	} else
 		ipv4_redirect(skb, net, 0, 0, IPPROTO_COMP, 0);
 	xfrm_state_put(x);
 }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 1/3] ipv4: Invalidate the socket cached route on pmtu events if possible
  2013-01-18  8:14                   ` [RFC PATCH 1/3] ipv4: Invalidate the socket cached route on pmtu events if possible Steffen Klassert
@ 2013-01-18 19:38                     ` David Miller
  2013-01-19  0:54                     ` Julian Anastasov
  1 sibling, 0 replies; 24+ messages in thread
From: David Miller @ 2013-01-18 19:38 UTC (permalink / raw)
  To: steffen.klassert; +Cc: Yurij.Plotnikov, bhutchings, netdev, Alexandra.Kossovsky

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Fri, 18 Jan 2013 09:14:20 +0100

> The route lookup in ipv4_sk_update_pmtu() might return a route
> different from the route we cached at the socket. This is because
> standart routes are per cpu, so each cpu has it's own struct rtable.
> This means that we do not invalidate the socket cached route if the
> NET_RX_SOFTIRQ is not served by the same cpu that the sending socket
> uses. As a result, the cached route reused until we disconnect.
> 
> With this patch we invalidate the socket cached route if possible.
> If the socket is owened by the user, we can't update the cached
> route directly. A followup patch will implement socket release
> callback functions for datagram sockets to handle this case.
> 
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

This looks fine.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 2/3] ipv4: Add a socket release callback for datagram sockets
  2013-01-18  8:15                   ` [RFC PATCH 2/3] ipv4: Add a socket release callback for datagram sockets Steffen Klassert
@ 2013-01-18 19:39                     ` David Miller
  0 siblings, 0 replies; 24+ messages in thread
From: David Miller @ 2013-01-18 19:39 UTC (permalink / raw)
  To: steffen.klassert; +Cc: Yurij.Plotnikov, bhutchings, netdev, Alexandra.Kossovsky

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Fri, 18 Jan 2013 09:15:14 +0100

> This implements a socket release callback function to check
> if the socket cached route got invalid during the time
> we owned the socket. The function is used from udp, raw
> and ping sockets.
> 
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

Ok.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 3/3] xfrm4: Invalidate all ipv4 routes on IPsec pmtu events
  2013-01-18  8:16                   ` [RFC PATCH 3/3] xfrm4: Invalidate all ipv4 routes on IPsec pmtu events Steffen Klassert
@ 2013-01-18 19:39                     ` David Miller
  2013-01-21  6:48                       ` Steffen Klassert
  2013-01-21 12:04                       ` Steffen Klassert
  0 siblings, 2 replies; 24+ messages in thread
From: David Miller @ 2013-01-18 19:39 UTC (permalink / raw)
  To: steffen.klassert; +Cc: Yurij.Plotnikov, bhutchings, netdev, Alexandra.Kossovsky

From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Fri, 18 Jan 2013 09:16:01 +0100

> On IPsec pmtu events we can't access the transport headers of
> the original packet, so we can't find the socket that sent
> the packet. The only chance to notify the socket about the
> pmtu change is to force a relookup for all routes. This
> patch implenents this for the IPsec protocols.
> 
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

Expensive, but I really can't come up with a better idea.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 1/3] ipv4: Invalidate the socket cached route on pmtu events if possible
  2013-01-18  8:14                   ` [RFC PATCH 1/3] ipv4: Invalidate the socket cached route on pmtu events if possible Steffen Klassert
  2013-01-18 19:38                     ` David Miller
@ 2013-01-19  0:54                     ` Julian Anastasov
  2013-01-21  6:43                       ` Steffen Klassert
  1 sibling, 1 reply; 24+ messages in thread
From: Julian Anastasov @ 2013-01-19  0:54 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Yurij M. Plotnikov, Ben Hutchings, netdev, Alexandra N. Kossovsky


	Hello,

On Fri, 18 Jan 2013, Steffen Klassert wrote:

> The route lookup in ipv4_sk_update_pmtu() might return a route
> different from the route we cached at the socket. This is because
> standart routes are per cpu, so each cpu has it's own struct rtable.
> This means that we do not invalidate the socket cached route if the
> NET_RX_SOFTIRQ is not served by the same cpu that the sending socket
> uses. As a result, the cached route reused until we disconnect.
> 
> With this patch we invalidate the socket cached route if possible.
> If the socket is owened by the user, we can't update the cached
> route directly. A followup patch will implement socket release
> callback functions for datagram sockets to handle this case.
> 
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> ---
>  net/ipv4/route.c |   40 +++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 259cbee..e59541b 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -965,7 +965,7 @@ void ipv4_update_pmtu(struct sk_buff *skb, struct net *net, u32 mtu,
>  }
>  EXPORT_SYMBOL_GPL(ipv4_update_pmtu);
>  
> -void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu)
> +static void __ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu)
>  {
>  	const struct iphdr *iph = (const struct iphdr *) skb->data;
>  	struct flowi4 fl4;
> @@ -978,6 +978,44 @@ void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu)
>  		ip_rt_put(rt);
>  	}
>  }
> +
> +void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu)
> +{
> +	const struct iphdr *iph = (const struct iphdr *) skb->data;
> +	struct flowi4 fl4;
> +	struct rtable *rt = (struct rtable *) __sk_dst_get(sk);

	I guess __sk_dst_get should be called after
bh_lock_sock ?

> +	struct dst_entry *dst;
> +
> +	bh_lock_sock(sk);
> +	if (sock_owned_by_user(sk) || !rt) {
> +		__ipv4_sk_update_pmtu(skb, sk, mtu);
> +		goto out;
> +	}

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 1/3] ipv4: Invalidate the socket cached route on pmtu events if possible
  2013-01-19  0:54                     ` Julian Anastasov
@ 2013-01-21  6:43                       ` Steffen Klassert
  0 siblings, 0 replies; 24+ messages in thread
From: Steffen Klassert @ 2013-01-21  6:43 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: Yurij M. Plotnikov, Ben Hutchings, netdev, Alexandra N. Kossovsky

On Sat, Jan 19, 2013 at 02:54:04AM +0200, Julian Anastasov wrote:
> > +
> > +void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu)
> > +{
> > +	const struct iphdr *iph = (const struct iphdr *) skb->data;
> > +	struct flowi4 fl4;
> > +	struct rtable *rt = (struct rtable *) __sk_dst_get(sk);
> 
> 	I guess __sk_dst_get should be called after
> bh_lock_sock ?

Yes, absolutely. I'll fix it before I submit the patch.

Thanks for your review!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 3/3] xfrm4: Invalidate all ipv4 routes on IPsec pmtu events
  2013-01-18 19:39                     ` David Miller
@ 2013-01-21  6:48                       ` Steffen Klassert
  2013-01-21 12:04                       ` Steffen Klassert
  1 sibling, 0 replies; 24+ messages in thread
From: Steffen Klassert @ 2013-01-21  6:48 UTC (permalink / raw)
  To: David Miller; +Cc: Yurij.Plotnikov, bhutchings, netdev, Alexandra.Kossovsky

On Fri, Jan 18, 2013 at 02:39:23PM -0500, David Miller wrote:
> From: Steffen Klassert <steffen.klassert@secunet.com>
> Date: Fri, 18 Jan 2013 09:16:01 +0100
> 
> > On IPsec pmtu events we can't access the transport headers of
> > the original packet, so we can't find the socket that sent
> > the packet. The only chance to notify the socket about the
> > pmtu change is to force a relookup for all routes. This
> > patch implenents this for the IPsec protocols.
> > 
> > Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> 
> Expensive, but I really can't come up with a better idea.

Yes, it's a last resort fix. But pmtu events are rare, so
it should not harm too much.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets
  2013-01-18  8:11                 ` Steffen Klassert
                                     ` (2 preceding siblings ...)
  2013-01-18  8:16                   ` [RFC PATCH 3/3] xfrm4: Invalidate all ipv4 routes on IPsec pmtu events Steffen Klassert
@ 2013-01-21 11:31                   ` Yurij M. Plotnikov
  2013-01-21 11:38                     ` Steffen Klassert
  3 siblings, 1 reply; 24+ messages in thread
From: Yurij M. Plotnikov @ 2013-01-21 11:31 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: Ben Hutchings, netdev, Alexandra N. Kossovsky

On 01/18/13 12:11, Steffen Klassert wrote:
> On Mon, Jan 14, 2013 at 12:26:59PM +0400, Yurij M. Plotnikov wrote:
>    
>> Could you, please, tell me is there any news about this bug?
>>
>>      
> Ok, I tried to reconstruct your testcase and I can confirm this bug.
> It happens only on the first pmtu event on a given route. It works
> if I run this test again. Actually it works sometimes even on the
> first pmtu event, so it took me quite a while to understand what's ging on.
>
> This is because standart routes (no pmtu or redirect events happened on
> that route) are per cpu, so each cpu has it's own struct rtable. This means
> that we do not invalidate the socket cached route if the NET_RX_SOFTIRQ
> (which handles the pmtu event) is not served by the same cpu that the
> sending socket uses. Exceptional routes (routes where a pmtu or redirect
> event occured) are not per cpu, so there is no problem.
>
> I'll send three patches in reply to this mail. Applying patch one and two
> should fix the bug you reported. If you use IPsec you need patch three too.
> The patches are marked as RFC, I'll do proper patch submission if you can
> confirm that they fix this bug.
>    
Sorry for long delay. I've just checked patches 1+2 and they fix the bug 
for me.

Yurij.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets
  2013-01-21 11:31                   ` PMTU discovery is broken on kernel 3.7.1 for UDP sockets Yurij M. Plotnikov
@ 2013-01-21 11:38                     ` Steffen Klassert
  0 siblings, 0 replies; 24+ messages in thread
From: Steffen Klassert @ 2013-01-21 11:38 UTC (permalink / raw)
  To: Yurij M. Plotnikov; +Cc: Ben Hutchings, netdev, Alexandra N. Kossovsky

On Mon, Jan 21, 2013 at 03:31:20PM +0400, Yurij M. Plotnikov wrote:
> On 01/18/13 12:11, Steffen Klassert wrote:
> >
> >I'll send three patches in reply to this mail. Applying patch one and two
> >should fix the bug you reported. If you use IPsec you need patch three too.
> >The patches are marked as RFC, I'll do proper patch submission if you can
> >confirm that they fix this bug.
> Sorry for long delay. I've just checked patches 1+2 and they fix the
> bug for me.
> 

Thanks a lot for reporting and testing!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 3/3] xfrm4: Invalidate all ipv4 routes on IPsec pmtu events
  2013-01-18 19:39                     ` David Miller
  2013-01-21  6:48                       ` Steffen Klassert
@ 2013-01-21 12:04                       ` Steffen Klassert
  1 sibling, 0 replies; 24+ messages in thread
From: Steffen Klassert @ 2013-01-21 12:04 UTC (permalink / raw)
  To: David Miller; +Cc: Yurij.Plotnikov, bhutchings, netdev, Alexandra.Kossovsky

On Fri, Jan 18, 2013 at 02:39:23PM -0500, David Miller wrote:
> From: Steffen Klassert <steffen.klassert@secunet.com>
> Date: Fri, 18 Jan 2013 09:16:01 +0100
> 
> > On IPsec pmtu events we can't access the transport headers of
> > the original packet, so we can't find the socket that sent
> > the packet. The only chance to notify the socket about the
> > pmtu change is to force a relookup for all routes. This
> > patch implenents this for the IPsec protocols.
> > 
> > Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> 
> Expensive, but I really can't come up with a better idea.

I've just applied this to the ipsec tree.
I'll send a pull request tomorrow.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2013-01-21 12:04 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-19 13:10 PMTU discovery is broken on kernel 3.7.1 for UDP sockets Yurij M. Plotnikov
2012-12-19 13:35 ` Ben Hutchings
2012-12-19 14:27   ` Yurij M. Plotnikov
2012-12-19 19:37     ` Ben Hutchings
2012-12-20  7:14       ` Yurij M. Plotnikov
2012-12-20  7:34       ` Steffen Klassert
2012-12-20 11:22         ` Yurij M. Plotnikov
2012-12-20 12:35           ` Steffen Klassert
2012-12-21 10:22             ` Steffen Klassert
2013-01-14  8:26               ` Yurij M. Plotnikov
2013-01-14 12:52                 ` Steffen Klassert
2013-01-18  8:11                 ` Steffen Klassert
2013-01-18  8:14                   ` [RFC PATCH 1/3] ipv4: Invalidate the socket cached route on pmtu events if possible Steffen Klassert
2013-01-18 19:38                     ` David Miller
2013-01-19  0:54                     ` Julian Anastasov
2013-01-21  6:43                       ` Steffen Klassert
2013-01-18  8:15                   ` [RFC PATCH 2/3] ipv4: Add a socket release callback for datagram sockets Steffen Klassert
2013-01-18 19:39                     ` David Miller
2013-01-18  8:16                   ` [RFC PATCH 3/3] xfrm4: Invalidate all ipv4 routes on IPsec pmtu events Steffen Klassert
2013-01-18 19:39                     ` David Miller
2013-01-21  6:48                       ` Steffen Klassert
2013-01-21 12:04                       ` Steffen Klassert
2013-01-21 11:31                   ` PMTU discovery is broken on kernel 3.7.1 for UDP sockets Yurij M. Plotnikov
2013-01-21 11:38                     ` Steffen Klassert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).