* PMTU discovery is broken on kernel 3.7.1 for UDP sockets @ 2012-12-19 13:10 Yurij M. Plotnikov 2012-12-19 13:35 ` Ben Hutchings 0 siblings, 1 reply; 24+ messages in thread From: Yurij M. Plotnikov @ 2012-12-19 13:10 UTC (permalink / raw) To: netdev, Ben Hutchings, Alexandra N. Kossovsky On kernel 3.7.1 I get strange behaviour of IP_MTU_DISCOVER socket option. The behaviour in case of IP_PMTUDISC_DO and IP_PMTUDISC_WANT values of IP_MTU_DISCOVER socket option on SOCK_DGRAM socket are the same and packet is always sent with "Don't Fragment" bit in case of IP_PMTUDISC_WANT. Also, the value of IP_MTU socket option is not updated. This can be reproduced with 3 hosts configuration. Let it be the hosts: host_A, host_B and host _C. host_A via interface eth1 connected with host_B via intefaces eth1. Let host_C via interface eth1 connected with host_B via interface eth2. Also Lets address 10.0.1.1/24 is assigned to eth1 on host_A; 10.0.1.2/24 is assigned to eth1 on host_B; 10.0.2.1/24 is assigned to eth2 on host_B; 10.0.2.2/24 is assigned to eth1 on host_C. Also there are two routes: "10.0.2.2 via 10.0.1.2 dev eth1" on host_A and "10.0.1.1 via 10.0.2.1 dev eth1" on host_C. Also forwarding is on on host_B. So we have the following picture: host_A-eth1(10.0.1.1)<-->(10.0.1.2)eth1-host_B-eth2(10.0.2.1)<-->(10.0.2.2)eth1-host_C MTU is equal to 1500 on all involved interfaces. Then we make the followign steps: on host_A: 1. socket(SOCK_DGRAM) -> 6 2. bind(6, 10.0.1.1:25630) -> 0 on host_C: 3. socket(SOCK_DGRAM) -> 5 4. bind(5, 10.0.2.2:25631) -> 0 on host_A: 5. connect(6, 10.0.2.2:25631) -> 0 on host_C: 6. connect(5, 10.0.1.1:25630) -> 0 on host_A 7. getsockopt(6,IP_MTU) -> 0 // Returns that MTU is 1500 8. getsockopt(6,IP_MTU_DISCOVER) -> 0 // Returns that default value is IP_PMTUDISC_WANT On eth2 on host_B and on eth1 on host_C change MTU from 1500 to 750. Wait for a while. 9. send(6, lenght=1400) -> 1400 // the packet is sent with "Don't Fragment" bit, tcpdump on eth1 on host_B shows it 10. sleep(5); 11. send(6, length=1400) -> -1 with EMSGSIZE 12. sleep(5); 13. getsockopt(6,IP_MTU) -> 0 // Returns that MTU is 1500 once again. So value is not updated. 14. send(6, lenght=1400) -> 1400 // the packet one again is sent with "Don't Fragment" bit, tcpdump on eth1 on host_B shows it So "Don't Fragment" bit is always set for the packets in case when value of IP_MTU_DISCOVER is IP_PMTUDISC_WANT. If at step 8 we change IP_MTU_DISCOVER value from IP_PMTUDISC_WANT to IP_PMTUDISC_DO we have the same picture. The value of IP_MTU socket options is still 1500 at step 13 in this case. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets 2012-12-19 13:10 PMTU discovery is broken on kernel 3.7.1 for UDP sockets Yurij M. Plotnikov @ 2012-12-19 13:35 ` Ben Hutchings 2012-12-19 14:27 ` Yurij M. Plotnikov 0 siblings, 1 reply; 24+ messages in thread From: Ben Hutchings @ 2012-12-19 13:35 UTC (permalink / raw) To: Yurij M. Plotnikov; +Cc: netdev, Alexandra N. Kossovsky On Wed, 2012-12-19 at 17:10 +0400, Yurij M. Plotnikov wrote: > On kernel 3.7.1 I get strange behaviour of IP_MTU_DISCOVER socket > option. The behaviour in case of IP_PMTUDISC_DO and IP_PMTUDISC_WANT > values of IP_MTU_DISCOVER socket option on SOCK_DGRAM socket are the > same and packet is always sent with "Don't Fragment" bit in case of > IP_PMTUDISC_WANT. Also, the value of IP_MTU socket option is not updated. You could try reverting: commit ee9a8f7ab2edf801b8b514c310455c94acc232f6 Author: Steffen Klassert <steffen.klassert@secunet.com> Date: Mon Oct 8 00:56:54 2012 +0000 ipv4: Don't report stale pmtu values to userspace We report cached pmtu values even if they are already expired. Change this to not report these values after they are expired and fix a race in the expire time calculation, as suggested by Eric Dumazet. Still, PMTU information is not supposed to expire for 10 minutes... [...] > On eth2 on host_B and on eth1 on host_C change MTU from 1500 to 750. > Wait for a while. > > 9. send(6, lenght=1400) -> 1400 // the packet is sent with "Don't > Fragment" bit, tcpdump on eth1 on host_B shows it > 10. sleep(5); > 11. send(6, length=1400) -> -1 with EMSGSIZE > 12. sleep(5); > 13. getsockopt(6,IP_MTU) -> 0 // Returns that MTU is 1500 once again. So > value is not updated. [...] What if you read this option immediately before the sleep(5)? Ben. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets 2012-12-19 13:35 ` Ben Hutchings @ 2012-12-19 14:27 ` Yurij M. Plotnikov 2012-12-19 19:37 ` Ben Hutchings 0 siblings, 1 reply; 24+ messages in thread From: Yurij M. Plotnikov @ 2012-12-19 14:27 UTC (permalink / raw) To: Ben Hutchings; +Cc: netdev, Alexandra N. Kossovsky On 12/19/12 17:35, Ben Hutchings wrote: > On Wed, 2012-12-19 at 17:10 +0400, Yurij M. Plotnikov wrote: > >> On kernel 3.7.1 I get strange behaviour of IP_MTU_DISCOVER socket >> option. The behaviour in case of IP_PMTUDISC_DO and IP_PMTUDISC_WANT >> values of IP_MTU_DISCOVER socket option on SOCK_DGRAM socket are the >> same and packet is always sent with "Don't Fragment" bit in case of >> IP_PMTUDISC_WANT. Also, the value of IP_MTU socket option is not updated. >> > You could try reverting: > > commit ee9a8f7ab2edf801b8b514c310455c94acc232f6 > Author: Steffen Klassert<steffen.klassert@secunet.com> > Date: Mon Oct 8 00:56:54 2012 +0000 > > ipv4: Don't report stale pmtu values to userspace > > We report cached pmtu values even if they are already expired. > Change this to not report these values after they are expired > and fix a race in the expire time calculation, as suggested by > Eric Dumazet. > > Still, PMTU information is not supposed to expire for 10 minutes... > > With reverted commit there is no such problem on 3.7.1: IP_MTU is updated and DF is set only for the first packet in case of IP_PMTUDISC_WANT. > [...] > >> On eth2 on host_B and on eth1 on host_C change MTU from 1500 to 750. >> Wait for a while. >> >> 9. send(6, lenght=1400) -> 1400 // the packet is sent with "Don't >> Fragment" bit, tcpdump on eth1 on host_B shows it >> 10. sleep(5); >> 11. send(6, length=1400) -> -1 with EMSGSIZE >> 12. sleep(5); >> 13. getsockopt(6,IP_MTU) -> 0 // Returns that MTU is 1500 once again. So >> value is not updated. >> > [...] > > What if you read this option immediately before the sleep(5)? > It still returns that MTU is 1500. Yurij. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets 2012-12-19 14:27 ` Yurij M. Plotnikov @ 2012-12-19 19:37 ` Ben Hutchings 2012-12-20 7:14 ` Yurij M. Plotnikov 2012-12-20 7:34 ` Steffen Klassert 0 siblings, 2 replies; 24+ messages in thread From: Ben Hutchings @ 2012-12-19 19:37 UTC (permalink / raw) To: Yurij M. Plotnikov; +Cc: netdev, Alexandra N. Kossovsky On Wed, 2012-12-19 at 18:27 +0400, Yurij M. Plotnikov wrote: > On 12/19/12 17:35, Ben Hutchings wrote: > > On Wed, 2012-12-19 at 17:10 +0400, Yurij M. Plotnikov wrote: > > > >> On kernel 3.7.1 I get strange behaviour of IP_MTU_DISCOVER socket > >> option. The behaviour in case of IP_PMTUDISC_DO and IP_PMTUDISC_WANT > >> values of IP_MTU_DISCOVER socket option on SOCK_DGRAM socket are the > >> same and packet is always sent with "Don't Fragment" bit in case of > >> IP_PMTUDISC_WANT. Also, the value of IP_MTU socket option is not updated. > >> > > You could try reverting: > > > > commit ee9a8f7ab2edf801b8b514c310455c94acc232f6 > > Author: Steffen Klassert<steffen.klassert@secunet.com> > > Date: Mon Oct 8 00:56:54 2012 +0000 > > > > ipv4: Don't report stale pmtu values to userspace > > > > We report cached pmtu values even if they are already expired. > > Change this to not report these values after they are expired > > and fix a race in the expire time calculation, as suggested by > > Eric Dumazet. > > > > Still, PMTU information is not supposed to expire for 10 minutes... > > > > > With reverted commit there is no such problem on 3.7.1: IP_MTU is > updated and DF is set only for the first packet in case of > IP_PMTUDISC_WANT. [...] So it looks like something is going wrong with the expiry calculation here. This change shouldn't affect the PMTU actually used by the kernel, but could affect Onload since that relies on netlink route updates to keep in synch. You didn't say you were using Onload, but if you are then we should not bother netdev with this until we can demonstrate a problem that involves only the kernel stack. Ben. -- Ben Hutchings, Staff Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets 2012-12-19 19:37 ` Ben Hutchings @ 2012-12-20 7:14 ` Yurij M. Plotnikov 2012-12-20 7:34 ` Steffen Klassert 1 sibling, 0 replies; 24+ messages in thread From: Yurij M. Plotnikov @ 2012-12-20 7:14 UTC (permalink / raw) To: Ben Hutchings; +Cc: netdev, Alexandra N. Kossovsky On 12/19/12 23:37, Ben Hutchings wrote: > On Wed, 2012-12-19 at 18:27 +0400, Yurij M. Plotnikov wrote: > >> On 12/19/12 17:35, Ben Hutchings wrote: >> >>> On Wed, 2012-12-19 at 17:10 +0400, Yurij M. Plotnikov wrote: >>> >>> >>>> On kernel 3.7.1 I get strange behaviour of IP_MTU_DISCOVER socket >>>> option. The behaviour in case of IP_PMTUDISC_DO and IP_PMTUDISC_WANT >>>> values of IP_MTU_DISCOVER socket option on SOCK_DGRAM socket are the >>>> same and packet is always sent with "Don't Fragment" bit in case of >>>> IP_PMTUDISC_WANT. Also, the value of IP_MTU socket option is not updated. >>>> >>>> >>> You could try reverting: >>> >>> commit ee9a8f7ab2edf801b8b514c310455c94acc232f6 >>> Author: Steffen Klassert<steffen.klassert@secunet.com> >>> Date: Mon Oct 8 00:56:54 2012 +0000 >>> >>> ipv4: Don't report stale pmtu values to userspace >>> >>> We report cached pmtu values even if they are already expired. >>> Change this to not report these values after they are expired >>> and fix a race in the expire time calculation, as suggested by >>> Eric Dumazet. >>> >>> Still, PMTU information is not supposed to expire for 10 minutes... >>> >>> >>> >> With reverted commit there is no such problem on 3.7.1: IP_MTU is >> updated and DF is set only for the first packet in case of >> IP_PMTUDISC_WANT. >> > [...] > > So it looks like something is going wrong with the expiry calculation > here. > > This change shouldn't affect the PMTU actually used by the kernel, but > could affect Onload since that relies on netlink route updates to keep > in synch. You didn't say you were using Onload, but if you are then we > should not bother netdev with this until we can demonstrate a problem > that involves only the kernel stack. > > The results were obtained on pure Linux kernel without using Onload. Yurij. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets 2012-12-19 19:37 ` Ben Hutchings 2012-12-20 7:14 ` Yurij M. Plotnikov @ 2012-12-20 7:34 ` Steffen Klassert 2012-12-20 11:22 ` Yurij M. Plotnikov 1 sibling, 1 reply; 24+ messages in thread From: Steffen Klassert @ 2012-12-20 7:34 UTC (permalink / raw) To: Ben Hutchings; +Cc: Yurij M. Plotnikov, netdev, Alexandra N. Kossovsky On Wed, Dec 19, 2012 at 07:37:44PM +0000, Ben Hutchings wrote: > On Wed, 2012-12-19 at 18:27 +0400, Yurij M. Plotnikov wrote: > > On 12/19/12 17:35, Ben Hutchings wrote: > > > On Wed, 2012-12-19 at 17:10 +0400, Yurij M. Plotnikov wrote: > > > > > >> On kernel 3.7.1 I get strange behaviour of IP_MTU_DISCOVER socket > > >> option. The behaviour in case of IP_PMTUDISC_DO and IP_PMTUDISC_WANT > > >> values of IP_MTU_DISCOVER socket option on SOCK_DGRAM socket are the > > >> same and packet is always sent with "Don't Fragment" bit in case of > > >> IP_PMTUDISC_WANT. Also, the value of IP_MTU socket option is not updated. > > >> > > > You could try reverting: > > > > > > commit ee9a8f7ab2edf801b8b514c310455c94acc232f6 > > > Author: Steffen Klassert<steffen.klassert@secunet.com> > > > Date: Mon Oct 8 00:56:54 2012 +0000 > > > > > > ipv4: Don't report stale pmtu values to userspace > > > > > > We report cached pmtu values even if they are already expired. > > > Change this to not report these values after they are expired > > > and fix a race in the expire time calculation, as suggested by > > > Eric Dumazet. > > > > > > Still, PMTU information is not supposed to expire for 10 minutes... > > > > > > > > With reverted commit there is no such problem on 3.7.1: IP_MTU is > > updated and DF is set only for the first packet in case of > > IP_PMTUDISC_WANT. > [...] > > So it looks like something is going wrong with the expiry calculation > here. > > This change shouldn't affect the PMTU actually used by the kernel, but > could affect Onload since that relies on netlink route updates to keep > in synch. You didn't say you were using Onload, but if you are then we > should not bother netdev with this until we can demonstrate a problem > that involves only the kernel stack. > I'm really surprised that this change can have such an effect, it changes nothing at the kernels pmtu handling. When looking at the code, I found that we may report a mtu value from a stale dst_entry when we query the mtu value with the IP_MTU socket option. But a subsequent send() should update the socket cached dst_entry, so at most one packet should be affected. Does the patch below change anything? diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index 3c9d208..1049ce0 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -1198,7 +1198,7 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname, { struct dst_entry *dst; val = 0; - dst = sk_dst_get(sk); + dst = sk_dst_check(sk, 0); if (dst) { val = dst_mtu(dst); dst_release(dst); ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets 2012-12-20 7:34 ` Steffen Klassert @ 2012-12-20 11:22 ` Yurij M. Plotnikov 2012-12-20 12:35 ` Steffen Klassert 0 siblings, 1 reply; 24+ messages in thread From: Yurij M. Plotnikov @ 2012-12-20 11:22 UTC (permalink / raw) To: Steffen Klassert; +Cc: Ben Hutchings, netdev, Alexandra N. Kossovsky On 12/20/12 11:34, Steffen Klassert wrote: > On Wed, Dec 19, 2012 at 07:37:44PM +0000, Ben Hutchings wrote: > >> On Wed, 2012-12-19 at 18:27 +0400, Yurij M. Plotnikov wrote: >> >>> On 12/19/12 17:35, Ben Hutchings wrote: >>> >>>> On Wed, 2012-12-19 at 17:10 +0400, Yurij M. Plotnikov wrote: >>>> >>>> >>>>> On kernel 3.7.1 I get strange behaviour of IP_MTU_DISCOVER socket >>>>> option. The behaviour in case of IP_PMTUDISC_DO and IP_PMTUDISC_WANT >>>>> values of IP_MTU_DISCOVER socket option on SOCK_DGRAM socket are the >>>>> same and packet is always sent with "Don't Fragment" bit in case of >>>>> IP_PMTUDISC_WANT. Also, the value of IP_MTU socket option is not updated. >>>>> >>>>> >>>> You could try reverting: >>>> >>>> commit ee9a8f7ab2edf801b8b514c310455c94acc232f6 >>>> Author: Steffen Klassert<steffen.klassert@secunet.com> >>>> Date: Mon Oct 8 00:56:54 2012 +0000 >>>> >>>> ipv4: Don't report stale pmtu values to userspace >>>> >>>> We report cached pmtu values even if they are already expired. >>>> Change this to not report these values after they are expired >>>> and fix a race in the expire time calculation, as suggested by >>>> Eric Dumazet. >>>> >>>> Still, PMTU information is not supposed to expire for 10 minutes... >>>> >>>> >>>> >>> With reverted commit there is no such problem on 3.7.1: IP_MTU is >>> updated and DF is set only for the first packet in case of >>> IP_PMTUDISC_WANT. >>> >> [...] >> >> So it looks like something is going wrong with the expiry calculation >> here. >> >> This change shouldn't affect the PMTU actually used by the kernel, but >> could affect Onload since that relies on netlink route updates to keep >> in synch. You didn't say you were using Onload, but if you are then we >> should not bother netdev with this until we can demonstrate a problem >> that involves only the kernel stack. >> >> > I'm really surprised that this change can have such an effect, > it changes nothing at the kernels pmtu handling. When looking > at the code, I found that we may report a mtu value from a stale > dst_entry when we query the mtu value with the IP_MTU socket > option. But a subsequent send() should update the socket cached > dst_entry, so at most one packet should be affected. > > Does the patch below change anything? > > > diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c > index 3c9d208..1049ce0 100644 > --- a/net/ipv4/ip_sockglue.c > +++ b/net/ipv4/ip_sockglue.c > @@ -1198,7 +1198,7 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname, > { > struct dst_entry *dst; > val = 0; > - dst = sk_dst_get(sk); > + dst = sk_dst_check(sk, 0); > if (dst) { > val = dst_mtu(dst); > dst_release(dst); > With this patch kernel 3.7.1 works perfect. All described problems are fixed. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets 2012-12-20 11:22 ` Yurij M. Plotnikov @ 2012-12-20 12:35 ` Steffen Klassert 2012-12-21 10:22 ` Steffen Klassert 0 siblings, 1 reply; 24+ messages in thread From: Steffen Klassert @ 2012-12-20 12:35 UTC (permalink / raw) To: Yurij M. Plotnikov; +Cc: Ben Hutchings, netdev, Alexandra N. Kossovsky On Thu, Dec 20, 2012 at 03:22:13PM +0400, Yurij M. Plotnikov wrote: > On 12/20/12 11:34, Steffen Klassert wrote: > > > >diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c > >index 3c9d208..1049ce0 100644 > >--- a/net/ipv4/ip_sockglue.c > >+++ b/net/ipv4/ip_sockglue.c > >@@ -1198,7 +1198,7 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname, > > { > > struct dst_entry *dst; > > val = 0; > >- dst = sk_dst_get(sk); > >+ dst = sk_dst_check(sk, 0); > > if (dst) { > > val = dst_mtu(dst); > > dst_release(dst); > With this patch kernel 3.7.1 works perfect. All described problems > are fixed. Thanks for testing! I'm not sure if we can't use this as a fix. I think with this patch it could happen that we return -ENOTCONN instead of a pmtu value on a connected socket. Perhaps it is better to update the cached dst_entry in ipv4_sk_update_pmtu() when we receive the -EMSGSIZE. I'll do some investigation. Anyway, it is still odd that reverting my other patch 'fixes' this issue too. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets 2012-12-20 12:35 ` Steffen Klassert @ 2012-12-21 10:22 ` Steffen Klassert 2013-01-14 8:26 ` Yurij M. Plotnikov 0 siblings, 1 reply; 24+ messages in thread From: Steffen Klassert @ 2012-12-21 10:22 UTC (permalink / raw) To: Yurij M. Plotnikov; +Cc: Ben Hutchings, netdev, Alexandra N. Kossovsky On Thu, Dec 20, 2012 at 01:35:35PM +0100, Steffen Klassert wrote: > On Thu, Dec 20, 2012 at 03:22:13PM +0400, Yurij M. Plotnikov wrote: > > On 12/20/12 11:34, Steffen Klassert wrote: > > > > > >diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c > > >index 3c9d208..1049ce0 100644 > > >--- a/net/ipv4/ip_sockglue.c > > >+++ b/net/ipv4/ip_sockglue.c > > >@@ -1198,7 +1198,7 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname, > > > { > > > struct dst_entry *dst; > > > val = 0; > > >- dst = sk_dst_get(sk); > > >+ dst = sk_dst_check(sk, 0); > > > if (dst) { > > > val = dst_mtu(dst); > > > dst_release(dst); > > With this patch kernel 3.7.1 works perfect. All described problems > > are fixed. > > Thanks for testing! > > I'm not sure if we can't use this as a fix. I think with this patch it > could happen that we return -ENOTCONN instead of a pmtu value on a > connected socket. Perhaps it is better to update the cached dst_entry in > ipv4_sk_update_pmtu() when we receive the -EMSGSIZE. I'll do some > investigation. > It turned out that updating the cached dst_entry in ipv4_sk_update_pmtu() is not trivial. We need to implement proper socket locking and we need socket release calback functions for all protocols that use ipv4_sk_update_pmtu(), similar to tcp. Today is my last office day for this year, so we probably have to defer a solution to the next year. Thanks. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets 2012-12-21 10:22 ` Steffen Klassert @ 2013-01-14 8:26 ` Yurij M. Plotnikov 2013-01-14 12:52 ` Steffen Klassert 2013-01-18 8:11 ` Steffen Klassert 0 siblings, 2 replies; 24+ messages in thread From: Yurij M. Plotnikov @ 2013-01-14 8:26 UTC (permalink / raw) To: Steffen Klassert; +Cc: Ben Hutchings, netdev, Alexandra N. Kossovsky On 12/21/12 14:22, Steffen Klassert wrote: > On Thu, Dec 20, 2012 at 01:35:35PM +0100, Steffen Klassert wrote: > >> On Thu, Dec 20, 2012 at 03:22:13PM +0400, Yurij M. Plotnikov wrote: >> >>> On 12/20/12 11:34, Steffen Klassert wrote: >>> >>>> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c >>>> index 3c9d208..1049ce0 100644 >>>> --- a/net/ipv4/ip_sockglue.c >>>> +++ b/net/ipv4/ip_sockglue.c >>>> @@ -1198,7 +1198,7 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname, >>>> { >>>> struct dst_entry *dst; >>>> val = 0; >>>> - dst = sk_dst_get(sk); >>>> + dst = sk_dst_check(sk, 0); >>>> if (dst) { >>>> val = dst_mtu(dst); >>>> dst_release(dst); >>>> >>> With this patch kernel 3.7.1 works perfect. All described problems >>> are fixed. >>> >> Thanks for testing! >> >> I'm not sure if we can't use this as a fix. I think with this patch it >> could happen that we return -ENOTCONN instead of a pmtu value on a >> connected socket. Perhaps it is better to update the cached dst_entry in >> ipv4_sk_update_pmtu() when we receive the -EMSGSIZE. I'll do some >> investigation. >> >> > It turned out that updating the cached dst_entry in ipv4_sk_update_pmtu() > is not trivial. We need to implement proper socket locking and we need > socket release calback functions for all protocols that use > ipv4_sk_update_pmtu(), similar to tcp. > > Today is my last office day for this year, so we probably have to defer > a solution to the next year. > > Thanks. > Hi Steffen, Could you, please, tell me is there any news about this bug? Thanks in advance, Yurij. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets 2013-01-14 8:26 ` Yurij M. Plotnikov @ 2013-01-14 12:52 ` Steffen Klassert 2013-01-18 8:11 ` Steffen Klassert 1 sibling, 0 replies; 24+ messages in thread From: Steffen Klassert @ 2013-01-14 12:52 UTC (permalink / raw) To: Yurij M. Plotnikov; +Cc: Ben Hutchings, netdev, Alexandra N. Kossovsky On Mon, Jan 14, 2013 at 12:26:59PM +0400, Yurij M. Plotnikov wrote: > > Could you, please, tell me is there any news about this bug? > Currently I'm testing some patches. I'll contact you when I have something that works, probably later this week. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets 2013-01-14 8:26 ` Yurij M. Plotnikov 2013-01-14 12:52 ` Steffen Klassert @ 2013-01-18 8:11 ` Steffen Klassert 2013-01-18 8:14 ` [RFC PATCH 1/3] ipv4: Invalidate the socket cached route on pmtu events if possible Steffen Klassert ` (3 more replies) 1 sibling, 4 replies; 24+ messages in thread From: Steffen Klassert @ 2013-01-18 8:11 UTC (permalink / raw) To: Yurij M. Plotnikov; +Cc: Ben Hutchings, netdev, Alexandra N. Kossovsky On Mon, Jan 14, 2013 at 12:26:59PM +0400, Yurij M. Plotnikov wrote: > > Could you, please, tell me is there any news about this bug? > Ok, I tried to reconstruct your testcase and I can confirm this bug. It happens only on the first pmtu event on a given route. It works if I run this test again. Actually it works sometimes even on the first pmtu event, so it took me quite a while to understand what's ging on. This is because standart routes (no pmtu or redirect events happened on that route) are per cpu, so each cpu has it's own struct rtable. This means that we do not invalidate the socket cached route if the NET_RX_SOFTIRQ (which handles the pmtu event) is not served by the same cpu that the sending socket uses. Exceptional routes (routes where a pmtu or redirect event occured) are not per cpu, so there is no problem. I'll send three patches in reply to this mail. Applying patch one and two should fix the bug you reported. If you use IPsec you need patch three too. The patches are marked as RFC, I'll do proper patch submission if you can confirm that they fix this bug. ^ permalink raw reply [flat|nested] 24+ messages in thread
* [RFC PATCH 1/3] ipv4: Invalidate the socket cached route on pmtu events if possible 2013-01-18 8:11 ` Steffen Klassert @ 2013-01-18 8:14 ` Steffen Klassert 2013-01-18 19:38 ` David Miller 2013-01-19 0:54 ` Julian Anastasov 2013-01-18 8:15 ` [RFC PATCH 2/3] ipv4: Add a socket release callback for datagram sockets Steffen Klassert ` (2 subsequent siblings) 3 siblings, 2 replies; 24+ messages in thread From: Steffen Klassert @ 2013-01-18 8:14 UTC (permalink / raw) To: Yurij M. Plotnikov; +Cc: Ben Hutchings, netdev, Alexandra N. Kossovsky The route lookup in ipv4_sk_update_pmtu() might return a route different from the route we cached at the socket. This is because standart routes are per cpu, so each cpu has it's own struct rtable. This means that we do not invalidate the socket cached route if the NET_RX_SOFTIRQ is not served by the same cpu that the sending socket uses. As a result, the cached route reused until we disconnect. With this patch we invalidate the socket cached route if possible. If the socket is owened by the user, we can't update the cached route directly. A followup patch will implement socket release callback functions for datagram sockets to handle this case. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> --- net/ipv4/route.c | 40 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 39 insertions(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 259cbee..e59541b 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -965,7 +965,7 @@ void ipv4_update_pmtu(struct sk_buff *skb, struct net *net, u32 mtu, } EXPORT_SYMBOL_GPL(ipv4_update_pmtu); -void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu) +static void __ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu) { const struct iphdr *iph = (const struct iphdr *) skb->data; struct flowi4 fl4; @@ -978,6 +978,44 @@ void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu) ip_rt_put(rt); } } + +void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu) +{ + const struct iphdr *iph = (const struct iphdr *) skb->data; + struct flowi4 fl4; + struct rtable *rt = (struct rtable *) __sk_dst_get(sk); + struct dst_entry *dst; + + bh_lock_sock(sk); + if (sock_owned_by_user(sk) || !rt) { + __ipv4_sk_update_pmtu(skb, sk, mtu); + goto out; + } + + __build_flow_key(&fl4, sk, iph, 0, 0, 0, 0, 0); + + if (!__sk_dst_check(sk, 0)) { + rt = ip_route_output_flow(sock_net(sk), &fl4, sk); + if (IS_ERR(rt)) + goto out; + } + + __ip_rt_update_pmtu((struct rtable *) rt->dst.path, &fl4, mtu); + + dst = dst_check(&rt->dst, 0); + if (!dst) { + rt = ip_route_output_flow(sock_net(sk), &fl4, sk); + if (IS_ERR(rt)) + goto out; + + dst = &rt->dst; + } + + __sk_dst_set(sk, dst); + +out: + bh_unlock_sock(sk); +} EXPORT_SYMBOL_GPL(ipv4_sk_update_pmtu); void ipv4_redirect(struct sk_buff *skb, struct net *net, -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 1/3] ipv4: Invalidate the socket cached route on pmtu events if possible 2013-01-18 8:14 ` [RFC PATCH 1/3] ipv4: Invalidate the socket cached route on pmtu events if possible Steffen Klassert @ 2013-01-18 19:38 ` David Miller 2013-01-19 0:54 ` Julian Anastasov 1 sibling, 0 replies; 24+ messages in thread From: David Miller @ 2013-01-18 19:38 UTC (permalink / raw) To: steffen.klassert; +Cc: Yurij.Plotnikov, bhutchings, netdev, Alexandra.Kossovsky From: Steffen Klassert <steffen.klassert@secunet.com> Date: Fri, 18 Jan 2013 09:14:20 +0100 > The route lookup in ipv4_sk_update_pmtu() might return a route > different from the route we cached at the socket. This is because > standart routes are per cpu, so each cpu has it's own struct rtable. > This means that we do not invalidate the socket cached route if the > NET_RX_SOFTIRQ is not served by the same cpu that the sending socket > uses. As a result, the cached route reused until we disconnect. > > With this patch we invalidate the socket cached route if possible. > If the socket is owened by the user, we can't update the cached > route directly. A followup patch will implement socket release > callback functions for datagram sockets to handle this case. > > Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> This looks fine. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 1/3] ipv4: Invalidate the socket cached route on pmtu events if possible 2013-01-18 8:14 ` [RFC PATCH 1/3] ipv4: Invalidate the socket cached route on pmtu events if possible Steffen Klassert 2013-01-18 19:38 ` David Miller @ 2013-01-19 0:54 ` Julian Anastasov 2013-01-21 6:43 ` Steffen Klassert 1 sibling, 1 reply; 24+ messages in thread From: Julian Anastasov @ 2013-01-19 0:54 UTC (permalink / raw) To: Steffen Klassert Cc: Yurij M. Plotnikov, Ben Hutchings, netdev, Alexandra N. Kossovsky Hello, On Fri, 18 Jan 2013, Steffen Klassert wrote: > The route lookup in ipv4_sk_update_pmtu() might return a route > different from the route we cached at the socket. This is because > standart routes are per cpu, so each cpu has it's own struct rtable. > This means that we do not invalidate the socket cached route if the > NET_RX_SOFTIRQ is not served by the same cpu that the sending socket > uses. As a result, the cached route reused until we disconnect. > > With this patch we invalidate the socket cached route if possible. > If the socket is owened by the user, we can't update the cached > route directly. A followup patch will implement socket release > callback functions for datagram sockets to handle this case. > > Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> > --- > net/ipv4/route.c | 40 +++++++++++++++++++++++++++++++++++++++- > 1 file changed, 39 insertions(+), 1 deletion(-) > > diff --git a/net/ipv4/route.c b/net/ipv4/route.c > index 259cbee..e59541b 100644 > --- a/net/ipv4/route.c > +++ b/net/ipv4/route.c > @@ -965,7 +965,7 @@ void ipv4_update_pmtu(struct sk_buff *skb, struct net *net, u32 mtu, > } > EXPORT_SYMBOL_GPL(ipv4_update_pmtu); > > -void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu) > +static void __ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu) > { > const struct iphdr *iph = (const struct iphdr *) skb->data; > struct flowi4 fl4; > @@ -978,6 +978,44 @@ void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu) > ip_rt_put(rt); > } > } > + > +void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu) > +{ > + const struct iphdr *iph = (const struct iphdr *) skb->data; > + struct flowi4 fl4; > + struct rtable *rt = (struct rtable *) __sk_dst_get(sk); I guess __sk_dst_get should be called after bh_lock_sock ? > + struct dst_entry *dst; > + > + bh_lock_sock(sk); > + if (sock_owned_by_user(sk) || !rt) { > + __ipv4_sk_update_pmtu(skb, sk, mtu); > + goto out; > + } Regards -- Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 1/3] ipv4: Invalidate the socket cached route on pmtu events if possible 2013-01-19 0:54 ` Julian Anastasov @ 2013-01-21 6:43 ` Steffen Klassert 0 siblings, 0 replies; 24+ messages in thread From: Steffen Klassert @ 2013-01-21 6:43 UTC (permalink / raw) To: Julian Anastasov Cc: Yurij M. Plotnikov, Ben Hutchings, netdev, Alexandra N. Kossovsky On Sat, Jan 19, 2013 at 02:54:04AM +0200, Julian Anastasov wrote: > > + > > +void ipv4_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, u32 mtu) > > +{ > > + const struct iphdr *iph = (const struct iphdr *) skb->data; > > + struct flowi4 fl4; > > + struct rtable *rt = (struct rtable *) __sk_dst_get(sk); > > I guess __sk_dst_get should be called after > bh_lock_sock ? Yes, absolutely. I'll fix it before I submit the patch. Thanks for your review! ^ permalink raw reply [flat|nested] 24+ messages in thread
* [RFC PATCH 2/3] ipv4: Add a socket release callback for datagram sockets 2013-01-18 8:11 ` Steffen Klassert 2013-01-18 8:14 ` [RFC PATCH 1/3] ipv4: Invalidate the socket cached route on pmtu events if possible Steffen Klassert @ 2013-01-18 8:15 ` Steffen Klassert 2013-01-18 19:39 ` David Miller 2013-01-18 8:16 ` [RFC PATCH 3/3] xfrm4: Invalidate all ipv4 routes on IPsec pmtu events Steffen Klassert 2013-01-21 11:31 ` PMTU discovery is broken on kernel 3.7.1 for UDP sockets Yurij M. Plotnikov 3 siblings, 1 reply; 24+ messages in thread From: Steffen Klassert @ 2013-01-18 8:15 UTC (permalink / raw) To: Yurij M. Plotnikov; +Cc: Ben Hutchings, netdev, Alexandra N. Kossovsky This implements a socket release callback function to check if the socket cached route got invalid during the time we owned the socket. The function is used from udp, raw and ping sockets. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> --- include/net/ip.h | 2 ++ net/ipv4/datagram.c | 25 +++++++++++++++++++++++++ net/ipv4/ping.c | 1 + net/ipv4/raw.c | 1 + net/ipv4/udp.c | 1 + 5 files changed, 30 insertions(+) diff --git a/include/net/ip.h b/include/net/ip.h index 0707fb9..a68f838 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -143,6 +143,8 @@ static inline struct sk_buff *ip_finish_skb(struct sock *sk, struct flowi4 *fl4) extern int ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len); +extern void ip4_datagram_release_cb(struct sock *sk); + struct ip_reply_arg { struct kvec iov[1]; int flags; diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c index 424fafb..b28e863 100644 --- a/net/ipv4/datagram.c +++ b/net/ipv4/datagram.c @@ -85,3 +85,28 @@ out: return err; } EXPORT_SYMBOL(ip4_datagram_connect); + +void ip4_datagram_release_cb(struct sock *sk) +{ + const struct inet_sock *inet = inet_sk(sk); + const struct ip_options_rcu *inet_opt; + __be32 daddr = inet->inet_daddr; + struct flowi4 fl4; + struct rtable *rt; + + if (! __sk_dst_get(sk) || __sk_dst_check(sk, 0)) + return; + + rcu_read_lock(); + inet_opt = rcu_dereference(inet->inet_opt); + if (inet_opt && inet_opt->opt.srr) + daddr = inet_opt->opt.faddr; + rt = ip_route_output_ports(sock_net(sk), &fl4, sk, daddr, + inet->inet_saddr, inet->inet_dport, + inet->inet_sport, sk->sk_protocol, + RT_CONN_FLAGS(sk), sk->sk_bound_dev_if); + if (!IS_ERR(rt)) + __sk_dst_set(sk, &rt->dst); + rcu_read_unlock(); +} +EXPORT_SYMBOL_GPL(ip4_datagram_release_cb); diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c index 8f3d054..6f9c072 100644 --- a/net/ipv4/ping.c +++ b/net/ipv4/ping.c @@ -738,6 +738,7 @@ struct proto ping_prot = { .recvmsg = ping_recvmsg, .bind = ping_bind, .backlog_rcv = ping_queue_rcv_skb, + .release_cb = ip4_datagram_release_cb, .hash = ping_v4_hash, .unhash = ping_v4_unhash, .get_port = ping_v4_get_port, diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index 73d1e4d..6f08991 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -894,6 +894,7 @@ struct proto raw_prot = { .recvmsg = raw_recvmsg, .bind = raw_bind, .backlog_rcv = raw_rcv_skb, + .release_cb = ip4_datagram_release_cb, .hash = raw_hash_sk, .unhash = raw_unhash_sk, .obj_size = sizeof(struct raw_sock), diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 79c8dbe..1f4d405 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1952,6 +1952,7 @@ struct proto udp_prot = { .recvmsg = udp_recvmsg, .sendpage = udp_sendpage, .backlog_rcv = __udp_queue_rcv_skb, + .release_cb = ip4_datagram_release_cb, .hash = udp_lib_hash, .unhash = udp_lib_unhash, .rehash = udp_v4_rehash, -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 2/3] ipv4: Add a socket release callback for datagram sockets 2013-01-18 8:15 ` [RFC PATCH 2/3] ipv4: Add a socket release callback for datagram sockets Steffen Klassert @ 2013-01-18 19:39 ` David Miller 0 siblings, 0 replies; 24+ messages in thread From: David Miller @ 2013-01-18 19:39 UTC (permalink / raw) To: steffen.klassert; +Cc: Yurij.Plotnikov, bhutchings, netdev, Alexandra.Kossovsky From: Steffen Klassert <steffen.klassert@secunet.com> Date: Fri, 18 Jan 2013 09:15:14 +0100 > This implements a socket release callback function to check > if the socket cached route got invalid during the time > we owned the socket. The function is used from udp, raw > and ping sockets. > > Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Ok. ^ permalink raw reply [flat|nested] 24+ messages in thread
* [RFC PATCH 3/3] xfrm4: Invalidate all ipv4 routes on IPsec pmtu events 2013-01-18 8:11 ` Steffen Klassert 2013-01-18 8:14 ` [RFC PATCH 1/3] ipv4: Invalidate the socket cached route on pmtu events if possible Steffen Klassert 2013-01-18 8:15 ` [RFC PATCH 2/3] ipv4: Add a socket release callback for datagram sockets Steffen Klassert @ 2013-01-18 8:16 ` Steffen Klassert 2013-01-18 19:39 ` David Miller 2013-01-21 11:31 ` PMTU discovery is broken on kernel 3.7.1 for UDP sockets Yurij M. Plotnikov 3 siblings, 1 reply; 24+ messages in thread From: Steffen Klassert @ 2013-01-18 8:16 UTC (permalink / raw) To: Yurij M. Plotnikov; +Cc: Ben Hutchings, netdev, Alexandra N. Kossovsky On IPsec pmtu events we can't access the transport headers of the original packet, so we can't find the socket that sent the packet. The only chance to notify the socket about the pmtu change is to force a relookup for all routes. This patch implenents this for the IPsec protocols. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> --- net/ipv4/ah4.c | 7 +++++-- net/ipv4/esp4.c | 7 +++++-- net/ipv4/ipcomp.c | 7 +++++-- 3 files changed, 15 insertions(+), 6 deletions(-) diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c index a0d8392..612ecc9 100644 --- a/net/ipv4/ah4.c +++ b/net/ipv4/ah4.c @@ -413,9 +413,12 @@ static void ah4_err(struct sk_buff *skb, u32 info) if (!x) return; - if (icmp_hdr(skb)->type == ICMP_DEST_UNREACH) + if (icmp_hdr(skb)->type == ICMP_DEST_UNREACH) { + atomic_inc(&flow_cache_genid); + rt_genid_bump(net); + ipv4_update_pmtu(skb, net, info, 0, 0, IPPROTO_AH, 0); - else + } else ipv4_redirect(skb, net, 0, 0, IPPROTO_AH, 0); xfrm_state_put(x); } diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c index b61e9de..2d32ae5 100644 --- a/net/ipv4/esp4.c +++ b/net/ipv4/esp4.c @@ -499,9 +499,12 @@ static void esp4_err(struct sk_buff *skb, u32 info) if (!x) return; - if (icmp_hdr(skb)->type == ICMP_DEST_UNREACH) + if (icmp_hdr(skb)->type == ICMP_DEST_UNREACH) { + atomic_inc(&flow_cache_genid); + rt_genid_bump(net); + ipv4_update_pmtu(skb, net, info, 0, 0, IPPROTO_ESP, 0); - else + } else ipv4_redirect(skb, net, 0, 0, IPPROTO_ESP, 0); xfrm_state_put(x); } diff --git a/net/ipv4/ipcomp.c b/net/ipv4/ipcomp.c index d3ab47e..9a46dae 100644 --- a/net/ipv4/ipcomp.c +++ b/net/ipv4/ipcomp.c @@ -47,9 +47,12 @@ static void ipcomp4_err(struct sk_buff *skb, u32 info) if (!x) return; - if (icmp_hdr(skb)->type == ICMP_DEST_UNREACH) + if (icmp_hdr(skb)->type == ICMP_DEST_UNREACH) { + atomic_inc(&flow_cache_genid); + rt_genid_bump(net); + ipv4_update_pmtu(skb, net, info, 0, 0, IPPROTO_COMP, 0); - else + } else ipv4_redirect(skb, net, 0, 0, IPPROTO_COMP, 0); xfrm_state_put(x); } -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 3/3] xfrm4: Invalidate all ipv4 routes on IPsec pmtu events 2013-01-18 8:16 ` [RFC PATCH 3/3] xfrm4: Invalidate all ipv4 routes on IPsec pmtu events Steffen Klassert @ 2013-01-18 19:39 ` David Miller 2013-01-21 6:48 ` Steffen Klassert 2013-01-21 12:04 ` Steffen Klassert 0 siblings, 2 replies; 24+ messages in thread From: David Miller @ 2013-01-18 19:39 UTC (permalink / raw) To: steffen.klassert; +Cc: Yurij.Plotnikov, bhutchings, netdev, Alexandra.Kossovsky From: Steffen Klassert <steffen.klassert@secunet.com> Date: Fri, 18 Jan 2013 09:16:01 +0100 > On IPsec pmtu events we can't access the transport headers of > the original packet, so we can't find the socket that sent > the packet. The only chance to notify the socket about the > pmtu change is to force a relookup for all routes. This > patch implenents this for the IPsec protocols. > > Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Expensive, but I really can't come up with a better idea. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 3/3] xfrm4: Invalidate all ipv4 routes on IPsec pmtu events 2013-01-18 19:39 ` David Miller @ 2013-01-21 6:48 ` Steffen Klassert 2013-01-21 12:04 ` Steffen Klassert 1 sibling, 0 replies; 24+ messages in thread From: Steffen Klassert @ 2013-01-21 6:48 UTC (permalink / raw) To: David Miller; +Cc: Yurij.Plotnikov, bhutchings, netdev, Alexandra.Kossovsky On Fri, Jan 18, 2013 at 02:39:23PM -0500, David Miller wrote: > From: Steffen Klassert <steffen.klassert@secunet.com> > Date: Fri, 18 Jan 2013 09:16:01 +0100 > > > On IPsec pmtu events we can't access the transport headers of > > the original packet, so we can't find the socket that sent > > the packet. The only chance to notify the socket about the > > pmtu change is to force a relookup for all routes. This > > patch implenents this for the IPsec protocols. > > > > Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> > > Expensive, but I really can't come up with a better idea. Yes, it's a last resort fix. But pmtu events are rare, so it should not harm too much. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC PATCH 3/3] xfrm4: Invalidate all ipv4 routes on IPsec pmtu events 2013-01-18 19:39 ` David Miller 2013-01-21 6:48 ` Steffen Klassert @ 2013-01-21 12:04 ` Steffen Klassert 1 sibling, 0 replies; 24+ messages in thread From: Steffen Klassert @ 2013-01-21 12:04 UTC (permalink / raw) To: David Miller; +Cc: Yurij.Plotnikov, bhutchings, netdev, Alexandra.Kossovsky On Fri, Jan 18, 2013 at 02:39:23PM -0500, David Miller wrote: > From: Steffen Klassert <steffen.klassert@secunet.com> > Date: Fri, 18 Jan 2013 09:16:01 +0100 > > > On IPsec pmtu events we can't access the transport headers of > > the original packet, so we can't find the socket that sent > > the packet. The only chance to notify the socket about the > > pmtu change is to force a relookup for all routes. This > > patch implenents this for the IPsec protocols. > > > > Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> > > Expensive, but I really can't come up with a better idea. I've just applied this to the ipsec tree. I'll send a pull request tomorrow. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets 2013-01-18 8:11 ` Steffen Klassert ` (2 preceding siblings ...) 2013-01-18 8:16 ` [RFC PATCH 3/3] xfrm4: Invalidate all ipv4 routes on IPsec pmtu events Steffen Klassert @ 2013-01-21 11:31 ` Yurij M. Plotnikov 2013-01-21 11:38 ` Steffen Klassert 3 siblings, 1 reply; 24+ messages in thread From: Yurij M. Plotnikov @ 2013-01-21 11:31 UTC (permalink / raw) To: Steffen Klassert; +Cc: Ben Hutchings, netdev, Alexandra N. Kossovsky On 01/18/13 12:11, Steffen Klassert wrote: > On Mon, Jan 14, 2013 at 12:26:59PM +0400, Yurij M. Plotnikov wrote: > >> Could you, please, tell me is there any news about this bug? >> >> > Ok, I tried to reconstruct your testcase and I can confirm this bug. > It happens only on the first pmtu event on a given route. It works > if I run this test again. Actually it works sometimes even on the > first pmtu event, so it took me quite a while to understand what's ging on. > > This is because standart routes (no pmtu or redirect events happened on > that route) are per cpu, so each cpu has it's own struct rtable. This means > that we do not invalidate the socket cached route if the NET_RX_SOFTIRQ > (which handles the pmtu event) is not served by the same cpu that the > sending socket uses. Exceptional routes (routes where a pmtu or redirect > event occured) are not per cpu, so there is no problem. > > I'll send three patches in reply to this mail. Applying patch one and two > should fix the bug you reported. If you use IPsec you need patch three too. > The patches are marked as RFC, I'll do proper patch submission if you can > confirm that they fix this bug. > Sorry for long delay. I've just checked patches 1+2 and they fix the bug for me. Yurij. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: PMTU discovery is broken on kernel 3.7.1 for UDP sockets 2013-01-21 11:31 ` PMTU discovery is broken on kernel 3.7.1 for UDP sockets Yurij M. Plotnikov @ 2013-01-21 11:38 ` Steffen Klassert 0 siblings, 0 replies; 24+ messages in thread From: Steffen Klassert @ 2013-01-21 11:38 UTC (permalink / raw) To: Yurij M. Plotnikov; +Cc: Ben Hutchings, netdev, Alexandra N. Kossovsky On Mon, Jan 21, 2013 at 03:31:20PM +0400, Yurij M. Plotnikov wrote: > On 01/18/13 12:11, Steffen Klassert wrote: > > > >I'll send three patches in reply to this mail. Applying patch one and two > >should fix the bug you reported. If you use IPsec you need patch three too. > >The patches are marked as RFC, I'll do proper patch submission if you can > >confirm that they fix this bug. > Sorry for long delay. I've just checked patches 1+2 and they fix the > bug for me. > Thanks a lot for reporting and testing! ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2013-01-21 12:04 UTC | newest] Thread overview: 24+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-12-19 13:10 PMTU discovery is broken on kernel 3.7.1 for UDP sockets Yurij M. Plotnikov 2012-12-19 13:35 ` Ben Hutchings 2012-12-19 14:27 ` Yurij M. Plotnikov 2012-12-19 19:37 ` Ben Hutchings 2012-12-20 7:14 ` Yurij M. Plotnikov 2012-12-20 7:34 ` Steffen Klassert 2012-12-20 11:22 ` Yurij M. Plotnikov 2012-12-20 12:35 ` Steffen Klassert 2012-12-21 10:22 ` Steffen Klassert 2013-01-14 8:26 ` Yurij M. Plotnikov 2013-01-14 12:52 ` Steffen Klassert 2013-01-18 8:11 ` Steffen Klassert 2013-01-18 8:14 ` [RFC PATCH 1/3] ipv4: Invalidate the socket cached route on pmtu events if possible Steffen Klassert 2013-01-18 19:38 ` David Miller 2013-01-19 0:54 ` Julian Anastasov 2013-01-21 6:43 ` Steffen Klassert 2013-01-18 8:15 ` [RFC PATCH 2/3] ipv4: Add a socket release callback for datagram sockets Steffen Klassert 2013-01-18 19:39 ` David Miller 2013-01-18 8:16 ` [RFC PATCH 3/3] xfrm4: Invalidate all ipv4 routes on IPsec pmtu events Steffen Klassert 2013-01-18 19:39 ` David Miller 2013-01-21 6:48 ` Steffen Klassert 2013-01-21 12:04 ` Steffen Klassert 2013-01-21 11:31 ` PMTU discovery is broken on kernel 3.7.1 for UDP sockets Yurij M. Plotnikov 2013-01-21 11:38 ` Steffen Klassert
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).