net/arp: ARP cache aging failed.

Netdev List
 help / color / mirror / Atom feed

* net/arp: ARP cache aging failed.
@ 2016-11-23  1:16 yuehaibing
  2016-11-23  8:33 ` Julian Anastasov
  0 siblings, 1 reply; 16+ messages in thread
From: yuehaibing @ 2016-11-23  1:16 UTC (permalink / raw)
  To: davem; +Cc: netdev


 Hi,

	I've encountered a arp cache aging failed bug in 4.9 kernel.The topo is as follow:


		HOST1			   --------		  -----
	  --------------------IP1 --------| Switch |--------IP2-| HOST2 |
		 |			   --------		  -----
 	  ------Bonging----                  |
	   |		   |		     IP3
 	 MAC1            MAC2		   | HOST3 |


	HOST1 have a bonding interface which including two NICs

	IP1:192.168.1.100/24
	IP2:192.168.1.200/24
	IP2:192.168.1.300/24

	There are large numbers of TCP transaction between HOST2 and HOST3.


	The Host2 can ping HOST1 normally.However,It cannot ping after HOST1 bonding interface deactived a working NIC.
	

	on HOST2 ,use fowllow command:

		watch "ip -s neigh show|grep 192.168.1.100"
	
	I noticed the old HOST1 arp cache aging counter is gradually  increased ,then reset before it reached 30.This process is repeated,
thus the arp cache holding REACHABLE status.The new HOST1 MAC arp cache cannot been renewed ,and thus ping cannot sendto the correct HOST1 MAC.

	Then I found n->confirmed is freshed  in dst_neigh_output while dst->pending_confirm is set to 1.

include/net/dst.h

static inline void dst_confirm(struct dst_entry *dst)
{
	dst->pending_confirm = 1;
}

static inline int dst_neigh_output(struct dst_entry *dst, struct neighbour *n,
                                   struct sk_buff *skb)
{
        const struct hh_cache *hh;

        if (dst->pending_confirm) {
                unsigned long now = jiffies;

                dst->pending_confirm = 0;
                /* avoid dirtying neighbour */
                if (n->confirmed != now)
                        n->confirmed = now;
        }
	.......
}

	dst_confirm can be called by tcp_ack in net/ipv4/tcp_input.c.

/* This routine deals with incoming acks, but not outgoing ones. */
static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
{
	.....
	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP)) {
		struct dst_entry *dst = __sk_dst_get(sk);
		if (dst)
			dst_confirm(dst);
	}
	.....
}
	
	As to my topo,HOST1 and HOST3 share one route on HOST2, tcp connection between HOST2 and HOST3 may call tcp_ack to set dst->pending_confirm.
	
So dst_neigh_output may wrongly freshed  n->confirmed which stands for HOST1,however HOST1'MAC had been changed.

	The possibility of this occurred Significantly increases ,when ping and TCP transaction are set the same processor affinity on the HOST2.

	It seems that the issue is brought in commit 5110effee8fde2edfacac9cd12a9960ab2dc39ea ("net: Do delayed neigh confirmation.").

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: net/arp: ARP cache aging failed.
  2016-11-23  1:16 net/arp: ARP cache aging failed yuehaibing
@ 2016-11-23  8:33 ` Julian Anastasov
  2016-11-23 12:05   ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Julian Anastasov @ 2016-11-23  8:33 UTC (permalink / raw)
  To: yuehaibing; +Cc: davem, netdev


	Hello,

On Wed, 23 Nov 2016, yuehaibing wrote:

> 	As to my topo,HOST1 and HOST3 share one route on HOST2, tcp connection between HOST2 and HOST3 may call tcp_ack to set dst->pending_confirm.
> 	
> So dst_neigh_output may wrongly freshed  n->confirmed which stands for HOST1,however HOST1'MAC had been changed.
> 
> 	The possibility of this occurred Significantly increases ,when ping and TCP transaction are set the same processor affinity on the HOST2.
> 
> 	It seems that the issue is brought in commit 5110effee8fde2edfacac9cd12a9960ab2dc39ea ("net: Do delayed neigh confirmation.").

	Bad news. Problem is not in delayed confirmation but
in the mechanism to use same dst for different neighbours on
LAN. We don't have a dst->neighbour reference anymore.

	For IPv4 this is related to rt->rt_uses_gateway but
also to DST_NOCACHE. In the other cases we can not call
dst_confirm, may be we should lookup the neigh entry instead.
But we need a way to reduce such lookups on every packet,
for example, by remembering in struct sock and checking if
some bits of jiffies (at least 4-5) are changed from
previous lookup.

Regards

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: net/arp: ARP cache aging failed.
  2016-11-23  8:33 ` Julian Anastasov
@ 2016-11-23 12:05   ` Eric Dumazet
  2016-11-23 14:37     ` Hannes Frederic Sowa
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2016-11-23 12:05 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: yuehaibing, davem, netdev

On Wed, 2016-11-23 at 10:33 +0200, Julian Anastasov wrote:
> 	Hello,
> 
> On Wed, 23 Nov 2016, yuehaibing wrote:
> 
> > 	As to my topo,HOST1 and HOST3 share one route on HOST2, tcp connection between HOST2 and HOST3 may call tcp_ack to set dst->pending_confirm.
> > 	
> > So dst_neigh_output may wrongly freshed  n->confirmed which stands for HOST1,however HOST1'MAC had been changed.
> > 
> > 	The possibility of this occurred Significantly increases ,when ping and TCP transaction are set the same processor affinity on the HOST2.
> > 
> > 	It seems that the issue is brought in commit 5110effee8fde2edfacac9cd12a9960ab2dc39ea ("net: Do delayed neigh confirmation.").
> 
> 	Bad news. Problem is not in delayed confirmation but
> in the mechanism to use same dst for different neighbours on
> LAN. We don't have a dst->neighbour reference anymore.
> 
> 	For IPv4 this is related to rt->rt_uses_gateway but
> also to DST_NOCACHE. In the other cases we can not call
> dst_confirm, may be we should lookup the neigh entry instead.
> But we need a way to reduce such lookups on every packet,
> for example, by remembering in struct sock and checking if
> some bits of jiffies (at least 4-5) are changed from
> previous lookup.


I thought bonding would keep the MAC address 'alive'.

If TCP packets are confirmed, this means the old MAC address is still
valid, what am I missing here ?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: net/arp: ARP cache aging failed.
  2016-11-23 12:05   ` Eric Dumazet
@ 2016-11-23 14:37     ` Hannes Frederic Sowa
  2016-11-23 23:49       ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Hannes Frederic Sowa @ 2016-11-23 14:37 UTC (permalink / raw)
  To: Eric Dumazet, Julian Anastasov; +Cc: yuehaibing, davem, netdev

On 23.11.2016 13:05, Eric Dumazet wrote:
> On Wed, 2016-11-23 at 10:33 +0200, Julian Anastasov wrote:
>> 	Hello,
>>
>> On Wed, 23 Nov 2016, yuehaibing wrote:
>>
>>> 	As to my topo,HOST1 and HOST3 share one route on HOST2, tcp connection between HOST2 and HOST3 may call tcp_ack to set dst->pending_confirm.
>>> 	
>>> So dst_neigh_output may wrongly freshed  n->confirmed which stands for HOST1,however HOST1'MAC had been changed.
>>>
>>> 	The possibility of this occurred Significantly increases ,when ping and TCP transaction are set the same processor affinity on the HOST2.
>>>
>>> 	It seems that the issue is brought in commit 5110effee8fde2edfacac9cd12a9960ab2dc39ea ("net: Do delayed neigh confirmation.").
>>
>> 	Bad news. Problem is not in delayed confirmation but
>> in the mechanism to use same dst for different neighbours on
>> LAN. We don't have a dst->neighbour reference anymore.
>>
>> 	For IPv4 this is related to rt->rt_uses_gateway but
>> also to DST_NOCACHE. In the other cases we can not call
>> dst_confirm, may be we should lookup the neigh entry instead.
>> But we need a way to reduce such lookups on every packet,
>> for example, by remembering in struct sock and checking if
>> some bits of jiffies (at least 4-5) are changed from
>> previous lookup.
> 
> 
> I thought bonding would keep the MAC address 'alive'.

I wonder about this, too.

> If TCP packets are confirmed, this means the old MAC address is still
> valid, what am I missing here ?

Irregardless about the question if bonding should keep the MAC address
alive, a MAC address can certainly change below a TCP connection.

dst_entry is 1:n to neigh_entry and as such we can end up confirming an
aging neighbor while sending a reply with dst->pending_confirm set while
the confirming packet actually came from a different neighbor.

I agree with Julian, pending_confirm became useless in this way.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: net/arp: ARP cache aging failed.
  2016-11-23 14:37     ` Hannes Frederic Sowa
@ 2016-11-23 23:49       ` Eric Dumazet
  2016-11-24  7:51         ` Julian Anastasov
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2016-11-23 23:49 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: Julian Anastasov, yuehaibing, davem, netdev

On Wed, 2016-11-23 at 15:37 +0100, Hannes Frederic Sowa wrote:

> Irregardless about the question if bonding should keep the MAC address
> alive, a MAC address can certainly change below a TCP connection.

Of course ;)

> 
> dst_entry is 1:n to neigh_entry and as such we can end up confirming an
> aging neighbor while sending a reply with dst->pending_confirm set while
> the confirming packet actually came from a different neighbor.
> 
> I agree with Julian, pending_confirm became useless in this way.

Let's kill it then ;)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: net/arp: ARP cache aging failed.
  2016-11-23 23:49       ` Eric Dumazet
@ 2016-11-24  7:51         ` Julian Anastasov
  2016-11-24  9:06           ` YueHaibing
  0 siblings, 1 reply; 16+ messages in thread
From: Julian Anastasov @ 2016-11-24  7:51 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Hannes Frederic Sowa, yuehaibing, davem, netdev


	Hello,

On Wed, 23 Nov 2016, Eric Dumazet wrote:

> On Wed, 2016-11-23 at 15:37 +0100, Hannes Frederic Sowa wrote:
> 
> > Irregardless about the question if bonding should keep the MAC address
> > alive, a MAC address can certainly change below a TCP connection.
> 
> Of course ;)
> 
> > 
> > dst_entry is 1:n to neigh_entry and as such we can end up confirming an
> > aging neighbor while sending a reply with dst->pending_confirm set while
> > the confirming packet actually came from a different neighbor.
> > 
> > I agree with Julian, pending_confirm became useless in this way.
> 
> Let's kill it then ;)

	It works for traffic via gateway. I now see that
we can even avoid write in dst_confirm:

	if (!dst->pending_confirm)
		dst->pending_confirm = 1;

	because it is called by non-dup TCP ACKs.

	But for traffic to hosts on LAN we need different solution,
i.e. for cached dsts with rt_gateway = 0 (last entry below).

rt_uses_gateway rt_gateway DST_NOCACHE Description
====================================================================
1               nh_gw      ANY         Traffic via gateway
0               LAN_host   1           FLOWI_FLAG_KNOWN_NH (nexthop
                                       set by IPVS, hdrincl, xt_TEE)
0               0          0           1 dst for many subnet hosts

Regards

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: net/arp: ARP cache aging failed.
  2016-11-24  7:51         ` Julian Anastasov
@ 2016-11-24  9:06           ` YueHaibing
  2016-11-24 12:28             ` Hannes Frederic Sowa
  0 siblings, 1 reply; 16+ messages in thread
From: YueHaibing @ 2016-11-24  9:06 UTC (permalink / raw)
  To: Julian Anastasov, Eric Dumazet; +Cc: Hannes Frederic Sowa, davem, netdev

On 2016/11/24 15:51, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Wed, 23 Nov 2016, Eric Dumazet wrote:
> 
>> On Wed, 2016-11-23 at 15:37 +0100, Hannes Frederic Sowa wrote:
>>
>>> Irregardless about the question if bonding should keep the MAC address
>>> alive, a MAC address can certainly change below a TCP connection.
>>
>> Of course ;)
>>

I configured bonding fail_over_mac=1 ,so the bonding MAC always be the MAC
address of the currently active slave.

>>>
>>> dst_entry is 1:n to neigh_entry and as such we can end up confirming an
>>> aging neighbor while sending a reply with dst->pending_confirm set while
>>> the confirming packet actually came from a different neighbor.
>>>
>>> I agree with Julian, pending_confirm became useless in this way.
>>
>> Let's kill it then ;)
> 
> 	It works for traffic via gateway. I now see that
> we can even avoid write in dst_confirm:
> 
> 	if (!dst->pending_confirm)
> 		dst->pending_confirm = 1;
> 
> 	because it is called by non-dup TCP ACKs.
> 
> 	But for traffic to hosts on LAN we need different solution,
> i.e. for cached dsts with rt_gateway = 0 (last entry below).
> 
> rt_uses_gateway rt_gateway DST_NOCACHE Description
> ====================================================================
> 1               nh_gw      ANY         Traffic via gateway
> 0               LAN_host   1           FLOWI_FLAG_KNOWN_NH (nexthop
>                                        set by IPVS, hdrincl, xt_TEE)
> 0               0          0           1 dst for many subnet hosts
> 
> Regards
> 
> --
> Julian Anastasov <ja@ssi.bg>
> 
> .
> 

As above,Is there a plan to fix the problem ? Should we just not call dst_confirm
when in the case rt->rt_uses_gateway/DST_NOCACHE?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: net/arp: ARP cache aging failed.
  2016-11-24  9:06           ` YueHaibing
@ 2016-11-24 12:28             ` Hannes Frederic Sowa
  2016-11-25  8:18               ` Julian Anastasov
  0 siblings, 1 reply; 16+ messages in thread
From: Hannes Frederic Sowa @ 2016-11-24 12:28 UTC (permalink / raw)
  To: YueHaibing, Julian Anastasov, Eric Dumazet; +Cc: davem, netdev

On 24.11.2016 10:06, YueHaibing wrote:
> On 2016/11/24 15:51, Julian Anastasov wrote:
>>
>> 	Hello,
>>
>> On Wed, 23 Nov 2016, Eric Dumazet wrote:
>>
>>> On Wed, 2016-11-23 at 15:37 +0100, Hannes Frederic Sowa wrote:
>>>
>>>> Irregardless about the question if bonding should keep the MAC address
>>>> alive, a MAC address can certainly change below a TCP connection.
>>>
>>> Of course ;)
>>>
> 
> I configured bonding fail_over_mac=1 ,so the bonding MAC always be the MAC
> address of the currently active slave.
> 
>>>>
>>>> dst_entry is 1:n to neigh_entry and as such we can end up confirming an
>>>> aging neighbor while sending a reply with dst->pending_confirm set while
>>>> the confirming packet actually came from a different neighbor.
>>>>
>>>> I agree with Julian, pending_confirm became useless in this way.
>>>
>>> Let's kill it then ;)
>>
>> 	It works for traffic via gateway. I now see that
>> we can even avoid write in dst_confirm:
>>
>> 	if (!dst->pending_confirm)
>> 		dst->pending_confirm = 1;
>>
>> 	because it is called by non-dup TCP ACKs.
>>
>> 	But for traffic to hosts on LAN we need different solution,
>> i.e. for cached dsts with rt_gateway = 0 (last entry below).
>>
>> rt_uses_gateway rt_gateway DST_NOCACHE Description
>> ====================================================================
>> 1               nh_gw      ANY         Traffic via gateway
>> 0               LAN_host   1           FLOWI_FLAG_KNOWN_NH (nexthop
>>                                        set by IPVS, hdrincl, xt_TEE)
>> 0               0          0           1 dst for many subnet hosts
>>
>> Regards
>>
>> --
>> Julian Anastasov <ja@ssi.bg>
>>
>> .
>>
> 
> As above,Is there a plan to fix the problem ? Should we just not call dst_confirm
> when in the case rt->rt_uses_gateway/DST_NOCACHE?

I think some people are thinking about it already (me also ;) ).

But it is not easy to come up with a solution. First of all, we need to
look up the L2 address again in the neighbor cache and confirm the
appropriate neighbor. Secondly we should only do that for packets which
we can actually confirm (that means passing the TCP recv tests or some
other kind of confirmation besides simply spamming the box etc). Also it
needs to be fast.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: net/arp: ARP cache aging failed.
  2016-11-24 12:28             ` Hannes Frederic Sowa
@ 2016-11-25  8:18               ` Julian Anastasov
  2016-11-25 13:53                 ` Hannes Frederic Sowa
  0 siblings, 1 reply; 16+ messages in thread
From: Julian Anastasov @ 2016-11-25  8:18 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: YueHaibing, Eric Dumazet, davem, netdev


	Hello,

On Thu, 24 Nov 2016, Hannes Frederic Sowa wrote:

> I think some people are thinking about it already (me also ;) ).
> 
> But it is not easy to come up with a solution. First of all, we need to
> look up the L2 address again in the neighbor cache and confirm the
> appropriate neighbor. Secondly we should only do that for packets which
> we can actually confirm (that means passing the TCP recv tests or some
> other kind of confirmation besides simply spamming the box etc). Also it
> needs to be fast.

	Another option would be to add similar bit to
struct sock (sk_dst_pending_confirm), may be ip_finish_output2
can propagate it to dst_neigh_output via new arg and then to
reset it. Or to change n->confirmed via some new inline sock
function instead of changing dst_neigh_output. At this place 
skb->sk is optional, may be we need (skb->sk && dst ==
skb->sk->sk_dst_cache) check. Also, when sk_dst_cache changes
we should clear this flag.

Regards

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: net/arp: ARP cache aging failed.
  2016-11-25  8:18               ` Julian Anastasov
@ 2016-11-25 13:53                 ` Hannes Frederic Sowa
  2016-11-25 20:40                   ` Julian Anastasov
  0 siblings, 1 reply; 16+ messages in thread
From: Hannes Frederic Sowa @ 2016-11-25 13:53 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: YueHaibing, Eric Dumazet, davem, netdev

On 25.11.2016 09:18, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Thu, 24 Nov 2016, Hannes Frederic Sowa wrote:
> 
>> I think some people are thinking about it already (me also ;) ).
>>
>> But it is not easy to come up with a solution. First of all, we need to
>> look up the L2 address again in the neighbor cache and confirm the
>> appropriate neighbor. Secondly we should only do that for packets which
>> we can actually confirm (that means passing the TCP recv tests or some
>> other kind of confirmation besides simply spamming the box etc). Also it
>> needs to be fast.
> 
> 	Another option would be to add similar bit to
> struct sock (sk_dst_pending_confirm), may be ip_finish_output2
> can propagate it to dst_neigh_output via new arg and then to
> reset it.

I don't understand how this can help? Maybe I understood it wrong?

> Or to change n->confirmed via some new inline sock
> function instead of changing dst_neigh_output. At this place 
> skb->sk is optional, may be we need (skb->sk && dst ==
> skb->sk->sk_dst_cache) check. Also, when sk_dst_cache changes
> we should clear this flag.

In "(skb->sk && dst == skb->sk->sk_dst_cache)" where does dst come from?

I don't see a possibility besides using mac_header or inner_mac_header
to look up the incoming MAC address and confirm that one in the neighbor
cache so far (we could try to optimize this case for rt_gateway though).

Bye,
Hannes

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: net/arp: ARP cache aging failed.
  2016-11-25 13:53                 ` Hannes Frederic Sowa
@ 2016-11-25 20:40                   ` Julian Anastasov
  2016-12-14 10:54                     ` YueHaibing
  0 siblings, 1 reply; 16+ messages in thread
From: Julian Anastasov @ 2016-11-25 20:40 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: YueHaibing, Eric Dumazet, David S. Miller, netdev


	Hello,

On Fri, 25 Nov 2016, Hannes Frederic Sowa wrote:

> On 25.11.2016 09:18, Julian Anastasov wrote:
> > 
> > 	Another option would be to add similar bit to
> > struct sock (sk_dst_pending_confirm), may be ip_finish_output2
> > can propagate it to dst_neigh_output via new arg and then to
> > reset it.
> 
> I don't understand how this can help? Maybe I understood it wrong?

	The idea is, the indication from highest level (TCP) to
be propagated to lowest level (neigh).

> > Or to change n->confirmed via some new inline sock
> > function instead of changing dst_neigh_output. At this place 
> > skb->sk is optional, may be we need (skb->sk && dst ==
> > skb->sk->sk_dst_cache) check. Also, when sk_dst_cache changes
> > we should clear this flag.
> 
> In "(skb->sk && dst == skb->sk->sk_dst_cache)" where does dst come from?

	This is in case we do not want to propagate indication
from TCP to tunnels, see below...

> I don't see a possibility besides using mac_header or inner_mac_header
> to look up the incoming MAC address and confirm that one in the neighbor
> cache so far (we could try to optimize this case for rt_gateway though).

	My idea is as follows:

- TCP instead of calling dst_confirm should call new func
dst_confirm_sk(sk) where sk->sk_dst_pending_confirm is set,
not dst->pending_confirm.

- ip_finish_output2: use skb->sk (TCP) to check for
sk_dst_pending_confirm and update n->confirmed in some
new inline func

	Correct me if I'm wrong but here is how I see the
situation:

- TCP caches output dst in socket, for example, dst1,
sets it as skb->dst

- if xfrm tunnels are used then dst1 can point to some tunnel,
i.e. it is not in all cases the same skb->dst, as seen by 
ip_finish_output2

- netfilter can DNAT and assign different skb->dst

- as result, before now, dst1->pending_confirm is set but
it is not used properly because ip_finish_output2 uses
the final skb->dst which can be different or with
rt_gateway = 0

- considering that tunnels do not use dst_confirm,
n->confirmed is not changed every time. There are some
interesting cases:

1. both dst1 and the final skb->dst point to same
dst with rt_gateway = 0: n->confirmed is updated but
as wee see it can be for wrong neigh.

2. dst1 is different from skb->dst, so n->confirmed is
not changed. This can happen on DNAT or when using
tunnel to secure gateway.

- in ip_finish_output2 we have skb->sk (original TCP sk) and
sk arg (equal to skb->sk or second option is sock of some UDP
tunnel, etc). The idea is to use skb->sk->sk_dst_pending_confirm,
i.e. highest level sk because the sk arg, if different, does not
have such flag set (tunnels do not call dst_confirm).

- ip_finish_output2 should call new func dst_neigh_confirm_sk
(which has nothing related to dst, hm... better name is needed):

 	if (!IS_ERR(neigh)) {
-		int res = dst_neigh_output(dst, neigh, skb);
+		int res;
+
+		dst_neigh_confirm_sk(skb->sk, neigh);
+		res = dst_neigh_output(dst, neigh, skb);

	which should do something like this:

	if (sk && sk->sk_dst_pending_confirm) {
		unsigned long now = jiffies;

		sk->sk_dst_pending_confirm = 0;
		/* avoid dirtying neighbour */
		if (n->confirmed != now)
			n->confirmed = now;
	}

	Additional dst == skb->sk->sk_dst_cache check will
not propagate the indication on DNAT/tunnel. For me,
it is better without such check.

	So, the idea is to move TCP and other similar
users to the new dst_confirm_sk() method. If other
dst_confirm() users are left, they should be checked
if dsts with rt_gateway = 0 can be wrongly used.

Regards

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: net/arp: ARP cache aging failed.
  2016-11-25 20:40                   ` Julian Anastasov
@ 2016-12-14 10:54                     ` YueHaibing
  2016-12-14 12:56                       ` [PATCH] arp: do neigh confirm based on sk arg kbuild test robot
                                         ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: YueHaibing @ 2016-12-14 10:54 UTC (permalink / raw)
  To: Julian Anastasov, Hannes Frederic Sowa
  Cc: Eric Dumazet, David S. Miller, netdev

On 2016/11/26 4:40, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Fri, 25 Nov 2016, Hannes Frederic Sowa wrote:
> 
>> On 25.11.2016 09:18, Julian Anastasov wrote:
>>>
>>> 	Another option would be to add similar bit to
>>> struct sock (sk_dst_pending_confirm), may be ip_finish_output2
>>> can propagate it to dst_neigh_output via new arg and then to
>>> reset it.
>>
>> I don't understand how this can help? Maybe I understood it wrong?
> 
> 	The idea is, the indication from highest level (TCP) to
> be propagated to lowest level (neigh).
> 
>>> Or to change n->confirmed via some new inline sock
>>> function instead of changing dst_neigh_output. At this place 
>>> skb->sk is optional, may be we need (skb->sk && dst ==
>>> skb->sk->sk_dst_cache) check. Also, when sk_dst_cache changes
>>> we should clear this flag.
>>
>> In "(skb->sk && dst == skb->sk->sk_dst_cache)" where does dst come from?
> 
> 	This is in case we do not want to propagate indication
> from TCP to tunnels, see below...
> 
>> I don't see a possibility besides using mac_header or inner_mac_header
>> to look up the incoming MAC address and confirm that one in the neighbor
>> cache so far (we could try to optimize this case for rt_gateway though).
> 
> 	My idea is as follows:
> 
> - TCP instead of calling dst_confirm should call new func
> dst_confirm_sk(sk) where sk->sk_dst_pending_confirm is set,
> not dst->pending_confirm.
> 
> - ip_finish_output2: use skb->sk (TCP) to check for
> sk_dst_pending_confirm and update n->confirmed in some
> new inline func
> 
> 	Correct me if I'm wrong but here is how I see the
> situation:
> 
> - TCP caches output dst in socket, for example, dst1,
> sets it as skb->dst
> 
> - if xfrm tunnels are used then dst1 can point to some tunnel,
> i.e. it is not in all cases the same skb->dst, as seen by 
> ip_finish_output2
> 
> - netfilter can DNAT and assign different skb->dst
> 
> - as result, before now, dst1->pending_confirm is set but
> it is not used properly because ip_finish_output2 uses
> the final skb->dst which can be different or with
> rt_gateway = 0
> 
> - considering that tunnels do not use dst_confirm,
> n->confirmed is not changed every time. There are some
> interesting cases:
> 
> 1. both dst1 and the final skb->dst point to same
> dst with rt_gateway = 0: n->confirmed is updated but
> as wee see it can be for wrong neigh.
> 
> 2. dst1 is different from skb->dst, so n->confirmed is
> not changed. This can happen on DNAT or when using
> tunnel to secure gateway.
> 
> - in ip_finish_output2 we have skb->sk (original TCP sk) and
> sk arg (equal to skb->sk or second option is sock of some UDP
> tunnel, etc). The idea is to use skb->sk->sk_dst_pending_confirm,
> i.e. highest level sk because the sk arg, if different, does not
> have such flag set (tunnels do not call dst_confirm).
> 
> - ip_finish_output2 should call new func dst_neigh_confirm_sk
> (which has nothing related to dst, hm... better name is needed):
> 
>  	if (!IS_ERR(neigh)) {
> -		int res = dst_neigh_output(dst, neigh, skb);
> +		int res;
> +
> +		dst_neigh_confirm_sk(skb->sk, neigh);
> +		res = dst_neigh_output(dst, neigh, skb);
> 
> 	which should do something like this:
> 
> 	if (sk && sk->sk_dst_pending_confirm) {
> 		unsigned long now = jiffies;
> 
> 		sk->sk_dst_pending_confirm = 0;
> 		/* avoid dirtying neighbour */
> 		if (n->confirmed != now)
> 			n->confirmed = now;
> 	}
> 
> 	Additional dst == skb->sk->sk_dst_cache check will
> not propagate the indication on DNAT/tunnel. For me,
> it is better without such check.
> 
> 	So, the idea is to move TCP and other similar
> users to the new dst_confirm_sk() method. If other
> dst_confirm() users are left, they should be checked
> if dsts with rt_gateway = 0 can be wrongly used.
> 
> Regards
> 
> --
> Julian Anastasov <ja@ssi.bg>
> 
> .
> 

Sorry for so late.

Based on your ideas, I make a patch and test it for a while.

It works for me.

>From dc033fe4bac8931590e15774a298f04ccea8ed4c Mon Sep 17 00:00:00 2001
From: YueHabing <yuehaibing@huawei.com>
Date: Wed, 14 Dec 2016 02:43:51 -0500
Subject: [PATCH] arp: do neigh confirm based on sk arg

Signed-off-by: YueHabing <yuehaibing@huawei.com>
---
 include/net/sock.h     | 18 ++++++++++++++++++
 net/ipv4/ip_output.c   |  5 ++++-
 net/ipv4/ping.c        |  2 +-
 net/ipv4/raw.c         |  2 +-
 net/ipv4/tcp_input.c   |  8 ++------
 net/ipv4/tcp_metrics.c |  5 +++--
 net/ipv4/udp.c         |  2 +-
 net/ipv6/raw.c         |  2 +-
 net/ipv6/route.c       |  6 +++---
 net/ipv6/udp.c         |  2 +-
 10 files changed, 35 insertions(+), 17 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 92b2697..ece8dfa 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -434,6 +434,7 @@ struct sock {
 	struct sk_buff		*sk_send_head;
 	__s32			sk_peek_off;
 	int			sk_write_pending;
+	unsigned short          sk_dst_pending_confirm;
 #ifdef CONFIG_SECURITY
 	void			*sk_security;
 #endif
@@ -2263,6 +2264,23 @@ static inline void sk_state_store(struct sock *sk, int newstate)
 	smp_store_release(&sk->sk_state, newstate);
 }

+static inline void dst_confirm_sk(struct sock *sk)
+{
+        sk->sk_dst_pending_confirm = 1;
+}
+
+static inline void dst_neigh_confirm_sk(struct sock *sk, struct neighbour *n)
+{
+        if (sk && sk->sk_dst_pending_confirm) {
+                unsigned long now = jiffies;
+
+                sk->sk_dst_pending_confirm = 0;
+                /* avoid dirtying neighbour */
+                if (n->confirmed != now)
+                        n->confirmed = now;
+        }
+}
+
 void sock_enable_timestamp(struct sock *sk, int flag);
 int sock_get_timestamp(struct sock *, struct timeval __user *);
 int sock_get_timestampns(struct sock *, struct timespec __user *);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 877bdb0..7c9b640 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -221,7 +221,10 @@ static int ip_finish_output2(struct net *net, struct sock *sk, struct sk_buff *s
 	if (unlikely(!neigh))
 		neigh = __neigh_create(&arp_tbl, &nexthop, dev, false);
 	if (!IS_ERR(neigh)) {
-		int res = dst_neigh_output(dst, neigh, skb);
+		int res;
+
+		dst_neigh_confirm_sk(skb->sk, neigh);
+		res = dst_neigh_output(dst, neigh, skb);

 		rcu_read_unlock_bh();
 		return res;
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 96b8e2b..bcff1c6 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -847,7 +847,7 @@ static int ping_v4_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	return err;

 do_confirm:
-	dst_confirm(&rt->dst);
+	dst_confirm_sk(sk);
 	if (!(msg->msg_flags & MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index ecbe5a7..e01424d 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -664,7 +664,7 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	return len;

 do_confirm:
-	dst_confirm(&rt->dst);
+	dst_confirm_sk(sk);
 	if (!(msg->msg_flags & MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c71d49c..92b2060 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3702,9 +3702,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 		tcp_process_tlp_ack(sk, ack, flag);

 	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP)) {
-		struct dst_entry *dst = __sk_dst_get(sk);
-		if (dst)
-			dst_confirm(dst);
+		dst_confirm_sk(sk);
 	}

 	if (icsk->icsk_pending == ICSK_TIME_RETRANS)
@@ -6049,9 +6047,7 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
 		tcp_set_state(sk, TCP_FIN_WAIT2);
 		sk->sk_shutdown |= SEND_SHUTDOWN;

-		dst = __sk_dst_get(sk);
-		if (dst)
-			dst_confirm(dst);
+		dst_confirm_sk(sk);

 		if (!sock_flag(sk, SOCK_DEAD)) {
 			/* Wake up lingering close() */
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index bf1f3b2..375ff08 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -379,7 +379,8 @@ void tcp_update_metrics(struct sock *sk)
 		return;

 	if (dst->flags & DST_HOST)
-		dst_confirm(dst);
+		dst_confirm_sk(sk);
+		

 	rcu_read_lock();
 	if (icsk->icsk_backoff || !tp->srtt_us) {
@@ -496,7 +497,7 @@ void tcp_init_metrics(struct sock *sk)
 	if (!dst)
 		goto reset;

-	dst_confirm(dst);
+	dst_confirm_sk(sk);

 	rcu_read_lock();
 	tm = tcp_get_metrics(sk, dst, true);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 5bab6c3..f7852d3 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1111,7 +1111,7 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	return err;

 do_confirm:
-	dst_confirm(&rt->dst);
+	dst_confirm_sk(sk);
 	if (!(msg->msg_flags&MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 054a1d8..bbb09a4 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -927,7 +927,7 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	txopt_put(opt_to_free);
 	return err < 0 ? err : len;
 do_confirm:
-	dst_confirm(dst);
+	dst_confirm_sk(sk);
 	if (!(msg->msg_flags & MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 1b57e11..8df4671 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1356,7 +1356,7 @@ static bool rt6_cache_allowed_for_pmtu(const struct rt6_info *rt)
 		(rt->rt6i_flags & RTF_PCPU || rt->rt6i_node);
 }

-static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
+static void __ip6_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
 				 const struct ipv6hdr *iph, u32 mtu)
 {
 	struct rt6_info *rt6 = (struct rt6_info *)dst;
@@ -1367,7 +1367,7 @@ static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
 	if (dst_metric_locked(dst, RTAX_MTU))
 		return;

-	dst_confirm(dst);
+	dst_confirm_sk(sk);
 	mtu = max_t(u32, mtu, IPV6_MIN_MTU);
 	if (mtu >= dst_mtu(dst))
 		return;
@@ -2248,7 +2248,7 @@ static void rt6_do_redirect(struct dst_entry *dst, struct sock *sk, struct sk_bu
 	 * Look, redirects are sent only in response to data packets,
 	 * so that this nexthop apparently is reachable. --ANK
 	 */
-	dst_confirm(&rt->dst);
+	dst_confirm_sk(sk);

 	neigh = __neigh_lookup(&nd_tbl, &msg->target, skb->dev, 1);
 	if (!neigh)
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index e4a8000..6057101 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1309,7 +1309,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	return err;

 do_confirm:
-	dst_confirm(dst);
+	dst_confirm_sk(sk);
 	if (!(msg->msg_flags&MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] arp: do neigh confirm based on sk arg
  2016-12-14 10:54                     ` YueHaibing
@ 2016-12-14 12:56                       ` kbuild test robot
  2016-12-14 13:37                       ` kbuild test robot
  2016-12-14 20:15                       ` net/arp: ARP cache aging failed Julian Anastasov
  2 siblings, 0 replies; 16+ messages in thread
From: kbuild test robot @ 2016-12-14 12:56 UTC (permalink / raw)
  To: YueHaibing
  Cc: kbuild-all, Julian Anastasov, Hannes Frederic Sowa, Eric Dumazet,
	David S. Miller, netdev

[-- Attachment #1: Type: text/plain, Size: 2997 bytes --]

Hi YueHaibing,

[auto build test WARNING on v4.9-rc8]
[cannot apply to net/master net-next/master sparc-next/master next-20161214]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/YueHaibing/arp-do-neigh-confirm-based-on-sk-arg/20161214-191755
config: openrisc-or1ksim_defconfig (attached as .config)
compiler: or32-linux-gcc (GCC) 4.5.1-or32-1.0rc1
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=openrisc 

All warnings (new ones prefixed by >>):

   net/ipv4/tcp_input.c: In function 'tcp_rcv_state_process':
>> net/ipv4/tcp_input.c:6003:21: warning: unused variable 'dst'

vim +/dst +6003 net/ipv4/tcp_input.c

^1da177e4 Linus Torvalds 2005-04-16  5987  			 */
168a8f580 Jerry Chu      2012-08-31  5988  			tcp_rearm_rto(sk);
168a8f580 Jerry Chu      2012-08-31  5989  		} else
^1da177e4 Linus Torvalds 2005-04-16  5990  			tcp_init_metrics(sk);
^1da177e4 Linus Torvalds 2005-04-16  5991  
c0402760f Yuchung Cheng  2016-09-19  5992  		if (!inet_csk(sk)->icsk_ca_ops->cong_control)
02cf4ebd8 Neal Cardwell  2013-10-21  5993  			tcp_update_pacing_rate(sk);
02cf4ebd8 Neal Cardwell  2013-10-21  5994  
61eb90035 Joe Perches    2013-05-24  5995  		/* Prevent spurious tcp_cwnd_restart() on first data packet */
^1da177e4 Linus Torvalds 2005-04-16  5996  		tp->lsndtime = tcp_time_stamp;
^1da177e4 Linus Torvalds 2005-04-16  5997  
^1da177e4 Linus Torvalds 2005-04-16  5998  		tcp_initialize_rcv_mss(sk);
^1da177e4 Linus Torvalds 2005-04-16  5999  		tcp_fast_path_on(tp);
^1da177e4 Linus Torvalds 2005-04-16  6000  		break;
^1da177e4 Linus Torvalds 2005-04-16  6001  
c48b22daa Joe Perches    2013-05-24  6002  	case TCP_FIN_WAIT1: {
c48b22daa Joe Perches    2013-05-24 @6003  		struct dst_entry *dst;
c48b22daa Joe Perches    2013-05-24  6004  		int tmo;
c48b22daa Joe Perches    2013-05-24  6005  
168a8f580 Jerry Chu      2012-08-31  6006  		/* If we enter the TCP_FIN_WAIT1 state and we are a
168a8f580 Jerry Chu      2012-08-31  6007  		 * Fast Open socket and this is the first acceptable
168a8f580 Jerry Chu      2012-08-31  6008  		 * ACK we have received, this would have acknowledged
168a8f580 Jerry Chu      2012-08-31  6009  		 * our SYNACK so stop the SYNACK timer.
168a8f580 Jerry Chu      2012-08-31  6010  		 */
00db41243 Ian Morris     2015-04-03  6011  		if (req) {

:::::: The code at line 6003 was first introduced by commit
:::::: c48b22daa6062fff9eded311b4d6974c29b40487 tcp: Remove 2 indentation levels in tcp_rcv_state_process

:::::: TO: Joe Perches <joe@perches.com>
:::::: CC: David S. Miller <davem@davemloft.net>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 7277 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] arp: do neigh confirm based on sk arg
  2016-12-14 10:54                     ` YueHaibing
  2016-12-14 12:56                       ` [PATCH] arp: do neigh confirm based on sk arg kbuild test robot
@ 2016-12-14 13:37                       ` kbuild test robot
  2016-12-14 20:15                       ` net/arp: ARP cache aging failed Julian Anastasov
  2 siblings, 0 replies; 16+ messages in thread
From: kbuild test robot @ 2016-12-14 13:37 UTC (permalink / raw)
  To: YueHaibing
  Cc: kbuild-all, Julian Anastasov, Hannes Frederic Sowa, Eric Dumazet,
	David S. Miller, netdev

[-- Attachment #1: Type: text/plain, Size: 2920 bytes --]

Hi YueHaibing,

[auto build test WARNING on v4.9-rc8]
[cannot apply to net/master net-next/master sparc-next/master next-20161214]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/YueHaibing/arp-do-neigh-confirm-based-on-sk-arg/20161214-191755
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

>> include/net/sock.h:452: warning: No description found for parameter 'sk_dst_pending_confirm'

vim +/sk_dst_pending_confirm +452 include/net/sock.h

^1da177e4 Linus Torvalds  2005-04-16  436  	int			sk_write_pending;
4ea59a6cc YueHaibing      2016-12-14  437  	unsigned short          sk_dst_pending_confirm;
d5f642384 Alexey Dobriyan 2008-11-04  438  #ifdef CONFIG_SECURITY
^1da177e4 Linus Torvalds  2005-04-16  439  	void			*sk_security;
d5f642384 Alexey Dobriyan 2008-11-04  440  #endif
2a56a1fec Tejun Heo       2015-12-07  441  	struct sock_cgroup_data	sk_cgrp_data;
baac50bbc Johannes Weiner 2016-01-14  442  	struct mem_cgroup	*sk_memcg;
^1da177e4 Linus Torvalds  2005-04-16  443  	void			(*sk_state_change)(struct sock *sk);
676d23690 David S. Miller 2014-04-11  444  	void			(*sk_data_ready)(struct sock *sk);
^1da177e4 Linus Torvalds  2005-04-16  445  	void			(*sk_write_space)(struct sock *sk);
^1da177e4 Linus Torvalds  2005-04-16  446  	void			(*sk_error_report)(struct sock *sk);
^1da177e4 Linus Torvalds  2005-04-16  447  	int			(*sk_backlog_rcv)(struct sock *sk,
^1da177e4 Linus Torvalds  2005-04-16  448  						  struct sk_buff *skb);
^1da177e4 Linus Torvalds  2005-04-16  449  	void                    (*sk_destruct)(struct sock *sk);
ef456144d Craig Gallek    2016-01-04  450  	struct sock_reuseport __rcu	*sk_reuseport_cb;
a4298e452 Eric Dumazet    2016-04-01  451  	struct rcu_head		sk_rcu;
^1da177e4 Linus Torvalds  2005-04-16 @452  };
^1da177e4 Linus Torvalds  2005-04-16  453  
559835ea7 Pravin B Shelar 2013-09-24  454  #define __sk_user_data(sk) ((*((void __rcu **)&(sk)->sk_user_data)))
559835ea7 Pravin B Shelar 2013-09-24  455  
559835ea7 Pravin B Shelar 2013-09-24  456  #define rcu_dereference_sk_user_data(sk)	rcu_dereference(__sk_user_data((sk)))
559835ea7 Pravin B Shelar 2013-09-24  457  #define rcu_assign_sk_user_data(sk, ptr)	rcu_assign_pointer(__sk_user_data((sk)), ptr)
559835ea7 Pravin B Shelar 2013-09-24  458  
4a17fd522 Pavel Emelyanov 2012-04-19  459  /*
4a17fd522 Pavel Emelyanov 2012-04-19  460   * SK_CAN_REUSE and SK_NO_REUSE on a socket mean that the socket is OK

:::::: The code at line 452 was first introduced by commit
:::::: 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 Linux-2.6.12-rc2

:::::: TO: Linus Torvalds <torvalds@ppc970.osdl.org>
:::::: CC: Linus Torvalds <torvalds@ppc970.osdl.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 6425 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: net/arp: ARP cache aging failed.
  2016-12-14 10:54                     ` YueHaibing
  2016-12-14 12:56                       ` [PATCH] arp: do neigh confirm based on sk arg kbuild test robot
  2016-12-14 13:37                       ` kbuild test robot
@ 2016-12-14 20:15                       ` Julian Anastasov
  2016-12-15  2:58                         ` YueHaibing
  2 siblings, 1 reply; 16+ messages in thread
From: Julian Anastasov @ 2016-12-14 20:15 UTC (permalink / raw)
  To: YueHaibing; +Cc: Hannes Frederic Sowa, Eric Dumazet, David S. Miller, netdev


	Hello,

On Wed, 14 Dec 2016, YueHaibing wrote:

> On 2016/11/26 4:40, Julian Anastasov wrote:
> > 
> > 	So, the idea is to move TCP and other similar
> > users to the new dst_confirm_sk() method. If other
> > dst_confirm() users are left, they should be checked
> > if dsts with rt_gateway = 0 can be wrongly used.
> 
> Sorry for so late.

	In fact, I'm late too because I almost finished
my changes, the only remaining part is the cxgb files...

> Based on your ideas, I make a patch and test it for a while.

	The problem is that it is valid only for TCP.
Also, this flag should be reset sometimes, eg. when sk dst
changes...

> It works for me.
> 
> @@ -847,7 +847,7 @@ static int ping_v4_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>  	return err;
> 
>  do_confirm:
> -	dst_confirm(&rt->dst);
> +	dst_confirm_sk(sk);

	MSG_CONFIRM from sendmsg needs special treatment. The
problem is that UDP sending does not lock the socket, so I also
added a skb flag to handle this situation in ip*_append_data.
We do not want threaded application firing at different
destinations to confirm the wrong neighbour. MSG_PROBE is
another issue, the XFRM dst chaining, etc...

	I hope, I'll be ready this weekend with few patches
that change all dst_confirm users... There I'll explain
everything in detail.

Regards

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: net/arp: ARP cache aging failed.
  2016-12-14 20:15                       ` net/arp: ARP cache aging failed Julian Anastasov
@ 2016-12-15  2:58                         ` YueHaibing
  0 siblings, 0 replies; 16+ messages in thread
From: YueHaibing @ 2016-12-15  2:58 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: Hannes Frederic Sowa, Eric Dumazet, David S. Miller, netdev


On 2016/12/15 4:15, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Wed, 14 Dec 2016, YueHaibing wrote:
> 
>> On 2016/11/26 4:40, Julian Anastasov wrote:
>>>
>>> 	So, the idea is to move TCP and other similar
>>> users to the new dst_confirm_sk() method. If other
>>> dst_confirm() users are left, they should be checked
>>> if dsts with rt_gateway = 0 can be wrongly used.
>>
>> Sorry for so late.
> 
> 	In fact, I'm late too because I almost finished
> my changes, the only remaining part is the cxgb files...
> 
>> Based on your ideas, I make a patch and test it for a while.
> 
> 	The problem is that it is valid only for TCP.
> Also, this flag should be reset sometimes, eg. when sk dst
> changes...
> 
>> It works for me.
>>
>> @@ -847,7 +847,7 @@ static int ping_v4_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>>  	return err;
>>
>>  do_confirm:
>> -	dst_confirm(&rt->dst);
>> +	dst_confirm_sk(sk);
> 
> 	MSG_CONFIRM from sendmsg needs special treatment. The
> problem is that UDP sending does not lock the socket, so I also
> added a skb flag to handle this situation in ip*_append_data.
> We do not want threaded application firing at different
> destinations to confirm the wrong neighbour. MSG_PROBE is
> another issue, the XFRM dst chaining, etc...
> 
> 	I hope, I'll be ready this weekend with few patches
> that change all dst_confirm users... There I'll explain
> everything in detail.
> 
> Regards
> 
> --
> Julian Anastasov <ja@ssi.bg>
> 
> .
> 

Great, I'll be glad to do a testing.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-12-15  3:00 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-23  1:16 net/arp: ARP cache aging failed yuehaibing
2016-11-23  8:33 ` Julian Anastasov
2016-11-23 12:05   ` Eric Dumazet
2016-11-23 14:37     ` Hannes Frederic Sowa
2016-11-23 23:49       ` Eric Dumazet
2016-11-24  7:51         ` Julian Anastasov
2016-11-24  9:06           ` YueHaibing
2016-11-24 12:28             ` Hannes Frederic Sowa
2016-11-25  8:18               ` Julian Anastasov
2016-11-25 13:53                 ` Hannes Frederic Sowa
2016-11-25 20:40                   ` Julian Anastasov
2016-12-14 10:54                     ` YueHaibing
2016-12-14 12:56                       ` [PATCH] arp: do neigh confirm based on sk arg kbuild test robot
2016-12-14 13:37                       ` kbuild test robot
2016-12-14 20:15                       ` net/arp: ARP cache aging failed Julian Anastasov
2016-12-15  2:58                         ` YueHaibing

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox