netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [FYI] xfrm: Don't lookup sk_policy for timewait sockets
@ 2015-04-09  8:09 Sebastian Poehn
  2015-04-09  9:07 ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Sebastian Poehn @ 2015-04-09  8:09 UTC (permalink / raw)
  To: netdev

We are running a couple of thousand machines with 3.8 and 3.12. On very few systems
(something below 10) we encounter panics in xfrm code. The main characteristic seams
to be the usage of TPROXY.

Attached patch is only a workaround, as problems may also happen in other code portions
(actually on even fewer systems this happens).

For timewait sockets the memory region of sk_policy does not belong
to us anymore. So there may be someone else using it and we may panic
because of corrupted pointers.

xfrm_sk_policy_lookup+0x38/0x66
xfrm_lookup+0x93/0x48f
nf_nat_packet+0x92/0xa4 [nf_nat]
_decode_session4+0xd9/0x294
nf_xfrm_me_harder+0x50/0xc5 [nf_nat]
nf_nat_ipv4_out+0xad/0xc4 [iptable_nat]
nf_iterate+0x42/0x7d
ip_finish_output2+0x2b1/0x2b1
nf_hook_slow+0x22f/0x2c9
ip_finish_output2+0x2b1/0x2b1
ip_finish_output2+0x2b1/0x2b1
__xfrm_route_forward+0x7a/0x97
ip_finish_output2+0x2b1/0x2b1
NF_HOOK_COND+0x3f/0x54
ip_output+0x5a/0x5e
__netif_receive_skb+0x4b2/0x514
process_backlog+0xee/0x1c5
net_rx_action+0xa7/0x1fe

Signed-off-by: Sebastian Poehn <sebastian.poehn@gmail.com>
---
 net/xfrm/xfrm_policy.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 9c6b1ab..e9a74fa 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -2072,7 +2072,7 @@ restart:
 	xdst = NULL;
 	route = NULL;
 
-	if (sk && sk->sk_policy[XFRM_POLICY_OUT]) {
+	if (sk && sk->sk_state != TCP_TIME_WAIT && sk->sk_policy[XFRM_POLICY_OUT]) {
 		num_pols = 1;
 		pols[0] = xfrm_sk_policy_lookup(sk, XFRM_POLICY_OUT, fl);
 		err = xfrm_expand_policies(fl, family, pols,
@@ -2349,7 +2349,7 @@ int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb,
 	}
 
 	pol = NULL;
-	if (sk && sk->sk_policy[dir]) {
+	if (sk && sk->sk_state != TCP_TIME_WAIT && sk->sk_policy[dir]) {
 		pol = xfrm_sk_policy_lookup(sk, dir, &fl);
 		if (IS_ERR(pol)) {
 			XFRM_INC_STATS(net, LINUX_MIB_XFRMINPOLERROR);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [FYI] xfrm: Don't lookup sk_policy for timewait sockets
  2015-04-09  8:09 [FYI] xfrm: Don't lookup sk_policy for timewait sockets Sebastian Poehn
@ 2015-04-09  9:07 ` Eric Dumazet
  2015-04-09  9:24   ` Sebastian Poehn
  2015-04-09 18:37   ` David Miller
  0 siblings, 2 replies; 15+ messages in thread
From: Eric Dumazet @ 2015-04-09  9:07 UTC (permalink / raw)
  To: Sebastian Poehn; +Cc: netdev

On Thu, 2015-04-09 at 10:09 +0200, Sebastian Poehn wrote:
> We are running a couple of thousand machines with 3.8 and 3.12. On very few systems
> (something below 10) we encounter panics in xfrm code. The main characteristic seams
> to be the usage of TPROXY.
> 
> Attached patch is only a workaround, as problems may also happen in other code portions
> (actually on even fewer systems this happens).
> 
> For timewait sockets the memory region of sk_policy does not belong
> to us anymore. So there may be someone else using it and we may panic
> because of corrupted pointers.
> 
> xfrm_sk_policy_lookup+0x38/0x66
> xfrm_lookup+0x93/0x48f
> nf_nat_packet+0x92/0xa4 [nf_nat]
> _decode_session4+0xd9/0x294
> nf_xfrm_me_harder+0x50/0xc5 [nf_nat]
> nf_nat_ipv4_out+0xad/0xc4 [iptable_nat]
> nf_iterate+0x42/0x7d
> ip_finish_output2+0x2b1/0x2b1
> nf_hook_slow+0x22f/0x2c9
> ip_finish_output2+0x2b1/0x2b1
> ip_finish_output2+0x2b1/0x2b1
> __xfrm_route_forward+0x7a/0x97
> ip_finish_output2+0x2b1/0x2b1
> NF_HOOK_COND+0x3f/0x54
> ip_output+0x5a/0x5e
> __netif_receive_skb+0x4b2/0x514
> process_backlog+0xee/0x1c5
> net_rx_action+0xa7/0x1fe
> 
> Signed-off-by: Sebastian Poehn <sebastian.poehn@gmail.com>
> ---
>  net/xfrm/xfrm_policy.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
> index 9c6b1ab..e9a74fa 100644
> --- a/net/xfrm/xfrm_policy.c
> +++ b/net/xfrm/xfrm_policy.c
> @@ -2072,7 +2072,7 @@ restart:
>  	xdst = NULL;
>  	route = NULL;
>  
> -	if (sk && sk->sk_policy[XFRM_POLICY_OUT]) {
> +	if (sk && sk->sk_state != TCP_TIME_WAIT && sk->sk_policy[XFRM_POLICY_OUT]) {
>  		num_pols = 1;
>  		pols[0] = xfrm_sk_policy_lookup(sk, XFRM_POLICY_OUT, fl);
>  		err = xfrm_expand_policies(fl, family, pols,
> @@ -2349,7 +2349,7 @@ int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb,
>  	}
>  
>  	pol = NULL;
> -	if (sk && sk->sk_policy[dir]) {
> +	if (sk && sk->sk_state != TCP_TIME_WAIT && sk->sk_policy[dir]) {
>  		pol = xfrm_sk_policy_lookup(sk, dir, &fl);
>  		if (IS_ERR(pol)) {
>  			XFRM_INC_STATS(net, LINUX_MIB_XFRMINPOLERROR);

Hmm.. interesting.

TCP stack never sends packets attached to a socket in timewait state.

On IPv4, the ACK packets sent on behalf of TIME_WAIT follow this path :

tcp_v4_timewait_ack()
 tcp_v4_send_ack()
  ip_send_unicast_reply(*this_cpu_ptr(net->ipv4.tcp_sk))

The per cpu socket used to attach these skb are not in TCP_TIME_WAIT
state.

TPROXY can handle a socket in TCP_TIME_WAIT state only in input path.

So I am a bit confused. Could you elaborate ?

In any case, you probably should use sk_fullsock() new helper, as I
presume you'll have a similar issue with TCP_NEW_SYN_RECV pseudo sockets
when I am done with tcp listener refactoring.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [FYI] xfrm: Don't lookup sk_policy for timewait sockets
  2015-04-09  9:07 ` Eric Dumazet
@ 2015-04-09  9:24   ` Sebastian Poehn
  2015-04-09 18:37   ` David Miller
  1 sibling, 0 replies; 15+ messages in thread
From: Sebastian Poehn @ 2015-04-09  9:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Sebastian Poehn, netdev

On Thu, 2015-04-09 at 02:07 -0700, Eric Dumazet wrote:
> On Thu, 2015-04-09 at 10:09 +0200, Sebastian Poehn wrote:
> > We are running a couple of thousand machines with 3.8 and 3.12. On very few systems
> > (something below 10) we encounter panics in xfrm code. The main characteristic seams
> > to be the usage of TPROXY.
> > 
> > Attached patch is only a workaround, as problems may also happen in other code portions
> > (actually on even fewer systems this happens).
> > 
> > For timewait sockets the memory region of sk_policy does not belong
> > to us anymore. So there may be someone else using it and we may panic
> > because of corrupted pointers.
> > 
> > xfrm_sk_policy_lookup+0x38/0x66
> > xfrm_lookup+0x93/0x48f
> > nf_nat_packet+0x92/0xa4 [nf_nat]
> > _decode_session4+0xd9/0x294
> > nf_xfrm_me_harder+0x50/0xc5 [nf_nat]
> > nf_nat_ipv4_out+0xad/0xc4 [iptable_nat]
> > nf_iterate+0x42/0x7d
> > ip_finish_output2+0x2b1/0x2b1
> > nf_hook_slow+0x22f/0x2c9
> > ip_finish_output2+0x2b1/0x2b1
> > ip_finish_output2+0x2b1/0x2b1
> > __xfrm_route_forward+0x7a/0x97
> > ip_finish_output2+0x2b1/0x2b1
> > NF_HOOK_COND+0x3f/0x54
> > ip_output+0x5a/0x5e
> > __netif_receive_skb+0x4b2/0x514
> > process_backlog+0xee/0x1c5
> > net_rx_action+0xa7/0x1fe
> > 
> > Signed-off-by: Sebastian Poehn <sebastian.poehn@gmail.com>
> > ---
> >  net/xfrm/xfrm_policy.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
> > index 9c6b1ab..e9a74fa 100644
> > --- a/net/xfrm/xfrm_policy.c
> > +++ b/net/xfrm/xfrm_policy.c
> > @@ -2072,7 +2072,7 @@ restart:
> >  	xdst = NULL;
> >  	route = NULL;
> >  
> > -	if (sk && sk->sk_policy[XFRM_POLICY_OUT]) {
> > +	if (sk && sk->sk_state != TCP_TIME_WAIT && sk->sk_policy[XFRM_POLICY_OUT]) {
> >  		num_pols = 1;
> >  		pols[0] = xfrm_sk_policy_lookup(sk, XFRM_POLICY_OUT, fl);
> >  		err = xfrm_expand_policies(fl, family, pols,
> > @@ -2349,7 +2349,7 @@ int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb,
> >  	}
> >  
> >  	pol = NULL;
> > -	if (sk && sk->sk_policy[dir]) {
> > +	if (sk && sk->sk_state != TCP_TIME_WAIT && sk->sk_policy[dir]) {
> >  		pol = xfrm_sk_policy_lookup(sk, dir, &fl);
> >  		if (IS_ERR(pol)) {
> >  			XFRM_INC_STATS(net, LINUX_MIB_XFRMINPOLERROR);
> 
> Hmm.. interesting.
> 
> TCP stack never sends packets attached to a socket in timewait state.
> 
> On IPv4, the ACK packets sent on behalf of TIME_WAIT follow this path :
> 
> tcp_v4_timewait_ack()
>  tcp_v4_send_ack()
>   ip_send_unicast_reply(*this_cpu_ptr(net->ipv4.tcp_sk))
> 
> The per cpu socket used to attach these skb are not in TCP_TIME_WAIT
> state.
> 
> TPROXY can handle a socket in TCP_TIME_WAIT state only in input path.
> 
> So I am a bit confused. Could you elaborate ?

The setup is a usual transparent proxy one. So TPROXY intercepts the
first connection packet and later on we use socket match to direct to
the local IP_TRANSPARENT socket. So sorry for the confusion - xt_socket
was meant.

The strange thing I noticed is that tw_transparent was 0 (we don't use
xt_socket --transparent). But then I wonder how the tuple can match.

I will collect some more information and post them.
> 
> In any case, you probably should use sk_fullsock() new helper, as I
> presume you'll have a similar issue with TCP_NEW_SYN_RECV pseudo sockets
> when I am done with tcp listener refactoring.

I already saw and like sk_fullsock.
> 
> Thanks.
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [FYI] xfrm: Don't lookup sk_policy for timewait sockets
  2015-04-09  9:07 ` Eric Dumazet
  2015-04-09  9:24   ` Sebastian Poehn
@ 2015-04-09 18:37   ` David Miller
  2015-04-09 19:14     ` Florian Westphal
  2015-04-09 19:21     ` Eric Dumazet
  1 sibling, 2 replies; 15+ messages in thread
From: David Miller @ 2015-04-09 18:37 UTC (permalink / raw)
  To: eric.dumazet; +Cc: sebastian.poehn, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 09 Apr 2015 02:07:41 -0700

> TCP stack never sends packets attached to a socket in timewait state.

TPROXY assigns timewait sockets to skb->sk, that's the bug.

In net/netfilter/xt_TPROXY.c:

	tproxy_tg4()
 ...
		sk = nf_tproxy_get_sock_v4(dev_net(skb->dev), iph->protocol,
				   iph->saddr, iph->daddr,
				   hp->source, hp->dest,
				   skb->dev, NFT_LOOKUP_ESTABLISHED);
	/* NOTE: assign_sock consumes our sk reference */
	if (sk && tproxy_sk_is_transparent(sk)) {
		/* This should be in a separate target, but we don't do multiple
		   targets on the same rule yet */
		skb->mark = (skb->mark & ~mark_mask) ^ mark_value;

		pr_debug("redirecting: proto %hhu %pI4:%hu -> %pI4:%hu, mark: %x\n",
			 iph->protocol, &iph->daddr, ntohs(hp->dest),
			 &laddr, ntohs(lport), skb->mark);

		nf_tproxy_assign_sock(skb, sk);
 ...
 /* assign a socket to the skb -- consumes sk */
static void
nf_tproxy_assign_sock(struct sk_buff *skb, struct sock *sk)
{
	skb_orphan(skb);
	skb->sk = sk;
	skb->destructor = sock_edemux;
}

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [FYI] xfrm: Don't lookup sk_policy for timewait sockets
  2015-04-09 18:37   ` David Miller
@ 2015-04-09 19:14     ` Florian Westphal
  2015-04-09 21:07       ` David Miller
  2015-04-09 19:21     ` Eric Dumazet
  1 sibling, 1 reply; 15+ messages in thread
From: Florian Westphal @ 2015-04-09 19:14 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, sebastian.poehn, netdev

David Miller <davem@davemloft.net> wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 09 Apr 2015 02:07:41 -0700
> 
> > TCP stack never sends packets attached to a socket in timewait state.
> 
> TPROXY assigns timewait sockets to skb->sk, that's the bug.

I re-introduced this in fd158d79d33d3c under the assumption
that the input path handles skb->sk timewait sockets correctly
after all the early demux changes, afaics tcp edemux can also
assign skb->sk timewait sockets.

Also, reporter mentions 3.8 as affected which should not assign
tw sockets to skb->sk.

Even more strange, the reporters backtrace seems to indicate
crash at end of forward path.

Sebastian, can you disable tw assignment via TPROXY in 3.12 just
to see if it makes a difference?

[ not doing the assignment is safe provided you still set tproxy mark
  on the skb; policy routing will ensure local delivery ].

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [FYI] xfrm: Don't lookup sk_policy for timewait sockets
  2015-04-09 18:37   ` David Miller
  2015-04-09 19:14     ` Florian Westphal
@ 2015-04-09 19:21     ` Eric Dumazet
  2015-04-09 19:25       ` Florian Westphal
  1 sibling, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2015-04-09 19:21 UTC (permalink / raw)
  To: David Miller; +Cc: sebastian.poehn, netdev

On Thu, 2015-04-09 at 14:37 -0400, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 09 Apr 2015 02:07:41 -0700
> 
> > TCP stack never sends packets attached to a socket in timewait state.
> 
> TPROXY assigns timewait sockets to skb->sk, that's the bug.
> 
> In net/netfilter/xt_TPROXY.c:
> 
> 	tproxy_tg4()
>  ...
> 		sk = nf_tproxy_get_sock_v4(dev_net(skb->dev), iph->protocol,
> 				   iph->saddr, iph->daddr,
> 				   hp->source, hp->dest,
> 				   skb->dev, NFT_LOOKUP_ESTABLISHED);
> 	/* NOTE: assign_sock consumes our sk reference */
> 	if (sk && tproxy_sk_is_transparent(sk)) {
> 		/* This should be in a separate target, but we don't do multiple
> 		   targets on the same rule yet */
> 		skb->mark = (skb->mark & ~mark_mask) ^ mark_value;
> 
> 		pr_debug("redirecting: proto %hhu %pI4:%hu -> %pI4:%hu, mark: %x\n",
> 			 iph->protocol, &iph->daddr, ntohs(hp->dest),
> 			 &laddr, ntohs(lport), skb->mark);
> 
> 		nf_tproxy_assign_sock(skb, sk);
>  ...
>  /* assign a socket to the skb -- consumes sk */
> static void
> nf_tproxy_assign_sock(struct sk_buff *skb, struct sock *sk)
> {
> 	skb_orphan(skb);
> 	skb->sk = sk;
> 	skb->destructor = sock_edemux;
> }


Right, but stack trace shown by Sebastian seems to be an input frame,
and we transmit a frame. This is the part I do not understand, yet.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [FYI] xfrm: Don't lookup sk_policy for timewait sockets
  2015-04-09 19:21     ` Eric Dumazet
@ 2015-04-09 19:25       ` Florian Westphal
  0 siblings, 0 replies; 15+ messages in thread
From: Florian Westphal @ 2015-04-09 19:25 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, sebastian.poehn, netdev

Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2015-04-09 at 14:37 -0400, David Miller wrote:
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Thu, 09 Apr 2015 02:07:41 -0700
> > 
> > > TCP stack never sends packets attached to a socket in timewait state.
> > 
> > TPROXY assigns timewait sockets to skb->sk, that's the bug.
> > 
> > In net/netfilter/xt_TPROXY.c:
> > 
> > 	tproxy_tg4()
> >  ...
> > 		sk = nf_tproxy_get_sock_v4(dev_net(skb->dev), iph->protocol,
> > 				   iph->saddr, iph->daddr,
> > 				   hp->source, hp->dest,
> > 				   skb->dev, NFT_LOOKUP_ESTABLISHED);
> > 	/* NOTE: assign_sock consumes our sk reference */
> > 	if (sk && tproxy_sk_is_transparent(sk)) {
> > 		/* This should be in a separate target, but we don't do multiple
> > 		   targets on the same rule yet */
> > 		skb->mark = (skb->mark & ~mark_mask) ^ mark_value;
> > 
> > 		pr_debug("redirecting: proto %hhu %pI4:%hu -> %pI4:%hu, mark: %x\n",
> > 			 iph->protocol, &iph->daddr, ntohs(hp->dest),
> > 			 &laddr, ntohs(lport), skb->mark);
> > 
> > 		nf_tproxy_assign_sock(skb, sk);
> >  ...
> >  /* assign a socket to the skb -- consumes sk */
> > static void
> > nf_tproxy_assign_sock(struct sk_buff *skb, struct sock *sk)
> > {
> > 	skb_orphan(skb);
> > 	skb->sk = sk;
> > 	skb->destructor = sock_edemux;
> > }
> 
> 
> Right, but stack trace shown by Sebastian seems to be an input frame,
> and we transmit a frame. This is the part I do not understand, yet.

Maybe policy routing is wedged so skb is erronously entering forward
path?

After all TPROXY depends on correct marking of skb + ip rule magic
to redirect skb to localhost.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [FYI] xfrm: Don't lookup sk_policy for timewait sockets
  2015-04-09 19:14     ` Florian Westphal
@ 2015-04-09 21:07       ` David Miller
  2015-04-09 21:21         ` Florian Westphal
  0 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2015-04-09 21:07 UTC (permalink / raw)
  To: fw; +Cc: eric.dumazet, sebastian.poehn, netdev

From: Florian Westphal <fw@strlen.de>
Date: Thu, 9 Apr 2015 21:14:41 +0200

> I re-introduced this in fd158d79d33d3c under the assumption
> that the input path handles skb->sk timewait sockets correctly
> after all the early demux changes, afaics tcp edemux can also
> assign skb->sk timewait sockets.
> 
> Also, reporter mentions 3.8 as affected which should not assign
> tw sockets to skb->sk.
> 
> Even more strange, the reporters backtrace seems to indicate
> crash at end of forward path.
> 
> Sebastian, can you disable tw assignment via TPROXY in 3.12 just
> to see if it makes a difference?
> 
> [ not doing the assignment is safe provided you still set tproxy mark
>   on the skb; policy routing will ensure local delivery ].

My assumption in my analysis is that TPROXY writes the socket to
skb->sk, and it is also being forwarded.  And yes this is based
upon his backtrace.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [FYI] xfrm: Don't lookup sk_policy for timewait sockets
  2015-04-09 21:07       ` David Miller
@ 2015-04-09 21:21         ` Florian Westphal
  2015-04-10 11:14           ` Sebastian Poehn
  0 siblings, 1 reply; 15+ messages in thread
From: Florian Westphal @ 2015-04-09 21:21 UTC (permalink / raw)
  To: David Miller; +Cc: fw, eric.dumazet, sebastian.poehn, netdev

David Miller <davem@davemloft.net> wrote:
> From: Florian Westphal <fw@strlen.de>
> Date: Thu, 9 Apr 2015 21:14:41 +0200
> 
> > I re-introduced this in fd158d79d33d3c under the assumption
> > that the input path handles skb->sk timewait sockets correctly
> > after all the early demux changes, afaics tcp edemux can also
> > assign skb->sk timewait sockets.
> > 
> > Also, reporter mentions 3.8 as affected which should not assign
> > tw sockets to skb->sk.
> > 
> > Even more strange, the reporters backtrace seems to indicate
> > crash at end of forward path.
> > 
> > Sebastian, can you disable tw assignment via TPROXY in 3.12 just
> > to see if it makes a difference?
> > 
> > [ not doing the assignment is safe provided you still set tproxy mark
> >   on the skb; policy routing will ensure local delivery ].
> 
> My assumption in my analysis is that TPROXY writes the socket to
> skb->sk, and it is also being forwarded.  And yes this is based
> upon his backtrace.

Right.  However, I think we might have to make more changes than just tproxy.

If we have sockets bound to non-local addresses then why would tcp edemux
not cause same issue?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [FYI] xfrm: Don't lookup sk_policy for timewait sockets
  2015-04-09 21:21         ` Florian Westphal
@ 2015-04-10 11:14           ` Sebastian Poehn
  2015-04-13  8:04             ` Sebastian Poehn
  0 siblings, 1 reply; 15+ messages in thread
From: Sebastian Poehn @ 2015-04-10 11:14 UTC (permalink / raw)
  To: Florian Westphal; +Cc: David Miller, eric.dumazet, sebastian.poehn, netdev

On Thu, 2015-04-09 at 23:21 +0200, Florian Westphal wrote:
> David Miller <davem@davemloft.net> wrote:
> > From: Florian Westphal <fw@strlen.de>
> > Date: Thu, 9 Apr 2015 21:14:41 +0200
> > 
> > > I re-introduced this in fd158d79d33d3c under the assumption
> > > that the input path handles skb->sk timewait sockets correctly
> > > after all the early demux changes, afaics tcp edemux can also
> > > assign skb->sk timewait sockets.
> > > 
> > > Also, reporter mentions 3.8 as affected which should not assign
> > > tw sockets to skb->sk.
> > > 
> > > Even more strange, the reporters backtrace seems to indicate
> > > crash at end of forward path.
> > > 
> > > Sebastian, can you disable tw assignment via TPROXY in 3.12 just
> > > to see if it makes a difference?
> > > 
> > > [ not doing the assignment is safe provided you still set tproxy mark
> > >   on the skb; policy routing will ensure local delivery ].
> > 
> > My assumption in my analysis is that TPROXY writes the socket to
> > skb->sk, and it is also being forwarded.  And yes this is based
> > upon his backtrace.
> 
> Right.  However, I think we might have to make more changes than just tproxy.
> 
> If we have sockets bound to non-local addresses then why would tcp edemux
> not cause same issue?

Thanks for all the helpful inputs. I will try to provide some more information
and try a bit around with TPROXY not assigning tw sockets.

I will provide you with an update the next days.

Unfortunately this is a very rare event and (yet) impossible to reproduce in-house.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [FYI] xfrm: Don't lookup sk_policy for timewait sockets
  2015-04-10 11:14           ` Sebastian Poehn
@ 2015-04-13  8:04             ` Sebastian Poehn
  2015-04-13 15:09               ` Sebastian Poehn
  0 siblings, 1 reply; 15+ messages in thread
From: Sebastian Poehn @ 2015-04-13  8:04 UTC (permalink / raw)
  To: Pöhn, Sebastian; +Cc: Eric Dumazet, netdev, David Miller, Florian Westphal

Am 10.04.2015 13:14 schrieb "Sebastian Poehn" <sebastian.poehn@gmail.com>:
>
> On Thu, 2015-04-09 at 23:21 +0200, Florian Westphal wrote:
> > David Miller <davem@davemloft.net> wrote:
> > > From: Florian Westphal <fw@strlen.de>
> > > Date: Thu, 9 Apr 2015 21:14:41 +0200
> > >
> > > > I re-introduced this in fd158d79d33d3c under the assumption
> > > > that the input path handles skb->sk timewait sockets correctly
> > > > after all the early demux changes, afaics tcp edemux can also
> > > > assign skb->sk timewait sockets.
> > > >
> > > > Also, reporter mentions 3.8 as affected which should not assign
> > > > tw sockets to skb->sk.
> > > >
> > > > Even more strange, the reporters backtrace seems to indicate
> > > > crash at end of forward path.
> > > >
> > > > Sebastian, can you disable tw assignment via TPROXY in 3.12 just
> > > > to see if it makes a difference?
> > > >
> > > > [ not doing the assignment is safe provided you still set tproxy mark
> > > >   on the skb; policy routing will ensure local delivery ].
> > >
> > > My assumption in my analysis is that TPROXY writes the socket to
> > > skb->sk, and it is also being forwarded.  And yes this is based
> > > upon his backtrace.
> >
> > Right.  However, I think we might have to make more changes than just tproxy.
> >
> > If we have sockets bound to non-local addresses then why would tcp edemux
> > not cause same issue?
>
> Thanks for all the helpful inputs. I will try to provide some more information
> and try a bit around with TPROXY not assigning tw sockets.
>
> I will provide you with an update the next days.
>
> Unfortunately this is a very rare event and (yet) impossible to reproduce in-house.
>
>

Played around with sending crafted packets to a transparent tw socket.

For SYN tproxy does the re-lookup of the listening socket, which is fine. But for
packets without SYN is assigns the tw socket. However this is not an issue as the
fw mark is set, policy routing processes frame, so it becomes input and finally is
dropped in TCP receive path. But if I remove the policy routing rule the frame
enters the forwarding path.

Unfortunately this did not trigger the panic but this may be just by chance.

However I can't explain what is wrong with the ip rule maybe setup related.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [FYI] xfrm: Don't lookup sk_policy for timewait sockets
  2015-04-13  8:04             ` Sebastian Poehn
@ 2015-04-13 15:09               ` Sebastian Poehn
  2015-04-13 15:39                 ` Eric Dumazet
  2015-04-13 16:04                 ` Florian Westphal
  0 siblings, 2 replies; 15+ messages in thread
From: Sebastian Poehn @ 2015-04-13 15:09 UTC (permalink / raw)
  To: Sebastian Poehn; +Cc: Eric Dumazet, netdev, David Miller, Florian Westphal

On Mon, 2015-04-13 at 10:04 +0200, Sebastian Poehn wrote:
> 
> Played around with sending crafted packets to a transparent tw socket.
> 
> For SYN tproxy does the re-lookup of the listening socket, which is fine. But for
> packets without SYN is assigns the tw socket. However this is not an issue as the
> fw mark is set, policy routing processes frame, so it becomes input and finally is
> dropped in TCP receive path. But if I remove the policy routing rule the frame
> enters the forwarding path.
> 
> Unfortunately this did not trigger the panic but this may be just by chance.
> 
> However I can't explain what is wrong with the ip rule maybe setup related.
> 
First of all: This issue will only happen if there is something screwed up with 
policy routing. We don't use any 'exotic' policy to match the TPROXY traffic nor
is there anything that could damage the mark.

ip rule add from all fwmark 0x1/0x1 lookup X

However it happens - maybe a race with configuration.

I found TPROXY behavior correct:
1) For SYN on tw socket it assigns listening socket
2) Otherwise tw socket is assigned with is required for protocol conformity

Principally the problem is that TPROXY cannot ensure that policy routing is
working correctly. Florian suggested me to clean skb->sk in ip_forward. I even think
dropping the frame is fine. Not sure if this is suited for mainline.

diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c
index 939992c..2fc3b3e 100644
--- a/net/ipv4/ip_forward.c
+++ b/net/ipv4/ip_forward.c
@@ -82,6 +82,10 @@ int ip_forward(struct sk_buff *skb)
 	if (skb->pkt_type != PACKET_HOST)
 		goto drop;
 
+	/* this should happen neither */
+	if (unlikely(skb->sk))
+		goto drop;
+
 	if (skb_warn_if_lro(skb))
 		goto drop;
 		
Signed-off-by: Sebastian Poehn <sebastian.poehn@gmail.com> 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [FYI] xfrm: Don't lookup sk_policy for timewait sockets
  2015-04-13 15:09               ` Sebastian Poehn
@ 2015-04-13 15:39                 ` Eric Dumazet
  2015-04-13 17:25                   ` David Miller
  2015-04-13 16:04                 ` Florian Westphal
  1 sibling, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2015-04-13 15:39 UTC (permalink / raw)
  To: Sebastian Poehn; +Cc: netdev, David Miller, Florian Westphal

On Mon, 2015-04-13 at 17:09 +0200, Sebastian Poehn wrote:

> Principally the problem is that TPROXY cannot ensure that policy routing is
> working correctly. Florian suggested me to clean skb->sk in ip_forward. I even think
> dropping the frame is fine. Not sure if this is suited for mainline.

No objections from me, this looks very reasonable.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [FYI] xfrm: Don't lookup sk_policy for timewait sockets
  2015-04-13 15:09               ` Sebastian Poehn
  2015-04-13 15:39                 ` Eric Dumazet
@ 2015-04-13 16:04                 ` Florian Westphal
  1 sibling, 0 replies; 15+ messages in thread
From: Florian Westphal @ 2015-04-13 16:04 UTC (permalink / raw)
  To: Sebastian Poehn; +Cc: Eric Dumazet, netdev, David Miller, Florian Westphal

Sebastian Poehn <sebastian.poehn@gmail.com> wrote:
> On Mon, 2015-04-13 at 10:04 +0200, Sebastian Poehn wrote:
> > 
> > Played around with sending crafted packets to a transparent tw socket.
> > 
> > For SYN tproxy does the re-lookup of the listening socket, which is fine. But for
> > packets without SYN is assigns the tw socket. However this is not an issue as the
> > fw mark is set, policy routing processes frame, so it becomes input and finally is
> > dropped in TCP receive path. But if I remove the policy routing rule the frame
> > enters the forwarding path.
> > 
> > Unfortunately this did not trigger the panic but this may be just by chance.
> > 
> > However I can't explain what is wrong with the ip rule maybe setup related.
> > 
> First of all: This issue will only happen if there is something screwed up with 
> policy routing. We don't use any 'exotic' policy to match the TPROXY traffic nor
> is there anything that could damage the mark.
> 
> ip rule add from all fwmark 0x1/0x1 lookup X
> 
> However it happens - maybe a race with configuration.
> 
> I found TPROXY behavior correct:
> 1) For SYN on tw socket it assigns listening socket
> 2) Otherwise tw socket is assigned with is required for protocol conformity
> 
> Principally the problem is that TPROXY cannot ensure that policy routing is
> working correctly. Florian suggested me to clean skb->sk in ip_forward. I even think
> dropping the frame is fine. Not sure if this is suited for mainline.

I agree, drop is preferable.  I also think this should go in mainline,
kernel shouldn't be prone to oopses just because someone flushed ip rules at wrong
moment.

Thanks Sebastian.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [FYI] xfrm: Don't lookup sk_policy for timewait sockets
  2015-04-13 15:39                 ` Eric Dumazet
@ 2015-04-13 17:25                   ` David Miller
  0 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2015-04-13 17:25 UTC (permalink / raw)
  To: eric.dumazet; +Cc: sebastian.poehn, netdev, fw

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 13 Apr 2015 08:39:04 -0700

> On Mon, 2015-04-13 at 17:09 +0200, Sebastian Poehn wrote:
> 
>> Principally the problem is that TPROXY cannot ensure that policy routing is
>> working correctly. Florian suggested me to clean skb->sk in ip_forward. I even think
>> dropping the frame is fine. Not sure if this is suited for mainline.
> 
> No objections from me, this looks very reasonable.

Someone please submit this to me formally.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-04-13 17:25 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-09  8:09 [FYI] xfrm: Don't lookup sk_policy for timewait sockets Sebastian Poehn
2015-04-09  9:07 ` Eric Dumazet
2015-04-09  9:24   ` Sebastian Poehn
2015-04-09 18:37   ` David Miller
2015-04-09 19:14     ` Florian Westphal
2015-04-09 21:07       ` David Miller
2015-04-09 21:21         ` Florian Westphal
2015-04-10 11:14           ` Sebastian Poehn
2015-04-13  8:04             ` Sebastian Poehn
2015-04-13 15:09               ` Sebastian Poehn
2015-04-13 15:39                 ` Eric Dumazet
2015-04-13 17:25                   ` David Miller
2015-04-13 16:04                 ` Florian Westphal
2015-04-09 19:21     ` Eric Dumazet
2015-04-09 19:25       ` Florian Westphal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).