All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sebastian Poehn <sebastian.poehn@gmail.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sebastian Poehn <sebastian.poehn@gmail.com>, netdev@vger.kernel.org
Subject: Re: [FYI] xfrm: Don't lookup sk_policy for timewait sockets
Date: Thu, 09 Apr 2015 11:24:28 +0200	[thread overview]
Message-ID: <1428571468.6875.17.camel@googlemail.com> (raw)
In-Reply-To: <1428570461.25985.240.camel@edumazet-glaptop2.roam.corp.google.com>

On Thu, 2015-04-09 at 02:07 -0700, Eric Dumazet wrote:
> On Thu, 2015-04-09 at 10:09 +0200, Sebastian Poehn wrote:
> > We are running a couple of thousand machines with 3.8 and 3.12. On very few systems
> > (something below 10) we encounter panics in xfrm code. The main characteristic seams
> > to be the usage of TPROXY.
> > 
> > Attached patch is only a workaround, as problems may also happen in other code portions
> > (actually on even fewer systems this happens).
> > 
> > For timewait sockets the memory region of sk_policy does not belong
> > to us anymore. So there may be someone else using it and we may panic
> > because of corrupted pointers.
> > 
> > xfrm_sk_policy_lookup+0x38/0x66
> > xfrm_lookup+0x93/0x48f
> > nf_nat_packet+0x92/0xa4 [nf_nat]
> > _decode_session4+0xd9/0x294
> > nf_xfrm_me_harder+0x50/0xc5 [nf_nat]
> > nf_nat_ipv4_out+0xad/0xc4 [iptable_nat]
> > nf_iterate+0x42/0x7d
> > ip_finish_output2+0x2b1/0x2b1
> > nf_hook_slow+0x22f/0x2c9
> > ip_finish_output2+0x2b1/0x2b1
> > ip_finish_output2+0x2b1/0x2b1
> > __xfrm_route_forward+0x7a/0x97
> > ip_finish_output2+0x2b1/0x2b1
> > NF_HOOK_COND+0x3f/0x54
> > ip_output+0x5a/0x5e
> > __netif_receive_skb+0x4b2/0x514
> > process_backlog+0xee/0x1c5
> > net_rx_action+0xa7/0x1fe
> > 
> > Signed-off-by: Sebastian Poehn <sebastian.poehn@gmail.com>
> > ---
> >  net/xfrm/xfrm_policy.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
> > index 9c6b1ab..e9a74fa 100644
> > --- a/net/xfrm/xfrm_policy.c
> > +++ b/net/xfrm/xfrm_policy.c
> > @@ -2072,7 +2072,7 @@ restart:
> >  	xdst = NULL;
> >  	route = NULL;
> >  
> > -	if (sk && sk->sk_policy[XFRM_POLICY_OUT]) {
> > +	if (sk && sk->sk_state != TCP_TIME_WAIT && sk->sk_policy[XFRM_POLICY_OUT]) {
> >  		num_pols = 1;
> >  		pols[0] = xfrm_sk_policy_lookup(sk, XFRM_POLICY_OUT, fl);
> >  		err = xfrm_expand_policies(fl, family, pols,
> > @@ -2349,7 +2349,7 @@ int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb,
> >  	}
> >  
> >  	pol = NULL;
> > -	if (sk && sk->sk_policy[dir]) {
> > +	if (sk && sk->sk_state != TCP_TIME_WAIT && sk->sk_policy[dir]) {
> >  		pol = xfrm_sk_policy_lookup(sk, dir, &fl);
> >  		if (IS_ERR(pol)) {
> >  			XFRM_INC_STATS(net, LINUX_MIB_XFRMINPOLERROR);
> 
> Hmm.. interesting.
> 
> TCP stack never sends packets attached to a socket in timewait state.
> 
> On IPv4, the ACK packets sent on behalf of TIME_WAIT follow this path :
> 
> tcp_v4_timewait_ack()
>  tcp_v4_send_ack()
>   ip_send_unicast_reply(*this_cpu_ptr(net->ipv4.tcp_sk))
> 
> The per cpu socket used to attach these skb are not in TCP_TIME_WAIT
> state.
> 
> TPROXY can handle a socket in TCP_TIME_WAIT state only in input path.
> 
> So I am a bit confused. Could you elaborate ?

The setup is a usual transparent proxy one. So TPROXY intercepts the
first connection packet and later on we use socket match to direct to
the local IP_TRANSPARENT socket. So sorry for the confusion - xt_socket
was meant.

The strange thing I noticed is that tw_transparent was 0 (we don't use
xt_socket --transparent). But then I wonder how the tuple can match.

I will collect some more information and post them.
> 
> In any case, you probably should use sk_fullsock() new helper, as I
> presume you'll have a similar issue with TCP_NEW_SYN_RECV pseudo sockets
> when I am done with tcp listener refactoring.

I already saw and like sk_fullsock.
> 
> Thanks.
> 
> 

  reply	other threads:[~2015-04-09  9:24 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-09  8:09 [FYI] xfrm: Don't lookup sk_policy for timewait sockets Sebastian Poehn
2015-04-09  9:07 ` Eric Dumazet
2015-04-09  9:24   ` Sebastian Poehn [this message]
2015-04-09 18:37   ` David Miller
2015-04-09 19:14     ` Florian Westphal
2015-04-09 21:07       ` David Miller
2015-04-09 21:21         ` Florian Westphal
2015-04-10 11:14           ` Sebastian Poehn
2015-04-13  8:04             ` Sebastian Poehn
2015-04-13 15:09               ` Sebastian Poehn
2015-04-13 15:39                 ` Eric Dumazet
2015-04-13 17:25                   ` David Miller
2015-04-13 16:04                 ` Florian Westphal
2015-04-09 19:21     ` Eric Dumazet
2015-04-09 19:25       ` Florian Westphal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1428571468.6875.17.camel@googlemail.com \
    --to=sebastian.poehn@gmail.com \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.