From mboxrd@z Thu Jan 1 00:00:00 1970 From: Patrick McHardy Subject: Re: [PATCH/RFC 01/10] Implement local diversion of IPv4 skbs Date: Wed, 10 Jan 2007 13:19:39 +0100 Message-ID: <45A4D9DB.4000809@trash.net> References: <20070103163357.14635.37754.stgit@nienna.balabit> <20070103163427.14635.49596.stgit@nienna.balabit> <45A48BD6.5010507@trash.net> <200701101117.47576@nienna> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Cc: netfilter-devel@lists.netfilter.org, netdev@vger.kernel.org, Balazs Scheidler Return-path: To: KOVACS Krisztian In-Reply-To: <200701101117.47576@nienna> Sender: netdev-owner@vger.kernel.org List-Id: netfilter-devel.vger.kernel.org KOVACS Krisztian wrote: > On Wednesday 10 January 2007 07:46, Patrick McHardy wrote: > >>>+ if (sk) { >>>+ sock_hold(sk); >>>+ skb->sk = sk; >> >>This looks racy, the socket could be closed between the lookup and >>the actual use. Why do you need the socket lookup at all, can't >>you just divert all packets selected by iptables? > > > Yes, it's racy, but I this is true for the "regular" socket lookup, too. > Take UDP for example: __udp4_lib_rcv() does the socket lookup, gets a > reference to the socket, and then calls udp_queue_rcv_skb() to queue the > skb. As far as I can see there's nothing there which prevents the socket > from being closed between these calls. sk_common_release() even documents > this behaviour: > > [...] > if (sk->sk_prot->destroy) > sk->sk_prot->destroy(sk); > > /* > * Observation: when sock_common_release is called, processes have > * no access to socket. But net still has. > * Step one, detach it from networking: > * > * A. Remove from hash tables. > */ > > sk->sk_prot->unhash(sk); > > /* > * In this point socket cannot receive new packets, but it is possible > * that some packets are in flight because some CPU runs receiver and > * did hash table lookup before we unhashed socket. They will achieve > * receive queue and will be purged by socket destructor. > * > * Also we still have packets pending on receive queue and probably, > * our own packets waiting in device queues. sock_destroy will drain > * receive queue, but transmitted packets will delay socket destruction > * until the last reference will be released. > */ > [...] > > Of course it's true that doing early lookups and storing that reference > in the skb widens the window considerably, but I think this race is > already handled. Or is there anything I don't see? You're right, it seems to be handled properly (except I think there is a race between sk_common_release calling xfrm_sk_free_policy and f.e. udp calling __xfrm_policy_check, will look into that). It probably shouldn't be cached anyway, with nf_queue for example the window could be _really_ large.