From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751996AbaKDBtW (ORCPT ); Mon, 3 Nov 2014 20:49:22 -0500 Received: from mail-pd0-f194.google.com ([209.85.192.194]:64764 "EHLO mail-pd0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751046AbaKDBtU (ORCPT ); Mon, 3 Nov 2014 20:49:20 -0500 From: "billbonaparte" To: Cc: , "Pablo Neira Ayuso" , "Patrick McHardy" , , , "Changli Gao" , "Jesper Dangaard Brouer" , "Andrey Vagin" Subject: Re: netfilter: nf_conntrack: there maybe a bug in __nf_conntrack_confirm, when it race against get_next_corpse Date: Tue, 4 Nov 2014 09:48:32 +0800 Message-ID: <012601cff7d1$7ce2d620$76a88260$@gmail.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0127_01CFF814.8B094A70" X-Mailer: Microsoft Outlook 14.0 Thread-Index: Ac/30PYSucSRULmNTf2aYVpAYaLsJg== Content-Language: zh-cn Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is a multipart message in MIME format. ------=_NextPart_000_0127_01CFF814.8B094A70 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit (sorry to send this e-mail again, last mail is rejected by server due to non-acceptable content) Florian Westphal [mailto:fw@strlen.de] wrote: >Correct. This is broken since the central spin lock removal, since >nf_conntrack_lock no longer protects both get_next_corpse and >conntrack_confirm. > >Please send a patch, moving dying check after removal of conntrack from >the percpu list, Since unconfirmed conntrack is stored in unconfirmed-list which is per-cpu list and protected by per-cpu spin-lock, we can remove it from uncomfirmed-list and insert it into ct-hash-table separately. that is to say, we can remove it from uncomfirmed-list without holding corresponding hash-lock, then check if it is dying. if it is dying, we add it to the dying-list, then quit __nf_conntrack_confirm. we do this to follow the rules that the conntrack must alternatively at unconfirmed-list or dying-list when it is abort to be destroyed. >> 2. operation on ct->status should be atomic, because it race aginst >> get_next_corpse. > >Alternatively we could also get rid of the unconfirmed list handling in get_next_corpse, >it looks to me as if its simply not worth the trouble to also caring >about unconfirmed lists. yes, I think so. if there is a race at operating ct->status, there will be in alternative case: 1) IPS_DYING bit which set in get_next_corpse override other bits (e.g. IPS_SRC_NAT_DONE_BIT), or 2) other bits (e.g. IPS_SRC_NAT_DONE_BIT) which set in nf_nat_setup_info override IPS_DYING bit. but, any case seems to be okay. ------=_NextPart_000_0127_01CFF814.8B094A70 Content-Type: application/octet-stream; name="fix_conntrack_confirm_race.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="fix_conntrack_confirm_race.patch" >>From c454ca5a96f5b6f815fe29cc2c91c92d719d7b95 Mon Sep 17 00:00:00 2001=0A= From: bill bonaparte =0A= Date: Mon, 3 Nov 2014 17:13:51 +0800=0A= Subject: [PATCH] netfilter: nf_conntrack: fix a race in=0A= __nf_conntrack_confirm against nf_ct_get_next_corpse=0A= =0A= After we remove central spinlock nf_conntrack_lock, we get the race = against nf_ct_get_next_corpse again,=0A= to get rid of this race, we should remove the conntrack from the = unconfirmed-list (which is per-cpu list) firstly,=0A= then check if it is dying.=0A= ---=0A= net/netfilter/nf_conntrack_core.c | 28 ++++++++++++++++------------=0A= 1 files changed, 16 insertions(+), 12 deletions(-)=0A= =0A= diff --git a/net/netfilter/nf_conntrack_core.c = b/net/netfilter/nf_conntrack_core.c=0A= index 5016a69..5d54a18 100644=0A= --- a/net/netfilter/nf_conntrack_core.c=0A= +++ b/net/netfilter/nf_conntrack_core.c=0A= @@ -589,6 +589,22 @@ __nf_conntrack_confirm(struct sk_buff *skb)=0A= =0A= zone =3D nf_ct_zone(ct);=0A= local_bh_disable();=0A= + =0A= + /* We have to check the DYING flag after unlink the conntrack=0A= + to prevent a race against nf_ct_get_next_corpse() possibly =0A= + called from user context, else we insert an already 'dead' hash, =0A= + blocking further use of that particular connection -JM */=0A= + nf_ct_del_from_dying_or_unconfirmed_list(ct);=0A= + =0A= + if (unlikely(nf_ct_is_dying(ct))) {=0A= + /* let's follow the rules that the conntrack must=0A= + alternatively at the unconfirmed-list or dying-list=0A= + when it is abort to be destoryed =0A= + */=0A= + nf_ct_add_to_dying_list(ct);=0A= + local_bh_enable();=0A= + return NF_ACCEPT;=0A= + }=0A= =0A= do {=0A= sequence =3D read_seqcount_begin(&net->ct.generation);=0A= @@ -611,16 +627,6 @@ __nf_conntrack_confirm(struct sk_buff *skb)=0A= */=0A= NF_CT_ASSERT(!nf_ct_is_confirmed(ct));=0A= pr_debug("Confirming conntrack %p\n", ct);=0A= - /* We have to check the DYING flag inside the lock to prevent=0A= - a race against nf_ct_get_next_corpse() possibly called from=0A= - user context, else we insert an already 'dead' hash, blocking=0A= - further use of that particular connection -JM */=0A= -=0A= - if (unlikely(nf_ct_is_dying(ct))) {=0A= - nf_conntrack_double_unlock(hash, reply_hash);=0A= - local_bh_enable();=0A= - return NF_ACCEPT;=0A= - }=0A= =0A= /* See if there's one in the list already, including reverse:=0A= NAT could have grabbed it without realizing, since we're=0A= @@ -636,8 +642,6 @@ __nf_conntrack_confirm(struct sk_buff *skb)=0A= zone =3D=3D nf_ct_zone(nf_ct_tuplehash_to_ctrack(h)))=0A= goto out;=0A= =0A= - nf_ct_del_from_dying_or_unconfirmed_list(ct);=0A= -=0A= /* Timer relative to confirmation time, not original=0A= setting time, otherwise we'd get timer wrap in=0A= weird delay cases. */=0A= -- =0A= 1.7.5.4=0A= =0A= ------=_NextPart_000_0127_01CFF814.8B094A70--