From mboxrd@z Thu Jan 1 00:00:00 1970 From: Olaf Kirch Subject: [PATCH] Prevent crash on ip_conntrack removal Date: Wed, 18 Aug 2004 11:13:52 +0200 Sender: netdev-bounce@oss.sgi.com Message-ID: <20040818091352.GB6507@suse.de> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="3V7upXqbjpZ4EhLz" Cc: netfilter-devel@lists.netfilter.org Return-path: To: netdev@oss.sgi.com Content-Disposition: inline Errors-to: netdev-bounce@oss.sgi.com List-Id: netdev.vger.kernel.org --3V7upXqbjpZ4EhLz Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi, here's a patch that keeps us from crashing on removal of ip_conntrack. This problem came up during IBM's testing of SLES. I'm not sure if this issue has been submitted already. Problem description courtesy of David Stevens: It appears that conntrack, when loaded, is queueing the fragments for reassembly pre-routing (ie, when skb->dst is 0) and giving the fully reassembled packet to the pre-routing code which will set skb->dst before using it. IP without conntrack does the queueing of fragments and reassembly post-routing, so skb->dst in that case is set for all fragments and the reassembled packet. In the failure scenario, it appears that conntrack has queued some of the fragments (w/ skb->dst=0, esp. in the offset=0 first fragment) and then the conntrack module is removed. Arrival of a fragment afterward will queue and reassemble the entire packet post-routing, but the first frag still has skb->dst 0, so it'll blow up To fix this, the patch below simply drops such skbs. A different fix could be to change the conntrack module to flush out all unassembled fragments when unloaded; an alternative patch for this is attached as well (this one is completely untested). Cheers Olaf -- Olaf Kirch | The Hardware Gods hate me. okir@suse.de | ---------------+ --3V7upXqbjpZ4EhLz Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=netfilter-unload-crash Index: v2.6.8/net/ipv4/ip_input.c =================================================================== --- v2.6.8.orig/net/ipv4/ip_input.c +++ v2.6.8/net/ipv4/ip_input.c @@ -177,6 +177,13 @@ int ip_call_ra_chain(struct sk_buff *skb read_unlock(&ip_ra_lock); return 1; } + /* When ip_conntrack gets unloaded, we may be + * left with fragment chains where the first + * fragment has skb->dst = NULL. */ + if (skb->dst == NULL) { + kfree_skb(skb); + return 1; + } } if (last) { struct sk_buff *skb2 = skb_clone(skb, GFP_ATOMIC); @@ -277,6 +284,13 @@ int ip_local_deliver(struct sk_buff *skb skb = ip_defrag(skb); if (!skb) return 0; + /* When ip_conntrack gets unloaded, we may be + * left with fragment chains where the first + * fragment has skb->dst = NULL. */ + if (skb->dst == NULL) { + kfree_skb(skb); + return 0; + } } return NF_HOOK(PF_INET, NF_IP_LOCAL_IN, skb, skb->dev, NULL, --3V7upXqbjpZ4EhLz Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=conntrack-flush-fragments Alternative fix for the crash on conntrack unload. Simply flush all fragment queues when unloading conntrack_standalone so that there are no partially assembled fragments left with skb->dst == NULL. Index: v2.6.8/include/net/ip.h =================================================================== --- v2.6.8.orig/include/net/ip.h +++ v2.6.8/include/net/ip.h @@ -255,6 +255,7 @@ extern int ip_call_ra_chain(struct sk_bu */ struct sk_buff *ip_defrag(struct sk_buff *skb); +extern void ipfrag_flush(void); extern int ip_frag_nqueues; extern atomic_t ip_frag_mem; Index: v2.6.8/net/ipv4/ip_fragment.c =================================================================== --- v2.6.8.orig/net/ipv4/ip_fragment.c +++ v2.6.8/net/ipv4/ip_fragment.c @@ -239,13 +239,13 @@ static void ipq_kill(struct ipq *ipq) /* Memory limiting on fragments. Evictor trashes the oldest * fragment queue until we are back under the low threshold. */ -static void ip_evictor(void) +static void __ip_evictor(int threshold) { struct ipq *qp; struct list_head *tmp; for(;;) { - if (atomic_read(&ip_frag_mem) <= sysctl_ipfrag_low_thresh) + if (atomic_read(&ip_frag_mem) <= threshold) return; read_lock(&ipfrag_lock); if (list_empty(&ipq_lru_list)) { @@ -267,6 +267,11 @@ static void ip_evictor(void) } } +static inline void ip_evictor(void) +{ + __ip_evictor(sysctl_ipfrag_low_thresh); +} + /* * Oops, a fragment queue timed out. Kill it and send an ICMP reply. */ @@ -677,4 +682,10 @@ void ipfrag_init(void) add_timer(&ipfrag_secret_timer); } +void ipfrag_flush(void) +{ + __ip_evictor(0); +} + EXPORT_SYMBOL(ip_defrag); +EXPORT_SYMBOL(ipfrag_flush); Index: v2.6.8/net/ipv4/netfilter/ip_conntrack_standalone.c =================================================================== --- v2.6.8.orig/net/ipv4/netfilter/ip_conntrack_standalone.c +++ v2.6.8/net/ipv4/netfilter/ip_conntrack_standalone.c @@ -562,6 +562,8 @@ static int init_or_cleanup(int init) nf_unregister_hook(&ip_conntrack_defrag_local_out_ops); cleanup_defragops: nf_unregister_hook(&ip_conntrack_defrag_ops); + /* Frag queues may hold fragments with skb->dst == NULL */ + ipfrag_flush(); cleanup_proc: proc_net_remove("ip_conntrack"); cleanup_init: --3V7upXqbjpZ4EhLz--