[PATCH] igmp: fix ip_mc_sf

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] igmp: fix ip_mc_sf_allow race
@ 2009-12-30 14:23 Flavio Leitner
  2010-01-04  5:54 ` David Miller
  0 siblings, 1 reply; 16+ messages in thread
From: Flavio Leitner @ 2009-12-30 14:23 UTC (permalink / raw)
  To: netdev; +Cc: Flavio Leitner

Almost all igmp functions accessing inet->mc_list are protected by
rtnl_lock(), but there is one exception which is ip_mc_sf_allow(),
so there is a chance of either ip_mc_drop_socket or ip_mc_leave_group
remove an entry while ip_mc_sf_allow is running causing a crash.

Signed-off-by: Flavio Leitner <fleitner@redhat.com>
---
 net/ipv4/igmp.c |   56 +++++++++++++++++++++++++++++++++++++++++++-----------
 1 files changed, 44 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 76c0840..d1e2a0c 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -1799,7 +1799,7 @@ int ip_mc_join_group(struct sock *sk , struct ip_mreqn *imr)
 	iml->next = inet->mc_list;
 	iml->sflist = NULL;
 	iml->sfmode = MCAST_EXCLUDE;
-	inet->mc_list = iml;
+	rcu_assign_pointer(inet->mc_list, iml);
 	ip_mc_inc_group(in_dev, addr);
 	err = 0;
 done:
@@ -1854,11 +1854,12 @@ int ip_mc_leave_group(struct sock *sk, struct ip_mreqn *imr)
 
 		(void) ip_mc_leave_src(sk, iml, in_dev);
 
-		*imlp = iml->next;
+		rcu_assign_pointer(*imlp, iml->next);
 
 		if (in_dev)
 			ip_mc_dec_group(in_dev, group);
 		rtnl_unlock();
+		synchronize_rcu();
 		sock_kfree_s(sk, iml, sizeof(*iml));
 		return 0;
 	}
@@ -2209,30 +2210,40 @@ int ip_mc_sf_allow(struct sock *sk, __be32 loc_addr, __be32 rmt_addr, int dif)
 	struct ip_mc_socklist *pmc;
 	struct ip_sf_socklist *psl;
 	int i;
+	int ret;
 
+	ret = 1;
 	if (!ipv4_is_multicast(loc_addr))
-		return 1;
+		goto out;
 
-	for (pmc=inet->mc_list; pmc; pmc=pmc->next) {
+	rcu_read_lock();
+	for (pmc=rcu_dereference(inet->mc_list); pmc; pmc=rcu_dereference(pmc->next)) {
 		if (pmc->multi.imr_multiaddr.s_addr == loc_addr &&
 		    pmc->multi.imr_ifindex == dif)
 			break;
 	}
+	ret = inet->mc_all;
 	if (!pmc)
-		return inet->mc_all;
+		goto unlock;
 	psl = pmc->sflist;
+	ret = (pmc->sfmode == MCAST_EXCLUDE);
 	if (!psl)
-		return pmc->sfmode == MCAST_EXCLUDE;
+		goto unlock;
 
 	for (i=0; i<psl->sl_count; i++) {
 		if (psl->sl_addr[i] == rmt_addr)
 			break;
 	}
+	ret = 0;
 	if (pmc->sfmode == MCAST_INCLUDE && i >= psl->sl_count)
-		return 0;
+		goto unlock;
 	if (pmc->sfmode == MCAST_EXCLUDE && i < psl->sl_count)
-		return 0;
-	return 1;
+		goto unlock;
+	ret = 1;
+unlock:
+	rcu_read_unlock();
+out:
+	return ret;
 }
 
 /*
@@ -2243,15 +2254,21 @@ void ip_mc_drop_socket(struct sock *sk)
 {
 	struct inet_sock *inet = inet_sk(sk);
 	struct ip_mc_socklist *iml;
+	struct ip_mc_socklist *rmlist;
 	struct net *net = sock_net(sk);
 
-	if (inet->mc_list == NULL)
+	rcu_read_lock();
+	if (rcu_dereference(inet->mc_list) == NULL) {
+		rcu_read_unlock();
 		return;
+	}
+	rcu_read_unlock();
 
 	rtnl_lock();
+	rmlist = NULL;
 	while ((iml = inet->mc_list) != NULL) {
 		struct in_device *in_dev;
-		inet->mc_list = iml->next;
+		rcu_assign_pointer(inet->mc_list, iml->next);
 
 		in_dev = inetdev_by_index(net, iml->multi.imr_ifindex);
 		(void) ip_mc_leave_src(sk, iml, in_dev);
@@ -2259,9 +2276,24 @@ void ip_mc_drop_socket(struct sock *sk)
 			ip_mc_dec_group(in_dev, iml->multi.imr_multiaddr.s_addr);
 			in_dev_put(in_dev);
 		}
-		sock_kfree_s(sk, iml, sizeof(*iml));
+		if (rmlist == NULL)  {
+			iml->next = NULL;
+			rmlist = iml;
+		}
+		else {
+			iml->next = rmlist->next;
+			rmlist->next = iml;
+		}
 	}
 	rtnl_unlock();
+
+	synchronize_rcu();
+
+	while (rmlist != NULL) {
+		iml = rmlist;
+		rmlist = iml->next;
+		sock_kfree_s(sk, iml, sizeof(*iml));
+	}
 }
 
 int ip_check_mc(struct in_device *in_dev, __be32 mc_addr, __be32 src_addr, u16 proto)
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] igmp: fix ip_mc_sf_allow race
  2009-12-30 14:23 [PATCH] igmp: fix ip_mc_sf_allow race Flavio Leitner
@ 2010-01-04  5:54 ` David Miller
  2010-01-04 11:29   ` Flavio Leitner
  0 siblings, 1 reply; 16+ messages in thread
From: David Miller @ 2010-01-04  5:54 UTC (permalink / raw)
  To: fleitner; +Cc: netdev

From: Flavio Leitner <fleitner@redhat.com>
Date: Wed, 30 Dec 2009 12:23:25 -0200

> Almost all igmp functions accessing inet->mc_list are protected by
> rtnl_lock(), but there is one exception which is ip_mc_sf_allow(),
> so there is a chance of either ip_mc_drop_socket or ip_mc_leave_group
> remove an entry while ip_mc_sf_allow is running causing a crash.
> 
> Signed-off-by: Flavio Leitner <fleitner@redhat.com>

Have you triggered this in practice or is this due purely
to code inspection?

That new synchronize_rcu() is very expensive and will decrease
the rate at which groups can be joined and left, _especially_
on high cpu count machines.

I do not think it is therefore a suitable problem to this
race, if it does in fact exist.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] igmp: fix ip_mc_sf_allow race
  2010-01-04  5:54 ` David Miller
@ 2010-01-04 11:29   ` Flavio Leitner
  2010-01-04 13:07     ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Flavio Leitner @ 2010-01-04 11:29 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Sun, Jan 03, 2010 at 09:54:41PM -0800, David Miller wrote:
> From: Flavio Leitner <fleitner@redhat.com>
> Date: Wed, 30 Dec 2009 12:23:25 -0200
> 
> > Almost all igmp functions accessing inet->mc_list are protected by
> > rtnl_lock(), but there is one exception which is ip_mc_sf_allow(),
> > so there is a chance of either ip_mc_drop_socket or ip_mc_leave_group
> > remove an entry while ip_mc_sf_allow is running causing a crash.
> > 
> > Signed-off-by: Flavio Leitner <fleitner@redhat.com>
> 
> Have you triggered this in practice or is this due purely
> to code inspection?

I had to modify the code to reproduce introducing a delay in 
ip_mc_sf_allow(), but a customer is able to reproduce it when
avahi-daemon runs at boot time.

CPU: Intel(R) Xeon(R) CPU X5570  @ 2.93GHz stepping 05

BUG: unable to handle kernel paging request at virtual address 005e0005
printing eip:
c05f2194
*pde = 00000000
Oops: 0000 [#1]
SMP
last sysfs file: /devices/pci0000:7f/0000:7f:06.0/irq
Modules linked in: nfs lockd fscache nfs_acl autofs4 hidp l2cap
bluetooth sunrpc 8021q ipv6 xfrm_nalgo crypto_api dm_multipath scsi_dh
video hwmon backlight sbs i2c_ec i2c_core but ton battery asus_acpi ac
parport_pc lp parport sg e1000(U) pcspkr dm_raid45 dm_message dm_
region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod
hfcldd(FU) sd_mod scs i_mod hfcldd_conf(U) hraslog_link(U) ext3 jbd
uhci_hcd ohci_hcd ehci_hcd
CPU:    0
EIP:    0060:[<c05f2194>]    Tainted: GF     VLI
EFLAGS: 00210202   (2.6.18-128.el5PAE #1)
EIP is at ip_mc_sf_allow+0x20/0x79
eax: 005e0001   ebx: f6cb9100   ecx: 00000008   edx: fb0000e0
esi: 5acb10ac   edi: f63414e9   ebp: f7ae3200   esp: c0732ea4
ds: 007b   es: 007b   ss: 0068
Process xlinpack_xeon32 (pid: 5194, ti=c0732000 task=cfd2daa0
task.ti=f5d30000)
Stack: f6cb9108 f6cb9100 c05ea1eb 00000008 d03e9034 5acb10ac fb0000e0 e9140000
      00000008 e91414e9 f7ae3200 c06ab4a8 00000000 00000000 c05ce1d5 f7ae3200
      00000000 f7ae3200 d03e9020 c05ce042 f7ae3200 c07d6988 c06ab560 00000008
Call Trace:
[<c05ea1eb>] udp_rcv+0x1f4/0x514
[<c05ce1d5>] ip_local_deliver+0x159/0x204
[<c05ce042>] ip_rcv+0x46f/0x4a9
[<c05b397d>] netif_receive_skb+0x30c/0x330
[<f8924be0>] e1000_clean_rx_irq+0xf0/0x3e0 [e1000]
[<f8924af0>] e1000_clean_rx_irq+0x0/0x3e0 [e1000]
[<f8922bd4>] e1000_clean+0xf4/0x340 [e1000]
[<c04074d6>] do_IRQ+0xb5/0xc3
[<c05b52d4>] net_rx_action+0x92/0x175
[<c042900f>] __do_softirq+0x87/0x114
[<c04073d7>] do_softirq+0x52/0x9c
[<c04059d7>] apic_timer_interrupt+0x1f/0x24
=======================
Code: 81 c4 8c 00 00 00 5b 5e 5f 5d c3 56 89 ce 53 89 c3 8b 4c 24 0c 89
d0 25 f0 00 00 00
3d e0 00 00 00 75 59 8b 83 84 01 00 00 eb 0c <39> 50 04 75 05 39 48 0c
74 08 8b 00 85 c0 75 f0 eb 3f 8b 50 14
EIP: [<c05f2194>] ip_mc_sf_allow+0x20/0x79 SS:ESP 0068:c0732ea4
<0>Kernel panic - not syncing: Fatal exception in interrupt
--- snip ---


> That new synchronize_rcu() is very expensive and will decrease
> the rate at which groups can be joined and left, _especially_
> on high cpu count machines.

Well, I tried using read-write locking but then the packet reception
was slower while another task was playing with multicasting groups.

Then, I tried using call_rcu() to avoid the problem you are saying,
but when you stop the reproducer, sk_free() will warn printing 
"optmem leakage.." because the rcu callback didn't run yet.


> I do not think it is therefore a suitable problem to this
> race, if it does in fact exist.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Flavio

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] igmp: fix ip_mc_sf_allow race
  2010-01-04 11:29   ` Flavio Leitner
@ 2010-01-04 13:07     ` Eric Dumazet
  2010-01-04 18:51       ` Flavio Leitner
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2010-01-04 13:07 UTC (permalink / raw)
  To: Flavio Leitner; +Cc: David Miller, netdev

Le 04/01/2010 12:29, Flavio Leitner a écrit :
 
> Then, I tried using call_rcu() to avoid the problem you are saying,
> but when you stop the reproducer, sk_free() will warn printing 
> "optmem leakage.." because the rcu callback didn't run yet.
> 
> 

This is probably because your call_rcu() callback was trying to call sock_kfree_s() ?

 		rtnl_unlock();
		call_rcu(&iml->lock, callback_func)

callback_func()
{
	sock_kfree_s(sk, iml, sizeof(*iml));
}



Take a look at sock_kfree_s() definition :

void sock_kfree_s(struct sock *sk, void *mem, int size)
{
        kfree(mem);
        atomic_sub(size, &sk->sk_omem_alloc);
}


You can certainly try :

	rtnl_unlock();
	atomic_sub(sizeof(*iml), sk->sk_omem_alloc);
	call_rcu(&iml->rcu, kfree);

(immediate sk_omem_alloc handling, but deferred kfree())


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] igmp: fix ip_mc_sf_allow race
  2010-01-04 13:07     ` Eric Dumazet
@ 2010-01-04 18:51       ` Flavio Leitner
  2010-01-04 19:53         ` Eric Dumazet
  2010-01-05  0:06         ` David Stevens
  0 siblings, 2 replies; 16+ messages in thread
From: Flavio Leitner @ 2010-01-04 18:51 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

On Mon, Jan 04, 2010 at 02:07:03PM +0100, Eric Dumazet wrote:
> Le 04/01/2010 12:29, Flavio Leitner a écrit :
>  
> > Then, I tried using call_rcu() to avoid the problem you are saying,
> > but when you stop the reproducer, sk_free() will warn printing 
> > "optmem leakage.." because the rcu callback didn't run yet.
> > 
> > 
> 
> This is probably because your call_rcu() callback was trying to call sock_kfree_s() ?

yes, correct.

> 
>  		rtnl_unlock();
> 		call_rcu(&iml->lock, callback_func)
> 
> callback_func()
> {
> 	sock_kfree_s(sk, iml, sizeof(*iml));
> }
> 
> 
> 
> Take a look at sock_kfree_s() definition :
> 
> void sock_kfree_s(struct sock *sk, void *mem, int size)
> {
>         kfree(mem);
>         atomic_sub(size, &sk->sk_omem_alloc);
> }
> 
> 
> You can certainly try :
> 
> 	rtnl_unlock();
> 	atomic_sub(sizeof(*iml), sk->sk_omem_alloc);
> 	call_rcu(&iml->rcu, kfree);
> 
> (immediate sk_omem_alloc handling, but deferred kfree())

Ok, below is the new version using call_rcu(). I'm still running my
tests here, so I'm planning to resubmit it later if this version is
okay with you.

thanks!

diff --git a/include/linux/igmp.h b/include/linux/igmp.h
index 724c27e..9cccd16 100644
--- a/include/linux/igmp.h
+++ b/include/linux/igmp.h
@@ -170,6 +170,7 @@ struct ip_mc_socklist {
 	struct ip_mreqn		multi;
 	unsigned int		sfmode;		/* MCAST_{INCLUDE,EXCLUDE} */
 	struct ip_sf_socklist	*sflist;
+	struct rcu_head		rcu;
 };
 
 struct ip_sf_list {
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 76c0840..ed154db 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -1799,7 +1799,7 @@ int ip_mc_join_group(struct sock *sk , struct ip_mreqn *imr)
 	iml->next = inet->mc_list;
 	iml->sflist = NULL;
 	iml->sfmode = MCAST_EXCLUDE;
-	inet->mc_list = iml;
+	rcu_assign_pointer(inet->mc_list, iml);
 	ip_mc_inc_group(in_dev, addr);
 	err = 0;
 done:
@@ -1825,6 +1825,17 @@ static int ip_mc_leave_src(struct sock *sk, struct ip_mc_socklist *iml,
 	return err;
 }
 
+
+void ip_mc_socklist_reclaim(struct rcu_head *rp)
+{
+	struct ip_mc_socklist *iml;
+
+	iml = container_of(rp, struct ip_mc_socklist, rcu);
+	/* sk_omem_alloc should have been decreased by the caller*/
+	kfree(iml);
+}
+
+
 /*
  *	Ask a socket to leave a group.
  */
@@ -1854,12 +1865,15 @@ int ip_mc_leave_group(struct sock *sk, struct ip_mreqn *imr)
 
 		(void) ip_mc_leave_src(sk, iml, in_dev);
 
-		*imlp = iml->next;
+		rcu_assign_pointer(*imlp, iml->next);
 
 		if (in_dev)
 			ip_mc_dec_group(in_dev, group);
+
 		rtnl_unlock();
-		sock_kfree_s(sk, iml, sizeof(*iml));
+		/* decrease mem now to avoid the memleak warning */
+		atomic_sub(sizeof(*iml), &sk->sk_omem_alloc);
+		call_rcu(&iml->rcu, ip_mc_socklist_reclaim);
 		return 0;
 	}
 	if (!in_dev)
@@ -2209,30 +2223,40 @@ int ip_mc_sf_allow(struct sock *sk, __be32 loc_addr, __be32 rmt_addr, int dif)
 	struct ip_mc_socklist *pmc;
 	struct ip_sf_socklist *psl;
 	int i;
+	int ret;
 
+	ret = 1;
 	if (!ipv4_is_multicast(loc_addr))
-		return 1;
+		goto out;
 
-	for (pmc=inet->mc_list; pmc; pmc=pmc->next) {
+	rcu_read_lock();
+	for (pmc=rcu_dereference(inet->mc_list); pmc; pmc=rcu_dereference(pmc->next)) {
 		if (pmc->multi.imr_multiaddr.s_addr == loc_addr &&
 		    pmc->multi.imr_ifindex == dif)
 			break;
 	}
+	ret = inet->mc_all;
 	if (!pmc)
-		return inet->mc_all;
+		goto unlock;
 	psl = pmc->sflist;
+	ret = (pmc->sfmode == MCAST_EXCLUDE);
 	if (!psl)
-		return pmc->sfmode == MCAST_EXCLUDE;
+		goto unlock;
 
 	for (i=0; i<psl->sl_count; i++) {
 		if (psl->sl_addr[i] == rmt_addr)
 			break;
 	}
+	ret = 0;
 	if (pmc->sfmode == MCAST_INCLUDE && i >= psl->sl_count)
-		return 0;
+		goto unlock;
 	if (pmc->sfmode == MCAST_EXCLUDE && i < psl->sl_count)
-		return 0;
-	return 1;
+		goto unlock;
+	ret = 1;
+unlock:
+	rcu_read_unlock();
+out:
+	return ret;
 }
 
 /*
@@ -2245,13 +2269,17 @@ void ip_mc_drop_socket(struct sock *sk)
 	struct ip_mc_socklist *iml;
 	struct net *net = sock_net(sk);
 
-	if (inet->mc_list == NULL)
+	rcu_read_lock();
+	if (rcu_dereference(inet->mc_list) == NULL) {
+		rcu_read_unlock();
 		return;
+	}
+	rcu_read_unlock();
 
 	rtnl_lock();
-	while ((iml = inet->mc_list) != NULL) {
+	while ((iml = rcu_dereference(inet->mc_list)) != NULL) {
 		struct in_device *in_dev;
-		inet->mc_list = iml->next;
+		rcu_assign_pointer(inet->mc_list, iml->next);
 
 		in_dev = inetdev_by_index(net, iml->multi.imr_ifindex);
 		(void) ip_mc_leave_src(sk, iml, in_dev);
@@ -2259,7 +2287,9 @@ void ip_mc_drop_socket(struct sock *sk)
 			ip_mc_dec_group(in_dev, iml->multi.imr_multiaddr.s_addr);
 			in_dev_put(in_dev);
 		}
-		sock_kfree_s(sk, iml, sizeof(*iml));
+		/* decrease mem now to avoid the memleak warning */
+		atomic_sub(sizeof(*iml), &sk->sk_omem_alloc);
+		call_rcu(&iml->rcu, ip_mc_socklist_reclaim);
 	}
 	rtnl_unlock();
 }
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] igmp: fix ip_mc_sf_allow race
  2010-01-04 18:51       ` Flavio Leitner
@ 2010-01-04 19:53         ` Eric Dumazet
  2010-01-05  0:06         ` David Stevens
  1 sibling, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2010-01-04 19:53 UTC (permalink / raw)
  To: Flavio Leitner; +Cc: David Miller, netdev

Le 04/01/2010 19:51, Flavio Leitner a écrit :
> On Mon, Jan 04, 2010 at 02:07:03PM +0100, Eric Dumazet wrote:
>> Le 04/01/2010 12:29, Flavio Leitner a écrit :
>>  
>>> Then, I tried using call_rcu() to avoid the problem you are saying,
>>> but when you stop the reproducer, sk_free() will warn printing 
>>> "optmem leakage.." because the rcu callback didn't run yet.
>>>
>>>
>>
>> This is probably because your call_rcu() callback was trying to call sock_kfree_s() ?
> 
> yes, correct.
> 
>>
>>  		rtnl_unlock();
>> 		call_rcu(&iml->lock, callback_func)
>>
>> callback_func()
>> {
>> 	sock_kfree_s(sk, iml, sizeof(*iml));
>> }
>>
>>
>>
>> Take a look at sock_kfree_s() definition :
>>
>> void sock_kfree_s(struct sock *sk, void *mem, int size)
>> {
>>         kfree(mem);
>>         atomic_sub(size, &sk->sk_omem_alloc);
>> }
>>
>>
>> You can certainly try :
>>
>> 	rtnl_unlock();
>> 	atomic_sub(sizeof(*iml), sk->sk_omem_alloc);
>> 	call_rcu(&iml->rcu, kfree);
>>
>> (immediate sk_omem_alloc handling, but deferred kfree())
> 
> Ok, below is the new version using call_rcu(). I'm still running my
> tests here, so I'm planning to resubmit it later if this version is
> okay with you.

It seems fine, but please make ip_mc_socklist_reclaim() static :

> +
> +void ip_mc_socklist_reclaim(struct rcu_head *rp)
> +{
> +	struct ip_mc_socklist *iml;
> +
> +	iml = container_of(rp, struct ip_mc_socklist, rcu);
> +	/* sk_omem_alloc should have been decreased by the caller*/
> +	kfree(iml);
> +}
> +
> +

Thanks

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] igmp: fix ip_mc_sf_allow race
  2010-01-04 18:51       ` Flavio Leitner
  2010-01-04 19:53         ` Eric Dumazet
@ 2010-01-05  0:06         ` David Stevens
  2010-01-05  6:55           ` Eric Dumazet
  1 sibling, 1 reply; 16+ messages in thread
From: David Stevens @ 2010-01-05  0:06 UTC (permalink / raw)
  To: Flavio Leitner; +Cc: David Miller, Eric Dumazet, netdev, netdev-owner

For readability, instead of:

        ret = X;
        if (condition)
                goto out;

I prefer:

        if (condition) {
                ret = X;
                goto out;
        }

which makes it clear that ret is for the return and not
some random state change unrelated to the condition.

Otherwise, looks ok to me.

                                        +-DLS

Acked-by: David L Stevens <dlstevens@us.ibm.com>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] igmp: fix ip_mc_sf_allow race
  2010-01-05  0:06         ` David Stevens
@ 2010-01-05  6:55           ` Eric Dumazet
  2010-01-05 20:52             ` [PATCH] igmp: fix ip_mc_sf_allow race [v3] Flavio Leitner
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2010-01-05  6:55 UTC (permalink / raw)
  To: David Stevens; +Cc: Flavio Leitner, David Miller, netdev, netdev-owner

Le 05/01/2010 01:06, David Stevens a écrit :
> For readability, instead of:
> 
>         ret = X;
>         if (condition)
>                 goto out;
> 
> I prefer:
> 
>         if (condition) {
>                 ret = X;
>                 goto out;
>         }
> 
> which makes it clear that ret is for the return and not
> some random state change unrelated to the condition.
> 

Linus argument for this common Linux coding style is that generated code is smaller

	mov	$-6,%eax
	cmp	condition
	je	out

versus
	cmp	condition
	jne	.ok
	mov	$-6,%eax
	jmp	out
.ok:


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] igmp: fix ip_mc_sf_allow race [v3]
  2010-01-05  6:55           ` Eric Dumazet
@ 2010-01-05 20:52             ` Flavio Leitner
  2010-01-05 22:36               ` Eric Dumazet
                                 ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Flavio Leitner @ 2010-01-05 20:52 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, David Stevens, Eric Dumazet, Flavio Leitner

Almost all igmp functions accessing inet->mc_list are protected by
rtnl_lock(), but there is one exception which is ip_mc_sf_allow(),
so there is a chance of either ip_mc_drop_socket or ip_mc_leave_group
remove an entry while ip_mc_sf_allow is running causing a crash.

Signed-off-by: Flavio Leitner <fleitner@redhat.com>
---
 include/linux/igmp.h |    1 +
 net/ipv4/igmp.c      |   58 +++++++++++++++++++++++++++++++++++++------------
 2 files changed, 45 insertions(+), 14 deletions(-)

diff --git a/include/linux/igmp.h b/include/linux/igmp.h
index 724c27e..9cccd16 100644
--- a/include/linux/igmp.h
+++ b/include/linux/igmp.h
@@ -170,6 +170,7 @@ struct ip_mc_socklist {
 	struct ip_mreqn		multi;
 	unsigned int		sfmode;		/* MCAST_{INCLUDE,EXCLUDE} */
 	struct ip_sf_socklist	*sflist;
+	struct rcu_head		rcu;
 };
 
 struct ip_sf_list {
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 76c0840..61ff685 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -1799,7 +1799,7 @@ int ip_mc_join_group(struct sock *sk , struct ip_mreqn *imr)
 	iml->next = inet->mc_list;
 	iml->sflist = NULL;
 	iml->sfmode = MCAST_EXCLUDE;
-	inet->mc_list = iml;
+	rcu_assign_pointer(inet->mc_list, iml);
 	ip_mc_inc_group(in_dev, addr);
 	err = 0;
 done:
@@ -1825,6 +1825,17 @@ static int ip_mc_leave_src(struct sock *sk, struct ip_mc_socklist *iml,
 	return err;
 }
 
+
+static void ip_mc_socklist_reclaim(struct rcu_head *rp)
+{
+	struct ip_mc_socklist *iml;
+
+	iml = container_of(rp, struct ip_mc_socklist, rcu);
+	/* sk_omem_alloc should have been decreased by the caller*/
+	kfree(iml);
+}
+
+
 /*
  *	Ask a socket to leave a group.
  */
@@ -1854,12 +1865,15 @@ int ip_mc_leave_group(struct sock *sk, struct ip_mreqn *imr)
 
 		(void) ip_mc_leave_src(sk, iml, in_dev);
 
-		*imlp = iml->next;
+		rcu_assign_pointer(*imlp, iml->next);
 
 		if (in_dev)
 			ip_mc_dec_group(in_dev, group);
+
 		rtnl_unlock();
-		sock_kfree_s(sk, iml, sizeof(*iml));
+		/* decrease mem now to avoid the memleak warning */
+		atomic_sub(sizeof(*iml), &sk->sk_omem_alloc);
+		call_rcu(&iml->rcu, ip_mc_socklist_reclaim);
 		return 0;
 	}
 	if (!in_dev)
@@ -2209,30 +2223,40 @@ int ip_mc_sf_allow(struct sock *sk, __be32 loc_addr, __be32 rmt_addr, int dif)
 	struct ip_mc_socklist *pmc;
 	struct ip_sf_socklist *psl;
 	int i;
+	int ret;
 
+	ret = 1;
 	if (!ipv4_is_multicast(loc_addr))
-		return 1;
+		goto out;
 
-	for (pmc=inet->mc_list; pmc; pmc=pmc->next) {
+	rcu_read_lock();
+	for (pmc=rcu_dereference(inet->mc_list); pmc; pmc=rcu_dereference(pmc->next)) {
 		if (pmc->multi.imr_multiaddr.s_addr == loc_addr &&
 		    pmc->multi.imr_ifindex == dif)
 			break;
 	}
+	ret = inet->mc_all;
 	if (!pmc)
-		return inet->mc_all;
+		goto unlock;
 	psl = pmc->sflist;
+	ret = (pmc->sfmode == MCAST_EXCLUDE);
 	if (!psl)
-		return pmc->sfmode == MCAST_EXCLUDE;
+		goto unlock;
 
 	for (i=0; i<psl->sl_count; i++) {
 		if (psl->sl_addr[i] == rmt_addr)
 			break;
 	}
+	ret = 0;
 	if (pmc->sfmode == MCAST_INCLUDE && i >= psl->sl_count)
-		return 0;
+		goto unlock;
 	if (pmc->sfmode == MCAST_EXCLUDE && i < psl->sl_count)
-		return 0;
-	return 1;
+		goto unlock;
+	ret = 1;
+unlock:
+	rcu_read_unlock();
+out:
+	return ret;
 }
 
 /*
@@ -2245,13 +2269,17 @@ void ip_mc_drop_socket(struct sock *sk)
 	struct ip_mc_socklist *iml;
 	struct net *net = sock_net(sk);
 
-	if (inet->mc_list == NULL)
+	rcu_read_lock();
+	if (rcu_dereference(inet->mc_list) == NULL) {
+		rcu_read_unlock();
 		return;
+	}
+	rcu_read_unlock();
 
 	rtnl_lock();
-	while ((iml = inet->mc_list) != NULL) {
+	while ((iml = rcu_dereference(inet->mc_list)) != NULL) {
 		struct in_device *in_dev;
-		inet->mc_list = iml->next;
+		rcu_assign_pointer(inet->mc_list, iml->next);
 
 		in_dev = inetdev_by_index(net, iml->multi.imr_ifindex);
 		(void) ip_mc_leave_src(sk, iml, in_dev);
@@ -2259,7 +2287,9 @@ void ip_mc_drop_socket(struct sock *sk)
 			ip_mc_dec_group(in_dev, iml->multi.imr_multiaddr.s_addr);
 			in_dev_put(in_dev);
 		}
-		sock_kfree_s(sk, iml, sizeof(*iml));
+		/* decrease mem now to avoid the memleak warning */
+		atomic_sub(sizeof(*iml), &sk->sk_omem_alloc);
+		call_rcu(&iml->rcu, ip_mc_socklist_reclaim);
 	}
 	rtnl_unlock();
 }
-- 
1.6.2.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] igmp: fix ip_mc_sf_allow race [v3]
  2010-01-05 20:52             ` [PATCH] igmp: fix ip_mc_sf_allow race [v3] Flavio Leitner
@ 2010-01-05 22:36               ` Eric Dumazet
  2010-01-05 23:03               ` Stephen Hemminger
                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2010-01-05 22:36 UTC (permalink / raw)
  To: Flavio Leitner; +Cc: netdev, David Miller, David Stevens

Le 05/01/2010 21:52, Flavio Leitner a écrit :
> Almost all igmp functions accessing inet->mc_list are protected by
> rtnl_lock(), but there is one exception which is ip_mc_sf_allow(),
> so there is a chance of either ip_mc_drop_socket or ip_mc_leave_group
> remove an entry while ip_mc_sf_allow is running causing a crash.
> 
> Signed-off-by: Flavio Leitner <fleitner@redhat.com>

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Small notes : in ip_mc_drop_socket()


// rcu_read_lock()/unlock() seems not really needed here, we only want to avoid
//the fill rtnlçlock() in case this socket have a NULL mc_list.

rcu_read_lock();
if (rcu_dereference(inet->mc_list) == NULL) {
	rcu_read_unlock();
	return;
}
rcu_read_unlock();


rtnl_lock();
while ((iml = rcu_dereference(inet->mc_list)) != NULL) {

rcu_dereference() is not really needed here, since you own RTNL


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] igmp: fix ip_mc_sf_allow race [v3]
  2010-01-05 20:52             ` [PATCH] igmp: fix ip_mc_sf_allow race [v3] Flavio Leitner
  2010-01-05 22:36               ` Eric Dumazet
@ 2010-01-05 23:03               ` Stephen Hemminger
  2010-01-06 16:40               ` Paul E. McKenney
  2010-01-28 16:13               ` [PATCH] igmp: fix ip_mc_sf_allow race [v5] Flavio Leitner
  3 siblings, 0 replies; 16+ messages in thread
From: Stephen Hemminger @ 2010-01-05 23:03 UTC (permalink / raw)
  To: Flavio Leitner
  Cc: netdev, David Miller, David Stevens, Eric Dumazet, Flavio Leitner

On Tue,  5 Jan 2010 18:52:22 -0200
Flavio Leitner <fleitner@redhat.com> wrote:

> @@ -2245,13 +2269,17 @@ void ip_mc_drop_socket(struct sock *sk)
>  	struct ip_mc_socklist *iml;
>  	struct net *net = sock_net(sk);
>  
> -	if (inet->mc_list == NULL)
> +	rcu_read_lock();
> +	if (rcu_dereference(inet->mc_list) == NULL) {
> +		rcu_read_unlock();
>  		return;
> +	}
> +	rcu_read_unlock();
>  
>  	rtnl_lock();
> -	while ((iml = i

All this would be cleaner if mc_list was using list_head and the
existing list_head_rcu stuff.



-- 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] igmp: fix ip_mc_sf_allow race [v3]
  2010-01-05 20:52             ` [PATCH] igmp: fix ip_mc_sf_allow race [v3] Flavio Leitner
  2010-01-05 22:36               ` Eric Dumazet
  2010-01-05 23:03               ` Stephen Hemminger
@ 2010-01-06 16:40               ` Paul E. McKenney
  2010-01-06 17:10                 ` Stephen Hemminger
  2010-01-28 16:13               ` [PATCH] igmp: fix ip_mc_sf_allow race [v5] Flavio Leitner
  3 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2010-01-06 16:40 UTC (permalink / raw)
  To: Flavio Leitner; +Cc: netdev, David Miller, David Stevens, Eric Dumazet

On Tue, Jan 05, 2010 at 06:52:22PM -0200, Flavio Leitner wrote:
> Almost all igmp functions accessing inet->mc_list are protected by
> rtnl_lock(), but there is one exception which is ip_mc_sf_allow(),
> so there is a chance of either ip_mc_drop_socket or ip_mc_leave_group
> remove an entry while ip_mc_sf_allow is running causing a crash.

Looks like a good start from an RCU perspective, though I don't claim
to understand networking locking design.  A couple of questions below.

							Thanx, Paul

> Signed-off-by: Flavio Leitner <fleitner@redhat.com>
> ---
>  include/linux/igmp.h |    1 +
>  net/ipv4/igmp.c      |   58 +++++++++++++++++++++++++++++++++++++------------
>  2 files changed, 45 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/igmp.h b/include/linux/igmp.h
> index 724c27e..9cccd16 100644
> --- a/include/linux/igmp.h
> +++ b/include/linux/igmp.h
> @@ -170,6 +170,7 @@ struct ip_mc_socklist {
>  	struct ip_mreqn		multi;
>  	unsigned int		sfmode;		/* MCAST_{INCLUDE,EXCLUDE} */
>  	struct ip_sf_socklist	*sflist;
> +	struct rcu_head		rcu;
>  };
> 
>  struct ip_sf_list {
> diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
> index 76c0840..61ff685 100644
> --- a/net/ipv4/igmp.c
> +++ b/net/ipv4/igmp.c
> @@ -1799,7 +1799,7 @@ int ip_mc_join_group(struct sock *sk , struct ip_mreqn *imr)
>  	iml->next = inet->mc_list;
>  	iml->sflist = NULL;
>  	iml->sfmode = MCAST_EXCLUDE;
> -	inet->mc_list = iml;
> +	rcu_assign_pointer(inet->mc_list, iml);
>  	ip_mc_inc_group(in_dev, addr);
>  	err = 0;
>  done:
> @@ -1825,6 +1825,17 @@ static int ip_mc_leave_src(struct sock *sk, struct ip_mc_socklist *iml,
>  	return err;
>  }
> 
> +
> +static void ip_mc_socklist_reclaim(struct rcu_head *rp)
> +{
> +	struct ip_mc_socklist *iml;
> +
> +	iml = container_of(rp, struct ip_mc_socklist, rcu);
> +	/* sk_omem_alloc should have been decreased by the caller*/
> +	kfree(iml);
> +}
> +
> +
>  /*
>   *	Ask a socket to leave a group.
>   */
> @@ -1854,12 +1865,15 @@ int ip_mc_leave_group(struct sock *sk, struct ip_mreqn *imr)
> 
>  		(void) ip_mc_leave_src(sk, iml, in_dev);

Suppose some other CPU invokes ip_mc_sf_allow() at this point.  Will that
CPU be OK with the current state of the structure pointed to by iml?
If not, some of the above code might need to be deferred to follow the
grace period.

> -		*imlp = iml->next;
> +		rcu_assign_pointer(*imlp, iml->next);
> 
>  		if (in_dev)
>  			ip_mc_dec_group(in_dev, group);
> +
>  		rtnl_unlock();
> -		sock_kfree_s(sk, iml, sizeof(*iml));
> +		/* decrease mem now to avoid the memleak warning */
> +		atomic_sub(sizeof(*iml), &sk->sk_omem_alloc);
> +		call_rcu(&iml->rcu, ip_mc_socklist_reclaim);
>  		return 0;
>  	}
>  	if (!in_dev)
> @@ -2209,30 +2223,40 @@ int ip_mc_sf_allow(struct sock *sk, __be32 loc_addr, __be32 rmt_addr, int dif)
>  	struct ip_mc_socklist *pmc;
>  	struct ip_sf_socklist *psl;
>  	int i;
> +	int ret;
> 
> +	ret = 1;
>  	if (!ipv4_is_multicast(loc_addr))
> -		return 1;
> +		goto out;
> 
> -	for (pmc=inet->mc_list; pmc; pmc=pmc->next) {
> +	rcu_read_lock();
> +	for (pmc=rcu_dereference(inet->mc_list); pmc; pmc=rcu_dereference(pmc->next)) {
>  		if (pmc->multi.imr_multiaddr.s_addr == loc_addr &&
>  		    pmc->multi.imr_ifindex == dif)
>  			break;
>  	}
> +	ret = inet->mc_all;
>  	if (!pmc)
> -		return inet->mc_all;
> +		goto unlock;
>  	psl = pmc->sflist;
> +	ret = (pmc->sfmode == MCAST_EXCLUDE);
>  	if (!psl)
> -		return pmc->sfmode == MCAST_EXCLUDE;
> +		goto unlock;
> 
>  	for (i=0; i<psl->sl_count; i++) {
>  		if (psl->sl_addr[i] == rmt_addr)
>  			break;
>  	}
> +	ret = 0;
>  	if (pmc->sfmode == MCAST_INCLUDE && i >= psl->sl_count)
> -		return 0;
> +		goto unlock;
>  	if (pmc->sfmode == MCAST_EXCLUDE && i < psl->sl_count)
> -		return 0;
> -	return 1;
> +		goto unlock;
> +	ret = 1;
> +unlock:
> +	rcu_read_unlock();
> +out:
> +	return ret;
>  }
> 
>  /*
> @@ -2245,13 +2269,17 @@ void ip_mc_drop_socket(struct sock *sk)
>  	struct ip_mc_socklist *iml;
>  	struct net *net = sock_net(sk);
> 
> -	if (inet->mc_list == NULL)
> +	rcu_read_lock();
> +	if (rcu_dereference(inet->mc_list) == NULL) {
> +		rcu_read_unlock();
>  		return;
> +	}
> +	rcu_read_unlock();

I don't understand what rcu_read_lock() is protecting here.  The
test is still unstable -- just after finding inet->mc_list non-NULL,
ip_mc_leave_group() might cause it to become NULL.

Is there a need to protect sock_net(sk)?  (I don't believe so, but then
again, I don't claim to understand locking in Linux networking.)
If there is no need, it should be possible to drop the rcu_read_lock(),
rcu_read_unlock(), and rcu_dereference() above.  (You might want them
for documentation purposes, as they aren't hurting anything, just
wondering what the intent is.)

>  	rtnl_lock();
> -	while ((iml = inet->mc_list) != NULL) {
> +	while ((iml = rcu_dereference(inet->mc_list)) != NULL) {
>  		struct in_device *in_dev;
> -		inet->mc_list = iml->next;
> +		rcu_assign_pointer(inet->mc_list, iml->next);
> 
>  		in_dev = inetdev_by_index(net, iml->multi.imr_ifindex);
>  		(void) ip_mc_leave_src(sk, iml, in_dev);
> @@ -2259,7 +2287,9 @@ void ip_mc_drop_socket(struct sock *sk)
>  			ip_mc_dec_group(in_dev, iml->multi.imr_multiaddr.s_addr);
>  			in_dev_put(in_dev);
>  		}
> -		sock_kfree_s(sk, iml, sizeof(*iml));
> +		/* decrease mem now to avoid the memleak warning */
> +		atomic_sub(sizeof(*iml), &sk->sk_omem_alloc);
> +		call_rcu(&iml->rcu, ip_mc_socklist_reclaim);
>  	}
>  	rtnl_unlock();
>  }
> -- 
> 1.6.2.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] igmp: fix ip_mc_sf_allow race [v3]
  2010-01-06 16:40               ` Paul E. McKenney
@ 2010-01-06 17:10                 ` Stephen Hemminger
  2010-01-06 18:50                   ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Stephen Hemminger @ 2010-01-06 17:10 UTC (permalink / raw)
  To: paulmck; +Cc: Flavio Leitner, netdev, David Miller, David Stevens, Eric Dumazet

On Wed, 6 Jan 2010 08:40:27 -0800
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> > -	if (inet->mc_list == NULL)
> > +	rcu_read_lock();
> > +	if (rcu_dereference(inet->mc_list) == NULL) {
> > +		rcu_read_unlock();
> >  		return;
> > +	}
> > +	rcu_read_unlock();  
> 
> I don't understand what rcu_read_lock() is protecting here.  The
> test is still unstable -- just after finding inet->mc_list non-NULL,
> ip_mc_leave_group() might cause it to become NULL.
> 
> Is there a need to protect sock_net(sk)?  (I don't believe so, but then
> again, I don't claim to understand locking in Linux networking.)
> If there is no need, it should be possible to drop the rcu_read_lock(),
> rcu_read_unlock(), and rcu_dereference() above.  (You might want them
> for documentation purposes, as they aren't hurting anything, just
> wondering what the intent is.)

I think code is trying to avoid looking at mc_list if no multicast
addresses. But it is an unsafe check.

If mc_list was just converted to list_head this would all be clearer

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] igmp: fix ip_mc_sf_allow race [v3]
  2010-01-06 17:10                 ` Stephen Hemminger
@ 2010-01-06 18:50                   ` Paul E. McKenney
  0 siblings, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2010-01-06 18:50 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Flavio Leitner, netdev, David Miller, David Stevens, Eric Dumazet

On Wed, Jan 06, 2010 at 09:10:07AM -0800, Stephen Hemminger wrote:
> On Wed, 6 Jan 2010 08:40:27 -0800
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > > -	if (inet->mc_list == NULL)
> > > +	rcu_read_lock();
> > > +	if (rcu_dereference(inet->mc_list) == NULL) {
> > > +		rcu_read_unlock();
> > >  		return;
> > > +	}
> > > +	rcu_read_unlock();  
> > 
> > I don't understand what rcu_read_lock() is protecting here.  The
> > test is still unstable -- just after finding inet->mc_list non-NULL,
> > ip_mc_leave_group() might cause it to become NULL.
> > 
> > Is there a need to protect sock_net(sk)?  (I don't believe so, but then
> > again, I don't claim to understand locking in Linux networking.)
> > If there is no need, it should be possible to drop the rcu_read_lock(),
> > rcu_read_unlock(), and rcu_dereference() above.  (You might want them
> > for documentation purposes, as they aren't hurting anything, just
> > wondering what the intent is.)
> 
> I think code is trying to avoid looking at mc_list if no multicast
> addresses. But it is an unsafe check.

Fair enough!  Might be worth a comment saying that the rcu_read_lock(),
rcu_read_unlock()s, and rcu_dereference() are just for show.

> If mc_list was just converted to list_head this would all be clearer

Agreed!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] igmp: fix ip_mc_sf_allow race [v5]
  2010-01-05 20:52             ` [PATCH] igmp: fix ip_mc_sf_allow race [v3] Flavio Leitner
                                 ` (2 preceding siblings ...)
  2010-01-06 16:40               ` Paul E. McKenney
@ 2010-01-28 16:13               ` Flavio Leitner
  2010-02-02 15:32                 ` David Miller
  3 siblings, 1 reply; 16+ messages in thread
From: Flavio Leitner @ 2010-01-28 16:13 UTC (permalink / raw)
  To: netdev; +Cc: David Miller, David Stevens, Eric Dumazet

Hello,

This is another version considering the suggestions. Thanks!
I didn't convert to list_head yet because a lot more code needs
to be changed so I'd rather leave that to another patch.
8<---------

Almost all igmp functions accessing inet->mc_list are protected by
rtnl_lock(), but there is one exception which is ip_mc_sf_allow(),
so there is a chance of either ip_mc_drop_socket or ip_mc_leave_group
remove an entry while ip_mc_sf_allow is running causing a crash.

Signed-off-by: Flavio Leitner <fleitner@redhat.com>
---
 include/linux/igmp.h |    2 +
 net/ipv4/igmp.c      |   83 +++++++++++++++++++++++++++++++++++++------------
 2 files changed, 64 insertions(+), 21 deletions(-)

diff --git a/include/linux/igmp.h b/include/linux/igmp.h
index 724c27e..93fc244 100644
--- a/include/linux/igmp.h
+++ b/include/linux/igmp.h
@@ -153,6 +153,7 @@ extern int sysctl_igmp_max_msf;
 struct ip_sf_socklist {
 	unsigned int		sl_max;
 	unsigned int		sl_count;
+	struct rcu_head		rcu;
 	__be32			sl_addr[0];
 };
 
@@ -170,6 +171,7 @@ struct ip_mc_socklist {
 	struct ip_mreqn		multi;
 	unsigned int		sfmode;		/* MCAST_{INCLUDE,EXCLUDE} */
 	struct ip_sf_socklist	*sflist;
+	struct rcu_head		rcu;
 };
 
 struct ip_sf_list {
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 76c0840..54003a5 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -1799,7 +1799,7 @@ int ip_mc_join_group(struct sock *sk , struct ip_mreqn *imr)
 	iml->next = inet->mc_list;
 	iml->sflist = NULL;
 	iml->sfmode = MCAST_EXCLUDE;
-	inet->mc_list = iml;
+	rcu_assign_pointer(inet->mc_list, iml);
 	ip_mc_inc_group(in_dev, addr);
 	err = 0;
 done:
@@ -1807,24 +1807,46 @@ done:
 	return err;
 }
 
+static void ip_sf_socklist_reclaim(struct rcu_head *rp)
+{
+	struct ip_sf_socklist *psf;
+
+	psf = container_of(rp, struct ip_sf_socklist, rcu);
+	/* sk_omem_alloc should have been decreased by the caller*/
+	kfree(psf);
+}
+
 static int ip_mc_leave_src(struct sock *sk, struct ip_mc_socklist *iml,
 			   struct in_device *in_dev)
 {
+	struct ip_sf_socklist *psf = iml->sflist;
 	int err;
 
-	if (iml->sflist == NULL) {
+	if (psf == NULL) {
 		/* any-source empty exclude case */
 		return ip_mc_del_src(in_dev, &iml->multi.imr_multiaddr.s_addr,
 			iml->sfmode, 0, NULL, 0);
 	}
 	err = ip_mc_del_src(in_dev, &iml->multi.imr_multiaddr.s_addr,
-			iml->sfmode, iml->sflist->sl_count,
-			iml->sflist->sl_addr, 0);
-	sock_kfree_s(sk, iml->sflist, IP_SFLSIZE(iml->sflist->sl_max));
-	iml->sflist = NULL;
+			iml->sfmode, psf->sl_count, psf->sl_addr, 0);
+	rcu_assign_pointer(iml->sflist, NULL);
+	/* decrease mem now to avoid the memleak warning */
+	atomic_sub(IP_SFLSIZE(psf->sl_max), &sk->sk_omem_alloc);
+	call_rcu(&psf->rcu, ip_sf_socklist_reclaim);
 	return err;
 }
 
+
+static void ip_mc_socklist_reclaim(struct rcu_head *rp)
+{
+	struct ip_mc_socklist *iml;
+
+	iml = container_of(rp, struct ip_mc_socklist, rcu);
+	/* sk_omem_alloc should have been decreased by the caller*/
+	kfree(iml);
+}
+
+
 /*
  *	Ask a socket to leave a group.
  */
@@ -1854,12 +1876,14 @@ int ip_mc_leave_group(struct sock *sk, struct ip_mreqn *imr)
 
 		(void) ip_mc_leave_src(sk, iml, in_dev);
 
-		*imlp = iml->next;
+		rcu_assign_pointer(*imlp, iml->next);
 
 		if (in_dev)
 			ip_mc_dec_group(in_dev, group);
 		rtnl_unlock();
-		sock_kfree_s(sk, iml, sizeof(*iml));
+		/* decrease mem now to avoid the memleak warning */
+		atomic_sub(sizeof(*iml), &sk->sk_omem_alloc);
+		call_rcu(&iml->rcu, ip_mc_socklist_reclaim);
 		return 0;
 	}
 	if (!in_dev)
@@ -1974,9 +1998,12 @@ int ip_mc_source(int add, int omode, struct sock *sk, struct
 		if (psl) {
 			for (i=0; i<psl->sl_count; i++)
 				newpsl->sl_addr[i] = psl->sl_addr[i];
-			sock_kfree_s(sk, psl, IP_SFLSIZE(psl->sl_max));
+			/* decrease mem now to avoid the memleak warning */
+			atomic_sub(IP_SFLSIZE(psl->sl_max), &sk->sk_omem_alloc);
+			call_rcu(&psl->rcu, ip_sf_socklist_reclaim);
 		}
-		pmc->sflist = psl = newpsl;
+		rcu_assign_pointer(pmc->sflist, newpsl);
+		psl = newpsl;
 	}
 	rv = 1;	/* > 0 for insert logic below if sl_count is 0 */
 	for (i=0; i<psl->sl_count; i++) {
@@ -2072,11 +2099,13 @@ int ip_mc_msfilter(struct sock *sk, struct ip_msfilter *msf, int ifindex)
 	if (psl) {
 		(void) ip_mc_del_src(in_dev, &msf->imsf_multiaddr, pmc->sfmode,
 			psl->sl_count, psl->sl_addr, 0);
-		sock_kfree_s(sk, psl, IP_SFLSIZE(psl->sl_max));
+		/* decrease mem now to avoid the memleak warning */
+		atomic_sub(IP_SFLSIZE(psl->sl_max), &sk->sk_omem_alloc);
+		call_rcu(&psl->rcu, ip_sf_socklist_reclaim);
 	} else
 		(void) ip_mc_del_src(in_dev, &msf->imsf_multiaddr, pmc->sfmode,
 			0, NULL, 0);
-	pmc->sflist = newpsl;
+	rcu_assign_pointer(pmc->sflist, newpsl);
 	pmc->sfmode = msf->imsf_fmode;
 	err = 0;
 done:
@@ -2209,30 +2238,40 @@ int ip_mc_sf_allow(struct sock *sk, __be32 loc_addr, __be32 rmt_addr, int dif)
 	struct ip_mc_socklist *pmc;
 	struct ip_sf_socklist *psl;
 	int i;
+	int ret;
 
+	ret = 1;
 	if (!ipv4_is_multicast(loc_addr))
-		return 1;
+		goto out;
 
-	for (pmc=inet->mc_list; pmc; pmc=pmc->next) {
+	rcu_read_lock();
+	for (pmc=rcu_dereference(inet->mc_list); pmc; pmc=rcu_dereference(pmc->next)) {
 		if (pmc->multi.imr_multiaddr.s_addr == loc_addr &&
 		    pmc->multi.imr_ifindex == dif)
 			break;
 	}
+	ret = inet->mc_all;
 	if (!pmc)
-		return inet->mc_all;
+		goto unlock;
 	psl = pmc->sflist;
+	ret = (pmc->sfmode == MCAST_EXCLUDE);
 	if (!psl)
-		return pmc->sfmode == MCAST_EXCLUDE;
+		goto unlock;
 
 	for (i=0; i<psl->sl_count; i++) {
 		if (psl->sl_addr[i] == rmt_addr)
 			break;
 	}
+	ret = 0;
 	if (pmc->sfmode == MCAST_INCLUDE && i >= psl->sl_count)
-		return 0;
+		goto unlock;
 	if (pmc->sfmode == MCAST_EXCLUDE && i < psl->sl_count)
-		return 0;
-	return 1;
+		goto unlock;
+	ret = 1;
+unlock:
+	rcu_read_unlock();
+out:
+	return ret;
 }
 
 /*
@@ -2251,7 +2290,7 @@ void ip_mc_drop_socket(struct sock *sk)
 	rtnl_lock();
 	while ((iml = inet->mc_list) != NULL) {
 		struct in_device *in_dev;
-		inet->mc_list = iml->next;
+		rcu_assign_pointer(inet->mc_list, iml->next);
 
 		in_dev = inetdev_by_index(net, iml->multi.imr_ifindex);
 		(void) ip_mc_leave_src(sk, iml, in_dev);
@@ -2259,7 +2298,9 @@ void ip_mc_drop_socket(struct sock *sk)
 			ip_mc_dec_group(in_dev, iml->multi.imr_multiaddr.s_addr);
 			in_dev_put(in_dev);
 		}
-		sock_kfree_s(sk, iml, sizeof(*iml));
+		/* decrease mem now to avoid the memleak warning */
+		atomic_sub(sizeof(*iml), &sk->sk_omem_alloc);
+		call_rcu(&iml->rcu, ip_mc_socklist_reclaim);
 	}
 	rtnl_unlock();
 }
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] igmp: fix ip_mc_sf_allow race [v5]
  2010-01-28 16:13               ` [PATCH] igmp: fix ip_mc_sf_allow race [v5] Flavio Leitner
@ 2010-02-02 15:32                 ` David Miller
  0 siblings, 0 replies; 16+ messages in thread
From: David Miller @ 2010-02-02 15:32 UTC (permalink / raw)
  To: fbl; +Cc: netdev, dlstevens, eric.dumazet

From: Flavio Leitner <fbl@sysclose.org>
Date: Thu, 28 Jan 2010 14:13:38 -0200

> Hello,
> 
> This is another version considering the suggestions. Thanks!
> I didn't convert to list_head yet because a lot more code needs
> to be changed so I'd rather leave that to another patch.
> 8<---------
> 
> Almost all igmp functions accessing inet->mc_list are protected by
> rtnl_lock(), but there is one exception which is ip_mc_sf_allow(),
> so there is a chance of either ip_mc_drop_socket or ip_mc_leave_group
> remove an entry while ip_mc_sf_allow is running causing a crash.
> 
> Signed-off-by: Flavio Leitner <fleitner@redhat.com>

Applied to net-next-2.6, thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2010-02-02 15:32 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-30 14:23 [PATCH] igmp: fix ip_mc_sf_allow race Flavio Leitner
2010-01-04  5:54 ` David Miller
2010-01-04 11:29   ` Flavio Leitner
2010-01-04 13:07     ` Eric Dumazet
2010-01-04 18:51       ` Flavio Leitner
2010-01-04 19:53         ` Eric Dumazet
2010-01-05  0:06         ` David Stevens
2010-01-05  6:55           ` Eric Dumazet
2010-01-05 20:52             ` [PATCH] igmp: fix ip_mc_sf_allow race [v3] Flavio Leitner
2010-01-05 22:36               ` Eric Dumazet
2010-01-05 23:03               ` Stephen Hemminger
2010-01-06 16:40               ` Paul E. McKenney
2010-01-06 17:10                 ` Stephen Hemminger
2010-01-06 18:50                   ` Paul E. McKenney
2010-01-28 16:13               ` [PATCH] igmp: fix ip_mc_sf_allow race [v5] Flavio Leitner
2010-02-02 15:32                 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).