* [PATCH] igmp: fix ip_mc_sf_allow race @ 2009-12-30 14:23 Flavio Leitner 2010-01-04 5:54 ` David Miller 0 siblings, 1 reply; 16+ messages in thread From: Flavio Leitner @ 2009-12-30 14:23 UTC (permalink / raw) To: netdev; +Cc: Flavio Leitner Almost all igmp functions accessing inet->mc_list are protected by rtnl_lock(), but there is one exception which is ip_mc_sf_allow(), so there is a chance of either ip_mc_drop_socket or ip_mc_leave_group remove an entry while ip_mc_sf_allow is running causing a crash. Signed-off-by: Flavio Leitner <fleitner@redhat.com> --- net/ipv4/igmp.c | 56 +++++++++++++++++++++++++++++++++++++++++++----------- 1 files changed, 44 insertions(+), 12 deletions(-) diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c index 76c0840..d1e2a0c 100644 --- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -1799,7 +1799,7 @@ int ip_mc_join_group(struct sock *sk , struct ip_mreqn *imr) iml->next = inet->mc_list; iml->sflist = NULL; iml->sfmode = MCAST_EXCLUDE; - inet->mc_list = iml; + rcu_assign_pointer(inet->mc_list, iml); ip_mc_inc_group(in_dev, addr); err = 0; done: @@ -1854,11 +1854,12 @@ int ip_mc_leave_group(struct sock *sk, struct ip_mreqn *imr) (void) ip_mc_leave_src(sk, iml, in_dev); - *imlp = iml->next; + rcu_assign_pointer(*imlp, iml->next); if (in_dev) ip_mc_dec_group(in_dev, group); rtnl_unlock(); + synchronize_rcu(); sock_kfree_s(sk, iml, sizeof(*iml)); return 0; } @@ -2209,30 +2210,40 @@ int ip_mc_sf_allow(struct sock *sk, __be32 loc_addr, __be32 rmt_addr, int dif) struct ip_mc_socklist *pmc; struct ip_sf_socklist *psl; int i; + int ret; + ret = 1; if (!ipv4_is_multicast(loc_addr)) - return 1; + goto out; - for (pmc=inet->mc_list; pmc; pmc=pmc->next) { + rcu_read_lock(); + for (pmc=rcu_dereference(inet->mc_list); pmc; pmc=rcu_dereference(pmc->next)) { if (pmc->multi.imr_multiaddr.s_addr == loc_addr && pmc->multi.imr_ifindex == dif) break; } + ret = inet->mc_all; if (!pmc) - return inet->mc_all; + goto unlock; psl = pmc->sflist; + ret = (pmc->sfmode == MCAST_EXCLUDE); if (!psl) - return pmc->sfmode == MCAST_EXCLUDE; + goto unlock; for (i=0; i<psl->sl_count; i++) { if (psl->sl_addr[i] == rmt_addr) break; } + ret = 0; if (pmc->sfmode == MCAST_INCLUDE && i >= psl->sl_count) - return 0; + goto unlock; if (pmc->sfmode == MCAST_EXCLUDE && i < psl->sl_count) - return 0; - return 1; + goto unlock; + ret = 1; +unlock: + rcu_read_unlock(); +out: + return ret; } /* @@ -2243,15 +2254,21 @@ void ip_mc_drop_socket(struct sock *sk) { struct inet_sock *inet = inet_sk(sk); struct ip_mc_socklist *iml; + struct ip_mc_socklist *rmlist; struct net *net = sock_net(sk); - if (inet->mc_list == NULL) + rcu_read_lock(); + if (rcu_dereference(inet->mc_list) == NULL) { + rcu_read_unlock(); return; + } + rcu_read_unlock(); rtnl_lock(); + rmlist = NULL; while ((iml = inet->mc_list) != NULL) { struct in_device *in_dev; - inet->mc_list = iml->next; + rcu_assign_pointer(inet->mc_list, iml->next); in_dev = inetdev_by_index(net, iml->multi.imr_ifindex); (void) ip_mc_leave_src(sk, iml, in_dev); @@ -2259,9 +2276,24 @@ void ip_mc_drop_socket(struct sock *sk) ip_mc_dec_group(in_dev, iml->multi.imr_multiaddr.s_addr); in_dev_put(in_dev); } - sock_kfree_s(sk, iml, sizeof(*iml)); + if (rmlist == NULL) { + iml->next = NULL; + rmlist = iml; + } + else { + iml->next = rmlist->next; + rmlist->next = iml; + } } rtnl_unlock(); + + synchronize_rcu(); + + while (rmlist != NULL) { + iml = rmlist; + rmlist = iml->next; + sock_kfree_s(sk, iml, sizeof(*iml)); + } } int ip_check_mc(struct in_device *in_dev, __be32 mc_addr, __be32 src_addr, u16 proto) -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] igmp: fix ip_mc_sf_allow race 2009-12-30 14:23 [PATCH] igmp: fix ip_mc_sf_allow race Flavio Leitner @ 2010-01-04 5:54 ` David Miller 2010-01-04 11:29 ` Flavio Leitner 0 siblings, 1 reply; 16+ messages in thread From: David Miller @ 2010-01-04 5:54 UTC (permalink / raw) To: fleitner; +Cc: netdev From: Flavio Leitner <fleitner@redhat.com> Date: Wed, 30 Dec 2009 12:23:25 -0200 > Almost all igmp functions accessing inet->mc_list are protected by > rtnl_lock(), but there is one exception which is ip_mc_sf_allow(), > so there is a chance of either ip_mc_drop_socket or ip_mc_leave_group > remove an entry while ip_mc_sf_allow is running causing a crash. > > Signed-off-by: Flavio Leitner <fleitner@redhat.com> Have you triggered this in practice or is this due purely to code inspection? That new synchronize_rcu() is very expensive and will decrease the rate at which groups can be joined and left, _especially_ on high cpu count machines. I do not think it is therefore a suitable problem to this race, if it does in fact exist. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] igmp: fix ip_mc_sf_allow race 2010-01-04 5:54 ` David Miller @ 2010-01-04 11:29 ` Flavio Leitner 2010-01-04 13:07 ` Eric Dumazet 0 siblings, 1 reply; 16+ messages in thread From: Flavio Leitner @ 2010-01-04 11:29 UTC (permalink / raw) To: David Miller; +Cc: netdev On Sun, Jan 03, 2010 at 09:54:41PM -0800, David Miller wrote: > From: Flavio Leitner <fleitner@redhat.com> > Date: Wed, 30 Dec 2009 12:23:25 -0200 > > > Almost all igmp functions accessing inet->mc_list are protected by > > rtnl_lock(), but there is one exception which is ip_mc_sf_allow(), > > so there is a chance of either ip_mc_drop_socket or ip_mc_leave_group > > remove an entry while ip_mc_sf_allow is running causing a crash. > > > > Signed-off-by: Flavio Leitner <fleitner@redhat.com> > > Have you triggered this in practice or is this due purely > to code inspection? I had to modify the code to reproduce introducing a delay in ip_mc_sf_allow(), but a customer is able to reproduce it when avahi-daemon runs at boot time. CPU: Intel(R) Xeon(R) CPU X5570 @ 2.93GHz stepping 05 BUG: unable to handle kernel paging request at virtual address 005e0005 printing eip: c05f2194 *pde = 00000000 Oops: 0000 [#1] SMP last sysfs file: /devices/pci0000:7f/0000:7f:06.0/irq Modules linked in: nfs lockd fscache nfs_acl autofs4 hidp l2cap bluetooth sunrpc 8021q ipv6 xfrm_nalgo crypto_api dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core but ton battery asus_acpi ac parport_pc lp parport sg e1000(U) pcspkr dm_raid45 dm_message dm_ region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod hfcldd(FU) sd_mod scs i_mod hfcldd_conf(U) hraslog_link(U) ext3 jbd uhci_hcd ohci_hcd ehci_hcd CPU: 0 EIP: 0060:[<c05f2194>] Tainted: GF VLI EFLAGS: 00210202 (2.6.18-128.el5PAE #1) EIP is at ip_mc_sf_allow+0x20/0x79 eax: 005e0001 ebx: f6cb9100 ecx: 00000008 edx: fb0000e0 esi: 5acb10ac edi: f63414e9 ebp: f7ae3200 esp: c0732ea4 ds: 007b es: 007b ss: 0068 Process xlinpack_xeon32 (pid: 5194, ti=c0732000 task=cfd2daa0 task.ti=f5d30000) Stack: f6cb9108 f6cb9100 c05ea1eb 00000008 d03e9034 5acb10ac fb0000e0 e9140000 00000008 e91414e9 f7ae3200 c06ab4a8 00000000 00000000 c05ce1d5 f7ae3200 00000000 f7ae3200 d03e9020 c05ce042 f7ae3200 c07d6988 c06ab560 00000008 Call Trace: [<c05ea1eb>] udp_rcv+0x1f4/0x514 [<c05ce1d5>] ip_local_deliver+0x159/0x204 [<c05ce042>] ip_rcv+0x46f/0x4a9 [<c05b397d>] netif_receive_skb+0x30c/0x330 [<f8924be0>] e1000_clean_rx_irq+0xf0/0x3e0 [e1000] [<f8924af0>] e1000_clean_rx_irq+0x0/0x3e0 [e1000] [<f8922bd4>] e1000_clean+0xf4/0x340 [e1000] [<c04074d6>] do_IRQ+0xb5/0xc3 [<c05b52d4>] net_rx_action+0x92/0x175 [<c042900f>] __do_softirq+0x87/0x114 [<c04073d7>] do_softirq+0x52/0x9c [<c04059d7>] apic_timer_interrupt+0x1f/0x24 ======================= Code: 81 c4 8c 00 00 00 5b 5e 5f 5d c3 56 89 ce 53 89 c3 8b 4c 24 0c 89 d0 25 f0 00 00 00 3d e0 00 00 00 75 59 8b 83 84 01 00 00 eb 0c <39> 50 04 75 05 39 48 0c 74 08 8b 00 85 c0 75 f0 eb 3f 8b 50 14 EIP: [<c05f2194>] ip_mc_sf_allow+0x20/0x79 SS:ESP 0068:c0732ea4 <0>Kernel panic - not syncing: Fatal exception in interrupt --- snip --- > That new synchronize_rcu() is very expensive and will decrease > the rate at which groups can be joined and left, _especially_ > on high cpu count machines. Well, I tried using read-write locking but then the packet reception was slower while another task was playing with multicasting groups. Then, I tried using call_rcu() to avoid the problem you are saying, but when you stop the reproducer, sk_free() will warn printing "optmem leakage.." because the rcu callback didn't run yet. > I do not think it is therefore a suitable problem to this > race, if it does in fact exist. > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Flavio ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] igmp: fix ip_mc_sf_allow race 2010-01-04 11:29 ` Flavio Leitner @ 2010-01-04 13:07 ` Eric Dumazet 2010-01-04 18:51 ` Flavio Leitner 0 siblings, 1 reply; 16+ messages in thread From: Eric Dumazet @ 2010-01-04 13:07 UTC (permalink / raw) To: Flavio Leitner; +Cc: David Miller, netdev Le 04/01/2010 12:29, Flavio Leitner a écrit : > Then, I tried using call_rcu() to avoid the problem you are saying, > but when you stop the reproducer, sk_free() will warn printing > "optmem leakage.." because the rcu callback didn't run yet. > > This is probably because your call_rcu() callback was trying to call sock_kfree_s() ? rtnl_unlock(); call_rcu(&iml->lock, callback_func) callback_func() { sock_kfree_s(sk, iml, sizeof(*iml)); } Take a look at sock_kfree_s() definition : void sock_kfree_s(struct sock *sk, void *mem, int size) { kfree(mem); atomic_sub(size, &sk->sk_omem_alloc); } You can certainly try : rtnl_unlock(); atomic_sub(sizeof(*iml), sk->sk_omem_alloc); call_rcu(&iml->rcu, kfree); (immediate sk_omem_alloc handling, but deferred kfree()) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] igmp: fix ip_mc_sf_allow race 2010-01-04 13:07 ` Eric Dumazet @ 2010-01-04 18:51 ` Flavio Leitner 2010-01-04 19:53 ` Eric Dumazet 2010-01-05 0:06 ` David Stevens 0 siblings, 2 replies; 16+ messages in thread From: Flavio Leitner @ 2010-01-04 18:51 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, netdev On Mon, Jan 04, 2010 at 02:07:03PM +0100, Eric Dumazet wrote: > Le 04/01/2010 12:29, Flavio Leitner a écrit : > > > Then, I tried using call_rcu() to avoid the problem you are saying, > > but when you stop the reproducer, sk_free() will warn printing > > "optmem leakage.." because the rcu callback didn't run yet. > > > > > > This is probably because your call_rcu() callback was trying to call sock_kfree_s() ? yes, correct. > > rtnl_unlock(); > call_rcu(&iml->lock, callback_func) > > callback_func() > { > sock_kfree_s(sk, iml, sizeof(*iml)); > } > > > > Take a look at sock_kfree_s() definition : > > void sock_kfree_s(struct sock *sk, void *mem, int size) > { > kfree(mem); > atomic_sub(size, &sk->sk_omem_alloc); > } > > > You can certainly try : > > rtnl_unlock(); > atomic_sub(sizeof(*iml), sk->sk_omem_alloc); > call_rcu(&iml->rcu, kfree); > > (immediate sk_omem_alloc handling, but deferred kfree()) Ok, below is the new version using call_rcu(). I'm still running my tests here, so I'm planning to resubmit it later if this version is okay with you. thanks! diff --git a/include/linux/igmp.h b/include/linux/igmp.h index 724c27e..9cccd16 100644 --- a/include/linux/igmp.h +++ b/include/linux/igmp.h @@ -170,6 +170,7 @@ struct ip_mc_socklist { struct ip_mreqn multi; unsigned int sfmode; /* MCAST_{INCLUDE,EXCLUDE} */ struct ip_sf_socklist *sflist; + struct rcu_head rcu; }; struct ip_sf_list { diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c index 76c0840..ed154db 100644 --- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -1799,7 +1799,7 @@ int ip_mc_join_group(struct sock *sk , struct ip_mreqn *imr) iml->next = inet->mc_list; iml->sflist = NULL; iml->sfmode = MCAST_EXCLUDE; - inet->mc_list = iml; + rcu_assign_pointer(inet->mc_list, iml); ip_mc_inc_group(in_dev, addr); err = 0; done: @@ -1825,6 +1825,17 @@ static int ip_mc_leave_src(struct sock *sk, struct ip_mc_socklist *iml, return err; } + +void ip_mc_socklist_reclaim(struct rcu_head *rp) +{ + struct ip_mc_socklist *iml; + + iml = container_of(rp, struct ip_mc_socklist, rcu); + /* sk_omem_alloc should have been decreased by the caller*/ + kfree(iml); +} + + /* * Ask a socket to leave a group. */ @@ -1854,12 +1865,15 @@ int ip_mc_leave_group(struct sock *sk, struct ip_mreqn *imr) (void) ip_mc_leave_src(sk, iml, in_dev); - *imlp = iml->next; + rcu_assign_pointer(*imlp, iml->next); if (in_dev) ip_mc_dec_group(in_dev, group); + rtnl_unlock(); - sock_kfree_s(sk, iml, sizeof(*iml)); + /* decrease mem now to avoid the memleak warning */ + atomic_sub(sizeof(*iml), &sk->sk_omem_alloc); + call_rcu(&iml->rcu, ip_mc_socklist_reclaim); return 0; } if (!in_dev) @@ -2209,30 +2223,40 @@ int ip_mc_sf_allow(struct sock *sk, __be32 loc_addr, __be32 rmt_addr, int dif) struct ip_mc_socklist *pmc; struct ip_sf_socklist *psl; int i; + int ret; + ret = 1; if (!ipv4_is_multicast(loc_addr)) - return 1; + goto out; - for (pmc=inet->mc_list; pmc; pmc=pmc->next) { + rcu_read_lock(); + for (pmc=rcu_dereference(inet->mc_list); pmc; pmc=rcu_dereference(pmc->next)) { if (pmc->multi.imr_multiaddr.s_addr == loc_addr && pmc->multi.imr_ifindex == dif) break; } + ret = inet->mc_all; if (!pmc) - return inet->mc_all; + goto unlock; psl = pmc->sflist; + ret = (pmc->sfmode == MCAST_EXCLUDE); if (!psl) - return pmc->sfmode == MCAST_EXCLUDE; + goto unlock; for (i=0; i<psl->sl_count; i++) { if (psl->sl_addr[i] == rmt_addr) break; } + ret = 0; if (pmc->sfmode == MCAST_INCLUDE && i >= psl->sl_count) - return 0; + goto unlock; if (pmc->sfmode == MCAST_EXCLUDE && i < psl->sl_count) - return 0; - return 1; + goto unlock; + ret = 1; +unlock: + rcu_read_unlock(); +out: + return ret; } /* @@ -2245,13 +2269,17 @@ void ip_mc_drop_socket(struct sock *sk) struct ip_mc_socklist *iml; struct net *net = sock_net(sk); - if (inet->mc_list == NULL) + rcu_read_lock(); + if (rcu_dereference(inet->mc_list) == NULL) { + rcu_read_unlock(); return; + } + rcu_read_unlock(); rtnl_lock(); - while ((iml = inet->mc_list) != NULL) { + while ((iml = rcu_dereference(inet->mc_list)) != NULL) { struct in_device *in_dev; - inet->mc_list = iml->next; + rcu_assign_pointer(inet->mc_list, iml->next); in_dev = inetdev_by_index(net, iml->multi.imr_ifindex); (void) ip_mc_leave_src(sk, iml, in_dev); @@ -2259,7 +2287,9 @@ void ip_mc_drop_socket(struct sock *sk) ip_mc_dec_group(in_dev, iml->multi.imr_multiaddr.s_addr); in_dev_put(in_dev); } - sock_kfree_s(sk, iml, sizeof(*iml)); + /* decrease mem now to avoid the memleak warning */ + atomic_sub(sizeof(*iml), &sk->sk_omem_alloc); + call_rcu(&iml->rcu, ip_mc_socklist_reclaim); } rtnl_unlock(); } -- 1.6.2.3 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] igmp: fix ip_mc_sf_allow race 2010-01-04 18:51 ` Flavio Leitner @ 2010-01-04 19:53 ` Eric Dumazet 2010-01-05 0:06 ` David Stevens 1 sibling, 0 replies; 16+ messages in thread From: Eric Dumazet @ 2010-01-04 19:53 UTC (permalink / raw) To: Flavio Leitner; +Cc: David Miller, netdev Le 04/01/2010 19:51, Flavio Leitner a écrit : > On Mon, Jan 04, 2010 at 02:07:03PM +0100, Eric Dumazet wrote: >> Le 04/01/2010 12:29, Flavio Leitner a écrit : >> >>> Then, I tried using call_rcu() to avoid the problem you are saying, >>> but when you stop the reproducer, sk_free() will warn printing >>> "optmem leakage.." because the rcu callback didn't run yet. >>> >>> >> >> This is probably because your call_rcu() callback was trying to call sock_kfree_s() ? > > yes, correct. > >> >> rtnl_unlock(); >> call_rcu(&iml->lock, callback_func) >> >> callback_func() >> { >> sock_kfree_s(sk, iml, sizeof(*iml)); >> } >> >> >> >> Take a look at sock_kfree_s() definition : >> >> void sock_kfree_s(struct sock *sk, void *mem, int size) >> { >> kfree(mem); >> atomic_sub(size, &sk->sk_omem_alloc); >> } >> >> >> You can certainly try : >> >> rtnl_unlock(); >> atomic_sub(sizeof(*iml), sk->sk_omem_alloc); >> call_rcu(&iml->rcu, kfree); >> >> (immediate sk_omem_alloc handling, but deferred kfree()) > > Ok, below is the new version using call_rcu(). I'm still running my > tests here, so I'm planning to resubmit it later if this version is > okay with you. It seems fine, but please make ip_mc_socklist_reclaim() static : > + > +void ip_mc_socklist_reclaim(struct rcu_head *rp) > +{ > + struct ip_mc_socklist *iml; > + > + iml = container_of(rp, struct ip_mc_socklist, rcu); > + /* sk_omem_alloc should have been decreased by the caller*/ > + kfree(iml); > +} > + > + Thanks ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] igmp: fix ip_mc_sf_allow race 2010-01-04 18:51 ` Flavio Leitner 2010-01-04 19:53 ` Eric Dumazet @ 2010-01-05 0:06 ` David Stevens 2010-01-05 6:55 ` Eric Dumazet 1 sibling, 1 reply; 16+ messages in thread From: David Stevens @ 2010-01-05 0:06 UTC (permalink / raw) To: Flavio Leitner; +Cc: David Miller, Eric Dumazet, netdev, netdev-owner For readability, instead of: ret = X; if (condition) goto out; I prefer: if (condition) { ret = X; goto out; } which makes it clear that ret is for the return and not some random state change unrelated to the condition. Otherwise, looks ok to me. +-DLS Acked-by: David L Stevens <dlstevens@us.ibm.com> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] igmp: fix ip_mc_sf_allow race 2010-01-05 0:06 ` David Stevens @ 2010-01-05 6:55 ` Eric Dumazet 2010-01-05 20:52 ` [PATCH] igmp: fix ip_mc_sf_allow race [v3] Flavio Leitner 0 siblings, 1 reply; 16+ messages in thread From: Eric Dumazet @ 2010-01-05 6:55 UTC (permalink / raw) To: David Stevens; +Cc: Flavio Leitner, David Miller, netdev, netdev-owner Le 05/01/2010 01:06, David Stevens a écrit : > For readability, instead of: > > ret = X; > if (condition) > goto out; > > I prefer: > > if (condition) { > ret = X; > goto out; > } > > which makes it clear that ret is for the return and not > some random state change unrelated to the condition. > Linus argument for this common Linux coding style is that generated code is smaller mov $-6,%eax cmp condition je out versus cmp condition jne .ok mov $-6,%eax jmp out .ok: ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH] igmp: fix ip_mc_sf_allow race [v3] 2010-01-05 6:55 ` Eric Dumazet @ 2010-01-05 20:52 ` Flavio Leitner 2010-01-05 22:36 ` Eric Dumazet ` (3 more replies) 0 siblings, 4 replies; 16+ messages in thread From: Flavio Leitner @ 2010-01-05 20:52 UTC (permalink / raw) To: netdev; +Cc: David Miller, David Stevens, Eric Dumazet, Flavio Leitner Almost all igmp functions accessing inet->mc_list are protected by rtnl_lock(), but there is one exception which is ip_mc_sf_allow(), so there is a chance of either ip_mc_drop_socket or ip_mc_leave_group remove an entry while ip_mc_sf_allow is running causing a crash. Signed-off-by: Flavio Leitner <fleitner@redhat.com> --- include/linux/igmp.h | 1 + net/ipv4/igmp.c | 58 +++++++++++++++++++++++++++++++++++++------------ 2 files changed, 45 insertions(+), 14 deletions(-) diff --git a/include/linux/igmp.h b/include/linux/igmp.h index 724c27e..9cccd16 100644 --- a/include/linux/igmp.h +++ b/include/linux/igmp.h @@ -170,6 +170,7 @@ struct ip_mc_socklist { struct ip_mreqn multi; unsigned int sfmode; /* MCAST_{INCLUDE,EXCLUDE} */ struct ip_sf_socklist *sflist; + struct rcu_head rcu; }; struct ip_sf_list { diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c index 76c0840..61ff685 100644 --- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -1799,7 +1799,7 @@ int ip_mc_join_group(struct sock *sk , struct ip_mreqn *imr) iml->next = inet->mc_list; iml->sflist = NULL; iml->sfmode = MCAST_EXCLUDE; - inet->mc_list = iml; + rcu_assign_pointer(inet->mc_list, iml); ip_mc_inc_group(in_dev, addr); err = 0; done: @@ -1825,6 +1825,17 @@ static int ip_mc_leave_src(struct sock *sk, struct ip_mc_socklist *iml, return err; } + +static void ip_mc_socklist_reclaim(struct rcu_head *rp) +{ + struct ip_mc_socklist *iml; + + iml = container_of(rp, struct ip_mc_socklist, rcu); + /* sk_omem_alloc should have been decreased by the caller*/ + kfree(iml); +} + + /* * Ask a socket to leave a group. */ @@ -1854,12 +1865,15 @@ int ip_mc_leave_group(struct sock *sk, struct ip_mreqn *imr) (void) ip_mc_leave_src(sk, iml, in_dev); - *imlp = iml->next; + rcu_assign_pointer(*imlp, iml->next); if (in_dev) ip_mc_dec_group(in_dev, group); + rtnl_unlock(); - sock_kfree_s(sk, iml, sizeof(*iml)); + /* decrease mem now to avoid the memleak warning */ + atomic_sub(sizeof(*iml), &sk->sk_omem_alloc); + call_rcu(&iml->rcu, ip_mc_socklist_reclaim); return 0; } if (!in_dev) @@ -2209,30 +2223,40 @@ int ip_mc_sf_allow(struct sock *sk, __be32 loc_addr, __be32 rmt_addr, int dif) struct ip_mc_socklist *pmc; struct ip_sf_socklist *psl; int i; + int ret; + ret = 1; if (!ipv4_is_multicast(loc_addr)) - return 1; + goto out; - for (pmc=inet->mc_list; pmc; pmc=pmc->next) { + rcu_read_lock(); + for (pmc=rcu_dereference(inet->mc_list); pmc; pmc=rcu_dereference(pmc->next)) { if (pmc->multi.imr_multiaddr.s_addr == loc_addr && pmc->multi.imr_ifindex == dif) break; } + ret = inet->mc_all; if (!pmc) - return inet->mc_all; + goto unlock; psl = pmc->sflist; + ret = (pmc->sfmode == MCAST_EXCLUDE); if (!psl) - return pmc->sfmode == MCAST_EXCLUDE; + goto unlock; for (i=0; i<psl->sl_count; i++) { if (psl->sl_addr[i] == rmt_addr) break; } + ret = 0; if (pmc->sfmode == MCAST_INCLUDE && i >= psl->sl_count) - return 0; + goto unlock; if (pmc->sfmode == MCAST_EXCLUDE && i < psl->sl_count) - return 0; - return 1; + goto unlock; + ret = 1; +unlock: + rcu_read_unlock(); +out: + return ret; } /* @@ -2245,13 +2269,17 @@ void ip_mc_drop_socket(struct sock *sk) struct ip_mc_socklist *iml; struct net *net = sock_net(sk); - if (inet->mc_list == NULL) + rcu_read_lock(); + if (rcu_dereference(inet->mc_list) == NULL) { + rcu_read_unlock(); return; + } + rcu_read_unlock(); rtnl_lock(); - while ((iml = inet->mc_list) != NULL) { + while ((iml = rcu_dereference(inet->mc_list)) != NULL) { struct in_device *in_dev; - inet->mc_list = iml->next; + rcu_assign_pointer(inet->mc_list, iml->next); in_dev = inetdev_by_index(net, iml->multi.imr_ifindex); (void) ip_mc_leave_src(sk, iml, in_dev); @@ -2259,7 +2287,9 @@ void ip_mc_drop_socket(struct sock *sk) ip_mc_dec_group(in_dev, iml->multi.imr_multiaddr.s_addr); in_dev_put(in_dev); } - sock_kfree_s(sk, iml, sizeof(*iml)); + /* decrease mem now to avoid the memleak warning */ + atomic_sub(sizeof(*iml), &sk->sk_omem_alloc); + call_rcu(&iml->rcu, ip_mc_socklist_reclaim); } rtnl_unlock(); } -- 1.6.2.3 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] igmp: fix ip_mc_sf_allow race [v3] 2010-01-05 20:52 ` [PATCH] igmp: fix ip_mc_sf_allow race [v3] Flavio Leitner @ 2010-01-05 22:36 ` Eric Dumazet 2010-01-05 23:03 ` Stephen Hemminger ` (2 subsequent siblings) 3 siblings, 0 replies; 16+ messages in thread From: Eric Dumazet @ 2010-01-05 22:36 UTC (permalink / raw) To: Flavio Leitner; +Cc: netdev, David Miller, David Stevens Le 05/01/2010 21:52, Flavio Leitner a écrit : > Almost all igmp functions accessing inet->mc_list are protected by > rtnl_lock(), but there is one exception which is ip_mc_sf_allow(), > so there is a chance of either ip_mc_drop_socket or ip_mc_leave_group > remove an entry while ip_mc_sf_allow is running causing a crash. > > Signed-off-by: Flavio Leitner <fleitner@redhat.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Small notes : in ip_mc_drop_socket() // rcu_read_lock()/unlock() seems not really needed here, we only want to avoid //the fill rtnlçlock() in case this socket have a NULL mc_list. rcu_read_lock(); if (rcu_dereference(inet->mc_list) == NULL) { rcu_read_unlock(); return; } rcu_read_unlock(); rtnl_lock(); while ((iml = rcu_dereference(inet->mc_list)) != NULL) { rcu_dereference() is not really needed here, since you own RTNL ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] igmp: fix ip_mc_sf_allow race [v3] 2010-01-05 20:52 ` [PATCH] igmp: fix ip_mc_sf_allow race [v3] Flavio Leitner 2010-01-05 22:36 ` Eric Dumazet @ 2010-01-05 23:03 ` Stephen Hemminger 2010-01-06 16:40 ` Paul E. McKenney 2010-01-28 16:13 ` [PATCH] igmp: fix ip_mc_sf_allow race [v5] Flavio Leitner 3 siblings, 0 replies; 16+ messages in thread From: Stephen Hemminger @ 2010-01-05 23:03 UTC (permalink / raw) To: Flavio Leitner Cc: netdev, David Miller, David Stevens, Eric Dumazet, Flavio Leitner On Tue, 5 Jan 2010 18:52:22 -0200 Flavio Leitner <fleitner@redhat.com> wrote: > @@ -2245,13 +2269,17 @@ void ip_mc_drop_socket(struct sock *sk) > struct ip_mc_socklist *iml; > struct net *net = sock_net(sk); > > - if (inet->mc_list == NULL) > + rcu_read_lock(); > + if (rcu_dereference(inet->mc_list) == NULL) { > + rcu_read_unlock(); > return; > + } > + rcu_read_unlock(); > > rtnl_lock(); > - while ((iml = i All this would be cleaner if mc_list was using list_head and the existing list_head_rcu stuff. -- ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] igmp: fix ip_mc_sf_allow race [v3] 2010-01-05 20:52 ` [PATCH] igmp: fix ip_mc_sf_allow race [v3] Flavio Leitner 2010-01-05 22:36 ` Eric Dumazet 2010-01-05 23:03 ` Stephen Hemminger @ 2010-01-06 16:40 ` Paul E. McKenney 2010-01-06 17:10 ` Stephen Hemminger 2010-01-28 16:13 ` [PATCH] igmp: fix ip_mc_sf_allow race [v5] Flavio Leitner 3 siblings, 1 reply; 16+ messages in thread From: Paul E. McKenney @ 2010-01-06 16:40 UTC (permalink / raw) To: Flavio Leitner; +Cc: netdev, David Miller, David Stevens, Eric Dumazet On Tue, Jan 05, 2010 at 06:52:22PM -0200, Flavio Leitner wrote: > Almost all igmp functions accessing inet->mc_list are protected by > rtnl_lock(), but there is one exception which is ip_mc_sf_allow(), > so there is a chance of either ip_mc_drop_socket or ip_mc_leave_group > remove an entry while ip_mc_sf_allow is running causing a crash. Looks like a good start from an RCU perspective, though I don't claim to understand networking locking design. A couple of questions below. Thanx, Paul > Signed-off-by: Flavio Leitner <fleitner@redhat.com> > --- > include/linux/igmp.h | 1 + > net/ipv4/igmp.c | 58 +++++++++++++++++++++++++++++++++++++------------ > 2 files changed, 45 insertions(+), 14 deletions(-) > > diff --git a/include/linux/igmp.h b/include/linux/igmp.h > index 724c27e..9cccd16 100644 > --- a/include/linux/igmp.h > +++ b/include/linux/igmp.h > @@ -170,6 +170,7 @@ struct ip_mc_socklist { > struct ip_mreqn multi; > unsigned int sfmode; /* MCAST_{INCLUDE,EXCLUDE} */ > struct ip_sf_socklist *sflist; > + struct rcu_head rcu; > }; > > struct ip_sf_list { > diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c > index 76c0840..61ff685 100644 > --- a/net/ipv4/igmp.c > +++ b/net/ipv4/igmp.c > @@ -1799,7 +1799,7 @@ int ip_mc_join_group(struct sock *sk , struct ip_mreqn *imr) > iml->next = inet->mc_list; > iml->sflist = NULL; > iml->sfmode = MCAST_EXCLUDE; > - inet->mc_list = iml; > + rcu_assign_pointer(inet->mc_list, iml); > ip_mc_inc_group(in_dev, addr); > err = 0; > done: > @@ -1825,6 +1825,17 @@ static int ip_mc_leave_src(struct sock *sk, struct ip_mc_socklist *iml, > return err; > } > > + > +static void ip_mc_socklist_reclaim(struct rcu_head *rp) > +{ > + struct ip_mc_socklist *iml; > + > + iml = container_of(rp, struct ip_mc_socklist, rcu); > + /* sk_omem_alloc should have been decreased by the caller*/ > + kfree(iml); > +} > + > + > /* > * Ask a socket to leave a group. > */ > @@ -1854,12 +1865,15 @@ int ip_mc_leave_group(struct sock *sk, struct ip_mreqn *imr) > > (void) ip_mc_leave_src(sk, iml, in_dev); Suppose some other CPU invokes ip_mc_sf_allow() at this point. Will that CPU be OK with the current state of the structure pointed to by iml? If not, some of the above code might need to be deferred to follow the grace period. > - *imlp = iml->next; > + rcu_assign_pointer(*imlp, iml->next); > > if (in_dev) > ip_mc_dec_group(in_dev, group); > + > rtnl_unlock(); > - sock_kfree_s(sk, iml, sizeof(*iml)); > + /* decrease mem now to avoid the memleak warning */ > + atomic_sub(sizeof(*iml), &sk->sk_omem_alloc); > + call_rcu(&iml->rcu, ip_mc_socklist_reclaim); > return 0; > } > if (!in_dev) > @@ -2209,30 +2223,40 @@ int ip_mc_sf_allow(struct sock *sk, __be32 loc_addr, __be32 rmt_addr, int dif) > struct ip_mc_socklist *pmc; > struct ip_sf_socklist *psl; > int i; > + int ret; > > + ret = 1; > if (!ipv4_is_multicast(loc_addr)) > - return 1; > + goto out; > > - for (pmc=inet->mc_list; pmc; pmc=pmc->next) { > + rcu_read_lock(); > + for (pmc=rcu_dereference(inet->mc_list); pmc; pmc=rcu_dereference(pmc->next)) { > if (pmc->multi.imr_multiaddr.s_addr == loc_addr && > pmc->multi.imr_ifindex == dif) > break; > } > + ret = inet->mc_all; > if (!pmc) > - return inet->mc_all; > + goto unlock; > psl = pmc->sflist; > + ret = (pmc->sfmode == MCAST_EXCLUDE); > if (!psl) > - return pmc->sfmode == MCAST_EXCLUDE; > + goto unlock; > > for (i=0; i<psl->sl_count; i++) { > if (psl->sl_addr[i] == rmt_addr) > break; > } > + ret = 0; > if (pmc->sfmode == MCAST_INCLUDE && i >= psl->sl_count) > - return 0; > + goto unlock; > if (pmc->sfmode == MCAST_EXCLUDE && i < psl->sl_count) > - return 0; > - return 1; > + goto unlock; > + ret = 1; > +unlock: > + rcu_read_unlock(); > +out: > + return ret; > } > > /* > @@ -2245,13 +2269,17 @@ void ip_mc_drop_socket(struct sock *sk) > struct ip_mc_socklist *iml; > struct net *net = sock_net(sk); > > - if (inet->mc_list == NULL) > + rcu_read_lock(); > + if (rcu_dereference(inet->mc_list) == NULL) { > + rcu_read_unlock(); > return; > + } > + rcu_read_unlock(); I don't understand what rcu_read_lock() is protecting here. The test is still unstable -- just after finding inet->mc_list non-NULL, ip_mc_leave_group() might cause it to become NULL. Is there a need to protect sock_net(sk)? (I don't believe so, but then again, I don't claim to understand locking in Linux networking.) If there is no need, it should be possible to drop the rcu_read_lock(), rcu_read_unlock(), and rcu_dereference() above. (You might want them for documentation purposes, as they aren't hurting anything, just wondering what the intent is.) > rtnl_lock(); > - while ((iml = inet->mc_list) != NULL) { > + while ((iml = rcu_dereference(inet->mc_list)) != NULL) { > struct in_device *in_dev; > - inet->mc_list = iml->next; > + rcu_assign_pointer(inet->mc_list, iml->next); > > in_dev = inetdev_by_index(net, iml->multi.imr_ifindex); > (void) ip_mc_leave_src(sk, iml, in_dev); > @@ -2259,7 +2287,9 @@ void ip_mc_drop_socket(struct sock *sk) > ip_mc_dec_group(in_dev, iml->multi.imr_multiaddr.s_addr); > in_dev_put(in_dev); > } > - sock_kfree_s(sk, iml, sizeof(*iml)); > + /* decrease mem now to avoid the memleak warning */ > + atomic_sub(sizeof(*iml), &sk->sk_omem_alloc); > + call_rcu(&iml->rcu, ip_mc_socklist_reclaim); > } > rtnl_unlock(); > } > -- > 1.6.2.3 > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] igmp: fix ip_mc_sf_allow race [v3] 2010-01-06 16:40 ` Paul E. McKenney @ 2010-01-06 17:10 ` Stephen Hemminger 2010-01-06 18:50 ` Paul E. McKenney 0 siblings, 1 reply; 16+ messages in thread From: Stephen Hemminger @ 2010-01-06 17:10 UTC (permalink / raw) To: paulmck; +Cc: Flavio Leitner, netdev, David Miller, David Stevens, Eric Dumazet On Wed, 6 Jan 2010 08:40:27 -0800 "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote: > > - if (inet->mc_list == NULL) > > + rcu_read_lock(); > > + if (rcu_dereference(inet->mc_list) == NULL) { > > + rcu_read_unlock(); > > return; > > + } > > + rcu_read_unlock(); > > I don't understand what rcu_read_lock() is protecting here. The > test is still unstable -- just after finding inet->mc_list non-NULL, > ip_mc_leave_group() might cause it to become NULL. > > Is there a need to protect sock_net(sk)? (I don't believe so, but then > again, I don't claim to understand locking in Linux networking.) > If there is no need, it should be possible to drop the rcu_read_lock(), > rcu_read_unlock(), and rcu_dereference() above. (You might want them > for documentation purposes, as they aren't hurting anything, just > wondering what the intent is.) I think code is trying to avoid looking at mc_list if no multicast addresses. But it is an unsafe check. If mc_list was just converted to list_head this would all be clearer ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] igmp: fix ip_mc_sf_allow race [v3] 2010-01-06 17:10 ` Stephen Hemminger @ 2010-01-06 18:50 ` Paul E. McKenney 0 siblings, 0 replies; 16+ messages in thread From: Paul E. McKenney @ 2010-01-06 18:50 UTC (permalink / raw) To: Stephen Hemminger Cc: Flavio Leitner, netdev, David Miller, David Stevens, Eric Dumazet On Wed, Jan 06, 2010 at 09:10:07AM -0800, Stephen Hemminger wrote: > On Wed, 6 Jan 2010 08:40:27 -0800 > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote: > > > > - if (inet->mc_list == NULL) > > > + rcu_read_lock(); > > > + if (rcu_dereference(inet->mc_list) == NULL) { > > > + rcu_read_unlock(); > > > return; > > > + } > > > + rcu_read_unlock(); > > > > I don't understand what rcu_read_lock() is protecting here. The > > test is still unstable -- just after finding inet->mc_list non-NULL, > > ip_mc_leave_group() might cause it to become NULL. > > > > Is there a need to protect sock_net(sk)? (I don't believe so, but then > > again, I don't claim to understand locking in Linux networking.) > > If there is no need, it should be possible to drop the rcu_read_lock(), > > rcu_read_unlock(), and rcu_dereference() above. (You might want them > > for documentation purposes, as they aren't hurting anything, just > > wondering what the intent is.) > > I think code is trying to avoid looking at mc_list if no multicast > addresses. But it is an unsafe check. Fair enough! Might be worth a comment saying that the rcu_read_lock(), rcu_read_unlock()s, and rcu_dereference() are just for show. > If mc_list was just converted to list_head this would all be clearer Agreed! ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH] igmp: fix ip_mc_sf_allow race [v5] 2010-01-05 20:52 ` [PATCH] igmp: fix ip_mc_sf_allow race [v3] Flavio Leitner ` (2 preceding siblings ...) 2010-01-06 16:40 ` Paul E. McKenney @ 2010-01-28 16:13 ` Flavio Leitner 2010-02-02 15:32 ` David Miller 3 siblings, 1 reply; 16+ messages in thread From: Flavio Leitner @ 2010-01-28 16:13 UTC (permalink / raw) To: netdev; +Cc: David Miller, David Stevens, Eric Dumazet Hello, This is another version considering the suggestions. Thanks! I didn't convert to list_head yet because a lot more code needs to be changed so I'd rather leave that to another patch. 8<--------- Almost all igmp functions accessing inet->mc_list are protected by rtnl_lock(), but there is one exception which is ip_mc_sf_allow(), so there is a chance of either ip_mc_drop_socket or ip_mc_leave_group remove an entry while ip_mc_sf_allow is running causing a crash. Signed-off-by: Flavio Leitner <fleitner@redhat.com> --- include/linux/igmp.h | 2 + net/ipv4/igmp.c | 83 +++++++++++++++++++++++++++++++++++++------------ 2 files changed, 64 insertions(+), 21 deletions(-) diff --git a/include/linux/igmp.h b/include/linux/igmp.h index 724c27e..93fc244 100644 --- a/include/linux/igmp.h +++ b/include/linux/igmp.h @@ -153,6 +153,7 @@ extern int sysctl_igmp_max_msf; struct ip_sf_socklist { unsigned int sl_max; unsigned int sl_count; + struct rcu_head rcu; __be32 sl_addr[0]; }; @@ -170,6 +171,7 @@ struct ip_mc_socklist { struct ip_mreqn multi; unsigned int sfmode; /* MCAST_{INCLUDE,EXCLUDE} */ struct ip_sf_socklist *sflist; + struct rcu_head rcu; }; struct ip_sf_list { diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c index 76c0840..54003a5 100644 --- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -1799,7 +1799,7 @@ int ip_mc_join_group(struct sock *sk , struct ip_mreqn *imr) iml->next = inet->mc_list; iml->sflist = NULL; iml->sfmode = MCAST_EXCLUDE; - inet->mc_list = iml; + rcu_assign_pointer(inet->mc_list, iml); ip_mc_inc_group(in_dev, addr); err = 0; done: @@ -1807,24 +1807,46 @@ done: return err; } +static void ip_sf_socklist_reclaim(struct rcu_head *rp) +{ + struct ip_sf_socklist *psf; + + psf = container_of(rp, struct ip_sf_socklist, rcu); + /* sk_omem_alloc should have been decreased by the caller*/ + kfree(psf); +} + static int ip_mc_leave_src(struct sock *sk, struct ip_mc_socklist *iml, struct in_device *in_dev) { + struct ip_sf_socklist *psf = iml->sflist; int err; - if (iml->sflist == NULL) { + if (psf == NULL) { /* any-source empty exclude case */ return ip_mc_del_src(in_dev, &iml->multi.imr_multiaddr.s_addr, iml->sfmode, 0, NULL, 0); } err = ip_mc_del_src(in_dev, &iml->multi.imr_multiaddr.s_addr, - iml->sfmode, iml->sflist->sl_count, - iml->sflist->sl_addr, 0); - sock_kfree_s(sk, iml->sflist, IP_SFLSIZE(iml->sflist->sl_max)); - iml->sflist = NULL; + iml->sfmode, psf->sl_count, psf->sl_addr, 0); + rcu_assign_pointer(iml->sflist, NULL); + /* decrease mem now to avoid the memleak warning */ + atomic_sub(IP_SFLSIZE(psf->sl_max), &sk->sk_omem_alloc); + call_rcu(&psf->rcu, ip_sf_socklist_reclaim); return err; } + +static void ip_mc_socklist_reclaim(struct rcu_head *rp) +{ + struct ip_mc_socklist *iml; + + iml = container_of(rp, struct ip_mc_socklist, rcu); + /* sk_omem_alloc should have been decreased by the caller*/ + kfree(iml); +} + + /* * Ask a socket to leave a group. */ @@ -1854,12 +1876,14 @@ int ip_mc_leave_group(struct sock *sk, struct ip_mreqn *imr) (void) ip_mc_leave_src(sk, iml, in_dev); - *imlp = iml->next; + rcu_assign_pointer(*imlp, iml->next); if (in_dev) ip_mc_dec_group(in_dev, group); rtnl_unlock(); - sock_kfree_s(sk, iml, sizeof(*iml)); + /* decrease mem now to avoid the memleak warning */ + atomic_sub(sizeof(*iml), &sk->sk_omem_alloc); + call_rcu(&iml->rcu, ip_mc_socklist_reclaim); return 0; } if (!in_dev) @@ -1974,9 +1998,12 @@ int ip_mc_source(int add, int omode, struct sock *sk, struct if (psl) { for (i=0; i<psl->sl_count; i++) newpsl->sl_addr[i] = psl->sl_addr[i]; - sock_kfree_s(sk, psl, IP_SFLSIZE(psl->sl_max)); + /* decrease mem now to avoid the memleak warning */ + atomic_sub(IP_SFLSIZE(psl->sl_max), &sk->sk_omem_alloc); + call_rcu(&psl->rcu, ip_sf_socklist_reclaim); } - pmc->sflist = psl = newpsl; + rcu_assign_pointer(pmc->sflist, newpsl); + psl = newpsl; } rv = 1; /* > 0 for insert logic below if sl_count is 0 */ for (i=0; i<psl->sl_count; i++) { @@ -2072,11 +2099,13 @@ int ip_mc_msfilter(struct sock *sk, struct ip_msfilter *msf, int ifindex) if (psl) { (void) ip_mc_del_src(in_dev, &msf->imsf_multiaddr, pmc->sfmode, psl->sl_count, psl->sl_addr, 0); - sock_kfree_s(sk, psl, IP_SFLSIZE(psl->sl_max)); + /* decrease mem now to avoid the memleak warning */ + atomic_sub(IP_SFLSIZE(psl->sl_max), &sk->sk_omem_alloc); + call_rcu(&psl->rcu, ip_sf_socklist_reclaim); } else (void) ip_mc_del_src(in_dev, &msf->imsf_multiaddr, pmc->sfmode, 0, NULL, 0); - pmc->sflist = newpsl; + rcu_assign_pointer(pmc->sflist, newpsl); pmc->sfmode = msf->imsf_fmode; err = 0; done: @@ -2209,30 +2238,40 @@ int ip_mc_sf_allow(struct sock *sk, __be32 loc_addr, __be32 rmt_addr, int dif) struct ip_mc_socklist *pmc; struct ip_sf_socklist *psl; int i; + int ret; + ret = 1; if (!ipv4_is_multicast(loc_addr)) - return 1; + goto out; - for (pmc=inet->mc_list; pmc; pmc=pmc->next) { + rcu_read_lock(); + for (pmc=rcu_dereference(inet->mc_list); pmc; pmc=rcu_dereference(pmc->next)) { if (pmc->multi.imr_multiaddr.s_addr == loc_addr && pmc->multi.imr_ifindex == dif) break; } + ret = inet->mc_all; if (!pmc) - return inet->mc_all; + goto unlock; psl = pmc->sflist; + ret = (pmc->sfmode == MCAST_EXCLUDE); if (!psl) - return pmc->sfmode == MCAST_EXCLUDE; + goto unlock; for (i=0; i<psl->sl_count; i++) { if (psl->sl_addr[i] == rmt_addr) break; } + ret = 0; if (pmc->sfmode == MCAST_INCLUDE && i >= psl->sl_count) - return 0; + goto unlock; if (pmc->sfmode == MCAST_EXCLUDE && i < psl->sl_count) - return 0; - return 1; + goto unlock; + ret = 1; +unlock: + rcu_read_unlock(); +out: + return ret; } /* @@ -2251,7 +2290,7 @@ void ip_mc_drop_socket(struct sock *sk) rtnl_lock(); while ((iml = inet->mc_list) != NULL) { struct in_device *in_dev; - inet->mc_list = iml->next; + rcu_assign_pointer(inet->mc_list, iml->next); in_dev = inetdev_by_index(net, iml->multi.imr_ifindex); (void) ip_mc_leave_src(sk, iml, in_dev); @@ -2259,7 +2298,9 @@ void ip_mc_drop_socket(struct sock *sk) ip_mc_dec_group(in_dev, iml->multi.imr_multiaddr.s_addr); in_dev_put(in_dev); } - sock_kfree_s(sk, iml, sizeof(*iml)); + /* decrease mem now to avoid the memleak warning */ + atomic_sub(sizeof(*iml), &sk->sk_omem_alloc); + call_rcu(&iml->rcu, ip_mc_socklist_reclaim); } rtnl_unlock(); } -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] igmp: fix ip_mc_sf_allow race [v5] 2010-01-28 16:13 ` [PATCH] igmp: fix ip_mc_sf_allow race [v5] Flavio Leitner @ 2010-02-02 15:32 ` David Miller 0 siblings, 0 replies; 16+ messages in thread From: David Miller @ 2010-02-02 15:32 UTC (permalink / raw) To: fbl; +Cc: netdev, dlstevens, eric.dumazet From: Flavio Leitner <fbl@sysclose.org> Date: Thu, 28 Jan 2010 14:13:38 -0200 > Hello, > > This is another version considering the suggestions. Thanks! > I didn't convert to list_head yet because a lot more code needs > to be changed so I'd rather leave that to another patch. > 8<--------- > > Almost all igmp functions accessing inet->mc_list are protected by > rtnl_lock(), but there is one exception which is ip_mc_sf_allow(), > so there is a chance of either ip_mc_drop_socket or ip_mc_leave_group > remove an entry while ip_mc_sf_allow is running causing a crash. > > Signed-off-by: Flavio Leitner <fleitner@redhat.com> Applied to net-next-2.6, thanks. ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2010-02-02 15:32 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-12-30 14:23 [PATCH] igmp: fix ip_mc_sf_allow race Flavio Leitner 2010-01-04 5:54 ` David Miller 2010-01-04 11:29 ` Flavio Leitner 2010-01-04 13:07 ` Eric Dumazet 2010-01-04 18:51 ` Flavio Leitner 2010-01-04 19:53 ` Eric Dumazet 2010-01-05 0:06 ` David Stevens 2010-01-05 6:55 ` Eric Dumazet 2010-01-05 20:52 ` [PATCH] igmp: fix ip_mc_sf_allow race [v3] Flavio Leitner 2010-01-05 22:36 ` Eric Dumazet 2010-01-05 23:03 ` Stephen Hemminger 2010-01-06 16:40 ` Paul E. McKenney 2010-01-06 17:10 ` Stephen Hemminger 2010-01-06 18:50 ` Paul E. McKenney 2010-01-28 16:13 ` [PATCH] igmp: fix ip_mc_sf_allow race [v5] Flavio Leitner 2010-02-02 15:32 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).