* [BUG/Q] can_pernet_exit() leaves devices on dead net
@ 2018-03-01 15:53 Kirill Tkhai
2018-03-05 13:59 ` Oliver Hartkopp
0 siblings, 1 reply; 7+ messages in thread
From: Kirill Tkhai @ 2018-03-01 15:53 UTC (permalink / raw)
To: mkl, socketcan, davem, linux-can, netdev, dev
Hi,
I'm converting/reviewing pernet_operations either they allow several net namespaces
to be created/destroyed in parallel or not. Please, see the details in my recent
patches in net-next.git, if your are interested.
There is a strange place in can_pernet_ops pernet subsys, I found:
static void can_pernet_exit(struct net *net)
{
...
rcu_read_lock();
for_each_netdev_rcu(net, dev) {
if (dev->type == ARPHRD_CAN && dev->ml_priv) {
struct can_dev_rcv_lists *d = dev->ml_priv;
BUG_ON(d->entries);
kfree(d);
dev->ml_priv = NULL;
}
}
rcu_read_unlock()
...
}
This code clears dev->ml_priv from can devices, and it looks strange.
Since can_pernet_ops is pernet subsys, it's executed after default_device_exit()
from default_device_ops pernet device, as devices exit methods are executed first
(see net/core/net_namespace.c).
There are no NETIF_F_NETNS_LOCAL devices among can devices, though there is
check of can_link_ops in safe_candev_priv(). I haven't found can devices may
have net_device::rtnl_link_ops. But the code seems want to allow them. Anyway,
it's wrong in any case:
1)If there are can devices, which may be skipped by default_device_exit(),
can_pernet_exit() must use rtnl_lock() instead of rcu_read_lock(), and
it must move such devices to init_net. See wifi cfg80211_pernet_exit() for example.
2)If there are no such the devices, the code between rcu_read_lock() and rcu_read_unlock()
is useless, and must be deleted, as it never works and confuses a reader.
Thanks,
Kirill
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [BUG/Q] can_pernet_exit() leaves devices on dead net
2018-03-01 15:53 [BUG/Q] can_pernet_exit() leaves devices on dead net Kirill Tkhai
@ 2018-03-05 13:59 ` Oliver Hartkopp
2018-03-05 15:22 ` Kirill Tkhai
0 siblings, 1 reply; 7+ messages in thread
From: Oliver Hartkopp @ 2018-03-05 13:59 UTC (permalink / raw)
To: Kirill Tkhai, mkl, davem, linux-can, netdev, dev
Hi Kirill,
On 03/01/2018 04:53 PM, Kirill Tkhai wrote:
> I'm converting/reviewing pernet_operations either they allow several net namespaces
> to be created/destroyed in parallel or not. Please, see the details in my recent
> patches in net-next.git, if your are interested.
Thanks for your effort to review all these different sites!
> There is a strange place in can_pernet_ops pernet subsys, I found:
>
> static void can_pernet_exit(struct net *net)
> {
> ...
> rcu_read_lock();
> for_each_netdev_rcu(net, dev) {
> if (dev->type == ARPHRD_CAN && dev->ml_priv) {
> struct can_dev_rcv_lists *d = dev->ml_priv;
>
> BUG_ON(d->entries);
> kfree(d);
> dev->ml_priv = NULL;
> }
> }
> rcu_read_unlock()
> ...
> }
>
> This code clears dev->ml_priv from can devices, and it looks strange.
To give some more background about these 'struct can_dev_rcv_lists':
The receive lists are managed by the AF_CAN framework in linux/net/can for
each CAN network device. When the per-net modules like can-raw, can-bcm or
can-gw are removed (or if there are no more open sockets or the netdevices are
removed) the CAN filters are removed too.
Finally - when can.ko is removed - the filters should be cleared (that's why
the BUG() statement checks the emptiness) and then the empty can_dev_rcv_lists
structure is free'd.
> Since can_pernet_ops is pernet subsys, it's executed after default_device_exit()
> from default_device_ops pernet device, as devices exit methods are executed first
> (see net/core/net_namespace.c).
Hm - a device exit fires the NETDEV_UNREGISTER notifier which removes the
user-generated filters (e.g. in raw_notifier() in net/can/raw.c).
Finally the can_dev_rcv_lists structure is free'd in af_can.c.
Marc Kleine-Budde recently proposed a patch to create the can_dev_rcv_lists at
netdevice creation time (-> the space is allocated by alloc_netdev() and
removed by free_netdev()). This would remove the handling (allocate & free) of
ml_priv by af_can.c. Would this rework fix the described issue?
> There are no NETIF_F_NETNS_LOCAL devices among can devices, though there is
> check of can_link_ops in safe_candev_priv(). I haven't found can devices may
> have net_device::rtnl_link_ops. But the code seems want to allow them.
We use rtnl_link_ops to create and remove virtual CAN interfaces (vcan.c and
vxcan.c) and to alter MTU values and bitrates for real CAN interfaces.
See:
https://elixir.bootlin.com/linux/latest/source/Documentation/networking/can.txt#L1001
https://elixir.bootlin.com/linux/latest/source/Documentation/networking/can.txt#L1041
> Anyway,
> it's wrong in any case:
>
> 1)If there are can devices, which may be skipped by default_device_exit(),
> can_pernet_exit() must use rtnl_lock() instead of rcu_read_lock(), and
> it must move such devices to init_net. See wifi cfg80211_pernet_exit() for example.
>
> 2)If there are no such the devices, the code between rcu_read_lock() and rcu_read_unlock()
> is useless, and must be deleted, as it never works and confuses a reader.
The latter would create a memory leak. Maybe the suggested change from Marc
would solve the entire problem then?
Thanks & best regards,
Oliver
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [BUG/Q] can_pernet_exit() leaves devices on dead net
2018-03-05 13:59 ` Oliver Hartkopp
@ 2018-03-05 15:22 ` Kirill Tkhai
2018-03-06 10:26 ` Oliver Hartkopp
0 siblings, 1 reply; 7+ messages in thread
From: Kirill Tkhai @ 2018-03-05 15:22 UTC (permalink / raw)
To: Oliver Hartkopp, mkl, davem, linux-can, netdev, dev
Hi, Oliver,
thanks for your reply.
On 05.03.2018 16:59, Oliver Hartkopp wrote:
> Hi Kirill,
>
> On 03/01/2018 04:53 PM, Kirill Tkhai wrote:
>
>> I'm converting/reviewing pernet_operations either they allow several net namespaces
>> to be created/destroyed in parallel or not. Please, see the details in my recent
>> patches in net-next.git, if your are interested.
>
> Thanks for your effort to review all these different sites!
>
>> There is a strange place in can_pernet_ops pernet subsys, I found:
>>
>> static void can_pernet_exit(struct net *net)
>> {
>> ...
>> rcu_read_lock();
>> for_each_netdev_rcu(net, dev) {
>> if (dev->type == ARPHRD_CAN && dev->ml_priv) {
>> struct can_dev_rcv_lists *d = dev->ml_priv;
>>
>> BUG_ON(d->entries);
>> kfree(d);
>> dev->ml_priv = NULL;
>> }
>> }
>> rcu_read_unlock()
>> ...
>> }
>>
>> This code clears dev->ml_priv from can devices, and it looks strange.
>
> To give some more background about these 'struct can_dev_rcv_lists':
>
> The receive lists are managed by the AF_CAN framework in linux/net/can for
> each CAN network device. When the per-net modules like can-raw, can-bcm or
> can-gw are removed (or if there are no more open sockets or the netdevices are
> removed) the CAN filters are removed too.
>
> Finally - when can.ko is removed - the filters should be cleared (that's why
> the BUG() statement checks the emptiness) and then the empty can_dev_rcv_lists
> structure is free'd.
Thanks for the explanation, and module unloading should be nice. Just to clarify,
I worry not about rules, but about netdevices.
unshare -n ip link add type vcan
This command creates net ns, adds vcan there and exits. Then net ns is destroyed.
Since vcan has rtnl_link_ops, it unregistered by default_device_exit_batch().
Real can devices are moved to init_net in default_device_exit(), as they don't
have rtnl_link_ops set.
So, for_each_netdev_rcu() cycle in can_pernet_exit() should be useless (there are
no can devices in the list of net's devices). This looks so for me, please say
what devices are there if my assumption is wrong.
>> Since can_pernet_ops is pernet subsys, it's executed after default_device_exit()
>> from default_device_ops pernet device, as devices exit methods are executed first
>> (see net/core/net_namespace.c).
>
> Hm - a device exit fires the NETDEV_UNREGISTER notifier which removes the
> user-generated filters (e.g. in raw_notifier() in net/can/raw.c).
> Finally the can_dev_rcv_lists structure is free'd in af_can.c.
>
> Marc Kleine-Budde recently proposed a patch to create the can_dev_rcv_lists at
> netdevice creation time (-> the space is allocated by alloc_netdev() and
> removed by free_netdev()). This would remove the handling (allocate & free) of
> ml_priv by af_can.c. Would this rework fix the described issue?
Could you please give me a link to the patches? I can't find them in patchwork.
>> There are no NETIF_F_NETNS_LOCAL devices among can devices, though there is
>> check of can_link_ops in safe_candev_priv(). I haven't found can devices may
>> have net_device::rtnl_link_ops. But the code seems want to allow them.
>
> We use rtnl_link_ops to create and remove virtual CAN interfaces (vcan.c and
> vxcan.c) and to alter MTU values and bitrates for real CAN interfaces.
>
> See:
>
> https://elixir.bootlin.com/linux/latest/source/Documentation/networking/can.txt#L1001
>
> https://elixir.bootlin.com/linux/latest/source/Documentation/networking/can.txt#L1041
>
>> Anyway,
>> it's wrong in any case:
>>
>> 1)If there are can devices, which may be skipped by default_device_exit(),
>> can_pernet_exit() must use rtnl_lock() instead of rcu_read_lock(), and
>> it must move such devices to init_net. See wifi cfg80211_pernet_exit() for example.
>>
>> 2)If there are no such the devices, the code between rcu_read_lock() and rcu_read_unlock()
>> is useless, and must be deleted, as it never works and confuses a reader.
>
> The latter would create a memory leak. Maybe the suggested change from Marc
> would solve the entire problem then?
Kirill
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [BUG/Q] can_pernet_exit() leaves devices on dead net
2018-03-05 15:22 ` Kirill Tkhai
@ 2018-03-06 10:26 ` Oliver Hartkopp
2018-03-16 9:19 ` Kirill Tkhai
0 siblings, 1 reply; 7+ messages in thread
From: Oliver Hartkopp @ 2018-03-06 10:26 UTC (permalink / raw)
To: Kirill Tkhai, mkl, linux-can, netdev, dev
- DaveM
Hi Kirill,
On 03/05/2018 04:22 PM, Kirill Tkhai wrote:
> Thanks for the explanation, and module unloading should be nice. Just to clarify,
> I worry not about rules, but about netdevices.
>
> unshare -n ip link add type vcan
>
> This command creates net ns, adds vcan there and exits. Then net ns is destroyed.
> Since vcan has rtnl_link_ops, it unregistered by default_device_exit_batch().
> Real can devices are moved to init_net in default_device_exit(), as they don't
> have rtnl_link_ops set.
In fact most of the real CAN drivers have rtnl_link_ops:
https://elixir.bootlin.com/linux/latest/source/drivers/net/can/dev.c#L1162
https://elixir.bootlin.com/linux/latest/source/drivers/net/can/dev.c#L1225
Just slcan.c which is something like slip.c is missing these feature.
AFAIK real CAN netdevices are created in init_net at system startup, and you
can move them to a netns later.
When we already have rtnl_link_ops in the real CAN drivers - what happens to
them when the namespace is destroyed? Are they still moved to init_net, or do
we need to add some default handler in the current rtnl_link_ops setup?
> So, for_each_netdev_rcu() cycle in can_pernet_exit() should be useless (there are
> no can devices in the list of net's devices). This looks so for me, please say
> what devices are there if my assumption is wrong.
See above?
>>> Since can_pernet_ops is pernet subsys, it's executed after default_device_exit()
>>> from default_device_ops pernet device, as devices exit methods are executed first
>>> (see net/core/net_namespace.c).
>>
>> Hm - a device exit fires the NETDEV_UNREGISTER notifier which removes the
>> user-generated filters (e.g. in raw_notifier() in net/can/raw.c).
>> Finally the can_dev_rcv_lists structure is free'd in af_can.c.
>>
>> Marc Kleine-Budde recently proposed a patch to create the can_dev_rcv_lists at
>> netdevice creation time (-> the space is allocated by alloc_netdev() and
>> removed by free_netdev()). This would remove the handling (allocate & free) of
>> ml_priv by af_can.c. Would this rework fix the described issue?
>
> Could you please give me a link to the patches? I can't find them in patchwork.
There was a patchset of 14 patches from Marc where some of the refactoring &
renaming already made it into mainline - but the patches to move the
can_dev_rcv_lists data structure into the network device space have not been
pushed:
https://marc.info/?l=linux-can&m=150169588319315&w=2
https://marc.info/?l=linux-can&r=1&b=201708&w=2
This patch & documentation describes Marc's proposed idea best:
https://marc.info/?l=linux-can&m=150169589619340&w=2
Best regards,
Oliver
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG/Q] can_pernet_exit() leaves devices on dead net
2018-03-06 10:26 ` Oliver Hartkopp
@ 2018-03-16 9:19 ` Kirill Tkhai
2018-04-02 15:28 ` Oliver Hartkopp
0 siblings, 1 reply; 7+ messages in thread
From: Kirill Tkhai @ 2018-03-16 9:19 UTC (permalink / raw)
To: Oliver Hartkopp, mkl, linux-can, netdev, dev
On 06.03.2018 13:26, Oliver Hartkopp wrote:
> - DaveM
>
> Hi Kirill,
>
> On 03/05/2018 04:22 PM, Kirill Tkhai wrote:
>
>> Thanks for the explanation, and module unloading should be nice. Just to clarify,
>> I worry not about rules, but about netdevices.
>>
>> unshare -n ip link add type vcan
>>
>> This command creates net ns, adds vcan there and exits. Then net ns is destroyed.
>> Since vcan has rtnl_link_ops, it unregistered by default_device_exit_batch().
>> Real can devices are moved to init_net in default_device_exit(), as they don't
>> have rtnl_link_ops set.
>
> In fact most of the real CAN drivers have rtnl_link_ops:
>
> https://elixir.bootlin.com/linux/latest/source/drivers/net/can/dev.c#L1162
> https://elixir.bootlin.com/linux/latest/source/drivers/net/can/dev.c#L1225
>
> Just slcan.c which is something like slip.c is missing these feature.
>
> AFAIK real CAN netdevices are created in init_net at system startup, and you
> can move them to a netns later.
>
> When we already have rtnl_link_ops in the real CAN drivers - what happens to
> them when the namespace is destroyed? Are they still moved to init_net, or do
> we need to add some default handler in the current rtnl_link_ops setup?
They are unregistered in default_device_exit_batch():
list_for_each_entry(net, net_list, exit_list) {
for_each_netdev_reverse(net, dev) {
if (dev->rtnl_link_ops && dev->rtnl_link_ops->dellink)
dev->rtnl_link_ops->dellink(dev, &dev_kill_list);
else
unregister_netdevice_queue(dev, &dev_kill_list);
}
}
unregister_netdevice_many(&dev_kill_list);
So that, my question in the start of the topic is about that. If we unregister them,
what devices we're going to meet in can_pernet_exit()?
Also, can devices not having rtnl_link_ops are moved into init_net in default_device_exit():
for_each_netdev_safe(net, dev, aux) {
if (dev->features & NETIF_F_NETNS_LOCAL)
continue;
/* Leave virtual devices for the generic cleanup */
if (dev->rtnl_link_ops)
continue;
err = dev_change_net_namespace(dev, &init_net, fb_name);
}
>> So, for_each_netdev_rcu() cycle in can_pernet_exit() should be useless (there are
>> no can devices in the list of net's devices). This looks so for me, please say
>> what devices are there if my assumption is wrong.
>
> See above?
>
>>>> Since can_pernet_ops is pernet subsys, it's executed after default_device_exit()
>>>> from default_device_ops pernet device, as devices exit methods are executed first
>>>> (see net/core/net_namespace.c).
>>>
>>> Hm - a device exit fires the NETDEV_UNREGISTER notifier which removes the
>>> user-generated filters (e.g. in raw_notifier() in net/can/raw.c).
>>> Finally the can_dev_rcv_lists structure is free'd in af_can.c.
>>>
>>> Marc Kleine-Budde recently proposed a patch to create the can_dev_rcv_lists at
>>> netdevice creation time (-> the space is allocated by alloc_netdev() and
>>> removed by free_netdev()). This would remove the handling (allocate & free) of
>>> ml_priv by af_can.c. Would this rework fix the described issue?
>>
>> Could you please give me a link to the patches? I can't find them in patchwork.
>
> There was a patchset of 14 patches from Marc where some of the refactoring &
> renaming already made it into mainline - but the patches to move the
> can_dev_rcv_lists data structure into the network device space have not been
> pushed:
>
> https://marc.info/?l=linux-can&m=150169588319315&w=2
> https://marc.info/?l=linux-can&r=1&b=201708&w=2
>
> This patch & documentation describes Marc's proposed idea best:
> https://marc.info/?l=linux-can&m=150169589619340&w=2
Yes, this one looks good:
https://marc.info/?l=linux-can&m=150169589119335&w=2
Regards,
Kirill
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [BUG/Q] can_pernet_exit() leaves devices on dead net
2018-03-16 9:19 ` Kirill Tkhai
@ 2018-04-02 15:28 ` Oliver Hartkopp
2018-04-02 15:36 ` Kirill Tkhai
0 siblings, 1 reply; 7+ messages in thread
From: Oliver Hartkopp @ 2018-04-02 15:28 UTC (permalink / raw)
To: Kirill Tkhai, mkl; +Cc: linux-can, netdev, dev
Hi Kirill, Marc,
I checked the code once more and added some debug output to the other
parts in CAN notifier code.
In fact the code pointed to by both of you seems to be obsolete as I
only wanted to be 'really sure' that no leftovers of the CAN filters at
module unloading.
> Yes, this one looks good:
> https://marc.info/?l=linux-can&m=150169589119335&w=2
>
> Regards,
> Kirill
>
I was obviously too cautious ;-)
All tests I made resulted in the function iterating through all the CAN
netdevices doing exactly nothing.
I'm fine with removing that stuff - but I'm not sure whether it's worth
to push that patch to stable 4.12+ or even before 4.12 (without
namespace support - and removing rcu_barrier() too).
Any opinions?
Best regards,
Oliver
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG/Q] can_pernet_exit() leaves devices on dead net
2018-04-02 15:28 ` Oliver Hartkopp
@ 2018-04-02 15:36 ` Kirill Tkhai
0 siblings, 0 replies; 7+ messages in thread
From: Kirill Tkhai @ 2018-04-02 15:36 UTC (permalink / raw)
To: Oliver Hartkopp, mkl; +Cc: linux-can, netdev, dev
Hi, Oliver,
On 02.04.2018 18:28, Oliver Hartkopp wrote:
> Hi Kirill, Marc,
>
> I checked the code once more and added some debug output to the other parts in CAN notifier code.
>
> In fact the code pointed to by both of you seems to be obsolete as I only wanted to be 'really sure' that no leftovers of the CAN filters at module unloading.
>
>
>> Yes, this one looks good:
>> https://marc.info/?l=linux-can&m=150169589119335&w=2
>>
>> Regards,
>> Kirill
>>
>
> I was obviously too cautious ;-)
>
> All tests I made resulted in the function iterating through all the CAN netdevices doing exactly nothing.
>
> I'm fine with removing that stuff - but I'm not sure whether it's worth to push that patch to stable 4.12+ or even before 4.12 (without namespace support - and removing rcu_barrier() too).
>
> Any opinions?
I think the same -- it's not need for stable as there is just iteration over empty list, i.e., noop.
Kirill
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-04-02 15:36 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-03-01 15:53 [BUG/Q] can_pernet_exit() leaves devices on dead net Kirill Tkhai
2018-03-05 13:59 ` Oliver Hartkopp
2018-03-05 15:22 ` Kirill Tkhai
2018-03-06 10:26 ` Oliver Hartkopp
2018-03-16 9:19 ` Kirill Tkhai
2018-04-02 15:28 ` Oliver Hartkopp
2018-04-02 15:36 ` Kirill Tkhai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox