* [PATCH 0/3] ipvlan: fix ipv6 autoconfiguration via RA
@ 2015-05-14 13:56 Konstantin Khlebnikov
2015-05-14 13:56 ` [PATCH 1/3] ipv6: make inet6addr_chain blocking and always call with rtnl locked Konstantin Khlebnikov
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Konstantin Khlebnikov @ 2015-05-14 13:56 UTC (permalink / raw)
To: netdev, David S. Miller; +Cc: Mahesh Bandewar, Jiri Benc, Hannes Frederic Sowa
This patchset fixes ipvlan and its interaction with ipv6 RA. Now ipvlan l2
ports get dev_id and construct unique ipv6 addresses using one mac address.
---
Konstantin Khlebnikov (3):
ipv6: make inet6addr_chain blocking and always call with rtnl locked
ipvlan: grab rcu_read_lock on xmit path
ipvlan: set dev_id for l2 ports to generate unique IPv6 addresses
Documentation/networking/ipvlan.txt | 12 +++++++++++-
drivers/net/ipvlan/ipvlan.h | 1 +
drivers/net/ipvlan/ipvlan_main.c | 24 ++++++++++++++++++++++++
net/ipv6/addrconf.c | 7 ++++---
net/ipv6/addrconf_core.c | 8 ++++----
5 files changed, 44 insertions(+), 8 deletions(-)
--
Konstantin
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/3] ipv6: make inet6addr_chain blocking and always call with rtnl locked
2015-05-14 13:56 [PATCH 0/3] ipvlan: fix ipv6 autoconfiguration via RA Konstantin Khlebnikov
@ 2015-05-14 13:56 ` Konstantin Khlebnikov
2015-05-16 21:22 ` David Miller
2015-05-14 13:56 ` [PATCH 2/3] ipvlan: grab rcu_read_lock on xmit path Konstantin Khlebnikov
2015-05-14 13:56 ` [PATCH 3/3] ipvlan: set dev_id for l2 ports to generate unique IPv6 addresses Konstantin Khlebnikov
2 siblings, 1 reply; 12+ messages in thread
From: Konstantin Khlebnikov @ 2015-05-14 13:56 UTC (permalink / raw)
To: netdev, David S. Miller; +Cc: Mahesh Bandewar, Jiri Benc, Hannes Frederic Sowa
Unlike to inetaddr_chain inet6addr_chain is atomic and called from bh
context without rtnl when ipv6 receives router advertisement packet.
Several drivers don't know about that: ipvlan thinks that it has rtnl
here, ocrdma locks mutex inside callback. Probably there is more.
This patch makes it blocking and calls from first stage of DAD work.
Looks like this is completely safe and rtnl already locked here.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
Splash in ipvlan when ipv6 addrconf got RA:
[ 44.673784] RTNL: assertion failed at drivers/net/ipvlan/ipvlan_core.c (114)
[ 44.674761] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.1.0-rc3+ #69
[ 44.674763] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011
[ 44.674765] ffff880216273000 ffff88021613b7e8 ffffffff818c6054 ffff88021fc10c10
[ 44.674767] 0000000000000000 ffff88021613b818 ffffffff815544e8 0000000000000000
[ 44.674769] ffff8800db5d0000 ffff880216270000 ffff8802162707c0 ffff88021613b848
[ 44.674771] Call Trace:
[ 44.674787] [<ffffffff818c6054>] dump_stack+0x45/0x57
[ 44.674800] [<ffffffff815544e8>] ipvlan_addr_busy+0x98/0xa0
[ 44.674802] [<ffffffff81555755>] ipvlan_addr6_event+0x115/0x200
[ 44.674810] [<ffffffff810733fd>] notifier_call_chain+0x4d/0x70
[ 44.674812] [<ffffffff81073445>] atomic_notifier_call_chain+0x15/0x20
[ 44.674829] [<ffffffff817dbb16>] inet6addr_notifier_call_chain+0x16/0x20
[ 44.674836] [<ffffffff817a46c0>] ipv6_add_addr+0xa0/0x420
[ 44.674838] [<ffffffff817aa058>] addrconf_prefix_rcv+0x568/0x7d0
[ 44.674842] [<ffffffff817b7bad>] ndisc_rcv+0x8ad/0xf40
[ 44.674845] [<ffffffff817bec10>] icmpv6_rcv+0x430/0x870
[ 44.674847] [<ffffffff817dbd56>] ? ipv6_skip_exthdr+0x46/0x170
[ 44.674850] [<ffffffff818cfe79>] ? _raw_read_unlock_bh+0x19/0x20
[ 44.674852] [<ffffffff817c33f0>] ? ipv6_chk_mcast_addr+0x110/0x130
[ 44.674854] [<ffffffff817a1ffb>] ip6_input_finish+0x11b/0x3e0
[ 44.674855] [<ffffffff817a281a>] ip6_input+0x6a/0x80
[ 44.674857] [<ffffffff817c330a>] ? ipv6_chk_mcast_addr+0x2a/0x130
[ 44.674859] [<ffffffff817a1ee0>] ? ip6_rcv_finish+0xa0/0xa0
[ 44.674860] [<ffffffff817a28c0>] ip6_mc_input+0x90/0xb0
[ 44.674861] [<ffffffff817a1edd>] ip6_rcv_finish+0x9d/0xa0
[ 44.674863] [<ffffffff817a2602>] ipv6_rcv+0x342/0x4f0
[ 44.674864] [<ffffffff817a1e40>] ? ip6_make_skb+0x1a0/0x1a0
[ 44.674873] [<ffffffff816dc876>] __netif_receive_skb_core+0x216/0x900
[ 44.674877] [<ffffffff8155ce00>] ? virtnet_receive+0x170/0x750
[ 44.674879] [<ffffffff816dcf81>] __netif_receive_skb+0x21/0x70
[ 44.674880] [<ffffffff816dd06d>] process_backlog+0x9d/0x140
[ 44.674882] [<ffffffff816dd896>] net_rx_action+0x146/0x330
[ 44.674884] [<ffffffff81089baa>] ? pick_next_task_fair+0x47a/0x490
[ 44.674887] [<ffffffff81059e37>] __do_softirq+0xe7/0x2e0
[ 44.674888] [<ffffffff8105a04b>] run_ksoftirqd+0x1b/0x60
[ 44.674890] [<ffffffff810758c6>] smpboot_thread_fn+0x116/0x170
[ 44.674891] [<ffffffff810757b0>] ? sort_range+0x20/0x20
[ 44.674893] [<ffffffff81072924>] kthread+0xc4/0xe0
[ 44.674895] [<ffffffff81000303>] ? do_one_initcall+0xb3/0x1c0
[ 44.674897] [<ffffffff81072860>] ? flush_kthread_worker+0x90/0x90
[ 44.674899] [<ffffffff818d07e2>] ret_from_fork+0x42/0x70
[ 44.674901] [<ffffffff81072860>] ? flush_kthread_worker+0x90/0x90
---
net/ipv6/addrconf.c | 7 ++++---
net/ipv6/addrconf_core.c | 8 ++++----
2 files changed, 8 insertions(+), 7 deletions(-)
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 37b70e82bff8..61512e10ec92 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -910,9 +910,7 @@ ipv6_add_addr(struct inet6_dev *idev, const struct in6_addr *addr,
out2:
rcu_read_unlock_bh();
- if (likely(err == 0))
- inet6addr_notifier_call_chain(NETDEV_UP, ifa);
- else {
+ if (err) {
kfree(ifa);
ifa = ERR_PTR(err);
}
@@ -2735,6 +2733,7 @@ static void add_addr(struct inet6_dev *idev, const struct in6_addr *addr,
spin_lock_bh(&ifp->lock);
ifp->flags &= ~IFA_F_TENTATIVE;
spin_unlock_bh(&ifp->lock);
+ inet6addr_notifier_call_chain(NETDEV_UP, ifp);
ipv6_ifa_notify(RTM_NEWADDR, ifp);
in6_ifa_put(ifp);
}
@@ -3415,6 +3414,8 @@ static void addrconf_dad_begin(struct inet6_ifaddr *ifp)
struct inet6_dev *idev = ifp->idev;
struct net_device *dev = idev->dev;
+ inet6addr_notifier_call_chain(NETDEV_UP, ifp);
+
addrconf_join_solict(dev, &ifp->addr);
prandom_seed((__force u32) ifp->addr.s6_addr32[3]);
diff --git a/net/ipv6/addrconf_core.c b/net/ipv6/addrconf_core.c
index d873ceea86e6..6997bd7946d3 100644
--- a/net/ipv6/addrconf_core.c
+++ b/net/ipv6/addrconf_core.c
@@ -87,23 +87,23 @@ int __ipv6_addr_type(const struct in6_addr *addr)
}
EXPORT_SYMBOL(__ipv6_addr_type);
-static ATOMIC_NOTIFIER_HEAD(inet6addr_chain);
+static BLOCKING_NOTIFIER_HEAD(inet6addr_chain);
int register_inet6addr_notifier(struct notifier_block *nb)
{
- return atomic_notifier_chain_register(&inet6addr_chain, nb);
+ return blocking_notifier_chain_register(&inet6addr_chain, nb);
}
EXPORT_SYMBOL(register_inet6addr_notifier);
int unregister_inet6addr_notifier(struct notifier_block *nb)
{
- return atomic_notifier_chain_unregister(&inet6addr_chain, nb);
+ return blocking_notifier_chain_unregister(&inet6addr_chain, nb);
}
EXPORT_SYMBOL(unregister_inet6addr_notifier);
int inet6addr_notifier_call_chain(unsigned long val, void *v)
{
- return atomic_notifier_call_chain(&inet6addr_chain, val, v);
+ return blocking_notifier_call_chain(&inet6addr_chain, val, v);
}
EXPORT_SYMBOL(inet6addr_notifier_call_chain);
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/3] ipvlan: grab rcu_read_lock on xmit path
2015-05-14 13:56 [PATCH 0/3] ipvlan: fix ipv6 autoconfiguration via RA Konstantin Khlebnikov
2015-05-14 13:56 ` [PATCH 1/3] ipv6: make inet6addr_chain blocking and always call with rtnl locked Konstantin Khlebnikov
@ 2015-05-14 13:56 ` Konstantin Khlebnikov
2015-05-19 23:33 ` Mahesh Bandewar
2015-05-14 13:56 ` [PATCH 3/3] ipvlan: set dev_id for l2 ports to generate unique IPv6 addresses Konstantin Khlebnikov
2 siblings, 1 reply; 12+ messages in thread
From: Konstantin Khlebnikov @ 2015-05-14 13:56 UTC (permalink / raw)
To: netdev, David S. Miller; +Cc: Mahesh Bandewar, Jiri Benc, Hannes Frederic Sowa
ipvlan_start_xmit() is called with rcu_read_lock_bh() while its internal
structures requre normal rcu_read_lock().
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
[ 802.945151] ===============================
[ 802.945160] [ INFO: suspicious RCU usage. ]
[ 802.945164] 4.1.0-rc3+ #71 Not tainted
[ 802.945165] -------------------------------
[ 802.945167] drivers/net/ipvlan/ipvlan.h:103 suspicious rcu_dereference_check() usage!
[ 802.945168]
[ 802.945168] other info that might help us debug this:
[ 802.945168]
[ 802.945170]
[ 802.945170] rcu_scheduler_active = 1, debug_locks = 1
[ 802.945173] 3 locks held by ping6/3813:
[ 802.945174] #0: (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff81880f72>] rawv6_sendmsg+0x512/0xbb0
[ 802.945197] #1: (rcu_read_lock_bh){......}, at: [<ffffffff8185e577>] ip6_finish_output2+0x57/0x790
[ 802.945205] #2: (rcu_read_lock_bh){......}, at: [<ffffffff8177a68b>] __dev_queue_xmit+0x4b/0x930
[ 802.945218]
[ 802.945218] stack backtrace:
[ 802.945221] CPU: 2 PID: 3813 Comm: ping6 Not tainted 4.1.0-rc3+ #71
[ 802.945222] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011
[ 802.945224] 0000000000000001 ffff8800db7b7888 ffffffff819de6f8 0000000000000007
[ 802.945226] ffff8800db738000 ffff8800db7b78b8 ffffffff810a7f92 ffff880214fc8c00
[ 802.945229] ffff88021595b000 ffff88021595a000 ffff880214adf800 ffff8800db7b7968
[ 802.945232] Call Trace:
[ 802.945248] [<ffffffff819de6f8>] dump_stack+0x4c/0x65
[ 802.945253] [<ffffffff810a7f92>] lockdep_rcu_suspicious+0xe2/0x130
[ 802.945265] [<ffffffff815e24ac>] ipvlan_queue_xmit+0x17c/0x5a0
[ 802.945268] [<ffffffff810aa5a4>] ? __lock_is_held+0x54/0x70
[ 802.945271] [<ffffffff815e300c>] ipvlan_start_xmit+0x1c/0x50
[ 802.945272] [<ffffffff8177a117>] dev_hard_start_xmit+0x2f7/0x820
[ 802.945274] [<ffffffff817797b6>] ? netif_skb_features+0xf6/0x1d0
[ 802.945276] [<ffffffff81779b84>] ? validate_xmit_skb.isra.99.part.100+0x24/0x2c0
[ 802.945278] [<ffffffff8177ad04>] __dev_queue_xmit+0x6c4/0x930
[ 802.945280] [<ffffffff8177a68b>] ? __dev_queue_xmit+0x4b/0x930
[ 802.945281] [<ffffffff810a963a>] ? mark_held_locks+0x6a/0x90
[ 802.945283] [<ffffffff8177af8e>] dev_queue_xmit_sk+0xe/0x10
[ 802.945285] [<ffffffff8185e824>] ip6_finish_output2+0x304/0x790
[ 802.945287] [<ffffffff81860dde>] ? ip6_finish_output+0x9e/0x1e0
[ 802.945288] [<ffffffff81860dde>] ip6_finish_output+0x9e/0x1e0
[ 802.945290] [<ffffffff81860fdb>] ip6_output+0xbb/0x180
[ 802.945302] [<ffffffff818a7d0a>] ? ip6_find_1stfragopt+0x9a/0xa0
[ 802.945304] [<ffffffff81860d40>] ? ip6_fragment+0xe80/0xe80
[ 802.945306] [<ffffffff818a805c>] ip6_local_out_sk+0x2c/0x70
[ 802.945308] [<ffffffff818a80b0>] ip6_local_out+0x10/0x20
[ 802.945309] [<ffffffff81861571>] ip6_send_skb+0x31/0xd0
[ 802.945311] [<ffffffff81861644>] ip6_push_pending_frames+0x34/0x40
[ 802.945313] [<ffffffff81881368>] rawv6_sendmsg+0x908/0xbb0
[ 802.945328] [<ffffffff810aa5a4>] ? __lock_is_held+0x54/0x70
[ 802.945340] [<ffffffff81817e9e>] inet_sendmsg+0x10e/0x1f0
[ 802.945343] [<ffffffff81817d90>] ? inet_recvmsg+0x200/0x200
[ 802.945351] [<ffffffff81758275>] sock_sendmsg+0x45/0x50
[ 802.945354] [<ffffffff8175a719>] SYSC_sendto+0xd9/0x110
[ 802.945357] [<ffffffff8175ac99>] SyS_sendto+0x9/0x10
[ 802.945362] [<ffffffff819ebfae>] system_call_fastpath+0x12/0x76
---
drivers/net/ipvlan/ipvlan_main.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 77b92a0fe557..0cafd3e6f02d 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -173,6 +173,7 @@ static int ipvlan_stop(struct net_device *dev)
return 0;
}
+/* called with rcu_read_lock_bh. */
static netdev_tx_t ipvlan_start_xmit(struct sk_buff *skb,
struct net_device *dev)
{
@@ -180,6 +181,7 @@ static netdev_tx_t ipvlan_start_xmit(struct sk_buff *skb,
int skblen = skb->len;
int ret;
+ rcu_read_lock();
ret = ipvlan_queue_xmit(skb, dev);
if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)) {
struct ipvl_pcpu_stats *pcptr;
@@ -193,6 +195,8 @@ static netdev_tx_t ipvlan_start_xmit(struct sk_buff *skb,
} else {
this_cpu_inc(ipvlan->pcpu_stats->tx_drps);
}
+ rcu_read_unlock();
+
return ret;
}
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 3/3] ipvlan: set dev_id for l2 ports to generate unique IPv6 addresses
2015-05-14 13:56 [PATCH 0/3] ipvlan: fix ipv6 autoconfiguration via RA Konstantin Khlebnikov
2015-05-14 13:56 ` [PATCH 1/3] ipv6: make inet6addr_chain blocking and always call with rtnl locked Konstantin Khlebnikov
2015-05-14 13:56 ` [PATCH 2/3] ipvlan: grab rcu_read_lock on xmit path Konstantin Khlebnikov
@ 2015-05-14 13:56 ` Konstantin Khlebnikov
2015-05-19 23:59 ` Mahesh Bandewar
2 siblings, 1 reply; 12+ messages in thread
From: Konstantin Khlebnikov @ 2015-05-14 13:56 UTC (permalink / raw)
To: netdev, David S. Miller; +Cc: Mahesh Bandewar, Jiri Benc, Hannes Frederic Sowa
All ipvlan ports use one MAC address, this way ipv6 RA tries to assign
one ipv6 address to all of them. This patch assigns unique dev_id to each
ipvlan port. This field is used instead of common FF-FE in Modified EUI-64.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
Documentation/networking/ipvlan.txt | 12 +++++++++++-
drivers/net/ipvlan/ipvlan.h | 1 +
drivers/net/ipvlan/ipvlan_main.c | 20 ++++++++++++++++++++
3 files changed, 32 insertions(+), 1 deletion(-)
diff --git a/Documentation/networking/ipvlan.txt b/Documentation/networking/ipvlan.txt
index cf996394e466..cb0b777bce58 100644
--- a/Documentation/networking/ipvlan.txt
+++ b/Documentation/networking/ipvlan.txt
@@ -24,7 +24,7 @@ using IProute2/ip utility.
ip link add link <master-dev> <slave-dev> type ipvlan mode { l2 | L3 }
- e.g. ip link add link ipvl0 eth0 type ipvlan mode l2
+ e.g. ip link add link eth0 ipvl0 type ipvlan mode l2
4. Operating modes:
@@ -41,6 +41,15 @@ slave device and packets are switched and queued to the master device to send
out. In this mode the slaves will RX/TX multicast and broadcast (if applicable)
as well.
+ In L2 mode slave devices receive Router Advertisements from the network
+and perform autoconfiguration as well as master device. Each port has unique
+16-bit device id which is used for filling octets 4-5 of Modified EUI-64.
+That gives 65533 addresses (FF-FE used by master, FF-FF/00-00 reserved/not used).
+
+ Also lower half of IPv6 address could be set as interface token:
+
+ ip token set ::aaaa:bbbb:cccc:dddd dev ipvl0
+
4.2 L3 mode:
In this mode TX processing upto L3 happens on the stack instance attached
to the slave device and packets are switched to the stack instance of the
@@ -105,3 +114,4 @@ namespace where L2 on the slave could be changed / misused.
(4) ip -4 addr add 127.0.0.1 dev lo
(5) ip -4 addr add $IPADDR dev ipvl1
(6) ip -4 route add default via $ROUTER dev ipvl1
+
diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
index 54549a6223dd..1ebab84e7a0e 100644
--- a/drivers/net/ipvlan/ipvlan.h
+++ b/drivers/net/ipvlan/ipvlan.h
@@ -95,6 +95,7 @@ struct ipvl_port {
struct rcu_head rcu;
int count;
u16 mode;
+ struct ida ida;
};
static inline struct ipvl_port *ipvlan_port_get_rcu(const struct net_device *d)
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 0cafd3e6f02d..dee0e8441150 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -53,6 +53,7 @@ static int ipvlan_port_create(struct net_device *dev)
INIT_LIST_HEAD(&port->ipvlans);
for (idx = 0; idx < IPVLAN_HASH_SIZE; idx++)
INIT_HLIST_HEAD(&port->hlhead[idx]);
+ ida_init(&port->ida);
err = netdev_rx_handler_register(dev, ipvlan_handle_frame, port);
if (err)
@@ -72,6 +73,7 @@ static void ipvlan_port_destroy(struct net_device *dev)
dev->priv_flags &= ~IFF_IPVLAN_MASTER;
netdev_rx_handler_unregister(dev);
+ ida_destroy(&port->ida);
kfree_rcu(port, rcu);
}
@@ -484,6 +486,18 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
*/
memcpy(dev->dev_addr, phy_dev->dev_addr, ETH_ALEN);
+ if (port->mode == IPVLAN_MODE_L2) {
+ /*
+ * IPv6 addrconf uses it to produce unique addresses,
+ * see function addrconf_ifid_eui48.
+ */
+ err = ida_simple_get(&port->ida, 1, 0xFFFE, GFP_KERNEL);
+ if (err > 0)
+ dev->dev_id = err;
+ else if (err != -ENOSPC)
+ goto ipvlan_destroy_port;
+ }
+
dev->priv_flags |= IFF_IPVLAN_SLAVE;
port->count += 1;
@@ -518,6 +532,12 @@ static void ipvlan_link_delete(struct net_device *dev, struct list_head *head)
list_del(&addr->anode);
}
}
+
+ if (dev->dev_id) {
+ ida_simple_remove(&ipvlan->port->ida, dev->dev_id);
+ dev->dev_id = 0;
+ }
+
list_del_rcu(&ipvlan->pnode);
unregister_netdevice_queue(dev, head);
netdev_upper_dev_unlink(ipvlan->phy_dev, dev);
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] ipv6: make inet6addr_chain blocking and always call with rtnl locked
2015-05-14 13:56 ` [PATCH 1/3] ipv6: make inet6addr_chain blocking and always call with rtnl locked Konstantin Khlebnikov
@ 2015-05-16 21:22 ` David Miller
2015-05-18 10:13 ` Konstantin Khlebnikov
0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2015-05-16 21:22 UTC (permalink / raw)
To: khlebnikov; +Cc: netdev, maheshb, jbenc, hannes
From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Date: Thu, 14 May 2015 16:56:18 +0300
> Unlike to inetaddr_chain inet6addr_chain is atomic and called from bh
> context without rtnl when ipv6 receives router advertisement packet.
>
> Several drivers don't know about that: ipvlan thinks that it has rtnl
> here, ocrdma locks mutex inside callback. Probably there is more.
>
> This patch makes it blocking and calls from first stage of DAD work.
> Looks like this is completely safe and rtnl already locked here.
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
I don't see how you can make the inet6addr_chain blocking when it is
invoked from software interrupt context.
You also cannot try to defer these operations to a workqueue or
similar to get into a blockable context, because various ipv6
testsuites depend upon the addressing state change happening
when we process the packet that triggers that change.
Instead, I think you have to make the users of inet6addr_chain
aware of the context in which they execute.
Thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] ipv6: make inet6addr_chain blocking and always call with rtnl locked
2015-05-16 21:22 ` David Miller
@ 2015-05-18 10:13 ` Konstantin Khlebnikov
0 siblings, 0 replies; 12+ messages in thread
From: Konstantin Khlebnikov @ 2015-05-18 10:13 UTC (permalink / raw)
To: David Miller; +Cc: netdev, maheshb, jbenc, hannes
On 17.05.2015 00:22, David Miller wrote:
> From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Date: Thu, 14 May 2015 16:56:18 +0300
>
>> Unlike to inetaddr_chain inet6addr_chain is atomic and called from bh
>> context without rtnl when ipv6 receives router advertisement packet.
>>
>> Several drivers don't know about that: ipvlan thinks that it has rtnl
>> here, ocrdma locks mutex inside callback. Probably there is more.
>>
>> This patch makes it blocking and calls from first stage of DAD work.
>> Looks like this is completely safe and rtnl already locked here.
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>
> I don't see how you can make the inet6addr_chain blocking when it is
> invoked from software interrupt context.
>
> You also cannot try to defer these operations to a workqueue or
> similar to get into a blockable context, because various ipv6
> testsuites depend upon the addressing state change happening
> when we process the packet that triggers that change.
>
> Instead, I think you have to make the users of inet6addr_chain
> aware of the context in which they execute.
>
> Thanks.
>
I've defer only calls of inet6addr_notifier_call_chain.
Ipv6 addresses still appears right in interrupt context.
Ordering with netlink events RTM_NEWADDR/RTM_DELADDR stays the same:
inet6addr_notifier_call_chain(NETDEV_UP, ifp);
ipv6_ifa_notify(RTM_NEWADDR, ifp);
...
ipv6_ifa_notify(RTM_DELADDR, ifp);
inet6addr_notifier_call_chain(NETDEV_DOWN, ifp);
As I see ipv6 always send RTM_NEWADDR from dad-work even for
IFA_F_OPTIMISTIC addresses
The only visible change is ordering with event RTM_NEWPREFIX.
And another problem which my patch fixes. at this path:
addrconf_prefix_rcv -> ipv6_add_addr -> inet6addr_notifier_call_chain
inet6addr_notifier_call_chain called without any locks.
Theoretically NETDEV_DOWN event could be delivered before NETDEV_UP
if somebody removes that half-baked address right in that moment.
--
Konstantin
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] ipvlan: grab rcu_read_lock on xmit path
2015-05-14 13:56 ` [PATCH 2/3] ipvlan: grab rcu_read_lock on xmit path Konstantin Khlebnikov
@ 2015-05-19 23:33 ` Mahesh Bandewar
2015-05-21 9:51 ` Konstantin Khlebnikov
0 siblings, 1 reply; 12+ messages in thread
From: Mahesh Bandewar @ 2015-05-19 23:33 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: linux-netdev, David S. Miller, Jiri Benc, Hannes Frederic Sowa
On Thu, May 14, 2015 at 6:56 AM, Konstantin Khlebnikov
<khlebnikov@yandex-team.ru> wrote:
>
> ipvlan_start_xmit() is called with rcu_read_lock_bh() while its internal
> structures requre normal rcu_read_lock().
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>
> ---
>
> [ 802.945151] ===============================
> [ 802.945160] [ INFO: suspicious RCU usage. ]
> [ 802.945164] 4.1.0-rc3+ #71 Not tainted
> [ 802.945165] -------------------------------
> [ 802.945167] drivers/net/ipvlan/ipvlan.h:103 suspicious rcu_dereference_check() usage!
> [ 802.945168]
> [ 802.945168] other info that might help us debug this:
> [ 802.945168]
> [ 802.945170]
> [ 802.945170] rcu_scheduler_active = 1, debug_locks = 1
> [ 802.945173] 3 locks held by ping6/3813:
> [ 802.945174] #0: (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff81880f72>] rawv6_sendmsg+0x512/0xbb0
> [ 802.945197] #1: (rcu_read_lock_bh){......}, at: [<ffffffff8185e577>] ip6_finish_output2+0x57/0x790
> [ 802.945205] #2: (rcu_read_lock_bh){......}, at: [<ffffffff8177a68b>] __dev_queue_xmit+0x4b/0x930
> [ 802.945218]
> [ 802.945218] stack backtrace:
> [ 802.945221] CPU: 2 PID: 3813 Comm: ping6 Not tainted 4.1.0-rc3+ #71
> [ 802.945222] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011
> [ 802.945224] 0000000000000001 ffff8800db7b7888 ffffffff819de6f8 0000000000000007
> [ 802.945226] ffff8800db738000 ffff8800db7b78b8 ffffffff810a7f92 ffff880214fc8c00
> [ 802.945229] ffff88021595b000 ffff88021595a000 ffff880214adf800 ffff8800db7b7968
> [ 802.945232] Call Trace:
> [ 802.945248] [<ffffffff819de6f8>] dump_stack+0x4c/0x65
> [ 802.945253] [<ffffffff810a7f92>] lockdep_rcu_suspicious+0xe2/0x130
> [ 802.945265] [<ffffffff815e24ac>] ipvlan_queue_xmit+0x17c/0x5a0
> [ 802.945268] [<ffffffff810aa5a4>] ? __lock_is_held+0x54/0x70
> [ 802.945271] [<ffffffff815e300c>] ipvlan_start_xmit+0x1c/0x50
> [ 802.945272] [<ffffffff8177a117>] dev_hard_start_xmit+0x2f7/0x820
> [ 802.945274] [<ffffffff817797b6>] ? netif_skb_features+0xf6/0x1d0
> [ 802.945276] [<ffffffff81779b84>] ? validate_xmit_skb.isra.99.part.100+0x24/0x2c0
> [ 802.945278] [<ffffffff8177ad04>] __dev_queue_xmit+0x6c4/0x930
> [ 802.945280] [<ffffffff8177a68b>] ? __dev_queue_xmit+0x4b/0x930
> [ 802.945281] [<ffffffff810a963a>] ? mark_held_locks+0x6a/0x90
> [ 802.945283] [<ffffffff8177af8e>] dev_queue_xmit_sk+0xe/0x10
> [ 802.945285] [<ffffffff8185e824>] ip6_finish_output2+0x304/0x790
> [ 802.945287] [<ffffffff81860dde>] ? ip6_finish_output+0x9e/0x1e0
> [ 802.945288] [<ffffffff81860dde>] ip6_finish_output+0x9e/0x1e0
> [ 802.945290] [<ffffffff81860fdb>] ip6_output+0xbb/0x180
> [ 802.945302] [<ffffffff818a7d0a>] ? ip6_find_1stfragopt+0x9a/0xa0
> [ 802.945304] [<ffffffff81860d40>] ? ip6_fragment+0xe80/0xe80
> [ 802.945306] [<ffffffff818a805c>] ip6_local_out_sk+0x2c/0x70
> [ 802.945308] [<ffffffff818a80b0>] ip6_local_out+0x10/0x20
> [ 802.945309] [<ffffffff81861571>] ip6_send_skb+0x31/0xd0
> [ 802.945311] [<ffffffff81861644>] ip6_push_pending_frames+0x34/0x40
> [ 802.945313] [<ffffffff81881368>] rawv6_sendmsg+0x908/0xbb0
> [ 802.945328] [<ffffffff810aa5a4>] ? __lock_is_held+0x54/0x70
> [ 802.945340] [<ffffffff81817e9e>] inet_sendmsg+0x10e/0x1f0
> [ 802.945343] [<ffffffff81817d90>] ? inet_recvmsg+0x200/0x200
> [ 802.945351] [<ffffffff81758275>] sock_sendmsg+0x45/0x50
> [ 802.945354] [<ffffffff8175a719>] SYSC_sendto+0xd9/0x110
> [ 802.945357] [<ffffffff8175ac99>] SyS_sendto+0x9/0x10
> [ 802.945362] [<ffffffff819ebfae>] system_call_fastpath+0x12/0x76
> ---
> drivers/net/ipvlan/ipvlan_main.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
> index 77b92a0fe557..0cafd3e6f02d 100644
> --- a/drivers/net/ipvlan/ipvlan_main.c
> +++ b/drivers/net/ipvlan/ipvlan_main.c
> @@ -173,6 +173,7 @@ static int ipvlan_stop(struct net_device *dev)
> return 0;
> }
>
> +/* called with rcu_read_lock_bh. */
> static netdev_tx_t ipvlan_start_xmit(struct sk_buff *skb,
> struct net_device *dev)
> {
> @@ -180,6 +181,7 @@ static netdev_tx_t ipvlan_start_xmit(struct sk_buff *skb,
> int skblen = skb->len;
> int ret;
>
> + rcu_read_lock();
I don't believe this is correct. The correct way would be the way it
is fixed in the following patch -
https://patchwork.ozlabs.org/patch/471481/
> ret = ipvlan_queue_xmit(skb, dev);
> if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)) {
> struct ipvl_pcpu_stats *pcptr;
> @@ -193,6 +195,8 @@ static netdev_tx_t ipvlan_start_xmit(struct sk_buff *skb,
> } else {
> this_cpu_inc(ipvlan->pcpu_stats->tx_drps);
> }
> + rcu_read_unlock();
> +
> return ret;
> }
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] ipvlan: set dev_id for l2 ports to generate unique IPv6 addresses
2015-05-14 13:56 ` [PATCH 3/3] ipvlan: set dev_id for l2 ports to generate unique IPv6 addresses Konstantin Khlebnikov
@ 2015-05-19 23:59 ` Mahesh Bandewar
2015-05-21 11:38 ` Konstantin Khlebnikov
0 siblings, 1 reply; 12+ messages in thread
From: Mahesh Bandewar @ 2015-05-19 23:59 UTC (permalink / raw)
To: Konstantin Khlebnikov
Cc: linux-netdev, David S. Miller, Jiri Benc, Hannes Frederic Sowa
On Thu, May 14, 2015 at 6:56 AM, Konstantin Khlebnikov
<khlebnikov@yandex-team.ru> wrote:
> All ipvlan ports use one MAC address, this way ipv6 RA tries to assign
> one ipv6 address to all of them. This patch assigns unique dev_id to each
> ipvlan port. This field is used instead of common FF-FE in Modified EUI-64.
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> ---
> Documentation/networking/ipvlan.txt | 12 +++++++++++-
> drivers/net/ipvlan/ipvlan.h | 1 +
> drivers/net/ipvlan/ipvlan_main.c | 20 ++++++++++++++++++++
> 3 files changed, 32 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/networking/ipvlan.txt b/Documentation/networking/ipvlan.txt
> index cf996394e466..cb0b777bce58 100644
> --- a/Documentation/networking/ipvlan.txt
> +++ b/Documentation/networking/ipvlan.txt
> @@ -24,7 +24,7 @@ using IProute2/ip utility.
>
> ip link add link <master-dev> <slave-dev> type ipvlan mode { l2 | L3 }
>
> - e.g. ip link add link ipvl0 eth0 type ipvlan mode l2
> + e.g. ip link add link eth0 ipvl0 type ipvlan mode l2
>
>
> 4. Operating modes:
> @@ -41,6 +41,15 @@ slave device and packets are switched and queued to the master device to send
> out. In this mode the slaves will RX/TX multicast and broadcast (if applicable)
> as well.
>
> + In L2 mode slave devices receive Router Advertisements from the network
> +and perform autoconfiguration as well as master device. Each port has unique
> +16-bit device id which is used for filling octets 4-5 of Modified EUI-64.
> +That gives 65533 addresses (FF-FE used by master, FF-FF/00-00 reserved/not used).
> +
This is nice, thanks for fixing this! However how is "unique" id
guaranteed? Especially when multiple virtual drivers are stacked? Not
necessarily all of them may use the dev_id, but to avoid any possible
collision, shouldn't the device hierarchy (especially lower_dev) be
traversed before settling on the initial value?
> + Also lower half of IPv6 address could be set as interface token:
> +
> + ip token set ::aaaa:bbbb:cccc:dddd dev ipvl0
> +
> 4.2 L3 mode:
> In this mode TX processing upto L3 happens on the stack instance attached
> to the slave device and packets are switched to the stack instance of the
> @@ -105,3 +114,4 @@ namespace where L2 on the slave could be changed / misused.
> (4) ip -4 addr add 127.0.0.1 dev lo
> (5) ip -4 addr add $IPADDR dev ipvl1
> (6) ip -4 route add default via $ROUTER dev ipvl1
> +
> diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
> index 54549a6223dd..1ebab84e7a0e 100644
> --- a/drivers/net/ipvlan/ipvlan.h
> +++ b/drivers/net/ipvlan/ipvlan.h
> @@ -95,6 +95,7 @@ struct ipvl_port {
> struct rcu_head rcu;
> int count;
> u16 mode;
> + struct ida ida;
> };
>
> static inline struct ipvl_port *ipvlan_port_get_rcu(const struct net_device *d)
> diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
> index 0cafd3e6f02d..dee0e8441150 100644
> --- a/drivers/net/ipvlan/ipvlan_main.c
> +++ b/drivers/net/ipvlan/ipvlan_main.c
> @@ -53,6 +53,7 @@ static int ipvlan_port_create(struct net_device *dev)
> INIT_LIST_HEAD(&port->ipvlans);
> for (idx = 0; idx < IPVLAN_HASH_SIZE; idx++)
> INIT_HLIST_HEAD(&port->hlhead[idx]);
> + ida_init(&port->ida);
>
> err = netdev_rx_handler_register(dev, ipvlan_handle_frame, port);
> if (err)
> @@ -72,6 +73,7 @@ static void ipvlan_port_destroy(struct net_device *dev)
>
> dev->priv_flags &= ~IFF_IPVLAN_MASTER;
> netdev_rx_handler_unregister(dev);
> + ida_destroy(&port->ida);
> kfree_rcu(port, rcu);
> }
>
> @@ -484,6 +486,18 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
> */
> memcpy(dev->dev_addr, phy_dev->dev_addr, ETH_ALEN);
>
> + if (port->mode == IPVLAN_MODE_L2) {
> + /*
> + * IPv6 addrconf uses it to produce unique addresses,
> + * see function addrconf_ifid_eui48.
> + */
> + err = ida_simple_get(&port->ida, 1, 0xFFFE, GFP_KERNEL);
> + if (err > 0)
> + dev->dev_id = err;
> + else if (err != -ENOSPC)
> + goto ipvlan_destroy_port;
> + }
> +
> dev->priv_flags |= IFF_IPVLAN_SLAVE;
>
> port->count += 1;
> @@ -518,6 +532,12 @@ static void ipvlan_link_delete(struct net_device *dev, struct list_head *head)
> list_del(&addr->anode);
> }
> }
> +
> + if (dev->dev_id) {
> + ida_simple_remove(&ipvlan->port->ida, dev->dev_id);
> + dev->dev_id = 0;
> + }
> +
> list_del_rcu(&ipvlan->pnode);
> unregister_netdevice_queue(dev, head);
> netdev_upper_dev_unlink(ipvlan->phy_dev, dev);
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] ipvlan: grab rcu_read_lock on xmit path
2015-05-19 23:33 ` Mahesh Bandewar
@ 2015-05-21 9:51 ` Konstantin Khlebnikov
2015-05-21 11:47 ` Hannes Frederic Sowa
0 siblings, 1 reply; 12+ messages in thread
From: Konstantin Khlebnikov @ 2015-05-21 9:51 UTC (permalink / raw)
To: Mahesh Bandewar
Cc: linux-netdev, David S. Miller, Jiri Benc, Hannes Frederic Sowa
On 20.05.2015 02:33, Mahesh Bandewar wrote:
> On Thu, May 14, 2015 at 6:56 AM, Konstantin Khlebnikov
> <khlebnikov@yandex-team.ru> wrote:
>>
>> ipvlan_start_xmit() is called with rcu_read_lock_bh() while its internal
>> structures requre normal rcu_read_lock().
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>>
>> ---
>>
>> [ 802.945151] ===============================
>> [ 802.945160] [ INFO: suspicious RCU usage. ]
>> [ 802.945164] 4.1.0-rc3+ #71 Not tainted
>> [ 802.945165] -------------------------------
>> [ 802.945167] drivers/net/ipvlan/ipvlan.h:103 suspicious rcu_dereference_check() usage!
>> [ 802.945168]
>> [ 802.945168] other info that might help us debug this:
>> [ 802.945168]
>> [ 802.945170]
>> [ 802.945170] rcu_scheduler_active = 1, debug_locks = 1
>> [ 802.945173] 3 locks held by ping6/3813:
>> [ 802.945174] #0: (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff81880f72>] rawv6_sendmsg+0x512/0xbb0
>> [ 802.945197] #1: (rcu_read_lock_bh){......}, at: [<ffffffff8185e577>] ip6_finish_output2+0x57/0x790
>> [ 802.945205] #2: (rcu_read_lock_bh){......}, at: [<ffffffff8177a68b>] __dev_queue_xmit+0x4b/0x930
>> [ 802.945218]
>> [ 802.945218] stack backtrace:
>> [ 802.945221] CPU: 2 PID: 3813 Comm: ping6 Not tainted 4.1.0-rc3+ #71
>> [ 802.945222] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011
>> [ 802.945224] 0000000000000001 ffff8800db7b7888 ffffffff819de6f8 0000000000000007
>> [ 802.945226] ffff8800db738000 ffff8800db7b78b8 ffffffff810a7f92 ffff880214fc8c00
>> [ 802.945229] ffff88021595b000 ffff88021595a000 ffff880214adf800 ffff8800db7b7968
>> [ 802.945232] Call Trace:
>> [ 802.945248] [<ffffffff819de6f8>] dump_stack+0x4c/0x65
>> [ 802.945253] [<ffffffff810a7f92>] lockdep_rcu_suspicious+0xe2/0x130
>> [ 802.945265] [<ffffffff815e24ac>] ipvlan_queue_xmit+0x17c/0x5a0
>> [ 802.945268] [<ffffffff810aa5a4>] ? __lock_is_held+0x54/0x70
>> [ 802.945271] [<ffffffff815e300c>] ipvlan_start_xmit+0x1c/0x50
>> [ 802.945272] [<ffffffff8177a117>] dev_hard_start_xmit+0x2f7/0x820
>> [ 802.945274] [<ffffffff817797b6>] ? netif_skb_features+0xf6/0x1d0
>> [ 802.945276] [<ffffffff81779b84>] ? validate_xmit_skb.isra.99.part.100+0x24/0x2c0
>> [ 802.945278] [<ffffffff8177ad04>] __dev_queue_xmit+0x6c4/0x930
>> [ 802.945280] [<ffffffff8177a68b>] ? __dev_queue_xmit+0x4b/0x930
>> [ 802.945281] [<ffffffff810a963a>] ? mark_held_locks+0x6a/0x90
>> [ 802.945283] [<ffffffff8177af8e>] dev_queue_xmit_sk+0xe/0x10
>> [ 802.945285] [<ffffffff8185e824>] ip6_finish_output2+0x304/0x790
>> [ 802.945287] [<ffffffff81860dde>] ? ip6_finish_output+0x9e/0x1e0
>> [ 802.945288] [<ffffffff81860dde>] ip6_finish_output+0x9e/0x1e0
>> [ 802.945290] [<ffffffff81860fdb>] ip6_output+0xbb/0x180
>> [ 802.945302] [<ffffffff818a7d0a>] ? ip6_find_1stfragopt+0x9a/0xa0
>> [ 802.945304] [<ffffffff81860d40>] ? ip6_fragment+0xe80/0xe80
>> [ 802.945306] [<ffffffff818a805c>] ip6_local_out_sk+0x2c/0x70
>> [ 802.945308] [<ffffffff818a80b0>] ip6_local_out+0x10/0x20
>> [ 802.945309] [<ffffffff81861571>] ip6_send_skb+0x31/0xd0
>> [ 802.945311] [<ffffffff81861644>] ip6_push_pending_frames+0x34/0x40
>> [ 802.945313] [<ffffffff81881368>] rawv6_sendmsg+0x908/0xbb0
>> [ 802.945328] [<ffffffff810aa5a4>] ? __lock_is_held+0x54/0x70
>> [ 802.945340] [<ffffffff81817e9e>] inet_sendmsg+0x10e/0x1f0
>> [ 802.945343] [<ffffffff81817d90>] ? inet_recvmsg+0x200/0x200
>> [ 802.945351] [<ffffffff81758275>] sock_sendmsg+0x45/0x50
>> [ 802.945354] [<ffffffff8175a719>] SYSC_sendto+0xd9/0x110
>> [ 802.945357] [<ffffffff8175ac99>] SyS_sendto+0x9/0x10
>> [ 802.945362] [<ffffffff819ebfae>] system_call_fastpath+0x12/0x76
>> ---
>> drivers/net/ipvlan/ipvlan_main.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
>> index 77b92a0fe557..0cafd3e6f02d 100644
>> --- a/drivers/net/ipvlan/ipvlan_main.c
>> +++ b/drivers/net/ipvlan/ipvlan_main.c
>> @@ -173,6 +173,7 @@ static int ipvlan_stop(struct net_device *dev)
>> return 0;
>> }
>>
>> +/* called with rcu_read_lock_bh. */
>> static netdev_tx_t ipvlan_start_xmit(struct sk_buff *skb,
>> struct net_device *dev)
>> {
>> @@ -180,6 +181,7 @@ static netdev_tx_t ipvlan_start_xmit(struct sk_buff *skb,
>> int skblen = skb->len;
>> int ret;
>>
>> + rcu_read_lock();
>
> I don't believe this is correct. The correct way would be the way it
> is fixed in the following patch -
> https://patchwork.ozlabs.org/patch/471481/
I'm not sure, that might just plug the warning without fixing problem.
The rest code uses call_rcu()/synchronize_rcu(). Probably relation
between quiescent states of rcu and rcu_bh isn't that easy.
>
>> ret = ipvlan_queue_xmit(skb, dev);
>> if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)) {
>> struct ipvl_pcpu_stats *pcptr;
>> @@ -193,6 +195,8 @@ static netdev_tx_t ipvlan_start_xmit(struct sk_buff *skb,
>> } else {
>> this_cpu_inc(ipvlan->pcpu_stats->tx_drps);
>> }
>> + rcu_read_unlock();
>> +
>> return ret;
>> }
>>
>>
--
Konstantin
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] ipvlan: set dev_id for l2 ports to generate unique IPv6 addresses
2015-05-19 23:59 ` Mahesh Bandewar
@ 2015-05-21 11:38 ` Konstantin Khlebnikov
2015-05-21 12:09 ` Hannes Frederic Sowa
0 siblings, 1 reply; 12+ messages in thread
From: Konstantin Khlebnikov @ 2015-05-21 11:38 UTC (permalink / raw)
To: Mahesh Bandewar
Cc: linux-netdev, David S. Miller, Jiri Benc, Hannes Frederic Sowa
On 20.05.2015 02:59, Mahesh Bandewar wrote:
> On Thu, May 14, 2015 at 6:56 AM, Konstantin Khlebnikov
> <khlebnikov@yandex-team.ru> wrote:
>> All ipvlan ports use one MAC address, this way ipv6 RA tries to assign
>> one ipv6 address to all of them. This patch assigns unique dev_id to each
>> ipvlan port. This field is used instead of common FF-FE in Modified EUI-64.
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> ---
>> Documentation/networking/ipvlan.txt | 12 +++++++++++-
>> drivers/net/ipvlan/ipvlan.h | 1 +
>> drivers/net/ipvlan/ipvlan_main.c | 20 ++++++++++++++++++++
>> 3 files changed, 32 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/networking/ipvlan.txt b/Documentation/networking/ipvlan.txt
>> index cf996394e466..cb0b777bce58 100644
>> --- a/Documentation/networking/ipvlan.txt
>> +++ b/Documentation/networking/ipvlan.txt
>> @@ -24,7 +24,7 @@ using IProute2/ip utility.
>>
>> ip link add link <master-dev> <slave-dev> type ipvlan mode { l2 | L3 }
>>
>> - e.g. ip link add link ipvl0 eth0 type ipvlan mode l2
>> + e.g. ip link add link eth0 ipvl0 type ipvlan mode l2
>>
>>
>> 4. Operating modes:
>> @@ -41,6 +41,15 @@ slave device and packets are switched and queued to the master device to send
>> out. In this mode the slaves will RX/TX multicast and broadcast (if applicable)
>> as well.
>>
>> + In L2 mode slave devices receive Router Advertisements from the network
>> +and perform autoconfiguration as well as master device. Each port has unique
>> +16-bit device id which is used for filling octets 4-5 of Modified EUI-64.
>> +That gives 65533 addresses (FF-FE used by master, FF-FF/00-00 reserved/not used).
>> +
> This is nice, thanks for fixing this! However how is "unique" id
> guaranteed? Especially when multiple virtual drivers are stacked? Not
> necessarily all of them may use the dev_id, but to avoid any possible
> collision, shouldn't the device hierarchy (especially lower_dev) be
> traversed before settling on the initial value?
Well, uniqueness isn't guaranteed but that should work in most cases.
ipv6 anyway checks for duplicate addresses after configuration.
As I see creation of ipvlan on ipvlan just creates slave at original
master device, so this will work as expected. And ipvlan cannot share
physical device with bonding/bridge/macvlan, so I don't see how to
stack more than one layer of ipvlans accidentally.
>
>> + Also lower half of IPv6 address could be set as interface token:
>> +
>> + ip token set ::aaaa:bbbb:cccc:dddd dev ipvl0
>> +
>> 4.2 L3 mode:
>> In this mode TX processing upto L3 happens on the stack instance attached
>> to the slave device and packets are switched to the stack instance of the
>> @@ -105,3 +114,4 @@ namespace where L2 on the slave could be changed / misused.
>> (4) ip -4 addr add 127.0.0.1 dev lo
>> (5) ip -4 addr add $IPADDR dev ipvl1
>> (6) ip -4 route add default via $ROUTER dev ipvl1
>> +
>> diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
>> index 54549a6223dd..1ebab84e7a0e 100644
>> --- a/drivers/net/ipvlan/ipvlan.h
>> +++ b/drivers/net/ipvlan/ipvlan.h
>> @@ -95,6 +95,7 @@ struct ipvl_port {
>> struct rcu_head rcu;
>> int count;
>> u16 mode;
>> + struct ida ida;
>> };
>>
>> static inline struct ipvl_port *ipvlan_port_get_rcu(const struct net_device *d)
>> diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
>> index 0cafd3e6f02d..dee0e8441150 100644
>> --- a/drivers/net/ipvlan/ipvlan_main.c
>> +++ b/drivers/net/ipvlan/ipvlan_main.c
>> @@ -53,6 +53,7 @@ static int ipvlan_port_create(struct net_device *dev)
>> INIT_LIST_HEAD(&port->ipvlans);
>> for (idx = 0; idx < IPVLAN_HASH_SIZE; idx++)
>> INIT_HLIST_HEAD(&port->hlhead[idx]);
>> + ida_init(&port->ida);
>>
>> err = netdev_rx_handler_register(dev, ipvlan_handle_frame, port);
>> if (err)
>> @@ -72,6 +73,7 @@ static void ipvlan_port_destroy(struct net_device *dev)
>>
>> dev->priv_flags &= ~IFF_IPVLAN_MASTER;
>> netdev_rx_handler_unregister(dev);
>> + ida_destroy(&port->ida);
>> kfree_rcu(port, rcu);
>> }
>>
>> @@ -484,6 +486,18 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
>> */
>> memcpy(dev->dev_addr, phy_dev->dev_addr, ETH_ALEN);
>>
>> + if (port->mode == IPVLAN_MODE_L2) {
>> + /*
>> + * IPv6 addrconf uses it to produce unique addresses,
>> + * see function addrconf_ifid_eui48.
>> + */
>> + err = ida_simple_get(&port->ida, 1, 0xFFFE, GFP_KERNEL);
>> + if (err > 0)
>> + dev->dev_id = err;
>> + else if (err != -ENOSPC)
>> + goto ipvlan_destroy_port;
>> + }
>> +
>> dev->priv_flags |= IFF_IPVLAN_SLAVE;
>>
>> port->count += 1;
>> @@ -518,6 +532,12 @@ static void ipvlan_link_delete(struct net_device *dev, struct list_head *head)
>> list_del(&addr->anode);
>> }
>> }
>> +
>> + if (dev->dev_id) {
>> + ida_simple_remove(&ipvlan->port->ida, dev->dev_id);
>> + dev->dev_id = 0;
>> + }
>> +
>> list_del_rcu(&ipvlan->pnode);
>> unregister_netdevice_queue(dev, head);
>> netdev_upper_dev_unlink(ipvlan->phy_dev, dev);
>>
--
Konstantin
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] ipvlan: grab rcu_read_lock on xmit path
2015-05-21 9:51 ` Konstantin Khlebnikov
@ 2015-05-21 11:47 ` Hannes Frederic Sowa
0 siblings, 0 replies; 12+ messages in thread
From: Hannes Frederic Sowa @ 2015-05-21 11:47 UTC (permalink / raw)
To: Konstantin Khlebnikov, Mahesh Bandewar
Cc: linux-netdev, David S. Miller, Jiri Benc
On Thu, May 21, 2015, at 11:51, Konstantin Khlebnikov wrote:
> On 20.05.2015 02:33, Mahesh Bandewar wrote:
> > On Thu, May 14, 2015 at 6:56 AM, Konstantin Khlebnikov
> > <khlebnikov@yandex-team.ru> wrote:
> >>
> >> ipvlan_start_xmit() is called with rcu_read_lock_bh() while its internal
> >> structures requre normal rcu_read_lock().
> >>
> >> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> >>
> >> ---
> >>
> >> [ 802.945151] ===============================
> >> [ 802.945160] [ INFO: suspicious RCU usage. ]
> >> [ 802.945164] 4.1.0-rc3+ #71 Not tainted
> >> [ 802.945165] -------------------------------
> >> [ 802.945167] drivers/net/ipvlan/ipvlan.h:103 suspicious rcu_dereference_check() usage!
> >> [ 802.945168]
> >> [ 802.945168] other info that might help us debug this:
> >> [ 802.945168]
> >> [ 802.945170]
> >> [ 802.945170] rcu_scheduler_active = 1, debug_locks = 1
> >> [ 802.945173] 3 locks held by ping6/3813:
> >> [ 802.945174] #0: (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff81880f72>] rawv6_sendmsg+0x512/0xbb0
> >> [ 802.945197] #1: (rcu_read_lock_bh){......}, at: [<ffffffff8185e577>] ip6_finish_output2+0x57/0x790
> >> [ 802.945205] #2: (rcu_read_lock_bh){......}, at: [<ffffffff8177a68b>] __dev_queue_xmit+0x4b/0x930
> >> [ 802.945218]
> >> [ 802.945218] stack backtrace:
> >> [ 802.945221] CPU: 2 PID: 3813 Comm: ping6 Not tainted 4.1.0-rc3+ #71
> >> [ 802.945222] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011
> >> [ 802.945224] 0000000000000001 ffff8800db7b7888 ffffffff819de6f8 0000000000000007
> >> [ 802.945226] ffff8800db738000 ffff8800db7b78b8 ffffffff810a7f92 ffff880214fc8c00
> >> [ 802.945229] ffff88021595b000 ffff88021595a000 ffff880214adf800 ffff8800db7b7968
> >> [ 802.945232] Call Trace:
> >> [ 802.945248] [<ffffffff819de6f8>] dump_stack+0x4c/0x65
> >> [ 802.945253] [<ffffffff810a7f92>] lockdep_rcu_suspicious+0xe2/0x130
> >> [ 802.945265] [<ffffffff815e24ac>] ipvlan_queue_xmit+0x17c/0x5a0
> >> [ 802.945268] [<ffffffff810aa5a4>] ? __lock_is_held+0x54/0x70
> >> [ 802.945271] [<ffffffff815e300c>] ipvlan_start_xmit+0x1c/0x50
> >> [ 802.945272] [<ffffffff8177a117>] dev_hard_start_xmit+0x2f7/0x820
> >> [ 802.945274] [<ffffffff817797b6>] ? netif_skb_features+0xf6/0x1d0
> >> [ 802.945276] [<ffffffff81779b84>] ? validate_xmit_skb.isra.99.part.100+0x24/0x2c0
> >> [ 802.945278] [<ffffffff8177ad04>] __dev_queue_xmit+0x6c4/0x930
> >> [ 802.945280] [<ffffffff8177a68b>] ? __dev_queue_xmit+0x4b/0x930
> >> [ 802.945281] [<ffffffff810a963a>] ? mark_held_locks+0x6a/0x90
> >> [ 802.945283] [<ffffffff8177af8e>] dev_queue_xmit_sk+0xe/0x10
> >> [ 802.945285] [<ffffffff8185e824>] ip6_finish_output2+0x304/0x790
> >> [ 802.945287] [<ffffffff81860dde>] ? ip6_finish_output+0x9e/0x1e0
> >> [ 802.945288] [<ffffffff81860dde>] ip6_finish_output+0x9e/0x1e0
> >> [ 802.945290] [<ffffffff81860fdb>] ip6_output+0xbb/0x180
> >> [ 802.945302] [<ffffffff818a7d0a>] ? ip6_find_1stfragopt+0x9a/0xa0
> >> [ 802.945304] [<ffffffff81860d40>] ? ip6_fragment+0xe80/0xe80
> >> [ 802.945306] [<ffffffff818a805c>] ip6_local_out_sk+0x2c/0x70
> >> [ 802.945308] [<ffffffff818a80b0>] ip6_local_out+0x10/0x20
> >> [ 802.945309] [<ffffffff81861571>] ip6_send_skb+0x31/0xd0
> >> [ 802.945311] [<ffffffff81861644>] ip6_push_pending_frames+0x34/0x40
> >> [ 802.945313] [<ffffffff81881368>] rawv6_sendmsg+0x908/0xbb0
> >> [ 802.945328] [<ffffffff810aa5a4>] ? __lock_is_held+0x54/0x70
> >> [ 802.945340] [<ffffffff81817e9e>] inet_sendmsg+0x10e/0x1f0
> >> [ 802.945343] [<ffffffff81817d90>] ? inet_recvmsg+0x200/0x200
> >> [ 802.945351] [<ffffffff81758275>] sock_sendmsg+0x45/0x50
> >> [ 802.945354] [<ffffffff8175a719>] SYSC_sendto+0xd9/0x110
> >> [ 802.945357] [<ffffffff8175ac99>] SyS_sendto+0x9/0x10
> >> [ 802.945362] [<ffffffff819ebfae>] system_call_fastpath+0x12/0x76
> >> ---
> >> drivers/net/ipvlan/ipvlan_main.c | 4 ++++
> >> 1 file changed, 4 insertions(+)
> >>
> >> diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
> >> index 77b92a0fe557..0cafd3e6f02d 100644
> >> --- a/drivers/net/ipvlan/ipvlan_main.c
> >> +++ b/drivers/net/ipvlan/ipvlan_main.c
> >> @@ -173,6 +173,7 @@ static int ipvlan_stop(struct net_device *dev)
> >> return 0;
> >> }
> >>
> >> +/* called with rcu_read_lock_bh. */
> >> static netdev_tx_t ipvlan_start_xmit(struct sk_buff *skb,
> >> struct net_device *dev)
> >> {
> >> @@ -180,6 +181,7 @@ static netdev_tx_t ipvlan_start_xmit(struct sk_buff *skb,
> >> int skblen = skb->len;
> >> int ret;
> >>
> >> + rcu_read_lock();
> >
> > I don't believe this is correct. The correct way would be the way it
> > is fixed in the following patch -
> > https://patchwork.ozlabs.org/patch/471481/
>
> I'm not sure, that might just plug the warning without fixing problem.
> The rest code uses call_rcu()/synchronize_rcu(). Probably relation
> between quiescent states of rcu and rcu_bh isn't that easy.
>
In theory both, rcu_read_lock and rcu_read_lock_bh must be held,
otherwise call_rcu would be allowed to execute callback if no
rcu_read_lock but only rcu_read_lock_bh is taken. *AFAIK* in normal
kernels, grace period for call_rcu is just longer, call_rcu_bh does
check transition from and to softirqs.
Bye,
Hannes
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] ipvlan: set dev_id for l2 ports to generate unique IPv6 addresses
2015-05-21 11:38 ` Konstantin Khlebnikov
@ 2015-05-21 12:09 ` Hannes Frederic Sowa
0 siblings, 0 replies; 12+ messages in thread
From: Hannes Frederic Sowa @ 2015-05-21 12:09 UTC (permalink / raw)
To: Konstantin Khlebnikov, Mahesh Bandewar
Cc: linux-netdev, David S. Miller, Jiri Benc
On Thu, May 21, 2015, at 13:38, Konstantin Khlebnikov wrote:
> On 20.05.2015 02:59, Mahesh Bandewar wrote:
> > On Thu, May 14, 2015 at 6:56 AM, Konstantin Khlebnikov
> > <khlebnikov@yandex-team.ru> wrote:
> >> All ipvlan ports use one MAC address, this way ipv6 RA tries to assign
> >> one ipv6 address to all of them. This patch assigns unique dev_id to each
> >> ipvlan port. This field is used instead of common FF-FE in Modified EUI-64.
> >>
> >> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> >> ---
> >> Documentation/networking/ipvlan.txt | 12 +++++++++++-
> >> drivers/net/ipvlan/ipvlan.h | 1 +
> >> drivers/net/ipvlan/ipvlan_main.c | 20 ++++++++++++++++++++
> >> 3 files changed, 32 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/Documentation/networking/ipvlan.txt b/Documentation/networking/ipvlan.txt
> >> index cf996394e466..cb0b777bce58 100644
> >> --- a/Documentation/networking/ipvlan.txt
> >> +++ b/Documentation/networking/ipvlan.txt
> >> @@ -24,7 +24,7 @@ using IProute2/ip utility.
> >>
> >> ip link add link <master-dev> <slave-dev> type ipvlan mode { l2 | L3 }
> >>
> >> - e.g. ip link add link ipvl0 eth0 type ipvlan mode l2
> >> + e.g. ip link add link eth0 ipvl0 type ipvlan mode l2
> >>
> >>
> >> 4. Operating modes:
> >> @@ -41,6 +41,15 @@ slave device and packets are switched and queued to the master device to send
> >> out. In this mode the slaves will RX/TX multicast and broadcast (if applicable)
> >> as well.
> >>
> >> + In L2 mode slave devices receive Router Advertisements from the network
> >> +and perform autoconfiguration as well as master device. Each port has unique
> >> +16-bit device id which is used for filling octets 4-5 of Modified EUI-64.
> >> +That gives 65533 addresses (FF-FE used by master, FF-FF/00-00 reserved/not used).
> >> +
> > This is nice, thanks for fixing this! However how is "unique" id
> > guaranteed? Especially when multiple virtual drivers are stacked? Not
> > necessarily all of them may use the dev_id, but to avoid any possible
> > collision, shouldn't the device hierarchy (especially lower_dev) be
> > traversed before settling on the initial value?
>
> Well, uniqueness isn't guaranteed but that should work in most cases.
> ipv6 anyway checks for duplicate addresses after configuration.
>
> As I see creation of ipvlan on ipvlan just creates slave at original
> master device, so this will work as expected. And ipvlan cannot share
> physical device with bonding/bridge/macvlan, so I don't see how to
> stack more than one layer of ipvlans accidentally.
I hope that stable ipv6 privacy addresses will be used in future and old
eui-48 based LL addresses will just disappear. That said, I would be
even fine with a RNG generated dev_id, because we cannot really ensure
uniqueness.
Stable privacy addresses even use DAD retry counter to regenerate a new
LL address in case the address becomes IFA_F_DADFAILED.
Bye,
Hannes
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2015-05-21 12:09 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-14 13:56 [PATCH 0/3] ipvlan: fix ipv6 autoconfiguration via RA Konstantin Khlebnikov
2015-05-14 13:56 ` [PATCH 1/3] ipv6: make inet6addr_chain blocking and always call with rtnl locked Konstantin Khlebnikov
2015-05-16 21:22 ` David Miller
2015-05-18 10:13 ` Konstantin Khlebnikov
2015-05-14 13:56 ` [PATCH 2/3] ipvlan: grab rcu_read_lock on xmit path Konstantin Khlebnikov
2015-05-19 23:33 ` Mahesh Bandewar
2015-05-21 9:51 ` Konstantin Khlebnikov
2015-05-21 11:47 ` Hannes Frederic Sowa
2015-05-14 13:56 ` [PATCH 3/3] ipvlan: set dev_id for l2 ports to generate unique IPv6 addresses Konstantin Khlebnikov
2015-05-19 23:59 ` Mahesh Bandewar
2015-05-21 11:38 ` Konstantin Khlebnikov
2015-05-21 12:09 ` Hannes Frederic Sowa
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).