* [PATCH net-next 0/5] Per-net and on-demand link indices (and related)
@ 2012-08-06 10:30 Pavel Emelyanov
2012-08-06 10:30 ` [PATCH 1/5] net: Don't use ifindices in hash fns Pavel Emelyanov
` (4 more replies)
0 siblings, 5 replies; 12+ messages in thread
From: Pavel Emelyanov @ 2012-08-06 10:30 UTC (permalink / raw)
To: David Miller, Eric Dumazet, Eric W. Biederman, Linux Netdev List
Hi!
This set tries to summarize the recent discussion of making ifindices friendly
to checkpoint-restore and consists of:
1. Prepare hash function to non-unique ifindices
2. Allow for specifying the desired ifindex on net link creation
3. Make ifindex generation per-net
4. Simplify loopback device ifindex access
Thanks,
Pavel
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/5] net: Don't use ifindices in hash fns
2012-08-06 10:30 [PATCH net-next 0/5] Per-net and on-demand link indices (and related) Pavel Emelyanov
@ 2012-08-06 10:30 ` Pavel Emelyanov
2012-08-06 10:43 ` Eric Dumazet
2012-08-06 10:31 ` [PATCH 2/5] net: Allow to create links with given ifindex Pavel Emelyanov
` (3 subsequent siblings)
4 siblings, 1 reply; 12+ messages in thread
From: Pavel Emelyanov @ 2012-08-06 10:30 UTC (permalink / raw)
To: David Miller, Eric Dumazet, Eric W. Biederman, Linux Netdev List
Eric noticed, that when there will be devices with equal indices, some
hash functions that use them will become less effective as they could.
Fix this in advance by taking the net_device address into calculations
instead of the device index. Since the net_device is always aligned in
memory, shift the pointer to eliminate always zero bits (like we do it
in net_hash_mix).
This is true for arp and ndisc hash fns. The netlabel, can and llc ones
are also ifindex-based, but that three are init_net-only, thus will not
be affected.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
---
include/linux/netdevice.h | 6 ++++++
include/net/arp.h | 2 +-
include/net/ndisc.h | 2 +-
3 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a9db4f3..6010b37 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1330,6 +1330,12 @@ struct net_device {
#define NETDEV_ALIGN 32
+static inline unsigned int netdev_hash_mix(const struct net_device *dev)
+{
+ return (unsigned int)(((unsigned long)dev) >>
+ max(L1_CACHE_BYTES, NETDEV_ALIGN));
+}
+
static inline
int netdev_get_prio_tc_map(const struct net_device *dev, u32 prio)
{
diff --git a/include/net/arp.h b/include/net/arp.h
index 7f7df93..0305a38 100644
--- a/include/net/arp.h
+++ b/include/net/arp.h
@@ -10,7 +10,7 @@ extern struct neigh_table arp_tbl;
static inline u32 arp_hashfn(u32 key, const struct net_device *dev, u32 hash_rnd)
{
- u32 val = key ^ dev->ifindex;
+ u32 val = key ^ netdev_hash_mix(dev);
return val * hash_rnd;
}
diff --git a/include/net/ndisc.h b/include/net/ndisc.h
index 96a3b5c..ae7c1fd 100644
--- a/include/net/ndisc.h
+++ b/include/net/ndisc.h
@@ -134,7 +134,7 @@ static inline u32 ndisc_hashfn(const void *pkey, const struct net_device *dev, _
{
const u32 *p32 = pkey;
- return (((p32[0] ^ dev->ifindex) * hash_rnd[0]) +
+ return (((p32[0] ^ netdev_hash_mix(dev)) * hash_rnd[0]) +
(p32[1] * hash_rnd[1]) +
(p32[2] * hash_rnd[2]) +
(p32[3] * hash_rnd[3]));
--
1.7.6.5
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/5] net: Allow to create links with given ifindex
2012-08-06 10:30 [PATCH net-next 0/5] Per-net and on-demand link indices (and related) Pavel Emelyanov
2012-08-06 10:30 ` [PATCH 1/5] net: Don't use ifindices in hash fns Pavel Emelyanov
@ 2012-08-06 10:31 ` Pavel Emelyanov
2012-08-06 10:31 ` [PATCH 3/5] veth: Allow to create peer link " Pavel Emelyanov
` (2 subsequent siblings)
4 siblings, 0 replies; 12+ messages in thread
From: Pavel Emelyanov @ 2012-08-06 10:31 UTC (permalink / raw)
To: David Miller, Eric Dumazet, Eric W. Biederman, Linux Netdev List
Currently the RTM_NEWLINK results in -EOPNOTSUPP if the ifinfomsg->ifi_index
is not zero. I propose to allow requesting ifindices on link creation. This
is required by the checkpoint-restore to correctly restore a net namespace
(i.e. -- a container).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
---
net/core/dev.c | 7 ++++++-
net/core/rtnetlink.c | 12 +++++++-----
2 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index f91abf8..3ca300d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5579,7 +5579,12 @@ int register_netdevice(struct net_device *dev)
}
}
- dev->ifindex = dev_new_index(net);
+ ret = -EBUSY;
+ if (!dev->ifindex)
+ dev->ifindex = dev_new_index(net);
+ else if (__dev_get_by_index(net, dev->ifindex))
+ goto err_uninit;
+
if (dev->iflink == -1)
dev->iflink = dev->ifindex;
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 2c5a0a0..1aa1456 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1812,8 +1812,6 @@ replay:
return -ENODEV;
}
- if (ifm->ifi_index)
- return -EOPNOTSUPP;
if (tb[IFLA_MAP] || tb[IFLA_MASTER] || tb[IFLA_PROTINFO])
return -EOPNOTSUPP;
@@ -1839,10 +1837,14 @@ replay:
return PTR_ERR(dest_net);
dev = rtnl_create_link(net, dest_net, ifname, ops, tb);
-
- if (IS_ERR(dev))
+ if (IS_ERR(dev)) {
err = PTR_ERR(dev);
- else if (ops->newlink)
+ goto out;
+ }
+
+ dev->ifindex = ifm->ifi_index;
+
+ if (ops->newlink)
err = ops->newlink(net, dev, tb, data);
else
err = register_netdevice(dev);
--
1.7.6.5
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 3/5] veth: Allow to create peer link with given ifindex
2012-08-06 10:30 [PATCH net-next 0/5] Per-net and on-demand link indices (and related) Pavel Emelyanov
2012-08-06 10:30 ` [PATCH 1/5] net: Don't use ifindices in hash fns Pavel Emelyanov
2012-08-06 10:31 ` [PATCH 2/5] net: Allow to create links with given ifindex Pavel Emelyanov
@ 2012-08-06 10:31 ` Pavel Emelyanov
2012-08-06 10:31 ` [PATCH 4/5] net: Make ifindex generation per-net namespace Pavel Emelyanov
2012-08-06 10:32 ` [PATCH 5/5] net: Loopback ifindex is constant now Pavel Emelyanov
4 siblings, 0 replies; 12+ messages in thread
From: Pavel Emelyanov @ 2012-08-06 10:31 UTC (permalink / raw)
To: David Miller, Eric Dumazet, Eric W. Biederman, Linux Netdev List
The ifinfomsg is in there (thanks kaber@ for foreseeing this long time ago),
so take the given ifidex and register netdev with it.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
---
drivers/net/veth.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 5852361..496c026 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -348,6 +348,9 @@ static int veth_newlink(struct net *src_net, struct net_device *dev,
if (tbp[IFLA_ADDRESS] == NULL)
eth_hw_addr_random(peer);
+ if (ifmp)
+ peer->ifindex = ifmp->ifi_index;
+
err = register_netdevice(peer);
put_net(net);
net = NULL;
--
1.7.6.5
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 4/5] net: Make ifindex generation per-net namespace
2012-08-06 10:30 [PATCH net-next 0/5] Per-net and on-demand link indices (and related) Pavel Emelyanov
` (2 preceding siblings ...)
2012-08-06 10:31 ` [PATCH 3/5] veth: Allow to create peer link " Pavel Emelyanov
@ 2012-08-06 10:31 ` Pavel Emelyanov
2012-08-06 10:32 ` [PATCH 5/5] net: Loopback ifindex is constant now Pavel Emelyanov
4 siblings, 0 replies; 12+ messages in thread
From: Pavel Emelyanov @ 2012-08-06 10:31 UTC (permalink / raw)
To: David Miller, Eric Dumazet, Eric W. Biederman, Linux Netdev List
Strictly speaking this is only _really_ required for checkpoint-restore to
make loopback device always have the same index.
This change appears to be safe wrt "ifindex should be unique per-system"
concept, as all the ifindex usage is either already made per net namespace
of is explicitly limited with init_net only.
There are two cool side effects of this. The first one -- ifindices of
devices in container are always small, regardless of how many containers
we've started (and re-started) so far. The second one is -- we can speed
up the loopback ifidex access as shown in the next patch.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
---
include/net/net_namespace.h | 1 +
net/core/dev.c | 4 ++--
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index ae1cd6c..c5fbebf 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -62,6 +62,7 @@ struct net {
struct sock *rtnl; /* rtnetlink socket */
struct sock *genl_sock;
+ int ifindex;
struct list_head dev_base_head;
struct hlist_head *dev_name_head;
struct hlist_head *dev_index_head;
diff --git a/net/core/dev.c b/net/core/dev.c
index 3ca300d..1f06df8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5221,12 +5221,12 @@ int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg)
*/
static int dev_new_index(struct net *net)
{
- static int ifindex;
+ int ifindex = net->ifindex;
for (;;) {
if (++ifindex <= 0)
ifindex = 1;
if (!__dev_get_by_index(net, ifindex))
- return ifindex;
+ return net->ifindex = ifindex;
}
}
--
1.7.6.5
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 5/5] net: Loopback ifindex is constant now
2012-08-06 10:30 [PATCH net-next 0/5] Per-net and on-demand link indices (and related) Pavel Emelyanov
` (3 preceding siblings ...)
2012-08-06 10:31 ` [PATCH 4/5] net: Make ifindex generation per-net namespace Pavel Emelyanov
@ 2012-08-06 10:32 ` Pavel Emelyanov
2012-08-06 11:54 ` Eric Dumazet
4 siblings, 1 reply; 12+ messages in thread
From: Pavel Emelyanov @ 2012-08-06 10:32 UTC (permalink / raw)
To: David Miller, Eric Dumazet, Eric W. Biederman, Linux Netdev List
As pointed out, there are places, that access net->loopback_dev->ifindex
and after ifindex generation is made per-net this value becomes constant
equals 1. So go ahead and introduce the LOOPBACK_IFINDEX constant and use
it where appropriate.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
---
drivers/net/loopback.c | 1 +
include/net/net_namespace.h | 6 ++++++
net/decnet/dn_route.c | 6 +++---
net/ipv4/fib_frontend.c | 2 +-
net/ipv4/ipmr.c | 2 +-
net/ipv4/netfilter/ipt_rpfilter.c | 2 +-
net/ipv4/route.c | 6 +++---
net/ipv6/route.c | 2 +-
8 files changed, 17 insertions(+), 10 deletions(-)
diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index e2a06fd..4a075ba 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -197,6 +197,7 @@ static __net_init int loopback_net_init(struct net *net)
if (err)
goto out_free_netdev;
+ BUG_ON(dev->ifindex != LOOPBACK_IFINDEX);
net->loopback_dev = dev;
return 0;
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index c5fbebf..841581b 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -105,6 +105,12 @@ struct net {
struct sock *diag_nlsk;
};
+/*
+ * ifindex generation is per-net namespace, thus any loopback
+ * device should get ifindex 1
+ */
+
+#define LOOPBACK_IFINDEX 1
#include <linux/seq_file_net.h>
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index 85a3604..c855e8d 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -961,7 +961,7 @@ static int dn_route_output_slow(struct dst_entry **pprt, const struct flowidn *o
.saddr = oldflp->saddr,
.flowidn_scope = RT_SCOPE_UNIVERSE,
.flowidn_mark = oldflp->flowidn_mark,
- .flowidn_iif = init_net.loopback_dev->ifindex,
+ .flowidn_iif = LOOPBACK_IFINDEX,
.flowidn_oif = oldflp->flowidn_oif,
};
struct dn_route *rt = NULL;
@@ -979,7 +979,7 @@ static int dn_route_output_slow(struct dst_entry **pprt, const struct flowidn *o
"dn_route_output_slow: dst=%04x src=%04x mark=%d"
" iif=%d oif=%d\n", le16_to_cpu(oldflp->daddr),
le16_to_cpu(oldflp->saddr),
- oldflp->flowidn_mark, init_net.loopback_dev->ifindex,
+ oldflp->flowidn_mark, LOOPBACK_IFINDEX,
oldflp->flowidn_oif);
/* If we have an output interface, verify its a DECnet device */
@@ -1042,7 +1042,7 @@ source_ok:
if (!fld.daddr)
goto out;
}
- fld.flowidn_oif = init_net.loopback_dev->ifindex;
+ fld.flowidn_oif = LOOPBACK_IFINDEX;
res.type = RTN_LOCAL;
goto make_route;
}
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index c43ae3f..7f073a3 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -218,7 +218,7 @@ __be32 fib_compute_spec_dst(struct sk_buff *skb)
scope = RT_SCOPE_UNIVERSE;
if (!ipv4_is_zeronet(ip_hdr(skb)->saddr)) {
fl4.flowi4_oif = 0;
- fl4.flowi4_iif = net->loopback_dev->ifindex;
+ fl4.flowi4_iif = LOOPBACK_IFINDEX;
fl4.daddr = ip_hdr(skb)->saddr;
fl4.saddr = 0;
fl4.flowi4_tos = RT_TOS(ip_hdr(skb)->tos);
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 8eec8f4..3a57570 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1798,7 +1798,7 @@ static struct mr_table *ipmr_rt_fib_lookup(struct net *net, struct sk_buff *skb)
.flowi4_oif = (rt_is_output_route(rt) ?
skb->dev->ifindex : 0),
.flowi4_iif = (rt_is_output_route(rt) ?
- net->loopback_dev->ifindex :
+ LOOPBACK_IFINDEX :
skb->dev->ifindex),
.flowi4_mark = skb->mark,
};
diff --git a/net/ipv4/netfilter/ipt_rpfilter.c b/net/ipv4/netfilter/ipt_rpfilter.c
index 31371be..24eb18c 100644
--- a/net/ipv4/netfilter/ipt_rpfilter.c
+++ b/net/ipv4/netfilter/ipt_rpfilter.c
@@ -85,7 +85,7 @@ static bool rpfilter_mt(const struct sk_buff *skb, struct xt_action_param *par)
return ipv4_is_local_multicast(iph->daddr) ^ invert;
flow.flowi4_iif = 0;
} else {
- flow.flowi4_iif = dev_net(par->in)->loopback_dev->ifindex;
+ flow.flowi4_iif = LOOPBACK_IFINDEX;
}
flow.daddr = iph->saddr;
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 21ad369..c581373 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1619,7 +1619,7 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
if (res.type == RTN_LOCAL) {
err = fib_validate_source(skb, saddr, daddr, tos,
- net->loopback_dev->ifindex,
+ LOOPBACK_IFINDEX,
dev, in_dev, &itag);
if (err < 0)
goto martian_source_keep_err;
@@ -1895,7 +1895,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
orig_oif = fl4->flowi4_oif;
- fl4->flowi4_iif = net->loopback_dev->ifindex;
+ fl4->flowi4_iif = LOOPBACK_IFINDEX;
fl4->flowi4_tos = tos & IPTOS_RT_MASK;
fl4->flowi4_scope = ((tos & RTO_ONLINK) ?
RT_SCOPE_LINK : RT_SCOPE_UNIVERSE);
@@ -1984,7 +1984,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
if (!fl4->daddr)
fl4->daddr = fl4->saddr = htonl(INADDR_LOOPBACK);
dev_out = net->loopback_dev;
- fl4->flowi4_oif = net->loopback_dev->ifindex;
+ fl4->flowi4_oif = LOOPBACK_IFINDEX;
res.type = RTN_LOCAL;
flags |= RTCF_LOCAL;
goto make_route;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 8e80fd2..0ddf2d1 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -965,7 +965,7 @@ struct dst_entry * ip6_route_output(struct net *net, const struct sock *sk,
{
int flags = 0;
- fl6->flowi6_iif = net->loopback_dev->ifindex;
+ fl6->flowi6_iif = LOOPBACK_IFINDEX;
if ((sk && sk->sk_bound_dev_if) || rt6_need_strict(&fl6->daddr))
flags |= RT6_LOOKUP_F_IFACE;
--
1.7.6.5
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/5] net: Don't use ifindices in hash fns
2012-08-06 10:30 ` [PATCH 1/5] net: Don't use ifindices in hash fns Pavel Emelyanov
@ 2012-08-06 10:43 ` Eric Dumazet
2012-08-06 11:01 ` Pavel Emelyanov
0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2012-08-06 10:43 UTC (permalink / raw)
To: Pavel Emelyanov; +Cc: David Miller, Eric W. Biederman, Linux Netdev List
On Mon, 2012-08-06 at 14:30 +0400, Pavel Emelyanov wrote:
> Eric noticed, that when there will be devices with equal indices, some
> hash functions that use them will become less effective as they could.
>
> Fix this in advance by taking the net_device address into calculations
> instead of the device index. Since the net_device is always aligned in
> memory, shift the pointer to eliminate always zero bits (like we do it
> in net_hash_mix).
>
> This is true for arp and ndisc hash fns. The netlabel, can and llc ones
> are also ifindex-based, but that three are init_net-only, thus will not
> be affected.
>
> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
>
> ---
> include/linux/netdevice.h | 6 ++++++
> include/net/arp.h | 2 +-
> include/net/ndisc.h | 2 +-
> 3 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index a9db4f3..6010b37 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1330,6 +1330,12 @@ struct net_device {
>
> #define NETDEV_ALIGN 32
>
> +static inline unsigned int netdev_hash_mix(const struct net_device *dev)
> +{
> + return (unsigned int)(((unsigned long)dev) >>
> + max(L1_CACHE_BYTES, NETDEV_ALIGN));
> +}
> +
I guess you didnt test this patch very well ...
This returns 0 as is
I would define a generic pointer hash mix instead of a 'net_device
thing'
static inline u32 ptr_hash_mix(void *ptr)
{
#if BITS_PER_LONG==32
return (u32)(unsigned long)ptr;
#else
return (u32)((unsigned long)ptr >> L1_CACHE_SHIFT);
#endif
}
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/5] net: Don't use ifindices in hash fns
2012-08-06 10:43 ` Eric Dumazet
@ 2012-08-06 11:01 ` Pavel Emelyanov
2012-08-06 11:51 ` Eric Dumazet
0 siblings, 1 reply; 12+ messages in thread
From: Pavel Emelyanov @ 2012-08-06 11:01 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, Eric W. Biederman, Linux Netdev List
On 08/06/2012 02:43 PM, Eric Dumazet wrote:
> On Mon, 2012-08-06 at 14:30 +0400, Pavel Emelyanov wrote:
>> Eric noticed, that when there will be devices with equal indices, some
>> hash functions that use them will become less effective as they could.
>>
>> Fix this in advance by taking the net_device address into calculations
>> instead of the device index. Since the net_device is always aligned in
>> memory, shift the pointer to eliminate always zero bits (like we do it
>> in net_hash_mix).
>>
>> This is true for arp and ndisc hash fns. The netlabel, can and llc ones
>> are also ifindex-based, but that three are init_net-only, thus will not
>> be affected.
>>
>> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
>>
>> ---
>> include/linux/netdevice.h | 6 ++++++
>> include/net/arp.h | 2 +-
>> include/net/ndisc.h | 2 +-
>> 3 files changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index a9db4f3..6010b37 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -1330,6 +1330,12 @@ struct net_device {
>>
>> #define NETDEV_ALIGN 32
>>
>> +static inline unsigned int netdev_hash_mix(const struct net_device *dev)
>> +{
>> + return (unsigned int)(((unsigned long)dev) >>
>> + max(L1_CACHE_BYTES, NETDEV_ALIGN));
>> +}
>> +
>
> I guess you didnt test this patch very well ...
Damn :( You're right.
> This returns 0 as is
Well, on 64-bit no, but what it does is also not what it was supposed to.
> I would define a generic pointer hash mix instead of a 'net_device
> thing'
>
> static inline u32 ptr_hash_mix(void *ptr)
> {
> #if BITS_PER_LONG==32
> return (u32)(unsigned long)ptr;
> #else
> return (u32)((unsigned long)ptr >> L1_CACHE_SHIFT);
> #endif
> }
OK. It will also obsolete the net_hash_mix then. Any suggestions where to put
the new one?
Thanks,
Pavel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/5] net: Don't use ifindices in hash fns
2012-08-06 11:01 ` Pavel Emelyanov
@ 2012-08-06 11:51 ` Eric Dumazet
0 siblings, 0 replies; 12+ messages in thread
From: Eric Dumazet @ 2012-08-06 11:51 UTC (permalink / raw)
To: Pavel Emelyanov; +Cc: David Miller, Eric W. Biederman, Linux Netdev List
On Mon, 2012-08-06 at 15:01 +0400, Pavel Emelyanov wrote:
> Damn :( You're right.
>
> > This returns 0 as is
>
> Well, on 64-bit no, but what it does is also not what it was supposed to.
>
Here, my L1_CACHE_BYTES is 64, but whatever
> > I would define a generic pointer hash mix instead of a 'net_device
> > thing'
> >
> > static inline u32 ptr_hash_mix(void *ptr)
> > {
> > #if BITS_PER_LONG==32
> > return (u32)(unsigned long)ptr;
> > #else
> > return (u32)((unsigned long)ptr >> L1_CACHE_SHIFT);
> > #endif
> > }
>
> OK. It will also obsolete the net_hash_mix then. Any suggestions where to put
> the new one?
Not obsolete net_hash_mix(), since this one can return 0 if
CONFIG_NET_NS is not set.
static inline unsigned int net_hash_mix(struct net *net)
{
#ifdef CONFIG_NET_NS
return ptr_hash_mix(net);
#else
return 0;
#endif
}
I dont know, you could add this to include/linux/hash.h
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 5/5] net: Loopback ifindex is constant now
2012-08-06 10:32 ` [PATCH 5/5] net: Loopback ifindex is constant now Pavel Emelyanov
@ 2012-08-06 11:54 ` Eric Dumazet
2012-08-06 12:13 ` Pavel Emelyanov
0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2012-08-06 11:54 UTC (permalink / raw)
To: Pavel Emelyanov; +Cc: David Miller, Eric W. Biederman, Linux Netdev List
On Mon, 2012-08-06 at 14:32 +0400, Pavel Emelyanov wrote:
> As pointed out, there are places, that access net->loopback_dev->ifindex
> and after ifindex generation is made per-net this value becomes constant
> equals 1. So go ahead and introduce the LOOPBACK_IFINDEX constant and use
> it where appropriate.
>
> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
What guarantee do we have that loopback is the first device per net ?
You should add this to the changelog because its not that obvious.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 5/5] net: Loopback ifindex is constant now
2012-08-06 11:54 ` Eric Dumazet
@ 2012-08-06 12:13 ` Pavel Emelyanov
0 siblings, 0 replies; 12+ messages in thread
From: Pavel Emelyanov @ 2012-08-06 12:13 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, Eric W. Biederman, Linux Netdev List
On 08/06/2012 03:54 PM, Eric Dumazet wrote:
> On Mon, 2012-08-06 at 14:32 +0400, Pavel Emelyanov wrote:
>> As pointed out, there are places, that access net->loopback_dev->ifindex
>> and after ifindex generation is made per-net this value becomes constant
>> equals 1. So go ahead and introduce the LOOPBACK_IFINDEX constant and use
>> it where appropriate.
>>
>> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
>
> What guarantee do we have that loopback is the first device per net ?
In net_dev_init():
/* The loopback device is special if any other network devices
* is present in a network namespace the loopback device must
* be present. Since we now dynamically allocate and free the
* loopback device ensure this invariant is maintained by
* keeping the loopback device as the first device on the
* list of network devices. Ensuring the loopback devices
* is the first device that appears and the last network device
* that disappears.
*/
if (register_pernet_device(&loopback_net_ops))
goto out;
> You should add this to the changelog because its not that obvious.
OK.
Thanks,
Pavel
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 5/5] net: Loopback ifindex is constant now
2012-08-09 7:52 [PATCH net-next 0/5] Per-net and on-demand link indices (and related) v3 Pavel Emelyanov
@ 2012-08-09 7:53 ` Pavel Emelyanov
0 siblings, 0 replies; 12+ messages in thread
From: Pavel Emelyanov @ 2012-08-09 7:53 UTC (permalink / raw)
To: David Miller, Eric Dumazet, Eric W. Biederman, Linux Netdev List,
Ben Hutchings
As pointed out, there are places, that access net->loopback_dev->ifindex
and after ifindex generation is made per-net this value becomes constant
equals 1. So go ahead and introduce the LOOPBACK_IFINDEX constant and use
it where appropriate.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
---
drivers/net/loopback.c | 1 +
include/net/net_namespace.h | 7 +++++++
net/decnet/dn_route.c | 6 +++---
net/ipv4/fib_frontend.c | 2 +-
net/ipv4/ipmr.c | 2 +-
net/ipv4/netfilter/ipt_rpfilter.c | 2 +-
net/ipv4/route.c | 6 +++---
net/ipv6/route.c | 2 +-
8 files changed, 18 insertions(+), 10 deletions(-)
diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index e2a06fd..4a075ba 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -197,6 +197,7 @@ static __net_init int loopback_net_init(struct net *net)
if (err)
goto out_free_netdev;
+ BUG_ON(dev->ifindex != LOOPBACK_IFINDEX);
net->loopback_dev = dev;
return 0;
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 6dc3db3..97e4419 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -105,6 +105,13 @@ struct net {
struct sock *diag_nlsk;
};
+/*
+ * ifindex generation is per-net namespace, and loopback is
+ * always the 1st device in ns (see net_dev_init), thus any
+ * loopback device should get ifindex 1
+ */
+
+#define LOOPBACK_IFINDEX 1
#include <linux/seq_file_net.h>
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index 85a3604..c855e8d 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -961,7 +961,7 @@ static int dn_route_output_slow(struct dst_entry **pprt, const struct flowidn *o
.saddr = oldflp->saddr,
.flowidn_scope = RT_SCOPE_UNIVERSE,
.flowidn_mark = oldflp->flowidn_mark,
- .flowidn_iif = init_net.loopback_dev->ifindex,
+ .flowidn_iif = LOOPBACK_IFINDEX,
.flowidn_oif = oldflp->flowidn_oif,
};
struct dn_route *rt = NULL;
@@ -979,7 +979,7 @@ static int dn_route_output_slow(struct dst_entry **pprt, const struct flowidn *o
"dn_route_output_slow: dst=%04x src=%04x mark=%d"
" iif=%d oif=%d\n", le16_to_cpu(oldflp->daddr),
le16_to_cpu(oldflp->saddr),
- oldflp->flowidn_mark, init_net.loopback_dev->ifindex,
+ oldflp->flowidn_mark, LOOPBACK_IFINDEX,
oldflp->flowidn_oif);
/* If we have an output interface, verify its a DECnet device */
@@ -1042,7 +1042,7 @@ source_ok:
if (!fld.daddr)
goto out;
}
- fld.flowidn_oif = init_net.loopback_dev->ifindex;
+ fld.flowidn_oif = LOOPBACK_IFINDEX;
res.type = RTN_LOCAL;
goto make_route;
}
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index c43ae3f..7f073a3 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -218,7 +218,7 @@ __be32 fib_compute_spec_dst(struct sk_buff *skb)
scope = RT_SCOPE_UNIVERSE;
if (!ipv4_is_zeronet(ip_hdr(skb)->saddr)) {
fl4.flowi4_oif = 0;
- fl4.flowi4_iif = net->loopback_dev->ifindex;
+ fl4.flowi4_iif = LOOPBACK_IFINDEX;
fl4.daddr = ip_hdr(skb)->saddr;
fl4.saddr = 0;
fl4.flowi4_tos = RT_TOS(ip_hdr(skb)->tos);
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 8eec8f4..3a57570 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1798,7 +1798,7 @@ static struct mr_table *ipmr_rt_fib_lookup(struct net *net, struct sk_buff *skb)
.flowi4_oif = (rt_is_output_route(rt) ?
skb->dev->ifindex : 0),
.flowi4_iif = (rt_is_output_route(rt) ?
- net->loopback_dev->ifindex :
+ LOOPBACK_IFINDEX :
skb->dev->ifindex),
.flowi4_mark = skb->mark,
};
diff --git a/net/ipv4/netfilter/ipt_rpfilter.c b/net/ipv4/netfilter/ipt_rpfilter.c
index 31371be..c301300 100644
--- a/net/ipv4/netfilter/ipt_rpfilter.c
+++ b/net/ipv4/netfilter/ipt_rpfilter.c
@@ -85,7 +85,7 @@ static bool rpfilter_mt(const struct sk_buff *skb, struct xt_action_param *par)
return ipv4_is_local_multicast(iph->daddr) ^ invert;
flow.flowi4_iif = 0;
} else {
- flow.flowi4_iif = dev_net(par->in)->loopback_dev->ifindex;
+ flow.flowi4_iif = LOOPBACK_IFINDEX;
}
flow.daddr = iph->saddr;
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 21ad369..c581373 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1619,7 +1619,7 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
if (res.type == RTN_LOCAL) {
err = fib_validate_source(skb, saddr, daddr, tos,
- net->loopback_dev->ifindex,
+ LOOPBACK_IFINDEX,
dev, in_dev, &itag);
if (err < 0)
goto martian_source_keep_err;
@@ -1895,7 +1895,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
orig_oif = fl4->flowi4_oif;
- fl4->flowi4_iif = net->loopback_dev->ifindex;
+ fl4->flowi4_iif = LOOPBACK_IFINDEX;
fl4->flowi4_tos = tos & IPTOS_RT_MASK;
fl4->flowi4_scope = ((tos & RTO_ONLINK) ?
RT_SCOPE_LINK : RT_SCOPE_UNIVERSE);
@@ -1984,7 +1984,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4)
if (!fl4->daddr)
fl4->daddr = fl4->saddr = htonl(INADDR_LOOPBACK);
dev_out = net->loopback_dev;
- fl4->flowi4_oif = net->loopback_dev->ifindex;
+ fl4->flowi4_oif = LOOPBACK_IFINDEX;
res.type = RTN_LOCAL;
flags |= RTCF_LOCAL;
goto make_route;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 8e80fd2..0ddf2d1 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -965,7 +965,7 @@ struct dst_entry * ip6_route_output(struct net *net, const struct sock *sk,
{
int flags = 0;
- fl6->flowi6_iif = net->loopback_dev->ifindex;
+ fl6->flowi6_iif = LOOPBACK_IFINDEX;
if ((sk && sk->sk_bound_dev_if) || rt6_need_strict(&fl6->daddr))
flags |= RT6_LOOKUP_F_IFACE;
--
1.7.6.5
^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-08-09 7:53 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-06 10:30 [PATCH net-next 0/5] Per-net and on-demand link indices (and related) Pavel Emelyanov
2012-08-06 10:30 ` [PATCH 1/5] net: Don't use ifindices in hash fns Pavel Emelyanov
2012-08-06 10:43 ` Eric Dumazet
2012-08-06 11:01 ` Pavel Emelyanov
2012-08-06 11:51 ` Eric Dumazet
2012-08-06 10:31 ` [PATCH 2/5] net: Allow to create links with given ifindex Pavel Emelyanov
2012-08-06 10:31 ` [PATCH 3/5] veth: Allow to create peer link " Pavel Emelyanov
2012-08-06 10:31 ` [PATCH 4/5] net: Make ifindex generation per-net namespace Pavel Emelyanov
2012-08-06 10:32 ` [PATCH 5/5] net: Loopback ifindex is constant now Pavel Emelyanov
2012-08-06 11:54 ` Eric Dumazet
2012-08-06 12:13 ` Pavel Emelyanov
-- strict thread matches above, loose matches on Subject: below --
2012-08-09 7:52 [PATCH net-next 0/5] Per-net and on-demand link indices (and related) v3 Pavel Emelyanov
2012-08-09 7:53 ` [PATCH 5/5] net: Loopback ifindex is constant now Pavel Emelyanov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox