* [PATCH] [next-next-2.6] net: configurable device name hash
@ 2009-11-11 19:16 Octavian Purdila
2009-11-11 19:21 ` David Miller
0 siblings, 1 reply; 12+ messages in thread
From: Octavian Purdila @ 2009-11-11 19:16 UTC (permalink / raw)
To: netdev
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 0addd45..8a129d5 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -29,8 +29,7 @@ struct net_generic;
struct sock;
-#define NETDEV_HASHBITS 8
-#define NETDEV_HASHENTRIES (1 << NETDEV_HASHBITS)
+#define NETDEV_HASHENTRIES (1 << CONFIG_NETDEV_HASHBITS)
struct net {
atomic_t count; /* To decided when the network
diff --git a/net/Kconfig b/net/Kconfig
index 041c35e..f5db7b2 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -45,6 +45,13 @@ config COMPAT_NETLINK_MESSAGES
menu "Networking options"
+config NETDEV_HASHBITS
+ int "Network device hash size"
+ range 8 20
+ default 8
+ help
+ Select network device hash size as a power of 2.
+
source "net/packet/Kconfig"
source "net/unix/Kconfig"
source "net/xfrm/Kconfig"
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH] [next-next-2.6] net: configurable device name hash
2009-11-11 19:16 [PATCH] [next-next-2.6] net: configurable device name hash Octavian Purdila
@ 2009-11-11 19:21 ` David Miller
2009-11-11 19:38 ` Octavian Purdila
0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2009-11-11 19:21 UTC (permalink / raw)
To: opurdila; +Cc: netdev
From: Octavian Purdila <opurdila@ixiacom.com>
Date: Wed, 11 Nov 2009 21:16:14 +0200
> Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
We're not doing this sorry.
Dynamically size it at boot time or something, but a config
option is out of the question.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [next-next-2.6] net: configurable device name hash
2009-11-11 19:21 ` David Miller
@ 2009-11-11 19:38 ` Octavian Purdila
2009-11-11 20:08 ` Eric Dumazet
2009-11-11 20:42 ` David Miller
0 siblings, 2 replies; 12+ messages in thread
From: Octavian Purdila @ 2009-11-11 19:38 UTC (permalink / raw)
To: David Miller; +Cc: netdev
On Wednesday 11 November 2009 21:21:20 you wrote:
> From: Octavian Purdila <opurdila@ixiacom.com>
> Date: Wed, 11 Nov 2009 21:16:14 +0200
>
> > Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
>
> We're not doing this sorry.
>
> Dynamically size it at boot time or something, but a config
> option is out of the question.
>
I don't think we can dynamically size it at boot time since it depends on the
usage pattern which is impossible to determine at boot time, right?
Would it be acceptable to grow it at runtime, in list_netdevice for instance?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [next-next-2.6] net: configurable device name hash
2009-11-11 19:38 ` Octavian Purdila
@ 2009-11-11 20:08 ` Eric Dumazet
2009-11-11 20:32 ` Octavian Purdila
2009-11-11 20:42 ` David Miller
1 sibling, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2009-11-11 20:08 UTC (permalink / raw)
To: Octavian Purdila; +Cc: David Miller, netdev
Octavian Purdila a écrit :
> On Wednesday 11 November 2009 21:21:20 you wrote:
>> From: Octavian Purdila <opurdila@ixiacom.com>
>> Date: Wed, 11 Nov 2009 21:16:14 +0200
>>
>>> Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
>> We're not doing this sorry.
>>
>> Dynamically size it at boot time or something, but a config
>> option is out of the question.
>>
>
> I don't think we can dynamically size it at boot time since it depends on the
> usage pattern which is impossible to determine at boot time, right?
>
> Would it be acceptable to grow it at runtime, in list_netdevice for instance?
It will be really hard, now we use RCU lookups...
What workload could reasonably need 1.000.000 hash slots, and 16.000.000 netdevices ?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [next-next-2.6] net: configurable device name hash
2009-11-11 20:08 ` Eric Dumazet
@ 2009-11-11 20:32 ` Octavian Purdila
0 siblings, 0 replies; 12+ messages in thread
From: Octavian Purdila @ 2009-11-11 20:32 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev
On Wednesday 11 November 2009 22:08:31 you wrote:
> Octavian Purdila a écrit :
> > On Wednesday 11 November 2009 21:21:20 you wrote:
> >> From: Octavian Purdila <opurdila@ixiacom.com>
> >> Date: Wed, 11 Nov 2009 21:16:14 +0200
> >>
> >>> Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
> >>
> >> We're not doing this sorry.
> >>
> >> Dynamically size it at boot time or something, but a config
> >> option is out of the question.
> >
> > I don't think we can dynamically size it at boot time since it depends on
> > the usage pattern which is impossible to determine at boot time, right?
> >
> > Would it be acceptable to grow it at runtime, in list_netdevice for
> > instance?
>
> It will be really hard, now we use RCU lookups...
>
OK, I've forgot about that :)
> What workload could reasonably need 1.000.000 hash slots, and 16.000.000
> netdevices ?
>
And yes, I clearly get ahead of myself with that 20 bits.
Lets say we will max it to 14 for machines with over 1G of memory, would it be
acceptable to consume 64K out of that even if in most of the usecases we will
only have a handful of interfaces?
So, on second thought, perhaps is better to leave this alone and have those
few users who need it to change NETDEV_HASHBITS themselves - its not like its
a too heavy patch to carry around.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [next-next-2.6] net: configurable device name hash
2009-11-11 19:38 ` Octavian Purdila
2009-11-11 20:08 ` Eric Dumazet
@ 2009-11-11 20:42 ` David Miller
2009-11-11 21:33 ` Stephen Hemminger
1 sibling, 1 reply; 12+ messages in thread
From: David Miller @ 2009-11-11 20:42 UTC (permalink / raw)
To: opurdila; +Cc: netdev
From: Octavian Purdila <opurdila@ixiacom.com>
Date: Wed, 11 Nov 2009 21:38:44 +0200
> I don't think we can dynamically size it at boot time since it
> depends on the usage pattern which is impossible to determine at
> boot time, right?
We have no idea how many sockets will be used by the system yet we
dynamically size the socket hash tables.
Please do some research and see how we handle this elsewhere in the
networking.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [next-next-2.6] net: configurable device name hash
2009-11-11 20:42 ` David Miller
@ 2009-11-11 21:33 ` Stephen Hemminger
2009-11-11 21:47 ` Octavian Purdila
0 siblings, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2009-11-11 21:33 UTC (permalink / raw)
To: David Miller; +Cc: opurdila, netdev
On Wed, 11 Nov 2009 12:42:35 -0800 (PST)
David Miller <davem@davemloft.net> wrote:
> From: Octavian Purdila <opurdila@ixiacom.com>
> Date: Wed, 11 Nov 2009 21:38:44 +0200
>
> > I don't think we can dynamically size it at boot time since it
> > depends on the usage pattern which is impossible to determine at
> > boot time, right?
>
> We have no idea how many sockets will be used by the system yet we
> dynamically size the socket hash tables.
>
> Please do some research and see how we handle this elsewhere in the
> networking.
dcache also sizes hash bits at boot time on available memory.
See alloc_large_system_hash().
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [next-next-2.6] net: configurable device name hash
2009-11-11 21:33 ` Stephen Hemminger
@ 2009-11-11 21:47 ` Octavian Purdila
2009-11-11 22:24 ` Stephen Hemminger
2009-11-12 2:36 ` David Miller
0 siblings, 2 replies; 12+ messages in thread
From: Octavian Purdila @ 2009-11-11 21:47 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David Miller, netdev
On Wednesday 11 November 2009 23:33:42 you wrote:
> On Wed, 11 Nov 2009 12:42:35 -0800 (PST)
>
> David Miller <davem@davemloft.net> wrote:
> > From: Octavian Purdila <opurdila@ixiacom.com>
> > Date: Wed, 11 Nov 2009 21:38:44 +0200
> >
> > > I don't think we can dynamically size it at boot time since it
> > > depends on the usage pattern which is impossible to determine at
> > > boot time, right?
> >
> > We have no idea how many sockets will be used by the system yet we
> > dynamically size the socket hash tables.
> >
> > Please do some research and see how we handle this elsewhere in the
> > networking.
>
> dcache also sizes hash bits at boot time on available memory.
> See alloc_large_system_hash().
>
Thanks Stephen.
I was actually taking a look at that but I see that the device hash is
allocated per net namespace which means we can't use
alloc_large_system_hash().
We could use a similar function that will work in the per namespace
initialization context, but this might upset net namespace folks since we will
get a large hash for every namespace.
Not sure what can be done to address that problem now except using a boot
parameter to override the defaults. A better solution would be to be able to
use "namespace create" parameters but it appears we don't have this
possibility, yet.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [next-next-2.6] net: configurable device name hash
2009-11-11 21:47 ` Octavian Purdila
@ 2009-11-11 22:24 ` Stephen Hemminger
2009-11-12 2:36 ` David Miller
1 sibling, 0 replies; 12+ messages in thread
From: Stephen Hemminger @ 2009-11-11 22:24 UTC (permalink / raw)
To: Octavian Purdila; +Cc: David Miller, netdev
On Wed, 11 Nov 2009 23:47:41 +0200
Octavian Purdila <opurdila@ixiacom.com> wrote:
> On Wednesday 11 November 2009 23:33:42 you wrote:
> > On Wed, 11 Nov 2009 12:42:35 -0800 (PST)
> >
> > David Miller <davem@davemloft.net> wrote:
> > > From: Octavian Purdila <opurdila@ixiacom.com>
> > > Date: Wed, 11 Nov 2009 21:38:44 +0200
> > >
> > > > I don't think we can dynamically size it at boot time since it
> > > > depends on the usage pattern which is impossible to determine at
> > > > boot time, right?
> > >
> > > We have no idea how many sockets will be used by the system yet we
> > > dynamically size the socket hash tables.
> > >
> > > Please do some research and see how we handle this elsewhere in the
> > > networking.
> >
> > dcache also sizes hash bits at boot time on available memory.
> > See alloc_large_system_hash().
> >
>
> Thanks Stephen.
>
> I was actually taking a look at that but I see that the device hash is
> allocated per net namespace which means we can't use
> alloc_large_system_hash().
>
> We could use a similar function that will work in the per namespace
> initialization context, but this might upset net namespace folks since we will
> get a large hash for every namespace.
>
> Not sure what can be done to address that problem now except using a boot
> parameter to override the defaults. A better solution would be to be able to
> use "namespace create" parameters but it appears we don't have this
> possibility, yet.
>
Remember though that really hash sizes really don't buy that much more speed.
Going from 256 to 1024 gives a 4x benefit but with 10,000 devices that
just means scanning 10 vs. 40 names. It is not like the file system cache
where name lookup is a major component of overhead.
You can still use alloc_large_system_hash, but just constrain it to a maximum
of order 10 or something.
--
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [next-next-2.6] net: configurable device name hash
2009-11-11 21:47 ` Octavian Purdila
2009-11-11 22:24 ` Stephen Hemminger
@ 2009-11-12 2:36 ` David Miller
2009-11-12 12:46 ` Mark Smith
1 sibling, 1 reply; 12+ messages in thread
From: David Miller @ 2009-11-12 2:36 UTC (permalink / raw)
To: opurdila; +Cc: shemminger, netdev
From: Octavian Purdila <opurdila@ixiacom.com>
Date: Wed, 11 Nov 2009 23:47:41 +0200
> We could use a similar function that will work in the per namespace
> initialization context, but this might upset net namespace folks
> since we will get a large hash for every namespace.
Use kzalloc(), that's sufficient for a 64K or so hash table which is
way more than you ever will need.
Use the GFP_* flags that will silently (ie. without a log message)
fail, and divide by two until you successfully allocate the table if
you're worried about memory fragmentation at allocation time.
This is so straightforward, I can't believe we're talking so much
about how to implement this, it's a 15 minute hack :-)
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [next-next-2.6] net: configurable device name hash
2009-11-12 2:36 ` David Miller
@ 2009-11-12 12:46 ` Mark Smith
2009-11-12 14:09 ` Eric Dumazet
0 siblings, 1 reply; 12+ messages in thread
From: Mark Smith @ 2009-11-12 12:46 UTC (permalink / raw)
To: David Miller; +Cc: opurdila, shemminger, netdev
On Wed, 11 Nov 2009 18:36:26 -0800 (PST)
David Miller <davem@davemloft.net> wrote:
> From: Octavian Purdila <opurdila@ixiacom.com>
> Date: Wed, 11 Nov 2009 23:47:41 +0200
>
> > We could use a similar function that will work in the per namespace
> > initialization context, but this might upset net namespace folks
> > since we will get a large hash for every namespace.
>
> Use kzalloc(), that's sufficient for a 64K or so hash table which is
> way more than you ever will need.
>
> Use the GFP_* flags that will silently (ie. without a log message)
> fail, and divide by two until you successfully allocate the table if
> you're worried about memory fragmentation at allocation time.
>
> This is so straightforward, I can't believe we're talking so much
> about how to implement this, it's a 15 minute hack :-)
Yes, but sadly, sometimes there is too much history(!) to be able to be
fully aware of it. "suck-it-and-see" type patches are possibly a
quicker way to find out what people are thinking right now!
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [next-next-2.6] net: configurable device name hash
2009-11-12 12:46 ` Mark Smith
@ 2009-11-12 14:09 ` Eric Dumazet
0 siblings, 0 replies; 12+ messages in thread
From: Eric Dumazet @ 2009-11-12 14:09 UTC (permalink / raw)
To: Mark Smith; +Cc: David Miller, opurdila, shemminger, netdev
Mark Smith a écrit :
> On Wed, 11 Nov 2009 18:36:26 -0800 (PST)
> David Miller <davem@davemloft.net> wrote:
>
>> From: Octavian Purdila <opurdila@ixiacom.com>
>> Date: Wed, 11 Nov 2009 23:47:41 +0200
>>
>>> We could use a similar function that will work in the per namespace
>>> initialization context, but this might upset net namespace folks
>>> since we will get a large hash for every namespace.
>> Use kzalloc(), that's sufficient for a 64K or so hash table which is
>> way more than you ever will need.
>>
>> Use the GFP_* flags that will silently (ie. without a log message)
>> fail, and divide by two until you successfully allocate the table if
>> you're worried about memory fragmentation at allocation time.
>>
>> This is so straightforward, I can't believe we're talking so much
>> about how to implement this, it's a 15 minute hack :-)
>
> Yes, but sadly, sometimes there is too much history(!) to be able to be
> fully aware of it. "suck-it-and-see" type patches are possibly a
> quicker way to find out what people are thinking right now!
>
Before extending hash tables, we should make sure existing algos are going to
scale with millions of netdevices, and they dont scale that much for the moment.
We still have many for_each_netdev() loops...
It's easy to change a constant somewhere in an include file, its less easy to make
real scalability changes :(
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2009-11-12 14:10 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-11 19:16 [PATCH] [next-next-2.6] net: configurable device name hash Octavian Purdila
2009-11-11 19:21 ` David Miller
2009-11-11 19:38 ` Octavian Purdila
2009-11-11 20:08 ` Eric Dumazet
2009-11-11 20:32 ` Octavian Purdila
2009-11-11 20:42 ` David Miller
2009-11-11 21:33 ` Stephen Hemminger
2009-11-11 21:47 ` Octavian Purdila
2009-11-11 22:24 ` Stephen Hemminger
2009-11-12 2:36 ` David Miller
2009-11-12 12:46 ` Mark Smith
2009-11-12 14:09 ` Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).