netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] [next-next-2.6] net: configurable device name hash
@ 2009-11-11 19:16 Octavian Purdila
  2009-11-11 19:21 ` David Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Octavian Purdila @ 2009-11-11 19:16 UTC (permalink / raw)
  To: netdev

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 0addd45..8a129d5 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -29,8 +29,7 @@ struct net_generic;
 struct sock;
 
 
-#define NETDEV_HASHBITS    8
-#define NETDEV_HASHENTRIES (1 << NETDEV_HASHBITS)
+#define NETDEV_HASHENTRIES (1 << CONFIG_NETDEV_HASHBITS)
 
 struct net {
 	atomic_t		count;		/* To decided when the network
diff --git a/net/Kconfig b/net/Kconfig
index 041c35e..f5db7b2 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -45,6 +45,13 @@ config COMPAT_NETLINK_MESSAGES
 
 menu "Networking options"
 
+config NETDEV_HASHBITS
+	int "Network device hash size"
+	range 8 20
+	default 8
+	help
+	  Select network device hash size as a power of 2.
+
 source "net/packet/Kconfig"
 source "net/unix/Kconfig"
 source "net/xfrm/Kconfig"


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] [next-next-2.6] net: configurable device name hash
  2009-11-11 19:16 [PATCH] [next-next-2.6] net: configurable device name hash Octavian Purdila
@ 2009-11-11 19:21 ` David Miller
  2009-11-11 19:38   ` Octavian Purdila
  0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2009-11-11 19:21 UTC (permalink / raw)
  To: opurdila; +Cc: netdev

From: Octavian Purdila <opurdila@ixiacom.com>
Date: Wed, 11 Nov 2009 21:16:14 +0200

> Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>

We're not doing this sorry.

Dynamically size it at boot time or something, but a config
option is out of the question.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] [next-next-2.6] net: configurable device name hash
  2009-11-11 19:21 ` David Miller
@ 2009-11-11 19:38   ` Octavian Purdila
  2009-11-11 20:08     ` Eric Dumazet
  2009-11-11 20:42     ` David Miller
  0 siblings, 2 replies; 12+ messages in thread
From: Octavian Purdila @ 2009-11-11 19:38 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Wednesday 11 November 2009 21:21:20 you wrote:
> From: Octavian Purdila <opurdila@ixiacom.com>
> Date: Wed, 11 Nov 2009 21:16:14 +0200
> 
> > Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
> 
> We're not doing this sorry.
> 
> Dynamically size it at boot time or something, but a config
> option is out of the question.
> 

I don't think we can dynamically size it at boot time since it depends on the 
usage pattern which is impossible to determine at boot time, right?

Would it be acceptable to grow it at runtime, in list_netdevice for instance?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] [next-next-2.6] net: configurable device name hash
  2009-11-11 19:38   ` Octavian Purdila
@ 2009-11-11 20:08     ` Eric Dumazet
  2009-11-11 20:32       ` Octavian Purdila
  2009-11-11 20:42     ` David Miller
  1 sibling, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2009-11-11 20:08 UTC (permalink / raw)
  To: Octavian Purdila; +Cc: David Miller, netdev

Octavian Purdila a écrit :
> On Wednesday 11 November 2009 21:21:20 you wrote:
>> From: Octavian Purdila <opurdila@ixiacom.com>
>> Date: Wed, 11 Nov 2009 21:16:14 +0200
>>
>>> Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
>> We're not doing this sorry.
>>
>> Dynamically size it at boot time or something, but a config
>> option is out of the question.
>>
> 
> I don't think we can dynamically size it at boot time since it depends on the 
> usage pattern which is impossible to determine at boot time, right?
> 
> Would it be acceptable to grow it at runtime, in list_netdevice for instance?

It will be really hard, now we use RCU lookups...

What workload could reasonably need 1.000.000 hash slots, and 16.000.000 netdevices ?



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] [next-next-2.6] net: configurable device name hash
  2009-11-11 20:08     ` Eric Dumazet
@ 2009-11-11 20:32       ` Octavian Purdila
  0 siblings, 0 replies; 12+ messages in thread
From: Octavian Purdila @ 2009-11-11 20:32 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

On Wednesday 11 November 2009 22:08:31 you wrote:
> Octavian Purdila a écrit :
> > On Wednesday 11 November 2009 21:21:20 you wrote:
> >> From: Octavian Purdila <opurdila@ixiacom.com>
> >> Date: Wed, 11 Nov 2009 21:16:14 +0200
> >>
> >>> Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
> >>
> >> We're not doing this sorry.
> >>
> >> Dynamically size it at boot time or something, but a config
> >> option is out of the question.
> >
> > I don't think we can dynamically size it at boot time since it depends on
> > the usage pattern which is impossible to determine at boot time, right?
> >
> > Would it be acceptable to grow it at runtime, in list_netdevice for
> > instance?
> 
> It will be really hard, now we use RCU lookups...
> 

OK, I've forgot about that :) 

> What workload could reasonably need 1.000.000 hash slots, and 16.000.000
>  netdevices ?
> 

And yes, I clearly get ahead of myself with that 20 bits. 

Lets say we will max it to 14 for machines with over 1G of memory, would it be 
acceptable to consume 64K out of that even if in most of the usecases we will 
only have a handful of interfaces?

So, on second thought, perhaps is better to leave this alone and have those 
few users who need it to change NETDEV_HASHBITS themselves - its not like its 
a too heavy patch to carry around.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] [next-next-2.6] net: configurable device name hash
  2009-11-11 19:38   ` Octavian Purdila
  2009-11-11 20:08     ` Eric Dumazet
@ 2009-11-11 20:42     ` David Miller
  2009-11-11 21:33       ` Stephen Hemminger
  1 sibling, 1 reply; 12+ messages in thread
From: David Miller @ 2009-11-11 20:42 UTC (permalink / raw)
  To: opurdila; +Cc: netdev

From: Octavian Purdila <opurdila@ixiacom.com>
Date: Wed, 11 Nov 2009 21:38:44 +0200

> I don't think we can dynamically size it at boot time since it
> depends on the usage pattern which is impossible to determine at
> boot time, right?

We have no idea how many sockets will be used by the system yet we
dynamically size the socket hash tables.

Please do some research and see how we handle this elsewhere in the
networking.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] [next-next-2.6] net: configurable device name hash
  2009-11-11 20:42     ` David Miller
@ 2009-11-11 21:33       ` Stephen Hemminger
  2009-11-11 21:47         ` Octavian Purdila
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2009-11-11 21:33 UTC (permalink / raw)
  To: David Miller; +Cc: opurdila, netdev

On Wed, 11 Nov 2009 12:42:35 -0800 (PST)
David Miller <davem@davemloft.net> wrote:

> From: Octavian Purdila <opurdila@ixiacom.com>
> Date: Wed, 11 Nov 2009 21:38:44 +0200
> 
> > I don't think we can dynamically size it at boot time since it
> > depends on the usage pattern which is impossible to determine at
> > boot time, right?
> 
> We have no idea how many sockets will be used by the system yet we
> dynamically size the socket hash tables.
> 
> Please do some research and see how we handle this elsewhere in the
> networking.

dcache also sizes hash bits at boot time on available memory.
See alloc_large_system_hash().

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] [next-next-2.6] net: configurable device name hash
  2009-11-11 21:33       ` Stephen Hemminger
@ 2009-11-11 21:47         ` Octavian Purdila
  2009-11-11 22:24           ` Stephen Hemminger
  2009-11-12  2:36           ` David Miller
  0 siblings, 2 replies; 12+ messages in thread
From: Octavian Purdila @ 2009-11-11 21:47 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev

On Wednesday 11 November 2009 23:33:42 you wrote:
> On Wed, 11 Nov 2009 12:42:35 -0800 (PST)
> 
> David Miller <davem@davemloft.net> wrote:
> > From: Octavian Purdila <opurdila@ixiacom.com>
> > Date: Wed, 11 Nov 2009 21:38:44 +0200
> >
> > > I don't think we can dynamically size it at boot time since it
> > > depends on the usage pattern which is impossible to determine at
> > > boot time, right?
> >
> > We have no idea how many sockets will be used by the system yet we
> > dynamically size the socket hash tables.
> >
> > Please do some research and see how we handle this elsewhere in the
> > networking.
> 
> dcache also sizes hash bits at boot time on available memory.
> See alloc_large_system_hash().
> 

Thanks Stephen.

I was actually taking a look at that but I see that the device hash is 
allocated per net namespace which means we can't use 
alloc_large_system_hash().

We could use a similar function that will work in the per namespace 
initialization context, but this might upset net namespace folks since we will 
get a large hash for every namespace.

Not sure what can be done to address that problem now except using a boot 
parameter to override the defaults. A better solution would be to be able to 
use "namespace create" parameters but it appears we don't have this 
possibility, yet.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] [next-next-2.6] net: configurable device name hash
  2009-11-11 21:47         ` Octavian Purdila
@ 2009-11-11 22:24           ` Stephen Hemminger
  2009-11-12  2:36           ` David Miller
  1 sibling, 0 replies; 12+ messages in thread
From: Stephen Hemminger @ 2009-11-11 22:24 UTC (permalink / raw)
  To: Octavian Purdila; +Cc: David Miller, netdev

On Wed, 11 Nov 2009 23:47:41 +0200
Octavian Purdila <opurdila@ixiacom.com> wrote:

> On Wednesday 11 November 2009 23:33:42 you wrote:
> > On Wed, 11 Nov 2009 12:42:35 -0800 (PST)
> > 
> > David Miller <davem@davemloft.net> wrote:
> > > From: Octavian Purdila <opurdila@ixiacom.com>
> > > Date: Wed, 11 Nov 2009 21:38:44 +0200
> > >
> > > > I don't think we can dynamically size it at boot time since it
> > > > depends on the usage pattern which is impossible to determine at
> > > > boot time, right?
> > >
> > > We have no idea how many sockets will be used by the system yet we
> > > dynamically size the socket hash tables.
> > >
> > > Please do some research and see how we handle this elsewhere in the
> > > networking.
> > 
> > dcache also sizes hash bits at boot time on available memory.
> > See alloc_large_system_hash().
> > 
> 
> Thanks Stephen.
> 
> I was actually taking a look at that but I see that the device hash is 
> allocated per net namespace which means we can't use 
> alloc_large_system_hash().
> 
> We could use a similar function that will work in the per namespace 
> initialization context, but this might upset net namespace folks since we will 
> get a large hash for every namespace.
> 
> Not sure what can be done to address that problem now except using a boot 
> parameter to override the defaults. A better solution would be to be able to 
> use "namespace create" parameters but it appears we don't have this 
> possibility, yet.
> 

Remember though that really hash sizes really don't buy that much more speed.
Going from 256 to 1024 gives a 4x benefit but with 10,000 devices that
just means scanning 10 vs. 40 names. It is not like the file system cache
where name lookup is a major component of overhead.

You can still use alloc_large_system_hash, but just constrain it to a maximum
of order 10 or something.

-- 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] [next-next-2.6] net: configurable device name hash
  2009-11-11 21:47         ` Octavian Purdila
  2009-11-11 22:24           ` Stephen Hemminger
@ 2009-11-12  2:36           ` David Miller
  2009-11-12 12:46             ` Mark Smith
  1 sibling, 1 reply; 12+ messages in thread
From: David Miller @ 2009-11-12  2:36 UTC (permalink / raw)
  To: opurdila; +Cc: shemminger, netdev

From: Octavian Purdila <opurdila@ixiacom.com>
Date: Wed, 11 Nov 2009 23:47:41 +0200

> We could use a similar function that will work in the per namespace
> initialization context, but this might upset net namespace folks
> since we will get a large hash for every namespace.

Use kzalloc(), that's sufficient for a 64K or so hash table which is
way more than you ever will need.

Use the GFP_* flags that will silently (ie. without a log message)
fail, and divide by two until you successfully allocate the table if
you're worried about memory fragmentation at allocation time.

This is so straightforward, I can't believe we're talking so much
about how to implement this, it's a 15 minute hack :-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] [next-next-2.6] net: configurable device name hash
  2009-11-12  2:36           ` David Miller
@ 2009-11-12 12:46             ` Mark Smith
  2009-11-12 14:09               ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Mark Smith @ 2009-11-12 12:46 UTC (permalink / raw)
  To: David Miller; +Cc: opurdila, shemminger, netdev

On Wed, 11 Nov 2009 18:36:26 -0800 (PST)
David Miller <davem@davemloft.net> wrote:

> From: Octavian Purdila <opurdila@ixiacom.com>
> Date: Wed, 11 Nov 2009 23:47:41 +0200
> 
> > We could use a similar function that will work in the per namespace
> > initialization context, but this might upset net namespace folks
> > since we will get a large hash for every namespace.
> 
> Use kzalloc(), that's sufficient for a 64K or so hash table which is
> way more than you ever will need.
> 
> Use the GFP_* flags that will silently (ie. without a log message)
> fail, and divide by two until you successfully allocate the table if
> you're worried about memory fragmentation at allocation time.
> 
> This is so straightforward, I can't believe we're talking so much
> about how to implement this, it's a 15 minute hack :-)

Yes, but sadly, sometimes there is too much history(!) to be able to be
fully aware of it. "suck-it-and-see" type patches are possibly a
quicker way to find out what people are thinking right now!

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] [next-next-2.6] net: configurable device name hash
  2009-11-12 12:46             ` Mark Smith
@ 2009-11-12 14:09               ` Eric Dumazet
  0 siblings, 0 replies; 12+ messages in thread
From: Eric Dumazet @ 2009-11-12 14:09 UTC (permalink / raw)
  To: Mark Smith; +Cc: David Miller, opurdila, shemminger, netdev

Mark Smith a écrit :
> On Wed, 11 Nov 2009 18:36:26 -0800 (PST)
> David Miller <davem@davemloft.net> wrote:
> 
>> From: Octavian Purdila <opurdila@ixiacom.com>
>> Date: Wed, 11 Nov 2009 23:47:41 +0200
>>
>>> We could use a similar function that will work in the per namespace
>>> initialization context, but this might upset net namespace folks
>>> since we will get a large hash for every namespace.
>> Use kzalloc(), that's sufficient for a 64K or so hash table which is
>> way more than you ever will need.
>>
>> Use the GFP_* flags that will silently (ie. without a log message)
>> fail, and divide by two until you successfully allocate the table if
>> you're worried about memory fragmentation at allocation time.
>>
>> This is so straightforward, I can't believe we're talking so much
>> about how to implement this, it's a 15 minute hack :-)
> 
> Yes, but sadly, sometimes there is too much history(!) to be able to be
> fully aware of it. "suck-it-and-see" type patches are possibly a
> quicker way to find out what people are thinking right now!
> 

Before extending hash tables, we should make sure existing algos are going to
scale with millions of netdevices, and they dont scale that much for the moment.
We still have many for_each_netdev() loops...

It's easy to change a constant somewhere in an include file, its less easy to make
real scalability changes :(


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-11-12 14:10 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-11 19:16 [PATCH] [next-next-2.6] net: configurable device name hash Octavian Purdila
2009-11-11 19:21 ` David Miller
2009-11-11 19:38   ` Octavian Purdila
2009-11-11 20:08     ` Eric Dumazet
2009-11-11 20:32       ` Octavian Purdila
2009-11-11 20:42     ` David Miller
2009-11-11 21:33       ` Stephen Hemminger
2009-11-11 21:47         ` Octavian Purdila
2009-11-11 22:24           ` Stephen Hemminger
2009-11-12  2:36           ` David Miller
2009-11-12 12:46             ` Mark Smith
2009-11-12 14:09               ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).