netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Laight <David.Laight@ACULAB.COM>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	David Miller <davem@davemloft.net>,
	paul.gortmaker@windriver.com, tim.bird@am.sony.com,
	kuznet@ms2.inr.ac.ru, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org
Subject: Re: RFC: memory leak in udp_table_init
Date: Thu, 15 Mar 2012 10:44:23 -0700	[thread overview]
Message-ID: <20120315174422.GG2381@linux.vnet.ibm.com> (raw)
In-Reply-To: <1330605218.2465.50.camel@edumazet-laptop>

On Thu, Mar 01, 2012 at 04:33:38AM -0800, Eric Dumazet wrote:
> Le jeudi 01 mars 2012 à 08:55 +0000, David Laight a écrit :
> > > > The pid table is a good example of something where a hash
> > > > table is unnecessary.
> > > > Linux should steal the code I put into NetBSD :-)
> > > 
> > > On this unrelated topic.  What algorithm did you use on NetBSD for
> > > dealing with pids?
> > 
> > Basically I forced the hash chain length to one by allocating
> > a pid that hit an empty entry in the table.
> > 
> > So you start off with (say) 64 entries and use the low 6
> > bits to index the table. The higher bits are incremented
> > each time a 'slot' is reused.
> > Free entries are kept in a FIFO list.
> > So each entry either contains a pointer to the process,
> > or the high bits and the index of the next free slot.
> > (and the PGID pointer).
> > When there are only (say) 2 free entries, then the table
> > size is doubled, the pointers moved to the correct places,
> > the free ist fixed up, and the minimum number of free entries
> > doubled.
> > 
> > The overall effect:
> > - lookup is only ever a mask and index + compare.
> > - Allocate is always fast and fixed cost (except when
> >   the table size has to be doubled).
> > - A pid value will never be reused within (about) 2000
> >   allocates (for 16bit pids, much larger for 32bit ones).
> > - Allocated pid numbers tend to be random, certainly
> >   very difficult to predict.
> > - Small memory footprint for small systems.
> > For pids we normally avoid issuing large values, but
> > will do so to avoid immediate re-use on systems that
> > have 1000s of active processes.
> > 
> 
> 
> You describe a hash table mechanism still, and you made chain lengthes
> be 0 or 1.
> 
> This GEN_ID/SLOT schem is the one used in IBM AIX for pid allocations
> (with a 31 (or was it 32) bit range). They did not use FIFO, because
> they tried to use lower slots of the proc table.
> 
> Hashes values in network land are unpredictable, so we cannot make sure
> a slot contains at most one entry.
> 
> Note: If you manage 4 million tcp sockets in your server, hash table
> must be at _least_ 4 million slots. I would not call that suboptimal as
> you said in your previous mail.
> 
> We could argue that default sizes of these hash tables (ip route,
> tcp, ...) are now more suited for high end uses instead of
> desktop/embedded uses, because ram sizes increased so much last 10
> years.
> 
> [ We added in commits 0ccfe61803ad & c9503e0fe05 a 512K limits for
> tcp/ip route hash tables to stop insanity, but thats it ]
> 
> RCU lookups make dynamic resizes of these hash table a bit complex.
> IIRC David Miller have submitted a patch but this work was not
> completed. Not sure its an issue these days, as we prefer to work on ip
> route cache (and its associated hash table) removal anyway.

One approach is to put a pair of list_head structures in each object, then
proceed as Herbert Xu did: http://lists.openwall.net/netdev/2010/02/28/20.

There is another RCU-protected resizable hash table with the userspace
RCU library: http://lttng.org/urcu.  And another approach is described
in a USENIX paper:

http://www.usenix.org/event/atc11/tech/final_files/Triplett.pdf

						Thanx, Paul

> For UDP/UDPLite it certainly is not an issue, since max size is 65536
> slots. On a 32bit kernel, with at more 1GB of LOWMEM, max is 512 slots.
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

  reply	other threads:[~2012-03-15 17:44 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-25  0:55 RFC: memory leak in udp_table_init Tim Bird
2012-02-25  1:27 ` Paul Gortmaker
2012-02-25  5:19   ` Eric Dumazet
2012-02-26 19:20     ` David Miller
2012-02-27  5:40       ` Eric Dumazet
2012-02-27  5:44         ` David Miller
2012-02-27 11:33         ` David Laight
2012-02-29 18:28           ` Eric W. Biederman
2012-03-01  8:55             ` David Laight
2012-03-01 12:33               ` Eric Dumazet
2012-03-15 17:44                 ` Paul E. McKenney [this message]
2012-03-02  0:12               ` Eric W. Biederman
2012-02-27  6:03     ` [PATCH v2] mm: add a low limit to alloc_large_system_hash Eric Dumazet
2012-02-27  6:45       ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120315174422.GG2381@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=David.Laight@ACULAB.COM \
    --cc=davem@davemloft.net \
    --cc=ebiederm@xmission.com \
    --cc=eric.dumazet@gmail.com \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=paul.gortmaker@windriver.com \
    --cc=tim.bird@am.sony.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).