All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Laight <David.Laight@ACULAB.COM>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	David Miller <davem@davemloft.net>,
	paul.gortmaker@windriver.com, tim.bird@am.sony.com,
	kuznet@ms2.inr.ac.ru, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org
Subject: Re: RFC: memory leak in udp_table_init
Date: Thu, 15 Mar 2012 10:44:23 -0700	[thread overview]
Message-ID: <20120315174422.GG2381@linux.vnet.ibm.com> (raw)
In-Reply-To: <1330605218.2465.50.camel@edumazet-laptop>

On Thu, Mar 01, 2012 at 04:33:38AM -0800, Eric Dumazet wrote:
> Le jeudi 01 mars 2012 à 08:55 +0000, David Laight a écrit :
> > > > The pid table is a good example of something where a hash
> > > > table is unnecessary.
> > > > Linux should steal the code I put into NetBSD :-)
> > > 
> > > On this unrelated topic.  What algorithm did you use on NetBSD for
> > > dealing with pids?
> > 
> > Basically I forced the hash chain length to one by allocating
> > a pid that hit an empty entry in the table.
> > 
> > So you start off with (say) 64 entries and use the low 6
> > bits to index the table. The higher bits are incremented
> > each time a 'slot' is reused.
> > Free entries are kept in a FIFO list.
> > So each entry either contains a pointer to the process,
> > or the high bits and the index of the next free slot.
> > (and the PGID pointer).
> > When there are only (say) 2 free entries, then the table
> > size is doubled, the pointers moved to the correct places,
> > the free ist fixed up, and the minimum number of free entries
> > doubled.
> > 
> > The overall effect:
> > - lookup is only ever a mask and index + compare.
> > - Allocate is always fast and fixed cost (except when
> >   the table size has to be doubled).
> > - A pid value will never be reused within (about) 2000
> >   allocates (for 16bit pids, much larger for 32bit ones).
> > - Allocated pid numbers tend to be random, certainly
> >   very difficult to predict.
> > - Small memory footprint for small systems.
> > For pids we normally avoid issuing large values, but
> > will do so to avoid immediate re-use on systems that
> > have 1000s of active processes.
> > 
> 
> 
> You describe a hash table mechanism still, and you made chain lengthes
> be 0 or 1.
> 
> This GEN_ID/SLOT schem is the one used in IBM AIX for pid allocations
> (with a 31 (or was it 32) bit range). They did not use FIFO, because
> they tried to use lower slots of the proc table.
> 
> Hashes values in network land are unpredictable, so we cannot make sure
> a slot contains at most one entry.
> 
> Note: If you manage 4 million tcp sockets in your server, hash table
> must be at _least_ 4 million slots. I would not call that suboptimal as
> you said in your previous mail.
> 
> We could argue that default sizes of these hash tables (ip route,
> tcp, ...) are now more suited for high end uses instead of
> desktop/embedded uses, because ram sizes increased so much last 10
> years.
> 
> [ We added in commits 0ccfe61803ad & c9503e0fe05 a 512K limits for
> tcp/ip route hash tables to stop insanity, but thats it ]
> 
> RCU lookups make dynamic resizes of these hash table a bit complex.
> IIRC David Miller have submitted a patch but this work was not
> completed. Not sure its an issue these days, as we prefer to work on ip
> route cache (and its associated hash table) removal anyway.

One approach is to put a pair of list_head structures in each object, then
proceed as Herbert Xu did: http://lists.openwall.net/netdev/2010/02/28/20.

There is another RCU-protected resizable hash table with the userspace
RCU library: http://lttng.org/urcu.  And another approach is described
in a USENIX paper:

http://www.usenix.org/event/atc11/tech/final_files/Triplett.pdf

						Thanx, Paul

> For UDP/UDPLite it certainly is not an issue, since max size is 65536
> slots. On a 32bit kernel, with at more 1GB of LOWMEM, max is 512 slots.
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


  reply	other threads:[~2012-03-15 17:44 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-25  0:55 RFC: memory leak in udp_table_init Tim Bird
2012-02-25  1:27 ` Paul Gortmaker
2012-02-25  5:19   ` Eric Dumazet
2012-02-26 19:20     ` David Miller
2012-02-27  5:40       ` Eric Dumazet
2012-02-27  5:44         ` David Miller
2012-02-27 11:33         ` David Laight
2012-02-29 18:28           ` Eric W. Biederman
2012-03-01  8:55             ` David Laight
2012-03-01 12:33               ` Eric Dumazet
2012-03-15 17:44                 ` Paul E. McKenney [this message]
2012-03-02  0:12               ` Eric W. Biederman
2012-02-27  6:03     ` [PATCH v2] mm: add a low limit to alloc_large_system_hash Eric Dumazet
2012-02-27  6:45       ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120315174422.GG2381@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=David.Laight@ACULAB.COM \
    --cc=davem@davemloft.net \
    --cc=ebiederm@xmission.com \
    --cc=eric.dumazet@gmail.com \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=paul.gortmaker@windriver.com \
    --cc=tim.bird@am.sony.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.