From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [RFC PATCH]: Dynamically sized routing cache hash table. Date: Tue, 06 Mar 2007 08:58:45 +0100 Message-ID: <45ED1F35.4070600@cosmosbay.com> References: <20070305.202632.74752497.davem@davemloft.net> <45ED14E6.7090109@cosmosbay.com> <20070305.232329.95506510.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: netdev@vger.kernel.org, robert.olsson@its.uu.se, npiggin@suse.de To: David Miller Return-path: Received: from gw1.cosmosbay.com ([86.65.150.130]:57569 "EHLO gw1.cosmosbay.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933996AbXCFH7A (ORCPT ); Tue, 6 Mar 2007 02:59:00 -0500 In-Reply-To: <20070305.232329.95506510.davem@davemloft.net> Sender: netdev-owner@vger.kernel.org List-Id: netdev.vger.kernel.org David Miller a =E9crit : > From: Eric Dumazet > Date: Tue, 06 Mar 2007 08:14:46 +0100 >=20 >> I wonder... are you sure this has no relation with the size of rt_ha= sh_locks /=20 >> RT_HASH_LOCK_SZ ? >> One entry must have the same lock in the two tables when resizing is= in flight. >> #define MIN_RTHASH_SHIFT LOG2(RT_HASH_LOCK_SZ) >=20 > Good point. >=20 >>> +static struct rt_hash_bucket *rthash_alloc(unsigned int sz) >>> +{ >>> + struct rt_hash_bucket *n; >>> + >>> + if (sz <=3D PAGE_SIZE) >>> + n =3D kmalloc(sz, GFP_KERNEL); >>> + else if (hashdist) >>> + n =3D __vmalloc(sz, GFP_KERNEL, PAGE_KERNEL); >>> + else >>> + n =3D (struct rt_hash_bucket *) >>> + __get_free_pages(GFP_KERNEL, get_order(sz)); >> I dont feel well with this. >> Maybe we could try a __get_free_pages(), and in case of failure, fal= lback to=20 >> vmalloc(). Then keep a flag to be able to free memory correctly. Any= way, if=20 >> (get_order(sz)>=3DMAX_ORDER) we know __get_free_pages() will fail. >=20 > We have to use vmalloc() for the hashdist case so that the pages > are spread out properly on NUMA systems. That's exactly what the > large system hash allocator is going to do on bootup anyways. Yes, but on bootup you have an appropriate NUMA active policy. (Well...= we=20 hope so, but it broke several time in the past) I am not sure what kind of mm policy is active for scheduled works. Anyway I have some XX GB machines, non NUMA, and I would love to be abl= e to=20 have a 2^20 slots hash table, without having to increase MAX_ORDER. >=20 > Look, either both are right or both are wrong. I'm just following > protocol above and you'll note the PRECISE same logic exists in other > dynamically growing hash table implementations such as > net/xfrm/xfrm_hash.c >=20 Yes, they are both wrong/dumb :) Can we be smarter, or do we have to stay dumb ? :) struct rt_hash_bucket *n =3D NULL; if (sz <=3D PAGE_SIZE) { n =3D kmalloc(sz, GFP_KERNEL); *kind =3D allocated_by_kmalloc; } else if (!hashdist) { n =3D (struct rt_hash_bucket *) __get_free_pages(GFP_KERNEL, get_order(sz)); *kind =3D allocated_by_get_free_pages; } if (!n) { n =3D __vmalloc(sz, GFP_KERNEL, PAGE_KERNEL); *kind =3D allocated_by_vmalloc; }