From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alan Jenkins Date: Wed, 12 Nov 2008 20:48:56 +0000 Subject: Re: [PATCH] udevd: de-duplicate strings in rules Message-Id: <491B4138.9070706@tuffmail.co.uk> List-Id: References: <4919E8FB.2050809@tuffmail.co.uk> In-Reply-To: <4919E8FB.2050809@tuffmail.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-hotplug@vger.kernel.org Kay Sievers wrote: > On Wed, Nov 12, 2008 at 19:05, Alan Jenkins wrote: > >> Kay Sievers wrote: >> >>> On Wed, Nov 12, 2008 at 17:50, Alan Jenkins wrote: >>> >>>> Kay Sievers wrote: >>>> >>>>> On Tue, Nov 11, 2008 at 22:23, Kay Sievers wrote: >>>>> >>>>>> On Tue, Nov 11, 2008 at 21:20, Alan Jenkins wrote: >>>>>> >>>>>> >>>>>>> On my Ubuntu installation this removes 15k of duplicate strings, >>>>>>> using a temporary index of about 25k. >>>>>>> >>>>>>> >>>>>> Great. That looks nice. >>>>>> >>>>>> Thats's the diff of the rule dump before and after the patch: >>>>>> ... >>>>>> -[] shrunk to 64896 bytes tokens (5408 * 12 bytes), 57298 bytes buffer >>>>>> -[] dumping 5408 (64896 bytes) tokens, 5818 (57298 bytes) strings >>>>>> +[] shrunk to 64896 bytes tokens (5408 * 12 bytes), 18204 bytes buffer >>>>>> +[] used 40512 bytes of string index nodes (844 * 48 bytes) >>>>>> +[] dumping 5408 (64896 bytes) tokens, 1369 (18204 bytes) strings >>>>>> >>>>>> >>>>> I split the nodes and the childs in two independent arrays, so we got >>>>> rid of the limit of 10 childs per node. I've got ~200 fully uses slots >>>>> with the huge rules set here. Unlimited childs in the index removes >>>>> another 3 kB of duplicates, and the temporary index seems also a bit >>>>> smaller: >>>>> shrunk to 64896 bytes tokens (5408 * 12 bytes), 15324 bytes buffer >>>>> used 29456 bytes for index (1076 * 16 bytes nodes, 1020 * 12 bytes >>>>> child links) >>>>> >>>>> Would be great, if you can check if it still works for you as expected. :) >>>>> >>>>> >>>> Did you have a particular reason to keep the trie_root array? Now >>>> there's no fixed limit on children, you could just use trie[1] as the >>>> root node. Remove the special case for depth = 0. And initialize it's >>>> value and length to 0, then you can remove the special case for len = 0. >>>> >>>> >>> No special reason, I thought about that too, but it was already 5am, >>> and I was unable to think it through. :) >>> >>> Sounds nice to do that, did you try already, have a patch? >>> >>> >> No, sorry :). >> > > Ah, now by looking at it, maybe the then needed linear search for the > key in the root is not as good as the plain root array index? > Mmm. Ok, without the root array add_string() takes twice as long, which increases the total rules-loading time by 10%. (user time measured by cachegrind). Let's leave it.