From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751117AbWAYLKv (ORCPT ); Wed, 25 Jan 2006 06:10:51 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751118AbWAYLKu (ORCPT ); Wed, 25 Jan 2006 06:10:50 -0500 Received: from gw1.cosmosbay.com ([62.23.185.226]:60842 "EHLO gw1.cosmosbay.com") by vger.kernel.org with ESMTP id S1751117AbWAYLKu (ORCPT ); Wed, 25 Jan 2006 06:10:50 -0500 Message-ID: <43D75CB8.9090101@cosmosbay.com> Date: Wed, 25 Jan 2006 12:10:48 +0100 From: Eric Dumazet User-Agent: Thunderbird 1.5 (Windows/20051201) MIME-Version: 1.0 To: Nick Piggin CC: Linux Kernel Mailing List , Linux Memory Management List Subject: Re: [RFC] non-refcounted pages, application to slab? References: <20060125093909.GE32653@wotan.suse.de> <43D75239.90907@cosmosbay.com> <20060125105737.GB30421@wotan.suse.de> In-Reply-To: <20060125105737.GB30421@wotan.suse.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (gw1.cosmosbay.com [172.16.8.80]); Wed, 25 Jan 2006 12:10:47 +0100 (CET) Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Nick Piggin a écrit : > On Wed, Jan 25, 2006 at 11:26:01AM +0100, Eric Dumazet wrote: >> Nick Piggin a écrit : >>> If an allocator knows exactly the lifetime of its page, then there is no >>> need to do refcounting or the final put_page_zestzero (atomic op + mem >>> barriers). >>> >>> This is probably not worthwhile for most cases, but slab did strike me >>> as a potential candidate (however the complication here is that some >>> code I think uses the refcount of underlying pages of slab allocations >>> eg nommu code). So it is not a complete patch, but I wonder if anyone >>> thinks the savings might be worth the complexity? >>> >>> Is there any particular code that is really heavy on slab allocations? >>> That isn't mostly handled by the slab's internal freelists? >> Hi Nick >> >> After reading your patch, I have some crazy idea. >> >> The atomic op + mem barrier you want to avoid could be avoided more >> generally just by changing atomic_dec_and_test(atomic_t *v). >> >> If the current thread is the last referer (refcnt = 1), then it can safely >> set the value to 0 because no other CPU can be touching the value (or else >> there must be a bug somewhere, as the 'other cpu' could touch the value >> just after us and we could free an object still in use by 'other cpu' >> > > I think that would work for this case, but you change the semantics > of the function for all users which is bad. Yes :) I did a test with a patched kernel and I got : BUG: atomic counter underflow at: show_trace+0x20/0x22 dump_stack+0x1e/0x20 _atomic_dec_and_lock+0x78/0x88 dput+0xbf/0x187 path_release+0x14/0x30 __link_path_walk+0x36d/0xd5f link_path_walk+0x52/0xd6 do_path_lookup+0xfc/0x220 __path_lookup_intent_open+0x3e/0x73 path_lookup_open+0x35/0x37 open_namei+0x83/0x631 do_filp_open+0x38/0x56 do_sys_open+0x5c/0x99 sys_open+0x27/0x29 sysenter_past_esp+0x54/0x75 So we cannot change atomic_dec_and_test(atomic_t *v) but introduce a new function like : int atomic_dec_refcount(atomic_t *v) { #ifdef CONFIG_SMP /* avoid an atomic op if we are the last user of this refcount */ if (atomic_read(v) == 1) { atomic_set(v, 0); /* not a real atomic op on most machines */ return 1; } #endif return atomic_dec_and_test(v); } The cost of the extra conditional branch is worth, if it can avoid an atomic op. > > Such a test could be open coded in __free_page, although that does > add a branch + some icache, but that might also be an option. (and > my patch does also add to total icache footprint and is much uglier ;)) > > Thanks, > Nick > >