All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [RFC] non-refcounted pages, application to slab?
Date: Wed, 25 Jan 2006 12:10:48 +0100	[thread overview]
Message-ID: <43D75CB8.9090101@cosmosbay.com> (raw)
In-Reply-To: <20060125105737.GB30421@wotan.suse.de>

Nick Piggin a écrit :
> On Wed, Jan 25, 2006 at 11:26:01AM +0100, Eric Dumazet wrote:
>> Nick Piggin a écrit :
>>> If an allocator knows exactly the lifetime of its page, then there is no
>>> need to do refcounting or the final put_page_zestzero (atomic op + mem
>>> barriers).
>>>
>>> This is probably not worthwhile for most cases, but slab did strike me
>>> as a potential candidate (however the complication here is that some
>>> code I think uses the refcount of underlying pages of slab allocations
>>> eg nommu code). So it is not a complete patch, but I wonder if anyone
>>> thinks the savings might be worth the complexity?
>>>
>>> Is there any particular code that is really heavy on slab allocations?
>>> That isn't mostly handled by the slab's internal freelists?
>> Hi Nick
>>
>> After reading your patch, I have some crazy idea.
>>
>> The atomic op + mem barrier you want to avoid could be avoided more 
>> generally just by changing atomic_dec_and_test(atomic_t *v).
>>
>> If the current thread is the last referer (refcnt = 1), then it can safely 
>> set the value to 0 because no other CPU can be touching the value (or else 
>> there must be a bug somewhere, as the 'other cpu' could touch the value 
>> just after us and we could free an object still in use by 'other cpu'
>>
> 
> I think that would work for this case, but you change the semantics
> of the function for all users which is bad.

Yes :) I did a test with a patched kernel and I got :

BUG: atomic counter underflow at:
  <c0103a3a> show_trace+0x20/0x22   <c0103b5b> dump_stack+0x1e/0x20
  <c01d6934> _atomic_dec_and_lock+0x78/0x88   <c0177599> dput+0xbf/0x187
  <c016dc96> path_release+0x14/0x30   <c016e540> __link_path_walk+0x36d/0xd5f
  <c016ef84> link_path_walk+0x52/0xd6   <c016f2ec> do_path_lookup+0xfc/0x220
  <c016f467> __path_lookup_intent_open+0x3e/0x73   <c016f4d1> 
path_lookup_open+0x35/0x37
  <c016fc79> open_namei+0x83/0x631   <c015f811> do_filp_open+0x38/0x56
  <c015fb83> do_sys_open+0x5c/0x99   <c015fbe7> sys_open+0x27/0x29
  <c0102bb3> sysenter_past_esp+0x54/0x75


So we cannot change atomic_dec_and_test(atomic_t *v) but introduce a new 
function like :

int atomic_dec_refcount(atomic_t *v)
{
#ifdef CONFIG_SMP
        /* avoid an atomic op if we are the last user of this refcount */
        if (atomic_read(v) == 1) {
                atomic_set(v, 0); /* not a real atomic op on most machines */
                return 1;
        }
#endif
	return atomic_dec_and_test(v);
}

The cost of the extra conditional branch is worth, if it can avoid an atomic op.


> 
> Such a test could be open coded in __free_page, although that does
> add a branch + some icache, but that might also be an option. (and
> my patch does also add to total icache footprint and is much uglier ;))
> 
> Thanks,
> Nick
> 
> 


WARNING: multiple messages have this Message-ID (diff)
From: Eric Dumazet <dada1@cosmosbay.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [RFC] non-refcounted pages, application to slab?
Date: Wed, 25 Jan 2006 12:10:48 +0100	[thread overview]
Message-ID: <43D75CB8.9090101@cosmosbay.com> (raw)
In-Reply-To: <20060125105737.GB30421@wotan.suse.de>

Nick Piggin a ecrit :
> On Wed, Jan 25, 2006 at 11:26:01AM +0100, Eric Dumazet wrote:
>> Nick Piggin a ecrit :
>>> If an allocator knows exactly the lifetime of its page, then there is no
>>> need to do refcounting or the final put_page_zestzero (atomic op + mem
>>> barriers).
>>>
>>> This is probably not worthwhile for most cases, but slab did strike me
>>> as a potential candidate (however the complication here is that some
>>> code I think uses the refcount of underlying pages of slab allocations
>>> eg nommu code). So it is not a complete patch, but I wonder if anyone
>>> thinks the savings might be worth the complexity?
>>>
>>> Is there any particular code that is really heavy on slab allocations?
>>> That isn't mostly handled by the slab's internal freelists?
>> Hi Nick
>>
>> After reading your patch, I have some crazy idea.
>>
>> The atomic op + mem barrier you want to avoid could be avoided more 
>> generally just by changing atomic_dec_and_test(atomic_t *v).
>>
>> If the current thread is the last referer (refcnt = 1), then it can safely 
>> set the value to 0 because no other CPU can be touching the value (or else 
>> there must be a bug somewhere, as the 'other cpu' could touch the value 
>> just after us and we could free an object still in use by 'other cpu'
>>
> 
> I think that would work for this case, but you change the semantics
> of the function for all users which is bad.

Yes :) I did a test with a patched kernel and I got :

BUG: atomic counter underflow at:
  <c0103a3a> show_trace+0x20/0x22   <c0103b5b> dump_stack+0x1e/0x20
  <c01d6934> _atomic_dec_and_lock+0x78/0x88   <c0177599> dput+0xbf/0x187
  <c016dc96> path_release+0x14/0x30   <c016e540> __link_path_walk+0x36d/0xd5f
  <c016ef84> link_path_walk+0x52/0xd6   <c016f2ec> do_path_lookup+0xfc/0x220
  <c016f467> __path_lookup_intent_open+0x3e/0x73   <c016f4d1> 
path_lookup_open+0x35/0x37
  <c016fc79> open_namei+0x83/0x631   <c015f811> do_filp_open+0x38/0x56
  <c015fb83> do_sys_open+0x5c/0x99   <c015fbe7> sys_open+0x27/0x29
  <c0102bb3> sysenter_past_esp+0x54/0x75


So we cannot change atomic_dec_and_test(atomic_t *v) but introduce a new 
function like :

int atomic_dec_refcount(atomic_t *v)
{
#ifdef CONFIG_SMP
        /* avoid an atomic op if we are the last user of this refcount */
        if (atomic_read(v) == 1) {
                atomic_set(v, 0); /* not a real atomic op on most machines */
                return 1;
        }
#endif
	return atomic_dec_and_test(v);
}

The cost of the extra conditional branch is worth, if it can avoid an atomic op.


> 
> Such a test could be open coded in __free_page, although that does
> add a branch + some icache, but that might also be an option. (and
> my patch does also add to total icache footprint and is much uglier ;))
> 
> Thanks,
> Nick
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2006-01-25 11:10 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-01-25  9:39 [RFC] non-refcounted pages, application to slab? Nick Piggin
2006-01-25  9:39 ` Nick Piggin
2006-01-25  9:54 ` Eric Dumazet
2006-01-25  9:54   ` Eric Dumazet
2006-01-25  9:56   ` Nick Piggin
2006-01-25  9:56     ` Nick Piggin
2006-01-25 10:26 ` Eric Dumazet
2006-01-25 10:26   ` Eric Dumazet
2006-01-25 10:57   ` Nick Piggin
2006-01-25 10:57     ` Nick Piggin
2006-01-25 11:10     ` Eric Dumazet [this message]
2006-01-25 11:10       ` Eric Dumazet
2006-01-25 11:18       ` Nick Piggin
2006-01-25 11:18         ` Nick Piggin
2006-01-25 10:30 ` Pekka Enberg
2006-01-25 10:30   ` Pekka Enberg
2006-01-25 11:00   ` Nick Piggin
2006-01-25 11:00     ` Nick Piggin
2006-01-25 11:19     ` Pekka Enberg
2006-01-25 11:19       ` Pekka Enberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43D75CB8.9090101@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.