Re: [PATCH 09/10] percpu: implement new dynamic percpu allocator

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Tejun Heo <tj@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: rusty@rustcorp.com.au, tglx@linutronix.de, x86@kernel.org,
	linux-kernel@vger.kernel.org, hpa@zytor.com, jeremy@goop.org,
	cpw@sgi.com, mingo@elte.hu
Subject: Re: [PATCH 09/10] percpu: implement new dynamic percpu allocator
Date: Fri, 20 Feb 2009 14:29:30 +0900	[thread overview]
Message-ID: <499E3FBA.50102@kernel.org> (raw)
In-Reply-To: <20090219190417.c664bff4.akpm@linux-foundation.org>

Hello, Andrew.

Andrew Morton wrote:
>> Hmmm... Is it customary to dump stack on allocation failure?  AFAICS,
>> kmalloc or vmalloc isn't doing it.
> 
> The page allocator (and hence kmalloc) will do it.
> 
> But a more important question is "is a trace useful".  I'd say "yes". 
> Because being told that something ran out of memory isn't terribly
> useful.  The very first question is "OK, well _what_ ran out of
> memory?".

Then the page allocator will do it for percpu allocator too other than
get_vm_area() failure.  I think the right place to add dump_stack()
would be failure path of get_mv_area().  Right?

>> Because those parameters are initialized during boot but should work
>> as constant once they're set up.  I wanted to make sure that these
>> values don't get assigned to or changed and make that clear by making
>> them look like constants as except for the init code they're constants
>> for all purposes.
> 
> Well, there are an infinite number of ways in which people can later
> introduce bugs.  Why defend against just one?  Particularly in a way
> which muckies up the code?
> 
> If you really want to defend against alterations, access these things
> via function calls rather than via nastycasts which masquerade as
> constants?
> 
> static inline int pcpu_unit_pages_shift(void)
> {
> 	return __pcpu_unit_pages_shift;
> }

It's more a notation to signify the semantics of usage rather than the
mechanics.  I don't really see why this is such a big deal.  Macros
evaluating to rvalue to act as pseudo constant isn't that uncommon.
Anyways, I'll just drop the macros and use the raw variables.

>>> Methinks vmalloc() should have a might_sleep().  Dunno.
>> I can add that but it's an internal utility function which is called
>> from a few well known obvious call sites to replace krealloc, so I
>> don't thinks it's a big deal.  Maybe the correct thing to do is adding
>> might_sleep() to vmalloc?
> 
> I think so.  It perhaps already has one, via indirect means.

Alright, will look into it and add it if it actually is missing.

>>> A designed decision has been made to not permit the caller to specify
>>> the allocation mode?
>> The design decision was inherited from the original percpu allocator.
> 
> That doesn't mean it's right ;)

:-)

> But I don't recall us ever wishing that the gfp_t arg had been
> included.
> 
>>>> +static void free_pcpu_chunk(struct pcpu_chunk *chunk)
>>>> +{
>>>> +	if (!chunk)
>>>> +		return;
>>> afaict this test is unneeded.
>> I think it's better to put NULL test to free functions in general.
>> People expect free functions to take NULL argument and swallow it.
>> Doint it otherwise adds unnecessary danger for subtle bugs later on.
> 
> It's a dumb convention.  In the vast majority of cases the pointer is
> not NULL.  We add a test-n-branch to 99.999999999% of cases just to
> save three seconds of programmer effort a single time.
> 
> A better design would have been to have kfree() and
> kfree_might_be_null().  (We can still do that by adding a new
> kfree_im_not_stupid() which doesn't do the check).
> 
> It's a bad tradeoff to expend billions of cycles on millions of
> machines to save a little programmer effort.
> 
> (And we're not consistent anyway - see pci_free_consistent)

By making free_pcpu_chunk() not accept NULL, we'll only increase the
inconsistency.  The given fact is that we simply can't remove it from
kfree() at this point.  With the most popular free function supporting
that convention, it's silly and unfruitful to do things otherwise.  It
forces callers of any free functions to go look for each function
implementation to check whether it accepts NULL or not and in many
cases to wrongly assume one way or the other.  I don't think the
minute performance gain justifies the programming overhead.  Given the
current situation, what needs fixing is pci_free_consistent().

>>>> +void *__alloc_percpu(size_t size, size_t align)
>>>> +{
>>>> +	void *ptr = NULL;
>>>> +	struct pcpu_chunk *chunk;
>>>> +	int slot, off, err;
>>>> +
>>>> +	if (unlikely(!size))
>>>> +		return NULL;
>>> hm.  Why do we do this?  Perhaps emitting this warning:
>>>
>>>> +	if (unlikely(size > PAGE_SIZE << PCPU_MIN_UNIT_PAGES_SHIFT ||
>>>> +		     align > PAGE_SIZE)) {
>>>> +		printk(KERN_WARNING "illegal size (%zu) or align (%zu) for "
>>>> +		       "percpu allocation\n", size, align);
>>> would be more appropriate.
>> Maybe.  Dunno.  Returning NULL is what malloc/calloc are allowed to do
>> at least.
> 
> Yes, but it is probably a programming error in the caller.  We want to
> report that asap, not hide it.  The buggy caller will probably now
> assume that the memory allocation failed and will bale out altogether,
> leaving everyone all confused.

I kind of agree to this one.  Most alloc functions do allow it but
yeap its usage isn't very prevalent and is much more likely to be
buggy.  Alright, WARN_ON() then.

>>>> +/**
>>>> + * free_percpu - free percpu area
>>>> + * @ptr: pointer to area to free
>>>> + *
>>>> + * Free percpu area @ptr.  Might sleep.
>>>> + */
>>>> +void free_percpu(void *ptr)
>>>> +{
>>>> +	void *addr = __pcpu_ptr_to_addr(ptr);
>>>> +	struct pcpu_chunk *chunk;
>>>> +	int off;
>>>> +
>>>> +	if (!ptr)
>>>> +		return;
>>> Do we ever do this?  Should it be permitted?  Should we warn?
>> Dunno but should be allowed, yes, no.  :-)
> 
> It adds cycles and hides caller bugs.  Zap it!

Heh heh... No! :-) I'm sorry but I think that's the wrong decision.

Thanks.

-- 
tejun

next prev parent reply	other threads:[~2009-02-20  5:30 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-18 12:04 [PATCHSET x86/core/percpu] implement dynamic percpu allocator Tejun Heo
2009-02-18 12:04 ` [PATCH 01/10] vmalloc: call flush_cache_vunmap() from unmap_kernel_range() Tejun Heo
2009-02-19 12:06   ` Nick Piggin
2009-02-19 22:36     ` David Miller
2009-02-18 12:04 ` [PATCH 02/10] module: fix out-of-range memory access Tejun Heo
2009-02-19 12:08   ` Nick Piggin
2009-02-20  7:16   ` Tejun Heo
2009-02-18 12:04 ` [PATCH 03/10] module: reorder module pcpu related functions Tejun Heo
2009-02-18 12:04 ` [PATCH 04/10] alloc_percpu: change percpu_ptr to per_cpu_ptr Tejun Heo
2009-02-18 12:04 ` [PATCH 05/10] alloc_percpu: add align argument to __alloc_percpu Tejun Heo
2009-02-18 12:04 ` [PATCH 06/10] percpu: kill percpu_alloc() and friends Tejun Heo
2009-02-19  0:17   ` Rusty Russell
2009-03-11 18:36   ` Tony Luck
2009-03-11 22:44     ` Rusty Russell
2009-03-12  2:06     ` Tejun Heo
2009-02-18 12:04 ` [PATCH 07/10] vmalloc: implement vm_area_register_early() Tejun Heo
2009-02-19  0:55   ` Tejun Heo
2009-02-19 12:09   ` Nick Piggin
2009-02-18 12:04 ` [PATCH 08/10] vmalloc: add un/map_kernel_range_noflush() Tejun Heo
2009-02-19 12:17   ` Nick Piggin
2009-02-20  1:27     ` Tejun Heo
2009-02-20  7:15   ` Subject: [PATCH 08/10 UPDATED] " Tejun Heo
2009-02-20  8:32     ` Andrew Morton
2009-02-21  3:21       ` Tejun Heo
2009-02-18 12:04 ` [PATCH 09/10] percpu: implement new dynamic percpu allocator Tejun Heo
2009-02-19 10:10   ` Andrew Morton
2009-02-19 11:01     ` Ingo Molnar
2009-02-20  2:45       ` Tejun Heo
2009-02-19 12:07     ` Rusty Russell
2009-02-20  2:35     ` Tejun Heo
2009-02-20  3:04       ` Andrew Morton
2009-02-20  5:29         ` Tejun Heo [this message]
2009-02-24  2:52         ` Rusty Russell
2009-02-19 11:51   ` Rusty Russell
2009-02-20  3:01     ` Tejun Heo
2009-02-20  3:02       ` Tejun Heo
2009-02-24  2:56       ` Rusty Russell
2009-02-24  5:27         ` [PATCH tj-percpu] percpu: add __read_mostly to variables which are mostly read only Tejun Heo
2009-02-24  5:47         ` [PATCH 09/10] percpu: implement new dynamic percpu allocator Tejun Heo
2009-02-24 17:41           ` Luck, Tony
2009-02-26  3:17             ` Tejun Heo
2009-02-27 19:41               ` Luck, Tony
2009-02-19 12:36   ` Nick Piggin
2009-02-20  3:04     ` Tejun Heo
2009-02-20  7:30   ` [PATCH UPDATED " Tejun Heo
2009-02-20  8:37     ` Andrew Morton
2009-02-21  3:23       ` Tejun Heo
2009-02-21  3:42         ` [PATCH tj-percpu] percpu: s/size/bytes/g in new percpu allocator and interface Tejun Heo
2009-02-21  7:48           ` Tejun Heo
2009-02-21  7:55             ` [PATCH tj-percpu] percpu: clean up size usage Tejun Heo
2009-02-21  7:56               ` Tejun Heo
2009-02-18 12:04 ` [PATCH 10/10] x86: convert to the new dynamic percpu allocator Tejun Heo
2009-02-18 13:43 ` [PATCHSET x86/core/percpu] implement " Ingo Molnar
2009-02-19  0:31   ` Tejun Heo
2009-02-19 10:51   ` Rusty Russell
2009-02-19 11:06     ` Ingo Molnar
2009-02-19 12:14       ` Rusty Russell
2009-02-20  3:08         ` Tejun Heo
2009-02-20  5:36           ` Tejun Heo
2009-02-20  7:33             ` Tejun Heo
2009-02-19  0:30 ` Tejun Heo
2009-02-19 11:07   ` Ingo Molnar
2009-02-20  3:17     ` Tejun Heo
2009-02-20  9:32       ` Ingo Molnar
2009-02-21  7:10         ` Tejun Heo
2009-02-21  7:33           ` Tejun Heo
2009-02-22 19:38             ` Ingo Molnar
2009-02-23  0:43               ` Tejun Heo
2009-02-23 10:17                 ` Ingo Molnar
2009-02-23 13:38                   ` [patch] x86: optimize __pa() to be linear again on 64-bit x86 Ingo Molnar
2009-02-23 14:08                     ` Nick Piggin
2009-02-23 14:53                       ` Ingo Molnar
2009-02-24 16:00                         ` Andi Kleen
2009-02-27  5:57                         ` Tejun Heo
2009-02-27  6:57                           ` Ingo Molnar
2009-02-27  7:11                             ` Tejun Heo
2009-02-22 19:27           ` [PATCHSET x86/core/percpu] implement dynamic percpu allocator Ingo Molnar
2009-02-23  0:47             ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=499E3FBA.50102@kernel.org \
    --to=tj@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cpw@sgi.com \
    --cc=hpa@zytor.com \
    --cc=jeremy@goop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rusty@rustcorp.com.au \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.