public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH for-3.3] mempool: clean up and document synchronization and memory barrier usage
Date: Wed, 21 Dec 2011 15:55:56 +0100	[thread overview]
Message-ID: <20111221145556.GA25657@redhat.com> (raw)
In-Reply-To: <20111220221818.GJ10752@google.com>

On 12/20, Tejun Heo wrote:
>
> For #1, The only necessary condition is that curr_nr visible at free
> is from after the allocation of the element being freed (details in
> the comment).  For most cases, this is true without any barrier but
> there can be fringe cases where the allocated pointer is passed to the
> freeing task without going through memory barriers.  To cover this
> case, wmb is necessary before returning from allocation and rmb is
> necessary before reading curr_nr.  IOW,
>
> 	ALLOCATING TASK			FREEING TASK
>
> 	update pool state after alloc;
> 	wmb();
> 	pass pointer to freeing task;
> 					read pointer;
> 					rmb();
> 					read pool state to free;
>
> The wmb can be the implied one from unlocking pool->lock and smp_mb()
> in mempool_free() can be replaced with smp_rmb().

OK... I do not know if this is the bug or not, but I agree this race
is possible.

> Furthermore, mempool_alloc() is already holding pool->lock when it
> decides that it needs to wait.  There is no reason to do unlock - add
> waitqueue - test condition again.  It can simply add itself to
> waitqueue while holding pool->lock and then unlock and sleep.

Confused. I agree, we can hold pool->lock until schedule(). But, at
the same time, why should we hold it?

Or I missed the reason why we must not unlock before prepare_to_wait?

> --- work.orig/mm/mempool.c
> +++ work/mm/mempool.c
> @@ -223,29 +223,31 @@ repeat_alloc:
>  	spin_lock_irqsave(&pool->lock, flags);
>  	if (likely(pool->curr_nr)) {
>  		element = remove_element(pool);
> +		/* implied wmb in unlock is faired with rmb in mempool_free() */
>  		spin_unlock_irqrestore(&pool->lock, flags);

Not really afaics. unlock() doesn't imply wmb().

So, if some thread does PTR = mempool_alloc(), the "final" store to PTR
can leak into the critical section, and it can be reordered inside this
section with --curr_nr.

See the fat comment about set_current_state() in prepare_to_wait().

> @@ -265,7 +267,39 @@ void mempool_free(void *element, mempool
>  	if (unlikely(element == NULL))
>  		return;
>  
> -	smp_mb();
> +	/*
> +	 * Paired with implied wmb of @pool->lock in mempool_alloc().  The
> +	 * preceding read is for @element and the following @pool->curr_nr.
> +	 * This ensures that the visible value of @pool->curr_nr is from
> +	 * after the allocation of @element.  This is necessary for fringe
> +	 * cases where @element was passed to this task without going
> +	 * through barriers.
> +	 *
> +	 * For example, assume @p is %NULL at the beginning and one task
> +	 * performs "p = mempool_alloc(...);" while another task is doing
> +	 * "while (!p) cpu_relax(); mempool_free(p, ...);".  This function
> +	 * may end up using curr_nr value which is from before allocation
> +	 * of @p without the following rmb.
> +	 */
> +	smp_rmb();
> +
> +	/*
> +	 * For correctness, we need a test which is guaranteed to trigger
> +	 * if curr_nr + #allocated == min_nr.  Testing curr_nr < min_nr
> +	 * without locking achieves that and refilling as soon as possible
> +	 * is desirable.
> +	 *
> +	 * Because curr_nr visible here is always a value after the
> +	 * allocation of @element, any task which decremented curr_nr below
> +	 * min_nr is guaranteed to see curr_nr < min_nr unless curr_nr gets
> +	 * incremented to min_nr afterwards.  If curr_nr gets incremented
> +	 * to min_nr after the allocation of @element, the elements
> +	 * allocated after that are subject to the same guarantee.
> +	 *
> +	 * Waiters happen iff curr_nr is 0 and the above guarantee also
> +	 * ensures that there will be frees which return elements to the
> +	 * pool waking up the waiters.
> +	 */
>  	if (pool->curr_nr < pool->min_nr) {
>  		spin_lock_irqsave(&pool->lock, flags);
>  		if (pool->curr_nr < pool->min_nr) {

Just to check my understanding...

IOW. Whatever we do, we can race with other threads and miss "curr_nr < min_nr"
condition. But this is fine, unless this element (which we are going to free)
was the reason for this condition. Otherwise we can rely on another mempool_free(),
the waiters in mempool_alloc() can never hang forever.

Yes, I think this is right.

Oleg.


  reply	other threads:[~2011-12-21 15:01 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-20 22:18 [PATCH for-3.3] mempool: clean up and document synchronization and memory barrier usage Tejun Heo
2011-12-21 14:55 ` Oleg Nesterov [this message]
2011-12-21 14:57   ` Oleg Nesterov
2011-12-21 16:37   ` Tejun Heo
2011-12-21 16:44     ` Tejun Heo
2011-12-21 17:40     ` Oleg Nesterov
2011-12-21 18:52       ` Tejun Heo
2011-12-21 15:12 ` mempool && io_schedule_timeout() Oleg Nesterov
2011-12-21 15:34   ` Tejun Heo
2011-12-21 19:08 ` [PATCH for-3.3 UPDATED] mempool: fix and document synchronization and memory barrier usage Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111221145556.GA25657@redhat.com \
    --to=oleg@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox