Re: [RFC][PATCH 6/7] freelist: Lock less freelist

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Oleg Nesterov <oleg@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, mhiramat@kernel.org,
	Eddy_Wu@trendmicro.com, x86@kernel.org, davem@davemloft.net,
	rostedt@goodmis.org, naveen.n.rao@linux.ibm.com,
	anil.s.keshavamurthy@intel.com, linux-arch@vger.kernel.org,
	cameron@moodycamel.com, will@kernel.org, paulmck@kernel.org
Subject: Re: [RFC][PATCH 6/7] freelist: Lock less freelist
Date: Fri, 28 Aug 2020 16:46:52 +0200	[thread overview]
Message-ID: <20200828144650.GF28468@redhat.com> (raw)
In-Reply-To: <20200827161754.535381269@infradead.org>

On 08/27, Peter Zijlstra wrote:
>
>  1 file changed, 129 insertions(+)

129 lines! And I spent more than 2 hours trying to understand these
129 lines ;) looks correct...

However, I still can't understand the usage of _acquire/release ops
in this code.

> +static inline void __freelist_add(struct freelist_node *node, struct freelist_head *list)
> +{
> +	/*
> +	 * Since the refcount is zero, and nobody can increase it once it's
> +	 * zero (except us, and we run only one copy of this method per node at
> +	 * a time, i.e. the single thread case), then we know we can safely
> +	 * change the next pointer of the node; however, once the refcount is
> +	 * back above zero, then other threads could increase it (happens under
> +	 * heavy contention, when the refcount goes to zero in between a load
> +	 * and a refcount increment of a node in try_get, then back up to
> +	 * something non-zero, then the refcount increment is done by the other
> +	 * thread) -- so if the CAS to add the node to the actual list fails,
> +	 * decrese the refcount and leave the add operation to the next thread
> +	 * who puts the refcount back to zero (which could be us, hence the
> +	 * loop).
> +	 */
> +	struct freelist_node *head = READ_ONCE(list->head);
> +
> +	for (;;) {
> +		WRITE_ONCE(node->next, head);
> +		atomic_set_release(&node->refs, 1);
> +
> +		if (!try_cmpxchg_release(&list->head, &head, node)) {

OK, these 2 _release above look understandable, they pair with
atomic_try_cmpxchg_acquire/try_cmpxchg_acquire in freelist_try_get().

> +			/*
> +			 * Hmm, the add failed, but we can only try again when
> +			 * the refcount goes back to zero.
> +			 */
> +			if (atomic_fetch_add_release(REFS_ON_FREELIST - 1, &node->refs) == 1)
> +				continue;

Do we really need _release ? Why can't atomic_fetch_add_relaxed() work?

> +static inline struct freelist_node *freelist_try_get(struct freelist_head *list)
> +{
> +	struct freelist_node *prev, *next, *head = smp_load_acquire(&list->head);
> +	unsigned int refs;
> +
> +	while (head) {
> +		prev = head;
> +		refs = atomic_read(&head->refs);
> +		if ((refs & REFS_MASK) == 0 ||
> +		    !atomic_try_cmpxchg_acquire(&head->refs, &refs, refs+1)) {
> +			head = smp_load_acquire(&list->head);
> +			continue;
> +		}
> +
> +		/*
> +		 * Good, reference count has been incremented (it wasn't at
> +		 * zero), which means we can read the next and not worry about
> +		 * it changing between now and the time we do the CAS.
> +		 */
> +		next = READ_ONCE(head->next);
> +		if (try_cmpxchg_acquire(&list->head, &head, next)) {
> +			/*
> +			 * Yay, got the node. This means it was on the list,
> +			 * which means should-be-on-freelist must be false no
> +			 * matter the refcount (because nobody else knows it's
> +			 * been taken off yet, it can't have been put back on).
> +			 */
> +			WARN_ON_ONCE(atomic_read(&head->refs) & REFS_ON_FREELIST);
> +
> +			/*
> +			 * Decrease refcount twice, once for our ref, and once
> +			 * for the list's ref.
> +			 */
> +			atomic_fetch_add(-2, &head->refs);

Do we the barriers implied by _fetch_? Why can't atomic_sub(2, refs) work?

> +		/*
> +		 * OK, the head must have changed on us, but we still need to decrement
> +		 * the refcount we increased.
> +		 */
> +		refs = atomic_fetch_add(-1, &prev->refs);

Cosmetic, but why not atomic_fetch_dec() ?

Oleg.

next prev parent reply	other threads:[~2020-08-28 14:47 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-27 16:12 [RFC][PATCH 0/7] kprobes: Make kretprobes lockless Peter Zijlstra
2020-08-27 16:12 ` [RFC][PATCH 1/7] llist: Add nonatomic __llist_add() Peter Zijlstra
2020-08-27 16:12 ` [RFC][PATCH 2/7] sched: Fix try_invoke_on_locked_down_task() semantics Peter Zijlstra
2020-08-27 16:12 ` [RFC][PATCH 3/7] kprobes: Remove kretprobe hash Peter Zijlstra
2020-08-27 18:00   ` Masami Hiramatsu
2020-08-28  8:44     ` peterz
2020-08-28  9:07     ` Masami Hiramatsu
2020-08-28  4:44   ` Masami Hiramatsu
2020-08-28 13:11   ` Eddy_Wu
2020-08-28 13:38     ` peterz
2020-08-28 13:51     ` Masami Hiramatsu
2020-08-28 13:58       ` peterz
2020-08-28 14:19         ` Masami Hiramatsu
2020-08-28 14:11       ` Eddy_Wu
2020-08-28 14:19         ` peterz
2020-08-28 14:41           ` Masami Hiramatsu
2020-08-28 14:49     ` Masami Hiramatsu
2020-08-27 16:12 ` [RFC][PATCH 4/7] kprobe: Dont kfree() from breakpoint context Peter Zijlstra
2020-08-27 16:12 ` [RFC][PATCH 5/7] asm-generic/atomic: Add try_cmpxchg() fallbacks Peter Zijlstra
2020-08-27 16:12 ` [RFC][PATCH 6/7] freelist: Lock less freelist Peter Zijlstra
2020-08-27 16:37   ` peterz
     [not found]     ` <CAFCw3doX6KK5DwpG_OB331Mdw8uYeVqn8YPTjKh_a-m7ZB9+3A@mail.gmail.com>
2020-08-27 16:56       ` peterz
2020-08-27 17:00         ` Cameron
2020-08-27 19:08   ` Boqun Feng
2020-08-27 19:57     ` Cameron
2020-08-28  1:34       ` Boqun Feng
2020-08-28  4:03   ` Lai Jiangshan
2020-08-28 14:46   ` Oleg Nesterov [this message]
2020-08-28 15:29     ` peterz
2020-08-29  3:05       ` Cameron
2020-08-27 16:12 ` [RFC][PATCH 7/7] kprobes: Replace rp->free_instance with freelist Peter Zijlstra
2020-08-28  8:48   ` peterz
2020-08-28  9:13     ` Masami Hiramatsu
2020-08-28  9:18       ` peterz
2020-08-28 10:44         ` Masami Hiramatsu
2020-08-29  2:29         ` Cameron
2020-08-29  2:31           ` Cameron
2020-08-29  9:15             ` Masami Hiramatsu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200828144650.GF28468@redhat.com \
    --to=oleg@redhat.com \
    --cc=Eddy_Wu@trendmicro.com \
    --cc=anil.s.keshavamurthy@intel.com \
    --cc=cameron@moodycamel.com \
    --cc=davem@davemloft.net \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=naveen.n.rao@linux.ibm.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox