Re: [RFC][PATCH 6/7] freelist: Lock less freelist

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Oleg Nesterov <oleg@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, mhiramat@kernel.org,
	Eddy_Wu@trendmicro.com, x86@kernel.org, davem@davemloft.net,
	rostedt@goodmis.org, naveen.n.rao@linux.ibm.com,
	anil.s.keshavamurthy@intel.com, linux-arch@vger.kernel.org,
	cameron@moodycamel.com, will@kernel.org, paulmck@kernel.org
Subject: Re: [RFC][PATCH 6/7] freelist: Lock less freelist
Date: Fri, 28 Aug 2020 16:46:52 +0200	[thread overview]
Message-ID: <20200828144650.GF28468@redhat.com> (raw)
In-Reply-To: <20200827161754.535381269@infradead.org>

On 08/27, Peter Zijlstra wrote:
>
>  1 file changed, 129 insertions(+)

129 lines! And I spent more than 2 hours trying to understand these
129 lines ;) looks correct...

However, I still can't understand the usage of _acquire/release ops
in this code.

> +static inline void __freelist_add(struct freelist_node *node, struct freelist_head *list)
> +{
> +	/*
> +	 * Since the refcount is zero, and nobody can increase it once it's
> +	 * zero (except us, and we run only one copy of this method per node at
> +	 * a time, i.e. the single thread case), then we know we can safely
> +	 * change the next pointer of the node; however, once the refcount is
> +	 * back above zero, then other threads could increase it (happens under
> +	 * heavy contention, when the refcount goes to zero in between a load
> +	 * and a refcount increment of a node in try_get, then back up to
> +	 * something non-zero, then the refcount increment is done by the other
> +	 * thread) -- so if the CAS to add the node to the actual list fails,
> +	 * decrese the refcount and leave the add operation to the next thread
> +	 * who puts the refcount back to zero (which could be us, hence the
> +	 * loop).
> +	 */
> +	struct freelist_node *head = READ_ONCE(list->head);
> +
> +	for (;;) {
> +		WRITE_ONCE(node->next, head);
> +		atomic_set_release(&node->refs, 1);
> +
> +		if (!try_cmpxchg_release(&list->head, &head, node)) {

OK, these 2 _release above look understandable, they pair with
atomic_try_cmpxchg_acquire/try_cmpxchg_acquire in freelist_try_get().

> +			/*
> +			 * Hmm, the add failed, but we can only try again when
> +			 * the refcount goes back to zero.
> +			 */
> +			if (atomic_fetch_add_release(REFS_ON_FREELIST - 1, &node->refs) == 1)
> +				continue;

Do we really need _release ? Why can't atomic_fetch_add_relaxed() work?

> +static inline struct freelist_node *freelist_try_get(struct freelist_head *list)
> +{
> +	struct freelist_node *prev, *next, *head = smp_load_acquire(&list->head);
> +	unsigned int refs;
> +
> +	while (head) {
> +		prev = head;
> +		refs = atomic_read(&head->refs);
> +		if ((refs & REFS_MASK) == 0 ||
> +		    !atomic_try_cmpxchg_acquire(&head->refs, &refs, refs+1)) {
> +			head = smp_load_acquire(&list->head);
> +			continue;
> +		}
> +
> +		/*
> +		 * Good, reference count has been incremented (it wasn't at
> +		 * zero), which means we can read the next and not worry about
> +		 * it changing between now and the time we do the CAS.
> +		 */
> +		next = READ_ONCE(head->next);
> +		if (try_cmpxchg_acquire(&list->head, &head, next)) {
> +			/*
> +			 * Yay, got the node. This means it was on the list,
> +			 * which means should-be-on-freelist must be false no
> +			 * matter the refcount (because nobody else knows it's
> +			 * been taken off yet, it can't have been put back on).
> +			 */
> +			WARN_ON_ONCE(atomic_read(&head->refs) & REFS_ON_FREELIST);
> +
> +			/*
> +			 * Decrease refcount twice, once for our ref, and once
> +			 * for the list's ref.
> +			 */
> +			atomic_fetch_add(-2, &head->refs);

Do we the barriers implied by _fetch_? Why can't atomic_sub(2, refs) work?

> +		/*
> +		 * OK, the head must have changed on us, but we still need to decrement
> +		 * the refcount we increased.
> +		 */
> +		refs = atomic_fetch_add(-1, &prev->refs);

Cosmetic, but why not atomic_fetch_dec() ?

Oleg.

next prev parent reply	other threads:[~2020-08-28 14:47 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-27 16:12 [RFC][PATCH 0/7] kprobes: Make kretprobes lockless Peter Zijlstra
2020-08-27 16:12 ` [RFC][PATCH 1/7] llist: Add nonatomic __llist_add() Peter Zijlstra
2020-08-27 16:12 ` [RFC][PATCH 2/7] sched: Fix try_invoke_on_locked_down_task() semantics Peter Zijlstra
2020-08-27 16:12 ` [RFC][PATCH 3/7] kprobes: Remove kretprobe hash Peter Zijlstra
2020-08-27 18:00   ` Masami Hiramatsu
2020-08-28  8:44     ` peterz
2020-08-28  9:07     ` Masami Hiramatsu
2020-08-28  4:44   ` Masami Hiramatsu
2020-08-28 13:11   ` Eddy_Wu
2020-08-28 13:38     ` peterz
2020-08-28 13:51     ` Masami Hiramatsu
2020-08-28 13:58       ` peterz
2020-08-28 14:19         ` Masami Hiramatsu
2020-08-28 14:11       ` Eddy_Wu
2020-08-28 14:19         ` peterz
2020-08-28 14:41           ` Masami Hiramatsu
2020-08-28 14:49     ` Masami Hiramatsu
2020-08-27 16:12 ` [RFC][PATCH 4/7] kprobe: Dont kfree() from breakpoint context Peter Zijlstra
2020-08-27 16:12 ` [RFC][PATCH 5/7] asm-generic/atomic: Add try_cmpxchg() fallbacks Peter Zijlstra
2020-08-27 16:12 ` [RFC][PATCH 6/7] freelist: Lock less freelist Peter Zijlstra
2020-08-27 16:37   ` peterz
     [not found]     ` <CAFCw3doX6KK5DwpG_OB331Mdw8uYeVqn8YPTjKh_a-m7ZB9+3A@mail.gmail.com>
2020-08-27 16:56       ` peterz
2020-08-27 17:00         ` Cameron
2020-08-27 19:08   ` Boqun Feng
2020-08-27 19:57     ` Cameron
2020-08-28  1:34       ` Boqun Feng
2020-08-28  4:03   ` Lai Jiangshan
2020-08-28 14:46   ` Oleg Nesterov [this message]
2020-08-28 15:29     ` peterz
2020-08-29  3:05       ` Cameron
2020-08-27 16:12 ` [RFC][PATCH 7/7] kprobes: Replace rp->free_instance with freelist Peter Zijlstra
2020-08-28  8:48   ` peterz
2020-08-28  9:13     ` Masami Hiramatsu
2020-08-28  9:18       ` peterz
2020-08-28 10:44         ` Masami Hiramatsu
2020-08-29  2:29         ` Cameron
2020-08-29  2:31           ` Cameron
2020-08-29  9:15             ` Masami Hiramatsu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200828144650.GF28468@redhat.com \
    --to=oleg@redhat.com \
    --cc=Eddy_Wu@trendmicro.com \
    --cc=anil.s.keshavamurthy@intel.com \
    --cc=cameron@moodycamel.com \
    --cc=davem@davemloft.net \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=naveen.n.rao@linux.ibm.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.