Re: [PATCH 3/3] epoll: use rwlock in order to reduce ep_poll_callback() contention

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrea Parri <andrea.parri@amarulasolutions.com>
To: Roman Penyaev <rpenyaev@suse.de>
Cc: Davidlohr Bueso <dbueso@suse.de>, Jason Baron <jbaron@akamai.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/3] epoll: use rwlock in order to reduce ep_poll_callback() contention
Date: Thu, 13 Dec 2018 12:19:57 +0100	[thread overview]
Message-ID: <20181213111957.GA8459@andrea> (raw)
In-Reply-To: <a30306ad68b4b7171fbb587f5f845ea5@suse.de>

On Thu, Dec 13, 2018 at 11:13:58AM +0100, Roman Penyaev wrote:
> On 2018-12-12 18:13, Andrea Parri wrote:
> > On Wed, Dec 12, 2018 at 12:03:57PM +0100, Roman Penyaev wrote:
> 
> [...]
> 
> > > +static inline void list_add_tail_lockless(struct list_head *new,
> > > +					  struct list_head *head)
> > > +{
> > > +	struct list_head *prev;
> > > +
> > > +	new->next = head;
> > > +
> > > +	/*
> > > +	 * Initially ->next of a new element must be updated with the head
> > > +	 * (we are inserting to the tail) and only then pointers are
> > > atomically
> > > +	 * exchanged.  XCHG guarantees memory ordering, thus ->next should
> > > be
> > > +	 * updated before pointers are actually swapped.
> > > +	 */
> > > +
> > > +	prev = xchg(&head->prev, new);
> > > +
> > > +	/*
> > > +	 * It is safe to modify prev->next and new->prev, because a new
> > > element
> > > +	 * is added only to the tail and new->next is updated before XCHG.
> > > +	 */
> > 
> > IIUC, you're also relying on "some" ordering between the atomic load
> > of &head->prev above and the store to prev->next below: consider the
> > following snippet for two concurrent list_add_tail_lockless()'s:
> > 
> > {Initially: List := H -> A -> B}
> > 
> > CPU0					CPU1
> > 
> > list_add_tail_lockless(C, H):		list_add_tail_lockless(D, H):
> > 
> > C->next = H				D->next = H
> > prev = xchg(&H->prev, C) // =B		prev = xchg(&H->prev, D) // =C
> > B->next = C				C->next = D
> > C->prev = B				D->prev = C
> > 
> > Here, as annotated, CPU0's xchg() "wins" over CPU1's xchg() (i.e., the
> > latter reads the value of &H->prev that the former stored to that same
> > location).
> > 
> > As you noted above, the xchg() guarantees that CPU0's store to C->next
> > is "ordered before" CPU0's store to &H->prev.
> > 
> > But we also want CPU1's load from &H->prev to be ordered before CPU1's
> > store to C->next, which is also guaranteed by the xchg() (or, FWIW, by
> > the address dependency between these two memory accesses).
> > 
> > I do not see what could guarantee "C->next == D" in the end, otherwise.
> > 
> > What am I missing?
> 
> Hi Andrea,
> 
> xchg always acts as a full memory barrier, i.e. mfence in x86 terms.  So the
> following statement should be always true, otherwise nothing should work as
> the same code pattern is used in many generic places:
> 
>    CPU0               CPU1
> 
>  C->next = H
>  xchg(&ptr, C)
>                      C = xchg(&ptr, D)
>                      C->next = D
> 
> 
> This is the only guarantee we need, i.e. make it simplier:
> 
>    C->next = H
>    mfence            mfence
>                      C->next = D
> 
> the gurantee that two stores won't reorder.  Pattern is always the same: we
> prepare chunk of memory on CPU0 and do pointers xchg, CPU1 sees chunks of
> memory with all stores committed by CPU0 (regardless of CPU1 does loads
> or stores to this chunk).
> 
> I am repeating the same thing which you also noted, but I just want to be
> sure that I do not say nonsense.  So basically repeating to myself.
> 
> Ok, let's commit that.  Returning to your question: "I do not see what
> could guarantee "C->next == D" in the end"
> 
> At the end of what?  Lockless insert procedure (insert to tail) relies only
> on "head->prev".  This is the single "place" where we atomically exchange
> list elements and "somehow" chain them.  So insert needs only actual
> "head->prev", and xchg provides this guarantees to us.

When all the operations reported in the snippet have completed (i.e.,
executed and propagated to memory).

To rephrase my remark:

I am saying that we do need some ordering between the xchg() and the
program-order _subsequent stores, and implicitly suggesting to write
this down in the comment.  As I wrote, this ordering _is provided by
the xchg() itself or by the dependency; so, maybe, something like:

	/*
	 * [...]  XCHG guarantees memory ordering, thus new->next is
	 * updated before pointers are actually swapped and pointers
	 * are swapped before prev->next is updated.
	 */

Adding a snippet, say in the form you reported above, would not hurt
of course. ;-)

  Andrea


> 
> But there is also a user of the list, who needs to iterate over the list
> or to delete elements, etc, i.e. this user of the list needs list fully
> committed to the memory.  This user takes write_lock().  So answering your
> question (if I understood it correctly): at the end write_lock() guarantees
> that list won't be seen as corrupted and updates to the last element, i.e.
> "->next" or "->prev" pointers of the last element are committed and seen
> correctly.
> 
> --
> Roman
> 
> 
> 
> 
>

next prev parent reply	other threads:[~2018-12-13 11:20 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-12 11:03 [PATCH 0/3] use rwlock in order to reduce ep_poll_callback() contention Roman Penyaev
2018-12-12 11:03 ` [PATCH 1/3] epoll: make sure all elements in ready list are in FIFO order Roman Penyaev
2018-12-13 19:30   ` Davidlohr Bueso
2018-12-12 11:03 ` [PATCH 2/3] epoll: loosen irq safety in ep_poll_callback() Roman Penyaev
2018-12-12 11:03 ` [PATCH 3/3] epoll: use rwlock in order to reduce ep_poll_callback() contention Roman Penyaev
     [not found]   ` <20181212171348.GA12786@andrea>
2018-12-13 10:13     ` Roman Penyaev
2018-12-13 11:19       ` Andrea Parri [this message]
2018-12-13 12:19         ` Roman Penyaev
2018-12-13 18:13 ` [PATCH 0/3] " Davidlohr Bueso
2018-12-17 11:49   ` Roman Penyaev
2018-12-17 18:01     ` Davidlohr Bueso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181213111957.GA8459@andrea \
    --to=andrea.parri@amarulasolutions.com \
    --cc=akpm@linux-foundation.org \
    --cc=dbueso@suse.de \
    --cc=jbaron@akamai.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=rpenyaev@suse.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.