linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Nicholas Piggin <npiggin@gmail.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-arch <linux-arch@vger.kernel.org>,
	Avi Kivity <avi@scylladb.com>,
	maged michael <maged.michael@gmail.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	Dave Watson <davejwatson@fb.com>,
	Will Deacon <will.deacon@arm.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Andrew Hunter <ahh@google.com>, Paul Mackerras <paulus@samba.org>,
	Andy Lutomirski <luto@kernel.org>,
	Alan Stern <stern@rowland.harvard.edu>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	gromer <gromer@google.com>
Subject: Re: [PATCH v4 for 4.14 1/3] membarrier: Provide register expedited private command
Date: Fri, 29 Sep 2017 21:38:53 +1000	[thread overview]
Message-ID: <20170929213853.46c4675b@roar.ozlabs.ibm.com> (raw)
In-Reply-To: <20170929103131.un7tzxsixjoretal@hirez.programming.kicks-ass.net>

On Fri, 29 Sep 2017 12:31:31 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Sep 29, 2017 at 02:27:57AM +1000, Nicholas Piggin wrote:
> 
> > The biggest power boxes are more tightly coupled than those big
> > SGI systems, but even so just plodding along taking and releasing
> > locks in turn would be fine on those SGI ones as well really. Not DoS
> > level. This is not a single mega hot cache line or lock that is
> > bouncing over the entire machine, but one process grabbing a line and
> > lock from each of 1000 CPUs.
> > 
> > Slight disturbance sure, but each individual CPU will see it as 1/1000th
> > of a disturbance, most of the cost will be concentrated in the syscall
> > caller.  
> 
> But once the:
> 
> 	while (1)
> 		sys_membarrier()
> 
> thread has all those (lock) lines in M state locally, it will become
> very hard for the remote CPUs to claim them back, because its constantly

Not really. There is some ability to hold onto a line for a time, but
there is no way to starve them, let alone starve hundreds of other
CPUs. They will request the cacheline exclusive and eventually get it.
Then the membarrier CPU has to pay to get it back. If there is a lot of
activity on the locks, the membarrier will have a difficult time to take
each one.

I don't say there is zero cost or can't interfere with others, only that
it does not seem particularly bad compared with other things. Once you
restrict it to mm_cpumask, then it's quite partitionable.

I would really prefer to go this way on powerpc first. We could add the
the registration APIs as basically no-ops, but which would allow the
locking approach to be changed if we find it causes issues. I'll try to
find some time and a big system when I can.

> touching them. Sure it will touch a 1000 other lines before its back to
> this one, but if they're all local that's fairly quick.
> 
> But you're right, your big machines have far smaller NUMA factors.
> 
> > > Bouncing that lock across the machine is *painful*, I have vague
> > > memories of cases where the lock ping-pong was most the time spend.
> > > 
> > > But only Power needs this, all the other architectures are fine with the
> > > lockless approach for MEMBAR_EXPEDITED_PRIVATE.  
> > 
> > Yes, we can add an iterator function that power can override in a few
> > lines. Less arch specific code than this proposal.  
> 
> A semi related issue; I suppose we can do a arch upcall to flush_tlb_mm
> and reset the mm_cpumask when we change cpuset groups.

For powerpc we have been looking at how mm_cpumask can be improved.
It has real drawbacks even when you don't consider this new syscall.

Thanks,
Nick

  reply	other threads:[~2017-09-29 11:39 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20170926175151.14264-1-mathieu.desnoyers@efficios.com>
2017-09-26 20:43 ` [PATCH v4 for 4.14 1/3] membarrier: Provide register expedited private command Mathieu Desnoyers
2017-09-27 13:04   ` Nicholas Piggin
2017-09-28 13:31     ` Mathieu Desnoyers
2017-09-28 15:01       ` Nicholas Piggin
2017-09-28 15:29         ` Mathieu Desnoyers
2017-09-28 16:16           ` Nicholas Piggin
2017-09-28 18:28             ` Mathieu Desnoyers
2017-09-28 15:51         ` Peter Zijlstra
2017-09-28 16:27           ` Nicholas Piggin
2017-09-29 10:31             ` Peter Zijlstra
2017-09-29 11:38               ` Nicholas Piggin [this message]
2017-09-29 11:45                 ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170929213853.46c4675b@roar.ozlabs.ibm.com \
    --to=npiggin@gmail.com \
    --cc=ahh@google.com \
    --cc=avi@scylladb.com \
    --cc=boqun.feng@gmail.com \
    --cc=davejwatson@fb.com \
    --cc=gromer@google.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=luto@kernel.org \
    --cc=maged.michael@gmail.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=stern@rowland.harvard.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).