netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: David Laight <David.Laight@ACULAB.COM>,
	Will Deacon <will.deacon@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Alexander Duyck <alexander.duyck@gmail.com>,
	Sinan Kaya <okaya@codeaurora.org>, Arnd Bergmann <arnd@arndb.de>,
	Jason Gunthorpe <jgg@ziepe.ca>, Oliver <oohall@gmail.com>,
	"open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)"
	<linuxppc-dev@lists.ozlabs.org>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: RFC on writel and writel_relaxed
Date: Thu, 29 Mar 2018 02:12:30 +1100	[thread overview]
Message-ID: <1522249950.21446.23.camel@kernel.crashing.org> (raw)
In-Reply-To: <a642dc060b484d98bc728eb1f4b5c600@AcuMS.aculab.com>

On Wed, 2018-03-28 at 11:30 +0000, David Laight wrote:
> From: Benjamin Herrenschmidt
> > Sent: 28 March 2018 10:56
> 
> ...
> > For example, let's say I have a device with a reset bit and the spec
> > says the reset bit needs to be set for at least 10us.
> > 
> > This is wrong:
> > 
> > 	writel(1, RESET_REG);
> > 	usleep(10);
> > 	writel(0, RESET_REG);
> > 
> > Because of write posting, the first write might arrive to the device
> > right before the second one.
> > 
> > The typical "fix" is to turn that into:
> > 
> > 	writel(1, RESET_REG);
> > 	readl(RESET_REG); /* Flush posted writes */
> 
> Would a writel(1, RESET_REG) here provide enough synchronsiation?

Probably yes. It's one of those things where you try to deal with the
fact that 90% of driver writers barely understand the basic stuff and
so you need the "default" accessors to be hardened as much as possible.

We still need to get a reasonably definition of the semantics of the
relaxed ones vs. WC memory but let's get through that exercise first
and hopefully for the last time.
 
> > 	usleep(10);
> > 	writel(0, RESET_REG);
> > 
> > *However* the issue here, at least on power, is that the CPU can issue
> > that readl but doesn't necessarily wait for it to complete (ie, the
> > data to return), before proceeding to the usleep. Now a usleep contains
> > a bunch of loads and stores and is probably fine, but a udelay which
> > just loops on the timebase may not be.
> > 
> > Thus we may still violate the timing requirement.
> 
> I've seem that sort of code (with udelay() and no read back) quite often.
> How many were in linux I don't know.
> 
> For small delays I usually fix it by repeated writes (of the same value)
> to the device register. That can guarantee very short intervals.

As long as you know the bus frequency...

> The only time I've actually seen buffered writes break timing was
> between a 286 and an 8859 interrupt controller.

:-)

The problem for me is not so much what I've seen, I realize that most
of the issues we are talking about are the kind that will hit once in a
thousand times or less.

But we *can* reason about them in a way that can effectively prevent
the problem completely and when your cluster has 10000 machine, 1/1000
starts becoming significant.

These days the vast majority of IO devices either are 99% DMA driven so
that a bit of overhead on MMIOs is irrelevant, or have one fast path
(low latency IB etc...) that needs some finer control, and the rest is
all setup which can be paranoid at will.

So I think we should aim for the "default" accessors most people use to
be as hadened as we can think of. I favor correctness over performance
in all cases. But then we also define a reasonable semantic for the
relaxed ones (well, we sort-of do have one, we might have to make it a
bit more precise in some areas) that allows the few MMIO fast path that
care to be optimized.

> If you wrote to the mask then enabled interrupts the first IACK cycle
> could be too close to write and break the cycle recovery time.
> That clobbered most of the interrupt controller registers.
> That probably affected every 286 board ever built!
> Not sure how much software added the required extra bus cycle.
> 
> > What we did inside readl, with the twi;isync sequence (which basically
> > means, trap on return value with "trap never" as a condition, followed
> > by isync that ensures all excpetion conditions are resolved), is force
> > the CPU to "consume" the data from the read before moving on.
> > 
> > This effectively makes readl fully synchronous (we would probably avoid
> > that if we were to implement a readl_relaxed).
> 
> I've always wondered exactly what the twi;isync were for - always seemed
> very heavy handed for most mmio reads.
> Particularly if you are doing mmio reads from a data fifo.

If you do that you should use the "s" version of the accessors. Those
will only do the above trick at the end of the access series. Also a
FIFO needs special care about endianness anyway, so you should use
those accessors regardless. (Hint: you never endian swap a FIFO even on
BE on a LE device, unless something's been wired very badly in HW).

> Perhaps there should be a writel_wait() that is allowed to do a read back
> for such code paths?

I think what we have is fine, we just define that the standard
writel/readl as fairly simple and hardened, and we look at providing a
somewhat reasonable set of relaxed variants for optimizing fast path.
We pretty much already are there, we just need to be better at defining
the semantics.

And for the super high perf case, which thankfully is either seldom
(server high perf network stuff) or very arch specific (ARM SoC stuff),
then arch specific driver hacks will always remain the norm.

Cheers,
Ben. 

> 	David
> 

  reply	other threads:[~2018-03-28 15:41 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1521692689.16434.293.camel@kernel.crashing.org>
     [not found] ` <CAOSf1CFv1HHCL3YOVhRn2U=grdNjaQ=A4m3xwxN2Rek1-_TySg@mail.gmail.com>
     [not found]   ` <1521726722.16434.312.camel@kernel.crashing.org>
     [not found]     ` <20180323163510.GC13033@ziepe.ca>
     [not found]       ` <1521854626.16434.359.camel@kernel.crashing.org>
     [not found]         ` <58ce5b83f40f4775bec1be8db66adb0d@AcuMS.aculab.com>
     [not found]           ` <20180326165425.GA15554@ziepe.ca>
     [not found]             ` <CAK8P3a1zeMyj+Z-y4ER4moY6Zip9EWNOinf+VnboGOrgiwbBZA@mail.gmail.com>
     [not found]               ` <20180326202545.GB15554@ziepe.ca>
     [not found]                 ` <CAK8P3a3fc43ZcW626hmsd3DVcLw7hGkdUMxp7s4Rn3mdkziwMQ@mail.gmail.com>
     [not found]                   ` <20180326210951.GD15554@ziepe.ca>
     [not found]                     ` <CAK8P3a2UU1xAM0NLo7Q4-Xgo1SzY3De1uqpFudr+2ZW7nHEPmA@mail.gmail.com>
     [not found]                       ` <1522101616.7364.13.camel@kernel.crashing.org>
2018-03-27 14:46                         ` RFC on writel and writel_relaxed Sinan Kaya
2018-03-27 15:01                           ` Jose Abreu
2018-03-27 15:10                           ` Will Deacon
2018-03-27 18:54                             ` Alexander Duyck
2018-03-27 19:54                               ` Arnd Bergmann
2018-03-27 20:46                                 ` Arnd Bergmann
2018-03-27 21:33                               ` Benjamin Herrenschmidt
2018-03-28  0:39                                 ` Linus Torvalds
2018-03-28  1:03                                   ` Benjamin Herrenschmidt
2018-03-28  2:51                                     ` Linus Torvalds
2018-03-28  3:24                                       ` Sinan Kaya
2018-03-28  4:41                                         ` Benjamin Herrenschmidt
2018-03-28  6:14                                         ` Linus Torvalds
2018-03-28 11:41                                           ` okaya
2018-03-28 15:13                                             ` Benjamin Herrenschmidt
2018-03-28 15:55                                               ` David Miller
2018-03-28 16:23                                                 ` Nicholas Piggin
2018-03-28 21:31                                                   ` Benjamin Herrenschmidt
2018-03-28 22:09                                                     ` Nicholas Piggin
2018-03-29  9:20                                                     ` Will Deacon
2018-03-29 13:56                                                 ` Sinan Kaya
2018-03-29 14:04                                                   ` David Miller
2018-03-29 16:29                                                   ` Arnd Bergmann
2018-03-29 16:59                                                     ` Sinan Kaya
2018-03-30  1:40                                                   ` Benjamin Herrenschmidt
2018-04-02 13:01                                                     ` Sinan Kaya
2018-03-28  4:33                                       ` Benjamin Herrenschmidt
2018-03-28  6:26                                         ` Linus Torvalds
2018-03-28  6:42                                           ` Benjamin Herrenschmidt
2018-03-28  6:53                                             ` Linus Torvalds
2018-03-28  6:56                                               ` Benjamin Herrenschmidt
2018-03-28  7:11                                                 ` Arnd Bergmann
2018-03-28  7:42                                                   ` Benjamin Herrenschmidt
2018-03-28  9:07                                             ` Will Deacon
2018-03-28  9:56                                               ` Benjamin Herrenschmidt
2018-03-28 10:13                                                 ` Aw: " Lino Sanfilippo
2018-03-28 10:20                                                   ` Benjamin Herrenschmidt
2018-03-28 11:30                                                 ` David Laight
2018-03-28 15:12                                                   ` Benjamin Herrenschmidt [this message]
2018-03-28 16:16                                                     ` David Laight
2018-03-28  1:21                             ` Benjamin Herrenschmidt
2018-03-27 21:35                           ` Benjamin Herrenschmidt
2018-03-27 21:54 Alexander Duyck
2018-03-27 22:35 ` Sinan Kaya
2018-03-27 23:43 ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1522249950.21446.23.camel@kernel.crashing.org \
    --to=benh@kernel.crashing.org \
    --cc=David.Laight@ACULAB.COM \
    --cc=alexander.duyck@gmail.com \
    --cc=arnd@arndb.de \
    --cc=jgg@ziepe.ca \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=netdev@vger.kernel.org \
    --cc=okaya@codeaurora.org \
    --cc=oohall@gmail.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).