All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Jeffery <andrew@aj.id.au>
To: Eddie James <eajames@linux.vnet.ibm.com>, openbmc@lists.ozlabs.org
Cc: "Edward A. James" <eajames@us.ibm.com>, cbostic@linux.vnet.ibm.com
Subject: Re: [PATCH linux dev-4.10 v2] drivers: i2c: fsi: Add proper abort method
Date: Wed, 25 Oct 2017 11:33:37 +1030	[thread overview]
Message-ID: <1508893417.13477.1.camel@aj.id.au> (raw)
In-Reply-To: <de884d77-04a2-1095-8198-89f2ce93e029@linux.vnet.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 7347 bytes --]

> > > +static int fsi_i2c_reset(struct fsi_i2c_master *i2c, u16 port)
> > > +{
> > 
> > I can't see that this (the whole reset sequence) is called with pre-emption
> > disabled, perhaps we should do so? It looks like this will be called in process
> > context through fsi_i2c_xfer(). Stalling the recovery will just hold up other
> > processes wanting to use the bus as we'll hold the bus mutex across the task
> > switch.
> 
> Well, presumably we want to block other access to the bus? We need to 
> get the master back in a good state before releasing it for other transfers.

Right, that's my point. By disabling pre-emption we do that as fast as
possible by avoiding switching to other process contexts whilst holding the
mutex. This would happen at the cost of processes that don't care about the
bus (those that do are, or will, be waiting on the mutex anyway), but as we're
performing a hardware recovery sequence I feel it should be prioritised. Only
the context owning the mutex can recover the bus, therefore I think it's more
useful to avoid switching to other process contexts until we finish the
recovery.

I guess we need to think about system robustness in that case though. Can we
get stuck in this recovery sequence? fsi_*() calls time out if I recall
correctly, so the answer would appear to be no, in which case this shouldn't be
problematic. The system might be "sticky" if we start hitting FSI timeouts but
beyond that things should continue (aside from either FSI or the I2C bus being
broken).

So this is all coming from my gut feeling that things like bus recovery would
usually happen under a spinlock, which disables pre-emption. Maybe we should
discuss that before getting down in the weeds like we are.

> 
> > 
> > > +	int rc;
> > > +	u32 mode, stat, dummy = 0;
> > > +
> > > +	/* reset engine */
> > > +	rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_RESET_I2C, &dummy);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	/* re-init engine */
> > > +	rc = fsi_i2c_dev_init(i2c);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	rc = fsi_i2c_read_reg(i2c->fsi, I2C_FSI_MODE, &mode);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	/* set port; default after reset is 0 */
> > > +	if (port) {
> > > +		mode = SETFIELD(I2C_MODE_PORT, mode, port);
> > > +		rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_MODE, &mode);
> > > +		if (rc)
> > > +			return rc;
> > > +	}
> > > +
> > > +	/* reset busy register; hw workaround */
> > > +	dummy = I2C_PORT_BUSY_RESET;
> > > +	rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_PORT_BUSY, &dummy);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	/* force bus reset */
> > > +	rc = fsi_i2c_reset_bus(i2c);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	/* reset errors */
> > > +	dummy = 0;
> > > +	rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_RESET_ERR, &dummy);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	/* wait for command complete */
> > > +	set_current_state(TASK_UNINTERRUPTIBLE);
> > > +	schedule_timeout(I2C_LOCAL_WAIT_TIMEOUT);
> > > +
> > > +	rc = fsi_i2c_read_reg(i2c->fsi, I2C_FSI_STAT, &stat);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	if (stat & I2C_STAT_CMD_COMP)
> > > +		return 0;
> > > +
> > > +	/* failed to get command complete; reset engine again */
> > > +	rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_RESET_I2C, &dummy);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	/* re-init engine again */
> > > +	rc = fsi_i2c_dev_init(i2c);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static int fsi_i2c_abort(struct fsi_i2c_port *port, u32 status)
> > > +{
> > > +	int rc;
> > > +	unsigned long start;
> > > +	u32 cmd = I2C_CMD_WITH_STOP;
> > > +	struct fsi_device *fsi = port->master->fsi;
> > > +
> > > +	rc = fsi_i2c_reset(port->master, port->port);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	/* skip final stop command for these errors */
> > > +	if (status & (I2C_STAT_PARITY | I2C_STAT_LOST_ARB | I2C_STAT_STOP_ERR))
> > > +		return 0;
> > > +
> > > +	rc = fsi_i2c_write_reg(fsi, I2C_FSI_CMD, &cmd);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	start = jiffies;
> > > +
> > > +	do {
> > > +		rc = fsi_i2c_read_reg(fsi, I2C_FSI_STAT, &status);
> > > +		if (rc)
> > > +			return rc;
> > > +
> > > +		if (status & I2C_STAT_CMD_COMP)
> > > +			return 0;
> > > +
> > > +		set_current_state(TASK_INTERRUPTIBLE);
> > > +		if (schedule_timeout(I2C_LOCAL_WAIT_TIMEOUT) > 0)
> > > +			return -EINTR;
> > > +
> > > +	} while (time_after(start + I2C_ABORT_TIMEOUT, jiffies));
> > 
> > I'm a bit ignorant here, but why do we expect the abort operation to take up to
> > 100ms? You mentioned you invented the numbers, but it's not clear what the
> > justification is.
> 
> I used the same timeout used in the FSP driver, which seems to work. I 
> really have no other justification for this value.

That's probably worth a comment in the code.

> 
> > 
> > > +
> > > +	return -ETIME;
> > > +}
> > > +
> > >   static int fsi_i2c_handle_status(struct fsi_i2c_port *port,
> > >   				 struct i2c_msg *msg, u32 status)
> > >   {
> > >   	int rc;
> > >   	u8 fifo_count;
> > > -	struct fsi_i2c_master *i2c = port->master;
> > > -	u32 dummy = 0;
> > >   
> > >   	if (status & I2C_STAT_ERR) {
> > > -		rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_RESET_ERR, &dummy);
> > > +		rc = fsi_i2c_abort(port, status);
> > >   		if (rc)
> > >   			return rc;
> > >   
> > > +		if (status & I2C_STAT_INV_CMD)
> > > +			return -EINVAL;
> > > +
> > > +		if (status & (I2C_STAT_PARITY | I2C_STAT_BE_OVERRUN |
> > > +		    I2C_STAT_BE_ACCESS))
> > > +			return -EPROTO;
> > > +
> > >   		if (status & I2C_STAT_NACK)
> > >   			return -EFAULT;
> > 
> > EFAULT is "Bad address", but slaves can NACK things that aren't addresses.
> > As an example i2c-aspeed calls a NACK an EIO.
> 
> True, but I dislike returning EIO as it's somewhat of a default option 
> and makes it very difficult to determine what actually went wrong. How 
> about ENXIO?

What does userspace do with that knowledge? Is it worth differentiating? I
can't immediately see that it is. Maybe it's worth a dev_dbg() or something?
I feel ENXIO doesn't address my point either, as it means "No such device or
address", which is still inappropriate for data NACKs.

> 
> > 
> > >   
> > > +		if (status & I2C_STAT_LOST_ARB)
> > > +			return -ECANCELED;
> > 
> > So checking around the tree, all of i2c-aspeed, i2c-designware-core,
> > i2c-hix5hd2, i2c-kempld to name a few (I stopped checking at that point) use
> > -EAGAIN for arbitration loss. I don't think -ECANCELLED is appropriate, and
> > certainly think that -EAGAIN is what we want: Nothing went wrong aside from
> > this master lost the arbitration race, so the caller should really just retry.
> 
> Good point, however, for this master, it can often return arbitration 
> lost if the clock or data line is stuck, and retries will not work. I'll 
> switch it to EAGAIN, callers shouldn't try too many times hopefully...

Can we test status again when a new transfer is starting to see if we need to
recover the bus before proceeding?

Cheers,

Andrew

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

  reply	other threads:[~2017-10-25  1:03 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-20 20:02 [PATCH linux dev-4.10 v2] drivers: i2c: fsi: Add proper abort method Eddie James
2017-10-20 20:30 ` Christopher Bostic
2017-10-23  0:31 ` Andrew Jeffery
2017-10-24 15:39   ` Eddie James
2017-10-25  1:03     ` Andrew Jeffery [this message]
2017-10-27 17:10       ` Eddie James
2017-10-27 20:41         ` Andrew Jeffery
2017-10-30 16:27           ` Eddie James

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1508893417.13477.1.camel@aj.id.au \
    --to=andrew@aj.id.au \
    --cc=cbostic@linux.vnet.ibm.com \
    --cc=eajames@linux.vnet.ibm.com \
    --cc=eajames@us.ibm.com \
    --cc=openbmc@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.