From: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
To: Ian Munsie <imunsie@au1.ibm.com>,
Michael Ellerman <mpe@ellerman.id.au>, mikey <mikey@neuling.org>,
linuxppc-dev@lists.ozlabs.org,
Frederic Barrat <frederic.barrat@fr.ibm.com>,
Huy Nguyen <huyn@mellanox.com>
Subject: Re: [PATCH v2] cxl: Fix bug where AFU disable operation had no effect
Date: Fri, 1 Jul 2016 10:09:12 +0200 [thread overview]
Message-ID: <57762528.3040104@linux.vnet.ibm.com> (raw)
In-Reply-To: <1467305440-23492-1-git-send-email-imunsie@au.ibm.com>
It looks good to me. I can live with the extra 'clear' parameter,
because as you say, the approach I had suggested would require more
testing, so let's play safe for the time being.
And thanks for clarifying/fixing the deactivate paths.
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Fred
Le 30/06/2016 18:50, Ian Munsie a écrit :
> From: Ian Munsie <imunsie@au1.ibm.com>
>
> The AFU disable operation has a bug where it will not clear the enable
> bit and therefore will have no effect. To date this has likely been
> masked by fact that we perform an AFU reset before the disable, which
> also has the effect of clearing the enable bit, making the following
> disable operation effectively a noop on most hardware. This patch
> modifies the afu_control function to take a parameter to clear from the
> AFU control register so that the disable operation can clear the
> appropriate bit.
>
> This bug was uncovered on the Mellanox CX4, which uses an XSL rather
> than a PSL. On the XSL the reset operation will not complete while the
> AFU is enabled, meaning the enable bit was still set at the start of the
> disable and as a result this bug was hit and the disable also timed out.
>
> Because of this difference in behaviour between the PSL and XSL, this
> patch now makes the reset dependent on the card using a PSL to avoid
> waiting for a timeout on the XSL. It is entirely possible that we may be
> able to drop the reset altogether if it turns out we only ever needed it
> due to this bug - however I am not willing to drop it without further
> regression testing and have added comments to the code explaining the
> background.
>
> This also fixes a small issue where the AFU_Cntl register was read
> outside of the lock that protects it.
>
> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
> ---
> Changes since v1:
> - Modified comment to dedicated process disable path to explain
> the architected procedure, and the origin of our more heavy weight
> procedure.
> - Modified comment to afu directed disable path with a note that the
> procedure is AFU specific, and explaining the origin of our heavy
> weight prcedure.
> - Removed needs_reset_before_disable condition from dedicated process
> disable path. The XSL in the CX4 does not use this mode (AFU directed
> only), and since the reset in this procedure is architected we
> *should* never need to skip it.
>
> drivers/misc/cxl/cxl.h | 1 +
> drivers/misc/cxl/native.c | 58 +++++++++++++++++++++++++++++++++++++++++------
> drivers/misc/cxl/pci.c | 1 +
> 3 files changed, 53 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
> index ce2b9d5..bab8dfd 100644
> --- a/drivers/misc/cxl/cxl.h
> +++ b/drivers/misc/cxl/cxl.h
> @@ -544,6 +544,7 @@ struct cxl_service_layer_ops {
> void (*write_timebase_ctrl)(struct cxl *adapter);
> u64 (*timebase_read)(struct cxl *adapter);
> int capi_mode;
> + bool needs_reset_before_disable;
> };
>
> struct cxl_native {
> diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c
> index 120c468..e774505 100644
> --- a/drivers/misc/cxl/native.c
> +++ b/drivers/misc/cxl/native.c
> @@ -21,10 +21,10 @@
> #include "cxl.h"
> #include "trace.h"
>
> -static int afu_control(struct cxl_afu *afu, u64 command,
> +static int afu_control(struct cxl_afu *afu, u64 command, u64 clear,
> u64 result, u64 mask, bool enabled)
> {
> - u64 AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An);
> + u64 AFU_Cntl;
> unsigned long timeout = jiffies + (HZ * CXL_TIMEOUT);
> int rc = 0;
>
> @@ -33,7 +33,8 @@ static int afu_control(struct cxl_afu *afu, u64 command,
>
> trace_cxl_afu_ctrl(afu, command);
>
> - cxl_p2n_write(afu, CXL_AFU_Cntl_An, AFU_Cntl | command);
> + AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An);
> + cxl_p2n_write(afu, CXL_AFU_Cntl_An, (AFU_Cntl & ~clear) | command);
>
> AFU_Cntl = cxl_p2n_read(afu, CXL_AFU_Cntl_An);
> while ((AFU_Cntl & mask) != result) {
> @@ -67,7 +68,7 @@ static int afu_enable(struct cxl_afu *afu)
> {
> pr_devel("AFU enable request\n");
>
> - return afu_control(afu, CXL_AFU_Cntl_An_E,
> + return afu_control(afu, CXL_AFU_Cntl_An_E, 0,
> CXL_AFU_Cntl_An_ES_Enabled,
> CXL_AFU_Cntl_An_ES_MASK, true);
> }
> @@ -76,7 +77,8 @@ int cxl_afu_disable(struct cxl_afu *afu)
> {
> pr_devel("AFU disable request\n");
>
> - return afu_control(afu, 0, CXL_AFU_Cntl_An_ES_Disabled,
> + return afu_control(afu, 0, CXL_AFU_Cntl_An_E,
> + CXL_AFU_Cntl_An_ES_Disabled,
> CXL_AFU_Cntl_An_ES_MASK, false);
> }
>
> @@ -85,7 +87,7 @@ static int native_afu_reset(struct cxl_afu *afu)
> {
> pr_devel("AFU reset request\n");
>
> - return afu_control(afu, CXL_AFU_Cntl_An_RA,
> + return afu_control(afu, CXL_AFU_Cntl_An_RA, 0,
> CXL_AFU_Cntl_An_RS_Complete | CXL_AFU_Cntl_An_ES_Disabled,
> CXL_AFU_Cntl_An_RS_MASK | CXL_AFU_Cntl_An_ES_MASK,
> false);
> @@ -595,7 +597,33 @@ static int deactivate_afu_directed(struct cxl_afu *afu)
> cxl_sysfs_afu_m_remove(afu);
> cxl_chardev_afu_remove(afu);
>
> - cxl_ops->afu_reset(afu);
> + /*
> + * The CAIA section 2.2.1 indicates that the procedure for starting and
> + * stopping an AFU in AFU directed mode is AFU specific, which is not
> + * ideal since this code is generic and with one exception has no
> + * knowledge of the AFU. This is in contrast to the procedure for
> + * disabling a dedicated process AFU, which is documented to just
> + * require a reset. The architecture does indicate that both an AFU
> + * reset and an AFU disable should result in the AFU being disabled and
> + * we do both followed by a PSL purge for safety.
> + *
> + * Notably we used to have some issues with the disable sequence on PSL
> + * cards, which is why we ended up using this heavy weight procedure in
> + * the first place, however a bug was discovered that had rendered the
> + * disable operation ineffective, so it is conceivable that was the
> + * sole explanation for those difficulties. Careful regression testing
> + * is recommended if anyone attempts to remove or reorder these
> + * operations.
> + *
> + * The XSL on the Mellanox CX4 behaves a little differently from the
> + * PSL based cards and will time out an AFU reset if the AFU is still
> + * enabled. That card is special in that we do have a means to identify
> + * it from this code, so in that case we skip the reset and just use a
> + * disable/purge to avoid the timeout and corresponding noise in the
> + * kernel log.
> + */
> + if (afu->adapter->native->sl_ops->needs_reset_before_disable)
> + cxl_ops->afu_reset(afu);
> cxl_afu_disable(afu);
> cxl_psl_purge(afu);
>
> @@ -735,6 +763,22 @@ static int native_attach_process(struct cxl_context *ctx, bool kernel,
>
> static inline int detach_process_native_dedicated(struct cxl_context *ctx)
> {
> + /*
> + * The CAIA section 2.1.1 indicates that we need to do an AFU reset to
> + * stop the AFU in dedicated mode (we therefore do not make that
> + * optional like we do in the afu directed path). It does not indicate
> + * that we need to do an explicit disable (which should occur
> + * implicitly as part of the reset) or purge, but we do these as well
> + * to be on the safe side.
> + *
> + * Notably we used to have some issues with the disable sequence
> + * (before the sequence was spelled out in the architecture) which is
> + * why we were so heavy weight in the first place, however a bug was
> + * discovered that had rendered the disable operation ineffective, so
> + * it is conceivable that was the sole explanation for those
> + * difficulties. Point is, we should be careful and do some regression
> + * testing if we ever attempt to remove any part of this procedure.
> + */
> cxl_ops->afu_reset(ctx->afu);
> cxl_afu_disable(ctx->afu);
> cxl_psl_purge(ctx->afu);
> diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
> index 58d7d821..b7f2e96 100644
> --- a/drivers/misc/cxl/pci.c
> +++ b/drivers/misc/cxl/pci.c
> @@ -1309,6 +1309,7 @@ static const struct cxl_service_layer_ops psl_ops = {
> .write_timebase_ctrl = write_timebase_ctrl_psl,
> .timebase_read = timebase_read_psl,
> .capi_mode = OPAL_PHB_CAPI_MODE_CAPI,
> + .needs_reset_before_disable = true,
> };
>
> static const struct cxl_service_layer_ops xsl_ops = {
>
next prev parent reply other threads:[~2016-07-01 8:09 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-29 18:51 [PATCH 1/2] cxl: Fix bug where AFU disable operation had no effect Ian Munsie
2016-06-29 18:51 ` [PATCH 2/2] cxl: Workaround XSL bug that does not clear the RA bit after a reset Ian Munsie
2016-06-30 11:05 ` Frederic Barrat
2016-07-11 10:19 ` [2/2] " Michael Ellerman
2016-06-30 14:19 ` [PATCH 1/2] cxl: Fix bug where AFU disable operation had no effect Frederic Barrat
2016-06-30 15:32 ` Ian Munsie
2016-06-30 15:50 ` Frederic Barrat
2016-06-30 16:45 ` Ian Munsie
2016-06-30 16:50 ` [PATCH v2] " Ian Munsie
2016-07-01 8:09 ` Frederic Barrat [this message]
2016-07-11 10:19 ` [v2] " Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57762528.3040104@linux.vnet.ibm.com \
--to=fbarrat@linux.vnet.ibm.com \
--cc=frederic.barrat@fr.ibm.com \
--cc=huyn@mellanox.com \
--cc=imunsie@au1.ibm.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mikey@neuling.org \
--cc=mpe@ellerman.id.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.