* [PATCH] cxl: Force context lock during EEH flow
@ 2017-02-03 6:57 Vaibhav Jain
2017-02-06 16:18 ` Frederic Barrat
0 siblings, 1 reply; 3+ messages in thread
From: Vaibhav Jain @ 2017-02-03 6:57 UTC (permalink / raw)
To: linuxppc-dev
Cc: Vaibhav Jain, frederic.barrat, Andrew Donnellan, Ian Munsie,
Christophe Lombard, Philippe Bergheaud, Gregory Kurz, stable
During an eeh event when the cxl card is fenced and card sysfs attr
perst_reloads_same_image is set following warning message is seen in the
kernel logs:
[ 60.622727] Adapter context unlocked with 0 active contexts
[ 60.622762] ------------[ cut here ]------------
[ 60.622771] WARNING: CPU: 12 PID: 627 at
../drivers/misc/cxl/main.c:325 cxl_adapter_context_unlock+0x60/0x80 [cxl]
Even though this warning is harmless, it clutters the kernel log
during an eeh event. This warning is triggered as the EEH callback
cxl_pci_error_detected doesn't obtain a context-lock before forcibly
detaching all active context and when context-lock is released during
call to cxl_configure_adapter from cxl_pci_slot_reset, a warning in
cxl_adapter_context_unlock is triggered.
To fix this warning and also prevent activation of any context during
eeh the patch introduces a new function cxl_adapter_context_force_lock
that would forcefully acquire the context_lock and warn if any active
contexts exists when this happens. After the EEH flow concludes with
call to cxl_pci_resume the context-lock is released with a call to
cxl_adapter_context_unlock.
Cc: stable@vger.kernel.org
Fixes: 70b565bbdb91("cxl: Prevent adapter reset if an active context exists")
Reported-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
---
drivers/misc/cxl/cxl.h | 3 +++
drivers/misc/cxl/main.c | 10 ++++++++++
drivers/misc/cxl/pci.c | 18 ++++++++++++++++--
3 files changed, 29 insertions(+), 2 deletions(-)
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index b24d767..34d2ca0 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -970,4 +970,7 @@ int cxl_adapter_context_lock(struct cxl *adapter);
/* Unlock the contexts-lock if taken. Warn and force unlock otherwise */
void cxl_adapter_context_unlock(struct cxl *adapter);
+/* Force contexts-lock to be taken */
+void cxl_adapter_context_force_lock(struct cxl *adapter);
+
#endif
diff --git a/drivers/misc/cxl/main.c b/drivers/misc/cxl/main.c
index 62e0dfb..1754a0c 100644
--- a/drivers/misc/cxl/main.c
+++ b/drivers/misc/cxl/main.c
@@ -301,6 +301,16 @@ void cxl_adapter_context_put(struct cxl *adapter)
atomic_dec_if_positive(&adapter->contexts_num);
}
+void cxl_adapter_context_force_lock(struct cxl *adapter)
+{
+ int count = atomic_read(&adapter->contexts_num);
+
+ if (count > 0)
+ pr_warn("Forcing context lock with %d active contexts\n",
+ count);
+ atomic_set(&adapter->contexts_num, -1);
+}
+
int cxl_adapter_context_lock(struct cxl *adapter)
{
int rc;
diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index 80a87ab..d76fd4d 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -1487,8 +1487,6 @@ static int cxl_configure_adapter(struct cxl *adapter, struct pci_dev *dev)
if ((rc = cxl_native_register_psl_err_irq(adapter)))
goto err;
- /* Release the context lock as adapter is configured */
- cxl_adapter_context_unlock(adapter);
return 0;
err:
@@ -1587,6 +1585,9 @@ static struct cxl *cxl_pci_init_adapter(struct pci_dev *dev)
if ((rc = cxl_sysfs_adapter_add(adapter)))
goto err_put1;
+ /* Release the context lock as adapter is configured */
+ cxl_adapter_context_unlock(adapter);
+
return adapter;
err_put1:
@@ -1607,6 +1608,9 @@ static void cxl_pci_remove_adapter(struct cxl *adapter)
{
pr_devel("cxl_remove_adapter\n");
+ /* Forcibly take the adapter context lock */
+ cxl_adapter_context_force_lock(adapter);
+
cxl_sysfs_adapter_remove(adapter);
cxl_debugfs_adapter_remove(adapter);
@@ -1778,6 +1782,9 @@ static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
*/
schedule();
+ /* forcibly take the context lock to prevent new context activation */
+ cxl_adapter_context_force_lock(adapter);
+
/* If we're permanently dead, give up. */
if (state == pci_channel_io_perm_failure) {
/* Tell the AFU drivers; but we don't care what they
@@ -1879,11 +1886,15 @@ static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
/* Only continue if everyone agrees on NEED_RESET */
if (result != PCI_ERS_RESULT_NEED_RESET)
return result;
+ }
+ for (i = 0; i < adapter->slices; i++) {
+ afu = adapter->afu[i];
cxl_context_detach_all(afu);
cxl_ops->afu_deactivate_mode(afu, afu->current_mode);
pci_deconfigure_afu(afu);
}
+
cxl_deconfigure_adapter(adapter);
return result;
@@ -1979,6 +1990,9 @@ static void cxl_pci_resume(struct pci_dev *pdev)
afu_dev->driver->err_handler->resume(afu_dev);
}
}
+
+ /* Unlock context activation for the adapter */
+ cxl_adapter_context_unlock(adapter);
}
static const struct pci_error_handlers cxl_err_handler = {
--
2.9.3
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] cxl: Force context lock during EEH flow
2017-02-03 6:57 [PATCH] cxl: Force context lock during EEH flow Vaibhav Jain
@ 2017-02-06 16:18 ` Frederic Barrat
2017-02-08 14:31 ` Vaibhav Jain
0 siblings, 1 reply; 3+ messages in thread
From: Frederic Barrat @ 2017-02-06 16:18 UTC (permalink / raw)
To: Vaibhav Jain, linuxppc-dev
Cc: Philippe Bergheaud, frederic.barrat, stable, Ian Munsie,
Andrew Donnellan, Gregory Kurz, Christophe Lombard
Le 03/02/2017 à 07:57, Vaibhav Jain a écrit :
> During an eeh event when the cxl card is fenced and card sysfs attr
> perst_reloads_same_image is set following warning message is seen in the
> kernel logs:
>
> [ 60.622727] Adapter context unlocked with 0 active contexts
> [ 60.622762] ------------[ cut here ]------------
> [ 60.622771] WARNING: CPU: 12 PID: 627 at
> ../drivers/misc/cxl/main.c:325 cxl_adapter_context_unlock+0x60/0x80 [cxl]
>
> Even though this warning is harmless, it clutters the kernel log
> during an eeh event. This warning is triggered as the EEH callback
> cxl_pci_error_detected doesn't obtain a context-lock before forcibly
> detaching all active context and when context-lock is released during
> call to cxl_configure_adapter from cxl_pci_slot_reset, a warning in
> cxl_adapter_context_unlock is triggered.
>
> To fix this warning and also prevent activation of any context during
> eeh the patch introduces a new function cxl_adapter_context_force_lock
> that would forcefully acquire the context_lock and warn if any active
> contexts exists when this happens. After the EEH flow concludes with
> call to cxl_pci_resume the context-lock is released with a call to
> cxl_adapter_context_unlock.
>
> Cc: stable@vger.kernel.org
> Fixes: 70b565bbdb91("cxl: Prevent adapter reset if an active context exists")
> Reported-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
> Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
> ---
> drivers/misc/cxl/cxl.h | 3 +++
> drivers/misc/cxl/main.c | 10 ++++++++++
> drivers/misc/cxl/pci.c | 18 ++++++++++++++++--
> 3 files changed, 29 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
> index b24d767..34d2ca0 100644
> --- a/drivers/misc/cxl/cxl.h
> +++ b/drivers/misc/cxl/cxl.h
> @@ -970,4 +970,7 @@ int cxl_adapter_context_lock(struct cxl *adapter);
> /* Unlock the contexts-lock if taken. Warn and force unlock otherwise */
> void cxl_adapter_context_unlock(struct cxl *adapter);
>
> +/* Force contexts-lock to be taken */
> +void cxl_adapter_context_force_lock(struct cxl *adapter);
> +
> #endif
> diff --git a/drivers/misc/cxl/main.c b/drivers/misc/cxl/main.c
> index 62e0dfb..1754a0c 100644
> --- a/drivers/misc/cxl/main.c
> +++ b/drivers/misc/cxl/main.c
> @@ -301,6 +301,16 @@ void cxl_adapter_context_put(struct cxl *adapter)
> atomic_dec_if_positive(&adapter->contexts_num);
> }
>
> +void cxl_adapter_context_force_lock(struct cxl *adapter)
> +{
> + int count = atomic_read(&adapter->contexts_num);
> +
> + if (count > 0)
> + pr_warn("Forcing context lock with %d active contexts\n",
> + count);
> + atomic_set(&adapter->contexts_num, -1);
> +}
> +
> int cxl_adapter_context_lock(struct cxl *adapter)
> {
> int rc;
> diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
> index 80a87ab..d76fd4d 100644
> --- a/drivers/misc/cxl/pci.c
> +++ b/drivers/misc/cxl/pci.c
> @@ -1487,8 +1487,6 @@ static int cxl_configure_adapter(struct cxl *adapter, struct pci_dev *dev)
> if ((rc = cxl_native_register_psl_err_irq(adapter)))
> goto err;
>
> - /* Release the context lock as adapter is configured */
> - cxl_adapter_context_unlock(adapter);
> return 0;
>
> err:
> @@ -1587,6 +1585,9 @@ static struct cxl *cxl_pci_init_adapter(struct pci_dev *dev)
> if ((rc = cxl_sysfs_adapter_add(adapter)))
> goto err_put1;
>
> + /* Release the context lock as adapter is configured */
> + cxl_adapter_context_unlock(adapter);
> +
> return adapter;
>
> err_put1:
> @@ -1607,6 +1608,9 @@ static void cxl_pci_remove_adapter(struct cxl *adapter)
> {
> pr_devel("cxl_remove_adapter\n");
>
> + /* Forcibly take the adapter context lock */
> + cxl_adapter_context_force_lock(adapter);
> +
> cxl_sysfs_adapter_remove(adapter);
> cxl_debugfs_adapter_remove(adapter);
>
> @@ -1778,6 +1782,9 @@ static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
> */
> schedule();
>
> + /* forcibly take the context lock to prevent new context activation */
> + cxl_adapter_context_force_lock(adapter);
> +
I'm not a fan of the "force lock" call and I'm not convinced it's
needed. When an EEH is triggered, we'll force-detach all the contexts.
So that in itself should bring the counter of active contexts to 0. So
why not just add a cxl_adapter_context_lock() call at the end of
cxl_pci_error_detected()? The lock would be released in
cxl_configure_adapter when cxl_pci_slot_reset() is called, as is already
the case with the current code.
(as a side note, I don't believe the code is racy and that we need to
add extra protection to avoid context activation while
cxl_pci_error_detected() is called. We can talk about it if needed)
Fred
> /* If we're permanently dead, give up. */
> if (state == pci_channel_io_perm_failure) {
> /* Tell the AFU drivers; but we don't care what they
> @@ -1879,11 +1886,15 @@ static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev,
> /* Only continue if everyone agrees on NEED_RESET */
> if (result != PCI_ERS_RESULT_NEED_RESET)
> return result;
> + }
>
> + for (i = 0; i < adapter->slices; i++) {
> + afu = adapter->afu[i];
> cxl_context_detach_all(afu);
> cxl_ops->afu_deactivate_mode(afu, afu->current_mode);
> pci_deconfigure_afu(afu);
> }
> +
> cxl_deconfigure_adapter(adapter);
>
> return result;
> @@ -1979,6 +1990,9 @@ static void cxl_pci_resume(struct pci_dev *pdev)
> afu_dev->driver->err_handler->resume(afu_dev);
> }
> }
> +
> + /* Unlock context activation for the adapter */
> + cxl_adapter_context_unlock(adapter);
> }
>
> static const struct pci_error_handlers cxl_err_handler = {
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] cxl: Force context lock during EEH flow
2017-02-06 16:18 ` Frederic Barrat
@ 2017-02-08 14:31 ` Vaibhav Jain
0 siblings, 0 replies; 3+ messages in thread
From: Vaibhav Jain @ 2017-02-08 14:31 UTC (permalink / raw)
To: Frederic Barrat, linuxppc-dev
Cc: Philippe Bergheaud, frederic.barrat, stable, Ian Munsie,
Andrew Donnellan, Gregory Kurz, Christophe Lombard
Frederic Barrat <fbarrat@linux.vnet.ibm.com> writes:
>>
>> + /* forcibly take the context lock to prevent new context activation */
>> + cxl_adapter_context_force_lock(adapter);
>> +
>
> I'm not a fan of the "force lock" call and I'm not convinced it's
> needed. When an EEH is triggered, we'll force-detach all the contexts.
> So that in itself should bring the counter of active contexts to 0. So
> why not just add a cxl_adapter_context_lock() call at the end of
> cxl_pci_error_detected()? The lock would be released in
> cxl_configure_adapter when cxl_pci_slot_reset() is called, as is already
> the case with the current code.
> (as a side note, I don't believe the code is racy and that we need to
> add extra protection to avoid context activation while
> cxl_pci_error_detected() is called. We can talk about it if needed)
Hi Fred,
Thanks for reviewing the patch.
Function cxl_adapter_context_force_lock is introduced to handle cases
where in we don't want any new contexts to be activated while we go ahead
and detach all the currently activated contexts. This usually is the
case in error paths when we are performing cleanups of existing contexts
(control flows with cxl_context_detach_all).
This is in contrast with cxl_adapter_context_lock which need to wait for
all the active contexts to be detached and during the wait other
contexts can be activated. The _force_lock variant provides guarantees
against this by not letting any context to be activated and still
letting existing contexts be cleaned up.
This behavior though can also be achieved by having a separate flag in
the adapter-struct and checking it before letting a context
activate. However with cxl_adapter_context_force_lock I tried to re-use
the existing infrastructure we have already put in place to prevent
context activation while we are doing card reset.
In retrospect this is a more generic issue that we may wish to address
in cxl where in we want to prevent certain ops like context activation
or reading config space and also let existing consumers of these ops to
complete. It would be also better if other contributers can also pitch
in and provide there feedback as to possibly solve this problem in a
more generic manner.
Thanks,
--
Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Linux Technology Center, IBM India Pvt. Ltd.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2017-02-08 14:31 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-02-03 6:57 [PATCH] cxl: Force context lock during EEH flow Vaibhav Jain
2017-02-06 16:18 ` Frederic Barrat
2017-02-08 14:31 ` Vaibhav Jain
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).