linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Ganesh G R <ganeshgr@linux.ibm.com>
To: Michael Ellerman <mpe@ellerman.id.au>, linuxppc-dev@lists.ozlabs.org
Cc: Sahitya.Damerla@ibm.com, mahesh@linux.ibm.com
Subject: Re: [PATCH] powerpc/eeh: Permanently disable the removed device
Date: Mon, 15 Apr 2024 12:19:32 +0530	[thread overview]
Message-ID: <73516a29-a172-4b47-8f69-3e40ffacd47c@linux.ibm.com> (raw)
In-Reply-To: <87v84qaopi.fsf@mail.lhotse>

On 4/9/24 14:37, Michael Ellerman wrote:

> Hi Ganesh,
>
> Ganesh Goudar <ganeshgr@linux.ibm.com> writes:
>> When a device is hot removed on powernv, the hotplug
>> driver clears the device's state. However, on pseries,
>> if a device is removed by phyp after reaching the error
>> threshold, the kernel remains unaware, leading to the
>> device not being torn down. This prevents necessary
>> remediation actions like failover.
>>
>> Permanently disable the device if the presence check
>> fails.
> You can wrap your changelogs a bit wider, 70 or 80 columns is fine.

ok

>> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
>> index ab316e155ea9..8d1606406d3f 100644
>> --- a/arch/powerpc/kernel/eeh.c
>> +++ b/arch/powerpc/kernel/eeh.c
>> @@ -508,7 +508,9 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
>>   	 * state, PE is in good state.
>>   	 */
>>   	if ((ret < 0) ||
>> -	    (ret == EEH_STATE_NOT_SUPPORT) || eeh_state_active(ret)) {
>> +	    (ret == EEH_STATE_NOT_SUPPORT &&
>> +	     dev->error_state == pci_channel_io_perm_failure) ||
>> +	    eeh_state_active(ret)) {
>>   		eeh_stats.false_positives++;
>>   		pe->false_positives++;
>>   		rc = 0;
> How does this hunk relate the changelog?
>
> This is adding an extra condition to the false positive check, so
> there's a risk this causes devices to go into failure when previously
> they didn't, right? So please explain why it's a good change. The
> comment above the if needs updating too.

We need this change to log the event and get the device removed, I will explain this
in commit message.

>> diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
>> index 48773d2d9be3..10317badf471 100644
>> --- a/arch/powerpc/kernel/eeh_driver.c
>> +++ b/arch/powerpc/kernel/eeh_driver.c
>> @@ -867,7 +867,13 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
>>   	if (!devices) {
>>   		pr_debug("EEH: Frozen PHB#%x-PE#%x is empty!\n",
>>   			pe->phb->global_number, pe->addr);
>> -		goto out; /* nothing to recover */
> The other cases that go to recover_failed usually print something at
> warn level, so this probably should too. So either make the above a
> pr_warn(), or change it to a warn with a more helpful message.

ok

>> +		/*
>> +		 * The device is removed, Tear down its state,
>> +		 * On powernv hotplug driver would take care of
>> +		 * it but not on pseries, Permanently disable the
>> +		 * card as it is hot removed.
>> +		 */
> Formatting and punctuation is weird. It can be wider, and capital letter
> is only required after a full stop, not a comma.

ok, i will take care of it.

> Also you say that the powernv hotplug driver "would" take care of it,
> that's past tense, is that what you mean? Does the powernv hotplug
> driver still take care of it after this change? And (how) does that
> driver cope with it happening here also?

Yes, hotplug driver can still remove the device and the removal of
device is covered by pci rescan lock.

>> +		goto recover_failed;
>>   	}
>>   	
> cheers

      reply	other threads:[~2024-04-15  6:50 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-05 13:14 [PATCH] powerpc/eeh: Permanently disable the removed device Ganesh Goudar
2024-04-09  9:07 ` Michael Ellerman
2024-04-15  6:49   ` Ganesh G R [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=73516a29-a172-4b47-8f69-3e40ffacd47c@linux.ibm.com \
    --to=ganeshgr@linux.ibm.com \
    --cc=Sahitya.Damerla@ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mahesh@linux.ibm.com \
    --cc=mpe@ellerman.id.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).