linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Gavin Shan <gwshan@linux.vnet.ibm.com>
To: Russell Currey <ruscur@russell.cc>
Cc: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>,
	linuxppc-dev@lists.ozlabs.org,
	Frederic Barrat <fbarrat@linux.vnet.ibm.com>,
	Andrew Donnellan <andrew.donnellan@au1.ibm.com>,
	Ian Munsie <imunsie@au1.ibm.com>,
	Christophe Lombard <christophe_lombard@fr.ibm.com>,
	Philippe Bergheaud <philippe.bergheaud@fr.ibm.com>,
	Greg Kurz <gkurz@linux.vnet.ibm.com>,
	Gavin Shan <gwshan@linux.vnet.ibm.com>
Subject: Re: [RESEND-RFC v2 2/3] powerpc/eeh: Introduce function eeh_pe_reset_freeze_counter()
Date: Fri, 3 Mar 2017 16:45:14 +1100	[thread overview]
Message-ID: <20170303054514.GA29434@gwshan> (raw)
In-Reply-To: <1488515705.6003.1.camel@russell.cc>

On Fri, Mar 03, 2017 at 03:35:05PM +1100, Russell Currey wrote:
>On Fri, 2017-03-03 at 09:51 +0530, Vaibhav Jain wrote:
>> Hi Russell,
>> 
>> Vaibhav Jain <vaibhav@linux.vnet.ibm.com> writes:
>> 
>> > This patch introduces function eeh_pe_reset_freeze_counter() which can
>> > be used to reset the PE's freeze count variable outside eeh code. This
>> > is useful for devices that can acquire a different personality after
>> > a PERST event (e.g FPGA Adapters). Presently an existing freeze
>> > count for an adapter with personality N will be taken into account
>> > when the adapter acquired personality N+1.
>> > 
>> > By calling eeh_pe_reset_freeze_counter() drivers can reset the freeze
>> > counter for an adapter once it has acquired a new personality and
>> > ideally wont be plagued by the failures similar to the one before.
>> > 
>> > Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
>> > ---
>> 
>> Had a short chat discussion with Gavin Shan on this patchset and he
>> preffers restoring the freeze_count on the eeh_pe once FRESET is done.
>> He expects a the flow to be similar to one below
>> 
>> 1. module caches the value of freeze_count and resets it
>> 2. Issue warm reset
>> 3. During eeh error-detected callback module restores the freeze_count
>> from the cached value.
>> 
>> Russell, what do you think? 
>> 
>I thought about this but figured it didn't really make sense from a CAPI
>perspective.  If you're flashing the device, it is going to have different
>behaviour to before it was flashed, and that it should be treated differently as
>a result (and thus restoring the freeze_count doesn't make much sense).
>

There are nothing changed on the PHB. This patch is clearing the error count
of PHB PE, not the PE for the CAPI device. We shouldn't clear the error count
of the PHB PE. Otherwise, it's not consistent.

>Consider a case where there's a buggy FPGA image on an adapter that's failed 4
>times in the past hour, and generally has frequent errors.  You decide to update
>it to something that's less buggy, so you flash the adapter.  The freeze_count
>gets cached and thus is restored to 4 after the flash.  Now even if the new
>image is less buggy and may only fail once an hour instead of multiple times, if
>it happens to fail within an hour of the earlier failures the device is now
>fenced and you need to reboot.
>
>I don't mind either way - I just don't get the logic of restoring the count.
>

I don't get your point. FPGA image isn't the only source of EEH error. Also,
it's not related the PHB PE's error count, which the patch is to clear.

Cheers,
Gavin

  parent reply	other threads:[~2017-03-03  5:46 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-01 11:24 [RESEND-RFC v2 0/3] cxl: Reset freeze counter for the adapter before PERST Vaibhav Jain
2017-03-01 11:24 ` [RESEND-RFC v2 1/3] powerpc/eeh: Refactor eeh_pe_update_time_stamp() to update freeze_count Vaibhav Jain
2017-03-01 14:58   ` Guilherme G. Piccoli
2017-03-01 16:26     ` Vaibhav Jain
2017-03-01 17:39       ` Guilherme G. Piccoli
2017-03-01 23:52       ` Russell Currey
2017-03-02  1:03   ` Andrew Donnellan
2017-03-01 11:24 ` [RESEND-RFC v2 2/3] powerpc/eeh: Introduce function eeh_pe_reset_freeze_counter() Vaibhav Jain
2017-03-02  1:10   ` Andrew Donnellan
2017-03-03  4:21   ` Vaibhav Jain
2017-03-03  4:34     ` Andrew Donnellan
2017-03-03  4:35     ` Russell Currey
2017-03-03  4:39       ` Andrew Donnellan
2017-03-03  5:45       ` Gavin Shan [this message]
2017-03-01 11:24 ` [RESEND-RFC v2 3/3] cxl: Reset freeze counters before adapter PERST for flashing new image Vaibhav Jain
2017-03-02  1:46   ` Andrew Donnellan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170303054514.GA29434@gwshan \
    --to=gwshan@linux.vnet.ibm.com \
    --cc=andrew.donnellan@au1.ibm.com \
    --cc=christophe_lombard@fr.ibm.com \
    --cc=fbarrat@linux.vnet.ibm.com \
    --cc=gkurz@linux.vnet.ibm.com \
    --cc=imunsie@au1.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=philippe.bergheaud@fr.ibm.com \
    --cc=ruscur@russell.cc \
    --cc=vaibhav@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).