netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] cxgb4: fix probe when already with invalid parameters
@ 2014-03-19 18:49 Thadeu Lima de Souza Cascardo
  2014-03-19 21:38 ` Dimitrios Michailidis
  0 siblings, 1 reply; 9+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2014-03-19 18:49 UTC (permalink / raw)
  To: netdev; +Cc: dm, Thadeu Lima de Souza Cascardo

Since commit 636f9d371f70f22961fd598fe18380057518ca31 ("cxgb4: Add
support for T4 configuration file"), we have problems when probing the
device, and finding out it's already initialized, but does not have
valid buffer sizes setup.

This may happen with kexec without shutdown, or bad firmware or
bootloader.

The usual symptom is that probe fails:

[    2.605494] cxgb4 0000:50:00.4: Coming up as MASTER: Adapter already initialized
[    2.605511] cxgb4 0000:50:00.4: bad SGE FL page buffer sizes [0, 0]
[    2.625629] cxgb4: probe of 0000:50:00.4 failed with error -22

The solution is to treat the adapter as not initialized in case the
parameters are invalid.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |   34 +++++++++++++---------
 1 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 34e2488..d0638f9 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -5237,13 +5237,27 @@ static int adap_init0(struct adapter *adap)
 	 * master initialization), note that we're living with existing
 	 * adapter parameters.  Otherwise, it's time to try initializing the
 	 * adapter ...
+	 *
+	 * If we're living with non-hard-coded parameters (either from a
+	 * Firmware Configuration File or values programmed by a different PF
+	 * Driver), give the SGE code a chance to pull in anything that it
+	 * needs ...  Note that this must be called after we retrieve our VPD
+	 * parameters in order to know how to convert core ticks to seconds.
 	 */
 	if (state == DEV_STATE_INIT) {
 		dev_info(adap->pdev_dev, "Coming up as %s: "\
 			 "Adapter already initialized\n",
 			 adap->flags & MASTER_PF ? "MASTER" : "SLAVE");
 		adap->flags |= USING_SOFT_PARAMS;
-	} else {
+		ret = t4_sge_init(adap);
+		if (ret == -EINVAL) {
+			adap->flags &= ~USING_SOFT_PARAMS;
+			state = DEV_STATE_UNINIT;
+		} else if (ret < 0) {
+			goto bye;
+		}
+	}
+	if (state != DEV_STATE_INIT) {
 		dev_info(adap->pdev_dev, "Coming up as MASTER: "\
 			 "Initializing adapter\n");
 
@@ -5300,19 +5314,11 @@ static int adap_init0(struct adapter *adap)
 				-ret);
 			goto bye;
 		}
-	}
-
-	/*
-	 * If we're living with non-hard-coded parameters (either from a
-	 * Firmware Configuration File or values programmed by a different PF
-	 * Driver), give the SGE code a chance to pull in anything that it
-	 * needs ...  Note that this must be called after we retrieve our VPD
-	 * parameters in order to know how to convert core ticks to seconds.
-	 */
-	if (adap->flags & USING_SOFT_PARAMS) {
-		ret = t4_sge_init(adap);
-		if (ret < 0)
-			goto bye;
+		if (adap->flags & USING_SOFT_PARAMS) {
+			ret = t4_sge_init(adap);
+			if (ret < 0)
+				goto bye;
+		}
 	}
 
 	if (is_bypass_device(adap->pdev->device))
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* RE: [PATCH] cxgb4: fix probe when already with invalid parameters
  2014-03-19 18:49 [PATCH] cxgb4: fix probe when already with invalid parameters Thadeu Lima de Souza Cascardo
@ 2014-03-19 21:38 ` Dimitrios Michailidis
  2014-03-19 21:55   ` Casey Leedom
  2014-03-19 21:57   ` Thadeu Lima de Souza Cascardo
  0 siblings, 2 replies; 9+ messages in thread
From: Dimitrios Michailidis @ 2014-03-19 21:38 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo, netdev@vger.kernel.org; +Cc: Casey Leedom

Thadeu Lima de Souza Cascardo wrote:
> Since commit 636f9d371f70f22961fd598fe18380057518ca31 ("cxgb4: Add
> support for T4 configuration file"), we have problems when probing the
> device, and finding out it's already initialized, but does not have
> valid buffer sizes setup.
> 
> This may happen with kexec without shutdown, or bad firmware or
> bootloader.
> 
> The usual symptom is that probe fails:
> 
> [    2.605494] cxgb4 0000:50:00.4: Coming up as MASTER: Adapter already
> initialized
> [    2.605511] cxgb4 0000:50:00.4: bad SGE FL page buffer sizes [0, 0]
> [    2.625629] cxgb4: probe of 0000:50:00.4 failed with error -22
> 
> The solution is to treat the adapter as not initialized in case the
> parameters are invalid.

The patch doesn't look right to me.  Besides reinitializing the device when it finds
disagreeable settings it disregards that this PF may not be in charge of the device.
If the controlling PF (what the code calls master) selects values this PF doesn't like
with the patch it will elevate itself to master and install its own preferences.

Also not right of course is that FW is claiming the device is initialized when clearly it isn't.
Can you tell me which FW version is involved here and what steps got the device in this state?

> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
> ---
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |   34 +++++++++++++---------
>  1 files changed, 20 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> index 34e2488..d0638f9 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> @@ -5237,13 +5237,27 @@ static int adap_init0(struct adapter *adap)
>  	 * master initialization), note that we're living with existing
>  	 * adapter parameters.  Otherwise, it's time to try initializing the
>  	 * adapter ...
> +	 *
> +	 * If we're living with non-hard-coded parameters (either from a
> +	 * Firmware Configuration File or values programmed by a different PF
> +	 * Driver), give the SGE code a chance to pull in anything that it
> +	 * needs ...  Note that this must be called after we retrieve our VPD
> +	 * parameters in order to know how to convert core ticks to seconds.
>  	 */
>  	if (state == DEV_STATE_INIT) {
>  		dev_info(adap->pdev_dev, "Coming up as %s: "\
>  			 "Adapter already initialized\n",
>  			 adap->flags & MASTER_PF ? "MASTER" : "SLAVE");
>  		adap->flags |= USING_SOFT_PARAMS;
> -	} else {
> +		ret = t4_sge_init(adap);
> +		if (ret == -EINVAL) {
> +			adap->flags &= ~USING_SOFT_PARAMS;
> +			state = DEV_STATE_UNINIT;
> +		} else if (ret < 0) {
> +			goto bye;
> +		}
> +	}
> +	if (state != DEV_STATE_INIT) {
>  		dev_info(adap->pdev_dev, "Coming up as MASTER: "\
>  			 "Initializing adapter\n");
> 
> @@ -5300,19 +5314,11 @@ static int adap_init0(struct adapter *adap)
>  				-ret);
>  			goto bye;
>  		}
> -	}
> -
> -	/*
> -	 * If we're living with non-hard-coded parameters (either from a
> -	 * Firmware Configuration File or values programmed by a different PF
> -	 * Driver), give the SGE code a chance to pull in anything that it
> -	 * needs ...  Note that this must be called after we retrieve our VPD
> -	 * parameters in order to know how to convert core ticks to seconds.
> -	 */
> -	if (adap->flags & USING_SOFT_PARAMS) {
> -		ret = t4_sge_init(adap);
> -		if (ret < 0)
> -			goto bye;
> +		if (adap->flags & USING_SOFT_PARAMS) {
> +			ret = t4_sge_init(adap);
> +			if (ret < 0)
> +				goto bye;
> +		}
>  	}
> 
>  	if (is_bypass_device(adap->pdev->device))
> --
> 1.7.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] cxgb4: fix probe when already with invalid parameters
  2014-03-19 21:38 ` Dimitrios Michailidis
@ 2014-03-19 21:55   ` Casey Leedom
  2014-03-19 21:57   ` Thadeu Lima de Souza Cascardo
  1 sibling, 0 replies; 9+ messages in thread
From: Casey Leedom @ 2014-03-19 21:55 UTC (permalink / raw)
  To: Dimitrios Michailidis, Thadeu Lima de Souza Cascardo,
	netdev@vger.kernel.org

   Yes, know what firmware version you're using is critical.  Thanks!

Casey

On 03/19/14 14:38, Dimitrios Michailidis wrote:
> Thadeu Lima de Souza Cascardo wrote:
>> Since commit 636f9d371f70f22961fd598fe18380057518ca31 ("cxgb4: Add
>> support for T4 configuration file"), we have problems when probing the
>> device, and finding out it's already initialized, but does not have
>> valid buffer sizes setup.
>>
>> This may happen with kexec without shutdown, or bad firmware or
>> bootloader.
>>
>> The usual symptom is that probe fails:
>>
>> [    2.605494] cxgb4 0000:50:00.4: Coming up as MASTER: Adapter already
>> initialized
>> [    2.605511] cxgb4 0000:50:00.4: bad SGE FL page buffer sizes [0, 0]
>> [    2.625629] cxgb4: probe of 0000:50:00.4 failed with error -22
>>
>> The solution is to treat the adapter as not initialized in case the
>> parameters are invalid.
> The patch doesn't look right to me.  Besides reinitializing the device when it finds
> disagreeable settings it disregards that this PF may not be in charge of the device.
> If the controlling PF (what the code calls master) selects values this PF doesn't like
> with the patch it will elevate itself to master and install its own preferences.
>
> Also not right of course is that FW is claiming the device is initialized when clearly it isn't.
> Can you tell me which FW version is involved here and what steps got the device in this state?
>
>> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
>> ---
>>   drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |   34 +++++++++++++---------
>>   1 files changed, 20 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>> b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>> index 34e2488..d0638f9 100644
>> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>> @@ -5237,13 +5237,27 @@ static int adap_init0(struct adapter *adap)
>>   	 * master initialization), note that we're living with existing
>>   	 * adapter parameters.  Otherwise, it's time to try initializing the
>>   	 * adapter ...
>> +	 *
>> +	 * If we're living with non-hard-coded parameters (either from a
>> +	 * Firmware Configuration File or values programmed by a different PF
>> +	 * Driver), give the SGE code a chance to pull in anything that it
>> +	 * needs ...  Note that this must be called after we retrieve our VPD
>> +	 * parameters in order to know how to convert core ticks to seconds.
>>   	 */
>>   	if (state == DEV_STATE_INIT) {
>>   		dev_info(adap->pdev_dev, "Coming up as %s: "\
>>   			 "Adapter already initialized\n",
>>   			 adap->flags & MASTER_PF ? "MASTER" : "SLAVE");
>>   		adap->flags |= USING_SOFT_PARAMS;
>> -	} else {
>> +		ret = t4_sge_init(adap);
>> +		if (ret == -EINVAL) {
>> +			adap->flags &= ~USING_SOFT_PARAMS;
>> +			state = DEV_STATE_UNINIT;
>> +		} else if (ret < 0) {
>> +			goto bye;
>> +		}
>> +	}
>> +	if (state != DEV_STATE_INIT) {
>>   		dev_info(adap->pdev_dev, "Coming up as MASTER: "\
>>   			 "Initializing adapter\n");
>>
>> @@ -5300,19 +5314,11 @@ static int adap_init0(struct adapter *adap)
>>   				-ret);
>>   			goto bye;
>>   		}
>> -	}
>> -
>> -	/*
>> -	 * If we're living with non-hard-coded parameters (either from a
>> -	 * Firmware Configuration File or values programmed by a different PF
>> -	 * Driver), give the SGE code a chance to pull in anything that it
>> -	 * needs ...  Note that this must be called after we retrieve our VPD
>> -	 * parameters in order to know how to convert core ticks to seconds.
>> -	 */
>> -	if (adap->flags & USING_SOFT_PARAMS) {
>> -		ret = t4_sge_init(adap);
>> -		if (ret < 0)
>> -			goto bye;
>> +		if (adap->flags & USING_SOFT_PARAMS) {
>> +			ret = t4_sge_init(adap);
>> +			if (ret < 0)
>> +				goto bye;
>> +		}
>>   	}
>>
>>   	if (is_bypass_device(adap->pdev->device))
>> --
>> 1.7.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] cxgb4: fix probe when already with invalid parameters
  2014-03-19 21:38 ` Dimitrios Michailidis
  2014-03-19 21:55   ` Casey Leedom
@ 2014-03-19 21:57   ` Thadeu Lima de Souza Cascardo
  2014-03-19 22:13     ` Casey Leedom
  2014-03-19 22:54     ` Dimitrios Michailidis
  1 sibling, 2 replies; 9+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2014-03-19 21:57 UTC (permalink / raw)
  To: Dimitrios Michailidis; +Cc: netdev@vger.kernel.org, Casey Leedom

On Wed, Mar 19, 2014 at 09:38:49PM +0000, Dimitrios Michailidis wrote:
> Thadeu Lima de Souza Cascardo wrote:
> > Since commit 636f9d371f70f22961fd598fe18380057518ca31 ("cxgb4: Add
> > support for T4 configuration file"), we have problems when probing the
> > device, and finding out it's already initialized, but does not have
> > valid buffer sizes setup.
> > 
> > This may happen with kexec without shutdown, or bad firmware or
> > bootloader.
> > 
> > The usual symptom is that probe fails:
> > 
> > [    2.605494] cxgb4 0000:50:00.4: Coming up as MASTER: Adapter already
> > initialized
> > [    2.605511] cxgb4 0000:50:00.4: bad SGE FL page buffer sizes [0, 0]
> > [    2.625629] cxgb4: probe of 0000:50:00.4 failed with error -22
> > 
> > The solution is to treat the adapter as not initialized in case the
> > parameters are invalid.
> 
> The patch doesn't look right to me.  Besides reinitializing the device when it finds
> disagreeable settings it disregards that this PF may not be in charge of the device.
> If the controlling PF (what the code calls master) selects values this PF doesn't like
> with the patch it will elevate itself to master and install its own preferences.
> 
> Also not right of course is that FW is claiming the device is initialized when clearly it isn't.
> Can you tell me which FW version is involved here and what steps got the device in this state?
> 

We are trying to netboot, so it's possibly a problem on the Open
Firmware driver that makes it not send FW BYE before handling the CPU to
the bootloader. I could easily reproduce a similar situation by removing
the call to t4_fw_bye during the driver remove path, and reloading the
driver without commit 940d9d34a5467c2e2574866eb009d4cb61e27299 ("cxgb4:
allow large buffer size to have page size").

The firmware I am using is:
firmware-version: 1.9.23.0, TP 0.1.9.1

How about the change below?

If that's OK, I'll send a v2.

Cascardo.


> > Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
> > ---
> >  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |   34 +++++++++++++---------
> >  1 files changed, 20 insertions(+), 14 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> > b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> > index 34e2488..d0638f9 100644
> > --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> > +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> > @@ -5237,13 +5237,27 @@ static int adap_init0(struct adapter *adap)
> >  	 * master initialization), note that we're living with existing
> >  	 * adapter parameters.  Otherwise, it's time to try initializing the
> >  	 * adapter ...
> > +	 *
> > +	 * If we're living with non-hard-coded parameters (either from a
> > +	 * Firmware Configuration File or values programmed by a different PF
> > +	 * Driver), give the SGE code a chance to pull in anything that it
> > +	 * needs ...  Note that this must be called after we retrieve our VPD
> > +	 * parameters in order to know how to convert core ticks to seconds.
> >  	 */
> >  	if (state == DEV_STATE_INIT) {
> >  		dev_info(adap->pdev_dev, "Coming up as %s: "\
> >  			 "Adapter already initialized\n",
> >  			 adap->flags & MASTER_PF ? "MASTER" : "SLAVE");
> >  		adap->flags |= USING_SOFT_PARAMS;
> > -	} else {
> > +		ret = t4_sge_init(adap);
> > +		if (ret == -EINVAL) {

-		if (ret == -EINVAL) {
+		if (ret == -EINVAL && adap->flags & MASTER_PF) {

> > +			adap->flags &= ~USING_SOFT_PARAMS;
> > +			state = DEV_STATE_UNINIT;
> > +		} else if (ret < 0) {
> > +			goto bye;
> > +		}
> > +	}
> > +	if (state != DEV_STATE_INIT) {
> >  		dev_info(adap->pdev_dev, "Coming up as MASTER: "\
> >  			 "Initializing adapter\n");
> > 
> > @@ -5300,19 +5314,11 @@ static int adap_init0(struct adapter *adap)
> >  				-ret);
> >  			goto bye;
> >  		}
> > -	}
> > -
> > -	/*
> > -	 * If we're living with non-hard-coded parameters (either from a
> > -	 * Firmware Configuration File or values programmed by a different PF
> > -	 * Driver), give the SGE code a chance to pull in anything that it
> > -	 * needs ...  Note that this must be called after we retrieve our VPD
> > -	 * parameters in order to know how to convert core ticks to seconds.
> > -	 */
> > -	if (adap->flags & USING_SOFT_PARAMS) {
> > -		ret = t4_sge_init(adap);
> > -		if (ret < 0)
> > -			goto bye;
> > +		if (adap->flags & USING_SOFT_PARAMS) {
> > +			ret = t4_sge_init(adap);
> > +			if (ret < 0)
> > +				goto bye;
> > +		}
> >  	}
> > 
> >  	if (is_bypass_device(adap->pdev->device))
> > --
> > 1.7.1
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] cxgb4: fix probe when already with invalid parameters
  2014-03-19 21:57   ` Thadeu Lima de Souza Cascardo
@ 2014-03-19 22:13     ` Casey Leedom
  2014-03-19 22:25       ` Casey Leedom
  2014-03-19 22:54     ` Dimitrios Michailidis
  1 sibling, 1 reply; 9+ messages in thread
From: Casey Leedom @ 2014-03-19 22:13 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo, Dimitrios Michailidis
  Cc: netdev@vger.kernel.org, Hariprasad S

   Okay, that firmware is recent enough but Dimitris is right, the patch 
has a ton of problems.  This section of code is very tricky and we've 
worked hard to get it right.  The firmware also has special code in it 
to ~try to catch~ cases where a previous driver on the same PF failed to 
issue a BYE command.  When the firmware sees a new HELLO command come in 
from a PF and that PF is already "registered" (via a previous HELLO 
command), if it's the _only_ PF "registered" and the new HELLO command 
provides the Clear Initialized flag, then the firmware will assume a 
missing BYE and automatically "de-initialize" the firmware/chip.  So 
it's Very Weird that you're getting a PF is MASTER and Chip/Firmware is 
Initialized return.

Casey

On 03/19/14 14:57, Thadeu Lima de Souza Cascardo wrote:
> On Wed, Mar 19, 2014 at 09:38:49PM +0000, Dimitrios Michailidis wrote:
>> Thadeu Lima de Souza Cascardo wrote:
>>> Since commit 636f9d371f70f22961fd598fe18380057518ca31 ("cxgb4: Add
>>> support for T4 configuration file"), we have problems when probing the
>>> device, and finding out it's already initialized, but does not have
>>> valid buffer sizes setup.
>>>
>>> This may happen with kexec without shutdown, or bad firmware or
>>> bootloader.
>>>
>>> The usual symptom is that probe fails:
>>>
>>> [    2.605494] cxgb4 0000:50:00.4: Coming up as MASTER: Adapter already
>>> initialized
>>> [    2.605511] cxgb4 0000:50:00.4: bad SGE FL page buffer sizes [0, 0]
>>> [    2.625629] cxgb4: probe of 0000:50:00.4 failed with error -22
>>>
>>> The solution is to treat the adapter as not initialized in case the
>>> parameters are invalid.
>> The patch doesn't look right to me.  Besides reinitializing the device when it finds
>> disagreeable settings it disregards that this PF may not be in charge of the device.
>> If the controlling PF (what the code calls master) selects values this PF doesn't like
>> with the patch it will elevate itself to master and install its own preferences.
>>
>> Also not right of course is that FW is claiming the device is initialized when clearly it isn't.
>> Can you tell me which FW version is involved here and what steps got the device in this state?
>>
> We are trying to netboot, so it's possibly a problem on the Open
> Firmware driver that makes it not send FW BYE before handling the CPU to
> the bootloader. I could easily reproduce a similar situation by removing
> the call to t4_fw_bye during the driver remove path, and reloading the
> driver without commit 940d9d34a5467c2e2574866eb009d4cb61e27299 ("cxgb4:
> allow large buffer size to have page size").
>
> The firmware I am using is:
> firmware-version: 1.9.23.0, TP 0.1.9.1
>
> How about the change below?
>
> If that's OK, I'll send a v2.
>
> Cascardo.
>
>
>>> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
>>> ---
>>>   drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |   34 +++++++++++++---------
>>>   1 files changed, 20 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>>> b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>>> index 34e2488..d0638f9 100644
>>> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>>> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>>> @@ -5237,13 +5237,27 @@ static int adap_init0(struct adapter *adap)
>>>   	 * master initialization), note that we're living with existing
>>>   	 * adapter parameters.  Otherwise, it's time to try initializing the
>>>   	 * adapter ...
>>> +	 *
>>> +	 * If we're living with non-hard-coded parameters (either from a
>>> +	 * Firmware Configuration File or values programmed by a different PF
>>> +	 * Driver), give the SGE code a chance to pull in anything that it
>>> +	 * needs ...  Note that this must be called after we retrieve our VPD
>>> +	 * parameters in order to know how to convert core ticks to seconds.
>>>   	 */
>>>   	if (state == DEV_STATE_INIT) {
>>>   		dev_info(adap->pdev_dev, "Coming up as %s: "\
>>>   			 "Adapter already initialized\n",
>>>   			 adap->flags & MASTER_PF ? "MASTER" : "SLAVE");
>>>   		adap->flags |= USING_SOFT_PARAMS;
>>> -	} else {
>>> +		ret = t4_sge_init(adap);
>>> +		if (ret == -EINVAL) {
> -		if (ret == -EINVAL) {
> +		if (ret == -EINVAL && adap->flags & MASTER_PF) {
>
>>> +			adap->flags &= ~USING_SOFT_PARAMS;
>>> +			state = DEV_STATE_UNINIT;
>>> +		} else if (ret < 0) {
>>> +			goto bye;
>>> +		}
>>> +	}
>>> +	if (state != DEV_STATE_INIT) {
>>>   		dev_info(adap->pdev_dev, "Coming up as MASTER: "\
>>>   			 "Initializing adapter\n");
>>>
>>> @@ -5300,19 +5314,11 @@ static int adap_init0(struct adapter *adap)
>>>   				-ret);
>>>   			goto bye;
>>>   		}
>>> -	}
>>> -
>>> -	/*
>>> -	 * If we're living with non-hard-coded parameters (either from a
>>> -	 * Firmware Configuration File or values programmed by a different PF
>>> -	 * Driver), give the SGE code a chance to pull in anything that it
>>> -	 * needs ...  Note that this must be called after we retrieve our VPD
>>> -	 * parameters in order to know how to convert core ticks to seconds.
>>> -	 */
>>> -	if (adap->flags & USING_SOFT_PARAMS) {
>>> -		ret = t4_sge_init(adap);
>>> -		if (ret < 0)
>>> -			goto bye;
>>> +		if (adap->flags & USING_SOFT_PARAMS) {
>>> +			ret = t4_sge_init(adap);
>>> +			if (ret < 0)
>>> +				goto bye;
>>> +		}
>>>   	}
>>>
>>>   	if (is_bypass_device(adap->pdev->device))
>>> --
>>> 1.7.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] cxgb4: fix probe when already with invalid parameters
  2014-03-19 22:13     ` Casey Leedom
@ 2014-03-19 22:25       ` Casey Leedom
  2014-03-19 22:30         ` Thadeu Lima de Souza Cascardo
  0 siblings, 1 reply; 9+ messages in thread
From: Casey Leedom @ 2014-03-19 22:25 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo, Dimitrios Michailidis
  Cc: netdev@vger.kernel.org, Hariprasad S

   I think I know what's happening.  You mentioned that you're doing a 
"netboot" so it's probably using PF0..3 for the network boot (depending 
on the port being booted from) and is failing to issue a BYE command.  
When cxgb4 comes along it gets MASTER (because presumably the "netboot" 
said it didn't want to be MASTER) and the chip/firmware is reported as 
being Already Initialized.  The "netboot" here seems to be pretty crazy 
though since the above scenario would require it to explicitly refuse to 
be MASTER and fail to issue a BYE command.

   Which "netboot" are you using and what's the version number, etc.?

Casey


On 03/19/14 15:13, Casey Leedom wrote:
> Okay, that firmware is recent enough but Dimitris is right, the patch 
> has a ton of problems.  This section of code is very tricky and we've 
> worked hard to get it right.  The firmware also has special code in it 
> to ~try to catch~ cases where a previous driver on the same PF failed 
> to issue a BYE command.  When the firmware sees a new HELLO command 
> come in from a PF and that PF is already "registered" (via a previous 
> HELLO command), if it's the _only_ PF "registered" and the new HELLO 
> command provides the Clear Initialized flag, then the firmware will 
> assume a missing BYE and automatically "de-initialize" the 
> firmware/chip.  So it's Very Weird that you're getting a PF is MASTER 
> and Chip/Firmware is Initialized return.
>
> Casey
>
> On 03/19/14 14:57, Thadeu Lima de Souza Cascardo wrote:
>> On Wed, Mar 19, 2014 at 09:38:49PM +0000, Dimitrios Michailidis wrote:
>>> Thadeu Lima de Souza Cascardo wrote:
>>>> Since commit 636f9d371f70f22961fd598fe18380057518ca31 ("cxgb4: Add
>>>> support for T4 configuration file"), we have problems when probing the
>>>> device, and finding out it's already initialized, but does not have
>>>> valid buffer sizes setup.
>>>>
>>>> This may happen with kexec without shutdown, or bad firmware or
>>>> bootloader.
>>>>
>>>> The usual symptom is that probe fails:
>>>>
>>>> [    2.605494] cxgb4 0000:50:00.4: Coming up as MASTER: Adapter 
>>>> already
>>>> initialized
>>>> [    2.605511] cxgb4 0000:50:00.4: bad SGE FL page buffer sizes [0, 0]
>>>> [    2.625629] cxgb4: probe of 0000:50:00.4 failed with error -22
>>>>
>>>> The solution is to treat the adapter as not initialized in case the
>>>> parameters are invalid.
>>> The patch doesn't look right to me.  Besides reinitializing the 
>>> device when it finds
>>> disagreeable settings it disregards that this PF may not be in 
>>> charge of the device.
>>> If the controlling PF (what the code calls master) selects values 
>>> this PF doesn't like
>>> with the patch it will elevate itself to master and install its own 
>>> preferences.
>>>
>>> Also not right of course is that FW is claiming the device is 
>>> initialized when clearly it isn't.
>>> Can you tell me which FW version is involved here and what steps got 
>>> the device in this state?
>>>
>> We are trying to netboot, so it's possibly a problem on the Open
>> Firmware driver that makes it not send FW BYE before handling the CPU to
>> the bootloader. I could easily reproduce a similar situation by removing
>> the call to t4_fw_bye during the driver remove path, and reloading the
>> driver without commit 940d9d34a5467c2e2574866eb009d4cb61e27299 ("cxgb4:
>> allow large buffer size to have page size").
>>
>> The firmware I am using is:
>> firmware-version: 1.9.23.0, TP 0.1.9.1
>>
>> How about the change below?
>>
>> If that's OK, I'll send a v2.
>>
>> Cascardo.
>>
>>
>>>> Signed-off-by: Thadeu Lima de Souza Cascardo 
>>>> <cascardo@linux.vnet.ibm.com>
>>>> ---
>>>>   drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |   34 
>>>> +++++++++++++---------
>>>>   1 files changed, 20 insertions(+), 14 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>>>> b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>>>> index 34e2488..d0638f9 100644
>>>> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>>>> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>>>> @@ -5237,13 +5237,27 @@ static int adap_init0(struct adapter *adap)
>>>>        * master initialization), note that we're living with existing
>>>>        * adapter parameters.  Otherwise, it's time to try 
>>>> initializing the
>>>>        * adapter ...
>>>> +     *
>>>> +     * If we're living with non-hard-coded parameters (either from a
>>>> +     * Firmware Configuration File or values programmed by a 
>>>> different PF
>>>> +     * Driver), give the SGE code a chance to pull in anything 
>>>> that it
>>>> +     * needs ...  Note that this must be called after we retrieve 
>>>> our VPD
>>>> +     * parameters in order to know how to convert core ticks to 
>>>> seconds.
>>>>        */
>>>>       if (state == DEV_STATE_INIT) {
>>>>           dev_info(adap->pdev_dev, "Coming up as %s: "\
>>>>                "Adapter already initialized\n",
>>>>                adap->flags & MASTER_PF ? "MASTER" : "SLAVE");
>>>>           adap->flags |= USING_SOFT_PARAMS;
>>>> -    } else {
>>>> +        ret = t4_sge_init(adap);
>>>> +        if (ret == -EINVAL) {
>> -        if (ret == -EINVAL) {
>> +        if (ret == -EINVAL && adap->flags & MASTER_PF) {
>>
>>>> +            adap->flags &= ~USING_SOFT_PARAMS;
>>>> +            state = DEV_STATE_UNINIT;
>>>> +        } else if (ret < 0) {
>>>> +            goto bye;
>>>> +        }
>>>> +    }
>>>> +    if (state != DEV_STATE_INIT) {
>>>>           dev_info(adap->pdev_dev, "Coming up as MASTER: "\
>>>>                "Initializing adapter\n");
>>>>
>>>> @@ -5300,19 +5314,11 @@ static int adap_init0(struct adapter *adap)
>>>>                   -ret);
>>>>               goto bye;
>>>>           }
>>>> -    }
>>>> -
>>>> -    /*
>>>> -     * If we're living with non-hard-coded parameters (either from a
>>>> -     * Firmware Configuration File or values programmed by a 
>>>> different PF
>>>> -     * Driver), give the SGE code a chance to pull in anything 
>>>> that it
>>>> -     * needs ...  Note that this must be called after we retrieve 
>>>> our VPD
>>>> -     * parameters in order to know how to convert core ticks to 
>>>> seconds.
>>>> -     */
>>>> -    if (adap->flags & USING_SOFT_PARAMS) {
>>>> -        ret = t4_sge_init(adap);
>>>> -        if (ret < 0)
>>>> -            goto bye;
>>>> +        if (adap->flags & USING_SOFT_PARAMS) {
>>>> +            ret = t4_sge_init(adap);
>>>> +            if (ret < 0)
>>>> +                goto bye;
>>>> +        }
>>>>       }
>>>>
>>>>       if (is_bypass_device(adap->pdev->device))
>>>> -- 
>>>> 1.7.1
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] cxgb4: fix probe when already with invalid parameters
  2014-03-19 22:25       ` Casey Leedom
@ 2014-03-19 22:30         ` Thadeu Lima de Souza Cascardo
  2014-03-19 22:41           ` Casey Leedom
  0 siblings, 1 reply; 9+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2014-03-19 22:30 UTC (permalink / raw)
  To: Casey Leedom; +Cc: Dimitrios Michailidis, netdev@vger.kernel.org, Hariprasad S

On Wed, Mar 19, 2014 at 03:25:42PM -0700, Casey Leedom wrote:
>   I think I know what's happening.  You mentioned that you're doing
> a "netboot" so it's probably using PF0..3 for the network boot
> (depending on the port being booted from) and is failing to issue a
> BYE command.  When cxgb4 comes along it gets MASTER (because
> presumably the "netboot" said it didn't want to be MASTER) and the
> chip/firmware is reported as being Already Initialized.  The
> "netboot" here seems to be pretty crazy though since the above
> scenario would require it to explicitly refuse to be MASTER and fail
> to issue a BYE command.
> 
>   Which "netboot" are you using and what's the version number, etc.?
> 

That will take me a while to get answered. I'll see if I can find who
wrote that code, but it's a driver for Open Firmware on IBM Power
Systems. I am not really sure this is the cause, but that's what was
reported to me.

By the way, what exactly are the tons of problems with my patch,
considering the change I proposed, ie, checking that it's the MASTER?
What scenarios could it get wrong? Do you suggest any further checking?

Cascardo.

> Casey
> 
> 
> On 03/19/14 15:13, Casey Leedom wrote:
> >Okay, that firmware is recent enough but Dimitris is right, the
> >patch has a ton of problems.  This section of code is very tricky
> >and we've worked hard to get it right.  The firmware also has
> >special code in it to ~try to catch~ cases where a previous driver
> >on the same PF failed to issue a BYE command.  When the firmware
> >sees a new HELLO command come in from a PF and that PF is already
> >"registered" (via a previous HELLO command), if it's the _only_ PF
> >"registered" and the new HELLO command provides the Clear
> >Initialized flag, then the firmware will assume a missing BYE and
> >automatically "de-initialize" the firmware/chip.  So it's Very
> >Weird that you're getting a PF is MASTER and Chip/Firmware is
> >Initialized return.
> >
> >Casey
> >
> >On 03/19/14 14:57, Thadeu Lima de Souza Cascardo wrote:
> >>On Wed, Mar 19, 2014 at 09:38:49PM +0000, Dimitrios Michailidis wrote:
> >>>Thadeu Lima de Souza Cascardo wrote:
> >>>>Since commit 636f9d371f70f22961fd598fe18380057518ca31 ("cxgb4: Add
> >>>>support for T4 configuration file"), we have problems when probing the
> >>>>device, and finding out it's already initialized, but does not have
> >>>>valid buffer sizes setup.
> >>>>
> >>>>This may happen with kexec without shutdown, or bad firmware or
> >>>>bootloader.
> >>>>
> >>>>The usual symptom is that probe fails:
> >>>>
> >>>>[    2.605494] cxgb4 0000:50:00.4: Coming up as MASTER:
> >>>>Adapter already
> >>>>initialized
> >>>>[    2.605511] cxgb4 0000:50:00.4: bad SGE FL page buffer sizes [0, 0]
> >>>>[    2.625629] cxgb4: probe of 0000:50:00.4 failed with error -22
> >>>>
> >>>>The solution is to treat the adapter as not initialized in case the
> >>>>parameters are invalid.
> >>>The patch doesn't look right to me.  Besides reinitializing
> >>>the device when it finds
> >>>disagreeable settings it disregards that this PF may not be in
> >>>charge of the device.
> >>>If the controlling PF (what the code calls master) selects
> >>>values this PF doesn't like
> >>>with the patch it will elevate itself to master and install
> >>>its own preferences.
> >>>
> >>>Also not right of course is that FW is claiming the device is
> >>>initialized when clearly it isn't.
> >>>Can you tell me which FW version is involved here and what
> >>>steps got the device in this state?
> >>>
> >>We are trying to netboot, so it's possibly a problem on the Open
> >>Firmware driver that makes it not send FW BYE before handling the CPU to
> >>the bootloader. I could easily reproduce a similar situation by removing
> >>the call to t4_fw_bye during the driver remove path, and reloading the
> >>driver without commit 940d9d34a5467c2e2574866eb009d4cb61e27299 ("cxgb4:
> >>allow large buffer size to have page size").
> >>
> >>The firmware I am using is:
> >>firmware-version: 1.9.23.0, TP 0.1.9.1
> >>
> >>How about the change below?
> >>
> >>If that's OK, I'll send a v2.
> >>
> >>Cascardo.
> >>
> >>
> >>>>Signed-off-by: Thadeu Lima de Souza Cascardo
> >>>><cascardo@linux.vnet.ibm.com>
> >>>>---
> >>>>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |   34
> >>>>+++++++++++++---------
> >>>>  1 files changed, 20 insertions(+), 14 deletions(-)
> >>>>
> >>>>diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> >>>>b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> >>>>index 34e2488..d0638f9 100644
> >>>>--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> >>>>+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> >>>>@@ -5237,13 +5237,27 @@ static int adap_init0(struct adapter *adap)
> >>>>       * master initialization), note that we're living with existing
> >>>>       * adapter parameters.  Otherwise, it's time to try
> >>>>initializing the
> >>>>       * adapter ...
> >>>>+     *
> >>>>+     * If we're living with non-hard-coded parameters (either from a
> >>>>+     * Firmware Configuration File or values programmed by
> >>>>a different PF
> >>>>+     * Driver), give the SGE code a chance to pull in
> >>>>anything that it
> >>>>+     * needs ...  Note that this must be called after we
> >>>>retrieve our VPD
> >>>>+     * parameters in order to know how to convert core
> >>>>ticks to seconds.
> >>>>       */
> >>>>      if (state == DEV_STATE_INIT) {
> >>>>          dev_info(adap->pdev_dev, "Coming up as %s: "\
> >>>>               "Adapter already initialized\n",
> >>>>               adap->flags & MASTER_PF ? "MASTER" : "SLAVE");
> >>>>          adap->flags |= USING_SOFT_PARAMS;
> >>>>-    } else {
> >>>>+        ret = t4_sge_init(adap);
> >>>>+        if (ret == -EINVAL) {
> >>-        if (ret == -EINVAL) {
> >>+        if (ret == -EINVAL && adap->flags & MASTER_PF) {
> >>
> >>>>+            adap->flags &= ~USING_SOFT_PARAMS;
> >>>>+            state = DEV_STATE_UNINIT;
> >>>>+        } else if (ret < 0) {
> >>>>+            goto bye;
> >>>>+        }
> >>>>+    }
> >>>>+    if (state != DEV_STATE_INIT) {
> >>>>          dev_info(adap->pdev_dev, "Coming up as MASTER: "\
> >>>>               "Initializing adapter\n");
> >>>>
> >>>>@@ -5300,19 +5314,11 @@ static int adap_init0(struct adapter *adap)
> >>>>                  -ret);
> >>>>              goto bye;
> >>>>          }
> >>>>-    }
> >>>>-
> >>>>-    /*
> >>>>-     * If we're living with non-hard-coded parameters (either from a
> >>>>-     * Firmware Configuration File or values programmed by
> >>>>a different PF
> >>>>-     * Driver), give the SGE code a chance to pull in
> >>>>anything that it
> >>>>-     * needs ...  Note that this must be called after we
> >>>>retrieve our VPD
> >>>>-     * parameters in order to know how to convert core
> >>>>ticks to seconds.
> >>>>-     */
> >>>>-    if (adap->flags & USING_SOFT_PARAMS) {
> >>>>-        ret = t4_sge_init(adap);
> >>>>-        if (ret < 0)
> >>>>-            goto bye;
> >>>>+        if (adap->flags & USING_SOFT_PARAMS) {
> >>>>+            ret = t4_sge_init(adap);
> >>>>+            if (ret < 0)
> >>>>+                goto bye;
> >>>>+        }
> >>>>      }
> >>>>
> >>>>      if (is_bypass_device(adap->pdev->device))
> >>>>-- 
> >>>>1.7.1
> >
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] cxgb4: fix probe when already with invalid parameters
  2014-03-19 22:30         ` Thadeu Lima de Souza Cascardo
@ 2014-03-19 22:41           ` Casey Leedom
  0 siblings, 0 replies; 9+ messages in thread
From: Casey Leedom @ 2014-03-19 22:41 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo
  Cc: Dimitrios Michailidis, netdev@vger.kernel.org, Hariprasad S

   The biggest problem is that just because a PF is MASTER it doesn't 
mean that it can do anything it wants to the adapter.  There may be 
other already registered users (PFs) which are happily going about their 
business.  Being MASTER implies responsibilities:

  1. Initialization if the adapter isn't already initialized.

  2. Handling adapter errors and error interrupts.

Casey

On 03/19/14 15:30, Thadeu Lima de Souza Cascardo wrote:
> On Wed, Mar 19, 2014 at 03:25:42PM -0700, Casey Leedom wrote:
>>    I think I know what's happening.  You mentioned that you're doing
>> a "netboot" so it's probably using PF0..3 for the network boot
>> (depending on the port being booted from) and is failing to issue a
>> BYE command.  When cxgb4 comes along it gets MASTER (because
>> presumably the "netboot" said it didn't want to be MASTER) and the
>> chip/firmware is reported as being Already Initialized.  The
>> "netboot" here seems to be pretty crazy though since the above
>> scenario would require it to explicitly refuse to be MASTER and fail
>> to issue a BYE command.
>>
>>    Which "netboot" are you using and what's the version number, etc.?
>>
> That will take me a while to get answered. I'll see if I can find who
> wrote that code, but it's a driver for Open Firmware on IBM Power
> Systems. I am not really sure this is the cause, but that's what was
> reported to me.
>
> By the way, what exactly are the tons of problems with my patch,
> considering the change I proposed, ie, checking that it's the MASTER?
> What scenarios could it get wrong? Do you suggest any further checking?
>
> Cascardo.
>
>> Casey
>>
>>
>> On 03/19/14 15:13, Casey Leedom wrote:
>>> Okay, that firmware is recent enough but Dimitris is right, the
>>> patch has a ton of problems.  This section of code is very tricky
>>> and we've worked hard to get it right.  The firmware also has
>>> special code in it to ~try to catch~ cases where a previous driver
>>> on the same PF failed to issue a BYE command.  When the firmware
>>> sees a new HELLO command come in from a PF and that PF is already
>>> "registered" (via a previous HELLO command), if it's the _only_ PF
>>> "registered" and the new HELLO command provides the Clear
>>> Initialized flag, then the firmware will assume a missing BYE and
>>> automatically "de-initialize" the firmware/chip.  So it's Very
>>> Weird that you're getting a PF is MASTER and Chip/Firmware is
>>> Initialized return.
>>>
>>> Casey
>>>
>>> On 03/19/14 14:57, Thadeu Lima de Souza Cascardo wrote:
>>>> On Wed, Mar 19, 2014 at 09:38:49PM +0000, Dimitrios Michailidis wrote:
>>>>> Thadeu Lima de Souza Cascardo wrote:
>>>>>> Since commit 636f9d371f70f22961fd598fe18380057518ca31 ("cxgb4: Add
>>>>>> support for T4 configuration file"), we have problems when probing the
>>>>>> device, and finding out it's already initialized, but does not have
>>>>>> valid buffer sizes setup.
>>>>>>
>>>>>> This may happen with kexec without shutdown, or bad firmware or
>>>>>> bootloader.
>>>>>>
>>>>>> The usual symptom is that probe fails:
>>>>>>
>>>>>> [    2.605494] cxgb4 0000:50:00.4: Coming up as MASTER:
>>>>>> Adapter already
>>>>>> initialized
>>>>>> [    2.605511] cxgb4 0000:50:00.4: bad SGE FL page buffer sizes [0, 0]
>>>>>> [    2.625629] cxgb4: probe of 0000:50:00.4 failed with error -22
>>>>>>
>>>>>> The solution is to treat the adapter as not initialized in case the
>>>>>> parameters are invalid.
>>>>> The patch doesn't look right to me.  Besides reinitializing
>>>>> the device when it finds
>>>>> disagreeable settings it disregards that this PF may not be in
>>>>> charge of the device.
>>>>> If the controlling PF (what the code calls master) selects
>>>>> values this PF doesn't like
>>>>> with the patch it will elevate itself to master and install
>>>>> its own preferences.
>>>>>
>>>>> Also not right of course is that FW is claiming the device is
>>>>> initialized when clearly it isn't.
>>>>> Can you tell me which FW version is involved here and what
>>>>> steps got the device in this state?
>>>>>
>>>> We are trying to netboot, so it's possibly a problem on the Open
>>>> Firmware driver that makes it not send FW BYE before handling the CPU to
>>>> the bootloader. I could easily reproduce a similar situation by removing
>>>> the call to t4_fw_bye during the driver remove path, and reloading the
>>>> driver without commit 940d9d34a5467c2e2574866eb009d4cb61e27299 ("cxgb4:
>>>> allow large buffer size to have page size").
>>>>
>>>> The firmware I am using is:
>>>> firmware-version: 1.9.23.0, TP 0.1.9.1
>>>>
>>>> How about the change below?
>>>>
>>>> If that's OK, I'll send a v2.
>>>>
>>>> Cascardo.
>>>>
>>>>
>>>>>> Signed-off-by: Thadeu Lima de Souza Cascardo
>>>>>> <cascardo@linux.vnet.ibm.com>
>>>>>> ---
>>>>>>   drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |   34
>>>>>> +++++++++++++---------
>>>>>>   1 files changed, 20 insertions(+), 14 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>>>>>> b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>>>>>> index 34e2488..d0638f9 100644
>>>>>> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>>>>>> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>>>>>> @@ -5237,13 +5237,27 @@ static int adap_init0(struct adapter *adap)
>>>>>>        * master initialization), note that we're living with existing
>>>>>>        * adapter parameters.  Otherwise, it's time to try
>>>>>> initializing the
>>>>>>        * adapter ...
>>>>>> +     *
>>>>>> +     * If we're living with non-hard-coded parameters (either from a
>>>>>> +     * Firmware Configuration File or values programmed by
>>>>>> a different PF
>>>>>> +     * Driver), give the SGE code a chance to pull in
>>>>>> anything that it
>>>>>> +     * needs ...  Note that this must be called after we
>>>>>> retrieve our VPD
>>>>>> +     * parameters in order to know how to convert core
>>>>>> ticks to seconds.
>>>>>>        */
>>>>>>       if (state == DEV_STATE_INIT) {
>>>>>>           dev_info(adap->pdev_dev, "Coming up as %s: "\
>>>>>>                "Adapter already initialized\n",
>>>>>>                adap->flags & MASTER_PF ? "MASTER" : "SLAVE");
>>>>>>           adap->flags |= USING_SOFT_PARAMS;
>>>>>> -    } else {
>>>>>> +        ret = t4_sge_init(adap);
>>>>>> +        if (ret == -EINVAL) {
>>>> -        if (ret == -EINVAL) {
>>>> +        if (ret == -EINVAL && adap->flags & MASTER_PF) {
>>>>
>>>>>> +            adap->flags &= ~USING_SOFT_PARAMS;
>>>>>> +            state = DEV_STATE_UNINIT;
>>>>>> +        } else if (ret < 0) {
>>>>>> +            goto bye;
>>>>>> +        }
>>>>>> +    }
>>>>>> +    if (state != DEV_STATE_INIT) {
>>>>>>           dev_info(adap->pdev_dev, "Coming up as MASTER: "\
>>>>>>                "Initializing adapter\n");
>>>>>>
>>>>>> @@ -5300,19 +5314,11 @@ static int adap_init0(struct adapter *adap)
>>>>>>                   -ret);
>>>>>>               goto bye;
>>>>>>           }
>>>>>> -    }
>>>>>> -
>>>>>> -    /*
>>>>>> -     * If we're living with non-hard-coded parameters (either from a
>>>>>> -     * Firmware Configuration File or values programmed by
>>>>>> a different PF
>>>>>> -     * Driver), give the SGE code a chance to pull in
>>>>>> anything that it
>>>>>> -     * needs ...  Note that this must be called after we
>>>>>> retrieve our VPD
>>>>>> -     * parameters in order to know how to convert core
>>>>>> ticks to seconds.
>>>>>> -     */
>>>>>> -    if (adap->flags & USING_SOFT_PARAMS) {
>>>>>> -        ret = t4_sge_init(adap);
>>>>>> -        if (ret < 0)
>>>>>> -            goto bye;
>>>>>> +        if (adap->flags & USING_SOFT_PARAMS) {
>>>>>> +            ret = t4_sge_init(adap);
>>>>>> +            if (ret < 0)
>>>>>> +                goto bye;
>>>>>> +        }
>>>>>>       }
>>>>>>
>>>>>>       if (is_bypass_device(adap->pdev->device))
>>>>>> -- 
>>>>>> 1.7.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH] cxgb4: fix probe when already with invalid parameters
  2014-03-19 21:57   ` Thadeu Lima de Souza Cascardo
  2014-03-19 22:13     ` Casey Leedom
@ 2014-03-19 22:54     ` Dimitrios Michailidis
  1 sibling, 0 replies; 9+ messages in thread
From: Dimitrios Michailidis @ 2014-03-19 22:54 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo; +Cc: netdev@vger.kernel.org, Casey Leedom

Thadeu Lima de Souza Cascardo wrote:
> On Wed, Mar 19, 2014 at 09:38:49PM +0000, Dimitrios Michailidis wrote:
> > Thadeu Lima de Souza Cascardo wrote:
> > > Since commit 636f9d371f70f22961fd598fe18380057518ca31 ("cxgb4: Add
> > > support for T4 configuration file"), we have problems when probing the
> > > device, and finding out it's already initialized, but does not have
> > > valid buffer sizes setup.
> > >
> > > This may happen with kexec without shutdown, or bad firmware or
> > > bootloader.
> > >
> > > The usual symptom is that probe fails:
> > >
> > > [    2.605494] cxgb4 0000:50:00.4: Coming up as MASTER: Adapter already
> > > initialized
> > > [    2.605511] cxgb4 0000:50:00.4: bad SGE FL page buffer sizes [0, 0]
> > > [    2.625629] cxgb4: probe of 0000:50:00.4 failed with error -22
> > >
> > > The solution is to treat the adapter as not initialized in case the
> > > parameters are invalid.
> >
> > The patch doesn't look right to me.  Besides reinitializing the device when it
> finds
> > disagreeable settings it disregards that this PF may not be in charge of the
> device.
> > If the controlling PF (what the code calls master) selects values this PF doesn't
> like
> > with the patch it will elevate itself to master and install its own preferences.
> >
> > Also not right of course is that FW is claiming the device is initialized when
> clearly it isn't.
> > Can you tell me which FW version is involved here and what steps got the
> device in this state?
> >
> 
> We are trying to netboot, so it's possibly a problem on the Open
> Firmware driver that makes it not send FW BYE before handling the CPU to
> the bootloader. I could easily reproduce a similar situation by removing
> the call to t4_fw_bye during the driver remove path, and reloading the
> driver without commit 940d9d34a5467c2e2574866eb009d4cb61e27299
> ("cxgb4:
> allow large buffer size to have page size").
> 
> The firmware I am using is:
> firmware-version: 1.9.23.0, TP 0.1.9.1

Thanks for the info.

> How about the change below?

The problem with this one is once FW decides the device is initialized and ready for use
it can let in secondary PFs.  They'll get unhappy when you reset the device under them to
change the settings.  Here FW has put this PF in charge but also asked it not to change anything
while the device isn't really fully initialized.


> If that's OK, I'll send a v2.
> 
> Cascardo.
> 
> 
> > > Signed-off-by: Thadeu Lima de Souza Cascardo
> <cascardo@linux.vnet.ibm.com>
> > > ---
> > >  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |   34 +++++++++++++----
> -----
> > >  1 files changed, 20 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> > > b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> > > index 34e2488..d0638f9 100644
> > > --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> > > +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> > > @@ -5237,13 +5237,27 @@ static int adap_init0(struct adapter *adap)
> > >  	 * master initialization), note that we're living with existing
> > >  	 * adapter parameters.  Otherwise, it's time to try initializing the
> > >  	 * adapter ...
> > > +	 *
> > > +	 * If we're living with non-hard-coded parameters (either from a
> > > +	 * Firmware Configuration File or values programmed by a different PF
> > > +	 * Driver), give the SGE code a chance to pull in anything that it
> > > +	 * needs ...  Note that this must be called after we retrieve our VPD
> > > +	 * parameters in order to know how to convert core ticks to seconds.
> > >  	 */
> > >  	if (state == DEV_STATE_INIT) {
> > >  		dev_info(adap->pdev_dev, "Coming up as %s: "\
> > >  			 "Adapter already initialized\n",
> > >  			 adap->flags & MASTER_PF ? "MASTER" : "SLAVE");
> > >  		adap->flags |= USING_SOFT_PARAMS;
> > > -	} else {
> > > +		ret = t4_sge_init(adap);
> > > +		if (ret == -EINVAL) {
> 
> -		if (ret == -EINVAL) {
> +		if (ret == -EINVAL && adap->flags & MASTER_PF) {
> 
> > > +			adap->flags &= ~USING_SOFT_PARAMS;
> > > +			state = DEV_STATE_UNINIT;
> > > +		} else if (ret < 0) {
> > > +			goto bye;
> > > +		}
> > > +	}
> > > +	if (state != DEV_STATE_INIT) {
> > >  		dev_info(adap->pdev_dev, "Coming up as MASTER: "\
> > >  			 "Initializing adapter\n");
> > >
> > > @@ -5300,19 +5314,11 @@ static int adap_init0(struct adapter *adap)
> > >  				-ret);
> > >  			goto bye;
> > >  		}
> > > -	}
> > > -
> > > -	/*
> > > -	 * If we're living with non-hard-coded parameters (either from a
> > > -	 * Firmware Configuration File or values programmed by a different PF
> > > -	 * Driver), give the SGE code a chance to pull in anything that it
> > > -	 * needs ...  Note that this must be called after we retrieve our VPD
> > > -	 * parameters in order to know how to convert core ticks to seconds.
> > > -	 */
> > > -	if (adap->flags & USING_SOFT_PARAMS) {
> > > -		ret = t4_sge_init(adap);
> > > -		if (ret < 0)
> > > -			goto bye;
> > > +		if (adap->flags & USING_SOFT_PARAMS) {
> > > +			ret = t4_sge_init(adap);
> > > +			if (ret < 0)
> > > +				goto bye;
> > > +		}
> > >  	}
> > >
> > >  	if (is_bypass_device(adap->pdev->device))
> > > --
> > > 1.7.1
> >

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-03-19 22:54 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-19 18:49 [PATCH] cxgb4: fix probe when already with invalid parameters Thadeu Lima de Souza Cascardo
2014-03-19 21:38 ` Dimitrios Michailidis
2014-03-19 21:55   ` Casey Leedom
2014-03-19 21:57   ` Thadeu Lima de Souza Cascardo
2014-03-19 22:13     ` Casey Leedom
2014-03-19 22:25       ` Casey Leedom
2014-03-19 22:30         ` Thadeu Lima de Souza Cascardo
2014-03-19 22:41           ` Casey Leedom
2014-03-19 22:54     ` Dimitrios Michailidis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).