* [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host
@ 2014-06-01 10:49 Or Gerlitz
2014-06-01 16:41 ` Sergei Shtylyov
2014-06-02 14:29 ` Wei Yang
0 siblings, 2 replies; 13+ messages in thread
From: Or Gerlitz @ 2014-06-01 10:49 UTC (permalink / raw)
To: davem; +Cc: netdev, amirv, weiyang, Jack Morgenstein, Or Gerlitz
From: Jack Morgenstein <jackm@dev.mellanox.co.il>
Commit befdf89 did not take into account the case where the Host
driver is being unloaded. In this case, pci_get_drvdata for the VF
remove_one call may return NULL, so that dereferencing the priv
struct results in a kernel oops.
The fix is to also test that the dev pointer returned by
pci_get_drvdata is non-NULL.
Fixes: befdf89 ("preserve pcd_dev_data after __mlx4_remove_one()")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/main.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index c187d74..a6ae089 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -2629,7 +2629,7 @@ static void __mlx4_remove_one(struct pci_dev *pdev)
int pci_dev_data;
int p;
- if (priv->removed)
+ if (!dev || priv->removed)
return;
pci_dev_data = priv->pci_dev_data;
--
1.7.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host
2014-06-01 10:49 [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host Or Gerlitz
@ 2014-06-01 16:41 ` Sergei Shtylyov
2014-06-01 19:59 ` Or Gerlitz
2014-06-02 14:29 ` Wei Yang
1 sibling, 1 reply; 13+ messages in thread
From: Sergei Shtylyov @ 2014-06-01 16:41 UTC (permalink / raw)
To: Or Gerlitz, davem; +Cc: netdev, amirv, weiyang, Jack Morgenstein
Hello.
On 06/01/2014 02:49 PM, Or Gerlitz wrote:
> From: Jack Morgenstein <jackm@dev.mellanox.co.il>
> Commit befdf89 did not take into account the case where the Host
Please also specify that commit's summary line in parens.
> driver is being unloaded. In this case, pci_get_drvdata for the VF
> remove_one call may return NULL, so that dereferencing the priv
> struct results in a kernel oops.
> The fix is to also test that the dev pointer returned by
> pci_get_drvdata is non-NULL.
> Fixes: befdf89 ("preserve pcd_dev_data after __mlx4_remove_one()")
> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
WBR, Sergei
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host
2014-06-01 16:41 ` Sergei Shtylyov
@ 2014-06-01 19:59 ` Or Gerlitz
0 siblings, 0 replies; 13+ messages in thread
From: Or Gerlitz @ 2014-06-01 19:59 UTC (permalink / raw)
To: Sergei Shtylyov
Cc: Or Gerlitz, David Miller, netdev@vger.kernel.org, Amir Vadai,
Wei Yang, Jack Morgenstein
On Sun, Jun 1, 2014 at 7:41 PM, Sergei Shtylyov
<sergei.shtylyov@cogentembedded.com> wrote:
> On 06/01/2014 02:49 PM, Or Gerlitz wrote:
>> Commit befdf89 did not take into account the case where the Host
> Please also specify that commit's summary line in parens.
Did that below, see where we say
Fixes: befdf89 ("preserve pcd_dev_data after __mlx4_remove_one()")
>> driver is being unloaded. In this case, pci_get_drvdata for the VF
>> remove_one call may return NULL, so that dereferencing the priv
>> struct results in a kernel oops.
>> The fix is to also test that the dev pointer returned by
>> pci_get_drvdata is non-NULL.
>> Fixes: befdf89 ("preserve pcd_dev_data after __mlx4_remove_one()")
>> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
>> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host
2014-06-01 10:49 [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host Or Gerlitz
2014-06-01 16:41 ` Sergei Shtylyov
@ 2014-06-02 14:29 ` Wei Yang
2014-06-02 16:10 ` Bjorn Helgaas
1 sibling, 1 reply; 13+ messages in thread
From: Wei Yang @ 2014-06-02 14:29 UTC (permalink / raw)
To: Or Gerlitz; +Cc: davem, netdev, amirv, weiyang, bhelgaas, Jack Morgenstein
On Sun, Jun 01, 2014 at 01:49:43PM +0300, Or Gerlitz wrote:
>From: Jack Morgenstein <jackm@dev.mellanox.co.il>
>
>Commit befdf89 did not take into account the case where the Host
>driver is being unloaded. In this case, pci_get_drvdata for the VF
In my mind, unloading PF's driver when there is alive VFs is not allowed.
Quoted in driver code:
/* in SRIOV it is not allowed to unload the pf's
* driver while there are alive vf's */
if (mlx4_is_master(dev) && mlx4_how_many_lives_vf(dev))
printk(KERN_ERR "Removing PF when there are assigned VF's !!!\n");
Actually, I don't understand this restriction clearly. Maybe my understanding
of alive VF is not correct.
And in your code, unload PF's driver would call pci_disable_sriov() which will
destroy the VFs. While in your test, the VF's driver is still there?
>remove_one call may return NULL, so that dereferencing the priv
>struct results in a kernel oops.
Sorry for my poor mind, I still can't understand this situation.
Would you describe the situation more? You are unloading PF's driver in Host
at first, and then try to release the VF's driver?
>
>The fix is to also test that the dev pointer returned by
>pci_get_drvdata is non-NULL.
>
>Fixes: befdf89 ("preserve pcd_dev_data after __mlx4_remove_one()")
>Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
>Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
>---
> drivers/net/ethernet/mellanox/mlx4/main.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
>diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
>index c187d74..a6ae089 100644
>--- a/drivers/net/ethernet/mellanox/mlx4/main.c
>+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
>@@ -2629,7 +2629,7 @@ static void __mlx4_remove_one(struct pci_dev *pdev)
> int pci_dev_data;
> int p;
>
>- if (priv->removed)
>+ if (!dev || priv->removed)
> return;
This fix looks good to me.
As I remembered, I had this check in my first version, but I removed the check
on dev based on the suggestion from Bjorn. Since I agreed that there is no
chance for dev to be NULL. Bjorn, seems we are not correct :(
>
> pci_dev_data = priv->pci_dev_data;
>--
>1.7.1
--
Richard Yang
Help you, Help me
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host
2014-06-02 14:29 ` Wei Yang
@ 2014-06-02 16:10 ` Bjorn Helgaas
2014-06-03 0:58 ` David Miller
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Bjorn Helgaas @ 2014-06-02 16:10 UTC (permalink / raw)
To: Wei Yang; +Cc: Or Gerlitz, David Miller, netdev, Amir Vadai, Jack Morgenstein
On Mon, Jun 2, 2014 at 8:29 AM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote:
> On Sun, Jun 01, 2014 at 01:49:43PM +0300, Or Gerlitz wrote:
>>From: Jack Morgenstein <jackm@dev.mellanox.co.il>
>>
>>Commit befdf89 did not take into account the case where the Host
>>driver is being unloaded. In this case, pci_get_drvdata for the VF
>
> In my mind, unloading PF's driver when there is alive VFs is not allowed.
> Quoted in driver code:
>
> /* in SRIOV it is not allowed to unload the pf's
> * driver while there are alive vf's */
> if (mlx4_is_master(dev) && mlx4_how_many_lives_vf(dev))
> printk(KERN_ERR "Removing PF when there are assigned VF's !!!\n");
>
> Actually, I don't understand this restriction clearly. Maybe my understanding
> of alive VF is not correct.
>
> And in your code, unload PF's driver would call pci_disable_sriov() which will
> destroy the VFs. While in your test, the VF's driver is still there?
>
>>remove_one call may return NULL, so that dereferencing the priv
>>struct results in a kernel oops.
>
> Sorry for my poor mind, I still can't understand this situation.
> Would you describe the situation more? You are unloading PF's driver in Host
> at first, and then try to release the VF's driver?
>
>>
>>The fix is to also test that the dev pointer returned by
>>pci_get_drvdata is non-NULL.
>>
>>Fixes: befdf89 ("preserve pcd_dev_data after __mlx4_remove_one()")
>>Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
>>Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
>>---
>> drivers/net/ethernet/mellanox/mlx4/main.c | 2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>>diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
>>index c187d74..a6ae089 100644
>>--- a/drivers/net/ethernet/mellanox/mlx4/main.c
>>+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
>>@@ -2629,7 +2629,7 @@ static void __mlx4_remove_one(struct pci_dev *pdev)
>> int pci_dev_data;
>> int p;
>>
>>- if (priv->removed)
>>+ if (!dev || priv->removed)
>> return;
>
> This fix looks good to me.
>
> As I remembered, I had this check in my first version, but I removed the check
> on dev based on the suggestion from Bjorn. Since I agreed that there is no
> chance for dev to be NULL. Bjorn, seems we are not correct :(
Writing a driver is not an empirical process of trying things to see
what works. You need to actively design a consistent structure so you
know why and when things are safe. I object to gratuitous "dev ==
NULL" checks because often they are just a way of patching up a driver
design that isn't well thought-out.
As I wrote before:
From the PCI core's perspective, after .probe() returns successfully,
we can call any driver entry point and pass the pci_dev to it, and
expect it to work. Doing mlx4_remove_one() in mlx4_pci_err_detected()
sort of breaks that assumption because you clear out pci_drvdata().
Right now, the only other entry point mlx4 really implements is
mlx4_remove_one(), and it has a hack that tests whether pci_drvdata()
is NULL. But that's ... a hack, and you'll have to do the same
if/when you implement suspend/resume/sriov_configure/etc.
Bjorn
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host
2014-06-02 16:10 ` Bjorn Helgaas
@ 2014-06-03 0:58 ` David Miller
2014-06-03 2:00 ` Wei Yang
` (2 subsequent siblings)
3 siblings, 0 replies; 13+ messages in thread
From: David Miller @ 2014-06-03 0:58 UTC (permalink / raw)
To: bhelgaas; +Cc: weiyang, ogerlitz, netdev, amirv, jackm
From: Bjorn Helgaas <bhelgaas@google.com>
Date: Mon, 2 Jun 2014 10:10:01 -0600
> Writing a driver is not an empirical process of trying things to see
> what works. You need to actively design a consistent structure so you
> know why and when things are safe. I object to gratuitous "dev ==
> NULL" checks because often they are just a way of patching up a driver
> design that isn't well thought-out.
>
> As I wrote before:
>
> From the PCI core's perspective, after .probe() returns successfully,
> we can call any driver entry point and pass the pci_dev to it, and
> expect it to work. Doing mlx4_remove_one() in mlx4_pci_err_detected()
> sort of breaks that assumption because you clear out pci_drvdata().
> Right now, the only other entry point mlx4 really implements is
> mlx4_remove_one(), and it has a hack that tests whether pci_drvdata()
> is NULL. But that's ... a hack, and you'll have to do the same
> if/when you implement suspend/resume/sriov_configure/etc.
Agreed.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host
2014-06-02 16:10 ` Bjorn Helgaas
2014-06-03 0:58 ` David Miller
@ 2014-06-03 2:00 ` Wei Yang
2014-06-03 8:15 ` Or Gerlitz
2014-06-08 9:16 ` Or Gerlitz
3 siblings, 0 replies; 13+ messages in thread
From: Wei Yang @ 2014-06-03 2:00 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Wei Yang, Or Gerlitz, David Miller, netdev, Amir Vadai,
Jack Morgenstein
On Mon, Jun 02, 2014 at 10:10:01AM -0600, Bjorn Helgaas wrote:
>On Mon, Jun 2, 2014 at 8:29 AM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote:
>> On Sun, Jun 01, 2014 at 01:49:43PM +0300, Or Gerlitz wrote:
>>>From: Jack Morgenstein <jackm@dev.mellanox.co.il>
>>>
>>>Commit befdf89 did not take into account the case where the Host
>>>driver is being unloaded. In this case, pci_get_drvdata for the VF
>>
>> In my mind, unloading PF's driver when there is alive VFs is not allowed.
>> Quoted in driver code:
>>
>> /* in SRIOV it is not allowed to unload the pf's
>> * driver while there are alive vf's */
>> if (mlx4_is_master(dev) && mlx4_how_many_lives_vf(dev))
>> printk(KERN_ERR "Removing PF when there are assigned VF's !!!\n");
>>
>> Actually, I don't understand this restriction clearly. Maybe my understanding
>> of alive VF is not correct.
>>
>> And in your code, unload PF's driver would call pci_disable_sriov() which will
>> destroy the VFs. While in your test, the VF's driver is still there?
>>
>>>remove_one call may return NULL, so that dereferencing the priv
>>>struct results in a kernel oops.
>>
>> Sorry for my poor mind, I still can't understand this situation.
>> Would you describe the situation more? You are unloading PF's driver in Host
>> at first, and then try to release the VF's driver?
>>
>>>
>>>The fix is to also test that the dev pointer returned by
>>>pci_get_drvdata is non-NULL.
>>>
>>>Fixes: befdf89 ("preserve pcd_dev_data after __mlx4_remove_one()")
>>>Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
>>>Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
>>>---
>>> drivers/net/ethernet/mellanox/mlx4/main.c | 2 +-
>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>
>>>diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
>>>index c187d74..a6ae089 100644
>>>--- a/drivers/net/ethernet/mellanox/mlx4/main.c
>>>+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
>>>@@ -2629,7 +2629,7 @@ static void __mlx4_remove_one(struct pci_dev *pdev)
>>> int pci_dev_data;
>>> int p;
>>>
>>>- if (priv->removed)
>>>+ if (!dev || priv->removed)
>>> return;
>>
>> This fix looks good to me.
>>
>> As I remembered, I had this check in my first version, but I removed the check
>> on dev based on the suggestion from Bjorn. Since I agreed that there is no
>> chance for dev to be NULL. Bjorn, seems we are not correct :(
>
>Writing a driver is not an empirical process of trying things to see
>what works. You need to actively design a consistent structure so you
>know why and when things are safe. I object to gratuitous "dev ==
>NULL" checks because often they are just a way of patching up a driver
>design that isn't well thought-out.
>
>As I wrote before:
>
> From the PCI core's perspective, after .probe() returns successfully,
> we can call any driver entry point and pass the pci_dev to it, and
> expect it to work. Doing mlx4_remove_one() in mlx4_pci_err_detected()
> sort of breaks that assumption because you clear out pci_drvdata().
> Right now, the only other entry point mlx4 really implements is
> mlx4_remove_one(), and it has a hack that tests whether pci_drvdata()
> is NULL. But that's ... a hack, and you'll have to do the same
> if/when you implement suspend/resume/sriov_configure/etc.
Thanks for your kindness. After re-reading it, I understand it more, it is not
only related to the Mellanox driver, but also the whole picture about how to
write a driver.
1. We should make the driver entry save, after .probe() returns successfully.
2. If there is an exception and a hack to test the pci_drvdata(), we need to
have this hack in suspend/resum/etc.
Now back to the current mlx4 driver, mlx4_remove_one() is called by .shutdown
and .remove. In my mind, these two hook is invoked by rmmod or reboot. By
doing so, it is trying to comply with rule 1, make sure the pci_drvdata() is
valid, after .probe() succeed.
Then I am curious about in which case the driver break this rule.
Following is my suggestion:
1. To comply with rule 1, it would be better to fix this point instead of add
a hack.
2. Or to comply with rule 2, the driver needs to check pci_drvdata() in every
driver's entry instead of just in one driver entry. For example,
mlx4_pci_slot_reset() need this check too.
Bjorn, thanks again, hope my understanding this time is correct :-)
>
>Bjorn
--
Richard Yang
Help you, Help me
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host
2014-06-02 16:10 ` Bjorn Helgaas
2014-06-03 0:58 ` David Miller
2014-06-03 2:00 ` Wei Yang
@ 2014-06-03 8:15 ` Or Gerlitz
2014-06-03 8:40 ` Wei Yang
` (2 more replies)
2014-06-08 9:16 ` Or Gerlitz
3 siblings, 3 replies; 13+ messages in thread
From: Or Gerlitz @ 2014-06-03 8:15 UTC (permalink / raw)
To: Bjorn Helgaas, David Miller
Cc: Wei Yang, netdev, Amir Vadai, Jack Morgenstein, Tal Alon,
Yevgeny Petrilin
On Mon, Jun 2, 2014 at 7:10 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> Writing a driver is not an empirical process of trying things to see
> what works. You need to actively design a consistent structure so you
> know why and when things are safe. I object to gratuitous "dev ==
> NULL" checks because often they are just a way of patching up a driver
> design that isn't well thought-out.
Bjorn, 1st and most -- Agreed.
Next, to be precise, the use case of rebooting the host while the
driver was loaded in SRIOV mode and NO VFs probed to VMs worked before
commit befdf89 and is now broken.
Reading further your response, I understand that the code was probably
using a sort of hackish branching to make that to happen, and you
suggest we re-write that section properly so it can serve well when
(hopefully soon) implemenet
sriov_configure and possibly also suspend/resume, point taken.
Dave, as for this patch, again, the regression of inability to reboot
the host node
while the driver is loaded exists in the latest upstream code as of
befdf89 / 3.15-rc1
Now, taking into account that 3.15 is after rc8 and the IL devel team
has a holiday this week, I don't see us coming in time with a more
deeper fix for 3.15, so maybe you can eventaully go and merge this one
liner for 3.15?
Or.
> As I wrote before:
> From the PCI core's perspective, after .probe() returns successfully,
> we can call any driver entry point and pass the pci_dev to it, and
> expect it to work. Doing mlx4_remove_one() in mlx4_pci_err_detected()
> sort of breaks that assumption because you clear out pci_drvdata().
> Right now, the only other entry point mlx4 really implements is
> mlx4_remove_one(), and it has a hack that tests whether pci_drvdata()
> is NULL. But that's ... a hack, and you'll have to do the same
> if/when you implement suspend/resume/sriov_configure/etc.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host
2014-06-03 8:15 ` Or Gerlitz
@ 2014-06-03 8:40 ` Wei Yang
2014-06-04 9:50 ` Wei Yang
2014-06-06 2:52 ` Wei Yang
2 siblings, 0 replies; 13+ messages in thread
From: Wei Yang @ 2014-06-03 8:40 UTC (permalink / raw)
To: Or Gerlitz
Cc: Bjorn Helgaas, David Miller, Wei Yang, netdev, Amir Vadai,
Jack Morgenstein, Tal Alon, Yevgeny Petrilin
On Tue, Jun 03, 2014 at 11:15:43AM +0300, Or Gerlitz wrote:
>On Mon, Jun 2, 2014 at 7:10 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> Writing a driver is not an empirical process of trying things to see
>> what works. You need to actively design a consistent structure so you
>> know why and when things are safe. I object to gratuitous "dev ==
>> NULL" checks because often they are just a way of patching up a driver
>> design that isn't well thought-out.
>
>Bjorn, 1st and most -- Agreed.
>
>Next, to be precise, the use case of rebooting the host while the
>driver was loaded in SRIOV mode and NO VFs probed to VMs worked before
>commit befdf89 and is now broken.
>
>Reading further your response, I understand that the code was probably
>using a sort of hackish branching to make that to happen, and you
>suggest we re-write that section properly so it can serve well when
>(hopefully soon) implemenet
>sriov_configure and possibly also suspend/resume, point taken.
>
>Dave, as for this patch, again, the regression of inability to reboot
>the host node
>while the driver is loaded exists in the latest upstream code as of
>befdf89 / 3.15-rc1
>
>Now, taking into account that 3.15 is after rc8 and the IL devel team
>has a holiday this week, I don't see us coming in time with a more
>deeper fix for 3.15, so maybe you can eventaully go and merge this one
>liner for 3.15?
I am glad to verify your patch, if you wish.
>
>Or.
>
>
>> As I wrote before:
>> From the PCI core's perspective, after .probe() returns successfully,
>> we can call any driver entry point and pass the pci_dev to it, and
>> expect it to work. Doing mlx4_remove_one() in mlx4_pci_err_detected()
>> sort of breaks that assumption because you clear out pci_drvdata().
>> Right now, the only other entry point mlx4 really implements is
>> mlx4_remove_one(), and it has a hack that tests whether pci_drvdata()
>> is NULL. But that's ... a hack, and you'll have to do the same
>> if/when you implement suspend/resume/sriov_configure/etc.
--
Richard Yang
Help you, Help me
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host
2014-06-03 8:15 ` Or Gerlitz
2014-06-03 8:40 ` Wei Yang
@ 2014-06-04 9:50 ` Wei Yang
2014-06-06 2:52 ` Wei Yang
2 siblings, 0 replies; 13+ messages in thread
From: Wei Yang @ 2014-06-04 9:50 UTC (permalink / raw)
To: Or Gerlitz
Cc: Bjorn Helgaas, David Miller, Wei Yang, netdev, Amir Vadai,
Jack Morgenstein, Tal Alon, Yevgeny Petrilin
On Tue, Jun 03, 2014 at 11:15:43AM +0300, Or Gerlitz wrote:
>On Mon, Jun 2, 2014 at 7:10 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> Writing a driver is not an empirical process of trying things to see
>> what works. You need to actively design a consistent structure so you
>> know why and when things are safe. I object to gratuitous "dev ==
>> NULL" checks because often they are just a way of patching up a driver
>> design that isn't well thought-out.
>
>Bjorn, 1st and most -- Agreed.
>
>Next, to be precise, the use case of rebooting the host while the
>driver was loaded in SRIOV mode and NO VFs probed to VMs worked before
>commit befdf89 and is now broken.
>
>Reading further your response, I understand that the code was probably
>using a sort of hackish branching to make that to happen, and you
>suggest we re-write that section properly so it can serve well when
>(hopefully soon) implemenet
>sriov_configure and possibly also suspend/resume, point taken.
>
>Dave, as for this patch, again, the regression of inability to reboot
>the host node
>while the driver is loaded exists in the latest upstream code as of
>befdf89 / 3.15-rc1
>
>Now, taking into account that 3.15 is after rc8 and the IL devel team
>has a holiday this week, I don't see us coming in time with a more
>deeper fix for 3.15, so maybe you can eventaully go and merge this one
>liner for 3.15?
>
>Or.
Hi, Or,
I did some tests with your steps to reproduce the case. Below is my analysis:
I did "rmmod mlx4_core" and "kexec" after probe the Mellanox driver. Below is
the log from two steps respectively.
[root@tian-lp1 ywywyang]# rmmod mlx4_core
[ 534.159740] mlx4_core 0003:05:00.1: mlx4_remove_one: called
[ 534.161272] mlx4_core 0003:05:00.0: Received reset from slave:1
[ 534.161509] mlx4_core 0003:05:00.0: mlx4_remove_one: called
[ 534.170823] mlx4_core 0003:05:00.0: Disabling SR-IOV
[root@tian-lp1 ywywyang]# kexec -e
[ 669.089322] kvm: exiting hardware virtualization
[ 669.091746] mlx4_core 0003:05:00.1: mlx4_remove_one: called
[ 669.326754] mlx4_core 0003:05:00.0: Received reset from slave:1
[ 674.488417] lpfc 0006:01:00.4: 2:2885 Port Status Event: port status reg 0x81000000, port smphr reg 0xc000, error 1=0x9f000001, error 2=0xa9fa47fd
[ 675.618578] mlx4_core 0003:05:00.0: mlx4_remove_one: called
[ 675.691278] mlx4_en 0003:05:00.0: removed PHC
[ 675.700414] mlx4_core 0003:05:00.0: Disabling SR-IOV
[ 675.700630] mlx4_core 0003:05:00.1: mlx4_remove_one: called
[ 675.700701] Unable to handle kernel paging request for data at address 0x00000370
[ 675.700769] Faulting instruction address: 0xd00000001a13fb88
[ 675.700826] Oops: Kernel access of bad area, sig: 11 [#1]
[---]
During rmmod, the driver works fine, and in kexec there is oops message. The
kexec is almost the same as reboot. We see the driver for pci device
0003:05:00.1 has been "removed" twice and at the second time the driver
triggers an error.
rmmod and kexec calls different driver entry, rmmod -> .remove and
kexec->shutdown. I think this is the reason why there is an oops message
during reboot. In .shutdown, the driver will not be detached. While in case
there is VFs, both .shutdown and .remove will be invoked on VF.
Did a quick glance at the e1000e driver, the .shutdown and .remove behaves
differently. So maybe at .shutdown, it needs some different handling than
.remove. Well adding a check at .remove is a quick fix for this case.
This is my draft analysis for your reference, hope it is correct and help you
to some extend.
Have a good day :-)
--
Richard Yang
Help you, Help me
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host
2014-06-03 8:15 ` Or Gerlitz
2014-06-03 8:40 ` Wei Yang
2014-06-04 9:50 ` Wei Yang
@ 2014-06-06 2:52 ` Wei Yang
2014-06-08 9:18 ` Or Gerlitz
2 siblings, 1 reply; 13+ messages in thread
From: Wei Yang @ 2014-06-06 2:52 UTC (permalink / raw)
To: Or Gerlitz
Cc: Bjorn Helgaas, David Miller, Wei Yang, netdev, Amir Vadai,
Jack Morgenstein, Tal Alon, Yevgeny Petrilin
On Tue, Jun 03, 2014 at 11:15:43AM +0300, Or Gerlitz wrote:
>On Mon, Jun 2, 2014 at 7:10 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> Writing a driver is not an empirical process of trying things to see
>> what works. You need to actively design a consistent structure so you
>> know why and when things are safe. I object to gratuitous "dev ==
>> NULL" checks because often they are just a way of patching up a driver
>> design that isn't well thought-out.
>
>Bjorn, 1st and most -- Agreed.
>
>Next, to be precise, the use case of rebooting the host while the
>driver was loaded in SRIOV mode and NO VFs probed to VMs worked before
>commit befdf89 and is now broken.
>
>Reading further your response, I understand that the code was probably
>using a sort of hackish branching to make that to happen, and you
>suggest we re-write that section properly so it can serve well when
>(hopefully soon) implemenet
>sriov_configure and possibly also suspend/resume, point taken.
>
>Dave, as for this patch, again, the regression of inability to reboot
>the host node
>while the driver is loaded exists in the latest upstream code as of
>befdf89 / 3.15-rc1
>
>Now, taking into account that 3.15 is after rc8 and the IL devel team
>has a holiday this week, I don't see us coming in time with a more
>deeper fix for 3.15, so maybe you can eventaully go and merge this one
>liner for 3.15?
>
>Or.
>
Hi, Or,
After understanding the root cause of this error, I come up with another kind
of fix to this case. Verification has been done on my machine, it looks good
to me. If your team could verify this, it would be better :-)
Hope you like it, and any problem please let me know :-)
>From 3dec04518a3a373c2d2cf2c4088fd5bbf36b4b7b Mon Sep 17 00:00:00 2001
From: Wei Yang <weiyang@linux.vnet.ibm.com>
Date: Fri, 6 Jun 2014 10:16:45 +0800
Subject: [PATCH] net/mlx4_core: keep only one driver entry release mlx4_priv
After commit befdf89(net/mlx4_core: Preserve pci_dev_data after
__mlx4_remove_one()), there would be two driver entry which will try to
release mlx4_priv, .shutdown and .remove. This will leads to a concequence that
the mlx4_prive will be released twice in some case and trigger an oops "Oops:
Kernel access of bad area".
One case for this error is doing reboot or kexec when VFs are enabled. During
reboot or kexec, .shutdown will be called. When VFs are shutdown at first and
then PF, PF will trigger VFs' .remove since VFs still have driver attached.
This patch resolve this case by keeping only one driver entry to release the
mlx4_priv.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
CC: Or Gerlitz <ogerlitz@mellanox.com>
CC: Jack Morgenstein <jackm@dev.mellanox.co.il>
CC: Bjorn Helgaas <bhelgaas@google.com>
---
drivers/net/ethernet/mellanox/mlx4/main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 7cf9dad..70a1356 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -2799,7 +2799,7 @@ static struct pci_driver mlx4_driver = {
.name = DRV_NAME,
.id_table = mlx4_pci_table,
.probe = mlx4_init_one,
- .shutdown = mlx4_remove_one,
+ .shutdown = __mlx4_remove_one,
.remove = mlx4_remove_one,
.err_handler = &mlx4_err_handler,
};
--
1.7.9.5
--
Richard Yang
Help you, Help me
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host
2014-06-02 16:10 ` Bjorn Helgaas
` (2 preceding siblings ...)
2014-06-03 8:15 ` Or Gerlitz
@ 2014-06-08 9:16 ` Or Gerlitz
3 siblings, 0 replies; 13+ messages in thread
From: Or Gerlitz @ 2014-06-08 9:16 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Wei Yang, Or Gerlitz, David Miller, netdev, Amir Vadai,
Jack Morgenstein
On Mon, Jun 2, 2014 at 7:10 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
[...]
> From the PCI core's perspective, after .probe() returns successfully,
> we can call any driver entry point and pass the pci_dev to it, and
> expect it to work. Doing mlx4_remove_one() in mlx4_pci_err_detected()
note that __mlx4_remove_one() is what called from mlx4_pci_err_detected()
and the former is built in a way which allows it to be called twice.
In that respect, I agree to the fix provided by Wei Yang over this thread, which
essentially makes .shutdown to behave in a similar way and call
__mlx4_remove_one()
and will submit it for inclusion.
> sort of breaks that assumption because you clear out pci_drvdata().
> Right now, the only other entry point mlx4 really implements is
> mlx4_remove_one(), and it has a hack that tests whether pci_drvdata()
> is NULL. But that's ... a hack, and you'll have to do the same
> if/when you implement suspend/resume/sriov_configure/etc.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host
2014-06-06 2:52 ` Wei Yang
@ 2014-06-08 9:18 ` Or Gerlitz
0 siblings, 0 replies; 13+ messages in thread
From: Or Gerlitz @ 2014-06-08 9:18 UTC (permalink / raw)
To: Wei Yang
Cc: Bjorn Helgaas, David Miller, netdev, Amir Vadai, Jack Morgenstein,
Tal Alon, Yevgeny Petrilin
On Fri, Jun 6, 2014 at 5:52 AM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote:
> After understanding the root cause of this error, I come up with another kind
> of fix to this case. Verification has been done on my machine, it looks good
> to me. If your team could verify this, it would be better :-)
looks good, let me do some re-wording and I will submit it, with you
being the author
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2014-06-08 9:18 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-01 10:49 [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host Or Gerlitz
2014-06-01 16:41 ` Sergei Shtylyov
2014-06-01 19:59 ` Or Gerlitz
2014-06-02 14:29 ` Wei Yang
2014-06-02 16:10 ` Bjorn Helgaas
2014-06-03 0:58 ` David Miller
2014-06-03 2:00 ` Wei Yang
2014-06-03 8:15 ` Or Gerlitz
2014-06-03 8:40 ` Wei Yang
2014-06-04 9:50 ` Wei Yang
2014-06-06 2:52 ` Wei Yang
2014-06-08 9:18 ` Or Gerlitz
2014-06-08 9:16 ` Or Gerlitz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).