* [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host @ 2014-06-01 10:49 Or Gerlitz 2014-06-01 16:41 ` Sergei Shtylyov 2014-06-02 14:29 ` Wei Yang 0 siblings, 2 replies; 13+ messages in thread From: Or Gerlitz @ 2014-06-01 10:49 UTC (permalink / raw) To: davem; +Cc: netdev, amirv, weiyang, Jack Morgenstein, Or Gerlitz From: Jack Morgenstein <jackm@dev.mellanox.co.il> Commit befdf89 did not take into account the case where the Host driver is being unloaded. In this case, pci_get_drvdata for the VF remove_one call may return NULL, so that dereferencing the priv struct results in a kernel oops. The fix is to also test that the dev pointer returned by pci_get_drvdata is non-NULL. Fixes: befdf89 ("preserve pcd_dev_data after __mlx4_remove_one()") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> --- drivers/net/ethernet/mellanox/mlx4/main.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c index c187d74..a6ae089 100644 --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -2629,7 +2629,7 @@ static void __mlx4_remove_one(struct pci_dev *pdev) int pci_dev_data; int p; - if (priv->removed) + if (!dev || priv->removed) return; pci_dev_data = priv->pci_dev_data; -- 1.7.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host 2014-06-01 10:49 [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host Or Gerlitz @ 2014-06-01 16:41 ` Sergei Shtylyov 2014-06-01 19:59 ` Or Gerlitz 2014-06-02 14:29 ` Wei Yang 1 sibling, 1 reply; 13+ messages in thread From: Sergei Shtylyov @ 2014-06-01 16:41 UTC (permalink / raw) To: Or Gerlitz, davem; +Cc: netdev, amirv, weiyang, Jack Morgenstein Hello. On 06/01/2014 02:49 PM, Or Gerlitz wrote: > From: Jack Morgenstein <jackm@dev.mellanox.co.il> > Commit befdf89 did not take into account the case where the Host Please also specify that commit's summary line in parens. > driver is being unloaded. In this case, pci_get_drvdata for the VF > remove_one call may return NULL, so that dereferencing the priv > struct results in a kernel oops. > The fix is to also test that the dev pointer returned by > pci_get_drvdata is non-NULL. > Fixes: befdf89 ("preserve pcd_dev_data after __mlx4_remove_one()") > Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> > Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> WBR, Sergei ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host 2014-06-01 16:41 ` Sergei Shtylyov @ 2014-06-01 19:59 ` Or Gerlitz 0 siblings, 0 replies; 13+ messages in thread From: Or Gerlitz @ 2014-06-01 19:59 UTC (permalink / raw) To: Sergei Shtylyov Cc: Or Gerlitz, David Miller, netdev@vger.kernel.org, Amir Vadai, Wei Yang, Jack Morgenstein On Sun, Jun 1, 2014 at 7:41 PM, Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> wrote: > On 06/01/2014 02:49 PM, Or Gerlitz wrote: >> Commit befdf89 did not take into account the case where the Host > Please also specify that commit's summary line in parens. Did that below, see where we say Fixes: befdf89 ("preserve pcd_dev_data after __mlx4_remove_one()") >> driver is being unloaded. In this case, pci_get_drvdata for the VF >> remove_one call may return NULL, so that dereferencing the priv >> struct results in a kernel oops. >> The fix is to also test that the dev pointer returned by >> pci_get_drvdata is non-NULL. >> Fixes: befdf89 ("preserve pcd_dev_data after __mlx4_remove_one()") >> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> >> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host 2014-06-01 10:49 [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host Or Gerlitz 2014-06-01 16:41 ` Sergei Shtylyov @ 2014-06-02 14:29 ` Wei Yang 2014-06-02 16:10 ` Bjorn Helgaas 1 sibling, 1 reply; 13+ messages in thread From: Wei Yang @ 2014-06-02 14:29 UTC (permalink / raw) To: Or Gerlitz; +Cc: davem, netdev, amirv, weiyang, bhelgaas, Jack Morgenstein On Sun, Jun 01, 2014 at 01:49:43PM +0300, Or Gerlitz wrote: >From: Jack Morgenstein <jackm@dev.mellanox.co.il> > >Commit befdf89 did not take into account the case where the Host >driver is being unloaded. In this case, pci_get_drvdata for the VF In my mind, unloading PF's driver when there is alive VFs is not allowed. Quoted in driver code: /* in SRIOV it is not allowed to unload the pf's * driver while there are alive vf's */ if (mlx4_is_master(dev) && mlx4_how_many_lives_vf(dev)) printk(KERN_ERR "Removing PF when there are assigned VF's !!!\n"); Actually, I don't understand this restriction clearly. Maybe my understanding of alive VF is not correct. And in your code, unload PF's driver would call pci_disable_sriov() which will destroy the VFs. While in your test, the VF's driver is still there? >remove_one call may return NULL, so that dereferencing the priv >struct results in a kernel oops. Sorry for my poor mind, I still can't understand this situation. Would you describe the situation more? You are unloading PF's driver in Host at first, and then try to release the VF's driver? > >The fix is to also test that the dev pointer returned by >pci_get_drvdata is non-NULL. > >Fixes: befdf89 ("preserve pcd_dev_data after __mlx4_remove_one()") >Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> >Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> >--- > drivers/net/ethernet/mellanox/mlx4/main.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > >diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c >index c187d74..a6ae089 100644 >--- a/drivers/net/ethernet/mellanox/mlx4/main.c >+++ b/drivers/net/ethernet/mellanox/mlx4/main.c >@@ -2629,7 +2629,7 @@ static void __mlx4_remove_one(struct pci_dev *pdev) > int pci_dev_data; > int p; > >- if (priv->removed) >+ if (!dev || priv->removed) > return; This fix looks good to me. As I remembered, I had this check in my first version, but I removed the check on dev based on the suggestion from Bjorn. Since I agreed that there is no chance for dev to be NULL. Bjorn, seems we are not correct :( > > pci_dev_data = priv->pci_dev_data; >-- >1.7.1 -- Richard Yang Help you, Help me ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host 2014-06-02 14:29 ` Wei Yang @ 2014-06-02 16:10 ` Bjorn Helgaas 2014-06-03 0:58 ` David Miller ` (3 more replies) 0 siblings, 4 replies; 13+ messages in thread From: Bjorn Helgaas @ 2014-06-02 16:10 UTC (permalink / raw) To: Wei Yang; +Cc: Or Gerlitz, David Miller, netdev, Amir Vadai, Jack Morgenstein On Mon, Jun 2, 2014 at 8:29 AM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote: > On Sun, Jun 01, 2014 at 01:49:43PM +0300, Or Gerlitz wrote: >>From: Jack Morgenstein <jackm@dev.mellanox.co.il> >> >>Commit befdf89 did not take into account the case where the Host >>driver is being unloaded. In this case, pci_get_drvdata for the VF > > In my mind, unloading PF's driver when there is alive VFs is not allowed. > Quoted in driver code: > > /* in SRIOV it is not allowed to unload the pf's > * driver while there are alive vf's */ > if (mlx4_is_master(dev) && mlx4_how_many_lives_vf(dev)) > printk(KERN_ERR "Removing PF when there are assigned VF's !!!\n"); > > Actually, I don't understand this restriction clearly. Maybe my understanding > of alive VF is not correct. > > And in your code, unload PF's driver would call pci_disable_sriov() which will > destroy the VFs. While in your test, the VF's driver is still there? > >>remove_one call may return NULL, so that dereferencing the priv >>struct results in a kernel oops. > > Sorry for my poor mind, I still can't understand this situation. > Would you describe the situation more? You are unloading PF's driver in Host > at first, and then try to release the VF's driver? > >> >>The fix is to also test that the dev pointer returned by >>pci_get_drvdata is non-NULL. >> >>Fixes: befdf89 ("preserve pcd_dev_data after __mlx4_remove_one()") >>Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> >>Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> >>--- >> drivers/net/ethernet/mellanox/mlx4/main.c | 2 +- >> 1 files changed, 1 insertions(+), 1 deletions(-) >> >>diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c >>index c187d74..a6ae089 100644 >>--- a/drivers/net/ethernet/mellanox/mlx4/main.c >>+++ b/drivers/net/ethernet/mellanox/mlx4/main.c >>@@ -2629,7 +2629,7 @@ static void __mlx4_remove_one(struct pci_dev *pdev) >> int pci_dev_data; >> int p; >> >>- if (priv->removed) >>+ if (!dev || priv->removed) >> return; > > This fix looks good to me. > > As I remembered, I had this check in my first version, but I removed the check > on dev based on the suggestion from Bjorn. Since I agreed that there is no > chance for dev to be NULL. Bjorn, seems we are not correct :( Writing a driver is not an empirical process of trying things to see what works. You need to actively design a consistent structure so you know why and when things are safe. I object to gratuitous "dev == NULL" checks because often they are just a way of patching up a driver design that isn't well thought-out. As I wrote before: From the PCI core's perspective, after .probe() returns successfully, we can call any driver entry point and pass the pci_dev to it, and expect it to work. Doing mlx4_remove_one() in mlx4_pci_err_detected() sort of breaks that assumption because you clear out pci_drvdata(). Right now, the only other entry point mlx4 really implements is mlx4_remove_one(), and it has a hack that tests whether pci_drvdata() is NULL. But that's ... a hack, and you'll have to do the same if/when you implement suspend/resume/sriov_configure/etc. Bjorn ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host 2014-06-02 16:10 ` Bjorn Helgaas @ 2014-06-03 0:58 ` David Miller 2014-06-03 2:00 ` Wei Yang ` (2 subsequent siblings) 3 siblings, 0 replies; 13+ messages in thread From: David Miller @ 2014-06-03 0:58 UTC (permalink / raw) To: bhelgaas; +Cc: weiyang, ogerlitz, netdev, amirv, jackm From: Bjorn Helgaas <bhelgaas@google.com> Date: Mon, 2 Jun 2014 10:10:01 -0600 > Writing a driver is not an empirical process of trying things to see > what works. You need to actively design a consistent structure so you > know why and when things are safe. I object to gratuitous "dev == > NULL" checks because often they are just a way of patching up a driver > design that isn't well thought-out. > > As I wrote before: > > From the PCI core's perspective, after .probe() returns successfully, > we can call any driver entry point and pass the pci_dev to it, and > expect it to work. Doing mlx4_remove_one() in mlx4_pci_err_detected() > sort of breaks that assumption because you clear out pci_drvdata(). > Right now, the only other entry point mlx4 really implements is > mlx4_remove_one(), and it has a hack that tests whether pci_drvdata() > is NULL. But that's ... a hack, and you'll have to do the same > if/when you implement suspend/resume/sriov_configure/etc. Agreed. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host 2014-06-02 16:10 ` Bjorn Helgaas 2014-06-03 0:58 ` David Miller @ 2014-06-03 2:00 ` Wei Yang 2014-06-03 8:15 ` Or Gerlitz 2014-06-08 9:16 ` Or Gerlitz 3 siblings, 0 replies; 13+ messages in thread From: Wei Yang @ 2014-06-03 2:00 UTC (permalink / raw) To: Bjorn Helgaas Cc: Wei Yang, Or Gerlitz, David Miller, netdev, Amir Vadai, Jack Morgenstein On Mon, Jun 02, 2014 at 10:10:01AM -0600, Bjorn Helgaas wrote: >On Mon, Jun 2, 2014 at 8:29 AM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote: >> On Sun, Jun 01, 2014 at 01:49:43PM +0300, Or Gerlitz wrote: >>>From: Jack Morgenstein <jackm@dev.mellanox.co.il> >>> >>>Commit befdf89 did not take into account the case where the Host >>>driver is being unloaded. In this case, pci_get_drvdata for the VF >> >> In my mind, unloading PF's driver when there is alive VFs is not allowed. >> Quoted in driver code: >> >> /* in SRIOV it is not allowed to unload the pf's >> * driver while there are alive vf's */ >> if (mlx4_is_master(dev) && mlx4_how_many_lives_vf(dev)) >> printk(KERN_ERR "Removing PF when there are assigned VF's !!!\n"); >> >> Actually, I don't understand this restriction clearly. Maybe my understanding >> of alive VF is not correct. >> >> And in your code, unload PF's driver would call pci_disable_sriov() which will >> destroy the VFs. While in your test, the VF's driver is still there? >> >>>remove_one call may return NULL, so that dereferencing the priv >>>struct results in a kernel oops. >> >> Sorry for my poor mind, I still can't understand this situation. >> Would you describe the situation more? You are unloading PF's driver in Host >> at first, and then try to release the VF's driver? >> >>> >>>The fix is to also test that the dev pointer returned by >>>pci_get_drvdata is non-NULL. >>> >>>Fixes: befdf89 ("preserve pcd_dev_data after __mlx4_remove_one()") >>>Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> >>>Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> >>>--- >>> drivers/net/ethernet/mellanox/mlx4/main.c | 2 +- >>> 1 files changed, 1 insertions(+), 1 deletions(-) >>> >>>diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c >>>index c187d74..a6ae089 100644 >>>--- a/drivers/net/ethernet/mellanox/mlx4/main.c >>>+++ b/drivers/net/ethernet/mellanox/mlx4/main.c >>>@@ -2629,7 +2629,7 @@ static void __mlx4_remove_one(struct pci_dev *pdev) >>> int pci_dev_data; >>> int p; >>> >>>- if (priv->removed) >>>+ if (!dev || priv->removed) >>> return; >> >> This fix looks good to me. >> >> As I remembered, I had this check in my first version, but I removed the check >> on dev based on the suggestion from Bjorn. Since I agreed that there is no >> chance for dev to be NULL. Bjorn, seems we are not correct :( > >Writing a driver is not an empirical process of trying things to see >what works. You need to actively design a consistent structure so you >know why and when things are safe. I object to gratuitous "dev == >NULL" checks because often they are just a way of patching up a driver >design that isn't well thought-out. > >As I wrote before: > > From the PCI core's perspective, after .probe() returns successfully, > we can call any driver entry point and pass the pci_dev to it, and > expect it to work. Doing mlx4_remove_one() in mlx4_pci_err_detected() > sort of breaks that assumption because you clear out pci_drvdata(). > Right now, the only other entry point mlx4 really implements is > mlx4_remove_one(), and it has a hack that tests whether pci_drvdata() > is NULL. But that's ... a hack, and you'll have to do the same > if/when you implement suspend/resume/sriov_configure/etc. Thanks for your kindness. After re-reading it, I understand it more, it is not only related to the Mellanox driver, but also the whole picture about how to write a driver. 1. We should make the driver entry save, after .probe() returns successfully. 2. If there is an exception and a hack to test the pci_drvdata(), we need to have this hack in suspend/resum/etc. Now back to the current mlx4 driver, mlx4_remove_one() is called by .shutdown and .remove. In my mind, these two hook is invoked by rmmod or reboot. By doing so, it is trying to comply with rule 1, make sure the pci_drvdata() is valid, after .probe() succeed. Then I am curious about in which case the driver break this rule. Following is my suggestion: 1. To comply with rule 1, it would be better to fix this point instead of add a hack. 2. Or to comply with rule 2, the driver needs to check pci_drvdata() in every driver's entry instead of just in one driver entry. For example, mlx4_pci_slot_reset() need this check too. Bjorn, thanks again, hope my understanding this time is correct :-) > >Bjorn -- Richard Yang Help you, Help me ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host 2014-06-02 16:10 ` Bjorn Helgaas 2014-06-03 0:58 ` David Miller 2014-06-03 2:00 ` Wei Yang @ 2014-06-03 8:15 ` Or Gerlitz 2014-06-03 8:40 ` Wei Yang ` (2 more replies) 2014-06-08 9:16 ` Or Gerlitz 3 siblings, 3 replies; 13+ messages in thread From: Or Gerlitz @ 2014-06-03 8:15 UTC (permalink / raw) To: Bjorn Helgaas, David Miller Cc: Wei Yang, netdev, Amir Vadai, Jack Morgenstein, Tal Alon, Yevgeny Petrilin On Mon, Jun 2, 2014 at 7:10 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: > Writing a driver is not an empirical process of trying things to see > what works. You need to actively design a consistent structure so you > know why and when things are safe. I object to gratuitous "dev == > NULL" checks because often they are just a way of patching up a driver > design that isn't well thought-out. Bjorn, 1st and most -- Agreed. Next, to be precise, the use case of rebooting the host while the driver was loaded in SRIOV mode and NO VFs probed to VMs worked before commit befdf89 and is now broken. Reading further your response, I understand that the code was probably using a sort of hackish branching to make that to happen, and you suggest we re-write that section properly so it can serve well when (hopefully soon) implemenet sriov_configure and possibly also suspend/resume, point taken. Dave, as for this patch, again, the regression of inability to reboot the host node while the driver is loaded exists in the latest upstream code as of befdf89 / 3.15-rc1 Now, taking into account that 3.15 is after rc8 and the IL devel team has a holiday this week, I don't see us coming in time with a more deeper fix for 3.15, so maybe you can eventaully go and merge this one liner for 3.15? Or. > As I wrote before: > From the PCI core's perspective, after .probe() returns successfully, > we can call any driver entry point and pass the pci_dev to it, and > expect it to work. Doing mlx4_remove_one() in mlx4_pci_err_detected() > sort of breaks that assumption because you clear out pci_drvdata(). > Right now, the only other entry point mlx4 really implements is > mlx4_remove_one(), and it has a hack that tests whether pci_drvdata() > is NULL. But that's ... a hack, and you'll have to do the same > if/when you implement suspend/resume/sriov_configure/etc. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host 2014-06-03 8:15 ` Or Gerlitz @ 2014-06-03 8:40 ` Wei Yang 2014-06-04 9:50 ` Wei Yang 2014-06-06 2:52 ` Wei Yang 2 siblings, 0 replies; 13+ messages in thread From: Wei Yang @ 2014-06-03 8:40 UTC (permalink / raw) To: Or Gerlitz Cc: Bjorn Helgaas, David Miller, Wei Yang, netdev, Amir Vadai, Jack Morgenstein, Tal Alon, Yevgeny Petrilin On Tue, Jun 03, 2014 at 11:15:43AM +0300, Or Gerlitz wrote: >On Mon, Jun 2, 2014 at 7:10 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: >> Writing a driver is not an empirical process of trying things to see >> what works. You need to actively design a consistent structure so you >> know why and when things are safe. I object to gratuitous "dev == >> NULL" checks because often they are just a way of patching up a driver >> design that isn't well thought-out. > >Bjorn, 1st and most -- Agreed. > >Next, to be precise, the use case of rebooting the host while the >driver was loaded in SRIOV mode and NO VFs probed to VMs worked before >commit befdf89 and is now broken. > >Reading further your response, I understand that the code was probably >using a sort of hackish branching to make that to happen, and you >suggest we re-write that section properly so it can serve well when >(hopefully soon) implemenet >sriov_configure and possibly also suspend/resume, point taken. > >Dave, as for this patch, again, the regression of inability to reboot >the host node >while the driver is loaded exists in the latest upstream code as of >befdf89 / 3.15-rc1 > >Now, taking into account that 3.15 is after rc8 and the IL devel team >has a holiday this week, I don't see us coming in time with a more >deeper fix for 3.15, so maybe you can eventaully go and merge this one >liner for 3.15? I am glad to verify your patch, if you wish. > >Or. > > >> As I wrote before: >> From the PCI core's perspective, after .probe() returns successfully, >> we can call any driver entry point and pass the pci_dev to it, and >> expect it to work. Doing mlx4_remove_one() in mlx4_pci_err_detected() >> sort of breaks that assumption because you clear out pci_drvdata(). >> Right now, the only other entry point mlx4 really implements is >> mlx4_remove_one(), and it has a hack that tests whether pci_drvdata() >> is NULL. But that's ... a hack, and you'll have to do the same >> if/when you implement suspend/resume/sriov_configure/etc. -- Richard Yang Help you, Help me ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host 2014-06-03 8:15 ` Or Gerlitz 2014-06-03 8:40 ` Wei Yang @ 2014-06-04 9:50 ` Wei Yang 2014-06-06 2:52 ` Wei Yang 2 siblings, 0 replies; 13+ messages in thread From: Wei Yang @ 2014-06-04 9:50 UTC (permalink / raw) To: Or Gerlitz Cc: Bjorn Helgaas, David Miller, Wei Yang, netdev, Amir Vadai, Jack Morgenstein, Tal Alon, Yevgeny Petrilin On Tue, Jun 03, 2014 at 11:15:43AM +0300, Or Gerlitz wrote: >On Mon, Jun 2, 2014 at 7:10 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: >> Writing a driver is not an empirical process of trying things to see >> what works. You need to actively design a consistent structure so you >> know why and when things are safe. I object to gratuitous "dev == >> NULL" checks because often they are just a way of patching up a driver >> design that isn't well thought-out. > >Bjorn, 1st and most -- Agreed. > >Next, to be precise, the use case of rebooting the host while the >driver was loaded in SRIOV mode and NO VFs probed to VMs worked before >commit befdf89 and is now broken. > >Reading further your response, I understand that the code was probably >using a sort of hackish branching to make that to happen, and you >suggest we re-write that section properly so it can serve well when >(hopefully soon) implemenet >sriov_configure and possibly also suspend/resume, point taken. > >Dave, as for this patch, again, the regression of inability to reboot >the host node >while the driver is loaded exists in the latest upstream code as of >befdf89 / 3.15-rc1 > >Now, taking into account that 3.15 is after rc8 and the IL devel team >has a holiday this week, I don't see us coming in time with a more >deeper fix for 3.15, so maybe you can eventaully go and merge this one >liner for 3.15? > >Or. Hi, Or, I did some tests with your steps to reproduce the case. Below is my analysis: I did "rmmod mlx4_core" and "kexec" after probe the Mellanox driver. Below is the log from two steps respectively. [root@tian-lp1 ywywyang]# rmmod mlx4_core [ 534.159740] mlx4_core 0003:05:00.1: mlx4_remove_one: called [ 534.161272] mlx4_core 0003:05:00.0: Received reset from slave:1 [ 534.161509] mlx4_core 0003:05:00.0: mlx4_remove_one: called [ 534.170823] mlx4_core 0003:05:00.0: Disabling SR-IOV [root@tian-lp1 ywywyang]# kexec -e [ 669.089322] kvm: exiting hardware virtualization [ 669.091746] mlx4_core 0003:05:00.1: mlx4_remove_one: called [ 669.326754] mlx4_core 0003:05:00.0: Received reset from slave:1 [ 674.488417] lpfc 0006:01:00.4: 2:2885 Port Status Event: port status reg 0x81000000, port smphr reg 0xc000, error 1=0x9f000001, error 2=0xa9fa47fd [ 675.618578] mlx4_core 0003:05:00.0: mlx4_remove_one: called [ 675.691278] mlx4_en 0003:05:00.0: removed PHC [ 675.700414] mlx4_core 0003:05:00.0: Disabling SR-IOV [ 675.700630] mlx4_core 0003:05:00.1: mlx4_remove_one: called [ 675.700701] Unable to handle kernel paging request for data at address 0x00000370 [ 675.700769] Faulting instruction address: 0xd00000001a13fb88 [ 675.700826] Oops: Kernel access of bad area, sig: 11 [#1] [---] During rmmod, the driver works fine, and in kexec there is oops message. The kexec is almost the same as reboot. We see the driver for pci device 0003:05:00.1 has been "removed" twice and at the second time the driver triggers an error. rmmod and kexec calls different driver entry, rmmod -> .remove and kexec->shutdown. I think this is the reason why there is an oops message during reboot. In .shutdown, the driver will not be detached. While in case there is VFs, both .shutdown and .remove will be invoked on VF. Did a quick glance at the e1000e driver, the .shutdown and .remove behaves differently. So maybe at .shutdown, it needs some different handling than .remove. Well adding a check at .remove is a quick fix for this case. This is my draft analysis for your reference, hope it is correct and help you to some extend. Have a good day :-) -- Richard Yang Help you, Help me ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host 2014-06-03 8:15 ` Or Gerlitz 2014-06-03 8:40 ` Wei Yang 2014-06-04 9:50 ` Wei Yang @ 2014-06-06 2:52 ` Wei Yang 2014-06-08 9:18 ` Or Gerlitz 2 siblings, 1 reply; 13+ messages in thread From: Wei Yang @ 2014-06-06 2:52 UTC (permalink / raw) To: Or Gerlitz Cc: Bjorn Helgaas, David Miller, Wei Yang, netdev, Amir Vadai, Jack Morgenstein, Tal Alon, Yevgeny Petrilin On Tue, Jun 03, 2014 at 11:15:43AM +0300, Or Gerlitz wrote: >On Mon, Jun 2, 2014 at 7:10 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: >> Writing a driver is not an empirical process of trying things to see >> what works. You need to actively design a consistent structure so you >> know why and when things are safe. I object to gratuitous "dev == >> NULL" checks because often they are just a way of patching up a driver >> design that isn't well thought-out. > >Bjorn, 1st and most -- Agreed. > >Next, to be precise, the use case of rebooting the host while the >driver was loaded in SRIOV mode and NO VFs probed to VMs worked before >commit befdf89 and is now broken. > >Reading further your response, I understand that the code was probably >using a sort of hackish branching to make that to happen, and you >suggest we re-write that section properly so it can serve well when >(hopefully soon) implemenet >sriov_configure and possibly also suspend/resume, point taken. > >Dave, as for this patch, again, the regression of inability to reboot >the host node >while the driver is loaded exists in the latest upstream code as of >befdf89 / 3.15-rc1 > >Now, taking into account that 3.15 is after rc8 and the IL devel team >has a holiday this week, I don't see us coming in time with a more >deeper fix for 3.15, so maybe you can eventaully go and merge this one >liner for 3.15? > >Or. > Hi, Or, After understanding the root cause of this error, I come up with another kind of fix to this case. Verification has been done on my machine, it looks good to me. If your team could verify this, it would be better :-) Hope you like it, and any problem please let me know :-) >From 3dec04518a3a373c2d2cf2c4088fd5bbf36b4b7b Mon Sep 17 00:00:00 2001 From: Wei Yang <weiyang@linux.vnet.ibm.com> Date: Fri, 6 Jun 2014 10:16:45 +0800 Subject: [PATCH] net/mlx4_core: keep only one driver entry release mlx4_priv After commit befdf89(net/mlx4_core: Preserve pci_dev_data after __mlx4_remove_one()), there would be two driver entry which will try to release mlx4_priv, .shutdown and .remove. This will leads to a concequence that the mlx4_prive will be released twice in some case and trigger an oops "Oops: Kernel access of bad area". One case for this error is doing reboot or kexec when VFs are enabled. During reboot or kexec, .shutdown will be called. When VFs are shutdown at first and then PF, PF will trigger VFs' .remove since VFs still have driver attached. This patch resolve this case by keeping only one driver entry to release the mlx4_priv. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> CC: Or Gerlitz <ogerlitz@mellanox.com> CC: Jack Morgenstein <jackm@dev.mellanox.co.il> CC: Bjorn Helgaas <bhelgaas@google.com> --- drivers/net/ethernet/mellanox/mlx4/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c index 7cf9dad..70a1356 100644 --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -2799,7 +2799,7 @@ static struct pci_driver mlx4_driver = { .name = DRV_NAME, .id_table = mlx4_pci_table, .probe = mlx4_init_one, - .shutdown = mlx4_remove_one, + .shutdown = __mlx4_remove_one, .remove = mlx4_remove_one, .err_handler = &mlx4_err_handler, }; -- 1.7.9.5 -- Richard Yang Help you, Help me ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host 2014-06-06 2:52 ` Wei Yang @ 2014-06-08 9:18 ` Or Gerlitz 0 siblings, 0 replies; 13+ messages in thread From: Or Gerlitz @ 2014-06-08 9:18 UTC (permalink / raw) To: Wei Yang Cc: Bjorn Helgaas, David Miller, netdev, Amir Vadai, Jack Morgenstein, Tal Alon, Yevgeny Petrilin On Fri, Jun 6, 2014 at 5:52 AM, Wei Yang <weiyang@linux.vnet.ibm.com> wrote: > After understanding the root cause of this error, I come up with another kind > of fix to this case. Verification has been done on my machine, it looks good > to me. If your team could verify this, it would be better :-) looks good, let me do some re-wording and I will submit it, with you being the author ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host 2014-06-02 16:10 ` Bjorn Helgaas ` (2 preceding siblings ...) 2014-06-03 8:15 ` Or Gerlitz @ 2014-06-08 9:16 ` Or Gerlitz 3 siblings, 0 replies; 13+ messages in thread From: Or Gerlitz @ 2014-06-08 9:16 UTC (permalink / raw) To: Bjorn Helgaas Cc: Wei Yang, Or Gerlitz, David Miller, netdev, Amir Vadai, Jack Morgenstein On Mon, Jun 2, 2014 at 7:10 PM, Bjorn Helgaas <bhelgaas@google.com> wrote: [...] > From the PCI core's perspective, after .probe() returns successfully, > we can call any driver entry point and pass the pci_dev to it, and > expect it to work. Doing mlx4_remove_one() in mlx4_pci_err_detected() note that __mlx4_remove_one() is what called from mlx4_pci_err_detected() and the former is built in a way which allows it to be called twice. In that respect, I agree to the fix provided by Wei Yang over this thread, which essentially makes .shutdown to behave in a similar way and call __mlx4_remove_one() and will submit it for inclusion. > sort of breaks that assumption because you clear out pci_drvdata(). > Right now, the only other entry point mlx4 really implements is > mlx4_remove_one(), and it has a hack that tests whether pci_drvdata() > is NULL. But that's ... a hack, and you'll have to do the same > if/when you implement suspend/resume/sriov_configure/etc. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2014-06-08 9:18 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-06-01 10:49 [PATCH net] net/mlx4_core: Fix Oops on reboot when SRIOV VFs are probed into the Host Or Gerlitz 2014-06-01 16:41 ` Sergei Shtylyov 2014-06-01 19:59 ` Or Gerlitz 2014-06-02 14:29 ` Wei Yang 2014-06-02 16:10 ` Bjorn Helgaas 2014-06-03 0:58 ` David Miller 2014-06-03 2:00 ` Wei Yang 2014-06-03 8:15 ` Or Gerlitz 2014-06-03 8:40 ` Wei Yang 2014-06-04 9:50 ` Wei Yang 2014-06-06 2:52 ` Wei Yang 2014-06-08 9:18 ` Or Gerlitz 2014-06-08 9:16 ` Or Gerlitz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).