From: Gavin Shan <gwshan@linux.vnet.ibm.com>
To: Or Gerlitz <gerlitz.or@gmail.com>
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>,
Linux Netdev List <netdev@vger.kernel.org>,
Amir Vadai <amirv@mellanox.com>,
David Miller <davem@davemloft.net>,
Wei Yang <weiyang@linux.vnet.ibm.com>,
Yishai Hadas <yishaih@mellanox.com>,
Jack Morgenstein <jackm@dev.mellanox.co.il>
Subject: Re: [PATCH] net/mlx4: Fix EEH recovery failure
Date: Wed, 26 Nov 2014 09:21:29 +1100 [thread overview]
Message-ID: <20141125222128.GA7213@shangw> (raw)
In-Reply-To: <CAJ3xEMi9G+tFsaANwndhmOZ78gt79WZ35Oq4CiFOm1WqxBXyqQ@mail.gmail.com>
On Wed, Nov 26, 2014 at 12:00:31AM +0200, Or Gerlitz wrote:
>On Mon, Nov 24, 2014 at 11:55 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
>> On Mon, Nov 24, 2014 at 11:17:55PM +0200, Or Gerlitz wrote:
>>>On Sat, Nov 22, 2014 at 12:56 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
>>>> The patch fixes couple of EEH recovery failures on PPC PowerNV
>>>> platform:
>>>
>>>> * Don't clear struct mlx4_priv instance in mlx4_pci_err_detected().
>>>> Otherwise, __mlx4_init_one() runs into kernel crash because
>>>> of dereferencing to NULL pointer.
>>>
>>>I don't see this change in the patch, I see no-clearing of mlx4_priv
>>>in __mlx4_unload_one - please clarify, also is this patch
>>>based/targeted on the net or net-next tree?
>>>
>>
>> Yes, It would be: Don't clear struct mlx4_priv instance in mlx4_unload_one(),
>> which is called by mlx4_pci_err_detected().
>
>
>But the struct mlx4_priv instance is cleared in mlx4_unload_one() for
>a reason, I suspect that you might made the EEH callback to work, but
>broke something else... e.g did you made sure that kexec works after
>your changes as it did before?
>
Nope, I didn't try kexec out and I'll have a try, thanks!
Gavin
>> It's based on 3.18.rc5, where I had couple of EEH fixes on top of it.
>> When testing EEH with it, I hit the issue.
>
>>>> With the patch applied, EEH recovery for mlx4 adapter succeeds on PPC
>>>> PowerNV platform.
>>>>
>>>> # lspci
>>>> 0003:0f:00.0 Network controller: Mellanox Technologies \
>>>> MT27500 Family [ConnectX-3]
>>>>
>>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>> ---
>>>> drivers/net/ethernet/mellanox/mlx4/main.c | 3 ++-
>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
>>>> index 90de6e1..e118ac9 100644
>>>> --- a/drivers/net/ethernet/mellanox/mlx4/main.c
>>>> +++ b/drivers/net/ethernet/mellanox/mlx4/main.c
>>>> @@ -2809,7 +2809,6 @@ static void mlx4_unload_one(struct pci_dev *pdev)
>>>> kfree(dev->caps.qp1_proxy);
>>>> kfree(dev->dev_vfs);
>>>>
>>>> - memset(priv, 0, sizeof(*priv));
>>>> priv->pci_dev_data = pci_dev_data;
>>>> priv->removed = 1;
>>>> }
>>>> @@ -2900,6 +2899,8 @@ static pci_ers_result_t mlx4_pci_err_detected(struct pci_dev *pdev,
>>>> pci_channel_state_t state)
>>>> {
>>>> mlx4_unload_one(pdev);
>>>> + pci_release_regions(pdev);
>>>> + pci_disable_device(pdev);
>>>>
>>>> return state == pci_channel_io_perm_failure ?
>>>> PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_NEED_RESET;
>>>> --
>
next prev parent reply other threads:[~2014-11-25 22:21 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-24 21:17 [PATCH] net/mlx4: Fix EEH recovery failure Or Gerlitz
2014-11-24 21:55 ` Gavin Shan
2014-11-25 22:00 ` Or Gerlitz
2014-11-25 22:21 ` Gavin Shan [this message]
-- strict thread matches above, loose matches on Subject: below --
2014-11-22 10:56 Gavin Shan
2014-11-23 16:21 ` Amir Vadai
2014-11-24 21:42 ` Gavin Shan
2014-12-05 4:28 ` Gavin Shan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141125222128.GA7213@shangw \
--to=gwshan@linux.vnet.ibm.com \
--cc=amirv@mellanox.com \
--cc=davem@davemloft.net \
--cc=gerlitz.or@gmail.com \
--cc=jackm@dev.mellanox.co.il \
--cc=netdev@vger.kernel.org \
--cc=weiyang@linux.vnet.ibm.com \
--cc=yishaih@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.