From: Gavin Shan <gwshan@linux.vnet.ibm.com>
To: Or Gerlitz <gerlitz.or@gmail.com>
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>,
Linux Netdev List <netdev@vger.kernel.org>,
Amir Vadai <amirv@mellanox.com>,
David Miller <davem@davemloft.net>,
Wei Yang <weiyang@linux.vnet.ibm.com>,
Yishai Hadas <yishaih@mellanox.com>,
Jack Morgenstein <jackm@dev.mellanox.co.il>
Subject: Re: [PATCH] net/mlx4: Fix EEH recovery failure
Date: Wed, 26 Nov 2014 09:21:29 +1100 [thread overview]
Message-ID: <20141125222128.GA7213@shangw> (raw)
In-Reply-To: <CAJ3xEMi9G+tFsaANwndhmOZ78gt79WZ35Oq4CiFOm1WqxBXyqQ@mail.gmail.com>
On Wed, Nov 26, 2014 at 12:00:31AM +0200, Or Gerlitz wrote:
>On Mon, Nov 24, 2014 at 11:55 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
>> On Mon, Nov 24, 2014 at 11:17:55PM +0200, Or Gerlitz wrote:
>>>On Sat, Nov 22, 2014 at 12:56 PM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote:
>>>> The patch fixes couple of EEH recovery failures on PPC PowerNV
>>>> platform:
>>>
>>>> * Don't clear struct mlx4_priv instance in mlx4_pci_err_detected().
>>>> Otherwise, __mlx4_init_one() runs into kernel crash because
>>>> of dereferencing to NULL pointer.
>>>
>>>I don't see this change in the patch, I see no-clearing of mlx4_priv
>>>in __mlx4_unload_one - please clarify, also is this patch
>>>based/targeted on the net or net-next tree?
>>>
>>
>> Yes, It would be: Don't clear struct mlx4_priv instance in mlx4_unload_one(),
>> which is called by mlx4_pci_err_detected().
>
>
>But the struct mlx4_priv instance is cleared in mlx4_unload_one() for
>a reason, I suspect that you might made the EEH callback to work, but
>broke something else... e.g did you made sure that kexec works after
>your changes as it did before?
>
Nope, I didn't try kexec out and I'll have a try, thanks!
Gavin
>> It's based on 3.18.rc5, where I had couple of EEH fixes on top of it.
>> When testing EEH with it, I hit the issue.
>
>>>> With the patch applied, EEH recovery for mlx4 adapter succeeds on PPC
>>>> PowerNV platform.
>>>>
>>>> # lspci
>>>> 0003:0f:00.0 Network controller: Mellanox Technologies \
>>>> MT27500 Family [ConnectX-3]
>>>>
>>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>> ---
>>>> drivers/net/ethernet/mellanox/mlx4/main.c | 3 ++-
>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
>>>> index 90de6e1..e118ac9 100644
>>>> --- a/drivers/net/ethernet/mellanox/mlx4/main.c
>>>> +++ b/drivers/net/ethernet/mellanox/mlx4/main.c
>>>> @@ -2809,7 +2809,6 @@ static void mlx4_unload_one(struct pci_dev *pdev)
>>>> kfree(dev->caps.qp1_proxy);
>>>> kfree(dev->dev_vfs);
>>>>
>>>> - memset(priv, 0, sizeof(*priv));
>>>> priv->pci_dev_data = pci_dev_data;
>>>> priv->removed = 1;
>>>> }
>>>> @@ -2900,6 +2899,8 @@ static pci_ers_result_t mlx4_pci_err_detected(struct pci_dev *pdev,
>>>> pci_channel_state_t state)
>>>> {
>>>> mlx4_unload_one(pdev);
>>>> + pci_release_regions(pdev);
>>>> + pci_disable_device(pdev);
>>>>
>>>> return state == pci_channel_io_perm_failure ?
>>>> PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_NEED_RESET;
>>>> --
>
next prev parent reply other threads:[~2014-11-25 22:21 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-24 21:17 [PATCH] net/mlx4: Fix EEH recovery failure Or Gerlitz
2014-11-24 21:55 ` Gavin Shan
2014-11-25 22:00 ` Or Gerlitz
2014-11-25 22:21 ` Gavin Shan [this message]
-- strict thread matches above, loose matches on Subject: below --
2014-11-22 10:56 Gavin Shan
2014-11-23 16:21 ` Amir Vadai
2014-11-24 21:42 ` Gavin Shan
2014-12-05 4:28 ` Gavin Shan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141125222128.GA7213@shangw \
--to=gwshan@linux.vnet.ibm.com \
--cc=amirv@mellanox.com \
--cc=davem@davemloft.net \
--cc=gerlitz.or@gmail.com \
--cc=jackm@dev.mellanox.co.il \
--cc=netdev@vger.kernel.org \
--cc=weiyang@linux.vnet.ibm.com \
--cc=yishaih@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).