From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1059C2D0A8 for ; Mon, 28 Sep 2020 11:17:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6591521548 for ; Mon, 28 Sep 2020 11:17:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1601291853; bh=QAiNUtDSzv70JRMgjyCRJGXby7QSyIY+k+YJXFYx29s=; h=Subject:To:Cc:References:From:Date:In-Reply-To:List-ID:From; b=n0JTGWn0TMIXm65dGJVT7eWQpVR4185oomPOY2hMsGArAFXGpl2fyIuo4VX4hdHLb wd/KwyguRkdkRFLV/Jkcym8hfVVpfpPiBmPmo+iJqhicRwrDZEEQaoxVSaLAKnAm0b 71S5ykKFaKPt3+xuMicQJTkIdFL8ZRbtUTgVt328= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726617AbgI1LRa (ORCPT ); Mon, 28 Sep 2020 07:17:30 -0400 Received: from mail.kernel.org ([198.145.29.99]:33864 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726596AbgI1LR0 (ORCPT ); Mon, 28 Sep 2020 07:17:26 -0400 Received: from [192.168.0.112] (75-58-59-55.lightspeed.rlghnc.sbcglobal.net [75.58.59.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 40BAF2080C; Mon, 28 Sep 2020 11:17:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1601291845; bh=QAiNUtDSzv70JRMgjyCRJGXby7QSyIY+k+YJXFYx29s=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=pNl7ohwWu7ZBclQNbT59uh9P2dlQRGUroR0WQP/ijAlgaQmItzCeZJJm/J6XtKxhk U/8v7jhsdyHH6b2oUjvtmlRgcK007TH5Nees9BhaUyyBLTg7Dg1F0cVqsgGpnUfTcw 5UtO1/F/Zsn4pTYfktoM3XHaQ1VdzNM1ydXDJ1tk= Subject: Re: [PATCH v3 1/1] PCI/ERR: Fix reset logic in pcie_do_recovery() call To: "Kuppuswamy, Sathyanarayanan" , Bjorn Helgaas Cc: bhelgaas@google.com, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, ashok.raj@intel.com, Jay Vosburgh References: <20200922233333.GA2239404@bjorn-Precision-5520> <704c39bf-6f0c-bba3-70b8-91de6a445e43@linux.intel.com> <3d27d0a4-2115-fa72-8990-a84910e4215f@kernel.org> <526dc846-b12b-3523-4995-966eb972ceb7@kernel.org> <1fdcc4a6-53b7-2b5f-8496-f0f09405f561@linux.intel.com> <95e23cb5-f6e1-b121-0de8-a2066d507d9c@linux.intel.com> <65238d0b-0a39-400a-3a18-4f68eb554538@kernel.org> <4ae86061-2182-bcf1-ebd7-485acf2d47b9@linux.intel.com> <8beca800-ffb5-c535-6d43-7e750cbf06d0@linux.intel.com> <44f0cac5-8deb-1169-eb6d-93ac4889fe7e@kernel.org> <3bc0fd23-8ddd-32c5-1dd9-4d5209ea68c3@linux.intel.com> From: Sinan Kaya Autocrypt: addr=okaya@kernel.org; keydata= mQENBFrnOrUBCADGOL0kF21B6ogpOkuYvz6bUjO7NU99PKhXx1MfK/AzK+SFgxJF7dMluoF6 uT47bU7zb7HqACH6itTgSSiJeSoq86jYoq5s4JOyaj0/18Hf3/YBah7AOuwk6LtV3EftQIhw 9vXqCnBwP/nID6PQ685zl3vH68yzF6FVNwbDagxUz/gMiQh7scHvVCjiqkJ+qu/36JgtTYYw 8lGWRcto6gr0eTF8Wd8f81wspmUHGsFdN/xPsZPKMw6/on9oOj3AidcR3P9EdLY4qQyjvcNC V9cL9b5I/Ud9ghPwW4QkM7uhYqQDyh3SwgEFudc+/RsDuxjVlg9CFnGhS0nPXR89SaQZABEB AAG0HVNpbmFuIEtheWEgPG9rYXlhQGtlcm5lbC5vcmc+iQFOBBMBCAA4FiEEYdOlMSE+a7/c ckrQvGF4I+4LAFcFAlztcAoCGwMFCwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQvGF4I+4L AFfidAf/VKHInxep0Z96iYkIq42432HTZUrxNzG9IWk4HN7c3vTJKv2W+b9pgvBF1SmkyQSy 8SJ3Zd98CO6FOHA1FigFyZahVsme+T0GsS3/OF1kjrtMktoREr8t0rK0yKpCTYVdlkHadxmR Qs5xLzW1RqKlrNigKHI2yhgpMwrpzS+67F1biT41227sqFzW9urEl/jqGJXaB6GV+SRKSHN+ ubWXgE1NkmfAMeyJPKojNT7ReL6eh3BNB/Xh1vQJew+AE50EP7o36UXghoUktnx6cTkge0ZS qgxuhN33cCOU36pWQhPqVSlLTZQJVxuCmlaHbYWvye7bBOhmiuNKhOzb3FcgT7kBDQRa5zq1 AQgAyRq/7JZKOyB8wRx6fHE0nb31P75kCnL3oE+smKW/sOcIQDV3C7mZKLf472MWB1xdr4Tm eXeL/wT0QHapLn5M5wWghC80YvjjdolHnlq9QlYVtvl1ocAC28y43tKJfklhHiwMNDJfdZbw 9lQ2h+7nccFWASNUu9cqZOABLvJcgLnfdDpnSzOye09VVlKr3NHgRyRZa7me/oFJCxrJlKAl 2hllRLt0yV08o7i14+qmvxI2EKLX9zJfJ2rGWLTVe3EJBnCsQPDzAUVYSnTtqELu2AGzvDiM gatRaosnzhvvEK+kCuXuCuZlRWP7pWSHqFFuYq596RRG5hNGLbmVFZrCxQARAQABiQEfBBgB CAAJBQJa5zq1AhsMAAoJELxheCPuCwBX2UYH/2kkMC4mImvoClrmcMsNGijcZHdDlz8NFfCI gSb3NHkarnA7uAg8KJuaHUwBMk3kBhv2BGPLcmAknzBIehbZ284W7u3DT9o1Y5g+LDyx8RIi e7pnMcC+bE2IJExCVf2p3PB1tDBBdLEYJoyFz/XpdDjZ8aVls/pIyrq+mqo5LuuhWfZzPPec 9EiM2eXpJw+Rz+vKjSt1YIhg46YbdZrDM2FGrt9ve3YaM5H0lzJgq/JQPKFdbd5MB0X37Qc+ 2m/A9u9SFnOovA42DgXUyC2cSbIJdPWOK9PnzfXqF3sX9Aol2eLUmQuLpThJtq5EHu6FzJ7Y L+s0nPaNMKwv/Xhhm6Y= Message-ID: Date: Mon, 28 Sep 2020 07:17:24 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: <3bc0fd23-8ddd-32c5-1dd9-4d5209ea68c3@linux.intel.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 9/27/2020 10:43 PM, Kuppuswamy, Sathyanarayanan wrote: > FATAL + no-hotplug - In this case, link will still be reseted. But > currently driver state is not properly restored. So I attempted > to restore it using pci_reset_bus(). > >          status = reset_link(dev); > -        if (status != PCI_ERS_RESULT_RECOVERED) { > +        if (status == PCI_ERS_RESULT_RECOVERED) { > +            status = PCI_ERS_RESULT_NEED_RESET; > > ... > >      if (status == PCI_ERS_RESULT_NEED_RESET) { >          /* > -         * TODO: Should call platform-specific > -         * functions to reset slot before calling > -         * drivers' slot_reset callbacks? > +         * TODO: Optimize the call to pci_reset_bus() > +         * > +         * There are two components to pci_reset_bus(). > +         * > +         * 1. Do platform specific slot/bus reset. > +         * 2. Save/Restore all devices in the bus. > +         * > +         * For hotplug capable devices and fatal errors, > +         * device is already in reset state due to link > +         * reset. So repeating platform specific slot/bus > +         * reset via pci_reset_bus() call is redundant. So > +         * can optimize this logic and conditionally call > +         * pci_reset_bus(). >           */ > +        pci_reset_bus(dev); I think we have to go to remove/rescan for this case as you also mentioned above. There is no state to save. All BAR assignments are gone. Entire device programming is also lost. I don't think pci_reset_bus() can recover from this situation safely. It will make things worse by saving/restoring the hardware default state. This should remove/rescan logic should be inside DPC's slot_reset() function BTW. Not here.