From mboxrd@z Thu Jan 1 00:00:00 1970 From: Keller, Jacob E Date: Tue, 27 Sep 2016 22:41:59 +0000 Subject: [Intel-wired-lan] [PATCH net] i40e: avoid NULL pointer dereference and recursive errors on early PCI error In-Reply-To: <1475010871-31682-1-git-send-email-gpiccoli@linux.vnet.ibm.com> References: <1475010871-31682-1-git-send-email-gpiccoli@linux.vnet.ibm.com> Message-ID: <1475016118.22933.1.camel@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: intel-wired-lan@osuosl.org List-ID: On Tue, 2016-09-27 at 18:14 -0300, Guilherme G. Piccoli wrote: > Although rare, it's possible to hit PCI error early on device > probe, meaning possibly some structs are not entirely initialized, > and some might even be completely uninitialized, leading to NULL > pointer dereference. > > The i40e driver currently presents a "bad" behavior if device hits > such early PCI error: firstly, the struct i40e_pf might not be > attached to pci_dev yet, leading to a NULL pointer dereference on > access to pf->state. > Oops! Nice find! > Even checking if the struct is NULL and avoiding the access in that > case isn't enough, since the driver cannot recover from PCI error > that early; in our experiments we saw multiple failures on kernel > log, like: > > ? [549.664] i40e 0007:01:00.1: Initial pf_reset failed: -15 > ? [549.664] i40e: probe of 0007:01:00.1 failed with error -15 > ? [...] > ? [871.644] i40e 0007:01:00.1: The driver for the device stopped > because the > ? device firmware failed to init. Try updating your NVM image. > ? [871.644] i40e: probe of 0007:01:00.1 failed with error -32 > ? [...] > ? [872.516] i40e 0007:01:00.0: ARQ: Unknown event 0x0000 ignored > > Between the first probe failure (error -15) and the second (error > -32) > another PCI error happened due to the first bad probe. Also, driver > started to flood console with those ARQ event messages. > > This patch will prevent these issues by allowing error recovery > mechanism to remove the failed device from the system instead of > trying to recover from early PCI errors during device probe. > This seems reasonable. > Signed-off-by: Guilherme G. Piccoli > --- > ?drivers/net/ethernet/intel/i40e/i40e_main.c | 6 ++++++ > ?1 file changed, 6 insertions(+) > > diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c > b/drivers/net/ethernet/intel/i40e/i40e_main.c > index d0b3a1b..dad15b6 100644 > --- a/drivers/net/ethernet/intel/i40e/i40e_main.c > +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c > @@ -11360,6 +11360,12 @@ static pci_ers_result_t > i40e_pci_error_detected(struct pci_dev *pdev, > ? > ? dev_info(&pdev->dev, "%s: error %d\n", __func__, error); > ? > + if (!pf) { > + dev_info(&pdev->dev, > + ?"Cannot recover - error happened during > device probe\n"); > + return PCI_ERS_RESULT_DISCONNECT; > + } > + Looks good to me. Acked-by: Jacob Keller Thanks for the bug fix and detailed explanation! Regards, Jake > ? /* shutdown all operations */ > ? if (!test_bit(__I40E_SUSPENDED, &pf->state)) { > ? rtnl_lock(); From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Keller, Jacob E" Subject: Re: [Intel-wired-lan] [PATCH net] i40e: avoid NULL pointer dereference and recursive errors on early PCI error Date: Tue, 27 Sep 2016 22:41:59 +0000 Message-ID: <1475016118.22933.1.camel@intel.com> References: <1475010871-31682-1-git-send-email-gpiccoli@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Cc: "netdev@vger.kernel.org" To: "gpiccoli@linux.vnet.ibm.com" , "Kirsher, Jeffrey T" , "intel-wired-lan@lists.osuosl.org" Return-path: Received: from mga06.intel.com ([134.134.136.31]:50400 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754514AbcI0WmB (ORCPT ); Tue, 27 Sep 2016 18:42:01 -0400 In-Reply-To: <1475010871-31682-1-git-send-email-gpiccoli@linux.vnet.ibm.com> Content-Language: en-US Content-ID: <1D9A315637C7214EAC37A05C871876F2@intel.com> Sender: netdev-owner@vger.kernel.org List-ID: T24gVHVlLCAyMDE2LTA5LTI3IGF0IDE4OjE0IC0wMzAwLCBHdWlsaGVybWUgRy4gUGljY29saSB3 cm90ZToNCj4gQWx0aG91Z2ggcmFyZSwgaXQncyBwb3NzaWJsZSB0byBoaXQgUENJIGVycm9yIGVh cmx5IG9uIGRldmljZQ0KPiBwcm9iZSwgbWVhbmluZyBwb3NzaWJseSBzb21lIHN0cnVjdHMgYXJl IG5vdCBlbnRpcmVseSBpbml0aWFsaXplZCwNCj4gYW5kIHNvbWUgbWlnaHQgZXZlbiBiZSBjb21w bGV0ZWx5IHVuaW5pdGlhbGl6ZWQsIGxlYWRpbmcgdG8gTlVMTA0KPiBwb2ludGVyIGRlcmVmZXJl bmNlLg0KPiANCj4gVGhlIGk0MGUgZHJpdmVyIGN1cnJlbnRseSBwcmVzZW50cyBhICJiYWQiIGJl aGF2aW9yIGlmIGRldmljZSBoaXRzDQo+IHN1Y2ggZWFybHkgUENJIGVycm9yOiBmaXJzdGx5LCB0 aGUgc3RydWN0IGk0MGVfcGYgbWlnaHQgbm90IGJlDQo+IGF0dGFjaGVkIHRvIHBjaV9kZXYgeWV0 LCBsZWFkaW5nIHRvIGEgTlVMTCBwb2ludGVyIGRlcmVmZXJlbmNlIG9uDQo+IGFjY2VzcyB0byBw Zi0+c3RhdGUuDQo+IA0KDQpPb3BzISBOaWNlIGZpbmQhDQoNCj4gRXZlbiBjaGVja2luZyBpZiB0 aGUgc3RydWN0IGlzIE5VTEwgYW5kIGF2b2lkaW5nIHRoZSBhY2Nlc3MgaW4gdGhhdA0KPiBjYXNl IGlzbid0IGVub3VnaCwgc2luY2UgdGhlIGRyaXZlciBjYW5ub3QgcmVjb3ZlciBmcm9tIFBDSSBl cnJvcg0KPiB0aGF0IGVhcmx5OyBpbiBvdXIgZXhwZXJpbWVudHMgd2Ugc2F3IG11bHRpcGxlIGZh aWx1cmVzIG9uIGtlcm5lbA0KPiBsb2csIGxpa2U6DQo+IA0KPiDCoCBbNTQ5LjY2NF0gaTQwZSAw MDA3OjAxOjAwLjE6IEluaXRpYWwgcGZfcmVzZXQgZmFpbGVkOiAtMTUNCj4gwqAgWzU0OS42NjRd IGk0MGU6IHByb2JlIG9mIDAwMDc6MDE6MDAuMSBmYWlsZWQgd2l0aCBlcnJvciAtMTUNCj4gwqAg Wy4uLl0NCj4gwqAgWzg3MS42NDRdIGk0MGUgMDAwNzowMTowMC4xOiBUaGUgZHJpdmVyIGZvciB0 aGUgZGV2aWNlIHN0b3BwZWQNCj4gYmVjYXVzZSB0aGUNCj4gwqAgZGV2aWNlIGZpcm13YXJlIGZh aWxlZCB0byBpbml0LiBUcnkgdXBkYXRpbmcgeW91ciBOVk0gaW1hZ2UuDQo+IMKgIFs4NzEuNjQ0 XSBpNDBlOiBwcm9iZSBvZiAwMDA3OjAxOjAwLjEgZmFpbGVkIHdpdGggZXJyb3IgLTMyDQo+IMKg IFsuLi5dDQo+IMKgIFs4NzIuNTE2XSBpNDBlIDAwMDc6MDE6MDAuMDogQVJROiBVbmtub3duIGV2 ZW50IDB4MDAwMCBpZ25vcmVkDQo+IA0KPiBCZXR3ZWVuIHRoZSBmaXJzdCBwcm9iZSBmYWlsdXJl IChlcnJvciAtMTUpIGFuZCB0aGUgc2Vjb25kIChlcnJvcg0KPiAtMzIpDQo+IGFub3RoZXIgUENJ IGVycm9yIGhhcHBlbmVkIGR1ZSB0byB0aGUgZmlyc3QgYmFkIHByb2JlLiBBbHNvLCBkcml2ZXIN Cj4gc3RhcnRlZCB0byBmbG9vZCBjb25zb2xlIHdpdGggdGhvc2UgQVJRIGV2ZW50IG1lc3NhZ2Vz Lg0KPiANCj4gVGhpcyBwYXRjaCB3aWxsIHByZXZlbnQgdGhlc2UgaXNzdWVzIGJ5IGFsbG93aW5n IGVycm9yIHJlY292ZXJ5DQo+IG1lY2hhbmlzbSB0byByZW1vdmUgdGhlIGZhaWxlZCBkZXZpY2Ug ZnJvbSB0aGUgc3lzdGVtIGluc3RlYWQgb2YNCj4gdHJ5aW5nIHRvIHJlY292ZXIgZnJvbSBlYXJs eSBQQ0kgZXJyb3JzIGR1cmluZyBkZXZpY2UgcHJvYmUuDQo+IA0KDQpUaGlzIHNlZW1zIHJlYXNv bmFibGUuDQoNCj4gU2lnbmVkLW9mZi1ieTogR3VpbGhlcm1lIEcuIFBpY2NvbGkgPGdwaWNjb2xp QGxpbnV4LnZuZXQuaWJtLmNvbT4NCj4gLS0tDQo+IMKgZHJpdmVycy9uZXQvZXRoZXJuZXQvaW50 ZWwvaTQwZS9pNDBlX21haW4uYyB8IDYgKysrKysrDQo+IMKgMSBmaWxlIGNoYW5nZWQsIDYgaW5z ZXJ0aW9ucygrKQ0KPiANCj4gZGlmZiAtLWdpdCBhL2RyaXZlcnMvbmV0L2V0aGVybmV0L2ludGVs L2k0MGUvaTQwZV9tYWluLmMNCj4gYi9kcml2ZXJzL25ldC9ldGhlcm5ldC9pbnRlbC9pNDBlL2k0 MGVfbWFpbi5jDQo+IGluZGV4IGQwYjNhMWIuLmRhZDE1YjYgMTAwNjQ0DQo+IC0tLSBhL2RyaXZl cnMvbmV0L2V0aGVybmV0L2ludGVsL2k0MGUvaTQwZV9tYWluLmMNCj4gKysrIGIvZHJpdmVycy9u ZXQvZXRoZXJuZXQvaW50ZWwvaTQwZS9pNDBlX21haW4uYw0KPiBAQCAtMTEzNjAsNiArMTEzNjAs MTIgQEAgc3RhdGljIHBjaV9lcnNfcmVzdWx0X3QNCj4gaTQwZV9wY2lfZXJyb3JfZGV0ZWN0ZWQo c3RydWN0IHBjaV9kZXYgKnBkZXYsDQo+IMKgDQo+IMKgCWRldl9pbmZvKCZwZGV2LT5kZXYsICIl czogZXJyb3IgJWRcbiIsIF9fZnVuY19fLCBlcnJvcik7DQo+IMKgDQo+ICsJaWYgKCFwZikgew0K PiArCQlkZXZfaW5mbygmcGRldi0+ZGV2LA0KPiArCQkJwqAiQ2Fubm90IHJlY292ZXIgLSBlcnJv ciBoYXBwZW5lZCBkdXJpbmcNCj4gZGV2aWNlIHByb2JlXG4iKTsNCj4gKwkJcmV0dXJuIFBDSV9F UlNfUkVTVUxUX0RJU0NPTk5FQ1Q7DQo+ICsJfQ0KPiArDQoNCkxvb2tzIGdvb2QgdG8gbWUuDQoN CkFja2VkLWJ5OiBKYWNvYiBLZWxsZXIgPGphY29iLmUua2VsbGVyQGludGVsLmNvbT4NCg0KVGhh bmtzIGZvciB0aGUgYnVnIGZpeCBhbmQgZGV0YWlsZWQgZXhwbGFuYXRpb24hDQoNClJlZ2FyZHMs DQpKYWtlDQoNCj4gwqAJLyogc2h1dGRvd24gYWxsIG9wZXJhdGlvbnMgKi8NCj4gwqAJaWYgKCF0 ZXN0X2JpdChfX0k0MEVfU1VTUEVOREVELCAmcGYtPnN0YXRlKSkgew0KPiDCoAkJcnRubF9sb2Nr KCk7DQo=