From mboxrd@z Thu Jan 1 00:00:00 1970 From: gpiccoli@linux.vnet.ibm.com (Guilherme G. Piccoli) Date: Wed, 5 Apr 2017 17:01:15 -0300 Subject: [PATCH] nvme: avoid NULL pointer dereference in error recovery path In-Reply-To: <20170405194357.GA11705@lst.de> References: <20170405194037.1019-1-gpiccoli@linux.vnet.ibm.com> <20170405194357.GA11705@lst.de> Message-ID: On 04/05/2017 04:43 PM, Christoph Hellwig wrote: > On Wed, Apr 05, 2017@04:40:37PM -0300, Guilherme G. Piccoli wrote: >> It's possible that driver fails to recover from a PCI error and the >> PCI core (or arch PCI specifics, like EEH in PowerPC) starts a process >> of device removal. While this removal process is happening, if another >> PCI error is triggered, we might have a NULL address for >> "struct *nvme_dev", pointed by "pci_dev *driver_data" - for example this >> happens if nvme_remove() already have set that pci_dev struct's field >> to NULL. >> >> In this case, the driver error handler functions will dereferece a NULL >> pointer, causing a kernel oops. This patch checks for NULL pointer on >> error handlers and in case "driver_data" points to NULL, it aborts the >> error recovery path and return a fail error value to PCI core. > > I think this needs to be fixed at a higher level, that is the PCI > core. Once you have the callbacks run in parallel a simple null check > isn't going to fix this but every single access to the structure > is a possible use after free. > Christoph, your point makes sense. It's possible (I guess not expected, but at least feasible) to have a 2nd PCI error being triggered while a 1st error is in recovery/removal-on-failure process, after the slot_reset() step of the 1st error specifically. Many drivers around do these NULL pointer checks - solving this in EEH core would mean having a way to every driver communicate the error recovery process is completed and the next error could be "issued". Like PCI core would "stack" the next error. But...what if the device has really a severe issue in its slot, and it's unable to recover at all? So, it'll trigger another error and naturally the callbacks are going to be called, expecting to drop recoveries and signal to PCI core something went real bad. So, I believe these NULL checks will avoid at least this dereference case and enforce signaling to PCI core to remove the device if this "racy" error sequence happens after a device recovery failure is in process. If you see another paths that multiple PCI errors happening in sequence might cause trouble to nvme driver that I'm not aware, please let me know and we can think a better solution. Also, if you have ideas on how to address this in a more elegant/generic way, let me know too, I really appreciate. Guess this patch is the most simplistic approach, at least the one I could think of. Thanks for the quick review! Cheers, Guilherme