From: Lukas Wunner <lukas@wunner.de>
To: gokul cg <gokuljnpr@gmail.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>,
Bjorn Helgaas <helgaas@kernel.org>,
Ashok Raj <ashok.raj@intel.com>,
Keith Busch <keith.busch@intel.com>,
Yinghai Lu <yinghai@kernel.org>, Sinan Kaya <okaya@kernel.org>,
linux-pci@vger.kernel.org,
Alexandru Gagniuc <mr.nuke.me@gmail.com>
Subject: Re: [PATCH] PCI: pciehp: Differentiate between surprise and safe removal
Date: Thu, 2 Aug 2018 10:46:57 +0200 [thread overview]
Message-ID: <20180802084657.GA21267@wunner.de> (raw)
In-Reply-To: <CAFP4jM8AYG7hmkC_rYgXAfLoJmkJuW0e1UbgiayGrCPbb_yw8A@mail.gmail.com>
On Thu, Aug 02, 2018 at 12:59:18PM +0530, gokul cg wrote:
> I am suspecting a possible race condition in the kernel between PCI driver
> and AER handling.
>
> Because of the same kernel panic happens from worker thread which handles
> bottom half of aer irq.
>
> I am seeing this issue when I suddenly power off PCI card which
> supports/enabled PCIE AER error reporting.
>
> While powering off PCI device, AER driver will get AER IRQ for the device,
> from AER IRQ handler, it will cache AER error code and schedule worker
> thread to handle error.
>
> The PCIe device will get removed from PCI tree before worker thread
> completes its task and kernel panic is happening when worker thread tries
> to access PCI device's config space.
>
> #5 [ffff88027469fc70] general_protection at ffffffff8176cdf2
> [exception RIP: pci_bus_read_config_dword+100]
> #6 [ffff88027469fd50] pci_find_next_ext_capability at ffffffff81345d7b
> #7 [ffff88027469fd90] pci_find_ext_capability at ffffffff81347225
> #8 [ffff88027469fda0] get_device_error_info at ffffffff81356c4d
> #9 [ffff88027469fdd0] aer_isr at ffffffff81357a38
> #10 [ffff88027469fe28] process_one_work at ffffffff8105d4c0
> #11 [ffff88027469fe70] worker_thread at ffffffff8105e251
> #12 [ffff88027469fed0] kthread at ffffffff81064260
> #13 [ffff88027469ff50] ret_from_fork at ffffffff81773a38
>
> I have tested it on kernel 3.10 . But from source i could see that this
> case is still relevant for latest Linux source .
I'm not really familiar with the AER driver, but the problem is
actually easy to spot:
find_source_device() walks the hierarchy and saves a pointer to
pci_dev's in an array. That array is later traversed and the
pci_dev's are accessed.
The solution is to acquire a ref on each device in add_error_device():
- e_info->dev[e_info->error_dev_num] = dev;
+ e_info->dev[e_info->error_dev_num] = pci_dev_get(dev);
Then release the ref aer_process_err_devices() by calling pci_dev_put().
I believe there's an ongoing refactoring of the AER driver and the
issue may be addressed in the course of it, but as a quick fix for
an ancient v3.10 kernel, the above should do the trick.
HTH,
Lukas
next prev parent reply other threads:[~2018-08-02 8:46 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-31 5:50 [PATCH] PCI: pciehp: Differentiate between surprise and safe removal Lukas Wunner
2018-08-01 16:43 ` Mika Westerberg
2018-08-01 17:15 ` Lukas Wunner
2018-08-01 19:09 ` Alex G.
2018-08-02 7:20 ` Mika Westerberg
2018-08-02 7:29 ` gokul cg
2018-08-02 8:46 ` Lukas Wunner [this message]
2018-08-02 12:28 ` gokul cg
2018-08-02 15:07 ` Lukas Wunner
2018-08-02 17:09 ` Thomas Tai
2018-08-06 18:33 ` gokul cg
2018-08-07 14:26 ` Thomas Tai
2018-08-07 15:30 ` Thomas Tai
2018-08-08 9:59 ` gokul cg
2018-08-08 11:21 ` gokul cg
2018-08-08 20:49 ` Thomas Tai
2018-09-04 17:53 ` Bjorn Helgaas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180802084657.GA21267@wunner.de \
--to=lukas@wunner.de \
--cc=ashok.raj@intel.com \
--cc=gokuljnpr@gmail.com \
--cc=helgaas@kernel.org \
--cc=keith.busch@intel.com \
--cc=linux-pci@vger.kernel.org \
--cc=mika.westerberg@linux.intel.com \
--cc=mr.nuke.me@gmail.com \
--cc=okaya@kernel.org \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.