From: Jack Pham <jackp@codeaurora.org>
To: Mathias Nyman <mathias.nyman@linux.intel.com>
Cc: Peter Chen <peter.chen@nxp.com>,
mathias.nyman@intel.com, linux-usb@vger.kernel.org,
linux-imx@nxp.com, jun.li@nxp.com, Joel Stanley <joel@jms.id.au>,
Mayank Rana <mrana@codeaurora.org>
Subject: [1/3] usb: host: xhci: fix oops when removing hcd
Date: Wed, 26 Sep 2018 18:39:48 -0700 [thread overview]
Message-ID: <20180927013948.GB17520@jackp-linux.qualcomm.com> (raw)
On Wed, Sep 26, 2018 at 06:34:30PM -0700, Jack Pham wrote:
> Hi Mathias,
>
> On Wed, Sep 26, 2018 at 02:51:40PM +0300, Mathias Nyman wrote:
> > Hi Jack, Peter
> >
> > On 24.09.2018 19:37, Jack Pham wrote:
> > >Hi Peter,
> > >
> > >On Fri, Sep 21, 2018 at 09:48:43AM +0800, Peter Chen wrote:
> > >>Type-C-to-A cable, and the USB3 HCD has already been NULL at that time.
> > >>The oops log like below:
> > >>
> > >>[681.782288] xhci-hcd xhci-hcd.1.auto: remove, state 1
> > >>[681.787490] usb usb4: USB disconnect, device number 1
> > >>[681.792808] usb 4-1: USB disconnect, device number 2
> > >>[681.818089] xhci-hcd xhci-hcd.1.auto: USB bus 4 deregistered
> > >>[681.823803] Unable to handle kernel NULL pointer dereference at virtual address 000000a0
> > >>[681.823806] Mem abort info:
> > >>[681.823809] Exception class = DABT (current EL), IL = 32 bits
> > >>[681.823811] SET = 0, FnV = 0
> > >>[681.823813] EA = 0, S1PTW = 0
> > >>[681.823814] Data abort info:
> > >>[681.823816] ISV = 0, ISS = 0x00000004
> > >>[681.823818] CM = 0, WnR = 0
> > >>[681.823822] user pgtable: 4k pages, 48-bit VAs, pgd = ffff8000ae3fd000
> > >>[681.823824] [00000000000000a0] *pgd=0000000000000000
> > >>[681.823829] Internal error: Oops: 96000004 [#1] PREEMPT SMP
> > >>[681.823832] Modules linked in: 8021q garp stp mrp crc32_ce qca6174(O) crct10dif_ce galcore(O)
> > >>[681.823849] CPU: 0 PID: 94 Comm: kworker/0:1 Tainted: G O 4.14.62-imx_4.14.y+gcd63def #1
> > >>[681.823851] Hardware name: Freescale i.MX8MQ EVK (DT)
> > >>[681.823862] Workqueue: events_freezable __dwc3_set_mode
> > >>[681.823865] task: ffff8000b8a18000 task.stack: ffff00000a010000
> > >>[681.823872] PC is at xhci_irq+0x5fc/0x14b8
> > >>[681.823875] LR is at xhci_irq+0x3c/0x14b8
> > >
> > ><snip>
> > >
> > >>diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> > >>index f0a99aa0ac58..2dc5176b79d0 100644
> > >>--- a/drivers/usb/host/xhci-ring.c
> > >>+++ b/drivers/usb/host/xhci-ring.c
> > >>@@ -2680,7 +2680,8 @@ irqreturn_t xhci_irq(struct usb_hcd *hcd)
> > >> }
> > >> if (xhci->xhc_state & XHCI_STATE_DYING ||
> > >>- xhci->xhc_state & XHCI_STATE_HALTED) {
> > >>+ xhci->xhc_state & XHCI_STATE_HALTED ||
> > >>+ xhci->xhc_state & XHCI_STATE_REMOVING) {
> > >> xhci_dbg(xhci, "xHCI dying, ignoring interrupt. "
> > >> "Shouldn't IRQs be disabled?\n");
> > >> /* Clear the event handler busy flag (RW1C);
> > >
> > >We also noticed the same crash as you found, and tried to fix it in a
> > >similar way, but noticed that this still allows a memory leak to happen.
> > >
> > >It seems from commit fe190ed0d602a ("xhci: Do not halt the host until
> > >both HCD have disconnected their devices.") this was added to
> > >xhci_stop(), and is the reason we encounter the NULL pointer in
> > >xhci_irq() when it tries to access xhci->shared_hcd.
> > >
> > > + /* Only halt host and free memory after both hcds are removed */
> > > if (!usb_hcd_is_primary_hcd(hcd)) {
> > > + /* usb core will free this hcd shortly, unset pointer */
> > > + xhci->shared_hcd = NULL;
> > > mutex_unlock(&xhci->mutex);
> > > return;
> > > }
> > >
> > >While your fix will simply abort the xhci_irq() function if it
> > >encounters XHCI_STATE_REMOVING, during xhci_plat_remove():
> > >
> > > static int xhci_plat_remove(struct platform_device *dev)
> > > {
> > > struct usb_hcd *hcd = platform_get_drvdata(dev);
> > > struct xhci_hcd *xhci = hcd_to_xhci(hcd);
> > > struct clk *clk = xhci->clk;
> > > struct clk *reg_clk = xhci->reg_clk;
> > >
> > > xhci->xhc_state |= XHCI_STATE_REMOVING;
> > >
> > > usb_remove_hcd(xhci->shared_hcd);
> > > ^^^^^^^^^ calls xhci_stop() and sets shared_hcd=NULL
> > > usb_phy_shutdown(hcd->usb_phy);
> > >
> > > usb_remove_hcd(hcd);
> > > usb_put_hcd(xhci->shared_hcd);
> > > ^^^^^^^^^^^ shared_hcd==NULL, so this is a no-op
> > >
> > >Since usb_put_hcd() doesn't get called for shared_hcd, we end up
> > >with one additional kref count and hence a leak.
> >
> > Nice catch, this same issue exists in
> >
> > xhci_pci_remove()
> > tegra_xusb_remove()
> > xhci_mtk_remove()
> > xhci_histb_remove()
> >
> >
> > >
> > >Wondering if we need to also remove the xhci->shared_hcd = NULL from
> > >xhci_stop(), in addition to your patches. Thoughts?
> >
> > At some point the xhci->shared_hcd needs to be set NULL, it can be done in
> > in the xhci_plat_remove(), xhci_pci_remove() and the similar remove functions
> > after calling usb_remove_hcd(). we can't rely on xhci->shared_hcd after it
> > has been removed.
>
> Currently it is already done as follows:
>
> usb_remove_hcd(xhci->shared_hcd)
> -> xhci_stop()
> /* Only halt host and free memory after both hcds are removed */
> if (!usb_hcd_is_primary_hcd(hcd)) {
> /* usb core will free this hcd shortly, unset pointer */
> xhci->shared_hcd = NULL;
>
> I guess at the very least your suggestion to do it from the
> xhci_{plat,pci,mtk,tegra}_remove() context makes it more explicit rather
> than having it as a side effect of usb_remove_hcd()/xhci_stop(). The net
> effect is more or less the same though IMO.
D'oh. Of course it makes sense to move the NULL assignment to the caller
*after* calling usb_put_hcd() so it can be properly freed.
Jack
next reply other threads:[~2018-09-27 1:39 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-27 1:39 Jack Pham [this message]
-- strict thread matches above, loose matches on Subject: below --
2018-09-27 1:34 [1/3] usb: host: xhci: fix oops when removing hcd Jack Pham
2018-09-26 11:51 Mathias Nyman
2018-09-26 2:22 Peter Chen
2018-09-24 16:37 Jack Pham
2018-09-21 1:48 Peter Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180927013948.GB17520@jackp-linux.qualcomm.com \
--to=jackp@codeaurora.org \
--cc=joel@jms.id.au \
--cc=jun.li@nxp.com \
--cc=linux-imx@nxp.com \
--cc=linux-usb@vger.kernel.org \
--cc=mathias.nyman@intel.com \
--cc=mathias.nyman@linux.intel.com \
--cc=mrana@codeaurora.org \
--cc=peter.chen@nxp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).