public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Lukas Wunner <lukas@wunner.de>
To: Ethan Zhao <haifeng.zhao@linux.intel.com>
Cc: bhelgaas@google.com, baolu.lu@linux.intel.com,
	dwmw2@infradead.org, will@kernel.org, robin.murphy@arm.com,
	linux-pci@vger.kernel.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org,
	Haorong Ye <yehaorong@bytedance.com>
Subject: Re: [PATCH v4 2/2] iommu/vt-d: don's issue devTLB flush request when device is disconnected
Date: Fri, 22 Dec 2023 09:14:27 +0100	[thread overview]
Message-ID: <20231222081427.GA4134@wunner.de> (raw)
In-Reply-To: <02619a5c-842c-4441-85cb-0f7151705a5d@linux.intel.com> <589b2dbc-325b-404f-a387-b1c99a064d15@linux.intel.com> <cc6f7c1a-13f0-475a-9961-e22e73b13a32@linux.intel.com> <8fbd1a86-1ef5-4679-a4d9-b4faee2eda64@linux.intel.com> <94b08bab-6488-4c4a-9742-30a69972ba50@linux.intel.com>

On Fri, Dec 22, 2023 at 09:56:39AM +0800, Ethan Zhao wrote:
> I don't know if the polling along sleeping for completion of meanningless
> devTLB invalidation request blindly sent to (removed/powered down/link down)
> device makes sense or not.

If you have a way to get to the struct pci_dev * which you're waiting for
in qi_submit_sync() then I guess you could check for its presence and bail
out if it's gone, instead of issuing a cpu_relax().


> > Again, the proposed patch is not a proper solution.  It will paper over
> > the issue most of the time but every once in a while someone will still
> > get a hard lockup splat and it will then be more difficult to reproduce
> > and fix if the proposed patch is accepted.
> 
> Could you point out why is not proper ? Is there any other window
> the hard lockup still could happen with the ATS capable devcie
> supprise_removal case if we checked the connection state first ?
> Please help to elaberate it.

Even though user space may have initiated orderly removal via sysfs,
the device may be yanked from the slot (surprise removed) while the
orderly removal is happening.


> Yes, this is the old kernel stack trace, but customer also tried lasted
> 6.7rc4

If you could provide a stacktrace for a contemporary kernel,
I think that would be preferred.


> (doesn't work) and the patched 6.7rc4 (fixed).

Why is it fixed in v6.7-rc4?  Is the present patch thus unnecessary?


> > Finally, it is common to adhere to terms
> > used in the PCIe Base Spec in commit messages, so "ATC Invalidate Request"
> > might be preferable to "devTLB flush request".
> 
> ATS Invalidate Request ? devTLB flush request has the same meaning,
> 
> I thought all iommu/PCIe guys could understand.

I'm just pointing out the preferred way to write commit messages
in the PCI subsystem (as I've perceived it over the years) so that
you can reduce the number of iterations you have to go through
due to maintainer feedback.  I'm just trying to be helpful.


> How to define the point "some" msec to timeout while software
> break out the waiting loop ? or polling if the target is gone ?

I'd say adhere to the 1 min + 50% number provided in the spec.

If you know the device is gone before that then you can break out
of the loop in qi_submit_sync() of course.

The question is, does the Intel IOMMU have a timeout at all for
Invalidate Requests?  I guess we don't really know that because
in the stack trace you've provided, the watchdog stops the machine
before a timeout occurs.  So it's at least 12 sec.  Or there's
no timeout at all.

If the Intel IOMMU doesn't enforce a timeout, you should probably amend
qi_submit_sync() to break out of the loop once the 1 min + 50% limit
is exceeded.  And you need to amend the function to sleep instead of
polling in interrupt context.

Can you check with hardware engineers whether there's a timeout?

Thanks,

Lukas

  reply	other threads:[~2023-12-22  8:14 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-20  0:51 [PATCH v4 0/2] fix vt-d hard lockup when hotplug ATS capable device Ethan Zhao
2023-12-20  0:51 ` [PATCH v4 1/2] PCI: make pci_dev_is_disconnected() helper public for other drivers Ethan Zhao
2023-12-20  0:51 ` [PATCH v4 2/2] iommu/vt-d: don's issue devTLB flush request when device is disconnected Ethan Zhao
2023-12-21 10:39   ` Lukas Wunner
2023-12-21 11:01     ` Lukas Wunner
2023-12-22  2:08       ` Ethan Zhao
2023-12-22  3:56       ` Ethan Zhao
2023-12-22  1:56     ` Ethan Zhao
2023-12-22  8:14       ` Lukas Wunner [this message]
2023-12-22  9:01         ` Ethan Zhao
  -- strict thread matches above, loose matches on Subject: below --
2023-12-13  3:46 [PATCH RFC 0/2] fix vt-d hard lockup when hotplug ATS capable device Ethan Zhao
2023-12-13  3:46 ` [PATCH 1/2] PCI: make pci_dev_is_disconnected() helper public for other drivers Ethan Zhao
2023-12-13 10:49   ` Lukas Wunner
2023-12-14  0:58     ` Ethan Zhao
2023-12-21 10:51       ` Lukas Wunner
2023-12-22  2:35         ` Ethan Zhao
2023-12-13  3:46 ` [PATCH 2/2] iommu/vt-d: don's issue devTLB flush request when device is disconnected Ethan Zhao
2023-12-13 10:44   ` Lukas Wunner
2023-12-13 11:54     ` Robin Murphy
2023-12-14  2:40       ` Ethan Zhao
2023-12-21 10:42       ` Lukas Wunner
2023-12-21 11:01         ` Robin Murphy
2023-12-21 11:07           ` Lukas Wunner
2023-12-22  3:20         ` Ethan Zhao
2023-12-14  2:16     ` Ethan Zhao
2023-12-15  0:43       ` Ethan Zhao
2023-12-13 11:59   ` Baolu Lu
2023-12-14  2:26     ` Ethan Zhao
2023-12-15  1:03       ` Ethan Zhao
2023-12-15  1:34         ` Baolu Lu
2023-12-15  1:51           ` Ethan Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231222081427.GA4134@wunner.de \
    --to=lukas@wunner.de \
    --cc=baolu.lu@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=dwmw2@infradead.org \
    --cc=haifeng.zhao@linux.intel.com \
    --cc=iommu@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    --cc=yehaorong@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox