From: Lukas Wunner <lukas@wunner.de>
To: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Lino Sanfilippo <LinoSanfilippo@gmx.de>,
peterhuewe@gmx.de, jgg@ziepe.ca, jsnitsel@redhat.com,
hdegoede@redhat.com, oe-lkp@lists.linux.dev, lkp@intel.com,
peter.ujfalusi@linux.intel.com, peterz@infradead.org,
linux@mniewoehner.de, linux-integrity@vger.kernel.org,
linux-kernel@vger.kernel.org, l.sanfilippo@kunbus.com,
p.rosenberger@kunbus.com
Subject: Re: [PATCH 1/2] tpm, tpm_tis: Handle interrupt storm
Date: Tue, 23 May 2023 21:37:45 +0200 [thread overview]
Message-ID: <20230523193745.GA5820@wunner.de> (raw)
In-Reply-To: <CSTVVFNKUVJW.P69FKI6IF3ZN@suppilovahvero>
On Tue, May 23, 2023 at 09:53:58PM +0300, Jarkko Sakkinen wrote:
> On Mon May 22, 2023 at 5:31 PM EEST, Lino Sanfilippo wrote:
> > Commit e644b2f498d2 ("tpm, tpm_tis: Enable interrupt test") enabled
> > interrupts instead of polling on all capable TPMs. Unfortunately, on some
> > products the interrupt line is either never asserted or never deasserted.
> >
> > The former causes interrupt timeouts and is detected by
> > tpm_tis_core_init(). The latter results in interrupt storms.
[...]
> > The current approach to avoid those storms is to disable interrupts by
> > adding a DMI quirk for the concerned device.
> >
> > However this is a maintenance burden in the long run, so use a generic
> > approach:
>
> I'm trying to comprehend how you evaluate, how big maintenance burden
> this would be. Adding even a few dozen table entries is not a
> maintenance burden.
>
> On the other hand any new functionality is objectively a maintanance
> burden of some measure (applies to any functionality). So how do we know
> that taking this change is less of a maintenance burden than just add
> new table entries, as they come up?
That approach seems irresponsible to me as you force users to report
the issue, you force developers to add a quirk. Users expect things
to just work. And that's precisely what this patch will achieve.
Lino introduced the issues by universally enabling interrupts.
He acted responsibly by making the issue disappear most likely
for everyone. I find it bewildering that you'd prefer the opposite
approach and rather inflict on every affected user to open a
bug report and hope that someone writes a patch for them.
So far interrupts are only enabled in an rc release and linux-next,
yet issue reports have popped up quickly. My expectation is we'll
see a lot more reports once v6.4-final is released, doubly so when
the next stable kernel with that feature is released.
> > Detect an interrupt storm by counting the number of unhandled interrupts
> > within a 10 ms time interval. In case that more than 1000 were unhandled
> > deactivate interrupts, deregister the handler and fall back to polling.
>
> I know it can be sometimes hard to evaluate but can you try to explain
> how you came up to the 10 ms sampling period and 1000 interrupt
> threshold? I just don't like abritrary numbers.
If you choose a too low number, you'll get false positives due to
regular interrupt assertion either by the TPM or by another device
sharing the interrupt.
Those numbers may have to be fine-tuned over time to eliminate
false positives and false negatives. They're as good a number as
any to get that fine-tuning process started.
> > +unhandled:
> > + tpm_tis_process_unhandled_interrupt(chip);
> > + return IRQ_HANDLED;
>
> Shouldn't the return value be IRQ_NONE?
No, absolutely not. If you return IRQ_NONE here then genirq code
will increase the spurious interrupt counter. That's bad because
the IRQ storm detection tpm_tis_core.c would race with the IRQ storm
detection in genirq code:
Note that disablement of the interrupt must happen in a work_struct
here to avoid a deadlock. (The deadlock would occur because
devm_free_irq() waits for the interrupt handler to finish.)
Now, let's say the 1000 unhandled interrupts limit has been reached
and the work_struct is scheduled. If the work_struct isn't run
quickly enough, you may reach the 99900 limit in note_interrupt()
(see kernel/irq/spurious.c) and then genirq code will force the
interrupt off completely.
To avoid that you *have* to return IRQ_HANDLED here and thus pretend
towards genirq code that the interrupt was not spurious.
> > struct tpm_tis_data {
> > + struct tpm_chip *chip;
> > u16 manufacturer_id;
> > struct mutex locality_count_mutex;
> > unsigned int locality_count;
> > int locality;
> > + /* Interrupts */
>
> Not relevant change for a bug fix.
But not harmful either, is it?
Thanks,
Lukas
next prev parent reply other threads:[~2023-05-23 19:38 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-22 14:31 [PATCH 1/2] tpm, tpm_tis: Handle interrupt storm Lino Sanfilippo
2023-05-22 14:31 ` [PATCH 2/2] tpm, tpm_tis: reuse code in disable_interrupts() Lino Sanfilippo
2023-05-22 22:45 ` Jerry Snitselaar
2023-05-23 7:08 ` Péter Ujfalusi
2023-05-23 19:07 ` Jarkko Sakkinen
2023-05-23 20:52 ` Lino Sanfilippo
2023-05-24 1:29 ` Jarkko Sakkinen
2023-05-22 22:44 ` [PATCH 1/2] tpm, tpm_tis: Handle interrupt storm Jerry Snitselaar
2023-05-23 6:48 ` Péter Ujfalusi
2023-05-23 7:07 ` Péter Ujfalusi
2023-05-23 7:44 ` Lukas Wunner
2023-05-23 9:14 ` Péter Ujfalusi
2023-05-23 9:20 ` Hans de Goede
2023-05-23 9:35 ` Péter Ujfalusi
2023-05-23 10:35 ` Lino Sanfilippo
2023-05-23 15:19 ` Lukas Wunner
2023-05-23 10:41 ` Lino Sanfilippo
2023-05-23 15:16 ` Lukas Wunner
2023-05-24 9:08 ` Hans de Goede
2023-05-29 10:44 ` Michael Niewöhner
2023-05-23 19:35 ` Jarkko Sakkinen
2023-05-23 18:53 ` Jarkko Sakkinen
2023-05-23 19:00 ` Jarkko Sakkinen
2023-05-23 19:46 ` Lukas Wunner
2023-05-24 1:44 ` Jarkko Sakkinen
2023-05-23 20:50 ` Lino Sanfilippo
2023-05-23 19:12 ` Jarkko Sakkinen
2023-05-23 22:32 ` Jerry Snitselaar
2023-05-24 1:21 ` Jarkko Sakkinen
2023-05-23 19:37 ` Lukas Wunner [this message]
2023-05-24 1:46 ` Jarkko Sakkinen
2023-05-23 20:46 ` Lino Sanfilippo
2023-05-29 6:46 ` Péter Ujfalusi
2023-05-29 13:15 ` Lino Sanfilippo
2023-06-06 16:42 ` Jarkko Sakkinen
2023-05-30 17:56 ` Jerry Snitselaar
2023-06-06 16:35 ` Jarkko Sakkinen
2023-06-06 16:45 ` Jarkko Sakkinen
2023-05-24 3:58 ` Jarkko Sakkinen
2023-05-24 3:59 ` Jarkko Sakkinen
2023-05-24 7:29 ` Lukas Wunner
2023-05-24 15:30 ` Jarkko Sakkinen
2023-05-26 0:37 ` Lino Sanfilippo
2023-05-30 10:31 ` Lino Sanfilippo
2023-05-25 23:45 ` Lino Sanfilippo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230523193745.GA5820@wunner.de \
--to=lukas@wunner.de \
--cc=LinoSanfilippo@gmx.de \
--cc=hdegoede@redhat.com \
--cc=jarkko@kernel.org \
--cc=jgg@ziepe.ca \
--cc=jsnitsel@redhat.com \
--cc=l.sanfilippo@kunbus.com \
--cc=linux-integrity@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@mniewoehner.de \
--cc=lkp@intel.com \
--cc=oe-lkp@lists.linux.dev \
--cc=p.rosenberger@kunbus.com \
--cc=peter.ujfalusi@linux.intel.com \
--cc=peterhuewe@gmx.de \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox