linux-serial.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Leonardo Bras <leobras@redhat.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: "Leonardo Bras" <leobras@redhat.com>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Jiri Slaby" <jirislaby@kernel.org>,
	"Tony Lindgren" <tony@atomide.com>,
	"Andy Shevchenko" <andriy.shevchenko@linux.intel.com>,
	"John Ogness" <john.ogness@linutronix.de>,
	"Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>,
	"Uwe Kleine-König" <u.kleine-koenig@pengutronix.de>,
	"Florian Fainelli" <florian.fainelli@broadcom.com>,
	"Shanker Donthineni" <sdonthineni@nvidia.com>,
	linux-kernel@vger.kernel.org, linux-serial@vger.kernel.org
Subject: Re: [RFC PATCH v2 3/4] irq: Introduce IRQ_HANDLED_MANY
Date: Thu, 14 Nov 2024 00:40:17 -0300	[thread overview]
Message-ID: <ZzVxIfb5KpL97P4Q@LeoBras> (raw)
In-Reply-To: <ZdghE6TNHgZ_bi19@LeoBras>

On Fri, Feb 23, 2024 at 01:37:39AM -0300, Leonardo Bras wrote:
> On Wed, Feb 21, 2024 at 04:41:20PM +0100, Thomas Gleixner wrote:
> > On Wed, Feb 21 2024 at 02:39, Leonardo Bras wrote:
> > > On Mon, Feb 19, 2024 at 12:03:07PM +0100, Thomas Gleixner wrote:
> > >> >> Is scenarios where there is no need to keep track of IRQ handled, convert
> > >> >> it back to IRQ_HANDLED.
> > >> >
> > >> > That's not really workable as you'd have to update tons of drivers just
> > >> > to deal with that corner case. That's error prone and just extra
> > >> > complexity all over the place.
> > >
> > > I agree, that's a downside of this implementation. 
> > 
> > A serious one which is not really workable. See below.
> > 
> > > I agree the above may be able to solve the issue, but it would make 2 extra 
> > > atomic ops necessary in the thread handling the IRQ, as well as one extra 
> > > atomic operation in note_interrupt(), which could increase latency on this 
> > > IRQ deferring the handler to a thread.
> > >
> > > I mean, yes, the cpu running note_interrupt() would probably already have 
> > > exclusiveness for this cacheline, but it further increases cacheline 
> > > bouncing and also adds the mem barriers that incur on atomic operations, 
> > > even if we use an extra bit from threads_handled instead of allocate a new 
> > > field for threads_running.
> > 
> > I think that's a strawman. Atomic operations can of course be more
> > expensive than non-atomic ones, but they only start to make a difference
> > when the cache line is contended. That's not the case here for the
> > normal operations.
> > 
> > Interrupts and their threads are strictly targeted to a single CPU and
> > the cache line is already hot and had to be made exclusive because of
> > other write operations to it.
> > 
> > There is usually no concurrency at all, except for administrative
> > operations like enable/disable or affinity changes. Those administrative
> > operations are not high frequency and the resulting cache line bouncing
> > is unavoidable even without that change. But does it matter in the
> > larger picture? I don't think so.
> 
> That's a fair point, but there are some use cases that use CPU Isolation on 
> top of PREEMPT_RT in order to reduce interference on a CPU running an RT 
> workload.
> 
> For those cases, IIRC the handler will run on a different (housekeeping) 
> CPU when those IRQs originate on an Isolated CPU, meaning the above 
> described cacheline bouncing is expected.
> 
> 
> > 
> > > On top of that, let's think on a scenario where the threaded handler will 
> > > solve a lot of requests, but not necessarily spend a lot of time doing so.
> > > This allows the thread to run for little time while solving a lot of 
> > > requests.
> > >
> > > In this scenario, note_interrupt() could return without incrementing 
> > > irqs_unhandled for those IRQ that happen while the brief thread is running, 
> > > but every other IRQ would cause note_interrupt() to increase 
> > > irqs_unhandled, which would cause the bug to still reproduce.
> > 
> > In theory yes. Does it happen in practice?
> > 
> > But that exposes a flaw in the actual detection code. The code is
> > unconditionally accumulating if there is an unhandled interrupt within
> > 100ms after the last unhandled one. IOW, if there is a periodic
> > unhandled one every 50ms, the interrupt will be shut down after 100000 *
> > 50ms = 5000s ~= 83.3m ~= 1.4h. And it neither cares about the number of
> > actually handled interrupts.
> > 
> > The spurious detector is really about runaway interrupts which hog a CPU
> > completely, but the above is not what we want to protect against.
> 
> Now it makes a lot more sense to me.
> Thanks!

Hi Thomas,

I would like to go back to this discussion :)
From what I could understand, and read back the thread:

- The spurious detector is used to avoid cpu hog when a lots of IRQs are 
  hitting a cpu, but few ( < 100 / 100k) are being handled. It works by
  disabling that interruption.

- The bug I am dealing with (on serial8250), happens to fit exactly at
  above case: lots of requests, but few are handled.
  The reason: threaded handler, many requests, and they are dealt with in 
  batch: multiple requests are handled at once, but a single IRQ_HANDLED 
  returned.

- My proposed solution: Find a way of accounting the requests handled.

  - Implementation: add an option for drivers voluntarily report how 
    many requests they handled. Current drivers need no change.

  - Limitation: If this issue is found on another driver, we need to 
    implement accounting there as well. This may only happen on drivers
    which handle over 1k requests at once.


What was left for me TODO:
Think on a generic solution for this issue, to avoid dealing with that 
in a per-driver basis. 

That's what I was able to think about:

- Only the driver code knows how many requests it handled, so without  
  touching them we can't know how many requests were properly handled.

- I could try thinking a different solution, which involves changing only
  the spurious detector.

  - For that I would need to find a particular characteristic we would want 
    to avoid spurious detection against, and make sure it won't miss an
    actual case we want to be protected about.

Generic solutions(?) proposed:
- Zero irqs_unhandled if threaded & handles a single request in 100k
  - Problem: A regular issue with the interruption would not be detected 
    in the driver. 

- Skip detection if threaded & the handling thread is running
  - Problem 1: the thread may run shortly and batch handle a lot of stuff, 
  not being detected by the spurious detector. 
  - Problem 2: the thread may get stuck, not handle the IRQs and also not
  being detected by the spurious handler. (IIUC)


In the end, I could not find a proper way of telling apart
a - "this is a real spurious IRQ behavior, which needs to be disabled", and 
b - "this is just a handler that batch-handles it's requests",
without touching the drivers' code.

Do you have any suggestion on how to do that?

Thanks!
Leo


  parent reply	other threads:[~2024-11-14  3:40 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-16  7:59 [RFC PATCH v2 0/4] Fix force_irqthread + fast triggered edge-type IRQs Leonardo Bras
2024-02-16  7:59 ` [RFC PATCH v2 1/4] irq: Move spurious_deferred bit from BIT(31) to BIT(0) Leonardo Bras
2024-02-16  7:59 ` [RFC PATCH v2 2/4] irq/spurious: Account for multiple handles in note_interrupt Leonardo Bras
2024-02-16 15:36   ` Andy Shevchenko
2024-02-16 20:18     ` Leonardo Bras
2024-02-16  7:59 ` [RFC PATCH v2 3/4] irq: Introduce IRQ_HANDLED_MANY Leonardo Bras
2024-02-19  9:59   ` Thomas Gleixner
2024-02-19 11:03     ` Thomas Gleixner
2024-02-21  5:39       ` Leonardo Bras
2024-02-21 15:41         ` Thomas Gleixner
2024-02-21 17:04           ` Thomas Gleixner
2024-02-23  4:52             ` Leonardo Bras
2024-02-23  4:37           ` Leonardo Bras
2024-02-23  7:33             ` Thomas Gleixner
2024-11-14  3:40             ` Leonardo Bras [this message]
2024-11-14  7:50               ` Andy Shevchenko
2024-11-19  1:15                 ` Leonardo Bras
2024-11-19 10:06                   ` Andy Shevchenko
2024-12-02 22:53                   ` Thomas Gleixner
2024-02-16  7:59 ` [RFC PATCH v2 4/4] tty/serial8250: Make use of IRQ_HANDLED_MANY interface Leonardo Bras
2024-02-16 10:12   ` Ilpo Järvinen
2024-02-16 19:58     ` Leonardo Bras

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZzVxIfb5KpL97P4Q@LeoBras \
    --to=leobras@redhat.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=florian.fainelli@broadcom.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=jirislaby@kernel.org \
    --cc=john.ogness@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-serial@vger.kernel.org \
    --cc=sdonthineni@nvidia.com \
    --cc=tglx@linutronix.de \
    --cc=tony@atomide.com \
    --cc=u.kleine-koenig@pengutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).