Re: [PATCH v3 6/6] pid: drop irq disablement around pidmap

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH v3 6/6] pid: drop irq disablement around pidmap_lock
       [not found]         ` <Z56ZZpmAbRCIeI7D@casper.infradead.org>
@ 2025-02-02 13:55           ` David Laight
  2025-02-02 19:34             ` Mateusz Guzik
  0 siblings, 1 reply; 4+ messages in thread
From: David Laight @ 2025-02-02 13:55 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Mateusz Guzik, ebiederm, oleg, brauner, akpm, linux-mm,
	linux-kernel, netdev

On Sat, 1 Feb 2025 22:00:06 +0000
Matthew Wilcox <willy@infradead.org> wrote:

> On Sat, Feb 01, 2025 at 09:51:05PM +0000, David Laight wrote:
> > I'm not sure what you mean.
> > Disabling interrupts isn't as cheap as it ought to be, but probably isn't
> > that bad.  
> 
> Time it.  You'll see.

The best scheme I've seen is to just increment a per-cpu value.
Let the interrupt happen, notice it isn't allowed and return with
interrupts disabled.
Then re-issue the interrupt when the count is decremented to zero.
Easy with level sensitive interrupts.
But I don't think Linux ever uses that scheme.

> > > So while this is indeed a tradeoff, as I understand the sane default
> > > is to *not* disable interrupts unless necessary.  
> > 
> > I bet to differ.  
> 
> You're wrong.  It is utterly standard to take spinlocks without
> disabling IRQs.  We do it all over the kernel.  If you think that needs
> to change, then make your case, don't throw a driveby review.
> 
> And I don't mean by arguing.  Make a change, measure the difference.

The analysis was done on some userspace code that basically does:
	for (;;) {
		pthread_mutex_enter(lock);
		item = get_head(list);
		if (!item)
			break;
		pthead_mutex_exit(lock);
		process(item);
	}
For the test there were about 10000 items on the list and 30 threads
processing it (that was the target of the tests).
The entire list needs to be processed in 10ms (RTP audio).
There was a bit more code with the mutex held, but only 100 or so
instructions.
Mostly it works fine, some threads get delayed by interrupts (etc) but
the other threads carry on working and all the items get processed.

However sometimes an interrupt happens while the mutex is held.
In that case the other 29 threads get stuck waiting for the mutex.
No progress is made until the interrupt completes and it overruns
the 10ms period.

While this is a userspace test, the same thing will happen with
spin locks in the kernel.

In userspace you can't disable interrupts, but for kernel spinlocks
you can.

The problem is likely to show up as unexpected latency affecting
code with a hot mutex that is only held for short periods while
running a lot of network traffic.
That is also latency that affects all cpu at the same time.
The interrupt itself will always cause latency to one cpu.

Note that I also had to enable RFS, threaded NAPI and move the NAPI
threads to RT priorities to avoid lost packets.
The fix was to replace the linked list with an array and use atomic
increment to get the index of the item to process.

	David

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3 6/6] pid: drop irq disablement around pidmap_lock
  2025-02-02 13:55           ` [PATCH v3 6/6] pid: drop irq disablement around pidmap_lock David Laight
@ 2025-02-02 19:34             ` Mateusz Guzik
  2025-02-02 20:44               ` David Laight
  0 siblings, 1 reply; 4+ messages in thread
From: Mateusz Guzik @ 2025-02-02 19:34 UTC (permalink / raw)
  To: David Laight
  Cc: Matthew Wilcox, ebiederm, oleg, brauner, akpm, linux-mm,
	linux-kernel, netdev

On Sun, Feb 2, 2025 at 2:55 PM David Laight
<david.laight.linux@gmail.com> wrote:
>
> On Sat, 1 Feb 2025 22:00:06 +0000
> Matthew Wilcox <willy@infradead.org> wrote:
>
> > On Sat, Feb 01, 2025 at 09:51:05PM +0000, David Laight wrote:
> > > I'm not sure what you mean.
> > > Disabling interrupts isn't as cheap as it ought to be, but probably isn't
> > > that bad.
> >
> > Time it.  You'll see.
>
> The best scheme I've seen is to just increment a per-cpu value.
> Let the interrupt happen, notice it isn't allowed and return with
> interrupts disabled.
> Then re-issue the interrupt when the count is decremented to zero.
> Easy with level sensitive interrupts.
> But I don't think Linux ever uses that scheme.
>

I presume you are talking about the splhigh/splx set of primivitives
from Unix kernels.

While "entering" is indeed cheap, undoing the work still needs to be
atomic vs interrupts.

I see NetBSD uses local cmpxchg8b on the interrupt level and interrupt
mask, while the rest takes the irq trip.

The NetBSD solution is still going to be visibly slower than not
messing with any of it as spin_unlock on amd64 is merely a store of 0
and cmpxchg even without the lock prefix costs several clocks.

Maybe there is other hackery which could be done, but see below.

>
> > > > So while this is indeed a tradeoff, as I understand the sane default
> > > > is to *not* disable interrupts unless necessary.
> > >
> > > I bet to differ.
> >
> > You're wrong.  It is utterly standard to take spinlocks without
> > disabling IRQs.  We do it all over the kernel.  If you think that needs
> > to change, then make your case, don't throw a driveby review.
> >
> > And I don't mean by arguing.  Make a change, measure the difference.
>
> The analysis was done on some userspace code that basically does:
>         for (;;) {
>                 pthread_mutex_enter(lock);
>                 item = get_head(list);
>                 if (!item)
>                         break;
>                 pthead_mutex_exit(lock);
>                 process(item);
>         }
> For the test there were about 10000 items on the list and 30 threads
> processing it (that was the target of the tests).
> The entire list needs to be processed in 10ms (RTP audio).
> There was a bit more code with the mutex held, but only 100 or so
> instructions.
> Mostly it works fine, some threads get delayed by interrupts (etc) but
> the other threads carry on working and all the items get processed.
>
> However sometimes an interrupt happens while the mutex is held.
> In that case the other 29 threads get stuck waiting for the mutex.
> No progress is made until the interrupt completes and it overruns
> the 10ms period.
>
> While this is a userspace test, the same thing will happen with
> spin locks in the kernel.
>
> In userspace you can't disable interrupts, but for kernel spinlocks
> you can.
>
> The problem is likely to show up as unexpected latency affecting
> code with a hot mutex that is only held for short periods while
> running a lot of network traffic.
> That is also latency that affects all cpu at the same time.
> The interrupt itself will always cause latency to one cpu.
>

Nobody is denying there is potential that lock hold time will get
significantly extended if you get unlucky enough vs interrupts. It is
questioned whether defaulting to irqs off around lock-protected areas
is the right call.

As I noted in my previous e-mail the spin_lock_irq stuff disables
interrupts upfront and does not touch them afterwards even when
waiting for the lock to become free. Patching that up with queued
locks may be non-trivial, if at all possible. Thus contention on
irq-disabled locks *will* add latency to their handling unless this
gets addressed. Note maintaining forward progress guarantee in the
locking mechanism is non-negotiable, so punting to TAS or similar
unfair locks does not cut it.

This is on top of having to solve the overhead problem for taking the
trips (see earlier in the e-mail).

I would argue if the network stuff specifically is known to add
visible latency, then perhaps that's something to investigate.

Anyhow, as Willy said, you are welcome to code this up and demonstrate
it is better overall.

-- 
Mateusz Guzik <mjguzik gmail.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3 6/6] pid: drop irq disablement around pidmap_lock
  2025-02-02 19:34             ` Mateusz Guzik
@ 2025-02-02 20:44               ` David Laight
  2025-02-02 22:06                 ` Matthew Wilcox
  0 siblings, 1 reply; 4+ messages in thread
From: David Laight @ 2025-02-02 20:44 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: Matthew Wilcox, ebiederm, oleg, brauner, akpm, linux-mm,
	linux-kernel, netdev

On Sun, 2 Feb 2025 20:34:48 +0100
Mateusz Guzik <mjguzik@gmail.com> wrote:

> On Sun, Feb 2, 2025 at 2:55 PM David Laight
> <david.laight.linux@gmail.com> wrote:
> >
> > On Sat, 1 Feb 2025 22:00:06 +0000
> > Matthew Wilcox <willy@infradead.org> wrote:
> >  
> > > On Sat, Feb 01, 2025 at 09:51:05PM +0000, David Laight wrote:  
> > > > I'm not sure what you mean.
> > > > Disabling interrupts isn't as cheap as it ought to be, but probably isn't
> > > > that bad.  
> > >
> > > Time it.  You'll see.  
> >
> > The best scheme I've seen is to just increment a per-cpu value.
> > Let the interrupt happen, notice it isn't allowed and return with
> > interrupts disabled.
> > Then re-issue the interrupt when the count is decremented to zero.
> > Easy with level sensitive interrupts.
> > But I don't think Linux ever uses that scheme.
> >  
> 
> I presume you are talking about the splhigh/splx set of primivitives
> from Unix kernels.
> 
> While "entering" is indeed cheap, undoing the work still needs to be
> atomic vs interrupts.
> 
> I see NetBSD uses local cmpxchg8b on the interrupt level and interrupt
> mask, while the rest takes the irq trip.
> 
> The NetBSD solution is still going to be visibly slower than not
> messing with any of it as spin_unlock on amd64 is merely a store of 0
> and cmpxchg even without the lock prefix costs several clocks.
> 
> Maybe there is other hackery which could be done, but see below.

I was thinking it might be possible to merge an 'interrupts disabled' count
with the existing 'pre-emption disabled' count.
IIRC (on x86 at least) this is just a per-cpu variabled accessed from %fs/%gs.
So you add one to disable pre-emption and (say) 1<<16 to disable interrupts.
If an interrupt happens while the count is 'big' the count value is changed
so the last decrement of 1<<16 will set carry (or overflow), and a return
from interrupt is done that leaves interrupts disabled (traditionally easy).
The interrupt enable call just subtracts the 1<<16 and checks for carry (or
overflow), if not set all is fine, it set it needs to call something to
re-issue the interrupt - that is probably the hard bit.

> 
> >  
> > > > > So while this is indeed a tradeoff, as I understand the sane default
> > > > > is to *not* disable interrupts unless necessary.  
> > > >
> > > > I bet to differ.  
> > >
> > > You're wrong.  It is utterly standard to take spinlocks without
> > > disabling IRQs.  We do it all over the kernel.  If you think that needs
> > > to change, then make your case, don't throw a driveby review.
> > >
> > > And I don't mean by arguing.  Make a change, measure the difference.  
> >
> > The analysis was done on some userspace code that basically does:
> >         for (;;) {
> >                 pthread_mutex_enter(lock);
> >                 item = get_head(list);
> >                 if (!item)
> >                         break;
> >                 pthead_mutex_exit(lock);
> >                 process(item);
> >         }
> > For the test there were about 10000 items on the list and 30 threads
> > processing it (that was the target of the tests).
> > The entire list needs to be processed in 10ms (RTP audio).
> > There was a bit more code with the mutex held, but only 100 or so
> > instructions.
> > Mostly it works fine, some threads get delayed by interrupts (etc) but
> > the other threads carry on working and all the items get processed.
> >
> > However sometimes an interrupt happens while the mutex is held.
> > In that case the other 29 threads get stuck waiting for the mutex.
> > No progress is made until the interrupt completes and it overruns
> > the 10ms period.
> >
> > While this is a userspace test, the same thing will happen with
> > spin locks in the kernel.
> >
> > In userspace you can't disable interrupts, but for kernel spinlocks
> > you can.
> >
> > The problem is likely to show up as unexpected latency affecting
> > code with a hot mutex that is only held for short periods while
> > running a lot of network traffic.
> > That is also latency that affects all cpu at the same time.
> > The interrupt itself will always cause latency to one cpu.
> >  
> 
> Nobody is denying there is potential that lock hold time will get
> significantly extended if you get unlucky enough vs interrupts. It is
> questioned whether defaulting to irqs off around lock-protected areas
> is the right call.

I really commented because you were changing one lock which could
easily be 'hot' enough for there to be side effects, without even
a comment about any pitfalls.

	David

> 
> As I noted in my previous e-mail the spin_lock_irq stuff disables
> interrupts upfront and does not touch them afterwards even when
> waiting for the lock to become free. Patching that up with queued
> locks may be non-trivial, if at all possible. Thus contention on
> irq-disabled locks *will* add latency to their handling unless this
> gets addressed. Note maintaining forward progress guarantee in the
> locking mechanism is non-negotiable, so punting to TAS or similar
> unfair locks does not cut it.
> 
> This is on top of having to solve the overhead problem for taking the
> trips (see earlier in the e-mail).
> 
> I would argue if the network stuff specifically is known to add
> visible latency, then perhaps that's something to investigate.
> 
> Anyhow, as Willy said, you are welcome to code this up and demonstrate
> it is better overall.
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3 6/6] pid: drop irq disablement around pidmap_lock
  2025-02-02 20:44               ` David Laight
@ 2025-02-02 22:06                 ` Matthew Wilcox
  0 siblings, 0 replies; 4+ messages in thread
From: Matthew Wilcox @ 2025-02-02 22:06 UTC (permalink / raw)
  To: David Laight
  Cc: Mateusz Guzik, ebiederm, oleg, brauner, akpm, linux-mm,
	linux-kernel, netdev

On Sun, Feb 02, 2025 at 08:44:49PM +0000, David Laight wrote:
> I really commented because you were changing one lock which could
> easily be 'hot' enough for there to be side effects, without even
> a comment about any pitfalls.

No such comment is needed.  This is standard practice.  You're the one
arguing for changing stuff.  Have a different conversation separate from
this patch.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-02-02 22:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20250201163106.28912-1-mjguzik@gmail.com>
     [not found] ` <20250201163106.28912-7-mjguzik@gmail.com>
     [not found]   ` <20250201181933.07a3e7e2@pumpkin>
     [not found]     ` <CAGudoHFHzEQhkaJCB3z6qCfDtSRq+zZew3fDkAKG-AEjpMq8Nw@mail.gmail.com>
     [not found]       ` <20250201215105.55c0319a@pumpkin>
     [not found]         ` <Z56ZZpmAbRCIeI7D@casper.infradead.org>
2025-02-02 13:55           ` [PATCH v3 6/6] pid: drop irq disablement around pidmap_lock David Laight
2025-02-02 19:34             ` Mateusz Guzik
2025-02-02 20:44               ` David Laight
2025-02-02 22:06                 ` Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).