linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Mathias Nyman <mathias.nyman@linux.intel.com>,
	Linux regressions mailing list <regressions@lists.linux.dev>
Cc: "Christian A. Ehrhardt" <lk@c--e.de>,
	niklas.neronin@linux.intel.com,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	Greg KH <gregkh@linuxfoundation.org>,
	linux-usb@vger.kernel.org, linux-x86_64@vger.kernel.org,
	netdev@vger.kernel.org, Randy Dunlap <rdunlap@infradead.org>,
	Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Subject: Re: This is the fourth time I've tried to find what led to the regression of outgoing network speed and each time I find the merge commit 8c94ccc7cd691472461448f98e2372c75849406c
Date: Mon, 26 Feb 2024 19:09:22 +0100	[thread overview]
Message-ID: <87r0gz9jxp.ffs@tglx> (raw)
In-Reply-To: <acc2b877-4b42-fd4d-867b-603dae95d09d@linux.intel.com>

On Mon, Feb 26 2024 at 12:54, Mathias Nyman wrote:
> On 26.2.2024 11.51, Linux regression tracking (Thorsten Leemhuis) wrote:
>>> I don't think reverting this series is a solution.
>>>
>>> This isn't really about those usb xhci patches.
>>> This is about which interrupt gets assigned to which CPU.
>> 
>> I know, but from my understanding of Linus expectations wrt to handling
>> regressions it does not matter much if a bug existed earlier or
>> somewhere else: what counts is the commit that exposed the problem.
>> 
>> But I might be wrong here. Anyway, not CCing Linus for this; but I'll
>> likely point him to this direction on Sunday in my next weekly report,
>> unless some fix comes into sight.
>> 
>>> Mikhail got unlucky when the network adapter interrupts on that system was
>>> assigned to CPU0, clearly a more "clogged" CPU, thus causing a drop in max
>>> bandwidth.
>> 
>> But maybe others will be just as "unlucky". Or is there anything to
>> believe otherwise? Maybe some aspect of the .config or local setup that
>> is most likely unique to Mikhail's setup?
>
> I believe this is a zero-sum case.
>
> Others got equally lucky due to this change.
> Their devices end up interrupting less clogged CPUs and see a similar
> performance increase.

Reverting this does not make any sense.

The kernel assigns the initial interrupt affinities to the CPUs so that
the number of interrupts is halfways balanced. That spreading algorithm
is completely agnostic of the actual usage of the interrupts. Where
e.g. the network interrupt ends up depends on the probe/enumeration
order of devices. Add another PCI-E card into the machine and it will
again look different.

There is nothing the kernel can do about it and earlier attempts to do
interrupt frequency based balancing in the kernel ended up nowhere
simply because the kernel does not have enough information about the
overall requirements. That's why the kernel leaves the affinity
configuration for user space, e.g. irqbalanced, except for true
multi-queue scenarios like NVME where the kernel binds queues and their
interrupts to specific CPUs or groups of CPUs.

Why ending up on CPU0 has this particular effect on Mikhails machine is
unclear as we don't have any information about the overall workload,
other interrupt sources on CPU0 and their frequency. That'd need to be
investigated with instrumentation and might unearth some completely
different underlying reason causing this behavior.

So I don't think this is a regression in the true sense of
regressions. It's an unfortunate coincidence and reverting the
identified commits would just paper over the real problem, if there is
actually one single source of trouble which causes the performance drop
only on CPU0.  The commits are definitely _not_ the root cause, they
happen to unearth some other issue, which might be as mundane as
e.g. that the NVME interrupt on CPU0 is competing with the network
interrupt. So don't shoot the messenger.

Thanks,

        tglx











  reply	other threads:[~2024-02-26 18:09 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CABXGCsNnUfCCYVSb_-j-a-cAdONu1r6Fe8p2OtQ5op_wskOfpw@mail.gmail.com>
     [not found] ` <Zb6D/5R8nNrxveAP@cae.in-ulm.de>
     [not found]   ` <Zb/30qOGYAH4j6Mn@cae.in-ulm.de>
     [not found]     ` <CABXGCsPu73D+JS9dpvzX78RktK2VOv_xT8vvuVaQ=B6zs2dMNQ@mail.gmail.com>
     [not found]       ` <e7b96819-edf7-1f9f-7b01-e2e805c99b33@linux.intel.com>
2024-02-06 16:12         ` This is the fourth time I’ve tried to find what led to the regression of outgoing network speed and each time I find the merge commit 8c94ccc7cd691472461448f98e2372c75849406c Mikhail Gavrilov
2024-02-07 10:40           ` Mathias Nyman
2024-02-07 11:55             ` Mikhail Gavrilov
2024-02-08  9:25               ` Mathias Nyman
2024-02-08 10:32                 ` Mikhail Gavrilov
2024-02-08 15:43                   ` Mathias Nyman
2024-02-16  6:15                     ` This is the fourth time Iâve " Linux regression tracking (Thorsten Leemhuis)
2024-02-19  9:41                     ` This is the fourth time I’ve " Mikhail Gavrilov
2024-02-20 23:19                       ` Mikhail Gavrilov
2024-02-20 23:41                         ` Randy Dunlap
2024-02-20 23:43                           ` Randy Dunlap
2024-02-21 13:44                             ` Mathias Nyman
2024-02-26  5:45                               ` This is the fourth time I've " Linux regression tracking (Thorsten Leemhuis)
2024-02-26  9:24                                 ` Mathias Nyman
2024-02-26  9:51                                   ` Linux regression tracking (Thorsten Leemhuis)
2024-02-26 10:54                                     ` Mathias Nyman
2024-02-26 18:09                                       ` Thomas Gleixner [this message]
2024-02-27 17:08                                         ` mikhail.v.gavrilov
2024-02-27 17:23                                           ` Thomas Gleixner
2024-02-27 18:03                                             ` mikhail.v.gavrilov
2024-02-29  9:41                                               ` Mikhail Gavrilov
2024-03-04 14:10                                     ` Linux regression tracking (Thorsten Leemhuis)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r0gz9jxp.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=linux-x86_64@vger.kernel.org \
    --cc=lk@c--e.de \
    --cc=mathias.nyman@linux.intel.com \
    --cc=mikhail.v.gavrilov@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=niklas.neronin@linux.intel.com \
    --cc=rdunlap@infradead.org \
    --cc=regressions@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).