public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Nipun Gupta <nipun.gupta@amd.com>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"maz@kernel.org" <maz@kernel.org>, "jgg@ziepe.ca" <jgg@ziepe.ca>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Cc: "git (AMD-Xilinx)" <git@amd.com>,
	"Anand, Harpreet" <harpreet.anand@amd.com>,
	"Jansen Van Vuuren, Pieter" <pieter.jansen-van-vuuren@amd.com>,
	"Agarwal, Nikhil" <nikhil.agarwal@amd.com>,
	"Simek, Michal" <michal.simek@amd.com>,
	"Gangurde, Abhijit" <abhijit.gangurde@amd.com>,
	"Cascon, Pablo" <pablo.cascon@amd.com>,
	Jason Gunthorpe <jgg@nvidia.com>
Subject: Re: [PATCH] cdx: add MSI support for CDX bus
Date: Thu, 11 May 2023 01:52:28 +0200	[thread overview]
Message-ID: <878rdveocj.ffs@tglx> (raw)
In-Reply-To: <87ednnes6o.ffs@tglx>

Nipun!

On Thu, May 11 2023 at 00:29, Thomas Gleixner wrote:
> On Wed, May 10 2023 at 19:34, Nipun Gupta wrote:
> #2) That's actually the worse part of it and completely broken versus
>     device setup
>
>     probe()
>       cdx_msi_domain_alloc_irqs()
>       ...
>       request_irq() {
>         ...
>         irq_activate()
>           irq_chip_write_msi_msg()
>             ...
>             queue_work()
>           ...
>       }
>
>       enable_irq_in_device()
>       
>         <- device raises interrupt and eventually uses an uninitialized
>            MSI message because the scheduled work has not yet completed.
>
>     That's going to be a nightmare to debug and it's going to happen
>     once in a blue moon out in the field.

Here is another failure scenario:

cpu_down()
  // No scheduling possible as this happens in stomp_machine() context
  __cpu_disable()
    // Point of no return
    set_cpu_online(cpu, false);
    irq_migrate_all_off_this_cpu()
      ...
      set_affinity()

That works with your current implementation by some definition of works
because you schedule the affinity change async.

But what makes sure that this takes effect _before_ the CPU goes into
lala land and cannot respond to an interrupt anymore?

Nothing, other than pure luck, which is not really a solid engineering
principle.

While the regular operations can be fixed via the bus_lock mechanics,
this one falls flat on its nose because in that context this can't
schedule anymore so acquiring a mutex or waiting for some async muck to
complete is a no-no.

Not the end of the world though. There is a way to handle that
gracefully and we need that for the PCI/IMS implementations which will
do that via command queues too.

irq_migrate_all_off_this_cpu() is invoked from stomp_machine() context
with interrupts disabled because the current x86 interrupt management
hardware trainwreck requires it at least in the case that interrupt
remapping is not available.

Of course that got copied to all other architectures... Whether they
really require it or not from a hardware POV is a different
story. Though they all rely on the stomp_machine() context which
prevents concurrent modifications.

So from a historical POV all of this makes sense to some extent, but
that does not prevent us to fix this and make it an two stage process
which can actually handle both worlds.

I'm way too tired to think that through, but you get the idea and you
are welcome to beat me to it.

Thanks,

        tglx

  reply	other threads:[~2023-05-10 23:52 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-08 14:09 [PATCH] cdx: add MSI support for CDX bus Nipun Gupta
2023-05-09  8:01 ` Thomas Gleixner
2023-05-09 11:06   ` Gupta, Nipun
2023-05-09 22:01     ` Thomas Gleixner
2023-05-10 14:04       ` Nipun Gupta
2023-05-10 22:29         ` Thomas Gleixner
2023-05-10 23:52           ` Thomas Gleixner [this message]
2023-05-12 14:20           ` Nipun Gupta
2023-05-12 18:15             ` Thomas Gleixner
2023-05-15 13:09               ` Nipun Gupta
2023-05-15 16:46                 ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878rdveocj.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=abhijit.gangurde@amd.com \
    --cc=git@amd.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=harpreet.anand@amd.com \
    --cc=jgg@nvidia.com \
    --cc=jgg@ziepe.ca \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=michal.simek@amd.com \
    --cc=nikhil.agarwal@amd.com \
    --cc=nipun.gupta@amd.com \
    --cc=pablo.cascon@amd.com \
    --cc=pieter.jansen-van-vuuren@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox