All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jacob Pan <jacob.jun.pan@linux.intel.com>
To: Dimitri Sivanich <sivanich@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	David Woodhouse <dwmw2@infradead.org>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>,
	iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	Steve Wahl <steve.wahl@hpe.com>,
	Russ Anderson <russ.anderson@hpe.com>,
	jacob.jun.pan@linux.intel.com
Subject: Re: [PATCH] Allocate DMAR fault interrupts locally
Date: Thu, 21 Mar 2024 15:13:57 -0700	[thread overview]
Message-ID: <20240321151357.1d18127f@jacob-builder> (raw)
In-Reply-To: <Ze9r47riIq9ovBCY@hpe.com>

Hi Dimitri,

On Mon, 11 Mar 2024 15:38:59 -0500, Dimitri Sivanich <sivanich@hpe.com>
wrote:

> Thomas!
> 
> On Thu, Feb 29, 2024 at 11:18:37PM +0100, Thomas Gleixner wrote:
> > Dimitri!
> > 
> > On Thu, Feb 29 2024 at 14:07, Dimitri Sivanich wrote:
> >   
> > > +}
> > > +
> > > +static int __init assign_dmar_vectors(void)
> > > +{
> > > +	struct work_struct irq_remap_work;
> > > +	int nid;
> > > +
> > > +	INIT_WORK(&irq_remap_work,
> > > irq_remap_enable_fault_handling_thr);
> > > +	cpus_read_lock();
> > > +	for_each_online_node(nid) {
> > > +		/* Boot cpu dmar vectors are assigned before the
> > > rest */
> > > +		if (nid == cpu_to_node(get_boot_cpu_id()))
> > > +			continue;
> > > +		schedule_work_on(cpumask_first(cpumask_of_node(nid)),
> > > +				 &irq_remap_work);
> > > +		flush_work(&irq_remap_work);
> > > +	}
> > > +	cpus_read_unlock();
> > > +	return 0;
> > > +}
> > > +
> > > +arch_initcall(assign_dmar_vectors);  
> > 
> > Stray newline before arch_initcall(), but that's not the problem.
> > 
> > The real problems are:
> > 
> >  1) This approach only works when _ALL_ APs have been brought up during
> >     boot. With 'maxcpus=N' on the command line this will fail to enable
> >     fault handling when the APs which have not been brought up initially
> >     are onlined later on.
> > 
> >     This might be working in practice because intel_iommu_init() will
> >     enable the interrupts later on via init_dmars() unconditionally, but
> >     that's far from correct because IRQ_REMAP does not depend on
> >     INTEL_IOMMU.
> > 
> >  2) It leaves a gap where the reporting is not working between bringing
> >     up the APs during boot and this initcall. Mostly theoretical, but
> >     that does not make it more correct either.
> > 
> > What you really want is a cpu hotplug state in the CPUHP_BP_PREPARE_DYN
> > space which enables the interrupt for the node _before_ the first AP of
> > the node is brought up. That will solve the problem nicely w/o any of
> > the above issues.
> >  
> 
> Initially this sounds like a good approach.  As things currently stand,
> however, there are (at least) several problems with attempting to
> allocate interrupts on cpus that are not running yet via the existing
> dmar_set_interrupt path.
> 
> - The code relies on node_to_cpumask_map (cpumask_of_node()), which has
> been allocated, but not populated at the CPUHP_BP_PREPARE_DYN stage.
> 
> - The irq_matrix cpumaps do not indicate being online or initialized yet,
> except for the boot cpu instance, of course.
> 
> So things still revert to boot cpu allocation, until we exhaust the
> vectors.
> 
> Of course, running the dmar_set_interrupt code from a CPUHP_AP_ONLINE_DYN
> state does work (although I believe there is a concurrency issue that
> could show up with the current dmar_set_interrupt path).
> 
> So the code seems to have been designed based on the assumption that it
> will be run on an already active (though not necessarily fully onlined?)
> cpu.  To make this work, any code based on that assumption would need to
> be fixed.  Otherwise, a different approach is needed.
This may not be pretty but since DMAR fault is for unrecoverable faults,
they are rare and infrequent, should never happen on a healthy system. Can
we share one BSP vector for all DMARs? i.e. let dmar_fault() handler search
for the offending DMAR for fault reasons.


Thanks,

Jacob

  reply	other threads:[~2024-03-21 22:09 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-29 20:07 [PATCH] Allocate DMAR fault interrupts locally Dimitri Sivanich
2024-02-29 22:18 ` Thomas Gleixner
2024-03-01 19:50   ` Jacob Pan
2024-03-01 19:59     ` Thomas Gleixner
2024-03-11 20:38   ` Dimitri Sivanich
2024-03-21 22:13     ` Jacob Pan [this message]
2024-03-24 21:05       ` Thomas Gleixner
2024-03-25 18:56         ` Jacob Pan
2024-04-04  0:00         ` Jacob Pan
2024-03-24 20:05     ` Thomas Gleixner
2024-03-25 12:20       ` Dimitri Sivanich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240321151357.1d18127f@jacob-builder \
    --to=jacob.jun.pan@linux.intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux.dev \
    --cc=joro@8bytes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=russ.anderson@hpe.com \
    --cc=sivanich@hpe.com \
    --cc=steve.wahl@hpe.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.