Re: [PATCH 7/9] virtio-pci: harden INTX interrupts

From: Jason Wang <jasowang@redhat.com>
To: Thomas Gleixner <tglx@linutronix.de>, mst@redhat.com
Cc: "Paul E. McKenney" <paulmck@kernel.org>,
	david.kaplan@amd.com, konrad.wilk@oracle.com,
	Peter Zijlstra <peterz@infradead.org>,
	Boqun Feng <boqun.feng@gmail.com>,
	f.hetzelt@tu-berlin.de, linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	Will Deacon <will@kernel.org>
Subject: Re: [PATCH 7/9] virtio-pci: harden INTX interrupts
Date: Tue, 14 Sep 2021 10:50:06 +0800	[thread overview]
Message-ID: <dae944d8-a658-cb52-2c4b-076c6a41c458@redhat.com> (raw)
In-Reply-To: <875yv4f99j.ffs@tglx>

在 2021/9/14 上午5:36, Thomas Gleixner 写道:
> Jason,
>
> On Mon, Sep 13 2021 at 13:53, Jason Wang wrote:
>> This patch tries to make sure the virtio interrupt handler for INTX
>> won't be called after a reset and before virtio_device_ready(). We
>> can't use IRQF_NO_AUTOEN since we're using shared interrupt
>> (IRQF_SHARED). So this patch tracks the INTX enabling status in a new
>> intx_soft_enabled variable and toggle it during in
>> vp_disable/enable_vectors(). The INTX interrupt handler will check
>> intx_soft_enabled before processing the actual interrupt.
> Ah, there it is :)
>
> Cc'ed our memory ordering wizards as I might be wrong as usual.
>
>> -	if (vp_dev->intx_enabled)
>> +	if (vp_dev->intx_enabled) {
>> +		vp_dev->intx_soft_enabled = false;
>> +		/* ensure the vp_interrupt see this intx_soft_enabled value */
>> +		smp_wmb();
>>   		synchronize_irq(vp_dev->pci_dev->irq);
> As you are synchronizing the interrupt here anyway, what is the value of
> the barrier?
>
>   		vp_dev->intx_soft_enabled = false;
>    		synchronize_irq(vp_dev->pci_dev->irq);
>
> is sufficient because of:
>
> synchronize_irq()
>     do {
>     	raw_spin_lock(desc->lock);
>          in_progress = check_inprogress(desc);
>     	raw_spin_unlock(desc->lock);
>     } while (in_progress);
>
> raw_spin_lock() has ACQUIRE semantics so the store to intx_soft_enabled
> can complete after lock has been acquired which is uninteresting.
>
> raw_spin_unlock() has RELEASE semantics so the store to intx_soft_enabled
> has to be completed before the unlock completes.
>
> So if the interrupt is on the flight then it might or might not see
> intx_soft_enabled == false. But that's true for your barrier construct
> as well.
>
> The important part is that any interrupt for this line arriving after
> synchronize_irq() has completed is guaranteed to see intx_soft_enabled
> == false.
>
> That is what you want to achieve, right?

Right.

>
>>   	for (i = 0; i < vp_dev->msix_vectors; ++i)
>>   		disable_irq(pci_irq_vector(vp_dev->pci_dev, i));
>> @@ -43,8 +47,12 @@ void vp_enable_vectors(struct virtio_device *vdev)
>>   	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>>   	int i;
>>   
>> -	if (vp_dev->intx_enabled)
>> +	if (vp_dev->intx_enabled) {
>> +		vp_dev->intx_soft_enabled = true;
>> +		/* ensure the vp_interrupt see this intx_soft_enabled value */
>> +		smp_wmb();
> For the enable case the barrier is pointless vs. intx_soft_enabled
>
> CPU 0                                           CPU 1
>
> interrupt                                       vp_enable_vectors()
>    vp_interrupt()
>      if (!vp_dev->intx_soft_enabled)
>         return IRQ_NONE;
>                                                    vp_dev->intx_soft_enabled = true;
>
> IOW, the concurrent interrupt might or might not see the store. That's
> not a problem for legacy PCI interrupts. If it did not see the store and
> the interrupt originated from that device then it will account it as one
> spurious interrupt which will get raised again because those interrupts
> are level triggered and nothing acknowledged it at the device level.

I see.

>
> Now, what's more interesting is that is has to be guaranteed that the
> interrupt which observes
>
>          vp_dev->intx_soft_enabled == true
>
> also observes all preceeding stores, i.e. those which make the interrupt
> handler capable of handling the interrupt.
>
> That's the real problem and for that your barrier is at the wrong place
> because you want to make sure that those stores are visible before the
> store to intx_soft_enabled becomes visible, i.e. this should be:
>
>
>          /* Ensure that all preceeding stores are visible before intx_soft_enabled */
> 	smp_wmb();
> 	vp_dev->intx_soft_enabled = true;

Yes, I see.

>
> Now Micheal is not really enthusiatic about the barrier in the interrupt
> handler hotpath, which is understandable.
>
> As the device startup is not really happening often it's sensible to do
> the following
>
>          disable_irq();
>          vp_dev->intx_soft_enabled = true;
>          enable_irq();
>
> because:
>
>          disable_irq()
>            synchronize_irq()
>
> acts as a barrier for the preceeding stores:
>
>          disable_irq()
>     	  raw_spin_lock(desc->lock);
>            __disable_irq(desc);
>     	  raw_spin_unlock(desc->lock);
>
>            synchronize_irq()
>              do {
>     	      raw_spin_lock(desc->lock);
>                in_progress = check_inprogress(desc);
>     	      raw_spin_unlock(desc->lock);
>              } while (in_progress);
>
>          intx_soft_enabled = true;
>
>          enable_irq();
>
> In this case synchronize_irq() prevents the subsequent store to
> intx_soft_enabled to leak into the __disable_irq(desc) section which in
> turn makes it impossible for an interrupt handler to observe
> intx_soft_enabled == true before the prerequisites which preceed the
> call to disable_irq() are visible.
>
> Of course the memory ordering wizards might disagree, but if they do,
> then we have a massive chase of ordering problems vs. similar constructs
> all over the tree ahead of us.
>
>  From the interrupt perspective the sequence:
>
>          disable_irq();
>          vp_dev->intx_soft_enabled = true;
>          enable_irq();
>
> is perfectly fine as well. Any interrupt arriving during the disabled
> section will be reraised on enable_irq() in hardware because it's a
> level interrupt. Any resulting failure is either a hardware or a
> hypervisor bug.

Thanks a lot for the detail clarifications. Will switch to use 
disable_irq()/enable_irq() if no objection from memory ordering wizards.

>
> Thanks,
>
>          tglx
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization