From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [PATCH v3 1/2] IOMMU/spinlock: Fix a bug found in AMD IOMMU initialization Date: Fri, 11 Mar 2016 11:35:37 +0100 Message-ID: <1457692537.3102.563.camel@citrix.com> References: <1457529455-38314-1-git-send-email-quan.xu@intel.com> <1457529455-38314-2-git-send-email-quan.xu@intel.com> <945CA011AD5F084CBEA3E851C0AB28894B861913@SHSMSX101.ccr.corp.intel.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============4330058773822853301==" Return-path: In-Reply-To: <945CA011AD5F084CBEA3E851C0AB28894B861913@SHSMSX101.ccr.corp.intel.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: "Xu, Quan" , Meng Xu Cc: Suravee Suthikulpanit , Jan Beulich , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org --===============4330058773822853301== Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-PrCiXIjrxczy0HkvUluF" --=-PrCiXIjrxczy0HkvUluF Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, 2016-03-11 at 06:54 +0000, Xu, Quan wrote: > On March 11, 2016 11:25am, wrote: > >=20 > > On Wed, Mar 9, 2016 at 8:17 AM, Quan Xu wrote: > > >=20 > > > pcidevs_lock should be held with interrupt enabled. However there > > > remains an exception in AMD IOMMU code, where the lock is > > > acquired > > > with interrupt disabled. This inconsistency might lead to > > > deadlock. > > Why will this inconsistency lead to deadlock? > > I understand the difference between spin_lock_irqsave(), which > > disable interrupt, > > and spin_lock(), which allows interrupt, however, I didn't get why > > misuse the > > spin_lock_irqsave() at the place of spin_lock() could potentially > > lead to a > > deadlock? >=C2=A0 > =C2=A01).As Jan mentioned, The implication from disabling interrupts whil= e > acquiring a lock is that the lock is also being acquired by some > interrupt handler. > =C2=A0 If you mix acquire types, the one not disabling interrupts is pron= e > to be interrupted, and the interrupt trying to get hold of the lock > the same CPU already owns. >=20 The key issue is whether or not a lock can be acquired from IRQ context (i.e., in an interrupt handler, or a function called by that, as a result of an interrupt occurring). For lock that can, IRQ disabling variants must be used *everywhere* the lock is taken (so, e.g., not only when it is actually taken from IRQ context, just *always*!). If that rule is not honored, we are ok if the lock is taken on CPUA, and the interrupt handled on CPUB: =C2=A0 CPUA =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0CPUB =C2=A0 . =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 . =C2=A0 . =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 . =C2=A0 spin_lock(l) =C2=A0 =C2=A0 =C2=A0. =C2=A0 . =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 . =C2=A0 . =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 . <-- Inte= rrupt! =C2=A0 . =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0. =C2=A0 . =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 spin_lock(l); //spins on l =C2=A0 . =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 x =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 //spins on l =C2=A0 . =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 x =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 //spins on l =C2=A0 spin_unlock(l) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0. =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 //takes l =C2=A0 . =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 . =C2=A0 . =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 spin_unlock(l); =C2=A0 . =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 . <-- . =C2=A0 . =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 . but the following can happen, if the interrupt is delivered to the CPU that has already taken the lock: =C2=A0 =C2=A0 =C2=A0CPU =C2=A0 =C2=A0 =C2=A0. =C2=A0 =C2=A0 =C2=A0. [1] =C2=A0spin_lock(l); =C2=A0 =C2=A0 =C2=A0 //takes l =C2=A0 =C2=A0 =C2=A0. =C2=A0 =C2=A0 =C2=A0. <-- Interrupt! =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0. =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0spin_lock(l) // CPU deadlocks If, in the latter case, spin_lock_irq(l) were used at [1], things would have been fine, as the interrupt wouldn't have occurred until l weren't released: =C2=A0 CPU =C2=A0 . =C2=A0 . =C2=A0 spin_lock_irq(l) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0//takes l =C2=A0 . =C2=A0 . <---------------- Interrupt! =C2=A0 . =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 - = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 //IRQs are d= isabled =C2=A0 . =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 - = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 //IRQs are d= isabled =C2=A0 . =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 - = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 //IRQs are d= isabled =C2=A0 spin_unlock_irq(l) =C2=A0. =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 //IRQs re-hanbled =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 spin_lock_irq(l); =C2=A0 //takes l =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 . =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 . =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 spin_unlock_irq(l); =C2=A0. <----------------- . =C2=A0. Note that whether spin_lock_irq() or spin_lock_irqsave() should be used, is completely independent from this, and it must be decided according to whether IRQs are disabled already or not when taking the lock. And (quite obviously, but since wre're here) it is fine to mix spin_lock_irq() and spin_lock_irqsave(), as they both disable interrupts, which is what matters. So, now, if we know for sure that a lock is _never_ever_ever_ taken from interrupt context, can we mix spin_lock() and spin_lock_irq() on it (for whatever reason)? Well, as far as the above reasoning is concerned, yes. In fact, the deadlock arises because IRQs interrupt asynchronously what the CPU is doing, and that can happen when the CPU has taken the lock already. But if the 'asynchronous' part goes away, we really don't care whether a lock is take at time t1 with IRQ enabled, and at time t2 with IRQ disabled, don't you think? Well, here it is where what the comment inside check_lock() describes comes into play. As quoted by Qaun already: > =C2=A02). Comment inside check_lock(),=C2=A0 > we partition locks into IRQ-safe (always held with IRQs disabled) and > IRQ-unsafe (always held with IRQs enabled) types. The convention for > every lock must be consistently observed else we can deadlock in > IRQ-context rendezvous functions (__a rendezvous which gets every CPU > into IRQ context before any CPU is released from the rendezvous__). > If we can mix IRQ-disabled and IRQ-enabled callers, the following can > happen: > =C2=A0* Lock is held by CPU A, with IRQs enabled > =C2=A0* CPU B is spinning on same lock, with IRQs disabled > =C2=A0* Rendezvous starts -- CPU A takes interrupt and enters rendezbous > spin > =C2=A0* DEADLOCK -- CPU B will never enter rendezvous, CPU A will never > exit > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0the rendezvous, and will hence never release the lock. >=20 > To guard against this subtle bug we latch the IRQ safety of every > spinlock in the system, on first use. >=20 This is a very good safety measure, but it can be quite a restriction in some (luckily limited) cases. And that's why it is possible -- although absolutely discouraged-- to relax it. See, for an example of this, the comment in start_secondary(), in=C2=A0xen/arch/x86/smpboot.c: =C2=A0 =C2=A0 ... =C2=A0 =C2=A0 /* =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* Just as during early bootstrap, it is conve= nient here to disable =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* spinlock checking while we have IRQs disabl= ed. This allows us to =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* acquire IRQ-unsafe locks when it would othe= rwise be disallowed. =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*=C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* It is safe because the race we are usually = trying to avoid involves =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* a group of CPUs rendezvousing in an IPI han= dler, where one cannot =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* join because it is spinning with IRQs disab= led waiting to acquire a =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* lock held by another in the rendezvous grou= p (the lock must be an =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* IRQ-unsafe lock since the CPU took the IPI = after acquiring it, and =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* hence had IRQs enabled). This is a deadlock= scenario. =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*=C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* However, no CPU can be involved in rendezvo= us until it is online, =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* hence no such group can be waiting for this= CPU until it is =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* visible in cpu_online_map. Hence such a dea= dlock is not possible. =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*/ =C2=A0=C2=A0=C2=A0=C2=A0spin_debug_disable(); ... Just FTR, at least as far as I can remember, Linux does not enforce anything like that. Hope this clarifies things. Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-PrCiXIjrxczy0HkvUluF Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlbin3oACgkQk4XaBE3IOsSJTQCggwEpCP5OXPlBEzUB4hF5nIA7 pZ0AnReH747kshVvI+r95AuiqHBAeBLO =70/s -----END PGP SIGNATURE----- --=-PrCiXIjrxczy0HkvUluF-- --===============4330058773822853301== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwOi8vbGlzdHMueGVuLm9y Zy94ZW4tZGV2ZWwK --===============4330058773822853301==--