From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steven Rostedt Subject: [PATCH RT 03/12 rc3] intel-iommu: Fix AB-BA lockdep report Date: Mon, 05 Dec 2011 18:00:49 -0500 Message-ID: <20111205230252.092820216@goodmis.org> References: <20111205230046.736851081@goodmis.org> Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature"; boundary="00GvhwF7k39YY" Cc: Thomas Gleixner , Carsten Emde , John Kacur , Roland Dreier , David Woodhouse To: linux-kernel@vger.kernel.org, linux-rt-users Return-path: Received: from hrndva-omtalb.mail.rr.com ([71.74.56.122]:64097 "EHLO hrndva-omtalb.mail.rr.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932708Ab1LEXCy (ORCPT ); Mon, 5 Dec 2011 18:02:54 -0500 Content-Disposition: inline; filename=0003-intel-iommu-Fix-AB-BA-lockdep-report.patch Sender: linux-rt-users-owner@vger.kernel.org List-ID: --00GvhwF7k39YY Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable From: Roland Dreier When unbinding a device so that I could pass it through to a KVM VM, I got the lockdep report below. It looks like a legitimate lock ordering problem: - domain_context_mapping_one() takes iommu->lock and calls iommu_support_dev_iotlb(), which takes device_domain_lock (inside iommu->lock). - domain_remove_one_dev_info() starts by taking device_domain_lock then takes iommu->lock inside it (near the end of the function). So this is the classic AB-BA deadlock. It looks like a safe fix is to simply release device_domain_lock a bit earlier, since as far as I can tell, it doesn't protect any of the stuff accessed at the end of domain_remove_one_dev_info() anyway. BTW, the use of device_domain_lock looks a bit unsafe to me... it's at least not obvious to me why we aren't vulnerable to the race below: iommu_support_dev_iotlb() domain_remove_dev_info() lock device_domain_lock find info unlock device_domain_lock lock device_domain_lock find same info unlock device_domain_lock free_devinfo_mem(info) do stuff with info after it's free However I don't understand the locking here well enough to know if this is a real problem, let alone what the best fix is. Anyway here's the full lockdep output that prompted all of this: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D [ INFO: possible circular locking dependency detected ] 2.6.39.1+ #1 ------------------------------------------------------- bash/13954 is trying to acquire lock: (&(&iommu->lock)->rlock){......}, at: [] domain_rem= ove_one_dev_info+0x121/0x230 but task is already holding lock: (device_domain_lock){-.-...}, at: [] domain_remove_= one_dev_info+0x208/0x230 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (device_domain_lock){-.-...}: [] lock_acquire+0x9d/0x130 [] _raw_spin_lock_irqsave+0x55/0xa0 [] domain_context_mapping_one+0x600/0x750 [] domain_context_mapping+0x3f/0x120 [] iommu_prepare_identity_map+0x1c5/0x1e0 [] intel_iommu_init+0x88e/0xb5e [] pci_iommu_init+0x16/0x41 [] do_one_initcall+0x45/0x190 [] kernel_init+0xe3/0x168 [] kernel_thread_helper+0x4/0x10 -> #0 (&(&iommu->lock)->rlock){......}: [] __lock_acquire+0x195e/0x1e10 [] lock_acquire+0x9d/0x130 [] _raw_spin_lock_irqsave+0x55/0xa0 [] domain_remove_one_dev_info+0x121/0x230 [] device_notifier+0x72/0x90 [] notifier_call_chain+0x8c/0xc0 [] __blocking_notifier_call_chain+0x78/0xb0 [] blocking_notifier_call_chain+0x16/0x20 [] __device_release_driver+0xbc/0xe0 [] device_release_driver+0x2f/0x50 [] driver_unbind+0xa3/0xc0 [] drv_attr_store+0x2c/0x30 [] sysfs_write_file+0xe6/0x170 [] vfs_write+0xce/0x190 [] sys_write+0x54/0xa0 [] system_call_fastpath+0x16/0x1b other info that might help us debug this: 6 locks held by bash/13954: #0: (&buffer->mutex){+.+.+.}, at: [] sysfs_write_f= ile+0x44/0x170 #1: (s_active#3){++++.+}, at: [] sysfs_write_file+= 0xcd/0x170 #2: (&__lockdep_no_validate__){+.+.+.}, at: [] dri= ver_unbind+0x9b/0xc0 #3: (&__lockdep_no_validate__){+.+.+.}, at: [] dev= ice_release_driver+0x27/0x50 #4: (&(&priv->bus_notifier)->rwsem){.+.+.+}, at: [= ] __blocking_notifier_call_chain+0x5f/0xb0 #5: (device_domain_lock){-.-...}, at: [] domain_re= move_one_dev_info+0x208/0x230 stack backtrace: Pid: 13954, comm: bash Not tainted 2.6.39.1+ #1 Call Trace: [] print_circular_bug+0xf7/0x100 [] __lock_acquire+0x195e/0x1e10 [] ? trace_hardirqs_off+0xd/0x10 [] ? trace_hardirqs_on_caller+0x13d/0x180 [] lock_acquire+0x9d/0x130 [] ? domain_remove_one_dev_info+0x121/0x230 [] _raw_spin_lock_irqsave+0x55/0xa0 [] ? domain_remove_one_dev_info+0x121/0x230 [] ? trace_hardirqs_off+0xd/0x10 [] domain_remove_one_dev_info+0x121/0x230 [] device_notifier+0x72/0x90 [] notifier_call_chain+0x8c/0xc0 [] __blocking_notifier_call_chain+0x78/0xb0 [] blocking_notifier_call_chain+0x16/0x20 [] __device_release_driver+0xbc/0xe0 [] device_release_driver+0x2f/0x50 [] driver_unbind+0xa3/0xc0 [] drv_attr_store+0x2c/0x30 [] sysfs_write_file+0xe6/0x170 [] vfs_write+0xce/0x190 [] sys_write+0x54/0xa0 [] system_call_fastpath+0x16/0x1b Signed-off-by: Roland Dreier Signed-off-by: David Woodhouse --- drivers/pci/intel-iommu.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c index 8c2564d..bc05a51 100644 --- a/drivers/pci/intel-iommu.c +++ b/drivers/pci/intel-iommu.c @@ -3569,6 +3569,8 @@ static void domain_remove_one_dev_info(struct dmar_do= main *domain, found =3D 1; } =20 + spin_unlock_irqrestore(&device_domain_lock, flags); + if (found =3D=3D 0) { unsigned long tmp_flags; spin_lock_irqsave(&domain->iommu_lock, tmp_flags); @@ -3585,8 +3587,6 @@ static void domain_remove_one_dev_info(struct dmar_do= main *domain, spin_unlock_irqrestore(&iommu->lock, tmp_flags); } } - - spin_unlock_irqrestore(&device_domain_lock, flags); } =20 static void vm_domain_remove_all_dev_info(struct dmar_domain *domain) --=20 1.7.7.1 --00GvhwF7k39YY Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJO3U2cAAoJEIy3vGnGbaoAblwQAIY665ywhxaW6bnLtYbqvDmq F8wesCSB0+BfkVYiVf7Sm3tER22fx2QaMl7fiqphJFoB6EfKIIXuxGJnbzvUiCLC cZcAJLPY+Efr+eLxxFFUFTUCWbXuDtJxmk4iy+YwjIPRT5/eAMXqyhBA1YvUVtw7 DchvLmAPVURVQAkZrPfKHeC81zc1nln3+Dht38ujQRguFqVkGbzbNcsjacWGsyE8 NVEf4eYym751t/6rsGTEIsS3ehZer53Ei0i+eezLfKLam315F5zR4GXaZ+6Dj6aN 2gOapV29xsDL2aw0c/ZE5FosrLsXHhaCT3jjI92qAp0Ytz9wArfTDY0RrBL0Yc4D urHmJLah2eLBo1Gg2mLrx26mP8/6ANz3Y+B3ose9/dn+dXDts0QGsfgjvLFsBMkd 6VelNSxp6efQvj/DisGqDQPUg8rHaN58Y/AhyHbG9Pzne5LNJVl22/DiNtIJYpeA 39vUGyJhkf9bo2jgVtqXw1c1W3lAkM9nzjTd8wSjbk7jlOlfXY+sVHM4q2OCNgda fty9Z+0KShbzPlXyMjwVAqoWODzskDBj38wJCyB9TStAo39XLrSZesidwYCAxfeA URivWBKk6oJrxfA6aZJbeEa+gdpP/4O+aq5gEFBjVu09LnEpcJaZoTZWEpmSZeXT kwlIbCZ5fMZkFZ/24UbU =ukc9 -----END PGP SIGNATURE----- --00GvhwF7k39YY--