* "Convert the AMD iommu driver to the dma-iommu api" is buggy
@ 2019-10-16 14:55 Qian Cai
2019-10-16 14:59 ` Qian Cai
0 siblings, 1 reply; 12+ messages in thread
From: Qian Cai @ 2019-10-16 14:55 UTC (permalink / raw)
To: Tom Murphy
Cc: Robin Murphy, Christoph Hellwig, Joerg Roedel, iommu,
linux-kernel
Today's linux-next generates a lot of warnings on multiple servers during boot
due to the series "iommu/amd: Convert the AMD iommu driver to the dma-iommu api"
[1]. Reverted the whole things fixed them.
[1] https://lore.kernel.org/lkml/20190908165642.22253-1-murphyt7@tcd.ie/
[ 257.785749][ T6184] BUG: sleeping function called from invalid context at
mm/page_alloc.c:4692
[ 257.794886][ T6184] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid:
6184, name: readelf
[ 257.803574][ T6184] INFO: lockdep is turned off.
[ 257.808233][ T6184] CPU: 86 PID: 6184 Comm: readelf Tainted:
G W 5.4.0-rc3-next-20191016+ #7
[ 257.818035][ T6184] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 07/10/2019
[ 257.827313][ T6184] Call Trace:
[ 257.830487][ T6184] dump_stack+0x86/0xca
[ 257.834530][ T6184] ___might_sleep.cold.92+0xd2/0x122
[ 257.839708][ T6184] __might_sleep+0x73/0xe0
[ 257.844011][ T6184] __alloc_pages_nodemask+0x442/0x720
[ 257.849274][ T6184] ? __alloc_pages_slowpath+0x18d0/0x18d0
[ 257.854886][ T6184] ? debug_lockdep_rcu_enabled+0x27/0x60
[ 257.860415][ T6184] ? lock_downgrade+0x3c0/0x3c0
[ 257.865156][ T6184] alloc_pages_current+0x9c/0x110
[ 257.870071][ T6184] __get_free_pages+0x12/0x60
[ 257.874634][ T6184] get_zeroed_page+0x16/0x20
[ 257.879112][ T6184] amd_iommu_map+0x504/0x850
[ 257.883588][ T6184] ? amd_iommu_domain_direct_map+0x60/0x60
[ 257.889286][ T6184] ? lockdep_hardirqs_on+0x16/0x2a0
[ 257.894373][ T6184] ? alloc_iova+0x189/0x210
[ 257.898765][ T6184] ? trace_hardirqs_on+0x3a/0x160
[ 257.903677][ T6184] iommu_map+0x1b3/0x4d0
[ 257.907802][ T6184] ? iommu_unmap+0xf0/0xf0
[ 257.912104][ T6184] ? alloc_iova_fast+0x258/0x3d1
[ 257.916929][ T6184] ? create_object+0x4a2/0x540
[ 257.921579][ T6184] iommu_map_sg+0x9d/0x120
[ 257.925882][ T6184] iommu_dma_map_sg+0x2c3/0x450
[ 257.930627][ T6184] scsi_dma_map+0xd7/0x160
[ 257.934936][ T6184] pqi_raid_submit_scsi_cmd_with_io_request+0x392/0x420
[smartpqi]
[ 257.942735][ T6184] ? pqi_alloc_io_request+0x127/0x140 [smartpqi]
[ 257.948962][ T6184] pqi_scsi_queue_command+0x8ab/0xe00 [smartpqi]
[ 257.955192][ T6184] ? pqi_eh_device_reset_handler+0x9c0/0x9c0 [smartpqi]
[ 257.962029][ T6184] ? sd_init_command+0xa25/0x1346 [sd_mod]
[ 257.967730][ T6184] scsi_queue_rq+0xd19/0x1360
[ 257.972298][ T6184] __blk_mq_try_issue_directly+0x295/0x3f0
[ 257.977999][ T6184] ? blk_mq_request_bypass_insert+0xd0/0xd0
[ 257.983787][ T6184] ? debug_lockdep_rcu_enabled+0x27/0x60
[ 257.989312][ T6184] blk_mq_request_issue_directly+0xb5/0x100
[ 257.995098][ T6184] ? blk_mq_flush_plug_list+0x7e0/0x7e0
[ 258.000537][ T6184] ? blk_mq_sched_insert_requests+0xd6/0x380
[ 258.006409][ T6184] ? lock_downgrade+0x3c0/0x3c0
[ 258.011147][ T6184] blk_mq_try_issue_list_directly+0xa9/0x160
[ 258.017023][ T6184] blk_mq_sched_insert_requests+0x228/0x380
[ 258.022810][ T6184] blk_mq_flush_plug_list+0x448/0x7e0
[ 258.028073][ T6184] ? blk_mq_insert_requests+0x380/0x380
[ 258.033516][ T6184] blk_flush_plug_list+0x1eb/0x230
[ 258.038515][ T6184] ? blk_insert_cloned_request+0x1b0/0x1b0
[ 258.044215][ T6184] blk_finish_plug+0x43/0x5d
[ 258.048695][ T6184] read_pages+0xf6/0x3b0
[ 258.052823][ T6184] ? read_cache_pages+0x350/0x350
[ 258.057737][ T6184] ? __page_cache_alloc+0x12c/0x230
[ 258.062826][ T6184] __do_page_cache_readahead+0x346/0x3a0
[ 258.068348][ T6184] ? read_pages+0x3b0/0x3b0
[ 258.072738][ T6184] ? lockdep_hardirqs_on+0x16/0x2a0
[ 258.077928][ T6184] ? __xfs_filemap_fault+0x167/0x4a0 [xfs]
[ 258.083625][ T6184] filemap_fault+0xa13/0xe70
[ 258.088201][ T6184] __xfs_filemap_fault+0x167/0x4a0 [xfs]
[ 258.093731][ T6184] ? kmemleak_alloc+0x57/0x90
[ 258.098397][ T6184] ? xfs_file_read_iter+0x3c0/0x3c0 [xfs]
[ 258.104009][ T6184] ? debug_check_no_locks_freed+0x2c/0xe0
[ 258.109618][ T6184] ? lockdep_init_map+0x8b/0x2b0
[ 258.114543][ T6184] xfs_filemap_fault+0x68/0x70 [xfs]
[ 258.119720][ T6184] __do_fault+0x83/0x220
[ 258.123847][ T6184] __handle_mm_fault+0xd76/0x1f40
[ 258.128757][ T6184] ? __pmd_alloc+0x280/0x280
[ 258.133231][ T6184] ? debug_lockdep_rcu_enabled+0x27/0x60
[ 258.138755][ T6184] ? handle_mm_fault+0x178/0x4c0
[ 258.143581][ T6184] ? lockdep_hardirqs_on+0x16/0x2a0
[ 258.148674][ T6184] ? __do_page_fault+0x29c/0x640
[ 258.153501][ T6184] handle_mm_fault+0x205/0x4c0
[ 258.158151][ T6184] __do_page_fault+0x29c/0x640
[ 258.162800][ T6184] do_page_fault+0x50/0x37f
[ 258.167189][ T6184] page_fault+0x34/0x40
[ 258.171228][ T6184] RIP: 0010:__clear_user+0x3b/0x70
[ 183.553150] BUG: sleeping function called from invalid context at
drivers/iommu/iommu.c:1950
[ 183.562306] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 1486,
name: kworker/0:3
[ 183.571450] 5 locks held by kworker/0:3/1486:
[ 183.576510] #0: 44ff0008000ce128 ((wq_completion)events){+.+.}, at:
process_one_work+0x25c/0x948
[ 183.586110] #1: 43ff00081fb2fcf8 ((work_completion)(&wfc.work)){+.+.}, at:
process_one_work+0x280/0x948
[ 183.596310] #2: ffff000a2c661a08 (&dev->intf_state_mutex){+.+.}, at:
mlx5_load_one+0x68/0x12e0 [mlx5_core]
[ 183.606916] #3: ffff9000127e4560 (irq_domain_mutex){+.+.}, at:
__irq_domain_alloc_irqs+0x1f8/0x430
[ 183.616683] #4: 02ff0095ca0ed8f0 (&(&cookie->msi_lock)->rlock){....}, at:
iommu_dma_prepare_msi+0x70/0x210
[ 183.627146] irq event stamp: 378872
[ 183.631345] hardirqs last enabled at (378871): [<ffff9000109d0230>]
_raw_write_unlock_irqrestore+0x4c/0x84
[ 183.641791] hardirqs last disabled at (378872): [<ffff9000109cf7a0>]
_raw_spin_lock_irqsave+0x38/0x9c
[ 183.651717] softirqs last enabled at (377854): [<ffff9000100824f4>]
__do_softirq+0x864/0x900
[ 183.660951] softirqs last disabled at (377841): [<ffff900010118768>]
irq_exit+0x1c8/0x238
[ 183.669836] CPU: 0 PID: 1486 Comm: kworker/0:3 Tainted:
G W L 5.4.0-rc3-next-20191016+ #8
[ 183.679845] Hardware name: HPE Apollo 70 /C01_APACHE_MB ,
BIOS L50_5.13_1.11 06/18/2019
[ 183.690292] Workqueue: events work_for_cpu_fn
[ 183.695357] Call trace:
[ 183.698510] dump_backtrace+0x0/0x248
[ 183.702878] show_stack+0x20/0x2c
[ 183.706900] dump_stack+0xc8/0x130
[ 183.711009] ___might_sleep+0x314/0x328
[ 183.715551] __might_sleep+0x7c/0xe0
[ 183.719832] iommu_map+0x40/0x70
[ 183.723766] iommu_dma_prepare_msi+0x16c/0x210
[ 183.728916] its_irq_domain_alloc+0x100/0x254
[ 183.733979] irq_domain_alloc_irqs_parent+0x74/0x90
[ 183.739562] msi_domain_alloc+0xa0/0x170
[ 183.744190] __irq_domain_alloc_irqs+0x228/0x430
[ 183.749512] msi_domain_alloc_irqs+0x130/0x548
[ 183.754663] pci_msi_setup_msi_irqs+0x64/0x74
[ 183.759726] __pci_enable_msix_range+0x52c/0x878
[ 183.765049] pci_alloc_irq_vectors_affinity+0x94/0x168
[ 183.771028] mlx5_irq_table_create+0x178/0x748 [mlx5_core]
[ 183.777353] mlx5_load_one+0x710/0x12e0 [mlx5_core]
[ 183.783069] init_one+0x514/0x898 [mlx5_core]
[ 183.788134] local_pci_probe+0x74/0xcc
[ 183.792589] work_for_cpu_fn+0x30/0x4c
[ 183.797045] process_one_work+0x4f4/0x948
[ 183.801760] process_scheduled_works+0x34/0x54
[ 183.806909] worker_thread+0x348/0x4bc
[ 183.811364] kthread+0x1cc/0x1e8
[ 183.815299] ret_from_fork+0x10/0x18
[ 184.621631] mlx5_core 0000:0b:00.1: Port module event: module 1, Cable
unplugged
[ 184.867367] mlx5_core 0000:0b:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256)
RxCqeCmprss(0)
[ 186.181802] mlx5_core 0000:0b:00.1: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256)
RxCqeCmprss(0)
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy 2019-10-16 14:55 "Convert the AMD iommu driver to the dma-iommu api" is buggy Qian Cai @ 2019-10-16 14:59 ` Qian Cai 2019-10-16 15:31 ` Joerg Roedel 2019-10-16 15:44 ` Joerg Roedel 0 siblings, 2 replies; 12+ messages in thread From: Qian Cai @ 2019-10-16 14:59 UTC (permalink / raw) To: Tom Murphy Cc: Robin Murphy, Christoph Hellwig, Joerg Roedel, iommu, linux-kernel On Wed, 2019-10-16 at 10:55 -0400, Qian Cai wrote: > Today's linux-next generates a lot of warnings on multiple servers during boot > due to the series "iommu/amd: Convert the AMD iommu driver to the dma-iommu api" > [1]. Reverted the whole things fixed them. > > [1] https://lore.kernel.org/lkml/20190908165642.22253-1-murphyt7@tcd.ie/ > BTW, the previous x86 warning was from only reverted one patch "iommu: Add gfp parameter to iommu_ops::map" where proved to be insufficient. Now, pasting the correct warning. [ 564.365768][ T6222] BUG: sleeping function called from invalid context at mm/page_alloc.c:4692 [ 564.374447][ T6222] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 6222, name: git [ 564.382969][ T6222] INFO: lockdep is turned off. [ 564.387644][ T6222] CPU: 25 PID: 6222 Comm: git Tainted: G W 5.4.0-rc3-next-20191016 #6 [ 564.397011][ T6222] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 [ 564.406291][ T6222] Call Trace: [ 564.409470][ T6222] dump_stack+0x86/0xca [ 564.413517][ T6222] ___might_sleep.cold.92+0xd2/0x122 [ 564.418694][ T6222] __might_sleep+0x73/0xe0 [ 564.422999][ T6222] __alloc_pages_nodemask+0x442/0x720 [ 564.428265][ T6222] ? __alloc_pages_slowpath+0x18d0/0x18d0 [ 564.433883][ T6222] ? arch_stack_walk+0x7f/0xf0 [ 564.438534][ T6222] ? create_object+0x4a2/0x540 [ 564.443188][ T6222] alloc_pages_current+0x9c/0x110 [ 564.448098][ T6222] __get_free_pages+0x12/0x60 [ 564.452659][ T6222] get_zeroed_page+0x16/0x20 [ 564.457137][ T6222] amd_iommu_map+0x504/0x850 [ 564.461612][ T6222] ? amd_iommu_domain_direct_map+0x60/0x60 [ 564.467312][ T6222] ? lockdep_hardirqs_on+0x16/0x2a0 [ 564.472400][ T6222] ? alloc_iova+0x189/0x210 [ 564.476790][ T6222] __iommu_map+0x1c1/0x4e0 [ 564.481090][ T6222] ? iommu_get_dma_domain+0x40/0x40 [ 564.486181][ T6222] ? alloc_iova_fast+0x258/0x3d1 [ 564.491009][ T6222] ? create_object+0x4a2/0x540 [ 564.495656][ T6222] __iommu_map_sg+0xa5/0x130 [ 564.500135][ T6222] iommu_map_sg_atomic+0x14/0x20 [ 564.504958][ T6222] iommu_dma_map_sg+0x2c3/0x450 [ 564.509699][ T6222] scsi_dma_map+0xd7/0x160 [ 564.514010][ T6222] pqi_raid_submit_scsi_cmd_with_io_request+0x392/0x420 [smartpqi] [ 564.521811][ T6222] ? pqi_alloc_io_request+0x127/0x140 [smartpqi] [ 564.528037][ T6222] pqi_scsi_queue_command+0x8ab/0xe00 [smartpqi] [ 564.534264][ T6222] ? pqi_eh_device_reset_handler+0x9c0/0x9c0 [smartpqi] [ 564.541100][ T6222] ? sd_init_command+0xa25/0x1346 [sd_mod] [ 564.546802][ T6222] scsi_queue_rq+0xd19/0x1360 [ 564.551372][ T6222] __blk_mq_try_issue_directly+0x295/0x3f0 [ 564.557071][ T6222] ? blk_mq_request_bypass_insert+0xd0/0xd0 [ 564.562860][ T6222] ? debug_lockdep_rcu_enabled+0x27/0x60 [ 564.568384][ T6222] blk_mq_try_issue_directly+0xad/0x130 [ 564.573821][ T6222] ? __blk_mq_try_issue_directly+0x3f0/0x3f0 [ 564.579693][ T6222] ? blk_add_rq_to_plug+0xcd/0x110 [ 564.584693][ T6222] blk_mq_make_request+0xcee/0x1120 [ 564.589777][ T6222] ? lock_downgrade+0x3c0/0x3c0 [ 564.594517][ T6222] ? blk_mq_try_issue_directly+0x130/0x130 [ 564.600218][ T6222] ? blk_queue_enter+0x78d/0x810 [ 564.605041][ T6222] ? generic_make_request_checks+0xf30/0xf30 [ 564.610915][ T6222] ? lock_downgrade+0x3c0/0x3c0 [ 564.615655][ T6222] ? __srcu_read_unlock+0x24/0x50 [ 564.620565][ T6222] ? generic_make_request+0x150/0x650 [ 564.625833][ T6222] generic_make_request+0x196/0x650 [ 564.630921][ T6222] ? blk_queue_enter+0x810/0x810 [ 564.635747][ T6222] submit_bio+0xaa/0x270 [ 564.639873][ T6222] ? submit_bio+0xaa/0x270 [ 564.644172][ T6222] ? generic_make_request+0x650/0x650 [ 564.649437][ T6222] ? iomap_readpage+0x260/0x260 [ 564.654173][ T6222] iomap_readpages+0x154/0x3d0 [ 564.658823][ T6222] ? iomap_zero_range_actor+0x330/0x330 [ 564.664257][ T6222] ? __might_sleep+0x73/0xe0 [ 564.668836][ T6222] xfs_vm_readpages+0xaf/0x1f0 [xfs] [ 564.674016][ T6222] read_pages+0xe2/0x3b0 [ 564.678142][ T6222] ? read_cache_pages+0x350/0x350 [ 564.683057][ T6222] ? __page_cache_alloc+0x12c/0x230 [ 564.688148][ T6222] __do_page_cache_readahead+0x346/0x3a0 [ 564.693670][ T6222] ? read_pages+0x3b0/0x3b0 [ 564.698059][ T6222] ? lockdep_hardirqs_on+0x16/0x2a0 [ 564.703247][ T6222] ? __xfs_filemap_fault+0x167/0x4a0 [xfs] [ 564.708947][ T6222] filemap_fault+0xa13/0xe70 [ 564.713528][ T6222] __xfs_filemap_fault+0x167/0x4a0 [xfs] [ 564.719059][ T6222] ? kmemleak_alloc+0x57/0x90 [ 564.723724][ T6222] ? xfs_file_read_iter+0x3c0/0x3c0 [xfs] [ 564.729337][ T6222] ? debug_check_no_locks_freed+0x2c/0xe0 [ 564.734946][ T6222] ? lockdep_init_map+0x8b/0x2b0 [ 564.739872][ T6222] xfs_filemap_fault+0x68/0x70 [xfs] [ 564.745046][ T6222] __do_fault+0x83/0x220 [ 564.749172][ T6222] __handle_mm_fault+0xd76/0x1f40 [ 564.754084][ T6222] ? __pmd_alloc+0x280/0x280 [ 564.758559][ T6222] ? debug_lockdep_rcu_enabled+0x27/0x60 > > [ 183.553150] BUG: sleeping function called from invalid context at > drivers/iommu/iommu.c:1950 > [ 183.562306] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 1486, > name: kworker/0:3 > [ 183.571450] 5 locks held by kworker/0:3/1486: > [ 183.576510] #0: 44ff0008000ce128 ((wq_completion)events){+.+.}, at: > process_one_work+0x25c/0x948 > [ 183.586110] #1: 43ff00081fb2fcf8 ((work_completion)(&wfc.work)){+.+.}, at: > process_one_work+0x280/0x948 > [ 183.596310] #2: ffff000a2c661a08 (&dev->intf_state_mutex){+.+.}, at: > mlx5_load_one+0x68/0x12e0 [mlx5_core] > [ 183.606916] #3: ffff9000127e4560 (irq_domain_mutex){+.+.}, at: > __irq_domain_alloc_irqs+0x1f8/0x430 > [ 183.616683] #4: 02ff0095ca0ed8f0 (&(&cookie->msi_lock)->rlock){....}, at: > iommu_dma_prepare_msi+0x70/0x210 > [ 183.627146] irq event stamp: 378872 > [ 183.631345] hardirqs last enabled at (378871): [<ffff9000109d0230>] > _raw_write_unlock_irqrestore+0x4c/0x84 > [ 183.641791] hardirqs last disabled at (378872): [<ffff9000109cf7a0>] > _raw_spin_lock_irqsave+0x38/0x9c > [ 183.651717] softirqs last enabled at (377854): [<ffff9000100824f4>] > __do_softirq+0x864/0x900 > [ 183.660951] softirqs last disabled at (377841): [<ffff900010118768>] > irq_exit+0x1c8/0x238 > [ 183.669836] CPU: 0 PID: 1486 Comm: kworker/0:3 Tainted: > G W L 5.4.0-rc3-next-20191016+ #8 > [ 183.679845] Hardware name: HPE Apollo 70 /C01_APACHE_MB , > BIOS L50_5.13_1.11 06/18/2019 > [ 183.690292] Workqueue: events work_for_cpu_fn > [ 183.695357] Call trace: > [ 183.698510] dump_backtrace+0x0/0x248 > [ 183.702878] show_stack+0x20/0x2c > [ 183.706900] dump_stack+0xc8/0x130 > [ 183.711009] ___might_sleep+0x314/0x328 > [ 183.715551] __might_sleep+0x7c/0xe0 > [ 183.719832] iommu_map+0x40/0x70 > [ 183.723766] iommu_dma_prepare_msi+0x16c/0x210 > [ 183.728916] its_irq_domain_alloc+0x100/0x254 > [ 183.733979] irq_domain_alloc_irqs_parent+0x74/0x90 > [ 183.739562] msi_domain_alloc+0xa0/0x170 > [ 183.744190] __irq_domain_alloc_irqs+0x228/0x430 > [ 183.749512] msi_domain_alloc_irqs+0x130/0x548 > [ 183.754663] pci_msi_setup_msi_irqs+0x64/0x74 > [ 183.759726] __pci_enable_msix_range+0x52c/0x878 > [ 183.765049] pci_alloc_irq_vectors_affinity+0x94/0x168 > [ 183.771028] mlx5_irq_table_create+0x178/0x748 [mlx5_core] > [ 183.777353] mlx5_load_one+0x710/0x12e0 [mlx5_core] > [ 183.783069] init_one+0x514/0x898 [mlx5_core] > [ 183.788134] local_pci_probe+0x74/0xcc > [ 183.792589] work_for_cpu_fn+0x30/0x4c > [ 183.797045] process_one_work+0x4f4/0x948 > [ 183.801760] process_scheduled_works+0x34/0x54 > [ 183.806909] worker_thread+0x348/0x4bc > [ 183.811364] kthread+0x1cc/0x1e8 > [ 183.815299] ret_from_fork+0x10/0x18 > [ 184.621631] mlx5_core 0000:0b:00.1: Port module event: module 1, Cable > unplugged > [ 184.867367] mlx5_core 0000:0b:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) > RxCqeCmprss(0) > [ 186.181802] mlx5_core 0000:0b:00.1: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) > RxCqeCmprss(0) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy 2019-10-16 14:59 ` Qian Cai @ 2019-10-16 15:31 ` Joerg Roedel 2019-10-16 15:53 ` Qian Cai 2019-10-16 15:44 ` Joerg Roedel 1 sibling, 1 reply; 12+ messages in thread From: Joerg Roedel @ 2019-10-16 15:31 UTC (permalink / raw) To: Qian Cai; +Cc: Tom Murphy, Robin Murphy, Christoph Hellwig, iommu, linux-kernel Hi Qian, thanks for the report! On Wed, Oct 16, 2019 at 10:59:42AM -0400, Qian Cai wrote: > On Wed, 2019-10-16 at 10:55 -0400, Qian Cai wrote: > > Today's linux-next generates a lot of warnings on multiple servers during boot > > due to the series "iommu/amd: Convert the AMD iommu driver to the dma-iommu api" > > [1]. Reverted the whole things fixed them. > > > > [1] https://lore.kernel.org/lkml/20190908165642.22253-1-murphyt7@tcd.ie/ > > > > BTW, the previous x86 warning was from only reverted one patch "iommu: Add gfp > parameter to iommu_ops::map" where proved to be insufficient. Now, pasting the > correct warning. I am looking into it and may send you fixes to test. Thanks, Joerg ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy 2019-10-16 15:31 ` Joerg Roedel @ 2019-10-16 15:53 ` Qian Cai 2019-10-16 16:03 ` Joerg Roedel 0 siblings, 1 reply; 12+ messages in thread From: Qian Cai @ 2019-10-16 15:53 UTC (permalink / raw) To: Joerg Roedel Cc: Tom Murphy, Robin Murphy, Christoph Hellwig, iommu, linux-kernel On Wed, 2019-10-16 at 17:31 +0200, Joerg Roedel wrote: > Hi Qian, > > thanks for the report! > > On Wed, Oct 16, 2019 at 10:59:42AM -0400, Qian Cai wrote: > > On Wed, 2019-10-16 at 10:55 -0400, Qian Cai wrote: > > > Today's linux-next generates a lot of warnings on multiple servers during boot > > > due to the series "iommu/amd: Convert the AMD iommu driver to the dma-iommu api" > > > [1]. Reverted the whole things fixed them. > > > > > > [1] https://lore.kernel.org/lkml/20190908165642.22253-1-murphyt7@tcd.ie/ > > > > > > > BTW, the previous x86 warning was from only reverted one patch "iommu: Add gfp > > parameter to iommu_ops::map" where proved to be insufficient. Now, pasting the > > correct warning. The x86 one might just be a mistake. diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index ad05484d0c80..63c4b894751d 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -2542,7 +2542,7 @@ static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova, if (iommu_prot & IOMMU_WRITE) prot |= IOMMU_PROT_IW; - ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL); + ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp); domain_flush_np_cache(domain, iova, page_size); The arm64 -- does it forget to do this? diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index ecc08aef9b58..8dd0ef0656f4 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -1185,7 +1185,7 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, if (!iova) goto out_free_page; - if (iommu_map(domain, iova, msi_addr, size, prot)) + if (iommu_map_atomic(domain, iova, msi_addr, size, prot)) goto out_free_iova; INIT_LIST_HEAD(&msi_page->list); ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy 2019-10-16 15:53 ` Qian Cai @ 2019-10-16 16:03 ` Joerg Roedel 2019-10-16 16:09 ` Robin Murphy 2019-10-16 16:11 ` Qian Cai 0 siblings, 2 replies; 12+ messages in thread From: Joerg Roedel @ 2019-10-16 16:03 UTC (permalink / raw) To: Qian Cai; +Cc: Tom Murphy, Robin Murphy, Christoph Hellwig, iommu, linux-kernel On Wed, Oct 16, 2019 at 11:53:33AM -0400, Qian Cai wrote: > On Wed, 2019-10-16 at 17:31 +0200, Joerg Roedel wrote: > The x86 one might just be a mistake. > > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c > index ad05484d0c80..63c4b894751d 100644 > --- a/drivers/iommu/amd_iommu.c > +++ b/drivers/iommu/amd_iommu.c > @@ -2542,7 +2542,7 @@ static int amd_iommu_map(struct iommu_domain *dom, > unsigned long iova, > if (iommu_prot & IOMMU_WRITE) > prot |= IOMMU_PROT_IW; > > - ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL); > + ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp); Yeah, that is a bug, I spotted that too. > @@ -1185,7 +1185,7 @@ static struct iommu_dma_msi_page > *iommu_dma_get_msi_page(struct device *dev, > if (!iova) > goto out_free_page; > > - if (iommu_map(domain, iova, msi_addr, size, prot)) > + if (iommu_map_atomic(domain, iova, msi_addr, size, prot)) > goto out_free_iova; Not so sure this is a bug, this code is only about setting up MSIs on ARM. It probably doesn't need to be atomic. Regards, Joerg ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy 2019-10-16 16:03 ` Joerg Roedel @ 2019-10-16 16:09 ` Robin Murphy 2019-10-16 16:36 ` Julien Grall 2019-10-16 16:11 ` Qian Cai 1 sibling, 1 reply; 12+ messages in thread From: Robin Murphy @ 2019-10-16 16:09 UTC (permalink / raw) To: Joerg Roedel, Qian Cai Cc: iommu, Christoph Hellwig, Tom Murphy, linux-kernel, Julien Grall, Marc Zyngier On 16/10/2019 17:03, Joerg Roedel wrote: > On Wed, Oct 16, 2019 at 11:53:33AM -0400, Qian Cai wrote: >> On Wed, 2019-10-16 at 17:31 +0200, Joerg Roedel wrote: > >> The x86 one might just be a mistake. >> >> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c >> index ad05484d0c80..63c4b894751d 100644 >> --- a/drivers/iommu/amd_iommu.c >> +++ b/drivers/iommu/amd_iommu.c >> @@ -2542,7 +2542,7 @@ static int amd_iommu_map(struct iommu_domain *dom, >> unsigned long iova, >> if (iommu_prot & IOMMU_WRITE) >> prot |= IOMMU_PROT_IW; >> >> - ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL); >> + ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp); > > Yeah, that is a bug, I spotted that too. > >> @@ -1185,7 +1185,7 @@ static struct iommu_dma_msi_page >> *iommu_dma_get_msi_page(struct device *dev, >> if (!iova) >> goto out_free_page; >> >> - if (iommu_map(domain, iova, msi_addr, size, prot)) >> + if (iommu_map_atomic(domain, iova, msi_addr, size, prot)) >> goto out_free_iova; > > Not so sure this is a bug, this code is only about setting up MSIs on > ARM. It probably doesn't need to be atomic. Right, since the iommu_dma_prepare_msi() operation was broken out to happen at MSI domain setup time, I don't think the comment in there applies any more, so we can probably stop disabling irqs around the iommu_dma_get_msi_page() call. Robin. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy 2019-10-16 16:09 ` Robin Murphy @ 2019-10-16 16:36 ` Julien Grall 0 siblings, 0 replies; 12+ messages in thread From: Julien Grall @ 2019-10-16 16:36 UTC (permalink / raw) To: Robin Murphy, Joerg Roedel, Qian Cai Cc: iommu, Christoph Hellwig, Tom Murphy, linux-kernel, Marc Zyngier Hi Robin, On 16/10/2019 17:09, Robin Murphy wrote: > On 16/10/2019 17:03, Joerg Roedel wrote: >> On Wed, Oct 16, 2019 at 11:53:33AM -0400, Qian Cai wrote: >>> On Wed, 2019-10-16 at 17:31 +0200, Joerg Roedel wrote: >> >>> The x86 one might just be a mistake. >>> >>> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c >>> index ad05484d0c80..63c4b894751d 100644 >>> --- a/drivers/iommu/amd_iommu.c >>> +++ b/drivers/iommu/amd_iommu.c >>> @@ -2542,7 +2542,7 @@ static int amd_iommu_map(struct iommu_domain *dom, >>> unsigned long iova, >>> if (iommu_prot & IOMMU_WRITE) >>> prot |= IOMMU_PROT_IW; >>> - ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL); >>> + ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp); >> >> Yeah, that is a bug, I spotted that too. >> >>> @@ -1185,7 +1185,7 @@ static struct iommu_dma_msi_page >>> *iommu_dma_get_msi_page(struct device *dev, >>> if (!iova) >>> goto out_free_page; >>> - if (iommu_map(domain, iova, msi_addr, size, prot)) >>> + if (iommu_map_atomic(domain, iova, msi_addr, size, prot)) >>> goto out_free_iova; >> >> Not so sure this is a bug, this code is only about setting up MSIs on >> ARM. It probably doesn't need to be atomic. > Right, since the iommu_dma_prepare_msi() operation was broken out to happen at > MSI domain setup time, I don't think the comment in there applies any more, so > we can probably stop disabling irqs around the iommu_dma_get_msi_page() call. Yes, I agree that it does not need to be atomic anymore. Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy 2019-10-16 16:03 ` Joerg Roedel 2019-10-16 16:09 ` Robin Murphy @ 2019-10-16 16:11 ` Qian Cai 2019-10-16 16:26 ` Robin Murphy 1 sibling, 1 reply; 12+ messages in thread From: Qian Cai @ 2019-10-16 16:11 UTC (permalink / raw) To: Joerg Roedel Cc: Tom Murphy, Robin Murphy, Christoph Hellwig, iommu, linux-kernel On Wed, 2019-10-16 at 18:03 +0200, Joerg Roedel wrote: > On Wed, Oct 16, 2019 at 11:53:33AM -0400, Qian Cai wrote: > > On Wed, 2019-10-16 at 17:31 +0200, Joerg Roedel wrote: > > The x86 one might just be a mistake. > > > > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c > > index ad05484d0c80..63c4b894751d 100644 > > --- a/drivers/iommu/amd_iommu.c > > +++ b/drivers/iommu/amd_iommu.c > > @@ -2542,7 +2542,7 @@ static int amd_iommu_map(struct iommu_domain *dom, > > unsigned long iova, > > if (iommu_prot & IOMMU_WRITE) > > prot |= IOMMU_PROT_IW; > > > > - ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL); > > + ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp); > > Yeah, that is a bug, I spotted that too. > > > @@ -1185,7 +1185,7 @@ static struct iommu_dma_msi_page > > *iommu_dma_get_msi_page(struct device *dev, > > if (!iova) > > goto out_free_page; > > > > - if (iommu_map(domain, iova, msi_addr, size, prot)) > > + if (iommu_map_atomic(domain, iova, msi_addr, size, prot)) > > goto out_free_iova; > > Not so sure this is a bug, this code is only about setting up MSIs on > ARM. It probably doesn't need to be atomic. The patch "iommu: Add gfp parameter to iommu_ops::map" does this. It could be called from an atomic context as showed in the arm64 call traces, +int iommu_map(struct iommu_domain *domain, unsigned long iova, + phys_addr_t paddr, size_t size, int prot) +{ + might_sleep(); + return __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL); +} ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy 2019-10-16 16:11 ` Qian Cai @ 2019-10-16 16:26 ` Robin Murphy 0 siblings, 0 replies; 12+ messages in thread From: Robin Murphy @ 2019-10-16 16:26 UTC (permalink / raw) To: Qian Cai, Joerg Roedel; +Cc: Tom Murphy, Christoph Hellwig, iommu, linux-kernel On 16/10/2019 17:11, Qian Cai wrote: > On Wed, 2019-10-16 at 18:03 +0200, Joerg Roedel wrote: >> On Wed, Oct 16, 2019 at 11:53:33AM -0400, Qian Cai wrote: >>> On Wed, 2019-10-16 at 17:31 +0200, Joerg Roedel wrote: >>> The x86 one might just be a mistake. >>> >>> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c >>> index ad05484d0c80..63c4b894751d 100644 >>> --- a/drivers/iommu/amd_iommu.c >>> +++ b/drivers/iommu/amd_iommu.c >>> @@ -2542,7 +2542,7 @@ static int amd_iommu_map(struct iommu_domain *dom, >>> unsigned long iova, >>> if (iommu_prot & IOMMU_WRITE) >>> prot |= IOMMU_PROT_IW; >>> >>> - ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL); >>> + ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp); >> >> Yeah, that is a bug, I spotted that too. >> >>> @@ -1185,7 +1185,7 @@ static struct iommu_dma_msi_page >>> *iommu_dma_get_msi_page(struct device *dev, >>> if (!iova) >>> goto out_free_page; >>> >>> - if (iommu_map(domain, iova, msi_addr, size, prot)) >>> + if (iommu_map_atomic(domain, iova, msi_addr, size, prot)) >>> goto out_free_iova; >> >> Not so sure this is a bug, this code is only about setting up MSIs on >> ARM. It probably doesn't need to be atomic. > > The patch "iommu: Add gfp parameter to iommu_ops::map" does this. It could be > called from an atomic context as showed in the arm64 call traces, > > +int iommu_map(struct iommu_domain *domain, unsigned long iova, > + phys_addr_t paddr, size_t size, int prot) > +{ > + might_sleep(); > + return __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL); > +} Also note that it's *only* the might_sleep() at issue here - none of the arm64 IOMMU drivers have been converted to look at the new gfp argument yet, so anything they actually allocate while mapping will still be GFP_ATOMIC anyway. (Carrying that flag down through the whole io-pgtable stack is on my to-do list...) Robin. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy 2019-10-16 14:59 ` Qian Cai 2019-10-16 15:31 ` Joerg Roedel @ 2019-10-16 15:44 ` Joerg Roedel 2019-10-17 14:39 ` Qian Cai 1 sibling, 1 reply; 12+ messages in thread From: Joerg Roedel @ 2019-10-16 15:44 UTC (permalink / raw) To: Qian Cai; +Cc: Tom Murphy, Robin Murphy, Christoph Hellwig, iommu, linux-kernel On Wed, Oct 16, 2019 at 10:59:42AM -0400, Qian Cai wrote: > BTW, the previous x86 warning was from only reverted one patch "iommu: Add gfp > parameter to iommu_ops::map" where proved to be insufficient. Now, pasting the > correct warning. Can you please test this small fix: diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 78a2cca3ac5c..e7a4464e8594 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -2562,7 +2562,7 @@ static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova, if (iommu_prot & IOMMU_WRITE) prot |= IOMMU_PROT_IW; - ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL); + ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp); domain_flush_np_cache(domain, iova, page_size); ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy 2019-10-16 15:44 ` Joerg Roedel @ 2019-10-17 14:39 ` Qian Cai 2019-10-18 9:39 ` Joerg Roedel 0 siblings, 1 reply; 12+ messages in thread From: Qian Cai @ 2019-10-17 14:39 UTC (permalink / raw) To: Joerg Roedel Cc: Tom Murphy, Robin Murphy, Christoph Hellwig, iommu, linux-kernel On Wed, 2019-10-16 at 17:44 +0200, Joerg Roedel wrote: > On Wed, Oct 16, 2019 at 10:59:42AM -0400, Qian Cai wrote: > > BTW, the previous x86 warning was from only reverted one patch "iommu: Add gfp > > parameter to iommu_ops::map" where proved to be insufficient. Now, pasting the > > correct warning. > > Can you please test this small fix: This works fine so far. > > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c > index 78a2cca3ac5c..e7a4464e8594 100644 > --- a/drivers/iommu/amd_iommu.c > +++ b/drivers/iommu/amd_iommu.c > @@ -2562,7 +2562,7 @@ static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova, > if (iommu_prot & IOMMU_WRITE) > prot |= IOMMU_PROT_IW; > > - ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL); > + ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp); > > domain_flush_np_cache(domain, iova, page_size); > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy 2019-10-17 14:39 ` Qian Cai @ 2019-10-18 9:39 ` Joerg Roedel 0 siblings, 0 replies; 12+ messages in thread From: Joerg Roedel @ 2019-10-18 9:39 UTC (permalink / raw) To: Qian Cai; +Cc: Tom Murphy, Robin Murphy, Christoph Hellwig, iommu, linux-kernel On Thu, Oct 17, 2019 at 10:39:13AM -0400, Qian Cai wrote: > On Wed, 2019-10-16 at 17:44 +0200, Joerg Roedel wrote: > > On Wed, Oct 16, 2019 at 10:59:42AM -0400, Qian Cai wrote: > > > BTW, the previous x86 warning was from only reverted one patch "iommu: Add gfp > > > parameter to iommu_ops::map" where proved to be insufficient. Now, pasting the > > > correct warning. > > > > Can you please test this small fix: > > This works fine so far. Thanks for testing, I queued the fix and will push it to my next branch today. Regards, Joerg ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2019-10-18 9:39 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-10-16 14:55 "Convert the AMD iommu driver to the dma-iommu api" is buggy Qian Cai 2019-10-16 14:59 ` Qian Cai 2019-10-16 15:31 ` Joerg Roedel 2019-10-16 15:53 ` Qian Cai 2019-10-16 16:03 ` Joerg Roedel 2019-10-16 16:09 ` Robin Murphy 2019-10-16 16:36 ` Julien Grall 2019-10-16 16:11 ` Qian Cai 2019-10-16 16:26 ` Robin Murphy 2019-10-16 15:44 ` Joerg Roedel 2019-10-17 14:39 ` Qian Cai 2019-10-18 9:39 ` Joerg Roedel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox