All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: Matthew Auld <matthew.auld@intel.com>
Cc: intel-xe@lists.freedesktop.org, Rodrigo Vivi <rodrigo.vivi@intel.com>
Subject: Re: [Intel-xe] [PATCH v11 11/12] drm/xe: drop xe_device_mem_access_get() from invalidation_vma
Date: Mon, 12 Jun 2023 18:22:37 +0000	[thread overview]
Message-ID: <ZIdibdrkepdcWikv@DUT025-TGLU.fm.intel.com> (raw)
In-Reply-To: <20230612171225.88689-12-matthew.auld@intel.com>

On Mon, Jun 12, 2023 at 06:12:24PM +0100, Matthew Auld wrote:
> Lockdep gives the following splat:
> 
> [  594.158863] ffff888140da53f0 (&vm->userptr.notifier_lock){++++}-{3:3}, at: vma_userptr_invalidate+0xeb/0x330 [xe]
> [  594.158921]
>                but task is already holding lock:
> [  594.158926] ffffffff82761940
> (mmu_notifier_invalidate_range_start){+.+.}-{0:0}, at: unmap_vmas+0x0/0x1c0
> [  594.158941]
>                which lock already depends on the new lock.
> 
> [  594.158947]
>                the existing dependency chain (in reverse order) is:
> [  594.158953]
>                -> #5 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
> [  594.158961]        fs_reclaim_acquire+0x68/0xd0
> [  594.158969]        __kmem_cache_alloc_node+0x2c/0x1b0
> [  594.158975]        kmalloc_node_trace+0x1d/0xb0
> [  594.158983]        alloc_worker+0x18/0x50
> [  594.158989]        init_rescuer.part.0+0x13/0xa0
> [  594.158995]        workqueue_init+0xdf/0x210
> [  594.159001]        kernel_init_freeable+0x5c/0x2f0
> [  594.159009]        kernel_init+0x11/0x1a0
> [  594.159017]        ret_from_fork+0x29/0x50
> [  594.159023]
>                -> #4 (fs_reclaim){+.+.}-{0:0}:
> [  594.159031]        fs_reclaim_acquire+0xa0/0xd0
> [  594.159037]        __kmem_cache_alloc_node+0x2c/0x1b0
> [  594.159042]        kmalloc_trace+0x20/0xb0
> [  594.159048]        acpi_device_add+0x25a/0x3f0
> [  594.159056]        acpi_add_single_object+0x387/0x750
> [  594.159063]        acpi_bus_check_add+0x108/0x280
> [  594.159069]        acpi_bus_scan+0x34/0xf0
> [  594.159075]        acpi_scan_init+0xed/0x2b0
> [  594.159082]        acpi_init+0x21e/0x520
> [  594.159087]        do_one_initcall+0x53/0x260
> [  594.159092]        kernel_init_freeable+0x18a/0x2f0
> [  594.159099]        kernel_init+0x11/0x1a0
> [  594.159105]        ret_from_fork+0x29/0x50
> [  594.159110]
>                -> #3 (acpi_device_lock){+.+.}-{3:3}:
> [  594.159117]        __mutex_lock+0x95/0xd10
> [  594.159122]        acpi_enable_wakeup_device_power+0x30/0x120
> [  594.159130]        __acpi_device_wakeup_enable+0x34/0x110
> [  594.159138]        acpi_pm_set_device_wakeup+0x55/0x140
> [  594.159143]        __pci_enable_wake+0x56/0xb0
> [  594.159150]        pci_finish_runtime_suspend+0x35/0x80
> [  594.159157]        pci_pm_runtime_suspend+0xb5/0x1a0
> [  594.159162]        __rpm_callback+0x3c/0x110
> [  594.159170]        rpm_callback+0x58/0x70
> [  594.159176]        rpm_suspend+0x15c/0x6f0
> [  594.159182]        pm_runtime_work+0x9b/0xb0
> [  594.159188]        process_one_work+0x263/0x520
> [  594.159195]        worker_thread+0x4d/0x3b0
> [  594.159200]        kthread+0xeb/0x120
> [  594.159206]        ret_from_fork+0x29/0x50
> [  594.159211]
>                -> #2 (acpi_wakeup_lock){+.+.}-{3:3}:
> [  594.159218]        __mutex_lock+0x95/0xd10
> [  594.159223]        acpi_pm_set_device_wakeup+0x7a/0x140
> [  594.159228]        __pci_enable_wake+0x77/0xb0
> [  594.159234]        pci_pm_runtime_resume+0x70/0xd0
> [  594.159240]        __rpm_callback+0x3c/0x110
> [  594.159246]        rpm_callback+0x58/0x70
> [  594.159252]        rpm_resume+0x50d/0x7a0
> [  594.159258]        rpm_resume+0x267/0x7a0
> [  594.159264]        __pm_runtime_resume+0x45/0x90
> [  594.159270]        xe_pm_runtime_resume_and_get+0x12/0x50 [xe]
> [  594.159314]        xe_device_mem_access_get+0x97/0xc0 [xe]
> [  594.159346]        hw_engines+0x65/0xf0 [xe]
> [  594.159380]        seq_read_iter+0x10d/0x4b0
> [  594.159385]        seq_read+0x9e/0xd0
> [  594.159390]        full_proxy_read+0x4e/0x80
> [  594.159396]        vfs_read+0xb6/0x310
> [  594.159401]        ksys_read+0x60/0xe0
> [  594.159406]        do_syscall_64+0x38/0x90
> [  594.159413]        entry_SYSCALL_64_after_hwframe+0x72/0xdc
> [  594.159419]
>                -> #1 (&xe->mem_access.lock){+.+.}-{3:3}:
> [  594.159427]        xe_device_mem_access_get+0x43/0xc0 [xe]
> [  594.159457]        xe_gt_tlb_invalidation_vma+0x53/0x190 [xe]
> [  594.159490]        invalidation_fence_init+0x1d2/0x2c0 [xe]
> [  594.159529]        __xe_pt_unbind_vma+0x151/0x4e0 [xe]
> [  594.159564]        vm_bind_ioctl+0x48a/0xae0 [xe]
> [  594.159602]        async_op_work_func+0x20c/0x530 [xe]
> [  594.159634]        process_one_work+0x263/0x520
> [  594.159640]        worker_thread+0x4d/0x3b0
> [  594.159646]        kthread+0xeb/0x120
> [  594.159650]        ret_from_fork+0x29/0x50
> [  594.159655]
>                -> #0 (&vm->userptr.notifier_lock){++++}-{3:3}:
> [  594.159663]        __lock_acquire+0x16fa/0x2850
> [  594.159670]        lock_acquire+0xd2/0x2e0
> [  594.159676]        down_write+0x36/0xd0
> [  594.159681]        vma_userptr_invalidate+0xeb/0x330 [xe]
> [  594.159714]        __mmu_notifier_invalidate_range_start+0x239/0x2a0
> [  594.159722]        unmap_vmas+0x1ac/0x1c0
> [  594.159727]        unmap_region+0xb5/0x120
> [  594.159732]        do_vmi_align_munmap+0x2be/0x430
> [  594.159739]        do_vmi_munmap+0xea/0x120
> [  594.159744]        __vm_munmap+0x9c/0x160
> [  594.159750]        __x64_sys_munmap+0x12/0x20
> [  594.159756]        do_syscall_64+0x38/0x90
> [  594.159761]        entry_SYSCALL_64_after_hwframe+0x72/0xdc
> [  594.159768]
>                other info that might help us debug this:
> 
> [  594.159773] Chain exists of:
>                  &vm->userptr.notifier_lock --> fs_reclaim -->
> mmu_notifier_invalidate_range_start
> 
> [  594.159785]  Possible unsafe locking scenario:
> 
> [  594.159790]        CPU0                    CPU1
> [  594.159794]        ----                    ----
> [  594.159797]   lock(mmu_notifier_invalidate_range_start);
> [  594.159802]                                lock(fs_reclaim);
> [  594.159808]
> lock(mmu_notifier_invalidate_range_start);
> [  594.159814]   lock(&vm->userptr.notifier_lock);
> [  594.159819]
> 
> The VM should be holding a mem_access.ref so this looks like it should
> be a false positive and we can just drop the explicit mem_access in
> xe_gt_tlb_invalidation().  The GGTT invalidation path also takes care to
> hold mem_access.ref so should be fine there also.

Also the MMIO write to notify the GuC of the H2G should have memory
access assert.

With that.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> index 2fcb477604e2..19826488d3da 100644
> --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> @@ -194,7 +194,7 @@ int xe_gt_tlb_invalidation_vma(struct xe_gt *gt,
>  	struct xe_device *xe = gt_to_xe(gt);
>  #define MAX_TLB_INVALIDATION_LEN	7
>  	u32 action[MAX_TLB_INVALIDATION_LEN];
> -	int len = 0, ret;
> +	int len = 0;
>  
>  	XE_BUG_ON(!vma);
>  
> @@ -248,11 +248,7 @@ int xe_gt_tlb_invalidation_vma(struct xe_gt *gt,
>  
>  	XE_BUG_ON(len > MAX_TLB_INVALIDATION_LEN);
>  
> -	xe_device_mem_access_get(xe);
> -	ret = send_tlb_invalidation(&gt->uc.guc, fence, action, len);
> -	xe_device_mem_access_put(xe);
> -
> -	return ret;
> +	return send_tlb_invalidation(&gt->uc.guc, fence, action, len);
>  }
>  
>  static bool tlb_invalidation_seqno_past(struct xe_gt *gt, int seqno)
> -- 
> 2.40.1
> 

  reply	other threads:[~2023-06-12 18:23 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-12 17:12 [Intel-xe] [PATCH v11 00/12] xe_device_mem_access fixes and related bits Matthew Auld
2023-06-12 17:12 ` [Intel-xe] [PATCH v11 01/12] drm/xe: fix xe_device_mem_access_get() races Matthew Auld
2023-06-12 17:12 ` [Intel-xe] [PATCH v11 02/12] drm/xe/vm: tidy up xe_runtime_pm usage Matthew Auld
2023-06-12 17:12 ` [Intel-xe] [PATCH v11 03/12] drm/xe/debugfs: grab mem_access around forcewake Matthew Auld
2023-06-12 17:12 ` [Intel-xe] [PATCH v11 04/12] drm/xe/mmio: grab mem_access in xe_mmio_ioctl Matthew Auld
2023-06-12 18:13   ` Matthew Brost
2023-06-13  8:30     ` Matthew Auld
2023-06-12 17:12 ` [Intel-xe] [PATCH v11 05/12] drm/xe: ensure correct access_put ordering Matthew Auld
2023-06-12 18:16   ` Matthew Brost
2023-06-12 17:12 ` [Intel-xe] [PATCH v11 06/12] drm/xe/pci: wrap probe with mem_access Matthew Auld
2023-06-12 18:34   ` Matthew Brost
2023-06-12 17:12 ` [Intel-xe] [PATCH v11 07/12] drm/xe/display: use mem_access underneath Matthew Auld
2023-06-12 17:12 ` [Intel-xe] [PATCH v11 08/12] drm/xe/mmio: enforce xe_device_assert_mem_access Matthew Auld
2023-06-12 18:17   ` Matthew Brost
2023-06-12 17:12 ` [Intel-xe] [PATCH v11 09/12] drm/xe: drop xe_device_mem_access_get() from guc_ct_send Matthew Auld
2023-06-12 17:12 ` [Intel-xe] [PATCH v11 10/12] drm/xe/ggtt: prime ggtt->lock against FS_RECLAIM Matthew Auld
2023-06-12 17:12 ` [Intel-xe] [PATCH v11 11/12] drm/xe: drop xe_device_mem_access_get() from invalidation_vma Matthew Auld
2023-06-12 18:22   ` Matthew Brost [this message]
2023-06-12 17:12 ` [Intel-xe] [PATCH v11 12/12] drm/xe: add lockdep annotation for xe_device_mem_access_get() Matthew Auld
2023-06-12 18:32   ` Matthew Brost
2023-06-13  9:42   ` Thomas Hellström
2023-06-13 10:13     ` Matthew Auld
2023-06-12 17:16 ` [Intel-xe] ✓ CI.Patch_applied: success for xe_device_mem_access fixes and related bits Patchwork
2023-06-12 17:17 ` [Intel-xe] ✗ CI.checkpatch: warning " Patchwork
2023-06-12 17:18 ` [Intel-xe] ✓ CI.KUnit: success " Patchwork
2023-06-12 17:22 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-06-12 17:22 ` [Intel-xe] ✓ CI.Hooks: " Patchwork
2023-06-12 17:23 ` [Intel-xe] ✓ CI.checksparse: " Patchwork
2023-06-12 17:52 ` [Intel-xe] ○ CI.BAT: info " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZIdibdrkepdcWikv@DUT025-TGLU.fm.intel.com \
    --to=matthew.brost@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.auld@intel.com \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.