All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gustavo Sousa <gustavo.sousa@intel.com>
To: Michal Wajdeczko <michal.wajdeczko@intel.com>,
	<intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH 4/6] drm/xe: Separate early xe_device initialization
Date: Mon, 25 May 2026 16:05:50 -0300	[thread overview]
Message-ID: <87tsrvgvht.fsf@intel.com> (raw)
In-Reply-To: <cb7116e6-e713-41e6-a687-8e814ff2dcde@intel.com>

Michal Wajdeczko <michal.wajdeczko@intel.com> writes:

> On 5/25/2026 8:21 PM, Gustavo Sousa wrote:
>> Michal Wajdeczko <michal.wajdeczko@intel.com> writes:
>> 
>>> We would like to initialize more of the xe_device struct also from
>>> the kunit code, as it should be safe to use most of the generic drm
>>> or xe components without doing any additional tweaks. Separate early
>>> xe initialization code to a new function, so it can be reused.
>>>
>>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>> ---
>>>  drivers/gpu/drm/xe/xe_device.c | 39 ++++++++++++++++++++++++----------
>>>  drivers/gpu/drm/xe/xe_device.h |  1 +
>>>  2 files changed, 29 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
>>> index b498147dcf61..7ba407f73a02 100644
>>> --- a/drivers/gpu/drm/xe/xe_device.c
>>> +++ b/drivers/gpu/drm/xe/xe_device.c
>>> @@ -506,27 +506,45 @@ struct xe_device *xe_device_create(struct pci_dev *pdev)
>>>  	if (IS_ERR(xe))
>>>  		return xe;
>>>  
>>> +	err = xe_device_init_early(xe);
>>> +	if (err)
>>> +		return ERR_PTR(err);
>>> +
>>> +	return xe;
>>> +}
>>> +ALLOW_ERROR_INJECTION(xe_device_create, ERRNO); /* See xe_pci_probe() */
>>> +
>>> +/**
>>> + * xe_device_init_early() - Initialize a new &xe_device instance
>>> + * @xe: the &xe_device to initialize
>>> + *
>>> + * Return: 0 on success or a negative error code on failure.
>>> + */
>>> +int xe_device_init_early(struct xe_device *xe)
>>> +{
>>> +	int err;
>>> +
>>>  	err = ttm_device_init(&xe->ttm, &xe_ttm_funcs, xe->drm.dev,
>>>  			      xe->drm.anon_inode->i_mapping,
>>>  			      xe->drm.vma_offset_manager, 0);
>>> -	if (WARN_ON(err))
>>> -		return ERR_PTR(err);
>>> +	if (err)
>>> +		return err;
>>>  
>>>  	xe_bo_dev_init(&xe->bo_device);
>>>  	err = drmm_add_action_or_reset(&xe->drm, xe_device_destroy, NULL);
>> 
>> Curious: are the drmm_* calls going to work fine for the automated
>> teardown when they are called from a kunit context?
>
> that's correct, see:
>
> WARNING: drivers/gpu/drm/xe/tests/xe_kunit_helpers.c:67 at my_drm_fini+0x72/0x80, CPU#0: kunit_try_catch/20
> drm-kunit-mock-device test_data.drm-kunit-mock-device: [drm] WARN_ON(true)
> CPU: 0 UID: 0 PID: 20 Comm: kunit_try_catch Tainted: G        W        N  7.1.0-rc5-g547691814d7b-dirty #1020 VOLUNTARY
> Tainted: [W]=WARN, [N]=TEST
> Stack:
>  607ffcd5 00000000 00000000 00000001
>  ffffff00 607ffcd5 00000009 60029a90
>  6086f348 60064b86 605cf832 a088bcb0
> Call Trace:
>  [<60029a90>] ? _printk+0x0/0x65
>  [<60064b86>] ? dump_stack_lvl+0x6f/0xb8
>  [<605cf832>] ? my_drm_fini+0x72/0x80
>  [<600231a3>] ? __warn.cold+0x82/0x1cf
>  [<601aacb0>] ? kfree_const+0x0/0x40
>  [<60023384>] ? warn_slowpath_fmt+0x94/0xa1
>  [<60409500>] ? __drm_dev_dbg+0x0/0xb0
>  [<605cf832>] ? my_drm_fini+0x72/0x80
>  [<603fb4a7>] ? drm_managed_release+0xa7/0x190
>  [<605ec9a0>] ? devres_log+0x0/0xc0
>  [<603e6590>] ? devm_drm_dev_init_release+0x60/0x90
>  [<605ecb75>] ? release_nodes+0x55/0x80
>  [<6069ae80>] ? mutex_lock_nested+0x0/0x20
>  [<605ee3f7>] ? devres_release_all+0x97/0xf0
>  [<605e68c0>] ? bus_notify+0x0/0x60
>  [<60695df0>] ? mutex_unlock+0x0/0x20
>  [<605e6f54>] ? device_unbind_cleanup+0x14/0xb0
>  [<605e88e6>] ? device_release_driver_internal+0x266/0x2c0
>  [<60669f50>] ? kobject_put+0x0/0x150
>  [<602ef010>] ? sysfs_remove_file_ns+0x0/0x20
>  [<605e64fd>] ? bus_remove_device+0x11d/0x1b0
>  [<6069ae80>] ? mutex_lock_nested+0x0/0x20
>  [<605dfc1f>] ? device_del+0x1bf/0x5f0
>  [<60075330>] ? um_set_signals+0x0/0x60
>  [<60075320>] ? um_get_signals+0x0/0x10
>  [<606a2160>] ? _raw_spin_lock_irqsave+0x0/0xa0
>  [<605e0064>] ? device_unregister+0x14/0x40
>  [<60375cde>] ? kunit_remove_resource+0x8e/0xd0
>  [<606a24d0>] ? _raw_spin_unlock_irqrestore+0x0/0xb0
>  [<60375323>] ? kunit_cleanup+0x63/0xb0
>  [<60377a70>] ? kunit_generic_run_threadfn_adapter+0x0/0x30
>  [<60377a86>] ? kunit_generic_run_threadfn_adapter+0x16/0x30
>
>
>
>> 
>> What about devm_* calls? From a quick look, I know that at least
>> xe_bo_pinned_init() will make one of those.
>
> kunit device follows device model, it is just created on a dummy bus,
> but everything else works fine, see:
>
> WARNING: drivers/gpu/drm/xe/tests/xe_kunit_helpers.c:60 at my_dev_fini+0x6f/0x80, CPU#0: kunit_try_catch/20
> drm-kunit-mock-device test_data.drm-kunit-mock-device: [drm] WARN_ON(true)
> CPU: 0 UID: 0 PID: 20 Comm: kunit_try_catch Tainted: G                 N  7.1.0-rc5-g547691814d7b-dirty #1020 VOLUNTARY
> Tainted: [N]=TEST
> Stack:
>  607ffcd5 00000000 00000000 00000001
>  ffffff00 607ffcd5 00000009 60029a90
>  6086f348 60064b86 605cf7af a088bd00
> Call Trace:
>  [<60029a90>] ? _printk+0x0/0x65
>  [<60064b86>] ? dump_stack_lvl+0x6f/0xb8
>  [<605cf7af>] ? my_dev_fini+0x6f/0x80
>  [<600231a3>] ? __warn.cold+0x82/0x1cf
>  [<605ee3b5>] ? devres_release_all+0x55/0xf0
>  [<60023384>] ? warn_slowpath_fmt+0x94/0xa1
>  [<605ec9a0>] ? devres_log+0x0/0xc0
>  [<605cf7af>] ? my_dev_fini+0x6f/0x80
>  [<605ecb75>] ? release_nodes+0x55/0x80
>  [<6069ae80>] ? mutex_lock_nested+0x0/0x20
>  [<605ee3f7>] ? devres_release_all+0x97/0xf0
>  [<605e68c0>] ? bus_notify+0x0/0x60
>  [<60695df0>] ? mutex_unlock+0x0/0x20
>  [<605e6f54>] ? device_unbind_cleanup+0x14/0xb0
>  [<605e88e6>] ? device_release_driver_internal+0x266/0x2c0
>  [<60669f50>] ? kobject_put+0x0/0x150
>  [<602ef010>] ? sysfs_remove_file_ns+0x0/0x20
>  [<605e64fd>] ? bus_remove_device+0x11d/0x1b0
>  [<6069ae80>] ? mutex_lock_nested+0x0/0x20
>  [<605dfc1f>] ? device_del+0x1bf/0x5f0
>  [<60075330>] ? um_set_signals+0x0/0x60
>  [<60075320>] ? um_get_signals+0x0/0x10
>  [<606a2160>] ? _raw_spin_lock_irqsave+0x0/0xa0
>  [<605e0064>] ? device_unregister+0x14/0x40
>  [<60375cde>] ? kunit_remove_resource+0x8e/0xd0
>  [<606a24d0>] ? _raw_spin_unlock_irqrestore+0x0/0xb0
>  [<60375323>] ? kunit_cleanup+0x63/0xb0
>  [<60377a70>] ? kunit_generic_run_threadfn_adapter+0x0/0x30
>  [<60377a86>] ? kunit_generic_run_threadfn_adapter+0x16/0x30
>
>> 
>>>  	if (err)
>>> -		return ERR_PTR(err);
>>> +		return err;
>>>  
>>>  	err = xe_shrinker_create(xe);
>>>  	if (err)
>>> -		return ERR_PTR(err);
>>> +		return err;
>>>  
>>>  	xe->atomic_svm_timeslice_ms = 5;
>>>  	xe->min_run_period_lr_ms = 5;
>>>  
>>>  	err = xe_irq_init(xe);
>>>  	if (err)
>>> -		return ERR_PTR(err);
>>> +		return err;
>>>  
>>>  	xe_validation_device_init(&xe->val);
>>>  
>>> @@ -536,7 +554,7 @@ struct xe_device *xe_device_create(struct pci_dev *pdev)
>>>  
>>>  	err = xe_pagemap_shrinker_create(xe);
>>>  	if (err)
>>> -		return ERR_PTR(err);
>>> +		return err;
>>>  
>>>  	xa_init_flags(&xe->usm.asid_to_vm, XA_FLAGS_ALLOC);
>>>  
>>> @@ -555,7 +573,7 @@ struct xe_device *xe_device_create(struct pci_dev *pdev)
>>>  
>>>  	err = xe_bo_pinned_init(xe);
>>>  	if (err)
>>> -		return ERR_PTR(err);
>>> +		return err;
>>>  
>>>  	xe->preempt_fence_wq = alloc_ordered_workqueue("xe-preempt-fence-wq",
>>>  						       WQ_MEM_RECLAIM);
>>> @@ -569,16 +587,15 @@ struct xe_device *xe_device_create(struct pci_dev *pdev)
>>>  		 * drmm_add_action_or_reset register above
>>>  		 */
>>>  		drm_err(&xe->drm, "Failed to allocate xe workqueues\n");
>>> -		return ERR_PTR(-ENOMEM);
>>> +		return -ENOMEM;
>>>  	}
>>>  
>>>  	err = drmm_mutex_init(&xe->drm, &xe->pmt.lock);
>>>  	if (err)
>>> -		return ERR_PTR(err);
>>> +		return err;
>>>  
>>> -	return xe;
>>> +	return 0;
>>>  }
>>> -ALLOW_ERROR_INJECTION(xe_device_create, ERRNO); /* See xe_pci_probe() */
>> 
>> So, there are things allocated/initialized in xe_device_create() that
>> get deallocated/finalized in xe_device_destroy().  With this change, we
>> now have things extracted out of xe_device_create() into
>> xe_device_init_early(), and it appears that xe_device_destroy() is now
>> really the counterpart of the latter than the former.
>> 
>> I think we should just call the function xe_device_init() and then
>> rename xe_device_destroy() to xe_device_fini().
>
> there is yet another small pending series that introduces drmm_alloc_workqueue
> and after that converts all WQ initializations into self-contained statements
>
> didn't include here, as it touches include/drm so can be done in its own pace
>
> but with that xe_device_destroy (which should never be called that way)
> will be greatly reduced, and even possibly completely removed after moving
> registration of some cleanup steps to xe_bo_dev_init and/or ttm_init
>
>> 
>>>  
>>>  static bool xe_driver_flr_disabled(struct xe_device *xe)
>>>  {
>>> diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
>>> index 27cd2329b99f..975768a6a9c8 100644
>>> --- a/drivers/gpu/drm/xe/xe_device.h
>>> +++ b/drivers/gpu/drm/xe/xe_device.h
>>> @@ -44,6 +44,7 @@ static inline struct xe_device *ttm_to_xe_device(struct ttm_device *ttm)
>>>  }
>>>  
>>>  struct xe_device *xe_device_create(struct pci_dev *pdev);
>>> +int xe_device_init_early(struct xe_device *xe);
>> 
>> Right now we do not have any external users for this function.  We could
>> keep it static until we get to use them in kunit.
>
> my kunit series just got too big so I've started posting smaller chunks
> don't really want to go into static/public noise just for few days delay
> between series
>
>> 
>> That said, I think it would also make sense to separate it completely
>> from xe_device_create(), i.e., have xe_pci.c call the two functions
>> instead of xe_device_init_early() being implicitly called by
>> xe_device_create().  I think that would be more alligned with the
>> function names.
>
> hmm, maybe
>
> but that would likely require also extracting call to
>
> 	aperture_remove_conflicting_pci_devices
>
> and I'm not ready for that
>
> so IMO it's better to keep "xe_device_create()" as "Allocate and initialize"
> as we have now in the kernel-doc ;)

Fair.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>

>
>> 
>> --
>> Gustavo Sousa
>> 
>>>  int xe_device_probe_early(struct xe_device *xe);
>>>  int xe_device_probe(struct xe_device *xe);
>>>  void xe_device_remove(struct xe_device *xe);
>>> -- 
>>> 2.47.1

  reply	other threads:[~2026-05-25 19:06 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-25 16:05 [PATCH 0/6] drm/xe: Misc initialization improvements Michal Wajdeczko
2026-05-25 16:05 ` [PATCH 1/6] drm/xe: Drop unused param from xe_device_create() Michal Wajdeczko
2026-05-25 18:13   ` Raag Jadav
2026-05-25 18:45   ` Gustavo Sousa
2026-05-25 16:05 ` [PATCH 2/6] drm/xe: Move xe->info.force_execlist initialization Michal Wajdeczko
2026-05-25 18:46   ` Gustavo Sousa
2026-05-25 16:05 ` [PATCH 3/6] drm/xe: Move xe->info.devid|revid initialization Michal Wajdeczko
2026-05-25 17:47   ` Gustavo Sousa
2026-05-25 18:25     ` Michal Wajdeczko
2026-05-25 19:12       ` Gustavo Sousa
2026-05-25 19:41   ` [PATCH v2 " Michal Wajdeczko
2026-05-25 20:01     ` Gustavo Sousa
2026-05-25 16:05 ` [PATCH 4/6] drm/xe: Separate early xe_device initialization Michal Wajdeczko
2026-05-25 18:21   ` Gustavo Sousa
2026-05-25 18:57     ` Michal Wajdeczko
2026-05-25 19:05       ` Gustavo Sousa [this message]
2026-05-25 19:00     ` Gustavo Sousa
2026-05-25 19:14       ` Michal Wajdeczko
2026-05-25 18:22   ` Raag Jadav
2026-05-25 16:05 ` [PATCH 5/6] drm/xe/pm: Don't access device in init_early() Michal Wajdeczko
2026-05-25 18:30   ` Raag Jadav
2026-05-25 18:47   ` Gustavo Sousa
2026-05-27 15:46   ` Rodrigo Vivi
2026-05-25 16:05 ` [PATCH 6/6] drm/xe/pm: Do early initialization " Michal Wajdeczko
2026-05-25 18:47   ` Gustavo Sousa
2026-05-25 16:36 ` ✓ CI.KUnit: success for drm/xe: Misc initialization improvements Patchwork
2026-05-25 17:42 ` ✓ Xe.CI.BAT: " Patchwork
2026-05-25 19:48 ` ✓ CI.KUnit: success for drm/xe: Misc initialization improvements (rev2) Patchwork
2026-05-25 20:49 ` ✓ Xe.CI.BAT: " Patchwork
2026-05-26  1:59 ` ✓ Xe.CI.FULL: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tsrvgvht.fsf@intel.com \
    --to=gustavo.sousa@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=michal.wajdeczko@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.