From: Michal Wajdeczko <michal.wajdeczko@intel.com>
To: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: intel-xe@lists.freedesktop.org,
"Piotr Piórkowski" <piotr.piorkowski@intel.com>
Subject: Re: [PATCH] drm/xe/pf: Restore TotalVFs
Date: Mon, 15 Jul 2024 19:40:46 +0200 [thread overview]
Message-ID: <64a46ab4-6b7c-49e0-862c-cb8829658d54@intel.com> (raw)
In-Reply-To: <rzlfgpxzbypgidpwtybd4usfsdk7kic4ieqjzawzpjziv2vj65@tp7zl6unxjpx>
On 15.07.2024 19:09, Lucas De Marchi wrote:
> On Mon, Jul 15, 2024 at 01:40:52PM GMT, Michal Wajdeczko wrote:
>> If we update TotalVFs to a new value and fail the .probe() then
>> the PCI subsystem will not restore original value, which might
>> impact next .probe() attempt. As a backup plan, use managed action
>> to restore it ourselves.
>>
>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
>> ---
>> Attempt to fix that at PCI layer is here [1], but until it will be
>> available for us, we can still fix that on our side.
>>
>> [1]
>> https://lore.kernel.org/linux-pci/20240715102835.1286-1-michal.wajdeczko@intel.com/
>
> I don't think we should merge this while that is pending with no reply.
> If that is not accepted, then we can think about doing this.
fair enough, but idea was to have it little sooner so it will help Piotr
with his quite frequent issue
>
>> ---
>> drivers/gpu/drm/xe/xe_sriov_pf.c | 48 +++++++++++++++++++++++++++-----
>> 1 file changed, 41 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_sriov_pf.c
>> b/drivers/gpu/drm/xe/xe_sriov_pf.c
>> index 0f721ae17b26..d61933770f1c 100644
>> --- a/drivers/gpu/drm/xe/xe_sriov_pf.c
>> +++ b/drivers/gpu/drm/xe/xe_sriov_pf.c
>> @@ -17,12 +17,40 @@ static unsigned int wanted_max_vfs(struct
>> xe_device *xe)
>> return xe_modparam.max_vfs;
>> }
>>
>> -static int pf_reduce_totalvfs(struct xe_device *xe, int limit)
>> +static void restore_totalvfs(void *arg)
>> +{
>> + struct xe_device *xe = arg;
>> + struct device *dev = xe->drm.dev;
>> + struct pci_dev *pdev = to_pci_dev(dev);
>> + int totalvfs = xe->sriov.pf.device_total_vfs;
>> +
>> + xe_sriov_dbg_verbose(xe, "restoring totalvfs to %d\n", totalvfs);
>> + pci_sriov_set_totalvfs(pdev, totalvfs);
>> +}
>> +
>> +static int pf_reduce_totalvfs(struct xe_device *xe, u16 limit)
>
> ^^^^^^
>
> so the current logic assumes the maximum comes from BIOS, but it could
> rather come from a previous (failed) driver load? once we set the value
> is the information lost from the device or can we recover it from
> somewhere?
> somewhere != where we saved in the this driver.
PCI layer keeps the original value read from the device PCI config space
and currently restores it only on .remove but not after failed .probe
[2] https://elixir.bootlin.com/linux/latest/source/drivers/pci/iov.c#L934
lucky for us, we already were caching this original value for debugfs
reporting, so it's quite easy to restore it by ourselves
>
>
>> {
>> struct device *dev = xe->drm.dev;
>> struct pci_dev *pdev = to_pci_dev(dev);
>> int err;
>>
>> + xe->sriov.pf.device_total_vfs = pci_sriov_get_totalvfs(pdev);
>> + xe->sriov.pf.driver_max_vfs = limit;
>> +
>> + xe_sriov_dbg_verbose(xe, "reducing totalvfs from %u to %u\n",
>> + xe->sriov.pf.device_total_vfs,
>> + xe->sriov.pf.driver_max_vfs);
>> +
>> + /*
>> + * XXX: If we update TotalVFs to a new value and fail the .probe()
>
> why XXX?
as only this chunk (comment/call) could be removed when [1] lands on our
tree, but even if it stays it will be harmless
>
>> + * then the PCI subsystem might not restore original TotalVFs value,
>> + * which might impact next .probe() attempt. As a backup plan, use
>> + * managed action to restore it ourselves.
>> + *
>> + * Note that it's not critical if registering that action fails.
>> + */
>> + devm_add_action(xe->drm.dev, restore_totalvfs, xe);
>> +
>> err = pci_sriov_set_totalvfs(pdev, limit);
>> if (err)
>> xe_sriov_notice(xe, "Failed to set number of VFs to %d (%pe)\n",
>> @@ -37,6 +65,14 @@ static bool pf_continue_as_native(struct xe_device
>> *xe, const char *why)
>> return false;
>> }
>>
>> +static bool pf_continue_ready(struct xe_device *xe, u16 max_vfs)
>> +{
>> + xe_assert(xe, max_vfs > 0);
>> + xe_sriov_dbg(xe, "preparing for %u VF%s\n", max_vfs,
>> str_plural(max_vfs));
>> + pf_reduce_totalvfs(xe, max_vfs);
>> + return true;
>
> what's the point of splitting this function that always return true? I'd
> rather leave it where it was
I've moved statements to store device_total_vfs & driver_max_vfs to
pf_reduce_totalvfs() so below function is now just a _dispatcher_ and
this function name also acts a self-comment
>
> Lucas De Marchi
>
>> +}
>> +
>> /**
>> * xe_sriov_pf_readiness - Check if PF functionality can be enabled.
>> * @xe: the &xe_device to check
>> @@ -58,18 +94,16 @@ bool xe_sriov_pf_readiness(struct xe_device *xe)
>> if (!dev_is_pf(dev))
>> return false;
>>
>> + if (!totalvfs)
>> + return pf_continue_as_native(xe, "no VFs reported");
>> +
>> if (!xe_device_uc_enabled(xe))
>> return pf_continue_as_native(xe, "Guc submission disabled");
>>
>> if (!newlimit)
>> return pf_continue_as_native(xe, "all VFs disabled");
>>
>> - pf_reduce_totalvfs(xe, newlimit);
>> -
>> - xe->sriov.pf.device_total_vfs = totalvfs;
>> - xe->sriov.pf.driver_max_vfs = newlimit;
>> -
>> - return true;
>> + return pf_continue_ready(xe, newlimit);
>> }
>>
>> /**
>> --
>> 2.43.0
>>
next prev parent reply other threads:[~2024-07-15 17:40 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-15 11:40 [PATCH] drm/xe/pf: Restore TotalVFs Michal Wajdeczko
2024-07-15 11:46 ` ✓ CI.Patch_applied: success for " Patchwork
2024-07-15 11:47 ` ✓ CI.checkpatch: " Patchwork
2024-07-15 11:49 ` ✓ CI.KUnit: " Patchwork
2024-07-15 12:01 ` ✓ CI.Build: " Patchwork
2024-07-15 12:04 ` ✓ CI.Hooks: " Patchwork
2024-07-15 12:05 ` ✓ CI.checksparse: " Patchwork
2024-07-15 12:28 ` ✓ CI.BAT: " Patchwork
2024-07-15 13:53 ` ✗ CI.FULL: failure " Patchwork
2024-07-15 15:16 ` Michal Wajdeczko
2024-07-15 17:09 ` [PATCH] " Lucas De Marchi
2024-07-15 17:40 ` Michal Wajdeczko [this message]
2024-08-07 10:05 ` Michal Wajdeczko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=64a46ab4-6b7c-49e0-862c-cb8829658d54@intel.com \
--to=michal.wajdeczko@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=lucas.demarchi@intel.com \
--cc=piotr.piorkowski@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox