From: Riana Tauro <riana.tauro@intel.com>
To: Lucas De Marchi <lucas.demarchi@intel.com>,
Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, <anshuman.gupta@intel.com>,
<matthew.d.roper@intel.com>, <aravind.iddamsetty@linux.intel.com>
Subject: Re: [PATCH 0/2] Add configfs support for survivability mode
Date: Tue, 1 Apr 2025 11:48:54 +0530 [thread overview]
Message-ID: <cf183155-8df0-4f62-8e95-ec001ce01bb5@intel.com> (raw)
In-Reply-To: <t33j6tcubwtq6r4wuhmz2p6s2wwpqsv7bye7ilzwf7r3vwpt2p@bop726wytsvs>
Hi Lucas
On 4/1/2025 3:15 AM, Lucas De Marchi wrote:
> On Mon, Mar 31, 2025 at 04:19:28PM -0400, Rodrigo Vivi wrote:
>> On Thu, Mar 27, 2025 at 09:40:39AM -0500, Lucas De Marchi wrote:
>>> On Thu, Mar 27, 2025 at 12:12:00PM +0530, Riana Tauro wrote:
>>> > This series proposes to expose attributes via xe configfs
>>> > subsystem. Xe registers a configfs subsystem named 'xe'.
>>> > Userspace can then create directories for the devices they
>>> > want to configure and set appropriate attributes
>>> >
>>> > This is done by
>>> >
>>> > mount -t configfs none /config
>>> > mkdir /config/xe/0000:03:00.0
>>> >
>>>
>>> If we need a new version or to document anywhere in our docs, I'd add a
>>> comment here:
>>>
>>> # If driver is already bound, unbind it as this configuration
>>> # applies only when probing it
>>>
>>> > echo 0000:03:00.0 > /sys/bus/pci/drivers/xe/unbind
>>> > echo 1 > sys/kernel/config/xe/0000:03:00.0/survivability_mode
>>> > echo 0000:03:00.0 > /sys/bus/pci/drivers/xe/bind
>>> >
>>> > This is an alternative to introducing module param that causes
>>> > all the connected and supported GPU cards to enter survivability mode.
>>> > Manually entering survivability mode is useful when pcode does not
>>> > report failure, in field repairs and validation
>>> >
>>> > Rev2: use config_groups (Lucas)
>>>
>>> Awesome. I have some other work pending that will make use of
>>> it. I will play with these patches soon.
>>
>> I really liked this new flow and I was giving it a try here right now.
>>
>> However it didn't work. It didn't take me to the survivability mode,
>> but also, I cannot unload the xe after creating this configfs file:
>>
>> sudo remove /sys/kernel/config/xe/0000\:0*
>> rm: cannot remove '0000:00:02.0/survivability_mode': Operation not
>> permitted
>> rm: cannot remove '0000:03:00.0/survivability_mode': Operation not
>> permitted
>
> humn... testing on a bmg, it works:
>
> # # first of all, make sure autoprobe doesn't do what we don't
> # # want:
> # echo 0 > /sys/bus/pci/drivers_autoprobe
>
> # # load module and set the configuration
> # modprobe xe
> # mkdir /sys/kernel/config/xe/0000:03:00.0
> # echo 1 > /sys/kernel/config/xe/0000\:03\:00.0/survivability_mode
>
> # # bind the driver and check it enters survivability mode
> # echo 0000:03:00.0 > /sys/module/xe/drivers/pci\:xe/bind
> # dmesg | tail -n1
> [ 1994.807063] xe 0000:03:00.0: In Survivability Mode
> # cat /sys/bus/pci/drivers/xe/0000\:03\:00.0/survivability_mode
> Capability Info: 0x138320 - 0x2001ae06
> Postcode Info: 0x138324 - 0x0
> Overflow Info: 0x138328 - 0x0
> Auxiliary Info 0: 0x13832c - 0x0
>
> Unbind first and test we can remove the configuration for next bind:
>
> # echo 0000:03:00.0 > /sys/module/xe/drivers/pci\:xe/unbind
> # tree /sys/kernel/config/xe
> /sys/kernel/config/xe
> └── 0000:03:00.0
> └── survivability_mode
> # rmdir /sys/kernel/config/xe/0000:03:00.0/
> # tree /sys/kernel/config/xe
> /sys/kernel/config/xe
>
> Remove module:
> # modprobe -r xe
>
>
> What doesn't work:
>
> 1) Remove the module without unbinding. This is already the case
> 2) Remove the module without unbinding first
Do you mean removing directory?>
> For (2) it's basicaly: when you create a configfs connection, configfs
> increments the module's refcount. You have to remove them first.
yeah it takes a refcount and i think subsystem register fails if that's
not taken. haven't tried >
> For my surprise, it's possible to remove the config after binding -
> there's no error.
When i was trying, even with config_item reference is taken, it had
allowed rmdir. drop_item is void and cannot return a error but didn't
search further as survivability mode didn't require a reference.
Thanks
Riana> I need to double check if this wouldn't create some
> UAF depending on how we use the config, but for survivability purposes,
> I don't think it has an issue with that single bool.
>
> Lucas De Marchi
>
>>
>> Tried to unbind and had the same failure.
>>
>> then with the configfs there we cannot remove the module:
>> $ sudo rmmod xe
>> rmmod: ERROR: Module xe is in use
>
> what's the `lsmod | grep xe` output?
>
> You should always be able to unbind. There's nothing the driver can do
> that would block the unbind. After unbind, you should rmdir every dir in
> /sys/kernel/config/xe/*. Note that it's not an rm -r since you can't
> remove the inner configuration files, only the directory that is the
> "connection" between configfs and the driver. You also can't remove the
> xe dir (as it's owned by the module, not the connection to the device),
> only the dirs under it.
>
> Lucas De Marchi
>
>
>>
>>
>> So, it looks we have some stuff to adjust here before we can move
>> further,
>> but so far things are looking promising indeed
>>
>>>
>>> thanks
>>> Lucas De Marchi
next prev parent reply other threads:[~2025-04-01 6:19 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-27 6:42 [PATCH 0/2] Add configfs support for survivability mode Riana Tauro
2025-03-27 6:25 ` ✓ CI.Patch_applied: success for Add configfs support for survivability mode (rev2) Patchwork
2025-03-27 6:26 ` ✗ CI.checkpatch: warning " Patchwork
2025-03-27 6:26 ` ✗ CI.KUnit: failure " Patchwork
2025-03-27 6:42 ` [PATCH 1/2] drm/xe: Add configfs to enable survivability mode Riana Tauro
2025-04-01 1:03 ` Lucas De Marchi
2025-04-01 8:13 ` Riana Tauro
2025-03-27 6:42 ` [PATCH 2/2] drm/xe: Enable configfs support for " Riana Tauro
2025-04-01 3:25 ` Lucas De Marchi
2025-03-27 14:40 ` [PATCH 0/2] Add " Lucas De Marchi
2025-03-31 20:19 ` Rodrigo Vivi
2025-03-31 21:45 ` Lucas De Marchi
2025-04-01 6:18 ` Riana Tauro [this message]
2025-04-01 5:55 ` Riana Tauro
-- strict thread matches above, loose matches on Subject: below --
2025-03-07 14:24 Riana Tauro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cf183155-8df0-4f62-8e95-ec001ce01bb5@intel.com \
--to=riana.tauro@intel.com \
--cc=anshuman.gupta@intel.com \
--cc=aravind.iddamsetty@linux.intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=lucas.demarchi@intel.com \
--cc=matthew.d.roper@intel.com \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox