Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Riana Tauro <riana.tauro@intel.com>
To: Lucas De Marchi <lucas.demarchi@intel.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, <anshuman.gupta@intel.com>,
	<matthew.d.roper@intel.com>, <aravind.iddamsetty@linux.intel.com>
Subject: Re: [PATCH 0/2] Add configfs support for survivability mode
Date: Tue, 1 Apr 2025 11:48:54 +0530	[thread overview]
Message-ID: <cf183155-8df0-4f62-8e95-ec001ce01bb5@intel.com> (raw)
In-Reply-To: <t33j6tcubwtq6r4wuhmz2p6s2wwpqsv7bye7ilzwf7r3vwpt2p@bop726wytsvs>

Hi Lucas

On 4/1/2025 3:15 AM, Lucas De Marchi wrote:
> On Mon, Mar 31, 2025 at 04:19:28PM -0400, Rodrigo Vivi wrote:
>> On Thu, Mar 27, 2025 at 09:40:39AM -0500, Lucas De Marchi wrote:
>>> On Thu, Mar 27, 2025 at 12:12:00PM +0530, Riana Tauro wrote:
>>> > This series proposes to expose attributes via xe configfs
>>> > subsystem. Xe registers a configfs subsystem named 'xe'.
>>> > Userspace can then create directories for the devices they
>>> > want to configure and set appropriate attributes
>>> >
>>> > This is done by
>>> >
>>> > mount -t configfs none /config
>>> > mkdir /config/xe/0000:03:00.0
>>> >
>>>
>>> If we need a new version or to document anywhere in our docs, I'd add a
>>> comment here:
>>>
>>> # If driver is already bound, unbind it as this configuration
>>> # applies only when probing it
>>>
>>> > echo 0000:03:00.0 > /sys/bus/pci/drivers/xe/unbind
>>> > echo 1 > sys/kernel/config/xe/0000:03:00.0/survivability_mode
>>> > echo 0000:03:00.0 > /sys/bus/pci/drivers/xe/bind
>>> >
>>> > This is an alternative to introducing module param that causes
>>> > all the connected and supported GPU cards to enter survivability mode.
>>> > Manually entering survivability mode is useful when pcode does not
>>> > report failure, in field repairs and validation
>>> >
>>> > Rev2: use config_groups (Lucas)
>>>
>>> Awesome. I have some other work pending that will make use of
>>> it. I will play with these patches soon.
>>
>> I really liked this new flow and I was giving it a try here right now.
>>
>> However it didn't work. It didn't take me to the survivability mode,
>> but also, I cannot unload the xe after creating this configfs file:
>>
>> sudo remove /sys/kernel/config/xe/0000\:0*
>> rm: cannot remove '0000:00:02.0/survivability_mode': Operation not 
>> permitted
>> rm: cannot remove '0000:03:00.0/survivability_mode': Operation not 
>> permitted
> 
> humn... testing on a bmg, it works:
> 
>      # # first of all, make sure autoprobe doesn't do what we don't
>      # # want:
>      # echo 0  > /sys/bus/pci/drivers_autoprobe
> 
>      # # load module and set the configuration
>      # modprobe xe
>      # mkdir /sys/kernel/config/xe/0000:03:00.0
>      # echo 1 > /sys/kernel/config/xe/0000\:03\:00.0/survivability_mode
> 
>      # # bind the driver and check it enters survivability mode
>      # echo 0000:03:00.0 > /sys/module/xe/drivers/pci\:xe/bind
>      # dmesg | tail -n1
>      [ 1994.807063] xe 0000:03:00.0: In Survivability Mode
>      # cat  /sys/bus/pci/drivers/xe/0000\:03\:00.0/survivability_mode 
>      Capability Info: 0x138320 - 0x2001ae06
>      Postcode Info: 0x138324 - 0x0
>      Overflow Info: 0x138328 - 0x0
>      Auxiliary Info 0: 0x13832c - 0x0
> 
> Unbind first and test we can remove the configuration for next bind:
> 
>      # echo 0000:03:00.0 > /sys/module/xe/drivers/pci\:xe/unbind
>      # tree /sys/kernel/config/xe
>      /sys/kernel/config/xe
>      └── 0000:03:00.0
>          └── survivability_mode
>      # rmdir /sys/kernel/config/xe/0000:03:00.0/
>      # tree /sys/kernel/config/xe
>      /sys/kernel/config/xe
> 
> Remove module:
>      # modprobe -r xe
> 
> 
> What doesn't work:
> 
>      1) Remove the module without unbinding. This is already the case
>      2) Remove the module without unbinding first
Do you mean removing directory?>
> For (2) it's basicaly: when you create a configfs connection, configfs
> increments the module's refcount. You have to remove them first.
yeah it takes a refcount and i think subsystem register fails if that's 
not taken. haven't tried >
> For my surprise, it's possible to remove the config after binding -
> there's no error. 
When i was trying, even with config_item reference is taken, it had 
allowed rmdir. drop_item is void and cannot return a error but didn't 
search further as survivability mode didn't require a reference.

Thanks
Riana> I need to double check if this wouldn't create some
> UAF depending on how we use the config, but for survivability purposes,
> I don't think it has an issue with that single bool.
> 
> Lucas De Marchi
> 
>>
>> Tried to unbind and had the same failure.
>>
>> then with the configfs there we cannot remove the module:
>> $ sudo rmmod xe
>> rmmod: ERROR: Module xe is in use
> 
> what's the `lsmod | grep xe` output?
> 
> You should always be able to unbind. There's nothing the driver can do
> that would block the unbind. After unbind, you should rmdir every dir in
> /sys/kernel/config/xe/*. Note that it's not an rm -r since you can't
> remove the inner configuration files, only the directory that is the
> "connection" between configfs and the driver. You also can't remove the
> xe dir (as it's owned by the module, not the connection to the device),
> only  the dirs under it.
> 
> Lucas De Marchi
> 
> 
>>
>>
>> So, it looks we have some stuff to adjust here before we can move 
>> further,
>> but so far things are looking promising indeed
>>
>>>
>>> thanks
>>> Lucas De Marchi


  reply	other threads:[~2025-04-01  6:19 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-27  6:42 [PATCH 0/2] Add configfs support for survivability mode Riana Tauro
2025-03-27  6:25 ` ✓ CI.Patch_applied: success for Add configfs support for survivability mode (rev2) Patchwork
2025-03-27  6:26 ` ✗ CI.checkpatch: warning " Patchwork
2025-03-27  6:26 ` ✗ CI.KUnit: failure " Patchwork
2025-03-27  6:42 ` [PATCH 1/2] drm/xe: Add configfs to enable survivability mode Riana Tauro
2025-04-01  1:03   ` Lucas De Marchi
2025-04-01  8:13     ` Riana Tauro
2025-03-27  6:42 ` [PATCH 2/2] drm/xe: Enable configfs support for " Riana Tauro
2025-04-01  3:25   ` Lucas De Marchi
2025-03-27 14:40 ` [PATCH 0/2] Add " Lucas De Marchi
2025-03-31 20:19   ` Rodrigo Vivi
2025-03-31 21:45     ` Lucas De Marchi
2025-04-01  6:18       ` Riana Tauro [this message]
2025-04-01  5:55     ` Riana Tauro
  -- strict thread matches above, loose matches on Subject: below --
2025-03-07 14:24 Riana Tauro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cf183155-8df0-4f62-8e95-ec001ce01bb5@intel.com \
    --to=riana.tauro@intel.com \
    --cc=anshuman.gupta@intel.com \
    --cc=aravind.iddamsetty@linux.intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=lucas.demarchi@intel.com \
    --cc=matthew.d.roper@intel.com \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox