From: "Tauro, Riana" <riana.tauro@intel.com>
To: "Upadhyay, Tejas" <tejas.upadhyay@intel.com>,
"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>
Cc: "Gupta, Anshuman" <anshuman.gupta@intel.com>,
"Vivi, Rodrigo" <rodrigo.vivi@intel.com>,
"aravind.iddamsetty@linux.intel.com"
<aravind.iddamsetty@linux.intel.com>,
"Nilawar, Badal" <badal.nilawar@intel.com>,
"Jadav, Raag" <raag.jadav@intel.com>,
"Koppuravuri, Ravi Kishore" <ravi.kishore.koppuravuri@intel.com>,
"Koujalagi, Mallesh" <mallesh.koujalagi@intel.com>,
"Purkait, Soham" <soham.purkait@intel.com>,
"Ghimiray, Himal Prasad" <himal.prasad.ghimiray@intel.com>
Subject: Re: [PATCH v4 11/13] drm/xe/xe_ras: Add support for page offline list and queue commands
Date: Tue, 5 May 2026 10:47:15 +0530 [thread overview]
Message-ID: <dea087ac-68fd-42dd-ba9b-abfed59321fb@intel.com> (raw)
In-Reply-To: <DM6PR11MB32277AE94FAD89591953174B812C2@DM6PR11MB3227.namprd11.prod.outlook.com>
On 4/21/2026 2:40 PM, Upadhyay, Tejas wrote:
>
>> -----Original Message-----
>> From: Tauro, Riana <riana.tauro@intel.com>
>> Sent: 17 April 2026 14:28
>> To: intel-xe@lists.freedesktop.org
>> Cc: Tauro, Riana <riana.tauro@intel.com>; Gupta, Anshuman
>> <anshuman.gupta@intel.com>; Vivi, Rodrigo <rodrigo.vivi@intel.com>;
>> aravind.iddamsetty@linux.intel.com; Nilawar, Badal
>> <badal.nilawar@intel.com>; Jadav, Raag <raag.jadav@intel.com>;
>> Koppuravuri, Ravi Kishore <ravi.kishore.koppuravuri@intel.com>; Koujalagi,
>> Mallesh <mallesh.koujalagi@intel.com>; Purkait, Soham
>> <soham.purkait@intel.com>; Upadhyay, Tejas <tejas.upadhyay@intel.com>;
>> Ghimiray, Himal Prasad <himal.prasad.ghimiray@intel.com>
>> Subject: [PATCH v4 11/13] drm/xe/xe_ras: Add support for page offline list
>> and queue commands
>>
>> Add handling for page offline list and queue sysctrl commands. The page
>> offline list command retrieves pages that are already offlined by the firmware.
>> The page offline queue command retrieves the pages pending to be offlined by
>> the firmware.
>>
>> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
>> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>> Signed-off-by: Riana Tauro <riana.tauro@intel.com>
>> ---
>> drivers/gpu/drm/xe/xe_ras.c | 85 +++++++++++++++++++
>> drivers/gpu/drm/xe/xe_ras_types.h | 39 +++++++++
>> drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h | 6 +-
>> 3 files changed, 129 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_ras.c b/drivers/gpu/drm/xe/xe_ras.c index
>> cd1ac6441b92..42ec27c05e9a 100644
>> --- a/drivers/gpu/drm/xe/xe_ras.c
>> +++ b/drivers/gpu/drm/xe/xe_ras.c
>> @@ -319,6 +319,87 @@ int xe_ras_page_offline(struct xe_device *xe, enum
>> xe_ras_page_action action, u6
>> return 0;
>> }
>>
>> +static void get_queued_pages(struct xe_device *xe) {
>> + struct xe_sysctrl_mailbox_command command = {0};
>> + struct xe_ras_page_offline_queue response = {0};
>> + u32 count = 0;
>> + size_t rlen;
>> + int ret;
>> +
>> + /* Supported only on platforms with system controller */
>> + if (!xe->info.has_sysctrl)
>> + return;
>> +
>> + prepare_sysctrl_command(&command,
>> XE_SYSCTRL_CMD_GET_OFFLINE_QUEUE, NULL, 0,
>> + &response, sizeof(response));
>> +
>> + do {
>> + memset(&response, 0, sizeof(response));
>> +
>> + ret = xe_sysctrl_send_command(&xe->sc, &command, &rlen);
>> + if (ret) {
>> + xe_err(xe, "[RAS]: sysctrl: error ret %d\n", ret);
>> + return;
>> + }
>> +
>> + if (rlen != sizeof(response)) {
>> + xe_err(xe, "[RAS]: sysctrl: response size mismatch.
>> Expected %zu, got %zu\n",
>> + sizeof(response), rlen);
>> + return;
>> + }
>> +
>> + /* TODO: Process pages and offline them */
>> +
>> + count += response.pages_returned;
>> + if (count > response.total_pages) {
>> + xe_err(xe, "[RAS]: sysctrl: Pages returned exceed total
>> pages %u, returned %u\n",
>> + response.total_pages, count);
>> + return;
>> + }
>> + } while (response.additional_data);
>> +}
>> +
>> +static void get_offlined_list(struct xe_device *xe) {
>> + struct xe_sysctrl_mailbox_command command = {0};
>> + struct xe_ras_page_offline_list response = {0};
>> + int ret, count = 0;
>> + size_t rlen;
>> +
>> + /* Supported only on platforms with system controller */
>> + if (!xe->info.has_sysctrl)
>> + return;
>> +
>> + prepare_sysctrl_command(&command,
>> XE_SYSCTRL_CMD_GET_OFFLINE_LIST, NULL, 0,
>> + &response, sizeof(response));
>> +
>> + do {
>> + memset(&response, 0, sizeof(response));
>> +
>> + ret = xe_sysctrl_send_command(&xe->sc, &command, &rlen);
>> + if (ret) {
>> + xe_err(xe, "[RAS]: sysctrl: error ret %d\n", ret);
>> + return;
>> + }
>> +
>> + if (rlen != sizeof(response)) {
>> + xe_err(xe, "[RAS]: sysctrl: response size mismatch.
>> Expected %zu, got %zu\n",
>> + sizeof(response), rlen);
>> + return;
>> + }
>> +
>> + /* TODO: Process pages and offline them */
>> +
>> + count += response.pages_returned;
>> + if (count > response.total_pages) {
>> + xe_err(xe, "[RAS]: sysctrl: Pages returned exceed total
>> pages %u, returned %u\n",
>> + response.total_pages, count);
>> + return;
>> + }
>> + } while (response.additional_data);
>> +}
>> +
>> #ifdef CONFIG_PCIEAER
>> static void aer_unmask_and_downgrade_internal_error(struct xe_device *xe)
>> { @@ -394,4 +475,8 @@ void xe_ras_init(struct xe_device *xe) #ifdef
>> CONFIG_PCIEAER
>> aer_unmask_and_downgrade_internal_error(xe);
>> #endif
>> +
>> + /* Get any pages that need to be offlined from firmware and reserve
>> them */
>> + get_offlined_list(xe);
>> + get_queued_pages(xe);
>> }
>> diff --git a/drivers/gpu/drm/xe/xe_ras_types.h
>> b/drivers/gpu/drm/xe/xe_ras_types.h
>> index d76310866da5..1a5de6cd16a1 100644
>> --- a/drivers/gpu/drm/xe/xe_ras_types.h
>> +++ b/drivers/gpu/drm/xe/xe_ras_types.h
>> @@ -10,6 +10,7 @@
>>
>> #define XE_RAS_NUM_ERROR_ARR 3
>> #define XE_RAS_MAX_ERROR_DETAILS 16
>> +#define XE_RAS_NUM_PAGES 25
>> #define XE_RAS_IEH_PUNIT_ERROR BIT(1)
>>
>> /**
>> @@ -297,4 +298,42 @@ struct xe_ras_page_offline_response {
>> /** @reserved: Reserved for future use */
>> u32 reserved;
>> } __packed;
>> +
>> +/**
>> + * struct xe_ras_page_offline_list - Response from get offline list
>> +command
>> + *
>> + * This structure provides the details of offlined pages from flash.
>> + */
>> +struct xe_ras_page_offline_list {
>> + /** @max_entries: Total no of pages that can be stored in flash */
>> + u32 max_entries;
>> + /** @total_pages: Total number of permanently offlined pages */
>> + u32 total_pages;
>> + /** @pages_returned: Number of pages returned in this response */
>> + u32 pages_returned;
>> + /** @page_addresses: Array of permanently offlined page addresses
>> (4KB aligned) */
>> + u64 page_addresses[XE_RAS_NUM_PAGES];
>> + /** @additional_data: Indicates if more data is available */
>> + u8 additional_data;
>> + /** @reserved: Reserved for future use */
>> + u8 reserved[3];
>> +} __packed;
>> +
>> +/**
>> + * struct xe_ras_page_offline_queue - Response from get offline queue
>> +command
>> + *
>> + * This structure provides the details of queued offlined pages from firmware.
>> + */
>> +struct xe_ras_page_offline_queue {
>> + /** @total_pages: Total number of queued pages */
>> + u32 total_pages;
>> + /** @pages_returned: Number of pages returned in this response */
>> + u32 pages_returned;
>> + /** @page_addresses: Array of offlined page addresses (4KB aligned)
> I understand that we will have 4K aligned addresses reported to core offline API right? If yes can you also add double check before passing if its really 4K aligned?
Yes will add this in the integration patch
Thanks
Riana
>
> Tejas
>> */
>> + u64 page_addresses[XE_RAS_NUM_PAGES];
>> + /** @additional_data: Indicates if more data is available */
>> + u8 additional_data;
>> + /** @reserved: Reserved for future use */
>> + u8 reserved[3];
>> +} __packed;
>> #endif
>> diff --git a/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
>> b/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
>> index 2cafa8a14cc3..b6139ac0eaca 100644
>> --- a/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
>> +++ b/drivers/gpu/drm/xe/xe_sysctrl_mailbox_types.h
>> @@ -15,10 +15,14 @@
>> *
>> * @XE_SYSCTRL_CMD_GET_SOC_ERROR: Get basic error information
>> * @XE_SYSCTRL_CMD_PAGE_OFFLINE: Instruct firmware to offline/decline a
>> page
>> + * @XE_SYSCTRL_CMD_GET_OFFLINE_LIST: Get list of all offlined pages
>> + from flash
>> + * @XE_SYSCTRL_CMD_GET_OFFLINE_QUEUE: Get list of offlined queued
>> pages
>> + from firmware
>> */
>> enum xe_sysctrl_mailbox_command_id {
>> XE_SYSCTRL_CMD_GET_SOC_ERROR = 0x01,
>> - XE_SYSCTRL_CMD_PAGE_OFFLINE = 0x08
>> + XE_SYSCTRL_CMD_PAGE_OFFLINE = 0x08,
>> + XE_SYSCTRL_CMD_GET_OFFLINE_LIST = 0x09,
>> + XE_SYSCTRL_CMD_GET_OFFLINE_QUEUE = 0x0A
>> };
>>
>> enum xe_sysctrl_group {
>> --
>> 2.47.1
next prev parent reply other threads:[~2026-05-05 5:17 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-17 8:58 [PATCH v4 00/13] Introduce Xe Uncorrectable Error Handling Riana Tauro
2026-04-17 8:58 ` [PATCH v4 01/13] drm/xe/xe_survivability: Decouple survivability info from boot survivability Riana Tauro
2026-04-17 8:58 ` [PATCH v4 02/13] drm/xe/xe_pci_error: Implement PCI error recovery callbacks Riana Tauro
2026-04-27 6:35 ` Raag Jadav
2026-04-17 8:58 ` [PATCH v4 03/13] drm/xe/xe_pci_error: Group all devres to release them on PCIe slot reset Riana Tauro
2026-04-17 8:58 ` [PATCH v4 04/13] drm/xe: Skip device access during PCI error recovery Riana Tauro
2026-04-30 12:58 ` Anshuman Gupta
2026-04-17 8:58 ` [PATCH v4 05/13] drm/xe/xe_ras: Initialize Uncorrectable AER Registers Riana Tauro
2026-04-27 7:56 ` Raag Jadav
2026-05-05 5:22 ` Tauro, Riana
2026-04-17 8:58 ` [PATCH v4 06/13] drm/xe/xe_ras: Add basic structures and commands for uncorrectable errors Riana Tauro
2026-04-17 17:38 ` Matt Roper
2026-04-17 21:25 ` Jadav, Raag
2026-04-17 21:32 ` Matt Roper
2026-04-20 5:34 ` Tauro, Riana
2026-04-20 7:49 ` Raag Jadav
2026-04-17 8:58 ` [PATCH v4 07/13] drm/xe/xe_ras: Add support for uncorrectable core-compute errors Riana Tauro
2026-04-27 8:24 ` Raag Jadav
2026-05-05 5:28 ` Tauro, Riana
2026-04-17 8:58 ` [PATCH v4 08/13] drm/xe/xe_ras: Handle uncorrectable SoC Internal errors Riana Tauro
2026-04-17 8:58 ` [PATCH v4 09/13] drm/xe/xe_ras: Handle uncorrectable device memory errors Riana Tauro
2026-04-21 6:08 ` Upadhyay, Tejas
2026-05-05 5:03 ` Tauro, Riana
2026-04-17 8:58 ` [PATCH v4 10/13] drm/xe/xe_ras: Add support to offline/decline a page Riana Tauro
2026-04-21 6:21 ` Upadhyay, Tejas
2026-05-05 5:16 ` Tauro, Riana
2026-04-17 8:58 ` [PATCH v4 11/13] drm/xe/xe_ras: Add support for page offline list and queue commands Riana Tauro
2026-04-21 6:19 ` Upadhyay, Tejas
2026-05-05 5:08 ` Tauro, Riana
2026-04-21 9:10 ` Upadhyay, Tejas
2026-05-05 5:17 ` Tauro, Riana [this message]
2026-04-17 8:58 ` [PATCH v4 12/13] drm/xe/xe_ras: Query errors from system controller on probe Riana Tauro
2026-04-28 11:46 ` Raag Jadav
2026-04-17 8:58 ` [PATCH v4 13/13] drm/xe/xe_pci_error: Process errors in mmio_enabled Riana Tauro
2026-04-28 11:39 ` Raag Jadav
2026-05-05 5:31 ` Tauro, Riana
2026-04-30 11:15 ` Gupta, Anshuman
2026-05-02 17:55 ` Raag Jadav
2026-04-20 13:33 ` ✗ CI.checkpatch: warning for Introduce Xe Uncorrectable Error Handling (rev4) Patchwork
2026-04-20 13:35 ` ✓ CI.KUnit: success " Patchwork
2026-04-20 14:42 ` ✓ Xe.CI.BAT: " Patchwork
2026-04-20 17:14 ` ✗ Xe.CI.FULL: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=dea087ac-68fd-42dd-ba9b-abfed59321fb@intel.com \
--to=riana.tauro@intel.com \
--cc=anshuman.gupta@intel.com \
--cc=aravind.iddamsetty@linux.intel.com \
--cc=badal.nilawar@intel.com \
--cc=himal.prasad.ghimiray@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=mallesh.koujalagi@intel.com \
--cc=raag.jadav@intel.com \
--cc=ravi.kishore.koppuravuri@intel.com \
--cc=rodrigo.vivi@intel.com \
--cc=soham.purkait@intel.com \
--cc=tejas.upadhyay@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox