All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
To: Tomer Tayar <ttayar@habana.ai>,
	"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"alexander.deucher@amd.com" <alexander.deucher@amd.com>,
	"airlied@gmail.com" <airlied@gmail.com>,
	"daniel@ffwll.ch" <daniel@ffwll.ch>,
	"joonas.lahtinen@linux.intel.com"
	<joonas.lahtinen@linux.intel.com>,
	"ogabbay@kernel.org" <ogabbay@kernel.org>,
	"Hawking.Zhang@amd.com" <Hawking.Zhang@amd.com>,
	"Harish.Kasiviswanathan@amd.com" <Harish.Kasiviswanathan@amd.com>,
	"Felix.Kuehling@amd.com" <Felix.Kuehling@amd.com>,
	"Luben.Tuikov@amd.com" <Luben.Tuikov@amd.com>,
	"Ruhl, Michael J" <michael.j.ruhl@intel.com>
Subject: Re: [Intel-xe] [RFC v2 5/5] drm/xe/RAS: send multicast event on occurrence of an error
Date: Wed, 22 Nov 2023 20:04:21 +0530	[thread overview]
Message-ID: <07e4bc45-5dcf-4a57-9543-cc981991337d@linux.intel.com> (raw)
In-Reply-To: <e58e5753-d501-4a5a-86be-4cdebc31a287@habana.ai>


On 11/12/23 20:58, Tomer Tayar wrote:
> On 10/11/2023 14:27, Tomer Tayar wrote:
>> On 20/10/2023 18:58, Aravind Iddamsetty wrote:
>>> Whenever a correctable or an uncorrectable error happens an event is sent
>>> to the corresponding listeners of these groups.
>>>
>>> v2: Rebase
>>>
>>> Signed-off-by: Aravind Iddamsetty<aravind.iddamsetty@linux.intel.com>
>>> ---
>>>    drivers/gpu/drm/xe/xe_hw_error.c | 33 ++++++++++++++++++++++++++++++++
>>>    1 file changed, 33 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_hw_error.c b/drivers/gpu/drm/xe/xe_hw_error.c
>>> index bab6d4cf0b69..b0befb5e01cb 100644
>>> --- a/drivers/gpu/drm/xe/xe_hw_error.c
>>> +++ b/drivers/gpu/drm/xe/xe_hw_error.c
>>> @@ -786,6 +786,37 @@ xe_soc_hw_error_handler(struct xe_tile *tile, const enum hardware_error hw_err)
>>>    				(HARDWARE_ERROR_MAX << 1) + 1);
>>>    }
>>>    
>>> +static void
>>> +generate_netlink_event(struct xe_device *xe, const enum hardware_error hw_err)
>>> +{
>>> +	struct sk_buff *msg;
>>> +	void *hdr;
>>> +
>>> +	if (!xe->drm.drm_genl_family.module)
>>> +		return;
>>> +
>>> +	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
>>> +	if (!msg) {
>>> +		drm_dbg_driver(&xe->drm, "couldn't allocate memory for error multicast event\n");
>>> +		return;
>>> +	}
>>> +
>>> +	hdr = genlmsg_put(msg, 0, 0, &xe->drm.drm_genl_family, 0, DRM_RAS_CMD_ERROR_EVENT);
>>> +	if (!hdr) {
>>> +		drm_dbg_driver(&xe->drm, "mutlicast msg buffer is small\n");
>>> +		nlmsg_free(msg);
>>> +		return;
>>> +	}
>>> +
>>> +	genlmsg_end(msg, hdr);
>>> +
>>> +	genlmsg_multicast(&xe->drm.drm_genl_family, msg, 0,
>>> +			  hw_err ?
>>> +			  DRM_GENL_MCAST_UNCORR_ERR
>>> +			  : DRM_GENL_MCAST_CORR_ERR,
>>> +			  GFP_ATOMIC);
>> I agree that hiding/wrapping any netlink/genetlink API/macro with a DRM
>> helper would be sometimes redundant,
>> and that in some cases the specific DRM driver would have to "dirt its
>> hands" and deal with netlink (e.g. fill_error_details() in patch #3).
>> However maybe here a DRM helper would have been useful, so we won't see
>> a copy of this sequence in other DRM drivers?
>>
>> Thanks,
>> Tomer
> After rethinking, it is possible that different DRM drivers will need 
> some flexibility when it comes to calling genlmsg_put(), as they might 
> want to have more of this call in order to attach some data related to 
> the error indication.
> In that case, adding a DRM function that wraps it may me redundant.
> What do you think?
I think we can expose this base level call to every drm driver and if it wants
to add any custom msg would define it own helper that should be ok i believe.


Thanks,
Aravind.
>
>>> +}
>>> +
>>>    static void
>>>    xe_hw_error_source_handler(struct xe_tile *tile, const enum hardware_error hw_err)
>>>    {
>>> @@ -849,6 +880,8 @@ xe_hw_error_source_handler(struct xe_tile *tile, const enum hardware_error hw_er
>>>    	}
>>>    
>>>    	xe_mmio_write32(gt, DEV_ERR_STAT_REG(hw_err), errsrc);
>>> +
>>> +	generate_netlink_event(tile_to_xe(tile), hw_err);
>>>    unlock:
>>>    	spin_unlock_irqrestore(&tile_to_xe(tile)->irq.lock, flags);
>>>    }

WARNING: multiple messages have this Message-ID (diff)
From: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
To: Tomer Tayar <ttayar@habana.ai>,
	"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"alexander.deucher@amd.com" <alexander.deucher@amd.com>,
	"airlied@gmail.com" <airlied@gmail.com>,
	"daniel@ffwll.ch" <daniel@ffwll.ch>,
	"joonas.lahtinen@linux.intel.com"
	<joonas.lahtinen@linux.intel.com>,
	"ogabbay@kernel.org" <ogabbay@kernel.org>,
	"Hawking.Zhang@amd.com" <Hawking.Zhang@amd.com>,
	"Harish.Kasiviswanathan@amd.com" <Harish.Kasiviswanathan@amd.com>,
	"Felix.Kuehling@amd.com" <Felix.Kuehling@amd.com>,
	"Luben.Tuikov@amd.com" <Luben.Tuikov@amd.com>,
	"Ruhl, Michael J" <michael.j.ruhl@intel.com>
Subject: Re: [RFC v2 5/5] drm/xe/RAS: send multicast event on occurrence of an error
Date: Wed, 22 Nov 2023 20:04:21 +0530	[thread overview]
Message-ID: <07e4bc45-5dcf-4a57-9543-cc981991337d@linux.intel.com> (raw)
In-Reply-To: <e58e5753-d501-4a5a-86be-4cdebc31a287@habana.ai>


On 11/12/23 20:58, Tomer Tayar wrote:
> On 10/11/2023 14:27, Tomer Tayar wrote:
>> On 20/10/2023 18:58, Aravind Iddamsetty wrote:
>>> Whenever a correctable or an uncorrectable error happens an event is sent
>>> to the corresponding listeners of these groups.
>>>
>>> v2: Rebase
>>>
>>> Signed-off-by: Aravind Iddamsetty<aravind.iddamsetty@linux.intel.com>
>>> ---
>>>    drivers/gpu/drm/xe/xe_hw_error.c | 33 ++++++++++++++++++++++++++++++++
>>>    1 file changed, 33 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_hw_error.c b/drivers/gpu/drm/xe/xe_hw_error.c
>>> index bab6d4cf0b69..b0befb5e01cb 100644
>>> --- a/drivers/gpu/drm/xe/xe_hw_error.c
>>> +++ b/drivers/gpu/drm/xe/xe_hw_error.c
>>> @@ -786,6 +786,37 @@ xe_soc_hw_error_handler(struct xe_tile *tile, const enum hardware_error hw_err)
>>>    				(HARDWARE_ERROR_MAX << 1) + 1);
>>>    }
>>>    
>>> +static void
>>> +generate_netlink_event(struct xe_device *xe, const enum hardware_error hw_err)
>>> +{
>>> +	struct sk_buff *msg;
>>> +	void *hdr;
>>> +
>>> +	if (!xe->drm.drm_genl_family.module)
>>> +		return;
>>> +
>>> +	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
>>> +	if (!msg) {
>>> +		drm_dbg_driver(&xe->drm, "couldn't allocate memory for error multicast event\n");
>>> +		return;
>>> +	}
>>> +
>>> +	hdr = genlmsg_put(msg, 0, 0, &xe->drm.drm_genl_family, 0, DRM_RAS_CMD_ERROR_EVENT);
>>> +	if (!hdr) {
>>> +		drm_dbg_driver(&xe->drm, "mutlicast msg buffer is small\n");
>>> +		nlmsg_free(msg);
>>> +		return;
>>> +	}
>>> +
>>> +	genlmsg_end(msg, hdr);
>>> +
>>> +	genlmsg_multicast(&xe->drm.drm_genl_family, msg, 0,
>>> +			  hw_err ?
>>> +			  DRM_GENL_MCAST_UNCORR_ERR
>>> +			  : DRM_GENL_MCAST_CORR_ERR,
>>> +			  GFP_ATOMIC);
>> I agree that hiding/wrapping any netlink/genetlink API/macro with a DRM
>> helper would be sometimes redundant,
>> and that in some cases the specific DRM driver would have to "dirt its
>> hands" and deal with netlink (e.g. fill_error_details() in patch #3).
>> However maybe here a DRM helper would have been useful, so we won't see
>> a copy of this sequence in other DRM drivers?
>>
>> Thanks,
>> Tomer
> After rethinking, it is possible that different DRM drivers will need 
> some flexibility when it comes to calling genlmsg_put(), as they might 
> want to have more of this call in order to attach some data related to 
> the error indication.
> In that case, adding a DRM function that wraps it may me redundant.
> What do you think?
I think we can expose this base level call to every drm driver and if it wants
to add any custom msg would define it own helper that should be ok i believe.


Thanks,
Aravind.
>
>>> +}
>>> +
>>>    static void
>>>    xe_hw_error_source_handler(struct xe_tile *tile, const enum hardware_error hw_err)
>>>    {
>>> @@ -849,6 +880,8 @@ xe_hw_error_source_handler(struct xe_tile *tile, const enum hardware_error hw_er
>>>    	}
>>>    
>>>    	xe_mmio_write32(gt, DEV_ERR_STAT_REG(hw_err), errsrc);
>>> +
>>> +	generate_netlink_event(tile_to_xe(tile), hw_err);
>>>    unlock:
>>>    	spin_unlock_irqrestore(&tile_to_xe(tile)->irq.lock, flags);
>>>    }

  reply	other threads:[~2023-11-22 14:31 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-20 15:58 [Intel-xe] [RFC v4 0/5] Proposal to use netlink for RAS and Telemetry across drm subsystem Aravind Iddamsetty
2023-10-20 15:58 ` Aravind Iddamsetty
2023-10-20 15:58 ` [Intel-xe] [RFC v4 1/5] drm/netlink: Add netlink infrastructure Aravind Iddamsetty
2023-10-20 15:58   ` Aravind Iddamsetty
2023-10-20 20:36   ` [Intel-xe] " Ruhl, Michael J
2023-10-20 20:36     ` Ruhl, Michael J
2023-10-21  1:10     ` [Intel-xe] " Aravind Iddamsetty
2023-10-21  1:10       ` Aravind Iddamsetty
2023-11-10 12:24   ` [Intel-xe] " Tomer Tayar
2023-11-10 12:24     ` Tomer Tayar
2023-11-22 14:32     ` [Intel-xe] " Aravind Iddamsetty
2023-11-22 14:32       ` Aravind Iddamsetty
2023-11-23  7:26       ` [Intel-xe] " Tomer Tayar
2023-11-23  7:26         ` Tomer Tayar
2023-10-20 15:58 ` [Intel-xe] [RFC v2 2/5] drm/xe/RAS: Register netlink capability Aravind Iddamsetty
2023-10-20 15:58   ` Aravind Iddamsetty
2023-10-20 20:37   ` [Intel-xe] " Ruhl, Michael J
2023-10-20 20:37     ` Ruhl, Michael J
2023-10-20 15:58 ` [Intel-xe] [RFC v3 3/5] drm/xe/RAS: Expose the error counters Aravind Iddamsetty
2023-10-20 15:58   ` Aravind Iddamsetty
2023-10-20 20:39   ` [Intel-xe] " Ruhl, Michael J
2023-10-20 20:39     ` Ruhl, Michael J
2023-11-10 12:27   ` [Intel-xe] " Tomer Tayar
2023-11-10 12:27     ` Tomer Tayar
2023-11-22 14:33     ` [Intel-xe] " Aravind Iddamsetty
2023-11-22 14:33       ` Aravind Iddamsetty
2023-10-20 15:58 ` [Intel-xe] [RFC 4/5] drm/netlink: Define multicast groups Aravind Iddamsetty
2023-10-20 15:58   ` Aravind Iddamsetty
2023-10-20 20:39   ` [Intel-xe] " Ruhl, Michael J
2023-10-20 20:39     ` Ruhl, Michael J
2023-10-20 15:58 ` [Intel-xe] [RFC v2 5/5] drm/xe/RAS: send multicast event on occurrence of an error Aravind Iddamsetty
2023-10-20 15:58   ` Aravind Iddamsetty
2023-10-20 20:40   ` [Intel-xe] " Ruhl, Michael J
2023-10-20 20:40     ` Ruhl, Michael J
2023-11-10 12:27   ` [Intel-xe] " Tomer Tayar
2023-11-10 12:27     ` Tomer Tayar
2023-11-12 15:28     ` [Intel-xe] " Tomer Tayar
2023-11-12 15:28       ` Tomer Tayar
2023-11-22 14:34       ` Aravind Iddamsetty [this message]
2023-11-22 14:34         ` Aravind Iddamsetty
2023-10-23 15:29 ` [Intel-xe] [RFC v4 0/5] Proposal to use netlink for RAS and Telemetry across drm subsystem Alex Deucher
2023-10-23 15:29   ` Alex Deucher
2023-10-24  8:59   ` [Intel-xe] " Zhang, Hawking
2023-10-24  8:59     ` Zhang, Hawking
2023-10-26  9:27     ` [Intel-xe] " Aravind Iddamsetty
2023-10-26  9:27       ` Aravind Iddamsetty
2023-10-26 10:04   ` [Intel-xe] " Lazar, Lijo
2023-10-26 10:04     ` Lazar, Lijo
2023-10-30  6:19     ` [Intel-xe] " Aravind Iddamsetty
2023-10-30  6:19       ` Aravind Iddamsetty
2023-10-30 15:11       ` [Intel-xe] " Lazar, Lijo
2023-10-30 15:11         ` Lazar, Lijo
2023-11-01  8:06         ` [Intel-xe] " Aravind Iddamsetty
2023-11-01  8:06           ` Aravind Iddamsetty
2023-11-07  5:30           ` [Intel-xe] " Lazar, Lijo
2023-11-07  5:30             ` Lazar, Lijo
2023-11-08  9:24             ` [Intel-xe] " Aravind Iddamsetty
2023-11-08  9:24               ` Aravind Iddamsetty
2023-10-24  1:58 ` [Intel-xe] ✗ CI.Patch_applied: failure for Proposal to use netlink for RAS and Telemetry across drm subsystem (rev4) Patchwork
2023-11-10 12:23 ` [Intel-xe] [RFC v4 0/5] Proposal to use netlink for RAS and Telemetry across drm subsystem Tomer Tayar
2023-11-10 12:23   ` Tomer Tayar
2023-11-22 14:28   ` [Intel-xe] " Aravind Iddamsetty
2023-11-22 14:28     ` Aravind Iddamsetty
  -- strict thread matches above, loose matches on Subject: below --
2023-10-08  9:26 [Intel-xe] [RFC v3 " Aravind Iddamsetty
2023-10-08  9:26 ` [Intel-xe] [RFC v2 5/5] drm/xe/RAS: send multicast event on occurrence of an error Aravind Iddamsetty
2023-08-25 11:55 [Intel-xe] [RFC v2 0/5] Proposal to use netlink for RAS and Telemetry across drm subsystem Aravind Iddamsetty
2023-08-25 11:55 ` [Intel-xe] [RFC v2 5/5] drm/xe/RAS: send multicast event on occurrence of an error Aravind Iddamsetty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=07e4bc45-5dcf-4a57-9543-cc981991337d@linux.intel.com \
    --to=aravind.iddamsetty@linux.intel.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=Harish.Kasiviswanathan@amd.com \
    --cc=Hawking.Zhang@amd.com \
    --cc=Luben.Tuikov@amd.com \
    --cc=airlied@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=joonas.lahtinen@linux.intel.com \
    --cc=michael.j.ruhl@intel.com \
    --cc=ogabbay@kernel.org \
    --cc=ttayar@habana.ai \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.