* Re: [PATCH v11 03/22] drm: Add new general DRM property "color format"
From: Dave Stevenson @ 2026-03-25 12:49 UTC (permalink / raw)
To: Nicolas Frattaroli
Cc: Harry Wentland, Leo Li, Rodrigo Siqueira, Alex Deucher,
Christian König, David Airlie, Simona Vetter,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
Andrzej Hajda, Neil Armstrong, Robert Foss, Laurent Pinchart,
Jonas Karlman, Jernej Skrabec, Sandy Huang, Heiko Stübner,
Andy Yan, Jani Nikula, Rodrigo Vivi, Joonas Lahtinen,
Tvrtko Ursulin, Dmitry Baryshkov, Sascha Hauer, Rob Herring,
Jonathan Corbet, Shuah Khan, kernel, amd-gfx, dri-devel,
linux-kernel, linux-arm-kernel, linux-rockchip, intel-gfx,
intel-xe, linux-doc, Werner Sembach, Andri Yngvason, Marius Vlad
In-Reply-To: <20260324-color-format-v11-3-605559af4fb4@collabora.com>
On Tue, 24 Mar 2026 at 16:02, Nicolas Frattaroli
<nicolas.frattaroli@collabora.com> wrote:
>
> Add a new general DRM property named "color format" which can be used by
> userspace to request the display driver to output a particular color
> format.
>
> Possible options are:
> - auto (setup by default, driver internally picks the color format)
> - rgb
> - ycbcr444
> - ycbcr422
> - ycbcr420
>
> Drivers should advertise from this list which formats they support.
> Together with this list and EDID data from the sink we should be able
> to relay a list of usable color formats to users to pick from.
>
> Co-developed-by: Werner Sembach <wse@tuxedocomputers.com>
> Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
> Co-developed-by: Andri Yngvason <andri@yngvason.is>
> Signed-off-by: Andri Yngvason <andri@yngvason.is>
> Signed-off-by: Marius Vlad <marius.vlad@collabora.com>
> Reviewed-by: Maxime Ripard <mripard@kernel.org>
> Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
> ---
> drivers/gpu/drm/drm_atomic_helper.c | 5 ++
> drivers/gpu/drm/drm_atomic_uapi.c | 11 ++++
> drivers/gpu/drm/drm_connector.c | 108 ++++++++++++++++++++++++++++++++++++
> include/drm/drm_connector.h | 104 ++++++++++++++++++++++++++++++++++
> 4 files changed, 228 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> index 26953ed6b53e..b7753454b777 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -737,6 +737,11 @@ drm_atomic_helper_check_modeset(struct drm_device *dev,
> if (old_connector_state->max_requested_bpc !=
> new_connector_state->max_requested_bpc)
> new_crtc_state->connectors_changed = true;
> +
> + if (old_connector_state->color_format !=
> + new_connector_state->color_format)
> + new_crtc_state->connectors_changed = true;
> +
> }
>
> if (funcs->atomic_check)
> diff --git a/drivers/gpu/drm/drm_atomic_uapi.c b/drivers/gpu/drm/drm_atomic_uapi.c
> index 5bd5bf6661df..dee510c85e59 100644
> --- a/drivers/gpu/drm/drm_atomic_uapi.c
> +++ b/drivers/gpu/drm/drm_atomic_uapi.c
> @@ -935,6 +935,15 @@ static int drm_atomic_connector_set_property(struct drm_connector *connector,
> state->privacy_screen_sw_state = val;
> } else if (property == connector->broadcast_rgb_property) {
> state->hdmi.broadcast_rgb = val;
> + } else if (property == connector->color_format_property) {
> + if (val > INT_MAX || !drm_connector_color_format_valid(val)) {
> + drm_dbg_atomic(connector->dev,
> + "[CONNECTOR:%d:%s] unknown color format %llu\n",
> + connector->base.id, connector->name, val);
> + return -EINVAL;
> + }
> +
> + state->color_format = val;
> } else if (connector->funcs->atomic_set_property) {
> return connector->funcs->atomic_set_property(connector,
> state, property, val);
> @@ -1020,6 +1029,8 @@ drm_atomic_connector_get_property(struct drm_connector *connector,
> *val = state->privacy_screen_sw_state;
> } else if (property == connector->broadcast_rgb_property) {
> *val = state->hdmi.broadcast_rgb;
> + } else if (property == connector->color_format_property) {
> + *val = state->color_format;
> } else if (connector->funcs->atomic_get_property) {
> return connector->funcs->atomic_get_property(connector,
> state, property, val);
> diff --git a/drivers/gpu/drm/drm_connector.c b/drivers/gpu/drm/drm_connector.c
> index 47dc53c4a738..e848374dee0b 100644
> --- a/drivers/gpu/drm/drm_connector.c
> +++ b/drivers/gpu/drm/drm_connector.c
> @@ -1388,6 +1388,18 @@ static const u32 hdmi_colorspaces =
> BIT(DRM_MODE_COLORIMETRY_DCI_P3_RGB_D65) |
> BIT(DRM_MODE_COLORIMETRY_DCI_P3_RGB_THEATER);
>
> +static const u32 hdmi_colorformats =
> + BIT(DRM_OUTPUT_COLOR_FORMAT_RGB444) |
> + BIT(DRM_OUTPUT_COLOR_FORMAT_YCBCR444) |
> + BIT(DRM_OUTPUT_COLOR_FORMAT_YCBCR422) |
> + BIT(DRM_OUTPUT_COLOR_FORMAT_YCBCR420);
> +
> +static const u32 dp_colorformats =
> + BIT(DRM_OUTPUT_COLOR_FORMAT_RGB444) |
> + BIT(DRM_OUTPUT_COLOR_FORMAT_YCBCR444) |
> + BIT(DRM_OUTPUT_COLOR_FORMAT_YCBCR422) |
> + BIT(DRM_OUTPUT_COLOR_FORMAT_YCBCR420);
> +
> /*
> * As per DP 1.4a spec, 2.2.5.7.5 VSC SDP Payload for Pixel Encoding/Colorimetry
> * Format Table 2-120
> @@ -2940,6 +2952,102 @@ int drm_connector_attach_colorspace_property(struct drm_connector *connector)
> }
> EXPORT_SYMBOL(drm_connector_attach_colorspace_property);
>
> +/**
> + * drm_connector_attach_color_format_property - create and attach color format property
> + * @connector: connector to create the color format property on
> + * @supported_color_formats: bitmask of bit-shifted &enum drm_output_color_format
> + * values the connector supports
> + *
> + * Called by a driver to create a color format property. The property is
> + * attached to the connector automatically on success.
> + *
> + * @supported_color_formats should only include color formats the connector
> + * type can actually support.
> + *
> + * Returns:
> + * 0 on success, negative errno on error
> + */
> +int drm_connector_attach_color_format_property(struct drm_connector *connector,
> + unsigned long supported_color_formats)
> +{
> + struct drm_device *dev = connector->dev;
> + struct drm_prop_enum_list enum_list[DRM_CONNECTOR_COLOR_FORMAT_COUNT];
> + unsigned int i = 0;
> + unsigned long fmt;
> +
> + if (connector->color_format_property)
> + return 0;
> +
> + if (!supported_color_formats) {
> + drm_err(dev, "No supported color formats provided on [CONNECTOR:%d:%s]\n",
> + connector->base.id, connector->name);
> + return -EINVAL;
> + }
> +
> + if (supported_color_formats & ~GENMASK(DRM_OUTPUT_COLOR_FORMAT_COUNT - 1, 0)) {
> + drm_err(dev, "Unknown color formats provided on [CONNECTOR:%d:%s]\n",
> + connector->base.id, connector->name);
> + return -EINVAL;
> + }
> +
> + switch (connector->connector_type) {
> + case DRM_MODE_CONNECTOR_HDMIA:
> + case DRM_MODE_CONNECTOR_HDMIB:
> + if (supported_color_formats & ~hdmi_colorformats) {
> + drm_err(dev, "Color formats not allowed for HDMI on [CONNECTOR:%d:%s]\n",
> + connector->base.id, connector->name);
> + return -EINVAL;
> + }
> + break;
> + case DRM_MODE_CONNECTOR_DisplayPort:
> + case DRM_MODE_CONNECTOR_eDP:
> + if (supported_color_formats & ~dp_colorformats) {
> + drm_err(dev, "Color formats not allowed for DP on [CONNECTOR:%d:%s]\n",
> + connector->base.id, connector->name);
> + return -EINVAL;
> + }
> + break;
> + }
> +
> + enum_list[0].name = "AUTO";
> + enum_list[0].type = DRM_CONNECTOR_COLOR_FORMAT_AUTO;
> +
> + for_each_set_bit(fmt, &supported_color_formats, DRM_OUTPUT_COLOR_FORMAT_COUNT) {
> + switch (fmt) {
> + case DRM_OUTPUT_COLOR_FORMAT_RGB444:
> + enum_list[++i].type = DRM_CONNECTOR_COLOR_FORMAT_RGB444;
> + break;
> + case DRM_OUTPUT_COLOR_FORMAT_YCBCR444:
> + enum_list[++i].type = DRM_CONNECTOR_COLOR_FORMAT_YCBCR444;
> + break;
> + case DRM_OUTPUT_COLOR_FORMAT_YCBCR422:
> + enum_list[++i].type = DRM_CONNECTOR_COLOR_FORMAT_YCBCR422;
> + break;
> + case DRM_OUTPUT_COLOR_FORMAT_YCBCR420:
> + enum_list[++i].type = DRM_CONNECTOR_COLOR_FORMAT_YCBCR420;
> + break;
> + default:
> + drm_warn(dev, "Unknown supported format %ld on [CONNECTOR:%d:%s]\n",
> + fmt, connector->base.id, connector->name);
> + continue;
> + }
> + enum_list[i].name = drm_hdmi_connector_get_output_format_name(fmt);
> + }
> +
> + connector->color_format_property =
> + drm_property_create_enum(dev, DRM_MODE_PROP_ENUM, "color format",
> + enum_list, i + 1);
> +
> + if (!connector->color_format_property)
> + return -ENOMEM;
> +
> + drm_object_attach_property(&connector->base, connector->color_format_property,
> + DRM_CONNECTOR_COLOR_FORMAT_AUTO);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(drm_connector_attach_color_format_property);
> +
> /**
> * drm_connector_atomic_hdr_metadata_equal - checks if the hdr metadata changed
> * @old_state: old connector state to compare
> diff --git a/include/drm/drm_connector.h b/include/drm/drm_connector.h
> index af8b92d2d5b7..bd549f912b76 100644
> --- a/include/drm/drm_connector.h
> +++ b/include/drm/drm_connector.h
> @@ -571,14 +571,102 @@ enum drm_colorspace {
> * YCbCr 4:2:2 output format (ie. with horizontal subsampling)
> * @DRM_OUTPUT_COLOR_FORMAT_YCBCR420:
> * YCbCr 4:2:0 output format (ie. with horizontal and vertical subsampling)
> + * @DRM_OUTPUT_COLOR_FORMAT_COUNT:
> + * Number of valid output color format values in this enum
> */
> enum drm_output_color_format {
> DRM_OUTPUT_COLOR_FORMAT_RGB444 = 0,
> DRM_OUTPUT_COLOR_FORMAT_YCBCR444,
> DRM_OUTPUT_COLOR_FORMAT_YCBCR422,
> DRM_OUTPUT_COLOR_FORMAT_YCBCR420,
> + DRM_OUTPUT_COLOR_FORMAT_COUNT,
> };
>
> +/**
> + * enum drm_connector_color_format - Connector Color Format Request
> + *
> + * This enum, unlike &enum drm_output_color_format, is used to specify requests
> + * for a specific color format on a connector through the DRM "color format"
> + * property. The difference is that it has an "AUTO" value to specify that
> + * no specific choice has been made.
> + */
> +enum drm_connector_color_format {
> + /**
> + * @DRM_CONNECTOR_COLOR_FORMAT_AUTO: The driver or display protocol
> + * helpers should pick a suitable color format. All implementations of a
> + * specific display protocol must behave the same way with "AUTO", but
> + * different display protocols do not necessarily have the same "AUTO"
> + * semantics.
> + *
> + * For HDMI, "AUTO" picks RGB, but falls back to YCbCr 4:2:0 if the
> + * bandwidth required for full-scale RGB is not available, or the mode
> + * is YCbCr 4:2:0-only, as long as the mode and output both support
> + * YCbCr 4:2:0.
Is there a reason you propose dropping back to YCbCr 4:2:0 without
trying YCbCr 4:2:2 first? Minimising the subsampling is surely
beneficial, and vc4 for one can do 4:2:2 but not 4:2:0.
Dave
> + *
> + * For display protocols other than HDMI, the recursive bridge chain
> + * format selection picks the first chain of bridge formats that works,
> + * as has already been the case before the introduction of the "color
> + * format" property. Non-HDMI bridges should therefore either sort their
> + * bus output formats by preference, or agree on a unified auto format
> + * selection logic that's implemented in a common state helper (like
> + * how HDMI does it).
> + */
> + DRM_CONNECTOR_COLOR_FORMAT_AUTO = 0,
> +
> + /**
> + * @DRM_CONNECTOR_COLOR_FORMAT_RGB444: RGB output format
> + */
> + DRM_CONNECTOR_COLOR_FORMAT_RGB444,
> +
> + /**
> + * @DRM_CONNECTOR_COLOR_FORMAT_YCBCR444: YCbCr 4:4:4 output format (ie.
> + * not subsampled)
> + */
> + DRM_CONNECTOR_COLOR_FORMAT_YCBCR444,
> +
> + /**
> + * @DRM_CONNECTOR_COLOR_FORMAT_YCBCR422: YCbCr 4:2:2 output format (ie.
> + * with horizontal subsampling)
> + */
> + DRM_CONNECTOR_COLOR_FORMAT_YCBCR422,
> +
> + /**
> + * @DRM_CONNECTOR_COLOR_FORMAT_YCBCR420: YCbCr 4:2:0 output format (ie.
> + * with horizontal and vertical subsampling)
> + */
> + DRM_CONNECTOR_COLOR_FORMAT_YCBCR420,
> +
> + /**
> + * @DRM_CONNECTOR_COLOR_FORMAT_COUNT: Number of valid connector color
> + * format values in this enum
> + */
> + DRM_CONNECTOR_COLOR_FORMAT_COUNT,
> +};
> +
> +/**
> + * drm_connector_color_format_valid - Validate drm_connector_color_format value
> + * @fmt: value to check against all values of &enum drm_connector_color_format
> + *
> + * Checks whether the passed in value of @fmt is one of the allowable values in
> + * &enum drm_connector_color_format.
> + *
> + * Returns: %true if it's a valid value for the enum, %false otherwise.
> + */
> +static inline bool __pure
> +drm_connector_color_format_valid(enum drm_connector_color_format fmt)
> +{
> + switch (fmt) {
> + case DRM_CONNECTOR_COLOR_FORMAT_AUTO:
> + case DRM_CONNECTOR_COLOR_FORMAT_RGB444:
> + case DRM_CONNECTOR_COLOR_FORMAT_YCBCR444:
> + case DRM_CONNECTOR_COLOR_FORMAT_YCBCR422:
> + case DRM_CONNECTOR_COLOR_FORMAT_YCBCR420:
> + return true;
> + default:
> + return false;
> + }
> +}
> +
> const char *
> drm_hdmi_connector_get_output_format_name(enum drm_output_color_format fmt);
>
> @@ -1129,6 +1217,13 @@ struct drm_connector_state {
> */
> enum drm_colorspace colorspace;
>
> + /**
> + * @color_format: State variable for Connector property to request
> + * color format change on Sink. This is most commonly used to switch
> + * between RGB to YUV and vice-versa.
> + */
> + enum drm_connector_color_format color_format;
> +
> /**
> * @writeback_job: Writeback job for writeback connectors
> *
> @@ -2127,6 +2222,12 @@ struct drm_connector {
> */
> struct drm_property *colorspace_property;
>
> + /**
> + * @color_format_property: Connector property to set the suitable
> + * color format supported by the sink.
> + */
> + struct drm_property *color_format_property;
> +
> /**
> * @path_blob_ptr:
> *
> @@ -2610,6 +2711,9 @@ bool drm_connector_has_possible_encoder(struct drm_connector *connector,
> struct drm_encoder *encoder);
> const char *drm_get_colorspace_name(enum drm_colorspace colorspace);
>
> +int drm_connector_attach_color_format_property(struct drm_connector *connector,
> + unsigned long supported_color_formats);
> +
> /**
> * drm_for_each_connector_iter - connector_list iterator macro
> * @connector: &struct drm_connector pointer used as cursor
>
> --
> 2.53.0
>
^ permalink raw reply
* Re: [PATCH v3 0/6] can: flexcan: Add NXP S32N79 SoC support
From: Ciprian Marian Costea @ 2026-03-25 12:45 UTC (permalink / raw)
To: Marc Kleine-Budde
Cc: Vincent Mailhol, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Frank Li, Sascha Hauer, Fabio Estevam, Pengutronix Kernel Team,
linux-can, devicetree, linux-kernel, imx, linux-arm-kernel,
NXP S32 Linux Team, Christophe Lizzi, Alberto Ruiz,
Enric Balletbo, Eric Chanudet
In-Reply-To: <20260324-visionary-vigilant-alpaca-a492d8-mkl@pengutronix.de>
On 3/24/2026 3:46 PM, Marc Kleine-Budde wrote:
> On 23.03.2026 14:58:21, Ciprian Costea wrote:
>> From: Ciprian Marian Costea <ciprianmarian.costea@oss.nxp.com>
>>
>> This patch series adds FlexCAN support for the NXP S32N79 SoC.
>>
>> The S32N79 is an automotive-grade processor from NXP with multiple
>> FlexCAN instances. The FlexCAN IP integration on S32N79 differs from
>> other SoCs in the interrupt routing - it uses two separate interrupt
>> lines:
>> - one interrupt for mailboxes 0-127
>> - one interrupt for bus error detection and device state changes
>
> Can you check if the S32N79 suffers from the
>
> | /* No interrupt for error passive */
> | #define FLEXCAN_QUIRK_BROKEN_PERR_STATE BIT(6)
>
> problem? Maybe everyone just added the FLEXCAN_QUIRK_BROKEN_PERR_STATE
> for the new SoC without actually testing it.
>
> regards,
> Marc
>
Hello Marc,
I've tested on S32N79 hardware. Without the quirk, the controller
reaches Transmit-Error-Counter=128 (error-passive at HW level) but the
driver only sees ERROR-WARNING - the passive interrupt never fires.
With the quirk enabled, ERROR-PASSIVE is correctly reported.
So FLEXCAN_QUIRK_BROKEN_PERR_STATE quirk is needed for S32N79.
Best Regards,
Ciprian
^ permalink raw reply
* Re: [PATCH] ACPI: APEI: Handle repeated SEA error interrupts storm scenarios
From: Shuai Xue @ 2026-03-25 12:40 UTC (permalink / raw)
To: hejunhao, Rafael J. Wysocki, Luck, Tony
Cc: bp, guohanjun, mchehab, jarkko, yazen.ghannam, jane.chu, lenb,
Jonathan.Cameron, linux-acpi, linux-arm-kernel, linux-kernel,
linux-edac, shiju.jose, tanxiaofei, Linuxarm
In-Reply-To: <22567fce-992c-89df-28fe-3d5959b8b205@h-partners.com>
On 3/25/26 5:24 PM, hejunhao wrote:
>
>
> On 2026/3/25 10:12, Shuai Xue wrote:
>> Hi, junhao
>>
>> On 3/24/26 6:04 PM, hejunhao wrote:
>>> Hi shuai xue,
>>>
>>>
>>> On 2026/3/3 22:42, Shuai Xue wrote:
>>>> Hi, junhao,
>>>>
>>>> On 2/27/26 8:12 PM, hejunhao wrote:
>>>>>
>>>>>
>>>>> On 2025/11/4 9:32, Shuai Xue wrote:
>>>>>>
>>>>>>
>>>>>> 在 2025/11/4 00:19, Rafael J. Wysocki 写道:
>>>>>>> On Thu, Oct 30, 2025 at 8:13 AM Junhao He <hejunhao3@h-partners.com> wrote:
>>>>>>>>
>>>>>>>> The do_sea() function defaults to using firmware-first mode, if supported.
>>>>>>>> It invoke acpi/apei/ghes ghes_notify_sea() to report and handling the SEA
>>>>>>>> error, The GHES uses a buffer to cache the most recent 4 kinds of SEA
>>>>>>>> errors. If the same kind SEA error continues to occur, GHES will skip to
>>>>>>>> reporting this SEA error and will not add it to the "ghes_estatus_llist"
>>>>>>>> list until the cache times out after 10 seconds, at which point the SEA
>>>>>>>> error will be reprocessed.
>>>>>>>>
>>>>>>>> The GHES invoke ghes_proc_in_irq() to handle the SEA error, which
>>>>>>>> ultimately executes memory_failure() to process the page with hardware
>>>>>>>> memory corruption. If the same SEA error appears multiple times
>>>>>>>> consecutively, it indicates that the previous handling was incomplete or
>>>>>>>> unable to resolve the fault. In such cases, it is more appropriate to
>>>>>>>> return a failure when encountering the same error again, and then proceed
>>>>>>>> to arm64_do_kernel_sea for further processing.
>>>>
>>>> There is no such function in the arm64 tree. If apei_claim_sea() returns
>>>
>>> Sorry for the mistake in the commit message. The function arm64_do_kernel_sea() should
>>> be arm64_notify_die().
>>>
>>>> an error, the actual fallback path in do_sea() is arm64_notify_die(),
>>>> which sends SIGBUS?
>>>>
>>>
>>> If apei_claim_sea() returns an error, arm64_notify_die() will call arm64_force_sig_fault(inf->sig /* SIGBUS */, , , ),
>>> followed by force_sig_fault(SIGBUS, , ) to force the process to receive the SIGBUS signal.
>>
>> So the process is expected to killed by SIGBUS?
>
> Yes. The devmem process is expected to terminate upon receiving a SIGBUS signal, you can
> see this at the last line of the test log after the patch is applied.
> For other processes whether it terminates depends on whether it catches the signal; the kernel is
> responsible for sending it immediately.
>
>>
>>>
>>>>>>>>
>>>>>>>> When hardware memory corruption occurs, a memory error interrupt is
>>>>>>>> triggered. If the kernel accesses this erroneous data, it will trigger
>>>>>>>> the SEA error exception handler. All such handlers will call
>>>>>>>> memory_failure() to handle the faulty page.
>>>>>>>>
>>>>>>>> If a memory error interrupt occurs first, followed by an SEA error
>>>>>>>> interrupt, the faulty page is first marked as poisoned by the memory error
>>>>>>>> interrupt process, and then the SEA error interrupt handling process will
>>>>>>>> send a SIGBUS signal to the process accessing the poisoned page.
>>>>>>>>
>>>>>>>> However, if the SEA interrupt is reported first, the following exceptional
>>>>>>>> scenario occurs:
>>>>>>>>
>>>>>>>> When a user process directly requests and accesses a page with hardware
>>>>>>>> memory corruption via mmap (such as with devmem), the page containing this
>>>>>>>> address may still be in a free buddy state in the kernel. At this point,
>>>>>>>> the page is marked as "poisoned" during the SEA claim memory_failure().
>>>>>>>> However, since the process does not request the page through the kernel's
>>>>>>>> MMU, the kernel cannot send SIGBUS signal to the processes. And the memory
>>>>>>>> error interrupt handling process not support send SIGBUS signal. As a
>>>>>>>> result, these processes continues to access the faulty page, causing
>>>>>>>> repeated entries into the SEA exception handler. At this time, it lead to
>>>>>>>> an SEA error interrupt storm.
>>>>
>>>> In such case, the user process which accessing the poisoned page will be killed
>>>> by memory_fauilre?
>>>>
>>>> // memory_failure():
>>>>
>>>> if (TestSetPageHWPoison(p)) {
>>>> res = -EHWPOISON;
>>>> if (flags & MF_ACTION_REQUIRED)
>>>> res = kill_accessing_process(current, pfn, flags);
>>>> if (flags & MF_COUNT_INCREASED)
>>>> put_page(p);
>>>> action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
>>>> goto unlock_mutex;
>>>> }
>>>>
>>>> I think this problem has already been fixed by commit 2e6053fea379 ("mm/memory-failure:
>>>> fix infinite UCE for VM_PFNMAP pfn").
>>>>
>>>> The root cause is that walk_page_range() skips VM_PFNMAP vmas by default when
>>>> no .test_walk callback is set, so kill_accessing_process() returns 0 for a
>>>> devmem-style mapping (remap_pfn_range, VM_PFNMAP), making the caller believe
>>>> the UCE was handled properly while the process was never actually killed.
>>>>
>>>> Did you try the lastest kernel version?
>>>>
>>>
>>> I retested this issue on the kernel v7.0.0-rc4 with the following debug patch and was still able to reproduce it.
>>>
>>>
>>> @@ -1365,8 +1365,11 @@ static int ghes_in_nmi_queue_one_entry(struct ghes *ghes,
>>> ghes_clear_estatus(ghes, &tmp_header, buf_paddr, fixmap_idx);
>>>
>>> /* This error has been reported before, don't process it again. */
>>> - if (ghes_estatus_cached(estatus))
>>> + if (ghes_estatus_cached(estatus)) {
>>> + pr_info("This error has been reported before, don't process it again.\n");
>>> goto no_work;
>>> + }
>>>
>>> the test log Only some debug logs are retained here.
>>>
>>> [2026/3/24 14:51:58.199] [root@localhost ~]# taskset -c 40 busybox devmem 0x1351811824 32 0
>>> [2026/3/24 14:51:58.369] [root@localhost ~]# taskset -c 40 busybox devmem 0x1351811824 32
>>> [2026/3/24 14:51:58.458] [ 130.558038][ C40] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
>>> [2026/3/24 14:51:58.459] [ 130.572517][ C40] {1}[Hardware Error]: event severity: recoverable
>>> [2026/3/24 14:51:58.459] [ 130.578861][ C40] {1}[Hardware Error]: Error 0, type: recoverable
>>> [2026/3/24 14:51:58.459] [ 130.585203][ C40] {1}[Hardware Error]: section_type: ARM processor error
>>> [2026/3/24 14:51:58.459] [ 130.592238][ C40] {1}[Hardware Error]: MIDR: 0x0000000000000000
>>> [2026/3/24 14:51:58.459] [ 130.598492][ C40] {1}[Hardware Error]: Multiprocessor Affinity Register (MPIDR): 0x0000000081010400
>>> [2026/3/24 14:51:58.459] [ 130.607871][ C40] {1}[Hardware Error]: error affinity level: 0
>>> [2026/3/24 14:51:58.459] [ 130.614038][ C40] {1}[Hardware Error]: running state: 0x1
>>> [2026/3/24 14:51:58.459] [ 130.619770][ C40] {1}[Hardware Error]: Power State Coordination Interface state: 0
>>> [2026/3/24 14:51:58.459] [ 130.627673][ C40] {1}[Hardware Error]: Error info structure 0:
>>> [2026/3/24 14:51:58.459] [ 130.633839][ C40] {1}[Hardware Error]: num errors: 1
>>> [2026/3/24 14:51:58.459] [ 130.639137][ C40] {1}[Hardware Error]: error_type: 0, cache error
>>> [2026/3/24 14:51:58.459] [ 130.645652][ C40] {1}[Hardware Error]: error_info: 0x0000000020400014
>>> [2026/3/24 14:51:58.459] [ 130.652514][ C40] {1}[Hardware Error]: cache level: 1
>>> [2026/3/24 14:51:58.551] [ 130.658073][ C40] {1}[Hardware Error]: the error has not been corrected
>>> [2026/3/24 14:51:58.551] [ 130.665194][ C40] {1}[Hardware Error]: physical fault address: 0x0000001351811800
>>> [2026/3/24 14:51:58.551] [ 130.673097][ C40] {1}[Hardware Error]: Vendor specific error info has 48 bytes:
>>> [2026/3/24 14:51:58.551] [ 130.680744][ C40] {1}[Hardware Error]: 00000000: 00000000 00000000 00000000 00000000 ................
>>> [2026/3/24 14:51:58.551] [ 130.690471][ C40] {1}[Hardware Error]: 00000010: 00000000 00000000 00000000 00000000 ................
>>> [2026/3/24 14:51:58.552] [ 130.700198][ C40] {1}[Hardware Error]: 00000020: 00000000 00000000 00000000 00000000 ................
>>> [2026/3/24 14:51:58.552] [ 130.710083][ T9767] Memory failure: 0x1351811: recovery action for free buddy page: Recovered
>>> [2026/3/24 14:51:58.638] [ 130.790952][ C40] This error has been reported before, don't process it again.
>>> [2026/3/24 14:51:58.903] [ 131.046994][ C40] This error has been reported before, don't process it again.
>>> [2026/3/24 14:51:58.991] [ 131.132360][ C40] This error has been reported before, don't process it again.
>>> [2026/3/24 14:51:59.969] [ 132.071431][ C40] This error has been reported before, don't process it again.
>>> [2026/3/24 14:52:00.860] [ 133.010255][ C40] This error has been reported before, don't process it again.
>>> [2026/3/24 14:52:01.927] [ 134.034746][ C40] This error has been reported before, don't process it again.
>>> [2026/3/24 14:52:02.906] [ 135.058973][ C40] This error has been reported before, don't process it again.
>>> [2026/3/24 14:52:03.971] [ 136.083213][ C40] This error has been reported before, don't process it again.
>>> [2026/3/24 14:52:04.860] [ 137.021956][ C40] This error has been reported before, don't process it again.
>>> [2026/3/24 14:52:06.018] [ 138.131460][ C40] This error has been reported before, don't process it again.
>>> [2026/3/24 14:52:06.905] [ 139.070280][ C40] This error has been reported before, don't process it again.
>>> [2026/3/24 14:52:07.886] [ 140.009147][ C40] This error has been reported before, don't process it again.
>>> [2026/3/24 14:52:08.596] [ 140.777368][ C40] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
>>> [2026/3/24 14:52:08.683] [ 140.791921][ C40] {2}[Hardware Error]: event severity: recoverable
>>> [2026/3/24 14:52:08.683] [ 140.798263][ C40] {2}[Hardware Error]: Error 0, type: recoverable
>>> [2026/3/24 14:52:08.683] [ 140.804606][ C40] {2}[Hardware Error]: section_type: ARM processor error
>>> [2026/3/24 14:52:08.683] [ 140.811641][ C40] {2}[Hardware Error]: MIDR: 0x0000000000000000
>>> [2026/3/24 14:52:08.684] [ 140.817895][ C40] {2}[Hardware Error]: Multiprocessor Affinity Register (MPIDR): 0x0000000081010400
>>> [2026/3/24 14:52:08.684] [ 140.827274][ C40] {2}[Hardware Error]: error affinity level: 0
>>> [2026/3/24 14:52:08.684] [ 140.833440][ C40] {2}[Hardware Error]: running state: 0x1
>>> [2026/3/24 14:52:08.684] [ 140.839173][ C40] {2}[Hardware Error]: Power State Coordination Interface state: 0
>>> [2026/3/24 14:52:08.684] [ 140.847076][ C40] {2}[Hardware Error]: Error info structure 0:
>>> [2026/3/24 14:52:08.684] [ 140.853241][ C40] {2}[Hardware Error]: num errors: 1
>>> [2026/3/24 14:52:08.684] [ 140.858540][ C40] {2}[Hardware Error]: error_type: 0, cache error
>>> [2026/3/24 14:52:08.684] [ 140.865055][ C40] {2}[Hardware Error]: error_info: 0x0000000020400014
>>> [2026/3/24 14:52:08.684] [ 140.871917][ C40] {2}[Hardware Error]: cache level: 1
>>> [2026/3/24 14:52:08.684] [ 140.877475][ C40] {2}[Hardware Error]: the error has not been corrected
>>> [2026/3/24 14:52:08.764] [ 140.884596][ C40] {2}[Hardware Error]: physical fault address: 0x0000001351811800
>>> [2026/3/24 14:52:08.764] [ 140.892499][ C40] {2}[Hardware Error]: Vendor specific error info has 48 bytes:
>>> [2026/3/24 14:52:08.766] [ 140.900145][ C40] {2}[Hardware Error]: 00000000: 00000000 00000000 00000000 00000000 ................
>>> [2026/3/24 14:52:08.767] [ 140.909872][ C40] {2}[Hardware Error]: 00000010: 00000000 00000000 00000000 00000000 ................
>>> [2026/3/24 14:52:08.767] [ 140.919598][ C40] {2}[Hardware Error]: 00000020: 00000000 00000000 00000000 00000000 ................
>>> [2026/3/24 14:52:08.768] [ 140.929346][ T9767] Memory failure: 0x1351811: already hardware poisoned
>>> [2026/3/24 14:52:08.768] [ 140.936072][ T9767] Memory failure: 0x1351811: Sending SIGBUS to busybox:9767 due to hardware memory corruption
>>
>> Did you cut off some logs here?
>
> I just removed some duplicate debug logs: "This error has already been...", these were added by myself.
>> The error log also indicates that the SIGBUS is delivered as expected.
>
> An SError occurs at kernel time 130.558038. Then, after 10 seconds, the kernel
> can re-enter the SEA processing flow and send the SIGBUS signal to the process.
> This 10-second delay corresponds to the cache timeout threshold of the
> ghes_estatus_cached() feature.
> Therefore, the purpose of this patch is to send the SIGBUS signal to the process
> immediately, rather than waiting for the timeout to expire.
Hi, hejun,
Sorry, but I am still not convinced by the log you provided.
As I understand your commit message, there are two different cases being discussed:
Case 1: memory error interrupt first, then SEA
When hardware memory corruption occurs, a memory error interrupt is
triggered first. If the kernel later accesses the corrupted data, it may
then enter the SEA handler. In this case, the faulty page would already
have been marked poisoned by the memory error interrupt path, and the SEA
handling path would eventually send SIGBUS to the task accessing that page.
Case 2: SEA first, then memory error interrupt
Your commit message describes this as the problematic scenario:
A user process directly accesses hardware-corrupted memory through a
PFNMAP-style mapping such as devmem. The page may still be in the free
buddy state when SEA is handled first. In that case, memory_failure()
poisons the page during SEA handling, but the process is not killed
immediately. Since the task continues accessing the same corrupted
location, it keeps re-entering the SEA handler, leading to an SEA storm.
Later, the memory error interrupt path also cannot kill the task, so the
system remains stuck in this repeated SEA loop.
My concern is that your recent explanation and log seem to demonstrate
something different from what the commit message claims to fix.
From the log, what I can see is:
the first SEA occurs,
the page is marked poisoned as a free buddy page,
repeated SEAs are suppressed by ghes_estatus_cached(),
after the cache timeout expires, the SEA path runs again,
then memory_failure() reports "already hardware poisoned" and SIGBUS is
sent to the busybox devmem process.
This seems to show a delayed SIGBUS delivery caused by the GHES cache
timeout, rather than clearly demonstrating the SEA storm problem described
in the commit message.
So I think there is still a mismatch here:
If the patch is intended to fix the SEA storm described in case 2,
then I would expect evidence that the storm still exists on the latest
kernel and that this patch is what actually breaks that loop.
If instead the patch is intended to avoid the 10-second delay before
SIGBUS delivery, then that should be stated explicitly, because that is
a different problem statement from what the current commit message says.
Also, regarding the devmem/PFNMAP case: I previously pointed to commit
2e6053fea379 ("mm/memory-failure: fix infinite UCE for VM_PFNMAP pfn"),
which was meant to address the failure to kill tasks accessing poisoned
VM_PFNMAP mappings.
So my main question is:
Does the SEA storm issue still exist on the latest kernel version, or is
the remaining issue only that SIGBUS is delayed by the GHES estatus cache
timeout?
I think the answer to that question is important before deciding whether
this patch is correct, and before finalizing the commit message.
Thanks,
Shuai
^ permalink raw reply
* Re: i.MX8MP: Fix HDMI LCDIF FIFO underruns
From: Krzysztof Hałasa @ 2026-03-25 12:40 UTC (permalink / raw)
To: Liu Ying
Cc: Maxime Ripard, Marco Felsch, Marek Vasut, Stefan Agner,
Simona Vetter, imx, Fabio Estevam, Pengutronix Kernel Team,
Maarten Lankhorst, Sascha Hauer, Frank Li, linux-kernel,
dri-devel, Thomas Zimmermann, David Airlie, linux-arm-kernel,
Lucas Stach
In-Reply-To: <ecd02f87-c4b7-44ed-8ecf-98e0990af34a@nxp.com>
Hi,
maybe I don't understand this, but... added a bunch of printk()s:
static void dump(struct lcdif_drm_private *lcdif, const char *func, unsigned line)
{
uint32_t d1 = readl(lcdif->base + LCDC_V8_INT_STATUS_D1);
printk(KERN_DEBUG "%s[%u]\n", func, line);
printk(KERN_DEBUG "CTRL %08X PARA %08X %08X %08X %08X %08X\n",
readl(lcdif->base + LCDC_V8_CTRL),
readl(lcdif->base + LCDC_V8_DISP_PARA),
readl(lcdif->base + LCDC_V8_DISP_SIZE),
readl(lcdif->base + LCDC_V8_HSYN_PARA),
readl(lcdif->base + LCDC_V8_VSYN_PARA),
readl(lcdif->base + LCDC_V8_VSYN_HSYN_WIDTH));
printk(KERN_DEBUG "D0 %08X %08X D1 %08X(C) %08X\n",
readl(lcdif->base + LCDC_V8_INT_STATUS_D0),
readl(lcdif->base + LCDC_V8_INT_ENABLE_D0),
d1,
readl(lcdif->base + LCDC_V8_INT_ENABLE_D1));
printk(KERN_DEBUG "DESC %08X %08X %08X %08X %08X\n",
readl(lcdif->base + LCDC_V8_CTRLDESCL0_1),
readl(lcdif->base + LCDC_V8_CTRLDESCL0_3),
readl(lcdif->base + LCDC_V8_CTRLDESCL_LOW0_4),
readl(lcdif->base + LCDC_V8_CTRLDESCL_HIGH0_4),
readl(lcdif->base + LCDC_V8_CTRLDESCL0_5));
if (d1)
writel(d1, lcdif->base + LCDC_V8_INT_STATUS_D1); // clear D1 status
}
(C) after LCDC_V8_INT_STATUS_D1 value means I clear it after read. It's
the single PANIC bit, "write 1 to clear".
LCDIFV3_PANIC0_THRES is 0x15501AA, but 2/4 and 3/4 didn't fix it either
(completely at least). 2160p30. Also added the undocumented
CTRLDESCL0_3_STATE_CLEAR_VSYNC bit as in NXP's driver (clearing FIFO on
VSYNC).
Normal start (before Weston):
4.943 lcdif_reset_block[432]
4.943 CTRL 80000000 PARA 00000000 00000000 00030003 00030003 00030003 <<<<< SW_RESET
4.943 D0 00000000 00000000 D1 00000000(C) 00000000
4.943 DESC 00000000 00000000 00000000 00000000 00000000
4.943 lcdif_reset_block[437]
4.943 CTRL 00000000 PARA 00000000 00000000 00030003 00030003 00030003
4.943 D0 00000000 00000000 D1 00000000(C) 00000000
4.943 DESC 00000000 00000000 00000000 00000000 00000000
4.943 lcdif_set_formats[197]
4.943 CTRL 00000000 PARA 00000000 00000000 00030003 00030003 00030003
4.943 D0 00000000 00000000 D1 00000000(C) 00000000
4.943 DESC 00000000 00000000 00000000 00000000 00000000
4.943 lcdif_set_formats[320]
4.943 CTRL 00000000 PARA 00000000 00000000 00030003 00030003 00030003
4.943 D0 00000000 00000000 D1 00000000(C) 00000000
4.943 DESC 00000000 00000000 00000000 00000000 09000000
4.943 lcdif_set_mode[340]
4.943 CTRL 00000000 PARA 00000000 00000000 00030003 00030003 00030003
4.943 D0 00000000 00000000 D1 00000000(C) 00000000
4.943 DESC 00000000 00000000 00000000 00000000 09000000
4.943 lcdif_enable_controller[382]
4.943 CTRL 00000002 PARA 00000000 08700F00 012800B0 00480008 000A0058
4.943 D0 00000000 00000000 D1 00000000(C) 00000000
4.943 DESC 08700F00 00A23C00 ED000000 00000000 09000000
4.948 lcdif_enable_controller[402]
4.948 CTRL 00000002 PARA 80000000 08700F00 012800B0 00480008 000A0058
4.949 D0 00000000 00000000 D1 00000000(C) 00000001
4.949 DESC 08700F00 00A23C00 ED000000 00000000 89000000 <<<<< DMA_EN
I don't know what exactly happens at this point, the DMA starts and the
CRTC also starts scanning right away? No underrun follows. Not even
"panic" indication:
4.949 lcdif_crtc_atomic_enable[604]
4.950 lcdif_plane_primary_atomic_update[746]
4.951 CTRL 00000002 PARA 80000000 08700F00 012800B0 00480008 000A0058
4.951 D0 00000000 00000000 D1 00000000(C) 00000001
4.951 DESC 08700F00 00A23C00 ED000000 00000000 89000000
4.956 lcdif_plane_primary_atomic_update[755]
4.956 CTRL 00000002 PARA 80000000 08700F00 012800B0 00480008 000A0058
4.956 D0 00000000 00000000 D1 00000000(C) 00000001
4.956 DESC 08700F00 00A23C00 ED000000 00000000 89000000
4.956 lcdif_crtc_atomic_flush[554]
4.956 CTRL 00000002 PARA 80000000 08700F00 012800B0 00480008 000A0058
4.956 D0 00000000 00000000 D1 00000000(C) 00000001
4.957 DESC 08700F00 00A23C00 ED000000 00000000 89000000
4.957 lcdif_crtc_atomic_flush[558]
4.957 CTRL 00000002 PARA 80000000 08700F00 012800B0 00480008 000A0058
4.957 D0 00000000 00000000 D1 00000000(C) 00000001
4.957 DESC 08700F00 00A23C00 ED000000 00000000 C9000000 <<<<< SHADOWS
^^^ The code programs the CRTC to update DMA registers (only?), I guess
mostly the frame buffer address, on the next frame end / start /
whatever event.
4.957 lcdif_crtc_enable_vblank[678]
4.957 CTRL 00000002 PARA 80000000 08700F00 012800B0 00480008 000A0058
4.957 D0 00000000 00000000 D1 00000000(C) 00000001
4.957 DESC 08700F00 00A23C00 ED000000 00000000 C9000000 <<<<< SHADOWS
4.962 lcdif_crtc_enable_vblank[683]
4.962 CTRL 00000002 PARA 80000000 08700F00 012800B0 00480008 000A0058
4.962 D0 00000000 00000004 D1 00000000(C) 00000001
4.962 DESC 08700F00 00A23C00 ED000000 00000000 C9000000 <<<<< SHADOWS
4.963 lcdif_crtc_atomic_flush[572]
if the frame started at 4.948 - 4.949, the next one is due at ca. 4.982
(30 Hz). VSync should start not sooner than 4.980 (32.0 ms).
4.977 lcdif_crtc_atomic_destroy_state[632]
4.977 Console: switching to colour frame buffer device 240x67
4.977 lcdif_crtc_atomic_duplicate_state[661]
4.977 lcdif_crtc_atomic_check[480]
4.977 lcdif_crtc_atomic_check[542]
4.977 CTRL 00000002 PARA 80000000 08700F00 012800B0 00480008 000A0058
4.977 D0 00000001 00000004 D1 00000000(C) 00000001 <<<<< VSYNC
4.978 DESC 08700F00 00A23C00 ED000000 00000000 89000000
^^^ Here the "update shadow" bit is (self) cleared, so I think the frame
has ended (or the next one started), and a new one has started (or is
about to). No underrun. This is not an IRQ. Interestingly, it's not
4.980 yet.
4.978 lcdif_crtc_atomic_check[543]
4.978 lcdif_plane_primary_atomic_update[746]
4.978 CTRL 00000002 PARA 80000000 08700F00 012800B0 00480008 000A0058
4.978 D0 00000001 00000004 D1 00000000(C) 00000001
4.978 DESC 08700F00 00A23C00 ED000000 00000000 89000000
4.985 lcdif_plane_primary_atomic_update[755]
4.985 CTRL 00000002 PARA 80000000 08700F00 012800B0 00480008 000A0058
4.985 D0 00000001 00000004 D1 00000000(C) 00000001
4.985 DESC 08700F00 00A23C00 ED000000 00000000 89000000
At this point, the new frame definitely has started (7+ ms since VSYNC).
Now starting Weston:
...
34.655 CTRL 80000000 PARA 00000000 00000000 00030003 00030003 00030003 <<<<< SW_RESET
34.655 D0 00000000 00000000 D1 00000000(C) 00000000
34.655 DESC 00000000 00000000 00000000 00000000 00000000
34.655 lcdif_reset_block[437]
34.655 CTRL 00000000 PARA 00000000 00000000 00030003 00030003 00030003
34.655 D0 00000000 00000000 D1 00000000(C) 00000000
34.655 DESC 00000000 00000000 00000000 00000000 00000000
34.655 lcdif_set_formats[197]
34.655 CTRL 00000000 PARA 00000000 00000000 00030003 00030003 00030003
34.655 D0 00000000 00000000 D1 00000000(C) 00000000
34.655 DESC 00000000 00000000 00000000 00000000 00000000
34.655 lcdif_set_formats[320]
34.655 CTRL 00000000 PARA 00000000 00000000 00030003 00030003 00030003
34.655 D0 00000000 00000000 D1 00000000(C) 00000000
34.655 DESC 00000000 00000000 00000000 00000000 09000000
34.655 lcdif_set_mode[340]
34.655 CTRL 00000000 PARA 00000000 00000000 00030003 00030003 00030003
34.655 D0 00000000 00000000 D1 00000000(C) 00000000
34.655 DESC 00000000 00000000 00000000 00000000 09000000
34.655 lcdif_enable_controller[382]
34.655 CTRL 00000002 PARA 00000000 08700F00 012800B0 00480008 000A0058
34.655 D0 00000000 00000000 D1 00000000(C) 00000000
34.655 DESC 08700F00 00A23C00 EF000000 00000000 09000000
34.660 lcdif_enable_controller[402]
34.660 CTRL 00000002 PARA 80000000 08700F00 012800B0 00480008 000A0058
34.660 D0 00000000 00000000 D1 00000000(C) 00000001
34.660 DESC 08700F00 00A23C00 EF000000 00000000 89000000 <<<<< DMA_EN
As above, the only difference is frame buffer address.
34.660 lcdif_crtc_atomic_enable[604]
34.661 lcdif_plane_primary_atomic_update[746]
34.661 CTRL 00000002 PARA 80000000 08700F00 012800B0 00480008 000A0058
34.661 D0 00000000 00000000 D1 00000000(C) 00000001
34.661 DESC 08700F00 00A23C00 EF000000 00000000 89000000
34.666 lcdif_plane_primary_atomic_update[755]
34.666 CTRL 00000002 PARA 80000000 08700F00 012800B0 00480008 000A0058
34.666 D0 00000000 00000000 D1 00000000(C) 00000001
34.667 DESC 08700F00 00A23C00 EF000000 00000000 89000000
34.667 lcdif_crtc_atomic_flush[554]
34.667 CTRL 00000002 PARA 80000000 08700F00 012800B0 00480008 000A0058
34.667 D0 00000000 00000000 D1 00000000(C) 00000001
34.667 DESC 08700F00 00A23C00 EF000000 00000000 89000000
34.667 lcdif_crtc_atomic_flush[558]
34.667 CTRL 00000002 PARA 80000000 08700F00 012800B0 00480008 000A0058
34.667 D0 00000000 00000000 D1 00000000(C) 00000001
34.667 DESC 08700F00 00A23C00 EF000000 00000000 C9000000 <<<<< SHADOWS
The DMA registers are to be updated on the next VSYNC.
34.667 lcdif_crtc_enable_vblank[678]
34.667 CTRL 00000002 PARA 80000000 08700F00 012800B0 00480008 000A0058
34.667 D0 00000000 00000000 D1 00000000(C) 00000001
34.667 DESC 08700F00 00A23C00 EF000000 00000000 C9000000 <<<<< SHADOWS
34.672 lcdif_crtc_enable_vblank[683]
34.672 CTRL 00000002 PARA 80000000 08700F00 012800B0 00480008 000A0058
34.672 D0 00000000 00000004 D1 00000000(C) 00000001
34.672 DESC 08700F00 00A23C00 EF000000 00000000 C9000000 <<<<< SHADOWS
34.689 lcdif_crtc_atomic_destroy_state[632]
35.482 lcdif_crtc_atomic_duplicate_state[661]
35.482 lcdif_crtc_atomic_check[480]
35.482 lcdif_crtc_atomic_check[542]
35.482 CTRL 00000002 PARA 80000000 08700F00 012800B0 00480008 000A0058
35.482 D0 01000002 00000004 D1 00000001(C) 00000001 <<<<< UNDERRUN
35.482 DESC 08700F00 00A23C00 EF000000 00000000 89000000
No VSYNC bit (though shadow updates happen on VSYNC), PANIC registered,
and UNDERRUN.
Why did the first (boot) sequence result in success and this second one
in failure? This is somehow reproducible.
Interestingly, Weston usually starts fine in subsequent launches.
How is this first (actually second - after the first VSYNC) frame
different from all the others? Maybe we're programming UPDATE_SHADOW
after the actual start of blanking period, but before DE, so it has no
chance to reload registers, yet the DMA is somehow not ready?
The manual (LCDIF display control Register (CTRL)) suggests that the DMA
(with present settings) starts to fetch FB data at the end of the
previous active video pixel time (DE and pixel data going inactive):
fetch_start_option (bits 9-8): Indicates when to start fetching for new
frame. This signals also decide the shadow load, fifo clear time
00b - fetch start as soon as FPV begins(as the end of the data_enable).
This should leave a lot of time to fill the FIFO (before the next DE),
shouldn't it?
297 MHz pixclk, 3840x2160, 30 Hz, H back porch 296 pixclks, Vfp 176,
V back porch 72 lines, Hfp 8 lines, VSYNC 10 lines, HSYNC 88 pixclks.
HTotal = 4400 pixels which makes Vactive time = 4400 * 2160/297e6 =
32 ms, with the remaining 1.3333 ms for all the V sync stuff.
--
Krzysztof "Chris" Hałasa
Sieć Badawcza Łukasiewicz
Przemysłowy Instytut Automatyki i Pomiarów PIAP
Al. Jerozolimskie 202, 02-486 Warszawa
^ permalink raw reply
* [RFC PATCH 5/6] net: ethernet: ti: am65-cpsw-nuss: Recycle TX and RX CPPI Descriptors
From: Siddharth Vadapalli @ 2026-03-25 12:38 UTC (permalink / raw)
To: peter.ujfalusi, vkoul, Frank.Li, andrew+netdev, davem, edumazet,
kuba, pabeni, nm, ssantosh, horms, c-vankar, mwalle
Cc: dmaengine, linux-kernel, netdev, linux-arm-kernel, danishanwar,
srk, s-vadapalli
In-Reply-To: <20260325123850.638748-1-s-vadapalli@ti.com>
The existing implementation allocates the CPPI Descriptors from the
Descriptor Pool of the TX and RX Channels on demand and returns them to
the Pool on completion. Recycle descriptors to speed up the transmit and
receive paths. Use a Cyclic Queue (Ring) to hold the TX and RX Descriptors
for the respective TX Channel and RX Flow and utilize atomic operations for
guarding against concurrent modification.
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
---
drivers/net/ethernet/ti/am65-cpsw-nuss.c | 237 ++++++++++++++++++++---
drivers/net/ethernet/ti/am65-cpsw-nuss.h | 19 ++
2 files changed, 233 insertions(+), 23 deletions(-)
diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
index 6df6cb52d952..fc165579a479 100644
--- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c
+++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
@@ -145,9 +145,6 @@
AM65_CPSW_PN_TS_CTL_RX_ANX_F_EN)
#define AM65_CPSW_ALE_AGEOUT_DEFAULT 30
-/* Number of TX/RX descriptors per channel/flow */
-#define AM65_CPSW_MAX_TX_DESC 500
-#define AM65_CPSW_MAX_RX_DESC 500
#define AM65_CPSW_NAV_PS_DATA_SIZE 16
#define AM65_CPSW_NAV_SW_DATA_SIZE 16
@@ -374,6 +371,122 @@ static void am65_cpsw_slave_set_promisc(struct am65_cpsw_port *port,
}
}
+static size_t am65_cpsw_nuss_num_free_tx_desc(struct am65_cpsw_tx_chn *tx_chn)
+{
+ struct am65_cpsw_tx_ring *tx_ring = &tx_chn->tx_ring;
+ int head_idx, tail_idx, num_free;
+
+ /* Atomically read both head and tail indices */
+ head_idx = atomic_read(&tx_ring->tx_desc_ring_head_idx);
+ tail_idx = atomic_read(&tx_ring->tx_desc_ring_tail_idx);
+
+ /* Calculate number of available descriptors in circular queue */
+ num_free = (tail_idx - head_idx + (AM65_CPSW_MAX_TX_DESC + 1)) %
+ (AM65_CPSW_MAX_TX_DESC + 1);
+
+ return num_free;
+}
+
+static void am65_cpsw_nuss_put_tx_desc(struct am65_cpsw_tx_chn *tx_chn,
+ struct cppi5_host_desc_t *desc)
+{
+ struct am65_cpsw_tx_ring *tx_ring = &tx_chn->tx_ring;
+ int tail_idx, new_tail_idx;
+
+ /* Atomically get current tail index and calculate new wrapped index */
+ do {
+ tail_idx = atomic_read(&tx_ring->tx_desc_ring_tail_idx);
+ new_tail_idx = tail_idx + 1;
+ if (new_tail_idx > AM65_CPSW_MAX_TX_DESC)
+ new_tail_idx = 0;
+ } while (atomic_cmpxchg(&tx_ring->tx_desc_ring_tail_idx,
+ tail_idx, new_tail_idx) != tail_idx);
+
+ /* Store the descriptor at the tail position */
+ tx_ring->tx_descs[tail_idx] = desc;
+}
+
+static void am65_cpsw_nuss_put_rx_desc(struct am65_cpsw_rx_flow *flow,
+ struct cppi5_host_desc_t *desc)
+{
+ struct am65_cpsw_rx_ring *rx_ring = &flow->rx_ring;
+ int tail_idx, new_tail_idx;
+
+ /* Atomically get current tail index and calculate new wrapped index */
+ do {
+ tail_idx = atomic_read(&rx_ring->rx_desc_ring_tail_idx);
+ new_tail_idx = tail_idx + 1;
+ if (new_tail_idx > AM65_CPSW_MAX_RX_DESC)
+ new_tail_idx = 0;
+ } while (atomic_cmpxchg(&rx_ring->rx_desc_ring_tail_idx,
+ tail_idx, new_tail_idx) != tail_idx);
+
+ /* Store the descriptor at the tail position */
+ rx_ring->rx_descs[tail_idx] = desc;
+}
+
+static void *am65_cpsw_nuss_get_tx_desc(struct am65_cpsw_tx_chn *tx_chn)
+{
+ struct am65_cpsw_tx_ring *tx_ring = &tx_chn->tx_ring;
+ int head_idx, tail_idx, new_head_idx;
+
+ /* Atomically get current head index and check if queue is empty */
+ do {
+ head_idx = atomic_read(&tx_ring->tx_desc_ring_head_idx);
+ tail_idx = atomic_read(&tx_ring->tx_desc_ring_tail_idx);
+
+ /* Queue is empty when head == tail */
+ if (head_idx == tail_idx)
+ return NULL;
+
+ /* Calculate new head with wraparound */
+ new_head_idx = head_idx + 1;
+ if (new_head_idx > AM65_CPSW_MAX_TX_DESC)
+ new_head_idx = 0;
+
+ } while (atomic_cmpxchg(&tx_ring->tx_desc_ring_head_idx,
+ head_idx, new_head_idx) != head_idx);
+
+ return tx_ring->tx_descs[head_idx];
+}
+
+static void *am65_cpsw_nuss_get_rx_desc(struct am65_cpsw_rx_flow *flow)
+{
+ struct am65_cpsw_rx_ring *rx_ring = &flow->rx_ring;
+ int head_idx, tail_idx, new_head_idx;
+
+ /* Atomically get current head index and check if queue is empty */
+ do {
+ head_idx = atomic_read(&rx_ring->rx_desc_ring_head_idx);
+ tail_idx = atomic_read(&rx_ring->rx_desc_ring_tail_idx);
+
+ /* Queue is empty when head == tail */
+ if (head_idx == tail_idx)
+ return NULL;
+
+ /* Calculate new head with wraparound */
+ new_head_idx = head_idx + 1;
+ if (new_head_idx > AM65_CPSW_MAX_RX_DESC)
+ new_head_idx = 0;
+
+ } while (atomic_cmpxchg(&rx_ring->rx_desc_ring_head_idx,
+ head_idx, new_head_idx) != head_idx);
+
+ return rx_ring->rx_descs[head_idx];
+}
+
+static inline int am65_cpsw_nuss_tx_descs_available(struct am65_cpsw_tx_chn *tx_chn)
+{
+ struct am65_cpsw_tx_ring *tx_ring = &tx_chn->tx_ring;
+ int head_idx, tail_idx;
+
+ head_idx = atomic_read(&tx_ring->tx_desc_ring_head_idx);
+ tail_idx = atomic_read(&tx_ring->tx_desc_ring_tail_idx);
+
+ return (tail_idx - head_idx + (AM65_CPSW_MAX_TX_DESC + 1)) %
+ (AM65_CPSW_MAX_TX_DESC + 1);
+}
+
static void am65_cpsw_nuss_ndo_slave_set_rx_mode(struct net_device *ndev)
{
struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
@@ -423,7 +536,7 @@ static void am65_cpsw_nuss_ndo_host_tx_timeout(struct net_device *ndev,
netif_tx_queue_stopped(netif_txq),
jiffies_to_msecs(jiffies - trans_start),
netdev_queue_dql_avail(netif_txq),
- k3_cppi_desc_pool_avail(tx_chn->desc_pool));
+ am65_cpsw_nuss_num_free_tx_desc(tx_chn));
if (netif_tx_queue_stopped(netif_txq)) {
/* try recover if stopped by us */
@@ -442,7 +555,7 @@ static int am65_cpsw_nuss_rx_push(struct am65_cpsw_common *common,
dma_addr_t desc_dma;
dma_addr_t buf_dma;
- desc_rx = k3_cppi_desc_pool_alloc(rx_chn->desc_pool);
+ desc_rx = am65_cpsw_nuss_get_rx_desc(&rx_chn->flows[flow_idx]);
if (!desc_rx) {
dev_err(dev, "Failed to allocate RXFDQ descriptor\n");
return -ENOMEM;
@@ -453,7 +566,7 @@ static int am65_cpsw_nuss_rx_push(struct am65_cpsw_common *common,
page_address(page) + AM65_CPSW_HEADROOM,
AM65_CPSW_MAX_PACKET_SIZE, DMA_FROM_DEVICE);
if (unlikely(dma_mapping_error(rx_chn->dma_dev, buf_dma))) {
- k3_cppi_desc_pool_free(rx_chn->desc_pool, desc_rx);
+ am65_cpsw_nuss_put_rx_desc(&rx_chn->flows[flow_idx], desc_rx);
dev_err(dev, "Failed to map rx buffer\n");
return -EINVAL;
}
@@ -508,6 +621,7 @@ static void am65_cpsw_nuss_tx_cleanup(void *data, dma_addr_t desc_dma);
static void am65_cpsw_destroy_rxq(struct am65_cpsw_common *common, int id)
{
struct am65_cpsw_rx_chn *rx_chn = &common->rx_chns;
+ struct cppi5_host_desc_t *rx_desc;
struct am65_cpsw_rx_flow *flow;
struct xdp_rxq_info *rxq;
int port;
@@ -515,6 +629,12 @@ static void am65_cpsw_destroy_rxq(struct am65_cpsw_common *common, int id)
flow = &rx_chn->flows[id];
napi_disable(&flow->napi_rx);
hrtimer_cancel(&flow->rx_hrtimer);
+ /* return descriptors to pool */
+ rx_desc = am65_cpsw_nuss_get_rx_desc(flow);
+ while (rx_desc) {
+ k3_cppi_desc_pool_free(rx_chn->desc_pool, rx_desc);
+ rx_desc = am65_cpsw_nuss_get_rx_desc(flow);
+ }
k3_udma_glue_reset_rx_chn(rx_chn->rx_chn, id, rx_chn,
am65_cpsw_nuss_rx_cleanup);
@@ -603,7 +723,12 @@ static int am65_cpsw_create_rxq(struct am65_cpsw_common *common, int id)
goto err;
}
+ /* Preallocate all RX Descriptors */
+ atomic_set(&flow->rx_ring.rx_desc_ring_head_idx, 0);
+ atomic_set(&flow->rx_ring.rx_desc_ring_tail_idx, AM65_CPSW_MAX_RX_DESC);
+
for (i = 0; i < AM65_CPSW_MAX_RX_DESC; i++) {
+ flow->rx_ring.rx_descs[i] = k3_cppi_desc_pool_alloc(rx_chn->desc_pool);
page = page_pool_dev_alloc_pages(flow->page_pool);
if (!page) {
dev_err(common->dev, "cannot allocate page in flow %d\n",
@@ -661,9 +786,16 @@ static int am65_cpsw_create_rxqs(struct am65_cpsw_common *common)
static void am65_cpsw_destroy_txq(struct am65_cpsw_common *common, int id)
{
struct am65_cpsw_tx_chn *tx_chn = &common->tx_chns[id];
+ struct cppi5_host_desc_t *tx_desc;
napi_disable(&tx_chn->napi_tx);
hrtimer_cancel(&tx_chn->tx_hrtimer);
+ /* return descriptors to pool */
+ tx_desc = am65_cpsw_nuss_get_tx_desc(tx_chn);
+ while (tx_desc) {
+ k3_cppi_desc_pool_free(tx_chn->desc_pool, tx_desc);
+ tx_desc = am65_cpsw_nuss_get_tx_desc(tx_chn);
+ }
k3_udma_glue_reset_tx_chn(tx_chn->tx_chn, tx_chn,
am65_cpsw_nuss_tx_cleanup);
k3_udma_glue_disable_tx_chn(tx_chn->tx_chn);
@@ -695,7 +827,13 @@ static void am65_cpsw_destroy_txqs(struct am65_cpsw_common *common)
static int am65_cpsw_create_txq(struct am65_cpsw_common *common, int id)
{
struct am65_cpsw_tx_chn *tx_chn = &common->tx_chns[id];
- int ret;
+ int ret, i;
+
+ /* Preallocate all TX Descriptors */
+ atomic_set(&tx_chn->tx_ring.tx_desc_ring_head_idx, 0);
+ atomic_set(&tx_chn->tx_ring.tx_desc_ring_tail_idx, AM65_CPSW_MAX_TX_DESC);
+ for (i = 0; i < AM65_CPSW_MAX_TX_DESC; i++)
+ tx_chn->tx_ring.tx_descs[i] = k3_cppi_desc_pool_alloc(tx_chn->desc_pool);
ret = k3_udma_glue_enable_tx_chn(tx_chn->tx_chn);
if (ret)
@@ -1103,7 +1241,7 @@ static int am65_cpsw_xdp_tx_frame(struct net_device *ndev,
u32 pkt_len = xdpf->len;
int ret;
- host_desc = k3_cppi_desc_pool_alloc(tx_chn->desc_pool);
+ host_desc = am65_cpsw_nuss_get_tx_desc(tx_chn);
if (unlikely(!host_desc)) {
ndev->stats.tx_dropped++;
return AM65_CPSW_XDP_CONSUMED; /* drop */
@@ -1161,7 +1299,7 @@ static int am65_cpsw_xdp_tx_frame(struct net_device *ndev,
k3_udma_glue_tx_cppi5_to_dma_addr(tx_chn->tx_chn, &dma_buf);
dma_unmap_single(tx_chn->dma_dev, dma_buf, pkt_len, DMA_TO_DEVICE);
pool_free:
- k3_cppi_desc_pool_free(tx_chn->desc_pool, host_desc);
+ am65_cpsw_nuss_put_tx_desc(tx_chn, host_desc);
return ret;
}
@@ -1320,7 +1458,7 @@ static int am65_cpsw_nuss_rx_packets(struct am65_cpsw_rx_flow *flow,
dev_dbg(dev, "%s rx csum_info:%#x\n", __func__, csum_info);
dma_unmap_single(rx_chn->dma_dev, buf_dma, buf_dma_len, DMA_FROM_DEVICE);
- k3_cppi_desc_pool_free(rx_chn->desc_pool, desc_rx);
+ am65_cpsw_nuss_put_rx_desc(flow, desc_rx);
if (port->xdp_prog) {
xdp_init_buff(&xdp, PAGE_SIZE, &port->xdp_rxq[flow->id]);
@@ -1444,13 +1582,48 @@ static void am65_cpsw_nuss_tx_wake(struct am65_cpsw_tx_chn *tx_chn, struct net_d
*/
__netif_tx_lock(netif_txq, smp_processor_id());
if (netif_running(ndev) &&
- (k3_cppi_desc_pool_avail(tx_chn->desc_pool) >= MAX_SKB_FRAGS))
+ (am65_cpsw_nuss_num_free_tx_desc(tx_chn) >= MAX_SKB_FRAGS))
netif_tx_wake_queue(netif_txq);
__netif_tx_unlock(netif_txq);
}
}
+static inline void am65_cpsw_nuss_xmit_recycle(struct am65_cpsw_tx_chn *tx_chn,
+ struct cppi5_host_desc_t *desc)
+{
+ struct cppi5_host_desc_t *first_desc, *next_desc;
+ dma_addr_t buf_dma, next_desc_dma;
+ u32 buf_dma_len;
+
+ first_desc = desc;
+ next_desc = first_desc;
+
+ cppi5_hdesc_get_obuf(first_desc, &buf_dma, &buf_dma_len);
+ k3_udma_glue_tx_cppi5_to_dma_addr(tx_chn->tx_chn, &buf_dma);
+
+ dma_unmap_single(tx_chn->dma_dev, buf_dma, buf_dma_len, DMA_TO_DEVICE);
+
+ next_desc_dma = cppi5_hdesc_get_next_hbdesc(first_desc);
+ k3_udma_glue_tx_cppi5_to_dma_addr(tx_chn->tx_chn, &next_desc_dma);
+ while (next_desc_dma) {
+ next_desc = k3_cppi_desc_pool_dma2virt(tx_chn->desc_pool,
+ next_desc_dma);
+ cppi5_hdesc_get_obuf(next_desc, &buf_dma, &buf_dma_len);
+ k3_udma_glue_tx_cppi5_to_dma_addr(tx_chn->tx_chn, &buf_dma);
+
+ dma_unmap_page(tx_chn->dma_dev, buf_dma, buf_dma_len,
+ DMA_TO_DEVICE);
+
+ next_desc_dma = cppi5_hdesc_get_next_hbdesc(next_desc);
+ k3_udma_glue_tx_cppi5_to_dma_addr(tx_chn->tx_chn, &next_desc_dma);
+
+ am65_cpsw_nuss_put_tx_desc(tx_chn, next_desc);
+ }
+
+ am65_cpsw_nuss_put_tx_desc(tx_chn, first_desc);
+}
+
static int am65_cpsw_nuss_tx_compl_packets(struct am65_cpsw_common *common,
int chn, unsigned int budget, bool *tdown)
{
@@ -1509,7 +1682,7 @@ static int am65_cpsw_nuss_tx_compl_packets(struct am65_cpsw_common *common,
total_bytes += pkt_len;
num_tx++;
- am65_cpsw_nuss_xmit_free(tx_chn, desc_tx);
+ am65_cpsw_nuss_xmit_recycle(tx_chn, desc_tx);
dev_sw_netstats_tx_add(ndev, 1, pkt_len);
if (!single_port) {
/* as packets from multi ports can be interleaved
@@ -1624,7 +1797,7 @@ static netdev_tx_t am65_cpsw_nuss_ndo_slave_xmit(struct sk_buff *skb,
goto err_free_skb;
}
- first_desc = k3_cppi_desc_pool_alloc(tx_chn->desc_pool);
+ first_desc = am65_cpsw_nuss_get_tx_desc(tx_chn);
if (!first_desc) {
dev_dbg(dev, "Failed to allocate descriptor\n");
dma_unmap_single(tx_chn->dma_dev, buf_dma, pkt_len,
@@ -1672,7 +1845,7 @@ static netdev_tx_t am65_cpsw_nuss_ndo_slave_xmit(struct sk_buff *skb,
skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
u32 frag_size = skb_frag_size(frag);
- next_desc = k3_cppi_desc_pool_alloc(tx_chn->desc_pool);
+ next_desc = am65_cpsw_nuss_get_tx_desc(tx_chn);
if (!next_desc) {
dev_err(dev, "Failed to allocate descriptor\n");
goto busy_free_descs;
@@ -1682,7 +1855,7 @@ static netdev_tx_t am65_cpsw_nuss_ndo_slave_xmit(struct sk_buff *skb,
DMA_TO_DEVICE);
if (unlikely(dma_mapping_error(tx_chn->dma_dev, buf_dma))) {
dev_err(dev, "Failed to map tx skb page\n");
- k3_cppi_desc_pool_free(tx_chn->desc_pool, next_desc);
+ am65_cpsw_nuss_put_tx_desc(tx_chn, next_desc);
ndev->stats.tx_errors++;
goto err_free_descs;
}
@@ -1725,14 +1898,14 @@ static netdev_tx_t am65_cpsw_nuss_ndo_slave_xmit(struct sk_buff *skb,
goto err_free_descs;
}
- if (k3_cppi_desc_pool_avail(tx_chn->desc_pool) < MAX_SKB_FRAGS) {
+ if (am65_cpsw_nuss_num_free_tx_desc(tx_chn) < MAX_SKB_FRAGS) {
netif_tx_stop_queue(netif_txq);
/* Barrier, so that stop_queue visible to other cpus */
smp_mb__after_atomic();
dev_dbg(dev, "netif_tx_stop_queue %d\n", q_idx);
/* re-check for smp */
- if (k3_cppi_desc_pool_avail(tx_chn->desc_pool) >=
+ if (am65_cpsw_nuss_num_free_tx_desc(tx_chn) >=
MAX_SKB_FRAGS) {
netif_tx_wake_queue(netif_txq);
dev_dbg(dev, "netif_tx_wake_queue %d\n", q_idx);
@@ -1742,14 +1915,14 @@ static netdev_tx_t am65_cpsw_nuss_ndo_slave_xmit(struct sk_buff *skb,
return NETDEV_TX_OK;
err_free_descs:
- am65_cpsw_nuss_xmit_free(tx_chn, first_desc);
+ am65_cpsw_nuss_xmit_recycle(tx_chn, first_desc);
err_free_skb:
ndev->stats.tx_dropped++;
dev_kfree_skb_any(skb);
return NETDEV_TX_OK;
busy_free_descs:
- am65_cpsw_nuss_xmit_free(tx_chn, first_desc);
+ am65_cpsw_nuss_xmit_recycle(tx_chn, first_desc);
busy_stop_q:
netif_tx_stop_queue(netif_txq);
return NETDEV_TX_BUSY;
@@ -2195,7 +2368,7 @@ static void am65_cpsw_nuss_free_tx_chns(void *data)
static void am65_cpsw_nuss_remove_tx_chns(struct am65_cpsw_common *common)
{
struct device *dev = common->dev;
- int i;
+ int i, j;
common->tx_ch_rate_msk = 0;
for (i = 0; i < common->tx_ch_num; i++) {
@@ -2205,6 +2378,10 @@ static void am65_cpsw_nuss_remove_tx_chns(struct am65_cpsw_common *common)
devm_free_irq(dev, tx_chn->irq, tx_chn);
netif_napi_del(&tx_chn->napi_tx);
+
+ for (j = 0; j < AM65_CPSW_MAX_TX_DESC; j++)
+ k3_cppi_desc_pool_free(tx_chn->desc_pool,
+ tx_chn->tx_ring.tx_descs[j]);
}
am65_cpsw_nuss_free_tx_chns(common);
@@ -2260,7 +2437,7 @@ static int am65_cpsw_nuss_init_tx_chns(struct am65_cpsw_common *common)
.flags = 0
};
u32 hdesc_size, hdesc_size_out;
- int i, ret = 0;
+ int i, j, ret = 0;
hdesc_size = cppi5_hdesc_calc_size(true, AM65_CPSW_NAV_PS_DATA_SIZE,
AM65_CPSW_NAV_SW_DATA_SIZE);
@@ -2329,6 +2506,13 @@ static int am65_cpsw_nuss_init_tx_chns(struct am65_cpsw_common *common)
return 0;
err:
+ /* Free descriptors */
+ while (i--) {
+ struct am65_cpsw_tx_chn *tx_chn = &common->tx_chns[i];
+
+ for (j = 0; j < AM65_CPSW_MAX_TX_DESC; j++)
+ k3_cppi_desc_pool_free(tx_chn->desc_pool, tx_chn->tx_ring.tx_descs[j]);
+ }
am65_cpsw_nuss_free_tx_chns(common);
return ret;
@@ -2353,7 +2537,7 @@ static void am65_cpsw_nuss_remove_rx_chns(struct am65_cpsw_common *common)
struct device *dev = common->dev;
struct am65_cpsw_rx_chn *rx_chn;
struct am65_cpsw_rx_flow *flows;
- int i;
+ int i, j;
rx_chn = &common->rx_chns;
flows = rx_chn->flows;
@@ -2362,6 +2546,9 @@ static void am65_cpsw_nuss_remove_rx_chns(struct am65_cpsw_common *common)
if (!(flows[i].irq < 0))
devm_free_irq(dev, flows[i].irq, &flows[i]);
netif_napi_del(&flows[i].napi_rx);
+ for (j = 0; j < AM65_CPSW_MAX_RX_DESC; j++)
+ k3_cppi_desc_pool_free(rx_chn->desc_pool,
+ flows[i].rx_ring.rx_descs[j]);
}
am65_cpsw_nuss_free_rx_chns(common);
@@ -2378,7 +2565,7 @@ static int am65_cpsw_nuss_init_rx_chns(struct am65_cpsw_common *common)
struct am65_cpsw_rx_flow *flow;
u32 hdesc_size, hdesc_size_out;
u32 fdqring_id;
- int i, ret = 0;
+ int i, j, ret = 0;
hdesc_size = cppi5_hdesc_calc_size(true, AM65_CPSW_NAV_PS_DATA_SIZE,
AM65_CPSW_NAV_SW_DATA_SIZE);
@@ -2498,10 +2685,14 @@ static int am65_cpsw_nuss_init_rx_chns(struct am65_cpsw_common *common)
err_request_irq:
netif_napi_del(&flow->napi_rx);
+ for (j = 0; j < AM65_CPSW_MAX_RX_DESC; j++)
+ k3_cppi_desc_pool_free(rx_chn->desc_pool, flow->rx_ring.rx_descs[j]);
err_flow:
for (--i; i >= 0; i--) {
flow = &rx_chn->flows[i];
+ for (j = 0; j < AM65_CPSW_MAX_RX_DESC; j++)
+ k3_cppi_desc_pool_free(rx_chn->desc_pool, flow->rx_ring.rx_descs[j]);
devm_free_irq(dev, flow->irq, flow);
netif_napi_del(&flow->napi_rx);
}
diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.h b/drivers/net/ethernet/ti/am65-cpsw-nuss.h
index 7750448e4746..e64b4cfd6f2c 100644
--- a/drivers/net/ethernet/ti/am65-cpsw-nuss.h
+++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.h
@@ -6,6 +6,7 @@
#ifndef AM65_CPSW_NUSS_H_
#define AM65_CPSW_NUSS_H_
+#include <linux/dma/ti-cppi5.h>
#include <linux/if_ether.h>
#include <linux/kernel.h>
#include <linux/module.h>
@@ -23,6 +24,10 @@ struct am65_cpts;
#define AM65_CPSW_MAX_QUEUES 8 /* both TX & RX */
+/* Number of TX/RX descriptors per channel/flow */
+#define AM65_CPSW_MAX_TX_DESC 500
+#define AM65_CPSW_MAX_RX_DESC 500
+
#define AM65_CPSW_PORT_VLAN_REG_OFFSET 0x014
struct am65_cpsw_slave_data {
@@ -75,6 +80,12 @@ struct am65_cpsw_host {
u32 vid_context;
};
+struct am65_cpsw_tx_ring {
+ struct cppi5_host_desc_t *tx_descs[AM65_CPSW_MAX_TX_DESC + 1];
+ atomic_t tx_desc_ring_head_idx; /* Points to dequeuing place for free descriptor */
+ atomic_t tx_desc_ring_tail_idx; /* Points to queuing place for freed descriptor */
+};
+
struct am65_cpsw_tx_chn {
struct device *dma_dev;
struct napi_struct napi_tx;
@@ -82,6 +93,7 @@ struct am65_cpsw_tx_chn {
struct k3_cppi_desc_pool *desc_pool;
struct k3_udma_glue_tx_channel *tx_chn;
spinlock_t lock; /* protect TX rings in multi-port mode */
+ struct am65_cpsw_tx_ring tx_ring;
struct hrtimer tx_hrtimer;
unsigned long tx_pace_timeout;
int irq;
@@ -92,12 +104,19 @@ struct am65_cpsw_tx_chn {
u32 rate_mbps;
};
+struct am65_cpsw_rx_ring {
+ struct cppi5_host_desc_t *rx_descs[AM65_CPSW_MAX_RX_DESC + 1];
+ atomic_t rx_desc_ring_head_idx; /* Points to dequeuing place for free descriptor */
+ atomic_t rx_desc_ring_tail_idx; /* Points to queuing place for freed descriptor */
+};
+
struct am65_cpsw_rx_flow {
u32 id;
struct napi_struct napi_rx;
struct am65_cpsw_common *common;
int irq;
bool irq_disabled;
+ struct am65_cpsw_rx_ring rx_ring;
struct hrtimer rx_hrtimer;
unsigned long rx_pace_timeout;
struct page_pool *page_pool;
--
2.51.1
^ permalink raw reply related
* [RFC PATCH 6/6] net: ethernet: ti: am65-cpsw-nuss: Enable batch processing for TX / TX CMPL
From: Siddharth Vadapalli @ 2026-03-25 12:38 UTC (permalink / raw)
To: peter.ujfalusi, vkoul, Frank.Li, andrew+netdev, davem, edumazet,
kuba, pabeni, nm, ssantosh, horms, c-vankar, mwalle
Cc: dmaengine, linux-kernel, netdev, linux-arm-kernel, danishanwar,
srk, s-vadapalli
In-Reply-To: <20260325123850.638748-1-s-vadapalli@ti.com>
Enable batch processing on the transmit and transmit completion paths by
submitting a batch of packet descriptors on transmit and similarly by
dequeueing a batch of packet descriptors on transmit completion.
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
---
drivers/net/ethernet/ti/am65-cpsw-nuss.c | 201 +++++++++++++++++++----
drivers/net/ethernet/ti/am65-cpsw-nuss.h | 12 ++
2 files changed, 178 insertions(+), 35 deletions(-)
diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
index fc165579a479..2b354af14cb7 100644
--- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c
+++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
@@ -1624,14 +1624,14 @@ static inline void am65_cpsw_nuss_xmit_recycle(struct am65_cpsw_tx_chn *tx_chn,
am65_cpsw_nuss_put_tx_desc(tx_chn, first_desc);
}
-static int am65_cpsw_nuss_tx_compl_packets(struct am65_cpsw_common *common,
- int chn, unsigned int budget, bool *tdown)
+static int am65_cpsw_nuss_tx_cmpl_free_batch(struct am65_cpsw_common *common, int chn,
+ u32 batch_size, unsigned int budget,
+ bool *tdown)
{
bool single_port = AM65_CPSW_IS_CPSW2G(common);
enum am65_cpsw_tx_buf_type buf_type;
struct am65_cpsw_tx_swdata *swdata;
struct cppi5_host_desc_t *desc_tx;
- struct device *dev = common->dev;
struct am65_cpsw_tx_chn *tx_chn;
struct netdev_queue *netif_txq;
unsigned int total_bytes = 0;
@@ -1640,21 +1640,13 @@ static int am65_cpsw_nuss_tx_compl_packets(struct am65_cpsw_common *common,
unsigned int pkt_len;
struct sk_buff *skb;
dma_addr_t desc_dma;
- int res, num_tx = 0;
+ int num_tx = 0, i;
tx_chn = &common->tx_chns[chn];
- while (true) {
- if (!single_port)
- spin_lock(&tx_chn->lock);
- res = k3_udma_glue_pop_tx_chn(tx_chn->tx_chn, &desc_dma);
- if (!single_port)
- spin_unlock(&tx_chn->lock);
-
- if (res == -ENODATA)
- break;
-
- if (cppi5_desc_is_tdcm(desc_dma)) {
+ for (i = 0; i < batch_size; i++) {
+ desc_dma = tx_chn->cmpl_desc_dma_array[i];
+ if (unlikely(cppi5_desc_is_tdcm(desc_dma))) {
if (atomic_dec_and_test(&common->tdown_cnt))
complete(&common->tdown_complete);
*tdown = true;
@@ -1701,7 +1693,34 @@ static int am65_cpsw_nuss_tx_compl_packets(struct am65_cpsw_common *common,
am65_cpsw_nuss_tx_wake(tx_chn, ndev, netif_txq);
}
- dev_dbg(dev, "%s:%u pkt:%d\n", __func__, chn, num_tx);
+ return num_tx;
+}
+
+static int am65_cpsw_nuss_tx_compl_packets(struct am65_cpsw_common *common,
+ int chn, unsigned int budget, bool *tdown)
+{
+ bool single_port = AM65_CPSW_IS_CPSW2G(common);
+ struct am65_cpsw_tx_chn *tx_chn;
+ u32 batch_size = 0;
+ int res, num_tx;
+
+ tx_chn = &common->tx_chns[chn];
+
+ if (!single_port)
+ spin_lock(&tx_chn->lock);
+
+ res = k3_udma_glue_pop_tx_chn_batch(tx_chn->tx_chn, tx_chn->cmpl_desc_dma_array,
+ &batch_size, AM65_CPSW_TX_BATCH_SIZE);
+ if (!batch_size) {
+ if (!single_port)
+ spin_unlock(&tx_chn->lock);
+ return 0;
+ }
+
+ num_tx = am65_cpsw_nuss_tx_cmpl_free_batch(common, chn, batch_size, budget, tdown);
+
+ if (!single_port)
+ spin_unlock(&tx_chn->lock);
return num_tx;
}
@@ -1760,18 +1779,48 @@ static irqreturn_t am65_cpsw_nuss_tx_irq(int irq, void *dev_id)
return IRQ_HANDLED;
}
+static void am65_cpsw_nuss_submit_ndev_batch(struct am65_cpsw_common *common)
+{
+ bool single_port = AM65_CPSW_IS_CPSW2G(common);
+ struct am65_cpsw_tx_desc_batch *tx_desc_batch;
+ struct am65_cpsw_tx_chn *tx_chn;
+ int ret, i;
+
+ /* Submit packets across netdevs across TX Channels */
+ for (i = 0; i < AM65_CPSW_MAX_QUEUES; i++) {
+ if (common->tx_desc_batch[i].tx_batch_idx) {
+ tx_chn = &common->tx_chns[i];
+ tx_desc_batch = &common->tx_desc_batch[i];
+ if (!single_port)
+ spin_lock_bh(&tx_chn->lock);
+ ret = k3_udma_glue_push_tx_chn_batch(tx_chn->tx_chn,
+ tx_desc_batch->desc_tx_array,
+ tx_desc_batch->desc_dma_array,
+ tx_desc_batch->tx_batch_idx);
+ if (!single_port)
+ spin_unlock_bh(&tx_chn->lock);
+ if (ret)
+ dev_err(common->dev, "failed to push %u pkts on queue %d\n",
+ tx_desc_batch->tx_batch_idx, i);
+ tx_desc_batch->tx_batch_idx = 0;
+ }
+ }
+ atomic_set(&common->tx_batch_count, 0);
+}
+
static netdev_tx_t am65_cpsw_nuss_ndo_slave_xmit(struct sk_buff *skb,
struct net_device *ndev)
{
struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
struct cppi5_host_desc_t *first_desc, *next_desc, *cur_desc;
struct am65_cpsw_port *port = am65_ndev_to_port(ndev);
+ struct am65_cpsw_tx_desc_batch *tx_desc_batch;
struct am65_cpsw_tx_swdata *swdata;
struct device *dev = common->dev;
struct am65_cpsw_tx_chn *tx_chn;
struct netdev_queue *netif_txq;
dma_addr_t desc_dma, buf_dma;
- int ret, q_idx, i;
+ int q_idx, i;
u32 *psdata;
u32 pkt_len;
@@ -1883,20 +1932,31 @@ static netdev_tx_t am65_cpsw_nuss_ndo_slave_xmit(struct sk_buff *skb,
cppi5_hdesc_set_pktlen(first_desc, pkt_len);
desc_dma = k3_cppi_desc_pool_virt2dma(tx_chn->desc_pool, first_desc);
- if (AM65_CPSW_IS_CPSW2G(common)) {
- ret = k3_udma_glue_push_tx_chn(tx_chn->tx_chn, first_desc, desc_dma);
- } else {
- spin_lock_bh(&tx_chn->lock);
- ret = k3_udma_glue_push_tx_chn(tx_chn->tx_chn, first_desc, desc_dma);
- spin_unlock_bh(&tx_chn->lock);
- }
- if (ret) {
- dev_err(dev, "can't push desc %d\n", ret);
- /* inform bql */
- netdev_tx_completed_queue(netif_txq, 1, pkt_len);
- ndev->stats.tx_errors++;
- goto err_free_descs;
- }
+
+ /* Batch processing begins */
+ spin_lock_bh(&common->tx_batch_lock);
+
+ tx_desc_batch = &common->tx_desc_batch[q_idx];
+ tx_desc_batch->desc_tx_array[tx_desc_batch->tx_batch_idx] = first_desc;
+ tx_desc_batch->desc_dma_array[tx_desc_batch->tx_batch_idx] = desc_dma;
+ tx_desc_batch->tx_batch_idx++;
+
+ /* Push the batch across all queues and all netdevs in any of the
+ * following scenarios:
+ * 1. If we reach the batch size
+ * 2. If queue is stopped
+ * 3. No more packets are expected for ndev
+ * 4. We do not have sufficient free descriptors for upcoming packets
+ * and need to push the batch to reclaim them via completion
+ */
+ if ((atomic_inc_return(&common->tx_batch_count) == AM65_CPSW_TX_BATCH_SIZE) ||
+ netif_xmit_stopped(netif_txq) ||
+ !netdev_xmit_more() ||
+ (am65_cpsw_nuss_num_free_tx_desc(tx_chn) < MAX_SKB_FRAGS))
+ am65_cpsw_nuss_submit_ndev_batch(common);
+
+ /* Batch processing ends */
+ spin_unlock_bh(&common->tx_batch_lock);
if (am65_cpsw_nuss_num_free_tx_desc(tx_chn) < MAX_SKB_FRAGS) {
netif_tx_stop_queue(netif_txq);
@@ -2121,19 +2181,88 @@ static int am65_cpsw_ndo_xdp_xmit(struct net_device *ndev, int n,
struct xdp_frame **frames, u32 flags)
{
struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
+ struct am65_cpsw_port *port = am65_ndev_to_port(ndev);
+ struct am65_cpsw_tx_desc_batch *tx_desc_batch;
+ struct cppi5_host_desc_t *host_desc;
+ struct am65_cpsw_tx_swdata *swdata;
struct am65_cpsw_tx_chn *tx_chn;
struct netdev_queue *netif_txq;
+ dma_addr_t dma_desc, dma_buf;
int cpu = smp_processor_id();
- int i, nxmit = 0;
+ int i, q_idx, nxmit = 0;
+ struct xdp_frame *xdpf;
+ u32 pkt_len;
- tx_chn = &common->tx_chns[cpu % common->tx_ch_num];
+ q_idx = cpu % common->tx_ch_num;
+ tx_chn = &common->tx_chns[q_idx];
netif_txq = netdev_get_tx_queue(ndev, tx_chn->id);
__netif_tx_lock(netif_txq, cpu);
for (i = 0; i < n; i++) {
- if (am65_cpsw_xdp_tx_frame(ndev, tx_chn, frames[i],
- AM65_CPSW_TX_BUF_TYPE_XDP_NDO))
+ host_desc = am65_cpsw_nuss_get_tx_desc(tx_chn);
+ if (unlikely(!host_desc)) {
+ ndev->stats.tx_dropped++;
+ break;
+ }
+
+ xdpf = frames[i];
+ pkt_len = xdpf->len;
+
+ am65_cpsw_nuss_set_buf_type(tx_chn, host_desc, AM65_CPSW_TX_BUF_TYPE_XDP_NDO);
+
+ dma_buf = dma_map_single(tx_chn->dma_dev, xdpf->data,
+ pkt_len, DMA_TO_DEVICE);
+ if (unlikely(dma_mapping_error(tx_chn->dma_dev, dma_buf))) {
+ ndev->stats.tx_dropped++;
+ am65_cpsw_nuss_put_tx_desc(tx_chn, host_desc);
break;
+ }
+
+ cppi5_hdesc_init(host_desc, CPPI5_INFO0_HDESC_EPIB_PRESENT,
+ AM65_CPSW_NAV_PS_DATA_SIZE);
+ cppi5_hdesc_set_pkttype(host_desc, AM65_CPSW_CPPI_TX_PKT_TYPE);
+ cppi5_hdesc_set_pktlen(host_desc, pkt_len);
+ cppi5_desc_set_pktids(&host_desc->hdr, 0, AM65_CPSW_CPPI_TX_FLOW_ID);
+ cppi5_desc_set_tags_ids(&host_desc->hdr, 0, port->port_id);
+
+ k3_udma_glue_tx_dma_to_cppi5_addr(tx_chn->tx_chn, &dma_buf);
+ cppi5_hdesc_attach_buf(host_desc, dma_buf, pkt_len, dma_buf, pkt_len);
+
+ swdata = cppi5_hdesc_get_swdata(host_desc);
+ swdata->ndev = ndev;
+ swdata->xdpf = xdpf;
+
+ /* Report BQL before sending the packet */
+ netif_txq = netdev_get_tx_queue(ndev, tx_chn->id);
+ netdev_tx_sent_queue(netif_txq, pkt_len);
+
+ dma_desc = k3_cppi_desc_pool_virt2dma(tx_chn->desc_pool, host_desc);
+
+ /* Batch processing begins */
+ spin_lock_bh(&common->tx_batch_lock);
+
+ tx_desc_batch = &common->tx_desc_batch[q_idx];
+ tx_desc_batch->desc_tx_array[tx_desc_batch->tx_batch_idx] = host_desc;
+ tx_desc_batch->desc_dma_array[tx_desc_batch->tx_batch_idx] = dma_desc;
+ tx_desc_batch->tx_batch_idx++;
+
+ /* Push the batch across all queues and all netdevs in any of the
+ * following scenarios:
+ * 1. If we reach the batch size
+ * 2. If queue is stopped
+ * 3. We are at the last XDP frame in the batch
+ * 4. We do not have sufficient free descriptors for upcoming packets
+ * and need to push the batch to reclaim them via completion
+ */
+ if ((atomic_inc_return(&common->tx_batch_count) == AM65_CPSW_TX_BATCH_SIZE) ||
+ netif_xmit_stopped(netif_txq) ||
+ (i == (n - 1)) ||
+ (am65_cpsw_nuss_num_free_tx_desc(tx_chn) < MAX_SKB_FRAGS))
+ am65_cpsw_nuss_submit_ndev_batch(common);
+
+ /* Batch processing ends */
+ spin_unlock_bh(&common->tx_batch_lock);
+
nxmit++;
}
__netif_tx_unlock(netif_txq);
@@ -2497,6 +2626,8 @@ static int am65_cpsw_nuss_init_tx_chns(struct am65_cpsw_common *common)
dev_name(dev), tx_chn->id);
}
+ atomic_set(&common->tx_batch_count, 0);
+
ret = am65_cpsw_nuss_ndev_add_tx_napi(common);
if (ret) {
dev_err(dev, "Failed to add tx NAPI %d\n", ret);
diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.h b/drivers/net/ethernet/ti/am65-cpsw-nuss.h
index e64b4cfd6f2c..81405e3bed79 100644
--- a/drivers/net/ethernet/ti/am65-cpsw-nuss.h
+++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.h
@@ -28,6 +28,8 @@ struct am65_cpts;
#define AM65_CPSW_MAX_TX_DESC 500
#define AM65_CPSW_MAX_RX_DESC 500
+#define AM65_CPSW_TX_BATCH_SIZE 128
+
#define AM65_CPSW_PORT_VLAN_REG_OFFSET 0x014
struct am65_cpsw_slave_data {
@@ -93,6 +95,7 @@ struct am65_cpsw_tx_chn {
struct k3_cppi_desc_pool *desc_pool;
struct k3_udma_glue_tx_channel *tx_chn;
spinlock_t lock; /* protect TX rings in multi-port mode */
+ dma_addr_t cmpl_desc_dma_array[AM65_CPSW_TX_BATCH_SIZE];
struct am65_cpsw_tx_ring tx_ring;
struct hrtimer tx_hrtimer;
unsigned long tx_pace_timeout;
@@ -165,6 +168,12 @@ struct am65_cpsw_devlink {
struct am65_cpsw_common *common;
};
+struct am65_cpsw_tx_desc_batch {
+ struct cppi5_host_desc_t *desc_tx_array[AM65_CPSW_TX_BATCH_SIZE];
+ dma_addr_t desc_dma_array[AM65_CPSW_TX_BATCH_SIZE];
+ u8 tx_batch_idx;
+};
+
struct am65_cpsw_common {
struct device *dev;
struct device *mdio_dev;
@@ -188,6 +197,9 @@ struct am65_cpsw_common {
struct am65_cpsw_tx_chn tx_chns[AM65_CPSW_MAX_QUEUES];
struct completion tdown_complete;
atomic_t tdown_cnt;
+ atomic_t tx_batch_count;
+ spinlock_t tx_batch_lock; /* protect TX batch operations */
+ struct am65_cpsw_tx_desc_batch tx_desc_batch[AM65_CPSW_MAX_QUEUES];
int rx_ch_num_flows;
struct am65_cpsw_rx_chn rx_chns;
--
2.51.1
^ permalink raw reply related
* [RFC PATCH 4/6] net: ethernet: ti: am65-cpsw-nuss: Do not set buf_type for SKB fragments
From: Siddharth Vadapalli @ 2026-03-25 12:38 UTC (permalink / raw)
To: peter.ujfalusi, vkoul, Frank.Li, andrew+netdev, davem, edumazet,
kuba, pabeni, nm, ssantosh, horms, c-vankar, mwalle
Cc: dmaengine, linux-kernel, netdev, linux-arm-kernel, danishanwar,
srk, s-vadapalli
In-Reply-To: <20260325123850.638748-1-s-vadapalli@ti.com>
There are two kinds of descriptors:
1. Host Packet Descriptor
2. Host Buffer Descriptor
Unfragmented SKBs are always associated with a single Host Packet
Descriptor. Fragmented SKBs on the other hand have the Start-of-Packet
SKB associated with a single Host Packet Descriptor and the remaining
fragments are associated with a Host Buffer Descriptor. A single Host
Packet Descriptor is linked to a chain of Host Buffer Descriptors for
fragmented SKBs with as many Host Buffer Descriptors as the number of
SKB fragments.
Since packet completion handling only uses the buffer type of the Host
Packet Descriptor, setting the buffer type of the linked Host Buffer
Descriptors is an unnecessary operation which wastes CPU cycles per SKB
fragment.
Hence, do not set buffer type for SKB fragments.
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
---
drivers/net/ethernet/ti/am65-cpsw-nuss.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
index d9400599e80a..6df6cb52d952 100644
--- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c
+++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
@@ -1678,9 +1678,6 @@ static netdev_tx_t am65_cpsw_nuss_ndo_slave_xmit(struct sk_buff *skb,
goto busy_free_descs;
}
- am65_cpsw_nuss_set_buf_type(tx_chn, next_desc,
- AM65_CPSW_TX_BUF_TYPE_SKB);
-
buf_dma = skb_frag_dma_map(tx_chn->dma_dev, frag, 0, frag_size,
DMA_TO_DEVICE);
if (unlikely(dma_mapping_error(tx_chn->dma_dev, buf_dma))) {
--
2.51.1
^ permalink raw reply related
* [RFC PATCH 3/6] dmaengine: ti: k3-udma-glue: Add helpers for batch operations on TX/RX DMA
From: Siddharth Vadapalli @ 2026-03-25 12:38 UTC (permalink / raw)
To: peter.ujfalusi, vkoul, Frank.Li, andrew+netdev, davem, edumazet,
kuba, pabeni, nm, ssantosh, horms, c-vankar, mwalle
Cc: dmaengine, linux-kernel, netdev, linux-arm-kernel, danishanwar,
srk, s-vadapalli
In-Reply-To: <20260325123850.638748-1-s-vadapalli@ti.com>
To allow pushing and popping a batch of DMA Descriptors on the Transmit
and Receive DMA Channels (Flows), introduce four helpers:
1. k3_udma_glue_push_tx_chn_batch
2. k3_udma_glue_pop_tx_chn_batch
3. k3_udma_glue_push_rx_chn_batch
4. k3_udma_glue_pop_rx_chn_batch
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
---
drivers/dma/ti/k3-udma-glue.c | 55 ++++++++++++++++++++++++++++++++
include/linux/dma/k3-udma-glue.h | 12 +++++++
2 files changed, 67 insertions(+)
diff --git a/drivers/dma/ti/k3-udma-glue.c b/drivers/dma/ti/k3-udma-glue.c
index f87d244cc2d6..15835c521977 100644
--- a/drivers/dma/ti/k3-udma-glue.c
+++ b/drivers/dma/ti/k3-udma-glue.c
@@ -485,6 +485,25 @@ int k3_udma_glue_push_tx_chn(struct k3_udma_glue_tx_channel *tx_chn,
}
EXPORT_SYMBOL_GPL(k3_udma_glue_push_tx_chn);
+int k3_udma_glue_push_tx_chn_batch(struct k3_udma_glue_tx_channel *tx_chn,
+ struct cppi5_host_desc_t **desc_tx,
+ dma_addr_t *desc_dma, u32 batch_size)
+{
+ u32 ringtxcq_id;
+ int i;
+
+ if (!atomic_add_unless(&tx_chn->free_pkts, -1 * batch_size, 0))
+ return -ENOMEM;
+
+ ringtxcq_id = k3_ringacc_get_ring_id(tx_chn->ringtxcq);
+
+ for (i = 0; i < batch_size; i++)
+ cppi5_desc_set_retpolicy(&desc_tx[i]->hdr, 0, ringtxcq_id);
+
+ return k3_ringacc_ring_push_batch(tx_chn->ringtx, desc_dma, batch_size);
+}
+EXPORT_SYMBOL_GPL(k3_udma_glue_push_tx_chn_batch);
+
int k3_udma_glue_pop_tx_chn(struct k3_udma_glue_tx_channel *tx_chn,
dma_addr_t *desc_dma)
{
@@ -498,6 +517,21 @@ int k3_udma_glue_pop_tx_chn(struct k3_udma_glue_tx_channel *tx_chn,
}
EXPORT_SYMBOL_GPL(k3_udma_glue_pop_tx_chn);
+int k3_udma_glue_pop_tx_chn_batch(struct k3_udma_glue_tx_channel *tx_chn,
+ dma_addr_t *desc_dma, u32 *batch_size,
+ u32 max_batch)
+{
+ int ret;
+
+ ret = k3_ringacc_ring_pop_batch(tx_chn->ringtxcq, desc_dma, batch_size,
+ max_batch);
+ if (!ret)
+ atomic_add(*batch_size, &tx_chn->free_pkts);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(k3_udma_glue_pop_tx_chn_batch);
+
int k3_udma_glue_enable_tx_chn(struct k3_udma_glue_tx_channel *tx_chn)
{
int ret;
@@ -1512,6 +1546,16 @@ int k3_udma_glue_push_rx_chn(struct k3_udma_glue_rx_channel *rx_chn,
}
EXPORT_SYMBOL_GPL(k3_udma_glue_push_rx_chn);
+int k3_udma_glue_push_rx_chn_batch(struct k3_udma_glue_rx_channel *rx_chn,
+ u32 flow_num, dma_addr_t desc_dma,
+ u32 batch_size)
+{
+ struct k3_udma_glue_rx_flow *flow = &rx_chn->flows[flow_num];
+
+ return k3_ringacc_ring_push_batch(flow->ringrxfdq, &desc_dma, batch_size);
+}
+EXPORT_SYMBOL_GPL(k3_udma_glue_push_rx_chn_batch);
+
int k3_udma_glue_pop_rx_chn(struct k3_udma_glue_rx_channel *rx_chn,
u32 flow_num, dma_addr_t *desc_dma)
{
@@ -1521,6 +1565,17 @@ int k3_udma_glue_pop_rx_chn(struct k3_udma_glue_rx_channel *rx_chn,
}
EXPORT_SYMBOL_GPL(k3_udma_glue_pop_rx_chn);
+int k3_udma_glue_pop_rx_chn_batch(struct k3_udma_glue_rx_channel *rx_chn,
+ u32 flow_num, dma_addr_t *desc_dma,
+ u32 *batch_size, u32 max_batch)
+{
+ struct k3_udma_glue_rx_flow *flow = &rx_chn->flows[flow_num];
+
+ return k3_ringacc_ring_pop_batch(flow->ringrx, desc_dma, batch_size,
+ max_batch);
+}
+EXPORT_SYMBOL_GPL(k3_udma_glue_pop_rx_chn_batch);
+
int k3_udma_glue_rx_get_irq(struct k3_udma_glue_rx_channel *rx_chn,
u32 flow_num)
{
diff --git a/include/linux/dma/k3-udma-glue.h b/include/linux/dma/k3-udma-glue.h
index 5d43881e6fb7..9fe3f51c230c 100644
--- a/include/linux/dma/k3-udma-glue.h
+++ b/include/linux/dma/k3-udma-glue.h
@@ -35,8 +35,14 @@ void k3_udma_glue_release_tx_chn(struct k3_udma_glue_tx_channel *tx_chn);
int k3_udma_glue_push_tx_chn(struct k3_udma_glue_tx_channel *tx_chn,
struct cppi5_host_desc_t *desc_tx,
dma_addr_t desc_dma);
+int k3_udma_glue_push_tx_chn_batch(struct k3_udma_glue_tx_channel *tx_chn,
+ struct cppi5_host_desc_t **desc_tx,
+ dma_addr_t *desc_dma, u32 batch_size);
int k3_udma_glue_pop_tx_chn(struct k3_udma_glue_tx_channel *tx_chn,
dma_addr_t *desc_dma);
+int k3_udma_glue_pop_tx_chn_batch(struct k3_udma_glue_tx_channel *tx_chn,
+ dma_addr_t *desc_dma, u32 *batch_size,
+ u32 max_batch);
int k3_udma_glue_enable_tx_chn(struct k3_udma_glue_tx_channel *tx_chn);
void k3_udma_glue_disable_tx_chn(struct k3_udma_glue_tx_channel *tx_chn);
void k3_udma_glue_tdown_tx_chn(struct k3_udma_glue_tx_channel *tx_chn,
@@ -127,8 +133,14 @@ void k3_udma_glue_tdown_rx_chn(struct k3_udma_glue_rx_channel *rx_chn,
int k3_udma_glue_push_rx_chn(struct k3_udma_glue_rx_channel *rx_chn,
u32 flow_num, struct cppi5_host_desc_t *desc_tx,
dma_addr_t desc_dma);
+int k3_udma_glue_push_rx_chn_batch(struct k3_udma_glue_rx_channel *rx_chn,
+ u32 flow_num, dma_addr_t desc_dma,
+ u32 batch_size);
int k3_udma_glue_pop_rx_chn(struct k3_udma_glue_rx_channel *rx_chn,
u32 flow_num, dma_addr_t *desc_dma);
+int k3_udma_glue_pop_rx_chn_batch(struct k3_udma_glue_rx_channel *rx_chn,
+ u32 flow_num, dma_addr_t *desc_dma,
+ u32 *batch_size, u32 max_batch);
int k3_udma_glue_rx_flow_init(struct k3_udma_glue_rx_channel *rx_chn,
u32 flow_idx, struct k3_udma_glue_rx_flow_cfg *flow_cfg);
u32 k3_udma_glue_rx_flow_get_fdq_id(struct k3_udma_glue_rx_channel *rx_chn,
--
2.51.1
^ permalink raw reply related
* [RFC PATCH 2/6] soc: ti: k3-ringacc: Add helpers for batch push and pop operations
From: Siddharth Vadapalli @ 2026-03-25 12:38 UTC (permalink / raw)
To: peter.ujfalusi, vkoul, Frank.Li, andrew+netdev, davem, edumazet,
kuba, pabeni, nm, ssantosh, horms, c-vankar, mwalle
Cc: dmaengine, linux-kernel, netdev, linux-arm-kernel, danishanwar,
srk, s-vadapalli
In-Reply-To: <20260325123850.638748-1-s-vadapalli@ti.com>
To allow pushing and popping a batch of descriptors at once to improve
efficiency, introduce two helpers:
1. k3_ringacc_ring_push_batch
2. k3_ringacc_ring_pop_batch
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
---
drivers/soc/ti/k3-ringacc.c | 88 +++++++++++++++++++++++++++++++
include/linux/soc/ti/k3-ringacc.h | 27 ++++++++++
2 files changed, 115 insertions(+)
diff --git a/drivers/soc/ti/k3-ringacc.c b/drivers/soc/ti/k3-ringacc.c
index 1751d42ee2d3..33ae7db9c2a1 100644
--- a/drivers/soc/ti/k3-ringacc.c
+++ b/drivers/soc/ti/k3-ringacc.c
@@ -1223,6 +1223,41 @@ int k3_ringacc_ring_push(struct k3_ring *ring, void *elem)
}
EXPORT_SYMBOL_GPL(k3_ringacc_ring_push);
+int k3_ringacc_ring_push_batch(struct k3_ring *ring, void *elem_arr,
+ u32 batch_size)
+{
+ void *elem_ptr, *elem;
+ int ret = 0;
+ u32 i;
+
+ if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
+ return -EINVAL;
+
+ if (k3_ringacc_ring_get_free(ring) < batch_size)
+ if (k3_ringacc_ring_get_rt_free(ring) < batch_size)
+ return -ENOMEM;
+
+ dev_dbg(ring->parent->dev, "ring_push_batch: free%d index%d\n",
+ ring->state.free, ring->state.windex);
+
+ for (i = 0; i < batch_size; i++) {
+ elem_ptr = k3_ringacc_get_elm_addr(ring, ring->state.windex);
+ elem = &((dma_addr_t *)elem_arr)[i];
+ memcpy(elem_ptr, elem, (4 << ring->elm_size));
+ if (ring->parent->dma_rings) {
+ u64 *addr = elem_ptr;
+ *addr |= ((u64)ring->asel << K3_ADDRESS_ASEL_SHIFT);
+ }
+ ring->state.windex = (ring->state.windex + 1) % ring->size;
+ }
+
+ ring->state.free -= batch_size;
+ writel(batch_size, &ring->rt->db);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(k3_ringacc_ring_push_batch);
+
int k3_ringacc_ring_push_head(struct k3_ring *ring, void *elem)
{
int ret = -EOPNOTSUPP;
@@ -1266,6 +1301,59 @@ int k3_ringacc_ring_pop(struct k3_ring *ring, void *elem)
}
EXPORT_SYMBOL_GPL(k3_ringacc_ring_pop);
+int k3_ringacc_ring_pop_batch(struct k3_ring *ring, void *elem_arr,
+ u32 *batch_size, u32 max_batch)
+{
+ void *elem_ptr, *elem;
+ u32 ring_occupancy, i;
+ u32 num_to_pop;
+
+ if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
+ return -EINVAL;
+
+ if (!ring->state.occ || ring->state.occ < max_batch)
+ k3_ringacc_ring_update_occ(ring);
+
+ if (!ring->state.occ) {
+ if (likely(!ring->state.tdown_complete))
+ return -ENODATA;
+
+ /* Handle teardown */
+ elem = &((dma_addr_t *)elem_arr)[0];
+ dma_addr_t *value = elem;
+ *value = CPPI5_TDCM_MARKER;
+ writel(K3_DMARING_RT_DB_TDOWN_ACK, &ring->rt->db);
+ ring->state.tdown_complete = false;
+ *batch_size = 1;
+ return 0;
+ }
+
+ ring_occupancy = ring->state.occ;
+ if (ring_occupancy > max_batch)
+ num_to_pop = max_batch;
+ else
+ num_to_pop = ring_occupancy;
+
+ dev_dbg(ring->parent->dev, "ring_pop_batch: occ%d index%d\n",
+ ring->state.occ, ring->state.rindex);
+
+ for (i = 0; i < num_to_pop; i++) {
+ elem_ptr = k3_ringacc_get_elm_addr(ring, ring->state.rindex);
+ elem = &((dma_addr_t *)elem_arr)[i];
+ memcpy(elem, elem_ptr, (4 << ring->elm_size));
+ k3_dmaring_remove_asel_from_elem(elem);
+ ring->state.rindex = (ring->state.rindex + 1) % ring->size;
+ dev_dbg(ring->parent->dev, "occ%d index%d pos_ptr%p\n",
+ ring->state.occ, ring->state.rindex, elem_ptr);
+ }
+ ring->state.occ -= num_to_pop;
+ writel(-1 * num_to_pop, &ring->rt->db);
+ *batch_size = num_to_pop;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(k3_ringacc_ring_pop_batch);
+
int k3_ringacc_ring_pop_tail(struct k3_ring *ring, void *elem)
{
int ret = -EOPNOTSUPP;
diff --git a/include/linux/soc/ti/k3-ringacc.h b/include/linux/soc/ti/k3-ringacc.h
index 091cf551932d..6fffa65ee760 100644
--- a/include/linux/soc/ti/k3-ringacc.h
+++ b/include/linux/soc/ti/k3-ringacc.h
@@ -220,6 +220,19 @@ u32 k3_ringacc_ring_is_full(struct k3_ring *ring);
*/
int k3_ringacc_ring_push(struct k3_ring *ring, void *elem);
+/**
+ * k3_ringacc_ring_push_batch - push a batch of elements to the ring tail
+ * @ring: pointer on ring
+ * @elem_arr: pointer to array of ring element buffers
+ * @batch_size: count of element buffers to be pushed
+ *
+ * Push the batch of element buffers to the ring tail.
+ *
+ * Returns 0 on success, errno otherwise.
+ */
+int k3_ringacc_ring_push_batch(struct k3_ring *ring, void *elem_arr,
+ u32 batch_size);
+
/**
* k3_ringacc_ring_pop - pop element from the ring head
* @ring: pointer on ring
@@ -232,6 +245,20 @@ int k3_ringacc_ring_push(struct k3_ring *ring, void *elem);
*/
int k3_ringacc_ring_pop(struct k3_ring *ring, void *elem);
+/**
+ * k3_ringacc_ring_pop_batch - pop all elements from the ring head
+ * @ring: pointer on ring
+ * @elem_ar: pointer to array of ring element buffers
+ * @batch_size: pointer to count of elements popped from ring
+ * @max_batch: maximum number of elements to pop
+ *
+ * Pop a batch of element buffers from the ring head.
+ *
+ * Returns 0 on success, errno otherwise.
+ */
+int k3_ringacc_ring_pop_batch(struct k3_ring *ring, void *elem_arr,
+ u32 *batch_size, u32 max_batch);
+
/**
* k3_ringacc_ring_push_head - push element to the ring head
* @ring: pointer on ring
--
2.51.1
^ permalink raw reply related
* [RFC PATCH 1/6] soc: ti: k3-ringacc: Add helper to get realtime count of free elements
From: Siddharth Vadapalli @ 2026-03-25 12:38 UTC (permalink / raw)
To: peter.ujfalusi, vkoul, Frank.Li, andrew+netdev, davem, edumazet,
kuba, pabeni, nm, ssantosh, horms, c-vankar, mwalle
Cc: dmaengine, linux-kernel, netdev, linux-arm-kernel, danishanwar,
srk, s-vadapalli
In-Reply-To: <20260325123850.638748-1-s-vadapalli@ti.com>
The existing helper k3_ringacc_ring_get_free() updates the count of free
elements only when the software maintained counter decrements to zero.
As a result, for batch processing, we may read a lower count of free
elements than the actual count. To address this, introduce a new helper
that provides realtime count of free elements.
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
---
drivers/soc/ti/k3-ringacc.c | 11 +++++++++++
include/linux/soc/ti/k3-ringacc.h | 8 ++++++++
2 files changed, 19 insertions(+)
diff --git a/drivers/soc/ti/k3-ringacc.c b/drivers/soc/ti/k3-ringacc.c
index 7602b8a909b0..1751d42ee2d3 100644
--- a/drivers/soc/ti/k3-ringacc.c
+++ b/drivers/soc/ti/k3-ringacc.c
@@ -905,6 +905,17 @@ u32 k3_ringacc_ring_get_free(struct k3_ring *ring)
}
EXPORT_SYMBOL_GPL(k3_ringacc_ring_get_free);
+u32 k3_ringacc_ring_get_rt_free(struct k3_ring *ring)
+{
+ if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
+ return -EINVAL;
+
+ ring->state.free = ring->size - k3_ringacc_ring_read_occ(ring);
+
+ return ring->state.free;
+}
+EXPORT_SYMBOL_GPL(k3_ringacc_ring_get_rt_free);
+
u32 k3_ringacc_ring_get_occ(struct k3_ring *ring)
{
if (!ring || !(ring->flags & K3_RING_FLAG_BUSY))
diff --git a/include/linux/soc/ti/k3-ringacc.h b/include/linux/soc/ti/k3-ringacc.h
index 39b022b92598..091cf551932d 100644
--- a/include/linux/soc/ti/k3-ringacc.h
+++ b/include/linux/soc/ti/k3-ringacc.h
@@ -184,6 +184,14 @@ u32 k3_ringacc_ring_get_size(struct k3_ring *ring);
*/
u32 k3_ringacc_ring_get_free(struct k3_ring *ring);
+/**
+ * k3_ringacc_ring_get_rt_free - get realtime value of free elements
+ * @ring: pointer on ring
+ *
+ * Returns realtime count of free elements in the ring.
+ */
+u32 k3_ringacc_ring_get_rt_free(struct k3_ring *ring);
+
/**
* k3_ringacc_ring_get_occ - get ring occupancy
* @ring: pointer on ring
--
2.51.1
^ permalink raw reply related
* [RFC PATCH 0/6] Descriptor Recycling and Batch processing for CPSW
From: Siddharth Vadapalli @ 2026-03-25 12:38 UTC (permalink / raw)
To: peter.ujfalusi, vkoul, Frank.Li, andrew+netdev, davem, edumazet,
kuba, pabeni, nm, ssantosh, horms, c-vankar, mwalle
Cc: dmaengine, linux-kernel, netdev, linux-arm-kernel, danishanwar,
srk, s-vadapalli
Hello,
NOTE for MAINTAINERS:
Patches in this series span 3 subsystems and I have posted this as an RFC
series to make it easy for the reviewers to understand the complete
implementation. I will eventually split the series and post them
sequentially to the respective subsystem's mailing list:
1. SoC
2. DMAEngine
3. Netdev
Series is based on commit
d1e59a469737 tcp: add cwnd_event_tx_start to tcp_congestion_ops
of the main branch of net-next tree. When I split the series in the
future, I shall base the patches for SoC and DMAEngine on linux-next
and the patches for Netdev on net-next.
This series enables batch processing for the am65-cpsw-nuss.c driver
on the transmit path (ndo_start_xmit and ndo_xdp_xmit) and transmit
completion path. Additionally, this series also recycles descriptors
instead of releasing them to the pool and reallocating them. The
difference in memory footprint without this series and with this series
is hardly noticeable (being under 1 MB).
Feedback on the implementation w.r.t. correctness, ease of use /
maintenance and configurability (sysfs based option for changing batch
size) is appreciated.
Series has been tested in the following combinations to cover edge
cases:
1. Single-Port (CPSW2G on J784S4-EVM)
2. Multi-Port (CPSW3G on AM625-SK)
3. Bidirectional TCP Iperf followed by interfaces being brought down
with traffic in flight (and TX / RX DMA Channel Teardown) followed
by interfaces being brought up and ensuring that Iperf traffic
resumes.
The primary motivation for this series is to improve performance in
terms of lowering the CPU load and achieving higher throughput for
gigabit and multi-gigabit operation.
The upcoming features that I plan to implement are:
1. Enable batch processing on RX
2. Batch processing on ICSSG similar to CPSW (since batch processing
increases latency, it might not be desirable to enable batch
processing and may be skipped as well).
The following sections capture the improvements brought about by this
series.
[1] AM625-SK with CPSW3G (multi-port / two netdevs) and single A53
processor (remaining CPUs are disabled) with each MAC Port operating
at 1 Gbps Full-Duplex.
===========================================================================
Baseline for [1]
===========================================================================
Dual TX Iperf UDP traffic at 100% CPU Load averaged over 30 seconds:
403 Mbps + 408 Mbps = 811 Mbps
Dual RX Iperf TCP traffic at 100% CPU Load averaged over 30 seconds:
336 Mbps + 331 Mbps = 667 Mbps
===========================================================================
With this series for [1]
===========================================================================
Dual TX Iperf UDP traffic at 100% CPU Load averaged over 30 seconds:
428 Mbps + 437 Mbps = 865 Mbps
Dual RX Iperf TCP traffic at 100% CPU Load averaged over 30 seconds:
332 Mbps + 337 Mbps = 669 Mbps
[2] J784S4-EVM with CPSW2G (single-port) and single A72 processor
(remaining CPUs are disabled) with the MAC Port operating at 1 Gbps Full-
Duplex.
===========================================================================
Baseline for [2]
===========================================================================
TX Iperf UDP traffic at 84% CPU Load averaged over 30 seconds:
956 Mbps
RX Iperf TCP traffic at 100% CPU Load averaged over 30 seconds:
941 Mbps
===========================================================================
With this series for [2]
===========================================================================
TX Iperf UDP traffic at 80% CPU Load averaged over 30 seconds:
956 Mbps
RX Iperf TCP traffic at 100% CPU Load averaged over 30 seconds:
941 Mbps
[3] J784S4-EVM with CPSW9G (multi-port) and single A72 processor
(remaining CPUs are disabled) with one MAC Port operating at 5 Gbps
Full-Duplex.
===========================================================================
Baseline for [3]
===========================================================================
TX Iperf UDP traffic at 100% CPU Load averaged over 30 seconds:
1.26 Gbps
RX Iperf TCP traffic at 75% CPU Load averaged over 30 seconds:
1.73 Gbps
===========================================================================
With this series for [3]
===========================================================================
TX Iperf UDP traffic at 100% CPU Load averaged over 30 seconds:
1.28 Gbps
RX Iperf TCP traffic at 75% CPU Load averaged over 30 seconds:
1.75 Gbps
Regards,
Siddharth.
Siddharth Vadapalli (6):
soc: ti: k3-ringacc: Add helper to get realtime count of free elements
soc: ti: k3-ringacc: Add helpers for batch push and pop operations
dmaengine: ti: k3-udma-glue: Add helpers for batch operations on TX/RX
DMA
net: ethernet: ti: am65-cpsw-nuss: Do not set buf_type for SKB
fragments
net: ethernet: ti: am65-cpsw-nuss: Recycle TX and RX CPPI Descriptors
net: ethernet: ti: am65-cpsw-nuss: Enable batch processing for TX / TX
CMPL
drivers/dma/ti/k3-udma-glue.c | 55 +++
drivers/net/ethernet/ti/am65-cpsw-nuss.c | 441 +++++++++++++++++++----
drivers/net/ethernet/ti/am65-cpsw-nuss.h | 31 ++
drivers/soc/ti/k3-ringacc.c | 99 +++++
include/linux/dma/k3-udma-glue.h | 12 +
include/linux/soc/ti/k3-ringacc.h | 35 ++
6 files changed, 612 insertions(+), 61 deletions(-)
--
2.51.1
^ permalink raw reply
* Re: [PATCH v2 08/13] firmware: arm_scmi: Harden clock protocol initialization
From: Cristian Marussi @ 2026-03-25 12:27 UTC (permalink / raw)
To: Marek Szyprowski
Cc: Cristian Marussi, linux-kernel, linux-arm-kernel, arm-scmi,
linux-clk, linux-renesas-soc, sudeep.holla, philip.radford,
james.quinlan, f.fainelli, vincent.guittot, etienne.carriere,
peng.fan, michal.simek, dan.carpenter, geert+renesas,
kuninori.morimoto.gx, marek.vasut+renesas
In-Reply-To: <9b574ac5-09fa-4e7a-b2bb-a339fbb319bc@samsung.com>
On Wed, Mar 25, 2026 at 12:02:41PM +0100, Marek Szyprowski wrote:
> On 10.03.2026 19:40, Cristian Marussi wrote:
> > Add proper error handling on failure to enumerate clocks features or
> > rates.
> >
> > Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
>
Hi Marek,
> This patch landed yesterday in linux-next as commit 0d8b0c8068a8
> ("firmware: arm_scmi: Harden clock protocol initialization"). In my
> tests I found that it causes a regression on RK3568 Odroid-M1 board
> (arch/arm64/boot/dts/rockchip/rk3568-odroid-m1.dts), cpufreq and GPU
> device are not probed properly:
>
> # dmesg | grep scmi
> scmi_core: SCMI protocol bus registered
> arm-scmi arm-scmi.0.auto: Using scmi_smc_transport
> arm-scmi arm-scmi.0.auto: SCMI max-rx-timeout: 30ms / max-msg-size:
> 104bytes / max-msg: 20
> scmi_protocol scmi_dev.1: Enabled polling mode TX channel - prot_id:16
> arm-scmi arm-scmi.0.auto: SCMI Notifications - Core Enabled.
> arm-scmi arm-scmi.0.auto: Malformed reply - real_sz:8 calc_sz:4
> (loop_num_ret:1)
> arm-scmi arm-scmi.0.auto: SCMI Protocol v2.0 'rockchip:' Firmware
> version 0x0
> arm-scmi arm-scmi.0.auto: Enabling SCMI Quirk
> [quirk_clock_rates_triplet_out_of_spec]
> scmi-clocks scmi_dev.3: probe with driver scmi-clocks failed with error -22
>
Yes there are multiple reports of issues on this hardening, the series
is on hold and wont go into v7.1 as of now...it needs some basic fixes
and various quirks probably to address non-compliant firmwares...
It will be pushed to next again with a few more fixes in the coming
days and then we'll need to figure out how many quirks will be needed on
top of that and if it is acceptable at all...
Thanks for the report,
Cristian
^ permalink raw reply
* Re: [PATCH v7 0/2] Add support for Variscite DART-MX95 and Sonata board
From: Francesco Dolcini @ 2026-03-25 12:21 UTC (permalink / raw)
To: Frank Li
Cc: Stefano Radaelli, devicetree, linux-kernel, pierluigi.p,
Stefano Radaelli, Shawn Guo, Sascha Hauer,
Pengutronix Kernel Team, Fabio Estevam, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, imx, linux-arm-kernel
In-Reply-To: <acLqjeqmR4eVDoKw@lizhi-Precision-Tower-5810>
On Tue, Mar 24, 2026 at 03:48:29PM -0400, Frank Li wrote:
> On Fri, Mar 13, 2026 at 02:56:03PM -0400, Frank Li wrote:
> > On Fri, Mar 13, 2026 at 06:47:01PM +0100, Stefano Radaelli wrote:
> > > This patch series adds support for the Variscite DART-MX95 system on
> > > module and the Sonata carrier board.
> > >
> > > The series includes:
> > > - Device tree bindings documentation for both SOM and carrier board
> > > - SOM device tree with on-module peripherals
> > > - Sonata carrier board device tree with board-specific features
> > >
> > > The implementation follows the standard SOM + carrier board pattern
> > > where the SOM dtsi contains only peripherals mounted on the module,
> > > while carrier-specific interfaces are enabled in the board dts.
> > >
> > > v7:
> > > - Order nodes in symphony dts
> >
> > Can you run https://github.com/lznuaa/dt-format for new file to easy
> > track and check by script later.
>
> I applied with below change
> - update copywrite to 2026
How do you know that the correct copyright year is 2026?
The copyright years does not change to today just because it is the year
it was submitted.
I think that you should refrain yourself for doing these kind of changes
to the code when you apply patches.
Thanks
Francesco
^ permalink raw reply
* Re: [PATCH] firmware: arm_scmi: clock: Relax check in scmi_clock_protocol_init
From: Cristian Marussi @ 2026-03-25 12:18 UTC (permalink / raw)
To: Sudeep Holla
Cc: Geert Uytterhoeven, Cristian Marussi, Peng Fan (OSS), Jacky Bai,
arm-scmi, linux-arm-kernel, linux-kernel, Peng Fan
In-Reply-To: <20260325-spiffy-analytic-sloth-eabfea@sudeepholla>
On Wed, Mar 25, 2026 at 12:06:15PM +0000, Sudeep Holla wrote:
> On Tue, Mar 24, 2026 at 04:32:55PM +0100, Geert Uytterhoeven wrote:
> > Hi Cristian,
> >
Hi Sudeep,
> > On Tue, 24 Mar 2026 at 15:35, Cristian Marussi <cristian.marussi@arm.com> wrote:
> > > On Tue, Mar 24, 2026 at 02:15:36PM +0100, Geert Uytterhoeven wrote:
> > > > On Tue, 24 Mar 2026 at 09:41, Cristian Marussi <cristian.marussi@arm.com> wrote:
> > > > > Looking at the spec 3.6.2.5 CLOCK_ATTRIBUTES
> > > > >
> > > > > "This command returns the attributes that are associated with a specific clock. An agent might be allowed access to only
> > > > > a subset of the clocks available in the system. The platform must thus guarantee that clocks that an agent cannot access
> > > > > are not visible to it."
> > > > >
> > > > > ...not sure if this sheds some light or it is ambiguos anyway...I'd say that
> > > > > NOT_FOUND does NOT equate to be invisible...
> > > > >
> > > > > ...BUT at the same time I think that this practice of exposing a non-contiguos
> > > > > set of resources IDs (a set with holes in it) is the a well-known spec-loophole
> > > > > used by many vendors to deploy one single FW image across all of their platforms
> > > > > without having to reconfigure their reosurces IDs ro expose a common set of
> > > > > contiguos IDs like the spec would suggest...
> > > > >
> > > > > Having said that, since we unfortunately left this door open in the
> > > > > implementation, now this loophole has become common practice
> > > > > apparently...
> > > >
> > > > When I first read that paragraph, I was also confused.
> > > > What does "not visible" mean?
> > > > - Not present in the clock ID space exposed to that client of
> > > > the system?
> > > > Yeah, multiple different sequences of contiguous IDs, depending
> > > > on client!
> > >
> > > Yes that is the most spec-compliant interpretation usually; in general
> > > across all protocols the SCMI server, through customized enumeration
> > > results, should provide a per-agent view of the system: this should help
> > > handling shared or virtualized resources, since the agent always see
> > > only the 'illusion' provided by the server...
> > >
> > > ...under this assumption if you dont even need a resource at all (not
> > > RW nor RO) you should NOT even be able to see it...this in turn of
> > > course means that in order to expose a contiguous set of IDs you should
> > > be able to properly configure at build time the FW resources on a per
> > > platform basis...
> > >
> > > > - Return failure on CLOCK_ATTRIBUTES?
> > > > Which is what implementations seem to do.
> > >
> > > Yes this is what is done leveraging the gap in the implementation...I am
> > > not sure that the non-contiguous set of IDs is supported if not by
> > > chance as of now :P (especially in other protocols)
> > >
> > > > The next step in the fun is when the system actually needs to know the
> > > > clock rate of such a clock...
> > >
> > > Well...that seems a bit of wishful thinking ...
> >
> > Unfortunately it is real... [cliffhanger, to be continued... :-]
> >
>
> I am late to the discussion. Based on all the discussion so far, I don't
> want to rush the clk changes from Cristian that adds the hardening yet.
Agreed, the whole series with the iterators and hardening needs fixes from
Geert upfront...
> I won't make it part of my SCMI v7.1 PR. However is it good idea to keep
> it in the next so that we can converge towards some solution in v7.2 ?
>
> So, the question is if I add the fixes from Geert[1] to my queue, will it
> be a good baseline for all these changes and discussions ?
Yes having my iterator series plus Geert fixes on top would be a good
starting point to have a sane kernel...then we'll see how many quirks or
de-Hardening will be needed on top of that to make all firmware
survive...
I will re-test on my side in the coming days but anything that goes on
linux-next from your next of course has a more broad audience...
Thanks,
Cristian
^ permalink raw reply
* Re: [PATCH v2] KVM: arm64: Prevent the host from using an smc with imm16 != 0
From: Fuad Tabba @ 2026-03-25 12:16 UTC (permalink / raw)
To: Sebastian Ene
Cc: Vincent Donnefort, kvmarm, linux-arm-kernel, linux-kernel,
android-kvm, catalin.marinas, joey.gouly, mark.rutland, maz,
oupton, suzuki.poulose, will, yuzenghui
In-Reply-To: <acPJ3Z9hDKc0Sm2t@google.com>
On Wed, 25 Mar 2026 at 11:41, Sebastian Ene <sebastianene@google.com> wrote:
>
> On Wed, Mar 25, 2026 at 11:35:18AM +0000, Vincent Donnefort wrote:
> > On Wed, Mar 25, 2026 at 11:31:38AM +0000, Sebastian Ene wrote:
> > > The ARM Service Calling Convention (SMCCC) specifies that the function
> > > identifier and parameters should be passed in registers, leaving the
> > > 16-bit immediate field of the SMC instruction un-handled.
> > > Currently, our pKVM handler ignores the immediate value, which could lead
> > > to non-compliant software relying on implementation-defined behavior.
> > > Enforce the host kernel running under pKVM to use an immediate value
> > > of 0 by decoding the ISS from the ESR_EL2 and return a not supported
> > > error code back to the caller.
> > >
> > > Signed-off-by: Sebastian Ene <sebastianene@google.com>
> > > ---
> > > v1 -> v2:
> > >
> > > - Dropped injecting an UNDEF and return an error instead
> > > (SMCCC_RET_NOT_SUPPORTED)
> > > - Used the mask ESR_ELx_xVC_IMM_MASK instead of masking with U16_MAX
> > > - Updated the title of the commit message from:
> > > "[PATCH] KVM: arm64: Inject UNDEF when host is executing an
> > > smc with imm16 != 0
> >
> > > ---
> > > arch/arm64/kvm/hyp/nvhe/hyp-main.c | 6 ++++++
> > > 1 file changed, 6 insertions(+)
> > >
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > > index e7790097db93..4ffe30fd8707 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > > @@ -762,6 +762,12 @@ void handle_trap(struct kvm_cpu_context *host_ctxt)
> > > handle_host_hcall(host_ctxt);
> > > break;
> > > case ESR_ELx_EC_SMC64:
> > > + if (ESR_ELx_xVC_IMM_MASK & esr) {
> > > + cpu_reg(host_ctxt, 0) = SMCCC_RET_NOT_SUPPORTED;
> > > + kvm_skip_host_instr();
> > > + break;
> > > + }
> > > +
> >
> > I wonder if it isn't better to move that into handle_host_smc() as this is part
> > of how we handle the SMC after all? (and it calls that kvm_skip_host_instr()
> > already)
> >
>
> I was thinking of doing that as well but I prefer this since we don't
> have to look again at the esr in the callee.
This is a non-trivial amount of code to have in a switch. Can you
please put it in a helper?
Thanks,
/fuad
>
> > > handle_host_smc(host_ctxt);
> > > break;
> > > case ESR_ELx_EC_IABT_LOW:
> > > --
> > > 2.53.0.1018.g2bb0e51243-goog
> > >
^ permalink raw reply
* Re: [PATCH] firmware: arm_scmi: clock: Relax check in scmi_clock_protocol_init
From: Sudeep Holla @ 2026-03-25 12:14 UTC (permalink / raw)
To: Cristian Marussi
Cc: Peng Fan (OSS), Jacky Bai, arm-scmi, linux-arm-kernel,
Geert Uytterhoeven, linux-kernel, Peng Fan
In-Reply-To: <acJI7o79lVNMjV_d@pluto>
On Tue, Mar 24, 2026 at 08:18:54AM +0000, Cristian Marussi wrote:
> On Tue, Mar 24, 2026 at 07:49:22AM +0000, Sudeep Holla wrote:
> > On Tue, Mar 24, 2026 at 02:24:14PM +0800, Peng Fan (OSS) wrote:
> > > From: Peng Fan <peng.fan@nxp.com>
> > >
>
> Hi,
>
> > > On i.MX95, the SCMI Clock protocol defines several reserved clock IDs that
> > > are not backed by real clock devices
> > > (see arch/arm64/boot/dts/freescale/imx95-clock.h).
> > >
> > > For these reserved IDs, the SCMI firmware correctly returns NOT_FOUND in
> > > response to the CLOCK_ATTRIBUTES command. According to the SCMI Clock
> > > specification, NOT_FOUND is expected when a clock_id does not correspond to
> > > a valid clock device.
> > >
> > > The recent hardening added in scmi_clock_protocol_init() treats any error
> > > return as fatal, causing SCMI clock probe to fail and preventing i.MX9
> > > platforms from booting.
> > >
> > > Relax the check so that -ENOENT is treated as a non-fatal condition.
> > >
> >
> > I understand the use-case and the fix here, but still wonder if this
> > should be treated as quirk or handle it in the core. I am inclined to
> > latter as reserved SCMI clock/resource ID seems to be trend in its usage
> > and hard to classify as quirks.
> >
> > Cristain, agree or have a different view ?
> >
>
> I was just replying...
>
> Looking at the spec 3.6.2.5 CLOCK_ATTRIBUTES
>
> "This command returns the attributes that are associated with a specific
> clock. An agent might be allowed access to only a subset of the clocks
> available in the system. The platform must thus guarantee that clocks that
> an agent cannot access are not visible to it."
>
> ...not sure if this sheds some light or it is ambiguos anyway...I'd say that
> NOT_FOUND does NOT equate to be invisible...
>
> ...BUT at the same time I think that this practice of exposing a non-contiguos
> set of resources IDs (a set with holes in it) is the a well-known spec-loophole
> used by many vendors to deploy one single FW image across all of their platforms
> without having to reconfigure their reosurces IDs ro expose a common set of
> contiguos IDs like the spec would suggest...
>
> Having said that, since we unfortunately left this door open in the
> implementation, now this loophole has become common practice
> apparently...
>
Indeed, I agree. I wasn't fully convinced to relax and this discussion
seem to strive towards right direction IMO.
> ...I am more concerned about the impact that this COULD have on underlying
> resources allocations that at the driver level was certainly conceived
> to manage a contiguos set of IDs, even though it was certanly the
> usecase until my harden patches...so it could be safe but to be double
> checked...
>
> Quirk or core I suppose it depends on how much we want to 'legalize' this
> trick for the future (thinking also about other protocols)...
>
Agreed. I am more nervous about legalising it. Let us see what are the
issues with the firmware in the wild before we can decide. That's one reason
I would like keep your changes in -next even though I have now dropped the
idea of pushing in for v7.1
> ...a middle ground could be to implement it as an always-on quirk that
> matches any FW (like the existing out_of_spec_triplet quirk)... as a way
> to keep on allowing the existing behaviour but sort of discouraging it...
>
Fair enough, worth an attempt at the least.
> Not really strong opinions about this, BUT at this point, in general I
> would keep all of this series on hold for further testing, given these
> issues together the bugs fixed by Geert on iterators and the lack of
> clock-MAINTs acks...
>
Agreed.
> ...also Geert was investigating the need of a different quirks in these
> regards for the Renesas boards...
>
Yes still waiting for the outcome from the cliffhanger Geert left us 😉.
Just kidding take your time Geert, no rush.
--
Regards,
Sudeep
^ permalink raw reply
* Re: [PATCH v2] KVM: arm64: Prevent the host from using an smc with imm16 != 0
From: Will Deacon @ 2026-03-25 12:12 UTC (permalink / raw)
To: Marc Zyngier
Cc: Sebastian Ene, Vincent Donnefort, kvmarm, linux-arm-kernel,
linux-kernel, android-kvm, catalin.marinas, joey.gouly,
mark.rutland, oupton, suzuki.poulose, tabba, yuzenghui
In-Reply-To: <86ldfg3ze2.wl-maz@kernel.org>
On Wed, Mar 25, 2026 at 11:46:29AM +0000, Marc Zyngier wrote:
> On Wed, 25 Mar 2026 11:35:18 +0000,
> Vincent Donnefort <vdonnefort@google.com> wrote:
> >
> > On Wed, Mar 25, 2026 at 11:31:38AM +0000, Sebastian Ene wrote:
> > > The ARM Service Calling Convention (SMCCC) specifies that the function
> > > identifier and parameters should be passed in registers, leaving the
> > > 16-bit immediate field of the SMC instruction un-handled.
> > > Currently, our pKVM handler ignores the immediate value, which could lead
> > > to non-compliant software relying on implementation-defined behavior.
> > > Enforce the host kernel running under pKVM to use an immediate value
> > > of 0 by decoding the ISS from the ESR_EL2 and return a not supported
> > > error code back to the caller.
> > >
> > > Signed-off-by: Sebastian Ene <sebastianene@google.com>
> > > ---
> > > v1 -> v2:
> > >
> > > - Dropped injecting an UNDEF and return an error instead
> > > (SMCCC_RET_NOT_SUPPORTED)
> > > - Used the mask ESR_ELx_xVC_IMM_MASK instead of masking with U16_MAX
> > > - Updated the title of the commit message from:
> > > "[PATCH] KVM: arm64: Inject UNDEF when host is executing an
> > > smc with imm16 != 0
> >
> > > ---
> > > arch/arm64/kvm/hyp/nvhe/hyp-main.c | 6 ++++++
> > > 1 file changed, 6 insertions(+)
> > >
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > > index e7790097db93..4ffe30fd8707 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > > @@ -762,6 +762,12 @@ void handle_trap(struct kvm_cpu_context *host_ctxt)
> > > handle_host_hcall(host_ctxt);
> > > break;
> > > case ESR_ELx_EC_SMC64:
> > > + if (ESR_ELx_xVC_IMM_MASK & esr) {
> > > + cpu_reg(host_ctxt, 0) = SMCCC_RET_NOT_SUPPORTED;
> > > + kvm_skip_host_instr();
> > > + break;
> > > + }
> > > +
> >
> > I wonder if it isn't better to move that into handle_host_smc() as this is part
> > of how we handle the SMC after all? (and it calls that kvm_skip_host_instr()
> > already)
>
> Yes, that'd be vastly better.
>
> It also begs the question: if you don't want to handle SMCs with a
> non-zero immediate, why is it OK to do it for HVCs?
I suppose the difference is that the HVC API is a private interface
between EL2 and the host. As it stands, EL2 ignores the immediate but we
don't have a way to know how EL3 responds to the immediate for an SMC.
When proxying an SMC from the host, EL2 therefore has three choices:
1. Ignore the immediate from the host and always use zero when talking
to EL3. That's the current behaviour, but it could theoretically
lead to problems if EL3 is using the immediate for something.
2. Propagate the immediate from the host. That should work, but it's
a bit involved.
3. Reject non-zero immediates (this patch).
Will
^ permalink raw reply
* Re: [PATCH] firmware: arm_scmi: clock: Relax check in scmi_clock_protocol_init
From: Sudeep Holla @ 2026-03-25 12:06 UTC (permalink / raw)
To: Geert Uytterhoeven
Cc: Cristian Marussi, Peng Fan (OSS), Jacky Bai, arm-scmi,
linux-arm-kernel, linux-kernel, Peng Fan
In-Reply-To: <CAMuHMdW==oMpCS==L8ADexPC=4uczNwK5UD6DequBjGOKQfOZw@mail.gmail.com>
On Tue, Mar 24, 2026 at 04:32:55PM +0100, Geert Uytterhoeven wrote:
> Hi Cristian,
>
> On Tue, 24 Mar 2026 at 15:35, Cristian Marussi <cristian.marussi@arm.com> wrote:
> > On Tue, Mar 24, 2026 at 02:15:36PM +0100, Geert Uytterhoeven wrote:
> > > On Tue, 24 Mar 2026 at 09:41, Cristian Marussi <cristian.marussi@arm.com> wrote:
> > > > Looking at the spec 3.6.2.5 CLOCK_ATTRIBUTES
> > > >
> > > > "This command returns the attributes that are associated with a specific clock. An agent might be allowed access to only
> > > > a subset of the clocks available in the system. The platform must thus guarantee that clocks that an agent cannot access
> > > > are not visible to it."
> > > >
> > > > ...not sure if this sheds some light or it is ambiguos anyway...I'd say that
> > > > NOT_FOUND does NOT equate to be invisible...
> > > >
> > > > ...BUT at the same time I think that this practice of exposing a non-contiguos
> > > > set of resources IDs (a set with holes in it) is the a well-known spec-loophole
> > > > used by many vendors to deploy one single FW image across all of their platforms
> > > > without having to reconfigure their reosurces IDs ro expose a common set of
> > > > contiguos IDs like the spec would suggest...
> > > >
> > > > Having said that, since we unfortunately left this door open in the
> > > > implementation, now this loophole has become common practice
> > > > apparently...
> > >
> > > When I first read that paragraph, I was also confused.
> > > What does "not visible" mean?
> > > - Not present in the clock ID space exposed to that client of
> > > the system?
> > > Yeah, multiple different sequences of contiguous IDs, depending
> > > on client!
> >
> > Yes that is the most spec-compliant interpretation usually; in general
> > across all protocols the SCMI server, through customized enumeration
> > results, should provide a per-agent view of the system: this should help
> > handling shared or virtualized resources, since the agent always see
> > only the 'illusion' provided by the server...
> >
> > ...under this assumption if you dont even need a resource at all (not
> > RW nor RO) you should NOT even be able to see it...this in turn of
> > course means that in order to expose a contiguous set of IDs you should
> > be able to properly configure at build time the FW resources on a per
> > platform basis...
> >
> > > - Return failure on CLOCK_ATTRIBUTES?
> > > Which is what implementations seem to do.
> >
> > Yes this is what is done leveraging the gap in the implementation...I am
> > not sure that the non-contiguous set of IDs is supported if not by
> > chance as of now :P (especially in other protocols)
> >
> > > The next step in the fun is when the system actually needs to know the
> > > clock rate of such a clock...
> >
> > Well...that seems a bit of wishful thinking ...
>
> Unfortunately it is real... [cliffhanger, to be continued... :-]
>
I am late to the discussion. Based on all the discussion so far, I don't
want to rush the clk changes from Cristian that adds the hardening yet.
I won't make it part of my SCMI v7.1 PR. However is it good idea to keep
it in the next so that we can converge towards some solution in v7.2 ?
So, the question is if I add the fixes from Geert[1] to my queue, will it
be a good baseline for all these changes and discussions ?
--
Regards,
Sudeep
^ permalink raw reply
* Re: [LSF/MM/BPF TOPIC] Per-process page size
From: Lorenzo Stoakes (Oracle) @ 2026-03-25 12:04 UTC (permalink / raw)
To: Dev Jain
Cc: lsf-pc, ryan.roberts, catalin.marinas, will, ardb, willy, hughd,
baolin.wang, akpm, david, lorenzo.stoakes, Liam.Howlett, vbabka,
rppt, surenb, mhocko, linux-mm, linux-arm-kernel, linux-kernel
In-Reply-To: <20260217145026.3880286-1-dev.jain@arm.com>
Sorry if I repeat points others have raised, but I've just not had the time to
look at LSF/MM topics.
On Tue, Feb 17, 2026 at 08:20:26PM +0530, Dev Jain wrote:
> Hi everyone,
>
> We propose per-process page size on arm64. Although the proposal is for
> arm64, perhaps the concept can be extended to other arches, thus the
> generic topic name.
Well it's inevitably going to interact with core mm right and be super invasive?
So it's not just an arm64-only thing unless you feel you can keep it all in
arch/arm?
So you're asking essentially core mm to be radically altered to support an
arch-dependent feature.
I think we should be clear about that.
>
> -------------
> INTRODUCTION
> -------------
> While mTHP has brought the performance of many workloads running on an arm64 4K
Honestly this whole suggestion to me - instinctively - feels like 'mTHP doesn't
do what we want, so let's work around it'.
Wouldn't it be better to improve mTHP to provide what you are looking for?
> kernel closer to that of the performance on an arm64 64K kernel, a performance
> gap still remains. This is attributed to a combination of greater number of
> pgtable levels, less reach within the walk cache and higher data cache footprint
> for pgtable memory. At the same time, 64K is not suitable for general
This all sounds like things mTHP can address.
> purpose environments due to it's significantly higher memory footprint.
>
> To solve this, we have been experimenting with a concept called "per-process
> page size". This breaks the historic assumption of a single page size for the
> entire system: a process will now operate on a page size ABI that is greater
Fundamentally you'd be adding a great deal more complexity into mm, so the
trade-off has to be really excellent for that to be worthwhile.
It also essentially, to me, spells a giving up on mTHP ever being something that
automatically gives us what we want in these regards.
> than or equal to the kernel's page size. This is enabled by a key architectural
> feature on Arm: the separation of user and kernel page tables.
Isn't that true of every architecture? I mean we map userland and kernel memory
separately everywhere right?
Presumably you mean that arm64 has a feature whereby there are always entirely
separate page table structures for each.
But the kernel _regularly_ manipulates userland memory mappings, directly via
uaccess, via GUP, page table set up and manipulation via rmap etc. etc. etc.
uaccess means the kernel uses the same page tables that userland uses.
So is the proposed mechanism contpte or actually having a base page size at
higher granularity? Presumably the latter.
GUP fast as well is another concern - how can that be kept working with all
this?
And TLB maintenance, now you're dealing with fundamentally different
granularity, the kernel will have to surely become 'enlightened' somehow
(IOW - radically altered adding further complexity and bug risk) to handle
that?
>
> This can also lead to a future of a single kernel image instead of 4K, 16K
> and 64K images.
Except then every bit of kernel memory is at what 4K? Surely there's perf impact
there too..
I wonder about how on earth pageblocks are supposed to work in this
mechanism. Because now the whole heuristic means of avoiding fragmentation for
PMD sized pages won't work anywhere, because any given process might need more.
And pageblocks are currently broken for 64KB page size anyway, as the reserves
required are huge due to 64KB PMD size being gigantic (512GB isn't it?)
And then, there's the page cache :) but I guess we address that below.
I do wonder how this is supposed to interact with THP and mTHP which now
will have to maintain entirely separate PMD sizes depending on process
surely?
I mean correct me if I've misinterpreted and you intend to implement this
using contpte's or something, but if you are I'd ask why don't we just
improve mTHP?
>
> --------------
> CURRENT DESIGN
> --------------
> The design is based on one core idea; most of the kernel continues to believe
> there is only one page size in use across the whole system. That page size is
> the size selected at compile-time, as is done today. But every process (more
> accurately mm_struct) has a page size ABI which is one of the 3 page sizes
> (4K, 16K or 64K) as long as that page size is greater than or equal to the
> kernel page size (kernel page size is the macro PAGE_SIZE).
>
> Pagesize selection
> ------------------
> A process' selected page size ABI comes into force at execve() time and
> remains fixed until the process exits or until the next execve(). Any forked
> processes inherit the page size of their parent.
> The personality() mechanism already exists for similar cases, so we propose
> to extend it to enable specifying the required page size.
Well it's designed for providing basic compatibility layers and address
space limits and so on, this feels like something _hugely_ more impactful.
Doesn't personality() _immediately_ change the current process's state? I
don't think we should implement that that way.
If we were to do this, I'd say something like a clone flag would be better
maybe? or an elf setting.
I also wonder the degree to which kernel mechanisms already assume
PAGE_SIZE == userland PAGE_SIZE.
We'd have to rework all of the page table walkers for one no?
>
> There are 3 layers to the design. The first two are not arch-dependent,
> and makes Linux support a per-process pagesize ABI. The last layer is
> arch-specific.
>
> 1. ABI adapter
> --------------
> A translation layer is added at the syscall boundary to convert between the
> process page size and the kernel page size. This effectively means enforcing
> alignment requirements for addresses passed to syscalls and ensuring that
> quantities passed as “number of pages” are interpreted relative to the process
> page size and not the kernel page size. In this way the process has the illusion
> that it is working in units of its page size, but the kernel is working in
> units of the kernel page size.
I mean I think we'd also need to radically alter a lot of other things. But
this is already adding a bunch of complexity and ways for things to be
buggy and broken.
>
> 2. Generic Linux MM enlightenment
> ---------------------------------
> We enlighten the Linux MM code to always hand out memory in the granularity
> of process pages. Most of this work is greatly simplified because of the
> existing mTHP allocation paths, and the ongoing support for large folios
> across different areas of the kernel. The process order will be used as the
> hard minimum mTHP order to allocate.
I mean, this is again another radical change everywhere. I don't understand
'the process order will be used as the hard minimum mTHP order to
allocate'.
So now we're allocating all physical memory for these processes using mTHP
somehow? How does the mTHP mechanism interact with this?
What will `cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size` show?
And HugePageSize in /proc/meminfo?
Presumably:
sysconf(_SC_PAGESIZE); / getconf PAGE_SIZE/PAGESIZE
Also there's the thorny thing of in /proc/$pid/smaps 'MMUPageSize'
vs. 'KernelPageSize', which was instituted to handle some PPC-specific I
think variation in the two.
This was updated after recent discussions by David at:
https://lore.kernel.org/linux-mm/20260306081916.38872-1-david@kernel.org/
"KernelPageSize" always corresponds to "MMUPageSize", except when a larger
kernel page size is emulated on a system with a smaller page size used by the
MMU, which is the case for some PPC64 setups with hugetlb."
This will be invalidated if we change these, but surely you'd have to?
Also w.r.t. hugetlb how will PMD sharing work across processes with
different page size?
And what about drivers that allocate PAGE_SIZE memory to share with
userland? This does happen, or at the wrong granularity?
How can they be 'educated' to allocate the right amount of memory? I don't
see a way around one that actually.
Also what about bounce buffers for e.g. /proc/kcore?
And talking about proc, what about procfs interfaces that count per-page,
except they won't be counting per-page any more, they'll be counting
per-kernel page...?
What about bpf hooks that rely on page size? Surely we're potentially
breaking things there?
(Sorry just asking questions as they come up in my mind)
THP now surely has multiple different PMD sizes to contend with, but is
also being used to provide the mTHP-sized folios to the process?...
I mean unless you intend that these processes are just implementing page
table size via contpte only, and keep userland page size = kernel page size
except for this, but then the question again becomes 'what in mTHP is not
working such that you feel you need to do this'?
>
> File memory
> -----------
> For a growing list of compliant file systems, large folios can already be
> stored in the page cache. There is even a mechanism, introduced to support
> filesystems with block sizes larger than the system page size, to set a
> hard-minimum size for folios on a per-address-space basis. This mechanism
> will be reused and extended to service the per-process page size requirements.
>
> One key reason that the 64K kernel currently consumes considerably more memory
> than the 4K kernel is that Linux systems often have lots of small
> configuration files which each require a page in the page cache. But these
> small files are (likely) only used by certain processes. So, we prefer to
> continue to cache those using a 4K page.
> Therefore, if a process with a larger page size maps a file whose pagecache
> contains smaller folios, we drop them and re-read the range with a folio
> order at least that of the process order.
Yeah this is surely terrible? One process can cause evictions from the page
cache for others, then when they get scheduled they drop them again and
again over and over and over just because files are being accessed by
different processes no?
I don't think this is at all workable, unless I'm missing something?
This feels like you need a complete rework of the page cache to implement
this really, and I wonder at that being a realistic goal.
>
> 3. Translation from Linux pagetable to native pagetable
> -------------------------------------------------------
> Assume the case of a kernel pagesize of 4K and app pagesize of 64K.
> Now that enlightenment is done, it is guaranteed that every single mapping
> in the 4K pagetable (which we call the Linux pagetable) is of granularity
> at least 64K. In the arm64 MM code, we maintain a "native" pagetable per
> mm_struct, which is based off a 64K geometry. Because of the guarantee
I mean yeah, this 'enlightenment' feels a lot more like - radically
altering a ton of mm code and never quite being sure it will all work :)
> aforementioned, any pagetable operation on the Linux pagetable
> (set_ptes, clear_flush_ptes, modify_prot_start_ptes, etc) is going to happen
> at a granularity of at least 16 PTEs - therefore we can translate this
> operation to modify a single PTE entry in the native pagetable.
> Given that enlightenment may miss corner cases, we insert a warning in the
> architecture code - on being presented with an operation not translatable
Yeah this is already hugely concerning, introducing subtle ways to break
things.
> into a native operation, we fallback to the Linux pagetable, thus losing
> the benefits borne out of the pagetable geometry but keeping
> the emulation intact.
I really dislike the idea of maintaining an 'emulation' between userland
and the kernel on a fundamental basis. The kernel is supposed to deal with
the system 'as it is'.
Adding layers of abstraction just adds ever more possibilities of ever more
subtle and difficult-to-solve bugs.
>
> -----------------------
> What we want to discuss
> -----------------------
I think we also need to discuss whether this is generally viable or a good
idea :)
I suppose that is implicit.
> - Are there other arches which could benefit from this?
> - What level of compatibility we can achieve - is it even possible to
> contain userspace within the emulated ABI?
> - Rough edges of compatibility layer - pfnmaps, ksm, procfs, etc. For
> example, what happens when a 64K process opens a procfs file of
> a 4K process?
> - native pgtable implementation - perhaps inspiration can be taken
> from other arches with an involved pgtable logic (ppc, s390)?
>
> -------------
> Key Attendees
> -------------
> - Ryan Roberts (co-presenter)
> - mm folks (David Hildenbrand, Matthew Wilcox, Liam Howlett, Lorenzo Stoakes,
> and many others)
Thanks :)
> - arch folks
Overall this feels like a workaround for mTHP not doing what you want, and
to me that speaks to us needing to improve mTHP rather than add a ton of
complexity.
I fear we are only touching the tip of the iceberg of ways in which this
could break.
Regards, Lorenzo
^ permalink raw reply
* Re: [PATCH 09/10] PCI: Align head space better
From: Ilpo Järvinen @ 2026-03-25 12:04 UTC (permalink / raw)
To: linux-pci
Cc: Bjorn Helgaas, Guenter Roeck, linux-alpha, linux-arm-kernel,
linux-m68k, linux-mips, linux-parisc, linuxppc-dev, linux-s390,
linux-sh, Russell King, Geert Uytterhoeven, Thomas Bogendoerfer,
James E.J. Bottomley, Helge Deller, Michael Ellerman,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Chris Zankel, Max Filippov, Madhavan Srinivasan,
Yoshinori Sato, Rich Felker, John Paul Adrian Glaubitz,
Nicholas Piggin, Christophe Leroy (CS GROUP), x86, LKML
In-Reply-To: <20260324165633.4583-10-ilpo.jarvinen@linux.intel.com>
[-- Attachment #1: Type: text/plain, Size: 2856 bytes --]
On Tue, 24 Mar 2026, Ilpo Järvinen wrote:
> When a bridge window contains big and small resource(s), the small
> resource(s) may not amount to the half of the size of the big resource
> which would allow calculate_head_align() to shrink the head alignment.
> This results in always placing the small resource(s) after the big
> resource.
>
> In general, it would be good to be able to place the small resource(s)
> before the big resource to achieve better utilization of the address
> space. In the cases where the large resource can only fit at the end
> of the window, it is even required.
>
> However, carrying the information over from pbus_size_mem() and
> calculate_head_align() to __pci_assign_resource() and
> pcibios_align_resource() is not easy with the current data structures.
>
> A somewhat hacky way to move the non-aligning tail part to the head is
> possible within pcibios_align_resource(). The free space between the
> start of the free space span and the aligned start address can be
> compared with the non-aligning remainder of the size. If the free space
> is larger than the remainder, placing the remainder before the start
> address is possible. This relocation should generally work, because PCI
> resources consist only power-of-2 atoms.
>
> Various arch requirements may still need to override the relocation, so
> the relocation is only applied selectively in such cases.
>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221205
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> ---
> arch/arm/kernel/bios32.c | 3 +++
> arch/m68k/kernel/pcibios.c | 4 ++++
> arch/mips/pci/pci-generic.c | 3 +++
> arch/mips/pci/pci-legacy.c | 2 ++
> arch/parisc/kernel/pci.c | 3 +++
> arch/powerpc/kernel/pci-common.c | 2 ++
> arch/sh/drivers/pci/pci.c | 2 ++
> arch/x86/pci/i386.c | 2 ++
> arch/xtensa/kernel/pci.c | 2 ++
> drivers/pci/setup-res.c | 39 +++++++++++++++++++++++++++++++-
> include/linux/pci.h | 5 ++++
> kernel/resource.c | 2 +-
> 12 files changed, 67 insertions(+), 2 deletions(-)
> diff --git a/arch/sh/drivers/pci/pci.c b/arch/sh/drivers/pci/pci.c
> index 7a0522316ee3..994c3bd36ef2 100644
> --- a/arch/sh/drivers/pci/pci.c
> +++ b/arch/sh/drivers/pci/pci.c
> @@ -185,6 +185,8 @@ resource_size_t pcibios_align_resource(void *data, const struct resource *res,
> */
> if (start & 0x300)
> start = (start + 0x3ff) & ~0x3ff;
> + } else (res->flags & IORESOURCE_MEM) {
I'll be adding the missing if into this in v2 (found by sashiko). It seems
lkp didn't test this so it was not caught earlier.
> + start = pci_align_resource(dev, res, empty_res, size, align);
> }
>
> return start;
--
i.
^ permalink raw reply
* Re: [PATCH 02/12] bus: fsl-mc: use generic driver_override infrastructure
From: Ioana Ciornei @ 2026-03-25 12:01 UTC (permalink / raw)
To: Danilo Krummrich
Cc: Russell King, Greg Kroah-Hartman, Rafael J. Wysocki, Nipun Gupta,
Nikhil Agarwal, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li, Bjorn Helgaas, Armin Wolf, Bjorn Andersson,
Mathieu Poirier, Vineeth Vijayan, Peter Oberparleiter,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Harald Freudenberger,
Holger Dengler, Mark Brown, Michael S. Tsirkin, Jason Wang,
Xuan Zhuo, Eugenio Pérez, Alex Williamson,
Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
Christophe Leroy (CS GROUP), linux-kernel, driver-core,
linuxppc-dev, linux-hyperv, linux-pci, platform-driver-x86,
linux-arm-msm, linux-remoteproc, linux-s390, linux-spi,
virtualization, kvm, xen-devel, linux-arm-kernel, Gui-Dong Han
In-Reply-To: <20260324005919.2408620-3-dakr@kernel.org>
On Tue, Mar 24, 2026 at 01:59:06AM +0100, Danilo Krummrich wrote:
> When a driver is probed through __driver_attach(), the bus' match()
> callback is called without the device lock held, thus accessing the
> driver_override field without a lock, which can cause a UAF.
>
> Fix this by using the driver-core driver_override infrastructure taking
> care of proper locking internally.
>
> Note that calling match() from __driver_attach() without the device lock
> held is intentional. [1]
>
> Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1]
> Reported-by: Gui-Dong Han <hanguidong02@gmail.com>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789
> Fixes: 1f86a00c1159 ("bus/fsl-mc: add support for 'driver_override' in the mc-bus")
> Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
^ permalink raw reply
* Re: [PATCH v2] KVM: arm64: Prevent the host from using an smc with imm16 != 0
From: Marc Zyngier @ 2026-03-25 11:58 UTC (permalink / raw)
To: Sebastian Ene
Cc: Vincent Donnefort, kvmarm, linux-arm-kernel, linux-kernel,
android-kvm, catalin.marinas, joey.gouly, mark.rutland, oupton,
suzuki.poulose, tabba, will, yuzenghui
In-Reply-To: <acPJ3Z9hDKc0Sm2t@google.com>
On Wed, 25 Mar 2026 11:41:17 +0000,
Sebastian Ene <sebastianene@google.com> wrote:
>
> On Wed, Mar 25, 2026 at 11:35:18AM +0000, Vincent Donnefort wrote:
> > On Wed, Mar 25, 2026 at 11:31:38AM +0000, Sebastian Ene wrote:
> > > The ARM Service Calling Convention (SMCCC) specifies that the function
> > > identifier and parameters should be passed in registers, leaving the
> > > 16-bit immediate field of the SMC instruction un-handled.
> > > Currently, our pKVM handler ignores the immediate value, which could lead
> > > to non-compliant software relying on implementation-defined behavior.
> > > Enforce the host kernel running under pKVM to use an immediate value
> > > of 0 by decoding the ISS from the ESR_EL2 and return a not supported
> > > error code back to the caller.
> > >
> > > Signed-off-by: Sebastian Ene <sebastianene@google.com>
> > > ---
> > > v1 -> v2:
> > >
> > > - Dropped injecting an UNDEF and return an error instead
> > > (SMCCC_RET_NOT_SUPPORTED)
> > > - Used the mask ESR_ELx_xVC_IMM_MASK instead of masking with U16_MAX
> > > - Updated the title of the commit message from:
> > > "[PATCH] KVM: arm64: Inject UNDEF when host is executing an
> > > smc with imm16 != 0
> >
> > > ---
> > > arch/arm64/kvm/hyp/nvhe/hyp-main.c | 6 ++++++
> > > 1 file changed, 6 insertions(+)
> > >
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > > index e7790097db93..4ffe30fd8707 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > > @@ -762,6 +762,12 @@ void handle_trap(struct kvm_cpu_context *host_ctxt)
> > > handle_host_hcall(host_ctxt);
> > > break;
> > > case ESR_ELx_EC_SMC64:
> > > + if (ESR_ELx_xVC_IMM_MASK & esr) {
> > > + cpu_reg(host_ctxt, 0) = SMCCC_RET_NOT_SUPPORTED;
> > > + kvm_skip_host_instr();
> > > + break;
> > > + }
> > > +
> >
> > I wonder if it isn't better to move that into handle_host_smc() as this is part
> > of how we handle the SMC after all? (and it calls that kvm_skip_host_instr()
> > already)
> >
>
> I was thinking of doing that as well but I prefer this since we don't
> have to look again at the esr in the callee.
I don't see that obtaining ESR_EL2 is that hard or expensive. Given
that everything is in the same compilation unit, the compiler will
probably optimise it.
But if I have to state the obvious: you are *handling* an SMC
instruction. Surely handle_host_smc() is the correct spot for that
job?
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply
* Re: [PATCH v5 phy-next 10/27] scsi: ufs: qcom: keep parallel track of PHY power state
From: Vladimir Oltean @ 2026-03-25 11:57 UTC (permalink / raw)
To: Manivannan Sadhasivam
Cc: linux-phy, Vinod Koul, Neil Armstrong, dri-devel, freedreno,
linux-arm-kernel, linux-arm-msm, linux-can, linux-gpio, linux-ide,
linux-kernel, linux-media, linux-pci, linux-renesas-soc,
linux-riscv, linux-rockchip, linux-samsung-soc, linux-scsi,
linux-sunxi, linux-tegra, linux-usb, netdev, spacemit,
UNGLinuxDriver, James E.J. Bottomley, Martin K. Petersen,
Nitin Rawat
In-Reply-To: <vu3cxpynr5mu2fzkrtmjcwijc5jz323wlnbc3r7lp2wxqmhydx@z5xhgf4myw2d>
On Wed, Mar 25, 2026 at 05:21:14PM +0530, Manivannan Sadhasivam wrote:
> I believe I added the power_count check for phy_exit(). But since that got
> moved, the check becomes no longer necessary.
FYI, the power_count keeps track of the balance of phy_power_on() and
phy_power_off() calls, whereas it is the init_count keeps track of
phy_init() and phy_exit() calls. They are only related to the extent
that you must respect the phy_init() -> phy_power_on() -> phy_power_off()
-> phy_exit() sequence. But in any case, both should be considered
PHY-internal fields. The "Order of API calls" section from
Documentation/driver-api/phy/phy.rst mentions the order that I just
described above, and consumers should just ensure they follow that.
^ permalink raw reply
* [PATCH 3/3] arm64: dts: freescale: imx95-toradex-smarc: Use gpio-hog for WIFI_UART_EN
From: Franz Schnyder @ 2026-03-25 11:55 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Frank Li,
Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam
Cc: Franz Schnyder, devicetree, imx, linux-arm-kernel, linux-kernel,
Francesco Dolcini
In-Reply-To: <20260325-mainline-update-imx95-v1-0-b5ebe976655b@toradex.com>
From: Franz Schnyder <franz.schnyder@toradex.com>
On the Toradex SMARC iMX95, the WiFi UART signals are shared with the
JTAG. The WIFI_UART_EN signal is used to select between these
two functions.
Configure the signal as gpio-hog and drive it high to select the UART
function by default. Add a label to override the hog in derived
device trees.
Signed-off-by: Franz Schnyder <franz.schnyder@toradex.com>
---
arch/arm64/boot/dts/freescale/imx95-toradex-smarc.dtsi | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/arm64/boot/dts/freescale/imx95-toradex-smarc.dtsi b/arch/arm64/boot/dts/freescale/imx95-toradex-smarc.dtsi
index a90edefc5197..29e3f5bf867b 100644
--- a/arch/arm64/boot/dts/freescale/imx95-toradex-smarc.dtsi
+++ b/arch/arm64/boot/dts/freescale/imx95-toradex-smarc.dtsi
@@ -451,6 +451,13 @@ som_gpio_expander_1: gpio@21 {
"",
"",
"SMARC_SDIO_WP";
+
+ wifi_uart_en: wifi-uart-en-hog {
+ gpio-hog;
+ gpios = <12 GPIO_ACTIVE_HIGH>;
+ line-name = "WIFI_UART_EN";
+ output-high;
+ };
};
embedded-controller@28 {
--
2.43.0
^ permalink raw reply related
* Re: [PATCH v5 phy-next 10/27] scsi: ufs: qcom: keep parallel track of PHY power state
From: Manivannan Sadhasivam @ 2026-03-25 11:51 UTC (permalink / raw)
To: Vladimir Oltean
Cc: linux-phy, Vinod Koul, Neil Armstrong, dri-devel, freedreno,
linux-arm-kernel, linux-arm-msm, linux-can, linux-gpio, linux-ide,
linux-kernel, linux-media, linux-pci, linux-renesas-soc,
linux-riscv, linux-rockchip, linux-samsung-soc, linux-scsi,
linux-sunxi, linux-tegra, linux-usb, netdev, spacemit,
UNGLinuxDriver, James E.J. Bottomley, Martin K. Petersen,
Nitin Rawat
In-Reply-To: <20260325114309.3k7xkfrffpxp5xq4@skbuf>
On Wed, Mar 25, 2026 at 01:43:09PM +0200, Vladimir Oltean wrote:
> On Tue, Mar 24, 2026 at 11:00:10AM +0530, Manivannan Sadhasivam wrote:
> > On Fri, Mar 20, 2026 at 12:32:24AM +0200, Vladimir Oltean wrote:
> > > As explained in the similar ufs-exynos.c change, PHY consumer drivers
> > > should not look at the phy->power_count, because in the general case
> > > there might also be other consumers who have called phy_power_on() too,
> > > so the fact that the power_count is non-zero does not mean that we did.
> > >
> > > Moreover, struct phy will become opaque soon, so the qcom UFS driver
> > > will not be able to apply this pattern. Keep parallel track of the PHY
> > > power state, instead of looking at a field which will become unavailable
> > > (phy->power_count).
> > >
> > > About treating the phy_power_off() return code: from an API perspective,
> > > this should have probably returned void, otherwise consumers would be
> > > stuck in a state they can't escape. The provider, phy-qcom-qmp-ufs.c,
> > > does return 0 in its power_off() implementation. I consider it safe to
> > > discard potential errors from phy_power_off() instead of complicating
> > > the phy_powered_on logic.
> > >
> >
> > You could even simplify the code by getting rid of the 'phy_powered_on' check
> > altogether. There is no real need to track the PHY power state in this driver.
> > It is safe to call phy_power_off() without any checks.
> >
> > - Mani
>
> Ok.. as the author of commit 7bac65687510 ("scsi: ufs: qcom: Power off
> the PHY if it was already powered on in ufs_qcom_power_up_sequence()"),
> I assume you have hardware to test. Would you mind writing a patch that
> I could pick up to replace this one with?
>
Sure, will do.
> I suppose that the power_count test is somehow no longer necessary after
> commit 77d2fa54a945 ("scsi: ufs: qcom : Refactor phy_power_on/off
> calls"), but frankly I don't see it - the ufshcd state machine is a bit
> too complicated for me to just statically analyze.
I believe I added the power_count check for phy_exit(). But since that got
moved, the check becomes no longer necessary.
- Mani
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox