* Re: [PATCH v4 4/6] pds_core: add PLDM component info display
From: Simon Horman @ 2026-06-15 16:08 UTC (permalink / raw)
To: nikhil.rao
Cc: Simon Horman, netdev, brett.creeley, andrew+netdev, davem,
edumazet, kuba, pabeni, eric.joyner
In-Reply-To: <20260614050052.1048328-5-nikhil.rao@amd.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
Simon says: This review seems to be a bit of a mixed bag.
I've annotated the issues it raises that I'd appreciate
you looking over, and the rest, which I don't feel fall
in that category.
---
pds_core: add PLDM component info display
This patch reports per-component firmware versions through devlink info on
v2+ devices, replacing the single fw running version with names derived
from the firmware-supplied component list and slot/flag fields.
Simon says: I'm not particularly concerned about the issue raised below -
I'm forwarding this review because of others raised further down.
[Low]
Before going to the diff, this patch immediately renames user-visible
component names introduced in the immediately preceding commit in the
same series ("pds_core: add PLDM firmware update support via devlink
flash"):
fw.mainfw -> fw
fw.mainfw.gold -> fw.gold
Could these two patches be folded together (or reordered) so the series
introduces the final naming once? Anyone bisecting between the two
commits would land on the obsolete names.
> diff --git a/drivers/net/ethernet/amd/pds_core/devlink.c b/drivers/net/ethernet/amd/pds_core/devlink.c
> index 3b763ee1715e..2f40b97affd6 100644
> --- a/drivers/net/ethernet/amd/pds_core/devlink.c
> +++ b/drivers/net/ethernet/amd/pds_core/devlink.c
> @@ -93,14 +93,105 @@ int pdsc_dl_flash_update(struct devlink *dl,
> return pdsc_firmware_update(pdsc, params, extack);
> }
>
> +static int pdsc_dl_report_component(struct devlink_info_req *req,
> + struct pds_core_fw_component_info *info)
> +{
> + enum devlink_info_version_type ver_type;
> + u16 flags = le16_to_cpu(info->flags);
> + char *ver = info->version;
> + const char *name;
> + char buf[32];
> +
> + /* Main firmware is reported as generic "fw" */
> + if (info->component_type == PDS_CORE_FW_TYPE_MAIN) {
> + if (info->slot_id == PDS_CORE_FW_SLOT_GOLD)
> + snprintf(buf, sizeof(buf), "fw.gold");
> + else
> + snprintf(buf, sizeof(buf), "fw");
> + } else {
> + name = pdsc_fw_type_to_name(info->component_type);
> + if (!name)
> + return 0;
Simon says: Ditto
[Low]
When firmware reports a component_type that pdsc_fw_type_names[] does
not know about, this returns 0 and silently drops the entry. Should
this at least dev_warn_once() so a newer firmware advertising a new
component type does not vanish from devlink output without any
operator-visible signal?
> +
> + if (info->slot_id == PDS_CORE_FW_SLOT_GOLD)
> + snprintf(buf, sizeof(buf), "fw.%s.gold", name);
> + else
> + snprintf(buf, sizeof(buf), "fw.%s", name);
> + }
Simon says: And this one seems intentional, so I think
this portion of the review can be ignored.
[Low]
This branch appends ".gold" to any non-MAIN component when slot_id ==
PDS_CORE_FW_SLOT_GOLD, which produces names like fw.bootloader.gold,
fw.uboot.gold and fw.cpld.gold. The Documentation table updated in
this same patch only lists fw.gold as the stored variant, and the
example output only mentions fw.uboot.gold.
Should the gold suffix be restricted to MAIN, or should the
documentation table be expanded to cover any component_type also
appearing as a .gold variant?
> +
> + ver_type = DEVLINK_INFO_VERSION_TYPE_NONE;
> + if (flags & PDS_CORE_FW_COMPONENT_INFO_F_UPDATE_BY_NAME)
> + ver_type = DEVLINK_INFO_VERSION_TYPE_COMPONENT;
> +
> + if (flags & PDS_CORE_FW_COMPONENT_INFO_F_FIXED) {
[ ... ]
> + return 0;
> +}
> +
> +static int pdsc_dl_component_info_get(struct devlink *dl,
> + struct devlink_info_req *req,
> + struct netlink_ext_ack *extack)
> +{
> + struct pds_core_component_list_info *list_info;
> + struct pdsc *pdsc = devlink_priv(dl);
> + u8 num_components;
> + int err;
> + int i;
> +
> + if (!pdsc->fw_components.num_components) {
> + err = pdsc_get_component_info(pdsc);
> + if (err) {
> + dev_err(pdsc->dev, "Failed to get component_info %pe\n",
> + ERR_PTR(err));
> + return err;
> + }
> + }
Simon says: ditto
[Medium, Medium]
Two questions about the cache check above.
First, pdsc_get_component_info() has a branch that returns 0 without
populating num_components when firmware reports
comp.get_component_info.ver == 0:
if (comp.get_component_info.ver == 0) {
dev_warn_once(pdsc->dev,
"Incompatible get_component_info version %u ...\n",
comp.get_component_info.ver);
err = 0;
goto out;
}
Since pdsc->fw_components.num_components stays 0 on this path, every
subsequent devlink dev info request re-enters pdsc_get_component_info()
and runs through kmalloc / kzalloc(PDS_PAGE_SIZE) / dma_map_single() /
a full devcmd round-trip with pdsc->devcmd_timeout * 2 under
devcmd_lock / dma_unmap_single(). dev_warn_once() suppresses the
warning after the first hit, so the repetition is invisible. Should
there be an "unsupported" sentinel that prevents re-querying once v0
firmware is detected?
Simon says: However, the issue flagged below seems to warrant
investigation in the context of this patch-set.
Second, pdsc->fw_components is only invalidated at the end of
pdsc_firmware_update() ("pdsc->fw_components.num_components = 0;").
If firmware versions change through any other path (FW recovery via
health_work, PCI reset via pci_reset_work, side-channel upgrade) the
cached list stays around and devlink dev info reports stale versions.
Should those reset paths also clear pdsc->fw_components?
> +
> + list_info = &pdsc->fw_components;
> + num_components = min_t(u16, list_info->num_components,
> + le16_to_cpu(pdsc->dev_ident.max_fw_slots));
Simon says: Likewise, the one below seems to warrant investigation too.
[Medium]
num_components has already been bounded against
PDS_CORE_FW_COMPONENT_LIST_LEN inside pdsc_get_component_info().
max_fw_slots looks like a slot count (gold/main-a/main-b), but the
component list also contains cpld/bootloader/uboot/etc. If firmware
reports six components on a three-slot device, this min_t() silently
truncates legitimate component entries from the devlink output.
Is this cap intentional, and if so should truncation be logged? If
not, can the cap simply be dropped?
> + for (i = 0; i < num_components; i++) {
> + err = pdsc_dl_report_component(req, &list_info->info[i]);
> + if (err)
> + return err;
> + }
> +
> + return 0;
> +}
> +
> static char *fw_slotnames[] = {
> "fw.goldfw",
> "fw.mainfwa",
> "fw.mainfwb",
> };
>
> -int pdsc_dl_info_get(struct devlink *dl, struct devlink_info_req *req,
> - struct netlink_ext_ack *extack)
> +static int pdsc_dl_fw_list_info_get(struct devlink *dl,
> + struct devlink_info_req *req,
> + struct netlink_ext_ack *extack)
> {
[ ... ]
> @@ -134,12 +225,49 @@ int pdsc_dl_info_get(struct devlink *dl, struct devlink_info_req *req,
> return err;
> }
>
> + return 0;
> +}
> +
> +static int pdsc_dl_info_get_v1(struct devlink *dl,
> + struct devlink_info_req *req,
> + struct netlink_ext_ack *extack)
> +{
> + struct pdsc *pdsc = devlink_priv(dl);
> + int err;
> +
> + err = pdsc_dl_fw_list_info_get(dl, req, extack);
> + if (err)
> + dev_warn_once(pdsc->dev, "Failed to get fw list: %pe\n",
> + ERR_PTR(err));
Simon says: Again, the one below seems to warrant investigation.
[Medium]
Pre-patch, devlink_info_version_stored_put() failures inside the fw
list loop returned to the devlink layer immediately. After this
refactor, pdsc_dl_fw_list_info_get() returns the error but
pdsc_dl_info_get_v1() converts it to dev_warn_once() and continues to
append the running fw version and the ASIC fields.
Does this mean a mid-loop failure (for example -EMSGSIZE from netlink
attribute append) is now masked, leaving a partially populated devlink
info reply that gets reported as success?
> +
> + /* Version 1: report fw from dev_info (running only) */
> err = devlink_info_version_running_put(req,
> DEVLINK_INFO_VERSION_GENERIC_FW,
> pdsc->dev_info.fw_version);
> if (err)
> return err;
>
> + return 0;
> +}
> +
> +int pdsc_dl_info_get(struct devlink *dl, struct devlink_info_req *req,
> + struct netlink_ext_ack *extack)
> +{
> + struct pdsc *pdsc = devlink_priv(dl);
> + char buf[32];
> + int err;
> +
> + if (pdsc->dev_ident.version >= PDS_CORE_IDENTITY_VERSION_2) {
> + err = pdsc_dl_component_info_get(dl, req, extack);
> + if (err)
> + dev_warn_once(pdsc->dev, "Failed to get component info: %pe\n",
> + ERR_PTR(err));
> + } else {
> + err = pdsc_dl_info_get_v1(dl, req, extack);
> + if (err)
> + return err;
> + }
Simon says: but this one seems to be more a question of style that substance.
[Medium]
On v2+ devices, errors from pdsc_dl_component_info_get() are swallowed
into dev_warn_once() and the function falls through to the ASIC
fields. Pre-patch behavior always reported pdsc->dev_info.fw_version
via DEVLINK_INFO_VERSION_GENERIC_FW.
After this change, transient failures inside pdsc_get_component_info()
(DMA mapping failure, -ENOMEM, the -ETIMEDOUT/-EAGAIN deferred-DMA
path, the v0-firmware return-0-without-populating branch, the
num_components > LIST_LEN -ENOMEM branch), or even firmware enumerating
components without setting PDS_CORE_FW_COMPONENT_INFO_F_RUNNING on the
MAIN component, cause devlink dev info to show only asic.id, asic.rev
and the serial number with no firmware version reported at all.
Could the v2 path either retain the unconditional running fw fallback
from dev_info.fw_version, or fall through to pdsc_dl_info_get_v1() on
failure, so devlink info never silently loses the fw running version?
Also, since dev_warn_once() suppresses the warning across the lifetime
of the device, callers cannot tell that the netlink reply is partial:
a partially populated message gets the ASIC fields appended and is
reported as success.
> +
> snprintf(buf, sizeof(buf), "0x%x", pdsc->dev_info.asic_type);
> err = devlink_info_version_fixed_put(req,
> DEVLINK_INFO_VERSION_GENERIC_ASIC_ID,
^ permalink raw reply
* Re: [PATCH net-next] docs: exclude driver and netdevsim bugs
From: Jakub Kicinski @ 2026-06-15 16:19 UTC (permalink / raw)
To: Leon Romanovsky
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, johannes,
corbet, skhan, workflows, linux-doc
In-Reply-To: <20260615091436.GP327369@unreal>
On Mon, 15 Jun 2026 12:14:36 +0300 Leon Romanovsky wrote:
> > +Unless explicitly excluded all bug fixes should be targeting the ``net``
> > +tree and contain an appropriate Fixes tag.
> > +
> > +Obvious exclusions:
> > +
> > + - fixes for bugs which only exist in ``net-next`` should target ``net-next``
> > + (please still include the Fixes tag in the commit message)
> > + - bugs which cannot be reached, e.g. in code paths not executed given
> > + current in-tree callers
> > + - fixes for compiler warnings and typos
>
> If you decide to resubmit this patch, could you please remove "fixes for
> compiler warnings" from the exclusion list?
>
> It is quite frustrating to receive a compiler warning originating from a
> different subsystem after the merge window, knowing it will not be
> addressed until the next merge window (around eight weeks later).
Agreed, FWIW, but not planning to resubmit.
I think people misunderstood that I'm __documenting what I already do__
rather than trying to have a discussion :/
^ permalink raw reply
* Re: [PATCH v4 5/6] pds_core: add host backed memory support for firmware
From: Simon Horman @ 2026-06-15 16:19 UTC (permalink / raw)
To: nikhil.rao
Cc: Simon Horman, netdev, brett.creeley, andrew+netdev, davem,
edumazet, kuba, pabeni, eric.joyner
In-Reply-To: <20260614050052.1048328-6-nikhil.rao@amd.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
pds_core: add host backed memory support for firmware
This commit adds support for host-backed DMA memory to provide additional
memory for firmware control plane components on newer AMD/Pensando cards.
> diff --git a/drivers/net/ethernet/amd/pds_core/core.c b/drivers/net/ethernet/amd/pds_core/core.c
> index 705cab7b07273..d1695ca95440c 100644
> --- a/drivers/net/ethernet/amd/pds_core/core.c
> +++ b/drivers/net/ethernet/amd/pds_core/core.c
[ ... ]
> @@ -658,3 +660,161 @@ void pdsc_health_thread(struct work_struct *work)
> out_unlock:
> mutex_unlock(&pdsc->config_lock);
> }
> +
> +static void pdsc_host_mem_del_one(struct pdsc *pdsc, u16 tag, u8 reason)
> +{
> + union pds_core_dev_comp comp = {};
> + union pds_core_dev_cmd cmd = {
> + .host_mem.opcode = PDS_CORE_CMD_HOST_MEM,
> + .host_mem.oper = PDS_CORE_HOST_MEM_DEL,
> + .host_mem.tag = cpu_to_le16(tag),
> + .host_mem.reason = reason,
> + };
[ ... ]
> + hm->pa = dma_map_page(pdsc->dev, hm->pg, 0, hm->size,
> + DMA_BIDIRECTIONAL);
> + if (dma_mapping_error(pdsc->dev, hm->pa)) {
> + dev_err(pdsc->dev, "dma map failed for tag %d size %d\n",
> + hm->tag, hm->size);
> + __free_pages(hm->pg, hm->order);
> + hm->pg = NULL;
> + err = -EIO;
> + goto err_del;
> + }
> +
> + /* Track this allocation so pdsc_host_mem_free() can clean it up */
> + pdsc->num_host_mem_reqs++;
> +
> + memset(&cmd, 0, sizeof(cmd));
> + memset(&comp, 0, sizeof(comp));
> + cmd.host_mem.opcode = PDS_CORE_CMD_HOST_MEM;
> + cmd.host_mem.oper = PDS_CORE_HOST_MEM_ADD;
> + cmd.host_mem.tag = cpu_to_le16(hm->tag);
> + cmd.host_mem.size = cpu_to_le32(hm->size);
> + cmd.host_mem.buf_pa = cpu_to_le64(hm->pa);
> +
> + dev_dbg(pdsc->dev, "Sending devcmd for mem add tag %d size %d pa %pad\n",
> + hm->tag, hm->size, &hm->pa);
> + err = pdsc_devcmd(pdsc, &cmd, &comp, pdsc->devcmd_timeout);
> + if (err || comp.status != PDS_RC_SUCCESS) {
> + dev_err(pdsc->dev, "mem add failed err %d status %d for tag %d\n",
> + err, comp.status, hm->tag);
> + err = err ? err : -EIO;
> + goto err_del;
> + }
> + dev_dbg(pdsc->dev, "mem add completed for tag %d\n", hm->tag);
> +
> + return 0;
> +
> +err_del:
> + /* After MEM_QUERY succeeds, firmware expects MEM_ADD or MEM_DEL */
> + pdsc_host_mem_del_one(pdsc, hm->tag, PDS_RC_ENOMEM);
> + return err;
[Severity: Medium]
Does this leave the failed memory allocation mapped and wasted for the
lifetime of the driver?
Because pdsc->num_host_mem_reqs is incremented before sending the MEM_ADD
command, jumping to the err_del label here skips the DMA unmap and page free
steps. The failed request is tracked and will eventually be cleaned up by
pdsc_host_mem_free() during device teardown.
However, the caller pdsc_start() ignores the failure, so the driver operates
normally while potentially tying up to 4MB of DMA-mapped memory until unload
or reset.
> +}
[ ... ]
> +void pdsc_host_mem_free(struct pdsc *pdsc)
> +{
> + int i;
> +
> + if (!pdsc->host_mem_reqs)
> + return;
> +
> + for (i = 0; i < pdsc->num_host_mem_reqs; i++) {
> + dma_unmap_page(pdsc->dev, pdsc->host_mem_reqs[i].pa,
> + pdsc->host_mem_reqs[i].size,
> + DMA_BIDIRECTIONAL);
> + __free_pages(pdsc->host_mem_reqs[i].pg,
> + pdsc->host_mem_reqs[i].order);
> + }
[Severity: Critical]
Is there a missing notification to the firmware that this memory is being
unmapped?
This loop unmaps the DMA regions and frees the pages, but it doesn't send a
PDS_CORE_HOST_MEM_DEL command.
For physical functions, pdsc_devcmd_reset() runs before this during
teardown, which might clear hardware state. However, virtual functions
explicitly bypass the reset in pdsc_teardown():
pdsc_teardown() {
if (!pdsc->pdev->is_virtfn)
pdsc_devcmd_reset(pdsc);
...
If a virtual function uses host memory, skipping the MEM_DEL command means
the firmware could be completely unaware the buffers were returned to the
OS. Could this lead to the device performing DMA into freed memory and
causing memory corruption or IOMMU faults?
> +
> + kfree(pdsc->host_mem_reqs);
> + pdsc->host_mem_reqs = NULL;
> + pdsc->num_host_mem_reqs = 0;
> +}
^ permalink raw reply
* Re: [PATCH net-next v7 11/12] net: pcs: airoha: add PCS driver for Airoha AN7581 SoC
From: Benjamin Larsson @ 2026-06-15 16:31 UTC (permalink / raw)
To: Christian Marangi, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, Simon Horman, Jonathan Corbet, Shuah Khan,
Lorenzo Bianconi, Heiner Kallweit, Russell King, Saravana Kannan,
Philipp Zabel, Nathan Chancellor, Nick Desaulniers, Bill Wendling,
Justin Stitt, netdev, devicetree, linux-kernel, linux-doc,
linux-arm-kernel, linux-mediatek, llvm
In-Reply-To: <20260615122950.22281-12-ansuelsmth@gmail.com>
Hi.
On 15/06/2026 14:29, Christian Marangi wrote:
> Add PCS driver for Airoha AN7581 SoC for Ethernet/PON/PCIe/USB SERDES
> and permit usage of external PHY or connected SFP cage. Supported modes
> are USXGMII, 10G-BASER, 2500BASE-X, 1000BASE-X and SGMII.
>
> The driver probe and register the various needed registers and register as
> a PCS provider for fwnode usage.
>
> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
> ---
> drivers/net/pcs/Kconfig | 2 +
> drivers/net/pcs/Makefile | 2 +
> drivers/net/pcs/airoha/Kconfig | 12 +
> drivers/net/pcs/airoha/Makefile | 7 +
> drivers/net/pcs/airoha/pcs-airoha-common.c | 1318 ++++++++++++
> drivers/net/pcs/airoha/pcs-airoha.h | 1309 ++++++++++++
> drivers/net/pcs/airoha/pcs-an7581.c | 2093 ++++++++++++++++++++
> 7 files changed, 4743 insertions(+)
> create mode 100644 drivers/net/pcs/airoha/Kconfig
> create mode 100644 drivers/net/pcs/airoha/Makefile
> create mode 100644 drivers/net/pcs/airoha/pcs-airoha-common.c
> create mode 100644 drivers/net/pcs/airoha/pcs-airoha.h
> create mode 100644 drivers/net/pcs/airoha/pcs-an7581.c
Most likely there will be pcs drivers for the EN7523 platform also. Can
the common code for an7581 have an7581 in the name instead of airoha?
MvH
Benjamin Larsson
^ permalink raw reply
* Re: [PATCH net-next 0/9] atm: remove more dead code
From: Simon Horman @ 2026-06-15 16:33 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, 3chas3, mitch,
linux-atm-general, dwmw2
In-Reply-To: <20260613201032.77274-1-kuba@kernel.org>
On Sat, Jun 13, 2026 at 01:10:23PM -0700, Jakub Kicinski wrote:
> Commit 6deb53595092 ("net: remove unused ATM protocols and legacy
> ATM device drivers") removed a good chunk of old ATM drivers.
> Our goal going forward is to limit the ATM support to PPPoATM
> used in ADSL deployments.
>
> A recent burst of AI generated fixes for net/atm/signaling.c and
> net/atm/svc.c made me look closer at the remaining code. PPPoATM runs
> over permanent virtual circuits (PF_ATMPVC) with a statically
> configured VPI/VCI. We can drop switched virtual circuits (SVCs)
> and user-space signaling (atmsigd) support. While digging around
> I noticed a few more obviously dead pieces of code.
>
> Annoyingly, I have applied one "fix" to QoS config which will
> now make net conflict with this series :/
>
> Jakub Kicinski (9):
> atm: remove AAL3/4 transport support
> atm: remove the unused send_oam / push_oam callbacks
> atm: remove dead SONET PHY ioctls
> atm: remove the local ATM (NSAP) address registry
> atm: remove SVC socket support and the signaling daemon interface
> atm: remove the unused change_qos device operation
> atm: remove the unused pre_send and send_bh device operations
> atm: remove unused ATM PHY operations
> atm: remove orphaned uAPI for deleted drivers, protocols and SVCs
There is a compile time nit on patch 4/9.
../net/atm/resources.c: In function ‘atm_dev_ioctl’:
../net/atm/resources.c:227:20: warning: variable ‘len’ set but not used [-Wunused-but-set-variable=]
227 | int error, len, size = 0;
| ^~~
It might be nice to clear that one.
But overall this looks good to me.
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply
* RE: [Intel-wired-lan] [PATCH v4] ice: wait for reset completion in ice_resume()
From: Nowlin, Alexander @ 2026-06-15 16:41 UTC (permalink / raw)
To: Aaron Ma, Nguyen, Anthony L, Kitszel, Przemyslaw, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: Akeem G Abodunrin, Jesse Brandeburg,
intel-wired-lan@lists.osuosl.org, Loktionov, Aleksandr,
kohei@enjuk.jp, Paul Menzel
In-Reply-To: <20260429034849.1686650-1-aaron.ma@canonical.com>
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Aaron Ma via Intel-wired-lan
> Sent: Tuesday, April 28, 2026 8:49 PM
> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>; Andrew Lunn <andrew+netdev@lunn.ch>; David S. Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>; Jesse Brandeburg <jesse.brandeburg@intel.com>; intel-wired-lan@lists.osuosl.org; Loktionov, Aleksandr <aleksandr.loktionov@intel.com>; kohei@enjuk.jp; Paul Menzel <pmenzel@molgen.mpg.de>
> Subject: [Intel-wired-lan] [PATCH v4] ice: wait for reset completion in ice_resume()
>
> ice_resume() schedules an asynchronous PF reset and returns immediately. The reset runs later in ice_service_task(). If userspace tries to bring up the net device before the reset finishes, ice_open() fails with -EBUSY:
>
> ice_resume()
> ice_schedule_reset() # sets ICE_PFR_REQ, returns
> ...
> ice_open()
> ice_is_reset_in_progress() # ICE_PFR_REQ still set, -EBUSY
> ...
> ice_service_task()
> ice_do_reset()
> ice_rebuild() # clears ICE_PFR_REQ, too late
>
> Reproduced on E800 series NICs during suspend/resume with irdma enabled, where the aux device probe widens the race window.
>
> ice 0000:81:00.0: can't open net device while reset is in progress
>
> Add a best-effort wait (10s timeout, matching ice_devlink_info_get()) for the reset to complete before returning from ice_resume(). In practice the reset completes in ~300ms.
>
> Fixes: 769c500dcc1e ("ice: Add advanced power mgmt for WoL")
> Cc: stable@vger.kernel.org
> Reviewed-by: Kohei Enju <kohei@enjuk.jp>
> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
> ---
> v4: use secs_to_jiffies() instead of 10 * HZ (Przemek Kitszel)
> v3: add error message to commit message for searchability, mention
> timeout in dev_err (Paul Menzel)
> v2: reword comment to clarify best-effort semantics (Kohei Enju)
> v1: https://lore.kernel.org/intel-wired-lan/20260402024220.210466-1-aaron.ma@canonical.com/
>
> drivers/net/ethernet/intel/ice/ice_main.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
Tested-by: Alexander Nowlin <alexander.nowlin@intel.com>
^ permalink raw reply
* Re: e1000e: Report link down after "Detected Hardware Unit Hang" ?
From: Andrew Lunn @ 2026-06-15 16:41 UTC (permalink / raw)
To: Helge Deller; +Cc: Tony Nguyen, Przemek Kitszel, intel-wired-lan, netdev
In-Reply-To: <ai8hmGQmXQu64Ld3@carbonx1>
On Sun, Jun 14, 2026 at 11:48:08PM +0200, Helge Deller wrote:
> I'm regularily facing the known "eno1: Detected Hardware Unit Hang:"
> with my on-board intel e1000e NIC hardware.
> Since none of he various tips on the internet helped, I had the idea
> to setup a master/slave bond networking to fail over to another NIC when
> the Intel chip hangs.
>
> Sadly this doesn't work as intended, because the link of the intel NIC
> isn't reported "down", so the failover never happens, unless I manually
> start "ifconfig eno1 down".
>
> My question: Shouldn't the intel NIC ideally report Link Down if we know
> it hangs? That way a fail-over should at least happen, right?
>
> Below is a completely untested patch.
> Does it make sense that I try to test and/or develop such a patch, or
> are there things I miss?
If the interface is dead, then setting the carrier down makes a lot of
sense. One question i have is, what do you need to do to recover the
hardware? Will it correctly set the carrier up when you do the
recovery?
Also, just looking at your proposed change, it is not clear to me why
such an assignment will result in carrier down. It would be good to
explain it in the commit message.
Andrew
^ permalink raw reply
* Re: [PATCH RFC 3/9] net: stmmac: qcom-ethqos: fix RGMII_ID mode to use DLL bypass
From: Andrew Lunn @ 2026-06-15 16:48 UTC (permalink / raw)
To: Mohd Ayaan Anwar
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Richard Cochran, Bjorn Andersson, Konrad Dybcio, Maxime Coquelin,
Alexandre Torgue, Russell King, linux-arm-msm, netdev, devicetree,
linux-kernel, linux-stm32, linux-arm-kernel
In-Reply-To: <ai93X/cNWHtEQsDt@oss.qualcomm.com>
On Mon, Jun 15, 2026 at 09:24:07AM +0530, Mohd Ayaan Anwar wrote:
> Hello Andrew,
> On Thu, Jun 11, 2026 at 10:54:37PM +0200, Andrew Lunn wrote:
> > On Fri, Jun 12, 2026 at 12:06:59AM +0530, Mohd Ayaan Anwar wrote:
> > > When "rgmii-id" is selected the PHY supplies both TX and RX delays, so
> > > the MAC must not add its own. The driver currently falls through to the
> > > generic DLL initialisation path which programs it to add a delay.
> > >
> > > Power down the DLL and set DDR bypass mode for RGMII_ID, then program
> > > the IO_MACRO via a new ethqos_rgmii_id_macro_init() helper. Also fix
> > > ethqos_set_clk_tx_rate() to not double the clock rate in bypass mode at
> > > 100M/10M, and remove RGMII_ID from the phase-shift suppression in
> > > ethqos_rgmii_macro_init() since RGMII_ID no longer reaches that path.
> >
> > I'm curious how this works at the moment? Do no boards make use of
> > RGMII ID? Are all current boards broken?
>
> Searching through the DTS, I found that we have two boards using "rgmii"
> (qcs404-evb-4000.dts and sa8155-adp.dts) and another board using
> "rgmii-txid" (sa8540p-ride.dts). No board which uses RGMII ID.
So this causes problems. We cannot break existing boards, yet it would
be good to fix the current broken behaviour.
> I don't think any of these boards have extra long wires which would add
> PCB level delay. They are against the netdev definitions for "rgmii" and
> "rgmii-txid".
>
> But the first two boards should still be working fine since the current
> driver programs the IO_MACRO to add the delay when operating in RGMII
> mode.
Which is wrong, given the current definition. No delays should be
added, by either the MAC or the PHY.
Please could you contact the Maintainers of these boards and find out
the real situation with the hardware.
It could be the best way forward is that you issue a warning when
"rgmii" is found and pass rgmii-id to the PHY. And you also change the
two boards to use rgmii-id. Lets think about the rgmii-txid case once
we better understand it.
Andrew
^ permalink raw reply
* Re: [PATCH net] appletalk: Hold socket reference in atalk_rcv()
From: Simon Horman @ 2026-06-15 16:48 UTC (permalink / raw)
To: zhaoyz24
Cc: Simon Horman, netdev, davem, edumazet, kuba, pabeni, kees,
veritas501, linux-kernel, yangyx22, wangao, fengxw06, qli01, xuke,
stable
In-Reply-To: <20260614095226.1210-1-zhaoyz24@mails.tsinghua.edu.cn>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
appletalk: Hold socket reference in atalk_rcv()
This commit addresses a use-after-free race by holding a reference to the
socket found by atalk_search_socket() until after the packet is queued in
atalk_rcv().
> diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c
> index 30a6dc06291c3..61ec5c569dc3c 100644
> --- a/net/appletalk/ddp.c
> +++ b/net/appletalk/ddp.c
[ ... ]
> @@ -1474,9 +1476,12 @@ static int atalk_rcv(struct sk_buff *skb, struct net_device *dev,
> goto drop;
>
> /* Queue packet (standard) */
> - if (sock_queue_rcv_skb(sock, skb) < 0)
> + if (sock_queue_rcv_skb(sock, skb) < 0) {
> + sock_put(sock);
> goto drop;
> + }
[Severity: High]
Does this code introduce a memory leak or an infinite timer loop?
By holding a socket reference in atalk_rcv(), the socket memory is kept alive,
but atalk_rcv() does not hold the socket lock. A concurrent atalk_release()
could lock the socket, mark it SOCK_DEAD, and call atalk_destroy_socket(),
which logically purges the receive queue:
net/appletalk/ddp.c:atalk_destroy_socket() {
...
skb_queue_purge(&sk->sk_receive_queue);
...
}
After this purge, atalk_rcv() might call sock_queue_rcv_skb(). The core
function __sock_queue_rcv_skb() does not reject packets for SOCK_DEAD sockets.
It will successfully enqueue the skb and increment sk->sk_rmem_alloc.
Since AppleTalk uses the empty sock_def_destruct() instead of a custom
sk_destruct callback, is the receive queue ever purged upon final sk_free(),
or does this permanently leak the skb?
Furthermore, if atalk_destroy_socket() finds other pending allocations and
arms the destruction timer, will the timer endlessly re-arm itself?
net/appletalk/ddp.c:atalk_destroy_timer() {
...
if (sk_has_allocations(sk)) {
sk->sk_timer.expires = jiffies + SOCK_DESTROY_TIME;
add_timer(&sk->sk_timer);
}
...
}
The timer checks sk_has_allocations(sk), which will remain true indefinitely
due to the uncleared sk_rmem_alloc from the leaked skb.
>
> + sock_put(sock);
> return NET_RX_SUCCESS;
^ permalink raw reply
* Re: [PATCH net-next v7 2/2] net: dsa: mv88e6xxx: add support for credit based shaper
From: Jakub Kicinski @ 2026-06-15 16:51 UTC (permalink / raw)
To: Luke Howard
Cc: Cedric Jehasse via B4 Relay, cedric.jehasse, Andrew Lunn,
Vladimir Oltean, David S. Miller, Eric Dumazet, Paolo Abeni,
Simon Horman, Russell King, netdev, linux-kernel,
Marek Behún, Cedric Jehasse
In-Reply-To: <9E7B1B5A-ACE1-4488-A600-96FAD88D070E@padl.com>
On Sat, 13 Jun 2026 13:37:08 +1000 Luke Howard wrote:
> On egress, one can use the switch chips’ AVB support, but this limits
> the number of TCs to three (corresponding to SR class A, B, and
> ‘legacy’ traffic). I have a submitted MQPRIO offload patch that does
> this. This matches how MQPRIO and CBS are used on end station NICs,
> but is not flexible enough to support an arbitrary number of TCs.
> Because of this, offloading PRIO may be cleaner.
Yes, it's been a while since I worked on qdisc offload, but the NIC /
host-centric offloads operate quite differently than the switch-centric
ones. Offloading MQPRIO to a switch is likely to lead to technical debt.
^ permalink raw reply
* Re: [PATCH net-next v7 2/2] net: dsa: mv88e6xxx: add support for credit based shaper
From: Jakub Kicinski @ 2026-06-15 16:54 UTC (permalink / raw)
To: Cedric Jehasse
Cc: Cedric Jehasse via B4 Relay, cedric.jehasse, Andrew Lunn,
Vladimir Oltean, David S. Miller, Eric Dumazet, Paolo Abeni,
Simon Horman, Russell King, netdev, linux-kernel, Luke Howard,
Marek Behún
In-Reply-To: <4dvydmir7hw4zhfrxonjuu4l7as6hma5yg4iuv3vgoqed7zai6@knkmyvh4sa54>
On Mon, 15 Jun 2026 16:29:34 +0200 Cedric Jehasse wrote:
> > > +static int mv88e6xxx_setup_tc_cbs(struct dsa_switch *ds, int port,
> > > + struct tc_cbs_qopt_offload *cbs)
> >
> > please stick an extack into struct tc_cbs_qopt_offload and use it to
> > report the reason for rejection back to the user
>
> Do you mean modifying cbs_enable_offload to report a driver specified reason isof
> reportig the generic "Specified device failed to setup cbs hardware offload"?
> What should be done for other drivers who handle TC_SETUP_QDISC_CBS, but don't
> fill in extack? Should cbs_enable_offload fill in extack with "Specified device
> failed to setup cbs hardware offload"?
There's a "weak" version of the extack-setting macro which you can use
to avoid overriding the driver-specified message. With that you should
be able to set the generic error for existing switches while allowing
your driver to provide more specific reason.
BTW please note that net-next is now closed for the duration of the
merge window.
^ permalink raw reply
* Re: [PATCH net] appletalk: Hold socket reference in atalk_rcv()
From: Eric Dumazet @ 2026-06-15 16:53 UTC (permalink / raw)
To: Yizhou Zhao
Cc: netdev, David S. Miller, Jakub Kicinski, Paolo Abeni,
Simon Horman, Kees Cook, Kito Xu, linux-kernel, Yuxiang Yang,
Ao Wang, Xuewei Feng, Qi Li, Ke Xu, stable
In-Reply-To: <20260614095226.1210-1-zhaoyz24@mails.tsinghua.edu.cn>
On Sun, Jun 14, 2026 at 2:52 AM Yizhou Zhao
<zhaoyz24@mails.tsinghua.edu.cn> wrote:
>
> atalk_search_socket() walks the global atalk_sockets list while holding
> atalk_sockets_lock, but it returns the matching socket after dropping the
> lock without taking a reference. atalk_rcv() then passes that pointer to
> sock_queue_rcv_skb().
>
> That leaves a race with close(). A concurrent atalk_release() can orphan
> the socket, remove it from atalk_sockets, and drop the final reference via
> atalk_destroy_socket(), freeing the socket before atalk_rcv() queues the
> incoming skb.
>
> On a KASAN-enabled kernel this can be reproduced by racing AppleTalk DDP
> delivery on loopback against close/rebind of the destination DGRAM socket:
>
> BUG: KASAN: slab-use-after-free in selinux_socket_sock_rcv_skb()
> sk_filter_trim_cap()
> sock_queue_rcv_skb_reason()
> atalk_rcv()
> snap_rcv()
> llc_rcv()
>
> Take a reference on the selected socket before dropping
> atalk_sockets_lock, and put it after sock_queue_rcv_skb() has finished.
> This keeps the socket alive for the receive path without changing socket
> lookup semantics. A malformed or racing receive still drops the skb on
> queueing failure as before.
No idea why linux still carries appletalk.
MacOS dropped it 20 years ago.
^ permalink raw reply
* Re: [PATCH] net: ehea: unwind probe_port sysfs file on failure
From: Andrew Lunn @ 2026-06-15 16:54 UTC (permalink / raw)
To: Pengpeng Hou
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Kees Cook, netdev, linux-kernel
In-Reply-To: <20260615070033.43461-1-pengpeng@iscas.ac.cn>
>On Mon, Jun 15, 2026 at 03:00:31PM +0800, Pengpeng Hou wrote:
> ehea_create_device_sysfs() creates probe_port and then remove_port. If
> the second device_create_file() fails, the helper returns the error but
> leaves probe_port installed even though probe treats the sysfs setup as
> failed.
>
> Remove probe_port on the remove_port creation failure path so the helper
> leaves no partial sysfs state behind.
>
> Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Andrew
^ permalink raw reply
* Re: [PATCH v14 1/2] tcp: rehash onto different local ECMP path on retransmit timeout
From: Eric Dumazet @ 2026-06-15 16:55 UTC (permalink / raw)
To: Neil Spring
Cc: netdev, ncardwell, kuniyu, davem, kuba, dsahern, pabeni, horms,
shuah, linux-kselftest, bpf, martin.lau, daniel
In-Reply-To: <20260615042158.1600746-2-ntspring@meta.com>
On Sun, Jun 14, 2026 at 9:22 PM Neil Spring <ntspring@meta.com> wrote:
>
> Currently sk_rethink_txhash() re-rolls the socket's txhash on RTO, PLB,
> and spurious-retransmission events, but the cached route is reused and
> the new hash is not propagated into the ECMP path selection logic. Two
> changes are needed to make rehash select a different local ECMP path:
>
> 1. Add __sk_dst_reset() alongside sk_rethink_txhash() in
> tcp_write_timeout(), tcp_rcv_spurious_retrans(), and
> tcp_plb_check_rehash() so the cached dst is invalidated and the
> next transmit triggers a fresh route lookup.
>
> 2. Set fl6->mp_hash from sk_txhash (or tcp_rsk(req)->txhash for
> SYN/ACK retransmits and syncookies) in tcp_v6_connect(),
> inet6_sk_rebuild_header(), inet6_csk_route_req(),
> inet6_csk_route_socket(), tcp_v6_send_response(), and
> cookie_v6_check() so fib6_select_path() picks a path based on the
> new hash.
>
> The mp_hash override only applies to fib_multipath_hash_policy 0 (the
> default L3 policy). Its hash includes the flow label, but that is 0 by
> default -- np->flow_label is unset, and auto_flowlabels only computes
> the on-wire label later, per packet -- so flows to the same peer share
> one local path. Keying the hash on sk_txhash makes the local path
> per-connection and lets a rehash re-select it. Policies 1-3 are left
> unchanged.
>
> The mp_hash assignment is factored into a small helper,
> ip6_ecmp_set_mp_hash(), shared by inet6_csk_route_req(),
> inet6_csk_route_socket(), tcp_v6_connect(), inet6_sk_rebuild_header(),
> tcp_v6_send_response(), and cookie_v6_check(). It applies
> (txhash >> 1) ?: 1 for policy 0 (the >> 1 keeps mp_hash in the 31-bit
> range; ?: 1 keeps it non-zero, since 0 would fall back to
> rt6_multipath_hash()). inet6_csk_route_socket() calls it only for
> sk_protocol == IPPROTO_TCP so that non-TCP callers (e.g., L2TP via
> inet6_csk_xmit) fall through to rt6_multipath_hash() and retain their
> existing flow-key-based ECMP behavior.
>
> tcp_v6_send_response() also sets mp_hash from the response txhash so
> that a control packet (a RST from the full socket, or an ACK from a
> time-wait socket) selects the same local ECMP nexthop as the
> connection's txhash rather than falling back to the flow hash. The
> time-wait socket's tw_txhash is copied from sk_txhash when the
> connection enters TIME_WAIT, so it reflects any rehash that occurred.
>
> Setting mp_hash explicitly is necessary because the default ECMP hash
> derives from fl6->flowlabel via np->flow_label, which is not updated
> from sk_txhash (REPFLOW is off by default). ip6_make_flowlabel()
> cannot help either, as it runs after the route lookup.
>
> As a consequence, for policy 0 the local ECMP path of an IPv6 TCP
> flow follows sk_txhash even when fl6->flowlabel is non-zero, e.g. a
> reflected (REPFLOW) or explicitly set (IPV6_FLOWLABEL_MGR) flow
> label. This is intentional: only local path selection changes, so
> rehash can recover from a failed path; the on-wire flow label is
> unchanged.
>
> sk_set_txhash() is moved before ip6_dst_lookup_flow() in
> tcp_v6_connect() so the initial ECMP path is selected by the same
> txhash that subsequent route rebuilds will use. This avoids
> unintended path changes when the cached dst is naturally invalidated
> (e.g., by PMTU discovery or route changes).
>
> The rehash sites (tcp_write_timeout(), tcp_plb_check_rehash(), and
> tcp_rcv_spurious_retrans()) call __sk_rethink_txhash_reset_dst(),
> which re-rolls the txhash and, when it changed, drops the cached dst
> so the next transmit re-runs route selection. The dst reset is
> guarded by sk->sk_family == AF_INET6 since IPv4 ECMP does not
> currently use sk_txhash for path selection. For IPv4-mapped IPv6
> sockets this produces a redundant dst reset on a cold path
> (RTO/PLB); the subsequent IPv4 route lookup returns the same result.
> The helper is deliberately separate from sk_rethink_txhash() itself:
> dst_negative_advice() calls sk_rethink_txhash() before its own dst op,
> so resetting the dst inside sk_rethink_txhash() would skip that op
> (e.g. rt6_remove_exception_rt()).
>
> For syncookies, cookie_init_sequence() computes the cookie value
> before route_req() and sets txhash so the SYN-ACK selects the same
> ECMP path that cookie_v6_check() will use when the full socket is
> created. cookie_tcp_reqsk_init() derives txhash from the cookie so
> the full socket's ECMP path matches the SYN-ACK. Both the SYN-ACK
> assignment in tcp_conn_request() and the full-socket assignment in
> cookie_tcp_reqsk_init() set txhash from the cookie for IPv4 and IPv6
> alike. On IPv6 this drives ECMP path selection; on IPv4, which does
> not use sk_txhash for ECMP, it only affects TX-queue selection. That
> selection scales the hash by its high bits (reciprocal_scale()), which
> are uniform in the keyed secure_tcp_syn_cookie() output -- the MSS index
> only perturbs the low bits -- so the queue distribution matches
> net_tx_rndhash().
>
> cookie_init_sequence() is split from the former version that also
> called tcp_synq_overflow() and incremented SYNCOOKIESSENT; those
> side effects are now in cookie_record_sent(), called after
> route_req() succeeds so they are not bumped when route_req() fails.
> cookie_record_sent() is guarded by CONFIG_SYN_COOKIES to
> match the guard on tcp_synq_overflow(). route_req() receives 0 as
> tw_isn for the syncookie path so that tcp_v6_init_req() still saves
> ireq->pktopts for REPFLOW flowlabel reflection and IPv6 cmsg
> options. The ecn_ok clear for syncookies without timestamps stays
> after tcp_ecn_create_request() so it takes precedence.
>
> Signed-off-by: Neil Spring <ntspring@meta.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply
* Re: [PATCH v2] net: mvneta_bm: add suspend/resume support to prevent crash after resume
From: Simon Horman @ 2026-06-15 16:58 UTC (permalink / raw)
To: yun.zhou
Cc: Simon Horman, marcin.s.wojtas, andrew+netdev, davem, edumazet,
kuba, pabeni, netdev, linux-kernel
In-Reply-To: <20260614103835.2708515-1-yun.zhou@windriver.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
net: mvneta_bm: add suspend/resume support to prevent crash after resume
This commit adds suspend and resume callbacks to the mvneta_bm driver
to handle hardware Buffer Manager state correctly across power cycles.
It drains and frees buffers on suspend, then reinitializes the BM hardware
defaults on resume.
> diff --git a/drivers/net/ethernet/marvell/mvneta_bm.c b/drivers/net/ethernet/marvell/mvneta_bm.c
> index 6bb3804949190..ff2fe4020a456 100644
> --- a/drivers/net/ethernet/marvell/mvneta_bm.c
> +++ b/drivers/net/ethernet/marvell/mvneta_bm.c
[ ... ]
> @@ -477,6 +477,52 @@ static void mvneta_bm_remove(struct platform_device *pdev)
> clk_disable_unprepare(priv->clk);
> }
>
> +#ifdef CONFIG_PM_SLEEP
> +static int mvneta_bm_suspend(struct device *dev)
> +{
> + struct mvneta_bm *priv = dev_get_drvdata(dev);
> + int i;
> +
> + /* Drain buffers and free pool resources while BM is still clocked */
> + for (i = 0; i < MVNETA_BM_POOLS_NUM; i++) {
> + struct mvneta_bm_pool *bm_pool = &priv->bm_pools[i];
> + int size_bytes;
> +
> + if (bm_pool->type == MVNETA_BM_FREE)
> + continue;
> +
> + mvneta_bm_bufs_free(priv, bm_pool, bm_pool->port_map);
> +
> + size_bytes = sizeof(u32) * bm_pool->hwbm_pool.size;
> + dma_free_coherent(&priv->pdev->dev, size_bytes,
> + bm_pool->virt_addr, bm_pool->phys_addr);
[Severity: Critical]
Can this dma_free_coherent() call cause a panic or double-free if pool
creation failed during initialization?
If mvneta_bm_pool_create() fails in mvneta_bm_pool_use(), the pool type is
left as non-FREE. For example, if the failure is due to an alignment issue
in mvneta_bm_pool_create():
if (!IS_ALIGNED((u32)bm_pool->virt_addr, MVNETA_BM_POOL_PTR_ALIGN)) {
dma_free_coherent(&pdev->dev, size_bytes, bm_pool->virt_addr,
bm_pool->phys_addr);
...
return -ENOMEM;
}
This frees virt_addr but leaves the dangling pointer in bm_pool->virt_addr.
Since the type is no longer MVNETA_BM_FREE, the suspend loop will not
skip it, leading to a second dma_free_coherent() call on the same pointer.
If the allocation itself failed, virt_addr is NULL, which could also lead
to a panic here.
> + bm_pool->virt_addr = NULL;
> + bm_pool->type = MVNETA_BM_FREE;
[Severity: Medium]
Does setting the pool type to MVNETA_BM_FREE cause lockdep tracking issues
on resume?
When the device resumes, mvneta_resume() will call mvneta_bm_pool_use()
to recreate the pool. Because the type was reset to MVNETA_BM_FREE here,
mvneta_bm_pool_use() will execute its initialization block:
if (new_pool->type == MVNETA_BM_FREE) {
...
mutex_init(&hwbm_pool->buf_lock);
}
Calling mutex_init() on the existing statically-allocated lock structure
wipes its wait list and lockdep state. Is there a way to avoid
re-initializing locks that are already set up?
> + }
> +
> + mvneta_bm_write(priv, MVNETA_BM_COMMAND_REG, MVNETA_BM_STOP_MASK);
> + clk_disable_unprepare(priv->clk);
> + return 0;
> +}
> +
> +static int mvneta_bm_resume(struct device *dev)
> +{
> + struct mvneta_bm *priv = dev_get_drvdata(dev);
> + int err;
> +
> + err = clk_prepare_enable(priv->clk);
> + if (err)
> + return err;
> +
> + /* Reinitialize BM hardware; pools are refilled by mvneta_resume() */
> + mvneta_bm_default_set(priv);
> + mvneta_bm_write(priv, MVNETA_BM_COMMAND_REG, MVNETA_BM_START_MASK);
[Severity: Critical]
Is the hardware pool state completely restored during resume?
The commit message notes that mvneta_bm_resume() and mvneta_resume()
perform full initialization identical to probe. However, mvneta_resume()
recreates the pool via mvneta_bm_pool_create(), which only restores the
base address and target attributes:
mvneta_bm_write(priv, MVNETA_BM_POOL_BASE_REG(bm_pool->id),
bm_pool->phys_addr);
mvneta_bm_pool_target_set(priv, bm_pool->id, target_id, attr);
mvneta_bm_pool_enable(priv, bm_pool->id);
It appears the pool capacity (MVNETA_BM_POOL_SIZE_REG) and the read/write
pointers (MVNETA_BM_POOL_READ_PTR_REG and MVNETA_BM_POOL_WRITE_PTR_REG)
are missing, as they are normally configured in mvneta_bm_pools_init()
which isn't called during resume.
Since the BM loses its register state during suspend, won't the hardware
operate with uninitialized or zero capacity after resume? Could this lead
to incorrectly wrapped pointers or dropped buffers when refilling the pool?
> + return 0;
> +}
> +#endif
^ permalink raw reply
* Re: [PATCH v2] net: mvneta_bm: add suspend/resume support to prevent crash after resume
From: Simon Horman @ 2026-06-15 16:59 UTC (permalink / raw)
To: yun.zhou
Cc: Simon Horman, marcin.s.wojtas, andrew+netdev, davem, edumazet,
kuba, pabeni, netdev, linux-kernel
In-Reply-To: <20260614103835.2708515-1-yun.zhou@windriver.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
---
net: mvneta_bm: add suspend/resume support to prevent crash after resume
This patch adds suspend/resume callbacks to the mvneta_bm driver so the
hardware Buffer Manager is drained on suspend and reinitialised on resume,
and adds a device_link from mvneta to mvneta_bm so the ordering of the two
drivers' PM transitions is enforced.
> diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> index 0c061fb0ed07..cfaf5ea1db9e 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -5678,6 +5678,11 @@ static int mvneta_probe(struct platform_device *pdev)
> "use SW buffer management\n");
> mvneta_bm_put(pp->bm_priv);
> pp->bm_priv = NULL;
> + } else {
> + /* Ensure BM suspends after us, resumes before us */
> + device_link_add(&pdev->dev,
> + &pp->bm_priv->pdev->dev,
> + DL_FLAG_AUTOREMOVE_CONSUMER);
[Medium]
Should the return value of device_link_add() be checked here?
device_link_add() can return NULL on failure (allocation failure,
supplier-not-yet-bound, invalid flag combinations, or a pre-existing
reverse dependency). The commit message and the comment both indicate
that the suspend/resume ordering correctness depends on this link
existing.
If the link is not created, the system silently regresses to the
pre-patch ordering: BM may suspend before mvneta finishes its own
suspend, or resume after mvneta and hand back stale buffer pointers.
The surrounding block already falls back to SW buffer management when
mvneta_bm_port_init() fails; would a similar dev_warn (or fallback) be
appropriate here when device_link_add() returns NULL?
> }
> }
> /* Set RX packet offset correction for platforms, whose
> diff --git a/drivers/net/ethernet/marvell/mvneta_bm.c b/drivers/net/ethernet/marvell/mvneta_bm.c
> index 6bb380494919..ff2fe4020a45 100644
> --- a/drivers/net/ethernet/marvell/mvneta_bm.c
> +++ b/drivers/net/ethernet/marvell/mvneta_bm.c
> @@ -477,6 +477,52 @@ static void mvneta_bm_remove(struct platform_device *pdev)
[ ... ]
> +static int mvneta_bm_resume(struct device *dev)
> +{
> + struct mvneta_bm *priv = dev_get_drvdata(dev);
> + int err;
> +
> + err = clk_prepare_enable(priv->clk);
> + if (err)
> + return err;
> +
> + /* Reinitialize BM hardware; pools are refilled by mvneta_resume() */
> + mvneta_bm_default_set(priv);
> + mvneta_bm_write(priv, MVNETA_BM_COMMAND_REG, MVNETA_BM_START_MASK);
> + return 0;
> +}
[High]
Is this hardware re-initialisation on resume complete?
The commit message says "mvneta disables its clock, causing BM to lose
all buffer address state". If clock gating loses BM state, the per-pool
size and ring pointer registers programmed in mvneta_bm_pools_init()
would also be lost:
/* Reset read pointer */
mvneta_bm_write(priv, MVNETA_BM_POOL_READ_PTR_REG(i), 0);
/* Reset write pointer */
mvneta_bm_write(priv, MVNETA_BM_POOL_WRITE_PTR_REG(i), 0);
...
mvneta_bm_write(priv, MVNETA_BM_POOL_SIZE_REG(i),
bm_pool->hwbm_pool.size);
mvneta_bm_default_set() only touches MVNETA_BM_INTR_MASK_REG,
MVNETA_BM_INTR_CAUSE_REG, and the burst-size bits in
MVNETA_BM_CONFIG_REG.
The downstream mvneta_bm_pool_create() invoked via mvneta_resume() ->
mvneta_bm_port_init() -> mvneta_bm_pool_use() writes only
POOL_BASE_REG, the XBAR target/attr, and the ENABLE bit; it does not
program POOL_SIZE_REG, POOL_READ_PTR_REG, or POOL_WRITE_PTR_REG.
After resume, POOL_SIZE_REG would hold its hardware reset value while
hwbm_pool->size in software is non-zero. Could buffer puts via
mvneta_bm_pool_put_bp() then misuse the BPPE ring and recreate the
class of failure this patch is trying to fix?
Would it be more correct for mvneta_bm_resume() to call
mvneta_bm_pools_init() (or rewrite SIZE/READ_PTR/WRITE_PTR for every
pool) instead of only mvneta_bm_default_set()?
> +#endif
> +
> +static SIMPLE_DEV_PM_OPS(mvneta_bm_pm_ops, mvneta_bm_suspend, mvneta_bm_resume);
> +
^ permalink raw reply
* [PATCH net-next] eth: ionic: delete the incorrect link_down_count reporting
From: Jakub Kicinski @ 2026-06-15 17:01 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, Jakub Kicinski,
brett.creeley, eric.joyner, leitao
The definition of the statistic is quite clear,
struct ethtool_link_ext_stats says:
This statistic counts when PHY _actually_ went down, or lost link.
Reportedly this is not how the device-counted stat on ionic behaves.
The goal is to detect flapping links, due to bad cabling.
ionic reportedly uses this for some firmware stat of how many times
the traffic was stopped. This is _not_ what should be reported here.
Link: https://lore.kernel.org/20260610061830.51037-1-eric.joyner@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: brett.creeley@amd.com
CC: eric.joyner@amd.com
CC: leitao@debian.org
---
drivers/net/ethernet/pensando/ionic/ionic_ethtool.c | 10 ----------
1 file changed, 10 deletions(-)
diff --git a/drivers/net/ethernet/pensando/ionic/ionic_ethtool.c b/drivers/net/ethernet/pensando/ionic/ionic_ethtool.c
index 78a802eb159f..f7dcfe3d032d 100644
--- a/drivers/net/ethernet/pensando/ionic/ionic_ethtool.c
+++ b/drivers/net/ethernet/pensando/ionic/ionic_ethtool.c
@@ -111,15 +111,6 @@ static void ionic_get_regs(struct net_device *netdev, struct ethtool_regs *regs,
memcpy_fromio(p + offset, idev->dev_cmd_regs->words, size);
}
-static void ionic_get_link_ext_stats(struct net_device *netdev,
- struct ethtool_link_ext_stats *stats)
-{
- struct ionic_lif *lif = netdev_priv(netdev);
-
- if (lif->ionic->pdev->is_physfn)
- stats->link_down_events = lif->link_down_count;
-}
-
static int ionic_get_link_ksettings(struct net_device *netdev,
struct ethtool_link_ksettings *ks)
{
@@ -1132,7 +1123,6 @@ static const struct ethtool_ops ionic_ethtool_ops = {
.get_regs_len = ionic_get_regs_len,
.get_regs = ionic_get_regs,
.get_link = ethtool_op_get_link,
- .get_link_ext_stats = ionic_get_link_ext_stats,
.get_link_ksettings = ionic_get_link_ksettings,
.set_link_ksettings = ionic_set_link_ksettings,
.get_coalesce = ionic_get_coalesce,
--
2.54.0
^ permalink raw reply related
* Re: [PATCH net-next v4 0/5] ionic: Expose more port stats to ethtool
From: Jakub Kicinski @ 2026-06-15 17:03 UTC (permalink / raw)
To: Eric Joyner
Cc: netdev, Brett Creeley, Andrew Lunn, David S. Miller, Eric Dumazet,
Paolo Abeni, Jacob Keller
In-Reply-To: <4f6de4ed-c687-4d1b-86fd-1c8e63e08fb3@amd.com>
On Sat, 13 Jun 2026 18:34:23 -0700 Eric Joyner wrote:
> I could add the overflow and reset handling (I was working on it for a v5), but
> to me, it didn't seem like it was worth the effort to modify the stat from
> firmware instead of continuing to use the existing driver-calculated stat. It
> wasn't really a "can I do this" question but more "is it worth doing this?"
>
> To start, it didn't seem like there was a specific standard that the API
> expected; it looked like "copy straight from the adapter if you can, or just
> calculate something if you can't". It's hard to tell what the actual
> expectations for the interface outside of the struct comment since there aren't
> many drivers using it. The Mellanox and Broadcom driver stat handlers just read
> the link down count straight from hardware (though I don't know if those stats
> get reset on driver load or reboot and we're the outlier in the way our firmware
> behaves). The Intel drivers are similar to what we do now in the ionic driver,
> and they just count the link down events that the driver detects, but I know
> that they don't expose a link down count from the hardware. What we do now
> seems good enough for that purpose? It's bigger than 16 bits and gets reset on
> driver load, so why add more code to handle something that this seems to do well
> enough?
>
> But, I also still want the raw firmware stat because the firmware has a
> different lifetime than the driver, and so it will count link down events while
> the driver isn't loaded or in a state to receive link events. But since maybe
> that's only useful for debugging, it belongs in debugfs instead? I just thought
> drivers could more or less put what they want in `ethtool -S` output, so that
> seemed like an okay place to put it.
I sent a patch to delete this incorrect stat ASAP. Feel free to add it
to ethtool -S under a name that clearly indicates that it's _not_ the
phy down count in the next cycle.
^ permalink raw reply
* Re: [PATCH net-next] eth: ionic: delete the incorrect link_down_count reporting
From: Breno Leitao @ 2026-06-15 17:14 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms,
brett.creeley, eric.joyner
In-Reply-To: <20260615170153.592171-1-kuba@kernel.org>
On Mon, Jun 15, 2026 at 10:01:53AM -0700, Jakub Kicinski wrote:
> The definition of the statistic is quite clear,
> struct ethtool_link_ext_stats says:
>
> This statistic counts when PHY _actually_ went down, or lost link.
>
> Reportedly this is not how the device-counted stat on ionic behaves.
> The goal is to detect flapping links, due to bad cabling.
> ionic reportedly uses this for some firmware stat of how many times
> the traffic was stopped. This is _not_ what should be reported here.
>
> Link: https://lore.kernel.org/20260610061830.51037-1-eric.joyner@amd.com
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Breno Leitao <leitao@debian.org>
CC: brett.creeley@amd.com
> CC: eric.joyner@amd.com
> CC: leitao@debian.org
> ---
> drivers/net/ethernet/pensando/ionic/ionic_ethtool.c | 10 ----------
> 1 file changed, 10 deletions(-)
>
> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_ethtool.c b/drivers/net/ethernet/pensando/ionic/ionic_ethtool.c
> index 78a802eb159f..f7dcfe3d032d 100644
> --- a/drivers/net/ethernet/pensando/ionic/ionic_ethtool.c
> +++ b/drivers/net/ethernet/pensando/ionic/ionic_ethtool.c
> @@ -111,15 +111,6 @@ static void ionic_get_regs(struct net_device *netdev, struct ethtool_regs *regs,
> memcpy_fromio(p + offset, idev->dev_cmd_regs->words, size);
> }
>
> -static void ionic_get_link_ext_stats(struct net_device *netdev,
> - struct ethtool_link_ext_stats *stats)
> -{
> - struct ionic_lif *lif = netdev_priv(netdev);
> -
> - if (lif->ionic->pdev->is_physfn)
> - stats->link_down_events = lif->link_down_count;
It seems this is the only place where link_down_count is read. Maybe you
want to kill it as well?
^ permalink raw reply
* [PATCH net] netdev-genl: report NAPI thread PID in the caller's pid namespace
From: Maoyi Xie @ 2026-06-15 17:17 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Daniel Borkmann, Nikolay Aleksandrov, David Wei,
Stanislav Fomichev, Dragos Tatulea, Samiullah Khawaja, netdev,
linux-kernel, stable
netdev_nl_napi_fill_one() reports the NAPI kthread PID in NETDEV_A_NAPI_PID
using task_pid_nr(), which returns the PID in the initial pid namespace.
NETDEV_CMD_NAPI_GET does not have GENL_ADMIN_PERM and the netdev genl family
is netnsok, so a caller in a child pid namespace can issue it. That caller
then sees the kthread's global PID, even though the kthread is not visible
in its pid namespace, where the value should be 0.
Translate the PID through the caller's pid namespace, the same way commit
3799c2570982 ("io_uring/fdinfo: translate SqThread PID through caller's
pid_ns") did for the io_uring SQPOLL thread. The doit and dumpit paths both
run synchronously in the caller's context, so task_active_pid_ns(current) is
the caller's pid namespace.
Fixes: db4704f4e4df ("netdev-genl: Add PID for the NAPI thread")
Cc: stable@vger.kernel.org
Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com>
---
net/core/netdev-genl.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index b8f6076d8007..4c23e985cc01 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -2,6 +2,7 @@
#include <linux/netdevice.h>
#include <linux/notifier.h>
+#include <linux/pid_namespace.h>
#include <linux/rtnetlink.h>
#include <net/busy_poll.h>
#include <net/net_namespace.h>
@@ -189,7 +190,8 @@ netdev_nl_napi_fill_one(struct sk_buff *rsp, struct napi_struct *napi,
goto nla_put_failure;
if (napi->thread) {
- pid = task_pid_nr(napi->thread);
+ pid = task_pid_nr_ns(napi->thread,
+ task_active_pid_ns(current));
if (nla_put_u32(rsp, NETDEV_A_NAPI_PID, pid))
goto nla_put_failure;
}
--
2.43.0
^ permalink raw reply related
* Re: [PATCH net-next v3] net: phy: sfp: detect presence via I2C when no MOD_DEF0 GPIO
From: Maxime Chevallier @ 2026-06-15 17:19 UTC (permalink / raw)
To: Greg Patrick, Russell King, Andrew Lunn, Heiner Kallweit
Cc: netdev, linux-kernel, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Manuel Stocker
In-Reply-To: <20260611175341.2223184-1-gregspatrick@hotmail.com>
Hi,
On 6/11/26 19:53, Greg Patrick wrote:
> An SFP cage (compatible "sff,sfp") whose MOD_DEF0 signal is not wired to a
> GPIO currently falls back to sff_gpio_get_state(), which unconditionally
> reports the module as present. An empty cage therefore fails its probe and
> is parked in SFP_MOD_ERROR forever; because SFP_F_PRESENT never deasserts
> there is no REMOVE event to recover the state machine, so a module inserted
> after boot is never detected, and empty cages spam -EIO at boot.
>
> This affects boards that route none of the cage presence signal to a
> software-readable input. On the NicGiga S100-0800S-M (RTL9303, 8x SFP+) the
> cage I2C bus is the switch's SMBus master; TX_DISABLE is driven via a
> PCA9534 I/O expander, but no MOD_ABS/MOD_DEF0 line reaches a readable GPIO
> (the RTL9303 gpio0 lines read stuck-low, the single PCA9534 is fully
> consumed by TX_DISABLE, and there is no RTL8231). The Horaco ZX-SW82TS-L2P
> (RTL9302D, 2x SFP+) is independently affected in the same way.
>
> For such an SFP cage, derive presence from a throttled single-byte I2C read
> of the module EEPROM instead: a successful read asserts SFP_F_PRESENT,
> R_PROBE_ABSENT consecutive failures clear it (to ride out a transient error
> on a live module). The existing poll then emits SFP_E_INSERT / SFP_E_REMOVE
> normally, giving working hot-plug and silencing the boot-time -EIO spam on
> empty cages. Presence is re-probed every T_PROBE_PRESENT, so insertion is
> detected within that interval and removal within
> T_PROBE_PRESENT * R_PROBE_ABSENT.
>
> A soldered-down module (compatible "sff,sff") has no presence signal and is
> genuinely always present, so it continues to use sff_gpio_get_state(); the
> new path is gated on the cage type advertising SFP_F_PRESENT.
>
> Signed-off-by: Greg Patrick <gregspatrick@hotmail.com>
> Tested-by: Manuel Stocker <mensi@mensi.ch>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
And it doesn't regress any boards I've tested this on (although this was
very quick testing), so
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
I'm wondering if a followup could be to add another loud warning to
dmesg that the HW design is broken, leading to potentially quirky behaviour.
We have lots of those for SFP.
Maxime
^ permalink raw reply
* Re: [PATCH net] appletalk: Hold socket reference in atalk_rcv()
From: Jakub Kicinski @ 2026-06-15 17:23 UTC (permalink / raw)
To: Eric Dumazet
Cc: Yizhou Zhao, netdev, David S. Miller, Paolo Abeni, Simon Horman,
Kees Cook, Kito Xu, linux-kernel, Yuxiang Yang, Ao Wang,
Xuewei Feng, Qi Li, Ke Xu, stable
In-Reply-To: <CANn89iLwy0tsB5wMrREnGSvmPrThyCkjHEz0hpWbCiTJSG0NCA@mail.gmail.com>
On Mon, 15 Jun 2026 09:53:59 -0700 Eric Dumazet wrote:
> > atalk_search_socket() walks the global atalk_sockets list while holding
> > atalk_sockets_lock, but it returns the matching socket after dropping the
> > lock without taking a reference. atalk_rcv() then passes that pointer to
> > sock_queue_rcv_skb().
> >
> > That leaves a race with close(). A concurrent atalk_release() can orphan
> > the socket, remove it from atalk_sockets, and drop the final reference via
> > atalk_destroy_socket(), freeing the socket before atalk_rcv() queues the
> > incoming skb.
> >
> > On a KASAN-enabled kernel this can be reproduced by racing AppleTalk DDP
> > delivery on loopback against close/rebind of the destination DGRAM socket:
> >
> > BUG: KASAN: slab-use-after-free in selinux_socket_sock_rcv_skb()
> > sk_filter_trim_cap()
> > sock_queue_rcv_skb_reason()
> > atalk_rcv()
> > snap_rcv()
> > llc_rcv()
> >
> > Take a reference on the selected socket before dropping
> > atalk_sockets_lock, and put it after sock_queue_rcv_skb() has finished.
> > This keeps the socket alive for the receive path without changing socket
> > lookup semantics. A malformed or racing receive still drops the skb on
> > queueing failure as before.
>
> No idea why linux still carries appletalk.
>
> MacOS dropped it 20 years ago.
Yes. Let me try to move it to mod-orphan.
^ permalink raw reply
* Re: [PATCH net-next] eth: ionic: delete the incorrect link_down_count reporting
From: Creeley, Brett @ 2026-06-15 17:29 UTC (permalink / raw)
To: Breno Leitao, Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms,
brett.creeley, eric.joyner
In-Reply-To: <ajAye8LnVeqkcGLZ@gmail.com>
On 6/15/2026 10:14 AM, Breno Leitao wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>
>
> On Mon, Jun 15, 2026 at 10:01:53AM -0700, Jakub Kicinski wrote:
>> The definition of the statistic is quite clear,
>> struct ethtool_link_ext_stats says:
>>
>> This statistic counts when PHY _actually_ went down, or lost link.
>>
>> Reportedly this is not how the device-counted stat on ionic behaves.
>> The goal is to detect flapping links, due to bad cabling.
>> ionic reportedly uses this for some firmware stat of how many times
>> the traffic was stopped. This is _not_ what should be reported here.
>>
>> Link: https://lore.kernel.org/20260610061830.51037-1-eric.joyner@amd.com
>> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> Acked-by: Breno Leitao <leitao@debian.org>
>
> CC: brett.creeley@amd.com
>> CC: eric.joyner@amd.com
>> CC: leitao@debian.org
>> ---
>> drivers/net/ethernet/pensando/ionic/ionic_ethtool.c | 10 ----------
>> 1 file changed, 10 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_ethtool.c b/drivers/net/ethernet/pensando/ionic/ionic_ethtool.c
>> index 78a802eb159f..f7dcfe3d032d 100644
>> --- a/drivers/net/ethernet/pensando/ionic/ionic_ethtool.c
>> +++ b/drivers/net/ethernet/pensando/ionic/ionic_ethtool.c
>> @@ -111,15 +111,6 @@ static void ionic_get_regs(struct net_device *netdev, struct ethtool_regs *regs,
>> memcpy_fromio(p + offset, idev->dev_cmd_regs->words, size);
>> }
>>
>> -static void ionic_get_link_ext_stats(struct net_device *netdev,
>> - struct ethtool_link_ext_stats *stats)
>> -{
>> - struct ionic_lif *lif = netdev_priv(netdev);
>> -
>> - if (lif->ionic->pdev->is_physfn)
>> - stats->link_down_events = lif->link_down_count;
> It seems this is the only place where link_down_count is read. Maybe you
> want to kill it as well?
It still shows in debugfs, but we'd be fine with it being removed
completely in favor of the correct implementation from Eric at:
https://lore.kernel.org/netdev/20260614205303.48088-5-eric.joyner@amd.com/.
Thanks,
Brett
^ permalink raw reply
* Re: [PATCH net 0/3][pull request] Intel Wired LAN Driver Updates 2026-06-09 (idpf, ixgbe, igc)
From: Tony Nguyen @ 2026-06-15 17:35 UTC (permalink / raw)
To: Paolo Abeni, Simon Horman, Przemyslaw Korba, khai.wen.tan
Cc: davem, kuba, edumazet, andrew+netdev, netdev
In-Reply-To: <7ede2980-e7ac-4311-99af-5cca54841f36@redhat.com>
On 6/13/2026 2:49 AM, Paolo Abeni wrote:
> On 6/12/26 2:03 PM, Simon Horman wrote:
>> On Tue, Jun 09, 2026 at 10:24:53AM -0700, Tony Nguyen wrote:
>>> Przemyslaw adds needed padding to idpf PTP structures to match firmware
>>> expectations.
>>>
>>> Larysa bypasses XPS configuration on XDP queues for ixgbe.
>>>
>>> Khai Wen corrects offset into packet buffer when handling for frame
>>> preemption on igc.
>>
>> Hi Tony, Przemyslaw, and KhaiWenTan,
>>
>> There is AI-generated review of this patch-set available on both
>> https://sashiko.dev and https://netdev-ai.bots.linux.dev/sashiko/
>>
>> I would appreciate it if you could look over the AI-generated review of
>> patches 1 and 3.
>
> Going over such comments looks like pre-existing issues or things better
> handled as follow-up.
For idpf, Przemyslaw is preparing a patch for the uninitialized memory
issue. I don't believe the firmware is a problem, but we are
investigating/testing this.
For igc, KhaiWen is looking into them.
Thanks,
Tony
> /P
>
^ permalink raw reply
* Re: [PATCH net 2/2] net: pse-pd: guard against freed PI data on regulator disable
From: Carlo Szelinsky @ 2026-06-15 18:00 UTC (permalink / raw)
To: Simon Horman
Cc: Oleksij Rempel, Kory Maincent, Andrew Lunn, David S . Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel,
Carlo Szelinsky
In-Reply-To: <20260527122418.2410341-2-horms@kernel.org>
Hi Simon,
Thanks for forwarding this... I'd missed it the first time round.
Both points are valid; answering inline.
> [High]
> Is the pcdev->pi = NULL store correctly synchronized with the readers?
> ...
> Would taking pcdev->lock around the kfree()+NULL store in
> pse_release_pis() (so the NULL observation under pcdev->lock is
> authoritative) close that window? Alternatively, can the regulator
> unregister be ordered to run before pse_release_pis() so no consumer
> can invoke a regulator op while the PI array is being torn down?
Agreed, the guard is not authoritative as it stands. For v2 I'd take
pcdev->lock around the kfree() + pcdev->pi = NULL in pse_release_pis(),
so the NULL observed under the lock is authoritative.
I leaned away from reordering the regulator unregister, because both
the PI regulators and their consumer are devm-bound to pcdev->dev
(devm_regulator_register() and devm_regulator_get_exclusive()), so
reordering means tearing the regulators down by hand in
pse_controller_unregister() instead of letting devres do it, heavier
than a net fix wants. Does the contained fix (lock around the free)
work for you, or would you rather see the reorder?
One thing I'd like your read on for the commit message: the consumer
get is exclusive on pcdev->dev, so I couldn't pin down a concrete
external consumer that calls a regulator op on another CPU during
unbind. Do you have a specific path in mind, or should I describe the
lock as closing a narrow theoretical window rather than a proven race?
I'll add it either way.. I just want to make sure we are aligned :-)
> [High]
> Should the same NULL guard be applied to pse_pi_is_enabled() and
> pse_pi_enable()?
> ...
> Should these two callbacks receive the same guard, or alternatively
> should the unwind order be changed so that no callback can fire after
> the PI array is freed?
Yes. Both follow the same rdev_get_drvdata() -> mutex_lock() ->
unconditional pcdev->pi[id] pattern as the disable path, so for v2 I'll
add the same !pcdev->pi guard to pse_pi_is_enabled() and
pse_pi_enable().
Thanks,
Carlo
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox