Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* Re: [PATCH 1/2] mm/memory-failure: add panic_on_unrecoverable_memory_failure sysctl
From: Breno Leitao @ 2026-03-31 10:25 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: linux-mm, linux-kernel, linux-doc, kernel-team, Naoya Horiguchi,
	Andrew Morton, Jonathan Corbet, Shuah Khan
In-Reply-To: <d8d2a5ad-9b8a-f0e2-3eb0-ee820eb7a148@huawei.com>

Hi Miaohe,

On Tue, Mar 31, 2026 at 10:27:33AM +0800, Miaohe Lin wrote:
> On 2026/3/30 21:45, Breno Leitao wrote:
> > On Mon, Mar 30, 2026 at 03:55:00PM +0800, Miaohe Lin wrote:
> >> On 2026/3/23 23:29, Breno Leitao wrote:
> >>
> >>> @@ -1298,6 +1309,10 @@ static int action_result(unsigned long pfn, enum mf_action_page_type type,
> >>>  	pr_err("%#lx: recovery action for %s: %s\n",
> >>>  		pfn, action_page_types[type], action_name[result]);
> >>>
> >>> +	if (sysctl_panic_on_unrecoverable_mf &&
> >>> +	    type == MF_MSG_GET_HWPOISON && result == MF_IGNORED)
> >>> +		panic("Memory failure: %#lx: unrecoverable page", pfn);
> >>
> >> MF_MSG_GET_HWPOISON contains some other scenarios. For example, an isolated folio will
> >> make get_hwpoison_page return -EIO so we will see MF_MSG_GET_HWPOISON and MF_IGNORED in
> >> action_result. But that's recoverable if folio is used by userspace thus panic will be
> >> unacceptable.
> >> Will it better to check type against MF_MSG_KERNEL_HIGH_ORDER?
> >
> > Yes, I was discussing this with akpm, and maybe the better
> > approach would be to panic for types MF_MSG_KERNEL_HIGH_ORDER and MF_MSG_KERNEL.
> >
> > In both cases, it seems that, the page would not be able to migrate. What do
> > you think about a change like this:
> >
> >
> > @@ -1298,6 +1309,10 @@ static int action_result(unsigned long pfn, enum mf_action_page_type type,
> >         pr_err("%#lx: recovery action for %s: %s\n",
> >                 pfn, action_page_types[type], action_name[result]);
> >
> > +       if (sysctl_panic_on_unrecoverable_mf && result == MF_IGNORED &&
> > +           (type == MF_MSG_KERNEL || type == MF_MSG_KERNEL_HIGH_ORDER))
> > +               panic("Memory failure: %#lx: unrecoverable page", pfn);
> > +
> >         return (result == MF_RECOVERED || result == MF_DELAYED) ? 0 : -EBUSY;
> >  }
> >
>
> Maybe MF_MSG_UNKNOWN can also be considered? Kernel can't do anything further
> for those folios.

Agreed, I'll incorporate that change.

> BTW I think current code can't reach to MF_MSG_KERNEL and MF_MSG_UNKNOWN cases
> bacause there is always a (PageHuge() || HWPoisonHandlable()) check before calling
> identify_page_state.

You're absolutely right. I'd like to address this observation as well in the
updated patch.

Thanks,
--breno

^ permalink raw reply

* Re: [PATCH v2] bootconfig: Apply early options from embedded config
From: Kiryl Shutsemau @ 2026-03-31 10:18 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Breno Leitao, Jonathan Corbet, Shuah Khan, linux-kernel,
	linux-trace-kernel, linux-doc, oss, paulmck, rostedt, kernel-team
In-Reply-To: <20260325232204.05edbb21c7602b6408ca007b@kernel.org>

On Wed, Mar 25, 2026 at 11:22:04PM +0900, Masami Hiramatsu wrote:
> > diff --git a/init/main.c b/init/main.c
> > index 453ac9dff2da0..14a04c283fa48 100644
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -416,9 +416,64 @@ static int __init warn_bootconfig(char *str)
> >  	return 0;
> >  }
> >  
> > +/*
> > + * do_early_param() is defined later in this file but called from
> > + * bootconfig_apply_early_params() below, so we need a forward declaration.
> > + */
> > +static int __init do_early_param(char *param, char *val,
> > +				 const char *unused, void *arg);
> > +
> > +/*
> > + * bootconfig_apply_early_params - dispatch kernel.* keys from the embedded
> > + * bootconfig as early_param() calls.
> > + *
> > + * early_param() handlers must run before most of the kernel initialises
> > + * (e.g. before the GIC driver reads irqchip.gicv3_pseudo_nmi).  A bootconfig
> > + * attached to the initrd arrives too late for this because the initrd is not
> > + * mapped yet when early params are processed.  The embedded bootconfig lives
> > + * in the kernel image itself (.init.data), so it is always reachable.
> > + *
> > + * This function is called from setup_boot_config() which runs in
> > + * start_kernel() before parse_early_param(), making the timing correct.
> > + */
> > +static void __init bootconfig_apply_early_params(void)
> 
> [sashiko comment]
> | Does this run early enough for architectural parameters?
> | While setup_boot_config() runs before parse_early_param() in start_kernel(),
> | it runs after setup_arch(). setup_boot_config() relies on xbc_init() which
> | uses the memblock allocator, requiring setup_arch() to have already
> | initialized it.
> | However, the kernel expects many early parameters (like mem=, earlycon,
> | noapic, and iommu) to be parsed during setup_arch() via the architecture's
> | call to parse_early_param(). Since setup_arch() completes before
> | setup_boot_config() runs, will these architectural early parameters be
> | silently ignored because the decisions they influence were already
> | finalized?
> 
> This is the major reason that I did not support early parameter
> in bootconfig. Some archs initialize kernel_cmdline in setup_arch()
> and setup early parameters in it.
> To fix this, we need to change setup_arch() for each architecture so
> that it calls this bootconfig_apply_early_params().

Hi Masami,

I have a question about bootconfig design. Is it necessary to parse the
bootconfig at boot time?

We could avoid a lot of complexity if we flattened the bootconfig into a
simple key-value list before embedding it in the kernel image or
attaching it to the initrd. That would eliminate the need for memory
allocations and allow the config to be used earlier during boot.

What am I missing?

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply

* Re: [PATCH v8 2/3] hwmon: ltc4283: Add support for the LTC4283 Swap Controller
From: Nuno Sá @ 2026-03-31  9:48 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Nuno Sá, linux-gpio, linux-hwmon, devicetree, linux-doc,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, Jonathan Corbet,
	Linus Walleij, Bartosz Golaszewski
In-Reply-To: <e0c96f38-6742-4b86-8938-64e4e6063119@roeck-us.net>

On Mon, Mar 30, 2026 at 08:47:32AM -0700, Guenter Roeck wrote:
> On 3/30/26 02:28, Nuno Sá wrote:
> > Hi Guenter, Regarding AI review, I think most of the points were
> > discussed in previous revisions, but there are two valid.
> > 
> > On Fri, Mar 27, 2026 at 05:26:15PM +0000, Nuno Sá wrote:
> > > Support the LTC4283 Hot Swap Controller. The device features programmable
> > > current limit with foldback and independently adjustable inrush current to
> > > optimize the MOSFET safe operating area (SOA). The SOA timer limits MOSFET
> > > temperature rise for reliable protection against overstresses.
> > > 
> > > An I2C interface and onboard ADC allow monitoring of board current,
> > > voltage, power, energy, and fault status.
> > > 
> > > Signed-off-by: Nuno Sá <nuno.sa@analog.com>
> > > ---
> > >   Documentation/hwmon/index.rst   |    1 +
> > >   Documentation/hwmon/ltc4283.rst |  266 ++++++
> > >   MAINTAINERS                     |    1 +
> > >   drivers/hwmon/Kconfig           |   12 +
> > >   drivers/hwmon/Makefile          |    1 +
> > >   drivers/hwmon/ltc4283.c         | 1796 +++++++++++++++++++++++++++++++++++++++
> > >   6 files changed, 2077 insertions(+)
> > > 
> > 
> > ...
> > 
> > > +static int ltc4283_read_in_alarm(struct ltc4283_hwmon *st, u32 channel,
> > > +				 bool max_alm, long *val)
> > > +{
> > > +	if (channel == LTC4283_VPWR)
> > > +		return ltc4283_read_alarm(st, LTC4283_ADC_ALM_LOG_1,
> > > +					  BIT(2 + max_alm), val);
> > > +
> > > +	if (channel >= LTC4283_CHAN_ADI_1 && channel <= LTC4283_CHAN_ADI_4) {
> > > +		u32 bit = (channel - LTC4283_CHAN_ADI_1) * 2;
> > > +		/*
> > > +		 * Lower channels go to higher bits. We also want to go +1 down
> > > +		 * in the min_alarm case.
> > > +		 */
> > > +		return ltc4283_read_alarm(st, LTC4283_ADC_ALM_LOG_2,
> > > +					  BIT(7 - bit - !max_alm), val);
> > > +	}
> > > +
> > > +	if (channel >= LTC4283_CHAN_ADIO_1 && channel <= LTC4283_CHAN_ADIO_4) {
> > > +		u32 bit = (channel - LTC4283_CHAN_ADIO_1) * 2;
> > > +
> > > +		return ltc4283_read_alarm(st, LTC4283_ADC_ALM_LOG_3,
> > > +					  BIT(7 - bit - !max_alm), val);
> > > +	}
> > > +
> > > +	if (channel >= LTC4283_CHAN_ADIN12 && channel <= LTC4283_CHAN_ADIN34) {
> > > +		u32 bit = (channel - LTC4283_CHAN_ADIN12) * 2;
> > > +
> > > +		return ltc4283_read_alarm(st, LTC4283_ADC_ALM_LOG_5,
> > > +					  BIT(7 - bit - !max_alm), val);
> > > +	}
> > 
> > "Will this condition handle the ADIO12 and ADIO34 differential channels?
> > It looks like channels 14 and 15 fall through to the default return intended
> > for the DRAIN channel. Since reading the alarm implicitly clears the register
> > bits, could reading these ADIO alarms unintentionally clear actual DRAIN
> > alarms? Should the upper bound be LTC4283_CHAN_ADIO34?"
> > 
> > Good catch and should be:
> > 
> > -       if (channel >= LTC4283_CHAN_ADIN12 && channel <= LTC4283_CHAN_ADIN34) {
> > +       if (channel >= LTC4283_CHAN_ADIN12 && channel <= LTC4283_CHAN_ADIO34) {
> > 
> > > +
> > > +	if (channel == LTC4283_CHAN_DRNS)
> > > +		return ltc4283_read_alarm(st, LTC4283_ADC_ALM_LOG_4,
> > > +					  BIT(6 + max_alm), val);
> > > +
> > > +	return ltc4283_read_alarm(st, LTC4283_ADC_ALM_LOG_4, BIT(4 + max_alm),
> > > +				  val);
> > > +}
> > 
> > ...
> > 
> > > +
> > > +static int ltc4283_probe(struct i2c_client *client)
> > > +{
> > > +	struct device *dev = &client->dev, *hwmon;
> > > +	struct auxiliary_device *adev;
> > > +	struct ltc4283_hwmon *st;
> > > +	int ret;
> > > +
> > > +	st = devm_kzalloc(dev, sizeof(*st), GFP_KERNEL);
> > > +	if (!st)
> > > +		return -ENOMEM;
> > > +
> > > +	if (!i2c_check_functionality(client->adapter,
> > > +				     I2C_FUNC_SMBUS_BYTE_DATA |
> > > +				     I2C_FUNC_SMBUS_WORD_DATA |
> > > +				     I2C_FUNC_SMBUS_READ_I2C_BLOCK))
> > > +		return -EOPNOTSUPP;
> > > +
> > > +	st->client = client;
> > > +	st->map = devm_regmap_init(dev, &ltc4283_regmap_bus, client,
> > > +				   &ltc4283_regmap_config);
> > > +	if (IS_ERR(st->map))
> > > +		return dev_err_probe(dev, PTR_ERR(st->map),
> > > +				     "Failed to create regmap\n");
> > > +
> > > +	ret = ltc4283_setup(st, dev);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	hwmon = devm_hwmon_device_register_with_info(dev, "ltc4283", st,
> > > +						     &ltc4283_chip_info, NULL);
> > > +
> > > +	if (IS_ERR(hwmon))
> > > +		return PTR_ERR(hwmon);
> > > +
> > > +	ltc4283_debugfs_init(st, client);
> > > +
> > > +	if (!st->gpio_mask)
> > > +		return 0;
> > > +
> > > +	adev = devm_auxiliary_device_create(dev, "gpio", &st->gpio_mask);
> > > +	if (!adev)
> > > +		return dev_err_probe(dev, -ENODEV, "Failed to add GPIO device\n");
> > 
> > "Does this allow multiple LTC4283 chips to probe successfully?
> > Without allocating a unique ID per I2C instance, it seems the first probed
> > chip takes the generic name. If a second chip is present, it might attempt
> > to register with the exact same name, resulting in a failure in device_add()
> > and aborting the probe."
> > 
> > Also looks valid and I suspect is one of those that a quick look will
> > find more "offenders". I would purpose:
> > 
> > -       adev = devm_auxiliary_device_create(dev, "gpio", &st->gpio_mask);
> > +       adev = __devm_auxiliary_device_create(dev, KBUILD_MODNAME, "gpio",
> > +                                             &st->gpio_mask, client->addr);
> > 
> 
> That would still fail if there are multiple chips at the same I2C address
> on multiple I2C busses. Check drivers/gpu/drm/bridge/ti-sn65dsi86.c which has
> the same problem.

I did looked at that one but totally forgot the multiple busses
scenario.

> 
> > If there's nothing else and you agree with the above, is this something
> > you can tweak while applying or should I spin a new version?
> > 
> 
> Please respin. Also, regarding the other concerns:
> 
>   Can BIT(8) * st->rsense wrap to zero on 32-bit architectures?
>   BIT(8) is a 32-bit unsigned long and st->rsense is a u32. If a user sets a
>   very large sense resistor value via the device tree, the multiplication could
>   wrap to 0, causing a division-by-zero kernel panic. Should the divisor use
>   BIT_ULL(8)?
> 
> Unless I am missing something, this _can_ overflow. Try to provide a sense
> resistor value of 1677721600. Yes, it is unreasonable to specify such large
> rsense values, but why not just limit it such that it does not overflow ?

Yes, that's pretty much my reasoning (regarding the unreasonable
rsense). I could just make BIT_ULL() and be done with it. I can also
also cap rsense to a max value but i'm not 100% what that value would
be. Maybe 1 ohm is already more than reasonable. I can also ask internally. Any
preference on this one?

> 
> Also, for the overflow concerns, if you are sure they can not happen, I'll
> really need to write the unit test code to make sure that this is indeed
> the case.
>

Hmm, for the val * MILLI case, well it should not happen but given it
depends on user input, better if I clamp it before passing the
value to ltc4283_write_in_byte(). Yes, we clamp again inside the
write_bytes() API but not a big deal.

For the st->power_max is again one of those cases where the values would
not make sense (I think - the combination of vsense_max and rsense). Just looking
at the code, it can overflow but this one I'm not really sure how we could handle it.
Maybe clamp power_max to U8_MAX and have a warning message in ltc4283_read_power_byte() if
we overflow long in which case we need a power64 attr?

But even clamping does not make much sense here. The power limit register
is 8 bits, so if our design (rsense + vsense_max) overflows that,
there's nothing we can do other that erroring out.

- Nuno Sá

> Thanks,
> Guenter
> 

^ permalink raw reply

* Re: [PATCH v5 2/4] iio: adc: ad4691: add initial driver for AD4691 family
From: Andy Shevchenko @ 2026-03-31  8:58 UTC (permalink / raw)
  To: Sabau, Radu bogdan
  Cc: Andy Shevchenko, Lars-Peter Clausen, Hennerich, Michael,
	Jonathan Cameron, David Lechner, Sa, Nuno, Andy Shevchenko,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Uwe Kleine-König, Liam Girdwood, Mark Brown, Linus Walleij,
	Bartosz Golaszewski, Philipp Zabel, Jonathan Corbet, Shuah Khan,
	linux-iio@vger.kernel.org, devicetree@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-pwm@vger.kernel.org,
	linux-gpio@vger.kernel.org, linux-doc@vger.kernel.org
In-Reply-To: <LV9PR03MB8414CB05EB794F6974584C2AF753A@LV9PR03MB8414.namprd03.prod.outlook.com>

On Tue, Mar 31, 2026 at 08:36:42AM +0000, Sabau, Radu bogdan wrote:
> > -----Original Message-----
> > From: Andy Shevchenko <andy.shevchenko@gmail.com>
> > Sent: Monday, March 30, 2026 8:24 PM

...

> > > > > +#include <linux/bitfield.h>
> > > > > +#include <linux/bitops.h>
> > > > > +#include <linux/cleanup.h>
> > > > > +#include <linux/delay.h>
> > > > > +#include <linux/device.h>
> > > >
> > > > Hmm... Is it used? Or perhaps you need only
> > > > dev_printk.h
> > > > device/devres.h
> > > > ?
> > 
> > > I have checked this out and it seems device.h doesn't actually need
> > > to be included anyway since spi.h directly includes device.h, and since
> > > this is a SPI driver that's never going away, it's covered. Will drop it!
> > 
> > No, this is the wrong justification. IWYU principle is about exact
> > match between what is used and included in a file (module). spi.h is
> > not dev_*() provider and may not be considered for that.
> > 
> 
> You are right, my justification was incorrect. Under IWYU, relying on
> spi.h's transitive pull of device.h is not valid. However, I think device.h
> is still needed in this case since struct device is used directly in the code
> both as local variables and in the regmap callbacks.

Really? I can't see that.
(Hint: use of the data type and use of its pointer is a huge difference.)

> Also dev_err_probe() is called directly and lives in device.h.

No, as I started with my replies. The proper header that provides it is
dev_printk.h.

> What's your take on this?

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply

* RE: [PATCH v5 2/4] iio: adc: ad4691: add initial driver for AD4691 family
From: Sabau, Radu bogdan @ 2026-03-31  8:36 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Andy Shevchenko, Lars-Peter Clausen, Hennerich, Michael,
	Jonathan Cameron, David Lechner, Sa, Nuno, Andy Shevchenko,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Uwe Kleine-König, Liam Girdwood, Mark Brown, Linus Walleij,
	Bartosz Golaszewski, Philipp Zabel, Jonathan Corbet, Shuah Khan,
	linux-iio@vger.kernel.org, devicetree@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-pwm@vger.kernel.org,
	linux-gpio@vger.kernel.org, linux-doc@vger.kernel.org
In-Reply-To: <CAHp75VcUCM8aeUpNaFEXnS+Cm08Mq5j+Qp2gYqWP9vCO+9CtQA@mail.gmail.com>



> -----Original Message-----
> From: Andy Shevchenko <andy.shevchenko@gmail.com>
> Sent: Monday, March 30, 2026 8:24 PM
> To: Sabau, Radu bogdan <Radu.Sabau@analog.com>
> 
> ...
> 
> > > > +#include <linux/bitfield.h>
> > > > +#include <linux/bitops.h>
> > > > +#include <linux/cleanup.h>
> > > > +#include <linux/delay.h>
> > > > +#include <linux/device.h>
> > >
> > > Hmm... Is it used? Or perhaps you need only
> > > dev_printk.h
> > > device/devres.h
> > > ?
> 
> > I have checked this out and it seems device.h doesn't actually need
> > to be included anyway since spi.h directly includes device.h, and since
> > this is a SPI driver that's never going away, it's covered. Will drop it!
> 
> No, this is the wrong justification. IWYU principle is about exact
> match between what is used and included in a file (module). spi.h is
> not dev_*() provider and may not be considered for that.
> 

You are right, my justification was incorrect. Under IWYU, relying on
spi.h's transitive pull of device.h is not valid. However, I think device.h
is still needed in this case since struct device is used directly in the code
both as local variables and in the regmap callbacks. Also dev_err_probe()
is called directly and lives in device.h.

What's your take on this?

Best Regards,
Radu

^ permalink raw reply

* [PATCH net-next v03 6/6] hinic3: Remove unneeded coalesce parameters
From: Fan Gong @ 2026-03-31  7:56 UTC (permalink / raw)
  To: Fan Gong, Zhu Yikai, netdev, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Andrew Lunn,
	Ioana Ciornei
  Cc: linux-kernel, linux-doc, luosifu, Xin Guo, Zhou Shuai, Wu Like,
	Shi Jing, Zheng Jiezhen, Maxime Chevallier
In-Reply-To: <cover.1774940117.git.zhuyikai1@h-partners.com>

  Remove unneeded coalesce parameters in irq handling.

Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
---
 drivers/net/ethernet/huawei/hinic3/hinic3_irq.c | 6 +-----
 drivers/net/ethernet/huawei/hinic3/hinic3_rx.h  | 3 ---
 2 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_irq.c b/drivers/net/ethernet/huawei/hinic3/hinic3_irq.c
index d3b3927b5408..42464c007174 100644
--- a/drivers/net/ethernet/huawei/hinic3/hinic3_irq.c
+++ b/drivers/net/ethernet/huawei/hinic3/hinic3_irq.c
@@ -156,13 +156,9 @@ static int hinic3_set_interrupt_moder(struct net_device *netdev, u16 q_id,
 	spin_unlock_irqrestore(&nic_dev->channel_res_lock, flags);
 
 	err = hinic3_set_interrupt_cfg(nic_dev->hwdev, info);
-	if (err) {
+	if (err)
 		netdev_err(netdev,
 			   "Failed to modify moderation for Queue: %u\n", q_id);
-	} else {
-		nic_dev->rxqs[q_id].last_coalesc_timer_cfg = coalesc_timer_cfg;
-		nic_dev->rxqs[q_id].last_pending_limit = pending_limit;
-	}
 
 	return err;
 }
diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_rx.h b/drivers/net/ethernet/huawei/hinic3/hinic3_rx.h
index cd2dcaab6cf7..a64a51d766c5 100644
--- a/drivers/net/ethernet/huawei/hinic3/hinic3_rx.h
+++ b/drivers/net/ethernet/huawei/hinic3/hinic3_rx.h
@@ -112,9 +112,6 @@ struct hinic3_rxq {
 	dma_addr_t             cqe_start_paddr;
 
 	struct dim             dim;
-
-	u8                     last_coalesc_timer_cfg;
-	u8                     last_pending_limit;
 } ____cacheline_aligned;
 
 struct hinic3_dyna_rxq_res {
-- 
2.43.0


^ permalink raw reply related

* [PATCH net-next v03 3/6] hinic3: Add ethtool coalesce ops
From: Fan Gong @ 2026-03-31  7:56 UTC (permalink / raw)
  To: Fan Gong, Zhu Yikai, netdev, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Andrew Lunn,
	Ioana Ciornei
  Cc: linux-kernel, linux-doc, luosifu, Xin Guo, Zhou Shuai, Wu Like,
	Shi Jing, Zheng Jiezhen, Maxime Chevallier
In-Reply-To: <cover.1774940117.git.zhuyikai1@h-partners.com>

  Implement following ethtool callback function:
.get_coalesce
.set_coalesce

  These callbacks allow users to utilize ethtool for detailed
RX coalesce configuration and monitoring.

Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
---
 .../ethernet/huawei/hinic3/hinic3_ethtool.c   | 233 +++++++++++++++++-
 1 file changed, 231 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_ethtool.c b/drivers/net/ethernet/huawei/hinic3/hinic3_ethtool.c
index 7fd8ad053c6e..a9599a63696f 100644
--- a/drivers/net/ethernet/huawei/hinic3/hinic3_ethtool.c
+++ b/drivers/net/ethernet/huawei/hinic3/hinic3_ethtool.c
@@ -17,6 +17,11 @@
 #include "hinic3_nic_cfg.h"
 
 #define HINIC3_MGMT_VERSION_MAX_LEN     32
+/* Coalesce time properties in microseconds */
+#define COALESCE_PENDING_LIMIT_UNIT     8
+#define COALESCE_TIMER_CFG_UNIT         5
+#define COALESCE_MAX_PENDING_LIMIT      (255 * COALESCE_PENDING_LIMIT_UNIT)
+#define COALESCE_MAX_TIMER_CFG          (255 * COALESCE_TIMER_CFG_UNIT)
 
 static void hinic3_get_drvinfo(struct net_device *netdev,
 			       struct ethtool_drvinfo *info)
@@ -986,9 +991,231 @@ static void hinic3_get_pause_stats(struct net_device *netdev,
 	kfree(ps);
 }
 
+static int hinic3_set_queue_coalesce(struct net_device *netdev, u16 q_id,
+				     struct hinic3_intr_coal_info *coal)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+	struct hinic3_intr_coal_info *intr_coal;
+	struct hinic3_interrupt_info info = {};
+	int err;
+
+	intr_coal = &nic_dev->intr_coalesce[q_id];
+
+	intr_coal->coalesce_timer_cfg = coal->coalesce_timer_cfg;
+	intr_coal->pending_limit = coal->pending_limit;
+	intr_coal->rx_pending_limit_low = coal->rx_pending_limit_low;
+	intr_coal->rx_pending_limit_high = coal->rx_pending_limit_high;
+
+	if (!test_bit(HINIC3_INTF_UP, &nic_dev->flags) ||
+	    q_id >= nic_dev->q_params.num_qps || nic_dev->adaptive_rx_coal)
+		return 0;
+
+	info.msix_index = nic_dev->q_params.irq_cfg[q_id].msix_entry_idx;
+	info.interrupt_coalesc_set = 1;
+	info.coalesc_timer_cfg = intr_coal->coalesce_timer_cfg;
+	info.pending_limit = intr_coal->pending_limit;
+	info.resend_timer_cfg = intr_coal->resend_timer_cfg;
+	err = hinic3_set_interrupt_cfg(nic_dev->hwdev, info);
+	if (err) {
+		netdev_warn(netdev, "Failed to set queue%u coalesce\n", q_id);
+		return err;
+	}
+
+	return 0;
+}
+
+static int is_coalesce_exceed_limit(struct net_device *netdev,
+				    const struct ethtool_coalesce *coal)
+{
+	const struct {
+		const char *name;
+		u32 value;
+		u32 limit;
+	} coalesce_limits[] = {
+		{"rx_coalesce_usecs",
+		 coal->rx_coalesce_usecs,
+		 COALESCE_MAX_TIMER_CFG},
+		{"rx_max_coalesced_frames",
+		 coal->rx_max_coalesced_frames,
+		 COALESCE_MAX_PENDING_LIMIT},
+		{"rx_max_coalesced_frames_low",
+		 coal->rx_max_coalesced_frames_low,
+		 COALESCE_MAX_PENDING_LIMIT},
+		{"rx_max_coalesced_frames_high",
+		 coal->rx_max_coalesced_frames_high,
+		 COALESCE_MAX_PENDING_LIMIT},
+	};
+
+	for (int i = 0; i < ARRAY_SIZE(coalesce_limits); i++) {
+		if (coalesce_limits[i].value > coalesce_limits[i].limit) {
+			netdev_err(netdev, "%s out of range %d-%d\n",
+				   coalesce_limits[i].name, 0,
+				   coalesce_limits[i].limit);
+			return -EOPNOTSUPP;
+		}
+	}
+	return 0;
+}
+
+static int is_coalesce_legal(struct net_device *netdev,
+			     const struct ethtool_coalesce *coal)
+{
+	int err;
+
+	err = is_coalesce_exceed_limit(netdev, coal);
+	if (err)
+		return err;
+
+	if (coal->rx_max_coalesced_frames_low >=
+	    coal->rx_max_coalesced_frames_high &&
+	    coal->rx_max_coalesced_frames_high > 0) {
+		netdev_err(netdev, "invalid coalesce frame high %u, low %u, unit %d\n",
+			   coal->rx_max_coalesced_frames_high,
+			   coal->rx_max_coalesced_frames_low,
+			   COALESCE_PENDING_LIMIT_UNIT);
+		return -EOPNOTSUPP;
+	}
+
+	return 0;
+}
+
+static void check_coalesce_align(struct net_device *netdev,
+				 u32 item, u32 unit, const char *str)
+{
+	if (item % unit)
+		netdev_warn(netdev, "%s in %d units, change to %u\n",
+			    str, unit, item - item % unit);
+}
+
+#define CHECK_COALESCE_ALIGN(member, unit) \
+	check_coalesce_align(netdev, member, unit, #member)
+
+static void check_coalesce_changed(struct net_device *netdev,
+				   u32 item, u32 unit, u32 ori_val,
+				   const char *obj_str, const char *str)
+{
+	if ((item / unit) != ori_val)
+		netdev_dbg(netdev, "Change %s from %d to %u %s\n",
+			   str, ori_val * unit, item - item % unit, obj_str);
+}
+
+#define CHECK_COALESCE_CHANGED(member, unit, ori_val, obj_str) \
+	check_coalesce_changed(netdev, member, unit, ori_val, obj_str, #member)
+
+static int hinic3_set_hw_coal_param(struct net_device *netdev,
+				    struct hinic3_intr_coal_info *intr_coal)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+	int err;
+	u16 i;
+
+	for (i = 0; i < nic_dev->max_qps; i++) {
+		err = hinic3_set_queue_coalesce(netdev, i, intr_coal);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int hinic3_get_coalesce(struct net_device *netdev,
+			       struct ethtool_coalesce *coal,
+			       struct kernel_ethtool_coalesce *kernel_coal,
+			       struct netlink_ext_ack *extack)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+	struct hinic3_intr_coal_info *interrupt_info;
+
+	interrupt_info = &nic_dev->intr_coalesce[0];
+
+	/* TX/RX uses the same interrupt.
+	 * So we only declare RX ethtool_coalesce parameters.
+	 */
+	coal->rx_coalesce_usecs = interrupt_info->coalesce_timer_cfg *
+				  COALESCE_TIMER_CFG_UNIT;
+	coal->rx_max_coalesced_frames = interrupt_info->pending_limit *
+					COALESCE_PENDING_LIMIT_UNIT;
+
+	coal->use_adaptive_rx_coalesce = nic_dev->adaptive_rx_coal;
+
+	coal->rx_max_coalesced_frames_high =
+		interrupt_info->rx_pending_limit_high *
+		COALESCE_PENDING_LIMIT_UNIT;
+
+	coal->rx_max_coalesced_frames_low =
+		interrupt_info->rx_pending_limit_low *
+		COALESCE_PENDING_LIMIT_UNIT;
+
+	return 0;
+}
+
+static int hinic3_set_coalesce(struct net_device *netdev,
+			       struct ethtool_coalesce *coal,
+			       struct kernel_ethtool_coalesce *kernel_coal,
+			       struct netlink_ext_ack *extack)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+	struct hinic3_intr_coal_info *ori_intr_coal;
+	struct hinic3_intr_coal_info intr_coal = {};
+	char obj_str[32];
+	int err;
+
+	err = is_coalesce_legal(netdev, coal);
+	if (err)
+		return err;
+
+	CHECK_COALESCE_ALIGN(coal->rx_coalesce_usecs, COALESCE_TIMER_CFG_UNIT);
+	CHECK_COALESCE_ALIGN(coal->rx_max_coalesced_frames,
+			     COALESCE_PENDING_LIMIT_UNIT);
+	CHECK_COALESCE_ALIGN(coal->rx_max_coalesced_frames_high,
+			     COALESCE_PENDING_LIMIT_UNIT);
+	CHECK_COALESCE_ALIGN(coal->rx_max_coalesced_frames_low,
+			     COALESCE_PENDING_LIMIT_UNIT);
+
+	ori_intr_coal = &nic_dev->intr_coalesce[0];
+	snprintf(obj_str, sizeof(obj_str), "for netdev");
+
+	CHECK_COALESCE_CHANGED(coal->rx_coalesce_usecs, COALESCE_TIMER_CFG_UNIT,
+			       ori_intr_coal->coalesce_timer_cfg, obj_str);
+	CHECK_COALESCE_CHANGED(coal->rx_max_coalesced_frames,
+			       COALESCE_PENDING_LIMIT_UNIT,
+			       ori_intr_coal->pending_limit, obj_str);
+	CHECK_COALESCE_CHANGED(coal->rx_max_coalesced_frames_high,
+			       COALESCE_PENDING_LIMIT_UNIT,
+			       ori_intr_coal->rx_pending_limit_high, obj_str);
+	CHECK_COALESCE_CHANGED(coal->rx_max_coalesced_frames_low,
+			       COALESCE_PENDING_LIMIT_UNIT,
+			       ori_intr_coal->rx_pending_limit_low, obj_str);
+
+	intr_coal.coalesce_timer_cfg =
+		(u8)(coal->rx_coalesce_usecs / COALESCE_TIMER_CFG_UNIT);
+	intr_coal.pending_limit = (u8)(coal->rx_max_coalesced_frames /
+				      COALESCE_PENDING_LIMIT_UNIT);
+
+	nic_dev->adaptive_rx_coal = coal->use_adaptive_rx_coalesce;
+
+	intr_coal.rx_pending_limit_high =
+		(u8)(coal->rx_max_coalesced_frames_high /
+		     COALESCE_PENDING_LIMIT_UNIT);
+
+	intr_coal.rx_pending_limit_low =
+		(u8)(coal->rx_max_coalesced_frames_low /
+		     COALESCE_PENDING_LIMIT_UNIT);
+
+	/* coalesce timer or pending set to zero will disable coalesce */
+	if (!nic_dev->adaptive_rx_coal &&
+	    (!intr_coal.coalesce_timer_cfg || !intr_coal.pending_limit))
+		netdev_warn(netdev, "Coalesce will be disabled\n");
+
+	return hinic3_set_hw_coal_param(netdev, &intr_coal);
+}
+
 static const struct ethtool_ops hinic3_ethtool_ops = {
-	.supported_coalesce_params      = ETHTOOL_COALESCE_USECS |
-					  ETHTOOL_COALESCE_PKT_RATE_RX_USECS,
+	.supported_coalesce_params      = ETHTOOL_COALESCE_RX_USECS |
+					  ETHTOOL_COALESCE_RX_MAX_FRAMES |
+					  ETHTOOL_COALESCE_USE_ADAPTIVE_RX |
+					  ETHTOOL_COALESCE_RX_MAX_FRAMES_LOW |
+					  ETHTOOL_COALESCE_RX_MAX_FRAMES_HIGH,
 	.get_link_ksettings             = hinic3_get_link_ksettings,
 	.get_drvinfo                    = hinic3_get_drvinfo,
 	.get_msglevel                   = hinic3_get_msglevel,
@@ -1004,6 +1231,8 @@ static const struct ethtool_ops hinic3_ethtool_ops = {
 	.get_eth_ctrl_stats             = hinic3_get_eth_ctrl_stats,
 	.get_rmon_stats                 = hinic3_get_rmon_stats,
 	.get_pause_stats                = hinic3_get_pause_stats,
+	.get_coalesce                   = hinic3_get_coalesce,
+	.set_coalesce                   = hinic3_set_coalesce,
 };
 
 void hinic3_set_ethtool_ops(struct net_device *netdev)
-- 
2.43.0


^ permalink raw reply related

* [PATCH net-next v03 2/6] hinic3: Add ethtool statistic ops
From: Fan Gong @ 2026-03-31  7:56 UTC (permalink / raw)
  To: Fan Gong, Zhu Yikai, netdev, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Andrew Lunn,
	Ioana Ciornei
  Cc: linux-kernel, linux-doc, luosifu, Xin Guo, Zhou Shuai, Wu Like,
	Shi Jing, Zheng Jiezhen, Maxime Chevallier
In-Reply-To: <cover.1774940117.git.zhuyikai1@h-partners.com>

  Add PF/VF statistics functions in TX and RX processing.
  Implement following ethtool callback function:
.get_sset_count
.get_ethtool_stats
.get_strings
.get_eth_phy_stats
.get_eth_mac_stats
.get_eth_ctrl_stats
.get_rmon_stats
.get_pause_stats

  These callbacks allow users to utilize ethtool for detailed
TX and RX netdev stats monitoring.

Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
---
 .../ethernet/huawei/hinic3/hinic3_ethtool.c   | 493 ++++++++++++++++++
 .../ethernet/huawei/hinic3/hinic3_hw_intf.h   |  13 +-
 .../net/ethernet/huawei/hinic3/hinic3_main.c  |   1 +
 .../huawei/hinic3/hinic3_mgmt_interface.h     |  37 ++
 .../ethernet/huawei/hinic3/hinic3_nic_cfg.c   |  64 +++
 .../ethernet/huawei/hinic3/hinic3_nic_cfg.h   | 109 ++++
 .../ethernet/huawei/hinic3/hinic3_nic_dev.h   |   8 +
 .../net/ethernet/huawei/hinic3/hinic3_rx.c    |  59 ++-
 .../net/ethernet/huawei/hinic3/hinic3_rx.h    |  14 +
 .../net/ethernet/huawei/hinic3/hinic3_tx.c    |  80 ++-
 .../net/ethernet/huawei/hinic3/hinic3_tx.h    |   2 +
 11 files changed, 871 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_ethtool.c b/drivers/net/ethernet/huawei/hinic3/hinic3_ethtool.c
index d78aff802a20..7fd8ad053c6e 100644
--- a/drivers/net/ethernet/huawei/hinic3/hinic3_ethtool.c
+++ b/drivers/net/ethernet/huawei/hinic3/hinic3_ethtool.c
@@ -501,6 +501,491 @@ static int hinic3_set_ringparam(struct net_device *netdev,
 	return 0;
 }
 
+struct hinic3_stats {
+	char name[ETH_GSTRING_LEN];
+	u32  size;
+	int  offset;
+};
+
+#define HINIC3_NIC_STAT(_stat_item) { \
+	.name   = #_stat_item, \
+	.size   = sizeof_field(struct hinic3_nic_stats, _stat_item), \
+	.offset = offsetof(struct hinic3_nic_stats, _stat_item) \
+}
+
+#define HINIC3_RXQ_STAT(_stat_item) { \
+	.name   = "rxq%d_"#_stat_item, \
+	.size   = sizeof_field(struct hinic3_rxq_stats, _stat_item), \
+	.offset = offsetof(struct hinic3_rxq_stats, _stat_item) \
+}
+
+#define HINIC3_TXQ_STAT(_stat_item) { \
+	.name   = "txq%d_"#_stat_item, \
+	.size   = sizeof_field(struct hinic3_txq_stats, _stat_item), \
+	.offset = offsetof(struct hinic3_txq_stats, _stat_item) \
+}
+
+static struct hinic3_stats hinic3_rx_queue_stats[] = {
+	HINIC3_RXQ_STAT(csum_errors),
+	HINIC3_RXQ_STAT(other_errors),
+	HINIC3_RXQ_STAT(rx_buf_empty),
+	HINIC3_RXQ_STAT(alloc_skb_err),
+	HINIC3_RXQ_STAT(alloc_rx_buf_err),
+	HINIC3_RXQ_STAT(restore_drop_sge),
+};
+
+static struct hinic3_stats hinic3_tx_queue_stats[] = {
+	HINIC3_TXQ_STAT(busy),
+	HINIC3_TXQ_STAT(skb_pad_err),
+	HINIC3_TXQ_STAT(frag_len_overflow),
+	HINIC3_TXQ_STAT(offload_cow_skb_err),
+	HINIC3_TXQ_STAT(map_frag_err),
+	HINIC3_TXQ_STAT(unknown_tunnel_pkt),
+	HINIC3_TXQ_STAT(frag_size_err),
+};
+
+#define HINIC3_FUNC_STAT(_stat_item) {	\
+	.name   = #_stat_item, \
+	.size   = sizeof_field(struct l2nic_vport_stats, _stat_item), \
+	.offset = offsetof(struct l2nic_vport_stats, _stat_item) \
+}
+
+static struct hinic3_stats hinic3_function_stats[] = {
+	HINIC3_FUNC_STAT(tx_unicast_pkts_vport),
+	HINIC3_FUNC_STAT(tx_unicast_bytes_vport),
+	HINIC3_FUNC_STAT(tx_multicast_pkts_vport),
+	HINIC3_FUNC_STAT(tx_multicast_bytes_vport),
+	HINIC3_FUNC_STAT(tx_broadcast_pkts_vport),
+	HINIC3_FUNC_STAT(tx_broadcast_bytes_vport),
+
+	HINIC3_FUNC_STAT(rx_unicast_pkts_vport),
+	HINIC3_FUNC_STAT(rx_unicast_bytes_vport),
+	HINIC3_FUNC_STAT(rx_multicast_pkts_vport),
+	HINIC3_FUNC_STAT(rx_multicast_bytes_vport),
+	HINIC3_FUNC_STAT(rx_broadcast_pkts_vport),
+	HINIC3_FUNC_STAT(rx_broadcast_bytes_vport),
+
+	HINIC3_FUNC_STAT(tx_discard_vport),
+	HINIC3_FUNC_STAT(rx_discard_vport),
+	HINIC3_FUNC_STAT(tx_err_vport),
+	HINIC3_FUNC_STAT(rx_err_vport),
+};
+
+#define HINIC3_PORT_STAT(_stat_item) { \
+	.name   = #_stat_item, \
+	.size   = sizeof_field(struct mag_cmd_port_stats, _stat_item), \
+	.offset = offsetof(struct mag_cmd_port_stats, _stat_item) \
+}
+
+static struct hinic3_stats hinic3_port_stats[] = {
+	HINIC3_PORT_STAT(mac_tx_fragment_pkt_num),
+	HINIC3_PORT_STAT(mac_tx_undersize_pkt_num),
+	HINIC3_PORT_STAT(mac_tx_undermin_pkt_num),
+	HINIC3_PORT_STAT(mac_tx_1519_max_bad_pkt_num),
+	HINIC3_PORT_STAT(mac_tx_1519_max_good_pkt_num),
+	HINIC3_PORT_STAT(mac_tx_oversize_pkt_num),
+	HINIC3_PORT_STAT(mac_tx_jabber_pkt_num),
+	HINIC3_PORT_STAT(mac_tx_bad_pkt_num),
+	HINIC3_PORT_STAT(mac_tx_bad_oct_num),
+	HINIC3_PORT_STAT(mac_tx_good_oct_num),
+	HINIC3_PORT_STAT(mac_tx_total_pkt_num),
+	HINIC3_PORT_STAT(mac_tx_uni_pkt_num),
+	HINIC3_PORT_STAT(mac_tx_pfc_pkt_num),
+	HINIC3_PORT_STAT(mac_tx_pfc_pri0_pkt_num),
+	HINIC3_PORT_STAT(mac_tx_pfc_pri1_pkt_num),
+	HINIC3_PORT_STAT(mac_tx_pfc_pri2_pkt_num),
+	HINIC3_PORT_STAT(mac_tx_pfc_pri3_pkt_num),
+	HINIC3_PORT_STAT(mac_tx_pfc_pri4_pkt_num),
+	HINIC3_PORT_STAT(mac_tx_pfc_pri5_pkt_num),
+	HINIC3_PORT_STAT(mac_tx_pfc_pri6_pkt_num),
+	HINIC3_PORT_STAT(mac_tx_pfc_pri7_pkt_num),
+	HINIC3_PORT_STAT(mac_tx_err_all_pkt_num),
+	HINIC3_PORT_STAT(mac_tx_from_app_good_pkt_num),
+	HINIC3_PORT_STAT(mac_tx_from_app_bad_pkt_num),
+
+	HINIC3_PORT_STAT(mac_rx_undermin_pkt_num),
+	HINIC3_PORT_STAT(mac_rx_1519_max_bad_pkt_num),
+	HINIC3_PORT_STAT(mac_rx_1519_max_good_pkt_num),
+	HINIC3_PORT_STAT(mac_rx_bad_pkt_num),
+	HINIC3_PORT_STAT(mac_rx_bad_oct_num),
+	HINIC3_PORT_STAT(mac_rx_good_oct_num),
+	HINIC3_PORT_STAT(mac_rx_total_pkt_num),
+	HINIC3_PORT_STAT(mac_rx_uni_pkt_num),
+	HINIC3_PORT_STAT(mac_rx_pfc_pkt_num),
+	HINIC3_PORT_STAT(mac_rx_pfc_pri0_pkt_num),
+	HINIC3_PORT_STAT(mac_rx_pfc_pri1_pkt_num),
+	HINIC3_PORT_STAT(mac_rx_pfc_pri2_pkt_num),
+	HINIC3_PORT_STAT(mac_rx_pfc_pri3_pkt_num),
+	HINIC3_PORT_STAT(mac_rx_pfc_pri4_pkt_num),
+	HINIC3_PORT_STAT(mac_rx_pfc_pri5_pkt_num),
+	HINIC3_PORT_STAT(mac_rx_pfc_pri6_pkt_num),
+	HINIC3_PORT_STAT(mac_rx_pfc_pri7_pkt_num),
+	HINIC3_PORT_STAT(mac_rx_send_app_good_pkt_num),
+	HINIC3_PORT_STAT(mac_rx_send_app_bad_pkt_num),
+	HINIC3_PORT_STAT(mac_rx_unfilter_pkt_num),
+};
+
+static int hinic3_get_sset_count(struct net_device *netdev, int sset)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+	int count, q_num;
+
+	switch (sset) {
+	case ETH_SS_STATS:
+		q_num = nic_dev->q_params.num_qps;
+		count = ARRAY_SIZE(hinic3_function_stats) +
+			(ARRAY_SIZE(hinic3_tx_queue_stats) +
+			 ARRAY_SIZE(hinic3_rx_queue_stats)) *
+			q_num;
+
+		if (!HINIC3_IS_VF(nic_dev->hwdev))
+			count += ARRAY_SIZE(hinic3_port_stats);
+
+		return count;
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+static u64 get_val_of_ptr(u32 size, const void *ptr)
+{
+	u64 ret = size == sizeof(u64) ? *(u64 *)ptr :
+		  size == sizeof(u32) ? *(u32 *)ptr :
+		  size == sizeof(u16) ? *(u16 *)ptr :
+		  *(u8 *)ptr;
+
+	return ret;
+}
+
+static void hinic3_get_drv_queue_stats(struct net_device *netdev, u64 *data)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+	struct hinic3_txq_stats txq_stats = {};
+	struct hinic3_rxq_stats rxq_stats = {};
+	u16 i = 0, j, qid;
+	char *p;
+
+	u64_stats_init(&txq_stats.syncp);
+	u64_stats_init(&rxq_stats.syncp);
+
+	for (qid = 0; qid < nic_dev->q_params.num_qps; qid++) {
+		if (!nic_dev->txqs)
+			break;
+
+		hinic3_txq_get_stats(&nic_dev->txqs[qid], &txq_stats);
+		for (j = 0; j < ARRAY_SIZE(hinic3_tx_queue_stats); j++, i++) {
+			p = (char *)&txq_stats +
+			    hinic3_tx_queue_stats[j].offset;
+			data[i] = get_val_of_ptr(hinic3_tx_queue_stats[j].size,
+						 p);
+		}
+	}
+
+	for (qid = 0; qid < nic_dev->q_params.num_qps; qid++) {
+		if (!nic_dev->rxqs)
+			break;
+
+		hinic3_rxq_get_stats(&nic_dev->rxqs[qid], &rxq_stats);
+		for (j = 0; j < ARRAY_SIZE(hinic3_rx_queue_stats); j++, i++) {
+			p = (char *)&rxq_stats +
+			    hinic3_rx_queue_stats[j].offset;
+			data[i] = get_val_of_ptr(hinic3_rx_queue_stats[j].size,
+						 p);
+		}
+	}
+}
+
+static u16 hinic3_get_ethtool_port_stats(struct net_device *netdev, u64 *data)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+	struct mag_cmd_port_stats *ps;
+	u16 i = 0, j;
+	char *p;
+	int err;
+
+	ps = kmalloc_obj(*ps);
+	if (!ps)
+		goto err_zero_stats;
+
+	err = hinic3_get_phy_port_stats(nic_dev->hwdev, ps);
+	if (err) {
+		kfree(ps);
+		netdev_err(netdev, "Failed to get port stats from fw\n");
+		goto err_zero_stats;
+	}
+
+	for (j = 0; j < ARRAY_SIZE(hinic3_port_stats); j++, i++) {
+		p = (char *)ps + hinic3_port_stats[j].offset;
+		data[i] = get_val_of_ptr(hinic3_port_stats[j].size, p);
+	}
+
+	kfree(ps);
+
+	return i;
+
+err_zero_stats:
+	memset(&data[i], 0, ARRAY_SIZE(hinic3_port_stats) * sizeof(*data));
+
+	return i + ARRAY_SIZE(hinic3_port_stats);
+}
+
+static void hinic3_get_ethtool_stats(struct net_device *netdev,
+				     struct ethtool_stats *stats, u64 *data)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+	struct l2nic_vport_stats vport_stats = {};
+	u16 i = 0, j;
+	char *p;
+	int err;
+
+	err = hinic3_get_vport_stats(nic_dev->hwdev,
+				     hinic3_global_func_id(nic_dev->hwdev),
+				     &vport_stats);
+	if (err)
+		netdev_err(netdev, "Failed to get function stats from fw\n");
+
+	for (j = 0; j < ARRAY_SIZE(hinic3_function_stats); j++, i++) {
+		p = (char *)&vport_stats + hinic3_function_stats[j].offset;
+		data[i] = get_val_of_ptr(hinic3_function_stats[j].size, p);
+	}
+
+	if (!HINIC3_IS_VF(nic_dev->hwdev))
+		i += hinic3_get_ethtool_port_stats(netdev, data + i);
+
+	hinic3_get_drv_queue_stats(netdev, data + i);
+}
+
+static u16 hinic3_get_hw_stats_strings(struct net_device *netdev, char *p)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+	u16 i, cnt = 0;
+
+	for (i = 0; i < ARRAY_SIZE(hinic3_function_stats); i++) {
+		memcpy(p, hinic3_function_stats[i].name, ETH_GSTRING_LEN);
+		p += ETH_GSTRING_LEN;
+		cnt++;
+	}
+
+	if (!HINIC3_IS_VF(nic_dev->hwdev)) {
+		for (i = 0; i < ARRAY_SIZE(hinic3_port_stats); i++) {
+			memcpy(p, hinic3_port_stats[i].name, ETH_GSTRING_LEN);
+			p += ETH_GSTRING_LEN;
+			cnt++;
+		}
+	}
+
+	return cnt;
+}
+
+static void hinic3_get_qp_stats_strings(const struct net_device *netdev,
+					char *p)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+	u8 *data = p;
+	u16 i, j;
+
+	for (i = 0; i < nic_dev->q_params.num_qps; i++) {
+		for (j = 0; j < ARRAY_SIZE(hinic3_tx_queue_stats); j++)
+			ethtool_sprintf(&data,
+					hinic3_tx_queue_stats[j].name, i);
+	}
+
+	for (i = 0; i < nic_dev->q_params.num_qps; i++) {
+		for (j = 0; j < ARRAY_SIZE(hinic3_rx_queue_stats); j++)
+			ethtool_sprintf(&data,
+					hinic3_rx_queue_stats[j].name, i);
+	}
+}
+
+static void hinic3_get_strings(struct net_device *netdev,
+			       u32 stringset, u8 *data)
+{
+	char *p = (char *)data;
+	u16 offset;
+
+	switch (stringset) {
+	case ETH_SS_STATS:
+		offset = hinic3_get_hw_stats_strings(netdev, p);
+		hinic3_get_qp_stats_strings(netdev,
+					    p + offset * ETH_GSTRING_LEN);
+
+		return;
+	default:
+		netdev_err(netdev, "Invalid string set %u.\n", stringset);
+		return;
+	}
+}
+
+static void hinic3_get_eth_phy_stats(struct net_device *netdev,
+				     struct ethtool_eth_phy_stats *phy_stats)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+	struct mag_cmd_port_stats *ps;
+	int err;
+
+	ps = kmalloc_obj(*ps);
+	if (!ps)
+		return;
+
+	err = hinic3_get_phy_port_stats(nic_dev->hwdev, ps);
+	if (err) {
+		kfree(ps);
+		netdev_err(netdev, "Failed to get eth phy stats from fw\n");
+		return;
+	}
+
+	phy_stats->SymbolErrorDuringCarrier = ps->mac_rx_sym_err_pkt_num;
+
+	kfree(ps);
+}
+
+static void hinic3_get_eth_mac_stats(struct net_device *netdev,
+				     struct ethtool_eth_mac_stats *mac_stats)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+	struct mag_cmd_port_stats *ps;
+	int err;
+
+	ps = kmalloc_obj(*ps);
+	if (!ps)
+		return;
+
+	err = hinic3_get_phy_port_stats(nic_dev->hwdev, ps);
+	if (err) {
+		kfree(ps);
+		netdev_err(netdev, "Failed to get eth mac stats from fw\n");
+		return;
+	}
+
+	mac_stats->FramesTransmittedOK = ps->mac_tx_good_pkt_num;
+	mac_stats->FramesReceivedOK = ps->mac_rx_good_pkt_num;
+	mac_stats->FrameCheckSequenceErrors = ps->mac_rx_fcs_err_pkt_num;
+	mac_stats->OctetsTransmittedOK = ps->mac_tx_total_oct_num;
+	mac_stats->OctetsReceivedOK = ps->mac_rx_total_oct_num;
+	mac_stats->MulticastFramesXmittedOK = ps->mac_tx_multi_pkt_num;
+	mac_stats->BroadcastFramesXmittedOK = ps->mac_tx_broad_pkt_num;
+	mac_stats->MulticastFramesReceivedOK = ps->mac_rx_multi_pkt_num;
+	mac_stats->BroadcastFramesReceivedOK = ps->mac_rx_broad_pkt_num;
+
+	kfree(ps);
+}
+
+static void hinic3_get_eth_ctrl_stats(struct net_device *netdev,
+				      struct ethtool_eth_ctrl_stats *ctrl_stats)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+	struct mag_cmd_port_stats *ps;
+	int err;
+
+	ps = kmalloc_obj(*ps);
+	if (!ps)
+		return;
+
+	err = hinic3_get_phy_port_stats(nic_dev->hwdev, ps);
+	if (err) {
+		kfree(ps);
+		netdev_err(netdev, "Failed to get eth ctrl stats from fw\n");
+		return;
+	}
+
+	ctrl_stats->MACControlFramesTransmitted = ps->mac_tx_control_pkt_num;
+	ctrl_stats->MACControlFramesReceived = ps->mac_rx_control_pkt_num;
+
+	kfree(ps);
+}
+
+static const struct ethtool_rmon_hist_range hinic3_rmon_ranges[] = {
+	{     0,    64 },
+	{    65,   127 },
+	{   128,   255 },
+	{   256,   511 },
+	{   512,  1023 },
+	{  1024,  1518 },
+	{  1519,  2047 },
+	{  2048,  4095 },
+	{  4096,  8191 },
+	{  8192,  9216 },
+	{  9217, 12287 },
+	{}
+};
+
+static void hinic3_get_rmon_stats(struct net_device *netdev,
+				  struct ethtool_rmon_stats *rmon_stats,
+				  const struct ethtool_rmon_hist_range **ranges)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+	struct mag_cmd_port_stats *ps;
+	int err;
+
+	ps = kmalloc_obj(*ps);
+	if (!ps)
+		return;
+
+	err = hinic3_get_phy_port_stats(nic_dev->hwdev, ps);
+	if (err) {
+		kfree(ps);
+		netdev_err(netdev, "Failed to get eth rmon stats from fw\n");
+		return;
+	}
+
+	rmon_stats->undersize_pkts	= ps->mac_rx_undersize_pkt_num;
+	rmon_stats->oversize_pkts	= ps->mac_rx_oversize_pkt_num;
+	rmon_stats->fragments		= ps->mac_rx_fragment_pkt_num;
+	rmon_stats->jabbers		= ps->mac_rx_jabber_pkt_num;
+
+	rmon_stats->hist[0]		= ps->mac_rx_64_oct_pkt_num;
+	rmon_stats->hist[1]		= ps->mac_rx_65_127_oct_pkt_num;
+	rmon_stats->hist[2]		= ps->mac_rx_128_255_oct_pkt_num;
+	rmon_stats->hist[3]		= ps->mac_rx_256_511_oct_pkt_num;
+	rmon_stats->hist[4]		= ps->mac_rx_512_1023_oct_pkt_num;
+	rmon_stats->hist[5]		= ps->mac_rx_1024_1518_oct_pkt_num;
+	rmon_stats->hist[6]		= ps->mac_rx_1519_2047_oct_pkt_num;
+	rmon_stats->hist[7]		= ps->mac_rx_2048_4095_oct_pkt_num;
+	rmon_stats->hist[8]		= ps->mac_rx_4096_8191_oct_pkt_num;
+	rmon_stats->hist[9]		= ps->mac_rx_8192_9216_oct_pkt_num;
+	rmon_stats->hist[10]		= ps->mac_rx_9217_12287_oct_pkt_num;
+
+	rmon_stats->hist_tx[0]		= ps->mac_tx_64_oct_pkt_num;
+	rmon_stats->hist_tx[1]		= ps->mac_tx_65_127_oct_pkt_num;
+	rmon_stats->hist_tx[2]		= ps->mac_tx_128_255_oct_pkt_num;
+	rmon_stats->hist_tx[3]		= ps->mac_tx_256_511_oct_pkt_num;
+	rmon_stats->hist_tx[4]		= ps->mac_tx_512_1023_oct_pkt_num;
+	rmon_stats->hist_tx[5]		= ps->mac_tx_1024_1518_oct_pkt_num;
+	rmon_stats->hist_tx[6]		= ps->mac_tx_1519_2047_oct_pkt_num;
+	rmon_stats->hist_tx[7]		= ps->mac_tx_2048_4095_oct_pkt_num;
+	rmon_stats->hist_tx[8]		= ps->mac_tx_4096_8191_oct_pkt_num;
+	rmon_stats->hist_tx[9]		= ps->mac_tx_8192_9216_oct_pkt_num;
+	rmon_stats->hist_tx[10]		= ps->mac_tx_9217_12287_oct_pkt_num;
+
+	*ranges = hinic3_rmon_ranges;
+
+	kfree(ps);
+}
+
+static void hinic3_get_pause_stats(struct net_device *netdev,
+				   struct ethtool_pause_stats *pause_stats)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+	struct mag_cmd_port_stats *ps;
+	int err;
+
+	ps = kmalloc_obj(*ps);
+	if (!ps)
+		return;
+
+	err = hinic3_get_phy_port_stats(nic_dev->hwdev, ps);
+	if (err) {
+		kfree(ps);
+		netdev_err(netdev, "Failed to get eth pause stats from fw\n");
+		return;
+	}
+
+	pause_stats->tx_pause_frames = ps->mac_tx_pause_num;
+	pause_stats->rx_pause_frames = ps->mac_rx_pause_num;
+
+	kfree(ps);
+}
+
 static const struct ethtool_ops hinic3_ethtool_ops = {
 	.supported_coalesce_params      = ETHTOOL_COALESCE_USECS |
 					  ETHTOOL_COALESCE_PKT_RATE_RX_USECS,
@@ -511,6 +996,14 @@ static const struct ethtool_ops hinic3_ethtool_ops = {
 	.get_link                       = ethtool_op_get_link,
 	.get_ringparam                  = hinic3_get_ringparam,
 	.set_ringparam                  = hinic3_set_ringparam,
+	.get_sset_count                 = hinic3_get_sset_count,
+	.get_ethtool_stats              = hinic3_get_ethtool_stats,
+	.get_strings                    = hinic3_get_strings,
+	.get_eth_phy_stats              = hinic3_get_eth_phy_stats,
+	.get_eth_mac_stats              = hinic3_get_eth_mac_stats,
+	.get_eth_ctrl_stats             = hinic3_get_eth_ctrl_stats,
+	.get_rmon_stats                 = hinic3_get_rmon_stats,
+	.get_pause_stats                = hinic3_get_pause_stats,
 };
 
 void hinic3_set_ethtool_ops(struct net_device *netdev)
diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_hw_intf.h b/drivers/net/ethernet/huawei/hinic3/hinic3_hw_intf.h
index cfc9daa3034f..0b2ebef04c02 100644
--- a/drivers/net/ethernet/huawei/hinic3/hinic3_hw_intf.h
+++ b/drivers/net/ethernet/huawei/hinic3/hinic3_hw_intf.h
@@ -51,7 +51,18 @@ static inline void mgmt_msg_params_init_default(struct mgmt_msg_params *msg_para
 	msg_params->in_size = buf_size;
 	msg_params->expected_out_size = buf_size;
 	msg_params->timeout_ms = 0;
-}
+};
+
+static inline void
+mgmt_msg_params_init_in_out(struct mgmt_msg_params *msg_params, void *in_buf,
+			    void *out_buf, u32 in_buf_size, u32 out_buf_size)
+{
+	msg_params->buf_in = in_buf;
+	msg_params->buf_out = out_buf;
+	msg_params->in_size = in_buf_size;
+	msg_params->expected_out_size = out_buf_size;
+	msg_params->timeout_ms = 0;
+};
 
 enum cfg_cmd {
 	CFG_CMD_GET_DEV_CAP = 0,
diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_main.c b/drivers/net/ethernet/huawei/hinic3/hinic3_main.c
index 3b470978714a..60834f8dffcd 100644
--- a/drivers/net/ethernet/huawei/hinic3/hinic3_main.c
+++ b/drivers/net/ethernet/huawei/hinic3/hinic3_main.c
@@ -153,6 +153,7 @@ static int hinic3_init_nic_dev(struct net_device *netdev,
 		return -ENOMEM;
 
 	nic_dev->nic_svc_cap = hwdev->cfg_mgmt->cap.nic_svc_cap;
+	u64_stats_init(&nic_dev->stats.syncp);
 
 	nic_dev->workq = create_singlethread_workqueue(HINIC3_NIC_DEV_WQ_NAME);
 	if (!nic_dev->workq) {
diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_mgmt_interface.h b/drivers/net/ethernet/huawei/hinic3/hinic3_mgmt_interface.h
index c5bca3c4af96..76c691f82703 100644
--- a/drivers/net/ethernet/huawei/hinic3/hinic3_mgmt_interface.h
+++ b/drivers/net/ethernet/huawei/hinic3/hinic3_mgmt_interface.h
@@ -143,6 +143,41 @@ struct l2nic_cmd_set_dcb_state {
 	u8                   rsvd[7];
 };
 
+struct l2nic_port_stats_info {
+	struct mgmt_msg_head msg_head;
+	u16                  func_id;
+	u16                  rsvd1;
+};
+
+struct l2nic_vport_stats {
+	u64 tx_unicast_pkts_vport;
+	u64 tx_unicast_bytes_vport;
+	u64 tx_multicast_pkts_vport;
+	u64 tx_multicast_bytes_vport;
+	u64 tx_broadcast_pkts_vport;
+	u64 tx_broadcast_bytes_vport;
+
+	u64 rx_unicast_pkts_vport;
+	u64 rx_unicast_bytes_vport;
+	u64 rx_multicast_pkts_vport;
+	u64 rx_multicast_bytes_vport;
+	u64 rx_broadcast_pkts_vport;
+	u64 rx_broadcast_bytes_vport;
+
+	u64 tx_discard_vport;
+	u64 rx_discard_vport;
+	u64 tx_err_vport;
+	u64 rx_err_vport;
+};
+
+struct l2nic_cmd_vport_stats {
+	struct mgmt_msg_head     msg_head;
+	u32                      stats_size;
+	u32                      rsvd1;
+	struct l2nic_vport_stats stats;
+	u64                      rsvd2[6];
+};
+
 struct l2nic_cmd_lro_config {
 	struct mgmt_msg_head msg_head;
 	u16                  func_id;
@@ -234,6 +269,7 @@ enum l2nic_cmd {
 	L2NIC_CMD_SET_VPORT_ENABLE    = 6,
 	L2NIC_CMD_SET_RX_MODE         = 7,
 	L2NIC_CMD_SET_SQ_CI_ATTR      = 8,
+	L2NIC_CMD_GET_VPORT_STAT      = 9,
 	L2NIC_CMD_CLEAR_QP_RESOURCE   = 11,
 	L2NIC_CMD_CFG_RX_LRO          = 13,
 	L2NIC_CMD_CFG_LRO_TIMER       = 14,
@@ -272,6 +308,7 @@ enum mag_cmd {
 	MAG_CMD_SET_PORT_ENABLE = 6,
 	MAG_CMD_GET_LINK_STATUS = 7,
 
+	MAG_CMD_GET_PORT_STAT   = 151,
 	MAG_CMD_GET_PORT_INFO   = 153,
 };
 
diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_nic_cfg.c b/drivers/net/ethernet/huawei/hinic3/hinic3_nic_cfg.c
index de5a7984d2cb..1b14dc824ce1 100644
--- a/drivers/net/ethernet/huawei/hinic3/hinic3_nic_cfg.c
+++ b/drivers/net/ethernet/huawei/hinic3/hinic3_nic_cfg.c
@@ -639,6 +639,42 @@ int hinic3_get_link_status(struct hinic3_hwdev *hwdev, bool *link_status_up)
 	return 0;
 }
 
+int hinic3_get_phy_port_stats(struct hinic3_hwdev *hwdev,
+			      struct mag_cmd_port_stats *stats)
+{
+	struct mag_cmd_port_stats_info stats_info = {};
+	struct mag_cmd_get_port_stat *ps;
+	struct mgmt_msg_params msg_params = {};
+	int err;
+
+	ps = kzalloc_obj(*ps);
+	if (!ps)
+		return -ENOMEM;
+
+	stats_info.port_id = hinic3_physical_port_id(hwdev);
+
+	mgmt_msg_params_init_in_out(&msg_params, &stats_info, ps,
+				    sizeof(stats_info), sizeof(*ps));
+
+	err = hinic3_send_mbox_to_mgmt(hwdev, MGMT_MOD_HILINK,
+				       MAG_CMD_GET_PORT_STAT, &msg_params);
+
+	if (err || ps->head.status) {
+		dev_err(hwdev->dev,
+			"Failed to get port statistics, err: %d, status: 0x%x\n",
+			err, ps->head.status);
+		err = -EFAULT;
+		goto out;
+	}
+
+	memcpy(stats, &ps->counter, sizeof(*stats));
+
+out:
+	kfree(ps);
+
+	return err;
+}
+
 int hinic3_get_port_info(struct hinic3_hwdev *hwdev,
 			 struct hinic3_nic_port_info *port_info)
 {
@@ -738,3 +774,31 @@ int hinic3_get_pause_info(struct hinic3_nic_dev *nic_dev,
 	return hinic3_cfg_hw_pause(nic_dev->hwdev, MGMT_MSG_CMD_OP_GET,
 				   nic_pause);
 }
+
+int hinic3_get_vport_stats(struct hinic3_hwdev *hwdev, u16 func_id,
+			   struct l2nic_vport_stats *stats)
+{
+	struct l2nic_cmd_vport_stats vport_stats = {};
+	struct l2nic_port_stats_info stats_info = {};
+	struct mgmt_msg_params msg_params = {};
+	int err;
+
+	stats_info.func_id = func_id;
+
+	mgmt_msg_params_init_in_out(&msg_params, &stats_info, &vport_stats,
+				    sizeof(stats_info), sizeof(vport_stats));
+
+	err = hinic3_send_mbox_to_mgmt(hwdev, MGMT_MOD_L2NIC,
+				       L2NIC_CMD_GET_VPORT_STAT, &msg_params);
+
+	if (err || vport_stats.msg_head.status) {
+		dev_err(hwdev->dev,
+			"Failed to get function statistics, err: %d, status: 0x%x\n",
+			err, vport_stats.msg_head.status);
+		return -EFAULT;
+	}
+
+	memcpy(stats, &vport_stats.stats, sizeof(*stats));
+
+	return 0;
+}
diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_nic_cfg.h b/drivers/net/ethernet/huawei/hinic3/hinic3_nic_cfg.h
index 5d52202a8d4e..80573c121539 100644
--- a/drivers/net/ethernet/huawei/hinic3/hinic3_nic_cfg.h
+++ b/drivers/net/ethernet/huawei/hinic3/hinic3_nic_cfg.h
@@ -129,6 +129,110 @@ struct mag_cmd_get_xsfp_present {
 	u8                   rsvd[2];
 };
 
+struct mag_cmd_port_stats {
+	u64 mac_tx_fragment_pkt_num;
+	u64 mac_tx_undersize_pkt_num;
+	u64 mac_tx_undermin_pkt_num;
+	u64 mac_tx_64_oct_pkt_num;
+	u64 mac_tx_65_127_oct_pkt_num;
+	u64 mac_tx_128_255_oct_pkt_num;
+	u64 mac_tx_256_511_oct_pkt_num;
+	u64 mac_tx_512_1023_oct_pkt_num;
+	u64 mac_tx_1024_1518_oct_pkt_num;
+	u64 mac_tx_1519_2047_oct_pkt_num;
+	u64 mac_tx_2048_4095_oct_pkt_num;
+	u64 mac_tx_4096_8191_oct_pkt_num;
+	u64 mac_tx_8192_9216_oct_pkt_num;
+	u64 mac_tx_9217_12287_oct_pkt_num;
+	u64 mac_tx_12288_16383_oct_pkt_num;
+	u64 mac_tx_1519_max_bad_pkt_num;
+	u64 mac_tx_1519_max_good_pkt_num;
+	u64 mac_tx_oversize_pkt_num;
+	u64 mac_tx_jabber_pkt_num;
+	u64 mac_tx_bad_pkt_num;
+	u64 mac_tx_bad_oct_num;
+	u64 mac_tx_good_pkt_num;
+	u64 mac_tx_good_oct_num;
+	u64 mac_tx_total_pkt_num;
+	u64 mac_tx_total_oct_num;
+	u64 mac_tx_uni_pkt_num;
+	u64 mac_tx_multi_pkt_num;
+	u64 mac_tx_broad_pkt_num;
+	u64 mac_tx_pause_num;
+	u64 mac_tx_pfc_pkt_num;
+	u64 mac_tx_pfc_pri0_pkt_num;
+	u64 mac_tx_pfc_pri1_pkt_num;
+	u64 mac_tx_pfc_pri2_pkt_num;
+	u64 mac_tx_pfc_pri3_pkt_num;
+	u64 mac_tx_pfc_pri4_pkt_num;
+	u64 mac_tx_pfc_pri5_pkt_num;
+	u64 mac_tx_pfc_pri6_pkt_num;
+	u64 mac_tx_pfc_pri7_pkt_num;
+	u64 mac_tx_control_pkt_num;
+	u64 mac_tx_err_all_pkt_num;
+	u64 mac_tx_from_app_good_pkt_num;
+	u64 mac_tx_from_app_bad_pkt_num;
+
+	u64 mac_rx_fragment_pkt_num;
+	u64 mac_rx_undersize_pkt_num;
+	u64 mac_rx_undermin_pkt_num;
+	u64 mac_rx_64_oct_pkt_num;
+	u64 mac_rx_65_127_oct_pkt_num;
+	u64 mac_rx_128_255_oct_pkt_num;
+	u64 mac_rx_256_511_oct_pkt_num;
+	u64 mac_rx_512_1023_oct_pkt_num;
+	u64 mac_rx_1024_1518_oct_pkt_num;
+	u64 mac_rx_1519_2047_oct_pkt_num;
+	u64 mac_rx_2048_4095_oct_pkt_num;
+	u64 mac_rx_4096_8191_oct_pkt_num;
+	u64 mac_rx_8192_9216_oct_pkt_num;
+	u64 mac_rx_9217_12287_oct_pkt_num;
+	u64 mac_rx_12288_16383_oct_pkt_num;
+	u64 mac_rx_1519_max_bad_pkt_num;
+	u64 mac_rx_1519_max_good_pkt_num;
+	u64 mac_rx_oversize_pkt_num;
+	u64 mac_rx_jabber_pkt_num;
+	u64 mac_rx_bad_pkt_num;
+	u64 mac_rx_bad_oct_num;
+	u64 mac_rx_good_pkt_num;
+	u64 mac_rx_good_oct_num;
+	u64 mac_rx_total_pkt_num;
+	u64 mac_rx_total_oct_num;
+	u64 mac_rx_uni_pkt_num;
+	u64 mac_rx_multi_pkt_num;
+	u64 mac_rx_broad_pkt_num;
+	u64 mac_rx_pause_num;
+	u64 mac_rx_pfc_pkt_num;
+	u64 mac_rx_pfc_pri0_pkt_num;
+	u64 mac_rx_pfc_pri1_pkt_num;
+	u64 mac_rx_pfc_pri2_pkt_num;
+	u64 mac_rx_pfc_pri3_pkt_num;
+	u64 mac_rx_pfc_pri4_pkt_num;
+	u64 mac_rx_pfc_pri5_pkt_num;
+	u64 mac_rx_pfc_pri6_pkt_num;
+	u64 mac_rx_pfc_pri7_pkt_num;
+	u64 mac_rx_control_pkt_num;
+	u64 mac_rx_sym_err_pkt_num;
+	u64 mac_rx_fcs_err_pkt_num;
+	u64 mac_rx_send_app_good_pkt_num;
+	u64 mac_rx_send_app_bad_pkt_num;
+	u64 mac_rx_unfilter_pkt_num;
+};
+
+struct mag_cmd_port_stats_info {
+	struct mgmt_msg_head head;
+
+	u8                   port_id;
+	u8                   rsvd0[3];
+};
+
+struct mag_cmd_get_port_stat {
+	struct mgmt_msg_head      head;
+
+	struct mag_cmd_port_stats counter;
+	u64                       rsvd1[15];
+};
+
 enum link_err_type {
 	LINK_ERR_MODULE_UNRECOGENIZED,
 	LINK_ERR_NUM,
@@ -209,6 +313,11 @@ int hinic3_get_port_info(struct hinic3_hwdev *hwdev,
 			 struct hinic3_nic_port_info *port_info);
 int hinic3_set_vport_enable(struct hinic3_hwdev *hwdev, u16 func_id,
 			    bool enable);
+int hinic3_get_phy_port_stats(struct hinic3_hwdev *hwdev,
+			      struct mag_cmd_port_stats *stats);
+int hinic3_get_vport_stats(struct hinic3_hwdev *hwdev, u16 func_id,
+			   struct l2nic_vport_stats *stats);
+
 int hinic3_add_vlan(struct hinic3_hwdev *hwdev, u16 vlan_id, u16 func_id);
 int hinic3_del_vlan(struct hinic3_hwdev *hwdev, u16 vlan_id, u16 func_id);
 
diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_nic_dev.h b/drivers/net/ethernet/huawei/hinic3/hinic3_nic_dev.h
index 55b280888ad8..8f6e0914c31e 100644
--- a/drivers/net/ethernet/huawei/hinic3/hinic3_nic_dev.h
+++ b/drivers/net/ethernet/huawei/hinic3/hinic3_nic_dev.h
@@ -34,6 +34,13 @@ enum hinic3_event_work_flags {
 	HINIC3_EVENT_WORK_TX_TIMEOUT,
 };
 
+struct hinic3_nic_stats {
+	/* Subdivision statistics show in private tool */
+	u64                   tx_carrier_off_drop;
+	u64                   tx_invalid_qid;
+	struct u64_stats_sync syncp;
+};
+
 enum hinic3_rx_mode_state {
 	HINIC3_HW_PROMISC_ON,
 	HINIC3_HW_ALLMULTI_ON,
@@ -120,6 +127,7 @@ struct hinic3_nic_dev {
 	struct hinic3_dyna_txrxq_params q_params;
 	struct hinic3_txq               *txqs;
 	struct hinic3_rxq               *rxqs;
+	struct hinic3_nic_stats         stats;
 
 	enum hinic3_rss_hash_type       rss_hash_type;
 	struct hinic3_rss_type          rss_type;
diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_rx.c b/drivers/net/ethernet/huawei/hinic3/hinic3_rx.c
index 309ab5901379..8951df172f0e 100644
--- a/drivers/net/ethernet/huawei/hinic3/hinic3_rx.c
+++ b/drivers/net/ethernet/huawei/hinic3/hinic3_rx.c
@@ -29,7 +29,7 @@
 #define HINIC3_LRO_PKT_HDR_LEN_IPV4     66
 #define HINIC3_LRO_PKT_HDR_LEN_IPV6     86
 #define HINIC3_LRO_PKT_HDR_LEN(cqe) \
-	(RQ_CQE_OFFOLAD_TYPE_GET((cqe)->offload_type, IP_TYPE) == \
+	(RQ_CQE_OFFOLAD_TYPE_GET(le32_to_cpu((cqe)->offload_type), IP_TYPE) == \
 	 HINIC3_RX_IPV6_PKT ? HINIC3_LRO_PKT_HDR_LEN_IPV6 : \
 	 HINIC3_LRO_PKT_HDR_LEN_IPV4)
 
@@ -155,8 +155,12 @@ static u32 hinic3_rx_fill_buffers(struct hinic3_rxq *rxq)
 
 		err = rx_alloc_mapped_page(rxq->page_pool, rx_info,
 					   rxq->buf_len);
-		if (unlikely(err))
+		if (unlikely(err)) {
+			u64_stats_update_begin(&rxq->rxq_stats.syncp);
+			rxq->rxq_stats.alloc_rx_buf_err++;
+			u64_stats_update_end(&rxq->rxq_stats.syncp);
 			break;
+		}
 
 		dma_addr = page_pool_get_dma_addr(rx_info->page) +
 			rx_info->page_offset;
@@ -170,6 +174,10 @@ static u32 hinic3_rx_fill_buffers(struct hinic3_rxq *rxq)
 				rxq->next_to_update << HINIC3_NORMAL_RQ_WQE);
 		rxq->delta -= i;
 		rxq->next_to_alloc = rxq->next_to_update;
+	} else if (free_wqebbs == rxq->q_depth - 1) {
+		u64_stats_update_begin(&rxq->rxq_stats.syncp);
+		rxq->rxq_stats.rx_buf_empty++;
+		u64_stats_update_end(&rxq->rxq_stats.syncp);
 	}
 
 	return i;
@@ -330,11 +338,23 @@ static void hinic3_rx_csum(struct hinic3_rxq *rxq, u32 offload_type,
 	struct net_device *netdev = rxq->netdev;
 	bool l2_tunnel;
 
+	if (unlikely(csum_err == HINIC3_RX_CSUM_IPSU_OTHER_ERR)) {
+		u64_stats_update_begin(&rxq->rxq_stats.syncp);
+		rxq->rxq_stats.other_errors++;
+		u64_stats_update_end(&rxq->rxq_stats.syncp);
+	}
+
 	if (!(netdev->features & NETIF_F_RXCSUM))
 		return;
 
 	if (unlikely(csum_err)) {
 		/* pkt type is recognized by HW, and csum is wrong */
+		if (!(csum_err & (HINIC3_RX_CSUM_HW_CHECK_NONE |
+				  HINIC3_RX_CSUM_IPSU_OTHER_ERR))) {
+			u64_stats_update_begin(&rxq->rxq_stats.syncp);
+			rxq->rxq_stats.csum_errors++;
+			u64_stats_update_end(&rxq->rxq_stats.syncp);
+		}
 		skb->ip_summed = CHECKSUM_NONE;
 		return;
 	}
@@ -387,8 +407,12 @@ static int recv_one_pkt(struct hinic3_rxq *rxq, struct hinic3_rq_cqe *rx_cqe,
 	u16 num_lro;
 
 	skb = hinic3_fetch_rx_buffer(rxq, pkt_len);
-	if (unlikely(!skb))
+	if (unlikely(!skb)) {
+		u64_stats_update_begin(&rxq->rxq_stats.syncp);
+		rxq->rxq_stats.alloc_skb_err++;
+		u64_stats_update_end(&rxq->rxq_stats.syncp);
 		return -ENOMEM;
+	}
 
 	/* place header in linear portion of buffer */
 	if (skb_is_nonlinear(skb))
@@ -550,11 +574,29 @@ int hinic3_configure_rxqs(struct net_device *netdev, u16 num_rq,
 	return 0;
 }
 
+void hinic3_rxq_get_stats(struct hinic3_rxq *rxq,
+			  struct hinic3_rxq_stats *stats)
+{
+	struct hinic3_rxq_stats *rxq_stats = &rxq->rxq_stats;
+	unsigned int start;
+
+	do {
+		start = u64_stats_fetch_begin(&rxq_stats->syncp);
+		stats->csum_errors = rxq_stats->csum_errors;
+		stats->other_errors = rxq_stats->other_errors;
+		stats->rx_buf_empty = rxq_stats->rx_buf_empty;
+		stats->alloc_skb_err = rxq_stats->alloc_skb_err;
+		stats->alloc_rx_buf_err = rxq_stats->alloc_rx_buf_err;
+		stats->restore_drop_sge = rxq_stats->restore_drop_sge;
+	} while (u64_stats_fetch_retry(&rxq_stats->syncp, start));
+}
+
 int hinic3_rx_poll(struct hinic3_rxq *rxq, int budget)
 {
 	struct hinic3_nic_dev *nic_dev = netdev_priv(rxq->netdev);
 	u32 sw_ci, status, pkt_len, vlan_len;
 	struct hinic3_rq_cqe *rx_cqe;
+	u64 rx_bytes = 0;
 	u32 num_wqe = 0;
 	int nr_pkts = 0;
 	u16 num_lro;
@@ -574,10 +616,14 @@ int hinic3_rx_poll(struct hinic3_rxq *rxq, int budget)
 		if (recv_one_pkt(rxq, rx_cqe, pkt_len, vlan_len, status))
 			break;
 
+		rx_bytes += pkt_len;
 		nr_pkts++;
 		num_lro = RQ_CQE_STATUS_GET(status, NUM_LRO);
-		if (num_lro)
+		if (num_lro) {
+			rx_bytes += (num_lro - 1) *
+				    HINIC3_LRO_PKT_HDR_LEN(rx_cqe);
 			num_wqe += hinic3_get_sge_num(rxq, pkt_len);
+		}
 
 		rx_cqe->status = 0;
 
@@ -588,5 +634,10 @@ int hinic3_rx_poll(struct hinic3_rxq *rxq, int budget)
 	if (rxq->delta >= HINIC3_RX_BUFFER_WRITE)
 		hinic3_rx_fill_buffers(rxq);
 
+	u64_stats_update_begin(&rxq->rxq_stats.syncp);
+	rxq->rxq_stats.packets += (u64)nr_pkts;
+	rxq->rxq_stats.bytes += rx_bytes;
+	u64_stats_update_end(&rxq->rxq_stats.syncp);
+
 	return nr_pkts;
 }
diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_rx.h b/drivers/net/ethernet/huawei/hinic3/hinic3_rx.h
index 06d1b3299e7c..cd2dcaab6cf7 100644
--- a/drivers/net/ethernet/huawei/hinic3/hinic3_rx.h
+++ b/drivers/net/ethernet/huawei/hinic3/hinic3_rx.h
@@ -8,6 +8,17 @@
 #include <linux/dim.h>
 #include <linux/netdevice.h>
 
+/* rx cqe checksum err */
+#define HINIC3_RX_CSUM_IP_CSUM_ERR      BIT(0)
+#define HINIC3_RX_CSUM_TCP_CSUM_ERR     BIT(1)
+#define HINIC3_RX_CSUM_UDP_CSUM_ERR     BIT(2)
+#define HINIC3_RX_CSUM_IGMP_CSUM_ERR    BIT(3)
+#define HINIC3_RX_CSUM_ICMPV4_CSUM_ERR  BIT(4)
+#define HINIC3_RX_CSUM_ICMPV6_CSUM_ERR  BIT(5)
+#define HINIC3_RX_CSUM_SCTP_CRC_ERR     BIT(6)
+#define HINIC3_RX_CSUM_HW_CHECK_NONE    BIT(7)
+#define HINIC3_RX_CSUM_IPSU_OTHER_ERR   BIT(8)
+
 #define RQ_CQE_OFFOLAD_TYPE_PKT_TYPE_MASK           GENMASK(4, 0)
 #define RQ_CQE_OFFOLAD_TYPE_IP_TYPE_MASK            GENMASK(6, 5)
 #define RQ_CQE_OFFOLAD_TYPE_TUNNEL_PKT_FORMAT_MASK  GENMASK(11, 8)
@@ -123,6 +134,9 @@ void hinic3_free_rxqs_res(struct net_device *netdev, u16 num_rq,
 			  u32 rq_depth, struct hinic3_dyna_rxq_res *rxqs_res);
 int hinic3_configure_rxqs(struct net_device *netdev, u16 num_rq,
 			  u32 rq_depth, struct hinic3_dyna_rxq_res *rxqs_res);
+
+void hinic3_rxq_get_stats(struct hinic3_rxq *rxq,
+			  struct hinic3_rxq_stats *stats);
 int hinic3_rx_poll(struct hinic3_rxq *rxq, int budget);
 
 #endif
diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_tx.c b/drivers/net/ethernet/huawei/hinic3/hinic3_tx.c
index 9306bf0020ca..58c1f1f40f5c 100644
--- a/drivers/net/ethernet/huawei/hinic3/hinic3_tx.c
+++ b/drivers/net/ethernet/huawei/hinic3/hinic3_tx.c
@@ -97,8 +97,12 @@ static int hinic3_tx_map_skb(struct net_device *netdev, struct sk_buff *skb,
 
 	dma_info[0].dma = dma_map_single(&pdev->dev, skb->data,
 					 skb_headlen(skb), DMA_TO_DEVICE);
-	if (dma_mapping_error(&pdev->dev, dma_info[0].dma))
+	if (dma_mapping_error(&pdev->dev, dma_info[0].dma)) {
+		u64_stats_update_begin(&txq->txq_stats.syncp);
+		txq->txq_stats.map_frag_err++;
+		u64_stats_update_end(&txq->txq_stats.syncp);
 		return -EFAULT;
+	}
 
 	dma_info[0].len = skb_headlen(skb);
 
@@ -117,6 +121,9 @@ static int hinic3_tx_map_skb(struct net_device *netdev, struct sk_buff *skb,
 						     skb_frag_size(frag),
 						     DMA_TO_DEVICE);
 		if (dma_mapping_error(&pdev->dev, dma_info[idx].dma)) {
+			u64_stats_update_begin(&txq->txq_stats.syncp);
+			txq->txq_stats.map_frag_err++;
+			u64_stats_update_end(&txq->txq_stats.syncp);
 			err = -EFAULT;
 			goto err_unmap_page;
 		}
@@ -260,6 +267,9 @@ static int hinic3_tx_csum(struct hinic3_txq *txq, struct hinic3_sq_task *task,
 		if (l4_proto != IPPROTO_UDP ||
 		    ((struct udphdr *)skb_transport_header(skb))->dest !=
 		    VXLAN_OFFLOAD_PORT_LE) {
+			u64_stats_update_begin(&txq->txq_stats.syncp);
+			txq->txq_stats.unknown_tunnel_pkt++;
+			u64_stats_update_end(&txq->txq_stats.syncp);
 			/* Unsupported tunnel packet, disable csum offload */
 			skb_checksum_help(skb);
 			return 0;
@@ -433,6 +443,27 @@ static u32 hinic3_tx_offload(struct sk_buff *skb, struct hinic3_sq_task *task,
 	return offload;
 }
 
+static void hinic3_get_pkt_stats(struct hinic3_txq *txq, struct sk_buff *skb)
+{
+	u32 hdr_len, tx_bytes;
+	unsigned short pkts;
+
+	if (skb_is_gso(skb)) {
+		hdr_len = (skb_shinfo(skb)->gso_segs - 1) *
+			  skb_tcp_all_headers(skb);
+		tx_bytes = skb->len + hdr_len;
+		pkts = skb_shinfo(skb)->gso_segs;
+	} else {
+		tx_bytes = skb->len > ETH_ZLEN ? skb->len : ETH_ZLEN;
+		pkts = 1;
+	}
+
+	u64_stats_update_begin(&txq->txq_stats.syncp);
+	txq->txq_stats.bytes += tx_bytes;
+	txq->txq_stats.packets += pkts;
+	u64_stats_update_end(&txq->txq_stats.syncp);
+}
+
 static u16 hinic3_get_and_update_sq_owner(struct hinic3_io_queue *sq,
 					  u16 curr_pi, u16 wqebb_cnt)
 {
@@ -539,8 +570,12 @@ static netdev_tx_t hinic3_send_one_skb(struct sk_buff *skb,
 	int err;
 
 	if (unlikely(skb->len < MIN_SKB_LEN)) {
-		if (skb_pad(skb, MIN_SKB_LEN - skb->len))
+		if (skb_pad(skb, MIN_SKB_LEN - skb->len)) {
+			u64_stats_update_begin(&txq->txq_stats.syncp);
+			txq->txq_stats.skb_pad_err++;
+			u64_stats_update_end(&txq->txq_stats.syncp);
 			goto err_out;
+		}
 
 		skb->len = MIN_SKB_LEN;
 	}
@@ -595,6 +630,7 @@ static netdev_tx_t hinic3_send_one_skb(struct sk_buff *skb,
 				  txq->tx_stop_thrs,
 				  txq->tx_start_thrs);
 
+	hinic3_get_pkt_stats(txq, skb);
 	hinic3_prepare_sq_ctrl(&wqe_combo, queue_info, num_sge, owner);
 	hinic3_write_db(txq->sq, 0, DB_CFLAG_DP_SQ,
 			hinic3_get_sq_local_pi(txq->sq));
@@ -604,6 +640,10 @@ static netdev_tx_t hinic3_send_one_skb(struct sk_buff *skb,
 err_drop_pkt:
 	dev_kfree_skb_any(skb);
 err_out:
+	u64_stats_update_begin(&txq->txq_stats.syncp);
+	txq->txq_stats.dropped++;
+	u64_stats_update_end(&txq->txq_stats.syncp);
+
 	return NETDEV_TX_OK;
 }
 
@@ -611,12 +651,26 @@ netdev_tx_t hinic3_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 {
 	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
 	u16 q_id = skb_get_queue_mapping(skb);
+	struct hinic3_txq *txq;
 
-	if (unlikely(!netif_carrier_ok(netdev)))
+	if (unlikely(!netif_carrier_ok(netdev))) {
+		u64_stats_update_begin(&nic_dev->stats.syncp);
+		nic_dev->stats.tx_carrier_off_drop++;
+		u64_stats_update_end(&nic_dev->stats.syncp);
 		goto err_drop_pkt;
+	}
+
+	if (unlikely(q_id >= nic_dev->q_params.num_qps)) {
+		txq = &nic_dev->txqs[0];
+		u64_stats_update_begin(&txq->txq_stats.syncp);
+		txq->txq_stats.dropped++;
+		u64_stats_update_end(&txq->txq_stats.syncp);
 
-	if (unlikely(q_id >= nic_dev->q_params.num_qps))
+		u64_stats_update_begin(&nic_dev->stats.syncp);
+		nic_dev->stats.tx_invalid_qid++;
+		u64_stats_update_end(&nic_dev->stats.syncp);
 		goto err_drop_pkt;
+	}
 
 	return hinic3_send_one_skb(skb, netdev, &nic_dev->txqs[q_id]);
 
@@ -754,6 +808,24 @@ int hinic3_configure_txqs(struct net_device *netdev, u16 num_sq,
 	return 0;
 }
 
+void hinic3_txq_get_stats(struct hinic3_txq *txq,
+			  struct hinic3_txq_stats *stats)
+{
+	struct hinic3_txq_stats *txq_stats = &txq->txq_stats;
+	unsigned int start;
+
+	do {
+		start = u64_stats_fetch_begin(&txq_stats->syncp);
+		stats->busy = txq_stats->busy;
+		stats->skb_pad_err = txq_stats->skb_pad_err;
+		stats->frag_len_overflow = txq_stats->frag_len_overflow;
+		stats->offload_cow_skb_err = txq_stats->offload_cow_skb_err;
+		stats->map_frag_err = txq_stats->map_frag_err;
+		stats->unknown_tunnel_pkt = txq_stats->unknown_tunnel_pkt;
+		stats->frag_size_err = txq_stats->frag_size_err;
+	} while (u64_stats_fetch_retry(&txq_stats->syncp, start));
+}
+
 bool hinic3_tx_poll(struct hinic3_txq *txq, int budget)
 {
 	struct net_device *netdev = txq->netdev;
diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_tx.h b/drivers/net/ethernet/huawei/hinic3/hinic3_tx.h
index 00194f2a1bcc..0a21c423618f 100644
--- a/drivers/net/ethernet/huawei/hinic3/hinic3_tx.h
+++ b/drivers/net/ethernet/huawei/hinic3/hinic3_tx.h
@@ -157,6 +157,8 @@ int hinic3_configure_txqs(struct net_device *netdev, u16 num_sq,
 			  u32 sq_depth, struct hinic3_dyna_txq_res *txqs_res);
 
 netdev_tx_t hinic3_xmit_frame(struct sk_buff *skb, struct net_device *netdev);
+void hinic3_txq_get_stats(struct hinic3_txq *txq,
+			  struct hinic3_txq_stats *stats);
 bool hinic3_tx_poll(struct hinic3_txq *txq, int budget);
 void hinic3_flush_txqs(struct net_device *netdev);
 
-- 
2.43.0


^ permalink raw reply related

* [PATCH net-next v03 5/6] hinic3: Configure netdev->watchdog_timeo to set nic tx timeout
From: Fan Gong @ 2026-03-31  7:56 UTC (permalink / raw)
  To: Fan Gong, Zhu Yikai, netdev, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Andrew Lunn,
	Ioana Ciornei
  Cc: linux-kernel, linux-doc, luosifu, Xin Guo, Zhou Shuai, Wu Like,
	Shi Jing, Zheng Jiezhen, Maxime Chevallier
In-Reply-To: <cover.1774940117.git.zhuyikai1@h-partners.com>

  Configure netdev watchdog timeout to improve transmission reliability.

Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
---
 drivers/net/ethernet/huawei/hinic3/hinic3_main.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_main.c b/drivers/net/ethernet/huawei/hinic3/hinic3_main.c
index 60834f8dffcd..4742c881b7a6 100644
--- a/drivers/net/ethernet/huawei/hinic3/hinic3_main.c
+++ b/drivers/net/ethernet/huawei/hinic3/hinic3_main.c
@@ -33,6 +33,8 @@
 #define HINIC3_RX_PENDING_LIMIT_LOW   2
 #define HINIC3_RX_PENDING_LIMIT_HIGH  8
 
+#define HINIC3_WATCHDOG_TIMEOUT       5
+
 static void init_intr_coal_param(struct net_device *netdev)
 {
 	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
@@ -247,6 +249,8 @@ static void hinic3_assign_netdev_ops(struct net_device *netdev)
 {
 	hinic3_set_netdev_ops(netdev);
 	hinic3_set_ethtool_ops(netdev);
+
+	netdev->watchdog_timeo = HINIC3_WATCHDOG_TIMEOUT * HZ;
 }
 
 static void netdev_feature_init(struct net_device *netdev)
-- 
2.43.0


^ permalink raw reply related

* [PATCH net-next v03 0/6] net: hinic3: PF initialization
From: Fan Gong @ 2026-03-31  7:56 UTC (permalink / raw)
  To: Fan Gong, Zhu Yikai, netdev, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Andrew Lunn,
	Ioana Ciornei
  Cc: linux-kernel, linux-doc, luosifu, Xin Guo, Zhou Shuai, Wu Like,
	Shi Jing, Zheng Jiezhen, Maxime Chevallier

This is [3/3] part of hinic3 Ethernet driver second submission.
With this patch hinic3 becomes a complete Ethernet driver with
pf and vf.

Add 20 ethtool ops for information of queue, rss, coalesce and eth data.
Add MTU size validation
Config netdev watchdog timeout.
Remove unneed coalesce parameters.

Changes:

PATCH 03 V01: https://lore.kernel.org/netdev/cover.1773387649.git.zhuyikai1@h-partners.com/
* Add rmon/pause/phy/mac/ctrl stats (Ioana Ciornei)

PATCH 03 V02: https://lore.kernel.org/netdev/cover.1774684571.git.zhuyikai1@h-partners.com/
* Modify "return -EINVAL" intension problem (AI review)
* Use le16_to_cpu for rss_indir pair.out->buf (AI review)
* Use u32 instead of int in coalesce_limits to avoid overflow (AI review)
* Remove redundant u64_stats_update_begin/end when reading stats without
  concurrent reader (AI review)
* Modify nic_dev->stats.syncp logic (AI review)
* Complete rxq/txq stats stats fileds in hinic3_rx/txq_get_stats (AI review)
* Remove statistics values in rtnl_link_stats64 from ethtool statistics
  values (AI review)
* Add channel_cfg_lock & channel_res_lock to protect resources access (AI review)
* Remove OutOfRangeLengthField, FrameToolong and InRangeLengthErrors (Ioana Ciornei)
* Remove redundant mtu commit (Maxime Chevialler)

PATCH 03 V03:
* Change unnedd to unneeded (AI review)
* Remove packets,bytes,errors and dropped in hinic3_rx/tx_queue_stats (AI review)
* Remove duplicated entried in hinic3_port_stats[] (AI review)
* change stats_info.head.status to ps->head.status (AI review)

Fan Gong (6):
  hinic3: Add ethtool queue ops
  hinic3: Add ethtool statistic ops
  hinic3: Add ethtool coalesce ops
  hinic3: Add ethtool rss ops
  hinic3: Configure netdev->watchdog_timeo to set nic tx timeout
  hinic3: Remove unneeded coalesce parameters

 .../ethernet/huawei/hinic3/hinic3_ethtool.c   | 829 +++++++++++++++++-
 .../ethernet/huawei/hinic3/hinic3_hw_intf.h   |  13 +-
 .../net/ethernet/huawei/hinic3/hinic3_irq.c   |  16 +-
 .../net/ethernet/huawei/hinic3/hinic3_main.c  |  16 +
 .../huawei/hinic3/hinic3_mgmt_interface.h     |  39 +
 .../huawei/hinic3/hinic3_netdev_ops.c         | 101 ++-
 .../ethernet/huawei/hinic3/hinic3_nic_cfg.c   |  64 ++
 .../ethernet/huawei/hinic3/hinic3_nic_cfg.h   | 109 +++
 .../ethernet/huawei/hinic3/hinic3_nic_dev.h   |  24 +
 .../ethernet/huawei/hinic3/hinic3_nic_io.h    |   4 +
 .../net/ethernet/huawei/hinic3/hinic3_rss.c   | 487 +++++++++-
 .../net/ethernet/huawei/hinic3/hinic3_rss.h   |  19 +
 .../net/ethernet/huawei/hinic3/hinic3_rx.c    |  59 +-
 .../net/ethernet/huawei/hinic3/hinic3_rx.h    |  17 +-
 .../net/ethernet/huawei/hinic3/hinic3_tx.c    |  80 +-
 .../net/ethernet/huawei/hinic3/hinic3_tx.h    |   2 +
 16 files changed, 1853 insertions(+), 26 deletions(-)


base-commit: 8e7adcf81564a3fe886a6270eea7558f063e5538
-- 
2.43.0


^ permalink raw reply

* [PATCH net-next v03 4/6] hinic3: Add ethtool rss ops
From: Fan Gong @ 2026-03-31  7:56 UTC (permalink / raw)
  To: Fan Gong, Zhu Yikai, netdev, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Andrew Lunn,
	Ioana Ciornei
  Cc: linux-kernel, linux-doc, luosifu, Xin Guo, Zhou Shuai, Wu Like,
	Shi Jing, Zheng Jiezhen, Maxime Chevallier
In-Reply-To: <cover.1774940117.git.zhuyikai1@h-partners.com>

  Implement following ethtool callback function:
.get_rxnfc
.set_rxnfc
.get_channels
.set_channels
.get_rxfh_indir_size
.get_rxfh_key_size
.get_rxfh
.set_rxfh

  These callbacks allow users to utilize ethtool for detailed
RSS parameters configuration and monitoring.

Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
---
 .../ethernet/huawei/hinic3/hinic3_ethtool.c   |   9 +
 .../huawei/hinic3/hinic3_mgmt_interface.h     |   2 +
 .../net/ethernet/huawei/hinic3/hinic3_rss.c   | 487 +++++++++++++++++-
 .../net/ethernet/huawei/hinic3/hinic3_rss.h   |  19 +
 4 files changed, 515 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_ethtool.c b/drivers/net/ethernet/huawei/hinic3/hinic3_ethtool.c
index a9599a63696f..c29ed438dd27 100644
--- a/drivers/net/ethernet/huawei/hinic3/hinic3_ethtool.c
+++ b/drivers/net/ethernet/huawei/hinic3/hinic3_ethtool.c
@@ -15,6 +15,7 @@
 #include "hinic3_hw_comm.h"
 #include "hinic3_nic_dev.h"
 #include "hinic3_nic_cfg.h"
+#include "hinic3_rss.h"
 
 #define HINIC3_MGMT_VERSION_MAX_LEN     32
 /* Coalesce time properties in microseconds */
@@ -1233,6 +1234,14 @@ static const struct ethtool_ops hinic3_ethtool_ops = {
 	.get_pause_stats                = hinic3_get_pause_stats,
 	.get_coalesce                   = hinic3_get_coalesce,
 	.set_coalesce                   = hinic3_set_coalesce,
+	.get_rxnfc                      = hinic3_get_rxnfc,
+	.set_rxnfc                      = hinic3_set_rxnfc,
+	.get_channels                   = hinic3_get_channels,
+	.set_channels                   = hinic3_set_channels,
+	.get_rxfh_indir_size            = hinic3_get_rxfh_indir_size,
+	.get_rxfh_key_size              = hinic3_get_rxfh_key_size,
+	.get_rxfh                       = hinic3_get_rxfh,
+	.set_rxfh                       = hinic3_set_rxfh,
 };
 
 void hinic3_set_ethtool_ops(struct net_device *netdev)
diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_mgmt_interface.h b/drivers/net/ethernet/huawei/hinic3/hinic3_mgmt_interface.h
index 76c691f82703..3c1263ff99ff 100644
--- a/drivers/net/ethernet/huawei/hinic3/hinic3_mgmt_interface.h
+++ b/drivers/net/ethernet/huawei/hinic3/hinic3_mgmt_interface.h
@@ -282,6 +282,7 @@ enum l2nic_cmd {
 	L2NIC_CMD_SET_VLAN_FILTER_EN  = 26,
 	L2NIC_CMD_SET_RX_VLAN_OFFLOAD = 27,
 	L2NIC_CMD_CFG_RSS             = 60,
+	L2NIC_CMD_GET_RSS_CTX_TBL     = 62,
 	L2NIC_CMD_CFG_RSS_HASH_KEY    = 63,
 	L2NIC_CMD_CFG_RSS_HASH_ENGINE = 64,
 	L2NIC_CMD_SET_RSS_CTX_TBL     = 65,
@@ -301,6 +302,7 @@ enum l2nic_ucode_cmd {
 	L2NIC_UCODE_CMD_MODIFY_QUEUE_CTX  = 0,
 	L2NIC_UCODE_CMD_CLEAN_QUEUE_CTX   = 1,
 	L2NIC_UCODE_CMD_SET_RSS_INDIR_TBL = 4,
+	L2NIC_UCODE_CMD_GET_RSS_INDIR_TBL = 6,
 };
 
 /* hilink mac group command */
diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_rss.c b/drivers/net/ethernet/huawei/hinic3/hinic3_rss.c
index 25db74d8c7dd..1c8aea9d8887 100644
--- a/drivers/net/ethernet/huawei/hinic3/hinic3_rss.c
+++ b/drivers/net/ethernet/huawei/hinic3/hinic3_rss.c
@@ -155,7 +155,7 @@ static int hinic3_set_rss_type(struct hinic3_hwdev *hwdev,
 				       L2NIC_CMD_SET_RSS_CTX_TBL, &msg_params);
 
 	if (ctx_tbl.msg_head.status == MGMT_STATUS_CMD_UNSUPPORTED) {
-		return MGMT_STATUS_CMD_UNSUPPORTED;
+		return -EOPNOTSUPP;
 	} else if (err || ctx_tbl.msg_head.status) {
 		dev_err(hwdev->dev, "mgmt Failed to set rss context offload, err: %d, status: 0x%x\n",
 			err, ctx_tbl.msg_head.status);
@@ -165,6 +165,39 @@ static int hinic3_set_rss_type(struct hinic3_hwdev *hwdev,
 	return 0;
 }
 
+static int hinic3_get_rss_type(struct hinic3_hwdev *hwdev,
+			       struct hinic3_rss_type *rss_type)
+{
+	struct l2nic_cmd_rss_ctx_tbl ctx_tbl = {};
+	struct mgmt_msg_params msg_params = {};
+	int err;
+
+	ctx_tbl.func_id = hinic3_global_func_id(hwdev);
+
+	mgmt_msg_params_init_default(&msg_params, &ctx_tbl, sizeof(ctx_tbl));
+
+	err = hinic3_send_mbox_to_mgmt(hwdev, MGMT_MOD_L2NIC,
+				       L2NIC_CMD_GET_RSS_CTX_TBL,
+				       &msg_params);
+	if (err || ctx_tbl.msg_head.status) {
+		dev_err(hwdev->dev, "Failed to get hash type, err: %d, status: 0x%x\n",
+			err, ctx_tbl.msg_head.status);
+		return -EINVAL;
+	}
+
+	rss_type->ipv4         = L2NIC_RSS_TYPE_GET(ctx_tbl.context, IPV4);
+	rss_type->ipv6         = L2NIC_RSS_TYPE_GET(ctx_tbl.context, IPV6);
+	rss_type->ipv6_ext     = L2NIC_RSS_TYPE_GET(ctx_tbl.context, IPV6_EXT);
+	rss_type->tcp_ipv4     = L2NIC_RSS_TYPE_GET(ctx_tbl.context, TCP_IPV4);
+	rss_type->tcp_ipv6     = L2NIC_RSS_TYPE_GET(ctx_tbl.context, TCP_IPV6);
+	rss_type->tcp_ipv6_ext = L2NIC_RSS_TYPE_GET(ctx_tbl.context,
+						    TCP_IPV6_EXT);
+	rss_type->udp_ipv4     = L2NIC_RSS_TYPE_GET(ctx_tbl.context, UDP_IPV4);
+	rss_type->udp_ipv6     = L2NIC_RSS_TYPE_GET(ctx_tbl.context, UDP_IPV6);
+
+	return 0;
+}
+
 static int hinic3_rss_cfg_hash_type(struct hinic3_hwdev *hwdev, u8 opcode,
 				    enum hinic3_rss_hash_type *type)
 {
@@ -264,7 +297,8 @@ static int hinic3_set_hw_rss_parameters(struct net_device *netdev, u8 rss_en)
 	if (err)
 		return err;
 
-	hinic3_fillout_indir_tbl(netdev, nic_dev->rss_indir);
+	if (!netif_is_rxfh_configured(netdev))
+		hinic3_fillout_indir_tbl(netdev, nic_dev->rss_indir);
 
 	err = hinic3_config_rss_hw_resource(netdev, nic_dev->rss_indir);
 	if (err)
@@ -334,3 +368,452 @@ void hinic3_try_to_enable_rss(struct net_device *netdev)
 	clear_bit(HINIC3_RSS_ENABLE, &nic_dev->flags);
 	nic_dev->q_params.num_qps = nic_dev->max_qps;
 }
+
+static int hinic3_set_l4_rss_hash_ops(const struct ethtool_rxnfc *cmd,
+				      struct hinic3_rss_type *rss_type)
+{
+	u8 rss_l4_en;
+
+	switch (cmd->data & (RXH_L4_B_0_1 | RXH_L4_B_2_3)) {
+	case 0:
+		rss_l4_en = 0;
+		break;
+	case (RXH_L4_B_0_1 | RXH_L4_B_2_3):
+		rss_l4_en = 1;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	switch (cmd->flow_type) {
+	case TCP_V4_FLOW:
+		rss_type->tcp_ipv4 = rss_l4_en;
+		break;
+	case TCP_V6_FLOW:
+		rss_type->tcp_ipv6 = rss_l4_en;
+		break;
+	case UDP_V4_FLOW:
+		rss_type->udp_ipv4 = rss_l4_en;
+		break;
+	case UDP_V6_FLOW:
+		rss_type->udp_ipv6 = rss_l4_en;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int hinic3_update_rss_hash_opts(struct net_device *netdev,
+				       struct ethtool_rxnfc *cmd,
+				       struct hinic3_rss_type *rss_type)
+{
+	int err;
+
+	switch (cmd->flow_type) {
+	case TCP_V4_FLOW:
+	case TCP_V6_FLOW:
+	case UDP_V4_FLOW:
+	case UDP_V6_FLOW:
+		err = hinic3_set_l4_rss_hash_ops(cmd, rss_type);
+		if (err)
+			return err;
+
+		break;
+	case IPV4_FLOW:
+		rss_type->ipv4 = 1;
+		break;
+	case IPV6_FLOW:
+		rss_type->ipv6 = 1;
+		break;
+	default:
+		netdev_err(netdev, "Unsupported flow type\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int hinic3_set_rss_hash_opts(struct net_device *netdev,
+				    struct ethtool_rxnfc *cmd)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+	struct hinic3_rss_type *rss_type;
+	int err;
+
+	rss_type = &nic_dev->rss_type;
+
+	if (!test_bit(HINIC3_RSS_ENABLE, &nic_dev->flags)) {
+		cmd->data = 0;
+		netdev_err(netdev, "RSS is disable, not support to set flow-hash\n");
+		return -EOPNOTSUPP;
+	}
+
+	/* RSS only supports hashing of IP addresses and L4 ports */
+	if (cmd->data & ~(RXH_IP_SRC | RXH_IP_DST |
+			  RXH_L4_B_0_1 | RXH_L4_B_2_3))
+		return -EINVAL;
+
+	/* Both IP addresses must be part of the hash tuple */
+	if (!(cmd->data & RXH_IP_SRC) || !(cmd->data & RXH_IP_DST))
+		return -EINVAL;
+
+	err = hinic3_get_rss_type(nic_dev->hwdev, rss_type);
+	if (err) {
+		netdev_err(netdev, "Failed to get rss type\n");
+		return err;
+	}
+
+	err = hinic3_update_rss_hash_opts(netdev, cmd, rss_type);
+	if (err)
+		return err;
+
+	err = hinic3_set_rss_type(nic_dev->hwdev, *rss_type);
+	if (err) {
+		netdev_err(netdev, "Failed to set rss type\n");
+		return err;
+	}
+
+	return 0;
+}
+
+static void convert_rss_type(u8 rss_opt, struct ethtool_rxnfc *cmd)
+{
+	if (rss_opt)
+		cmd->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
+}
+
+static int hinic3_convert_rss_type(struct net_device *netdev,
+				   struct hinic3_rss_type *rss_type,
+				   struct ethtool_rxnfc *cmd)
+{
+	cmd->data = RXH_IP_SRC | RXH_IP_DST;
+	switch (cmd->flow_type) {
+	case TCP_V4_FLOW:
+		convert_rss_type(rss_type->tcp_ipv4, cmd);
+		break;
+	case TCP_V6_FLOW:
+		convert_rss_type(rss_type->tcp_ipv6, cmd);
+		break;
+	case UDP_V4_FLOW:
+		convert_rss_type(rss_type->udp_ipv4, cmd);
+		break;
+	case UDP_V6_FLOW:
+		convert_rss_type(rss_type->udp_ipv6, cmd);
+		break;
+	case IPV4_FLOW:
+	case IPV6_FLOW:
+		break;
+	default:
+		netdev_err(netdev, "Unsupported flow type\n");
+		cmd->data = 0;
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int hinic3_get_rss_hash_opts(struct net_device *netdev,
+				    struct ethtool_rxnfc *cmd)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+	struct hinic3_rss_type rss_type;
+	int err;
+
+	cmd->data = 0;
+
+	if (!test_bit(HINIC3_RSS_ENABLE, &nic_dev->flags))
+		return 0;
+
+	err = hinic3_get_rss_type(nic_dev->hwdev, &rss_type);
+	if (err) {
+		netdev_err(netdev, "Failed to get rss type\n");
+		return err;
+	}
+
+	return hinic3_convert_rss_type(netdev, &rss_type, cmd);
+}
+
+int hinic3_get_rxnfc(struct net_device *netdev,
+		     struct ethtool_rxnfc *cmd, u32 *rule_locs)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+	int err = 0;
+
+	switch (cmd->cmd) {
+	case ETHTOOL_GRXRINGS:
+		cmd->data = nic_dev->q_params.num_qps;
+		break;
+	case ETHTOOL_GRXFH:
+		err = hinic3_get_rss_hash_opts(netdev, cmd);
+		break;
+	default:
+		err = -EOPNOTSUPP;
+		break;
+	}
+
+	return err;
+}
+
+int hinic3_set_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd)
+{
+	int err;
+
+	switch (cmd->cmd) {
+	case ETHTOOL_SRXFH:
+		err = hinic3_set_rss_hash_opts(netdev, cmd);
+		break;
+	default:
+		err = -EOPNOTSUPP;
+		break;
+	}
+
+	return err;
+}
+
+static u16 hinic3_max_channels(struct net_device *netdev)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+	u8 tcs = netdev_get_num_tc(netdev);
+
+	return tcs ? nic_dev->max_qps / tcs : nic_dev->max_qps;
+}
+
+static u16 hinic3_curr_channels(struct net_device *netdev)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+
+	if (netif_running(netdev))
+		return nic_dev->q_params.num_qps ?
+				nic_dev->q_params.num_qps : 1;
+	else
+		return min_t(u16, hinic3_max_channels(netdev),
+			     nic_dev->q_params.num_qps);
+}
+
+void hinic3_get_channels(struct net_device *netdev,
+			 struct ethtool_channels *channels)
+{
+	channels->max_rx = 0;
+	channels->max_tx = 0;
+	channels->max_other = 0;
+	/* report maximum channels */
+	channels->max_combined = hinic3_max_channels(netdev);
+	channels->rx_count = 0;
+	channels->tx_count = 0;
+	channels->other_count = 0;
+	/* report flow director queues as maximum channels */
+	channels->combined_count = hinic3_curr_channels(netdev);
+}
+
+static int
+hinic3_validate_channel_parameter(struct net_device *netdev,
+				  const struct ethtool_channels *channels)
+{
+	u16 max_channel = hinic3_max_channels(netdev);
+	unsigned int count = channels->combined_count;
+
+	if (!count) {
+		netdev_err(netdev, "Unsupported combined_count=0\n");
+		return -EINVAL;
+	}
+
+	if (channels->tx_count || channels->rx_count || channels->other_count) {
+		netdev_err(netdev, "Setting rx/tx/other count not supported\n");
+		return -EINVAL;
+	}
+
+	if (count > max_channel) {
+		netdev_err(netdev, "Combined count %u exceed limit %u\n", count,
+			   max_channel);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+int hinic3_set_channels(struct net_device *netdev,
+			struct ethtool_channels *channels)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+	unsigned int count = channels->combined_count;
+	struct hinic3_dyna_txrxq_params q_params;
+	int err;
+
+	if (hinic3_validate_channel_parameter(netdev, channels))
+		return -EINVAL;
+
+	if (!test_bit(HINIC3_RSS_ENABLE, &nic_dev->flags)) {
+		netdev_err(netdev, "This function doesn't support RSS, only support 1 queue pair\n");
+		return -EOPNOTSUPP;
+	}
+
+	netdev_dbg(netdev, "Set max combined queue number from %u to %u\n",
+		   nic_dev->q_params.num_qps, count);
+
+	if (netif_running(netdev)) {
+		q_params = nic_dev->q_params;
+		q_params.num_qps = (u16)count;
+		q_params.txqs_res = NULL;
+		q_params.rxqs_res = NULL;
+		q_params.irq_cfg = NULL;
+
+		err = hinic3_change_channel_settings(netdev, &q_params);
+		if (err) {
+			netdev_err(netdev, "Failed to change channel settings\n");
+			return err;
+		}
+	} else {
+		nic_dev->q_params.num_qps = (u16)count;
+	}
+
+	return 0;
+}
+
+u32 hinic3_get_rxfh_indir_size(struct net_device *netdev)
+{
+	return L2NIC_RSS_INDIR_SIZE;
+}
+
+static int hinic3_set_rss_rxfh(struct net_device *netdev,
+			       const u32 *indir, u8 *key)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+	int err;
+	u32 i;
+
+	if (indir) {
+		for (i = 0; i < L2NIC_RSS_INDIR_SIZE; i++)
+			nic_dev->rss_indir[i] = (u16)indir[i];
+
+		err = hinic3_rss_set_indir_tbl(nic_dev->hwdev,
+					       nic_dev->rss_indir);
+		if (err) {
+			netdev_err(netdev, "Failed to set rss indir table\n");
+			return err;
+		}
+	}
+
+	if (key) {
+		err = hinic3_rss_set_hash_key(nic_dev->hwdev, key);
+		if (err) {
+			netdev_err(netdev, "Failed to set rss key\n");
+			return err;
+		}
+
+		memcpy(nic_dev->rss_hkey, key, L2NIC_RSS_KEY_SIZE);
+	}
+
+	return 0;
+}
+
+u32 hinic3_get_rxfh_key_size(struct net_device *netdev)
+{
+	return L2NIC_RSS_KEY_SIZE;
+}
+
+static int hinic3_rss_get_indir_tbl(struct hinic3_hwdev *hwdev,
+				    u32 *indir_table)
+{
+	struct hinic3_cmd_buf_pair pair;
+	__le16 *indir_tbl = NULL;
+	int err, i;
+
+	err = hinic3_cmd_buf_pair_init(hwdev, &pair);
+	if (err) {
+		dev_err(hwdev->dev, "Failed to allocate cmd_buf.\n");
+		return err;
+	}
+
+	err = hinic3_cmdq_detail_resp(hwdev, MGMT_MOD_L2NIC,
+				      L2NIC_UCODE_CMD_GET_RSS_INDIR_TBL,
+				      pair.in, pair.out, NULL);
+	if (err) {
+		dev_err(hwdev->dev, "Failed to get rss indir table\n");
+		goto err_get_indir_tbl;
+	}
+
+	indir_tbl = (__le16 *)pair.out->buf;
+	for (i = 0; i < L2NIC_RSS_INDIR_SIZE; i++)
+		indir_table[i] = le16_to_cpu(*(indir_tbl + i));
+
+err_get_indir_tbl:
+	hinic3_cmd_buf_pair_uninit(hwdev, &pair);
+
+	return err;
+}
+
+int hinic3_get_rxfh(struct net_device *netdev,
+		    struct ethtool_rxfh_param *rxfh)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+	int err = 0;
+
+	if (!test_bit(HINIC3_RSS_ENABLE, &nic_dev->flags)) {
+		netdev_err(netdev, "Rss is disabled\n");
+		return -EOPNOTSUPP;
+	}
+
+	rxfh->hfunc =
+		nic_dev->rss_hash_type == HINIC3_RSS_HASH_ENGINE_TYPE_XOR ?
+		ETH_RSS_HASH_XOR : ETH_RSS_HASH_TOP;
+
+	if (rxfh->indir) {
+		err = hinic3_rss_get_indir_tbl(nic_dev->hwdev, rxfh->indir);
+		if (err)
+			return err;
+	}
+
+	if (rxfh->key)
+		memcpy(rxfh->key, nic_dev->rss_hkey, L2NIC_RSS_KEY_SIZE);
+
+	return err;
+}
+
+static int hinic3_update_hash_func_type(struct net_device *netdev, u8 hfunc)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+	enum hinic3_rss_hash_type new_rss_hash_type;
+
+	switch (hfunc) {
+	case ETH_RSS_HASH_NO_CHANGE:
+		return 0;
+	case ETH_RSS_HASH_XOR:
+		new_rss_hash_type = HINIC3_RSS_HASH_ENGINE_TYPE_XOR;
+		break;
+	case ETH_RSS_HASH_TOP:
+		new_rss_hash_type = HINIC3_RSS_HASH_ENGINE_TYPE_TOEP;
+		break;
+	default:
+		netdev_err(netdev, "Unsupported hash func %u\n", hfunc);
+		return -EOPNOTSUPP;
+	}
+
+	if (new_rss_hash_type == nic_dev->rss_hash_type)
+		return 0;
+
+	nic_dev->rss_hash_type = new_rss_hash_type;
+	return hinic3_rss_set_hash_type(nic_dev->hwdev, nic_dev->rss_hash_type);
+}
+
+int hinic3_set_rxfh(struct net_device *netdev,
+		    struct ethtool_rxfh_param *rxfh,
+		    struct netlink_ext_ack *extack)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+	int err;
+
+	if (!test_bit(HINIC3_RSS_ENABLE, &nic_dev->flags)) {
+		netdev_err(netdev, "Not support to set rss parameters when rss is disable\n");
+		return -EOPNOTSUPP;
+	}
+
+	err = hinic3_update_hash_func_type(netdev, rxfh->hfunc);
+	if (err)
+		return err;
+
+	err = hinic3_set_rss_rxfh(netdev, rxfh->indir, rxfh->key);
+
+	return err;
+}
diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_rss.h b/drivers/net/ethernet/huawei/hinic3/hinic3_rss.h
index 78d82c2aca06..9f1b77780cd4 100644
--- a/drivers/net/ethernet/huawei/hinic3/hinic3_rss.h
+++ b/drivers/net/ethernet/huawei/hinic3/hinic3_rss.h
@@ -5,10 +5,29 @@
 #define _HINIC3_RSS_H_
 
 #include <linux/netdevice.h>
+#include <linux/ethtool.h>
 
 int hinic3_rss_init(struct net_device *netdev);
 void hinic3_rss_uninit(struct net_device *netdev);
 void hinic3_try_to_enable_rss(struct net_device *netdev);
 void hinic3_clear_rss_config(struct net_device *netdev);
 
+int hinic3_get_rxnfc(struct net_device *netdev,
+		     struct ethtool_rxnfc *cmd, u32 *rule_locs);
+int hinic3_set_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd);
+
+void hinic3_get_channels(struct net_device *netdev,
+			 struct ethtool_channels *channels);
+int hinic3_set_channels(struct net_device *netdev,
+			struct ethtool_channels *channels);
+
+u32 hinic3_get_rxfh_indir_size(struct net_device *netdev);
+u32 hinic3_get_rxfh_key_size(struct net_device *netdev);
+
+int hinic3_get_rxfh(struct net_device *netdev,
+		    struct ethtool_rxfh_param *rxfh);
+int hinic3_set_rxfh(struct net_device *netdev,
+		    struct ethtool_rxfh_param *rxfh,
+		    struct netlink_ext_ack *extack);
+
 #endif
-- 
2.43.0


^ permalink raw reply related

* [PATCH net-next v03 1/6] hinic3: Add ethtool queue ops
From: Fan Gong @ 2026-03-31  7:56 UTC (permalink / raw)
  To: Fan Gong, Zhu Yikai, netdev, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Andrew Lunn,
	Ioana Ciornei
  Cc: linux-kernel, linux-doc, luosifu, Xin Guo, Zhou Shuai, Wu Like,
	Shi Jing, Zheng Jiezhen, Maxime Chevallier
In-Reply-To: <cover.1774940117.git.zhuyikai1@h-partners.com>

  Implement following ethtool callback function:
.get_ringparam
.set_ringparam

  These callbacks allow users to utilize ethtool for detailed
queue depth configuration and monitoring.

Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com>
Signed-off-by: Fan Gong <gongfan1@huawei.com>
---
 .../ethernet/huawei/hinic3/hinic3_ethtool.c   |  94 ++++++++++++++++
 .../net/ethernet/huawei/hinic3/hinic3_irq.c   |  10 +-
 .../net/ethernet/huawei/hinic3/hinic3_main.c  |  11 ++
 .../huawei/hinic3/hinic3_netdev_ops.c         | 101 +++++++++++++++++-
 .../ethernet/huawei/hinic3/hinic3_nic_dev.h   |  16 +++
 .../ethernet/huawei/hinic3/hinic3_nic_io.h    |   4 +
 6 files changed, 231 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_ethtool.c b/drivers/net/ethernet/huawei/hinic3/hinic3_ethtool.c
index 90fc16288de9..d78aff802a20 100644
--- a/drivers/net/ethernet/huawei/hinic3/hinic3_ethtool.c
+++ b/drivers/net/ethernet/huawei/hinic3/hinic3_ethtool.c
@@ -409,6 +409,98 @@ hinic3_get_link_ksettings(struct net_device *netdev,
 	return 0;
 }
 
+static void hinic3_get_ringparam(struct net_device *netdev,
+				 struct ethtool_ringparam *ring,
+				 struct kernel_ethtool_ringparam *kernel_ring,
+				 struct netlink_ext_ack *extack)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+
+	ring->rx_max_pending = HINIC3_MAX_RX_QUEUE_DEPTH;
+	ring->tx_max_pending = HINIC3_MAX_TX_QUEUE_DEPTH;
+	ring->rx_pending = nic_dev->rxqs[0].q_depth;
+	ring->tx_pending = nic_dev->txqs[0].q_depth;
+}
+
+static void hinic3_update_qp_depth(struct net_device *netdev,
+				   u32 sq_depth, u32 rq_depth)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+	u16 i;
+
+	nic_dev->q_params.sq_depth = sq_depth;
+	nic_dev->q_params.rq_depth = rq_depth;
+	for (i = 0; i < nic_dev->max_qps; i++) {
+		nic_dev->txqs[i].q_depth = sq_depth;
+		nic_dev->txqs[i].q_mask = sq_depth - 1;
+		nic_dev->rxqs[i].q_depth = rq_depth;
+		nic_dev->rxqs[i].q_mask = rq_depth - 1;
+	}
+}
+
+static int hinic3_check_ringparam_valid(struct net_device *netdev,
+					const struct ethtool_ringparam *ring)
+{
+	if (ring->rx_jumbo_pending || ring->rx_mini_pending) {
+		netdev_err(netdev, "Unsupported rx_jumbo_pending/rx_mini_pending\n");
+		return -EINVAL;
+	}
+
+	if (ring->tx_pending > HINIC3_MAX_TX_QUEUE_DEPTH ||
+	    ring->tx_pending < HINIC3_MIN_QUEUE_DEPTH ||
+	    ring->rx_pending > HINIC3_MAX_RX_QUEUE_DEPTH ||
+	    ring->rx_pending < HINIC3_MIN_QUEUE_DEPTH) {
+		netdev_err(netdev,
+			   "Queue depth out of range tx[%d-%d] rx[%d-%d]\n",
+			   HINIC3_MIN_QUEUE_DEPTH, HINIC3_MAX_TX_QUEUE_DEPTH,
+			   HINIC3_MIN_QUEUE_DEPTH, HINIC3_MAX_RX_QUEUE_DEPTH);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int hinic3_set_ringparam(struct net_device *netdev,
+				struct ethtool_ringparam *ring,
+				struct kernel_ethtool_ringparam *kernel_ring,
+				struct netlink_ext_ack *extack)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+	struct hinic3_dyna_txrxq_params q_params = {};
+	u32 new_sq_depth, new_rq_depth;
+	int err;
+
+	err = hinic3_check_ringparam_valid(netdev, ring);
+	if (err)
+		return err;
+
+	new_sq_depth = 1U << ilog2(ring->tx_pending);
+	new_rq_depth = 1U << ilog2(ring->rx_pending);
+	if (new_sq_depth == nic_dev->q_params.sq_depth &&
+	    new_rq_depth == nic_dev->q_params.rq_depth)
+		return 0;
+
+	netdev_dbg(netdev, "Change Tx/Rx ring depth from %u/%u to %u/%u\n",
+		   nic_dev->q_params.sq_depth, nic_dev->q_params.rq_depth,
+		   new_sq_depth, new_rq_depth);
+
+	if (!netif_running(netdev)) {
+		hinic3_update_qp_depth(netdev, new_sq_depth, new_rq_depth);
+	} else {
+		q_params = nic_dev->q_params;
+		q_params.sq_depth = new_sq_depth;
+		q_params.rq_depth = new_rq_depth;
+
+		err = hinic3_change_channel_settings(netdev, &q_params);
+		if (err) {
+			netdev_err(netdev, "Failed to change channel settings\n");
+			return err;
+		}
+	}
+
+	return 0;
+}
+
 static const struct ethtool_ops hinic3_ethtool_ops = {
 	.supported_coalesce_params      = ETHTOOL_COALESCE_USECS |
 					  ETHTOOL_COALESCE_PKT_RATE_RX_USECS,
@@ -417,6 +509,8 @@ static const struct ethtool_ops hinic3_ethtool_ops = {
 	.get_msglevel                   = hinic3_get_msglevel,
 	.set_msglevel                   = hinic3_set_msglevel,
 	.get_link                       = ethtool_op_get_link,
+	.get_ringparam                  = hinic3_get_ringparam,
+	.set_ringparam                  = hinic3_set_ringparam,
 };
 
 void hinic3_set_ethtool_ops(struct net_device *netdev)
diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_irq.c b/drivers/net/ethernet/huawei/hinic3/hinic3_irq.c
index e7d6c2033b45..d3b3927b5408 100644
--- a/drivers/net/ethernet/huawei/hinic3/hinic3_irq.c
+++ b/drivers/net/ethernet/huawei/hinic3/hinic3_irq.c
@@ -135,10 +135,16 @@ static int hinic3_set_interrupt_moder(struct net_device *netdev, u16 q_id,
 {
 	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
 	struct hinic3_interrupt_info info = {};
+	unsigned long flags;
 	int err;
 
-	if (q_id >= nic_dev->q_params.num_qps)
+	spin_lock_irqsave(&nic_dev->channel_res_lock, flags);
+
+	if (!HINIC3_CHANNEL_RES_VALID(nic_dev) ||
+	    q_id >= nic_dev->q_params.num_qps) {
+		spin_unlock_irqrestore(&nic_dev->channel_res_lock, flags);
 		return 0;
+	}
 
 	info.interrupt_coalesc_set = 1;
 	info.coalesc_timer_cfg = coalesc_timer_cfg;
@@ -147,6 +153,8 @@ static int hinic3_set_interrupt_moder(struct net_device *netdev, u16 q_id,
 	info.resend_timer_cfg =
 		nic_dev->intr_coalesce[q_id].resend_timer_cfg;
 
+	spin_unlock_irqrestore(&nic_dev->channel_res_lock, flags);
+
 	err = hinic3_set_interrupt_cfg(nic_dev->hwdev, info);
 	if (err) {
 		netdev_err(netdev,
diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_main.c b/drivers/net/ethernet/huawei/hinic3/hinic3_main.c
index 0a888fe4c975..3b470978714a 100644
--- a/drivers/net/ethernet/huawei/hinic3/hinic3_main.c
+++ b/drivers/net/ethernet/huawei/hinic3/hinic3_main.c
@@ -179,6 +179,8 @@ static int hinic3_sw_init(struct net_device *netdev)
 	int err;
 
 	mutex_init(&nic_dev->port_state_mutex);
+	mutex_init(&nic_dev->channel_cfg_lock);
+	spin_lock_init(&nic_dev->channel_res_lock);
 
 	nic_dev->q_params.sq_depth = HINIC3_SQ_DEPTH;
 	nic_dev->q_params.rq_depth = HINIC3_RQ_DEPTH;
@@ -314,6 +316,15 @@ static void hinic3_link_status_change(struct net_device *netdev,
 				      bool link_status_up)
 {
 	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+	unsigned long flags;
+	bool valid;
+
+	spin_lock_irqsave(&nic_dev->channel_res_lock, flags);
+	valid = HINIC3_CHANNEL_RES_VALID(nic_dev);
+	spin_unlock_irqrestore(&nic_dev->channel_res_lock, flags);
+
+	if (!valid)
+		return;
 
 	if (link_status_up) {
 		if (netif_carrier_ok(netdev))
diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_netdev_ops.c b/drivers/net/ethernet/huawei/hinic3/hinic3_netdev_ops.c
index da73811641a9..ae485afeb14e 100644
--- a/drivers/net/ethernet/huawei/hinic3/hinic3_netdev_ops.c
+++ b/drivers/net/ethernet/huawei/hinic3/hinic3_netdev_ops.c
@@ -428,6 +428,82 @@ static void hinic3_vport_down(struct net_device *netdev)
 	}
 }
 
+int
+hinic3_change_channel_settings(struct net_device *netdev,
+			       struct hinic3_dyna_txrxq_params *trxq_params)
+{
+	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
+	struct hinic3_dyna_qp_params new_qp_params = {};
+	struct hinic3_dyna_qp_params cur_qp_params = {};
+	bool need_teardown = false;
+	unsigned long flags;
+	int err;
+
+	mutex_lock(&nic_dev->channel_cfg_lock);
+
+	hinic3_config_num_qps(netdev, trxq_params);
+
+	err = hinic3_alloc_channel_resources(netdev, &new_qp_params,
+					     trxq_params);
+	if (err) {
+		netdev_err(netdev, "Failed to alloc channel resources\n");
+		mutex_unlock(&nic_dev->channel_cfg_lock);
+		return err;
+	}
+
+	spin_lock_irqsave(&nic_dev->channel_res_lock, flags);
+	if (!test_and_set_bit(HINIC3_CHANGE_RES_INVALID, &nic_dev->flags))
+		need_teardown = true;
+	spin_unlock_irqrestore(&nic_dev->channel_res_lock, flags);
+
+	if (need_teardown) {
+		hinic3_vport_down(netdev);
+		hinic3_close_channel(netdev);
+		hinic3_uninit_qps(nic_dev, &cur_qp_params);
+		hinic3_free_channel_resources(netdev, &cur_qp_params,
+					      &nic_dev->q_params);
+	}
+
+	if (nic_dev->num_qp_irq > trxq_params->num_qps)
+		hinic3_qp_irq_change(netdev, trxq_params->num_qps);
+
+	spin_lock_irqsave(&nic_dev->channel_res_lock, flags);
+	nic_dev->q_params = *trxq_params;
+	spin_unlock_irqrestore(&nic_dev->channel_res_lock, flags);
+
+	hinic3_init_qps(nic_dev, &new_qp_params);
+
+	err = hinic3_open_channel(netdev);
+	if (err)
+		goto err_uninit_qps;
+
+	err = hinic3_vport_up(netdev);
+	if (err)
+		goto err_close_channel;
+
+	spin_lock_irqsave(&nic_dev->channel_res_lock, flags);
+	clear_bit(HINIC3_CHANGE_RES_INVALID, &nic_dev->flags);
+	spin_unlock_irqrestore(&nic_dev->channel_res_lock, flags);
+
+	mutex_unlock(&nic_dev->channel_cfg_lock);
+
+	return 0;
+
+err_close_channel:
+	hinic3_close_channel(netdev);
+err_uninit_qps:
+	spin_lock_irqsave(&nic_dev->channel_res_lock, flags);
+	memset(&nic_dev->q_params, 0, sizeof(nic_dev->q_params));
+	spin_unlock_irqrestore(&nic_dev->channel_res_lock, flags);
+
+	hinic3_uninit_qps(nic_dev, &new_qp_params);
+	hinic3_free_channel_resources(netdev, &new_qp_params, trxq_params);
+
+	mutex_unlock(&nic_dev->channel_cfg_lock);
+
+	return err;
+}
+
 static int hinic3_open(struct net_device *netdev)
 {
 	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
@@ -487,16 +563,33 @@ static int hinic3_close(struct net_device *netdev)
 {
 	struct hinic3_nic_dev *nic_dev = netdev_priv(netdev);
 	struct hinic3_dyna_qp_params qp_params;
+	bool need_teardown = false;
+	unsigned long flags;
 
 	if (!test_and_clear_bit(HINIC3_INTF_UP, &nic_dev->flags)) {
 		netdev_dbg(netdev, "Netdev already close, do nothing\n");
 		return 0;
 	}
 
-	hinic3_vport_down(netdev);
-	hinic3_close_channel(netdev);
-	hinic3_uninit_qps(nic_dev, &qp_params);
-	hinic3_free_channel_resources(netdev, &qp_params, &nic_dev->q_params);
+	mutex_lock(&nic_dev->channel_cfg_lock);
+
+	spin_lock_irqsave(&nic_dev->channel_res_lock, flags);
+	if (!test_and_set_bit(HINIC3_CHANGE_RES_INVALID, &nic_dev->flags))
+		need_teardown = true;
+	spin_unlock_irqrestore(&nic_dev->channel_res_lock, flags);
+
+	if (need_teardown) {
+		hinic3_vport_down(netdev);
+		hinic3_close_channel(netdev);
+		hinic3_uninit_qps(nic_dev, &qp_params);
+		hinic3_free_channel_resources(netdev, &qp_params,
+					      &nic_dev->q_params);
+	}
+
+	hinic3_free_nicio_res(nic_dev);
+	hinic3_destroy_num_qps(netdev);
+
+	mutex_unlock(&nic_dev->channel_cfg_lock);
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_nic_dev.h b/drivers/net/ethernet/huawei/hinic3/hinic3_nic_dev.h
index 9502293ff710..55b280888ad8 100644
--- a/drivers/net/ethernet/huawei/hinic3/hinic3_nic_dev.h
+++ b/drivers/net/ethernet/huawei/hinic3/hinic3_nic_dev.h
@@ -10,6 +10,9 @@
 #include "hinic3_hw_cfg.h"
 #include "hinic3_hwdev.h"
 #include "hinic3_mgmt_interface.h"
+#include "hinic3_nic_io.h"
+#include "hinic3_tx.h"
+#include "hinic3_rx.h"
 
 #define HINIC3_VLAN_BITMAP_BYTE_SIZE(nic_dev)  (sizeof(*(nic_dev)->vlan_bitmap))
 #define HINIC3_VLAN_BITMAP_SIZE(nic_dev)  \
@@ -20,8 +23,13 @@ enum hinic3_flags {
 	HINIC3_MAC_FILTER_CHANGED,
 	HINIC3_RSS_ENABLE,
 	HINIC3_UPDATE_MAC_FILTER,
+	HINIC3_CHANGE_RES_INVALID,
 };
 
+#define HINIC3_CHANNEL_RES_VALID(nic_dev) \
+	(test_bit(HINIC3_INTF_UP, &(nic_dev)->flags) && \
+	 !test_bit(HINIC3_CHANGE_RES_INVALID, &(nic_dev)->flags))
+
 enum hinic3_event_work_flags {
 	HINIC3_EVENT_WORK_TX_TIMEOUT,
 };
@@ -129,6 +137,10 @@ struct hinic3_nic_dev {
 	struct work_struct              rx_mode_work;
 	/* lock for enable/disable port */
 	struct mutex                    port_state_mutex;
+	/* lock for channel configuration */
+	struct mutex                    channel_cfg_lock;
+	/* lock for channel resources */
+	spinlock_t                      channel_res_lock;
 
 	struct list_head                uc_filter_list;
 	struct list_head                mc_filter_list;
@@ -143,6 +155,10 @@ struct hinic3_nic_dev {
 
 void hinic3_set_netdev_ops(struct net_device *netdev);
 int hinic3_set_hw_features(struct net_device *netdev);
+int
+hinic3_change_channel_settings(struct net_device *netdev,
+			       struct hinic3_dyna_txrxq_params *trxq_params);
+
 int hinic3_qps_irq_init(struct net_device *netdev);
 void hinic3_qps_irq_uninit(struct net_device *netdev);
 
diff --git a/drivers/net/ethernet/huawei/hinic3/hinic3_nic_io.h b/drivers/net/ethernet/huawei/hinic3/hinic3_nic_io.h
index 12eefabcf1db..3791b9bc865b 100644
--- a/drivers/net/ethernet/huawei/hinic3/hinic3_nic_io.h
+++ b/drivers/net/ethernet/huawei/hinic3/hinic3_nic_io.h
@@ -14,6 +14,10 @@ struct hinic3_nic_dev;
 #define HINIC3_RQ_WQEBB_SHIFT      3
 #define HINIC3_SQ_WQEBB_SIZE       BIT(HINIC3_SQ_WQEBB_SHIFT)
 
+#define HINIC3_MAX_TX_QUEUE_DEPTH  65536
+#define HINIC3_MAX_RX_QUEUE_DEPTH  16384
+#define HINIC3_MIN_QUEUE_DEPTH     128
+
 /* ******************** RQ_CTRL ******************** */
 enum hinic3_rq_wqe_type {
 	HINIC3_NORMAL_RQ_WQE = 1,
-- 
2.43.0


^ permalink raw reply related

* Re: (sashiko status) [PATCH 0/2] Docs/admin-guide/mm/damon: warn commit_inputs vs other params race
From: SeongJae Park @ 2026-03-31  4:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: SeongJae Park, Greg KH, Liam R. Howlett, # 5 . 19 . x,
	David Hildenbrand, Jonathan Corbet, Lorenzo Stoakes, Michal Hocko,
	Mike Rapoport, Shuah Khan, Suren Baghdasaryan, Vlastimil Babka,
	damon, linux-doc, linux-kernel, linux-mm, Roman Gushchin
In-Reply-To: <20260330142205.e7c7d7b47ec15a634f6eebf4@linux-foundation.org>

On Mon, 30 Mar 2026 14:22:05 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:

> On Sun, 29 Mar 2026 12:32:26 -0700 SeongJae Park <sj@kernel.org> wrote:
> 
> > On Sun, 29 Mar 2026 20:05:53 +0200 Greg KH <gregkh@linuxfoundation.org> wrote:
> > 
> > > On Sun, Mar 29, 2026 at 08:49:16AM -0700, SeongJae Park wrote:
> > > > Forwarding sashiko.dev review status for this thread.
> > > > 
> > > > # review url: https://sashiko.dev/#/patchset/20260329153052.46657-1-sj@kernel.org
> > > 
> > > Why are you doing this?  If we want to see the review, can't we just go
> > > and look at the tool itself?
> > 
> > We can.  But it is bit cumbersome to opening web browser and moving my focus to
> > there.  Reading everything on the mailing tool is easier for some people like
> > me.  Like some test bots send reports are replying to patches, or we sometimes
> > forwarding bugzilla reports to mailing lists in a form of a plain text mail.
> > 
> > Secondly, I have to share my opinions about the reviews, as many times AI
> > reviews need human's opinions.  There is no good way to do that on the web ui
> > of the tool (sashiko) for now, and I think this mail based flow is the best.
> 
> I do agree with Greg that it's all a bit excessive.  Thanks for your
> your diligence, but perhaps dial it back a bit?  It's OK - we're all
> trying to figure out how best to utilize this tool.

Thank you for your kind words, Andrew.  I understand and admit the fact that
this looks excessive.

> 
> I view Sashiko as primarily an author tool.  Sometimes I call it
> checkpatch++.

Thank you for sharing your perspective.  This is helpful at what you want from
the use of the tool, thank you.

My view of sashiko was a human reviewer that having very odd characteristic and
cannot answer to my feedback for a reason, but still being useful in many
cases.  Hence I wanted to help the special reviewer be able to communicate with
others on the mailing list.  And I was thinking anyway that's what sashiko will
do, because I saw sending review as mail as one of TODO items for sashiko, from
the public announcement, and I onboarded DAMON for that.

But apparently not everyone is sharing same view.  My understanding of the TODO
item in sashiko public announcement may also be biased.  Maybe being a
subsystem's sole maintainer that looking for a reviewer made such uncautiously
biased perspectives.

> In a better world, author would be able to sort out
> Sashiko issues before ever sending out the patchset.  But in this
> world, a public send is needed to obtain that review.
> 
> So what we're presently seeing is author development activity which is
> unfortunately and inappropriately being conducted on a public list.

Makes sense.  Now I understand why you and Roman were discussing having a
separate mailing list for sharing the reviews via mail as a path forward, and
I agree that could be a good option.

> 
> Personally, I pay only a little attention to author's Sashiko activity.
> Just enough to see whether I should pay more attention.  If author
> says "oops, let me redo" then fine, I'll await the next spin.  If
> author says "that was all nonsense" then fine, time to take a closer
> look.

Makes sense.  I will try to keep sharing necessary information, but for only
targetted audiences, with less traffic.

Thanks,
SJ

^ permalink raw reply

* Re: (sashiko status) [PATCH 0/2] Docs/admin-guide/mm/damon: warn commit_inputs vs other params race
From: SeongJae Park @ 2026-03-31  4:34 UTC (permalink / raw)
  To: Greg KH
  Cc: SeongJae Park, Andrew Morton, Liam R. Howlett, # 5 . 19 . x,
	David Hildenbrand, Jonathan Corbet, Lorenzo Stoakes, Michal Hocko,
	Mike Rapoport, Shuah Khan, Suren Baghdasaryan, Vlastimil Babka,
	damon, linux-doc, linux-kernel, linux-mm, Roman Gushchin
In-Reply-To: <2026033013-drainage-stylized-43d6@gregkh>

On Mon, 30 Mar 2026 07:47:54 +0200 Greg KH <gregkh@linuxfoundation.org> wrote:

> On Sun, Mar 29, 2026 at 12:32:26PM -0700, SeongJae Park wrote:
> > + Roman for a case he has any opinion about my sashiko usage.
> > 
> > Hello Greg,
> > 
> > On Sun, 29 Mar 2026 20:05:53 +0200 Greg KH <gregkh@linuxfoundation.org> wrote:
> > 
> > > On Sun, Mar 29, 2026 at 08:49:16AM -0700, SeongJae Park wrote:
> > > > Forwarding sashiko.dev review status for this thread.
> > > > 
> > > > # review url: https://sashiko.dev/#/patchset/20260329153052.46657-1-sj@kernel.org
> > > 
> > > Why are you doing this?  If we want to see the review, can't we just go
> > > and look at the tool itself?
> > 
> > We can.  But it is bit cumbersome to opening web browser and moving my focus to
> > there.  Reading everything on the mailing tool is easier for some people like
> > me.  Like some test bots send reports are replying to patches, or we sometimes
> > forwarding bugzilla reports to mailing lists in a form of a plain text mail.
> 
> Sure, but are you going to now forward all random tool reviews that are
> run on your subsystem to all of these lists (your distribution cc: is
> quite large here)?

Obviously not for random tools.  But if there are a few tools that (nearly)
everyone agrees useful and worthy to integrate with the mailing lists workflow,
I would like to.

Now it seems I was much more optimistic that others.

> 
> > Secondly, I have to share my opinions about the reviews, as many times AI
> > reviews need human's opinions.  There is no good way to do that on the web ui
> > of the tool (sashiko) for now, and I think this mail based flow is the best.
> 
> That is assuming that you can fix up the AI reviews, is that happening
> here?

What I mean with the required human opinions for the AI reviews are not
necessarily only for fixups, but also sharing of reviews that the human and the
tool are aligned.

But in this case, I was sharing the review results seems incorrect, or doesn't
need deep dive at least:
https://lore.kernel.org/20260329163102.58683-1-sj@kernel.org

> 
> > And anyway I'm supposed to share at least my review of AI reviews, in mm
> > community.  If I ignore, I will only make Andrew have to reply asking that.
> > 
> > I used to share only my review of the AI reviews as replies, instead of
> > forwarding AI reviews and then replies to those.  But it was
> > 1. cumbersome for me (should summarize AI review and then my review; feeling
> >    doing work twice), and
> > 2. feeling not optimal at sharing all concerning comments with others.  My
> >    summary might miss some points of AI review but other reviewers might just
> >    believe me and don't read the full review due to the additional web browser
> >    opening work.  Also some other reivewers might kindly review AI reviews
> >    before I do, and save my (or their) time.
> > 
> > Hence I ended up to do this bit odd workflow:  Forwarding the full AI review on
> > the mailing list first, then reply my responses.
> > 
> > > sending it back to all of us feels odd,
> > 
> > If this is polluting your inbox and/or distract you, I'm so sorry for that.
> > Please let me know if this is distracting you.  Maybe I can filtering people
> > who don't want this kind of replies out of the recipients for the forwarding
> > mails.  Or, if you have a suggestion about what need to be changed, please let
> > me know.
> 
> It just seemed odd, and might get crazy over time if this happens for
> all random AI tools that happen to be popping up now, right?

As I also mentioned above, I agree.  And seems in this case I was much more
optimistic that others, or hallucinated ;)

> If this is
> the "official" one for -mm, that's fine, but consider the distribution
> and intended audience a bit please.

Andrew replied this is not such official and recommended action for mm.  I once
thought this could be the official one for DAMON only.  But in any case, I now
understand this can look crazy, odd or excessive to some people including those
that I believe.  I will think about a better way to use this tool, while
keeping your inputs in my mind.

Thank you so much for sharing your opinions, Greg.


Thanks,
SJ

[...]

^ permalink raw reply

* Re: [PATCH v2] bootconfig: Apply early options from embedded config
From: Masami Hiramatsu @ 2026-03-31  3:58 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Jonathan Corbet, Shuah Khan, linux-kernel, linux-trace-kernel,
	linux-doc, oss, paulmck, rostedt, kernel-team
In-Reply-To: <acqJk-zbyjIiy6hJ@gmail.com>

On Mon, 30 Mar 2026 08:04:23 -0700
Breno Leitao <leitao@debian.org> wrote:

> On Mon, Mar 30, 2026 at 06:15:17AM -0700, Breno Leitao wrote:
> > On Fri, Mar 27, 2026 at 10:37:44PM +0900, Masami Hiramatsu wrote:
> > > On Fri, 27 Mar 2026 03:06:41 -0700
> > > Breno Leitao <leitao@debian.org> wrote:
> >
> > > > > To fix this, we need to change setup_arch() for each architecture so
> > > > > that it calls this bootconfig_apply_early_params().
> > > >
> > > > Could we instead integrate this into parse_early_param() itself? That
> > > > approach would avoid the need to modify each architecture individually.
> > >
> > > Ah, indeed.
> >
> > I investigated integrating bootconfig into parse_early_param() and hit a
> > blocker: xbc_init() and xbc_make_cmdline() depend on memblock_alloc(), but on
> > most architectures (x86, arm64, arm, s390, riscv) parse_early_param() is called
> > from setup_arch() _before_ memblock is initialized.
> 
> That said, I'd like to propose a simpler approach as a first step:
> 
> 1) Keep calling bootconfig_apply_early_params() from setup_boot_config().
>    This is the least intrusive approach and expands bootconfig support to
>    additional early boot parameters.
> 
> 2) Document that architecture-specific early parameters might be ignored.
>    If a parameter is consumed early enough (during setup_arch()), it will
>    not see the bootconfig value.

We have to carefully do this because what parameter is arch specific or not
depends on architecture and undocumented.

How about introducing a new Kconfig, which supports early params by
embedded bootconfig?

> 
> 3) Ensure that early bootconfig parameters don't overwrite the boot command
>    line. For example, if the boot command line has foo=bar and bootconfig
>    later has foo=baz, the command line value should take precedence.
>    This prevents early boot code (in setup_arch()) from seeing a parameter
>    value that will be changed later.

OK, this also needs to be considered. Currently we just pass the bootconfig
parameters right before bootloader given parameters as "extra_command_line"
if "bootconfig" in cmdline or CONFIG_BOOT_CONFIG_FORCE=y.

[boot_config(.kernel)]<command_line>[ -- [boot_config(.init)][init_command_line]]

This is currently expected behavior. The bootconfig parameters are
expected to be overridden by command_line or command_line are appended.

If we change this for early params, we also should change the expected
output of /proc/cmdline too. I think we have 2 options;

 - As before, we expect the parameters provided by the boot configuration
   to be processed first and then overridden later by the command line.

Or,

 - ignore all parameters which is given from the command line, this also
   updates existing setup_boot_config() (means xbc_snprint_cmdline() ).

Anyway, this behavior change will also be a bit critical... We have
to announce it.

> 
> 
> If that is OK, that is what I have right now:

Let me comment on it.

> diff --git a/Documentation/admin-guide/bootconfig.rst b/Documentation/admin-guide/bootconfig.rst
> index f712758472d5c..6ed852a0c66d8 100644
> --- a/Documentation/admin-guide/bootconfig.rst
> +++ b/Documentation/admin-guide/bootconfig.rst
> @@ -169,6 +169,15 @@ Boot Kernel With a Boot Config
>  There are two options to boot the kernel with bootconfig: attaching the
>  bootconfig to the initrd image or embedding it in the kernel itself.
>  
> +Early options (those registered with ``early_param()``) may only be
> +specified in the embedded bootconfig, because the initrd is not yet
> +available when early parameters are processed.
> +
> +Note that embedded bootconfig is parsed after ``setup_arch()``, so
> +early options that are consumed during architecture initialization
> +(e.g., ``mem=``, ``memmap=``, ``earlycon``, ``noapic``, ``nolapic``,
> +``acpi=``, ``numa=``, ``iommu=``) may not take effect from bootconfig.
> +

This is easy to explain, but it's quite troublesome for users to
determine which parameters are unavailable. Currently we can identify
it by `git grep early_param -- arch/${ARCH}`. But it is setup in
setup_arch() we need to track the source code. (Or ask AI :))


>  Attaching a Boot Config to Initrd
>  ---------------------------------
>  
> diff --git a/init/Kconfig b/init/Kconfig
> index 7484cd703bc1a..34adcc1feb9b6 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1525,6 +1525,16 @@ config BOOT_CONFIG_EMBED
>  	  image. But if the system doesn't support initrd, this option will
>  	  help you by embedding a bootconfig file while building the kernel.
>  
> +	  Unlike bootconfig attached to initrd, the embedded bootconfig also
> +	  supports early options (those registered with early_param()). Any
> +	  kernel.* key in the embedded bootconfig is applied before
> +	  parse_early_param() runs.  Early options in initrd bootconfig will
> +	  not be applied.  Early options consumed during setup_arch() (e.g.
> +	  mem=, memmap=, earlycon, noapic, acpi=, numa=, iommu=) may not
> +	  take effect.  If the same early option
> +	  appears in both bootconfig and the kernel command line, the
> +	  command line value takes precedence.
> +

As a compromise, how about making this a separate Kconfig?

config BOOT_CONFIG_EMBED_EARLY_PARAM
	bool "Support early_params in embedded bootconfig"

and afterwards, we can introduce 

	depends on ARCH_BOOT_CONFIG_EMBED_EARLY_PARAM


Thank you,

-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH -next] hwmon: (yogafan) fix markup warning
From: Guenter Roeck @ 2026-03-31  2:45 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: linux-kernel, Sergio Melas, linux-hwmon, linux-doc
In-Reply-To: <20260330214624.3781789-1-rdunlap@infradead.org>

On Mon, Mar 30, 2026 at 02:46:24PM -0700, Randy Dunlap wrote:
> Add a blank line between the License and heading lines to prevent a
> documentation build warning:
> 
> Documentation/hwmon/yogafan.rst:2: WARNING: Explicit markup ends without
>   a blank line; unexpected unindent. [docutils]
> 
> Fixes: b773f2e6b472 ("hwmon: (yogafan) Add support for Lenovo Yoga/Legion fan monitoring")
> Signed-off-by: Randy Dunlap <rdunlap@infradead.org>

Applied.

Thanks,
Guenter

^ permalink raw reply

* Re: [PATCH 1/2] mm/memory-failure: add panic_on_unrecoverable_memory_failure sysctl
From: Miaohe Lin @ 2026-03-31  2:27 UTC (permalink / raw)
  To: Breno Leitao
  Cc: linux-mm, linux-kernel, linux-doc, kernel-team, Naoya Horiguchi,
	Andrew Morton, Jonathan Corbet, Shuah Khan
In-Reply-To: <acp8wYLHDGAfhzI5@gmail.com>

On 2026/3/30 21:45, Breno Leitao wrote:
> On Mon, Mar 30, 2026 at 03:55:00PM +0800, Miaohe Lin wrote:
>> On 2026/3/23 23:29, Breno Leitao wrote:
>>
>>> @@ -1298,6 +1309,10 @@ static int action_result(unsigned long pfn, enum mf_action_page_type type,
>>>  	pr_err("%#lx: recovery action for %s: %s\n",
>>>  		pfn, action_page_types[type], action_name[result]);
>>>  
>>> +	if (sysctl_panic_on_unrecoverable_mf &&
>>> +	    type == MF_MSG_GET_HWPOISON && result == MF_IGNORED)
>>> +		panic("Memory failure: %#lx: unrecoverable page", pfn);
>>
>> MF_MSG_GET_HWPOISON contains some other scenarios. For example, an isolated folio will
>> make get_hwpoison_page return -EIO so we will see MF_MSG_GET_HWPOISON and MF_IGNORED in
>> action_result. But that's recoverable if folio is used by userspace thus panic will be
>> unacceptable.
>> Will it better to check type against MF_MSG_KERNEL_HIGH_ORDER?
> 
> Yes, I was discussing this with akpm, and maybe the better
> approach would be to panic for types MF_MSG_KERNEL_HIGH_ORDER and MF_MSG_KERNEL.
> 
> In both cases, it seems that, the page would not be able to migrate. What do
> you think about a change like this:
> 
> 
> @@ -1298,6 +1309,10 @@ static int action_result(unsigned long pfn, enum mf_action_page_type type,
>         pr_err("%#lx: recovery action for %s: %s\n",
>                 pfn, action_page_types[type], action_name[result]);
> 
> +       if (sysctl_panic_on_unrecoverable_mf && result == MF_IGNORED &&
> +           (type == MF_MSG_KERNEL || type == MF_MSG_KERNEL_HIGH_ORDER))
> +               panic("Memory failure: %#lx: unrecoverable page", pfn);
> +
>         return (result == MF_RECOVERED || result == MF_DELAYED) ? 0 : -EBUSY;
>  }
> 

Maybe MF_MSG_UNKNOWN can also be considered? Kernel can't do anything further
for those folios.
BTW I think current code can't reach to MF_MSG_KERNEL and MF_MSG_UNKNOWN cases
bacause there is always a (PageHuge() || HWPoisonHandlable()) check before calling
identify_page_state.

Thanks.
.

^ permalink raw reply

* Re: [PATCH net-next V9 12/14] net/mlx5: qos: Support cross-device tx scheduling
From: Jakub Kicinski @ 2026-03-31  2:08 UTC (permalink / raw)
  To: tariqt
  Cc: Jakub Kicinski, edumazet, pabeni, andrew+netdev, davem,
	donald.hunter, horms, jiri, corbet, skhan, saeedm, leon, mbloch,
	shuah, chuck.lever, matttbe, cjubran, cratiu, dtatulea,
	jacob.e.keller, shshitrit, daniel.zahka, parav, ajayachandra,
	kees, shayd, danielj, moshe, willemb, dw, petrm, sdf, daniel, joe,
	razor, vadim.fedorenko, mst, antonio, allison.henderson,
	minhquangbui99, noren, netdev, linux-kernel, linux-doc,
	linux-rdma, linux-kselftest, gal, jiri
In-Reply-To: <20260326065949.44058-13-tariqt@nvidia.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net/mlx5: qos: Support cross-device tx scheduling

This commit relaxes the restriction that rate groups can only contain vports
from the same E-Switch. It enables cross-device TX scheduling if the hardware
supports it, LAG is enabled, and the E-Switches share a devlink device.

> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
> --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c

[ ... ]

> @@ -1738,18 +1761,44 @@ int mlx5_esw_devlink_rate_node_del(struct devlink_rate *rate_node, void *priv,
>  	return 0;
>  }
>  
> +static int
> +mlx5_esw_validate_cross_esw_scheduling(struct mlx5_eswitch *esw,
> +				       struct mlx5_esw_sched_node *parent,
> +				       struct netlink_ext_ack *extack)
> +{

[ ... ]

> +	if (!mlx5_lag_is_active(esw->dev)) {
> +		NL_SET_ERR_MSG_MOD(extack,
> +				   "Cross E-Switch scheduling requires LAG to be activated");
> +		return -EOPNOTSUPP;
> +	}

What happens to the cross-device rate scheduling if LAG is deactivated
after the configuration is applied?

When LAG is deactivated, the LAG state is torn down in hardware, but there
does not appear to be a corresponding cleanup or update triggered for the
cross-eswitch QoS scheduling elements.

Could this result in a regression where the hardware TSAR tree and devlink
rate configuration remain pointing to a now-invalid VHCA association,
bypassing the state validation and causing traffic drops or firmware errors?

> +
> +	return 0;
> +}

^ permalink raw reply

* Re: [PATCH net-next V9 11/14] net/mlx5: qos: Remove qos domains and use shd lock
From: Jakub Kicinski @ 2026-03-31  2:08 UTC (permalink / raw)
  To: tariqt
  Cc: Jakub Kicinski, edumazet, pabeni, andrew+netdev, davem,
	donald.hunter, horms, jiri, corbet, skhan, saeedm, leon, mbloch,
	shuah, chuck.lever, matttbe, cjubran, cratiu, dtatulea,
	jacob.e.keller, shshitrit, daniel.zahka, parav, ajayachandra,
	kees, shayd, danielj, moshe, willemb, dw, petrm, sdf, daniel, joe,
	razor, vadim.fedorenko, mst, antonio, allison.henderson,
	minhquangbui99, noren, netdev, linux-kernel, linux-doc,
	linux-rdma, linux-kselftest, gal, jiri
In-Reply-To: <20260326065949.44058-12-tariqt@nvidia.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
    net/mlx5: qos: Remove qos domains and use shd lock

This commit removes QoS domains and switches to using the shd lock to
protect against concurrent QoS modifications. It also enables the
supported_cross_device_rate_nodes devlink ops attribute.

Link: https://lore.kernel.org/netdev/20250213180134.323929-1-tariqt@nvidia.com/

> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
> index 6698ac55a4bfa..c051605fecd23 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
> @@ -385,6 +385,7 @@ static const struct devlink_ops mlx5_devlink_ops = {
>  	.rate_node_del = mlx5_esw_devlink_rate_node_del,
>  	.rate_leaf_parent_set = mlx5_esw_devlink_rate_leaf_parent_set,
>  	.rate_node_parent_set = mlx5_esw_devlink_rate_node_parent_set,
> +	.supported_cross_device_rate_nodes = true,

The commit message states that enabling supported_cross_device_rate_nodes
now is safe because mlx5_esw_qos_vport_update_parent rejects cross-esw
parent updates.

When a user provides a DEVLINK_ATTR_PARENT_DEV belonging to an unrelated
device, the devlink core (devlink_nl_rate_parent_node_set) drops the lock on
the parent devlink and calls devlink_rate_node_get_by_name(). This function
traverses the rate_list of the unlocked parent devlink instance.

Does this leave the devlink core exposed to data races and potential
use-after-free vulnerabilities, since the driver's rejection callback
executes only after this lockless list traversal has occurred?

> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
> index 0be516003bcde..f67f99428959f 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
> @@ -1110,7 +1068,7 @@ void mlx5_esw_qos_vport_disable(struct mlx5_vport *vport)
>  	struct mlx5_esw_sched_node *parent;
>  
>  	lockdep_assert_held(&esw->state_lock);
> -	esw_qos_lock(esw);
> +	devl_lock(esw->dev->shd);

Could esw->dev->shd be NULL here?

In mlx5_shd_init(), initialization is explicitly skipped for non-PF devices,
meaning Sub-Functions (SFs) acting as E-Switch managers will have a NULL
shd pointer. If the device lacks VPD data, pci_vpd_alloc() returns -ENODEV,
causing mlx5_shd_init() to return 0 but leaving dev->shd as NULL.

When QoS operations are attempted on these devices, calling devl_lock()
dereferences the pointer, which could cause an immediate NULL pointer
dereference and kernel panic.

Additionally, this acquires the devlink lock while already holding
esw->state_lock, establishing the lock order:
state_lock -> devlink->lock.

Conversely, the devlink core invokes driver callbacks with the devlink
instance lock held, and the driver subsequently acquires esw->state_lock
inside these callbacks, creating the reverse order:
devlink->lock -> state_lock.

Does this introduce a circular locking dependency splat between state_lock
and the devlink instance lock?

> @@ -1159,26 +1117,25 @@ int mlx5_esw_qos_set_vport_rate(struct mlx5_vport *vport, u32 max_rate, u32 min_
>  	struct mlx5_eswitch *esw = vport->dev->priv.eswitch;
>  	int err;
>  
> -	esw_qos_lock(esw);
> +	devl_lock(esw->dev->shd);

This function is invoked by mlx5_eswitch_set_vport_rate(), which acts as the
handler for the ndo_set_vf_rate netdev callback. The core network stack
executes ndo_set_vf_rate with the rtnl_lock held, establishing the lock
order: rtnl_lock -> devlink->lock.

However, the devlink subsystem enforces the opposite locking order
(devlink->lock -> rtnl_lock). Because all devlink locks share the same
lockdep class, does this inversion trigger a circular locking dependency
lockdep splat and pose an ABBA deadlock risk?

[ ... ]

> @@ -1859,13 +1763,15 @@ mlx5_esw_qos_vport_update_parent(struct mlx5_vport *vport,
>  						  parent ? : esw->qos.root,
>  						  extack);
>  	}
> -	esw_qos_unlock(esw);
> +
>  	return err;
>  }
>  
>  void mlx5_esw_qos_vport_clear_parent(struct mlx5_vport *vport)
>  {
> +	devl_lock(vport->dev->shd);
>  	mlx5_esw_qos_vport_update_parent(vport, NULL, NULL);
> +	devl_unlock(vport->dev->shd);
>  }

During device teardown, execution paths such as mlx5_unload_one_devl_locked()
hold the primary devlink instance lock. This path invokes
mlx5_eswitch_unload_vf_vports() -> mlx5_esw_offloads_devlink_port_unregister()
-> mlx5_esw_qos_vport_clear_parent().

Because both the primary devlink instance lock and the shd devlink instance
lock are initialized from the same site (devlink_alloc), they share the same
lockdep class.

Does calling devl_lock() directly (instead of using the devlink core's
mutex_lock_nested capability) while a lock of the same class is already held
trigger a recursive locking lockdep warning?

^ permalink raw reply

* Re: [PATCH net-next V9 07/14] devlink: Allow rate node parents from other devlinks
From: Jakub Kicinski @ 2026-03-31  2:08 UTC (permalink / raw)
  To: tariqt
  Cc: Jakub Kicinski, edumazet, pabeni, andrew+netdev, davem,
	donald.hunter, horms, jiri, corbet, skhan, saeedm, leon, mbloch,
	shuah, chuck.lever, matttbe, cjubran, cratiu, dtatulea,
	jacob.e.keller, shshitrit, daniel.zahka, parav, ajayachandra,
	kees, shayd, danielj, moshe, willemb, dw, petrm, sdf, daniel, joe,
	razor, vadim.fedorenko, mst, antonio, allison.henderson,
	minhquangbui99, noren, netdev, linux-kernel, linux-doc,
	linux-rdma, linux-kselftest, gal, jiri
In-Reply-To: <20260326065949.44058-8-tariqt@nvidia.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
devlink: Allow rate node parents from other devlinks

This commit makes use of the building blocks previously added to
implement cross-device rate nodes.

> diff --git a/include/net/devlink.h b/include/net/devlink.h
> index 3038af6ec017e..8d5ad5d4f1d05 100644
> --- a/include/net/devlink.h
> +++ b/include/net/devlink.h
> @@ -1585,6 +1585,11 @@ struct devlink_ops {
>  				    struct devlink_rate *parent,
>  				    void *priv_child, void *priv_parent,
>  				    struct netlink_ext_ack *extack);
> +	/* Indicates if cross-device rate nodes are supported.
> +	 * This also requires a shared common ancestor object all devices that
> +	 * could share rate nodes are nested in.
> +	 */
> +	bool supported_cross_device_rate_nodes;

When a device acting as a cross-device rate parent is destroyed,
devl_rate_nodes_destroy() iterates the shared rate_list to detach its
children. If it encounters a child rate node belonging to a different device,
it invokes the child driver's ops->rate_node_parent_set or
ops->rate_leaf_parent_set callbacks.

Since the cleanup path only holds the parent's devlink lock and the shared
root devlink lock, does this allow the child driver's rate operations to
execute concurrently with other operations on the child device without holding
the child's devlink instance lock?

[ ... ]
> diff --git a/net/devlink/rate.c b/net/devlink/rate.c
> index 1949746fab291..f243cccc95bee 100644
> --- a/net/devlink/rate.c
> +++ b/net/devlink/rate.c
> @@ -30,19 +30,53 @@ devlink_rate_leaf_get_from_info(struct devlink *devlink, struct genl_info *info)
>  	return devlink_rate ?: ERR_PTR(-ENODEV);
>  }
>  
> +/* Repeatedly locks the nested-in devlink instances while cross device rate
> + * nodes are supported. Returns the devlink instance where rates should be
> + * stored.
> + */
>  static struct devlink *devl_rate_lock(struct devlink *devlink)
>  {
> -	return devlink;
> +	struct devlink *rate_devlink = devlink;
> +
> +	while (rate_devlink->ops &&
> +	       rate_devlink->ops->supported_cross_device_rate_nodes) {
> +		devlink = devlink_nested_in_get_lock(rate_devlink->rel);
> +		if (!devlink)
> +			break;
> +		rate_devlink = devlink;
> +	}
> +	return rate_devlink;
>  }

[ ... ]

> +/* Repeatedly unlocks the nested-in devlink instances of 'devlink' while cross
> + * device nodes are supported.
> + */
>  static void devl_rate_unlock(struct devlink *devlink)
>  {
> +	if (!devlink || !devlink->ops ||
> +	    !devlink->ops->supported_cross_device_rate_nodes)
> +		return;
> +
> +	devl_rate_unlock(devlink_nested_in_get_locked(devlink->rel));
> +	devlink_nested_in_put_unlock(devlink->rel);
>  }

If a parent devlink is not yet registered, devlink_nested_in_get_lock() can
return NULL in devl_rate_lock() and break the loop without taking a lock.

If another thread completes registering the parent devlink concurrently,
devlink_nested_in_get_locked() in devl_rate_unlock() will find it. 

Could this lead to unlocking a mutex that was never acquired and dropping a
reference that was never taken, resulting in a use-after-free or lock
corruption?

[ ... ]
> @@ -320,13 +371,14 @@ devlink_nl_rate_parent_node_set(struct devlink_rate *devlink_rate,
>  				struct genl_info *info,
>  				struct nlattr *nla_parent)
>  {
> -	struct devlink *devlink = devlink_rate->devlink;
> +	struct devlink *devlink = devlink_rate->devlink, *parent_devlink;
>  	const char *parent_name = nla_data(nla_parent);
>  	const struct devlink_ops *ops = devlink->ops;
>  	size_t len = strlen(parent_name);
>  	struct devlink_rate *parent;
>  	int err = -EOPNOTSUPP;
>  
> +	parent_devlink = devlink_nl_ctx(info)->parent_devlink ? : devlink;
>  	parent = devlink_rate->parent;
>  
>  	if (parent && !len) {
> @@ -344,7 +396,13 @@ devlink_nl_rate_parent_node_set(struct devlink_rate *devlink_rate,
>  		refcount_dec(&parent->refcnt);
>  		devlink_rate->parent = NULL;
>  	} else if (len) {
> -		parent = devlink_rate_node_get_by_name(devlink, parent_name);
> +		/* parent_devlink (when different than devlink) isn't locked,
> +		 * but the rate node devlink instance is, so nobody from the
> +		 * same group of devices sharing rates could change the used
> +		 * fields or unregister the parent.
> +		 */
> +		parent = devlink_rate_node_get_by_name(parent_devlink,
> +						       parent_name);

Is parent_devlink validated to ensure it shares the same common rate root
ancestor as devlink?

If an arbitrary unrelated parent_devlink is specified,
devlink_rate_node_get_by_name() could traverse an unrelated parent's
rate_list that was never locked by devl_rate_lock(), leading to a data race.

Additionally, if a rate node is found, its priv pointer is passed to the
target driver's ops->rate_node_parent_set. Could this cause the target driver
to cast a foreign priv pointer to its own private struct type, resulting in
type confusion and memory corruption?

^ permalink raw reply

* Re: [PATCH net-next V9 04/14] devlink: Decouple rate storage from associated devlink object
From: Jakub Kicinski @ 2026-03-31  2:08 UTC (permalink / raw)
  To: tariqt
  Cc: Jakub Kicinski, edumazet, pabeni, andrew+netdev, davem,
	donald.hunter, horms, jiri, corbet, skhan, saeedm, leon, mbloch,
	shuah, chuck.lever, matttbe, cjubran, cratiu, dtatulea,
	jacob.e.keller, shshitrit, daniel.zahka, parav, ajayachandra,
	kees, shayd, danielj, moshe, willemb, dw, petrm, sdf, daniel, joe,
	razor, vadim.fedorenko, mst, antonio, allison.henderson,
	minhquangbui99, noren, netdev, linux-kernel, linux-doc,
	linux-rdma, linux-kselftest, gal, jiri
In-Reply-To: <20260326065949.44058-5-tariqt@nvidia.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
devlink: Decouple rate storage from associated devlink object

Devlink rate leafs and nodes were stored in their respective devlink
objects pointed to by devlink_rate->devlink.

> diff --git a/net/devlink/rate.c b/net/devlink/rate.c
> index 4781429109195..9ebbc72130c6a 100644
> --- a/net/devlink/rate.c
> +++ b/net/devlink/rate.c
[ ... ]
> @@ -648,8 +702,9 @@ int devlink_nl_rate_new_doit(struct sk_buff *skb, struct genl_info *info)
>  		goto err_rate_set;
>  
>  	refcount_set(&rate_node->refcnt, 1);
> -	list_add(&rate_node->list, &devlink->rate_list);
> +	list_add(&rate_node->list, &rate_devlink->rate_list);
>  	devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_NEW);
> +	devl_rate_unlock(devlink);
>  	return 0;
>  
>  err_rate_set:
>  	ops->rate_node_del(rate_node, rate_node->priv, info->extack);
> @@ -658,6 +713,8 @@ int devlink_nl_rate_new_doit(struct sk_buff *skb, struct genl_info *info)
>  	kfree(rate_node->name);
>  err_strdup:
>  	kfree(rate_node);
> +unlock:
> +	devl_rate_unlock(devlink);
>  	return err;
>  }

If devlink_nl_rate_set() fails after it successfully sets the parent node
and increments the parent's refcount, does this error path permanently
leak that reference since it frees the node without decrementing it?

[ ... ]
> @@ -716,14 +785,20 @@ devl_rate_node_create(struct devlink *devlink, void *priv, char *node_name,
>  		      struct devlink_rate *parent)
>  {
>  	struct devlink_rate *rate_node;
> +	struct devlink *rate_devlink;
>  
> +	rate_devlink = devl_rate_lock(devlink);
>  	rate_node = devlink_rate_node_get_by_name(devlink, node_name);
> -	if (!IS_ERR(rate_node))
> -		return ERR_PTR(-EEXIST);
> +	if (!IS_ERR(rate_node)) {
> +		rate_node = ERR_PTR(-EEXIST);
> +		goto unlock;
> +	}
>  
>  	rate_node = kzalloc_obj(*rate_node);
> -	if (!rate_node)
> -		return ERR_PTR(-ENOMEM);
> +	if (!rate_node) {
> +		rate_node = ERR_PTR(-ENOMEM);
> +		goto unlock;
> +	}
>  
>  	if (parent) {
>  		rate_node->parent = parent;
> @@ -737,12 +812,15 @@ devl_rate_node_create(struct devlink *devlink, void *priv, char *node_name,
>  	rate_node->name = kstrdup(node_name, GFP_KERNEL);
>  	if (!rate_node->name) {
>  		kfree(rate_node);
> -		return ERR_PTR(-ENOMEM);
> +		rate_node = ERR_PTR(-ENOMEM);
> +		goto unlock;
>  	}

Similarly, if kstrdup() fails here, does this leak the parent's refcount
since it was already incremented earlier in the function?

^ permalink raw reply

* Re: [PATCH net-next V9 02/14] devlink: Add helpers to lock nested-in instances
From: Jakub Kicinski @ 2026-03-31  2:08 UTC (permalink / raw)
  To: tariqt
  Cc: Jakub Kicinski, edumazet, pabeni, andrew+netdev, davem,
	donald.hunter, horms, jiri, corbet, skhan, saeedm, leon, mbloch,
	shuah, chuck.lever, matttbe, cjubran, cratiu, dtatulea,
	jacob.e.keller, shshitrit, daniel.zahka, parav, ajayachandra,
	kees, shayd, danielj, moshe, willemb, dw, petrm, sdf, daniel, joe,
	razor, vadim.fedorenko, mst, antonio, allison.henderson,
	minhquangbui99, noren, netdev, linux-kernel, linux-doc,
	linux-rdma, linux-kselftest, gal, jiri
In-Reply-To: <20260326065949.44058-3-tariqt@nvidia.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
devlink: Add helpers to lock nested-in instances

Upcoming code will need to obtain a reference to locked nested-in
devlink instances. Add helpers to lock, obtain an already locked
reference and unlock/unref the nested-in instance.

> diff --git a/net/devlink/core.c b/net/devlink/core.c
> index eeb6a71f5f56e..db11248df7120 100644
> --- a/net/devlink/core.c
> +++ b/net/devlink/core.c

[ ... ]

> +/* Returns the nested in devlink object and validates its lock is held. */
> +struct devlink *devlink_nested_in_get_locked(struct devlink_rel *rel)
> +{
> +	struct devlink *devlink;
> +	unsigned long index;
> +
> +	if (!rel)
> +		return NULL;
> +	index = rel->nested_in.devlink_index;
> +	devlink = xa_find(&devlinks, &index, index, DEVLINK_REGISTERED);
> +	if (devlink)
> +		devl_assert_locked(devlink);

Can this dereference a freed pointer if the lock is not actually held?

xa_find() acquires and releases the RCU read lock internally. When it returns,
the devlink pointer is no longer protected by RCU or a reference count.

If a caller invokes this helper without holding the devlink lock (which is the
exact scenario the assertion is designed to detect), another thread could
concurrently unregister and free the devlink.

If the RCU grace period expires before the assertion executes,
devl_assert_locked() would dereference freed memory. Should the lookup and
assertion be enclosed within an explicit rcu_read_lock() block to safely
assert the lock?

> +	return devlink;
> +}
> +
> +void devlink_nested_in_put_unlock(struct devlink_rel *rel)
> +{
> +	struct devlink *devlink = devlink_nested_in_get_locked(rel);

Could this corrupt the mutex state and cause a refcount underflow?

This helper takes a devlink_rel pointer instead of the devlink pointer
acquired by devlink_nested_in_get_lock(), and performs a secondary global
lookup to find the devlink.

If a caller mistakenly calls this in an error cleanup path where they did not
actually acquire the lock, the global xa_find() will still locate the
registered devlink. This would execute devl_unlock() and devlink_put() on a
devlink the current thread does not own.

Would it be safer for unlock/put helpers to take the exact pointer returned by
the lock/get helper to ensure safe resource cleanup?

> +
> +	if (devlink) {
> +		devl_unlock(devlink);
> +		devlink_put(devlink);
> +	}
> +}
-- 
pw-bot: cr

^ permalink raw reply

* Re: [PATCH v2] bootconfig: Apply early options from embedded config
From: Masami Hiramatsu @ 2026-03-31  0:00 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Jonathan Corbet, Shuah Khan, linux-kernel, linux-trace-kernel,
	linux-doc, oss, paulmck, rostedt, kernel-team
In-Reply-To: <acpzhCBEPh-tKVqg@gmail.com>

On Mon, 30 Mar 2026 06:15:17 -0700
Breno Leitao <leitao@debian.org> wrote:

> On Fri, Mar 27, 2026 at 10:37:44PM +0900, Masami Hiramatsu wrote:
> > On Fri, 27 Mar 2026 03:06:41 -0700
> > Breno Leitao <leitao@debian.org> wrote:
> 
> > > > To fix this, we need to change setup_arch() for each architecture so
> > > > that it calls this bootconfig_apply_early_params().
> > > 
> > > Could we instead integrate this into parse_early_param() itself? That
> > > approach would avoid the need to modify each architecture individually.
> > 
> > Ah, indeed. 
> 
> I investigated integrating bootconfig into parse_early_param() and hit a
> blocker: xbc_init() and xbc_make_cmdline() depend on memblock_alloc(), but on
> most architectures (x86, arm64, arm, s390, riscv) parse_early_param() is called
> from setup_arch() _before_ memblock is initialized.

Yeah, that's right.

> 
> So, bootconfig will not be available as early as parse_early_param(). 
> 
> An alternative is replace memblock allocations in lib/bootconfig.c with static
> __initdata buffers, similar to Petr's approach in 2023:
> 
> 	https://lore.kernel.org/all/20231121231342.193646-3-oss@malat.biz/
> 
> But, there was concerns about the allocation size:
> 
> 	Petr Malat <oss@malat.biz> wrote: 
> 	> To allow handling of early options, it's necessary to eliminate allocations
> 	> from embedded bootconfig handling
> 
> 	"Hm, my concern is that this can introduce some sort of overhead to parse the bootconfig."
> 

As far as we can correctly handle the early params and it is limited only
with the embedded bootconfig, I think it is OK to allocate it statically.

Thank you,


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH v11 03/22] drm: Add new general DRM property "color format"
From: Ville Syrjälä @ 2026-03-30 23:56 UTC (permalink / raw)
  To: Nicolas Frattaroli
  Cc: Maxime Ripard, Harry Wentland, Leo Li, Rodrigo Siqueira,
	Alex Deucher, Christian König, David Airlie, Simona Vetter,
	Maarten Lankhorst, Thomas Zimmermann, Andrzej Hajda,
	Neil Armstrong, Robert Foss, Laurent Pinchart, Jonas Karlman,
	Jernej Skrabec, Sandy Huang, Heiko Stübner, Andy Yan,
	Jani Nikula, Rodrigo Vivi, Joonas Lahtinen, Tvrtko Ursulin,
	Dmitry Baryshkov, Sascha Hauer, Rob Herring, Jonathan Corbet,
	Shuah Khan, kernel, amd-gfx, dri-devel, linux-kernel,
	linux-arm-kernel, linux-rockchip, intel-gfx, intel-xe, linux-doc,
	Werner Sembach, Andri Yngvason, Marius Vlad
In-Reply-To: <acclgID7lSVNten2@intel.com>

On Sat, Mar 28, 2026 at 02:49:04AM +0200, Ville Syrjälä wrote:
> On Fri, Mar 27, 2026 at 01:56:06PM +0100, Nicolas Frattaroli wrote:
> > On Thursday, 26 March 2026 18:58:25 Central European Standard Time Ville Syrjälä wrote:
> > > On Thu, Mar 26, 2026 at 06:02:47PM +0100, Maxime Ripard wrote:
> > > > On Wed, Mar 25, 2026 at 08:43:15PM +0200, Ville Syrjälä wrote:
> > > > > On Wed, Mar 25, 2026 at 03:56:58PM +0100, Maxime Ripard wrote:
> > > > > > On Wed, Mar 25, 2026 at 01:03:07PM +0200, Ville Syrjälä wrote:
> > > > > > > On Wed, Mar 25, 2026 at 09:24:27AM +0100, Maxime Ripard wrote:
> > > > > > > > On Tue, Mar 24, 2026 at 09:53:35PM +0200, Ville Syrjälä wrote:
> > > > > > > > > On Tue, Mar 24, 2026 at 08:10:11PM +0100, Nicolas Frattaroli wrote:
> > > > > > > > > > On Tuesday, 24 March 2026 18:00:45 Central European Standard Time Ville Syrjälä wrote:
> > > > > > > > > > > On Tue, Mar 24, 2026 at 05:01:07PM +0100, Nicolas Frattaroli wrote:
> > > > > > > > > > > > +enum drm_connector_color_format {
> > > > > > > > > > > > +	/**
> > > > > > > > > > > > +	 * @DRM_CONNECTOR_COLOR_FORMAT_AUTO: The driver or display protocol
> > > > > > > > > > > > +	 * helpers should pick a suitable color format. All implementations of a
> > > > > > > > > > > > +	 * specific display protocol must behave the same way with "AUTO", but
> > > > > > > > > > > > +	 * different display protocols do not necessarily have the same "AUTO"
> > > > > > > > > > > > +	 * semantics.
> > > > > > > > > > > > +	 *
> > > > > > > > > > > > +	 * For HDMI, "AUTO" picks RGB, but falls back to YCbCr 4:2:0 if the
> > > > > > > > > > > > +	 * bandwidth required for full-scale RGB is not available, or the mode
> > > > > > > > > > > > +	 * is YCbCr 4:2:0-only, as long as the mode and output both support
> > > > > > > > > > > > +	 * YCbCr 4:2:0.
> > > > > > > > > > > > +	 *
> > > > > > > > > > > > +	 * For display protocols other than HDMI, the recursive bridge chain
> > > > > > > > > > > > +	 * format selection picks the first chain of bridge formats that works,
> > > > > > > > > > > > +	 * as has already been the case before the introduction of the "color
> > > > > > > > > > > > +	 * format" property. Non-HDMI bridges should therefore either sort their
> > > > > > > > > > > > +	 * bus output formats by preference, or agree on a unified auto format
> > > > > > > > > > > > +	 * selection logic that's implemented in a common state helper (like
> > > > > > > > > > > > +	 * how HDMI does it).
> > > > > > > > > > > > +	 */
> > > > > > > > > > > > +	DRM_CONNECTOR_COLOR_FORMAT_AUTO = 0,
> > > > > > > > > > > > +
> > > > > > > > > > > > +	/**
> > > > > > > > > > > > +	 * @DRM_CONNECTOR_COLOR_FORMAT_RGB444: RGB output format
> > > > > > > > > > > > +	 */
> > > > > > > > > > > > +	DRM_CONNECTOR_COLOR_FORMAT_RGB444,
> > > > > > > > > > > > +
> > > > > > > > > > > > +	/**
> > > > > > > > > > > > +	 * @DRM_CONNECTOR_COLOR_FORMAT_YCBCR444: YCbCr 4:4:4 output format (ie.
> > > > > > > > > > > > +	 * not subsampled)
> > > > > > > > > > > > +	 */
> > > > > > > > > > > > +	DRM_CONNECTOR_COLOR_FORMAT_YCBCR444,
> > > > > > > > > > > > +
> > > > > > > > > > > > +	/**
> > > > > > > > > > > > +	 * @DRM_CONNECTOR_COLOR_FORMAT_YCBCR422: YCbCr 4:2:2 output format (ie.
> > > > > > > > > > > > +	 * with horizontal subsampling)
> > > > > > > > > > > > +	 */
> > > > > > > > > > > > +	DRM_CONNECTOR_COLOR_FORMAT_YCBCR422,
> > > > > > > > > > > > +
> > > > > > > > > > > > +	/**
> > > > > > > > > > > > +	 * @DRM_CONNECTOR_COLOR_FORMAT_YCBCR420: YCbCr 4:2:0 output format (ie.
> > > > > > > > > > > > +	 * with horizontal and vertical subsampling)
> > > > > > > > > > > > +	 */
> > > > > > > > > > > > +	DRM_CONNECTOR_COLOR_FORMAT_YCBCR420,
> > > > > > > > > > > 
> > > > > > > > > > > Seems like this should document what the quantization range
> > > > > > > > > > > should be for each format.
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > I don't think so? If you want per-component bit depth values,
> > > > > > > > > > DRM_FORMAT_* defines would be the appropriate values to use. This
> > > > > > > > > > enum is more abstract than that, and is there to communicate
> > > > > > > > > > YUV vs. RGB and chroma subsampling, with bit depth being handled
> > > > > > > > > > by other properties.
> > > > > > > > > > 
> > > > > > > > > > If you mean the factor used for subsampling, then that'd only be
> > > > > > > > > > relevant if YCBCR410 was supported where one chroma plane isn't
> > > > > > > > > > halved but quartered in resolution. I suspect 4:1:0 will never
> > > > > > > > > > be added; no digital display protocol standard supports it to my
> > > > > > > > > > knowledge, and hopefully none ever will.
> > > > > > > > > 
> > > > > > > > > No, I mean the quantization range (16-235 vs. 0-255 etc).
> > > > > > > > > 
> > > > > > > > > The i915 behaviour is that YCbCr is always limited range,
> > > > > > > > > RGB can either be full or limited range depending on the 
> > > > > > > > > "Broadcast RGB" property and other related factors.
> > > > > > > > 
> > > > > > > > So far the HDMI state has both the format and quantization range as
> > > > > > > > different fields. I'm not sure we need to document the range in the
> > > > > > > > format field, maybe only mention it's not part of the format but has a
> > > > > > > > field of its own?
> > > > > > > 
> > > > > > > I think we only have it for RGB (on some drivers only?). For YCbCr
> > > > > > > I think the assumption is limited range everywhere.
> > > > > > > 
> > > > > > > But I'm not really concerned about documenting struct members.
> > > > > > > What I'm talking about is the *uapi* docs. Surely userspace
> > > > > > > will want to know what the new property actually does so the
> > > > > > > uapi needs to be documented properly. And down the line some
> > > > > > > new driver might also implement the wrong behaviour if there
> > > > > > > is no clear specification.
> > > > > > 
> > > > > > Ack
> > > > > > 
> > > > > > > So I'm thinking (or perhaps hoping) the rule might be something like:
> > > > > > > - YCbCr limited range 
> > > > > > > - RGB full range if "Broadcast RGB" property is not present
> > > > > > 
> > > > > > Isn't it much more complicated than that for HDMI though? My
> > > > > > recollection was that any VIC but VIC1 would be limited range, and
> > > > > > anything else full range?
> > > > > 
> > > > > Do we have some driver that implements the CTA-861 CE vs. IT mode
> > > > > logic but doesn't expose the "Broadcast RGB" property? I was hoping
> > > > > those would always go hand in hand now.
> > > > 
> > > > I'm not sure. i915 and the HDMI state helpers handle it properly (I
> > > > think?) but it looks like only vc4 registers the Broadcast RGB property
> > > > and uses the HDMI state helpers.
> > > > 
> > > > And it looks like amdgpu registers Broadcast RGB but doesn't use
> > > > drm_default_rgb_quant_range() which seems suspicious?
> > > 
> > > If they want just manual full vs. limited then they should
> > > limit the property to not expose the "auto" option at all.
> > > 
> > > amdgpu also ties this in with the "colorspace" property, which
> > > originally in i915 only controlled the infoframes/etc. But on
> > > amdgpu it now controls various aspects of output color
> > > transformation. The end result is that the property is a complete
> > > mess with most of the values making no sense. And for whatever
> > > reason everyone involved refused to remove/deprecate the
> > > nonsensical values :/
> > > 
> > > Looks like this series should make sure the documentation for
> > > the "colorspace" property is in sync with the new property
> > > as well. Currently now it's giving conflicting information.
> > > 
> > 
> > I take it the problematic information is in
> > 
> >     * DOC: standard connector properties
> >     *
> >     * Colorspace:
> > 
> > and probably specifically BT2020_YCC's (and BT2020_RGB's?) insistence
> > that they "produce RGB content".
> > 
> > I think we probably just have to change the statement "The variants
> > BT2020_RGB and BT2020_YCC are equivalent and the driver chooses between
> > RGB and YCbCr on its own."
> > 
> > The "on its own" here would get turned into "based on the color format
> > property".
> > 
> > Speaking of i915, that patch is one of the very few (5) patches in
> > this series still lacking a review (hint hint nudge nudge). I'd like
> > to get some more feedback on the remaining patches before I send out
> > another revision, so that it's hopefully not just docs changes (I
> > know better than to think those patches must be perfect and won't
> > need revision.)
> 
> The i915 code around this is already a big mess, and I don't really
> adding to that mess. So I think we'll need to do some refactoring before
> we add anything there. I already started typing something and so far
> it looks fairly straightforward, so I should have something soon.

OK, posted something
https://lore.kernel.org/intel-gfx/20260330235339.29479-1-ville.syrjala@linux.intel.com/T/#m7c349478ca6c856fbc68d5e2178f1aa31678a05f

Are the wayland/compositor/color management folks on board with
these new properties? I don't think I see the usual suspects on
the cc list.

> 
> While doing that several questions came to my mind though:
> 
> * More interactions with the colorspace property, but I sent
>   a separate mail already about that
> 
> * Which conversion matrix to use, and the answer I suspect
>   should be "ask the colorspace property", as mentioned in the
>   other mail
> 
> * Should we flat out reject color formats (and I suppose also
>   colorspace prop values) the sink doesn't claim to support?
> 
>   If yes, then I think we'll have to forget about adding anything 
>   to i915 MST code. The way the MST stuff works is that if one
>   stream needs a modeset then all the related streams get modeset
>   as well. Thus if the user replaces a monitor getting fed with a
>   YCbCr stream just as another stream is being modeset, then the
>   entire atomic commit could fail due to the YCbCr stream getting
>   rejected.
> 
>   I think eventually we might have to invent some mechanism where
>   all the input into the modeset computation is cached somehow,
>   and said cache updated only on explicit userspace modesets.
>   Either that or we have to come up  with a way to skip some of
>   the calculations that depend on external factors. Either way
>   it's going to be a pain.
> 
>   OTOH if we don't mind feeding the sink with stuff it can't
>   understand, then I suppose we might add YCbCr 4:4:4 support
>   for MST. It shouldn't be any different from RGB apart from
>   the RGB->YCbCr conversion, which is handled elsewhere. But
>   YCbCr 4:2:0 is definitely out either way, the MST code has
>   no support for that currently.
> 
> -- 
> Ville Syrjälä
> Intel

-- 
Ville Syrjälä
Intel

^ permalink raw reply

* Re: [PATCH v5 3/4] iio: adc: ad4691: add triggered buffer support
From: kernel test robot @ 2026-03-30 23:50 UTC (permalink / raw)
  To: Radu Sabau via B4 Relay, Lars-Peter Clausen, Michael Hennerich,
	Jonathan Cameron, David Lechner, Nuno Sá, Andy Shevchenko,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Uwe Kleine-König, Liam Girdwood, Mark Brown, Linus Walleij,
	Bartosz Golaszewski, Philipp Zabel, Jonathan Corbet, Shuah Khan
  Cc: oe-kbuild-all, linux-iio, devicetree, linux-kernel, linux-pwm,
	linux-gpio, linux-doc, Radu Sabau
In-Reply-To: <20260327-ad4692-multichannel-sar-adc-driver-v5-3-11f789de47b8@analog.com>

Hi Radu,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 11439c4635edd669ae435eec308f4ab8a0804808]

url:    https://github.com/intel-lab-lkp/linux/commits/Radu-Sabau-via-B4-Relay/dt-bindings-iio-adc-add-AD4691-family/20260330-200546
base:   11439c4635edd669ae435eec308f4ab8a0804808
patch link:    https://lore.kernel.org/r/20260327-ad4692-multichannel-sar-adc-driver-v5-3-11f789de47b8%40analog.com
patch subject: [PATCH v5 3/4] iio: adc: ad4691: add triggered buffer support
config: i386-randconfig-r131-20260331 (https://download.01.org/0day-ci/archive/20260331/202603310753.zLWq0JDB-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
sparse: v0.6.5-rc1
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260331/202603310753.zLWq0JDB-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202603310753.zLWq0JDB-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
>> drivers/iio/adc/ad4691.c:675:47: sparse: sparse: dereference of noderef expression
>> drivers/iio/adc/ad4691.c:675:47: sparse: sparse: dereference of noderef expression
   drivers/iio/adc/ad4691.c:757:47: sparse: sparse: dereference of noderef expression
   drivers/iio/adc/ad4691.c:757:47: sparse: sparse: dereference of noderef expression
   drivers/iio/adc/ad4691.c:809:46: sparse: sparse: dereference of noderef expression
   drivers/iio/adc/ad4691.c:809:46: sparse: sparse: dereference of noderef expression
   drivers/iio/adc/ad4691.c:815:40: sparse: sparse: dereference of noderef expression
   drivers/iio/adc/ad4691.c:815:40: sparse: sparse: dereference of noderef expression
   drivers/iio/adc/ad4691.c: note: in included file:
   include/linux/bitmap.h:797:55: sparse: sparse: shift too big (32) for type unsigned long
   include/linux/bitmap.h:797:55: sparse: sparse: shift too big (32) for type unsigned long

vim +675 drivers/iio/adc/ad4691.c

   668	
   669	static int ad4691_manual_buffer_preenable(struct iio_dev *indio_dev)
   670	{
   671		struct ad4691_state *st = iio_priv(indio_dev);
   672		struct device *dev = regmap_get_device(st->regmap);
   673		struct spi_device *spi = to_spi_device(dev);
   674		unsigned int n_active = bitmap_weight(indio_dev->active_scan_mask,
 > 675						      indio_dev->masklength);
   676		unsigned int n_xfers = n_active + 1;
   677		unsigned int k, i;
   678		int ret;
   679	
   680		st->scan_xfers = kcalloc(n_xfers, sizeof(*st->scan_xfers), GFP_KERNEL);
   681		if (!st->scan_xfers)
   682			return -ENOMEM;
   683	
   684		st->scan_tx = kcalloc(n_xfers, sizeof(*st->scan_tx), GFP_KERNEL);
   685		if (!st->scan_tx) {
   686			kfree(st->scan_xfers);
   687			return -ENOMEM;
   688		}
   689	
   690		st->scan_rx = kcalloc(n_xfers, sizeof(*st->scan_rx), GFP_KERNEL);
   691		if (!st->scan_rx) {
   692			kfree(st->scan_tx);
   693			kfree(st->scan_xfers);
   694			return -ENOMEM;
   695		}
   696	
   697		spi_message_init(&st->scan_msg);
   698	
   699		k = 0;
   700		iio_for_each_active_channel(indio_dev, i) {
   701			st->scan_tx[k] = cpu_to_be16(AD4691_ADC_CHAN(i));
   702			st->scan_xfers[k].tx_buf = &st->scan_tx[k];
   703			st->scan_xfers[k].rx_buf = &st->scan_rx[k];
   704			st->scan_xfers[k].len = sizeof(__be16);
   705			st->scan_xfers[k].cs_change = 1;
   706			spi_message_add_tail(&st->scan_xfers[k], &st->scan_msg);
   707			k++;
   708		}
   709	
   710		/* Final NOOP transfer to retrieve last channel's result. */
   711		st->scan_tx[k] = cpu_to_be16(AD4691_NOOP);
   712		st->scan_xfers[k].tx_buf = &st->scan_tx[k];
   713		st->scan_xfers[k].rx_buf = &st->scan_rx[k];
   714		st->scan_xfers[k].len = sizeof(__be16);
   715		spi_message_add_tail(&st->scan_xfers[k], &st->scan_msg);
   716	
   717		st->scan_msg.spi = spi;
   718	
   719		ret = spi_optimize_message(spi, &st->scan_msg);
   720		if (ret) {
   721			ad4691_free_scan_bufs(st);
   722			return ret;
   723		}
   724	
   725		ret = ad4691_enter_conversion_mode(st);
   726		if (ret) {
   727			spi_unoptimize_message(&st->scan_msg);
   728			ad4691_free_scan_bufs(st);
   729			return ret;
   730		}
   731	
   732		return 0;
   733	}
   734	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox