Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] bpf: return EOPNOTSUPP when JIT is needed and not possible
From: Daniel Borkmann @ 2021-12-09 23:03 UTC (permalink / raw)
  To: Ido Schimmel, John Fastabend
  Cc: Thadeu Lima de Souza Cascardo, bpf, netdev, ast, linux-kernel,
	kuba
In-Reply-To: <YbJZoK+qBEiLAxxM@shredder>

On 12/9/21 8:31 PM, Ido Schimmel wrote:
> On Thu, Dec 09, 2021 at 11:05:18AM -0800, John Fastabend wrote:
>> Thadeu Lima de Souza Cascardo wrote:
>>> When a CBPF program is JITed and CONFIG_BPF_JIT_ALWAYS_ON is enabled, and
>>> the JIT fails, it would return ENOTSUPP, which is not a valid userspace
>>> error code.  Instead, EOPNOTSUPP should be returned.
>>>
>>> Fixes: 290af86629b2 ("bpf: introduce BPF_JIT_ALWAYS_ON config")
>>> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
>>> ---
>>>   kernel/bpf/core.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
>>> index de3e5bc6781f..5c89bae0d6f9 100644
>>> --- a/kernel/bpf/core.c
>>> +++ b/kernel/bpf/core.c
>>> @@ -1931,7 +1931,7 @@ struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err)
>>>   		fp = bpf_int_jit_compile(fp);
>>>   		bpf_prog_jit_attempt_done(fp);
>>>   		if (!fp->jited && jit_needed) {
>>> -			*err = -ENOTSUPP;
>>> +			*err = -EOPNOTSUPP;
>>>   			return fp;
>>>   		}
>>>   	} else {
>>
>> It seems BPF subsys returns ENOTSUPP in multiple places. This fixes one
>> paticular case and is user facing. Not sure we want to one-off fix them
>> here creating user facing changes over multiple kernel versions. On the
>> fence with this one curious to see what others think. Haven't apps
>> already adapted to the current convention or they don't care?
> 
> Similar issue was discussed in the past. See:
> https://lore.kernel.org/netdev/20191204.125135.750458923752225025.davem@davemloft.net/

With regards to ENOTSUPP exposure, if the consensus is that we should fix all
occurences over to EOPNOTSUPP even if they've been exposed for quite some time
(Jakub?), we could give this patch a try maybe via bpf-next and see if anyone
complains.

Thadeu, I think you also need to fix up BPF selftests as test_verifier, to mention
one example (there are also bunch of others under tools/testing/selftests/), is
checking for ENOTSUPP specifically..

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH net-next] net: dsa: mv88e6xxx: Add tx fwd offload PVT on intermediate devices
From: Tobias Waldekranz @ 2021-12-09 22:52 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: davem, kuba, andrew, vivien.didelot, f.fainelli, netdev
In-Reply-To: <20211209224146.gfldu66kqmkgcg54@skbuf>

On Fri, Dec 10, 2021 at 00:41, Vladimir Oltean <olteanv@gmail.com> wrote:
> On Thu, Dec 09, 2021 at 11:24:24PM +0100, Tobias Waldekranz wrote:
>> In a typical mv88e6xxx switch tree like this:
>> 
>>   CPU
>>    |    .----.
>> .--0--. | .--0--.
>> | sw0 | | | sw1 |
>> '-1-2-' | '-1-2-'
>>     '---'
>> 
>> If sw1p{1,2} are added to a bridge that sw0p1 is not a part of, sw0
>> still needs to add a crosschip PVT entry for the virtual DSA device
>> assigned to represent the bridge.
>> 
>> Fixes: ce5df6894a57 ("net: dsa: mv88e6xxx: map virtual bridges with forwarding offload in the PVT")
>> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
>> ---
>
> This makes sense. Sorry, my Turris MOX has 3 cascaded switches but I
> only test it using a single bridge that spans all of the ports.
> So this is why in my case the DSA and CPU ports could receive packets
> using the virtual bridge device, because mv88e6xxx_port_vlan() had been
> called on them through the direct mv88e6xxx_port_bridge_join(), not
> through mv88e6xxx_crosschip_bridge_join().

Yeah this is by far the most common setup, that's why I missed it as
well.

> I guess you have a use case
> where some leaf ports are in a bridge but some upstream ports aren't,
> and this is how you caught this?

I've been doing some work on running kselftest-like tests on a multichip
mv88e6xxx system. In that process, I discovered this issue along with a
whole slew of other nasty things related to isolation of standalone
ports.

I am finalizing a series to tackle that which (while not exactly
elegant) should get the job done. Stay tuned :)

^ permalink raw reply

* Re: [PATCH net-next] net: dsa: mv88e6xxx: Add tx fwd offload PVT on intermediate devices
From: Vladimir Oltean @ 2021-12-09 22:41 UTC (permalink / raw)
  To: Tobias Waldekranz; +Cc: davem, kuba, andrew, vivien.didelot, f.fainelli, netdev
In-Reply-To: <20211209222424.124791-1-tobias@waldekranz.com>

On Thu, Dec 09, 2021 at 11:24:24PM +0100, Tobias Waldekranz wrote:
> In a typical mv88e6xxx switch tree like this:
> 
>   CPU
>    |    .----.
> .--0--. | .--0--.
> | sw0 | | | sw1 |
> '-1-2-' | '-1-2-'
>     '---'
> 
> If sw1p{1,2} are added to a bridge that sw0p1 is not a part of, sw0
> still needs to add a crosschip PVT entry for the virtual DSA device
> assigned to represent the bridge.
> 
> Fixes: ce5df6894a57 ("net: dsa: mv88e6xxx: map virtual bridges with forwarding offload in the PVT")
> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
> ---

This makes sense. Sorry, my Turris MOX has 3 cascaded switches but I
only test it using a single bridge that spans all of the ports.
So this is why in my case the DSA and CPU ports could receive packets
using the virtual bridge device, because mv88e6xxx_port_vlan() had been
called on them through the direct mv88e6xxx_port_bridge_join(), not
through mv88e6xxx_crosschip_bridge_join(). I guess you have a use case
where some leaf ports are in a bridge but some upstream ports aren't,
and this is how you caught this?

Thanks!

Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>

> 
> Though this is a bugfix, it still targets net-next as it depends on
> the recent work done by Vladimir here:
> 
> https://lore.kernel.org/netdev/20211206165758.1553882-1-vladimir.oltean@nxp.com/
> 
>  drivers/net/dsa/mv88e6xxx/chip.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> index 7fadbf987b23..85f5a35340d7 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -2522,6 +2522,7 @@ static int mv88e6xxx_crosschip_bridge_join(struct dsa_switch *ds,
>  
>  	mv88e6xxx_reg_lock(chip);
>  	err = mv88e6xxx_pvt_map(chip, sw_index, port);
> +	err = err ? : mv88e6xxx_map_virtual_bridge_to_pvt(ds, bridge.num);
>  	mv88e6xxx_reg_unlock(chip);
>  
>  	return err;
> @@ -2537,7 +2538,8 @@ static void mv88e6xxx_crosschip_bridge_leave(struct dsa_switch *ds,
>  		return;
>  
>  	mv88e6xxx_reg_lock(chip);
> -	if (mv88e6xxx_pvt_map(chip, sw_index, port))
> +	if (mv88e6xxx_pvt_map(chip, sw_index, port) ||
> +	    mv88e6xxx_map_virtual_bridge_to_pvt(ds, bridge.num))
>  		dev_err(ds->dev, "failed to remap cross-chip Port VLAN\n");
>  	mv88e6xxx_reg_unlock(chip);
>  }
> -- 
> 2.25.1
> 


^ permalink raw reply

* Re: [PATCH net-next v3] net: phylink: Add helpers for c22 registers without MDIO
From: Sean Anderson @ 2021-12-09 22:26 UTC (permalink / raw)
  To: netdev, Russell King
  Cc: linux-kernel, Horatiu Vultur, Jakub Kicinski, Andrew Lunn,
	David S . Miller, Heiner Kallweit, Russell King
In-Reply-To: <20211119155809.2707305-1-sean.anderson@seco.com>

ping?

On 11/19/21 10:58 AM, Sean Anderson wrote:
> Some devices expose memory-mapped c22-compliant PHYs. Because these
> devices do not have an MDIO bus, we cannot use the existing helpers.
> Refactor the existing helpers to allow supplying the values for c22
> registers directly, instead of using MDIO to access them. Only get_state
> and set_advertisement are converted, since they contain the most complex
> logic. Because set_advertisement is never actually used outside
> phylink_mii_c22_pcs_config, move the MDIO-writing part into that
> function. Because some modes do not need the advertisement register set
> at all, we use -EINVAL for this purpose.
> 
> Additionally, a new function phylink_pcs_enable_an is provided to
> determine whether to enable autonegotiation.
> 
> Signed-off-by: Sean Anderson <sean.anderson@seco.com>
> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> ---
> This series was originally submitted as [1]. Although does it not
> include its intended user (macb), I have submitted it separately at the
> behest of Russell.
> 
> [1] https://lore.kernel.org/netdev/YVtypfZJfivfDnu7@lunn.ch/T/#m50877e4daf344ac0b5efced38c79246ad2b9cb6e
> 
> Changes in v3:
> - Change adv type from u16 to int
> 
> Changes in v2:
> - Add phylink_pcs_enable_an
> - Also remove set_advertisement
> - Use mdiobus_modify_changed
> 
>   drivers/net/phy/phylink.c | 120 +++++++++++++++++++++-----------------
>   include/linux/phylink.h   |   7 ++-
>   2 files changed, 72 insertions(+), 55 deletions(-)
> 
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index 33462fdc7add..428f9dc02d0e 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -2813,6 +2813,52 @@ void phylink_decode_usxgmii_word(struct phylink_link_state *state,
>   }
>   EXPORT_SYMBOL_GPL(phylink_decode_usxgmii_word);
>   
> +/**
> + * phylink_mii_c22_pcs_decode_state() - Decode MAC PCS state from MII registers
> + * @state: a pointer to a &struct phylink_link_state.
> + * @bmsr: The value of the %MII_BMSR register
> + * @lpa: The value of the %MII_LPA register
> + *
> + * Helper for MAC PCS supporting the 802.3 clause 22 register set for
> + * clause 37 negotiation and/or SGMII control.
> + *
> + * Parse the Clause 37 or Cisco SGMII link partner negotiation word into
> + * the phylink @state structure. This is suitable to be used for implementing
> + * the mac_pcs_get_state() member of the struct phylink_mac_ops structure if
> + * accessing @bmsr and @lpa cannot be done with MDIO directly.
> + */
> +void phylink_mii_c22_pcs_decode_state(struct phylink_link_state *state,
> +				      u16 bmsr, u16 lpa)
> +{
> +	state->link = !!(bmsr & BMSR_LSTATUS);
> +	state->an_complete = !!(bmsr & BMSR_ANEGCOMPLETE);
> +	/* If there is no link or autonegotiation is disabled, the LP advertisement
> +	 * data is not meaningful, so don't go any further.
> +	 */
> +	if (!state->link || !state->an_enabled)
> +		return;
> +
> +	switch (state->interface) {
> +	case PHY_INTERFACE_MODE_1000BASEX:
> +		phylink_decode_c37_word(state, lpa, SPEED_1000);
> +		break;
> +
> +	case PHY_INTERFACE_MODE_2500BASEX:
> +		phylink_decode_c37_word(state, lpa, SPEED_2500);
> +		break;
> +
> +	case PHY_INTERFACE_MODE_SGMII:
> +	case PHY_INTERFACE_MODE_QSGMII:
> +		phylink_decode_sgmii_word(state, lpa);
> +		break;
> +
> +	default:
> +		state->link = false;
> +		break;
> +	}
> +}
> +EXPORT_SYMBOL_GPL(phylink_mii_c22_pcs_decode_state);
> +
>   /**
>    * phylink_mii_c22_pcs_get_state() - read the MAC PCS state
>    * @pcs: a pointer to a &struct mdio_device.
> @@ -2839,55 +2885,26 @@ void phylink_mii_c22_pcs_get_state(struct mdio_device *pcs,
>   		return;
>   	}
>   
> -	state->link = !!(bmsr & BMSR_LSTATUS);
> -	state->an_complete = !!(bmsr & BMSR_ANEGCOMPLETE);
> -	/* If there is no link or autonegotiation is disabled, the LP advertisement
> -	 * data is not meaningful, so don't go any further.
> -	 */
> -	if (!state->link || !state->an_enabled)
> -		return;
> -
> -	switch (state->interface) {
> -	case PHY_INTERFACE_MODE_1000BASEX:
> -		phylink_decode_c37_word(state, lpa, SPEED_1000);
> -		break;
> -
> -	case PHY_INTERFACE_MODE_2500BASEX:
> -		phylink_decode_c37_word(state, lpa, SPEED_2500);
> -		break;
> -
> -	case PHY_INTERFACE_MODE_SGMII:
> -	case PHY_INTERFACE_MODE_QSGMII:
> -		phylink_decode_sgmii_word(state, lpa);
> -		break;
> -
> -	default:
> -		state->link = false;
> -		break;
> -	}
> +	phylink_mii_c22_pcs_decode_state(state, bmsr, lpa);
>   }
>   EXPORT_SYMBOL_GPL(phylink_mii_c22_pcs_get_state);
>   
>   /**
> - * phylink_mii_c22_pcs_set_advertisement() - configure the clause 37 PCS
> + * phylink_mii_c22_pcs_encode_advertisement() - configure the clause 37 PCS
>    *	advertisement
> - * @pcs: a pointer to a &struct mdio_device.
>    * @interface: the PHY interface mode being configured
>    * @advertising: the ethtool advertisement mask
>    *
>    * Helper for MAC PCS supporting the 802.3 clause 22 register set for
>    * clause 37 negotiation and/or SGMII control.
>    *
> - * Configure the clause 37 PCS advertisement as specified by @state. This
> - * does not trigger a renegotiation; phylink will do that via the
> - * mac_an_restart() method of the struct phylink_mac_ops structure.
> + * Encode the clause 37 PCS advertisement as specified by @interface and
> + * @advertising.
>    *
> - * Returns negative error code on failure to configure the advertisement,
> - * zero if no change has been made, or one if the advertisement has changed.
> + * Return: The new value for @adv, or ``-EINVAL`` if it should not be changed.
>    */
> -int phylink_mii_c22_pcs_set_advertisement(struct mdio_device *pcs,
> -					  phy_interface_t interface,
> -					  const unsigned long *advertising)
> +int phylink_mii_c22_pcs_encode_advertisement(phy_interface_t interface,
> +					     const unsigned long *advertising)
>   {
>   	u16 adv;
>   
> @@ -2901,18 +2918,15 @@ int phylink_mii_c22_pcs_set_advertisement(struct mdio_device *pcs,
>   		if (linkmode_test_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT,
>   				      advertising))
>   			adv |= ADVERTISE_1000XPSE_ASYM;
> -
> -		return mdiodev_modify_changed(pcs, MII_ADVERTISE, 0xffff, adv);
> -
> +		return adv;
>   	case PHY_INTERFACE_MODE_SGMII:
> -		return mdiodev_modify_changed(pcs, MII_ADVERTISE, 0xffff, 0x0001);
> -
> +		return 0x0001;
>   	default:
>   		/* Nothing to do for other modes */
> -		return 0;
> +		return -EINVAL;
>   	}
>   }
> -EXPORT_SYMBOL_GPL(phylink_mii_c22_pcs_set_advertisement);
> +EXPORT_SYMBOL_GPL(phylink_mii_c22_pcs_encode_advertisement);
>   
>   /**
>    * phylink_mii_c22_pcs_config() - configure clause 22 PCS
> @@ -2930,16 +2944,18 @@ int phylink_mii_c22_pcs_config(struct mdio_device *pcs, unsigned int mode,
>   			       phy_interface_t interface,
>   			       const unsigned long *advertising)
>   {
> -	bool changed;
> +	bool changed = 0;
>   	u16 bmcr;
> -	int ret;
> +	int ret, adv;
>   
> -	ret = phylink_mii_c22_pcs_set_advertisement(pcs, interface,
> -						    advertising);
> -	if (ret < 0)
> -		return ret;
> -
> -	changed = ret > 0;
> +	adv = phylink_mii_c22_pcs_encode_advertisement(interface, advertising);
> +	if (adv >= 0) {
> +		ret = mdiobus_modify_changed(pcs->bus, pcs->addr,
> +					     MII_ADVERTISE, 0xffff, adv);
> +		if (ret < 0)
> +			return ret;
> +		changed = ret;
> +	}
>   
>   	/* Ensure ISOLATE bit is disabled */
>   	if (mode == MLO_AN_INBAND &&
> @@ -2952,7 +2968,7 @@ int phylink_mii_c22_pcs_config(struct mdio_device *pcs, unsigned int mode,
>   	if (ret < 0)
>   		return ret;
>   
> -	return changed ? 1 : 0;
> +	return changed;
>   }
>   EXPORT_SYMBOL_GPL(phylink_mii_c22_pcs_config);
>   
> diff --git a/include/linux/phylink.h b/include/linux/phylink.h
> index 3563820a1765..01224235df0f 100644
> --- a/include/linux/phylink.h
> +++ b/include/linux/phylink.h
> @@ -527,11 +527,12 @@ void phylink_set_port_modes(unsigned long *bits);
>   void phylink_set_10g_modes(unsigned long *mask);
>   void phylink_helper_basex_speed(struct phylink_link_state *state);
>   
> +void phylink_mii_c22_pcs_decode_state(struct phylink_link_state *state,
> +				      u16 bmsr, u16 lpa);
>   void phylink_mii_c22_pcs_get_state(struct mdio_device *pcs,
>   				   struct phylink_link_state *state);
> -int phylink_mii_c22_pcs_set_advertisement(struct mdio_device *pcs,
> -					  phy_interface_t interface,
> -					  const unsigned long *advertising);
> +int phylink_mii_c22_pcs_encode_advertisement(phy_interface_t interface,
> +					     const unsigned long *advertising);
>   int phylink_mii_c22_pcs_config(struct mdio_device *pcs, unsigned int mode,
>   			       phy_interface_t interface,
>   			       const unsigned long *advertising);
> 

^ permalink raw reply

* [PATCH net-next] net: dsa: mv88e6xxx: Add tx fwd offload PVT on intermediate devices
From: Tobias Waldekranz @ 2021-12-09 22:24 UTC (permalink / raw)
  To: davem, kuba; +Cc: andrew, vivien.didelot, f.fainelli, olteanv, netdev

In a typical mv88e6xxx switch tree like this:

  CPU
   |    .----.
.--0--. | .--0--.
| sw0 | | | sw1 |
'-1-2-' | '-1-2-'
    '---'

If sw1p{1,2} are added to a bridge that sw0p1 is not a part of, sw0
still needs to add a crosschip PVT entry for the virtual DSA device
assigned to represent the bridge.

Fixes: ce5df6894a57 ("net: dsa: mv88e6xxx: map virtual bridges with forwarding offload in the PVT")
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
---

Though this is a bugfix, it still targets net-next as it depends on
the recent work done by Vladimir here:

https://lore.kernel.org/netdev/20211206165758.1553882-1-vladimir.oltean@nxp.com/

 drivers/net/dsa/mv88e6xxx/chip.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 7fadbf987b23..85f5a35340d7 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2522,6 +2522,7 @@ static int mv88e6xxx_crosschip_bridge_join(struct dsa_switch *ds,
 
 	mv88e6xxx_reg_lock(chip);
 	err = mv88e6xxx_pvt_map(chip, sw_index, port);
+	err = err ? : mv88e6xxx_map_virtual_bridge_to_pvt(ds, bridge.num);
 	mv88e6xxx_reg_unlock(chip);
 
 	return err;
@@ -2537,7 +2538,8 @@ static void mv88e6xxx_crosschip_bridge_leave(struct dsa_switch *ds,
 		return;
 
 	mv88e6xxx_reg_lock(chip);
-	if (mv88e6xxx_pvt_map(chip, sw_index, port))
+	if (mv88e6xxx_pvt_map(chip, sw_index, port) ||
+	    mv88e6xxx_map_virtual_bridge_to_pvt(ds, bridge.num))
 		dev_err(ds->dev, "failed to remap cross-chip Port VLAN\n");
 	mv88e6xxx_reg_unlock(chip);
 }
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCHv2 net-next 1/2] net_tstamp: add new flag HWTSTAMP_FLAG_BONDED_PHC_INDEX
From: Richard Cochran @ 2021-12-09 21:38 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: netdev, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	David S . Miller, Jakub Kicinski, Heiner Kallweit
In-Reply-To: <20211209213347.GE21819@hoboy.vegasvil.org>

On Thu, Dec 09, 2021 at 01:33:47PM -0800, Richard Cochran wrote:
> On Thu, Dec 09, 2021 at 06:24:48PM +0800, Hangbin Liu wrote:
> 
> > +/* possible values for hwtstamp_config->flags */
> > +enum hwtstamp_flags {
> > +	/*
> > +	 * With this flag, the user could get bond active interface's
> > +	 * PHC index. Note this PHC index is not stable as when there
> > +	 * is a failover, the bond active interface will be changed, so
> > +	 * will be the PHC index.
> > +	 */
> > +	HWTSTAMP_FLAG_BONDED_PHC_INDEX = (1<<0),
> > +
> > +	/* add new constants above here */
> > +	__HWTSTAMP_FLAGS_CNT
> > +};
> 
> I think this shouldn't be an enumerated type, but rather simply a bit
> field of independent options.

Ok, it can be an enum (to be like the other fields in this header) but
still the bits need to be independent of each other.

IOW, you should drop __HWTSTAMP_FLAGS_CNT and instead use a mask of
valid bits.

Thanks,
Richard

^ permalink raw reply

* Re: [PATCHv2 net-next 1/2] net_tstamp: add new flag HWTSTAMP_FLAG_BONDED_PHC_INDEX
From: Richard Cochran @ 2021-12-09 21:33 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: netdev, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	David S . Miller, Jakub Kicinski, Heiner Kallweit
In-Reply-To: <20211209102449.2000401-2-liuhangbin@gmail.com>

On Thu, Dec 09, 2021 at 06:24:48PM +0800, Hangbin Liu wrote:

> +/* possible values for hwtstamp_config->flags */
> +enum hwtstamp_flags {
> +	/*
> +	 * With this flag, the user could get bond active interface's
> +	 * PHC index. Note this PHC index is not stable as when there
> +	 * is a failover, the bond active interface will be changed, so
> +	 * will be the PHC index.
> +	 */
> +	HWTSTAMP_FLAG_BONDED_PHC_INDEX = (1<<0),
> +
> +	/* add new constants above here */
> +	__HWTSTAMP_FLAGS_CNT
> +};

I think this shouldn't be an enumerated type, but rather simply a bit
field of independent options.

Thanks,
Richard

^ permalink raw reply

* Re: [PATCH net-next 1/2] net_tstamp: add new flag HWTSTAMP_FLAGS_UNSTABLE_PHC
From: Richard Cochran @ 2021-12-09 21:30 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: netdev, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	David S . Miller, Jakub Kicinski, Heiner Kallweit
In-Reply-To: <20211209212258.GB21819@hoboy.vegasvil.org>

On Thu, Dec 09, 2021 at 01:22:58PM -0800, Richard Cochran wrote:
> On Thu, Dec 09, 2021 at 12:31:30PM +0800, Hangbin Liu wrote:
> > On Wed, Dec 08, 2021 at 07:20:22AM -0800, Richard Cochran wrote:
> > > I guess that the original intent of hwtstamp_config.flags was for user
> > > space to SET flags that it wanted. 
> > > Now this has become a place for drivers to return values back.
> > 
> > I think it's a flag that when uses want phc index of bond.
> > There is no affect for other drivers. It only affect bond interfaces.
> > When this flag set, it means users want to get the info from bond.
> > 
> > Do I missed something?
> 
> No, I simply mean that the input/output direction of the bit in the
> flags should be clear.
> 
> - User space will not set this bit, only read it.
> - Drivers may set this bit, but if user sets it, it is an error.

Oh, I am confused.  Your patch does this:

- if user sets bit, then return bonded index
- otherwise, return EOPNOTSUPP

That is fine with me.

Thanks,
Richard

^ permalink raw reply

* Re: [PATCH] skbuff: Extract list pointers to silence compiler warnings
From: patchwork-bot+netdevbpf @ 2021-12-09 21:30 UTC (permalink / raw)
  To: Kees Cook
  Cc: kuba, davem, jonathan.lemon, edumazet, elver, alobakin, pabeni,
	cong.wang, talalahmad, haokexin, netdev, linux-kernel,
	linux-hardening
In-Reply-To: <20211207062758.2324338-1-keescook@chromium.org>

Hello:

This patch was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Mon,  6 Dec 2021 22:27:58 -0800 you wrote:
> Under both -Warray-bounds and the object_size sanitizer, the compiler is
> upset about accessing prev/next of sk_buff when the object it thinks it
> is coming from is sk_buff_head. The warning is a false positive due to
> the compiler taking a conservative approach, opting to warn at casting
> time rather than access time.
> 
> However, in support of enabling -Warray-bounds globally (which has
> found many real bugs), arrange things for sk_buff so that the compiler
> can unambiguously see that there is no intention to access anything
> except prev/next.  Introduce and cast to a separate struct sk_buff_list,
> which contains _only_ the first two fields, silencing the warnings:
> 
> [...]

Here is the summary with links:
  - skbuff: Extract list pointers to silence compiler warnings
    https://git.kernel.org/netdev/net-next/c/1a2fb220edca

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCHv2 net-next 1/2] net_tstamp: add new flag HWTSTAMP_FLAG_BONDED_PHC_INDEX
From: Richard Cochran @ 2021-12-09 21:27 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: netdev, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	David S . Miller, Jakub Kicinski, Heiner Kallweit
In-Reply-To: <20211209102449.2000401-2-liuhangbin@gmail.com>

On Thu, Dec 09, 2021 at 06:24:48PM +0800, Hangbin Liu wrote:
> diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c
> index 1d309a666932..10ac5457dcbc 100644
> --- a/net/core/dev_ioctl.c
> +++ b/net/core/dev_ioctl.c
> @@ -186,18 +186,27 @@ static int net_hwtstamp_validate(struct ifreq *ifr)
>  	struct hwtstamp_config cfg;
>  	enum hwtstamp_tx_types tx_type;
>  	enum hwtstamp_rx_filters rx_filter;
> -	int tx_type_valid = 0;
> +	enum hwtstamp_flags flag;
>  	int rx_filter_valid = 0;
> +	int tx_type_valid = 0;
> +	int flag_valid = 0;
>  
>  	if (copy_from_user(&cfg, ifr->ifr_data, sizeof(cfg)))
>  		return -EFAULT;
>  
> -	if (cfg.flags) /* reserved for future extensions */
> -		return -EINVAL;
> -
> +	flag = cfg.flags;
>  	tx_type = cfg.tx_type;
>  	rx_filter = cfg.rx_filter;
>  
> +	switch (flag) {
> +	case HWTSTAMP_FLAG_BONDED_PHC_INDEX:
> +		flag_valid = 1;

This isn't correct.  You need to use a bitwise test.

Thanks,
Richard


> +		break;
> +	case __HWTSTAMP_FLAGS_CNT:
> +		/* not a real value */
> +		break;
> +	}
> +
>  	switch (tx_type) {
>  	case HWTSTAMP_TX_OFF:
>  	case HWTSTAMP_TX_ON:
> @@ -234,7 +243,7 @@ static int net_hwtstamp_validate(struct ifreq *ifr)
>  		break;
>  	}
>  
> -	if (!tx_type_valid || !rx_filter_valid)
> +	if (!flag_valid || !tx_type_valid || !rx_filter_valid)
>  		return -ERANGE;
>  
>  	return 0;
> -- 
> 2.31.1
> 

^ permalink raw reply

* Re: [PATCH net-next 1/2] net_tstamp: add new flag HWTSTAMP_FLAGS_UNSTABLE_PHC
From: Richard Cochran @ 2021-12-09 21:22 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: netdev, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	David S . Miller, Jakub Kicinski, Heiner Kallweit
In-Reply-To: <YbGGosXXCvBAJEx4@Laptop-X1>

On Thu, Dec 09, 2021 at 12:31:30PM +0800, Hangbin Liu wrote:
> On Wed, Dec 08, 2021 at 07:20:22AM -0800, Richard Cochran wrote:
> > I guess that the original intent of hwtstamp_config.flags was for user
> > space to SET flags that it wanted. 
> > Now this has become a place for drivers to return values back.
> 
> I think it's a flag that when uses want phc index of bond.
> There is no affect for other drivers. It only affect bond interfaces.
> When this flag set, it means users want to get the info from bond.
> 
> Do I missed something?

No, I simply mean that the input/output direction of the bit in the
flags should be clear.

- User space will not set this bit, only read it.
- Drivers may set this bit, but if user sets it, it is an error.

Thanks,
Richard

^ permalink raw reply

* RE: [PATCH V6 5/5] net: netvsc: Add Isolation VM support for netvsc driver
From: Haiyang Zhang @ 2021-12-09 20:40 UTC (permalink / raw)
  To: Michael Kelley (LINUX), Tianyu Lan, KY Srinivasan,
	Stephen Hemminger, wei.liu@kernel.org, Dexuan Cui,
	tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
	davem@davemloft.net, kuba@kernel.org, jejb@linux.ibm.com,
	martin.petersen@oracle.com, arnd@arndb.de, hch@infradead.org,
	m.szyprowski@samsung.com, robin.murphy@arm.com, Tianyu Lan,
	thomas.lendacky@amd.com
  Cc: iommu@lists.linux-foundation.org, linux-arch@vger.kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-scsi@vger.kernel.org, netdev@vger.kernel.org, vkuznets,
	brijesh.singh@amd.com, konrad.wilk@oracle.com, hch@lst.de,
	joro@8bytes.org, parri.andrea@gmail.com, dave.hansen@intel.com
In-Reply-To: <MWHPR21MB1593AF3BB6CBCA14B3805D35D7709@MWHPR21MB1593.namprd21.prod.outlook.com>



> -----Original Message-----
> From: Michael Kelley (LINUX) <mikelley@microsoft.com>
> Sent: Thursday, December 9, 2021 2:54 PM
> To: Haiyang Zhang <haiyangz@microsoft.com>; Tianyu Lan <ltykernel@gmail.com>; KY
> Srinivasan <kys@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>;
> wei.liu@kernel.org; Dexuan Cui <decui@microsoft.com>; tglx@linutronix.de; mingo@redhat.com;
> bp@alien8.de; dave.hansen@linux.intel.com; x86@kernel.org; hpa@zytor.com;
> davem@davemloft.net; kuba@kernel.org; jejb@linux.ibm.com; martin.petersen@oracle.com;
> arnd@arndb.de; hch@infradead.org; m.szyprowski@samsung.com; robin.murphy@arm.com; Tianyu
> Lan <Tianyu.Lan@microsoft.com>; thomas.lendacky@amd.com
> Cc: iommu@lists.linux-foundation.org; linux-arch@vger.kernel.org; linux-
> hyperv@vger.kernel.org; linux-kernel@vger.kernel.org; linux-scsi@vger.kernel.org;
> netdev@vger.kernel.org; vkuznets <vkuznets@redhat.com>; brijesh.singh@amd.com;
> konrad.wilk@oracle.com; hch@lst.de; joro@8bytes.org; parri.andrea@gmail.com;
> dave.hansen@intel.com
> Subject: RE: [PATCH V6 5/5] net: netvsc: Add Isolation VM support for netvsc driver
> 
> From: Haiyang Zhang <haiyangz@microsoft.com> Sent: Wednesday, December 8, 2021 12:14 PM
> > > From: Tianyu Lan <ltykernel@gmail.com>
> > > Sent: Tuesday, December 7, 2021 2:56 AM
> 
> [snip]
> 
> > >  static inline int netvsc_send_pkt(
> > >  	struct hv_device *device,
> > >  	struct hv_netvsc_packet *packet,
> > > @@ -986,14 +1105,24 @@ static inline int netvsc_send_pkt(
> > >
> > >  	trace_nvsp_send_pkt(ndev, out_channel, rpkt);
> > >
> > > +	packet->dma_range = NULL;
> > >  	if (packet->page_buf_cnt) {
> > >  		if (packet->cp_partial)
> > >  			pb += packet->rmsg_pgcnt;
> > >
> > > +		ret = netvsc_dma_map(ndev_ctx->device_ctx, packet, pb);
> > > +		if (ret) {
> > > +			ret = -EAGAIN;
> > > +			goto exit;
> > > +		}
> >
> > Returning EAGAIN will let the upper network layer busy retry,
> > which may make things worse.
> > I suggest to return ENOSPC here like another place in this
> > function, which will just drop the packet, and let the network
> > protocol/app layer decide how to recover.
> >
> > Thanks,
> > - Haiyang
> 
> I made the original suggestion to return -EAGAIN here.   A
> DMA mapping failure should occur only if swiotlb bounce
> buffer space is unavailable, which is a transient condition.
> The existing code already stops the queue and returns
> -EAGAIN when the ring buffer is full, which is also a transient
> condition.  My sense is that the two conditions should be
> handled the same way.  Or is there a reason why a ring
> buffer full condition should stop the queue and retry, while
> a mapping failure should drop the packet?

The netvsc_dma_map() fails in these two places. The dma_map_single() 
is doing the maping with swiotlb bounce buffer, correct? And it will 
become successful after the previous packets are unmapped?

+	packet->dma_range = kcalloc(page_count,
+				    sizeof(*packet->dma_range),
+				    GFP_KERNEL);

+		dma = dma_map_single(&hv_dev->device, src, len,
+				     DMA_TO_DEVICE);

I recalled your previous suggestion now, and agree with you that 
we can treat it the same way (return -EAGAIN) in this case. And 
the existing code will stop the queue temporarily.

Thanks,
- Haiyang

^ permalink raw reply

* RE: [PATCH V6 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()
From: Michael Kelley (LINUX) @ 2021-12-09 20:38 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu@kernel.org, Dexuan Cui, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org, hpa@zytor.com, davem@davemloft.net,
	kuba@kernel.org, jejb@linux.ibm.com, martin.petersen@oracle.com,
	arnd@arndb.de, hch@infradead.org, m.szyprowski@samsung.com,
	robin.murphy@arm.com, Tianyu Lan, thomas.lendacky@amd.com
  Cc: iommu@lists.linux-foundation.org, linux-arch@vger.kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-scsi@vger.kernel.org, netdev@vger.kernel.org, vkuznets,
	brijesh.singh@amd.com, konrad.wilk@oracle.com, hch@lst.de,
	joro@8bytes.org, parri.andrea@gmail.com, dave.hansen@intel.com
In-Reply-To: <20211207075602.2452-3-ltykernel@gmail.com>

From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, December 6, 2021 11:56 PM
> 
> Hyper-V provides Isolation VM which has memory encrypt support. Add
> hyperv_cc_platform_has() and return true for check of GUEST_MEM_ENCRYPT
> attribute.
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
> Change since v3:
> 	* Change code style of checking GUEST_MEM attribute in the
> 	  hyperv_cc_platform_has().
> ---
>  arch/x86/kernel/cc_platform.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c
> index 03bb2f343ddb..47db88c275d5 100644
> --- a/arch/x86/kernel/cc_platform.c
> +++ b/arch/x86/kernel/cc_platform.c
> @@ -11,6 +11,7 @@
>  #include <linux/cc_platform.h>
>  #include <linux/mem_encrypt.h>
> 
> +#include <asm/mshyperv.h>
>  #include <asm/processor.h>
> 
>  static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr)
> @@ -58,9 +59,16 @@ static bool amd_cc_platform_has(enum cc_attr attr)
>  #endif
>  }
> 
> +static bool hyperv_cc_platform_has(enum cc_attr attr)
> +{
> +	return attr == CC_ATTR_GUEST_MEM_ENCRYPT;
> +}
> 
>  bool cc_platform_has(enum cc_attr attr)
>  {
> +	if (hv_is_isolation_supported())
> +		return hyperv_cc_platform_has(attr);
> +
>  	if (sme_me_mask)
>  		return amd_cc_platform_has(attr);
> 

Throughout Linux kernel code, there are about 20 calls to cc_platform_has()
with CC_ATTR_GUEST_MEM_ENCRYPT as the argument.  The original code
(from v1 of this patch set) only dealt with the call in sev_setup_arch().   But
with this patch, all the other calls that previously returned "false" will now
return "true" in a Hyper-V Isolated VM.  I didn't try to analyze all these other
calls, so I think there's an open question about whether this is the behavior
we want.

Michael

^ permalink raw reply

* Re: [PATCH] tun: avoid double free in tun_free_netdev
From: George Kennedy @ 2021-12-09 20:26 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Stephen Hemminger, davem, netdev, linux-kernel
In-Reply-To: <20211208155838.24556030@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>



On 12/8/2021 6:58 PM, Jakub Kicinski wrote:
> On Wed, 8 Dec 2021 11:44:02 -0500 George Kennedy wrote:
>>> It looks like a lot of the problem is duplicate unwind.
>>> Why does err_free_flow, err_free_stat etc unwinds need to exist if
>>> the free_netdev is going to do same thing.
>> Maybe instead do not call security_tun_dev_free_security(tun->security)
>> in err_free_flow if it's going to be done anyway in tun_free_netdev().
> That won't be good either. register_netdevice() has multiple failure
> modes, it may or may not call the destructor depending on where it
> fails. Either the stuff that destructor undoes needs to be moved to
> ndo_init (which is what destructor always pairs with), or you can check
> dev->reg_state. If dev->reg_state is NETREG_UNREGISTERING that means
> the destructor will be caller later.
>
> The ndo_init way is preferable, just cut and past the appropriate lines
> preceding registration into a ndo_init callback.
Thank you Jakub,

Looking at other ndo_init's to try and figure out how much before 
register_netdevice() should go in it.

George

^ permalink raw reply

* RE: [PATCH V6 3/5] hyper-v: Enable swiotlb bounce buffer for Isolation VM
From: Michael Kelley (LINUX) @ 2021-12-09 20:09 UTC (permalink / raw)
  To: Tianyu Lan, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu@kernel.org, Dexuan Cui, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org, hpa@zytor.com, davem@davemloft.net,
	kuba@kernel.org, jejb@linux.ibm.com, martin.petersen@oracle.com,
	arnd@arndb.de, hch@infradead.org, m.szyprowski@samsung.com,
	robin.murphy@arm.com, Tianyu Lan, thomas.lendacky@amd.com
  Cc: iommu@lists.linux-foundation.org, linux-arch@vger.kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-scsi@vger.kernel.org, netdev@vger.kernel.org, vkuznets,
	brijesh.singh@amd.com, konrad.wilk@oracle.com, hch@lst.de,
	joro@8bytes.org, parri.andrea@gmail.com, dave.hansen@intel.com
In-Reply-To: <20211207075602.2452-4-ltykernel@gmail.com>

From: Tianyu Lan <ltykernel@gmail.com> Sent: Monday, December 6, 2021 11:56 PM
> 
> hyperv Isolation VM requires bounce buffer support to copy
> data from/to encrypted memory and so enable swiotlb force
> mode to use swiotlb bounce buffer for DMA transaction.
> 
> In Isolation VM with AMD SEV, the bounce buffer needs to be
> accessed via extra address space which is above shared_gpa_boundary
> (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
> The access physical address will be original physical address +
> shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
> spec is called virtual top of memory(vTOM). Memory addresses below
> vTOM are automatically treated as private while memory above
> vTOM is treated as shared.
> 
> Swiotlb bounce buffer code calls set_memory_decrypted()
> to mark bounce buffer visible to host and map it in extra
> address space via memremap. Populate the shared_gpa_boundary
> (vTOM) via swiotlb_unencrypted_base variable.
> 
> The map function memremap() can't work in the early place
> (e.g ms_hyperv_init_platform()) and so call swiotlb_update_mem_
> attributes() in the hyperv_init().
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
> Change since v4:
> 	* Remove Hyper-V IOMMU IOMMU_INIT_FINISH related functions
> 	  and set SWIOTLB_FORCE and swiotlb_unencrypted_base in the
> 	  ms_hyperv_init_platform(). Call swiotlb_update_mem_attributes()
> 	  in the hyperv_init().
> 
> Change since v3:
> 	* Add comment in pci-swiotlb-xen.c to explain why add
> 	  dependency between hyperv_swiotlb_detect() and pci_
> 	  xen_swiotlb_detect().
> 	* Return directly when fails to allocate Hyper-V swiotlb
> 	  buffer in the hyperv_iommu_swiotlb_init().
> ---
>  arch/x86/hyperv/hv_init.c      | 10 ++++++++++
>  arch/x86/kernel/cpu/mshyperv.c | 11 ++++++++++-
>  include/linux/hyperv.h         |  8 ++++++++
>  3 files changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 24f4a06ac46a..9e18a280f89d 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -28,6 +28,7 @@
>  #include <linux/syscore_ops.h>
>  #include <clocksource/hyperv_timer.h>
>  #include <linux/highmem.h>
> +#include <linux/swiotlb.h>
> 
>  int hyperv_init_cpuhp;
>  u64 hv_current_partition_id = ~0ull;
> @@ -502,6 +503,15 @@ void __init hyperv_init(void)
> 
>  	/* Query the VMs extended capability once, so that it can be cached. */
>  	hv_query_ext_cap(0);
> +
> +	/*
> +	 * Swiotlb bounce buffer needs to be mapped in extra address
> +	 * space. Map function doesn't work in the early place and so
> +	 * call swiotlb_update_mem_attributes() here.
> +	 */
> +	if (hv_is_isolation_supported())
> +		swiotlb_update_mem_attributes();
> +
>  	return;
> 
>  clean_guest_os_id:
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 4794b716ec79..baf3a0873552 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -18,6 +18,7 @@
>  #include <linux/kexec.h>
>  #include <linux/i8253.h>
>  #include <linux/random.h>
> +#include <linux/swiotlb.h>
>  #include <asm/processor.h>
>  #include <asm/hypervisor.h>
>  #include <asm/hyperv-tlfs.h>
> @@ -319,8 +320,16 @@ static void __init ms_hyperv_init_platform(void)
>  		pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
>  			ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
> 
> -		if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
> +		if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
>  			static_branch_enable(&isolation_type_snp);
> +			swiotlb_unencrypted_base = ms_hyperv.shared_gpa_boundary;
> +		}
> +
> +		/*
> +		 * Enable swiotlb force mode in Isolation VM to
> +		 * use swiotlb bounce buffer for dma transaction.
> +		 */
> +		swiotlb_force = SWIOTLB_FORCE;

I'm good with this approach that directly updates the swiotlb settings here
rather than in IOMMU initialization code.  It's a lot more straightforward.

However, there's an issue if building for X86_32 without PAE, in that the 
swiotlb module may not be built, resulting in compile and link errors.  The
swiotlb.h file needs to be updated to provide a stub function for
swiotlb_update_mem_attributes().   swiotlb_unencrypted_base probably
needs wrapper functions to get/set it, which can be stubs when 
CONFIG_SWIOTLB is not set.  swiotlb_force is a bit of a mess in that it already
has a stub definition that assumes it will only be read, and not set.  A bit of
thinking will be needed to sort that out.

>  	}
> 
>  	if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index b823311eac79..1f037e114dc8 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -1726,6 +1726,14 @@ int hyperv_write_cfg_blk(struct pci_dev *dev, void *buf, unsigned int len,
>  int hyperv_reg_block_invalidate(struct pci_dev *dev, void *context,
>  				void (*block_invalidate)(void *context,
>  							 u64 block_mask));
> +#if IS_ENABLED(CONFIG_HYPERV)
> +int __init hyperv_swiotlb_detect(void);
> +#else
> +static inline int __init hyperv_swiotlb_detect(void)
> +{
> +	return 0;
> +}
> +#endif

I don't think hyperv_swiotlb_detect() is used any longer, so this change
should be dropped.

> 
>  struct hyperv_pci_block_ops {
>  	int (*read_block)(struct pci_dev *dev, void *buf, unsigned int buf_len,
> --
> 2.25.1


^ permalink raw reply

* Re: [PATCH net-next v3 2/6] dt-bindings: net: lan966x: Extend with the analyzer interrupt
From: Vladimir Oltean @ 2021-12-09 20:23 UTC (permalink / raw)
  To: Horatiu Vultur
  Cc: Vladimir Oltean, netdev@vger.kernel.org,
	devicetree@vger.kernel.org, linux-kernel@vger.kernel.org,
	davem@davemloft.net, kuba@kernel.org, robh+dt@kernel.org,
	UNGLinuxDriver@microchip.com, linux@armlinux.org.uk,
	f.fainelli@gmail.com, vivien.didelot@gmail.com, andrew@lunn.ch
In-Reply-To: <20211209154247.kzsrwli5fqautqtm@soft-dev3-1.localhost>

On Thu, Dec 09, 2021 at 04:42:47PM +0100, Horatiu Vultur wrote:
> The 12/09/2021 10:58, Vladimir Oltean wrote:
> > 
> > On Thu, Dec 09, 2021 at 10:46:11AM +0100, Horatiu Vultur wrote:
> > > Extend dt-bindings for lan966x with analyzer interrupt.
> > > This interrupt can be generated for example when the HW learn/forgets
> > > an entry in the MAC table.
> > >
> > > Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
> > > ---
> > 
> > Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> > 
> > Why don't you describe your hardware in the device tree all at once?
> > Doing it piece by piece means that every time when you add a new
> > functionality you need to be compatible with the absence of a certain
> > reg, interrupt etc.
> 
> I though it is more clear what is added in the patch series.
> But then, if for example add more interrupts in DT than what the
> driver support, that would not be an issue?

I haven't kept track of the lan966x driver development. It looks like it
is pretty new, so I think it's ok in this case. But I've also seen
features introduced years after the driver was initially published (see
ocelot fdma) where device tree updates were still necessary, due to
minor things like these: an interrupt isn't there, the registers for
FDMA aren't there, etc. After that kind of time you'd expect the DT
to no longer require updates unless there is some unforeseen event
(something is broken, a driver is radically rethought). Sure there's a
fine line between how much you can add to the device tree and the
how many consumers there are in the kernel, but on the other hand the
kernel doesn't have to use everything that's in the device tree.
For example, at Mark Brown's suggestion, the DSPI nodes in ls1028a.dtsi
declare their DMA channels even though the driver does not use them
(it could, though, but it would be slower). Similarly, the DSPI driver
for LS1021A has had a while in which it ignored the interrupt line from
the device tree, because poll mode was simply faster. To me, this kind
of approach where the device tree is provisioned even for configurations
that aren't supported today makes sense, precisely because the DT blob
and the kernel have different lifetimes. It's better to have the
interrupt and not use it than to need it and not have it.

^ permalink raw reply

* Re: [PATCH] net: bonding: Add support for IPV6 ns/na
From: Jay Vosburgh @ 2021-12-09 20:22 UTC (permalink / raw)
  To: Sun Shouxin; +Cc: vfalico, andy, davem, kuba, netdev, linux-kernel, huyd12
In-Reply-To: <1639032622-28098-1-git-send-email-sunshouxin@chinatelecom.cn>

Sun Shouxin <sunshouxin@chinatelecom.cn> wrote:

>Since ipv6 neighbor solicitation and advertisement messages
>isn't handled gracefully in bonding6 driver, we can see packet
>drop due to inconsistency bewteen mac address in the option
>message and source MAC .

	Could you provide a specific example where this occurs?

>Another examples is ipv6 neighbor solicitation and advertisement
>messages from VM via tap attached to host brighe, the src mac
>mighe be changed through balance-alb mode, but it is not synced
>with Link-layer address in the option message.

	What happens if the MAC assignment changes because alb does a
rebalance?

>The patch implements bond6's tx handle for ipv6 neighbor
>solicitation and advertisement messages.

	A few additional minor comments below.

>Suggested-by: Hu Yadi <huyd12@chinatelecom.cn>
>Signed-off-by: Sun Shouxin <sunshouxin@chinatelecom.cn>
>---
> drivers/net/bonding/bond_alb.c | 127 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 127 insertions(+)
>
>diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
>index 2ec8e01..01566ba 100644
>--- a/drivers/net/bonding/bond_alb.c
>+++ b/drivers/net/bonding/bond_alb.c
>@@ -22,6 +22,7 @@
> #include <asm/byteorder.h>
> #include <net/bonding.h>
> #include <net/bond_alb.h>
>+#include <net/ndisc.h>
> 
> static const u8 mac_v6_allmcast[ETH_ALEN + 2] __long_aligned = {
> 	0x33, 0x33, 0x00, 0x00, 0x00, 0x01
>@@ -1269,6 +1270,112 @@ static int alb_set_mac_address(struct bonding *bond, void *addr)
> 	return res;
> }
> 
>+static void alb_change_nd_option(struct sk_buff *skb, void *data)
>+{
>+	struct nd_msg *msg = (struct nd_msg *)skb_transport_header(skb);
>+	struct nd_opt_hdr *nd_opt = (struct nd_opt_hdr *)msg->opt;
>+	struct net_device *dev = skb->dev;
>+	struct icmp6hdr *icmp6h = icmp6_hdr(skb);
>+	struct ipv6hdr *ip6hdr = ipv6_hdr(skb);
>+	u8 *lladdr = NULL;
>+	u32 ndoptlen = skb_tail_pointer(skb) - (skb_transport_header(skb) +
>+				offsetof(struct nd_msg, opt));
>+
>+	while (ndoptlen) {
>+		int l;
>+
>+		switch (nd_opt->nd_opt_type) {
>+		case ND_OPT_SOURCE_LL_ADDR:
>+		case ND_OPT_TARGET_LL_ADDR:
>+		lladdr = ndisc_opt_addr_data(nd_opt, dev);
>+		break;
>+
>+		default:
>+		break;
>+		}
>+
>+		l = nd_opt->nd_opt_len << 3;
>+
>+		if (ndoptlen < l || l == 0)
>+			return;
>+
>+		if (lladdr) {
>+			memcpy(lladdr, data, dev->addr_len);
>+			lladdr = NULL;
>+			icmp6h->icmp6_cksum = 0;
>+
>+			icmp6h->icmp6_cksum = csum_ipv6_magic(&ip6hdr->saddr,
>+							      &ip6hdr->daddr,
>+						ntohs(ip6hdr->payload_len),
>+						IPPROTO_ICMPV6,
>+						csum_partial(icmp6h,
>+							     ntohs(ip6hdr->payload_len), 0));
>+			lladdr = NULL;

	"lladdr = NULL" could be in the default: case, above, instead of
being done here (and it's here twice).

>+		}
>+		ndoptlen -= l;
>+		nd_opt = ((void *)nd_opt) + l;
>+	}
>+}
>+
>+static u8 *alb_get_lladdr(struct sk_buff *skb)
>+{
>+	struct nd_msg *msg = (struct nd_msg *)skb_transport_header(skb);
>+	struct nd_opt_hdr *nd_opt = (struct nd_opt_hdr *)msg->opt;
>+	struct net_device *dev = skb->dev;
>+	u8 *lladdr = NULL;
>+	u32 ndoptlen = skb_tail_pointer(skb) - (skb_transport_header(skb) +
>+				offsetof(struct nd_msg, opt));
>+
>+	while (ndoptlen) {
>+		int l;
>+
>+		switch (nd_opt->nd_opt_type) {
>+		case ND_OPT_SOURCE_LL_ADDR:
>+		case ND_OPT_TARGET_LL_ADDR:
>+			lladdr = ndisc_opt_addr_data(nd_opt, dev);
>+			break;
>+
>+		default:
>+			break;
>+		}
>+
>+		l = nd_opt->nd_opt_len << 3;
>+
>+		if (ndoptlen < l || l == 0)
>+			return lladdr;
>+
>+		if (lladdr)
>+			return lladdr;
>+
>+		ndoptlen -= l;
>+		nd_opt = ((void *)nd_opt) + l;
>+	}
>+
>+	return lladdr;
>+}
>+
>+static void alb_set_nd_option(struct sk_buff *skb, struct bonding *bond,
>+			      struct slave *tx_slave)
>+{
>+	struct ipv6hdr *ip6hdr;
>+	struct icmp6hdr *hdr = NULL;
>+
>+	if (tx_slave && tx_slave != rcu_access_pointer(bond->curr_active_slave)) {
>+		if (ntohs(skb->protocol) == ETH_P_IPV6) {

	Nit: use "skb->protocol == htons(ETH_P_IPV6)" as the compiler
should optimize the htons() of a constant.  Also, you may want to
consider reordering the tests here, as IPv6 NA/NS traffic is likely to
be the vast minority.

>+			ip6hdr = ipv6_hdr(skb);
>+			if (ip6hdr->nexthdr == IPPROTO_ICMPV6) {
>+				hdr = icmp6_hdr(skb);
>+				if (hdr->icmp6_type ==
>+				    NDISC_NEIGHBOUR_ADVERTISEMENT ||
>+				     hdr->icmp6_type ==
>+				     NDISC_NEIGHBOUR_SOLICITATION) {

	This construct appears twice, perhaps it deserves its own
boolean-return function?

	-J

>+					alb_change_nd_option(skb, tx_slave->dev->dev_addr);
>+				}
>+			}
>+		}
>+	}
>+}
>+
> /************************ exported alb functions ************************/
> 
> int bond_alb_initialize(struct bonding *bond, int rlb_enabled)
>@@ -1415,6 +1522,7 @@ struct slave *bond_xmit_alb_slave_get(struct bonding *bond,
> 	}
> 	case ETH_P_IPV6: {
> 		const struct ipv6hdr *ip6hdr;
>+		struct icmp6hdr *hdr = NULL;
> 
> 		/* IPv6 doesn't really use broadcast mac address, but leave
> 		 * that here just in case.
>@@ -1446,6 +1554,24 @@ struct slave *bond_xmit_alb_slave_get(struct bonding *bond,
> 			break;
> 		}
> 
>+		if (ip6hdr->nexthdr == IPPROTO_ICMPV6) {
>+			hdr = icmp6_hdr(skb);
>+			if (hdr->icmp6_type ==
>+			    NDISC_NEIGHBOUR_ADVERTISEMENT ||
>+			    hdr->icmp6_type ==
>+			    NDISC_NEIGHBOUR_SOLICITATION) {
>+				u8 *lladdr = NULL;
>+
>+				lladdr = alb_get_lladdr(skb);
>+				if (lladdr) {
>+					if (!bond_slave_has_mac_rx(bond, lladdr)) {
>+						do_tx_balance = false;
>+						break;
>+					}
>+				}
>+			}
>+		}
>+
> 		hash_start = (char *)&ip6hdr->daddr;
> 		hash_size = sizeof(ip6hdr->daddr);
> 		break;
>@@ -1489,6 +1615,7 @@ netdev_tx_t bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
> 	struct slave *tx_slave = NULL;
> 
> 	tx_slave = bond_xmit_alb_slave_get(bond, skb);
>+	alb_set_nd_option(skb, bond, tx_slave);
> 	return bond_do_alb_xmit(skb, bond, tx_slave);
> }
> 
>-- 
>1.8.3.1
>

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply

* RE: [PATCH v2] samples/bpf: xdpsock: fix swap.cocci warning
From: John Fastabend @ 2021-12-09 20:11 UTC (permalink / raw)
  To: Yihao Han, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, netdev, bpf, linux-kernel
  Cc: kernel, Yihao Han
In-Reply-To: <20211209092250.56430-1-hanyihao@vivo.com>

Yihao Han wrote:
> Fix following swap.cocci warning:
> ./samples/bpf/xdpsock_user.c:528:22-23:
> WARNING opportunity for swap()
> 
> Signed-off-by: Yihao Han <hanyihao@vivo.com>
> ---

LGTM

Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply

* Re: [PATCH net v2] tcp: Don't acquire inet_listen_hashbucket::lock with disabled BH.
From: Martin KaFai Lau @ 2021-12-09 20:06 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Jakub Kicinski, Kuniyuki Iwashima, eric.dumazet, davem, dsahern,
	efault, netdev, tglx, yoshfuji
In-Reply-To: <20211206120216.mgo6qibl5fmzdcrp@linutronix.de>

On Mon, Dec 06, 2021 at 01:02:16PM +0100, Sebastian Andrzej Siewior wrote:
> Commit
>    9652dc2eb9e40 ("tcp: relax listening_hash operations")
> 
> removed the need to disable bottom half while acquiring
> listening_hash.lock. There are still two callers left which disable
> bottom half before the lock is acquired.
> 
> On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts
> as a lock to ensure that resources, that are protected by disabling
> bottom halves, remain protected.
> This leads to a circular locking dependency if the lock acquired with
> disabled bottom halves is also acquired with enabled bottom halves
> followed by disabling bottom halves. This is the reverse locking order.
> It has been observed with inet_listen_hashbucket::lock:
> 
> local_bh_disable() + spin_lock(&ilb->lock):
>   inet_listen()
>     inet_csk_listen_start()
>       sk->sk_prot->hash() := inet_hash()
> 	local_bh_disable()
> 	__inet_hash()
> 	  spin_lock(&ilb->lock);
> 	    acquire(&ilb->lock);
> 
> Reverse order: spin_lock(&ilb->lock) + local_bh_disable():
>   tcp_seq_next()
>     listening_get_next()
>       spin_lock(&ilb->lock);
The net tree has already been using ilb2 instead of ilb.
It does not change the problem though but updating
the commit log will be useful to avoid future confusion.

iiuc, established_get_next() does not hit this because
it calls spin_lock_bh() which then keeps the
local_bh_disable() => spin_lock() ordering?

The patch lgtm.
Acked-by: Martin KaFai Lau <kafai@fb.com>

> 	acquire(&ilb->lock);
> 
>   tcp4_seq_show()
>     get_tcp4_sock()
>       sock_i_ino()
> 	read_lock_bh(&sk->sk_callback_lock);
> 	  acquire(softirq_ctrl)	// <---- whoops
> 	  acquire(&sk->sk_callback_lock)
> 
> Drop local_bh_disable() around __inet_hash() which acquires
> listening_hash->lock. Split inet_unhash() and acquire the
> listen_hashbucket lock without disabling bottom halves; the inet_ehash
> lock with disabled bottom halves.

^ permalink raw reply

* Re: [PATCH net-next 00/17] net: netns refcount tracking series
From: Jakub Kicinski @ 2021-12-09 20:03 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S . Miller, netdev, Eric Dumazet
In-Reply-To: <20211207005142.1688204-1-eric.dumazet@gmail.com>

On Mon,  6 Dec 2021 16:51:25 -0800 Eric Dumazet wrote:
> We have 100+ syzbot reports about netns being dismantled too soon,
> still unresolved as of today.
> 
> We think a missing get_net() or an extra put_net() is the root cause.
> 
> In order to find the bug(s), and be able to spot future ones,
> this patch adds CONFIG_NET_NS_REFCNT_TRACKER and new helpers
> to precisely pair all put_net() with corresponding get_net().
> 
> To use these helpers, each data structure owning a refcount
> should also use a "netns_tracker" to pair the get() and put().
> 
> Small sections of codes where the get()/put() are in sight
> do not need to have a tracker, because they are short lived,
> but in theory it is also possible to declare an on-stack tracker.

Ugh, I realized after a week of waiting that vfs / sunrpc / audit folks
are not even CCed here. I think we should give them the courtesy of
being able to ack the patches.. Can you split out 1-4,6,7 for immediate
merging and repost the rest with the right CCs?

^ permalink raw reply

* Re: [PATCH net-next 1/5] net: phylink: add legacy_pre_march2020 indicator
From: patchwork-bot+netdevbpf @ 2021-12-09 20:00 UTC (permalink / raw)
  To: Russell King
  Cc: chris.snook, nbd, f.fainelli, john, Mark-MC.Lee, matthias.bgg,
	sean.wang, vivien.didelot, olteanv, andrew, hkallweit1,
	linux-arm-kernel, linux-mediatek, netdev
In-Reply-To: <E1mucmf-00EyCl-KA@rmk-PC.armlinux.org.uk>

Hello:

This series was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 07 Dec 2021 15:53:37 +0000 you wrote:
> Add a boolean to phylink_config to indicate whether a driver has not
> been updated for the changes in commit 7cceb599d15d ("net: phylink:
> avoid mac_config calls"), and thus are reliant on the old behaviour.
> 
> We were currently keying the phylink behaviour on the presence of a
> PCS, but this is sub-optimal for modern drivers that may not have a
> PCS.
> 
> [...]

Here is the summary with links:
  - [net-next,1/5] net: phylink: add legacy_pre_march2020 indicator
    https://git.kernel.org/netdev/net-next/c/3e5b1feccea7
  - [net-next,2/5] net: dsa: mark DSA phylink as legacy_pre_march2020
    https://git.kernel.org/netdev/net-next/c/0a9f0794d9bd
  - [net-next,3/5] net: mtk_eth_soc: mark as a legacy_pre_march2020 driver
    https://git.kernel.org/netdev/net-next/c/b06515367fac
  - [net-next,4/5] net: phylink: use legacy_pre_march2020
    https://git.kernel.org/netdev/net-next/c/001f4261fe4d
  - [net-next,5/5] net: ag71xx: remove unnecessary legacy methods
    https://git.kernel.org/netdev/net-next/c/11053047a4af

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next] net: phy: prefer 1000baseT over 1000baseKX
From: patchwork-bot+netdevbpf @ 2021-12-09 20:00 UTC (permalink / raw)
  To: Russell King; +Cc: andrew, hkallweit1, davem, netdev, kuba
In-Reply-To: <E1muvFO-00F6jY-1K@rmk-PC.armlinux.org.uk>

Hello:

This patch was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 08 Dec 2021 11:36:30 +0000 you wrote:
> The PHY settings table is supposed to be sorted by descending match
> priority - in other words, earlier entries are preferred over later
> entries.
> 
> The order of 1000baseKX/Full and 1000baseT/Full is such that we
> prefer 1000baseKX/Full over 1000baseT/Full, but 1000baseKX/Full is
> a lot rarer than 1000baseT/Full, and thus is much less likely to
> be preferred.
> 
> [...]

Here is the summary with links:
  - [net-next] net: phy: prefer 1000baseT over 1000baseKX
    https://git.kernel.org/netdev/net-next/c/f20f94f7f52c

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next] xfrm: use net device refcount tracker helpers
From: patchwork-bot+netdevbpf @ 2021-12-09 20:00 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, kuba, netdev, edumazet, amwang, steffen.klassert
In-Reply-To: <20211207193203.2706158-1-eric.dumazet@gmail.com>

Hello:

This patch was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Tue,  7 Dec 2021 11:32:03 -0800 you wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> xfrm4_fill_dst() and xfrm6_fill_dst() build dst,
> getting a device reference that will likely be released
> by standard dst_release() code.
> 
> We have to track these references or risk a warning if
> CONFIG_NET_DEV_REFCNT_TRACKER=y
> 
> [...]

Here is the summary with links:
  - [net-next] xfrm: use net device refcount tracker helpers
    https://git.kernel.org/netdev/net-next/c/4177e4960594

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net-next 0/5] net: phylink: introduce legacy mode flag
From: patchwork-bot+netdevbpf @ 2021-12-09 20:00 UTC (permalink / raw)
  To: Russell King
  Cc: chris.snook, nbd, f.fainelli, john, Mark-MC.Lee, matthias.bgg,
	sean.wang, vivien.didelot, olteanv, andrew, davem, hkallweit1,
	kuba, linux-arm-kernel, linux-mediatek, netdev
In-Reply-To: <Ya+DGaGmGgWrlVkW@shell.armlinux.org.uk>

Hello:

This series was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 7 Dec 2021 15:51:53 +0000 you wrote:
> Hi all,
> 
> In March 2020, phylink gained support to split the PCS support out of
> the MAC callbacks. By doing so, a slight behavioural difference was
> introduced when a PCS is present, specifically:
> 
> 1) the call to mac_config() when the link comes up or advertisement
>    changes were eliminated
> 2) mac_an_restart() will never be called
> 3) mac_pcs_get_state() will never be called
> 
> [...]

Here is the summary with links:
  - [net-next,1/5] net: phylink: add legacy_pre_march2020 indicator
    https://git.kernel.org/netdev/net-next/c/3e5b1feccea7
  - [net-next,2/5] net: dsa: mark DSA phylink as legacy_pre_march2020
    https://git.kernel.org/netdev/net-next/c/0a9f0794d9bd
  - [net-next,3/5] net: mtk_eth_soc: mark as a legacy_pre_march2020 driver
    https://git.kernel.org/netdev/net-next/c/b06515367fac
  - [net-next,4/5] net: phylink: use legacy_pre_march2020
    https://git.kernel.org/netdev/net-next/c/001f4261fe4d
  - [net-next,5/5] net: ag71xx: remove unnecessary legacy methods
    https://git.kernel.org/netdev/net-next/c/11053047a4af

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* RE: [PATCH V6 5/5] net: netvsc: Add Isolation VM support for netvsc driver
From: Michael Kelley (LINUX) @ 2021-12-09 19:54 UTC (permalink / raw)
  To: Haiyang Zhang, Tianyu Lan, KY Srinivasan, Stephen Hemminger,
	wei.liu@kernel.org, Dexuan Cui, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org, hpa@zytor.com, davem@davemloft.net,
	kuba@kernel.org, jejb@linux.ibm.com, martin.petersen@oracle.com,
	arnd@arndb.de, hch@infradead.org, m.szyprowski@samsung.com,
	robin.murphy@arm.com, Tianyu Lan, thomas.lendacky@amd.com
  Cc: iommu@lists.linux-foundation.org, linux-arch@vger.kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-scsi@vger.kernel.org, netdev@vger.kernel.org, vkuznets,
	brijesh.singh@amd.com, konrad.wilk@oracle.com, hch@lst.de,
	joro@8bytes.org, parri.andrea@gmail.com, dave.hansen@intel.com
In-Reply-To: <BN8PR21MB128401EEDE6B8C8553CC8009CA6F9@BN8PR21MB1284.namprd21.prod.outlook.com>

From: Haiyang Zhang <haiyangz@microsoft.com> Sent: Wednesday, December 8, 2021 12:14 PM
> > From: Tianyu Lan <ltykernel@gmail.com>
> > Sent: Tuesday, December 7, 2021 2:56 AM

[snip]

> >  static inline int netvsc_send_pkt(
> >  	struct hv_device *device,
> >  	struct hv_netvsc_packet *packet,
> > @@ -986,14 +1105,24 @@ static inline int netvsc_send_pkt(
> >
> >  	trace_nvsp_send_pkt(ndev, out_channel, rpkt);
> >
> > +	packet->dma_range = NULL;
> >  	if (packet->page_buf_cnt) {
> >  		if (packet->cp_partial)
> >  			pb += packet->rmsg_pgcnt;
> >
> > +		ret = netvsc_dma_map(ndev_ctx->device_ctx, packet, pb);
> > +		if (ret) {
> > +			ret = -EAGAIN;
> > +			goto exit;
> > +		}
> 
> Returning EAGAIN will let the upper network layer busy retry,
> which may make things worse.
> I suggest to return ENOSPC here like another place in this
> function, which will just drop the packet, and let the network
> protocol/app layer decide how to recover.
> 
> Thanks,
> - Haiyang

I made the original suggestion to return -EAGAIN here.   A
DMA mapping failure should occur only if swiotlb bounce
buffer space is unavailable, which is a transient condition.
The existing code already stops the queue and returns
-EAGAIN when the ring buffer is full, which is also a transient
condition.  My sense is that the two conditions should be
handled the same way.  Or is there a reason why a ring
buffer full condition should stop the queue and retry, while
a mapping failure should drop the packet?

Michael

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox