Netdev List
 help / color / mirror / Atom feed
* [PATCH net 1/2] iavf: Fix error when changing ring parameters on ice PF
From: Tony Nguyen @ 2022-04-20 17:26 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: Michal Maloszewski, netdev, anthony.l.nguyen, sassmann,
	Sylwester Dziedziuch, Konrad Jankowski
In-Reply-To: <20220420172624.931237-1-anthony.l.nguyen@intel.com>

From: Michal Maloszewski <michal.maloszewski@intel.com>

Reset is triggered when ring parameters are being changed through
ethtool and queues are reconfigured for VF's VSI. If ring is changed
again immediately, then the next reset could be executed before
queues could be properly reinitialized on VF's VSI. It caused ice PF
to mess up the VSI resource tree.

Add a check in iavf_set_ringparam for adapter and VF's queue
state. If VF is currently resetting or queues are disabled for the VF
return with EAGAIN error.

Fixes: d732a1844507 ("i40evf: fix crash when changing ring sizes")
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Michal Maloszewski <michal.maloszewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf_ethtool.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
index 3bb56714beb0..08efbc50fbe9 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
@@ -631,6 +631,11 @@ static int iavf_set_ringparam(struct net_device *netdev,
 	if ((ring->rx_mini_pending) || (ring->rx_jumbo_pending))
 		return -EINVAL;
 
+	if (adapter->state == __IAVF_RESETTING ||
+	    (adapter->state == __IAVF_RUNNING &&
+	     (adapter->flags & IAVF_FLAG_QUEUES_DISABLED)))
+		return -EAGAIN;
+
 	if (ring->tx_pending > IAVF_MAX_TXD ||
 	    ring->tx_pending < IAVF_MIN_TXD ||
 	    ring->rx_pending > IAVF_MAX_RXD ||
-- 
2.31.1


^ permalink raw reply related

* [PATCH net 2/2] i40e: i40e_main: fix a missing check on list iterator
From: Tony Nguyen @ 2022-04-20 17:26 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: Xiaomeng Tong, netdev, anthony.l.nguyen, sassmann, stable,
	Gurucharan
In-Reply-To: <20220420172624.931237-1-anthony.l.nguyen@intel.com>

From: Xiaomeng Tong <xiam0nd.tong@gmail.com>

The bug is here:
	ret = i40e_add_macvlan_filter(hw, ch->seid, vdev->dev_addr, &aq_err);

The list iterator 'ch' will point to a bogus position containing
HEAD if the list is empty or no element is found. This case must
be checked before any use of the iterator, otherwise it will
lead to a invalid memory access.

To fix this bug, use a new variable 'iter' as the list iterator,
while use the origin variable 'ch' as a dedicated pointer to
point to the found element.

Cc: stable@vger.kernel.org
Fixes: 1d8d80b4e4ff6 ("i40e: Add macvlan support on i40e")
Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 27 +++++++++++----------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 6778df2177a1..98871f014994 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -7549,42 +7549,43 @@ static void i40e_free_macvlan_channels(struct i40e_vsi *vsi)
 static int i40e_fwd_ring_up(struct i40e_vsi *vsi, struct net_device *vdev,
 			    struct i40e_fwd_adapter *fwd)
 {
+	struct i40e_channel *ch = NULL, *ch_tmp, *iter;
 	int ret = 0, num_tc = 1,  i, aq_err;
-	struct i40e_channel *ch, *ch_tmp;
 	struct i40e_pf *pf = vsi->back;
 	struct i40e_hw *hw = &pf->hw;
 
-	if (list_empty(&vsi->macvlan_list))
-		return -EINVAL;
-
 	/* Go through the list and find an available channel */
-	list_for_each_entry_safe(ch, ch_tmp, &vsi->macvlan_list, list) {
-		if (!i40e_is_channel_macvlan(ch)) {
-			ch->fwd = fwd;
+	list_for_each_entry_safe(iter, ch_tmp, &vsi->macvlan_list, list) {
+		if (!i40e_is_channel_macvlan(iter)) {
+			iter->fwd = fwd;
 			/* record configuration for macvlan interface in vdev */
 			for (i = 0; i < num_tc; i++)
 				netdev_bind_sb_channel_queue(vsi->netdev, vdev,
 							     i,
-							     ch->num_queue_pairs,
-							     ch->base_queue);
-			for (i = 0; i < ch->num_queue_pairs; i++) {
+							     iter->num_queue_pairs,
+							     iter->base_queue);
+			for (i = 0; i < iter->num_queue_pairs; i++) {
 				struct i40e_ring *tx_ring, *rx_ring;
 				u16 pf_q;
 
-				pf_q = ch->base_queue + i;
+				pf_q = iter->base_queue + i;
 
 				/* Get to TX ring ptr */
 				tx_ring = vsi->tx_rings[pf_q];
-				tx_ring->ch = ch;
+				tx_ring->ch = iter;
 
 				/* Get the RX ring ptr */
 				rx_ring = vsi->rx_rings[pf_q];
-				rx_ring->ch = ch;
+				rx_ring->ch = iter;
 			}
+			ch = iter;
 			break;
 		}
 	}
 
+	if (!ch)
+		return -EINVAL;
+
 	/* Guarantee all rings are updated before we update the
 	 * MAC address filter.
 	 */
-- 
2.31.1


^ permalink raw reply related

* Re: IPv6 multicast with VRF
From: David Ahern @ 2022-04-20 18:59 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: netdev
In-Reply-To: <20220420165457.kd5yz6a6itqfcysj@skbuf>

On 4/20/22 10:54 AM, Vladimir Oltean wrote:
> Hi,
> 
> I don't have experience with either IPv6 multicast or VRF, yet I need to
> send some IPv6 multicast packets from a device enslaved to a VRF, and I
> don't really know what's wrong with the routing table setup.
> 
> The system is configured in the following way:
> 
>  ip link set dev eth0 up
> 
>  # The kernel kindly creates a ff00::/8 route for IPv6 multicast traffic
>  # in the local table, and I think this is what makes multicast route
>  # lookups find the egress device.
>  ip -6 route show table local
> local ::1 dev lo proto kernel metric 0 pref medium
> local fe80::204:9fff:fe05:f4ab dev eth0 proto kernel metric 0 pref medium
> multicast ff00::/8 dev eth0 proto kernel metric 256 pref medium
> 
>  ip -6 route get ff02::1
> multicast ff02::1 dev eth0 table local proto kernel src fe80::204:9fff:fe05:f4ab metric 256 pref medium
> 
>  ip link add dev vrf0 type vrf table 3 && ip link set dev vrf0 up
> 
>  ip -4 route add table 3 unreachable default metric 4278198272
> 
>  ip -6 route add table 3 unreachable default metric 4278198272
> 
>  ip link set dev eth0 master vrf0
> 
> The problem seems to be that, although the "ff00::/8 dev eth0" route
> migrates from table 255 to table 3, route lookups after this point fail
> to find it and return -ENETUNREACH (ip6_null_entry).
> 
>  ip -6 route show table local
> local ::1 dev lo proto kernel metric 0 pref medium
> 
>  ip -6 route show table main
> ::1 dev lo proto kernel metric 256 pref medium
> 
>  ip -6 route show table 3
> local fe80::204:9fff:fe05:f4ab dev eth0 proto kernel metric 0 pref medium
> fe80::/64 dev eth0 proto kernel metric 256 pref medium
> multicast ff00::/8 dev eth0 proto kernel metric 256 pref medium
> unreachable default dev lo metric 4278198272 pref medium
> 
>  ip -6 route get ff02::1
> RTNETLINK answers: Network is unreachable
> 
>  ip -6 route get vrf vrf0 ff02::1
> RTNETLINK answers: Network is unreachable
> 
> I'm not exactly sure what is missing?

Did you adjust the FIB rules? See the documentation in the kernel repo.

And add a device scope to the `get`. e.g.,

    ip -6 route get ff02::1%eth0


^ permalink raw reply

* Re: IPv6 multicast with VRF
From: Vladimir Oltean @ 2022-04-20 19:18 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev
In-Reply-To: <97eaffb8-2125-834e-641f-c99c097b6ee2@gmail.com>

On Wed, Apr 20, 2022 at 12:59:45PM -0600, David Ahern wrote:
> Did you adjust the FIB rules? See the documentation in the kernel repo.

Sorry, I don't understand what you mean by "adjusting". I tried various
forms of adding an IPv6 multicast route on eth0, to multiple tables,
some routes more generic and some more specific, and none seem to match
when eth0 is under a VRF, for a reason I don't really know. This does
not occur with IPv4 multicast, by the way.

By documentation I think you mean Documentation/networking/vrf.rst.
I went through it but I didn't notice something that would make me
realize what the issue is.

> And add a device scope to the `get`. e.g.,
> 
>     ip -6 route get ff02::1%eth0

I'm probably not understanding this, because:

 ip -6 route get ff02::1%eth0
Error: inet6 prefix is expected rather than "ff02::1%eth0".

^ permalink raw reply

* Re: Support for IEEE1588 timestamping in the BCM54210PE PHY using the kernel mii_timestamper interface
From: Andrew Lunn @ 2022-04-20 19:19 UTC (permalink / raw)
  To: Lasse Johnsen; +Cc: netdev, richardcochran, Gordon Hollingworth, Ahmad Byagowi
In-Reply-To: <928593CA-9CE9-4A54-B84A-9973126E026D@timebeat.app>

On Wed, Apr 20, 2022 at 03:03:26PM +0100, Lasse Johnsen wrote:
> Hello,
> 
> 
> The attached set of patches adds support for the IEEE1588 functionality on the BCM54210PE PHY using the Linux Kernel mii_timestamper interface. The BCM54210PE PHY can be found in the Raspberry PI Compute Module 4 and the work has been undertaken by Timebeat.app on behalf of Raspberry PI with help and support from the nice engineers at Broadcom.

Hi Lasse

There are a few process issues you should address.

Please wrap your email at about 80 characters.

Please take a read of

https://www.kernel.org/doc/html/latest/process/submitting-patches.html

and

https://www.kernel.org/doc/html/latest/process/maintainer-netdev.html#netdev-faq

It is a bit of a learning curve getting patches accepted, and you have
to follow the processes defined in these documents.

>  arch/arm/configs/bcm2711_defconfig            |    1 +
>  arch/arm64/configs/bcm2711_defconfig          |    1 +

You will need to split these changes up. defconfg changes go via the
Broadcom maintainers. PHY driver changes go via netdev. You can
initially post them as a series, but in the end you might need to send
them to different people/lists.

> +obj-$(CONFIG_BCM54120PE_PHY)	+= bcm54210pe_ptp.o

How specific is this code to the bcm54210pe? Should it work for any
bcm54xxx PHY? You might want to name this file broadcom_ptp.c if it
will work with any PHY supported by broadcom.c.

> +static bool bcm54210pe_fetch_timestamp(u8 txrx, u8 message_type, u16 seq_id, struct bcm54210pe_private *private, u64 *timestamp)
> +{
> +	struct bcm54210pe_circular_buffer_item *item; 
> +	struct list_head *this, *next;
> +
> +	u8 index = (txrx * 4) + message_type;
> +
> +	if(index >= CIRCULAR_BUFFER_COUNT)
> +	{
> +		return false;
> +	}

Please run your code through ./scripts/checkpatch.pl. You will find
the code has a number of code style issues which need cleaning up.

> +#if IS_ENABLED (CONFIG_BCM54120PE_PHY)
> +{
> +	.phy_id		= PHY_ID_BCM54213PE,
> +	.phy_id_mask	= 0xffffffff,
> +        .name           = "Broadcom BCM54210PE",
> +        /* PHY_GBIT_FEATURES */
> +        .config_init    = bcm54xx_config_init,
> +        .ack_interrupt  = bcm_phy_ack_intr,
> +        .config_intr    = bcm_phy_config_intr,
> +	.probe		= bcm54210pe_probe,
> +#elif
> +{ 
>  	.phy_id		= PHY_ID_BCM54213PE,
>  	.phy_id_mask	= 0xffffffff,
>  	.name		= "Broadcom BCM54213PE",
> @@ -786,6 +804,7 @@ static struct phy_driver broadcom_drivers[] = {
>  	.config_init	= bcm54xx_config_init,
>  	.ack_interrupt	= bcm_phy_ack_intr,
>  	.config_intr	= bcm_phy_config_intr,
> +#endif

Don't replace the existing entry, extend it with your new
functionality.

	Andrew

^ permalink raw reply

* Re: [PATCH net-next 08/12] net: dsa: rzn1-a5psw: add FDB support
From: Vladimir Oltean @ 2022-04-20 19:52 UTC (permalink / raw)
  To: Clément Léger
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, David S . Miller,
	Jakub Kicinski, Paolo Abeni, Rob Herring, Krzysztof Kozlowski,
	Geert Uytterhoeven, Magnus Damm, Heiner Kallweit, Russell King,
	Thomas Petazzoni, Herve Codina, Miquèl Raynal,
	Milan Stevanovic, Jimmy Lalande, linux-kernel, devicetree,
	linux-renesas-soc, netdev
In-Reply-To: <20220420101648.7aa973b2@fixe.home>

On Wed, Apr 20, 2022 at 10:16:48AM +0200, Clément Léger wrote:
> Le Thu, 14 Apr 2022 20:51:40 +0300,
> Vladimir Oltean <olteanv@gmail.com> a écrit :
> 
> > > +
> > > +static int a5psw_port_fdb_add(struct dsa_switch *ds, int port,
> > > +			      const unsigned char *addr, u16 vid,
> > > +			      struct dsa_db db)  
> > 
> > This isn't something that is documented because I haven't had time to
> > update that, but new drivers should comply to the requirements for FDB
> > isolation (not ignore the passed "db" here) and eventually set
> > ds->fdb_isolation = true. Doing so would allow your switch to behave
> > correctly when
> > - there is more than one bridge spanning its ports,
> > - some ports are standalone and some ports are bridged
> > - standalone ports are looped back via an external cable with bridged
> >   ports
> > - unrecognized upper interfaces (bond, team) are used, and those are
> >   bridged directly with some other switch ports
> > 
> > The most basic thing you need to do to satisfy the requirements is to
> > figure out what mechanism for FDB partitioning does your hardware have.
> > If the answer is "none", then we'll have to use VLANs for that: all
> > standalone ports to share a VLAN, each VLAN-unaware bridge to share a
> > VLAN across all member ports, each VLAN of a VLAN-aware bridge to
> > reserve its own VLAN. Up to a total of 32 VLANs, since I notice that's
> > what the limit for your hardware is.
> 
> Ok, I see the idea. In the mean time, could we make a first step with a
> single bridge and without VLAN support ? This is expected to come later
> anyway.
> 
> > 
> > But I see this patch set doesn't include VLAN functionality (and also
> > ignores the "vid" from FDB entries), so I can't really say more right now.
> > But if you could provide more information about the hardware
> > capabilities we can discuss implementation options.
> 
> That's indeed the problem. The FDB table does not seems to have
> partitionning at all (except for ports) and entries (such as seen below)
> do not contain any VLAN information.
> 
> > > diff --git a/drivers/net/dsa/rzn1_a5psw.h b/drivers/net/dsa/rzn1_a5psw.h
> > > index b34ea549e936..37aa89383e70 100644
> > > --- a/drivers/net/dsa/rzn1_a5psw.h
> > > +++ b/drivers/net/dsa/rzn1_a5psw.h
> > > @@ -167,6 +167,22 @@
> > >  #define A5PSW_CTRL_TIMEOUT		1000
> > >  #define A5PSW_TABLE_ENTRIES		8192
> > >  
> > > +struct fdb_entry {  
> > 
> > Shouldn't this contain something along the lines of a VID, FID, something?
> 
> This is extracted directly from the datasheet [1]. The switch FDB table
> does not seems to store the VID with the entries (See page 300).
> 
> [1]
> https://www.renesas.com/us/en/document/mah/rzn1d-group-rzn1s-group-rzn1l-group-users-manual-r-engine-and-ethernet-peripherals

Thanks for the link. I see that the switch has a non-partitionable
lookup table, not even by VLAN. A shame.

This is also in contrast with the software bridge driver, where FDB and
MDB entries can have independent destinations per VID.

So there's nothing you can do beyond limiting to a single offloaded
bridge and hoping for the best w.r.t. per-VLAN forwarding destinations.

Note that if you limit to a single bridge does not mean that you can
declare ds->fdb_isolation = true. Declaring that would opt you into
unicast and multicast filtering towards the CPU, i.o.w. a method for
software to only receive the addresses it has expressed an interest in,
rather than all packets received on standalone ports. The way that is
implemented in DSA is by adding FDB and MDB entries on the management
port, and it would break a lot of things without a partitioning scheme
for the lookup table.

^ permalink raw reply

* Re: [PATCH v3 2/2] net: sysctl: introduce sysctl SYSCTL_THREE
From: Luis Chamberlain @ 2022-04-20 19:56 UTC (permalink / raw)
  To: Tonghao Zhang
  Cc: Jakub Kicinski, Kees Cook, Iurii Zaikin, David S. Miller,
	Hideaki YOSHIFUJI, David Ahern, Simon Horman, Julian Anastasov,
	Pablo Neira Ayuso, Jozsef Kadlecsik,
	Linux Kernel Network Developers, Florian Westphal, Dmitry Vyukov,
	Alexei Starovoitov, Eric Dumazet, Marc Kleine-Budde, Lorenz Bauer,
	Akhmat Karakotov
In-Reply-To: <CAMDZJNXb8bwXbbSoum3488c+zYA_4Ow3o5dXiWvquywnBtNg=A@mail.gmail.com>

On Wed, Apr 20, 2022 at 08:43:14PM +0800, Tonghao Zhang wrote:
> On Sat, Apr 16, 2022 at 12:40 AM <xiangxia.m.yue@gmail.com> wrote:
> >
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >
> > This patch introdues the SYSCTL_THREE.
> Hi Luis, Jakub
> any thoughts? I have fixed v2.
> > KUnit:
> > [00:10:14] ================ sysctl_test (10 subtests) =================
> > [00:10:14] [PASSED] sysctl_test_api_dointvec_null_tbl_data
> > [00:10:14] [PASSED] sysctl_test_api_dointvec_table_maxlen_unset
> > [00:10:14] [PASSED] sysctl_test_api_dointvec_table_len_is_zero
> > [00:10:14] [PASSED] sysctl_test_api_dointvec_table_read_but_position_set
> > [00:10:14] [PASSED] sysctl_test_dointvec_read_happy_single_positive
> > [00:10:14] [PASSED] sysctl_test_dointvec_read_happy_single_negative
> > [00:10:14] [PASSED] sysctl_test_dointvec_write_happy_single_positive
> > [00:10:14] [PASSED] sysctl_test_dointvec_write_happy_single_negative
> > [00:10:14] [PASSED] sysctl_test_api_dointvec_write_single_less_int_min
> > [00:10:14] [PASSED] sysctl_test_api_dointvec_write_single_greater_int_max
> > [00:10:14] =================== [PASSED] sysctl_test ===================
> >
> > ./run_kselftest.sh -c sysctl
> > ...
> > ok 1 selftests: sysctl: sysctl.sh
> >
> > Cc: Luis Chamberlain <mcgrof@kernel.org>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Iurii Zaikin <yzaikin@google.com>
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
> > Cc: David Ahern <dsahern@kernel.org>
> > Cc: Simon Horman <horms@verge.net.au>
> > Cc: Julian Anastasov <ja@ssi.bg>
> > Cc: Pablo Neira Ayuso <pablo@netfilter.org>
> > Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
> > Cc: Florian Westphal <fw@strlen.de>
> > Cc: Dmitry Vyukov <dvyukov@google.com>
> > Cc: Alexei Starovoitov <ast@kernel.org>
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Marc Kleine-Budde <mkl@pengutronix.de>
> > Cc: Lorenz Bauer <lmb@cloudflare.com>
> > Cc: Akhmat Karakotov <hmukos@yandex-team.ru>
> > Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>

It would be good for you to also have a separate patch which extends
the selftest for sysctl which tests that each of the values always
matches, I thought we had that test already, if not one needs to be
added for this. That should be the first patch. The second one would
add this as you are here in this patch, and the last one adds the new
SYSCTL_THREE to the selftest.

Otherwise looks good to me.

Happy to route this via sysclt-next if Jacub is OK with that.

  Luis

^ permalink raw reply

* [PATCH 0/5] hv_sock: Hardening changes
From: Andrea Parri (Microsoft) @ 2022-04-20 20:07 UTC (permalink / raw)
  To: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, Michael Kelley, Stefano Garzarella, David Miller,
	Jakub Kicinski, Paolo Abeni
  Cc: linux-hyperv, virtualization, netdev, linux-kernel,
	Andrea Parri (Microsoft)

Changes since RFC[1]:

  - Massage changelogs, fix typo
  - Drop "hv_sock: Initialize send_buf in hvs_stream_enqueue()"
  - Remove style/newline change
  - Remove/"inline" hv_pkt_iter_first_raw()

Applies to v5.18-rc3.

Thanks,
  Andrea

[1] https://lkml.kernel.org/r/20220413204742.5539-1-parri.andrea@gmail.com

Andrea Parri (Microsoft) (5):
  hv_sock: Check hv_pkt_iter_first_raw()'s return value
  hv_sock: Copy packets sent by Hyper-V out of the ring buffer
  hv_sock: Add validation for untrusted Hyper-V values
  Drivers: hv: vmbus: Accept hv_sock offers in isolated guests
  Drivers: hv: vmbus: Refactor the ring-buffer iterator functions

 drivers/hv/channel_mgmt.c        |  8 ++++--
 drivers/hv/ring_buffer.c         | 32 ++++++---------------
 include/linux/hyperv.h           | 48 ++++++++++----------------------
 net/vmw_vsock/hyperv_transport.c | 22 ++++++++++++---
 4 files changed, 48 insertions(+), 62 deletions(-)

-- 
2.25.1


^ permalink raw reply

* [PATCH 1/5] hv_sock: Check hv_pkt_iter_first_raw()'s return value
From: Andrea Parri (Microsoft) @ 2022-04-20 20:07 UTC (permalink / raw)
  To: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, Michael Kelley, Stefano Garzarella, David Miller,
	Jakub Kicinski, Paolo Abeni
  Cc: linux-hyperv, virtualization, netdev, linux-kernel,
	Andrea Parri (Microsoft)
In-Reply-To: <20220420200720.434717-1-parri.andrea@gmail.com>

The function returns NULL if the ring buffer doesn't contain enough
readable bytes to constitute a packet descriptor.  The ring buffer's
write_index is in memory which is shared with the Hyper-V host, an
erroneous or malicious host could thus change its value and overturn
the result of hvs_stream_has_data().

Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
---
 net/vmw_vsock/hyperv_transport.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index e111e13b66604..943352530936e 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -603,6 +603,8 @@ static ssize_t hvs_stream_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
 
 	if (need_refill) {
 		hvs->recv_desc = hv_pkt_iter_first_raw(hvs->chan);
+		if (!hvs->recv_desc)
+			return -ENOBUFS;
 		ret = hvs_update_recv_data(hvs);
 		if (ret)
 			return ret;
-- 
2.25.1


^ permalink raw reply related

* [PATCH 2/5] hv_sock: Copy packets sent by Hyper-V out of the ring buffer
From: Andrea Parri (Microsoft) @ 2022-04-20 20:07 UTC (permalink / raw)
  To: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, Michael Kelley, Stefano Garzarella, David Miller,
	Jakub Kicinski, Paolo Abeni
  Cc: linux-hyperv, virtualization, netdev, linux-kernel,
	Andrea Parri (Microsoft)
In-Reply-To: <20220420200720.434717-1-parri.andrea@gmail.com>

Pointers to VMbus packets sent by Hyper-V are used by the hv_sock driver
within the guest VM.  Hyper-V can send packets with erroneous values or
modify packet fields after they are processed by the guest.  To defend
against these scenarios, copy the incoming packet after validating its
length and offset fields using hv_pkt_iter_{first,next}().  In this way,
the packet can no longer be modified by the host.

Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
---
 net/vmw_vsock/hyperv_transport.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index 943352530936e..8c37d07017fc4 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -78,6 +78,9 @@ struct hvs_send_buf {
 					 ALIGN((payload_len), 8) + \
 					 VMBUS_PKT_TRAILER_SIZE)
 
+/* Upper bound on the size of a VMbus packet for hv_sock */
+#define HVS_MAX_PKT_SIZE	HVS_PKT_LEN(HVS_MTU_SIZE)
+
 union hvs_service_id {
 	guid_t	srv_id;
 
@@ -378,6 +381,8 @@ static void hvs_open_connection(struct vmbus_channel *chan)
 		rcvbuf = ALIGN(rcvbuf, HV_HYP_PAGE_SIZE);
 	}
 
+	chan->max_pkt_size = HVS_MAX_PKT_SIZE;
+
 	ret = vmbus_open(chan, sndbuf, rcvbuf, NULL, 0, hvs_channel_cb,
 			 conn_from_host ? new : sk);
 	if (ret != 0) {
@@ -602,7 +607,7 @@ static ssize_t hvs_stream_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
 		return -EOPNOTSUPP;
 
 	if (need_refill) {
-		hvs->recv_desc = hv_pkt_iter_first_raw(hvs->chan);
+		hvs->recv_desc = hv_pkt_iter_first(hvs->chan);
 		if (!hvs->recv_desc)
 			return -ENOBUFS;
 		ret = hvs_update_recv_data(hvs);
@@ -618,7 +623,7 @@ static ssize_t hvs_stream_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
 
 	hvs->recv_data_len -= to_read;
 	if (hvs->recv_data_len == 0) {
-		hvs->recv_desc = hv_pkt_iter_next_raw(hvs->chan, hvs->recv_desc);
+		hvs->recv_desc = hv_pkt_iter_next(hvs->chan, hvs->recv_desc);
 		if (hvs->recv_desc) {
 			ret = hvs_update_recv_data(hvs);
 			if (ret)
-- 
2.25.1


^ permalink raw reply related

* [PATCH 3/5] hv_sock: Add validation for untrusted Hyper-V values
From: Andrea Parri (Microsoft) @ 2022-04-20 20:07 UTC (permalink / raw)
  To: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, Michael Kelley, Stefano Garzarella, David Miller,
	Jakub Kicinski, Paolo Abeni
  Cc: linux-hyperv, virtualization, netdev, linux-kernel,
	Andrea Parri (Microsoft)
In-Reply-To: <20220420200720.434717-1-parri.andrea@gmail.com>

For additional robustness in the face of Hyper-V errors or malicious
behavior, validate all values that originate from packets that Hyper-V
has sent to the guest in the host-to-guest ring buffer.  Ensure that
invalid values cannot cause data being copied out of the bounds of the
source buffer in hvs_stream_dequeue().

Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
---
 include/linux/hyperv.h           |  5 +++++
 net/vmw_vsock/hyperv_transport.c | 11 +++++++++--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index fe2e0179ed51e..55478a6810b60 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1663,6 +1663,11 @@ static inline u32 hv_pkt_datalen(const struct vmpacket_descriptor *desc)
 	return (desc->len8 << 3) - (desc->offset8 << 3);
 }
 
+/* Get packet length associated with descriptor */
+static inline u32 hv_pkt_len(const struct vmpacket_descriptor *desc)
+{
+	return desc->len8 << 3;
+}
 
 struct vmpacket_descriptor *
 hv_pkt_iter_first_raw(struct vmbus_channel *channel);
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index 8c37d07017fc4..092cadc2c866d 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -577,12 +577,19 @@ static bool hvs_dgram_allow(u32 cid, u32 port)
 static int hvs_update_recv_data(struct hvsock *hvs)
 {
 	struct hvs_recv_buf *recv_buf;
-	u32 payload_len;
+	u32 pkt_len, payload_len;
+
+	pkt_len = hv_pkt_len(hvs->recv_desc);
+
+	/* Ensure the packet is big enough to read its header */
+	if (pkt_len < HVS_HEADER_LEN)
+		return -EIO;
 
 	recv_buf = (struct hvs_recv_buf *)(hvs->recv_desc + 1);
 	payload_len = recv_buf->hdr.data_size;
 
-	if (payload_len > HVS_MTU_SIZE)
+	/* Ensure the packet is big enough to read its payload */
+	if (payload_len > pkt_len - HVS_HEADER_LEN || payload_len > HVS_MTU_SIZE)
 		return -EIO;
 
 	if (payload_len == 0)
-- 
2.25.1


^ permalink raw reply related

* [PATCH 4/5] Drivers: hv: vmbus: Accept hv_sock offers in isolated guests
From: Andrea Parri (Microsoft) @ 2022-04-20 20:07 UTC (permalink / raw)
  To: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, Michael Kelley, Stefano Garzarella, David Miller,
	Jakub Kicinski, Paolo Abeni
  Cc: linux-hyperv, virtualization, netdev, linux-kernel,
	Andrea Parri (Microsoft)
In-Reply-To: <20220420200720.434717-1-parri.andrea@gmail.com>

So that isolated guests can communicate with the host via hv_sock
channels.

Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
---
 drivers/hv/channel_mgmt.c | 8 ++++++--
 include/linux/hyperv.h    | 8 ++++++--
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 67be81208a2d9..d800220ee54f4 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -976,13 +976,17 @@ find_primary_channel_by_offer(const struct vmbus_channel_offer_channel *offer)
 	return channel;
 }
 
-static bool vmbus_is_valid_device(const guid_t *guid)
+static bool vmbus_is_valid_offer(const struct vmbus_channel_offer_channel *offer)
 {
+	const guid_t *guid = &offer->offer.if_type;
 	u16 i;
 
 	if (!hv_is_isolation_supported())
 		return true;
 
+	if (is_hvsock_offer(offer))
+		return true;
+
 	for (i = 0; i < ARRAY_SIZE(vmbus_devs); i++) {
 		if (guid_equal(guid, &vmbus_devs[i].guid))
 			return vmbus_devs[i].allowed_in_isolated;
@@ -1004,7 +1008,7 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
 
 	trace_vmbus_onoffer(offer);
 
-	if (!vmbus_is_valid_device(&offer->offer.if_type)) {
+	if (!vmbus_is_valid_offer(offer)) {
 		pr_err_ratelimited("Invalid offer %d from the host supporting isolation\n",
 				   offer->child_relid);
 		atomic_dec(&vmbus_connection.offer_in_progress);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 55478a6810b60..1112c5cf894e6 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1044,10 +1044,14 @@ struct vmbus_channel {
 u64 vmbus_next_request_id(struct vmbus_channel *channel, u64 rqst_addr);
 u64 vmbus_request_addr(struct vmbus_channel *channel, u64 trans_id);
 
+static inline bool is_hvsock_offer(const struct vmbus_channel_offer_channel *o)
+{
+	return !!(o->offer.chn_flags & VMBUS_CHANNEL_TLNPI_PROVIDER_OFFER);
+}
+
 static inline bool is_hvsock_channel(const struct vmbus_channel *c)
 {
-	return !!(c->offermsg.offer.chn_flags &
-		  VMBUS_CHANNEL_TLNPI_PROVIDER_OFFER);
+	return is_hvsock_offer(&c->offermsg);
 }
 
 static inline bool is_sub_channel(const struct vmbus_channel *c)
-- 
2.25.1


^ permalink raw reply related

* [PATCH 5/5] Drivers: hv: vmbus: Refactor the ring-buffer iterator functions
From: Andrea Parri (Microsoft) @ 2022-04-20 20:07 UTC (permalink / raw)
  To: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, Michael Kelley, Stefano Garzarella, David Miller,
	Jakub Kicinski, Paolo Abeni
  Cc: linux-hyperv, virtualization, netdev, linux-kernel,
	Andrea Parri (Microsoft)
In-Reply-To: <20220420200720.434717-1-parri.andrea@gmail.com>

With no users of hv_pkt_iter_next_raw() and no "external" users of
hv_pkt_iter_first_raw(), the iterator functions can be refactored
and simplified to remove some indirection/code.

Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
---
 drivers/hv/ring_buffer.c | 32 +++++++++-----------------------
 include/linux/hyperv.h   | 35 ++++-------------------------------
 2 files changed, 13 insertions(+), 54 deletions(-)

diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 3d215d9dec433..fa98b3a91206a 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -421,7 +421,7 @@ int hv_ringbuffer_read(struct vmbus_channel *channel,
 	memcpy(buffer, (const char *)desc + offset, packetlen);
 
 	/* Advance ring index to next packet descriptor */
-	__hv_pkt_iter_next(channel, desc, true);
+	__hv_pkt_iter_next(channel, desc);
 
 	/* Notify host of update */
 	hv_pkt_iter_close(channel);
@@ -456,22 +456,6 @@ static u32 hv_pkt_iter_avail(const struct hv_ring_buffer_info *rbi)
 		return (rbi->ring_datasize - priv_read_loc) + write_loc;
 }
 
-/*
- * Get first vmbus packet without copying it out of the ring buffer
- */
-struct vmpacket_descriptor *hv_pkt_iter_first_raw(struct vmbus_channel *channel)
-{
-	struct hv_ring_buffer_info *rbi = &channel->inbound;
-
-	hv_debug_delay_test(channel, MESSAGE_DELAY);
-
-	if (hv_pkt_iter_avail(rbi) < sizeof(struct vmpacket_descriptor))
-		return NULL;
-
-	return (struct vmpacket_descriptor *)(hv_get_ring_buffer(rbi) + rbi->priv_read_index);
-}
-EXPORT_SYMBOL_GPL(hv_pkt_iter_first_raw);
-
 /*
  * Get first vmbus packet from ring buffer after read_index
  *
@@ -483,11 +467,14 @@ struct vmpacket_descriptor *hv_pkt_iter_first(struct vmbus_channel *channel)
 	struct vmpacket_descriptor *desc, *desc_copy;
 	u32 bytes_avail, pkt_len, pkt_offset;
 
-	desc = hv_pkt_iter_first_raw(channel);
-	if (!desc)
+	hv_debug_delay_test(channel, MESSAGE_DELAY);
+
+	bytes_avail = hv_pkt_iter_avail(rbi);
+	if (bytes_avail < sizeof(struct vmpacket_descriptor))
 		return NULL;
+	bytes_avail = min(rbi->pkt_buffer_size, bytes_avail);
 
-	bytes_avail = min(rbi->pkt_buffer_size, hv_pkt_iter_avail(rbi));
+	desc = (struct vmpacket_descriptor *)(hv_get_ring_buffer(rbi) + rbi->priv_read_index);
 
 	/*
 	 * Ensure the compiler does not use references to incoming Hyper-V values (which
@@ -534,8 +521,7 @@ EXPORT_SYMBOL_GPL(hv_pkt_iter_first);
  */
 struct vmpacket_descriptor *
 __hv_pkt_iter_next(struct vmbus_channel *channel,
-		   const struct vmpacket_descriptor *desc,
-		   bool copy)
+		   const struct vmpacket_descriptor *desc)
 {
 	struct hv_ring_buffer_info *rbi = &channel->inbound;
 	u32 packetlen = desc->len8 << 3;
@@ -548,7 +534,7 @@ __hv_pkt_iter_next(struct vmbus_channel *channel,
 		rbi->priv_read_index -= dsize;
 
 	/* more data? */
-	return copy ? hv_pkt_iter_first(channel) : hv_pkt_iter_first_raw(channel);
+	return hv_pkt_iter_first(channel);
 }
 EXPORT_SYMBOL_GPL(__hv_pkt_iter_next);
 
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 1112c5cf894e6..370adc9971d3e 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1673,55 +1673,28 @@ static inline u32 hv_pkt_len(const struct vmpacket_descriptor *desc)
 	return desc->len8 << 3;
 }
 
-struct vmpacket_descriptor *
-hv_pkt_iter_first_raw(struct vmbus_channel *channel);
-
 struct vmpacket_descriptor *
 hv_pkt_iter_first(struct vmbus_channel *channel);
 
 struct vmpacket_descriptor *
 __hv_pkt_iter_next(struct vmbus_channel *channel,
-		   const struct vmpacket_descriptor *pkt,
-		   bool copy);
+		   const struct vmpacket_descriptor *pkt);
 
 void hv_pkt_iter_close(struct vmbus_channel *channel);
 
 static inline struct vmpacket_descriptor *
-hv_pkt_iter_next_pkt(struct vmbus_channel *channel,
-		     const struct vmpacket_descriptor *pkt,
-		     bool copy)
+hv_pkt_iter_next(struct vmbus_channel *channel,
+		 const struct vmpacket_descriptor *pkt)
 {
 	struct vmpacket_descriptor *nxt;
 
-	nxt = __hv_pkt_iter_next(channel, pkt, copy);
+	nxt = __hv_pkt_iter_next(channel, pkt);
 	if (!nxt)
 		hv_pkt_iter_close(channel);
 
 	return nxt;
 }
 
-/*
- * Get next packet descriptor without copying it out of the ring buffer
- * If at end of list, return NULL and update host.
- */
-static inline struct vmpacket_descriptor *
-hv_pkt_iter_next_raw(struct vmbus_channel *channel,
-		     const struct vmpacket_descriptor *pkt)
-{
-	return hv_pkt_iter_next_pkt(channel, pkt, false);
-}
-
-/*
- * Get next packet descriptor from iterator
- * If at end of list, return NULL and update host.
- */
-static inline struct vmpacket_descriptor *
-hv_pkt_iter_next(struct vmbus_channel *channel,
-		 const struct vmpacket_descriptor *pkt)
-{
-	return hv_pkt_iter_next_pkt(channel, pkt, true);
-}
-
 #define foreach_vmbus_pkt(pkt, channel) \
 	for (pkt = hv_pkt_iter_first(channel); pkt; \
 	    pkt = hv_pkt_iter_next(channel, pkt))
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH net-next 03/12] dt-bindings: net: pcs: add bindings for Renesas RZ/N1 MII converter
From: Rob Herring @ 2022-04-20 20:11 UTC (permalink / raw)
  To: Clément Léger
  Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, Vladimir Oltean,
	David S . Miller, Jakub Kicinski, Paolo Abeni,
	Krzysztof Kozlowski, Geert Uytterhoeven, Magnus Damm,
	Heiner Kallweit, Russell King, Thomas Petazzoni, Herve Codina,
	Miquèl Raynal, Milan Stevanovic, Jimmy Lalande, linux-kernel,
	devicetree, linux-renesas-soc, netdev
In-Reply-To: <20220419170044.450050ca@fixe.home>

On Tue, Apr 19, 2022 at 05:00:44PM +0200, Clément Léger wrote:
> Le Tue, 19 Apr 2022 08:43:47 -0500,
> Rob Herring <robh@kernel.org> a écrit :
> 
> > > +  clocks:
> > > +    items:
> > > +      - description: MII reference clock
> > > +      - description: RGMII reference clock
> > > +      - description: RMII reference clock
> > > +      - description: AHB clock used for the MII converter register interface
> > > +
> > > +  renesas,miic-cfg-mode:
> > > +    description: MII mux configuration mode. This value should use one of the
> > > +                 value defined in dt-bindings/net/pcs-rzn1-miic.h.  
> > 
> > Describe possible values here as constraints. At present, I don't see 
> > the point of this property if there is only 1 possible value and it is 
> > required.
> 
> The ethernet subsystem contains a number of internal muxes that allows
> to configure ethernet routing. This configuration option allows to set
> the register that configure these muxes.
> 
> After talking with Andrew, I considered moving to something like this:
> 
> eth-miic@44030000 {
>   compatible = "renesas,rzn1-miic";
> 
>   mii_conv1: mii-conv-1 {
>     renesas,miic-input = <MIIC_GMAC1_PORT>;
>     port = <1>;

'port' is already used, find another name. Maybe 'reg' here. Don't know. 
What do 1 and 2 represent?


>   };
>   mii_conv2: mii-conv-2 {
>     renesas,miic-input = <MIIC_SWITCHD_PORT>;
>     port = <2>;
>   };
>    ...
> };
> 
> Which would allow embedding the configuration inside the port
> sub-nodes. Moreover, it allows a better validation of the values using
> the schema validation directly since only a limited number of values
> are allowed for each port.
> 
> > 
> > > +    $ref: /schemas/types.yaml#/definitions/uint32
> > > +  
> > > +patternProperties:
> > > +  "^mii-conv@[0-4]$":
> > > +    type: object  
> > 
> >        additionalProperties: false
> > 
> > > +    description: MII converter port
> > > +
> > > +    properties:
> > > +      reg:
> > > +        maxItems: 1  
> > 
> > Why do you need sub-nodes? They don't have any properties. A simple mask 
> > property could tell you which ports are present/active/enabled if that's 
> > what you are tracking. Or the SoC specific compatibles you need to add 
> > can imply the ports if they are SoC specific.
> 
> The MACs are using phandles to these sub-nodes to query a specific MII
> converter port PCS:
> 
> switch@44050000 {
>     compatible = "renesas,rzn1-a5psw";
> 
>     ports {
>         port@0 {

ethernet-ports and ethernet-port so we don't collide with the graph 
binding.


>             reg = <0>;
>             label = "lan0";
>             phy-handle = <&switch0phy3>;
>             pcs-handle = <&mii_conv4>;
>         };
>     };
> };
> 
> According to Andrew, this is not a good idea to represent the PCS as a
> bus since it is indeed not a bus. I could also switch to something like
> pcs-handle = <&eth_mii 4> but i'm not sure what you'd prefer. We could
> also remove this from the device-tree and consider each driver to
> request the MII ouput to be configured using something like this for
> instance:

I'm pretty sure we already defined pcs-handle as only a phandle. If you 
want variable cells, then it's got to be extended like all the other 
properties using that pattern.

> 
>  miic_request_pcs(pcs_np, miic_port_nr, MIIC_SWITCHD_PORT);
> 
> But I'm not really fan of this because it requires the drivers to
> assume some specificities of the MII converter (port number are not in
> the same order of the switch for instance) and thus I would prefer this
> to be in the device-tree.
> 
> Let me know if you can think of something that would suit you
> better but  keep in mind that I need to correctly match a switch/MAC
> port with a PCS port and that I also need to configure MII internal
> muxes. 
> 
> For more information, you can look at section 8 of the manual at [1].

I can't really. Other people want their bindings reviewed too.

Rob

^ permalink raw reply

* Re: [PATCH net-next 09/12] ARM: dts: r9a06g032: describe MII converter
From: Rob Herring @ 2022-04-20 20:16 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Clément Léger, Vivien Didelot, Florian Fainelli,
	Vladimir Oltean, David S . Miller, Jakub Kicinski, Paolo Abeni,
	Krzysztof Kozlowski, Geert Uytterhoeven, Magnus Damm,
	Heiner Kallweit, Russell King, Thomas Petazzoni, Herve Codina,
	Miquèl Raynal, Milan Stevanovic, Jimmy Lalande, linux-kernel,
	devicetree, linux-renesas-soc, netdev
In-Reply-To: <YlmbIjoIZ8Xb4Kh/@lunn.ch>

On Fri, Apr 15, 2022 at 06:19:46PM +0200, Andrew Lunn wrote:
> > I think it would be good to modify it like this:
> > 
> > eth-miic@44030000 {
> >     ...
> >   converters {
> >     mii_conv0: mii-conv@0 {
> >       // Even if useless, maybe keeping it for the sake of coherency
> >       renesas,miic-input = <MIIC_GMAC1>;
> >       reg = <0>;
> >     };
> 
> This is not a 'bus', so using reg, and @0, etc is i think wrong.  You
> just have a collection of properties.

'reg' can be for anything you need to address, not just buses.

Rob

^ permalink raw reply

* Re: [PATCH net v2 1/2] dt-bindings: net: dsa: realtek: cleanup compatible strings
From: Luiz Angelo Daros de Luca @ 2022-04-20 20:29 UTC (permalink / raw)
  To: David S. Miller
  Cc: open list:NETWORKING DRIVERS, Linus Walleij, Alvin Šipraga,
	Andrew Lunn, Vivien Didelot, Florian Fainelli, Vladimir Oltean,
	Jakub Kicinski, Paolo Abeni, Rob Herring, krzk+dt,
	Arınç ÜNAL, devicetree
In-Reply-To: <165044941250.8751.17513068846690831070.git-patchwork-notify@kernel.org>

> This series was applied to netdev/net-next.git (master)
> by David S. Miller <davem@davemloft.net>:
>
> On Mon, 18 Apr 2022 20:35:57 -0300 you wrote:
> > Compatible strings are used to help the driver find the chip ID/version
> > register for each chip family. After that, the driver can setup the
> > switch accordingly. Keep only the first supported model for each family
> > as a compatible string and reference other chip models in the
> > description.
> >
> > The removed compatible strings have never been used in a released kernel.
> >
> > [...]
>
> Here is the summary with links:
>   - [net,v2,1/2] dt-bindings: net: dsa: realtek: cleanup compatible strings
>     https://git.kernel.org/netdev/net-next/c/6f2d04ccae9b
>   - [net,v2,2/2] net: dsa: realtek: remove realtek,rtl8367s string
>     https://git.kernel.org/netdev/net-next/c/fcd30c96af95
>

Hi David,

I was expecting to get those patches merged to net as well. Otherwise,
the "realtek,rtl8367s" we are removing will get into a released
kernel.

Regards,

^ permalink raw reply

* Re: IPv6 multicast with VRF
From: David Ahern @ 2022-04-20 20:40 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: netdev
In-Reply-To: <20220420191824.wgdh5tr3mzisalsh@skbuf>

On 4/20/22 1:18 PM, Vladimir Oltean wrote:
> On Wed, Apr 20, 2022 at 12:59:45PM -0600, David Ahern wrote:
>> Did you adjust the FIB rules? See the documentation in the kernel repo.
> 
> Sorry, I don't understand what you mean by "adjusting". I tried various
> forms of adding an IPv6 multicast route on eth0, to multiple tables,
> some routes more generic and some more specific, and none seem to match
> when eth0 is under a VRF, for a reason I don't really know. This does
> not occur with IPv4 multicast, by the way.
> 
> By documentation I think you mean Documentation/networking/vrf.rst.
> I went through it but I didn't notice something that would make me
> realize what the issue is.

try this:
    https://static.sched.com/hosted_files/ossna2017/fe/vrf-tutorial-oss.pdf
slide 79 and on

> 
>> And add a device scope to the `get`. e.g.,
>>
>>     ip -6 route get ff02::1%eth0
> 
> I'm probably not understanding this, because:
> 
>  ip -6 route get ff02::1%eth0
> Error: inet6 prefix is expected rather than "ff02::1%eth0".

ip -6 ro get oif eth0 ff02::1

(too many syntax differences between tools)

^ permalink raw reply

* [PATCH 1/1] ixgbe: correct SDP0 check of SFP cage for X550
From: Jeff Daly @ 2022-04-20 20:51 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: Stephen Douthit, Jesse Brandeburg, Tony Nguyen, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Don Skidmore, Jeff Kirsher,
	moderated list:INTEL ETHERNET DRIVERS,
	open list:NETWORKING DRIVERS, open list

SDP0 for X550 NICs is active low to indicate the presence of an SFP in the
cage (MOD_ABS#).  Invert the results of the logical AND to set
sfp_cage_full variable correctly.

Fixes: aac9e053f104 ("ixgbe: cleanup crosstalk fix")

Suggested-by: Stephen Douthit <stephend@silicom-usa.com>
Signed-off-by: Jeff Daly <jeffd@silicom-usa.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
index 4c26c4b92f07..26d16bc85c59 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
@@ -3308,8 +3308,8 @@ s32 ixgbe_check_mac_link_generic(struct ixgbe_hw *hw, ixgbe_link_speed *speed,
 			break;
 		case ixgbe_mac_X550EM_x:
 		case ixgbe_mac_x550em_a:
-			sfp_cage_full = IXGBE_READ_REG(hw, IXGBE_ESDP) &
-					IXGBE_ESDP_SDP0;
+			sfp_cage_full = !(IXGBE_READ_REG(hw, IXGBE_ESDP) &
+					IXGBE_ESDP_SDP0);
 			break;
 		default:
 			/* sanity check - No SFP+ devices here */
-- 
2.25.1


^ permalink raw reply related

* [PATCH net] sctp: check asoc strreset_chunk in sctp_generate_reconf_event
From: Xin Long @ 2022-04-20 20:52 UTC (permalink / raw)
  To: network dev, linux-sctp; +Cc: davem, kuba, Marcelo Ricardo Leitner, Neil Horman

A null pointer reference issue can be triggered when the response of a
stream reconf request arrives after the timer is triggered, such as:

  send Incoming SSN Reset Request --->
  CPU0:
   reconf timer is triggered,
   go to the handler code before hold sk lock
                            <--- reply with Outgoing SSN Reset Request
  CPU1:
   process Outgoing SSN Reset Request,
   and set asoc->strreset_chunk to NULL
  CPU0:
   continue the handler code, hold sk lock,
   and try to hold asoc->strreset_chunk, crash!

In Ying Xu's testing, the call trace is:

  [ ] BUG: kernel NULL pointer dereference, address: 0000000000000010
  [ ] RIP: 0010:sctp_chunk_hold+0xe/0x40 [sctp]
  [ ] Call Trace:
  [ ]  <IRQ>
  [ ]  sctp_sf_send_reconf+0x2c/0x100 [sctp]
  [ ]  sctp_do_sm+0xa4/0x220 [sctp]
  [ ]  sctp_generate_reconf_event+0xbd/0xe0 [sctp]
  [ ]  call_timer_fn+0x26/0x130

This patch is to fix it by returning from the timer handler if asoc
strreset_chunk is already set to NULL.

Fixes: 7b9438de0cd4 ("sctp: add stream reconf timer")
Reported-by: Ying Xu <yinxu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 net/sctp/sm_sideeffect.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index b3815b568e8e..463c4a58d2c3 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -458,6 +458,10 @@ void sctp_generate_reconf_event(struct timer_list *t)
 		goto out_unlock;
 	}
 
+	/* This happens when the response arrives after the timer is triggered. */
+	if (!asoc->strreset_chunk)
+		goto out_unlock;
+
 	error = sctp_do_sm(net, SCTP_EVENT_T_TIMEOUT,
 			   SCTP_ST_TIMEOUT(SCTP_EVENT_TIMEOUT_RECONF),
 			   asoc->state, asoc->ep, asoc,
-- 
2.31.1


^ permalink raw reply related

* Ethernet TX buffer crossing 4K boundary?
From: Joakim Tjernlund @ 2022-04-20 21:09 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: Eric Gratorp

We have this custom Ethernet controller that cannot DMA a buffer if the buffer crosses 4K boundary.
Any ideas how to deal with that limitation in the driver?

 Joakim


^ permalink raw reply

* Re: [PATCHv2 bpf-next 3/4] bpf: Resolve symbols with kallsyms_lookup_names for kprobe multi link
From: Andrii Nakryiko @ 2022-04-20 21:49 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Masami Hiramatsu, Networking, bpf, lkml, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh
In-Reply-To: <20220418124834.829064-4-jolsa@kernel.org>

On Mon, Apr 18, 2022 at 5:49 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Using kallsyms_lookup_names function to speed up symbols lookup in
> kprobe multi link attachment and replacing with it the current
> kprobe_multi_resolve_syms function.
>
> This speeds up bpftrace kprobe attachment:
>
>   # perf stat -r 5 -e cycles ./src/bpftrace -e 'kprobe:x* {  } i:ms:1 { exit(); }'
>   ...
>   6.5681 +- 0.0225 seconds time elapsed  ( +-  0.34% )
>
> After:
>
>   # perf stat -r 5 -e cycles ./src/bpftrace -e 'kprobe:x* {  } i:ms:1 { exit(); }'
>   ...
>   0.5661 +- 0.0275 seconds time elapsed  ( +-  4.85% )
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---

LGTM.

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  kernel/trace/bpf_trace.c | 113 +++++++++++++++++++++++----------------
>  1 file changed, 67 insertions(+), 46 deletions(-)
>
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index b26f3da943de..f49cdc46a21f 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -2226,6 +2226,60 @@ struct bpf_kprobe_multi_run_ctx {
>         unsigned long entry_ip;
>  };
>
> +struct user_syms {
> +       const char **syms;
> +       char *buf;
> +};
> +
> +static int copy_user_syms(struct user_syms *us, unsigned long __user *usyms, u32 cnt)
> +{
> +       unsigned long __user usymbol;
> +       const char **syms = NULL;
> +       char *buf = NULL, *p;
> +       int err = -EFAULT;
> +       unsigned int i;
> +
> +       err = -ENOMEM;
> +       syms = kvmalloc(cnt * sizeof(*syms), GFP_KERNEL);
> +       if (!syms)
> +               goto error;
> +
> +       buf = kvmalloc(cnt * KSYM_NAME_LEN, GFP_KERNEL);
> +       if (!buf)
> +               goto error;
> +
> +       for (p = buf, i = 0; i < cnt; i++) {
> +               if (__get_user(usymbol, usyms + i)) {
> +                       err = -EFAULT;
> +                       goto error;
> +               }
> +               err = strncpy_from_user(p, (const char __user *) usymbol, KSYM_NAME_LEN);
> +               if (err == KSYM_NAME_LEN)
> +                       err = -E2BIG;
> +               if (err < 0)
> +                       goto error;
> +               syms[i] = p;
> +               p += err + 1;
> +       }
> +
> +       err = 0;
> +       us->syms = syms;
> +       us->buf = buf;

return 0 here instead of falling through into error: block?

> +
> +error:
> +       if (err) {
> +               kvfree(syms);
> +               kvfree(buf);
> +       }
> +       return err;
> +}
> +
> +static void free_user_syms(struct user_syms *us)
> +{
> +       kvfree(us->syms);
> +       kvfree(us->buf);
> +}
> +
>  static void bpf_kprobe_multi_link_release(struct bpf_link *link)
>  {
>         struct bpf_kprobe_multi_link *kmulti_link;

[...]

^ permalink raw reply

* Re: [PATCH bpf-next v4 0/3] Enlarge offset check value in bpf_skb_load_bytes
From: patchwork-bot+netdevbpf @ 2022-04-20 21:50 UTC (permalink / raw)
  To: Liu Jian
  Cc: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, davem, kuba, sdf, netdev, bpf, pabeni
In-Reply-To: <20220416105801.88708-1-liujian56@huawei.com>

Hello:

This series was applied to bpf/bpf-next.git (master)
by Daniel Borkmann <daniel@iogearbox.net>:

On Sat, 16 Apr 2022 18:57:58 +0800 you wrote:
> The data length of skb frags + frag_list may be greater than 0xffff,
> and skb_header_pointer can not handle negative offset.
> So here INT_MAX is used to check the validity of offset.
> 
> And add the test case for the change.
> 
> Liu Jian (3):
>   net: Enlarge offset check value from 0xffff to INT_MAX in
>     bpf_skb_load_bytes
>   net: change skb_ensure_writable()'s write_len param to unsigned int
>     type
>   selftests: bpf: add test for skb_load_bytes
> 
> [...]

Here is the summary with links:
  - [bpf-next,v4,1/3] net: Enlarge offset check value from 0xffff to INT_MAX in bpf_skb_load_bytes
    https://git.kernel.org/bpf/bpf-next/c/45969b4152c1
  - [bpf-next,v4,2/3] net: change skb_ensure_writable()'s write_len param to unsigned int type
    https://git.kernel.org/bpf/bpf-next/c/92ece28072f1
  - [bpf-next,v4,3/3] selftests: bpf: add test for skb_load_bytes
    https://git.kernel.org/bpf/bpf-next/c/127e7dca427b

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCHv2 bpf-next 4/4] selftests/bpf: Add attach bench test
From: Andrii Nakryiko @ 2022-04-20 21:56 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Masami Hiramatsu, Networking, bpf, lkml, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh
In-Reply-To: <20220418124834.829064-5-jolsa@kernel.org>

On Mon, Apr 18, 2022 at 5:49 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Adding test that reads all functions from ftrace available_filter_functions
> file and attach them all through kprobe_multi API.
>
> It also prints stats info with -v option, like on my setup:
>
>   test_bench_attach: found 48712 functions
>   test_bench_attach: attached in   1.069s
>   test_bench_attach: detached in   0.373s
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  .../bpf/prog_tests/kprobe_multi_test.c        | 136 ++++++++++++++++++
>  .../selftests/bpf/progs/kprobe_multi_empty.c  |  12 ++
>  2 files changed, 148 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/progs/kprobe_multi_empty.c
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c b/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
> index b9876b55fc0c..05f0fab8af89 100644
> --- a/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
> +++ b/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
> @@ -2,6 +2,9 @@
>  #include <test_progs.h>
>  #include "kprobe_multi.skel.h"
>  #include "trace_helpers.h"
> +#include "kprobe_multi_empty.skel.h"
> +#include "bpf/libbpf_internal.h"
> +#include "bpf/hashmap.h"
>
>  static void kprobe_multi_test_run(struct kprobe_multi *skel, bool test_return)
>  {
> @@ -301,6 +304,137 @@ static void test_attach_api_fails(void)
>         kprobe_multi__destroy(skel);
>  }
>
> +static inline __u64 get_time_ns(void)
> +{
> +       struct timespec t;
> +
> +       clock_gettime(CLOCK_MONOTONIC, &t);
> +       return (__u64) t.tv_sec * 1000000000 + t.tv_nsec;
> +}
> +
> +static size_t symbol_hash(const void *key, void *ctx __maybe_unused)
> +{
> +       return str_hash((const char *) key);
> +}
> +
> +static bool symbol_equal(const void *key1, const void *key2, void *ctx __maybe_unused)
> +{
> +       return strcmp((const char *) key1, (const char *) key2) == 0;
> +}
> +
> +#define DEBUGFS "/sys/kernel/debug/tracing/"
> +
> +static int get_syms(char ***symsp, size_t *cntp)
> +{
> +       size_t cap = 0, cnt = 0, i;
> +       char *name, **syms = NULL;
> +       struct hashmap *map;
> +       char buf[256];
> +       FILE *f;
> +       int err;
> +
> +       /*
> +        * The available_filter_functions contains many duplicates,
> +        * but other than that all symbols are usable in kprobe multi
> +        * interface.
> +        * Filtering out duplicates by using hashmap__add, which won't
> +        * add existing entry.
> +        */
> +       f = fopen(DEBUGFS "available_filter_functions", "r");

nit: DEBUGFS "constant" just makes it harder to follow the code and
doesn't add anything, please just use the full path here directly

> +       if (!f)
> +               return -EINVAL;
> +
> +       map = hashmap__new(symbol_hash, symbol_equal, NULL);
> +       err = libbpf_get_error(map);

hashmap__new() is an internal API, so please use IS_ERR() directly
here. libbpf_get_error() should be used for public libbpf APIs, and
preferably not in libbpf 1.0 mode

> +       if (err)
> +               goto error;
> +
> +       while (fgets(buf, sizeof(buf), f)) {
> +               /* skip modules */
> +               if (strchr(buf, '['))
> +                       continue;

[...]

> +       attach_delta = (attach_end_ns - attach_start_ns) / 1000000000.0;
> +       detach_delta = (detach_end_ns - detach_start_ns) / 1000000000.0;
> +
> +       fprintf(stderr, "%s: found %lu functions\n", __func__, cnt);
> +       fprintf(stderr, "%s: attached in %7.3lfs\n", __func__, attach_delta);
> +       fprintf(stderr, "%s: detached in %7.3lfs\n", __func__, detach_delta);


why stderr? just do printf() and let test_progs handle output


> +
> +cleanup:
> +       kprobe_multi_empty__destroy(skel);
> +       if (syms) {
> +               for (i = 0; i < cnt; i++)
> +                       free(syms[i]);
> +               free(syms);
> +       }
> +}
> +
>  void test_kprobe_multi_test(void)
>  {
>         if (!ASSERT_OK(load_kallsyms(), "load_kallsyms"))
> @@ -320,4 +454,6 @@ void test_kprobe_multi_test(void)
>                 test_attach_api_syms();
>         if (test__start_subtest("attach_api_fails"))
>                 test_attach_api_fails();
> +       if (test__start_subtest("bench_attach"))
> +               test_bench_attach();
>  }
> diff --git a/tools/testing/selftests/bpf/progs/kprobe_multi_empty.c b/tools/testing/selftests/bpf/progs/kprobe_multi_empty.c
> new file mode 100644
> index 000000000000..be9e3d891d46
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/kprobe_multi_empty.c
> @@ -0,0 +1,12 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/bpf.h>
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_tracing.h>
> +
> +char _license[] SEC("license") = "GPL";
> +
> +SEC("kprobe.multi/*")

use SEC("kprobe.multi") to make it clear that we are attaching it "manually"?

> +int test_kprobe_empty(struct pt_regs *ctx)
> +{
> +       return 0;
> +}
> --
> 2.35.1
>

^ permalink raw reply

* Re: [PATCH bpf-next] bpf: use bpf_prog_run_array_cg_flags everywhere
From: Andrii Nakryiko @ 2022-04-20 22:04 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Networking, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko
In-Reply-To: <20220419222259.287515-1-sdf@google.com>

On Tue, Apr 19, 2022 at 3:23 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> Rename bpf_prog_run_array_cg_flags to bpf_prog_run_array_cg and
> use it everywhere. check_return_code already enforces sane
> return ranges for all cgroup types. (only egress and bind hooks have
> uncanonical return ranges, the rest is using [0, 1])
>
> No functional changes.
>
> Suggested-by: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  include/linux/bpf-cgroup.h |  8 ++---
>  kernel/bpf/cgroup.c        | 70 ++++++++++++--------------------------
>  2 files changed, 24 insertions(+), 54 deletions(-)
>
> diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> index 88a51b242adc..669d96d074ad 100644
> --- a/include/linux/bpf-cgroup.h
> +++ b/include/linux/bpf-cgroup.h
> @@ -225,24 +225,20 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk,
>
>  #define BPF_CGROUP_RUN_SA_PROG(sk, uaddr, atype)                                      \
>  ({                                                                            \
> -       u32 __unused_flags;                                                    \
>         int __ret = 0;                                                         \
>         if (cgroup_bpf_enabled(atype))                                         \
>                 __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, atype,     \
> -                                                         NULL,                \
> -                                                         &__unused_flags);    \
> +                                                         NULL, NULL);         \
>         __ret;                                                                 \
>  })
>
>  #define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, atype, t_ctx)                  \
>  ({                                                                            \
> -       u32 __unused_flags;                                                    \
>         int __ret = 0;                                                         \
>         if (cgroup_bpf_enabled(atype))  {                                      \
>                 lock_sock(sk);                                                 \
>                 __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, atype,     \
> -                                                         t_ctx,               \
> -                                                         &__unused_flags);    \
> +                                                         t_ctx, NULL);        \
>                 release_sock(sk);                                              \
>         }                                                                      \
>         __ret;                                                                 \
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index 0cb6211fcb58..f61eca32c747 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -25,50 +25,18 @@ EXPORT_SYMBOL(cgroup_bpf_enabled_key);
>  /* __always_inline is necessary to prevent indirect call through run_prog
>   * function pointer.
>   */
> -static __always_inline int
> -bpf_prog_run_array_cg_flags(const struct cgroup_bpf *cgrp,
> -                           enum cgroup_bpf_attach_type atype,
> -                           const void *ctx, bpf_prog_run_fn run_prog,
> -                           int retval, u32 *ret_flags)
> -{
> -       const struct bpf_prog_array_item *item;
> -       const struct bpf_prog *prog;
> -       const struct bpf_prog_array *array;
> -       struct bpf_run_ctx *old_run_ctx;
> -       struct bpf_cg_run_ctx run_ctx;
> -       u32 func_ret;
> -
> -       run_ctx.retval = retval;
> -       migrate_disable();
> -       rcu_read_lock();
> -       array = rcu_dereference(cgrp->effective[atype]);
> -       item = &array->items[0];
> -       old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> -       while ((prog = READ_ONCE(item->prog))) {
> -               run_ctx.prog_item = item;
> -               func_ret = run_prog(prog, ctx);
> -               if (!(func_ret & 1) && !IS_ERR_VALUE((long)run_ctx.retval))
> -                       run_ctx.retval = -EPERM;
> -               *(ret_flags) |= (func_ret >> 1);
> -               item++;
> -       }
> -       bpf_reset_run_ctx(old_run_ctx);
> -       rcu_read_unlock();
> -       migrate_enable();
> -       return run_ctx.retval;
> -}
> -
>  static __always_inline int
>  bpf_prog_run_array_cg(const struct cgroup_bpf *cgrp,
>                       enum cgroup_bpf_attach_type atype,
>                       const void *ctx, bpf_prog_run_fn run_prog,
> -                     int retval)
> +                     int retval, u32 *ret_flags)
>  {
>         const struct bpf_prog_array_item *item;
>         const struct bpf_prog *prog;
>         const struct bpf_prog_array *array;
>         struct bpf_run_ctx *old_run_ctx;
>         struct bpf_cg_run_ctx run_ctx;
> +       u32 func_ret;
>
>         run_ctx.retval = retval;
>         migrate_disable();
> @@ -78,8 +46,11 @@ bpf_prog_run_array_cg(const struct cgroup_bpf *cgrp,
>         old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
>         while ((prog = READ_ONCE(item->prog))) {
>                 run_ctx.prog_item = item;
> -               if (!run_prog(prog, ctx) && !IS_ERR_VALUE((long)run_ctx.retval))
> +               func_ret = run_prog(prog, ctx);
> +               if (!(func_ret & 1) && !IS_ERR_VALUE((long)run_ctx.retval))

to be completely true to previous behavior, shouldn't there be

if (ret_flags)
    func_ret &= 1;
if (!func_ret && !IS_ERR_VALUE(...))

here?

This might have been discussed previously and I missed it. If that's
so, please ignore.


>                         run_ctx.retval = -EPERM;
> +               if (ret_flags)
> +                       *(ret_flags) |= (func_ret >> 1);
>                 item++;
>         }
>         bpf_reset_run_ctx(old_run_ctx);

[..]

^ permalink raw reply

* Re: [PATCH bpf-next] samples/bpf: reduce the sampling interval in xdp1_user
From: patchwork-bot+netdevbpf @ 2022-04-20 22:10 UTC (permalink / raw)
  To: Zhengchao Shao
  Cc: bpf, netdev, linux-kernel, ast, daniel, davem, kuba, hawk,
	john.fastabend, andrii, kafai, songliubraving, yhs, kpsingh,
	weiyongjun1, yuehaibing
In-Reply-To: <20220419114746.291613-1-shaozhengchao@huawei.com>

Hello:

This patch was applied to bpf/bpf-next.git (master)
by Andrii Nakryiko <andrii@kernel.org>:

On Tue, 19 Apr 2022 19:47:46 +0800 you wrote:
> If interval is 2, and sum - prev[key] = 1, the result = 0. This will
> mislead the tester that the port has no traffic right now. So reduce the
> sampling interval to 1.
> 
> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
> ---
>  samples/bpf/xdp1_user.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Here is the summary with links:
  - [bpf-next] samples/bpf: reduce the sampling interval in xdp1_user
    https://git.kernel.org/bpf/bpf-next/c/db69264f983a

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox