Netdev List
 help / color / mirror / Atom feed
* Re: [bpf-next] selftests/bpf: fix race in test_tcp_rtt test
From: Stanislav Fomichev @ 2019-08-16 16:13 UTC (permalink / raw)
  To: Petar Penkov; +Cc: netdev, bpf, davem, ast, daniel, sdf, Petar Penkov
In-Reply-To: <20190816160339.249832-1-ppenkov.kernel@gmail.com>

On 08/16, Petar Penkov wrote:
> From: Petar Penkov <ppenkov@google.com>
> 
> There is a race in this test between receiving the ACK for the
> single-byte packet sent in the test, and reading the values from the
> map.
> 
> This patch fixes this by having the client wait until there are no more
> unacknowledged packets.
> 
> Before:
> for i in {1..1000}; do ../net/in_netns.sh ./test_tcp_rtt; \
> done | grep -c PASSED
> < trimmed error messages >
> 993
> 
> After:
> for i in {1..10000}; do ../net/in_netns.sh ./test_tcp_rtt; \
> done | grep -c PASSED
> 10000
> 
> Fixes: b55873984dab ("selftests/bpf: test BPF_SOCK_OPS_RTT_CB")
> Signed-off-by: Petar Penkov <ppenkov@google.com>
> ---
>  tools/testing/selftests/bpf/test_tcp_rtt.c | 31 ++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/tools/testing/selftests/bpf/test_tcp_rtt.c b/tools/testing/selftests/bpf/test_tcp_rtt.c
> index 90c3862f74a8..2b4754473956 100644
> --- a/tools/testing/selftests/bpf/test_tcp_rtt.c
> +++ b/tools/testing/selftests/bpf/test_tcp_rtt.c
> @@ -6,6 +6,7 @@
>  #include <sys/types.h>
>  #include <sys/socket.h>
>  #include <netinet/in.h>
> +#include <netinet/tcp.h>
>  #include <pthread.h>
>  
>  #include <linux/filter.h>
> @@ -34,6 +35,30 @@ static void send_byte(int fd)
>  		error(1, errno, "Failed to send single byte");
>  }
>  
> +static int wait_for_ack(int fd, int retries)
> +{
> +	struct tcp_info info;
> +	socklen_t optlen;
> +	int i, err;
> +
> +	for (i = 0; i < retries; i++) {
> +		optlen = sizeof(info);
> +		err = getsockopt(fd, SOL_TCP, TCP_INFO, &info, &optlen);
> +		if (err < 0) {
> +			log_err("Failed to lookup TCP stats");
> +			return err;
> +		}
> +
> +		if (info.tcpi_unacked == 0)
> +			return 0;
> +
> +		sleep(1);
Isn't it too big of a hammer? Maybe usleep(10) here and do x100 retries
instead?

> +	}
> +
> +	log_err("Did not receive ACK");
> +	return -1;
> +}
> +
>  static int verify_sk(int map_fd, int client_fd, const char *msg, __u32 invoked,
>  		     __u32 dsack_dups, __u32 delivered, __u32 delivered_ce,
>  		     __u32 icsk_retransmits)
> @@ -149,6 +174,11 @@ static int run_test(int cgroup_fd, int server_fd)
>  			 /*icsk_retransmits=*/0);
>  
>  	send_byte(client_fd);
> +	if (wait_for_ack(client_fd, 5) < 0) {
> +		err = -1;
> +		goto close_client_fd;
> +	}
> +
>  
>  	err += verify_sk(map_fd, client_fd, "first payload byte",
>  			 /*invoked=*/2,
> @@ -157,6 +187,7 @@ static int run_test(int cgroup_fd, int server_fd)
>  			 /*delivered_ce=*/0,
>  			 /*icsk_retransmits=*/0);
>  
> +close_client_fd:
>  	close(client_fd);
>  
>  close_bpf_object:
> -- 
> 2.23.0.rc1.153.gdeed80330f-goog
> 

^ permalink raw reply

* RE: [PATCH net-next, 2/6] PCI: hv: Add a Hyper-V PCI mini driver for software backchannel interface
From: Vitaly Kuznetsov @ 2019-08-16 16:16 UTC (permalink / raw)
  To: Haiyang Zhang
  Cc: KY Srinivasan, Stephen Hemminger, linux-kernel@vger.kernel.org,
	sashal@kernel.org, davem@davemloft.net, saeedm@mellanox.com,
	leon@kernel.org, eranbe@mellanox.com, lorenzo.pieralisi@arm.com,
	bhelgaas@google.com, linux-pci@vger.kernel.org,
	linux-hyperv@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <DM6PR21MB13375FA0BA0220A91EF448E1CAAF0@DM6PR21MB1337.namprd21.prod.outlook.com>

Haiyang Zhang <haiyangz@microsoft.com> writes:

>
> The pci_hyperv can only be loaded on VMs on Hyper-V and Azure. Other 
> drivers like MLX5e will have symbolic dependency of pci_hyperv if they 
> use functions exported by pci_hyperv. This dependency will cause other 
> drivers fail to load on other platforms, like VMs on KVM. So we created 
> this mini driver, which can be loaded on any platforms to provide the 
> symbolic dependency.

(/me wondering is there a nicer way around this, by using __weak or
something like that...)

In case this stub is the best solution I'd suggest to rename it to
something like PCI_HYPERV_INTERFACE to make it clear it is not a
separate driver (_MINI makes me think so).

-- 
Vitaly

^ permalink raw reply

* Re: [RFC PATCH bpf-next 00/14] xdp_flow: Flow offload to XDP
From: Stanislav Fomichev @ 2019-08-16 16:20 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Toshiaki Makita, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, David S. Miller,
	Jesper Dangaard Brouer, John Fastabend, Jamal Hadi Salim,
	Cong Wang, Jiri Pirko, netdev, bpf, William Tu
In-Reply-To: <20190816155911.GP2820@mini-arch>

On 08/16, Stanislav Fomichev wrote:
> On 08/15, Jakub Kicinski wrote:
> > On Thu, 15 Aug 2019 08:21:00 -0700, Stanislav Fomichev wrote:
> > > On 08/15, Toshiaki Makita wrote:
> > > > On 2019/08/15 2:07, Stanislav Fomichev wrote:  
> > > > > On 08/13, Toshiaki Makita wrote:  
> > > > > > * Implementation
> > > > > > 
> > > > > > xdp_flow makes use of UMH to load an eBPF program for XDP, similar to
> > > > > > bpfilter. The difference is that xdp_flow does not generate the eBPF
> > > > > > program dynamically but a prebuilt program is embedded in UMH. This is
> > > > > > mainly because flow insertion is considerably frequent. If we generate
> > > > > > and load an eBPF program on each insertion of a flow, the latency of the
> > > > > > first packet of ping in above test will incease, which I want to avoid.  
> > > > > Can this be instead implemented with a new hook that will be called
> > > > > for TC events? This hook can write to perf event buffer and control
> > > > > plane will insert/remove/modify flow tables in the BPF maps (contol
> > > > > plane will also install xdp program).
> > > > > 
> > > > > Why do we need UMH? What am I missing?  
> > > > 
> > > > So you suggest doing everything in xdp_flow kmod?  
> > > You probably don't even need xdp_flow kmod. Add new tc "offload" mode
> > > (bypass) that dumps every command via netlink (or calls the BPF hook
> > > where you can dump it into perf event buffer) and then read that info
> > > from userspace and install xdp programs and modify flow tables.
> > > I don't think you need any kernel changes besides that stream
> > > of data from the kernel about qdisc/tc flow creation/removal/etc.
> > 
> > There's a certain allure in bringing the in-kernel BPF translation
> > infrastructure forward. OTOH from system architecture perspective IMHO
> > it does seem like a task best handed in user space. bpfilter can replace
> > iptables completely, here we're looking at an acceleration relatively
> > loosely coupled with flower.
> Even for bpfilter I would've solved it using something similar:
> iptables bypass + redirect iptables netlink requests to some
> userspace helper that was registered to be iptables compatibility
> manager. And then, again, it becomes a purely userspace problem.
Oh, wait, isn't iptables kernel api is setsockopt/getsockopt?
With the new cgroup hooks you can now try to do bpfilter completely
in BPF 🤯

> The issue with UMH is that the helper has to be statically compiled
> from the kernel tree, which means we can't bring in any dependencies
> (stuff like libkefir you mentioned below).
> 
> But I digress :-)
> 
> > FWIW Quentin spent some time working on a universal flow rule to BPF
> > translation library:
> > 
> > https://github.com/Netronome/libkefir
> > 
> > A lot remains to be done there, but flower front end is one of the
> > targets. A library can be tuned for any application, without a
> > dependency on flower uAPI.
> > 
> > > But, I haven't looked at the series deeply, so I might be missing
> > > something :-)
> > 
> > I don't think you are :)

^ permalink raw reply

* Re: [PATCH RFC net-next 3/3] net: dsa: mv88e6xxx: setup SERDES irq also for CPU/DSA ports
From: Vivien Didelot @ 2019-08-16 16:25 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Andrew Lunn, Vladimir Oltean, Florian Fainelli,
	Marek Behún
In-Reply-To: <20190816150834.26939-4-marek.behun@nic.cz>

On Fri, 16 Aug 2019 17:08:34 +0200, Marek Behún <marek.behun@nic.cz> wrote:
> @@ -2151,16 +2151,6 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port)
>  	if (err)
>  		return err;
>  
> -	/* Enable the SERDES interface for DSA and CPU ports. Normal
> -	 * ports SERDES are enabled when the port is enabled, thus
> -	 * saving a bit of power.
> -	 */
> -	if ((dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port))) {
> -		err = mv88e6xxx_serdes_power(chip, port, true);
> -		if (err)
> -			return err;
> -	}
> -
>  	/* Port Control 2: don't force a good FCS, set the maximum frame size to
>  	 * 10240 bytes, disable 802.1q tags checking, don't discard tagged or
>  	 * untagged frames on this port, do a destination address lookup on all
> @@ -2557,6 +2547,48 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
>  	return err;
>  }
>  
> +static int mv88e6xxx_port_setup(struct dsa_switch *ds, int port)
> +{
> +	struct mv88e6xxx_chip *chip = ds->priv;
> +	int err;
> +
> +	/* Enable the SERDES interface for DSA and CPU ports. Normal
> +	 * ports SERDES are enabled when the port is enabled, thus
> +	 * saving a bit of power.
> +	 */
> +	if ((dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port))) {
> +		mv88e6xxx_reg_lock(chip);
> +
> +		err = mv88e6xxx_serdes_power(chip, port, true);
> +
> +		if (!err && chip->info->ops->serdes_irq_setup)
> +			err = chip->info->ops->serdes_irq_setup(chip, port);
> +
> +		mv88e6xxx_reg_unlock(chip);
> +
> +		return err;
> +	}
> +
> +	return 0;
> +}
> +
> +static void mv88e6xxx_port_teardown(struct dsa_switch *ds, int port)
> +{
> +	struct mv88e6xxx_chip *chip = ds->priv;
> +
> +	if ((dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port))) {
> +		mv88e6xxx_reg_lock(chip);
> +
> +		if (chip->info->ops->serdes_irq_free)
> +			chip->info->ops->serdes_irq_free(chip, port);
> +
> +		if (mv88e6xxx_serdes_power(chip, port, false))
> +			dev_err(chip->dev, "failed to power off SERDES\n");
> +
> +		mv88e6xxx_reg_unlock(chip);
> +	}
> +}

So now we have mv88e6xxx_setup_port() and mv88e6xxx_port_setup(), which both
setup a port, differently, at different time. This is definitely error prone.

^ permalink raw reply

* regression in ath10k dma allocation
From: Tobias Klausmann @ 2019-08-16 16:08 UTC (permalink / raw)
  To: kvalo, davem, ath10k, linux-wireless, netdev, linux-kernel,
	nicoleotsuka, hch, m.szyprowski, robin.murphy, iommu
  Cc: tobias.klausmann

Hello all,

within the current development cycle i noted the ath10k driver failing 
to setup:

[    3.185660] ath10k_pci 0000:02:00.0: failed to alloc CE dest ring 1: -12
[    3.185664] ath10k_pci 0000:02:00.0: failed to allocate copy engine 
pipe 1: -12
[    3.185667] ath10k_pci 0000:02:00.0: failed to allocate copy engine 
pipes: -12
[    3.185669] ath10k_pci 0000:02:00.0: failed to setup resource: -12
[    3.185692] ath10k_pci: probe of 0000:02:00.0 failed with error -12

the actual failure comes from [1] and indeed bisecting brought me to a 
related commit "dma-contiguous: add dma_{alloc,free}_contiguous() 
helpers" [2]. Reverting the commit fixes the problem, yet this might 
just be the driver abusing the dma infrastructure, so hopefully someone 
can have a look at it, as i'm not familiar with the code!


[1]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/wireless/ath/ath10k/ce.c?h=v5.3-rc4#n1650

[2]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b1d2dc009dece4cd7e629419b52266ba51960a6b


Greetings,

Tobias


^ permalink raw reply

* [PATCH net-next 0/3] Improve phc2sys precision for mv88e6xxx switch in combination with imx6-fec
From: Hubert Feurstein @ 2019-08-16 16:31 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Hubert Feurstein, Andrew Lunn, Richard Cochran, Vivien Didelot,
	Florian Fainelli, Heiner Kallweit, Vladimir Oltean, Fugang Duan,
	David S. Miller

With this patchset the phc2sys synchronisation precision improved to +/-555ns on an IMX6DL with an MV88E6220 switch attached.

This patchset takes into account the comments from the following discussions:
- https://lkml.org/lkml/2019/8/2/1364
- https://lkml.org/lkml/2019/8/5/169

Patch 01 adds the required infrastructure in the MDIO layer.
Patch 02 adds support for the PTP_SYS_OFFSET_EXTENDED ioctl in the mv88e6xxx driver.
Patch 03 adds support for the PTP system timestamping in the imx-fec driver.

The following tests show the improvement caused by each patch. The system clock precision was set to 15ns instead of 333ns (as described in https://lkml.org/lkml/2019/8/2/1364).

Without this patchset applied, the phc2sys synchronisation performance was very poor:

  offset: min -27120 max 28840 mean 2.44 stddev 8040.78 count 1236
  delay:  min 282103 max 386385 mean 352568.03 stddev 27814.27 count 1236
  (test runtime 20 minutes)

Results after appling patch 01 and 02:

  offset: min -12316 max 13314 mean -9.38 stddev 4274.82 count 1022
  delay:  min 69977 max 96266 mean 87939.04 stddev 6466.17 count 1022
  (test runtime 16 minutes)

Results after appling patch 03:

  offset: min -788 max 528 mean -0.06 stddev 185.02 count 7171
  delay:  min 1773 max 2031 mean 1909.43 stddev 33.74 count 7171
  (test runtime 119 minutes)

Hubert Feurstein (3):
  net: mdio: add support for passing a PTP system timestamp to the
    mii_bus driver
  net: dsa: mv88e6xxx: extend PTP gettime function to read system clock
  net: fec: add support for PTP system timestamping for MDIO devices

 drivers/net/dsa/mv88e6xxx/chip.h          |   2 +
 drivers/net/dsa/mv88e6xxx/ptp.c           |  11 ++-
 drivers/net/dsa/mv88e6xxx/smi.c           |   3 +-
 drivers/net/ethernet/freescale/fec_main.c |   3 +
 drivers/net/phy/mdio_bus.c                | 105 ++++++++++++++++++++++
 include/linux/mdio.h                      |   7 ++
 include/linux/phy.h                       |  25 ++++++
 7 files changed, 151 insertions(+), 5 deletions(-)

-- 
2.22.1


^ permalink raw reply

* [PATCH net-next 2/3] net: dsa: mv88e6xxx: extend PTP gettime function to read system clock
From: Hubert Feurstein @ 2019-08-16 16:31 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Hubert Feurstein, Andrew Lunn, Richard Cochran, Vivien Didelot,
	Florian Fainelli, Vladimir Oltean, David S. Miller
In-Reply-To: <20190816163157.25314-1-h.feurstein@gmail.com>

This adds support for the PTP_SYS_OFFSET_EXTENDED ioctl.

Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
---
 drivers/net/dsa/mv88e6xxx/chip.h |  2 ++
 drivers/net/dsa/mv88e6xxx/ptp.c  | 11 +++++++----
 drivers/net/dsa/mv88e6xxx/smi.c  |  3 ++-
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index 01963ee94c50..9e14dc406415 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/mv88e6xxx/chip.h
@@ -277,6 +277,8 @@ struct mv88e6xxx_chip {
 	struct ptp_clock_info	ptp_clock_info;
 	struct delayed_work	tai_event_work;
 	struct ptp_pin_desc	pin_config[MV88E6XXX_MAX_GPIO];
+	struct ptp_system_timestamp *ptp_sts;
+
 	u16 trig_config;
 	u16 evcap_config;
 	u16 enable_count;
diff --git a/drivers/net/dsa/mv88e6xxx/ptp.c b/drivers/net/dsa/mv88e6xxx/ptp.c
index 073cbd0bb91b..cf6e52ee9e0a 100644
--- a/drivers/net/dsa/mv88e6xxx/ptp.c
+++ b/drivers/net/dsa/mv88e6xxx/ptp.c
@@ -235,14 +235,17 @@ static int mv88e6xxx_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
 	return 0;
 }
 
-static int mv88e6xxx_ptp_gettime(struct ptp_clock_info *ptp,
-				 struct timespec64 *ts)
+static int mv88e6xxx_ptp_gettimex(struct ptp_clock_info *ptp,
+				  struct timespec64 *ts,
+				  struct ptp_system_timestamp *sts)
 {
 	struct mv88e6xxx_chip *chip = ptp_to_chip(ptp);
 	u64 ns;
 
 	mv88e6xxx_reg_lock(chip);
+	chip->ptp_sts = sts;
 	ns = timecounter_read(&chip->tstamp_tc);
+	chip->ptp_sts = NULL;
 	mv88e6xxx_reg_unlock(chip);
 
 	*ts = ns_to_timespec64(ns);
@@ -426,7 +429,7 @@ static void mv88e6xxx_ptp_overflow_check(struct work_struct *work)
 	struct mv88e6xxx_chip *chip = dw_overflow_to_chip(dw);
 	struct timespec64 ts;
 
-	mv88e6xxx_ptp_gettime(&chip->ptp_clock_info, &ts);
+	mv88e6xxx_ptp_gettimex(&chip->ptp_clock_info, &ts, NULL);
 
 	schedule_delayed_work(&chip->overflow_work,
 			      MV88E6XXX_TAI_OVERFLOW_PERIOD);
@@ -472,7 +475,7 @@ int mv88e6xxx_ptp_setup(struct mv88e6xxx_chip *chip)
 	chip->ptp_clock_info.max_adj    = MV88E6XXX_MAX_ADJ_PPB;
 	chip->ptp_clock_info.adjfine	= mv88e6xxx_ptp_adjfine;
 	chip->ptp_clock_info.adjtime	= mv88e6xxx_ptp_adjtime;
-	chip->ptp_clock_info.gettime64	= mv88e6xxx_ptp_gettime;
+	chip->ptp_clock_info.gettimex64	= mv88e6xxx_ptp_gettimex;
 	chip->ptp_clock_info.settime64	= mv88e6xxx_ptp_settime;
 	chip->ptp_clock_info.enable	= ptp_ops->ptp_enable;
 	chip->ptp_clock_info.verify	= ptp_ops->ptp_verify;
diff --git a/drivers/net/dsa/mv88e6xxx/smi.c b/drivers/net/dsa/mv88e6xxx/smi.c
index 5fc78a063843..e3b0096a9d94 100644
--- a/drivers/net/dsa/mv88e6xxx/smi.c
+++ b/drivers/net/dsa/mv88e6xxx/smi.c
@@ -45,7 +45,8 @@ static int mv88e6xxx_smi_direct_write(struct mv88e6xxx_chip *chip,
 {
 	int ret;
 
-	ret = mdiobus_write_nested(chip->bus, dev, reg, data);
+	ret = mdiobus_write_sts_nested(chip->bus, dev, reg, data,
+				       chip->ptp_sts);
 	if (ret < 0)
 		return ret;
 
-- 
2.22.1


^ permalink raw reply related

* [PATCH net-next 1/3] net: mdio: add support for passing a PTP system timestamp to the mii_bus driver
From: Hubert Feurstein @ 2019-08-16 16:31 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Hubert Feurstein, Andrew Lunn, Richard Cochran, Florian Fainelli,
	Heiner Kallweit, Vladimir Oltean, David S. Miller
In-Reply-To: <20190816163157.25314-1-h.feurstein@gmail.com>

In order to improve the synchronisation precision of phc2sys (from
the linuxptp project) for devices like switches which are attached
to the MDIO bus, it is necessary the get the system timestamps as
close as possible to the access which causes the PTP timestamp
register to be snapshotted in the switch hardware. Usually this is
triggered by an MDIO write access, the snapshotted timestamp is then
transferred by several MDIO reads.

This patch adds the required infrastructure to solve the problem described
above.

Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
---
 drivers/net/phy/mdio_bus.c | 105 +++++++++++++++++++++++++++++++++++++
 include/linux/mdio.h       |   7 +++
 include/linux/phy.h        |  25 +++++++++
 3 files changed, 137 insertions(+)

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index bd04fe762056..167a21f267fa 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -34,6 +34,7 @@
 #include <linux/phy.h>
 #include <linux/io.h>
 #include <linux/uaccess.h>
+#include <linux/ptp_clock_kernel.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/mdio.h>
@@ -697,6 +698,110 @@ int mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val)
 }
 EXPORT_SYMBOL(mdiobus_write);
 
+/**
+ * __mdiobus_write_sts - Unlocked version of the mdiobus_write_sts function
+ * @bus: the mii_bus struct
+ * @addr: the phy address
+ * @regnum: register number to write
+ * @val: value to write to @regnum
+ * @sts: the ptp system timestamp
+ *
+ * Write a MDIO bus register and request the MDIO bus driver to take the
+ * system timestamps when sts-pointer is valid. When the bus driver doesn't
+ * support this, the timestamps are taken in this function instead.
+ *
+ * In order to improve the synchronisation precision of phc2sys (from
+ * the linuxptp project) for devices like switches which are attached
+ * to the MDIO bus, it is necessary the get the system timestamps as
+ * close as possible to the access which causes the PTP timestamp
+ * register to be snapshotted in the switch hardware. Usually this is
+ * triggered by an MDIO write access, the snapshotted timestamp is then
+ * transferred by several MDIO reads.
+ *
+ * Caller must hold the mdio bus lock.
+ *
+ * NOTE: MUST NOT be called from interrupt context.
+ */
+int __mdiobus_write_sts(struct mii_bus *bus, int addr, u32 regnum, u16 val,
+			struct ptp_system_timestamp *sts)
+{
+	int retval;
+
+	WARN_ON_ONCE(!mutex_is_locked(&bus->mdio_lock));
+
+	if (!bus->ptp_sts_supported)
+		ptp_read_system_prets(sts);
+
+	bus->ptp_sts = sts;
+	retval = __mdiobus_write(bus, addr, regnum, val);
+	bus->ptp_sts = NULL;
+
+	if (!bus->ptp_sts_supported)
+		ptp_read_system_postts(sts);
+
+	return retval;
+}
+EXPORT_SYMBOL(__mdiobus_write_sts);
+
+/**
+ * mdiobus_write_sts - Convenience function for writing a given MII mgmt
+ * register
+ *
+ * @bus: the mii_bus struct
+ * @addr: the phy address
+ * @regnum: register number to write
+ * @val: value to write to @regnum
+ * @sts: the ptp system timestamp
+ *
+ * NOTE: MUST NOT be called from interrupt context,
+ * because the bus read/write functions may wait for an interrupt
+ * to conclude the operation.
+ */
+int mdiobus_write_sts(struct mii_bus *bus, int addr, u32 regnum, u16 val,
+		      struct ptp_system_timestamp *sts)
+{
+	int retval;
+
+	BUG_ON(in_interrupt());
+
+	mutex_lock(&bus->mdio_lock);
+	retval = __mdiobus_write_sts(bus, addr, regnum, val, sts);
+	mutex_unlock(&bus->mdio_lock);
+
+	return retval;
+}
+EXPORT_SYMBOL(mdiobus_write_sts);
+
+/**
+ * mdiobus_write_sts_nested - Nested version of the mdiobus_write_sts function
+ * @bus: the mii_bus struct
+ * @addr: the phy address
+ * @regnum: register number to write
+ * @val: value to write to @regnum
+ * @sts: the ptp system timestamp
+ *
+ * In case of nested MDIO bus access avoid lockdep false positives by
+ * using mutex_lock_nested().
+ *
+ * NOTE: MUST NOT be called from interrupt context,
+ * because the bus read/write functions may wait for an interrupt
+ * to conclude the operation.
+ */
+int mdiobus_write_sts_nested(struct mii_bus *bus, int addr, u32 regnum, u16 val,
+			     struct ptp_system_timestamp *sts)
+{
+	int retval;
+
+	BUG_ON(in_interrupt());
+
+	mutex_lock_nested(&bus->mdio_lock, MDIO_MUTEX_NESTED);
+	retval = __mdiobus_write_sts(bus, addr, regnum, val, sts);
+	mutex_unlock(&bus->mdio_lock);
+
+	return retval;
+}
+EXPORT_SYMBOL(mdiobus_write_sts_nested);
+
 /**
  * mdio_bus_match - determine if given MDIO driver supports the given
  *		    MDIO device
diff --git a/include/linux/mdio.h b/include/linux/mdio.h
index e8242ad88c81..d65625c75b15 100644
--- a/include/linux/mdio.h
+++ b/include/linux/mdio.h
@@ -9,6 +9,7 @@
 #include <uapi/linux/mdio.h>
 #include <linux/mod_devicetable.h>
 
+struct ptp_system_timestamp;
 struct gpio_desc;
 struct mii_bus;
 
@@ -305,11 +306,17 @@ static inline void mii_10gbt_stat_mod_linkmode_lpa_t(unsigned long *advertising,
 
 int __mdiobus_read(struct mii_bus *bus, int addr, u32 regnum);
 int __mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val);
+int __mdiobus_write_sts(struct mii_bus *bus, int addr, u32 regnum, u16 val,
+			struct ptp_system_timestamp *sts);
 
 int mdiobus_read(struct mii_bus *bus, int addr, u32 regnum);
 int mdiobus_read_nested(struct mii_bus *bus, int addr, u32 regnum);
 int mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val);
 int mdiobus_write_nested(struct mii_bus *bus, int addr, u32 regnum, u16 val);
+int mdiobus_write_sts(struct mii_bus *bus, int addr, u32 regnum, u16 val,
+		      struct ptp_system_timestamp *sts);
+int mdiobus_write_sts_nested(struct mii_bus *bus, int addr, u32 regnum, u16 val,
+			     struct ptp_system_timestamp *sts);
 
 int mdiobus_register_device(struct mdio_device *mdiodev);
 int mdiobus_unregister_device(struct mdio_device *mdiodev);
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 462b90b73f93..15afe9c5256b 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -252,6 +252,31 @@ struct mii_bus {
 	int reset_delay_us;
 	/* RESET GPIO descriptor pointer */
 	struct gpio_desc *reset_gpiod;
+
+	/* PTP system timestamping support
+	 *
+	 * In order to improve the synchronisation precision of phc2sys (from
+	 * the linuxptp project) for devices like switches which are attached
+	 * to the MDIO bus, it is necessary the get the system timestamps as
+	 * close as possible to the access which causes the PTP timestamp
+	 * register to be snapshotted in the switch hardware. Usually this is
+	 * triggered by an MDIO write access, the snapshotted timestamp is then
+	 * transferred by several MDIO reads.
+	 *
+	 * The switch driver can use mdio_write_sts*() to pass through the
+	 * system timestamp pointer @ptp_sts to the MDIO bus driver. The bus
+	 * driver simply has to do the following calls in its write handler:
+	 *	ptp_read_system_prets(bus->ptp_sts);
+	 *	writel(value, mdio-register)
+	 *	ptp_read_system_postts(bus->ptp_sts);
+	 *
+	 * The ptp_read_system_*ts functions already check the ptp_sts pointer.
+	 *
+	 * @ptp_sts_supported: Must be set to true when the MDIO bus driver
+	 * takes the timestamps as described above.
+	 */
+	struct ptp_system_timestamp *ptp_sts;
+	bool ptp_sts_supported;
 };
 #define to_mii_bus(d) container_of(d, struct mii_bus, dev)
 
-- 
2.22.1


^ permalink raw reply related

* [PATCH net-next 3/3] net: fec: add support for PTP system timestamping for MDIO devices
From: Hubert Feurstein @ 2019-08-16 16:31 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Hubert Feurstein, Andrew Lunn, Richard Cochran, Fugang Duan,
	Vladimir Oltean, David S. Miller
In-Reply-To: <20190816163157.25314-1-h.feurstein@gmail.com>

In order to improve the synchronisation precision of phc2sys (from
the linuxptp project) for devices like switches which are attached
to the MDIO bus, it is necessary the get the system timestamps as
close as possible to the access which causes the PTP timestamp
register to be snapshotted in the switch hardware. Usually this is
triggered by an MDIO write access, the snapshotted timestamp is then
transferred by several MDIO reads.

The ptp_read_system_*ts functions already check the ptp_sts pointer.

Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
---
 drivers/net/ethernet/freescale/fec_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 2f6057e7335d..60e866631b61 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -1815,10 +1815,12 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
 	reinit_completion(&fep->mdio_done);
 
 	/* start a write op */
+	ptp_read_system_prets(bus->ptp_sts);
 	writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
 		FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
 		FEC_MMFR_TA | FEC_MMFR_DATA(value),
 		fep->hwp + FEC_MII_DATA);
+	ptp_read_system_postts(bus->ptp_sts);
 
 	/* wait for end of transfer */
 	time_left = wait_for_completion_timeout(&fep->mdio_done,
@@ -2034,6 +2036,7 @@ static int fec_enet_mii_init(struct platform_device *pdev)
 		pdev->name, fep->dev_id + 1);
 	fep->mii_bus->priv = fep;
 	fep->mii_bus->parent = &pdev->dev;
+	fep->mii_bus->ptp_sts_supported = true;
 
 	node = of_get_child_by_name(pdev->dev.of_node, "mdio");
 	err = of_mdiobus_register(fep->mii_bus, node);
-- 
2.22.1


^ permalink raw reply related

* Re: [bpf-next] selftests/bpf: fix race in test_tcp_rtt test
From: Petar Penkov @ 2019-08-16 16:37 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Networking, bpf, davem, Alexei Starovoitov, Daniel Borkmann,
	Stanislav Fomichev, Petar Penkov
In-Reply-To: <20190816161302.GQ2820@mini-arch>

On Fri, Aug 16, 2019 at 9:13 AM Stanislav Fomichev <sdf@fomichev.me> wrote:
>
> On 08/16, Petar Penkov wrote:
> > From: Petar Penkov <ppenkov@google.com>
> >
> > There is a race in this test between receiving the ACK for the
> > single-byte packet sent in the test, and reading the values from the
> > map.
> >
> > This patch fixes this by having the client wait until there are no more
> > unacknowledged packets.
> >
> > Before:
> > for i in {1..1000}; do ../net/in_netns.sh ./test_tcp_rtt; \
> > done | grep -c PASSED
> > < trimmed error messages >
> > 993
> >
> > After:
> > for i in {1..10000}; do ../net/in_netns.sh ./test_tcp_rtt; \
> > done | grep -c PASSED
> > 10000
> >
> > Fixes: b55873984dab ("selftests/bpf: test BPF_SOCK_OPS_RTT_CB")
> > Signed-off-by: Petar Penkov <ppenkov@google.com>
> > ---
> >  tools/testing/selftests/bpf/test_tcp_rtt.c | 31 ++++++++++++++++++++++
> >  1 file changed, 31 insertions(+)
> >
> > diff --git a/tools/testing/selftests/bpf/test_tcp_rtt.c b/tools/testing/selftests/bpf/test_tcp_rtt.c
> > index 90c3862f74a8..2b4754473956 100644
> > --- a/tools/testing/selftests/bpf/test_tcp_rtt.c
> > +++ b/tools/testing/selftests/bpf/test_tcp_rtt.c
> > @@ -6,6 +6,7 @@
> >  #include <sys/types.h>
> >  #include <sys/socket.h>
> >  #include <netinet/in.h>
> > +#include <netinet/tcp.h>
> >  #include <pthread.h>
> >
> >  #include <linux/filter.h>
> > @@ -34,6 +35,30 @@ static void send_byte(int fd)
> >               error(1, errno, "Failed to send single byte");
> >  }
> >
> > +static int wait_for_ack(int fd, int retries)
> > +{
> > +     struct tcp_info info;
> > +     socklen_t optlen;
> > +     int i, err;
> > +
> > +     for (i = 0; i < retries; i++) {
> > +             optlen = sizeof(info);
> > +             err = getsockopt(fd, SOL_TCP, TCP_INFO, &info, &optlen);
> > +             if (err < 0) {
> > +                     log_err("Failed to lookup TCP stats");
> > +                     return err;
> > +             }
> > +
> > +             if (info.tcpi_unacked == 0)
> > +                     return 0;
> > +
> > +             sleep(1);
> Isn't it too big of a hammer? Maybe usleep(10) here and do x100 retries
> instead?
>
I guess this is more consistent with the time we expect to wait for an
ACK in the test, will change and send out a v2 shortly. Thank you for
your quick review!
> > +     }
> > +
> > +     log_err("Did not receive ACK");
> > +     return -1;
> > +}
> > +
> >  static int verify_sk(int map_fd, int client_fd, const char *msg, __u32 invoked,
> >                    __u32 dsack_dups, __u32 delivered, __u32 delivered_ce,
> >                    __u32 icsk_retransmits)
> > @@ -149,6 +174,11 @@ static int run_test(int cgroup_fd, int server_fd)
> >                        /*icsk_retransmits=*/0);
> >
> >       send_byte(client_fd);
> > +     if (wait_for_ack(client_fd, 5) < 0) {
> > +             err = -1;
> > +             goto close_client_fd;
> > +     }
> > +
> >
> >       err += verify_sk(map_fd, client_fd, "first payload byte",
> >                        /*invoked=*/2,
> > @@ -157,6 +187,7 @@ static int run_test(int cgroup_fd, int server_fd)
> >                        /*delivered_ce=*/0,
> >                        /*icsk_retransmits=*/0);
> >
> > +close_client_fd:
> >       close(client_fd);
> >
> >  close_bpf_object:
> > --
> > 2.23.0.rc1.153.gdeed80330f-goog
> >

^ permalink raw reply

* Re: [PATCH bpf 0/6] tools: bpftool: fix printf()-like functions
From: Quentin Monnet @ 2019-08-16 16:41 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, Network Development,
	oss-drivers
In-Reply-To: <CAADnVQKpPaZ3wJJwSn=JPML9pWzwy_8G9c0H=ToaaxZEJ8isnQ@mail.gmail.com>

2019-08-15 22:08 UTC-0700 ~ Alexei Starovoitov
<alexei.starovoitov@gmail.com>
> On Thu, Aug 15, 2019 at 7:32 AM Quentin Monnet
> <quentin.monnet@netronome.com> wrote:
>>
>> Hi,
>> Because the "__printf()" attributes were used only where the functions are
>> implemented, and not in header files, the checks have not been enforced on
>> all the calls to printf()-like functions, and a number of errors slipped in
>> bpftool over time.
>>
>> This set cleans up such errors, and then moves the "__printf()" attributes
>> to header files, so that the checks are performed at all locations.
> 
> Applied. Thanks
> 

Thanks Alexei!

I noticed the set was applied to the bpf-next tree, and not bpf. Just
checking if this is intentional?

Regards,
Quentin

^ permalink raw reply

* Re: regression in ath10k dma allocation
From: Christoph Hellwig @ 2019-08-16 16:43 UTC (permalink / raw)
  To: Tobias Klausmann
  Cc: kvalo, davem, ath10k, linux-wireless, netdev, linux-kernel,
	nicoleotsuka, hch, m.szyprowski, robin.murphy, iommu,
	tobias.klausmann
In-Reply-To: <8fe8b415-2d34-0a14-170b-dcb31c162e67@mni.thm.de>

Hi Tobias,

do you have CONFIG_DMA_CMA set in your config?  If not please make sure
you have this commit in your testing tree, and if the problem still
persists it would be a little odd and we'd have to dig deeper:

commit dd3dcede9fa0a0b661ac1f24843f4a1b1317fdb6
Author: Nicolin Chen <nicoleotsuka@gmail.com>
Date:   Wed May 29 17:54:25 2019 -0700

    dma-contiguous: fix !CONFIG_DMA_CMA version of dma_{alloc, free}_contiguous()


^ permalink raw reply

* Re: [PATCH net-next,v4 08/12] drivers: net: use flow block API
From: Edward Cree @ 2019-08-16 17:00 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netdev, netfilter-devel
In-Reply-To: <20190816010421.if6mbyl2n3fsujy4@salvia>

On 16/08/2019 02:04, Pablo Neira Ayuso wrote:
> On Wed, Aug 14, 2019 at 05:17:20PM +0100, Edward Cree wrote:
>> TBH I'm still not clear why you need a flow_block per subsystem, rather than
>>  just having multiple subsystems feed their offload requests through the same
>>  flow_block but with different enum tc_setup_type or enum tc_fl_command or
>>  some other indication that this is "netfilter" rather than "tc" asking for a
>>  tc_cls_flower_offload.
> In tc, the flow_block is set up by when the ingress qdisc is
> registered. The usual scenario for most drivers is to have one single
> flow_block per registered ingress qdisc, this makes a 1:1 mapping
> between ingress qdisc and flow_block.
>
> Still, you can register two or more ingress qdiscs to make them share
> the same policy via 'tc block'. In that case all those qdiscs use one
> single flow_block. This makes a N:1 mapping between these qdisc
> ingress and the flow_block. This policy applies to all ingress qdiscs
> that are part of the same tc block. By 'tc block', I'm refering to the
> tcf_block structure.
>
> In netfilter, there are ingress basechains that are registered per
> device. Each basechain gets a flow_block by when the basechain is
> registered. Shared blocks as in tcf_block are not yet supported, but
> it should not be hard to extend it to make this work.
>
> To reuse the same flow_block as entry point for all subsystems as your
> propose - assuming offloads for two or more subsystems are in place -
> then all of them would need to have the same block sharing
> configuration, which might not be the case, ie. tc ingress might have
> a eth0 and eth1 use the same policy via flow_block, while netfilter
> might have one basechain for eth0 and another for eth1 (no policy
> sharing).
Thank you, that's very helpful.

>> This really needs a design document explaining what all the bits are, how
>>  they fit together, and why they need to be like that.
> I did not design this flow_block abstraction, this concept was already
> in place under a different name and extend it so the ethtool/netfilter
> subsystems to avoid driver code duplication for offloads.
It's more the new implementation that you've created as part of this
 extension that I was asking about, although I agree that the
 abstraction that already existed is in need of documentation too.

^ permalink raw reply

* Re: [PATCH RFC net-next 3/3] net: dsa: mv88e6xxx: setup SERDES irq also for CPU/DSA ports
From: Marek Behun @ 2019-08-16 17:05 UTC (permalink / raw)
  To: Vivien Didelot; +Cc: netdev, Andrew Lunn, Vladimir Oltean, Florian Fainelli
In-Reply-To: <20190816122552.GC629@t480s.localdomain>

On Fri, 16 Aug 2019 12:25:52 -0400
Vivien Didelot <vivien.didelot@gmail.com> wrote:

> So now we have mv88e6xxx_setup_port() and mv88e6xxx_port_setup(), which both
> setup a port, differently, at different time. This is definitely error prone.

Hmm. I don't know how much of mv88e6xxx_setup_port() could be moved to
this new port_setup(), since there are other setup functions called in
mv88e6xxx_setup() that can possibly depend on what was done by
mv88e6xxx_setup_port().

Maybe the new DSA operations should be called .after_setup()
and .before_teardown(), and be called just once for the whole switch,
not for each port?

^ permalink raw reply

* [bpf-next,v2] selftests/bpf: fix race in test_tcp_rtt test
From: Petar Penkov @ 2019-08-16 17:08 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, sdf, Petar Penkov

From: Petar Penkov <ppenkov@google.com>

There is a race in this test between receiving the ACK for the
single-byte packet sent in the test, and reading the values from the
map.

This patch fixes this by having the client wait until there are no more
unacknowledged packets.

Before:
for i in {1..1000}; do ../net/in_netns.sh ./test_tcp_rtt; \
done | grep -c PASSED
< trimmed error messages >
993

After:
for i in {1..10000}; do ../net/in_netns.sh ./test_tcp_rtt; \
done | grep -c PASSED
10000

Fixes: b55873984dab ("selftests/bpf: test BPF_SOCK_OPS_RTT_CB")
Signed-off-by: Petar Penkov <ppenkov@google.com>
---
 tools/testing/selftests/bpf/test_tcp_rtt.c | 31 ++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_tcp_rtt.c b/tools/testing/selftests/bpf/test_tcp_rtt.c
index 90c3862f74a8..93916a69823e 100644
--- a/tools/testing/selftests/bpf/test_tcp_rtt.c
+++ b/tools/testing/selftests/bpf/test_tcp_rtt.c
@@ -6,6 +6,7 @@
 #include <sys/types.h>
 #include <sys/socket.h>
 #include <netinet/in.h>
+#include <netinet/tcp.h>
 #include <pthread.h>
 
 #include <linux/filter.h>
@@ -34,6 +35,30 @@ static void send_byte(int fd)
 		error(1, errno, "Failed to send single byte");
 }
 
+static int wait_for_ack(int fd, int retries)
+{
+	struct tcp_info info;
+	socklen_t optlen;
+	int i, err;
+
+	for (i = 0; i < retries; i++) {
+		optlen = sizeof(info);
+		err = getsockopt(fd, SOL_TCP, TCP_INFO, &info, &optlen);
+		if (err < 0) {
+			log_err("Failed to lookup TCP stats");
+			return err;
+		}
+
+		if (info.tcpi_unacked == 0)
+			return 0;
+
+		usleep(10);
+	}
+
+	log_err("Did not receive ACK");
+	return -1;
+}
+
 static int verify_sk(int map_fd, int client_fd, const char *msg, __u32 invoked,
 		     __u32 dsack_dups, __u32 delivered, __u32 delivered_ce,
 		     __u32 icsk_retransmits)
@@ -149,6 +174,11 @@ static int run_test(int cgroup_fd, int server_fd)
 			 /*icsk_retransmits=*/0);
 
 	send_byte(client_fd);
+	if (wait_for_ack(client_fd, 100) < 0) {
+		err = -1;
+		goto close_client_fd;
+	}
+
 
 	err += verify_sk(map_fd, client_fd, "first payload byte",
 			 /*invoked=*/2,
@@ -157,6 +187,7 @@ static int run_test(int cgroup_fd, int server_fd)
 			 /*delivered_ce=*/0,
 			 /*icsk_retransmits=*/0);
 
+close_client_fd:
 	close(client_fd);
 
 close_bpf_object:
-- 
2.23.0.rc1.153.gdeed80330f-goog


^ permalink raw reply related

* Re: [PATCH bpf 0/6] tools: bpftool: fix printf()-like functions
From: Alexei Starovoitov @ 2019-08-16 17:11 UTC (permalink / raw)
  To: Quentin Monnet
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, Network Development,
	oss-drivers
In-Reply-To: <10602447-213f-fce5-54c7-7952eb3e8712@netronome.com>

On Fri, Aug 16, 2019 at 9:41 AM Quentin Monnet
<quentin.monnet@netronome.com> wrote:
>
> 2019-08-15 22:08 UTC-0700 ~ Alexei Starovoitov
> <alexei.starovoitov@gmail.com>
> > On Thu, Aug 15, 2019 at 7:32 AM Quentin Monnet
> > <quentin.monnet@netronome.com> wrote:
> >>
> >> Hi,
> >> Because the "__printf()" attributes were used only where the functions are
> >> implemented, and not in header files, the checks have not been enforced on
> >> all the calls to printf()-like functions, and a number of errors slipped in
> >> bpftool over time.
> >>
> >> This set cleans up such errors, and then moves the "__printf()" attributes
> >> to header files, so that the checks are performed at all locations.
> >
> > Applied. Thanks
> >
>
> Thanks Alexei!
>
> I noticed the set was applied to the bpf-next tree, and not bpf. Just
> checking if this is intentional?

Yes. I don't see the _fix_ part in there.
Looks like cleanup to me.
I've also considered to push
commit d34b044038bf ("tools: bpftool: close prog FD before exit on
showing a single program")
to bpf-next as well.
That fd leak didn't feel that necessary to push to bpf tree
and risk merge conflicts... but I pushed it to bpf at the end.

^ permalink raw reply

* Re: [bpf-next,v2] selftests/bpf: fix race in test_tcp_rtt test
From: Stanislav Fomichev @ 2019-08-16 17:14 UTC (permalink / raw)
  To: Petar Penkov; +Cc: netdev, bpf, davem, ast, daniel, sdf, Petar Penkov
In-Reply-To: <20190816170825.22500-1-ppenkov.kernel@gmail.com>

On 08/16, Petar Penkov wrote:
> From: Petar Penkov <ppenkov@google.com>
> 
> There is a race in this test between receiving the ACK for the
> single-byte packet sent in the test, and reading the values from the
> map.
> 
> This patch fixes this by having the client wait until there are no more
> unacknowledged packets.
Reviewed-by: Stanislav Fomichev <sdf@google.com>

Thanks!
> 
> Before:
> for i in {1..1000}; do ../net/in_netns.sh ./test_tcp_rtt; \
> done | grep -c PASSED
> < trimmed error messages >
> 993
> 
> After:
> for i in {1..10000}; do ../net/in_netns.sh ./test_tcp_rtt; \
> done | grep -c PASSED
> 10000
> 
> Fixes: b55873984dab ("selftests/bpf: test BPF_SOCK_OPS_RTT_CB")
> Signed-off-by: Petar Penkov <ppenkov@google.com>
> ---
>  tools/testing/selftests/bpf/test_tcp_rtt.c | 31 ++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/tools/testing/selftests/bpf/test_tcp_rtt.c b/tools/testing/selftests/bpf/test_tcp_rtt.c
> index 90c3862f74a8..93916a69823e 100644
> --- a/tools/testing/selftests/bpf/test_tcp_rtt.c
> +++ b/tools/testing/selftests/bpf/test_tcp_rtt.c
> @@ -6,6 +6,7 @@
>  #include <sys/types.h>
>  #include <sys/socket.h>
>  #include <netinet/in.h>
> +#include <netinet/tcp.h>
>  #include <pthread.h>
>  
>  #include <linux/filter.h>
> @@ -34,6 +35,30 @@ static void send_byte(int fd)
>  		error(1, errno, "Failed to send single byte");
>  }
>  
> +static int wait_for_ack(int fd, int retries)
> +{
> +	struct tcp_info info;
> +	socklen_t optlen;
> +	int i, err;
> +
> +	for (i = 0; i < retries; i++) {
> +		optlen = sizeof(info);
> +		err = getsockopt(fd, SOL_TCP, TCP_INFO, &info, &optlen);
> +		if (err < 0) {
> +			log_err("Failed to lookup TCP stats");
> +			return err;
> +		}
> +
> +		if (info.tcpi_unacked == 0)
> +			return 0;
> +
> +		usleep(10);
> +	}
> +
> +	log_err("Did not receive ACK");
> +	return -1;
> +}
> +
>  static int verify_sk(int map_fd, int client_fd, const char *msg, __u32 invoked,
>  		     __u32 dsack_dups, __u32 delivered, __u32 delivered_ce,
>  		     __u32 icsk_retransmits)
> @@ -149,6 +174,11 @@ static int run_test(int cgroup_fd, int server_fd)
>  			 /*icsk_retransmits=*/0);
>  
>  	send_byte(client_fd);
> +	if (wait_for_ack(client_fd, 100) < 0) {
> +		err = -1;
> +		goto close_client_fd;
> +	}
> +
>  
>  	err += verify_sk(map_fd, client_fd, "first payload byte",
>  			 /*invoked=*/2,
> @@ -157,6 +187,7 @@ static int run_test(int cgroup_fd, int server_fd)
>  			 /*delivered_ce=*/0,
>  			 /*icsk_retransmits=*/0);
>  
> +close_client_fd:
>  	close(client_fd);
>  
>  close_bpf_object:
> -- 
> 2.23.0.rc1.153.gdeed80330f-goog
> 

^ permalink raw reply

* Re: [PATCH bpf-next] libbpf: relicense bpf_helpers.h and bpf_endian.h
From: Greg KH @ 2019-08-16 17:15 UTC (permalink / raw)
  To: Daniel Borkmann, Jesper Dangaard Brouer
  Cc: Andrii Nakryiko, bpf, netdev, andrii.nakryiko, kernel-team,
	Michael Holzheu, Naveen N . Rao, David S . Miller,
	Michal Rostecki, John Fastabend, Sargun Dhillon
In-Reply-To: <23a87525-acf5-7a7e-b7b6-3c47b9760eeb@iogearbox.net>

On Fri, Aug 16, 2019 at 05:29:27PM +0200, Daniel Borkmann wrote:
> On 8/16/19 2:10 PM, Jesper Dangaard Brouer wrote:
> > On Thu, 15 Aug 2019 22:45:43 -0700
> > Andrii Nakryiko <andriin@fb.com> wrote:
> > 
> > > bpf_helpers.h and bpf_endian.h contain useful macros and BPF helper
> > > definitions essential to almost every BPF program. Which makes them
> > > useful not just for selftests. To be able to expose them as part of
> > > libbpf, though, we need them to be dual-licensed as LGPL-2.1 OR
> > > BSD-2-Clause. This patch updates licensing of those two files.
> > 
> > I've already ACKed this, and is fine with (LGPL-2.1 OR BSD-2-Clause).
> > 
> > I just want to understand, why "BSD-2-Clause" and not "Apache-2.0" ?
> > 
> > The original argument was that this needed to be compatible with
> > "Apache-2.0", then why not simply add this in the "OR" ?
> 
> It's use is discouraged in the kernel tree, see also LICENSES/dual/Apache-2.0 (below) and
> statement wrt compatibility from https://www.apache.org/licenses/GPL-compatibility.html:
> 
>   Valid-License-Identifier: Apache-2.0
>   SPDX-URL: https://spdx.org/licenses/Apache-2.0.html
>   Usage-Guide:
>     Do NOT use. The Apache-2.0 is not GPL2 compatible. [...]

That is correct, don't use Apache-2 code in the kernel please.  Even as
a dual-license, it's a total mess.

Having this be BSD-2 is actually better, as it should be fine to use
with Apache 2 code, right?

Jesper, do you know of any license that BSD-2 is not compatible with
that is needed?

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH bpf 0/6] tools: bpftool: fix printf()-like functions
From: Quentin Monnet @ 2019-08-16 17:18 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, Network Development,
	oss-drivers
In-Reply-To: <CAADnVQLPg8jEsUbKOxzQc5Q1BKrB=urSWiniGwsJhcm=UM7oKA@mail.gmail.com>

2019-08-16 10:11 UTC-0700 ~ Alexei Starovoitov
<alexei.starovoitov@gmail.com>
> On Fri, Aug 16, 2019 at 9:41 AM Quentin Monnet
> <quentin.monnet@netronome.com> wrote:
>>
>> 2019-08-15 22:08 UTC-0700 ~ Alexei Starovoitov
>> <alexei.starovoitov@gmail.com>
>>> On Thu, Aug 15, 2019 at 7:32 AM Quentin Monnet
>>> <quentin.monnet@netronome.com> wrote:
>>>>
>>>> Hi,
>>>> Because the "__printf()" attributes were used only where the functions are
>>>> implemented, and not in header files, the checks have not been enforced on
>>>> all the calls to printf()-like functions, and a number of errors slipped in
>>>> bpftool over time.
>>>>
>>>> This set cleans up such errors, and then moves the "__printf()" attributes
>>>> to header files, so that the checks are performed at all locations.
>>>
>>> Applied. Thanks
>>>
>>
>> Thanks Alexei!
>>
>> I noticed the set was applied to the bpf-next tree, and not bpf. Just
>> checking if this is intentional?
> 
> Yes. I don't see the _fix_ part in there.
> Looks like cleanup to me.
> I've also considered to push
> commit d34b044038bf ("tools: bpftool: close prog FD before exit on
> showing a single program")
> to bpf-next as well.
> That fd leak didn't feel that necessary to push to bpf tree
> and risk merge conflicts... but I pushed it to bpf at the end.
> 

Ok, thanks for explaining. I'll consider submitting this kind of patches
to bpf-next instead in the future.

Quentin

^ permalink raw reply

* Re: Unable to create htb tc classes more than 64K
From: Cong Wang @ 2019-08-16 17:45 UTC (permalink / raw)
  To: Akshat Kakkar; +Cc: NetFilter, lartc, netdev
In-Reply-To: <CAA5aLPhf1=wzQG0BAonhR3td-RhEmXaczug8n4hzXCzreb+52g@mail.gmail.com>

On Fri, Aug 16, 2019 at 5:49 AM Akshat Kakkar <akshat.1984@gmail.com> wrote:
>
> I want to have around 1 Million htb tc classes.
> The simple structure of htb tc class, allow having only 64K classes at once.

This is probably due the limit of class ID which is 16bit for minor.


> But, it is possible to make it more hierarchical using hierarchy of
> qdisc and classes.
> For this I tried something like this
>
> tc qdisc add dev eno2 root handle 100: htb
> tc class add dev eno2 parent 100: classid 100:1 htb rate 100Mbps
> tc class add dev eno2 parent 100: classid 100:2 htb rate 100Mbps
>
> tc qdisc add dev eno2 parent 100:1 handle 1: htb
> tc class add dev eno2 parent 1: classid 1:10 htb rate 100kbps
> tc class add dev eno2 parent 1: classid 1:20 htb rate 300kbps
>
> tc qdisc add dev eno2 parent 100:2 handle 2: htb
> tc class add dev eno2 parent 2: classid 2:10 htb rate 100kbps
> tc class add dev eno2 parent 2: classid 2:20 htb rate 300kbps
>
> What I want is something like:
> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
> 0x00000001 fw flowid 1:10
> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
> 0x00000002 fw flowid 1:20
> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
> 0x00000003 fw flowid 2:10
> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
> 0x00000004 fw flowid 2:20
>
> But I am unable to shape my traffic by any of 1:10, 1:20, 2:10 or 2:20.
>
> Can you please suggest, where is it going wrong?
> Is it not possible altogether?

The filter could only filter for classes on the same level, you are
trying to filter for the children classes, which doesn't work.

Thanks.

^ permalink raw reply

* Re: [PATCH bpf 0/6] tools: bpftool: fix printf()-like functions
From: Jakub Kicinski @ 2019-08-16 17:51 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Quentin Monnet, Alexei Starovoitov, Daniel Borkmann, bpf,
	Network Development, oss-drivers
In-Reply-To: <CAADnVQLPg8jEsUbKOxzQc5Q1BKrB=urSWiniGwsJhcm=UM7oKA@mail.gmail.com>

On Fri, 16 Aug 2019 10:11:12 -0700, Alexei Starovoitov wrote:
> On Fri, Aug 16, 2019 at 9:41 AM Quentin Monnet wrote:
> > 2019-08-15 22:08 UTC-0700 ~ Alexei Starovoitov
> > <alexei.starovoitov@gmail.com>  
> > > On Thu, Aug 15, 2019 at 7:32 AM Quentin Monnet
> > > <quentin.monnet@netronome.com> wrote:  
> > >>
> > >> Hi,
> > >> Because the "__printf()" attributes were used only where the functions are
> > >> implemented, and not in header files, the checks have not been enforced on
> > >> all the calls to printf()-like functions, and a number of errors slipped in
> > >> bpftool over time.
> > >>
> > >> This set cleans up such errors, and then moves the "__printf()" attributes
> > >> to header files, so that the checks are performed at all locations.  
> > >
> > > Applied. Thanks
> > >  
> >
> > Thanks Alexei!
> >
> > I noticed the set was applied to the bpf-next tree, and not bpf. Just
> > checking if this is intentional?  
> 
> Yes. I don't see the _fix_ part in there.

Mm.. these are not critical indeed, but patches 1 and 3 do fix a crash.
Perhaps those should had been a series on their own. 

We'll recalibrate :)

> Looks like cleanup to me.
> I've also considered to push
> commit d34b044038bf ("tools: bpftool: close prog FD before exit on
> showing a single program")
> to bpf-next as well.
> That fd leak didn't feel that necessary to push to bpf tree
> and risk merge conflicts... but I pushed it to bpf at the end.


^ permalink raw reply

* Re: [PATCH net-next v7 5/6] flow_offload: support get multi-subsystem block
From: Jakub Kicinski @ 2019-08-16 17:56 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: wenxu, David Miller, Jiri Pirko, pablo@netfilter.org,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <vbfpnl55eyg.fsf@mellanox.com>

On Fri, 16 Aug 2019 15:04:44 +0000, Vlad Buslov wrote:
> >> [  401.511871] RSP: 002b:00007ffca2a9fad8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> >> [  401.511875] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fad892d30f8
> >> [  401.511878] RDX: 0000000000000002 RSI: 000055afeb072a90 RDI: 0000000000000001
> >> [  401.511881] RBP: 000055afeb072a90 R08: 00000000ffffffff R09: 000000000000000a
> >> [  401.511884] R10: 000055afeb058710 R11: 0000000000000246 R12: 0000000000000002
> >> [  401.511887] R13: 00007fad893a8780 R14: 0000000000000002 R15: 00007fad893a3740
> >>
> >> I don't think it is correct approach to try to call these callbacks with
> >> rcu protection because:
> >>
> >> - Cls API uses sleeping locks that cannot be used in rcu read section
> >>   (hence the included trace).
> >>
> >> - It assumes that all implementation of classifier ops reoffload() don't
> >>   sleep.
> >>
> >> - And that all driver offload callbacks (both block and classifier
> >>   setup) don't sleep, which is not the case.
> >>
> >> I don't see any straightforward way to fix this, besides using some
> >> other locking mechanism to protect block_ing_cb_list.
> >>
> >> Regards,
> >> Vlad  
> >
> > Maybe get the  mutex flow_indr_block_ing_cb_lock for both lookup, add, delete? 
> >
> > the callbacks_lists. the add and delete is work only on modules init case. So the
> >
> > lookup is also not frequently(ony [un]register) and can protect with the locks.  
> 
> That should do the job. I'll send the patch.

Hi Vlad! 

While looking into this, would you mind also add the missing
flow_block_cb_is_busy() calls in the indirect handlers in the drivers?

LMK if you're too busy, I don't want this to get forgotten :)

^ permalink raw reply

* Re: [net-next 01/15] ice: Implement ethtool ops for channels
From: Nguyen, Anthony L @ 2019-08-16 18:01 UTC (permalink / raw)
  To: jakub.kicinski@netronome.com
  Cc: nhorman@redhat.com, davem@davemloft.net, Kirsher, Jeffrey T,
	Bowers, AndrewX, netdev@vger.kernel.org, sassmann@redhat.com,
	Tieman, Henry W
In-Reply-To: <20190812152416.35f98091@cakuba.netronome.com>

[-- Attachment #1: Type: text/plain, Size: 1527 bytes --]

On Mon, 2019-08-12 at 15:24 -0700, Jakub Kicinski wrote:
> On Mon, 12 Aug 2019 15:07:09 +0000, Nguyen, Anthony L wrote:
> > On Fri, 2019-08-09 at 14:15 -0700, Jakub Kicinski wrote:
> > > On Fri,  9 Aug 2019 11:31:25 -0700, Jeff Kirsher wrote:  
> > > > From: Henry Tieman <henry.w.tieman@intel.com>
> > > > 
> > > > Add code to query and set the number of queues on the primary
> > > > VSI for a PF. This is accessed from the 'ethtool -l' and
> > > > 'ethtool
> > > > -L'
> > > > commands, respectively.
> > > > 
> > > > Signed-off-by: Henry Tieman <henry.w.tieman@intel.com>
> > > > Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
> > > > Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
> > > > Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>  
> > > 
> > > If you're using the same IRQ vector for RX and TX queue the
> > > channel
> > > counts as combined. Looks like you are counting RX and TX
> > > separately
> > > here. That's incorrect.  
> > 
> > Hi Jakub,
> > 
> > The ice driver can support asymmetric queues.  We report these
> > seperately, as opposed to combined, so that the user can specify a
> > different number of Rx and Tx queues.
> 
> If you have 20 IRQ vectors, 10 TX queues and 20 RX queues, the first
> 10
> RX queues share a IRQ vector with TX queues the ethool API counts
> them
> as 10 combined and 10 rx-only. 
> 
> 10 tx-only and 20 rx-only would require 30 IRQ vectors.

Thanks for the feedback Jakub.  We are looking into this.

-Tony

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 3277 bytes --]

^ permalink raw reply

* Re: linux-next: manual merge of the net-next tree with the kbuild tree
From: Kees Cook @ 2019-08-16 18:03 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Stephen Rothwell, David Miller, Networking, Masahiro Yamada,
	Linux Next Mailing List, Linux Kernel Mailing List,
	Andrii Nakryiko, Daniel Borkmann
In-Reply-To: <CAEf4BzY9dDZF-DBDmuQQz0Rcx3DNGvQn_GLr0Uar1PAbAf2iig@mail.gmail.com>

On Thu, Aug 15, 2019 at 10:21:29PM -0700, Andrii Nakryiko wrote:
> On Thu, Aug 15, 2019 at 7:42 PM Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> >
> > Hi all,
> >
> > Today's linux-next merge of the net-next tree got a conflict in:
> >
> >   scripts/link-vmlinux.sh
> >
> > between commit:
> >
> >   e167191e4a8a ("kbuild: Parameterize kallsyms generation and correct reporting")
> >
> > from the kbuild tree and commits:
> >
> >   341dfcf8d78e ("btf: expose BTF info through sysfs")
> >   7fd785685e22 ("btf: rename /sys/kernel/btf/kernel into /sys/kernel/btf/vmlinux")
> >
> > from the net-next tree.
> >
> > I fixed it up (I think - see below) and can carry the fix as necessary.
> 
> Thanks, Stephen! Looks good except one minor issue below.
> 
> > This is now fixed as far as linux-next is concerned, but any non trivial
> > conflicts should be mentioned to your upstream maintainer when your tree
> > is submitted for merging.  You may also want to consider cooperating
> > with the maintainer of the conflicting tree to minimise any particularly
> > complex conflicts.
> >
> > --
> > Cheers,
> > Stephen Rothwell
> >
> > diff --cc scripts/link-vmlinux.sh
> > index 2438a9faf3f1,c31193340108..000000000000
> > --- a/scripts/link-vmlinux.sh
> > +++ b/scripts/link-vmlinux.sh
> > @@@ -56,11 -56,10 +56,11 @@@ modpost_link(
> >   }
> >
> >   # Link of vmlinux
> > - # ${1} - optional extra .o files
> > - # ${2} - output file
> > + # ${1} - output file
> > + # ${@:2} - optional extra .o files
> >   vmlinux_link()
> >   {
> >  +      info LD ${2}
> 
> This needs to be ${1}.
> 
> >         local lds="${objtree}/${KBUILD_LDS}"
> >         local objects
> >
> > @@@ -139,18 -149,6 +150,18 @@@ kallsyms(
> >         ${CC} ${aflags} -c -o ${2} ${afile}
> >   }
> >
> >  +# Perform one step in kallsyms generation, including temporary linking of
> >  +# vmlinux.
> >  +kallsyms_step()
> >  +{
> >  +      kallsymso_prev=${kallsymso}
> >  +      kallsymso=.tmp_kallsyms${1}.o
> >  +      kallsyms_vmlinux=.tmp_vmlinux${1}
> >  +
> > -       vmlinux_link "${kallsymso_prev}" ${kallsyms_vmlinux}
> > ++      vmlinux_link ${kallsyms_vmlinux} "${kallsymso_prev}" ${btf_vmlinux_bin_o}

Good cleanup on the "optional .o files" reordering! With your ordering
change, I think the ""s around ${kallsymso_prev} here are no longer needed
(which makes it read a bit more nicely).

> >  +      kallsyms ${kallsyms_vmlinux} ${kallsymso}
> >  +}
> >  +
> >   # Create map file with all symbols from ${1}
> >   # See mksymap for additional details
> >   mksysmap()
> > @@@ -228,8 -227,14 +240,15 @@@ ${MAKE} -f "${srctree}/scripts/Makefile
> >   info MODINFO modules.builtin.modinfo
> >   ${OBJCOPY} -j .modinfo -O binary vmlinux.o modules.builtin.modinfo
> >
> > + btf_vmlinux_bin_o=""
> > + if [ -n "${CONFIG_DEBUG_INFO_BTF}" ]; then
> > +       if gen_btf .tmp_vmlinux.btf .btf.vmlinux.bin.o ; then
> > +               btf_vmlinux_bin_o=.btf.vmlinux.bin.o
> > +       fi
> > + fi
> > +
> >   kallsymso=""
> >  +kallsymso_prev=""
> >   kallsyms_vmlinux=""
> >   if [ -n "${CONFIG_KALLSYMS}" ]; then
> >
> > @@@ -268,11 -285,8 +287,7 @@@
> >         fi
> >   fi
> >
> > - vmlinux_link "${kallsymso}" vmlinux
> > -
> > - if [ -n "${CONFIG_DEBUG_INFO_BTF}" ]; then
> > -       gen_btf vmlinux
> > - fi
> >  -info LD vmlinux
> > + vmlinux_link vmlinux "${kallsymso}" "${btf_vmlinux_bin_o}"

And, I think, also not here for either trailing argument.

> >
> >   if [ -n "${CONFIG_BUILDTIME_EXTABLE_SORT}" ]; then
> >         info SORTEX vmlinux

-Kees

-- 
Kees Cook

^ permalink raw reply

* Re: [RFC bpf-next 0/3] tools: bpftool: add subcommand to count map entries
From: Edward Cree @ 2019-08-16 18:13 UTC (permalink / raw)
  To: Quentin Monnet, Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, netdev, oss-drivers
In-Reply-To: <031de7fd-caa7-9e66-861f-8e46e5bb8851@netronome.com>

On 15/08/2019 15:15, Quentin Monnet wrote:
> So if I understand correctly, we would use the bpf() syscall to trigger
> a run of such program on all map entries (for map implementing the new
> operation), and the context would include pointers to the key and the
> value for the entry being processed so we can count/sum/compute an
> average of the values or any other kind of processing?
Yep, that's pretty much exactly what I had in mind.

-Ed

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox