Netdev List
 help / color / mirror / Atom feed
* Re: [next-queue PATCH 2/3] net/sched: Introduce Credit Based Shaper (CBS) qdisc
From: Cong Wang @ 2017-09-27 20:23 UTC (permalink / raw)
  To: Vinicius Costa Gomes
  Cc: Linux Kernel Network Developers, intel-wired-lan,
	Jamal Hadi Salim, Jiri Pirko, andre.guedes, ivan.briano,
	jesus.sanchez-palencia, boon.leong.ong, richardcochran, henrik
In-Reply-To: <20170926233916.11774-3-vinicius.gomes@intel.com>

On Tue, Sep 26, 2017 at 4:39 PM, Vinicius Costa Gomes
<vinicius.gomes@intel.com> wrote:
> +static int cbs_init(struct Qdisc *sch, struct nlattr *opt)
> +{
> +       struct cbs_sched_data *q = qdisc_priv(sch);
> +       struct net_device *dev = qdisc_dev(sch);
> +
> +       if (!opt)
> +               return -EINVAL;
> +
> +       /* FIXME: this means that we can only install this qdisc
> +        * "under" mqprio. Do we need a more generic way to retrieve
> +        * the queue, or do we pass the netdev_queue to the driver?
> +        */
> +       q->queue = TC_H_MIN(sch->parent) - 1 - netdev_get_num_tc(dev);
> +
> +       return cbs_change(sch, opt);
> +}

Yeah it is ugly to assume its parent is mqprio, at least you should
error out if it is not the case.

I am not sure how we can solve this elegantly, perhaps you should
extend mqprio rather than add a new one?

^ permalink raw reply

* RTL8169 vs low-latency (was: Re: Re: RTL 8169  linux driver question)
From: Kirill Smelkov @ 2017-09-27 20:30 UTC (permalink / raw)
  To: Francois Romieu, Hayes Wang
  Cc: David Laight, Stéphane ANCELOT, netdev, sancelot,
	Klaus Wölfel, Ivan Tyagov, Julien Muchembled,
	Vincent Pelletier, Rafael Monnerat, Hardik Juneja,
	Jean-Paul Smets
In-Reply-To: <20121128231822.GA10158@electric-eye.fr.zoreil.com>

+ klaus, ivan, ...

On Mon, Nov 26, 2012 at 09:15:19AM -0000, David Laight wrote:
> On Fri, Nov 23, 2012 at 08:14:37PM +0100, Stéphane ANCELOT wrote:
> > I had problem with it, my application sends a frame that is immediately
> > transmitted back by some slaves, there was abnormally 100us  lost
> > between the send and receive call.
> > 
> > Finally I found it was coming from the following register setup in the
> > driver :
> > 
> > RTL_W16(IntrMitigate, 0x5151);
> > 
> > Can you give me some details about it, since I do not have the RTL8169
> > programming guide.
> 
> That sounds like an 'interrupt mitigation' setting - which will cause
> RX interrupts to be delayed a short time in order to reduce the
> interrupt load on the kernel.
> 
> There is usually an 'ethtool' setting to disable interrupt mitigation.

On Wed, Nov 28, 2012 at 10:32:12AM +0800, hayeswang wrote:
> Francois Romieu [mailto:romieu@fr.zoreil.com] 
> [...]
> > Something like the patch below against net-next could help once I will
> > have tested it.
> > 
> > I completely guessed the Tx usec scale factor at gigabit 
> > speed (125 us, 
> > 100 us, disabled, who knows ?) and I have no idea which 
> > specific chipsets
> > it should work with.
> > 
> > Hayes, may I expect some hindsight regarding:
> > 1 - the availability of the IntrMitigate (0xe2) register through the
> >     8169, 8168 and 810x line of chipsets
> 
> 8169, 8168, and 8136(810x) serial chipsets support it.
> 
> > 2 - the Tx timer unit at gigabit speed
> 
> The unit of the timer depneds on both the speed and the setting of CPlusCmd
> (0xe0) bit 1 and bit 0.
> 
> For 8169
> bit[1:0] \ speed      1000M           100M            10M
> 0 0           320ns           2.56us          40.96us
> 0 1           2.56us          20.48us         327.7us
> 1 0           5.12us          40.96us         655.4us
> 1 1           10.24us         81.92us         1.31ms
> 
> For the other
> bit[1:0] \ speed      1000M           100M            10M
> 0 0           5us             2.56us          40.96us
> 0 1           40us            20.48us         327.7us
> 1 0           80us            40.96us         655.4us
> 1 1           160us           81.92us         1.31ms

On Thu, Nov 29, 2012 at 12:18:22AM +0100, Francois Romieu wrote:
> David Laight <David.Laight@ACULAB.COM> :
> [David's life]
> 
> 
> The version below fixes several bugs and refuses the frame or timing
> values it can't set. Hayes's Tx parameters still need to be pluged
> into rtl_coalesce_scale.
> 
> Rx delays seem lower than what I had expected when testing with a 8168b
> (XID 18000000).
> 
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 248f883..d2594b1 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -349,6 +349,12 @@ enum rtl_registers {
>  	RxMaxSize	= 0xda,
>  	CPlusCmd	= 0xe0,
>  	IntrMitigate	= 0xe2,
> +
> +#define RTL_COALESCE_MASK	0x0f
> +#define RTL_COALESCE_SHIFT	4
> +#define RTL_COALESCE_T_MAX	(RTL_COALESCE_MASK)
> +#define RTL_COALESCE_FRAME_MAX	(RTL_COALESCE_MASK << 2)
> +
>
> [...]

Hello up there. Let me chime in into this a bit old thread.

Like Stéphane I care about timings. It is not real-time but in my case network
round-trip latency almost directly translates into client-server
database request/response time and thus it significantly affects
throughput for workloads with many serially performed requests.

We have many computers with gigabit Realtek NICs. For 2 such computers
connected to a gigabit store-and-forward switch the minimum round-trip
time for small pings (`ping -i 0 -w 3 -s 56 -q peer`) is ~ 30μs.

However it turned out that when Ethernet frame length transitions 127 ->
128 bytes (`ping -i 0 -w 3 -s {81 -> 82} -q peer`) the lowest RTT
transitions step-wise to ~ 270μs.

As David said this is RX interrupt mitigation done by NIC which creates
the latency. For workloads when low-latency is required with e.g. Intel,
BCM etc NIC drivers one just uses `ethtool -C rx-usecs ...` to reduce
the time NIC delays before interrupting CPU, but it turned out
`ethtool -C` is not supported by r8169 driver.

Like Stéphane I've traced the problem down to IntrMitigate being
hardcoded to != 0 for our chips (we have 8168 based NICs):

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/realtek/r8169.c#n5460
static void rtl_hw_start_8169(struct net_device *dev) {
	...
        /*
         * Undocumented corner. Supposedly:
         * (TxTimer << 12) | (TxPackets << 8) | (RxTimer << 4) | RxPackets
         */ 
        RTL_W16(IntrMitigate, 0x0000);

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/realtek/r8169.c#n6346
static void rtl_hw_start_8168(struct net_device *dev) {
	...
        RTL_W16(IntrMitigate, 0x5151);

and then I've also found this thread.

So could we please finally get support for tuning r8169 interrupt
coalescing in tree? (so that next poor soul who hits the problem does
not need to go all the way to dig into driver sources and internet
wildly and finally patch locally

        -RTL_W16(IntrMitigate, 0x5151);
	+RTL_W16(IntrMitigate, 0x5100);

guessing whether it is right or not and also having to care to deploy
the patch everywhere it needs to be used, etc...).

To do so I've took Francois's patch and reworked it a bit:

- updated to latest net-next.git;
- adjusted scaling setup based on feedback from Hayes to pick up scaling
  vector depending not only on link speed but also on CPlusCmd[0:1] and to
  adjust CPlusCmd[0:1] correspondingly when setting timings;
- improved a bit (I think so) error handling.

I've tested the patch on "RTL8168d/8111d" (XID 083000c0) and with it and
`ethtool -C rx-usecs 0 rx-frames 0` on both ends it improves:

- minimum RTT latency:

	~270μs ->  ~30μs (small packet),
	~330μs -> ~110μs (full 1.5K ethernet frame)

- average RTT latency:

	~480μs ->  ~50μs (small packet),
	~560μs -> ~125μs (full 1.5K ethernet frame)

( before:

	root@neo1:# ping -i 0 -w 3 -s 82 -q neo2
	PING neo2.kirr.nexedi.com (192.168.102.21) 82(110) bytes of data.
	
	--- neo2.kirr.nexedi.com ping statistics ---
	5906 packets transmitted, 5905 received, 0% packet loss, time 2999ms
	rtt min/avg/max/mdev = 0.274/0.485/0.607/0.026 ms, ipg/ewma 0.508/0.489 ms
	
	root@neo1:# ping -i 0 -w 3 -s 1472 -q neo2
	PING neo2.kirr.nexedi.com (192.168.102.21) 1472(1500) bytes of data.
	
	--- neo2.kirr.nexedi.com ping statistics ---
	5073 packets transmitted, 5073 received, 0% packet loss, time 2999ms
	rtt min/avg/max/mdev = 0.330/0.566/0.710/0.028 ms, ipg/ewma 0.591/0.544 ms

  after:

	root@neo1# ping -i 0 -w 3 -s 82 -q neo2
	PING neo2.kirr.nexedi.com (192.168.102.21) 82(110) bytes of data.
	
	--- neo2.kirr.nexedi.com ping statistics ---
	45815 packets transmitted, 45815 received, 0% packet loss, time 3000ms
	rtt min/avg/max/mdev = 0.036/0.051/0.368/0.010 ms, ipg/ewma 0.065/0.053 ms
	
	root@neo1:# ping -i 0 -w 3 -s 1472 -q neo2
	PING neo2.kirr.nexedi.com (192.168.102.21) 1472(1500) bytes of data.
	
	--- neo2.kirr.nexedi.com ping statistics ---
	21250 packets transmitted, 21250 received, 0% packet loss, time 3000ms
	rtt min/avg/max/mdev = 0.112/0.125/0.390/0.007 ms, ipg/ewma 0.141/0.125 ms

  the small -> 1.5K latency growth is understandable as it takes ~15μs
  to transmit 1.5K on 1Gbps on the wire and with 2 hosts and 1 switch
  and ICMP ECHO + ECHO reply the packet has to travel 4 ethernet
  segments which is already 60μs;
  
  probably something a bit else is also there as e.g. on Linux, even
  with `cpupower frequency-set -g performance`, on some computers I've
  noticed the kernel can be spending more time in software-only mode
  when incoming packets go in less frequently. E.g. this program can
  demonstrate the effect for ICMP ECHO processing:
  
  https://lab.nexedi.com/kirr/bcc/blob/43cfc13b/tools/pinglat.py )

Once again let's please work towards including the patch into mainline
kernel.

It remains to be clarified whether RX and TX timers use the same base.
For now I've set them equally, but Francois's origianl patch version
suggests it could be not the same.

I would appreciate feedback from Hayes on this and also on whether 128
raw length is the threshold below which packets are considered by NIC as
small and go in without interrupt moderation.

Thanks beforehand,
Kirill

---- 8< ----
From: Francois Romieu <romieu@fr.zoreil.com>
Subject: [PATCH] r8169: Add support for interrupt coalesce tuning (ethtool -C)

In particular with

	ethtool -C <ifname> rx-usecs 0 rx-frames 0

now it is possible to disable RX delays when NIC usage requires low-latency.

See this thread for example and background:

	https://www.spinics.net/lists/netdev/msg217665.html

( kirr:

  - adjusted scaling setup based on feedback from Hayes to pick up scaling
    vector depending not only on speed but also on CPlusCmd[0:1] and to
    adjust CPlusCmd[0:1] correspondingly when setting timings;
  - improved a bit (I think so) error handling. )

Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
---
 drivers/net/ethernet/realtek/r8169.c | 231 +++++++++++++++++++++++++++++++++++
 1 file changed, 231 insertions(+)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index e03fcf914690..bebae6d8ea38 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -399,6 +399,12 @@ enum rtl_registers {
 	RxMaxSize	= 0xda,
 	CPlusCmd	= 0xe0,
 	IntrMitigate	= 0xe2,
+
+#define RTL_COALESCE_MASK	0x0f
+#define RTL_COALESCE_SHIFT	4
+#define RTL_COALESCE_T_MAX	(RTL_COALESCE_MASK)
+#define RTL_COALESCE_FRAME_MAX	(RTL_COALESCE_MASK << 2)
+
 	RxDescAddrLow	= 0xe4,
 	RxDescAddrHigh	= 0xe8,
 	EarlyTxThres	= 0xec,	/* 8169. Unit of 32 bytes. */
@@ -795,6 +801,7 @@ struct rtl8169_private {
 	u16 cp_cmd;
 
 	u16 event_slow;
+	const struct rtl_coalesce_info *coalesce_info;
 
 	struct mdio_ops {
 		void (*write)(struct rtl8169_private *, int, int);
@@ -2363,10 +2370,229 @@ static int rtl8169_nway_reset(struct net_device *dev)
 	return mii_nway_restart(&tp->mii);
 }
 
+/*
+ * Interrupt coalescing
+ *
+ * > 1 - the availability of the IntrMitigate (0xe2) register through the
+ * >     8169, 8168 and 810x line of chipsets
+ *
+ * 8169, 8168, and 8136(810x) serial chipsets support it.
+ *
+ * > 2 - the Tx timer unit at gigabit speed
+ *
+ * The unit of the timer depends on both the speed and the setting of CPlusCmd
+ * (0xe0) bit 1 and bit 0.
+ *
+ * For 8169
+ * bit[1:0] \ speed        1000M           100M            10M
+ * 0 0                     320ns           2.56us          40.96us
+ * 0 1                     2.56us          20.48us         327.7us
+ * 1 0                     5.12us          40.96us         655.4us
+ * 1 1                     10.24us         81.92us         1.31ms
+ *
+ * For the other
+ * bit[1:0] \ speed        1000M           100M            10M
+ * 0 0                     5us             2.56us          40.96us
+ * 0 1                     40us            20.48us         327.7us
+ * 1 0                     80us            40.96us         655.4us
+ * 1 1                     160us           81.92us         1.31ms
+ */
+
+/* rx/tx scale factors for one particular CPlusCmd[0:1] value */
+struct rtl_coalesce_scale {
+	/* Rx / Tx */
+	u32 nsecs[2];
+};
+
+/* rx/tx scale factors for all CPlusCmd[0:1] cases */
+struct rtl_coalesce_info {
+	u32 speed;
+	struct rtl_coalesce_scale scalev[4];	/* each CPlusCmd[0:1] case */
+};
+
+/* produce (r,t) pairs with each being in series of *1, *8, *8*2, *8*2*2 */
+#define rxtx_x1822(r, t) {		\
+	{{(r),		(t)}},		\
+	{{(r)*8,	(t)*8}},	\
+	{{(r)*8*2,	(t)*8*2}},	\
+	{{(r)*8*2*2,	(t)*8*2*2}},	\
+}
+static const struct rtl_coalesce_info rtl_coalesce_info_8169[] = {
+	/* speed	delays:     rx00   tx00	*/
+	{ SPEED_10,	rxtx_x1822(40960, 40960)	},
+	{ SPEED_100,	rxtx_x1822( 2560,  2560)	},
+	{ SPEED_1000,	rxtx_x1822(  320,   320)	},
+	{ 0 },
+};
+
+static const struct rtl_coalesce_info rtl_coalesce_info_8168_8136[] = {
+	/* speed	delays:     rx00   tx00	*/
+	{ SPEED_10,	rxtx_x1822(40960, 40960)	},
+	{ SPEED_100,	rxtx_x1822( 2560,  2560)	},
+	{ SPEED_1000,	rxtx_x1822( 5000,  5000)	},
+	{ 0 },
+};
+#undef rxtx_x1822
+
+/* get rx/tx scale vector corresponding to current speed */
+static const struct rtl_coalesce_info *rtl_coalesce_info(struct net_device *dev)
+{
+	struct rtl8169_private *tp = netdev_priv(dev);
+	struct ethtool_link_ksettings ecmd;
+	const struct rtl_coalesce_info *ci;
+	int rc;
+
+	rc = rtl8169_get_link_ksettings(dev, &ecmd);
+	if (rc < 0)
+		return ERR_PTR(rc);
+
+	for (ci = tp->coalesce_info; ci->speed != 0; ci++) {
+		if (ecmd.base.speed == ci->speed) {
+			return ci;
+		}
+	}
+
+	return ERR_PTR(-ELNRNG);
+}
+
+static int rtl_get_coalesce(struct net_device *dev, struct ethtool_coalesce *ec)
+{
+	struct rtl8169_private *tp = netdev_priv(dev);
+	void __iomem *ioaddr = tp->mmio_addr;
+	const struct rtl_coalesce_info *ci;
+	const struct rtl_coalesce_scale *scale;
+	struct {
+		u32 *max_frames;
+		u32 *usecs;
+	} coal_settings [] = {
+		{ &ec->rx_max_coalesced_frames, &ec->rx_coalesce_usecs },
+		{ &ec->tx_max_coalesced_frames, &ec->tx_coalesce_usecs }
+	}, *p = coal_settings;
+	int i;
+	u16 w;
+
+	memset(ec, 0, sizeof(*ec));
+
+	/* get rx/tx scale corresponding to current speed and CPlusCmd[0:1] */
+	ci = rtl_coalesce_info(dev);
+	if (IS_ERR(ci))
+		return PTR_ERR(ci);
+
+	scale = &ci->scalev[RTL_R16(CPlusCmd) & 3];
+
+	/* read IntrMitigate and adjust according to scale */
+	for (w = RTL_R16(IntrMitigate); w; w >>= RTL_COALESCE_SHIFT, p++) {
+		*p->max_frames = (w & RTL_COALESCE_MASK) << 2;
+		w >>= RTL_COALESCE_SHIFT;
+		*p->usecs = w & RTL_COALESCE_MASK;
+	}
+
+	for (i = 0; i < 2; i++) {
+		p = coal_settings + i;
+		*p->usecs = (*p->usecs * scale->nsecs[i]) / 1000;
+
+		/*
+		 * ethtool_coalesce says it is illegal to set both usecs and
+		 * max_frames to 0.
+		 */
+		if (!*p->usecs && !*p->max_frames)
+			*p->max_frames = 1;
+	}
+
+	return 0;
+}
+
+/* choose appropriate scale factor and CPlusCmd[0:1] for (speed, nsec) */
+static const struct rtl_coalesce_scale *rtl_coalesce_choose_scale(
+			struct net_device *dev, u32 nsec, u16 *cp01)
+{
+	const struct rtl_coalesce_info *ci;
+	u16 i;
+
+	ci = rtl_coalesce_info(dev);
+	if (IS_ERR(ci))
+		return ERR_CAST(ci);
+
+	for (i = 0; i < 4; i++) {
+		u32 rxtx_maxscale = max(ci->scalev[i].nsecs[0],
+					ci->scalev[i].nsecs[1]);
+		if (nsec <= rxtx_maxscale * RTL_COALESCE_T_MAX) {
+			*cp01 = i;
+			return &ci->scalev[i];
+		}
+	}
+
+	return ERR_PTR(-EINVAL);
+}
+
+static int rtl_set_coalesce(struct net_device *dev, struct ethtool_coalesce *ec)
+{
+	struct rtl8169_private *tp = netdev_priv(dev);
+	void __iomem *ioaddr = tp->mmio_addr;
+	const struct rtl_coalesce_scale *scale;
+	struct {
+		u32 frames;
+		u32 usecs;
+	} coal_settings [] = {
+		{ ec->rx_max_coalesced_frames, ec->rx_coalesce_usecs },
+		{ ec->tx_max_coalesced_frames, ec->tx_coalesce_usecs }
+	}, *p = coal_settings;
+	u16 w = 0, cp01;
+	int i;
+
+	scale = rtl_coalesce_choose_scale(dev,
+			max(p[0].usecs, p[1].usecs) * 1000, &cp01);
+	if (IS_ERR(scale))
+		return PTR_ERR(scale);
+
+	for (i = 0; i < 2; i++, p++) {
+		u32 units;
+
+		/*
+		 * accept max_frames=1 we returned in rtl_get_coalesce.
+		 * accept it not only when usecs=0 because of e.g. the following scenario:
+		 *
+		 * - both rx_usecs=0 & rx_frames=0 in hardware (no delay on RX)
+		 * - rtl_get_coalesce returns rx_usecs=0, rx_frames=1
+		 * - then user does `ethtool -C eth0 rx-usecs 100`
+		 *
+		 * since ethtool sends to kernel whole ethtool_coalesce
+		 * settings, if we do not handle rx_usecs=!0, rx_frames=1
+		 * we'll reject it below in `frames % 4 != 0`.
+		 */
+		if (p->frames == 1) {
+			p->frames = 0;
+		}
+
+		units = p->usecs * 1000 / scale->nsecs[i];
+		if (p->frames > RTL_COALESCE_FRAME_MAX || p->frames % 4)
+			return -EINVAL;
+
+		w <<= RTL_COALESCE_SHIFT;
+		w |= units;
+		w <<= RTL_COALESCE_SHIFT;
+		w |= p->frames >> 2;
+	}
+
+	rtl_lock_work(tp);
+
+	RTL_W16(IntrMitigate, swab16(w));
+
+	tp->cp_cmd = (tp->cp_cmd & ~3) | cp01;
+	RTL_W16(CPlusCmd, tp->cp_cmd);
+	RTL_R16(CPlusCmd);
+
+	rtl_unlock_work(tp);
+
+	return 0;
+}
+
 static const struct ethtool_ops rtl8169_ethtool_ops = {
 	.get_drvinfo		= rtl8169_get_drvinfo,
 	.get_regs_len		= rtl8169_get_regs_len,
 	.get_link		= ethtool_op_get_link,
+	.get_coalesce		= rtl_get_coalesce,
+	.set_coalesce		= rtl_set_coalesce,
 	.set_settings		= rtl8169_set_settings,
 	.get_msglevel		= rtl8169_get_msglevel,
 	.set_msglevel		= rtl8169_set_msglevel,
@@ -8062,6 +8288,7 @@ static const struct rtl_cfg_info {
 	unsigned int align;
 	u16 event_slow;
 	unsigned features;
+	const struct rtl_coalesce_info *coalesce_info;
 	u8 default_ver;
 } rtl_cfg_infos [] = {
 	[RTL_CFG_0] = {
@@ -8070,6 +8297,7 @@ static const struct rtl_cfg_info {
 		.align		= 0,
 		.event_slow	= SYSErr | LinkChg | RxOverflow | RxFIFOOver,
 		.features	= RTL_FEATURE_GMII,
+		.coalesce_info	= rtl_coalesce_info_8169,
 		.default_ver	= RTL_GIGA_MAC_VER_01,
 	},
 	[RTL_CFG_1] = {
@@ -8078,6 +8306,7 @@ static const struct rtl_cfg_info {
 		.align		= 8,
 		.event_slow	= SYSErr | LinkChg | RxOverflow,
 		.features	= RTL_FEATURE_GMII | RTL_FEATURE_MSI,
+		.coalesce_info	= rtl_coalesce_info_8168_8136,
 		.default_ver	= RTL_GIGA_MAC_VER_11,
 	},
 	[RTL_CFG_2] = {
@@ -8087,6 +8316,7 @@ static const struct rtl_cfg_info {
 		.event_slow	= SYSErr | LinkChg | RxOverflow | RxFIFOOver |
 				  PCSTimeout,
 		.features	= RTL_FEATURE_MSI,
+		.coalesce_info	= rtl_coalesce_info_8168_8136,
 		.default_ver	= RTL_GIGA_MAC_VER_13,
 	}
 };
@@ -8450,6 +8680,7 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	tp->hw_start = cfg->hw_start;
 	tp->event_slow = cfg->event_slow;
+	tp->coalesce_info = cfg->coalesce_info;
 
 	tp->opts1_mask = (tp->mac_version != RTL_GIGA_MAC_VER_01) ?
 		~(RxBOVF | RxFOVF) : ~0;
-- 
2.14.1.581.gf28d330327

^ permalink raw reply related

* Re: [PATCH v2 2/2] ip_tunnel: add mpls over gre encapsulation
From: kbuild test robot @ 2017-09-27 20:45 UTC (permalink / raw)
  To: Amine Kherbouche
  Cc: kbuild-all, netdev, xeb, roopa, amine.kherbouche, equinox
In-Reply-To: <c40295d24ec5207f5be695a2f888bfa840e2ef2c.1506416988.git.amine.kherbouche@6wind.com>

[-- Attachment #1: Type: text/plain, Size: 753 bytes --]

Hi Amine,

[auto build test ERROR on net/master]
[also build test ERROR on v4.14-rc2 next-20170927]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Amine-Kherbouche/mpls-expose-stack-entry-function/20170928-012842
config: x86_64-rhel (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

>> ERROR: "mpls_forward" [net/ipv4/gre.ko] undefined!

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 40101 bytes --]

^ permalink raw reply

* Re: [PATCH net] ipv6: remove incorrect WARN_ON() in fib6_del()
From: Wei Wang @ 2017-09-27 21:08 UTC (permalink / raw)
  To: Eric Dumazet, Martin KaFai Lau, David Miller
  Cc: Linux Kernel Network Developers
In-Reply-To: <CANn89i+uLBxzBCUP_TdyswBgAAhUMZajkLN7NGo2+9vuzVc_5A@mail.gmail.com>

On Tue, Sep 26, 2017 at 6:20 AM, Eric Dumazet <edumazet@google.com> wrote:
> On Mon, Sep 25, 2017 at 10:52 PM, Wei Wang <weiwan@google.com> wrote:
>> On Mon, Sep 25, 2017 at 7:23 PM, Eric Dumazet <edumazet@google.com> wrote:
>>> On Mon, Sep 25, 2017 at 7:07 PM, Martin KaFai Lau <kafai@fb.com> wrote:
>>>
>>>> I am probably still missing something.
>>>>
>>>> Considering the del operation should be under the writer lock,
>>>> if rt->rt6i_node should be NULL (for rt that has already been
>>>> removed from fib6), why this WARN_ON() is triggered?
>>>>
>>>> An example may help.
>>>>
>>>
>>> Look at the stack trace, you'll find the answers...
>>>
>>> ip6_link_failure() -> ip6_del_rt()
>>>
>>> Note that rt might have been deleted from the _tree_ already.
>>
>> Had a brief talk with Martin.
>> He has a valid point.
>> The current WARN_ON() code is as follows:
>> #if RT6_DEBUG >= 2
>>        if (rt->dst.obsolete > 0) {
>>                WARN_ON(fn);
>>                return -ENOENT;
>>        }
>> #endif
>>
>> The WARN_ON() only triggers when fn is not NULL. (I missed it before.)
>> In theory, fib6_del() calls fib6_del_route() which should set
>> rt->rt6i_node to NULL and rt->dst.obsolete to DST_OBSOLETE_DEAD within
>> the same write_lock session.
>> If those 2 values are inconsistent, it indicates something is wrong.
>> Will need more time to root cause the issue.
>>
>> Please ignore this patch. Sorry about the confusion.
>
> Oh well, for some reason I was seeing WARN_ON(1)  here, since this is
> a construct I often add in my tests ...

Just an update on this issue:
This WARNING issue should already be fixed by commit
7483cea79957312e9f8e9cf760a1bc5d6c507113:
Author: Ido Schimmel <idosch@mellanox.com>
Date:   Thu Aug 3 13:28:22 2017 +0200

    ipv6: fib: Unlink replaced routes from their nodes

    When a route is deleted its node pointer is set to NULL to indicate it's
    no longer linked to its node. Do the same for routes that are replaced.

    This will later allow us to test if a route is still in the FIB by
    checking its node pointer instead of its reference count.

    Signed-off-by: Ido Schimmel <idosch@mellanox.com>
    Signed-off-by: Jiri Pirko <jiri@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

So no further action is needed on this.

Thanks.
Wei

^ permalink raw reply

* Re: [next-queue PATCH 2/3] net/sched: Introduce Credit Based Shaper (CBS) qdisc
From: Vinicius Costa Gomes @ 2017-09-27 21:14 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, intel-wired-lan,
	Jamal Hadi Salim, Jiri Pirko, andre.guedes, ivan.briano,
	jesus.sanchez-palencia, boon.leong.ong, richardcochran, henrik
In-Reply-To: <CAM_iQpVrz+e=TYHYqaYyUWRP528GSZXx8EnN1yPhZK8eLvP4Yw@mail.gmail.com>

Hi,

Cong Wang <xiyou.wangcong@gmail.com> writes:

> On Tue, Sep 26, 2017 at 4:39 PM, Vinicius Costa Gomes
> <vinicius.gomes@intel.com> wrote:
>> +static int cbs_init(struct Qdisc *sch, struct nlattr *opt)
>> +{
>> +       struct cbs_sched_data *q = qdisc_priv(sch);
>> +       struct net_device *dev = qdisc_dev(sch);
>> +
>> +       if (!opt)
>> +               return -EINVAL;
>> +
>> +       /* FIXME: this means that we can only install this qdisc
>> +        * "under" mqprio. Do we need a more generic way to retrieve
>> +        * the queue, or do we pass the netdev_queue to the driver?
>> +        */
>> +       q->queue = TC_H_MIN(sch->parent) - 1 - netdev_get_num_tc(dev);
>> +
>> +       return cbs_change(sch, opt);
>> +}
>
> Yeah it is ugly to assume its parent is mqprio, at least you should
> error out if it is not the case.

Will add an error for this, for now.

>
> I am not sure how we can solve this elegantly, perhaps you should
> extend mqprio rather than add a new one?

Is the alternative hinted in the FIXME worse? Instead of passing the
index of the hardware queue to the driver we pass the pointer to a
netdev_queue to the driver and it "discovers" the HW queue from that.


Cheers,

^ permalink raw reply

* [PATCH] rndis_host: support Novatel Verizon USB730L
From: Aleksander Morgado @ 2017-09-27 21:31 UTC (permalink / raw)
  To: oliver; +Cc: davem, linux-usb, netdev, Aleksander Morgado

Treat the ef/04/01 interface class/subclass/protocol combination used
by the Novatel Verizon USB730L (1410:9030) as a possible RNDIS
interface.

 T:  Bus=01 Lev=02 Prnt=02 Port=01 Cnt=02 Dev#= 17 Spd=480 MxCh= 0
 D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  3
 P:  Vendor=1410 ProdID=9030 Rev=03.10
 S:  Manufacturer=Novatel Wireless
 S:  Product=MiFi USB730L
 S:  SerialNumber=0123456789ABCDEF
 C:  #Ifs= 3 Cfg#= 1 Atr=80 MxPwr=500mA
 I:  If#= 0 Alt= 0 #EPs= 1 Cls=ef(misc ) Sub=04 Prot=01 Driver=rndis_host
 I:  If#= 1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=rndis_host
 I:  If#= 2 Alt= 0 #EPs= 1 Cls=03(HID  ) Sub=00 Prot=00 Driver=usbhid

Once the network interface is brought up, the user just needs to run a
DHCP client to get IP address and routing setup.

As a side note, other Novatel Verizon USB730L models with the same
vid:pid end up exposing a standard ECM interface which doesn't require
any other kernel update to make it work.

Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>
---

Hey,

I'm not sure if binding this logic to a specific vid:pid (1410:9030) would be more appropriate here, or if it's ok to just bind class/subclass/protocol (as in the activesync case).
Let me know what you think.

---
 drivers/net/usb/cdc_ether.c  | 11 ++++++++++-
 drivers/net/usb/rndis_host.c |  4 ++++
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c
index 8ab281b478f2..2df0bcc6d30b 100644
--- a/drivers/net/usb/cdc_ether.c
+++ b/drivers/net/usb/cdc_ether.c
@@ -54,11 +54,19 @@ static int is_wireless_rndis(struct usb_interface_descriptor *desc)
 		desc->bInterfaceProtocol == 3);
 }

+static int is_novatel_rndis(struct usb_interface_descriptor *desc)
+{
+	return (desc->bInterfaceClass == USB_CLASS_MISC &&
+		desc->bInterfaceSubClass == 4 &&
+		desc->bInterfaceProtocol == 1);
+}
+
 #else

 #define is_rndis(desc)		0
 #define is_activesync(desc)	0
 #define is_wireless_rndis(desc)	0
+#define is_novatel_rndis(desc)	0

 #endif

@@ -150,7 +158,8 @@ int usbnet_generic_cdc_bind(struct usbnet *dev, struct usb_interface *intf)
 	 */
 	rndis = (is_rndis(&intf->cur_altsetting->desc) ||
 		 is_activesync(&intf->cur_altsetting->desc) ||
-		 is_wireless_rndis(&intf->cur_altsetting->desc));
+		 is_wireless_rndis(&intf->cur_altsetting->desc) ||
+		 is_novatel_rndis(&intf->cur_altsetting->desc));

 	memset(info, 0, sizeof(*info));
 	info->control = intf;
diff --git a/drivers/net/usb/rndis_host.c b/drivers/net/usb/rndis_host.c
index a151f267aebb..b807c91abe1d 100644
--- a/drivers/net/usb/rndis_host.c
+++ b/drivers/net/usb/rndis_host.c
@@ -632,6 +632,10 @@ static const struct usb_device_id	products [] = {
 	/* RNDIS for tethering */
 	USB_INTERFACE_INFO(USB_CLASS_WIRELESS_CONTROLLER, 1, 3),
 	.driver_info = (unsigned long) &rndis_info,
+}, {
+	/* Novatel Verizon USB730L */
+	USB_INTERFACE_INFO(USB_CLASS_MISC, 4, 1),
+	.driver_info = (unsigned long) &rndis_info,
 },
 	{ },		// END
 };

^ permalink raw reply related

* [PATCH net-next 4/5] bpf: Swap the order of checking prog_info and map_info
From: Martin KaFai Lau @ 2017-09-27 21:37 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team
In-Reply-To: <20170927213756.1254938-1-kafai@fb.com>

This patch swaps the checking order.  It now checks the map_info
first and then prog_info.  It is a prep work for adding
test to the newly added fields (the map_ids of prog_info field
in particular).

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
 tools/testing/selftests/bpf/test_progs.c | 58 +++++++++++++++++---------------
 1 file changed, 30 insertions(+), 28 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index 11ee25cea227..31ae27dc8d04 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -316,6 +316,36 @@ static void test_bpf_obj_id(void)
 			error_cnt++;
 		assert(!err);
 
+		/* Insert a magic value to the map */
+		map_fds[i] = bpf_find_map(__func__, objs[i], "test_map_id");
+		assert(map_fds[i] >= 0);
+		err = bpf_map_update_elem(map_fds[i], &array_key,
+					  &array_magic_value, 0);
+		assert(!err);
+
+		/* Check getting map info */
+		info_len = sizeof(struct bpf_map_info) * 2;
+		bzero(&map_infos[i], info_len);
+		err = bpf_obj_get_info_by_fd(map_fds[i], &map_infos[i],
+					     &info_len);
+		if (CHECK(err ||
+			  map_infos[i].type != BPF_MAP_TYPE_ARRAY ||
+			  map_infos[i].key_size != sizeof(__u32) ||
+			  map_infos[i].value_size != sizeof(__u64) ||
+			  map_infos[i].max_entries != 1 ||
+			  map_infos[i].map_flags != 0 ||
+			  info_len != sizeof(struct bpf_map_info),
+			  "get-map-info(fd)",
+			  "err %d errno %d type %d(%d) info_len %u(%lu) key_size %u value_size %u max_entries %u map_flags %X\n",
+			  err, errno,
+			  map_infos[i].type, BPF_MAP_TYPE_ARRAY,
+			  info_len, sizeof(struct bpf_map_info),
+			  map_infos[i].key_size,
+			  map_infos[i].value_size,
+			  map_infos[i].max_entries,
+			  map_infos[i].map_flags))
+			goto done;
+
 		/* Check getting prog info */
 		info_len = sizeof(struct bpf_prog_info) * 2;
 		bzero(&prog_infos[i], info_len);
@@ -347,34 +377,6 @@ static void test_bpf_obj_id(void)
 			  !!memcmp(xlated_insns, zeros, sizeof(zeros))))
 			goto done;
 
-		map_fds[i] = bpf_find_map(__func__, objs[i], "test_map_id");
-		assert(map_fds[i] >= 0);
-		err = bpf_map_update_elem(map_fds[i], &array_key,
-					  &array_magic_value, 0);
-		assert(!err);
-
-		/* Check getting map info */
-		info_len = sizeof(struct bpf_map_info) * 2;
-		bzero(&map_infos[i], info_len);
-		err = bpf_obj_get_info_by_fd(map_fds[i], &map_infos[i],
-					     &info_len);
-		if (CHECK(err ||
-			  map_infos[i].type != BPF_MAP_TYPE_ARRAY ||
-			  map_infos[i].key_size != sizeof(__u32) ||
-			  map_infos[i].value_size != sizeof(__u64) ||
-			  map_infos[i].max_entries != 1 ||
-			  map_infos[i].map_flags != 0 ||
-			  info_len != sizeof(struct bpf_map_info),
-			  "get-map-info(fd)",
-			  "err %d errno %d type %d(%d) info_len %u(%lu) key_size %u value_size %u max_entries %u map_flags %X\n",
-			  err, errno,
-			  map_infos[i].type, BPF_MAP_TYPE_ARRAY,
-			  info_len, sizeof(struct bpf_map_info),
-			  map_infos[i].key_size,
-			  map_infos[i].value_size,
-			  map_infos[i].max_entries,
-			  map_infos[i].map_flags))
-			goto done;
 	}
 
 	/* Check bpf_prog_get_next_id() */
-- 
2.9.5

^ permalink raw reply related

* [PATCH net-next 0/5] bpf: Extend bpf_{prog,map}_info
From: Martin KaFai Lau @ 2017-09-27 21:37 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

This patch series adds more fields to bpf_prog_info and bpf_map_info.
Please see individual patch for details.

Martin KaFai Lau (5):
  bpf: Add name, load_time, uid and map_ids to bpf_prog_info
  bpf: Add map_name to bpf_map_info
  bpf: libbpf: Provide basic API support to specify BPF obj name
  bpf: Swap the order of checking prog_info and map_info
  bpf: Test new fields in bpf_attr and bpf_{prog,map}_info

 include/linux/bpf.h                         |   3 +
 include/uapi/linux/bpf.h                    |  10 ++
 kernel/bpf/syscall.c                        |  58 ++++++++-
 samples/bpf/bpf_load.c                      |   2 +
 samples/bpf/map_perf_test_user.c            |   1 +
 tools/include/uapi/linux/bpf.h              |  10 ++
 tools/lib/bpf/bpf.c                         |  57 ++++++--
 tools/lib/bpf/bpf.h                         |  23 +++-
 tools/lib/bpf/libbpf.c                      | 109 ++++++++++++----
 tools/testing/selftests/bpf/test_progs.c    | 195 +++++++++++++++++++++++-----
 tools/testing/selftests/bpf/test_verifier.c |   2 +-
 11 files changed, 385 insertions(+), 85 deletions(-)

-- 
2.9.5

^ permalink raw reply

* [PATCH net-next 1/5] bpf: Add name, load_time, uid and map_ids to bpf_prog_info
From: Martin KaFai Lau @ 2017-09-27 21:37 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team
In-Reply-To: <20170927213756.1254938-1-kafai@fb.com>

The patch adds name and load_time to struct bpf_prog_aux.  They
are also exported to bpf_prog_info.

The bpf_prog's name is passed by userspace during BPF_PROG_LOAD.
The kernel only stores the first (BPF_PROG_NAME_LEN - 1) bytes
and the name stored in the kernel is always \0 terminated.

The kernel will reject name that contains characters other than
isalnum() and '_'.  It will also reject name that is not null
terminated.

The existing 'user->uid' of the bpf_prog_aux is also exported to
the bpf_prog_info as created_by_uid.

The existing 'used_maps' of the bpf_prog_aux is exported to
the newly added members 'nr_map_ids' and 'map_ids' of
the bpf_prog_info.  On the input, nr_map_ids tells how
big the userspace's map_ids buffer is.  On the output,
nr_map_ids tells the exact user_map_cnt and it will only
copy up to the userspace's map_ids buffer is allowed.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/bpf.h      |  2 ++
 include/uapi/linux/bpf.h |  8 ++++++++
 kernel/bpf/syscall.c     | 51 +++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 2b672c50f160..33ccc474fb04 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -187,6 +187,8 @@ struct bpf_prog_aux {
 	struct bpf_map **used_maps;
 	struct bpf_prog *prog;
 	struct user_struct *user;
+	u64 load_time; /* ns since boottime */
+	u8 name[BPF_OBJ_NAME_LEN];
 	union {
 		struct work_struct work;
 		struct rcu_head	rcu;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e43491ac4823..bd6348269bf5 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -175,6 +175,8 @@ enum bpf_attach_type {
 /* Specify numa node during map creation */
 #define BPF_F_NUMA_NODE		(1U << 2)
 
+#define BPF_OBJ_NAME_LEN 16U
+
 union bpf_attr {
 	struct { /* anonymous struct used by BPF_MAP_CREATE command */
 		__u32	map_type;	/* one of enum bpf_map_type */
@@ -210,6 +212,7 @@ union bpf_attr {
 		__aligned_u64	log_buf;	/* user supplied buffer */
 		__u32		kern_version;	/* checked when prog_type=kprobe */
 		__u32		prog_flags;
+		__u8		prog_name[BPF_OBJ_NAME_LEN];
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_* commands */
@@ -812,6 +815,11 @@ struct bpf_prog_info {
 	__u32 xlated_prog_len;
 	__aligned_u64 jited_prog_insns;
 	__aligned_u64 xlated_prog_insns;
+	__u64 load_time;	/* ns since boottime */
+	__u32 created_by_uid;
+	__u32 nr_map_ids;
+	__aligned_u64 map_ids;
+	__u8  name[BPF_OBJ_NAME_LEN];
 } __attribute__((aligned(8)));
 
 struct bpf_map_info {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 25d074920a00..45970df3f820 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -23,6 +23,9 @@
 #include <linux/version.h>
 #include <linux/kernel.h>
 #include <linux/idr.h>
+#include <linux/cred.h>
+#include <linux/timekeeping.h>
+#include <linux/ctype.h>
 
 #define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PROG_ARRAY || \
 			   (map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
@@ -312,6 +315,30 @@ int bpf_map_new_fd(struct bpf_map *map)
 		   offsetof(union bpf_attr, CMD##_LAST_FIELD) - \
 		   sizeof(attr->CMD##_LAST_FIELD)) != NULL
 
+/* dst and src must have at least BPF_OBJ_NAME_LEN number of bytes.
+ * Return 0 on success and < 0 on error.
+ */
+static int bpf_obj_name_cpy(char *dst, const char *src)
+{
+	const char *end = src + BPF_OBJ_NAME_LEN;
+
+	/* Copy all isalnum() and '_' char */
+	while (src < end && *src) {
+		if (!isalnum(*src) && *src != '_')
+			return -EINVAL;
+		*dst++ = *src++;
+	}
+
+	/* No '\0' found in BPF_OBJ_NAME_LEN number of bytes */
+	if (src == end)
+		return -EINVAL;
+
+	/* '\0' terminates dst */
+	*dst = 0;
+
+	return 0;
+}
+
 #define BPF_MAP_CREATE_LAST_FIELD numa_node
 /* called via syscall */
 static int map_create(union bpf_attr *attr)
@@ -973,7 +1000,7 @@ struct bpf_prog *bpf_prog_get_type(u32 ufd, enum bpf_prog_type type)
 EXPORT_SYMBOL_GPL(bpf_prog_get_type);
 
 /* last field in 'union bpf_attr' used by this command */
-#define	BPF_PROG_LOAD_LAST_FIELD prog_flags
+#define	BPF_PROG_LOAD_LAST_FIELD prog_name
 
 static int bpf_prog_load(union bpf_attr *attr)
 {
@@ -1037,6 +1064,11 @@ static int bpf_prog_load(union bpf_attr *attr)
 	if (err < 0)
 		goto free_prog;
 
+	prog->aux->load_time = ktime_get_boot_ns();
+	err = bpf_obj_name_cpy(prog->aux->name, attr->prog_name);
+	if (err)
+		goto free_prog;
+
 	/* run eBPF verifier */
 	err = bpf_check(&prog, attr);
 	if (err < 0)
@@ -1358,8 +1390,25 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
 
 	info.type = prog->type;
 	info.id = prog->aux->id;
+	info.load_time = prog->aux->load_time;
+	info.created_by_uid = from_kuid_munged(current_user_ns(),
+					       prog->aux->user->uid);
 
 	memcpy(info.tag, prog->tag, sizeof(prog->tag));
+	memcpy(info.name, prog->aux->name, sizeof(prog->aux->name));
+
+	ulen = info.nr_map_ids;
+	info.nr_map_ids = prog->aux->used_map_cnt;
+	ulen = min_t(u32, info.nr_map_ids, ulen);
+	if (ulen) {
+		u32 *user_map_ids = (u32 *)info.map_ids;
+		u32 i;
+
+		for (i = 0; i < ulen; i++)
+			if (put_user(prog->aux->used_maps[i]->id,
+				     &user_map_ids[i]))
+				return -EFAULT;
+	}
 
 	if (!capable(CAP_SYS_ADMIN)) {
 		info.jited_prog_len = 0;
-- 
2.9.5

^ permalink raw reply related

* [PATCH net-next 2/5] bpf: Add map_name to bpf_map_info
From: Martin KaFai Lau @ 2017-09-27 21:37 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team
In-Reply-To: <20170927213756.1254938-1-kafai@fb.com>

This patch allows userspace to specify a name for a map
during BPF_MAP_CREATE.

The map's name can later be exported to user space
via BPF_OBJ_GET_INFO_BY_FD.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/bpf.h      | 1 +
 include/uapi/linux/bpf.h | 2 ++
 kernel/bpf/syscall.c     | 7 ++++++-
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 33ccc474fb04..252f4bc9eb25 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -56,6 +56,7 @@ struct bpf_map {
 	struct work_struct work;
 	atomic_t usercnt;
 	struct bpf_map *inner_map_meta;
+	u8 name[BPF_OBJ_NAME_LEN];
 };
 
 /* function argument constraints */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index bd6348269bf5..6d2137b4cf38 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -190,6 +190,7 @@ union bpf_attr {
 		__u32	numa_node;	/* numa node (effective only if
 					 * BPF_F_NUMA_NODE is set).
 					 */
+		__u8	map_name[BPF_OBJ_NAME_LEN];
 	};
 
 	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
@@ -829,6 +830,7 @@ struct bpf_map_info {
 	__u32 value_size;
 	__u32 max_entries;
 	__u32 map_flags;
+	__u8  name[BPF_OBJ_NAME_LEN];
 } __attribute__((aligned(8)));
 
 /* User bpf_sock_ops struct to access socket values and specify request ops
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 45970df3f820..11a7f82a55d1 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -339,7 +339,7 @@ static int bpf_obj_name_cpy(char *dst, const char *src)
 	return 0;
 }
 
-#define BPF_MAP_CREATE_LAST_FIELD numa_node
+#define BPF_MAP_CREATE_LAST_FIELD map_name
 /* called via syscall */
 static int map_create(union bpf_attr *attr)
 {
@@ -361,6 +361,10 @@ static int map_create(union bpf_attr *attr)
 	if (IS_ERR(map))
 		return PTR_ERR(map);
 
+	err = bpf_obj_name_cpy(map->name, attr->map_name);
+	if (err)
+		goto free_map_nouncharge;
+
 	atomic_set(&map->refcnt, 1);
 	atomic_set(&map->usercnt, 1);
 
@@ -1462,6 +1466,7 @@ static int bpf_map_get_info_by_fd(struct bpf_map *map,
 	info.value_size = map->value_size;
 	info.max_entries = map->max_entries;
 	info.map_flags = map->map_flags;
+	memcpy(info.name, map->name, sizeof(map->name));
 
 	if (copy_to_user(uinfo, &info, info_len) ||
 	    put_user(info_len, &uattr->info.info_len))
-- 
2.9.5

^ permalink raw reply related

* [PATCH net-next 5/5] bpf: Test new fields in bpf_attr and bpf_{prog,map}_info
From: Martin KaFai Lau @ 2017-09-27 21:37 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team
In-Reply-To: <20170927213756.1254938-1-kafai@fb.com>

This patch tests newly added fields of the bpf_attr,
bpf_prog_info and bpf_map_info.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
 tools/testing/selftests/bpf/test_progs.c | 143 ++++++++++++++++++++++++++++---
 1 file changed, 132 insertions(+), 11 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index 31ae27dc8d04..69427531408d 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -10,6 +10,7 @@
 #include <string.h>
 #include <assert.h>
 #include <stdlib.h>
+#include <time.h>
 
 #include <linux/types.h>
 typedef __u16 __sum16;
@@ -19,6 +20,8 @@ typedef __u16 __sum16;
 #include <linux/ip.h>
 #include <linux/ipv6.h>
 #include <linux/tcp.h>
+#include <linux/filter.h>
+#include <linux/unistd.h>
 
 #include <sys/wait.h>
 #include <sys/resource.h>
@@ -273,16 +276,26 @@ static void test_bpf_obj_id(void)
 	const int nr_iters = 2;
 	const char *file = "./test_obj_id.o";
 	const char *jit_sysctl = "/proc/sys/net/core/bpf_jit_enable";
+	const char *expected_prog_name = "test_obj_id";
+	const char *expected_map_name = "test_map_id";
+	const __u64 nsec_per_sec = 1000000000;
 
 	struct bpf_object *objs[nr_iters];
 	int prog_fds[nr_iters], map_fds[nr_iters];
 	/* +1 to test for the info_len returned by kernel */
 	struct bpf_prog_info prog_infos[nr_iters + 1];
 	struct bpf_map_info map_infos[nr_iters + 1];
+	/* Each prog only uses one map. +1 to test nr_map_ids
+	 * returned by kernel.
+	 */
+	__u32 map_ids[nr_iters + 1];
 	char jited_insns[128], xlated_insns[128], zeros[128];
 	__u32 i, next_id, info_len, nr_id_found, duration = 0;
+	struct timespec real_time_ts, boot_time_ts;
 	int sysctl_fd, jit_enabled = 0, err = 0;
 	__u64 array_value;
+	uid_t my_uid = getuid();
+	time_t now, load_time;
 
 	sysctl_fd = open(jit_sysctl, 0, O_RDONLY);
 	if (sysctl_fd != -1) {
@@ -307,6 +320,7 @@ static void test_bpf_obj_id(void)
 	/* Check bpf_obj_get_info_by_fd() */
 	bzero(zeros, sizeof(zeros));
 	for (i = 0; i < nr_iters; i++) {
+		now = time(NULL);
 		err = bpf_prog_load(file, BPF_PROG_TYPE_SOCKET_FILTER,
 				    &objs[i], &prog_fds[i]);
 		/* test_obj_id.o is a dumb prog. It should never fail
@@ -334,16 +348,18 @@ static void test_bpf_obj_id(void)
 			  map_infos[i].value_size != sizeof(__u64) ||
 			  map_infos[i].max_entries != 1 ||
 			  map_infos[i].map_flags != 0 ||
-			  info_len != sizeof(struct bpf_map_info),
+			  info_len != sizeof(struct bpf_map_info) ||
+			  strcmp((char *)map_infos[i].name, expected_map_name),
 			  "get-map-info(fd)",
-			  "err %d errno %d type %d(%d) info_len %u(%lu) key_size %u value_size %u max_entries %u map_flags %X\n",
+			  "err %d errno %d type %d(%d) info_len %u(%lu) key_size %u value_size %u max_entries %u map_flags %X name %s(%s)\n",
 			  err, errno,
 			  map_infos[i].type, BPF_MAP_TYPE_ARRAY,
 			  info_len, sizeof(struct bpf_map_info),
 			  map_infos[i].key_size,
 			  map_infos[i].value_size,
 			  map_infos[i].max_entries,
-			  map_infos[i].map_flags))
+			  map_infos[i].map_flags,
+			  map_infos[i].name, expected_map_name))
 			goto done;
 
 		/* Check getting prog info */
@@ -355,8 +371,16 @@ static void test_bpf_obj_id(void)
 		prog_infos[i].jited_prog_len = sizeof(jited_insns);
 		prog_infos[i].xlated_prog_insns = ptr_to_u64(xlated_insns);
 		prog_infos[i].xlated_prog_len = sizeof(xlated_insns);
+		prog_infos[i].map_ids = ptr_to_u64(map_ids + i);
+		prog_infos[i].nr_map_ids = 2;
+		err = clock_gettime(CLOCK_REALTIME, &real_time_ts);
+		assert(!err);
+		err = clock_gettime(CLOCK_BOOTTIME, &boot_time_ts);
+		assert(!err);
 		err = bpf_obj_get_info_by_fd(prog_fds[i], &prog_infos[i],
 					     &info_len);
+		load_time = (real_time_ts.tv_sec - boot_time_ts.tv_sec)
+			+ (prog_infos[i].load_time / nsec_per_sec);
 		if (CHECK(err ||
 			  prog_infos[i].type != BPF_PROG_TYPE_SOCKET_FILTER ||
 			  info_len != sizeof(struct bpf_prog_info) ||
@@ -364,9 +388,14 @@ static void test_bpf_obj_id(void)
 			  (jit_enabled &&
 			   !memcmp(jited_insns, zeros, sizeof(zeros))) ||
 			  !prog_infos[i].xlated_prog_len ||
-			  !memcmp(xlated_insns, zeros, sizeof(zeros)),
+			  !memcmp(xlated_insns, zeros, sizeof(zeros)) ||
+			  load_time < now - 60 || load_time > now + 60 ||
+			  prog_infos[i].created_by_uid != my_uid ||
+			  prog_infos[i].nr_map_ids != 1 ||
+			  *(int *)prog_infos[i].map_ids != map_infos[i].id ||
+			  strcmp((char *)prog_infos[i].name, expected_prog_name),
 			  "get-prog-info(fd)",
-			  "err %d errno %d i %d type %d(%d) info_len %u(%lu) jit_enabled %d jited_prog_len %u xlated_prog_len %u jited_prog %d xlated_prog %d\n",
+			  "err %d errno %d i %d type %d(%d) info_len %u(%lu) jit_enabled %d jited_prog_len %u xlated_prog_len %u jited_prog %d xlated_prog %d load_time %lu(%lu) uid %u(%u) nr_map_ids %u(%u) map_id %u(%u) name %s(%s)\n",
 			  err, errno, i,
 			  prog_infos[i].type, BPF_PROG_TYPE_SOCKET_FILTER,
 			  info_len, sizeof(struct bpf_prog_info),
@@ -374,9 +403,13 @@ static void test_bpf_obj_id(void)
 			  prog_infos[i].jited_prog_len,
 			  prog_infos[i].xlated_prog_len,
 			  !!memcmp(jited_insns, zeros, sizeof(zeros)),
-			  !!memcmp(xlated_insns, zeros, sizeof(zeros))))
+			  !!memcmp(xlated_insns, zeros, sizeof(zeros)),
+			  load_time, now,
+			  prog_infos[i].created_by_uid, my_uid,
+			  prog_infos[i].nr_map_ids, 1,
+			  *(int *)prog_infos[i].map_ids, map_infos[i].id,
+			  prog_infos[i].name, expected_prog_name))
 			goto done;
-
 	}
 
 	/* Check bpf_prog_get_next_id() */
@@ -384,6 +417,7 @@ static void test_bpf_obj_id(void)
 	next_id = 0;
 	while (!bpf_prog_get_next_id(next_id, &next_id)) {
 		struct bpf_prog_info prog_info = {};
+		__u32 saved_map_id;
 		int prog_fd;
 
 		info_len = sizeof(prog_info);
@@ -406,16 +440,33 @@ static void test_bpf_obj_id(void)
 
 		nr_id_found++;
 
+		/* Negative test:
+		 * prog_info.nr_map_ids = 1
+		 * prog_info.map_ids = NULL
+		 */
+		prog_info.nr_map_ids = 1;
+		err = bpf_obj_get_info_by_fd(prog_fd, &prog_info, &info_len);
+		if (CHECK(!err || errno != EFAULT,
+			  "get-prog-fd-bad-nr-map-ids", "err %d errno %d(%d)",
+			  err, errno, EFAULT))
+			break;
+		bzero(&prog_info, sizeof(prog_info));
+		info_len = sizeof(prog_info);
+
+		saved_map_id = *(int *)(prog_infos[i].map_ids);
+		prog_info.map_ids = prog_infos[i].map_ids;
+		prog_info.nr_map_ids = 2;
 		err = bpf_obj_get_info_by_fd(prog_fd, &prog_info, &info_len);
 		prog_infos[i].jited_prog_insns = 0;
 		prog_infos[i].xlated_prog_insns = 0;
 		CHECK(err || info_len != sizeof(struct bpf_prog_info) ||
-		      memcmp(&prog_info, &prog_infos[i], info_len),
+		      memcmp(&prog_info, &prog_infos[i], info_len) ||
+		      *(int *)prog_info.map_ids != saved_map_id,
 		      "get-prog-info(next_id->fd)",
-		      "err %d errno %d info_len %u(%lu) memcmp %d\n",
+		      "err %d errno %d info_len %u(%lu) memcmp %d map_id %u(%u)\n",
 		      err, errno, info_len, sizeof(struct bpf_prog_info),
-		      memcmp(&prog_info, &prog_infos[i], info_len));
-
+		      memcmp(&prog_info, &prog_infos[i], info_len),
+		      *(int *)prog_info.map_ids, saved_map_id);
 		close(prog_fd);
 	}
 	CHECK(nr_id_found != nr_iters,
@@ -497,6 +548,75 @@ static void test_pkt_md_access(void)
 	bpf_object__close(obj);
 }
 
+static void test_obj_name(void)
+{
+	struct {
+		const char *name;
+		int success;
+		int expected_errno;
+	} tests[] = {
+		{ "", 1, 0 },
+		{ "_123456789ABCDE", 1, 0 },
+		{ "_123456789ABCDEF", 0, EINVAL },
+		{ "_123456789ABCD\n", 0, EINVAL },
+	};
+	struct bpf_insn prog[] = {
+		BPF_ALU64_IMM(BPF_MOV, BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	__u32 duration = 0;
+	int i;
+
+	for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) {
+		size_t name_len = strlen(tests[i].name) + 1;
+		union bpf_attr attr;
+		size_t ncopy;
+		int fd;
+
+		/* test different attr.prog_name during BPF_PROG_LOAD */
+		ncopy = name_len < sizeof(attr.prog_name) ?
+			name_len : sizeof(attr.prog_name);
+		bzero(&attr, sizeof(attr));
+		attr.prog_type = BPF_PROG_TYPE_SCHED_CLS;
+		attr.insn_cnt = 2;
+		attr.insns = ptr_to_u64(prog);
+		attr.license = ptr_to_u64("");
+		memcpy(attr.prog_name, tests[i].name, ncopy);
+
+		fd = syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
+		CHECK((tests[i].success && fd < 0) ||
+		      (!tests[i].success && fd != -1) ||
+		      (!tests[i].success && errno != tests[i].expected_errno),
+		      "check-bpf-prog-name",
+		      "fd %d(%d) errno %d(%d)\n",
+		       fd, tests[i].success, errno, tests[i].expected_errno);
+
+		if (fd != -1)
+			close(fd);
+
+		/* test different attr.map_name during BPF_MAP_CREATE */
+		ncopy = name_len < sizeof(attr.map_name) ?
+			name_len : sizeof(attr.map_name);
+		bzero(&attr, sizeof(attr));
+		attr.map_type = BPF_MAP_TYPE_ARRAY;
+		attr.key_size = 4;
+		attr.value_size = 4;
+		attr.max_entries = 1;
+		attr.map_flags = 0;
+		memcpy(attr.map_name, tests[i].name, ncopy);
+		fd = syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
+		CHECK((tests[i].success && fd < 0) ||
+		      (!tests[i].success && fd != -1) ||
+		      (!tests[i].success && errno != tests[i].expected_errno),
+		      "check-bpf-map-name",
+		      "fd %d(%d) errno %d(%d)\n",
+		      fd, tests[i].success, errno, tests[i].expected_errno);
+
+		if (fd != -1)
+			close(fd);
+	}
+}
+
 int main(void)
 {
 	struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
@@ -509,6 +629,7 @@ int main(void)
 	test_tcp_estats();
 	test_bpf_obj_id();
 	test_pkt_md_access();
+	test_obj_name();
 
 	printf("Summary: %d PASSED, %d FAILED\n", pass_cnt, error_cnt);
 	return error_cnt ? EXIT_FAILURE : EXIT_SUCCESS;
-- 
2.9.5

^ permalink raw reply related

* [PATCH net-next 3/5] bpf: libbpf: Provide basic API support to specify BPF obj name
From: Martin KaFai Lau @ 2017-09-27 21:37 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team
In-Reply-To: <20170927213756.1254938-1-kafai@fb.com>

This patch extends the libbpf to provide API support to
allow specifying BPF object name.

In tools/lib/bpf/libbpf, the C symbol of the function
and the map is used.  Regarding section name, all maps are
under the same section named "maps".  Hence, section name
is not a good choice for map's name.  To be consistent with
map, bpf_prog also follows and uses its function symbol as
the prog's name.

This patch adds logic to collect function's symbols in libbpf.
There is existing codes to collect the map's symbols and no change
is needed.

The bpf_load_program_name() and bpf_map_create_name() are
added to take the name argument.  For the other bpf_map_create_xxx()
variants, a name argument is directly added to them.

In samples/bpf, bpf_load.c in particular, the symbol is also
used as the map's name and the map symbols has already been
collected in the existing code.  For bpf_prog, bpf_load.c does
not collect the function symbol name.  We can consider to collect
them later if there is a need to continue supporting the bpf_load.c.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
 samples/bpf/bpf_load.c                      |   2 +
 samples/bpf/map_perf_test_user.c            |   1 +
 tools/include/uapi/linux/bpf.h              |  10 +++
 tools/lib/bpf/bpf.c                         |  57 +++++++++++----
 tools/lib/bpf/bpf.h                         |  23 ++++--
 tools/lib/bpf/libbpf.c                      | 109 +++++++++++++++++++++-------
 tools/testing/selftests/bpf/test_verifier.c |   2 +-
 7 files changed, 157 insertions(+), 47 deletions(-)

diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index 6aa50098dfb8..18b1c8dd0391 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -221,6 +221,7 @@ static int load_maps(struct bpf_map_data *maps, int nr_maps,
 			int inner_map_fd = map_fd[maps[i].def.inner_map_idx];
 
 			map_fd[i] = bpf_create_map_in_map_node(maps[i].def.type,
+							maps[i].name,
 							maps[i].def.key_size,
 							inner_map_fd,
 							maps[i].def.max_entries,
@@ -228,6 +229,7 @@ static int load_maps(struct bpf_map_data *maps, int nr_maps,
 							numa_node);
 		} else {
 			map_fd[i] = bpf_create_map_node(maps[i].def.type,
+							maps[i].name,
 							maps[i].def.key_size,
 							maps[i].def.value_size,
 							maps[i].def.max_entries,
diff --git a/samples/bpf/map_perf_test_user.c b/samples/bpf/map_perf_test_user.c
index a0310fc70057..519d9af4b04a 100644
--- a/samples/bpf/map_perf_test_user.c
+++ b/samples/bpf/map_perf_test_user.c
@@ -137,6 +137,7 @@ static void do_test_lru(enum test_type test, int cpu)
 
 			inner_lru_map_fds[cpu] =
 				bpf_create_map_node(BPF_MAP_TYPE_LRU_HASH,
+						    test_map_names[INNER_LRU_HASH_PREALLOC],
 						    sizeof(uint32_t),
 						    sizeof(long),
 						    inner_lru_hash_size, 0,
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index e43491ac4823..6d2137b4cf38 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -175,6 +175,8 @@ enum bpf_attach_type {
 /* Specify numa node during map creation */
 #define BPF_F_NUMA_NODE		(1U << 2)
 
+#define BPF_OBJ_NAME_LEN 16U
+
 union bpf_attr {
 	struct { /* anonymous struct used by BPF_MAP_CREATE command */
 		__u32	map_type;	/* one of enum bpf_map_type */
@@ -188,6 +190,7 @@ union bpf_attr {
 		__u32	numa_node;	/* numa node (effective only if
 					 * BPF_F_NUMA_NODE is set).
 					 */
+		__u8	map_name[BPF_OBJ_NAME_LEN];
 	};
 
 	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
@@ -210,6 +213,7 @@ union bpf_attr {
 		__aligned_u64	log_buf;	/* user supplied buffer */
 		__u32		kern_version;	/* checked when prog_type=kprobe */
 		__u32		prog_flags;
+		__u8		prog_name[BPF_OBJ_NAME_LEN];
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_* commands */
@@ -812,6 +816,11 @@ struct bpf_prog_info {
 	__u32 xlated_prog_len;
 	__aligned_u64 jited_prog_insns;
 	__aligned_u64 xlated_prog_insns;
+	__u64 load_time;	/* ns since boottime */
+	__u32 created_by_uid;
+	__u32 nr_map_ids;
+	__aligned_u64 map_ids;
+	__u8  name[BPF_OBJ_NAME_LEN];
 } __attribute__((aligned(8)));
 
 struct bpf_map_info {
@@ -821,6 +830,7 @@ struct bpf_map_info {
 	__u32 value_size;
 	__u32 max_entries;
 	__u32 map_flags;
+	__u8  name[BPF_OBJ_NAME_LEN];
 } __attribute__((aligned(8)));
 
 /* User bpf_sock_ops struct to access socket values and specify request ops
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 1d6907d379c9..daf624e4c720 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -46,6 +46,8 @@
 # endif
 #endif
 
+#define min(x, y) ((x) < (y) ? (x) : (y))
+
 static inline __u64 ptr_to_u64(const void *ptr)
 {
 	return (__u64) (unsigned long) ptr;
@@ -57,10 +59,11 @@ static inline int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
 	return syscall(__NR_bpf, cmd, attr, size);
 }
 
-int bpf_create_map_node(enum bpf_map_type map_type, int key_size,
-			int value_size, int max_entries, __u32 map_flags,
-			int node)
+int bpf_create_map_node(enum bpf_map_type map_type, const char *name,
+			int key_size, int value_size, int max_entries,
+			__u32 map_flags, int node)
 {
+	__u32 name_len = name ? strlen(name) : 0;
 	union bpf_attr attr;
 
 	memset(&attr, '\0', sizeof(attr));
@@ -70,6 +73,8 @@ int bpf_create_map_node(enum bpf_map_type map_type, int key_size,
 	attr.value_size = value_size;
 	attr.max_entries = max_entries;
 	attr.map_flags = map_flags;
+	memcpy(attr.map_name, name, min(name_len, BPF_OBJ_NAME_LEN - 1));
+
 	if (node >= 0) {
 		attr.map_flags |= BPF_F_NUMA_NODE;
 		attr.numa_node = node;
@@ -81,14 +86,23 @@ int bpf_create_map_node(enum bpf_map_type map_type, int key_size,
 int bpf_create_map(enum bpf_map_type map_type, int key_size,
 		   int value_size, int max_entries, __u32 map_flags)
 {
-	return bpf_create_map_node(map_type, key_size, value_size,
+	return bpf_create_map_node(map_type, NULL, key_size, value_size,
 				   max_entries, map_flags, -1);
 }
 
-int bpf_create_map_in_map_node(enum bpf_map_type map_type, int key_size,
-			       int inner_map_fd, int max_entries,
+int bpf_create_map_name(enum bpf_map_type map_type, const char *name,
+			int key_size, int value_size, int max_entries,
+			__u32 map_flags)
+{
+	return bpf_create_map_node(map_type, name, key_size, value_size,
+				   max_entries, map_flags, -1);
+}
+
+int bpf_create_map_in_map_node(enum bpf_map_type map_type, const char *name,
+			       int key_size, int inner_map_fd, int max_entries,
 			       __u32 map_flags, int node)
 {
+	__u32 name_len = name ? strlen(name) : 0;
 	union bpf_attr attr;
 
 	memset(&attr, '\0', sizeof(attr));
@@ -99,6 +113,8 @@ int bpf_create_map_in_map_node(enum bpf_map_type map_type, int key_size,
 	attr.inner_map_fd = inner_map_fd;
 	attr.max_entries = max_entries;
 	attr.map_flags = map_flags;
+	memcpy(attr.map_name, name, min(name_len, BPF_OBJ_NAME_LEN - 1));
+
 	if (node >= 0) {
 		attr.map_flags |= BPF_F_NUMA_NODE;
 		attr.numa_node = node;
@@ -107,19 +123,24 @@ int bpf_create_map_in_map_node(enum bpf_map_type map_type, int key_size,
 	return sys_bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
 }
 
-int bpf_create_map_in_map(enum bpf_map_type map_type, int key_size,
-			  int inner_map_fd, int max_entries, __u32 map_flags)
+int bpf_create_map_in_map(enum bpf_map_type map_type, const char *name,
+			  int key_size, int inner_map_fd, int max_entries,
+			  __u32 map_flags)
 {
-	return bpf_create_map_in_map_node(map_type, key_size, inner_map_fd,
-					  max_entries, map_flags, -1);
+	return bpf_create_map_in_map_node(map_type, name, key_size,
+					  inner_map_fd, max_entries, map_flags,
+					  -1);
 }
 
-int bpf_load_program(enum bpf_prog_type type, const struct bpf_insn *insns,
-		     size_t insns_cnt, const char *license,
-		     __u32 kern_version, char *log_buf, size_t log_buf_sz)
+int bpf_load_program_name(enum bpf_prog_type type, const char *name,
+			  const struct bpf_insn *insns,
+			  size_t insns_cnt, const char *license,
+			  __u32 kern_version, char *log_buf,
+			  size_t log_buf_sz)
 {
 	int fd;
 	union bpf_attr attr;
+	__u32 name_len = name ? strlen(name) : 0;
 
 	bzero(&attr, sizeof(attr));
 	attr.prog_type = type;
@@ -130,6 +151,7 @@ int bpf_load_program(enum bpf_prog_type type, const struct bpf_insn *insns,
 	attr.log_size = 0;
 	attr.log_level = 0;
 	attr.kern_version = kern_version;
+	memcpy(attr.prog_name, name, min(name_len, BPF_OBJ_NAME_LEN - 1));
 
 	fd = sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
 	if (fd >= 0 || !log_buf || !log_buf_sz)
@@ -143,6 +165,15 @@ int bpf_load_program(enum bpf_prog_type type, const struct bpf_insn *insns,
 	return sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
 }
 
+int bpf_load_program(enum bpf_prog_type type, const struct bpf_insn *insns,
+		     size_t insns_cnt, const char *license,
+		     __u32 kern_version, char *log_buf,
+		     size_t log_buf_sz)
+{
+	return bpf_load_program_name(type, NULL, insns, insns_cnt, license,
+				     kern_version, log_buf, log_buf_sz);
+}
+
 int bpf_verify_program(enum bpf_prog_type type, const struct bpf_insn *insns,
 		       size_t insns_cnt, int strict_alignment,
 		       const char *license, __u32 kern_version,
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index b8ea5843c39e..118d00535a0d 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -24,19 +24,28 @@
 #include <linux/bpf.h>
 #include <stddef.h>
 
-int bpf_create_map_node(enum bpf_map_type map_type, int key_size,
-			int value_size, int max_entries, __u32 map_flags,
-			int node);
+int bpf_create_map_node(enum bpf_map_type map_type, const char *name,
+			int key_size, int value_size, int max_entries,
+			__u32 map_flags, int node);
+int bpf_create_map_name(enum bpf_map_type map_type, const char *name,
+			int key_size, int value_size, int max_entries,
+			__u32 map_flags);
 int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
 		   int max_entries, __u32 map_flags);
-int bpf_create_map_in_map_node(enum bpf_map_type map_type, int key_size,
-			       int inner_map_fd, int max_entries,
+int bpf_create_map_in_map_node(enum bpf_map_type map_type, const char *name,
+			       int key_size, int inner_map_fd, int max_entries,
 			       __u32 map_flags, int node);
-int bpf_create_map_in_map(enum bpf_map_type map_type, int key_size,
-			  int inner_map_fd, int max_entries, __u32 map_flags);
+int bpf_create_map_in_map(enum bpf_map_type map_type, const char *name,
+			  int key_size, int inner_map_fd, int max_entries,
+			  __u32 map_flags);
 
 /* Recommend log buffer size */
 #define BPF_LOG_BUF_SIZE 65536
+int bpf_load_program_name(enum bpf_prog_type type, const char *name,
+			  const struct bpf_insn *insns,
+			  size_t insns_cnt, const char *license,
+			  __u32 kern_version, char *log_buf,
+			  size_t log_buf_sz);
 int bpf_load_program(enum bpf_prog_type type, const struct bpf_insn *insns,
 		     size_t insns_cnt, const char *license,
 		     __u32 kern_version, char *log_buf,
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 35f6dfcdc565..4f402dcdf372 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -171,6 +171,7 @@ int libbpf_strerror(int err, char *buf, size_t size)
 struct bpf_program {
 	/* Index in elf obj file, for relocation use. */
 	int idx;
+	char *name;
 	char *section_name;
 	struct bpf_insn *insns;
 	size_t insns_cnt;
@@ -283,6 +284,7 @@ static void bpf_program__exit(struct bpf_program *prog)
 	prog->clear_priv = NULL;
 
 	bpf_program__unload(prog);
+	zfree(&prog->name);
 	zfree(&prog->section_name);
 	zfree(&prog->insns);
 	zfree(&prog->reloc_desc);
@@ -293,26 +295,27 @@ static void bpf_program__exit(struct bpf_program *prog)
 }
 
 static int
-bpf_program__init(void *data, size_t size, char *name, int idx,
-		    struct bpf_program *prog)
+bpf_program__init(void *data, size_t size, char *section_name, int idx,
+		  struct bpf_program *prog)
 {
 	if (size < sizeof(struct bpf_insn)) {
-		pr_warning("corrupted section '%s'\n", name);
+		pr_warning("corrupted section '%s'\n", section_name);
 		return -EINVAL;
 	}
 
 	bzero(prog, sizeof(*prog));
 
-	prog->section_name = strdup(name);
+	prog->section_name = strdup(section_name);
 	if (!prog->section_name) {
-		pr_warning("failed to alloc name for prog %s\n",
-			   name);
+		pr_warning("failed to alloc name for prog under section %s\n",
+			   section_name);
 		goto errout;
 	}
 
 	prog->insns = malloc(size);
 	if (!prog->insns) {
-		pr_warning("failed to alloc insns for %s\n", name);
+		pr_warning("failed to alloc insns for prog under section %s\n",
+			   section_name);
 		goto errout;
 	}
 	prog->insns_cnt = size / sizeof(struct bpf_insn);
@@ -331,12 +334,12 @@ bpf_program__init(void *data, size_t size, char *name, int idx,
 
 static int
 bpf_object__add_program(struct bpf_object *obj, void *data, size_t size,
-			char *name, int idx)
+			char *section_name, int idx)
 {
 	struct bpf_program prog, *progs;
 	int nr_progs, err;
 
-	err = bpf_program__init(data, size, name, idx, &prog);
+	err = bpf_program__init(data, size, section_name, idx, &prog);
 	if (err)
 		return err;
 
@@ -350,8 +353,8 @@ bpf_object__add_program(struct bpf_object *obj, void *data, size_t size,
 		 * is still valid, so don't need special treat for
 		 * bpf_close_object().
 		 */
-		pr_warning("failed to alloc a new program '%s'\n",
-			   name);
+		pr_warning("failed to alloc a new program under section '%s'\n",
+			   section_name);
 		bpf_program__exit(&prog);
 		return -ENOMEM;
 	}
@@ -364,6 +367,54 @@ bpf_object__add_program(struct bpf_object *obj, void *data, size_t size,
 	return 0;
 }
 
+static int
+bpf_object__init_prog_names(struct bpf_object *obj)
+{
+	Elf_Data *symbols = obj->efile.symbols;
+	struct bpf_program *prog;
+	size_t pi, si;
+
+	for (pi = 0; pi < obj->nr_programs; pi++) {
+		char *name = NULL;
+
+		prog = &obj->programs[pi];
+
+		for (si = 0; si < symbols->d_size / sizeof(GElf_Sym) && !name;
+		     si++) {
+			GElf_Sym sym;
+
+			if (!gelf_getsym(symbols, si, &sym))
+				continue;
+			if (sym.st_shndx != prog->idx)
+				continue;
+
+			name = elf_strptr(obj->efile.elf,
+					  obj->efile.strtabidx,
+					  sym.st_name);
+			if (!name) {
+				pr_warning("failed to get sym name string for prog %s\n",
+					   prog->section_name);
+				return -LIBBPF_ERRNO__LIBELF;
+			}
+		}
+
+		if (!name) {
+			pr_warning("failed to find sym for prog %s\n",
+				   prog->section_name);
+			return -EINVAL;
+		}
+
+		prog->name = strdup(name);
+		if (!prog->name) {
+			pr_warning("failed to allocate memory for prog sym %s\n",
+				   name);
+			return -ENOMEM;
+		}
+	}
+
+	return 0;
+}
+
 static struct bpf_object *bpf_object__new(const char *path,
 					  void *obj_buf,
 					  size_t obj_buf_sz)
@@ -766,8 +817,12 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
 		pr_warning("Corrupted ELF file: index of strtab invalid\n");
 		return LIBBPF_ERRNO__FORMAT;
 	}
-	if (obj->efile.maps_shndx >= 0)
+	if (obj->efile.maps_shndx >= 0) {
 		err = bpf_object__init_maps(obj);
+		if (err)
+			goto out;
+	}
+	err = bpf_object__init_prog_names(obj);
 out:
 	return err;
 }
@@ -870,11 +925,12 @@ bpf_object__create_maps(struct bpf_object *obj)
 		struct bpf_map_def *def = &obj->maps[i].def;
 		int *pfd = &obj->maps[i].fd;
 
-		*pfd = bpf_create_map(def->type,
-				      def->key_size,
-				      def->value_size,
-				      def->max_entries,
-				      0);
+		*pfd = bpf_create_map_name(def->type,
+					   obj->maps[i].name,
+					   def->key_size,
+					   def->value_size,
+					   def->max_entries,
+					   0);
 		if (*pfd < 0) {
 			size_t j;
 			int err = *pfd;
@@ -982,7 +1038,7 @@ static int bpf_object__collect_reloc(struct bpf_object *obj)
 }
 
 static int
-load_program(enum bpf_prog_type type, struct bpf_insn *insns,
+load_program(enum bpf_prog_type type, const char *name, struct bpf_insn *insns,
 	     int insns_cnt, char *license, u32 kern_version, int *pfd)
 {
 	int ret;
@@ -995,8 +1051,8 @@ load_program(enum bpf_prog_type type, struct bpf_insn *insns,
 	if (!log_buf)
 		pr_warning("Alloc log buffer for bpf loader error, continue without log\n");
 
-	ret = bpf_load_program(type, insns, insns_cnt, license,
-			       kern_version, log_buf, BPF_LOG_BUF_SIZE);
+	ret = bpf_load_program_name(type, name, insns, insns_cnt, license,
+				    kern_version, log_buf, BPF_LOG_BUF_SIZE);
 
 	if (ret >= 0) {
 		*pfd = ret;
@@ -1021,9 +1077,9 @@ load_program(enum bpf_prog_type type, struct bpf_insn *insns,
 		if (type != BPF_PROG_TYPE_KPROBE) {
 			int fd;
 
-			fd = bpf_load_program(BPF_PROG_TYPE_KPROBE, insns,
-					      insns_cnt, license, kern_version,
-					      NULL, 0);
+			fd = bpf_load_program_name(BPF_PROG_TYPE_KPROBE, name,
+						   insns, insns_cnt, license,
+						   kern_version, NULL, 0);
 			if (fd >= 0) {
 				close(fd);
 				ret = -LIBBPF_ERRNO__PROGTYPE;
@@ -1067,8 +1123,8 @@ bpf_program__load(struct bpf_program *prog,
 			pr_warning("Program '%s' is inconsistent: nr(%d) != 1\n",
 				   prog->section_name, prog->instances.nr);
 		}
-		err = load_program(prog->type, prog->insns, prog->insns_cnt,
-				   license, kern_version, &fd);
+		err = load_program(prog->type, prog->name, prog->insns,
+				   prog->insns_cnt, license, kern_version, &fd);
 		if (!err)
 			prog->instances.fds[0] = fd;
 		goto out;
@@ -1096,7 +1152,8 @@ bpf_program__load(struct bpf_program *prog,
 			continue;
 		}
 
-		err = load_program(prog->type, result.new_insn_ptr,
+		err = load_program(prog->type, prog->name,
+				   result.new_insn_ptr,
 				   result.new_insn_cnt,
 				   license, kern_version, &fd);
 
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index a0426147523d..290d5056c165 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -6939,7 +6939,7 @@ static int create_map_in_map(void)
 		return inner_map_fd;
 	}
 
-	outer_map_fd = bpf_create_map_in_map(BPF_MAP_TYPE_ARRAY_OF_MAPS,
+	outer_map_fd = bpf_create_map_in_map(BPF_MAP_TYPE_ARRAY_OF_MAPS, NULL,
 					     sizeof(int), inner_map_fd, 1, 0);
 	if (outer_map_fd < 0)
 		printf("Failed to create array of maps '%s'!\n",
-- 
2.9.5

^ permalink raw reply related

* [PATCH] mkiss: remove redundant check on len being zero
From: Colin King @ 2017-09-27 21:45 UTC (permalink / raw)
  To: David S . Miller, Stephen Hemminger, Johannes Berg, Ralf Baechle,
	netdev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

The check on len is redundant as it is always greater than 1,
so just remove it and make the printk less complex.

Detected by CoverityScan, CID#1226729 ("Logically dead code")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/hamradio/mkiss.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/hamradio/mkiss.c b/drivers/net/hamradio/mkiss.c
index aec6c26563cf..54bf8e6e4a09 100644
--- a/drivers/net/hamradio/mkiss.c
+++ b/drivers/net/hamradio/mkiss.c
@@ -477,7 +477,8 @@ static void ax_encaps(struct net_device *dev, unsigned char *icp, int len)
 				  cmd = 0;
 				}
 				ax->crcauto = (cmd ? 0 : 1);
-				printk(KERN_INFO "mkiss: %s: crc mode %s %d\n", ax->dev->name, (len) ? "set to" : "is", cmd);
+				printk(KERN_INFO "mkiss: %s: crc mode set to %d\n",
+				       ax->dev->name, cmd);
 			}
 			spin_unlock_bh(&ax->buflock);
 			netif_start_queue(dev);
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH net-next] libbpf: use map_flags when creating maps
From: Daniel Borkmann @ 2017-09-27 22:03 UTC (permalink / raw)
  To: Alexei Starovoitov, Craig Gallek, David S . Miller; +Cc: Chonggang Li, netdev
In-Reply-To: <f7d4addc-9f6a-4ddb-b383-f255cd2f728f@fb.com>

On 09/27/2017 06:29 PM, Alexei Starovoitov wrote:
> On 9/27/17 7:04 AM, Craig Gallek wrote:
>> From: Craig Gallek <kraig@google.com>
>>
>> This extends struct bpf_map_def to include a flags field.  Note that
>> this has the potential to break the validation logic in
>> bpf_object__validate_maps and bpf_object__init_maps as they use
>> sizeof(struct bpf_map_def) as a minimal allowable size of a map section.
>> Any bpf program compiled with a smaller struct bpf_map_def will fail this
>> check.
>>
>> I don't believe this will be an issue in practice as both compile-time
>> definitions of struct bpf_map_def (in samples/bpf/bpf_load.h and
>> tools/testing/selftests/bpf/bpf_helpers.h) have always been larger
>> than this newly updated version in libbpf.h.
>>
>> Signed-off-by: Craig Gallek <kraig@google.com>
>> ---
>>  tools/lib/bpf/libbpf.c | 2 +-
>>  tools/lib/bpf/libbpf.h | 1 +
>>  2 files changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>> index 35f6dfcdc565..6bea85f260a3 100644
>> --- a/tools/lib/bpf/libbpf.c
>> +++ b/tools/lib/bpf/libbpf.c
>> @@ -874,7 +874,7 @@ bpf_object__create_maps(struct bpf_object *obj)
>>                        def->key_size,
>>                        def->value_size,
>>                        def->max_entries,
>> -                      0);
>> +                      def->map_flags);
>>          if (*pfd < 0) {
>>              size_t j;
>>              int err = *pfd;
>> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
>> index 7959086eb9c9..6e20003109e0 100644
>> --- a/tools/lib/bpf/libbpf.h
>> +++ b/tools/lib/bpf/libbpf.h
>> @@ -207,6 +207,7 @@ struct bpf_map_def {
>>      unsigned int key_size;
>>      unsigned int value_size;
>>      unsigned int max_entries;
>> +    unsigned int map_flags;
>>  };
>
> yes it will break loading of pre-compiled .o
> Instead of breaking, let's fix the loader to do it the way
> samples/bpf/bpf_load.c does.
> See commit 156450d9d964 ("samples/bpf: make bpf_load.c code compatible with ELF maps section changes")

+1, iproute2 loader also does map spec fixup

For libbpf it would be good also such that it reduces the diff
further between the libbpf and bpf_load so that it allows move
to libbpf for samples in future.

^ permalink raw reply

* Re: [PATCH net-next 0/3] support changing steering policies in tuntap
From: Michael S. Tsirkin @ 2017-09-27 22:13 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel
In-Reply-To: <1506500637-13881-1-git-send-email-jasowang@redhat.com>

On Wed, Sep 27, 2017 at 04:23:54PM +0800, Jason Wang wrote:
> Hi all:
> 
> We use flow caches based flow steering policy now. This is good for
> connection-oriented communication such as TCP but not for the others
> e.g connectionless unidirectional workload which cares only about
> pps. This calls the ability of supporting changing steering policies
> in tuntap which was done by this series.
> 
> Flow steering policy was abstracted into tun_steering_ops in the first
> patch. Then new ioctls to set or query current policy were introduced,
> and the last patch introduces a very simple policy that select txq
> based on processor id as an example.
> 
> Test was done by using xdp_redirect to redirect traffic generated from
> MoonGen that was running on a remote machine. And I see 37%
> improvement for processor id policy compared to automatic flow
> steering policy.

For sure, if you don't need to figure out the flow hash then you can
save a bunch of cycles.  But I don't think the cpu policy is too
practical outside of a benchmark.

Did you generate packets and just send them to tun? If so, this is not a
typical configuration, is it? With packets coming e.g.  from a real nic
they might already have the hash pre-calculated, and you won't
see the benefit.

> In the future, both simple and sophisticated policy like RSS or other guest
> driven steering policies could be done on top.

IMHO there should be a more practical example before adding all this
indirection. And it would be nice to understand why this queue selection
needs to be tun specific.

> Thanks
> 
> Jason Wang (3):
>   tun: abstract flow steering logic
>   tun: introduce ioctls to set and get steering policies
>   tun: introduce cpu id based steering policy
> 
>  drivers/net/tun.c           | 151 +++++++++++++++++++++++++++++++++++++-------
>  include/uapi/linux/if_tun.h |   8 +++
>  2 files changed, 136 insertions(+), 23 deletions(-)
> 
> -- 
> 2.7.4

^ permalink raw reply

* Re: [PATCH net-next RFC 5/5] vhost_net: basic tx virtqueue batched processing
From: Michael S. Tsirkin @ 2017-09-27 22:19 UTC (permalink / raw)
  To: Jason Wang; +Cc: virtualization, netdev, linux-kernel, kvm
In-Reply-To: <16ea7512-d770-21ef-edb6-3ada51f08592@redhat.com>

On Wed, Sep 27, 2017 at 10:04:18AM +0800, Jason Wang wrote:
> 
> 
> On 2017年09月27日 03:25, Michael S. Tsirkin wrote:
> > On Fri, Sep 22, 2017 at 04:02:35PM +0800, Jason Wang wrote:
> > > This patch implements basic batched processing of tx virtqueue by
> > > prefetching desc indices and updating used ring in a batch. For
> > > non-zerocopy case, vq->heads were used for storing the prefetched
> > > indices and updating used ring. It is also a requirement for doing
> > > more batching on top. For zerocopy case and for simplicity, batched
> > > processing were simply disabled by only fetching and processing one
> > > descriptor at a time, this could be optimized in the future.
> > > 
> > > XDP_DROP (without touching skb) on tun (with Moongen in guest) with
> > > zercopy disabled:
> > > 
> > > Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz:
> > > Before: 3.20Mpps
> > > After:  3.90Mpps (+22%)
> > > 
> > > No differences were seen with zerocopy enabled.
> > > 
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > So where is the speedup coming from? I'd guess the ring is
> > hot in cache, it's faster to access it in one go, then
> > pass many packets to net stack. Is that right?
> > 
> > Another possibility is better code cache locality.
> 
> Yes, I think the speed up comes from:
> 
> - less cache misses
> - less cache line bounce when virtqueue is about to be full (guest is faster
> than host which is the case of MoonGen)
> - less memory barriers
> - possible faster copy speed by using copy_to_user() on modern CPUs
> 
> > 
> > So how about this patchset is refactored:
> > 
> > 1. use existing APIs just first get packets then
> >     transmit them all then use them all
> 
> Looks like current API can not get packets first, it only support get packet
> one by one (if you mean vhost_get_vq_desc()). And used ring updating may get
> more misses in this case.

Right. So if you do

for (...)
	vhost_get_vq_desc


then later

for (...)
	vhost_add_used


then you get most of benefits except maybe code cache misses
and copy_to_user.







> > 2. add new APIs and move the loop into vhost core
> >     for more speedups
> 
> I don't see any advantages, looks like just need some e.g callbacks in this
> case.
> 
> Thanks

IUC callbacks pretty much destroy the code cache locality advantages,
IP is jumping around too much.


-- 
MST

^ permalink raw reply

* Re: [PATCH net-next RFC 0/5] batched tx processing in vhost_net
From: Michael S. Tsirkin @ 2017-09-27 22:28 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <afb7cad9-d760-b4d7-ecc5-518442e061b1@redhat.com>

On Wed, Sep 27, 2017 at 08:27:37AM +0800, Jason Wang wrote:
> 
> 
> On 2017年09月26日 21:45, Michael S. Tsirkin wrote:
> > On Fri, Sep 22, 2017 at 04:02:30PM +0800, Jason Wang wrote:
> > > Hi:
> > > 
> > > This series tries to implement basic tx batched processing. This is
> > > done by prefetching descriptor indices and update used ring in a
> > > batch. This intends to speed up used ring updating and improve the
> > > cache utilization.
> > Interesting, thanks for the patches. So IIUC most of the gain is really
> > overcoming some of the shortcomings of virtio 1.0 wrt cache utilization?
> 
> Yes.
> 
> Actually, looks like batching in 1.1 is not as easy as in 1.0.
> 
> In 1.0, we could do something like:
> 
> batch update used ring by user copy_to_user()
> smp_wmb()
> update used_idx
> In 1.1, we need more memory barriers, can't benefit from fast copy helpers?
> 
> for () {
>     update desc.addr
>     smp_wmb()
>     update desc.flag
> }

Yes but smp_wmb is a NOP on e.g. x86. We can switch to other types of
barriers as well.  We do need to do the updates in order, so we might
need new APIs for that to avoid re-doing the translation all the time.

In 1.0 the last update is a cache miss always. You need batching to get
less misses. In 1.1 you don't have it so fundamentally there is less
need for batching. But batching does not always work.  DPDK guys (which
batch things aggressively) already tried 1.1 and saw performance gains
so we do not need to argue theoretically.



> > 
> > Which is fair enough (1.0 is already deployed) but I would like to avoid
> > making 1.1 support harder, and this patchset does this unfortunately,
> 
> I think the new APIs do not expose more internal data structure of virtio
> than before? (vq->heads has already been used by vhost_net for years).

For sure we might need to change vring_used_elem.

> Consider the layout is re-designed completely, I don't see an easy method to
> reuse current 1.0 API for 1.1.

Current API just says you get buffers then you use them. It is not tied
to actual separate used ring.


> > see comments on individual patches. I'm sure it can be addressed though.
> > 
> > > Test shows about ~22% improvement in tx pss.
> > Is this with or without tx napi in guest?
> 
> MoonGen is used in guest for better numbers.
> 
> Thanks

Not sure I understand. Did you set napi_tx to true or false?

> > 
> > > Please review.
> > > 
> > > Jason Wang (5):
> > >    vhost: split out ring head fetching logic
> > >    vhost: introduce helper to prefetch desc index
> > >    vhost: introduce vhost_add_used_idx()
> > >    vhost_net: rename VHOST_RX_BATCH to VHOST_NET_BATCH
> > >    vhost_net: basic tx virtqueue batched processing
> > > 
> > >   drivers/vhost/net.c   | 221 ++++++++++++++++++++++++++++----------------------
> > >   drivers/vhost/vhost.c | 165 +++++++++++++++++++++++++++++++------
> > >   drivers/vhost/vhost.h |   9 ++
> > >   3 files changed, 270 insertions(+), 125 deletions(-)
> > > 
> > > -- 
> > > 2.7.4
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* [PATCH v2] netlink: do not proceed if dump's start() errs
From: Jason A. Donenfeld @ 2017-09-27 22:41 UTC (permalink / raw)
  To: davem, johannes.berg, netdev, linux-kernel
  Cc: Jason A. Donenfeld, Johannes Berg
In-Reply-To: <CAHmME9oA+dE3eOCTWweJjZGL4sog30JShoCJoWaPn4JV7h3auA@mail.gmail.com>

Drivers that use the start method for netlink dumping rely on dumpit not
being called if start fails. For example, ila_xlat.c allocates memory
and assigns it to cb->args[0] in its start() function. It might fail to
do that and return -ENOMEM instead. However, even when returning an
error, dumpit will be called, which, in the example above, quickly
dereferences the memory in cb->args[0], which will OOPS the kernel. This
is but one example of how this goes wrong.

Since start() has always been a function with an int return type, it
therefore makes sense to use it properly, rather than ignoring it. This
patch thus returns early and does not call dumpit() when start() fails.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
---
 net/netlink/af_netlink.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 327807731b44..94c11cf0459d 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2270,10 +2270,13 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
 
 	mutex_unlock(nlk->cb_mutex);
 
+	ret = 0;
 	if (cb->start)
-		cb->start(cb);
+		ret = cb->start(cb);
+
+	if (!ret)
+		ret = netlink_dump(sk);
 
-	ret = netlink_dump(sk);
 	sock_put(sk);
 
 	if (ret)
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH net-next RFC 2/5] vhost: introduce helper to prefetch desc index
From: Michael S. Tsirkin @ 2017-09-27 22:57 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <17e9c3a9-7759-a674-bc00-414eabfed118@redhat.com>

On Wed, Sep 27, 2017 at 08:35:47AM +0800, Jason Wang wrote:
> 
> 
> On 2017年09月27日 03:19, Michael S. Tsirkin wrote:
> > On Fri, Sep 22, 2017 at 04:02:32PM +0800, Jason Wang wrote:
> > > This patch introduces vhost_prefetch_desc_indices() which could batch
> > > descriptor indices fetching and used ring updating. This intends to
> > > reduce the cache misses of indices fetching and updating and reduce
> > > cache line bounce when virtqueue is almost full. copy_to_user() was
> > > used in order to benefit from modern cpus that support fast string
> > > copy. Batched virtqueue processing will be the first user.
> > > 
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > ---
> > >   drivers/vhost/vhost.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++
> > >   drivers/vhost/vhost.h |  3 +++
> > >   2 files changed, 58 insertions(+)
> > > 
> > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > index f87ec75..8424166d 100644
> > > --- a/drivers/vhost/vhost.c
> > > +++ b/drivers/vhost/vhost.c
> > > @@ -2437,6 +2437,61 @@ struct vhost_msg_node *vhost_dequeue_msg(struct vhost_dev *dev,
> > >   }
> > >   EXPORT_SYMBOL_GPL(vhost_dequeue_msg);
> > > +int vhost_prefetch_desc_indices(struct vhost_virtqueue *vq,
> > > +				struct vring_used_elem *heads,
> > > +				u16 num, bool used_update)
> > why do you need to combine used update with prefetch?
> 
> For better performance


Why is sticking a branch in there better than requesting the update
conditionally from the caller?



> and I believe we don't care about the overhead when
> we meet errors in tx.

That's a separate question, I do not really understand how
you can fetch a descriptor and update the used ring at the same
time. This allows the guest to overwrite the buffer.
I might be misunderstanding what is going on here though.


> > 
> > > +{
> > > +	int ret, ret2;
> > > +	u16 last_avail_idx, last_used_idx, total, copied;
> > > +	__virtio16 avail_idx;
> > > +	struct vring_used_elem __user *used;
> > > +	int i;
> > > +
> > > +	if (unlikely(vhost_get_avail(vq, avail_idx, &vq->avail->idx))) {
> > > +		vq_err(vq, "Failed to access avail idx at %p\n",
> > > +		       &vq->avail->idx);
> > > +		return -EFAULT;
> > > +	}
> > > +	last_avail_idx = vq->last_avail_idx & (vq->num - 1);
> > > +	vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
> > > +	total = vq->avail_idx - vq->last_avail_idx;
> > > +	ret = total = min(total, num);
> > > +
> > > +	for (i = 0; i < ret; i++) {
> > > +		ret2 = vhost_get_avail(vq, heads[i].id,
> > > +				      &vq->avail->ring[last_avail_idx]);
> > > +		if (unlikely(ret2)) {
> > > +			vq_err(vq, "Failed to get descriptors\n");
> > > +			return -EFAULT;
> > > +		}
> > > +		last_avail_idx = (last_avail_idx + 1) & (vq->num - 1);
> > > +	}
> > > +
> > > +	if (!used_update)
> > > +		return ret;
> > > +
> > > +	last_used_idx = vq->last_used_idx & (vq->num - 1);
> > > +	while (total) {
> > > +		copied = min((u16)(vq->num - last_used_idx), total);
> > > +		ret2 = vhost_copy_to_user(vq,
> > > +					  &vq->used->ring[last_used_idx],
> > > +					  &heads[ret - total],
> > > +					  copied * sizeof(*used));
> > > +
> > > +		if (unlikely(ret2)) {
> > > +			vq_err(vq, "Failed to update used ring!\n");
> > > +			return -EFAULT;
> > > +		}
> > > +
> > > +		last_used_idx = 0;
> > > +		total -= copied;
> > > +	}
> > > +
> > > +	/* Only get avail ring entries after they have been exposed by guest. */
> > > +	smp_rmb();
> > Barrier before return is a very confusing API. I guess it's designed to
> > be used in a specific way to make it necessary - but what is it?
> 
> Looks like a and we need do this after reading avail_idx.
> 
> Thanks
> 
> > 
> > 
> > > +	return ret;
> > > +}
> > > +EXPORT_SYMBOL(vhost_prefetch_desc_indices);
> > >   static int __init vhost_init(void)
> > >   {
> > > diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> > > index 39ff897..16c2cb6 100644
> > > --- a/drivers/vhost/vhost.h
> > > +++ b/drivers/vhost/vhost.h
> > > @@ -228,6 +228,9 @@ ssize_t vhost_chr_read_iter(struct vhost_dev *dev, struct iov_iter *to,
> > >   ssize_t vhost_chr_write_iter(struct vhost_dev *dev,
> > >   			     struct iov_iter *from);
> > >   int vhost_init_device_iotlb(struct vhost_dev *d, bool enabled);
> > > +int vhost_prefetch_desc_indices(struct vhost_virtqueue *vq,
> > > +				struct vring_used_elem *heads,
> > > +				u16 num, bool used_update);
> > >   #define vq_err(vq, fmt, ...) do {                                  \
> > >   		pr_debug(pr_fmt(fmt), ##__VA_ARGS__);       \
> > > -- 
> > > 2.7.4
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH net-next RFC 3/5] vhost: introduce vhost_add_used_idx()
From: Michael S. Tsirkin @ 2017-09-27 22:58 UTC (permalink / raw)
  To: Jason Wang; +Cc: virtualization, netdev, linux-kernel, kvm
In-Reply-To: <c42b7244-6d43-2ce5-46a3-ea70ec9d957f@redhat.com>

On Wed, Sep 27, 2017 at 08:38:24AM +0800, Jason Wang wrote:
> 
> 
> On 2017年09月27日 03:13, Michael S. Tsirkin wrote:
> > On Fri, Sep 22, 2017 at 04:02:33PM +0800, Jason Wang wrote:
> > > This patch introduces a helper which just increase the used idx. This
> > > will be used in pair with vhost_prefetch_desc_indices() by batching
> > > code.
> > > 
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > ---
> > >   drivers/vhost/vhost.c | 33 +++++++++++++++++++++++++++++++++
> > >   drivers/vhost/vhost.h |  1 +
> > >   2 files changed, 34 insertions(+)
> > > 
> > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > index 8424166d..6532cda 100644
> > > --- a/drivers/vhost/vhost.c
> > > +++ b/drivers/vhost/vhost.c
> > > @@ -2178,6 +2178,39 @@ int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
> > >   }
> > >   EXPORT_SYMBOL_GPL(vhost_add_used);
> > > +int vhost_add_used_idx(struct vhost_virtqueue *vq, int n)
> > > +{
> > > +	u16 old, new;
> > > +
> > > +	old = vq->last_used_idx;
> > > +	new = (vq->last_used_idx += n);
> > > +	/* If the driver never bothers to signal in a very long while,
> > > +	 * used index might wrap around. If that happens, invalidate
> > > +	 * signalled_used index we stored. TODO: make sure driver
> > > +	 * signals at least once in 2^16 and remove this.
> > > +	 */
> > > +	if (unlikely((u16)(new - vq->signalled_used) < (u16)(new - old)))
> > > +		vq->signalled_used_valid = false;
> > > +
> > > +	/* Make sure buffer is written before we update index. */
> > > +	smp_wmb();
> > > +	if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->last_used_idx),
> > > +			   &vq->used->idx)) {
> > > +		vq_err(vq, "Failed to increment used idx");
> > > +		return -EFAULT;
> > > +	}
> > > +	if (unlikely(vq->log_used)) {
> > > +		/* Log used index update. */
> > > +		log_write(vq->log_base,
> > > +			  vq->log_addr + offsetof(struct vring_used, idx),
> > > +			  sizeof(vq->used->idx));
> > > +		if (vq->log_ctx)
> > > +			eventfd_signal(vq->log_ctx, 1);
> > > +	}
> > > +	return 0;
> > > +}
> > > +EXPORT_SYMBOL_GPL(vhost_add_used_idx);
> > > +
> > >   static int __vhost_add_used_n(struct vhost_virtqueue *vq,
> > >   			    struct vring_used_elem *heads,
> > >   			    unsigned count)
> > > diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> > > index 16c2cb6..5dd6c05 100644
> > > --- a/drivers/vhost/vhost.h
> > > +++ b/drivers/vhost/vhost.h
> > > @@ -199,6 +199,7 @@ int __vhost_get_vq_desc(struct vhost_virtqueue *vq,
> > >   void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
> > >   int vhost_vq_init_access(struct vhost_virtqueue *);
> > > +int vhost_add_used_idx(struct vhost_virtqueue *vq, int n);
> > >   int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len);
> > >   int vhost_add_used_n(struct vhost_virtqueue *, struct vring_used_elem *heads,
> > >   		     unsigned count);
> > Please change the API to hide the fact that there's an index that needs
> > to be updated.
> 
> In fact, an interesting optimization on top is just call
> vhost_add_used_idx(vq, n) instead of n vhost_add_used_idx(vq, 1). That's the
> reason I leave n in the API.
> 
> Thanks

Right but you could increment some internal counter in the vq
structure then update the used index using some api
with a generic name, e.g.  add_used_complete or something like this.

> > 
> > > -- 
> > > 2.7.4

^ permalink raw reply

* Re: [next-queue PATCH 2/3] net/sched: Introduce Credit Based Shaper (CBS) qdisc
From: Jesus Sanchez-Palencia @ 2017-09-27 22:57 UTC (permalink / raw)
  To: Vinicius Costa Gomes, Cong Wang
  Cc: Linux Kernel Network Developers, intel-wired-lan,
	Jamal Hadi Salim, Jiri Pirko, andre.guedes, ivan.briano,
	boon.leong.ong, richardcochran, henrik
In-Reply-To: <87lgkzg7xv.fsf@intel.com>

Hi,


On 09/27/2017 02:14 PM, Vinicius Costa Gomes wrote:
> Hi,
> 
> Cong Wang <xiyou.wangcong@gmail.com> writes:
> 
>> On Tue, Sep 26, 2017 at 4:39 PM, Vinicius Costa Gomes
>> <vinicius.gomes@intel.com> wrote:
>>> +static int cbs_init(struct Qdisc *sch, struct nlattr *opt)
>>> +{
>>> +       struct cbs_sched_data *q = qdisc_priv(sch);
>>> +       struct net_device *dev = qdisc_dev(sch);
>>> +
>>> +       if (!opt)
>>> +               return -EINVAL;
>>> +
>>> +       /* FIXME: this means that we can only install this qdisc
>>> +        * "under" mqprio. Do we need a more generic way to retrieve
>>> +        * the queue, or do we pass the netdev_queue to the driver?
>>> +        */
>>> +       q->queue = TC_H_MIN(sch->parent) - 1 - netdev_get_num_tc(dev);
>>> +
>>> +       return cbs_change(sch, opt);
>>> +}
>>
>> Yeah it is ugly to assume its parent is mqprio, at least you should
>> error out if it is not the case.
> 
> Will add an error for this, for now.
> 
>>
>> I am not sure how we can solve this elegantly, perhaps you should
>> extend mqprio rather than add a new one?
> 
> Is the alternative hinted in the FIXME worse? Instead of passing the
> index of the hardware queue to the driver we pass the pointer to a
> netdev_queue to the driver and it "discovers" the HW queue from that.

What if we keep passing the index, but calculate it from the netdev_queue
pointer instead?

i.e.:  q->queue = sch->dev_queue - netdev_get_tx_queue(dev, 0);

At least it wouldn't rely on the root qdisc being of any specific type.

Regards,
Jesus

^ permalink raw reply

* Re: [PATCH v4 net-next 00/12] gtp: Additional feature support - Part I
From: Tom Herbert @ 2017-09-27 23:09 UTC (permalink / raw)
  To: Harald Welte
  Cc: David S . Miller, Pablo Neira Ayuso, Andreas Schultz,
	Linux Kernel Network Developers, Rohit LastName
In-Reply-To: <20170927122431.h26ey3h2pzrgay7a@nataraja>

On Wed, Sep 27, 2017 at 5:24 AM, Harald Welte <laforge@gnumonks.org> wrote:
> Hi Tom,
>
> thanks for your updated series!
>
> I'll revie as soon as I a, but that may likely not be before next
> week.  As indicated before, I'm on a motorbike roadtrip on vacation,
> with very limited connectivity and even more limited time for any
> technical work.  Thanks for your understanding.
>
Harald,

No problem. Enjoy your vacation!

Tom

> --
> - Harald Welte <laforge@gnumonks.org>           http://laforge.gnumonks.org/
> ============================================================================
> "Privacy in residential applications is a desirable marketing option."
>                                                   (ETSI EN 300 175-7 Ch. A6)

^ permalink raw reply

* Re: [next-queue PATCH 2/3] net/sched: Introduce Credit Based Shaper (CBS) qdisc
From: Guedes, Andre @ 2017-09-27 23:11 UTC (permalink / raw)
  To: Sanchez-Palencia, Jesus, Gomes, Vinicius,
	xiyou.wangcong@gmail.com
  Cc: jiri@resnulli.us, jhs@mojatatu.com, Ong, Boon Leong,
	richardcochran@gmail.com, netdev@vger.kernel.org,
	henrik@austad.us, Briano, Ivan, intel-wired-lan@lists.osuosl.org
In-Reply-To: <4b0e1610-c9df-191d-496e-4be04c785d2f@intel.com>

[-- Attachment #1: Type: text/plain, Size: 2074 bytes --]

On Wed, 2017-09-27 at 15:57 -0700, Jesus Sanchez-Palencia wrote:
> Hi,
> 
> 
> On 09/27/2017 02:14 PM, Vinicius Costa Gomes wrote:
> > Hi,
> > 
> > Cong Wang <xiyou.wangcong@gmail.com> writes:
> > 
> > > On Tue, Sep 26, 2017 at 4:39 PM, Vinicius Costa Gomes
> > > <vinicius.gomes@intel.com> wrote:
> > > > +static int cbs_init(struct Qdisc *sch, struct nlattr *opt)
> > > > +{
> > > > +       struct cbs_sched_data *q = qdisc_priv(sch);
> > > > +       struct net_device *dev = qdisc_dev(sch);
> > > > +
> > > > +       if (!opt)
> > > > +               return -EINVAL;
> > > > +
> > > > +       /* FIXME: this means that we can only install this qdisc
> > > > +        * "under" mqprio. Do we need a more generic way to retrieve
> > > > +        * the queue, or do we pass the netdev_queue to the driver?
> > > > +        */
> > > > +       q->queue = TC_H_MIN(sch->parent) - 1 - netdev_get_num_tc(dev);
> > > > +
> > > > +       return cbs_change(sch, opt);
> > > > +}
> > > 
> > > Yeah it is ugly to assume its parent is mqprio, at least you should
> > > error out if it is not the case.
> > 
> > Will add an error for this, for now.
> > 
> > > 
> > > I am not sure how we can solve this elegantly, perhaps you should
> > > extend mqprio rather than add a new one?
> > 
> > Is the alternative hinted in the FIXME worse? Instead of passing the
> > index of the hardware queue to the driver we pass the pointer to a
> > netdev_queue to the driver and it "discovers" the HW queue from that.

I don't see why we should move the queue index "discovery" into the driver. The
driver layer should be dead simple and getting the queue index from the upper
layer (qdisc) looks right to me.

> What if we keep passing the index, but calculate it from the netdev_queue
> pointer instead?
> 
> i.e.:  q->queue = sch->dev_queue - netdev_get_tx_queue(dev, 0);
> 
> At least it wouldn't rely on the root qdisc being of any specific type.

+1

- Andre

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 3262 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 0/3] support changing steering policies in tuntap
From: Willem de Bruijn @ 2017-09-27 23:25 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Jason Wang, Network Development, LKML
In-Reply-To: <20170927230042-mutt-send-email-mst@kernel.org>

>> In the future, both simple and sophisticated policy like RSS or other guest
>> driven steering policies could be done on top.
>
> IMHO there should be a more practical example before adding all this
> indirection. And it would be nice to understand why this queue selection
> needs to be tun specific.

I was thinking the same and this reminds me of the various strategies
implemented in packet fanout. tun_cpu_select_queue is analogous to
fanout_demux_cpu though it is tun-specific in that it requires tun->numqueues.

Fanout accrued various strategies until it gained an eBPF variant. Just
supporting BPF is probably sufficient here, too.

^ permalink raw reply

* Re: [PATCH V3] r8152: add Linksys USB3GIGV1 id
From: Doug Anderson @ 2017-09-27 23:47 UTC (permalink / raw)
  To: Grant Grundler
  Cc: Hayes Wang, Oliver Neukum, linux-usb, David S . Miller, LKML,
	netdev
In-Reply-To: <20170927172802.80654-1-grundler@chromium.org>

Hi,

On Wed, Sep 27, 2017 at 10:28 AM, Grant Grundler <grundler@chromium.org> wrote:
> This linksys dongle by default comes up in cdc_ether mode.
> This patch allows r8152 to claim the device:
>    Bus 002 Device 002: ID 13b1:0041 Linksys
>
> Signed-off-by: Grant Grundler <grundler@chromium.org>
> ---
>  drivers/net/usb/cdc_ether.c | 10 ++++++++++
>  drivers/net/usb/r8152.c     |  2 ++
>  2 files changed, 12 insertions(+)
>
> V3: for backwards compat, add #ifdef CONFIG_USB_RTL8152 around
>     the cdc_ether blacklist entry so the cdc_ether driver can
>     still claim the device if r8152 driver isn't configured.
>
> V2: add LINKSYS_VENDOR_ID to cdc_ether blacklist
>
> diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c
> index 8ab281b478f2..446dcc0f1f70 100644
> --- a/drivers/net/usb/cdc_ether.c
> +++ b/drivers/net/usb/cdc_ether.c
> @@ -546,6 +546,7 @@ static const struct driver_info wwan_info = {
>  #define DELL_VENDOR_ID         0x413C
>  #define REALTEK_VENDOR_ID      0x0bda
>  #define SAMSUNG_VENDOR_ID      0x04e8
> +#define LINKSYS_VENDOR_ID      0x13b1
>  #define LENOVO_VENDOR_ID       0x17ef

Slight nit that "LI" sorts after "LE".  You got it right in the other case...


>  #define NVIDIA_VENDOR_ID       0x0955
>  #define HP_VENDOR_ID           0x03f0
> @@ -737,6 +738,15 @@ static const struct usb_device_id  products[] = {
>         .driver_info = 0,
>  },
>
> +#ifdef CONFIG_USB_RTL8152
> +/* Linksys USB3GIGV1 Ethernet Adapter */
> +{
> +       USB_DEVICE_AND_INTERFACE_INFO(LINKSYS_VENDOR_ID, 0x0041, USB_CLASS_COMM,
> +                       USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE),
> +       .driver_info = 0,
> +},
> +#endif

I believe you want to use IS_ENABLED(), don't you?

There's still a weird esoteric side case where kernel modules don't
all need to be included in the filesystem just because they were built
at the same time.  ...but IMHO that seems like enough of a nit that we
can probably ignore it unless someone has a better idea.


-Doug

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox