Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] can: c_can: Speed up rx_poll function
From: Joe Perches @ 2013-10-28 17:12 UTC (permalink / raw)
  To: Markus Pargmann
  Cc: Marc Kleine-Budde, Wolfgang Grandegger, linux-can, netdev,
	linux-kernel, kernel
In-Reply-To: <1382979582-10352-1-git-send-email-mpa@pengutronix.de>

On Mon, 2013-10-28 at 17:59 +0100, Markus Pargmann wrote:
> This patch speeds up the rx_poll function by reducing the number of
> register reads.

trivial notes:

> diff --git a/drivers/net/can/c_can/c_can.c b/drivers/net/can/c_can/c_can.c
[]
> @@ -259,6 +259,12 @@ static u32 c_can_read_reg32(struct c_can_priv *priv, enum reg index)
> +u16 c_can_read_reg16(struct c_can_priv *priv, enum reg index)
> +{
> +	u16 val = priv->read_reg(priv, index);
> +	return val;
> +}

This function doesn't seem useful at all.
It's not exported and it's not static.

Why not use an in-place priv->read_reg(priv, index)?

> +
>  static void c_can_enable_all_interrupts(struct c_can_priv *priv,
>  						int enable)
>  {
> @@ -798,17 +804,21 @@ static int c_can_do_rx_poll(struct net_device *dev, int quota)
>  	u32 num_rx_pkts = 0;
>  	unsigned int msg_obj, msg_ctrl_save;
>  	struct c_can_priv *priv = netdev_priv(dev);
> -	u32 val = c_can_read_reg32(priv, C_CAN_INTPND1_REG);
> +	unsigned long val = c_can_read_reg16(priv, C_CAN_INTPND1_REG);

Probably better as a u16 as detailed below.

> +	/*
> +	 * It is faster to read only one 16bit register. This is only possible
> +	 * for a maximum number of 16 objects.
> +	 */
> +	BUILD_BUG_ON_MSG(C_CAN_MSG_OBJ_RX_LAST > 16,
> +			"Implementation does not support more message objects than 16");
> +
> +	while (quota > 0 && (val = c_can_read_reg16(priv, C_CAN_INTPND1_REG))) {
> +		msg_obj = 0;
> +		while ((msg_obj = find_next_bit(&val, 16, msg_obj)) < 16 &&

Using ffs instead of find_next_bit would be more standard
and probably faster too.



^ permalink raw reply

* RE: transmit lockup using smsc95xx ethernet on usb3
From: David Laight @ 2013-10-28 17:09 UTC (permalink / raw)
  To: Sarah Sharp
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Xenia Ragiadakou
In-Reply-To: <20131017000758.GL7082@xanatos>

> > We are seeing complete lockups of the transmit side when using
> > the smsc95xx driver connected to a USB3 port on an i7 (Ivybridge) cpu.
> > These errors are very intermittent - less than once a day, and
> > it isn't actually clear that they are related to traffic load.
...
> I would suggest you try with the latest stable kernel, with
> CONFIG_USB_DEBUG enabled.

I've finally got the same failure with a 3.12-0-rc5 kernel.
The only change I made was to comment out the 'short packet' trace.

The kernel trace contains:
(I've shortened the lines by removing the fixed part of the header.
I've got the full trace, and the matching usbmon trace, but they
are a bit large to post to the mailing lists!

All from xhci_hcd 0000:00:14.0
[38704.116936] Transfer error on endpoint
[38704.116941] Cleaning up stalled endpoint ring
[38704.116944] Finding segment containing stopped TRB.
[38704.116946] Finding endpoint context
[38704.116948] Finding segment containing last TRB in TD.
[38704.116950] Cycle state = 0x0
[38704.116952] New dequeue segment = ffff880214565660 (virtual)
[38704.116954] New dequeue pointer = 0x210cf4ad0 (DMA)
[38704.116956] Queueing new dequeue state
[38704.116959] Set TR Deq Ptr cmd, new deq seg = ffff880214565660 (0x210cf4800 dma), new deq ptr = ffff880210cf4ad0 (0x210cf4ad0 dma), new cycle = 0
[38704.116962] // Ding dong!
[38704.116966] Giveback URB ffff8800d6841d80, len = 0, expected = 18944, status = -71
[38704.116970] Transfer error on endpoint
[38704.116973] Cleaning up stalled endpoint ring
[38704.116974] Finding segment containing stopped TRB.
[38704.116976] Finding endpoint context
[38704.116978] Finding segment containing last TRB in TD.
[38704.116980] Cycle state = 0x1
[38704.116982] New dequeue segment = ffff880214565680 (virtual)
[38704.116984] New dequeue pointer = 0x210cf47e0 (DMA)
[38704.116986] Queueing new dequeue state
[38704.116989] Set TR Deq Ptr cmd, new deq seg = ffff880214565680 (0x210cf4400 dma), new deq ptr = ffff880210cf47e0 (0x210cf47e0 dma), new cycle = 1
[38704.116991] // Ding dong!
[38704.116995] Giveback URB ffff8802132169c0, len = 1024, expected = 1526, status = -71
[38704.116998] Ignoring reset ep completion code of 1
[38704.117001] Successful Set TR Deq Ptr cmd, deq = @210cf4ad0
[38704.117004] Ignoring reset ep completion code of 1
[38704.117006] Successful Set TR Deq Ptr cmd, deq = @210cf47e1
[38704.240067] Stalled endpoint
[38704.240075] Giveback URB ffff8800d4438900, len = 0, expected = 1526, status = -32
[38704.240112] Cancel URB ffff8800d4438b40, dev 1.1, ep 0x2, starting at offset 0x210cf4030
[38704.240118] // Ding dong!
[38704.240124] Cancel URB ffff8800d4438cc0, dev 1.1, ep 0x2, starting at offset 0x210cf4040
[38704.240129] Removing canceled TD starting at 0x210cf4030 (dma).
[38704.240132] TRB to noop at offset 0x210cf4030
[38704.240134] Removing canceled TD starting at 0x210cf4040 (dma).
[38704.240136] TRB to noop at offset 0x210cf4040
[38704.240155] WARN halted endpoint, queueing URB anyway.
[38704.240163] WARN halted endpoint, queueing URB anyway.
[38704.240170] WARN halted endpoint, queueing URB anyway.
[38704.240177] Cancel URB ffff8800d4438840, dev 1.1, ep 0x2, starting at offset 0x210cf4050
[38704.240179] // Ding dong!
[38704.240184] Cancel URB ffff8800d44389c0, dev 1.1, ep 0x2, starting at offset 0x210cf4060
[38704.240189] Removing canceled TD starting at 0x210cf4050 (dma).
[38704.240191] TRB to noop at offset 0x210cf4050
[38704.240193] Removing canceled TD starting at 0x210cf4060 (dma).
[38704.240195] TRB to noop at offset 0x210cf4060
[38704.240205] WARN halted endpoint, queueing URB anyway.
[38704.240210] WARN halted endpoint, queueing URB anyway.
[38704.240216] Cancel URB ffff8800d4438c00, dev 1.1, ep 0x2, starting at offset 0x210cf4070
[38704.240219] // Ding dong!
[38704.240224] Cancel URB ffff8800d4438a80, dev 1.1, ep 0x2, starting at offset 0x210cf4080
[38704.240228] Removing canceled TD starting at 0x210cf4070 (dma).
[38704.240230] TRB to noop at offset 0x210cf4070
[38704.240232] Removing canceled TD starting at 0x210cf4080 (dma).
[38704.240234] TRB to noop at offset 0x210cf4080
[38704.240244] WARN halted endpoint, queueing URB anyway.
[38704.240249] WARN halted endpoint, queueing URB anyway.
[38704.240255] Cancel URB ffff8800d4438780, dev 1.1, ep 0x2, starting at offset 0x210cf4090
[38704.240257] // Ding dong!
[38704.240262] Cancel URB ffff8800d4438480, dev 1.1, ep 0x2, starting at offset 0x210cf40a0
[38704.240266] Cancel URB ffff8800d44386c0, dev 1.1, ep 0x2, starting at offset 0x210cf40b0
[38704.240271] Removing canceled TD starting at 0x210cf4090 (dma).
[38704.240273] TRB to noop at offset 0x210cf4090
[38704.240275] Removing canceled TD starting at 0x210cf40a0 (dma).
[38704.240277] TRB to noop at offset 0x210cf40a0
[38704.240279] Removing canceled TD starting at 0x210cf40b0 (dma).
[38704.240281] TRB to noop at offset 0x210cf40b0
[38704.240292] WARN halted endpoint, queueing URB anyway.
[38704.240296] WARN halted endpoint, queueing URB anyway.
[38704.240301] WARN halted endpoint, queueing URB anyway.
[38704.240307] Cancel URB ffff8800d4438900, dev 1.1, ep 0x2, starting at offset 0x210cf40c0
[38704.240309] // Ding dong!
[38704.240314] Cancel URB ffff8800d4438b40, dev 1.1, ep 0x2, starting at offset 0x210cf40d0
[38704.240319] Removing canceled TD starting at 0x210cf40c0 (dma).
[38704.240321] TRB to noop at offset 0x210cf40c0
[38704.240323] Removing canceled TD starting at 0x210cf40d0 (dma).
[38704.240325] TRB to noop at offset 0x210cf40d0
[38704.240335] WARN halted endpoint, queueing URB anyway.
[38704.240340] WARN halted endpoint, queueing URB anyway.
[38704.240346] Cancel URB ffff8800d4438f00, dev 1.1, ep 0x2, starting at offset 0x210cf40e0
[38704.240348] // Ding dong!
[38704.240353] Cancel URB ffff8800d4438840, dev 1.1, ep 0x2, starting at offset 0x210cf40f0
[38704.240357] Removing canceled TD starting at 0x210cf40e0 (dma).
[38704.240359] TRB to noop at offset 0x210cf40e0
[38704.240361] Removing canceled TD starting at 0x210cf40f0 (dma).
[38704.240363] TRB to noop at offset 0x210cf40f0
[38704.240373] WARN halted endpoint, queueing URB anyway.
[38704.240378] WARN halted endpoint, queueing URB anyway.
[38704.240384] Cancel URB ffff8800d4438cc0, dev 1.1, ep 0x2, starting at offset 0x210cf4100
[38704.240386] // Ding dong!
[38704.240391] Cancel URB ffff8800d4438c00, dev 1.1, ep 0x2, starting at offset 0x210cf4110
[38704.240395] Cancel URB ffff8800d44389c0, dev 1.1, ep 0x2, starting at offset 0x210cf4120
[38704.240400] Removing canceled TD starting at 0x210cf4100 (dma).
[38704.240402] TRB to noop at offset 0x210cf4100
[38704.240404] Removing canceled TD starting at 0x210cf4110 (dma).
[38704.240406] TRB to noop at offset 0x210cf4110
[38704.240408] Removing canceled TD starting at 0x210cf4120 (dma).
[38704.240410] TRB to noop at offset 0x210cf4120
[38704.240420] WARN halted endpoint, queueing URB anyway.
[38704.240425] WARN halted endpoint, queueing URB anyway.
[38704.240430] WARN halted endpoint, queueing URB anyway.
[38704.240436] Cancel URB ffff8800d4438480, dev 1.1, ep 0x2, starting at offset 0x210cf4130
[38704.240438] // Ding dong!
[38704.240443] Cancel URB ffff8800d4438780, dev 1.1, ep 0x2, starting at offset 0x210cf4140
[38704.240447] Removing canceled TD starting at 0x210cf4130 (dma).
[38704.240449] TRB to noop at offset 0x210cf4130
[38704.240451] Removing canceled TD starting at 0x210cf4140 (dma).
[38704.240453] TRB to noop at offset 0x210cf4140
[38704.240463] WARN halted endpoint, queueing URB anyway.
[38704.240468] WARN halted endpoint, queueing URB anyway.
[38704.240474] Cancel URB ffff8800d4438a80, dev 1.1, ep 0x2, starting at offset 0x210cf4150
[38704.240476] // Ding dong!
[38704.240481] Cancel URB ffff8800d4438900, dev 1.1, ep 0x2, starting at offset 0x210cf4160
[38704.240486] Removing canceled TD starting at 0x210cf4150 (dma).
[38704.240488] TRB to noop at offset 0x210cf4150
[38704.240490] Removing canceled TD starting at 0x210cf4160 (dma).
[38704.240492] TRB to noop at offset 0x210cf4160
[38704.240501] WARN halted endpoint, queueing URB anyway.
[38704.240506] WARN halted endpoint, queueing URB anyway.
[38704.240512] Cancel URB ffff8800d44386c0, dev 1.1, ep 0x2, starting at offset 0x210cf4170
[38704.240514] // Ding dong!
[38704.240519] Cancel URB ffff8800d4438f00, dev 1.1, ep 0x2, starting at offset 0x210cf4180
[38704.240524] Cancel URB ffff8800d4438b40, dev 1.1, ep 0x2, starting at offset 0x210cf4190
[38704.240528] Removing canceled TD starting at 0x210cf4170 (dma).
[38704.240530] TRB to noop at offset 0x210cf4170
[38704.240532] Removing canceled TD starting at 0x210cf4180 (dma).
[38704.240535] TRB to noop at offset 0x210cf4180
[38704.240537] Removing canceled TD starting at 0x210cf4190 (dma).
[38704.240539] TRB to noop at offset 0x210cf4190
[38704.240549] WARN halted endpoint, queueing URB anyway.
[38704.240554] WARN halted endpoint, queueing URB anyway.
[38704.240559] WARN halted endpoint, queueing URB anyway.
[38704.240565] Cancel URB ffff8800d4438c00, dev 1.1, ep 0x2, starting at offset 0x210cf41a0
[38704.240567] // Ding dong!
[38704.240572] Cancel URB ffff8800d4438cc0, dev 1.1, ep 0x2, starting at offset 0x210cf41b0
[38704.240577] Removing canceled TD starting at 0x210cf41a0 (dma).
[38704.240579] TRB to noop at offset 0x210cf41a0
[38704.240581] Removing canceled TD starting at 0x210cf41b0 (dma).
[38704.240583] TRB to noop at offset 0x210cf41b0
[38704.240592] WARN halted endpoint, queueing URB anyway.
[38704.240598] WARN halted endpoint, queueing URB anyway.
[38704.240603] Cancel URB ffff8800d4438840, dev 1.1, ep 0x2, starting at offset 0x210cf41c0
[38704.240605] // Ding dong!
[38704.240610] Cancel URB ffff8800d4438480, dev 1.1, ep 0x2, starting at offset 0x210cf41d0
[38704.240614] Cancel URB ffff8800d44389c0, dev 1.1, ep 0x2, starting at offset 0x210cf41e0
[38704.240618] Removing canceled TD starting at 0x210cf41c0 (dma).
[38704.240620] TRB to noop at offset 0x210cf41c0
[38704.240623] Removing canceled TD starting at 0x210cf41d0 (dma).
[38704.240625] TRB to noop at offset 0x210cf41d0
[38704.240627] Removing canceled TD starting at 0x210cf41e0 (dma).
[38704.240629] TRB to noop at offset 0x210cf41e0
[38704.240639] WARN halted endpoint, queueing URB anyway.
[38704.240644] WARN halted endpoint, queueing URB anyway.
[38704.240648] WARN halted endpoint, queueing URB anyway.
[38704.240654] Cancel URB ffff8800d4438a80, dev 1.1, ep 0x2, starting at offset 0x210cf41f0
[38704.240656] // Ding dong!
[38704.240661] Cancel URB ffff8800d4438780, dev 1.1, ep 0x2, starting at offset 0x210cf4200
[38704.240665] Removing canceled TD starting at 0x210cf41f0 (dma).
[38704.240667] TRB to noop at offset 0x210cf41f0
[38704.240669] Removing canceled TD starting at 0x210cf4200 (dma).
[38704.240671] TRB to noop at offset 0x210cf4200
[38704.240681] WARN halted endpoint, queueing URB anyway.
[38704.240687] Cancel URB ffff8800d4438f00, dev 1.1, ep 0x2, starting at offset 0x210cf4210
[38704.240689] // Ding dong!
[38704.240694] Cancel URB ffff8800d44386c0, dev 1.1, ep 0x2, starting at offset 0x210cf4220
[38704.240699] Cancel URB ffff8800d4438900, dev 1.1, ep 0x2, starting at offset 0x210cf4230
[38704.240703] Cancel URB ffff8800d4438c00, dev 1.1, ep 0x2, starting at offset 0x210cf4240
[38704.240707] Removing canceled TD starting at 0x210cf4210 (dma).
[38704.240709] TRB to noop at offset 0x210cf4210
[38704.240711] Removing canceled TD starting at 0x210cf4220 (dma).
[38704.240713] TRB to noop at offset 0x210cf4220
[38704.240716] Removing canceled TD starting at 0x210cf4230 (dma).
[38704.240718] TRB to noop at offset 0x210cf4230
[38704.240720] Removing canceled TD starting at 0x210cf4240 (dma).
[38704.240722] TRB to noop at offset 0x210cf4240
[38704.240733] Cancel URB ffff8800d4438b40, dev 1.1, ep 0x2, starting at offset 0x210cf4250
[38704.240735] // Ding dong!
[38704.240740] Cancel URB ffff8800d4438480, dev 1.1, ep 0x2, starting at offset 0x210cf4260
[38704.240745] Removing canceled TD starting at 0x210cf4250 (dma).
[38704.240747] TRB to noop at offset 0x210cf4250
[38704.240749] Removing canceled TD starting at 0x210cf4260 (dma).
[38704.240751] TRB to noop at offset 0x210cf4260
[38704.240764] Cancel URB ffff8800d4438840, dev 1.1, ep 0x2, starting at offset 0x210cf4270
[38704.240766] // Ding dong!
[38704.240771] Cancel URB ffff8800d4438cc0, dev 1.1, ep 0x2, starting at offset 0x210cf4280
[38704.240776] Cancel URB ffff8800d4438a80, dev 1.1, ep 0x2, starting at offset 0x210cf4290
[38704.240787] Removing canceled TD starting at 0x210cf4270 (dma).
[38704.240789] TRB to noop at offset 0x210cf4270
[38704.240791] Removing canceled TD starting at 0x210cf4280 (dma).
[38704.240793] TRB to noop at offset 0x210cf4280
[38704.240796] Removing canceled TD starting at 0x210cf4290 (dma).
[38704.240798] TRB to noop at offset 0x210cf4290
[38704.240812] Queueing reset endpoint command
[38704.240814] Cleaning up stalled endpoint ring
[38704.240816] Finding segment containing stopped TRB.
[38704.240818] Finding endpoint context
[38704.240820] Finding segment containing last TRB in TD.
[38704.240822] Cycle state = 0x0
[38704.240825] New dequeue segment = ffff8802145657a0 (virtual)
[38704.240827] New dequeue pointer = 0x210cf4030 (DMA)
[38704.240829] Queueing new dequeue state
[38704.240832] Set TR Deq Ptr cmd, new deq seg = ffff8802145657a0 (0x210cf4000 dma), new deq ptr = ffff880210cf4030 (0x210cf4030 dma), new cycle = 0
[38704.240834] // Ding dong!
[38704.240843] Ignoring reset ep completion code of 1
[38704.240846] Successful Set TR Deq Ptr cmd, deq = @210cf4030
[38704.332072] Stalled endpoint
[38704.332080] Giveback URB ffff8800d4438480, len = 0, expected = 1526, status = -32
[38704.332135] Queueing reset endpoint command
[38704.332138] Cleaning up stalled endpoint ring
[38704.332140] Finding segment containing stopped TRB.
[38704.332142] Finding endpoint context
[38704.332144] Finding segment containing last TRB in TD.
[38704.332146] Cycle state = 0x0
[38704.332149] New dequeue segment = ffff8802145657a0 (virtual)
[38704.332151] New dequeue pointer = 0x210cf42b0 (DMA)
[38704.332153] Queueing new dequeue state
[38704.332156] Set TR Deq Ptr cmd, new deq seg = ffff8802145657a0 (0x210cf4000 dma), new deq ptr = ffff880210cf42b0 (0x210cf42b0 dma), new cycle = 0
[38704.332158] // Ding dong!
[38704.332170] Ignoring reset ep completion code of 1
[38704.332175] Successful Set TR Deq Ptr cmd, deq = @210cf42b0

>From then on this gets traced every few seconds - probably whenever
dhclient tries to send a packet.

[38704.747916] Stalled endpoint
[38704.747924] Giveback URB ffff8800d4438480, len = 0, expected = 1526, status = -32
[38704.747972] Queueing reset endpoint command
[38704.747982] Cleaning up stalled endpoint ring
[38704.747984] Finding segment containing stopped TRB.
[38704.747986] Finding endpoint context
[38704.747988] Finding segment containing last TRB in TD.
[38704.747990] Cycle state = 0x0
[38704.747992] New dequeue segment = ffff8802145657a0 (virtual)
[38704.747994] New dequeue pointer = 0x210cf42c0 (DMA)
[38704.747996] Queueing new dequeue state
[38704.747999] Set TR Deq Ptr cmd, new deq seg = ffff8802145657a0 (0x210cf4000 dma), new deq ptr = ffff880210cf42c0 (0x210cf42c0 dma), new cycle = 0
[38704.748002] // Ding dong!
[38704.748010] Ignoring reset ep completion code of 1
[38704.748018] Successful Set TR Deq Ptr cmd, deq = @210cf42c0

The only way to recover is to unplug and replug.

	David



--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] can: c_can: Speed up rx_poll function
From: Markus Pargmann @ 2013-10-28 16:59 UTC (permalink / raw)
  To: Marc Kleine-Budde, Wolfgang Grandegger
  Cc: linux-can, netdev, linux-kernel, kernel, Markus Pargmann

This patch speeds up the rx_poll function by reducing the number of
register reads.

Replace the 32bit register read by a 16bit register read. Currently
the 32bit register read is implemented by using 2 16bit reads. This is
inefficient as we only use the lower 16bit in rx_poll.

The for loops reads the pending interrupts in every iteration. This
leads up to 16 reads of pending interrupts. The patch introduces a new
outer loop to read the pending interrupts as long as 'quota' is above 0.
This reduces the total number of reads.

The third change is to replace the for-loop by a find_next_bit loop.

Tested on AM335x. I removed all 'static' and 'inline' from c_can.c to
see the timings for all functions. I used the function tracer with
trace_stats.

125kbit:
  Function                               Hit    Time            Avg             s^2
  --------                               ---    ----            ---             ---
  c_can_do_rx_poll                     63960    10168178 us     158.977 us      1493056 us
With patch:
  c_can_do_rx_poll                     63939    4268457 us     66.758 us       818790.9 us

1Mbit:
  Function                               Hit    Time            Avg             s^2
  --------                               ---    ----            ---             ---
  c_can_do_rx_poll                     69489    30049498 us     432.435 us      9271851 us
With patch:
  c_can_do_rx_poll                    103034    24220362 us     235.071 us      6016656 us

Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
---
 drivers/net/can/c_can/c_can.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/drivers/net/can/c_can/c_can.c b/drivers/net/can/c_can/c_can.c
index a668cd4..31af033 100644
--- a/drivers/net/can/c_can/c_can.c
+++ b/drivers/net/can/c_can/c_can.c
@@ -259,6 +259,12 @@ static u32 c_can_read_reg32(struct c_can_priv *priv, enum reg index)
 	return val;
 }
 
+u16 c_can_read_reg16(struct c_can_priv *priv, enum reg index)
+{
+	u16 val = priv->read_reg(priv, index);
+	return val;
+}
+
 static void c_can_enable_all_interrupts(struct c_can_priv *priv,
 						int enable)
 {
@@ -798,17 +804,21 @@ static int c_can_do_rx_poll(struct net_device *dev, int quota)
 	u32 num_rx_pkts = 0;
 	unsigned int msg_obj, msg_ctrl_save;
 	struct c_can_priv *priv = netdev_priv(dev);
-	u32 val = c_can_read_reg32(priv, C_CAN_INTPND1_REG);
+	unsigned long val = c_can_read_reg16(priv, C_CAN_INTPND1_REG);
+
+	/*
+	 * It is faster to read only one 16bit register. This is only possible
+	 * for a maximum number of 16 objects.
+	 */
+	BUILD_BUG_ON_MSG(C_CAN_MSG_OBJ_RX_LAST > 16,
+			"Implementation does not support more message objects than 16");
+
+	while (quota > 0 && (val = c_can_read_reg16(priv, C_CAN_INTPND1_REG))) {
+		msg_obj = 0;
+		while ((msg_obj = find_next_bit(&val, 16, msg_obj)) < 16 &&
+				quota > 0) {
+			++msg_obj;
 
-	for (msg_obj = C_CAN_MSG_OBJ_RX_FIRST;
-			msg_obj <= C_CAN_MSG_OBJ_RX_LAST && quota > 0;
-			val = c_can_read_reg32(priv, C_CAN_INTPND1_REG),
-			msg_obj++) {
-		/*
-		 * as interrupt pending register's bit n-1 corresponds to
-		 * message object n, we need to handle the same properly.
-		 */
-		if (val & (1 << (msg_obj - 1))) {
 			c_can_object_get(dev, 0, msg_obj, IF_COMM_ALL &
 					~IF_COMM_TXRQST);
 			msg_ctrl_save = priv->read_reg(priv,
-- 
1.8.4.rc3


^ permalink raw reply related

* Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
From: David Ahern @ 2013-10-28 16:49 UTC (permalink / raw)
  To: Ingo Molnar, Neil Horman
  Cc: Eric Dumazet, linux-kernel, sebastien.dugue, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, netdev
In-Reply-To: <20131028162438.GB14350@gmail.com>

On 10/28/13 10:24 AM, Ingo Molnar wrote:

> The most accurate method of measurement for such single-threaded
> workloads is something like:
>
> 	taskset 0x1 perf stat -a -C 1 --repeat 20 ...
>
> this will bind your workload to CPU#0, and will do PMU measurements
> only there - without mixing in other CPUs or workloads.

you can drop the -a if you only want a specific CPU (-C arg). And -C in 
perf is cpu number starting with 0, so in your example above -C 1 means 
cpu1, not cpu0.

David

^ permalink raw reply

* Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
From: Ingo Molnar @ 2013-10-28 16:24 UTC (permalink / raw)
  To: Neil Horman
  Cc: Eric Dumazet, linux-kernel, sebastien.dugue, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, netdev
In-Reply-To: <20131028160131.GA31048@hmsreliant.think-freely.org>


* Neil Horman <nhorman@tuxdriver.com> wrote:

> Looking at the specific cpu counters we get this:
> 
> Base:
>      Total time: 0.179 [sec]
> 
>  Performance counter stats for 'perf bench sched messaging -- bash -c echo 1 > /sys/module/csum_test/parameters/test_fire' (20 runs):
> 
>        1571.304618 task-clock                #    5.213 CPUs utilized            ( +-  0.45% )
>             14,423 context-switches          #    0.009 M/sec                    ( +-  4.28% )
>              2,710 cpu-migrations            #    0.002 M/sec                    ( +-  2.83% )

Hm, for these second round of measurements were you using 'perf stat 
-a -C ...'?

The most accurate method of measurement for such single-threaded 
workloads is something like:

	taskset 0x1 perf stat -a -C 1 --repeat 20 ...

this will bind your workload to CPU#0, and will do PMU measurements 
only there - without mixing in other CPUs or workloads.

Thanks,

	Ingo

^ permalink raw reply

* Re: [PATCH 16/16] wl1251: Add sysfs file address for setting permanent mac address
From: Pali Rohár @ 2013-10-28 16:21 UTC (permalink / raw)
  To: Dan Williams
  Cc: Johannes Berg, Luciano Coelho, John W. Linville, David S. Miller,
	linux-wireless, netdev, linux-kernel, freemangordon,
	aaro.koskinen, pavel, sre, joni.lapilainen
In-Reply-To: <1382974161.1542.18.camel@dcbw.foobar.com>

[-- Attachment #1: Type: Text/Plain, Size: 1983 bytes --]

On Monday 28 October 2013 16:29:21 Dan Williams wrote:
> On Mon, 2013-10-28 at 16:04 +0100, Pali Rohár wrote:
> > On Monday 28 October 2013 15:56:55 Johannes Berg wrote:
> > > On Mon, 2013-10-28 at 09:46 -0500, Dan Williams wrote:
> > > > If the device doesn't actually *have* a permanent MAC
> > > > address, then it shouldn't be returning one via ethtool,
> > > > and should return an error for ETHTOOL_GPERMADDR.
> > > 
> > > There's currently no provision for that, but I agree it
> > > would be better to do so.
> > > 
> > > johannes
> > 
> > So what to do with devices which has MAC address stored in
> > some obscure place and there is userspace binary which can
> > read it?
> > 
> > I think that there should be some way to tell kernel that
> > *this* is the permanent address and it should use it.
> > 
> > This is reason why I proposed patch which adding sysfs entry
> > for setting permanent address from userspace in wl1251
> > driver.
> 
> Ok, so the N900's MAC is stored in the "cal partition", which
> is a region of flash exposed to the OS as /dev/mtd1.  That
> also stores the regulatory domain.  Typically userspace
> binaries are used to pull out this and other data (see
> https://dev.openwrt.org/browser/packages/utils/calvaria/files/
> src/calvaria.c ) which is then used to initialize the device.
> 
> Any idea what kernel driver is used to expose the CAL
> partition as /dev/mtd1?  If it's nothing special maybe a
> small EEPROM-style driver could be written to read the
> relevant areas (and since they don't ever change, don't need
> to worry about something else writing at the same time).
> 
> Dan

/dev/mtd1 is second (0 is first) OneNand partition. Nothing 
special. Kernel driver for OnaNand is already in mainline kernel.

That partition also has NVS device data for wl1251 chip.

And you are right calvaria has GPL v2 code for parsing data in 
that partition.

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
From: Ingo Molnar @ 2013-10-28 16:20 UTC (permalink / raw)
  To: Neil Horman
  Cc: Eric Dumazet, linux-kernel, sebastien.dugue, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, netdev
In-Reply-To: <20131028160131.GA31048@hmsreliant.think-freely.org>

* Neil Horman <nhorman@tuxdriver.com> wrote:

> Base:
>        0.093269042 seconds time elapsed                                          ( +-  2.24% )
> Prefetch (5x64):
>        0.079440009 seconds time elapsed                                          ( +-  2.29% )
> Parallel ALU:
>        0.087666677 seconds time elapsed                                          ( +-  4.01% )
> Prefetch + Parallel ALU:
>        0.080758702 seconds time elapsed                                          ( +-  2.34% )
> 
> So we can see here that we get about a 1% speedup between the base 
> and the both (Prefetch + Parallel ALU) case, with prefetch 
> accounting for most of that speedup.

Hm, there's still something strange about these results. So the 
range of the results is 790-930 nsecs. The noise of the measurements 
is 2%-4%, i.e. 20-40 nsecs.

The prefetch-only result itself is the fastest of all - 
statistically equivalent to the prefetch+parallel-ALU result, within 
the noise range.

So if prefetch is enabled, turning on parallel-ALU has no measurable 
effect - which is counter-intuitive. Do you have an 
theory/explanation for that?

Thanks,

	Ingo

^ permalink raw reply

* Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
From: Neil Horman @ 2013-10-28 16:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric Dumazet, linux-kernel, sebastien.dugue, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, x86, netdev
In-Reply-To: <20131017084121.GC22705@gmail.com>



Ingo, et al.-
	Ok, sorry for the delay, here are the test results you've been asking
for.


First, some information about what I did.  I attached the module that I ran this
test with at the bottom of this email.  You'll note that I started using a
module parameter write patch to trigger the csum rather than the module load
path.  The latter seemed to be giving me lots of variance in my run times, which
I wanted to eliminate.  I attributed it to the module load mechanism itself, and
by using the parameter write path, I was able to get more consistent results.

First, the run time tests:

I ran this command:
for i in `seq 0 1 3`
do
	 echo $i > /sys/module/csum_test/parameters/module_test_mode
	 perf stat --repeat 20 --null echo 1 > echo 1 > /sys/module/csum_test/parameters/test_fire
done

The for loop allows me to chagne the module_test_mode, which is tied to a switch
statement in do_csum that selects which checksumming method we use
(base/prefetch/parallel alu/both).  The results are:


Base:
 Performance counter stats for 'bash -c echo 1 > /sys/module/csum_test/parameters/test_fire' (20 runs):

       0.093269042 seconds time elapsed                                          ( +-  2.24% )

Prefetch (5x64):
 Performance counter stats for 'bash -c echo 1 > /sys/module/csum_test/parameters/test_fire' (20 runs):

       0.079440009 seconds time elapsed                                          ( +-  2.29% )

Parallel ALU:
 Performance counter stats for 'bash -c echo 1 > /sys/module/csum_test/parameters/test_fire' (20 runs):

       0.087666677 seconds time elapsed                                          ( +-  4.01% )

Prefetch + Parallel ALU:
 Performance counter stats for 'bash -c echo 1 > /sys/module/csum_test/parameters/test_fire' (20 runs):

       0.080758702 seconds time elapsed                                          ( +-  2.34% )

So we can see here that we get about a 1% speedup between the base and the both
(Prefetch + Parallel ALU) case, with prefetch accounting for most of that
speedup.

Looking at the specific cpu counters we get this:


Base:
     Total time: 0.179 [sec]

 Performance counter stats for 'perf bench sched messaging -- bash -c echo 1 > /sys/module/csum_test/parameters/test_fire' (20 runs):

       1571.304618 task-clock                #    5.213 CPUs utilized            ( +-  0.45% )
            14,423 context-switches          #    0.009 M/sec                    ( +-  4.28% )
             2,710 cpu-migrations            #    0.002 M/sec                    ( +-  2.83% )
            75,402 page-faults               #    0.048 M/sec                    ( +-  0.07% )
     1,597,349,326 cycles                    #    1.017 GHz                      ( +-  1.74% ) [40.51%]
       104,882,858 stalled-cycles-frontend   #    6.57% frontend cycles idle     ( +-  1.25% ) [40.33%]
     1,043,429,984 stalled-cycles-backend    #   65.32% backend  cycles idle     ( +-  1.25% ) [39.73%]
       868,372,132 instructions              #    0.54  insns per cycle        
                                             #    1.20  stalled cycles per insn  ( +-  1.43% ) [39.88%]
       161,143,820 branches                  #  102.554 M/sec                    ( +-  1.49% ) [39.76%]
         4,348,075 branch-misses             #    2.70% of all branches          ( +-  1.43% ) [39.99%]
       457,042,576 L1-dcache-loads           #  290.868 M/sec                    ( +-  1.25% ) [40.63%]
         8,928,240 L1-dcache-load-misses     #    1.95% of all L1-dcache hits    ( +-  1.26% ) [41.17%]
        15,821,051 LLC-loads                 #   10.069 M/sec                    ( +-  1.56% ) [41.20%]
         4,902,576 LLC-load-misses           #   30.99% of all LL-cache hits     ( +-  1.51% ) [41.36%]
       235,775,688 L1-icache-loads           #  150.051 M/sec                    ( +-  1.39% ) [41.10%]
         3,116,106 L1-icache-load-misses     #    1.32% of all L1-icache hits    ( +-  3.43% ) [40.96%]
       461,315,416 dTLB-loads                #  293.588 M/sec                    ( +-  1.43% ) [41.18%]
           140,280 dTLB-load-misses          #    0.03% of all dTLB cache hits   ( +-  2.30% ) [40.96%]
       236,127,031 iTLB-loads                #  150.275 M/sec                    ( +-  1.63% ) [41.43%]
            46,173 iTLB-load-misses          #    0.02% of all iTLB cache hits   ( +-  3.40% ) [41.11%]
                 0 L1-dcache-prefetches      #    0.000 K/sec                   [40.82%]
                 0 L1-dcache-prefetch-misses #    0.000 K/sec                   [40.37%]

       0.301414024 seconds time elapsed                                          ( +-  0.47% )

Prefetch (5x64):
     Total time: 0.172 [sec]

 Performance counter stats for 'perf bench sched messaging -- bash -c echo 1 > /sys/module/csum_test/parameters/test_fire' (20 runs):

       1565.797128 task-clock                #    5.238 CPUs utilized            ( +-  0.46% )
            13,845 context-switches          #    0.009 M/sec                    ( +-  4.20% )
             2,624 cpu-migrations            #    0.002 M/sec                    ( +-  2.72% )
            75,452 page-faults               #    0.048 M/sec                    ( +-  0.08% )
     1,642,106,355 cycles                    #    1.049 GHz                      ( +-  1.33% ) [40.17%]
       107,786,666 stalled-cycles-frontend   #    6.56% frontend cycles idle     ( +-  1.37% ) [39.90%]
     1,065,286,880 stalled-cycles-backend    #   64.87% backend  cycles idle     ( +-  1.59% ) [39.14%]
       888,815,001 instructions              #    0.54  insns per cycle        
                                             #    1.20  stalled cycles per insn  ( +-  1.29% ) [38.92%]
       163,106,907 branches                  #  104.169 M/sec                    ( +-  1.32% ) [38.93%]
         4,333,456 branch-misses             #    2.66% of all branches          ( +-  1.94% ) [39.77%]
       459,779,806 L1-dcache-loads           #  293.639 M/sec                    ( +-  1.60% ) [40.23%]
         8,827,680 L1-dcache-load-misses     #    1.92% of all L1-dcache hits    ( +-  1.77% ) [41.38%]
        15,556,816 LLC-loads                 #    9.935 M/sec                    ( +-  1.76% ) [41.16%]
         4,885,618 LLC-load-misses           #   31.40% of all LL-cache hits     ( +-  1.40% ) [40.84%]
       236,131,778 L1-icache-loads           #  150.806 M/sec                    ( +-  1.32% ) [40.59%]
         3,037,537 L1-icache-load-misses     #    1.29% of all L1-icache hits    ( +-  2.23% ) [41.13%]
       454,835,028 dTLB-loads                #  290.481 M/sec                    ( +-  1.23% ) [41.34%]
           139,907 dTLB-load-misses          #    0.03% of all dTLB cache hits   ( +-  2.18% ) [41.21%]
       236,357,655 iTLB-loads                #  150.950 M/sec                    ( +-  1.31% ) [41.29%]
            46,633 iTLB-load-misses          #    0.02% of all iTLB cache hits   ( +-  2.74% ) [40.67%]
                 0 L1-dcache-prefetches      #    0.000 K/sec                   [40.16%]
                 0 L1-dcache-prefetch-misses #    0.000 K/sec                   [40.09%]

       0.298948767 seconds time elapsed                                          ( +-  0.36% )

Here it appears everything between the two runs is about the same.  We reduced
the number of dcache misses by a small amount (0.03 percentage points), which is
nice, but I'm not sure would account for the speedup we see in the run time.

Parallel ALU:
     Total time: 0.182 [sec]

 Performance counter stats for 'perf bench sched messaging -- bash -c echo 1 > /sys/module/csum_test/parameters/test_fire' (20 runs):

       1553.544876 task-clock                #    5.217 CPUs utilized            ( +-  0.42% )
            14,066 context-switches          #    0.009 M/sec                    ( +-  6.24% )
             2,831 cpu-migrations            #    0.002 M/sec                    ( +-  3.33% )
            75,432 page-faults               #    0.049 M/sec                    ( +-  0.08% )
     1,659,509,743 cycles                    #    1.068 GHz                      ( +-  1.27% ) [40.10%]
       106,466,680 stalled-cycles-frontend   #    6.42% frontend cycles idle     ( +-  1.50% ) [39.98%]
     1,035,481,957 stalled-cycles-backend    #   62.40% backend  cycles idle     ( +-  1.23% ) [39.38%]
       875,104,201 instructions              #    0.53  insns per cycle        
                                             #    1.18  stalled cycles per insn  ( +-  1.30% ) [38.66%]
       160,553,275 branches                  #  103.346 M/sec                    ( +-  1.32% ) [38.85%]
         4,329,119 branch-misses             #    2.70% of all branches          ( +-  1.39% ) [39.59%]
       448,195,116 L1-dcache-loads           #  288.498 M/sec                    ( +-  1.91% ) [41.07%]
         8,632,347 L1-dcache-load-misses     #    1.93% of all L1-dcache hits    ( +-  1.90% ) [41.56%]
        15,143,145 LLC-loads                 #    9.747 M/sec                    ( +-  1.89% ) [41.05%]
         4,698,204 LLC-load-misses           #   31.03% of all LL-cache hits     ( +-  1.03% ) [41.23%]
       224,316,468 L1-icache-loads           #  144.390 M/sec                    ( +-  1.27% ) [41.39%]
         2,902,842 L1-icache-load-misses     #    1.29% of all L1-icache hits    ( +-  2.65% ) [42.60%]
       433,914,588 dTLB-loads                #  279.306 M/sec                    ( +-  1.75% ) [43.07%]
           132,090 dTLB-load-misses          #    0.03% of all dTLB cache hits   ( +-  2.15% ) [43.12%]
       230,701,361 iTLB-loads                #  148.500 M/sec                    ( +-  1.77% ) [43.47%]
            45,562 iTLB-load-misses          #    0.02% of all iTLB cache hits   ( +-  3.76% ) [42.88%]
                 0 L1-dcache-prefetches      #    0.000 K/sec                   [42.29%]
                 0 L1-dcache-prefetch-misses #    0.000 K/sec                   [41.32%]

       0.297758185 seconds time elapsed                                          ( +-  0.40% )

Here It seems the major advantage was backend stall cycles saved (which makes
sense to me).  Since we split the instruction path into two units that could run
independently of each other we spent less time waiting for prior instructions to
retire.  As a result we dropped two percentage points in our stall number.

Prefetch + Parallel ALU:
     Total time: 0.182 [sec]

 Performance counter stats for 'perf bench sched messaging -- bash -c echo 1 > /sys/module/csum_test/parameters/test_fire' (20 runs):

       1549.171283 task-clock                #    5.231 CPUs utilized            ( +-  0.50% )
            13,717 context-switches          #    0.009 M/sec                    ( +-  4.32% )
             2,721 cpu-migrations            #    0.002 M/sec                    ( +-  2.47% )
            75,432 page-faults               #    0.049 M/sec                    ( +-  0.07% )
     1,579,140,244 cycles                    #    1.019 GHz                      ( +-  1.71% ) [40.06%]
       103,803,034 stalled-cycles-frontend   #    6.57% frontend cycles idle     ( +-  1.74% ) [39.60%]
     1,016,582,613 stalled-cycles-backend    #   64.38% backend  cycles idle     ( +-  1.79% ) [39.57%]
       881,036,653 instructions              #    0.56  insns per cycle        
                                             #    1.15  stalled cycles per insn  ( +-  1.61% ) [39.29%]
       164,333,010 branches                  #  106.078 M/sec                    ( +-  1.51% ) [39.38%]
         4,385,459 branch-misses             #    2.67% of all branches          ( +-  1.62% ) [40.29%]
       463,987,526 L1-dcache-loads           #  299.507 M/sec                    ( +-  1.52% ) [40.20%]
         8,739,535 L1-dcache-load-misses     #    1.88% of all L1-dcache hits    ( +-  1.95% ) [40.37%]
        15,318,497 LLC-loads                 #    9.888 M/sec                    ( +-  1.80% ) [40.43%]
         4,846,148 LLC-load-misses           #   31.64% of all LL-cache hits     ( +-  1.68% ) [40.59%]
       231,982,874 L1-icache-loads           #  149.746 M/sec                    ( +-  1.43% ) [41.25%]
         3,141,106 L1-icache-load-misses     #    1.35% of all L1-icache hits    ( +-  2.32% ) [41.76%]
       459,688,615 dTLB-loads                #  296.732 M/sec                    ( +-  1.75% ) [41.87%]
           138,667 dTLB-load-misses          #    0.03% of all dTLB cache hits   ( +-  1.97% ) [42.31%]
       235,629,204 iTLB-loads                #  152.100 M/sec                    ( +-  1.40% ) [42.04%]
            46,038 iTLB-load-misses          #    0.02% of all iTLB cache hits   ( +-  2.75% ) [41.20%]
                 0 L1-dcache-prefetches      #    0.000 K/sec                   [40.77%]
                 0 L1-dcache-prefetch-misses #    0.000 K/sec                   [40.27%]

       0.296173305 seconds time elapsed                                          ( +-  0.44% )
Here, with both optimizations, we've reduced both our backend stall cycles, and
our dcache miss rate (though our load misses here is higher than it was when we
are just doing parallel ALU execution.  I wonder if the separation of the adcx
path is leading to multiple load requests before the prefetch completes.  I'll
try messing with the stride a bit more to see if I can get some more insight
there.

So there you have it.  I think, looking at this, I can say that its not as big a
win as my initial measurements were indicating, but still a win.

Thoughts?

Regards
Neil

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/netdevice.h>
#include <linux/etherdevice.h>
#include <linux/init.h>
#include <linux/moduleparam.h>
#include <linux/rtnetlink.h>
#include <net/rtnetlink.h>
#include <linux/u64_stats_sync.h>

#define BUFSIZ 2*1024*1024
#define NBPAGES 16

extern int csum_mode;
int module_test_mode = 0;
int test_fire = 0;

static int __init csum_init_module(void)
{
        return 0;
}

static void __exit csum_cleanup_module(void)
{
        return;
}

static int set_param_str(const char *val, const struct kernel_param *kp)
{
        int i;
        __wsum sum = 0;
        /*u64 start, end;*/
        void *base, *addrs[NBPAGES];
        u32 rnd, offset;

	
        memset(addrs, 0, sizeof(addrs));
        for (i = 0; i < NBPAGES; i++) {
                addrs[i] = kmalloc_node(BUFSIZ, GFP_KERNEL, 0);
                if (!addrs[i])
                        goto out;
        }

	csum_mode = module_test_mode;

	local_bh_disable();
        /*pr_err("STARTING ITERATIONS on cpu %d\n", smp_processor_id());*/
        /*start = ktime_to_ns(ktime_get());*/

        for (i = 0; i < 100000; i++) {
                rnd = prandom_u32();
                base = addrs[rnd % NBPAGES];
                rnd /= NBPAGES;
                offset = rnd % (BUFSIZ - 1500);
                offset &= ~1U;
                sum = csum_partial(base + offset, 1500, sum);
        }
        /*end = ktime_to_ns(ktime_get());*/
        local_bh_enable();

	/*pr_err("COMPLETED 100000 iterations of csum %x in %llu nanosec\n", sum, end - start);*/

	csum_mode = 0;
out:
        for (i = 0; i < NBPAGES; i++)
                kfree(addrs[i]);

        return 0;
}

static int get_param_str(char *buffer, const struct kernel_param *kp)
{
	return sprintf(buffer, "%d\n", test_fire);
}

static struct kernel_param_ops param_ops_str = {
	.set = set_param_str,
	.get = get_param_str,
};

module_param_named(module_test_mode, module_test_mode, int, 0644);
MODULE_PARM_DESC(module_test_mode, "csum test mode");
module_param_cb(test_fire, &param_ops_str, &test_fire, 0644);
module_init(csum_init_module);
module_exit(csum_cleanup_module);
MODULE_LICENSE("GPL");

^ permalink raw reply

* [PATCH net-next v2] net: sched: cls_bpf: add BPF-based classifier
From: Daniel Borkmann @ 2013-10-28 15:43 UTC (permalink / raw)
  To: davem; +Cc: eric.dumazet, netdev, Thomas Graf
In-Reply-To: <cover.1382974535.git.dborkman@redhat.com>

This work contains a lightweight BPF-based traffic classifier that can
serve as a flexible alternative to ematch-based tree classification, i.e.
now that BPF filter engine can also be JITed in the kernel. Naturally, tc
actions and policies are supported as well with cls_bpf. Multiple BPF
programs/filter can be attached for a class, or they can just as well be
written within a single BPF program, that's really up to the user how he
wishes to run/optimize the code, e.g. also for inversion of verdicts etc.
The notion of a BPF program's return/exit codes is being kept as follows:

     0: No match
    -1: Select classid given in "tc filter ..." command
  else: flowid, overwrite the default one

As a minimal usage example with iproute2, we use a 3 band prio root qdisc
on a router with sfq each as leave, and assign ssh and icmp bpf-based
filters to band 1, http traffic to band 2 and the rest to band 3. For the
first two bands we load the bytecode from a file, in the 2nd we load it
inline as an example:

echo 1 > /proc/sys/net/core/bpf_jit_enable

tc qdisc del dev em1 root
tc qdisc add dev em1 root handle 1: prio bands 3 priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

tc qdisc add dev em1 parent 1:1 sfq perturb 16
tc qdisc add dev em1 parent 1:2 sfq perturb 16
tc qdisc add dev em1 parent 1:3 sfq perturb 16

tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/ssh.bpf flowid 1:1
tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/icmp.bpf flowid 1:1
tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/http.bpf flowid 1:2
tc filter add dev em1 parent 1: bpf run bytecode "`bpfc -f tc -i misc.ops`" flowid 1:3

BPF programs can be easily created and passed to tc, either as inline
'bytecode' or 'bytecode-file'. There are a couple of front-ends that can
compile opcodes, for example:

1) People familiar with tcpdump-like filters:

   tcpdump -iem1 -ddd port 22 | tr '\n' ',' > /etc/tc/ssh.bpf

2) People that want to low-level program their filters or use BPF
   extensions that lack support by libpcap's compiler:

   bpfc -f tc -i ssh.ops > /etc/tc/ssh.bpf

   ssh.ops example code:
   ldh [12]
   jne #0x800, drop
   ldb [23]
   jneq #6, drop
   ldh [20]
   jset #0x1fff, drop
   ldxb 4 * ([14] & 0xf)
   ldh [%x + 14]
   jeq #0x16, pass
   ldh [%x + 16]
   jne #0x16, drop
   pass: ret #-1
   drop: ret #0

It was chosen to load bytecode into tc, since the reverse operation,
tc filter list dev em1, is then able to show the exact commands again.
Possible follow-up work could also include a small expression compiler
for iproute2. Tested with the help of bmon. This idea came up during
the Netfilter Workshop 2013 in Copenhagen. Also thanks to feedback from
Eric Dumazet!

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
---
 v1->v2:
  - Ok, applied Eric Dumazet's feedback with possibility to overwrite
    default flowid

 iproute2 part: http://patchwork.ozlabs.org/patch/286496/

 include/uapi/linux/pkt_cls.h |  14 ++
 net/sched/Kconfig            |  10 ++
 net/sched/Makefile           |   1 +
 net/sched/cls_bpf.c          | 385 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 410 insertions(+)
 create mode 100644 net/sched/cls_bpf.c

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 082eafa..25731df 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -388,6 +388,20 @@ enum {
 
 #define TCA_CGROUP_MAX (__TCA_CGROUP_MAX - 1)
 
+/* BPF classifier */
+
+enum {
+	TCA_BPF_UNSPEC,
+	TCA_BPF_ACT,
+	TCA_BPF_POLICE,
+	TCA_BPF_CLASSID,
+	TCA_BPF_OPS_LEN,
+	TCA_BPF_OPS,
+	__TCA_BPF_MAX,
+};
+
+#define TCA_BPF_MAX (__TCA_BPF_MAX - 1)
+
 /* Extended Matches */
 
 struct tcf_ematch_tree_hdr {
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index c03a32a..ad1f1d8 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -443,6 +443,16 @@ config NET_CLS_CGROUP
 	  To compile this code as a module, choose M here: the
 	  module will be called cls_cgroup.
 
+config NET_CLS_BPF
+	tristate "BPF-based classifier"
+	select NET_CLS
+	---help---
+	  If you say Y here, you will be able to classify packets based on
+	  programmable BPF (JIT'ed) filters as an alternative to ematches.
+
+	  To compile this code as a module, choose M here: the module will
+	  be called cls_bpf.
+
 config NET_EMATCH
 	bool "Extended Matches"
 	select NET_CLS
diff --git a/net/sched/Makefile b/net/sched/Makefile
index e5f9abe..35fa47a 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -50,6 +50,7 @@ obj-$(CONFIG_NET_CLS_RSVP6)	+= cls_rsvp6.o
 obj-$(CONFIG_NET_CLS_BASIC)	+= cls_basic.o
 obj-$(CONFIG_NET_CLS_FLOW)	+= cls_flow.o
 obj-$(CONFIG_NET_CLS_CGROUP)	+= cls_cgroup.o
+obj-$(CONFIG_NET_CLS_BPF)	+= cls_bpf.o
 obj-$(CONFIG_NET_EMATCH)	+= ematch.o
 obj-$(CONFIG_NET_EMATCH_CMP)	+= em_cmp.o
 obj-$(CONFIG_NET_EMATCH_NBYTE)	+= em_nbyte.o
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
new file mode 100644
index 0000000..1002a82
--- /dev/null
+++ b/net/sched/cls_bpf.c
@@ -0,0 +1,385 @@
+/*
+ * Berkeley Packet Filter based traffic classifier
+ *
+ * Might be used to classify traffic through flexible, user-defined and
+ * possibly JIT-ed BPF filters for traffic control as an alternative to
+ * ematches.
+ *
+ * (C) 2013 Daniel Borkmann <dborkman@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/skbuff.h>
+#include <linux/filter.h>
+#include <net/rtnetlink.h>
+#include <net/pkt_cls.h>
+#include <net/sock.h>
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Daniel Borkmann <dborkman@redhat.com>");
+MODULE_DESCRIPTION("TC BPF based classifier");
+
+struct cls_bpf_head {
+	struct list_head plist;
+	u32 hgen;
+};
+
+struct cls_bpf_prog {
+	struct sk_filter *filter;
+	struct sock_filter *bpf_ops;
+	struct tcf_exts exts;
+	struct tcf_result res;
+	struct list_head link;
+	u32 handle;
+	u16 bpf_len;
+};
+
+static const struct nla_policy bpf_policy[TCA_BPF_MAX + 1] = {
+	[TCA_BPF_CLASSID]	= { .type = NLA_U32 },
+	[TCA_BPF_OPS_LEN]	= { .type = NLA_U16 },
+	[TCA_BPF_OPS]		= { .type = NLA_BINARY,
+				    .len = sizeof(struct sock_filter) * BPF_MAXINSNS },
+};
+
+static const struct tcf_ext_map bpf_ext_map = {
+	.action = TCA_BPF_ACT,
+	.police = TCA_BPF_POLICE,
+};
+
+static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
+			    struct tcf_result *res)
+{
+	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_prog *prog;
+	int ret;
+
+	list_for_each_entry(prog, &head->plist, link) {
+		int filter_res = SK_RUN_FILTER(prog->filter, skb);
+
+		if (filter_res == 0)
+			continue;
+
+		*res = prog->res;
+		if (filter_res != -1)
+			res->classid = filter_res;
+
+		ret = tcf_exts_exec(skb, &prog->exts, res);
+		if (ret < 0)
+			continue;
+
+		return ret;
+	}
+
+	return -1;
+}
+
+static int cls_bpf_init(struct tcf_proto *tp)
+{
+	struct cls_bpf_head *head;
+
+	head = kzalloc(sizeof(*head), GFP_KERNEL);
+	if (head == NULL)
+		return -ENOBUFS;
+
+	INIT_LIST_HEAD(&head->plist);
+	tp->root = head;
+
+	return 0;
+}
+
+static void cls_bpf_delete_prog(struct tcf_proto *tp, struct cls_bpf_prog *prog)
+{
+	tcf_unbind_filter(tp, &prog->res);
+	tcf_exts_destroy(tp, &prog->exts);
+
+	sk_unattached_filter_destroy(prog->filter);
+
+	kfree(prog->bpf_ops);
+	kfree(prog);
+}
+
+static int cls_bpf_delete(struct tcf_proto *tp, unsigned long arg)
+{
+	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_prog *prog, *todel = (struct cls_bpf_prog *) arg;
+
+	list_for_each_entry(prog, &head->plist, link) {
+		if (prog == todel) {
+			tcf_tree_lock(tp);
+			list_del(&prog->link);
+			tcf_tree_unlock(tp);
+
+			cls_bpf_delete_prog(tp, prog);
+			return 0;
+		}
+	}
+
+	return -ENOENT;
+}
+
+static void cls_bpf_destroy(struct tcf_proto *tp)
+{
+	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_prog *prog, *tmp;
+
+	list_for_each_entry_safe(prog, tmp, &head->plist, link) {
+		list_del(&prog->link);
+		cls_bpf_delete_prog(tp, prog);
+	}
+
+	kfree(head);
+}
+
+static unsigned long cls_bpf_get(struct tcf_proto *tp, u32 handle)
+{
+	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_prog *prog;
+	unsigned long ret = 0UL;
+
+	if (head == NULL)
+		return 0UL;
+
+	list_for_each_entry(prog, &head->plist, link) {
+		if (prog->handle == handle) {
+			ret = (unsigned long) prog;
+			break;
+		}
+	}
+
+	return ret;
+}
+
+static void cls_bpf_put(struct tcf_proto *tp, unsigned long f)
+{
+}
+
+static int cls_bpf_modify_existing(struct net *net, struct tcf_proto *tp,
+				   struct cls_bpf_prog *prog,
+				   unsigned long base, struct nlattr **tb,
+				   struct nlattr *est)
+{
+	struct sock_filter *bpf_ops, *bpf_old;
+	struct tcf_exts exts;
+	struct sock_fprog tmp;
+	struct sk_filter *fp, *fp_old;
+	u16 bpf_size, bpf_len;
+	u32 classid;
+	int ret;
+
+	if (!tb[TCA_BPF_OPS_LEN] || !tb[TCA_BPF_OPS] || !tb[TCA_BPF_CLASSID])
+		return -EINVAL;
+
+	ret = tcf_exts_validate(net, tp, tb, est, &exts, &bpf_ext_map);
+	if (ret < 0)
+		return ret;
+
+	classid = nla_get_u32(tb[TCA_BPF_CLASSID]);
+	bpf_len = nla_get_u16(tb[TCA_BPF_OPS_LEN]);
+	if (bpf_len > BPF_MAXINSNS || bpf_len == 0) {
+		ret = -EINVAL;
+		goto errout;
+	}
+
+	bpf_size = bpf_len * sizeof(*bpf_ops);
+	bpf_ops = kzalloc(bpf_size, GFP_KERNEL);
+	if (bpf_ops == NULL) {
+		ret = -ENOMEM;
+		goto errout;
+	}
+
+	memcpy(bpf_ops, nla_data(tb[TCA_BPF_OPS]), bpf_size);
+
+	tmp.len = bpf_len;
+	tmp.filter = (struct sock_filter __user *) bpf_ops;
+
+	ret = sk_unattached_filter_create(&fp, &tmp);
+	if (ret)
+		goto errout_free;
+
+	tcf_tree_lock(tp);
+	fp_old = prog->filter;
+	bpf_old = prog->bpf_ops;
+
+	prog->bpf_len = bpf_len;
+	prog->bpf_ops = bpf_ops;
+	prog->filter = fp;
+	prog->res.classid = classid;
+	tcf_tree_unlock(tp);
+
+	tcf_bind_filter(tp, &prog->res, base);
+	tcf_exts_change(tp, &prog->exts, &exts);
+
+	if (fp_old)
+		sk_unattached_filter_destroy(fp_old);
+	if (bpf_old)
+		kfree(bpf_old);
+
+	return 0;
+
+errout_free:
+	kfree(bpf_ops);
+errout:
+	tcf_exts_destroy(tp, &exts);
+	return ret;
+}
+
+static u32 cls_bpf_grab_new_handle(struct tcf_proto *tp,
+				   struct cls_bpf_head *head)
+{
+	unsigned int i = 0x80000000;
+
+	do {
+		if (++head->hgen == 0x7FFFFFFF)
+			head->hgen = 1;
+	} while (--i > 0 && cls_bpf_get(tp, head->hgen));
+	if (i == 0)
+		pr_err("Insufficient number of handles\n");
+
+	return i;
+}
+
+static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
+			  struct tcf_proto *tp, unsigned long base,
+			  u32 handle, struct nlattr **tca,
+			  unsigned long *arg)
+{
+	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_prog *prog = (struct cls_bpf_prog *) *arg;
+	struct nlattr *tb[TCA_BPF_MAX + 1];
+	int ret;
+
+	if (tca[TCA_OPTIONS] == NULL)
+		return -EINVAL;
+
+	ret = nla_parse_nested(tb, TCA_BPF_MAX, tca[TCA_OPTIONS], bpf_policy);
+	if (ret < 0)
+		return ret;
+
+	if (prog != NULL) {
+		if (handle && prog->handle != handle)
+			return -EINVAL;
+		return cls_bpf_modify_existing(net, tp, prog, base, tb,
+					       tca[TCA_RATE]);
+	}
+
+	prog = kzalloc(sizeof(*prog), GFP_KERNEL);
+	if (prog == NULL)
+		return -ENOBUFS;
+
+	if (handle == 0)
+		prog->handle = cls_bpf_grab_new_handle(tp, head);
+	else
+		prog->handle = handle;
+	if (prog->handle == 0) {
+		ret = -EINVAL;
+		goto errout;
+	}
+
+	ret = cls_bpf_modify_existing(net, tp, prog, base, tb, tca[TCA_RATE]);
+	if (ret < 0)
+		goto errout;
+
+	tcf_tree_lock(tp);
+	list_add(&prog->link, &head->plist);
+	tcf_tree_unlock(tp);
+
+	*arg = (unsigned long) prog;
+
+	return 0;
+errout:
+	if (*arg == 0UL && prog)
+		kfree(prog);
+
+	return ret;
+}
+
+static int cls_bpf_dump(struct tcf_proto *tp, unsigned long fh,
+			struct sk_buff *skb, struct tcmsg *tm)
+{
+	struct cls_bpf_prog *prog = (struct cls_bpf_prog *) fh;
+	struct nlattr *nest, *nla;
+
+	if (prog == NULL)
+		return skb->len;
+
+	tm->tcm_handle = prog->handle;
+
+	nest = nla_nest_start(skb, TCA_OPTIONS);
+	if (nest == NULL)
+		goto nla_put_failure;
+
+	if (nla_put_u32(skb, TCA_BPF_CLASSID, prog->res.classid))
+		goto nla_put_failure;
+	if (nla_put_u16(skb, TCA_BPF_OPS_LEN, prog->bpf_len))
+		goto nla_put_failure;
+
+	nla = nla_reserve(skb, TCA_BPF_OPS, prog->bpf_len *
+			  sizeof(struct sock_filter));
+	if (nla == NULL)
+		goto nla_put_failure;
+
+        memcpy(nla_data(nla), prog->bpf_ops, nla_len(nla));
+
+	if (tcf_exts_dump(skb, &prog->exts, &bpf_ext_map) < 0)
+		goto nla_put_failure;
+
+	nla_nest_end(skb, nest);
+
+	if (tcf_exts_dump_stats(skb, &prog->exts, &bpf_ext_map) < 0)
+		goto nla_put_failure;
+
+	return skb->len;
+
+nla_put_failure:
+	nla_nest_cancel(skb, nest);
+	return -1;
+}
+
+static void cls_bpf_walk(struct tcf_proto *tp, struct tcf_walker *arg)
+{
+	struct cls_bpf_head *head = tp->root;
+	struct cls_bpf_prog *prog;
+
+	list_for_each_entry(prog, &head->plist, link) {
+		if (arg->count < arg->skip)
+			goto skip;
+		if (arg->fn(tp, (unsigned long) prog, arg) < 0) {
+			arg->stop = 1;
+			break;
+		}
+skip:
+		arg->count++;
+	}
+}
+
+static struct tcf_proto_ops cls_bpf_ops __read_mostly = {
+	.kind		=	"bpf",
+	.owner		=	THIS_MODULE,
+	.classify	=	cls_bpf_classify,
+	.init		=	cls_bpf_init,
+	.destroy	=	cls_bpf_destroy,
+	.get		=	cls_bpf_get,
+	.put		=	cls_bpf_put,
+	.change		=	cls_bpf_change,
+	.delete		=	cls_bpf_delete,
+	.walk		=	cls_bpf_walk,
+	.dump		=	cls_bpf_dump,
+};
+
+static int __init cls_bpf_init_mod(void)
+{
+	return register_tcf_proto_ops(&cls_bpf_ops);
+}
+
+static void __exit cls_bpf_exit_mod(void)
+{
+	unregister_tcf_proto_ops(&cls_bpf_ops);
+}
+
+module_init(cls_bpf_init_mod);
+module_exit(cls_bpf_exit_mod);
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH 16/16] wl1251: Add sysfs file address for setting permanent mac address
From: Stephen Hemminger @ 2013-10-28 15:33 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Dan Williams, Pali Rohár, Luciano Coelho, John W. Linville,
	David S. Miller, linux-wireless, netdev, linux-kernel,
	freemangordon, aaro.koskinen, pavel, sre, joni.lapilainen
In-Reply-To: <1382972215.17956.30.camel@jlt4.sipsolutions.net>

On Mon, 28 Oct 2013 15:56:55 +0100
Johannes Berg <johannes@sipsolutions.net> wrote:

> On Mon, 2013-10-28 at 09:46 -0500, Dan Williams wrote:
> 
> > If the device doesn't actually *have* a permanent MAC address, then it
> > shouldn't be returning one via ethtool, and should return an error for
> > ETHTOOL_GPERMADDR.
> 
> There's currently no provision for that, but I agree it would be better
> to do so.
> 
> johannes
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

The current netdevice API handle the case of a device without a permanent
MAC address, slightly differently.

If device does not have a permanent address,
then:
  1. dev->addr_assign_type should not be NET_ADDR_PERM
  2. when device is registered dev->perm_addr will not be set
     and retain all zeros value
  4. when ethtool gets address it will return all zeros which
     is not a valid address.

This case doesn't seem to be handled threo mac80211 API's

^ permalink raw reply

* Re: [PATCH 16/16] wl1251: Add sysfs file address for setting permanent mac address
From: Dan Williams @ 2013-10-28 15:29 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Johannes Berg, Luciano Coelho, John W. Linville, David S. Miller,
	linux-wireless, netdev, linux-kernel, freemangordon,
	aaro.koskinen, pavel, sre, joni.lapilainen
In-Reply-To: <201310281604.48542@pali>

On Mon, 2013-10-28 at 16:04 +0100, Pali Rohár wrote:
> On Monday 28 October 2013 15:56:55 Johannes Berg wrote:
> > On Mon, 2013-10-28 at 09:46 -0500, Dan Williams wrote:
> > > If the device doesn't actually *have* a permanent MAC
> > > address, then it shouldn't be returning one via ethtool,
> > > and should return an error for ETHTOOL_GPERMADDR.
> > 
> > There's currently no provision for that, but I agree it would
> > be better to do so.
> > 
> > johannes
> 
> So what to do with devices which has MAC address stored in some 
> obscure place and there is userspace binary which can read it?
> 
> I think that there should be some way to tell kernel that *this* 
> is the permanent address and it should use it.
> 
> This is reason why I proposed patch which adding sysfs entry for 
> setting permanent address from userspace in wl1251 driver.

Ok, so the N900's MAC is stored in the "cal partition", which is a
region of flash exposed to the OS as /dev/mtd1.  That also stores the
regulatory domain.  Typically userspace binaries are used to pull out
this and other data (see
https://dev.openwrt.org/browser/packages/utils/calvaria/files/src/calvaria.c ) which is then used to initialize the device.

Any idea what kernel driver is used to expose the CAL partition
as /dev/mtd1?  If it's nothing special maybe a small EEPROM-style driver
could be written to read the relevant areas (and since they don't ever
change, don't need to worry about something else writing at the same
time).

Dan

^ permalink raw reply

* Re: [PATCH net-next] net: sched: cls_bpf: add BPF-based classifier
From: Eric Dumazet @ 2013-10-28 15:27 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, netdev, Thomas Graf
In-Reply-To: <526E7556.1060902@redhat.com>

On Mon, 2013-10-28 at 15:31 +0100, Daniel Borkmann wrote:

> I thought about this, I think this can partially be resolved by the
> user implementing one BPF program to match all possible flows for a
> class instead of implementing multiple BPF programs as mentioned in
> the commit, iow that's up to the user space. And the case you suggest,
> would be another option to further improve this, but would come with
> some difficulties in contrast to the notion 0: mismatch, !0: match.
> I think, this would need additional walk-through through all 'ret'
> opcodes to see if those classes actually exist. Then, we would need
> refcounting and call tcf_{un,}bind_filter() for each class that is
> related to this filter, and tcf_exts_exec() would either need to be
> i) understood in a "per-filter" notion, so as long as something
> matches (!0) exec the very same action/policy (which might not be
> what we want) or ii) could just not be implemented as multiple
> user-defined filters could be defined in iproute2 with different
> actions each, but in some paths return the same flowid. So I think
> this here seems the better trade-off.

I have no idea why you want to do all this.

classid are not checked at filter installation time, but at run time.

All you need to do is change :

+       list_for_each_entry(prog, &head->plist, link) {
+               if (SK_RUN_FILTER(prog->filter, skb) == 0)
+                       continue;
+
+               *res = prog->res;
+               ret = tcf_exts_exec(skb, &prog->exts, res);
+               if (ret < 0)
+                       continue;
+
+               return ret;
+       }

to

	list_for_each_entry(prog, &head->plist, link) {
		int filter_res = SK_RUN_FILTER(prog->filter, skb);

		if (!filter_res)
			continue;


               *res = prog->res;
		if (filter_res != -1)
			res->classid = filter_res;

               ret = tcf_exts_exec(skb, &prog->exts, res);
               if (ret < 0)
                       continue;

               return ret;
       }

(Reserving filter returns codes :
 0 : No match
 -1: select classid given in the "tc filter ...." command)
other values : flowid, overiding the default one.

^ permalink raw reply

* Re: [PATCH 1/3] vxlan: silence one build warning
From: Stephen Hemminger @ 2013-10-28 15:25 UTC (permalink / raw)
  To: David Miller; +Cc: zwu.kernel, netdev, linux-kernel, wuzhy
In-Reply-To: <20131028.003807.1591686723281776238.davem@davemloft.net>

On Mon, 28 Oct 2013 00:38:07 -0400 (EDT)
David Miller <davem@davemloft.net> wrote:

> From: Stephen Hemminger <stephen@networkplumber.org>
> Date: Fri, 25 Oct 2013 08:41:34 -0700
> 
> > I would rather not fix the warning this way since it risks masking
> > later bugs if this code ever changes.
> 
> But this is suboptimally coded, and is asking for the warning.
> 
> Anything returning a pointer by reference is asking for trouble
> in my opinion.
> 
> The correct thing to do is to make create_v{4,6}_sock() return
> the "struct socket *" as an error pointer.
> 
> No more ambiguous initializations, no more warnings.

Agreed, original code used ERR_PTR (see vxlan_socket_create),
the side effect stuff only came with the addition of IPv6.

^ permalink raw reply

* Re: [PATCH v2 1/3] vxlan: silence one build warning
From: Stephen Hemminger @ 2013-10-28 15:24 UTC (permalink / raw)
  To: Zhi Yong Wu; +Cc: netdev, linux-kernel, davem, Zhi Yong Wu
In-Reply-To: <1382940110-10737-1-git-send-email-zwu.kernel@gmail.com>

On Mon, 28 Oct 2013 14:01:48 +0800
Zhi Yong Wu <zwu.kernel@gmail.com> wrote:

> From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
> 
> drivers/net/vxlan.c: In function ‘vxlan_sock_add’:
> drivers/net/vxlan.c:2298:11: warning: ‘sock’ may be used uninitialized in this function [-Wmaybe-uninitialized]
> drivers/net/vxlan.c:2275:17: note: ‘sock’ was declared here
>   LD      drivers/net/built-in.o
> 
> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>

This is much better.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

^ permalink raw reply

* Re: [PATCH 16/16] wl1251: Add sysfs file address for setting permanent mac address
From: Pali Rohár @ 2013-10-28 15:04 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Dan Williams, Luciano Coelho, John W. Linville, David S. Miller,
	linux-wireless, netdev, linux-kernel, freemangordon,
	aaro.koskinen, pavel, sre, joni.lapilainen
In-Reply-To: <1382972215.17956.30.camel@jlt4.sipsolutions.net>

[-- Attachment #1: Type: Text/Plain, Size: 819 bytes --]

On Monday 28 October 2013 15:56:55 Johannes Berg wrote:
> On Mon, 2013-10-28 at 09:46 -0500, Dan Williams wrote:
> > If the device doesn't actually *have* a permanent MAC
> > address, then it shouldn't be returning one via ethtool,
> > and should return an error for ETHTOOL_GPERMADDR.
> 
> There's currently no provision for that, but I agree it would
> be better to do so.
> 
> johannes

So what to do with devices which has MAC address stored in some 
obscure place and there is userspace binary which can read it?

I think that there should be some way to tell kernel that *this* 
is the permanent address and it should use it.

This is reason why I proposed patch which adding sysfs entry for 
setting permanent address from userspace in wl1251 driver.

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: [PATCH 16/16] wl1251: Add sysfs file address for setting permanent mac address
From: Johannes Berg @ 2013-10-28 14:56 UTC (permalink / raw)
  To: Dan Williams
  Cc: Pali Rohár, Luciano Coelho, John W. Linville,
	David S. Miller, linux-wireless, netdev, linux-kernel,
	freemangordon, aaro.koskinen, pavel, sre, joni.lapilainen
In-Reply-To: <1382971562.1542.6.camel@dcbw.foobar.com>

On Mon, 2013-10-28 at 09:46 -0500, Dan Williams wrote:

> If the device doesn't actually *have* a permanent MAC address, then it
> shouldn't be returning one via ethtool, and should return an error for
> ETHTOOL_GPERMADDR.

There's currently no provision for that, but I agree it would be better
to do so.

johannes

^ permalink raw reply

* Re: [PATCH 16/16] wl1251: Add sysfs file address for setting permanent mac address
From: Dan Williams @ 2013-10-28 14:46 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Johannes Berg, Luciano Coelho, John W. Linville, David S. Miller,
	linux-wireless, netdev, linux-kernel, freemangordon,
	aaro.koskinen, pavel, sre, joni.lapilainen
In-Reply-To: <201310281500.24159@pali>

On Mon, 2013-10-28 at 15:00 +0100, Pali Rohár wrote:
> On Monday 28 October 2013 14:55:22 Johannes Berg wrote:
> > On Mon, 2013-10-28 at 14:49 +0100, Pali Rohár wrote:
> > > On Monday 28 October 2013 14:45:05 Johannes Berg wrote:
> > > > On Sat, 2013-10-26 at 22:34 +0200, Pali Rohár wrote:
> > > > > Driver wl1251 generating mac address randomly at startup
> > > > > and there is no way to set permanent mac address via
> > > > > SET_IEEE80211_PERM_ADDR. This patch export sysfs file
> > > > > which can set permanent mac address by userspace helper
> > > > > program. Patch is needed for devices which do not store
> > > > > mac address in internal wl1251 eeprom.
> > > > 
> > > > This doesn't really seem like a good idea since you can
> > > > also just use 'ip' or whatever to set the MAC address.
> > > > 
> > > > johannes
> > > 
> > > AFAIK you cannot set permanent address (show by ethtool -P
> > > wlan0) via ip/ifconfig.
> > 
> > You probably can't, but that address also doesn't matter at
> > all and isn't really used anywhere.
> > 
> > johannes
> 
> There are some (proprietary) applications which using permanent 
> address for something...
> 
> And also more important: some network managing tools using it. 
> NetworkManager resetting current MAC address to permanent one 
> before starting configuring interface.
> 
> So this will lead to never use correct MAC address (but random 
> permanent one) assigned for that wl1251 wireless card...

If the device doesn't actually *have* a permanent MAC address, then it
shouldn't be returning one via ethtool, and should return an error for
ETHTOOL_GPERMADDR.  Setting the permanent MAC address shouldn't ever be
allowed except by the driver inspecting EEPROM, and certainly not via
sysfs.

mac80211 does have to do some special stuff due to virtual interfaces
and such, but in general, if the MAC address is randomly generated,
neither the driver nor mac80211 should be reporting that address as the
permanent address, only as the current one.

Dan

^ permalink raw reply

* Re: [PATCH net-next] net: sched: cls_bpf: add BPF-based classifier
From: Daniel Borkmann @ 2013-10-28 14:31 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev, Thomas Graf
In-Reply-To: <1382967292.13037.22.camel@edumazet-glaptop.roam.corp.google.com>

On 10/28/2013 02:34 PM, Eric Dumazet wrote:
> On Mon, 2013-10-28 at 12:35 +0100, Daniel Borkmann wrote:
>> This work contains a lightweight BPF-based traffic classifier that can
>> serve as a flexible alternative to ematch-based tree classification, i.e.
>> now that BPF filter engine can also be JITed in the kernel. Naturally, tc
>> actions and policies are supported as well with cls_bpf. Multiple BPF
>> programs/filter can be attached for a class, or they can just as well be
>> written within a single BPF program, that's really up to the user how he
>> wishes to run/optimize the code, e.g. also for inversion of verdicts etc.
>> The notion of a BPF program's return/exit codes is being kept as follows:
>> non-zero for match, zero for mismatch.
>>
>> As a minimal usage example with iproute2, we use a 3 band prio root qdisc
>> on a router with sfq each as leave, and assign ssh and icmp bpf-based
>> filters to band 1, http traffic to band 2 and the rest to band 3. For the
>> first two bands we load the bytecode from a file, in the 2nd we load it
>> inline as an example:
>>
>> echo 1 > /proc/sys/net/core/bpf_jit_enable
>>
>> tc qdisc del dev em1 root
>> tc qdisc add dev em1 root handle 1: prio bands 3 priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>
>
>> tc qdisc add dev em1 parent 1:1 sfq perturb 16
>> tc qdisc add dev em1 parent 1:2 sfq perturb 16
>> tc qdisc add dev em1 parent 1:3 sfq perturb 16
>>
>> tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/ssh.bpf flowid 1:1
>> tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/icmp.bpf flowid 1:1
>> tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/http.bpf flowid 1:2
>> tc filter add dev em1 parent 1: bpf run bytecode "`bpfc -f tc -i misc.ops`" flowid 1:3
>>
>> BPF programs can be easily created and passed to tc, either as inline
>> 'bytecode' or 'bytecode-file'. There are a couple of front-ends that can
>> compile opcodes, for example:
>>
>> 1) People familiar with tcpdump-like filters:
>>
>>     tcpdump -iem1 -ddd port 22 | tr '\n' ',' > /etc/tc/ssh.bpf
>>
>> 2) People that want to low-level program their filters or use BPF
>>     extensions that lack support by libpcap's compiler:
>>
>>     bpfc -f tc -i ssh.ops > /etc/tc/ssh.bpf
>>
>>     ssh.ops example code:
>>     ldh [12]
>>     jne #0x800, drop
>>     ldb [23]
>>     jneq #6, drop
>>     ldh [20]
>>     jset #0x1fff, drop
>>     ldxb 4 * ([14] & 0xf)
>>     ldh [%x + 14]
>>     jeq #0x16, pass
>>     ldh [%x + 16]
>>     jne #0x16, drop
>>     pass: ret #-1
>>     drop: ret #0
>>
>> It was chosen to load bytecode into tc, since the reverse operation,
>> tc filter list dev em1, is then able to show the exact commands again.
>> Possible follow-up work could also include a small expression compiler
>> for iproute2. Tested with the help of bmon. This idea came up during
>> the Netfilter Workshop 2013 in Copenhagen.
>>
>
> Well, running a large amount of filters might be very expensive [1],
> have you considered returning the flowid from the filter, instead of
> returning 0 and !0 ?
>
> 0 : would mean : not matched filter
> <>0 : flowid

I thought about this, I think this can partially be resolved by the
user implementing one BPF program to match all possible flows for a
class instead of implementing multiple BPF programs as mentioned in
the commit, iow that's up to the user space. And the case you suggest,
would be another option to further improve this, but would come with
some difficulties in contrast to the notion 0: mismatch, !0: match.
I think, this would need additional walk-through through all 'ret'
opcodes to see if those classes actually exist. Then, we would need
refcounting and call tcf_{un,}bind_filter() for each class that is
related to this filter, and tcf_exts_exec() would either need to be
i) understood in a "per-filter" notion, so as long as something
matches (!0) exec the very same action/policy (which might not be
what we want) or ii) could just not be implemented as multiple
user-defined filters could be defined in iproute2 with different
actions each, but in some paths return the same flowid. So I think
this here seems the better trade-off.

^ permalink raw reply

* RE: [PATCH 7/8] rds: Pass pointers to virt_to_page(), not integers
From: Venkat Venkatsubra @ 2013-10-28 14:22 UTC (permalink / raw)
  To: David Miller, ben; +Cc: linux-kernel, rds-devel, netdev
In-Reply-To: <20131028.002654.404759341390215962.davem@davemloft.net>

-----Original Message-----
From: David Miller [mailto:davem@davemloft.net] 
Sent: Sunday, October 27, 2013 11:27 PM
To: ben@decadent.org.uk
Cc: Venkat Venkatsubra; linux-kernel@vger.kernel.org; rds-devel@oss.oracle.com; netdev@vger.kernel.org
Subject: Re: [PATCH 7/8] rds: Pass pointers to virt_to_page(), not integers

From: Ben Hutchings <ben@decadent.org.uk>
Date: Sun, 27 Oct 2013 21:54:16 +0000

> Most architectures define virt_to_page() as a macro that casts its 
> argument such that an argument of type unsigned long will be accepted 
> without complaint.  However, the proper type is void *, and passing 
> unsigned long results in a warning on MIPS.
> 
> Compile-tested only.
> 
> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

This looks fine:

Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>

^ permalink raw reply

* Re: [PATCH 16/16] wl1251: Add sysfs file address for setting permanent mac address
From: Pali Rohár @ 2013-10-28 14:00 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Luciano Coelho, John W. Linville, David S. Miller, linux-wireless,
	netdev, linux-kernel, freemangordon, aaro.koskinen, pavel, sre,
	joni.lapilainen
In-Reply-To: <1382968522.17956.11.camel@jlt4.sipsolutions.net>

[-- Attachment #1: Type: Text/Plain, Size: 1415 bytes --]

On Monday 28 October 2013 14:55:22 Johannes Berg wrote:
> On Mon, 2013-10-28 at 14:49 +0100, Pali Rohár wrote:
> > On Monday 28 October 2013 14:45:05 Johannes Berg wrote:
> > > On Sat, 2013-10-26 at 22:34 +0200, Pali Rohár wrote:
> > > > Driver wl1251 generating mac address randomly at startup
> > > > and there is no way to set permanent mac address via
> > > > SET_IEEE80211_PERM_ADDR. This patch export sysfs file
> > > > which can set permanent mac address by userspace helper
> > > > program. Patch is needed for devices which do not store
> > > > mac address in internal wl1251 eeprom.
> > > 
> > > This doesn't really seem like a good idea since you can
> > > also just use 'ip' or whatever to set the MAC address.
> > > 
> > > johannes
> > 
> > AFAIK you cannot set permanent address (show by ethtool -P
> > wlan0) via ip/ifconfig.
> 
> You probably can't, but that address also doesn't matter at
> all and isn't really used anywhere.
> 
> johannes

There are some (proprietary) applications which using permanent 
address for something...

And also more important: some network managing tools using it. 
NetworkManager resetting current MAC address to permanent one 
before starting configuring interface.

So this will lead to never use correct MAC address (but random 
permanent one) assigned for that wl1251 wireless card...

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: [patch net-next] ipv6: allow userspace to create address with IFLA_F_TEMPORARY flag
From: Vladislav Yasevich @ 2013-10-28 13:56 UTC (permalink / raw)
  To: Jiri Pirko, Vladislav Yasevich, netdev@vger.kernel.org,
	David Miller, Alexey Kuznetsov, jmorris, yoshfuji,
	Patrick McHardy, thaller, Stephen Hemminger
In-Reply-To: <20131027164835.GB4714@order.stressinduktion.org>

On Sun, Oct 27, 2013 at 12:48 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> Hi Jiri!
>
> On Sun, Oct 27, 2013 at 02:29:41PM +0100, Jiri Pirko wrote:
>> The idea is to provide possibility to do address configuration not in
>> kernel but rather in userspace (as it is done for example in NetworkManager)
>>
>> Maybe I'm missing something, but why is it problem to have the
>> possibility to set lifetime even for temporary prefix?
>
> There is no problem setting the lifetime for a temporary prefix (in
> contrary, it needs one) but the code paths designed for IFA_F_TEMPORARY
> may not fiddle with it. This needs to be checked.
>
> In this constellation addrconf_verify does not refresh the privacy
> address when its preferred lifetime is expired, if you create the
> address by only passing IFA_F_TEMPORARY to rtm_newaddr (as Vlad pointed
> out). E.g. NetworkManager has to take care about that, then.
>
> A temporary address is also bound to a non-privacy public address so
> it's lifetime is determined by its lifetime (e.g. if you switch the
> network and don't receive on-link information for that prefix any
> more). NetworkManager would have to take care about that, too. It is
> just a question of what NetworkManager wants to handle itself or lets
> the kernel handle for it.
>


Exactly.  If we want to re-use the as much of the kernel implementation as
possible, it might be nice to also add a regen notification in case there
is no public address.  This way, Network Manger (or another application)
can do the re-generation if need be for these user configured temporary
addresses.

-vlad


> Greetings,
>
>   Hannes
>

^ permalink raw reply

* Re: [PATCH 16/16] wl1251: Add sysfs file address for setting permanent mac address
From: Johannes Berg @ 2013-10-28 13:55 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Luciano Coelho, John W. Linville, David S. Miller, linux-wireless,
	netdev, linux-kernel, freemangordon, aaro.koskinen, pavel, sre,
	joni.lapilainen
In-Reply-To: <201310281449.58170@pali>

On Mon, 2013-10-28 at 14:49 +0100, Pali Rohár wrote:
> On Monday 28 October 2013 14:45:05 Johannes Berg wrote:
> > On Sat, 2013-10-26 at 22:34 +0200, Pali Rohár wrote:
> > > Driver wl1251 generating mac address randomly at startup and
> > > there is no way to set permanent mac address via
> > > SET_IEEE80211_PERM_ADDR. This patch export sysfs file which
> > > can set permanent mac address by userspace helper program.
> > > Patch is needed for devices which do not store mac address
> > > in internal wl1251 eeprom.
> > 
> > This doesn't really seem like a good idea since you can also
> > just use 'ip' or whatever to set the MAC address.
> > 
> > johannes
> 
> AFAIK you cannot set permanent address (show by ethtool -P wlan0) 
> via ip/ifconfig.

You probably can't, but that address also doesn't matter at all and
isn't really used anywhere.

johannes

^ permalink raw reply

* Re: [PATCH 16/16] wl1251: Add sysfs file address for setting permanent mac address
From: Pali Rohár @ 2013-10-28 13:49 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Luciano Coelho, John W. Linville, David S. Miller, linux-wireless,
	netdev, linux-kernel, freemangordon, aaro.koskinen, pavel, sre,
	joni.lapilainen
In-Reply-To: <1382967905.17956.8.camel@jlt4.sipsolutions.net>

[-- Attachment #1: Type: Text/Plain, Size: 726 bytes --]

On Monday 28 October 2013 14:45:05 Johannes Berg wrote:
> On Sat, 2013-10-26 at 22:34 +0200, Pali Rohár wrote:
> > Driver wl1251 generating mac address randomly at startup and
> > there is no way to set permanent mac address via
> > SET_IEEE80211_PERM_ADDR. This patch export sysfs file which
> > can set permanent mac address by userspace helper program.
> > Patch is needed for devices which do not store mac address
> > in internal wl1251 eeprom.
> 
> This doesn't really seem like a good idea since you can also
> just use 'ip' or whatever to set the MAC address.
> 
> johannes

AFAIK you cannot set permanent address (show by ethtool -P wlan0) 
via ip/ifconfig.

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: [PATCH 01/16] mac80211: fix TX device statistics for monitor interfaces
From: Johannes Berg @ 2013-10-28 13:47 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Luciano Coelho, John W. Linville, David S. Miller, linux-wireless,
	netdev, linux-kernel, freemangordon, aaro.koskinen, pavel, sre,
	joni.lapilainen, David Gnedt
In-Reply-To: <1382819655-30430-2-git-send-email-pali.rohar@gmail.com>

On Sat, 2013-10-26 at 22:34 +0200, Pali Rohár wrote:

> +	dev->stats.tx_packets++;
> +	dev->stats.tx_bytes += skb->len;

We can still easily drop the packet after this - we can't guarantee
it'll go out but maybe we shouldn't do it here?

johannes

^ permalink raw reply

* Re: [PATCH 15/16] wl1251: Add sysfs file tx_mgmt_frm_rate for setting rate
From: Johannes Berg @ 2013-10-28 13:45 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Luciano Coelho, John W. Linville, David S. Miller, linux-wireless,
	netdev, linux-kernel, freemangordon, aaro.koskinen, pavel, sre,
	joni.lapilainen
In-Reply-To: <1382819655-30430-16-git-send-email-pali.rohar@gmail.com>

On Sat, 2013-10-26 at 22:34 +0200, Pali Rohár wrote:
> This patch was extracted from Maemo 2.6.28 kernel

That's not a description or justification for the patch ....

but again, it seems like a bad idea to use sysfs.

johannes

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox