* Re: extending ndo_add_rx_vxlan_port
From: Or Gerlitz @ 2013-10-28 18:19 UTC (permalink / raw)
To: Joseph Gasparakis
Cc: Or Gerlitz, John Fastabend, Yan Burman, netdev, Stephen Hemminger
In-Reply-To: <alpine.LFD.2.03.1310281049210.31450@intel.com>
On Mon, Oct 28, 2013 at 7:59 PM, Joseph Gasparakis
> On Mon, 28 Oct 2013, Or Gerlitz wrote:
>> On Mon, Oct 28, 2013 at 1:44 AM, Joseph Gasparakis wrote:
>> > VXLAN implementation is not done like VLAN. VLANs are stacked on top of
>> > real interfaces and what you are saying makes sense. VXLAN however is
>> > using ip[6]_tunnel_xmit, and this is why we need to notify all the
>>
>> As the name of the ndo you added ndo_add_rx_vxlan_port suggests -- HW
>> needs that for RX offloads. So basically, what I am saying is: in a
>> similar manner that we already program the NIC "over which" the vxlan
>> instance is set to listen on the multicast address which is associated
>> with that vxlan segement, lets give them a hint that packets arriving
>> on this group are vxlan ones, so they can use it for programming
>> steering rules.
>
> Why don't you write up some code so everyone has something to look at?
> Then we can see what makes sense to do in terms of the existing or new ndos.
sure, code talks, indeed, still, looking on net-next, for the current
ndo there's no in tree consumer unless I miss anything, did I?
Or.
^ permalink raw reply
* Re: [PATCH] can: c_can: Speed up rx_poll function
From: Markus Pargmann @ 2013-10-28 18:28 UTC (permalink / raw)
To: Joe Perches
Cc: Marc Kleine-Budde, Wolfgang Grandegger, linux-can, netdev,
linux-kernel, kernel
In-Reply-To: <1382980336.30941.40.camel@joe-AO722>
On Mon, Oct 28, 2013 at 10:12:16AM -0700, Joe Perches wrote:
> On Mon, 2013-10-28 at 17:59 +0100, Markus Pargmann wrote:
> > This patch speeds up the rx_poll function by reducing the number of
> > register reads.
>
> trivial notes:
>
> > diff --git a/drivers/net/can/c_can/c_can.c b/drivers/net/can/c_can/c_can.c
> []
> > @@ -259,6 +259,12 @@ static u32 c_can_read_reg32(struct c_can_priv *priv, enum reg index)
> > +u16 c_can_read_reg16(struct c_can_priv *priv, enum reg index)
> > +{
> > + u16 val = priv->read_reg(priv, index);
> > + return val;
> > +}
>
> This function doesn't seem useful at all.
> It's not exported and it's not static.
Oh right, should be static.
>
> Why not use an in-place priv->read_reg(priv, index)?
No reason for that. Especially with a u16 as mentioned below, it is
clear enough that we have a 16bit value, so I will change it.
>
> > +
> > static void c_can_enable_all_interrupts(struct c_can_priv *priv,
> > int enable)
> > {
> > @@ -798,17 +804,21 @@ static int c_can_do_rx_poll(struct net_device *dev, int quota)
> > u32 num_rx_pkts = 0;
> > unsigned int msg_obj, msg_ctrl_save;
> > struct c_can_priv *priv = netdev_priv(dev);
> > - u32 val = c_can_read_reg32(priv, C_CAN_INTPND1_REG);
> > + unsigned long val = c_can_read_reg16(priv, C_CAN_INTPND1_REG);
>
> Probably better as a u16 as detailed below.
>
> > + /*
> > + * It is faster to read only one 16bit register. This is only possible
> > + * for a maximum number of 16 objects.
> > + */
> > + BUILD_BUG_ON_MSG(C_CAN_MSG_OBJ_RX_LAST > 16,
> > + "Implementation does not support more message objects than 16");
> > +
> > + while (quota > 0 && (val = c_can_read_reg16(priv, C_CAN_INTPND1_REG))) {
> > + msg_obj = 0;
> > + while ((msg_obj = find_next_bit(&val, 16, msg_obj)) < 16 &&
>
> Using ffs instead of find_next_bit would be more standard
> and probably faster too.
I wasn't aware of that. I will change it
Thanks,
Markus Pargmann
--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
^ permalink raw reply
* Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
From: Neil Horman @ 2013-10-28 18:29 UTC (permalink / raw)
To: Ingo Molnar
Cc: Eric Dumazet, linux-kernel, sebastien.dugue, Thomas Gleixner,
Ingo Molnar, H. Peter Anvin, x86, netdev
In-Reply-To: <20131028174630.GB31048@hmsreliant.think-freely.org>
On Mon, Oct 28, 2013 at 01:46:30PM -0400, Neil Horman wrote:
> On Mon, Oct 28, 2013 at 05:24:38PM +0100, Ingo Molnar wrote:
> >
> > * Neil Horman <nhorman@tuxdriver.com> wrote:
> >
> > > Looking at the specific cpu counters we get this:
> > >
> > > Base:
> > > Total time: 0.179 [sec]
> > >
> > > Performance counter stats for 'perf bench sched messaging -- bash -c echo 1 > /sys/module/csum_test/parameters/test_fire' (20 runs):
> > >
> > > 1571.304618 task-clock # 5.213 CPUs utilized ( +- 0.45% )
> > > 14,423 context-switches # 0.009 M/sec ( +- 4.28% )
> > > 2,710 cpu-migrations # 0.002 M/sec ( +- 2.83% )
> >
> > Hm, for these second round of measurements were you using 'perf stat
> > -a -C ...'?
> >
> > The most accurate method of measurement for such single-threaded
> > workloads is something like:
> >
> > taskset 0x1 perf stat -a -C 1 --repeat 20 ...
> >
> > this will bind your workload to CPU#0, and will do PMU measurements
> > only there - without mixing in other CPUs or workloads.
> >
> > Thanks,
> >
> > Ingo
> I wasn't, but I will...
> Neil
>
> > --
Heres my data for running the same test with taskset restricting execution to
only cpu0. I'm not quite sure whats going on here, but doing so resulted in a
10x slowdown of the runtime of each iteration which I can't explain. As before
however, both the parallel alu run and the prefetch run resulted in speedups,
but the two together were not in any way addative. I'm going to keep playing
with the prefetch stride, unless you have an alternate theory.
Regards
Neil
Base:
Total time: 1.013 [sec]
Performance counter stats for 'perf bench sched messaging -- bash -c echo 1 > /sys/module/csum_test/parameters/test_fire' (20 runs):
1140.286043 task-clock # 1.001 CPUs utilized ( +- 0.65% ) [100.00%]
48,779 context-switches # 0.043 M/sec ( +- 10.08% ) [100.00%]
0 cpu-migrations # 0.000 K/sec [100.00%]
75,398 page-faults # 0.066 M/sec ( +- 0.05% )
2,950,225,491 cycles # 2.587 GHz ( +- 0.65% ) [16.63%]
263,349,439 stalled-cycles-frontend # 8.93% frontend cycles idle ( +- 1.87% ) [16.70%]
1,615,723,017 stalled-cycles-backend # 54.77% backend cycles idle ( +- 0.64% ) [16.76%]
2,168,440,946 instructions # 0.74 insns per cycle
# 0.75 stalled cycles per insn ( +- 0.52% ) [16.76%]
406,885,149 branches # 356.827 M/sec ( +- 0.61% ) [16.74%]
10,099,789 branch-misses # 2.48% of all branches ( +- 0.73% ) [16.73%]
1,138,829,982 L1-dcache-loads # 998.723 M/sec ( +- 0.57% ) [16.71%]
21,341,094 L1-dcache-load-misses # 1.87% of all L1-dcache hits ( +- 1.22% ) [16.69%]
38,453,870 LLC-loads # 33.723 M/sec ( +- 1.46% ) [16.67%]
9,587,987 LLC-load-misses # 24.93% of all LL-cache hits ( +- 0.48% ) [16.66%]
566,241,820 L1-icache-loads # 496.579 M/sec ( +- 0.70% ) [16.65%]
9,061,979 L1-icache-load-misses # 1.60% of all L1-icache hits ( +- 3.39% ) [16.65%]
1,130,620,555 dTLB-loads # 991.524 M/sec ( +- 0.64% ) [16.64%]
423,302 dTLB-load-misses # 0.04% of all dTLB cache hits ( +- 4.89% ) [16.63%]
563,371,089 iTLB-loads # 494.061 M/sec ( +- 0.62% ) [16.62%]
215,406 iTLB-load-misses # 0.04% of all iTLB cache hits ( +- 6.97% ) [16.60%]
0 L1-dcache-prefetches # 0.000 K/sec [16.59%]
0 L1-dcache-prefetch-misses # 0.000 K/sec [16.58%]
1.139598762 seconds time elapsed ( +- 0.65% )
Prefetch:
Total time: 0.981 [sec]
Performance counter stats for 'perf bench sched messaging -- bash -c echo 1 > /sys/module/csum_test/parameters/test_fire' (20 runs):
1128.603117 task-clock # 1.001 CPUs utilized ( +- 0.66% ) [100.00%]
45,992 context-switches # 0.041 M/sec ( +- 9.47% ) [100.00%]
0 cpu-migrations # 0.000 K/sec [100.00%]
75,428 page-faults # 0.067 M/sec ( +- 0.06% )
2,920,666,228 cycles # 2.588 GHz ( +- 0.66% ) [16.59%]
255,998,006 stalled-cycles-frontend # 8.77% frontend cycles idle ( +- 1.78% ) [16.67%]
1,601,090,475 stalled-cycles-backend # 54.82% backend cycles idle ( +- 0.69% ) [16.75%]
2,164,301,312 instructions # 0.74 insns per cycle
# 0.74 stalled cycles per insn ( +- 0.59% ) [16.78%]
404,920,928 branches # 358.781 M/sec ( +- 0.54% ) [16.77%]
10,025,146 branch-misses # 2.48% of all branches ( +- 0.66% ) [16.75%]
1,133,764,674 L1-dcache-loads # 1004.573 M/sec ( +- 0.47% ) [16.74%]
21,251,432 L1-dcache-load-misses # 1.87% of all L1-dcache hits ( +- 1.01% ) [16.72%]
38,006,432 LLC-loads # 33.676 M/sec ( +- 1.56% ) [16.70%]
9,625,034 LLC-load-misses # 25.32% of all LL-cache hits ( +- 0.40% ) [16.68%]
565,712,289 L1-icache-loads # 501.250 M/sec ( +- 0.57% ) [16.66%]
8,726,826 L1-icache-load-misses # 1.54% of all L1-icache hits ( +- 3.40% ) [16.64%]
1,130,140,463 dTLB-loads # 1001.362 M/sec ( +- 0.53% ) [16.63%]
419,645 dTLB-load-misses # 0.04% of all dTLB cache hits ( +- 4.44% ) [16.62%]
560,199,307 iTLB-loads # 496.365 M/sec ( +- 0.51% ) [16.61%]
213,413 iTLB-load-misses # 0.04% of all iTLB cache hits ( +- 6.65% ) [16.59%]
0 L1-dcache-prefetches # 0.000 K/sec [16.56%]
0 L1-dcache-prefetch-misses # 0.000 K/sec [16.54%]
1.127934534 seconds time elapsed ( +- 0.66% )
Parallel ALU:
Total time: 0.986 [sec]
Performance counter stats for 'perf bench sched messaging -- bash -c echo 1 > /sys/module/csum_test/parameters/test_fire' (20 runs):
1131.914738 task-clock # 1.001 CPUs utilized ( +- 0.49% ) [100.00%]
40,807 context-switches # 0.036 M/sec ( +- 10.72% ) [100.00%]
0 cpu-migrations # 0.000 K/sec ( +-100.00% ) [100.00%]
75,329 page-faults # 0.067 M/sec ( +- 0.04% )
2,929,149,996 cycles # 2.588 GHz ( +- 0.49% ) [16.58%]
250,428,558 stalled-cycles-frontend # 8.55% frontend cycles idle ( +- 1.75% ) [16.66%]
1,621,074,968 stalled-cycles-backend # 55.34% backend cycles idle ( +- 0.46% ) [16.73%]
2,147,405,781 instructions # 0.73 insns per cycle
# 0.75 stalled cycles per insn ( +- 0.56% ) [16.77%]
401,196,771 branches # 354.441 M/sec ( +- 0.58% ) [16.76%]
9,941,701 branch-misses # 2.48% of all branches ( +- 0.67% ) [16.74%]
1,126,651,774 L1-dcache-loads # 995.350 M/sec ( +- 0.50% ) [16.73%]
21,075,294 L1-dcache-load-misses # 1.87% of all L1-dcache hits ( +- 0.96% ) [16.72%]
37,885,850 LLC-loads # 33.471 M/sec ( +- 1.10% ) [16.71%]
9,729,116 LLC-load-misses # 25.68% of all LL-cache hits ( +- 0.62% ) [16.69%]
562,058,495 L1-icache-loads # 496.556 M/sec ( +- 0.54% ) [16.67%]
8,617,450 L1-icache-load-misses # 1.53% of all L1-icache hits ( +- 3.06% ) [16.65%]
1,121,765,737 dTLB-loads # 991.034 M/sec ( +- 0.57% ) [16.63%]
388,875 dTLB-load-misses # 0.03% of all dTLB cache hits ( +- 4.27% ) [16.62%]
556,029,393 iTLB-loads # 491.229 M/sec ( +- 0.64% ) [16.61%]
189,181 iTLB-load-misses # 0.03% of all iTLB cache hits ( +- 6.98% ) [16.60%]
0 L1-dcache-prefetches # 0.000 K/sec [16.58%]
0 L1-dcache-prefetch-misses # 0.000 K/sec [16.56%]
1.131247174 seconds time elapsed ( +- 0.49% )
Both:
Total time: 0.993 [sec]
Performance counter stats for 'perf bench sched messaging -- bash -c echo 1 > /sys/module/csum_test/parameters/test_fire' (20 runs):
1130.912197 task-clock # 1.001 CPUs utilized ( +- 0.60% ) [100.00%]
45,859 context-switches # 0.041 M/sec ( +- 9.00% ) [100.00%]
0 cpu-migrations # 0.000 K/sec [100.00%]
75,398 page-faults # 0.067 M/sec ( +- 0.07% )
2,926,527,048 cycles # 2.588 GHz ( +- 0.60% ) [16.60%]
255,482,254 stalled-cycles-frontend # 8.73% frontend cycles idle ( +- 1.62% ) [16.67%]
1,608,247,364 stalled-cycles-backend # 54.95% backend cycles idle ( +- 0.73% ) [16.74%]
2,162,135,903 instructions # 0.74 insns per cycle
# 0.74 stalled cycles per insn ( +- 0.46% ) [16.77%]
403,436,790 branches # 356.736 M/sec ( +- 0.44% ) [16.76%]
10,062,572 branch-misses # 2.49% of all branches ( +- 0.85% ) [16.75%]
1,133,889,264 L1-dcache-loads # 1002.632 M/sec ( +- 0.56% ) [16.74%]
21,460,116 L1-dcache-load-misses # 1.89% of all L1-dcache hits ( +- 1.31% ) [16.73%]
38,070,119 LLC-loads # 33.663 M/sec ( +- 1.63% ) [16.72%]
9,593,162 LLC-load-misses # 25.20% of all LL-cache hits ( +- 0.42% ) [16.71%]
562,867,188 L1-icache-loads # 497.711 M/sec ( +- 0.59% ) [16.68%]
8,472,343 L1-icache-load-misses # 1.51% of all L1-icache hits ( +- 3.02% ) [16.64%]
1,126,997,403 dTLB-loads # 996.538 M/sec ( +- 0.53% ) [16.61%]
414,900 dTLB-load-misses # 0.04% of all dTLB cache hits ( +- 4.12% ) [16.60%]
561,156,032 iTLB-loads # 496.198 M/sec ( +- 0.56% ) [16.59%]
212,482 iTLB-load-misses # 0.04% of all iTLB cache hits ( +- 6.10% ) [16.58%]
0 L1-dcache-prefetches # 0.000 K/sec [16.57%]
0 L1-dcache-prefetch-misses # 0.000 K/sec [16.56%]
1.130242195 seconds time elapsed ( +- 0.60% )
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply
* Re: [PATCH 2/4] wl1251: move power GPIO handling into the driver
From: Mark Brown @ 2013-10-28 19:23 UTC (permalink / raw)
To: Grazvydas Ignotas
Cc: Alexander Shiyan, Luciano Coelho, Mark Rutland, devicetree,
Russell King, Pawel Moll, Ian Campbell, Tony Lindgren,
Greg Kroah-Hartman, Stephen Warren, linux-doc, John W. Linville,
Rob Herring, linux-kernel@vger.kernel.org, Sachin Kamat,
Bill Pemberton, Felipe Balbi, Rob Landley, netdev,
linux-wireless@vger.kernel.org, linux-omap@vger.kernel.org,
"linux-arm-kernel@lists.infradead.org"
In-Reply-To: <CANOLnONKcC1i19ypb4rh2Rff3WsVhhuUhNvZvYwDV2-ZNtjVxQ@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1042 bytes --]
On Mon, Oct 28, 2013 at 07:29:52PM +0200, Grazvydas Ignotas wrote:
> When wl12xx family of chips is connected through SDIO, we already have
> that pin set up as a regulator controlled with the help of mmc
> subsystem. When time comes to communicate with the chip, mmc subsystem
> sees this as yet another SD card and looks for associated regulator
> for it, and the board file has that set up as a fixed regulator
> controlling that pin (see pandora_vmmc3 in
> arch/arm/mach-omap2/board-omap3pandora.c). To prevent poweroff after
> first SDIO communications are over, pm_runtime calls are used in
> drivers/net/wireless/ti/wl1251/sdio.c .
Is this actually controlling VMMC though, or is it some other control?
If it's not controlling VMMC then it shouldn't say that it is.
> I don't know if something similar can be done done in SPI case, but
> I'm sure this is not the first your-so-called regulator misuse.
It's not the first but that doesn't make controlling something other
than a regulator through the regulator API any less broken.
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply
* Re: [PATCH 4/4] wl1251: spi: add device tree support
From: Kumar Gala @ 2013-10-28 19:30 UTC (permalink / raw)
To: Grazvydas Ignotas
Cc: Sebastian Reichel, Mark Rutland, linux-doc, Tony Lindgren,
Russell King, Sachin Kamat, Ian Campbell, Sebastian Reichel,
Luciano Coelho, devicetree, Pawel Moll, Stephen Warren,
John W. Linville, Rob Herring, Bill Pemberton,
linux-omap@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
Greg Kroah-Hartman, linux-wireless@vger.kernel.org,
linux-kernel@vger.kernel.org, Felipe
In-Reply-To: <CANOLnOP=antQDwQLxWhuB8+-kZh9x-JJXrXmOz3FH67fM2Lang@mail.gmail.com>
On Oct 28, 2013, at 12:15 PM, Grazvydas Ignotas wrote:
> On Mon, Oct 28, 2013 at 8:37 AM, Kumar Gala <galak@codeaurora.org> wrote:
>> On Oct 27, 2013, at 11:14 AM, Sebastian Reichel wrote:
>>> +++ b/Documentation/devicetree/bindings/net/wireless/ti,wl1251.txt
>>> @@ -0,0 +1,36 @@
>>> +* Texas Instruments wl1251 controller
>>> +
>>> +The wl1251 chip can be connected via SPI or via SDIO. The linux
>>> +kernel currently only supports device tree for the SPI variant.
>>> +
>>
>> From the binding I have no idea what this chip actually does, also we don't normally reference linux kernel support in bindings specs (so please remove it).
>>
>> However, what would expect the SDIO binding to look like? Or more specifically, how would you distinguish the SPI vs SDIO binding/connection? I'm wondering if the compatible should be something like "ti,wl1251-spi" and than the sdio can be "ti,wl1251-sdio"
>
> When connected to SDIO, it doesn't act as standard SDIO device and
> can't be probed (standard SDIO registers missing), so information has
> to come some other way. That used to partially come through
> platform_data and partially through a callback from mmc subsystem (see
> pandora_wl1251_init_card() in
> arch/arm/mach-omap2/board-omap3pandora.c). I don't know much about DT,
> but maybe the information that comes from SDIO registers on "normal"
> SDIO devices should come through DT in this case too? I don't really
> know how that should be integrated with mmc subsystem though..
Ok, my point is still valid that we can use a different compatible for the SDIO case even if its no standard SDIO vs the SPI case.
>
>>> +Required properties:
>>> +- compatible : Should be "ti,wl1251"
>>
>> reg is not listed as a required prop.
>>
>>> +- interrupts : Should contain interrupt line
>>> +- interrupt-parent : Should be the phandle for the interrupt
>>> + controller that services interrupts for this device
>>> +- vio-supply : phandle to regulator providing VIO
>>> +- power-gpio : GPIO connected to chip's PMEN pin
>>
>> should be vendor prefixed: ti,power-gpio
>>
>>> +- For additional required properties on SPI, please consult
>>> + Documentation/devicetree/bindings/spi/spi-bus.txt
>>> +
>>> +Optional properties:
>>> +- ti,use-eeprom : If found, configuration will be loaded from eeprom.
>>
>> can you be a bit more specific on what cfg will be loaded. Also, is this property a boolean, if so how do I know which eeprom the cfg is loaded from (is it one that is directly connected to the wl1251?
>
> wl1251 is a wifi chip that can have an optional eeprom connected to it
> to store things like calibration stuff and MAC address, and that
> eeprom is usually inside a single module with some additional radio
> related chips. If the eeprom is connected (like the module on pandora
> board has), the driver can issue command to the firmware running on
> chip to load that data on it's startup, alternatively the driver can
> load calibration from other storage (like it happens on N900).
So sounds like a boolean. I think just adding that its a boolean and maybe something like "configuration (calibration data, MAC addr, etc..)"
- k
>
>
> Gražvydas
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH] bridge: pass correct vlan id to multicast code
From: Vlad Yasevich @ 2013-10-28 19:45 UTC (permalink / raw)
To: netdev; +Cc: shemminger, makita.toshiaki, Vlad Yasevich
Currently multicast code attempts to extrace the vlan id from
the skb even when vlan filtering is disabled. This can lead
to mdb entries being created with the wrong vlan id.
Pass the already extracted vlan id to the multicast
filtering code to make the correct id is used in
creation as well as lookup.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
---
net/bridge/br_device.c | 2 +-
net/bridge/br_input.c | 2 +-
net/bridge/br_multicast.c | 44 +++++++++++++++++++-------------------------
net/bridge/br_private.h | 6 ++++--
4 files changed, 25 insertions(+), 29 deletions(-)
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index ca04163..e6b7fec 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -64,7 +64,7 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
br_flood_deliver(br, skb, false);
goto out;
}
- if (br_multicast_rcv(br, NULL, skb)) {
+ if (br_multicast_rcv(br, NULL, skb, vid)) {
kfree_skb(skb);
goto out;
}
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index a2fd37e..7e73c32 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -80,7 +80,7 @@ int br_handle_frame_finish(struct sk_buff *skb)
br_fdb_update(br, p, eth_hdr(skb)->h_source, vid);
if (!is_broadcast_ether_addr(dest) && is_multicast_ether_addr(dest) &&
- br_multicast_rcv(br, p, skb))
+ br_multicast_rcv(br, p, skb, vid))
goto drop;
if (p->state == BR_STATE_LEARNING)
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 8b0b610..686284f 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -947,7 +947,8 @@ void br_multicast_disable_port(struct net_bridge_port *port)
static int br_ip4_multicast_igmp3_report(struct net_bridge *br,
struct net_bridge_port *port,
- struct sk_buff *skb)
+ struct sk_buff *skb,
+ u16 vid)
{
struct igmpv3_report *ih;
struct igmpv3_grec *grec;
@@ -957,12 +958,10 @@ static int br_ip4_multicast_igmp3_report(struct net_bridge *br,
int type;
int err = 0;
__be32 group;
- u16 vid = 0;
if (!pskb_may_pull(skb, sizeof(*ih)))
return -EINVAL;
- br_vlan_get_tag(skb, &vid);
ih = igmpv3_report_hdr(skb);
num = ntohs(ih->ngrec);
len = sizeof(*ih);
@@ -1005,7 +1004,8 @@ static int br_ip4_multicast_igmp3_report(struct net_bridge *br,
#if IS_ENABLED(CONFIG_IPV6)
static int br_ip6_multicast_mld2_report(struct net_bridge *br,
struct net_bridge_port *port,
- struct sk_buff *skb)
+ struct sk_buff *skb,
+ u16 vid)
{
struct icmp6hdr *icmp6h;
struct mld2_grec *grec;
@@ -1013,12 +1013,10 @@ static int br_ip6_multicast_mld2_report(struct net_bridge *br,
int len;
int num;
int err = 0;
- u16 vid = 0;
if (!pskb_may_pull(skb, sizeof(*icmp6h)))
return -EINVAL;
- br_vlan_get_tag(skb, &vid);
icmp6h = icmp6_hdr(skb);
num = ntohs(icmp6h->icmp6_dataun.un_data16[1]);
len = sizeof(*icmp6h);
@@ -1141,7 +1139,8 @@ static void br_multicast_query_received(struct net_bridge *br,
static int br_ip4_multicast_query(struct net_bridge *br,
struct net_bridge_port *port,
- struct sk_buff *skb)
+ struct sk_buff *skb,
+ u16 vid)
{
const struct iphdr *iph = ip_hdr(skb);
struct igmphdr *ih = igmp_hdr(skb);
@@ -1153,7 +1152,6 @@ static int br_ip4_multicast_query(struct net_bridge *br,
unsigned long now = jiffies;
__be32 group;
int err = 0;
- u16 vid = 0;
spin_lock(&br->multicast_lock);
if (!netif_running(br->dev) ||
@@ -1189,7 +1187,6 @@ static int br_ip4_multicast_query(struct net_bridge *br,
if (!group)
goto out;
- br_vlan_get_tag(skb, &vid);
mp = br_mdb_ip4_get(mlock_dereference(br->mdb, br), group, vid);
if (!mp)
goto out;
@@ -1219,7 +1216,8 @@ out:
#if IS_ENABLED(CONFIG_IPV6)
static int br_ip6_multicast_query(struct net_bridge *br,
struct net_bridge_port *port,
- struct sk_buff *skb)
+ struct sk_buff *skb,
+ u16 vid)
{
const struct ipv6hdr *ip6h = ipv6_hdr(skb);
struct mld_msg *mld;
@@ -1231,7 +1229,6 @@ static int br_ip6_multicast_query(struct net_bridge *br,
unsigned long now = jiffies;
const struct in6_addr *group = NULL;
int err = 0;
- u16 vid = 0;
spin_lock(&br->multicast_lock);
if (!netif_running(br->dev) ||
@@ -1265,7 +1262,6 @@ static int br_ip6_multicast_query(struct net_bridge *br,
if (!group)
goto out;
- br_vlan_get_tag(skb, &vid);
mp = br_mdb_ip6_get(mlock_dereference(br->mdb, br), group, vid);
if (!mp)
goto out;
@@ -1439,7 +1435,8 @@ static void br_ip6_multicast_leave_group(struct net_bridge *br,
static int br_multicast_ipv4_rcv(struct net_bridge *br,
struct net_bridge_port *port,
- struct sk_buff *skb)
+ struct sk_buff *skb,
+ u16 vid)
{
struct sk_buff *skb2 = skb;
const struct iphdr *iph;
@@ -1447,7 +1444,6 @@ static int br_multicast_ipv4_rcv(struct net_bridge *br,
unsigned int len;
unsigned int offset;
int err;
- u16 vid = 0;
/* We treat OOM as packet loss for now. */
if (!pskb_may_pull(skb, sizeof(*iph)))
@@ -1508,7 +1504,6 @@ static int br_multicast_ipv4_rcv(struct net_bridge *br,
err = 0;
- br_vlan_get_tag(skb2, &vid);
BR_INPUT_SKB_CB(skb)->igmp = 1;
ih = igmp_hdr(skb2);
@@ -1519,10 +1514,10 @@ static int br_multicast_ipv4_rcv(struct net_bridge *br,
err = br_ip4_multicast_add_group(br, port, ih->group, vid);
break;
case IGMPV3_HOST_MEMBERSHIP_REPORT:
- err = br_ip4_multicast_igmp3_report(br, port, skb2);
+ err = br_ip4_multicast_igmp3_report(br, port, skb2, vid);
break;
case IGMP_HOST_MEMBERSHIP_QUERY:
- err = br_ip4_multicast_query(br, port, skb2);
+ err = br_ip4_multicast_query(br, port, skb2, vid);
break;
case IGMP_HOST_LEAVE_MESSAGE:
br_ip4_multicast_leave_group(br, port, ih->group, vid);
@@ -1540,7 +1535,8 @@ err_out:
#if IS_ENABLED(CONFIG_IPV6)
static int br_multicast_ipv6_rcv(struct net_bridge *br,
struct net_bridge_port *port,
- struct sk_buff *skb)
+ struct sk_buff *skb,
+ u16 vid)
{
struct sk_buff *skb2;
const struct ipv6hdr *ip6h;
@@ -1550,7 +1546,6 @@ static int br_multicast_ipv6_rcv(struct net_bridge *br,
unsigned int len;
int offset;
int err;
- u16 vid = 0;
if (!pskb_may_pull(skb, sizeof(*ip6h)))
return -EINVAL;
@@ -1640,7 +1635,6 @@ static int br_multicast_ipv6_rcv(struct net_bridge *br,
err = 0;
- br_vlan_get_tag(skb, &vid);
BR_INPUT_SKB_CB(skb)->igmp = 1;
switch (icmp6_type) {
@@ -1657,10 +1651,10 @@ static int br_multicast_ipv6_rcv(struct net_bridge *br,
break;
}
case ICMPV6_MLD2_REPORT:
- err = br_ip6_multicast_mld2_report(br, port, skb2);
+ err = br_ip6_multicast_mld2_report(br, port, skb2, vid);
break;
case ICMPV6_MGM_QUERY:
- err = br_ip6_multicast_query(br, port, skb2);
+ err = br_ip6_multicast_query(br, port, skb2, vid);
break;
case ICMPV6_MGM_REDUCTION:
{
@@ -1681,7 +1675,7 @@ out:
#endif
int br_multicast_rcv(struct net_bridge *br, struct net_bridge_port *port,
- struct sk_buff *skb)
+ struct sk_buff *skb, u16 vid)
{
BR_INPUT_SKB_CB(skb)->igmp = 0;
BR_INPUT_SKB_CB(skb)->mrouters_only = 0;
@@ -1691,10 +1685,10 @@ int br_multicast_rcv(struct net_bridge *br, struct net_bridge_port *port,
switch (skb->protocol) {
case htons(ETH_P_IP):
- return br_multicast_ipv4_rcv(br, port, skb);
+ return br_multicast_ipv4_rcv(br, port, skb, vid);
#if IS_ENABLED(CONFIG_IPV6)
case htons(ETH_P_IPV6):
- return br_multicast_ipv6_rcv(br, port, skb);
+ return br_multicast_ipv6_rcv(br, port, skb, vid);
#endif
}
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index e14c33b..2e8244e 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -451,7 +451,8 @@ extern int br_ioctl_deviceless_stub(struct net *net, unsigned int cmd, void __us
extern unsigned int br_mdb_rehash_seq;
extern int br_multicast_rcv(struct net_bridge *br,
struct net_bridge_port *port,
- struct sk_buff *skb);
+ struct sk_buff *skb,
+ u16 vid);
extern struct net_bridge_mdb_entry *br_mdb_get(struct net_bridge *br,
struct sk_buff *skb, u16 vid);
extern void br_multicast_add_port(struct net_bridge_port *port);
@@ -522,7 +523,8 @@ static inline bool br_multicast_querier_exists(struct net_bridge *br,
#else
static inline int br_multicast_rcv(struct net_bridge *br,
struct net_bridge_port *port,
- struct sk_buff *skb)
+ struct sk_buff *skb,
+ u16 vid)
{
return 0;
}
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next] netconsole: Convert to pr_<level>
From: Joe Perches @ 2013-10-28 19:53 UTC (permalink / raw)
To: netdev; +Cc: LKML
Use a more current logging style.
Convert printks to pr_<level>.
Consolidate multiple printks into a single printk to avoid
any possible dmesg interleaving. Add a default "event" msg
in case the listed types are ever expanded.
Signed-off-by: Joe Perches <joe@perches.com>
---
drivers/net/netconsole.c | 57 +++++++++++++++++++++++-------------------------
1 file changed, 27 insertions(+), 30 deletions(-)
diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index adeee61..a8ef4c4 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -34,6 +34,8 @@
*
****************************************************************/
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/mm.h>
#include <linux/init.h>
#include <linux/module.h>
@@ -319,8 +321,8 @@ static ssize_t store_enabled(struct netconsole_target *nt,
if (enabled < 0 || enabled > 1)
return -EINVAL;
if (enabled == nt->enabled) {
- printk(KERN_INFO "netconsole: network logging has already %s\n",
- nt->enabled ? "started" : "stopped");
+ pr_info("network logging has already %s\n",
+ nt->enabled ? "started" : "stopped");
return -EINVAL;
}
@@ -339,7 +341,7 @@ static ssize_t store_enabled(struct netconsole_target *nt,
return err;
}
- printk(KERN_INFO "netconsole: network logging started\n");
+ pr_info("network logging started\n");
} else { /* 0 */
netpoll_cleanup(&nt->np);
@@ -358,9 +360,8 @@ static ssize_t store_dev_name(struct netconsole_target *nt,
size_t len;
if (nt->enabled) {
- printk(KERN_ERR "netconsole: target (%s) is enabled, "
- "disable to update parameters\n",
- config_item_name(&nt->item));
+ pr_err("target (%s) is enabled, disable to update parameters\n",
+ config_item_name(&nt->item));
return -EINVAL;
}
@@ -381,9 +382,8 @@ static ssize_t store_local_port(struct netconsole_target *nt,
int rv;
if (nt->enabled) {
- printk(KERN_ERR "netconsole: target (%s) is enabled, "
- "disable to update parameters\n",
- config_item_name(&nt->item));
+ pr_err("target (%s) is enabled, disable to update parameters\n",
+ config_item_name(&nt->item));
return -EINVAL;
}
@@ -400,9 +400,8 @@ static ssize_t store_remote_port(struct netconsole_target *nt,
int rv;
if (nt->enabled) {
- printk(KERN_ERR "netconsole: target (%s) is enabled, "
- "disable to update parameters\n",
- config_item_name(&nt->item));
+ pr_err("target (%s) is enabled, disable to update parameters\n",
+ config_item_name(&nt->item));
return -EINVAL;
}
@@ -417,9 +416,8 @@ static ssize_t store_local_ip(struct netconsole_target *nt,
size_t count)
{
if (nt->enabled) {
- printk(KERN_ERR "netconsole: target (%s) is enabled, "
- "disable to update parameters\n",
- config_item_name(&nt->item));
+ pr_err("target (%s) is enabled, disable to update parameters\n",
+ config_item_name(&nt->item));
return -EINVAL;
}
@@ -427,7 +425,7 @@ static ssize_t store_local_ip(struct netconsole_target *nt,
const char *end;
if (in6_pton(buf, count, nt->np.local_ip.in6.s6_addr, -1, &end) > 0) {
if (*end && *end != '\n') {
- printk(KERN_ERR "netconsole: invalid IPv6 address at: <%c>\n", *end);
+ pr_err("invalid IPv6 address at: <%c>\n", *end);
return -EINVAL;
}
nt->np.ipv6 = true;
@@ -448,9 +446,8 @@ static ssize_t store_remote_ip(struct netconsole_target *nt,
size_t count)
{
if (nt->enabled) {
- printk(KERN_ERR "netconsole: target (%s) is enabled, "
- "disable to update parameters\n",
- config_item_name(&nt->item));
+ pr_err("target (%s) is enabled, disable to update parameters\n",
+ config_item_name(&nt->item));
return -EINVAL;
}
@@ -458,7 +455,7 @@ static ssize_t store_remote_ip(struct netconsole_target *nt,
const char *end;
if (in6_pton(buf, count, nt->np.remote_ip.in6.s6_addr, -1, &end) > 0) {
if (*end && *end != '\n') {
- printk(KERN_ERR "netconsole: invalid IPv6 address at: <%c>\n", *end);
+ pr_err("invalid IPv6 address at: <%c>\n", *end);
return -EINVAL;
}
nt->np.ipv6 = true;
@@ -481,9 +478,8 @@ static ssize_t store_remote_mac(struct netconsole_target *nt,
u8 remote_mac[ETH_ALEN];
if (nt->enabled) {
- printk(KERN_ERR "netconsole: target (%s) is enabled, "
- "disable to update parameters\n",
- config_item_name(&nt->item));
+ pr_err("target (%s) is enabled, disable to update parameters\n",
+ config_item_name(&nt->item));
return -EINVAL;
}
@@ -704,19 +700,20 @@ restart:
}
spin_unlock_irqrestore(&target_list_lock, flags);
if (stopped) {
- printk(KERN_INFO "netconsole: network logging stopped on "
- "interface %s as it ", dev->name);
+ const char *msg = "had an event";
switch (event) {
case NETDEV_UNREGISTER:
- printk(KERN_CONT "unregistered\n");
+ msg = "unregistered";
break;
case NETDEV_RELEASE:
- printk(KERN_CONT "released slaves\n");
+ msg = "released slaves";
break;
case NETDEV_JOIN:
- printk(KERN_CONT "is joining a master device\n");
+ msg = "is joining a master device";
break;
}
+ pr_info("network logging stopped on interface %s as it %s\n",
+ dev->name, msg);
}
done:
@@ -802,7 +799,7 @@ static int __init init_netconsole(void)
goto undonotifier;
register_console(&netconsole);
- printk(KERN_INFO "netconsole: network logging started\n");
+ pr_info("network logging started\n");
return err;
@@ -810,7 +807,7 @@ undonotifier:
unregister_netdevice_notifier(&netconsole_netdev_notifier);
fail:
- printk(KERN_ERR "netconsole: cleaning up\n");
+ pr_err("cleaning up\n");
/*
* Remove all targets and destroy them (only targets created
^ permalink raw reply related
* Re: extending ndo_add_rx_vxlan_port
From: Joseph Gasparakis @ 2013-10-28 20:24 UTC (permalink / raw)
To: Or Gerlitz
Cc: Joseph Gasparakis, Or Gerlitz, John Fastabend, Yan Burman, netdev,
Stephen Hemminger
In-Reply-To: <CAJZOPZJ3J7T=i2QXQPr80Qb-V8RT3uPJaNFDnrOpYxJ1Mt_LNQ@mail.gmail.com>
On Mon, 28 Oct 2013, Or Gerlitz wrote:
> On Mon, Oct 28, 2013 at 7:59 PM, Joseph Gasparakis
> > On Mon, 28 Oct 2013, Or Gerlitz wrote:
> >> On Mon, Oct 28, 2013 at 1:44 AM, Joseph Gasparakis wrote:
>
> >> > VXLAN implementation is not done like VLAN. VLANs are stacked on top of
> >> > real interfaces and what you are saying makes sense. VXLAN however is
> >> > using ip[6]_tunnel_xmit, and this is why we need to notify all the
> >>
> >> As the name of the ndo you added ndo_add_rx_vxlan_port suggests -- HW
> >> needs that for RX offloads. So basically, what I am saying is: in a
> >> similar manner that we already program the NIC "over which" the vxlan
> >> instance is set to listen on the multicast address which is associated
> >> with that vxlan segement, lets give them a hint that packets arriving
> >> on this group are vxlan ones, so they can use it for programming
> >> steering rules.
> >
> > Why don't you write up some code so everyone has something to look at?
> > Then we can see what makes sense to do in terms of the existing or new ndos.
>
> sure, code talks, indeed, still, looking on net-next, for the current
> ndo there's no in tree consumer unless I miss anything, did I?
>
> Or.
>
I don't think you missed anything. My patches for the i40e are currently
in our trees and making their way to netdev. It will be very soon.
^ permalink raw reply
* Re: [PATCH RESEND] packet: Deliver VLAN TPID to userspace
From: Ben Hutchings @ 2013-10-28 20:11 UTC (permalink / raw)
To: David Miller; +Cc: atzm, netdev, stephen
In-Reply-To: <20131025.185920.1893524127229371162.davem@davemloft.net>
On Fri, 2013-10-25 at 18:59 -0400, David Miller wrote:
> From: Atzm Watanabe <atzm@stratosphere.co.jp>
> Date: Tue, 22 Oct 2013 17:39:40 +0900
>
> > struct tpacket_hdr_variant1 {
> > __u32 tp_rxhash;
> > __u32 tp_vlan_tci;
> > + __u32 tp_vlan_tpid;
> > };
> >
>
> You cannot do this, the header length is not variable.
Well it is variable to an extent. This is used as a member of the final
anonymous union member of struct tpacket3_hdr, and the latter is
specified to be tail-padded to a multiple of 16 bytes
(TPACKET_ALIGNMENT). Since its current size is 36 bytes, I think it can
safely grow by 12 bytes, so long as userland doesn't depend on
getsockopt(..., PACKET_HDRLEN, ...) returning only specific values.
Possibly there should be a separate struct tpacket_hdr_variant2 which
includes the extra member. Possibly there should also be a status flag
to indicate that this member is present.
> This patch has been submitted several times, each of which you
> have been shown ways in which your changes break userspace in
> one way or another.
I think we established that struct tpacket3_hdr can't grow arbitrarily,
but not that it can't grow at all.
Ben.
--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: [RFC TRY#2][PATCH] bgmac: pass received packet to the netif instead of copying it
From: David Miller @ 2013-10-28 20:28 UTC (permalink / raw)
To: zajec5; +Cc: netdev, openwrt-devel
In-Reply-To: <1382982142-16876-1-git-send-email-zajec5@gmail.com>
From: Rafał Miłecki <zajec5@gmail.com>
Date: Mon, 28 Oct 2013 18:42:22 +0100
> Copying whole packets with skb_copy_from_linear_data_offset is a pretty
> bad idea. CPU was spending time in __copy_user_common and network
> performance was lower. With the new solution iperf-measured speed
> increased from 116Mb/s to 134Mb/s.
>
> Another way to improve performance could be switching to build_skb. It
> is cache-specific, so will require testing of various devices.
>
> Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Semantically this looks good but it needs some coding style fixes:
The following is indented correctly:
> + dma_sync_single_for_device(dma_dev,
> + slot->dma_addr,
> + BGMAC_RX_BUF_SIZE,
> + DMA_FROM_DEVICE);
> + break;
> }
However, this is not:
> + /* Unmap old skb, we'll pass it to the netfif */
> + dma_unmap_single(dma_dev, old_dma_addr,
> + BGMAC_RX_BUF_SIZE,
> + DMA_FROM_DEVICE);
> +
Like the first case, please align the arguments on the second and subsequently
line to start at the first column after the openning parenthesis on the
first line.
Every time I see this not done correctly, it's because the person doing it
is trying to use TAB characters only to indent. You should be using the
appropriate number of TAB _and_ SPACE characters necessary to get things
lines up properly.
^ permalink raw reply
* Re: [RFC TRY#2][PATCH] bgmac: pass received packet to the netif instead of copying it
From: Rafał Miłecki @ 2013-10-28 20:36 UTC (permalink / raw)
To: David Miller; +Cc: Network Development, OpenWrt Development List
In-Reply-To: <20131028.162858.1940229138245324914.davem@davemloft.net>
2013/10/28 David Miller <davem@davemloft.net>:
> However, this is not:
>
>> + /* Unmap old skb, we'll pass it to the netfif */
>> + dma_unmap_single(dma_dev, old_dma_addr,
>> + BGMAC_RX_BUF_SIZE,
>> + DMA_FROM_DEVICE);
>> +
Oh, sorry for that. I had to copy & paste it in some silly way... Will fix that!
--
Rafał
^ permalink raw reply
* Re: extending ndo_add_rx_vxlan_port
From: Joseph Gasparakis @ 2013-10-28 21:13 UTC (permalink / raw)
To: Joseph Gasparakis
Cc: Or Gerlitz, Or Gerlitz, John Fastabend, Yan Burman, netdev,
Stephen Hemminger
In-Reply-To: <alpine.LFD.2.03.1310281322500.10719@intel.com>
On Mon, 28 Oct 2013, Joseph Gasparakis wrote:
>
>
> On Mon, 28 Oct 2013, Or Gerlitz wrote:
>
> > On Mon, Oct 28, 2013 at 7:59 PM, Joseph Gasparakis
> > > On Mon, 28 Oct 2013, Or Gerlitz wrote:
> > >> On Mon, Oct 28, 2013 at 1:44 AM, Joseph Gasparakis wrote:
> >
> > >> > VXLAN implementation is not done like VLAN. VLANs are stacked on top of
> > >> > real interfaces and what you are saying makes sense. VXLAN however is
> > >> > using ip[6]_tunnel_xmit, and this is why we need to notify all the
> > >>
> > >> As the name of the ndo you added ndo_add_rx_vxlan_port suggests -- HW
> > >> needs that for RX offloads. So basically, what I am saying is: in a
> > >> similar manner that we already program the NIC "over which" the vxlan
> > >> instance is set to listen on the multicast address which is associated
> > >> with that vxlan segement, lets give them a hint that packets arriving
> > >> on this group are vxlan ones, so they can use it for programming
> > >> steering rules.
> > >
> > > Why don't you write up some code so everyone has something to look at?
> > > Then we can see what makes sense to do in terms of the existing or new ndos.
> >
> > sure, code talks, indeed, still, looking on net-next, for the current
> > ndo there's no in tree consumer unless I miss anything, did I?
> >
> > Or.
> >
>
> I don't think you missed anything. My patches for the i40e are currently
> in our trees and making their way to netdev. It will be very soon.
>
>
Oh! And I also forgot to mention that if you want to see how it
will be used, together with the patch mentioned above that introduces
the ndos, I had added an RFC patch against ixgbe that showcases how the
driver could use them. The RFC patch was just loging the info, but the
actual talking to the HW would be vendor specific anyway.
^ permalink raw reply
* Re: [patch net-next] ipv6: allow userspace to create address with IFLA_F_TEMPORARY flag
From: David Miller @ 2013-10-28 21:17 UTC (permalink / raw)
To: hannes
Cc: jiri, vyasevich, netdev, kuznet, jmorris, yoshfuji, kaber,
thaller, stephen
In-Reply-To: <20131027164835.GB4714@order.stressinduktion.org>
From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Sun, 27 Oct 2013 17:48:35 +0100
> A temporary address is also bound to a non-privacy public address so
> it's lifetime is determined by its lifetime (e.g. if you switch the
> network and don't receive on-link information for that prefix any
> more). NetworkManager would have to take care about that, too. It is
> just a question of what NetworkManager wants to handle itself or lets
> the kernel handle for it.
How much really needs to be in userspace to implement RFC4941?
I don't like the idea that even for a fully up and properly
functioning link, if NetworkManager wedges then critical things like
temporary address (re-)generation, will cease.
All that seems to matter is the secret used to generate the temporary
address sequence, and perhaps things like site specific lifetime
parameters. These are things userland can send to the kernel in
netlink messages.
Full disclosure that I am saying this from the perspective of someone
who believes that one of the biggest mistakes ever was allowing the
core of DHCP to be done in userspace.
Right now every ipv4 DHCP user ends up with their interface in
promiscuous mode. The DHCP client implementations are huge non-
trivial bodies of code, and getting them to adopt the changes necessary
to not put the interface in promiscuous mode is harder than pulling
teeth.
If at least the DHCP protocol communications part were in the kernel,
on the other hand, we could remove the problem quite swiftly.
This problem has existed for more than a decade, btw. There simply
exists no will to fix it properly, even after all this time, because
breaking the ice on those DHCP client code bases in userbase is hard.
So I want to see less fundamental stuff about interface configuration
and address assignment in userland, not more.
^ permalink raw reply
* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
From: Jamal Hadi Salim @ 2013-10-28 22:29 UTC (permalink / raw)
To: Florian Fainelli
Cc: Felix Fietkau, Neil Horman, John Fastabend, netdev, David Miller,
Sascha Hauer, John Crispin, Jonas Gorski, Gary Thomas,
Vlad Yasevich, Stephen Hemminger
In-Reply-To: <CAGVrzcbmsMjB78_x8MoOf4C34OMMZ-+3RBbheA0bG=r5VUi4EQ@mail.gmail.com>
On 10/27/13 14:14, Florian Fainelli wrote:
> 2013/10/27 Jamal Hadi Salim <jhs@mojatatu.com>:
>> On 10/25/13 09:01, Felix Fietkau wrote:
>>>
>
> They are yes, the only "fancy" features these switches allow is
> basically to set a given's port vlan id, which is already a huge
> improvement compared to the vendor provided firmware.
>
Nice to know that you have something better than the vendor provided
stuff.
>
> The switch does have an address learning process which is usually not
> controlled by software at all, so yes, flooding is usually the way to
> get it to the CPU.
>
Ok.
> Which exact drivers are you refering to? If we are talking about DSA
> then yes, this is correct, but it is completely Ethernet MAC driver
> agnostic.
>
Sorry - cant point you to an exact one; one that i tried to convert to
NAPI and found these issues was from Netlogic (embedded 64 bit mips),
that i think now is in the kernel proper (and someone had converted to
NAPI as well). Let me get back to you with some sample examples..
> Why would we expose the hardware switch physical ports as netdevs if
> we cannot even any control over their data-path? Unlike these
> multiport NICs, the only traffic you see and you can control is the
> one from your CPU port.
>
Not necessarily for datapath, rather for control path. If i can
pull the stats, ifconfig up/down the port, set flow control
etc - then that is a good reason to expose them.
>
> I do not really see how we could bend the existing interface (is it
> rtnetlink we are talking about or something else btw?) to expose these
> switches, maybe we could with iproute2, but still, the user-space
> interface/tool is far from being the problem here.
>
Look at the FDB API.
The user space interface as well as reusing kernel interfaces is my main
arguement.
> I don't think at any point in this discussion there was a mention that
> we do not want to change the user or kernel interface in OpenWrt
> because we have been using this for the past 5 years, on the contrary,
> if we are bringing this to a wide audience, this is to get some proper
> review and eventually change it.
>
Ok, sorry - I misinterpreted you and Felix. Like i said, if you gave me
that reason I would understand.
cheers,
jamal
^ permalink raw reply
* Re: [PATCH 4/4] wl1251: spi: add device tree support
From: Tomasz Figa @ 2013-10-28 22:38 UTC (permalink / raw)
To: linux-arm-kernel
Cc: Kumar Gala, Sebastian Reichel, Mark Rutland, linux-doc,
Tony Lindgren, Russell King, Sachin Kamat, Ian Campbell,
Sebastian Reichel, Luciano Coelho, devicetree, Pawel Moll,
Stephen Warren, John W. Linville, Rob Herring, Bill Pemberton,
linux-omap, Greg Kroah-Hartman, linux-wireless, linux-kernel,
Felipe Balbi, Rob Landley, netdev
In-Reply-To: <FF34C626-4A49-43B1-B0AD-DC6146ABBB11@codeaurora.org>
On Monday 28 of October 2013 01:37:34 Kumar Gala wrote:
> On Oct 27, 2013, at 11:14 AM, Sebastian Reichel wrote:
> > Add device tree support for the spi variant of wl1251
> > and document the binding.
> >
> > Signed-off-by: Sebastian Reichel <sre@debian.org>
> > ---
> > .../devicetree/bindings/net/wireless/ti,wl1251.txt | 36
> > ++++++++++++++++++++++ drivers/net/wireless/ti/wl1251/spi.c
> > | 23 ++++++++++---- 2 files changed, 53 insertions(+), 6
> > deletions(-)
> > create mode 100644
> > Documentation/devicetree/bindings/net/wireless/ti,wl1251.txt
> >
> > diff --git
> > a/Documentation/devicetree/bindings/net/wireless/ti,wl1251.txt
> > b/Documentation/devicetree/bindings/net/wireless/ti,wl1251.txt new
> > file mode 100644
> > index 0000000..5f8a154
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/net/wireless/ti,wl1251.txt
> > @@ -0,0 +1,36 @@
> > +* Texas Instruments wl1251 controller
> > +
> > +The wl1251 chip can be connected via SPI or via SDIO. The linux
> > +kernel currently only supports device tree for the SPI variant.
> > +
>
> From the binding I have no idea what this chip actually does, also we
> don't normally reference linux kernel support in bindings specs (so
> please remove it).
>
> However, what would expect the SDIO binding to look like? Or more
> specifically, how would you distinguish the SPI vs SDIO
> binding/connection? I'm wondering if the compatible should be
> something like "ti,wl1251-spi" and than the sdio can be
> "ti,wl1251-sdio"
Well, you can easily distinguish an SDIO device from an SPI device by its
parent node, but...
The binding for SDIO might require different set of properties (other than
ones inherited from generic SDIO or SPI bindings) than one for SPI. So
probably different compatible values might be justified.
Did we already have such case before? (maybe some I2C + SPI devices?)
> > +Required properties:
> > +- compatible : Should be "ti,wl1251"
>
> reg is not listed as a required prop.
It is implied by SPI bindings, but it might be a good idea to have this
stated here as well.
>
> > +- interrupts : Should contain interrupt line
> > +- interrupt-parent : Should be the phandle for the interrupt
> > + controller that services interrupts for this device
> > +- vio-supply : phandle to regulator providing VIO
> > +- power-gpio : GPIO connected to chip's PMEN pin
>
> should be vendor prefixed: ti,power-gpio
Hmm, out of curiosity, is it a rule for this kind of properties? I can see
both cases with and without prefixes when grepping for "-gpios" in
Documentation/devicetree/bindings. We should really have such things
written down somewhere.
>
> > +- For additional required properties on SPI, please consult
> > + Documentation/devicetree/bindings/spi/spi-bus.txt
> > +
> > +Optional properties:
> > +- ti,use-eeprom : If found, configuration will be loaded from eeprom.
>
> can you be a bit more specific on what cfg will be loaded. Also, is
> this property a boolean, if so how do I know which eeprom the cfg is
> loaded from (is it one that is directly connected to the wl1251?
Maybe one from ti,has-eeprom or ti,config-eeprom would be better name for
this property?
Best regards,
Tomasz
^ permalink raw reply
* [PATCH net-next] virtio_net: migrate mergeable rx buffers to page frag allocators
From: Michael Dalton @ 2013-10-28 22:44 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Eric Dumazet, Rusty Russell, Michael S. Tsirkin,
Daniel Borkmann, virtualization, Michael Dalton
The virtio_net driver's mergeable receive buffer allocator
uses 4KB packet buffers. For MTU-sized traffic, SKB truesize
is > 4KB but only ~1500 bytes of the buffer is used to store
packet data, reducing the effective TCP window size
substantially. This patch addresses the performance concerns
with mergeable receive buffers by allocating MTU-sized packet
buffers using page frag allocators. If more than MAX_SKB_FRAGS
buffers are needed, the SKB frag_list is used.
Signed-off-by: Michael Dalton <mwdalton@google.com>
---
drivers/net/virtio_net.c | 164 ++++++++++++++++++++++++++++++-----------------
1 file changed, 106 insertions(+), 58 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9fbdfcd..113ee93 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -124,6 +124,11 @@ struct virtnet_info {
/* Lock for config space updates */
struct mutex config_lock;
+ /* Page_frag for GFP_KERNEL packet buffer allocation when we run
+ * low on memory.
+ */
+ struct page_frag alloc_frag;
+
/* Does the affinity hint is set for virtqueues? */
bool affinity_hint_set;
@@ -217,33 +222,18 @@ static void skb_xmit_done(struct virtqueue *vq)
netif_wake_subqueue(vi->dev, vq2txq(vq));
}
-static void set_skb_frag(struct sk_buff *skb, struct page *page,
- unsigned int offset, unsigned int *len)
-{
- int size = min((unsigned)PAGE_SIZE - offset, *len);
- int i = skb_shinfo(skb)->nr_frags;
-
- __skb_fill_page_desc(skb, i, page, offset, size);
-
- skb->data_len += size;
- skb->len += size;
- skb->truesize += PAGE_SIZE;
- skb_shinfo(skb)->nr_frags++;
- skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
- *len -= size;
-}
-
/* Called from bottom half context */
static struct sk_buff *page_to_skb(struct receive_queue *rq,
- struct page *page, unsigned int len)
+ struct page *page, unsigned int offset,
+ unsigned int len, unsigned int truesize)
{
struct virtnet_info *vi = rq->vq->vdev->priv;
struct sk_buff *skb;
struct skb_vnet_hdr *hdr;
- unsigned int copy, hdr_len, offset;
+ unsigned int copy, hdr_len, hdr_padded_len;
char *p;
- p = page_address(page);
+ p = page_address(page) + offset;
/* copy small packet so we can reuse these pages for small data */
skb = netdev_alloc_skb_ip_align(vi->dev, GOOD_COPY_LEN);
@@ -254,16 +244,17 @@ static struct sk_buff *page_to_skb(struct receive_queue *rq,
if (vi->mergeable_rx_bufs) {
hdr_len = sizeof hdr->mhdr;
- offset = hdr_len;
+ hdr_padded_len = sizeof hdr->mhdr;
} else {
hdr_len = sizeof hdr->hdr;
- offset = sizeof(struct padded_vnet_hdr);
+ hdr_padded_len = sizeof(struct padded_vnet_hdr);
}
memcpy(hdr, p, hdr_len);
len -= hdr_len;
- p += offset;
+ offset += hdr_padded_len;
+ p += hdr_padded_len;
copy = len;
if (copy > skb_tailroom(skb))
@@ -273,6 +264,14 @@ static struct sk_buff *page_to_skb(struct receive_queue *rq,
len -= copy;
offset += copy;
+ if (vi->mergeable_rx_bufs) {
+ if (len)
+ skb_add_rx_frag(skb, 0, page, offset, len, truesize);
+ else
+ put_page(page);
+ return skb;
+ }
+
/*
* Verify that we can indeed put this data into a skb.
* This is here to handle cases when the device erroneously
@@ -284,9 +283,12 @@ static struct sk_buff *page_to_skb(struct receive_queue *rq,
dev_kfree_skb(skb);
return NULL;
}
-
+ BUG_ON(offset >= PAGE_SIZE);
while (len) {
- set_skb_frag(skb, page, offset, &len);
+ unsigned int frag_size = min((unsigned)PAGE_SIZE - offset, len);
+ skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page, offset,
+ frag_size, truesize);
+ len -= frag_size;
page = (struct page *)page->private;
offset = 0;
}
@@ -297,33 +299,52 @@ static struct sk_buff *page_to_skb(struct receive_queue *rq,
return skb;
}
-static int receive_mergeable(struct receive_queue *rq, struct sk_buff *skb)
+static int receive_mergeable(struct receive_queue *rq, struct sk_buff *head_skb)
{
- struct skb_vnet_hdr *hdr = skb_vnet_hdr(skb);
+ struct skb_vnet_hdr *hdr = skb_vnet_hdr(head_skb);
+ struct sk_buff *curr_skb = head_skb;
+ char *buf;
struct page *page;
- int num_buf, i, len;
+ int num_buf, len;
num_buf = hdr->mhdr.num_buffers;
while (--num_buf) {
- i = skb_shinfo(skb)->nr_frags;
- if (i >= MAX_SKB_FRAGS) {
- pr_debug("%s: packet too long\n", skb->dev->name);
- skb->dev->stats.rx_length_errors++;
- return -EINVAL;
- }
- page = virtqueue_get_buf(rq->vq, &len);
- if (!page) {
+ int num_skb_frags = skb_shinfo(curr_skb)->nr_frags;
+ buf = virtqueue_get_buf(rq->vq, &len);
+ if (unlikely(!buf)) {
pr_debug("%s: rx error: %d buffers missing\n",
- skb->dev->name, hdr->mhdr.num_buffers);
- skb->dev->stats.rx_length_errors++;
+ head_skb->dev->name, hdr->mhdr.num_buffers);
+ head_skb->dev->stats.rx_length_errors++;
return -EINVAL;
}
-
- if (len > PAGE_SIZE)
- len = PAGE_SIZE;
-
- set_skb_frag(skb, page, 0, &len);
-
+ if (unlikely(len > MAX_PACKET_LEN)) {
+ pr_debug("%s: rx error: merge buffer too long\n",
+ head_skb->dev->name);
+ len = MAX_PACKET_LEN;
+ }
+ if (unlikely(num_skb_frags == MAX_SKB_FRAGS)) {
+ struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC);
+ if (unlikely(!nskb)) {
+ head_skb->dev->stats.rx_dropped++;
+ return -ENOMEM;
+ }
+ if (curr_skb == head_skb)
+ skb_shinfo(curr_skb)->frag_list = nskb;
+ else
+ curr_skb->next = nskb;
+ curr_skb = nskb;
+ head_skb->truesize += nskb->truesize;
+ num_skb_frags = 0;
+ }
+ if (curr_skb != head_skb) {
+ head_skb->data_len += len;
+ head_skb->len += len;
+ head_skb->truesize += MAX_PACKET_LEN;
+ }
+ page = virt_to_head_page(buf);
+ skb_add_rx_frag(curr_skb, num_skb_frags, page,
+ buf - (char *)page_address(page), len,
+ MAX_PACKET_LEN);
--rq->num;
}
return 0;
@@ -341,8 +362,10 @@ static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len)
if (unlikely(len < sizeof(struct virtio_net_hdr) + ETH_HLEN)) {
pr_debug("%s: short packet %i\n", dev->name, len);
dev->stats.rx_length_errors++;
- if (vi->mergeable_rx_bufs || vi->big_packets)
+ if (vi->big_packets)
give_pages(rq, buf);
+ else if (vi->mergeable_rx_bufs)
+ put_page(virt_to_head_page(buf));
else
dev_kfree_skb(buf);
return;
@@ -352,19 +375,28 @@ static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len)
skb = buf;
len -= sizeof(struct virtio_net_hdr);
skb_trim(skb, len);
+ } else if (vi->mergeable_rx_bufs) {
+ struct page *page = virt_to_head_page(buf);
+ skb = page_to_skb(rq, page,
+ (char *)buf - (char *)page_address(page),
+ len, MAX_PACKET_LEN);
+ if (unlikely(!skb)) {
+ dev->stats.rx_dropped++;
+ put_page(page);
+ return;
+ }
+ if (receive_mergeable(rq, skb)) {
+ dev_kfree_skb(skb);
+ return;
+ }
} else {
page = buf;
- skb = page_to_skb(rq, page, len);
+ skb = page_to_skb(rq, page, 0, len, PAGE_SIZE);
if (unlikely(!skb)) {
dev->stats.rx_dropped++;
give_pages(rq, page);
return;
}
- if (vi->mergeable_rx_bufs)
- if (receive_mergeable(rq, skb)) {
- dev_kfree_skb(skb);
- return;
- }
}
hdr = skb_vnet_hdr(skb);
@@ -501,18 +533,28 @@ static int add_recvbuf_big(struct receive_queue *rq, gfp_t gfp)
static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
{
- struct page *page;
+ struct virtnet_info *vi = rq->vq->vdev->priv;
+ char *buf = NULL;
int err;
- page = get_a_page(rq, gfp);
- if (!page)
+ if (gfp & __GFP_WAIT) {
+ if (skb_page_frag_refill(MAX_PACKET_LEN, &vi->alloc_frag,
+ gfp)) {
+ buf = (char *)page_address(vi->alloc_frag.page) +
+ vi->alloc_frag.offset;
+ get_page(vi->alloc_frag.page);
+ vi->alloc_frag.offset += MAX_PACKET_LEN;
+ }
+ } else {
+ buf = netdev_alloc_frag(MAX_PACKET_LEN);
+ }
+ if (!buf)
return -ENOMEM;
- sg_init_one(rq->sg, page_address(page), PAGE_SIZE);
-
- err = virtqueue_add_inbuf(rq->vq, rq->sg, 1, page, gfp);
+ sg_init_one(rq->sg, buf, MAX_PACKET_LEN);
+ err = virtqueue_add_inbuf(rq->vq, rq->sg, 1, buf, gfp);
if (err < 0)
- give_pages(rq, page);
+ put_page(virt_to_head_page(buf));
return err;
}
@@ -1343,8 +1385,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
struct virtqueue *vq = vi->rq[i].vq;
while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
- if (vi->mergeable_rx_bufs || vi->big_packets)
+ if (vi->big_packets)
give_pages(&vi->rq[i], buf);
+ else if (vi->mergeable_rx_bufs)
+ put_page(virt_to_head_page(buf));
else
dev_kfree_skb(buf);
--vi->rq[i].num;
@@ -1650,6 +1694,8 @@ free_recv_bufs:
free_vqs:
cancel_delayed_work_sync(&vi->refill);
virtnet_del_vqs(vi);
+ if (vi->alloc_frag.page)
+ put_page(vi->alloc_frag.page);
free_index:
free_percpu(vi->vq_index);
free_stats:
@@ -1685,6 +1731,8 @@ static void virtnet_remove(struct virtio_device *vdev)
unregister_netdev(vi->dev);
remove_vq_common(vi);
+ if (vi->alloc_frag.page)
+ put_page(vi->alloc_frag.page);
flush_work(&vi->config_work);
--
1.8.4.1
^ permalink raw reply related
* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
From: Jamal Hadi Salim @ 2013-10-28 22:53 UTC (permalink / raw)
To: Felix Fietkau, Florian Fainelli, Neil Horman
Cc: John Fastabend, netdev, David Miller, Sascha Hauer, John Crispin,
Jonas Gorski, Gary Thomas, Vlad Yasevich, Stephen Hemminger,
Lennert Buytenhek
In-Reply-To: <526D6EBA.3020903@openwrt.org>
On 10/27/13 15:51, Felix Fietkau wrote:
> On 2013-10-27 6:19 PM, Jamal Hadi Salim wrote:
> That question does not make any sense to me. Aside from low level
> control frames like pause frames for flow control, the switch has no
> need to send packets to the CPU port on its own.
> Remember what I told you about the switch being a *separate* entity from
> the NIC that connects it to the CPU.
>
I am assuming there is a MAC address which is identified to be that of a
switch. Something responds to ARP for example for that MAC. I think
you are saying that for a certain class of switch chips, there is no
concept of "cpu port" - therefore there cannot be unicast from the
chip to the cpu.
> DSA does this, and last time I looked, it pushes *all* bridge traffic
> through the CPU, making it completely unusable for slower embedded CPUs.
>
I wasnt thinking DSA (rather some MIPS based embedded boards)- but now
that you bring it up, lets Cc Lennert.
> If I remember correctly, adding support 'bridge acceleration' was left
> as an exercise for the reader and never actually implemented.
>
From talking to you, I realize there are things that are dumb and
cant be "accelerated". The scenarios so far have been for accelaration
(or to be correct: offloading).
And my contention is - this is a matter of capability discovery as
advertised by the driver and as used by the user tools.
> Sure, this could be fixed somehow, but even then the model and
> assumptions that DSA is built on simply don't work for some of the
> dumber switches that we support.
>
Agreed.
[.. content removed for brevity, dont think we have disagreements ..]
> One of the currently very common switches in many embedded devices is
> the RTL8366/RTL8367. It has some flexibility when it comes to
> configuring VLANs, and it's one of the few ones where you can configure
> a forwarding table for a VLAN (which spans multiple ports), which allows
> software bridging between multiple VLANs.
> However, what this switch does *not* support is adding a header/trailer
> to packets to indicate the originating port.
> This means that all per-port netdevs will be dummy ports which don't
> include the data path.
>
My view is that netdevs are still valuable even if only they get used
for control path. Like you said earlier - you can still pull stats, flow
control messages still make it through etc. They provide you
the consistent api to configure the switch above, ex:
If i was to use the FDB api for this switch as long as i can
abstract it in software as a bridge, I could send it a switch config
via its ops which says:
"I am giving you this entry with vland 400 for port 2, but i want you to
send it to the hardware not to your local entry"
> So let's say you have a configuration where you're using VLAN ID 4 on
> port 1, and you want to bridge it to VLAN ID 400 on port 2.
>
> Sounds easy enough, you can easily create a bridge that spans port1.4
> and port2.400. Except, this particular switch (like pretty much any
> other switch supported by swconfig) isn't actually able to handle such a
> configuration on its own.
Makes sense.
Let me point that even the Linux bridge cant handle this on its own
either.
You would need two bridges instantiated. The "cpu port" (we should call
it the "L3 port" really) is implicit in the case of the bridge i.e it
is the Linux network stack.
You would need to set the vlan filters on the bridge to strip the vlan
on egress of the first bridge etc ..
> It needs two VLAN configurations, with different forwarding table IDs,
> and then the software bridge on the CPU port needs to forward between
> the two different VLANs.
> To be able to handle such a configuration, the code would have to detect
> this kind of special case scenario, somehow hook itself via rx handler
> into the NIC connected to the CPU port and emulate that VLAN ID
> replacement behavior.
>
IMO: You dont need to muck with rx handler if you used bridge
abstraction. It becomes a config issue.
> With swconfig, you create two VLANs: VLAN 4, containing CPU and port1;
> VLAN 400, containing CPU and port2. You then create a software bridge
> between eth0.4 and eth0.400 (assuming eth0 is the NIC connected to the
> switch).
>
Can we call that "L3" instead of software bridge?
It can be done if you create a Linux bridge in software per L2 table id
in your chip. Then you attach the bridge ports.
A linux bridge of the sort, assuming there's a subnet per bridge is
configured thus:
bridge-tab1: link ports {eth0:vlan4, eth1:vlan4}, subnet 1
bridge-tab2: link ports {eth0:vlan400, eth1:vlan400}, subnet 2
> In a different scenario, the code would also have to detect
> configurations that the switch isn't able to handle, e.g.: bridging
> port1.4 to eth1 and port2.4 to eth2.
> Such a configuration wouldn't work at all with such a switch, because
> the CPU isn't able to tell apart traffic from port1 and port2, and
> there's no way to tell the switch that port1.4 and port2.4 should not be
> connected to each other, but both should go to the CPU.
>
Understood.
I think that discovery is a must - so you can apply different behavior
to different switches.
But you seem to have solved this already. Linux as is does not.
You can either have the driver tell you what it can/cant do or you
can attempt to fire and miss and get a return code that will tell
you that it cant achieve what you are asking it to do. I prefer the
former.
>
> Those are just two simple scenarios from the top of my head - I'm pretty
> sure I could come up with a long list of further corner cases and
> quirks, which are simply either difficult to deal with, or completely
> unnatural in the model that you're describing.
>
I think these are the kind of things that need to be enumerated to come
to some conclusion.
> Trying to make all of these cases work in the code will make the whole
> thing a lot more difficult to deal with and maintain. It will also make
> it much harder for the user to figure out, what configurations work, and
> what configurations don't.
>
>
> Especially the case with reusing VLANs on different ports (but not
> connecting them to each other) is something that can easily work with
> software devices, but cannot be emulated on most embedded device
> switches. The software bridge configuration model raises a lot of
> expectations that these switches simply cannot meet.
>
I wouldnt expect every thing a software bridge does would be met by
a random switch.S/w bridge would be the super-set. But this is
not a new concept, example: Netdev itself is an abstraction - we have
USB, ethernet, wireless, variety of virtual interfaces etc.
Sometimes we dont even have the concept of a "link" in some of these
devices; infiniband would have a huge MAC address but i can still
use ifconfig on it etc.
> If you look at the swconfig model, you will see that the abstraction
> clearly communicates the limitations of these typical switches.
>
I will have to go back and look - but like i said earlier seems to me
you have solved this problem. Of the switch hardware i am familiar with
(high end pricey stuff), the capabilities tend to fall into the
following components:
-flooding control (i.e what should happen on destination failure)
-learning control (i.e what should happen on the source lookup failure)
(Ive seen knobs for "drop", "send to portX" where "X" could be cpu etc)
-fdb capacity
-whether it can do vlans, filtering pvids etc
-multicast snooping capability
To add to the above a few more based on talking to you:
- cpu port (in what ive come across this is always present, but
as you point out this cannot be assumed)
- ingress port tag (you point out that some cases this may never be
present even when the cpu port is present)
- ive never seen table id, but i think this is another one; in which
case the number of table ids becomes something one needs to discover..
cheers,
jamal
> The configuration model simply doesn't even let you express these kinds
> of unsuppported configurations that seem normal in the tools used to set
> up software bridges/vlans.
> At the same time, it's fairly consistent across the range of different
> chips that we have drivers for. That certainly leaves a much smaller
> amount of traps and surprises for users, compared to trying to emulate
> the software bridge model by hacking through the layers.
>
> Hopefully this will clear a few things up for you.
>
> - Felix
>
^ permalink raw reply
* Re: [PATCH 0/2] net_sched: Remove broken tc actions
From: Jamal Hadi Salim @ 2013-10-28 22:57 UTC (permalink / raw)
To: Alexander Duyck
Cc: Eric W. Biederman, David Miller, netdev, alexander.h.duyck
In-Reply-To: <526D799C.9050905@gmail.com>
On 10/27/13 16:37, Alexander Duyck wrote:
> The primary use case for act_skbedit was to have it associated with a
> filter. I based it off of act_simple so it isn't surprising that it
> inherited this issue.
Thats almost 100% of the use cases.
>
> From what I can tell all of the other actions are just using
> tcf_hash_search for lookup. Is there anything special that is needed in
> order to add the lookup call, or could we just add a one liner
> associating simple and skbedit lookup with tcf_hash_search?
>
Yes, that would do it. But dont bother - let me just send a generic
patch that will assume that for any other action that doesnt do it.
Will do it likely tomorow.
cheers,
jamal
^ permalink raw reply
* Re: [patch net-next] ipv6: allow userspace to create address with IFLA_F_TEMPORARY flag
From: Dan Williams @ 2013-10-28 23:16 UTC (permalink / raw)
To: David Miller
Cc: hannes, jiri, vyasevich, netdev, kuznet, jmorris, yoshfuji, kaber,
thaller, stephen
In-Reply-To: <20131028.171725.1076499130782328273.davem@davemloft.net>
On Mon, 2013-10-28 at 17:17 -0400, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Sun, 27 Oct 2013 17:48:35 +0100
>
> > A temporary address is also bound to a non-privacy public address so
> > it's lifetime is determined by its lifetime (e.g. if you switch the
> > network and don't receive on-link information for that prefix any
> > more). NetworkManager would have to take care about that, too. It is
> > just a question of what NetworkManager wants to handle itself or lets
> > the kernel handle for it.
>
> How much really needs to be in userspace to implement RFC4941?
>
> I don't like the idea that even for a fully up and properly
> functioning link, if NetworkManager wedges then critical things like
> temporary address (re-)generation, will cease.
Honestly, I'd be completely happy to leave temporary address handling up
to the kernel and *not* do it in userspace; the kernel already has all
the code. There are two problems with that though, (a) it's tied to
in-kernel RA handling, and (b) it's controlled by a CONFIG option. Both
these are solvable.
First off, what's the reasoning behind having IPv6 privacy as a config
option? It's off-by-default and must be explicitly turned on, so is
there any harm in removing the config? Or is it just for
smallest-kernel-ever folks?
Would a new IFA_F_MANAGE_TEMP (or better name) work here, indicating
that for some new static address, that the kernel should create and
manage the temporary privacy addresses associated with its prefix?
Dan
> All that seems to matter is the secret used to generate the temporary
> address sequence, and perhaps things like site specific lifetime
> parameters. These are things userland can send to the kernel in
> netlink messages.
>
> Full disclosure that I am saying this from the perspective of someone
> who believes that one of the biggest mistakes ever was allowing the
> core of DHCP to be done in userspace.
>
> Right now every ipv4 DHCP user ends up with their interface in
> promiscuous mode. The DHCP client implementations are huge non-
> trivial bodies of code, and getting them to adopt the changes necessary
> to not put the interface in promiscuous mode is harder than pulling
> teeth.
>
> If at least the DHCP protocol communications part were in the kernel,
> on the other hand, we could remove the problem quite swiftly.
>
> This problem has existed for more than a decade, btw. There simply
> exists no will to fix it properly, even after all this time, because
> breaking the ice on those DHCP client code bases in userbase is hard.
>
> So I want to see less fundamental stuff about interface configuration
> and address assignment in userland, not more.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH net-next] virtio_net: migrate mergeable rx buffers to page frag allocators
From: Eric Dumazet @ 2013-10-28 23:19 UTC (permalink / raw)
To: Michael Dalton
Cc: Francesco Fusco, Michael S. Tsirkin, netdev, Daniel Borkmann,
virtualization, Eric Dumazet, David S. Miller
In-Reply-To: <1383000258-1458-1-git-send-email-mwdalton@google.com>
On Mon, 2013-10-28 at 15:44 -0700, Michael Dalton wrote:
> The virtio_net driver's mergeable receive buffer allocator
> uses 4KB packet buffers. For MTU-sized traffic, SKB truesize
> is > 4KB but only ~1500 bytes of the buffer is used to store
> packet data, reducing the effective TCP window size
> substantially. This patch addresses the performance concerns
> with mergeable receive buffers by allocating MTU-sized packet
> buffers using page frag allocators. If more than MAX_SKB_FRAGS
> buffers are needed, the SKB frag_list is used.
>
> Signed-off-by: Michael Dalton <mwdalton@google.com>
> ---
Signed-off-by: Eric Dumazet <edumazet@google.com>
Daniel & Francesco, this should address the performance problem you
tried to address with ("tcp: rcvbuf autotuning improvements")
( http://www.spinics.net/lists/netdev/msg252642.html )
Thanks !
^ permalink raw reply
* Re: [patch net-next] ipv6: allow userspace to create address with IFLA_F_TEMPORARY flag
From: Dan Williams @ 2013-10-28 23:23 UTC (permalink / raw)
To: David Miller
Cc: hannes, jiri, vyasevich, netdev, kuznet, jmorris, yoshfuji, kaber,
thaller, stephen
In-Reply-To: <1383002179.28991.14.camel@dcbw.foobar.com>
On Mon, 2013-10-28 at 18:16 -0500, Dan Williams wrote:
> On Mon, 2013-10-28 at 17:17 -0400, David Miller wrote:
> > From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > Date: Sun, 27 Oct 2013 17:48:35 +0100
> >
> > > A temporary address is also bound to a non-privacy public address so
> > > it's lifetime is determined by its lifetime (e.g. if you switch the
> > > network and don't receive on-link information for that prefix any
> > > more). NetworkManager would have to take care about that, too. It is
> > > just a question of what NetworkManager wants to handle itself or lets
> > > the kernel handle for it.
> >
> > How much really needs to be in userspace to implement RFC4941?
> >
> > I don't like the idea that even for a fully up and properly
> > functioning link, if NetworkManager wedges then critical things like
> > temporary address (re-)generation, will cease.
>
> Honestly, I'd be completely happy to leave temporary address handling up
> to the kernel and *not* do it in userspace; the kernel already has all
> the code. There are two problems with that though, (a) it's tied to
> in-kernel RA handling, and (b) it's controlled by a CONFIG option. Both
> these are solvable.
>
> First off, what's the reasoning behind having IPv6 privacy as a config
> option? It's off-by-default and must be explicitly turned on, so is
> there any harm in removing the config? Or is it just for
> smallest-kernel-ever folks?
>
> Would a new IFA_F_MANAGE_TEMP (or better name) work here, indicating
> that for some new static address, that the kernel should create and
> manage the temporary privacy addresses associated with its prefix?
Except that we've run out of IFA_F_* flags already, since it's a u8 and
there are already 8 flags. What to do?
Dan
> Dan
>
> > All that seems to matter is the secret used to generate the temporary
> > address sequence, and perhaps things like site specific lifetime
> > parameters. These are things userland can send to the kernel in
> > netlink messages.
> >
> > Full disclosure that I am saying this from the perspective of someone
> > who believes that one of the biggest mistakes ever was allowing the
> > core of DHCP to be done in userspace.
> >
> > Right now every ipv4 DHCP user ends up with their interface in
> > promiscuous mode. The DHCP client implementations are huge non-
> > trivial bodies of code, and getting them to adopt the changes necessary
> > to not put the interface in promiscuous mode is harder than pulling
> > teeth.
> >
> > If at least the DHCP protocol communications part were in the kernel,
> > on the other hand, we could remove the problem quite swiftly.
> >
> > This problem has existed for more than a decade, btw. There simply
> > exists no will to fix it properly, even after all this time, because
> > breaking the ice on those DHCP client code bases in userbase is hard.
> >
> > So I want to see less fundamental stuff about interface configuration
> > and address assignment in userland, not more.
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 2/4] wl1251: move power GPIO handling into the driver
From: Sebastian Reichel @ 2013-10-28 23:26 UTC (permalink / raw)
To: Mark Brown
Cc: Grazvydas Ignotas, Alexander Shiyan, Luciano Coelho, Mark Rutland,
devicetree, Russell King, Pawel Moll, Ian Campbell, Tony Lindgren,
Greg Kroah-Hartman, Stephen Warren, linux-doc, John W. Linville,
Rob Herring, linux-kernel@vger.kernel.org, Sachin Kamat,
Bill Pemberton, Felipe Balbi, Rob Landley, netdev,
linux-wireless@vger.kernel.org, linux-omap@vger.kernel.org
In-Reply-To: <20131028192354.GA18208@sirena.org.uk>
[-- Attachment #1: Type: text/plain, Size: 1516 bytes --]
Hi,
On Mon, Oct 28, 2013 at 12:23:54PM -0700, Mark Brown wrote:
> On Mon, Oct 28, 2013 at 07:29:52PM +0200, Grazvydas Ignotas wrote:
>
> > When wl12xx family of chips is connected through SDIO, we already have
> > that pin set up as a regulator controlled with the help of mmc
> > subsystem. When time comes to communicate with the chip, mmc subsystem
> > sees this as yet another SD card and looks for associated regulator
> > for it, and the board file has that set up as a fixed regulator
> > controlling that pin (see pandora_vmmc3 in
> > arch/arm/mach-omap2/board-omap3pandora.c). To prevent poweroff after
> > first SDIO communications are over, pm_runtime calls are used in
> > drivers/net/wireless/ti/wl1251/sdio.c .
>
> Is this actually controlling VMMC though, or is it some other control?
> If it's not controlling VMMC then it shouldn't say that it is.
>
> > I don't know if something similar can be done done in SPI case, but
> > I'm sure this is not the first your-so-called regulator misuse.
>
> It's not the first but that doesn't make controlling something other
> than a regulator through the regulator API any less broken.
I gave it a second try to find out details for this pin:
1. The pin is named PMEN in the Nokia N900 schematics
2. PMEN is described as "Power management enable - system shutdown"
in a crippled datasheet of the wl1253, which I found in the internet.
I don't think this is supposed to be handled by the regulator API.
-- Sebastian
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply
* [PATCH NEXT] rtlwifi: Fix endian error in extracting packet type
From: Larry Finger @ 2013-10-28 23:28 UTC (permalink / raw)
To: linville; +Cc: linux-wireless, Mark Cave-Ayland, netdev, Larry Finger, Stable
From: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
All of the rtlwifi drivers have an error in the routine that tests if
the received data is "special". The 16-bit quantity is big-endian, but
was being extracted in native CPU mode. One of the effects of this bug
is to inhibit association under some conditions.
A statement that would have made the code correct had been changed to
a comment. Rather than just reinstating that code, the fix here passes
sparse tests.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Cc: Stable <stable@vger.kernel.org> [2.6.38+]
---
drivers/net/wireless/rtlwifi/base.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/rtlwifi/base.c b/drivers/net/wireless/rtlwifi/base.c
index 9a78e3d..1efde7f 100644
--- a/drivers/net/wireless/rtlwifi/base.c
+++ b/drivers/net/wireless/rtlwifi/base.c
@@ -1077,8 +1077,8 @@ u8 rtl_is_special_data(struct ieee80211_hw *hw, struct sk_buff *skb, u8 is_tx)
ip = (struct iphdr *)((u8 *) skb->data + mac_hdr_len +
SNAP_SIZE + PROTOC_TYPE_SIZE);
- ether_type = *(u16 *) ((u8 *) skb->data + mac_hdr_len + SNAP_SIZE);
- /* ether_type = ntohs(ether_type); */
+ ether_type = be16_to_cpu(*(__be16 *)((u8 *)skb->data + mac_hdr_len +
+ SNAP_SIZE));
if (ETH_P_IP == ether_type) {
if (IPPROTO_UDP == ip->protocol) {
--
1.8.4
^ permalink raw reply related
* Re: [patch net-next] ipv6: allow userspace to create address with IFLA_F_TEMPORARY flag
From: Hannes Frederic Sowa @ 2013-10-28 23:31 UTC (permalink / raw)
To: David Miller
Cc: jiri, vyasevich, netdev, kuznet, jmorris, yoshfuji, kaber,
thaller, stephen
In-Reply-To: <20131028.171725.1076499130782328273.davem@davemloft.net>
On Mon, Oct 28, 2013 at 05:17:25PM -0400, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Sun, 27 Oct 2013 17:48:35 +0100
>
> > A temporary address is also bound to a non-privacy public address so
> > it's lifetime is determined by its lifetime (e.g. if you switch the
> > network and don't receive on-link information for that prefix any
> > more). NetworkManager would have to take care about that, too. It is
> > just a question of what NetworkManager wants to handle itself or lets
> > the kernel handle for it.
>
> How much really needs to be in userspace to implement RFC4941?
First off, I am not part of the NetworkManager community and don't know
their plans how and what features they want to implement. But I researched
and guess a bit. Maybe Jiri can jump in and provide links and details?
Currently we are pretty fine with our in-kernel implementation of RFC
4941. We may not have some features the RFC requires, like choosing if
we use the privacy address on a per-prefix basis. We only distinguish
per interface. But it is used by people and doing fine, I guess.
> I don't like the idea that even for a fully up and properly
> functioning link, if NetworkManager wedges then critical things like
> temporary address (re-)generation, will cease.
I am totally d'accord. It is essential that regeneration does not stop
under any circumstances.
> All that seems to matter is the secret used to generate the temporary
> address sequence, and perhaps things like site specific lifetime
> parameters. These are things userland can send to the kernel in
> netlink messages.
RFC 4941 does not dictate per se the use of a stable secret key. It is
currently generated in the kernel. If NM wants to provide a way to have
a stable key, I agree, it should be provided via netlink (maybe Nicolas
work on the netconf api could be extended for that).
I found this bug: <https://bugzilla.gnome.org/show_bug.cgi?id=705170#c2>
It does not seem to mention stable keys, so I guess it is not on
their radar. The comment I referenced talked about fixing the kernel
interface but I did not find any pointers what is wrong with it? There was
discussion that the /all/use_tempaddr should enable privacy extensions
globally on all interfaces even if they were running at that point:
http://patchwork.ozlabs.org/patch/131911/ (it got rejected). If something
like this is needed I guess we could provide such a setting. But as
NM has to have state on all interfaces I don't see a particular reason
they would need this.
I don't follow netdev closely that long so maybe I missed a lot of the
discussions that maybe took place about this.
> Full disclosure that I am saying this from the perspective of someone
> who believes that one of the biggest mistakes ever was allowing the
> core of DHCP to be done in userspace.
*nod*
> Right now every ipv4 DHCP user ends up with their interface in
> promiscuous mode. The DHCP client implementations are huge non-
> trivial bodies of code, and getting them to adopt the changes necessary
> to not put the interface in promiscuous mode is harder than pulling
> teeth.
> If at least the DHCP protocol communications part were in the kernel,
> on the other hand, we could remove the problem quite swiftly.
>
> This problem has existed for more than a decade, btw. There simply
> exists no will to fix it properly, even after all this time, because
> breaking the ice on those DHCP client code bases in userbase is hard.
I wonder why NetworkManager rewrote IPv6 router discovery but not IPv4
DHCP client stuff. It could have been a good moment to introduce something
like PROT_DHCP sockets. Maybe it is not too late... ;)
> So I want to see less fundamental stuff about interface configuration
> and address assignment in userland, not more.
I agree and don't think the kernel is that bad at it. If bugs pop up I'll
certainly will have a look at them. ;-)
Off-topic:
Down the road there may be the deprecation of EUI-64 (ipv6 address generation
based on the MAC address). It will be replaced by an approach to have an
install-time secret key which will used like this:
RID = F(Prefix, Net_Iface, Network_ID, DAD_Counter, stable_secret_key)
(network_id is something like the essid of the wlan etc.; F some hashing
function)
It should be used for link-local addresses as well as SLAAC:
<http://tools.ietf.org/html/draft-ietf-6man-stable-privacy-addresses-14>
So we should design the interfaces generic enough to cope with that. This
will have some implications: we need link-local address early in boot-up
because of the bind() of early boot-up daemons or even network booting,
mld etc.
My current idea to handle this, is that a kernel boot parameter is
provided to stop the generation of link-local addresses and that we kick
off address configuration from user-space at early as the secret key is
provided, which may well be from the initramfs. I'll send a RFC as soon
as I can find some time to clean it up and have it actually tested.
Greetings,
Hannes
^ permalink raw reply
* Re: [PATCH 03/16] wl1251: add sysfs interface for bluetooth coexistence mode configuration
From: Ben Hutchings @ 2013-10-28 23:39 UTC (permalink / raw)
To: Pali Rohár
Cc: Luciano Coelho, John W. Linville, Johannes Berg, David S. Miller,
linux-wireless, netdev, linux-kernel, freemangordon,
aaro.koskinen, pavel, sre, joni.lapilainen, David Gnedt
In-Reply-To: <1382819655-30430-4-git-send-email-pali.rohar@gmail.com>
On Sat, 2013-10-26 at 22:34 +0200, Pali Rohár wrote:
> From: David Gnedt <david.gnedt@davizone.at>
>
> Port the bt_coex_mode sysfs interface from wl1251 driver version included
> in the Maemo Fremantle kernel to allow bt-coexistence mode configuration.
> This enables userspace applications to set one of the modes
> WL1251_BT_COEX_OFF, WL1251_BT_COEX_ENABLE and WL1251_BT_COEX_MONOAUDIO.
> The default mode is WL1251_BT_COEX_OFF.
> It should be noted that this driver always enabled bt-coexistence before
> and enabled bt-coexistence directly affects the receiving performance,
> rendering it unusable in some low-signal situations. Especially monitor
> mode is affected very badly with bt-coexistence enabled.
[...]
This should be implemented consistently with other drivers:
drivers/net/wireless/ath/ath9k/htc_drv_init.c:module_param_named(btcoex_enable, ath9k_htc_btcoex_enable, int, 0444);
drivers/net/wireless/ath/ath9k/init.c:module_param_named(btcoex_enable, ath9k_btcoex_enable, int, 0444);
drivers/net/wireless/b43/main.c:module_param_named(btcoex, modparam_btcoex, int, 0444);
drivers/net/wireless/ipw2x00/ipw2200.c:module_param(bt_coexist, int, 0444);
drivers/net/wireless/iwlegacy/common.c:module_param(bt_coex_active, bool, S_IRUGO);
drivers/net/wireless/iwlwifi/iwl-drv.c:module_param_named(bt_coex_active, iwlwifi_mod_params.bt_coex_active,
drivers/net/wireless/ti/wlcore/sysfs.c:static DEVICE_ATTR(bt_coex_state, S_IRUGO | S_IWUSR,
Oh, hmm, I see a problem here.
Ben.
--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox