Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next 0/5] SCTP fix/updates
From: Neil Horman @ 2013-10-30 11:29 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, netdev, linux-sctp
In-Reply-To: <1383130252-1515-1-git-send-email-dborkman@redhat.com>

On Wed, Oct 30, 2013 at 11:50:47AM +0100, Daniel Borkmann wrote:
> Please see patch 5 for the main description/motivation, the rest just
> brings in the needed functionality for that. Although this is actually
> a fix, I've based it against net-next as some additional work for
> fixing it was needed.
> 
> Thanks!
> 
> Daniel Borkmann (5):
>   lib: crc32: clean up spacing in test cases
>   lib: crc32: add functionality to combine two crc32{,c}s in GF(2)
>   lib: crc32: add test cases for crc32{,c}_combine routines
>   net: skb_checksum: allow custom update/combine for walking skb
>   net: sctp: fix and consolidate SCTP checksumming code
> 
>  include/linux/crc32.h       |  40 ++++
>  include/linux/skbuff.h      |  13 +-
>  include/net/checksum.h      |   6 +
>  include/net/sctp/checksum.h |  56 ++----
>  lib/crc32.c                 | 453 +++++++++++++++++++++++++-------------------
>  net/core/skbuff.c           |  31 ++-
>  net/sctp/output.c           |   9 +-
>  7 files changed, 350 insertions(+), 258 deletions(-)
> 
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply

* Re: [PATCH 06/16] wl1251: split RX and TX data path initialisation
From: Pavel Machek @ 2013-10-30 11:31 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Luciano Coelho, John W. Linville, Johannes Berg, David S. Miller,
	linux-wireless, netdev, linux-kernel, freemangordon,
	aaro.koskinen, sre, joni.lapilainen, David Gnedt
In-Reply-To: <1382819655-30430-7-git-send-email-pali.rohar@gmail.com>

Hi!

> Split up data path initialisation into RX and TX data path initialisation
> functions. This change is required for channel switching in monitor mode.
> 
> Signed-off-by: David Gnedt <david.gnedt@davizone.at>
> ---
>  drivers/net/wireless/ti/wl1251/cmd.c  |   37 ++++++++++++++++++++++++++-------
>  drivers/net/wireless/ti/wl1251/cmd.h  |    3 ++-
>  drivers/net/wireless/ti/wl1251/init.c |    9 ++++++--
>  3 files changed, 39 insertions(+), 10 deletions(-)
> 
> @@ -238,6 +235,32 @@ int wl1251_cmd_data_path(struct wl1251 *wl, u8 channel, bool enable)
>  	wl1251_debug(DEBUG_BOOT, "rx %s cmd channel %d",
>  		     enable ? "start" : "stop", channel);
>  
> +out:
> +	kfree(cmd);
> +	return ret;
> +}
> +
> +int wl1251_cmd_data_path_tx(struct wl1251 *wl, u8 channel, bool enable)
> +{
> +	struct cmd_enabledisable_path *cmd;
> +	int ret;
> +	u16 cmd_tx;
> +
> +	wl1251_debug(DEBUG_CMD, "cmd data path");
> +
> +	cmd = kzalloc(sizeof(*cmd), GFP_KERNEL);
> +	if (!cmd) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}

Again, doing jump just to kfree(NULL)... is probably not worth
it.

Thanks,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply

* Re: [PATCH 07/16] wl1251: configure hardware en-/decryption for monitor mode
From: Pavel Machek @ 2013-10-30 11:35 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Luciano Coelho, John W. Linville, Johannes Berg, David S. Miller,
	linux-wireless, netdev, linux-kernel, freemangordon,
	aaro.koskinen, sre, joni.lapilainen, David Gnedt
In-Reply-To: <1382819655-30430-8-git-send-email-pali.rohar@gmail.com>

On Sat 2013-10-26 22:34:06, Pali Rohár wrote:
> From: David Gnedt <david.gnedt@davizone.at>
> 
> Disable hardware encryption (DF_ENCRYPTION_DISABLE) and decryption
> (DF_SNIFF_MODE_ENABLE) via wl1251_acx_feature_cfg while monitor interface is
> present.
> 
> Signed-off-by: David Gnedt <david.gnedt@davizone.at>

You need to sign off the patch, otherwise looks ok.

Reviewed-by: Pavel Machek <pavel@ucw.cz>

Thanks,
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply

* Re: [PATCH 08/16] wl1251: implement multicast address filtering
From: Pavel Machek @ 2013-10-30 11:41 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Luciano Coelho, John W. Linville, Johannes Berg, David S. Miller,
	linux-wireless, netdev, linux-kernel, freemangordon,
	aaro.koskinen, sre, joni.lapilainen, David Gnedt
In-Reply-To: <1382819655-30430-9-git-send-email-pali.rohar@gmail.com>

Hi!

> Port multicast address filtering from wl1271 driver.
> It sets up the hardware multicast address filter in configure_filter() with
> addresses supplied through prepare_multicast().

> +static u64 wl1251_op_prepare_multicast(struct ieee80211_hw *hw,
> +				       struct netdev_hw_addr_list *mc_list)
> +{
> +	struct wl1251_filter_params *fp;
> +	struct netdev_hw_addr *ha;
> +	struct wl1251 *wl = hw->priv;
> +
> +	if (unlikely(wl->state == WL1251_STATE_OFF))
> +		return 0;
> +
> +	fp = kzalloc(sizeof(*fp), GFP_ATOMIC);
> +	if (!fp) {
> +		wl1251_error("Out of memory setting filters.");
> +		return 0;
> +	}

So if there's not enough memory, we return 0.

> +	/* update multicast filtering parameters */
> +	fp->mc_list_length = 0;
> +	if (netdev_hw_addr_list_count(mc_list) > ACX_MC_ADDRESS_GROUP_MAX) {
> +		fp->enabled = false;
> +	} else {
> +		fp->enabled = true;
> +		netdev_hw_addr_list_for_each(ha, mc_list) {
> +			memcpy(fp->mc_list[fp->mc_list_length],
> +					ha->addr, ETH_ALEN);
> +			fp->mc_list_length++;
> +		}
> +	}
> +
> +	return (u64)(unsigned long)fp;
> +}

Hiding pointers into u64 is not exactly nice, but I guess that's how
interface is designed? :-(.

> @@ -737,6 +779,15 @@ static void wl1251_op_configure_filter(struct ieee80211_hw *hw,
>  	if (ret < 0)
>  		goto out;
>  
> +	if (*total & FIF_ALLMULTI || *total & FIF_PROMISC_IN_BSS)
> +		ret = wl1251_acx_group_address_tbl(wl, false, NULL, 0);
> +	else if (fp)
> +		ret = wl1251_acx_group_address_tbl(wl, fp->enabled,
> +						   fp->mc_list,
> +						     	fp->mc_list_length);

Is it correct not to call anything in !fp case (for example because we
were out of memory?)

> +	if (ret < 0)
> +		goto out;
> +
>  	/* send filters to firmware */
>  	wl1251_acx_rx_config(wl, wl->rx_config, wl->rx_filter);
>  
> @@ -744,6 +795,7 @@ static void wl1251_op_configure_filter(struct ieee80211_hw *hw,
>  
>  out:
>  	mutex_unlock(&wl->mutex);
> +	kfree(fp);
>  }

Umm, this is interesting. Who frees the memory in the success case?

Thanks,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
From: Jamal Hadi Salim @ 2013-10-30 11:45 UTC (permalink / raw)
  To: Felix Fietkau, Florian Fainelli, Neil Horman
  Cc: John Fastabend, netdev, David Miller, Sascha Hauer, John Crispin,
	Jonas Gorski, Gary Thomas, Vlad Yasevich, Stephen Hemminger
In-Reply-To: <526F8118.50909@openwrt.org>

On 10/29/13 05:34, Felix Fietkau wrote:
> On 2013-10-28 23:53, Jamal Hadi Salim wrote:

>
> These are simple switches, why would they respond to ARP?
> I suspect that you're attributing too much functionality to the switch
> itself. Think of it as a device similar to the cheap unmanaged ones you
> can buy in a shop and hook up to your machine via Ethernet.
> Add to that some very limited VLAN grouping functionality, and you're
> pretty close to the limits of what these switches can do.
> They don't do ARP, IP or other things. They learn about MAC addresses
> from incoming packets to build their forwarding path.
> The CPU port in this case is whatever port on the switch that you plug
> the cable of your machine into :)

Ok, got it - the only use for cpu for these things is to retrieve things
like stats, link state, etc; can you even read the fdb?


> The FDB related abstraction that you're describing will not work with
> the hardware that I'm talking about. Let's leave that one out of this
> discussion.

sigh - ok. But you gotta help me understand why.

> As for per-port netdevs: Yes, you could pull stats.
> No, flow control messages would not make it through.
> No idea how it would provide a *consistent* API.
> Either way, if adding netdevs just for stats and link state, that could
> be easily added on top of swconfig (or whatever name we pick for it)
> later. I just don't think it's worth it at this point.
>

Ok, progress, lets leave this one out.

>> Can we call that "L3" instead of software bridge?
> L3? Why?

We have two L2 domains. You want to connect them - you need a higher
layer; Layer 3 seems to be the simple one (i.e typically people would
use ip to link two layer 2 broadcast domains).


> I think that's way more confusing to users than presenting a consistent
> model that properly reflects what you can do with the hardware.
>

I think discovery from a control view is always a win.

> But I sense a pattern here. I've long had my beef with quite a few Linux
> network related APIs for being inconsistent, having no decent error
> reporting when you're trying to configure things (errno doesn't count,
> it's just too ambiguous), and just making it hard to figure out the
> capabilities. Of course, none of this can be easily fixed due to ABI
> stability constraints.
> I do NOT wish to follow that pattern!
>

You are preaching to the choir. The whole errno 8 bit thing is a mess;
I used to printk things in the kernel to indicate granularity of
which EINVAL i was returning (but i was shot down); one suggestion is
to also include a string description on the error. But that is a side
issue.
So, nod. Discovery of capabilities is better - you still have to defer
to error codes when all else fails.


> I'm not going to try to enumerate all the case; I have other projects
> that I need to work on. :)
>

I understand. I am busy as well, just saying if we need to reach an
agreement to either agree or disagree we need to capture the esoterics
of the different cases; as you can see i tried to enumerate some in
my previous email. In my case this would be useful to see, using current
mechanisms, that it can or cant be done or can be done with mods etc.

> Only a *tiny* part of the software bridge configuration model can be
> emulated, the rest does not fit and has to be handled through extensions
> or different APIs anyway. That's why I am convinced that it's a really
> bad model to try to make these switches fit into it.
>
> You gain a tiny advantage with writing scripts, but at the same time,
> the code gets more complex, the configuration interface gets more
> confusing, there are more nasty corner cases to take care of.
> Why do you insist on making so many things worse just for one tiny
> advantage? Where's the pragmatic cost/benefit tradeoff?
>

There is nothing wrong with making extensions if they make sense. My
problem so far in this discussion is i havent figured which will be bad
extensions you bring up. My approach is to list things and
then point out which one will require some witchcraft on top of
current interfaces. I am afraid I am still missing that part. Maybe
I have to go back and study your patch some more.

> Right, with most of the switches that we support, almost none of these
> things work in a way that can be integrated with the network stack.
>

Good to know. These are useful components for slightly higher end
switches.


> I'm not even sure what you mean when you say 'cpu port cannot be
> assumed'.

Meant for other devices which are dumb - lets move past this point.

> On pretty much all devices that we work with, one of the ports
> connects to a NIC in the CPU. It's just that the switch cannot be
> assumed to have special treatment for that CPU port. As far as it is
> concerned, it is just another port like the others.
>

Aha. I think i see a small terminology cross-talk. You refer to things
as NICs when i use the term netdev. So now i understand better what you
mean by rx handler (I intepreted earlier to mean something at the tap
level). Ok, so Felix, for the case where we have switches with cpu ports
that can tag incoming packets with ingress port ids - can we say the
NIC rx handler is reasonable to be used as a demux point for the
software version of the ports? I am not talking about the corner
cases.

>> - ive never seen table id, but i think this is another one; in which
>> case the number of table ids becomes something one needs to discover..
> Yes, and this is something that doesn't even map directly to something
> in the software bridge world.
>

It does - There is a single table per bridge on the software bridge
world. You need multiple bridges, one per id.

cheers,
jamal

^ permalink raw reply

* Re: [PATCH 09/16] wl1251: disable power saving in monitor mode
From: Pavel Machek @ 2013-10-30 11:46 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Luciano Coelho, John W. Linville, Johannes Berg, David S. Miller,
	linux-wireless, netdev, linux-kernel, freemangordon,
	aaro.koskinen, sre, joni.lapilainen, David Gnedt
In-Reply-To: <1382819655-30430-10-git-send-email-pali.rohar@gmail.com>

On Sat 2013-10-26 22:34:08, Pali Rohár wrote:
> From: David Gnedt <david.gnedt@davizone.at>
> 
> Force power saving off while monitor interface is present.
> 
> Signed-off-by: David Gnedt <david.gnedt@davizone.at>
> ---
>  drivers/net/wireless/ti/wl1251/main.c |    7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/wireless/ti/wl1251/main.c b/drivers/net/wireless/ti/wl1251/main.c
> index 727f2ee..62cb374 100644
> --- a/drivers/net/wireless/ti/wl1251/main.c
> +++ b/drivers/net/wireless/ti/wl1251/main.c
> @@ -617,7 +617,8 @@ static int wl1251_op_config(struct ieee80211_hw *hw, u32 changed)
>  			goto out_sleep;
>  	}
>  
> -	if (conf->flags & IEEE80211_CONF_PS && !wl->psm_requested) {
> +	if (conf->flags & IEEE80211_CONF_PS && !wl->psm_requested &&
> +	    !wl->monitor_present) {
>  		wl1251_debug(DEBUG_PSM, "psm enabled");
>  
>  		wl->psm_requested = true;
> @@ -633,8 +634,8 @@ static int wl1251_op_config(struct ieee80211_hw *hw, u32 changed)
>  		ret = wl1251_ps_set_mode(wl, STATION_POWER_SAVE_MODE);
>  		if (ret < 0)
>  			goto out_sleep;
> -	} else if (!(conf->flags & IEEE80211_CONF_PS) &&
> -		   wl->psm_requested) {
> +	} else if ((!(conf->flags & IEEE80211_CONF_PS) || wl->monitor_present)
> +		   && wl->psm_requested) {
>  		wl1251_debug(DEBUG_PSM, "psm disabled");
>  

These boolean expressions make my head spin. Introduce helper function 

can_do_pm()... return (conf->flags & IEEE80211_CONF_PS) &&
!wl->monitor_present

?

Thanks,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply

* Re: [PATCH 10/16] wl1251: fix channel switching in monitor mode
From: Pavel Machek @ 2013-10-30 11:47 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Luciano Coelho, John W. Linville, Johannes Berg, David S. Miller,
	linux-wireless, netdev, linux-kernel, freemangordon,
	aaro.koskinen, sre, joni.lapilainen, David Gnedt
In-Reply-To: <1382819655-30430-11-git-send-email-pali.rohar@gmail.com>

On Sat 2013-10-26 22:34:09, Pali Rohár wrote:
> From: David Gnedt <david.gnedt@davizone.at>
> 
> Use the ENABLE_RX command for channel switching when no interface is present
> (monitor mode only).
> The advantage of ENABLE_RX is that it leaves the tx data path disabled in
> firmware, whereas the usual JOIN command seems to transmit some frames at
> firmware level.
> 
> Signed-off-by: David Gnedt <david.gnedt@davizone.at>

Missing signoff, otherwise

Reviewed-by: Pavel Machek <pavel@ucw.cz>
								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
From: Jamal Hadi Salim @ 2013-10-30 11:50 UTC (permalink / raw)
  To: mbizon
  Cc: Felix Fietkau, Florian Fainelli, Neil Horman, John Fastabend,
	netdev, David Miller, Sascha Hauer, John Crispin, Jonas Gorski,
	Gary Thomas, Vlad Yasevich, Stephen Hemminger
In-Reply-To: <1383088365.16822.22.camel@sakura.staff.proxad.net>

On 10/29/13 19:12, Maxime Bizon wrote:

>
>  From a user POV, when you see a netdevice, you expect to be able to
> receive or send packets from/to it. The ability to read stats/link is
> only a secondary feature.
>

The important part is all the APIs stay consistent. I can use
same netlink calls. ifconfig works.
iproute2 works. People have written books on this stuff - we dont
have MCSE(Must Call Software Engineer) certification, but this is
as close as it gets. i.e the knowledge has been commoditized, even
my kid knows how to use these tools.

If i can get stats by doing ifconfig - that should provide illusion that
the netdevice is sending/receiving packets.

> Wireless subsystem moved away from using dummy/additional netdevices
> because it caused confusion.
>

This is a good arguement.
Can we hear a little more about this?

> multiqueue devices forced us to separate struct netdevice and struct
> netdev_queue, maybe it's time for more surgery :)
>

I think that would be a reasonable thing to do if it becomes necessary.

cheers,
jamal

^ permalink raw reply

* Re: [PATCH 11/16] wl1251: enable tx path in monitor mode if necessary for packet injection
From: Pavel Machek @ 2013-10-30 11:51 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Luciano Coelho, John W. Linville, Johannes Berg, David S. Miller,
	linux-wireless, netdev, linux-kernel, freemangordon,
	aaro.koskinen, sre, joni.lapilainen, David Gnedt
In-Reply-To: <1382819655-30430-12-git-send-email-pali.rohar@gmail.com>

Hi!
> 
> If necessary enable the tx path in monitor mode for packet injection using
> the JOIN command with BSS_TYPE_STA_BSS and zero BSSID.
> 
> Signed-off-by: David Gnedt <david.gnedt@davizone.at>
> ---
>  drivers/net/wireless/ti/wl1251/main.c   |    5 +++++
>  drivers/net/wireless/ti/wl1251/tx.c     |   17 +++++++++++++++++
>  drivers/net/wireless/ti/wl1251/wl1251.h |    1 +
>  3 files changed, 23 insertions(+)
> 
> diff --git a/drivers/net/wireless/ti/wl1251/main.c b/drivers/net/wireless/ti/wl1251/main.c
> diff --git a/drivers/net/wireless/ti/wl1251/tx.c b/drivers/net/wireless/ti/wl1251/tx.c
> index 3cc82fd..1de4ccb 100644
> --- a/drivers/net/wireless/ti/wl1251/tx.c
> +++ b/drivers/net/wireless/ti/wl1251/tx.c
> @@ -28,6 +28,7 @@
>  #include "tx.h"
>  #include "ps.h"
>  #include "io.h"
> +#include "event.h"
>  
>  static bool wl1251_tx_double_buffer_busy(struct wl1251 *wl, u32 data_out_count)
>  {
> @@ -298,6 +299,22 @@ static int wl1251_tx_frame(struct wl1251 *wl, struct sk_buff *skb)
>  		}
>  	}
>  
> +	/* Enable tx path in monitor mode for packet injection */
> +	if ((wl->vif == NULL) && !wl->joined) {
> +		ret = wl1251_cmd_join(wl, BSS_TYPE_STA_BSS, wl->channel,
> +				      wl->beacon_int, wl->dtim_period);
> +		if (ret < 0)
> +			wl1251_warning("join failed");
> +		else {
> +			ret = wl1251_event_wait(wl, JOIN_EVENT_COMPLETE_ID,
> +						100);
> +			if (ret < 0)
> +				wl1251_warning("join timeout");
> +			else
> +				wl->joined = true;
> +		}
> +	}

Create function enable_tx_for_packet_injection() and then just return
so that you don't have to nest ifs?

Thanks,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply

* Re: [PATCH 12/16] wl1251: disable retry and ACK policy for injected packets
From: Pavel Machek @ 2013-10-30 11:52 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Luciano Coelho, John W. Linville, Johannes Berg, David S. Miller,
	linux-wireless, netdev, linux-kernel, freemangordon,
	aaro.koskinen, sre, joni.lapilainen, David Gnedt
In-Reply-To: <1382819655-30430-13-git-send-email-pali.rohar@gmail.com>

On Sat 2013-10-26 22:34:11, Pali Rohár wrote:
> From: David Gnedt <david.gnedt@davizone.at>
> 
> Set the retry limit to 0 and disable the ACK policy for injected packets.
> 
> Signed-off-by: David Gnedt <david.gnedt@davizone.at>

Missing sign-off, otherwise ok:

Reviewed-by: Pavel Machek <pavel@ucw.cz>
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply

* Re: [PATCH 13/16] wl1251: enforce changed hw encryption support on monitor state change
From: Pavel Machek @ 2013-10-30 11:55 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Luciano Coelho, John W. Linville, Johannes Berg, David S. Miller,
	linux-wireless, netdev, linux-kernel, freemangordon,
	aaro.koskinen, sre, joni.lapilainen, David Gnedt
In-Reply-To: <1382819655-30430-14-git-send-email-pali.rohar@gmail.com>

On Sat 2013-10-26 22:34:12, Pali Rohár wrote:
> From: David Gnedt <david.gnedt@davizone.at>
> 
> The firmware doesn't support per packet encryption selection, so disable hw
> encryption support completly while a monitor interface is present to support

"completely".

> injection of packets (which shouldn't get encrypted by hw).
> To enforce the changed hw encryption support force a disassociation on
> non-monitor interfaces.
> For disassociation a workaround using hw connection monitor is employed,
> which temporary enables hw connection manager flag.
> 
> Signed-off-by: David Gnedt <david.gnedt@davizone.at>

Missing sign-off.

> index 174f403..f054741 100644
> --- a/drivers/net/wireless/ti/wl1251/main.c
> +++ b/drivers/net/wireless/ti/wl1251/main.c
> @@ -685,6 +685,16 @@ static int wl1251_op_config(struct ieee80211_hw *hw, u32 changed)
>  		wl->power_level = conf->power_level;
>  	}
>  
> +	/*
> +	 * Tell stack that connection is lost because hw encryption isn't
> +	 * supported in monitor mode.
> +	 * XXX This requires temporary enabling the hw connection monitor flag
> +	 */

"of the" and I guess you can remove the XXX? This way it looks like
workaround is not complete.

Thanks,
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply

* Re: [PATCH 14/16] wl1251: add nvs file name to module firmware list
From: Pavel Machek @ 2013-10-30 11:55 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Luciano Coelho, John W. Linville, Johannes Berg, David S. Miller,
	linux-wireless, netdev, linux-kernel, freemangordon,
	aaro.koskinen, sre, joni.lapilainen
In-Reply-To: <1382819655-30430-15-git-send-email-pali.rohar@gmail.com>

On Sat 2013-10-26 22:34:13, Pali Rohár wrote:
> Signed-off-by: Pali Rohár <pali.rohar@gmail.com>

Reviewed-by: Pavel Machek <pavel@ucw.cz>

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
From: Felix Fietkau @ 2013-10-30 11:58 UTC (permalink / raw)
  To: Jamal Hadi Salim, mbizon
  Cc: Florian Fainelli, Neil Horman, John Fastabend, netdev,
	David Miller, Sascha Hauer, John Crispin, Jonas Gorski,
	Gary Thomas, Vlad Yasevich, Stephen Hemminger
In-Reply-To: <5270F282.6000708@mojatatu.com>

On 2013-10-30 12:50, Jamal Hadi Salim wrote:
> On 10/29/13 19:12, Maxime Bizon wrote:
> 
>>
>>  From a user POV, when you see a netdevice, you expect to be able to
>> receive or send packets from/to it. The ability to read stats/link is
>> only a secondary feature.
>>
> 
> The important part is all the APIs stay consistent. I can use
> same netlink calls. ifconfig works.
> iproute2 works. People have written books on this stuff - we dont
> have MCSE(Must Call Software Engineer) certification, but this is
> as close as it gets. i.e the knowledge has been commoditized, even
> my kid knows how to use these tools.
> 
> If i can get stats by doing ifconfig - that should provide illusion that
> the netdevice is sending/receiving packets.
Pretty much all of the above have serious limitations when you're not
actually able to run the data path through the per-port netdevs.
You can't assign IP addresses to them. The network stack will probably
even attempt to assign IPv6 link-local addresses to these things,
causing even more confusion.
You can't add them to normal software bridges like other devices.
You can't use bonding. I could probably go on for a while.
There's a huge list of things that you simply cannot do with these
interfaces, and without knowing the details of the implementation, users
will be left clueless as to why that is.
I'd say that's a very serious violation of the principle of least surprise.
And knowing what the typical OpenWrt users do with their devices, I can
already forsee the bogus bug reports trickling in, if this is to be
implemented.

- Felix

^ permalink raw reply

* RE: [PATCH] x86: Run checksumming in parallel accross multiple alu's
From: David Laight @ 2013-10-30 12:18 UTC (permalink / raw)
  To: Neil Horman, Doug Ledford; +Cc: Ingo Molnar, Eric Dumazet, linux-kernel, netdev
In-Reply-To: <20131030110214.GA10220@localhost.localdomain>

> /me wonders if rearranging the instructions into this order:
> adcq 0*8(src), res1
> adcq 1*8(src), res2
> adcq 2*8(src), res1

Those have to be sequenced.

Using a 64bit lea to add 32bit quantities should avoid the
dependencies on the flags register.
However you'd need to get 3 of those active to beat a 64bit adc.

	David

^ permalink raw reply

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
From: Felix Fietkau @ 2013-10-30 12:53 UTC (permalink / raw)
  To: Jamal Hadi Salim, Florian Fainelli, Neil Horman
  Cc: John Fastabend, netdev, David Miller, Sascha Hauer, John Crispin,
	Jonas Gorski, Gary Thomas, Vlad Yasevich, Stephen Hemminger
In-Reply-To: <5270F161.80603@mojatatu.com>

On 2013-10-30 12:45, Jamal Hadi Salim wrote:
> On 10/29/13 05:34, Felix Fietkau wrote:
>> On 2013-10-28 23:53, Jamal Hadi Salim wrote:
> 
>>
>> These are simple switches, why would they respond to ARP?
>> I suspect that you're attributing too much functionality to the switch
>> itself. Think of it as a device similar to the cheap unmanaged ones you
>> can buy in a shop and hook up to your machine via Ethernet.
>> Add to that some very limited VLAN grouping functionality, and you're
>> pretty close to the limits of what these switches can do.
>> They don't do ARP, IP or other things. They learn about MAC addresses
>> from incoming packets to build their forwarding path.
>> The CPU port in this case is whatever port on the switch that you plug
>> the cable of your machine into :)
> 
> Ok, got it - the only use for cpu for these things is to retrieve things
> like stats, link state, etc; can you even read the fdb?
Where supported, all you can typically read is a list of which MAC
address was discovered behind which port - if you're lucky. You usually
won't find VLAN information attached to that.
Often it simply isn't supported at all.

>> The FDB related abstraction that you're describing will not work with
>> the hardware that I'm talking about. Let's leave that one out of this
>> discussion.
> 
> sigh - ok. But you gotta help me understand why.
The hardware implementation of MAC address handling isn't even
consistent across chips from different vendors. Often you don't even get
things like the VLAN ID. Sometimes there's a global forwarding table,
sometimes you can have multiple tables and assign them to VLANs.

>>> Can we call that "L3" instead of software bridge?
>> L3? Why?
> 
> We have two L2 domains. You want to connect them - you need a higher
> layer; Layer 3 seems to be the simple one (i.e typically people would
> use ip to link two layer 2 broadcast domains).
If you connect two L2 domains through a bridge, I still consider that L2
- it's still on the same layer, just goes through more hops.

>> I think that's way more confusing to users than presenting a consistent
>> model that properly reflects what you can do with the hardware.
> I think discovery from a control view is always a win.
Yes, and swconfig handles the discovery part fairly well.

>> I'm not going to try to enumerate all the case; I have other projects
>> that I need to work on. :)
> 
> I understand. I am busy as well, just saying if we need to reach an
> agreement to either agree or disagree we need to capture the esoterics
> of the different cases; as you can see i tried to enumerate some in
> my previous email. In my case this would be useful to see, using current
> mechanisms, that it can or cant be done or can be done with mods etc.
At this point, I'm not sure if we will be able to reach an agreement. I
think I've shown over and over again that what you're proposing comes
with huge costs in terms of complexity and bloat, as demonstrated by the
fact that it adds so many corner cases that would have to be dealt with,
including many for which we haven't even the slightest idea of a good
solution.
Now, to make this a viable option, the benefits would have to be big and
significant enough to offset these costs.
The only real benefit you've pointed out so far is to be able to reuse
existing tools/APIs (but only with modifications, not as-is). I think
that's fairly small, when put in perspective with the hard problems that
this approach creates, both for users (hidden traps and surprises) and
for developers (implementation difficulties and incompatible abstractions).

>> Only a *tiny* part of the software bridge configuration model can be
>> emulated, the rest does not fit and has to be handled through extensions
>> or different APIs anyway. That's why I am convinced that it's a really
>> bad model to try to make these switches fit into it.
>>
>> You gain a tiny advantage with writing scripts, but at the same time,
>> the code gets more complex, the configuration interface gets more
>> confusing, there are more nasty corner cases to take care of.
>> Why do you insist on making so many things worse just for one tiny
>> advantage? Where's the pragmatic cost/benefit tradeoff?
>>
> 
> There is nothing wrong with making extensions if they make sense.
Yes, but if the basic abstraction doesn't make sense for the use case,
and it leads to too many corner cases, there's everything wrong with
trying to work around that through extensions.

> My problem so far in this discussion is i havent figured which will be bad
> extensions you bring up. My approach is to list things and
> then point out which one will require some witchcraft on top of
> current interfaces. I am afraid I am still missing that part. Maybe
> I have to go back and study your patch some more.
Sure, go ahead.

>> On pretty much all devices that we work with, one of the ports
>> connects to a NIC in the CPU. It's just that the switch cannot be
>> assumed to have special treatment for that CPU port. As far as it is
>> concerned, it is just another port like the others.
> 
> Aha. I think i see a small terminology cross-talk. You refer to things
> as NICs when i use the term netdev. So now i understand better what you
> mean by rx handler (I intepreted earlier to mean something at the tap
> level). 
I only started using the term NIC to emphasize that it's not just a
netdev of the switch - it's a real Ethernet MAC (usually in the SoC),
with a separate driver that knows nothing about the switch.

> Ok, so Felix, for the case where we have switches with cpu ports
> that can tag incoming packets with ingress port ids - can we say the
> NIC rx handler is reasonable to be used as a demux point for the
> software version of the ports? I am not talking about the corner
> cases.
Yes, but when looking at the big picture, the switch being able to tag
incoming packets with the ingress port is a corner case!
Most switches that we work with aren't actually able to do that!
I want to have a decent baseline implementation that does not assume
this port tagging capability.

>>> - ive never seen table id, but i think this is another one; in which
>>> case the number of table ids becomes something one needs to discover..
>> Yes, and this is something that doesn't even map directly to something
>> in the software bridge world.
> It does - There is a single table per bridge on the software bridge
> world. You need multiple bridges, one per id.
Depends on which software bridge.
If I have two normal netdevs, eth0 and eth1, I can create eth0.4 and
bridge it to eth1.5. That's just one bridge.
I can't easily emulate that with fake per-port netdevs and a typical
switch supported by swconfig.
With just swconfig (no fake netdevs) switches that support these table
ids, I would need to have two VLANs in the switch (both connected to the
CPU port, each one getting a separate table id), and then one software
bridge between eth0.4 and eth0.5 (assuming eth0 connects to the switch).

- Felix

^ permalink raw reply

* Re: 3.12-rc7 regression - network panic from ipv6
From: Steffen Klassert @ 2013-10-30 13:09 UTC (permalink / raw)
  To: David Miller; +Cc: mroos, hannes, linux-kernel, netdev
In-Reply-To: <20131029.174258.1677867676998240250.davem@davemloft.net>

On Tue, Oct 29, 2013 at 05:42:58PM -0400, David Miller wrote:
> From: Meelis Roos <mroos@linux.ee>
> Date: Tue, 29 Oct 2013 23:38:28 +0200 (EET)
> 
> >> > Some bad news - in a system where 3.12-rc6 and earlier worked fine, 
> >> > 3.12-rc7 panics or hangs repeatedly with network traffic (torrent being 
> >> > good test). First there is BUG from ipv6 code, followed by panic.
> >> 
> >> Could you do a bisect on this? There seems to be one commit for this
> >> particular function _decode_session6:
> >> 
> >> commit bafd4bd4dcfa13145db7f951251eef3e10f8c278
> >> Author: Steffen Klassert <steffen.klassert@secunet.com>
> >> Date:   Mon Sep 9 10:38:38 2013 +0200
> >> 
> >>     xfrm: Decode sessions with output interface.
> >>     
> >>     The output interface matching does not work on forward
> >>     policy lookups, the output interface of the flowi is
> >>     always 0. Fix this by setting the output interface when
> >>     we decode the session.
> >> 
> >> Maybe try to just revert this change locally and try again?
> > 
> > Yes, just reverting this patch on top of rc7 gets rid of the problem for 
> > me.
> 
> Steffen please fix this or I'll have to revert.

I was a bit surprised that the skb has no dst_entry attached.
But in the reported case, ip6_frag_queue() removes the dst_entry
explicitly on all but the last received fragments. And unlike the
ipv4 case, it does not restore it before ip6_expire_frag_queue()
calls icmpv6_send().

I'm currently testing the patch below. Meelis, could you please
check if this patch fixes your problems?

Unfortunately I'm off without network access for the whole day
tomorrow. So in case the patch fixes the problems, I'd integrate it
into the final ipsec pull request for this release cycle on friday.


Subject: [PATCH] xfrm: Fix null pointer dereference when decoding sessions

On some codepaths the skb does not have a dst entry
when xfrm_decode_session() is called. So check for
a valid skb_dst() before dereferencing the device
interface index. We use 0 as the device index if
there is no valid skb_dst(), or at reverse decoding
we use skb_iif as device interface index.

Bug was introduced with git commit bafd4bd4dc
("xfrm: Decode sessions with output interface.").

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/xfrm4_policy.c |    6 +++++-
 net/ipv6/xfrm6_policy.c |    6 +++++-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 4764ee4..e1a6393 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -104,10 +104,14 @@ _decode_session4(struct sk_buff *skb, struct flowi *fl, int reverse)
 	const struct iphdr *iph = ip_hdr(skb);
 	u8 *xprth = skb_network_header(skb) + iph->ihl * 4;
 	struct flowi4 *fl4 = &fl->u.ip4;
+	int oif = 0;
+
+	if (skb_dst(skb))
+		oif = skb_dst(skb)->dev->ifindex;
 
 	memset(fl4, 0, sizeof(struct flowi4));
 	fl4->flowi4_mark = skb->mark;
-	fl4->flowi4_oif = skb_dst(skb)->dev->ifindex;
+	fl4->flowi4_oif = reverse ? skb->skb_iif : oif;
 
 	if (!ip_is_fragment(iph)) {
 		switch (iph->protocol) {
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index dd503a3..5f8e128 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -135,10 +135,14 @@ _decode_session6(struct sk_buff *skb, struct flowi *fl, int reverse)
 	struct ipv6_opt_hdr *exthdr;
 	const unsigned char *nh = skb_network_header(skb);
 	u8 nexthdr = nh[IP6CB(skb)->nhoff];
+	int oif = 0;
+
+	if (skb_dst(skb))
+		oif = skb_dst(skb)->dev->ifindex;
 
 	memset(fl6, 0, sizeof(struct flowi6));
 	fl6->flowi6_mark = skb->mark;
-	fl6->flowi6_oif = skb_dst(skb)->dev->ifindex;
+	fl6->flowi6_oif = reverse ? skb->skb_iif : oif;
 
 	fl6->daddr = reverse ? hdr->saddr : hdr->daddr;
 	fl6->saddr = reverse ? hdr->daddr : hdr->saddr;
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
From: Doug Ledford @ 2013-10-30 13:22 UTC (permalink / raw)
  To: David Laight, Neil Horman; +Cc: Ingo Molnar, Eric Dumazet, linux-kernel, netdev
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B73C5@saturn3.aculab.com>

On 10/30/2013 08:18 AM, David Laight wrote:
>> /me wonders if rearranging the instructions into this order:
>> adcq 0*8(src), res1
>> adcq 1*8(src), res2
>> adcq 2*8(src), res1
>
> Those have to be sequenced.
>
> Using a 64bit lea to add 32bit quantities should avoid the
> dependencies on the flags register.
> However you'd need to get 3 of those active to beat a 64bit adc.
>
> 	David
>
>
>

Already done (well, something similar to what you mention above anyway), 
doesn't help (although doesn't hurt either, even though it doubles the 
number of adds needed to complete the same work).  This is the code I 
tested:

#define ADDL_64                                         \
         asm("xorq  %%r8,%%r8\n\t"                       \
             "xorq  %%r9,%%r9\n\t"                       \
             "xorq  %%r10,%%r10\n\t"                     \
             "xorq  %%r11,%%r11\n\t"                     \
             "movl  0*4(%[src]),%%r8d\n\t"               \
             "movl  1*4(%[src]),%%r9d\n\t"               \
             "movl  2*4(%[src]),%%r10d\n\t"              \
             "movl  3*4(%[src]),%%r11d\n\t"              \
             "addq  %%r8,%[res1]\n\t"                    \
             "addq  %%r9,%[res2]\n\t"                    \
             "addq  %%r10,%[res3]\n\t"                   \
             "addq  %%r11,%[res4]\n\t"                   \
             "movl  4*4(%[src]),%%r8d\n\t"               \
             "movl  5*4(%[src]),%%r9d\n\t"               \
             "movl  6*4(%[src]),%%r10d\n\t"              \
             "movl  7*4(%[src]),%%r11d\n\t"              \
             "addq  %%r8,%[res1]\n\t"                    \
             "addq  %%r9,%[res2]\n\t"                    \
             "addq  %%r10,%[res3]\n\t"                   \
             "addq  %%r11,%[res4]\n\t"                   \
             "movl  8*4(%[src]),%%r8d\n\t"               \
             "movl  9*4(%[src]),%%r9d\n\t"               \
             "movl  10*4(%[src]),%%r10d\n\t"             \
             "movl  11*4(%[src]),%%r11d\n\t"             \
             "addq  %%r8,%[res1]\n\t"                    \
             "addq  %%r9,%[res2]\n\t"                    \
             "addq  %%r10,%[res3]\n\t"                   \
             "addq  %%r11,%[res4]\n\t"                   \
             "movl  12*4(%[src]),%%r8d\n\t"              \
             "movl  13*4(%[src]),%%r9d\n\t"              \
             "movl  14*4(%[src]),%%r10d\n\t"             \
             "movl  15*4(%[src]),%%r11d\n\t"             \
             "addq  %%r8,%[res1]\n\t"                    \
             "addq  %%r9,%[res2]\n\t"                    \
             "addq  %%r10,%[res3]\n\t"                   \
             "addq  %%r11,%[res4]"                       \
             : [res1] "=r" (result1),                    \
               [res2] "=r" (result2),                    \
               [res3] "=r" (result3),                    \
               [res4] "=r" (result4)                     \
             : [src] "r" (buff),                         \
               "[res1]" (result1), "[res2]" (result2),   \
               "[res3]" (result3), "[res4]" (result4)    \
             : "r8", "r9", "r10", "r11" )

^ permalink raw reply

* Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
From: Doug Ledford @ 2013-10-30 13:35 UTC (permalink / raw)
  To: Neil Horman; +Cc: Ingo Molnar, Eric Dumazet, linux-kernel, netdev, David Laight
In-Reply-To: <20131030110214.GA10220@localhost.localdomain>

On 10/30/2013 07:02 AM, Neil Horman wrote:

> That does makes sense, but it then begs the question, whats the advantage of
> having multiple alu's at all?

There's lots of ALU operations that don't operate on the flags or other 
entities that can be run in parallel.

> If they're just going to serialize on the
> updating of the condition register, there doesn't seem to be much advantage in
> having multiple alu's at all, especially if a common use case (parallelizing an
> operation on a large linear dataset) resulted in lower performance.
>
> /me wonders if rearranging the instructions into this order:
> adcq 0*8(src), res1
> adcq 1*8(src), res2
> adcq 2*8(src), res1
>
> would prevent pipeline stalls.  That would be interesting data, and (I think)
> support your theory, Doug.  I'll give that a try

Just to avoid spending too much time on various combinations, here are 
the methods I've tried:

Original code
2 chains doing interleaved memory accesses
2 chains doing serial memory accesses (as above)
4 chains doing serial memory accesses
4 chains using 32bit values in 64bit registers so you can always use add 
instead of adc and never need the carry flag

And I've done all of the above with simple prefetch and smart prefetch.

In all cases, the result is basically that the add method doesn't matter 
much in the grand scheme of things, but the prefetch does, and smart 
prefetch always beat simple prefetch.

My simple prefetch was to just go into the main while() loop for the 
csum operation and always prefetch 5*64 into the future.

My smart prefetch looks like this:

static inline void prefetch_line(unsigned long *cur_line,
                                  unsigned long *end_line,
                                  size_t size)
{
         size_t fetched = 0;

         while (*cur_line <= *end_line && fetched < size) {
                 prefetch((void *)*cur_line);
                 *cur_line += cache_line_size();
                 fetched += cache_line_size();
         }
}

static unsigned do_csum(const unsigned char *buff, unsigned len)
{
	...
         unsigned long cur_line = (unsigned long)buff & 
~(cache_line_size() - 1);
         unsigned long end_line = ((unsigned long)buff + len) & 
~(cache_line_size() - 1);

	...
         /* Don't bother to prefetch the first line, we'll end up 
stalling on
          * it anyway, but go ahead and start the prefetch on the next 3 */
         cur_line += cache_line_size();
         prefetch_line(&cur_line, &end_line, cache_line_size() * 3);
         odd = 1 & (unsigned long) buff;
         if (unlikely(odd)) {
                 result = *buff << 8;
	...
                 count >>= 1;            /* nr of 32-bit words.. */

                 /* prefetch line #4 ahead of main loop */
                 prefetch_line(&cur_line, &end_line, cache_line_size());

                 if (count) {
		...
                         while (count64) {
                                 /* we are now prefetching line #5 ahead of
                                  * where we are starting, and will stay 5
                                  * ahead throughout the loop, at least 
until
                                  * we get to the end line and then 
we'll stop
                                  * prefetching */
                                 prefetch_line(&cur_line, &end_line, 64);
                                 ADDL_64;
                                 buff += 64;
                                 count64--;
                         }

                         ADDL_64_FINISH;


I was going to tinker today and tomorrow with this function once I get a 
toolchain that will compile it (I reinstalled all my rhel6 hosts as f20 
and I'm hoping that does the trick, if not I need to do more work):

#define ADCXQ_64                                        \
         asm("xorq %[res1],%[res1]\n\t"                  \
             "adcxq 0*8(%[src]),%[res1]\n\t"             \
             "adoxq 1*8(%[src]),%[res2]\n\t"             \
             "adcxq 2*8(%[src]),%[res1]\n\t"             \
             "adoxq 3*8(%[src]),%[res2]\n\t"             \
             "adcxq 4*8(%[src]),%[res1]\n\t"             \
             "adoxq 5*8(%[src]),%[res2]\n\t"             \
             "adcxq 6*8(%[src]),%[res1]\n\t"             \
             "adoxq 7*8(%[src]),%[res2]\n\t"             \
             "adcxq %[zero],%[res1]\n\t"                 \
             "adoxq %[zero],%[res2]\n\t"                 \
             : [res1] "=r" (result1),                    \
               [res2] "=r" (result2)                     \
             : [src] "r" (buff), [zero] "r" (zero),      \
               "[res1]" (result1), "[res2]" (result2))

and then I also wanted to try using both xmm and ymm registers and doing 
64bit adds with 32bit numbers across multiple xmm/ymm registers as that 
should parallel nicely.  David, you mentioned you've tried this, how did 
your experiment turn out and what was your method?  I was planning on 
doing regular full size loads into one xmm/ymm register, then using 
pshufd/vshufd to move the data into two different registers, then 
summing into a fourth register, and possible running two of those 
pipelines in parallel.

^ permalink raw reply

* Re: [patch net-next RFC] netfilter: ip6_tables: use reasm skb for matching
From: Florian Westphal @ 2013-10-30 13:41 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, pablo, netfilter-devel, yoshfuji, kadlec, kaber,
	mleitner
In-Reply-To: <1383130201-6198-1-git-send-email-jiri@resnulli.us>

Jiri Pirko <jiri@resnulli.us> wrote:
> Currently, when ipv6 fragment goes through the netfilter, match
> functions are called on them directly. This might cause match function
> to fail. So benefit from the fact that nf_defrag_ipv6 constructs
> reassembled skb for us and use this reassembled skb for matching.
> 
> This patch fixes for example following situation:
> On HOSTA do:
> ip6tables -I INPUT -p icmpv6 -j DROP
> ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
> 
> and on HOSTB you do:
> ping6 HOSTA -s2000    (MTU is 1500)
> 
> Incoming echo requests will be filtered out on HOSTA. This issue does
> not occur with smaller packets than MTU (where fragmentation does not happen).

[..]
> diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
> index 44400c2..5421beb0 100644
> --- a/net/ipv6/netfilter/ip6_tables.c
> +++ b/net/ipv6/netfilter/ip6_tables.c
> @@ -328,6 +328,7 @@ ip6t_do_table(struct sk_buff *skb,
>  	const struct xt_table_info *private;
>  	struct xt_action_param acpar;
>  	unsigned int addend;
> +	struct sk_buff *reasm = skb->nfct_reasm ? skb->nfct_reasm : skb;
>  
>  	/* Initialization */
>  	indev = in ? in->name : nulldevname;
> @@ -363,7 +364,7 @@ ip6t_do_table(struct sk_buff *skb,
>  
>  		IP_NF_ASSERT(e);
>  		acpar.thoff = 0;
> -		if (!ip6_packet_match(skb, indev, outdev, &e->ipv6,
> +		if (!ip6_packet_match(reasm, indev, outdev, &e->ipv6,
>  		    &acpar.thoff, &acpar.fragoff, &acpar.hotdrop)) {

[..]

This is a bit backwards, I think.
- We gather frags
- Then we invoke ip6t_do_table for each individual fragment

So basically your patch is equivalent to
for_each_frag( )
  ip6t_do_table(reassembled_skb)

Which makes no sense to me - why traverse the ruleset n times with the same
packet?

^ permalink raw reply

* Re: [PATCH net-next] xen-netback: allocate xenvif arrays using vzalloc.
From: Eric Dumazet @ 2013-10-30 13:54 UTC (permalink / raw)
  To: Joby Poriyath
  Cc: wei.liu2, ian.campbell, netdev, xen-devel, david.vrabel,
	malcolm.crossley, andrew.bennieston
In-Reply-To: <20131030103958.GB3261@citrix.com>

On Wed, 2013-10-30 at 10:39 +0000, Joby Poriyath wrote:

> The net_device allocation rule {linux/Documentation/networking/netdevices.txt} states
> that net_device struct must be allocated using kmalloc.
> 
> Is this safe to do?

As long as the freeing path is aware of the possibility the memory is
either allocated with vmalloc() or kmalloc(), we are safe.

^ permalink raw reply

* Re: [PATCH net-next 1/5] lib: crc32: clean up spacing in test cases
From: Joe Perches @ 2013-10-30 13:56 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, netdev, linux-sctp, linux-kernel
In-Reply-To: <1383130252-1515-2-git-send-email-dborkman@redhat.com>

On Wed, 2013-10-30 at 11:50 +0100, Daniel Borkmann wrote:
> This is nothing more but a whitepace cleanup, as 80 chars is not a
> hard but soft limit, and otherwise makes the test cases arrary really
> look ugly. So fix it up.

That does look nicer.

Another option might be to take the repetitive
6 leading 0's out of column 2 and the repetitive
5 leading 0's out of column 3.

> diff --git a/lib/crc32.c b/lib/crc32.c
[]
> @@ -795,206 +795,106 @@ static struct crc_test {
>  	u32 crc32c_le;	/* expected crc32c_le result */
>  } test[] =
>  {
> -	{0x674bf11d, 0x00000038, 0x00000542, 0x0af6d466, 0xd8b6e4c1,
> -	 0xf6e93d6c},
> -	{0x35c672c6, 0x0000003a, 0x000001aa, 0xc6d3dfba, 0x28aaf3ad,
> -	 0x0fe92aca},
[etc...]
> +	{0x674bf11d, 0x00000038, 0x00000542, 0x0af6d466, 0xd8b6e4c1, 0xf6e93d6c},
> +	{0x35c672c6, 0x0000003a, 0x000001aa, 0xc6d3dfba, 0x28aaf3ad, 0x0fe92aca},

these could be

+	{0x674bf11d, 0x38, 0x542, 0x0af6d466, 0xd8b6e4c1, 0xf6e93d6c},
+	{0x35c672c6, 0x3a, 0x1aa, 0xc6d3dfba, 0x28aaf3ad, 0x0fe92aca},

etc...

^ permalink raw reply

* Re: [PATCH ] net_sched:  actions - Add default lookup
From: Eric Dumazet @ 2013-10-30 14:00 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: David Miller, netdev@vger.kernel.org, Eric W. Biederman,
	Alexander Duyck
In-Reply-To: <5270ECB5.2030601@mojatatu.com>

On Wed, 2013-10-30 at 07:25 -0400, Jamal Hadi Salim wrote:
> Attached. Tested with simple action.
> 
> cheers,
> jamal

Why not setting .lookup to tcf_hash_search
in the few actions not already doing that ?

This would be more consistent.
# git grep -n tcf_hash_search
include/net/act_api.h:92:int tcf_hash_search(struct tc_action *a, u32 index);
net/sched/act_api.c:198:int tcf_hash_search(struct tc_action *a, u32 index)
net/sched/act_api.c:209:EXPORT_SYMBOL(tcf_hash_search);
net/sched/act_csum.c:588:       .lookup         = tcf_hash_search,
net/sched/act_gact.c:209:       .lookup         =       tcf_hash_search,
net/sched/act_ipt.c:301:        .lookup         =       tcf_hash_search,
net/sched/act_ipt.c:315:        .lookup         =       tcf_hash_search,
net/sched/act_mirred.c:274:     .lookup         =       tcf_hash_search,
net/sched/act_nat.c:311:        .lookup         =       tcf_hash_search,
net/sched/act_pedit.c:246:      .lookup         =       tcf_hash_search,
net/sched/act_police.c:410:     .lookup         =       tcf_hash_search,

^ permalink raw reply

* RE: [PATCH] x86: Run checksumming in parallel accross multiple alu's
From: David Laight @ 2013-10-30 14:04 UTC (permalink / raw)
  To: Doug Ledford, Neil Horman; +Cc: Ingo Molnar, Eric Dumazet, linux-kernel, netdev
In-Reply-To: <52710B09.6090302@redhat.com>

...
> and then I also wanted to try using both xmm and ymm registers and doing
> 64bit adds with 32bit numbers across multiple xmm/ymm registers as that
> should parallel nicely.  David, you mentioned you've tried this, how did
> your experiment turn out and what was your method?  I was planning on
> doing regular full size loads into one xmm/ymm register, then using
> pshufd/vshufd to move the data into two different registers, then
> summing into a fourth register, and possible running two of those
> pipelines in parallel.

It was a long time ago, and IIRC the code was just SSE so the
register length just wasn't going to give the required benefit.
I know I wrote the code, but I can't even remember whether I
actually got it working!
With the longer AVX words it might make enough difference.
Of course, this assumes that you have the fpu registers
available. If you have to do a fpu context switch it will
be a lot slower.

About the same time I did manage to an open coded copy
loop to run as fast as 'rep movs' - and without any unrolling
or any prefetch instructions.

Thinking about AVX you should be able to do (without looking up the
actual mnemonics):
	load
	add 32bit chunks to sum
	compare sum with read value (equiv of carry)
	add/subtract compare result (0 or ~0) to a carry-sum register
That is 4 instructions for 256 bits, so you can aim for 4 clocks.
You'd need to check the cpu book to see if any of those can
be scheduled at the same time (if not dependant).
(and also whether there is any result delay - don't think so.)

I'd try running two copies of the above - probably skewed so that
the memory accesses are separated, do the memory read for the
next iteration, and use the 3rd instruction unit for loop control.

	David

^ permalink raw reply

* RE: [PATCH net-next 1/5] lib: crc32: clean up spacing in test cases
From: David Laight @ 2013-10-30 14:10 UTC (permalink / raw)
  To: Joe Perches, Daniel Borkmann; +Cc: davem, netdev, linux-sctp, linux-kernel
In-Reply-To: <1383141418.12439.76.camel@joe-AO722>

> > +	{0x674bf11d, 0x00000038, 0x00000542, 0x0af6d466, 0xd8b6e4c1, 0xf6e93d6c},
>
> these could be
> 
> +	{0x674bf11d, 0x38, 0x542, 0x0af6d466, 0xd8b6e4c1, 0xf6e93d6c},

Or even:
#define X(a, b, c, d, e, f) {0x##a, 0x##b, 0x##c, 0x##d, 0x##e. 0x##f}
	X(674bf11d, 38, 542, 0af6d466, d8b6e4c1, f6e93d6c),
...
#undef X

	David

^ permalink raw reply

* Re: [patch net-next RFC] netfilter: ip6_tables: use reasm skb for matching
From: Jiri Pirko @ 2013-10-30 14:13 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netdev, davem, pablo, netfilter-devel, yoshfuji, kadlec, kaber,
	mleitner
In-Reply-To: <20131030134100.GD16615@breakpoint.cc>

Wed, Oct 30, 2013 at 02:41:00PM CET, fw@strlen.de wrote:
>Jiri Pirko <jiri@resnulli.us> wrote:
>> Currently, when ipv6 fragment goes through the netfilter, match
>> functions are called on them directly. This might cause match function
>> to fail. So benefit from the fact that nf_defrag_ipv6 constructs
>> reassembled skb for us and use this reassembled skb for matching.
>> 
>> This patch fixes for example following situation:
>> On HOSTA do:
>> ip6tables -I INPUT -p icmpv6 -j DROP
>> ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
>> 
>> and on HOSTB you do:
>> ping6 HOSTA -s2000    (MTU is 1500)
>> 
>> Incoming echo requests will be filtered out on HOSTA. This issue does
>> not occur with smaller packets than MTU (where fragmentation does not happen).
>
>[..]
>> diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
>> index 44400c2..5421beb0 100644
>> --- a/net/ipv6/netfilter/ip6_tables.c
>> +++ b/net/ipv6/netfilter/ip6_tables.c
>> @@ -328,6 +328,7 @@ ip6t_do_table(struct sk_buff *skb,
>>  	const struct xt_table_info *private;
>>  	struct xt_action_param acpar;
>>  	unsigned int addend;
>> +	struct sk_buff *reasm = skb->nfct_reasm ? skb->nfct_reasm : skb;
>>  
>>  	/* Initialization */
>>  	indev = in ? in->name : nulldevname;
>> @@ -363,7 +364,7 @@ ip6t_do_table(struct sk_buff *skb,
>>  
>>  		IP_NF_ASSERT(e);
>>  		acpar.thoff = 0;
>> -		if (!ip6_packet_match(skb, indev, outdev, &e->ipv6,
>> +		if (!ip6_packet_match(reasm, indev, outdev, &e->ipv6,
>>  		    &acpar.thoff, &acpar.fragoff, &acpar.hotdrop)) {
>
>[..]
>
>This is a bit backwards, I think.
>- We gather frags
>- Then we invoke ip6t_do_table for each individual fragment
>
>So basically your patch is equivalent to
>for_each_frag( )
>  ip6t_do_table(reassembled_skb)
>
>Which makes no sense to me - why traverse the ruleset n times with the same
>packet?


Because each fragment need to be pushed through separately.

What different approach would you suggest?

Thanks

Jiri

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox