Netdev List
 help / color / mirror / Atom feed
* [PATCH RFC 3/6] skbuff: convert to skb_orphan_frags
From: Michael S. Tsirkin @ 2012-05-07 13:54 UTC (permalink / raw)
  To: Ian Campbell; +Cc: David Miller, netdev@vger.kernel.org, eric.dumazet@gmail.com
In-Reply-To: <cover.1336397823.git.mst@redhat.com>

Reduce dode duplication a bit using the new helper.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 net/core/skbuff.c |   22 ++++++++--------------
 1 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index bd28e80..bdf5b09 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -785,10 +785,8 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask)
 {
 	struct sk_buff *n;
 
-	if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
-		if (skb_copy_ubufs(skb, gfp_mask))
-			return NULL;
-	}
+	if (skb_orphan_frags(skb, gfp_mask))
+		return NULL;
 
 	n = skb + 1;
 	if (skb->fclone == SKB_FCLONE_ORIG &&
@@ -908,12 +906,10 @@ struct sk_buff *__pskb_copy(struct sk_buff *skb, int headroom, gfp_t gfp_mask)
 	if (skb_shinfo(skb)->nr_frags) {
 		int i;
 
-		if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
-			if (skb_copy_ubufs(skb, gfp_mask)) {
-				kfree_skb(n);
-				n = NULL;
-				goto out;
-			}
+		if (skb_orphan_frags(skb, gfp_mask)) {
+			kfree_skb(n);
+			n = NULL;
+			goto out;
 		}
 		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
 			skb_shinfo(n)->frags[i] = skb_shinfo(skb)->frags[i];
@@ -1005,10 +1001,8 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 		skb_free_head(skb);
 	} else {
 		/* copy this zero copy skb frags */
-		if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
-			if (skb_copy_ubufs(skb, gfp_mask))
-				goto nofrags;
-		}
+		if (skb_orphan_frags(skb, gfp_mask))
+			goto nofrags;
 		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
 			skb_frag_ref(skb, i);
 
-- 
MST

^ permalink raw reply related

* [PATCH RFC 2/6] skbuff: add an api to orphan frags
From: Michael S. Tsirkin @ 2012-05-07 13:54 UTC (permalink / raw)
  To: Ian Campbell; +Cc: David Miller, netdev@vger.kernel.org, eric.dumazet@gmail.com
In-Reply-To: <cover.1336397823.git.mst@redhat.com>

Many places do
       if ((skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY))
		skb_copy_ubufs(skb, gfp_mask);
to copy and invoke frag destructors if necessary.
Add an inline helper for this.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/linux/skbuff.h |   24 ++++++++++++++++++++++++
 1 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index bb78f70..28d842e 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1711,6 +1711,30 @@ static inline void skb_orphan(struct sk_buff *skb)
 }
 
 /**
+ *	skb_orphan_frags - orphan the frags contained in a buffer
+ *	@skb: buffer to orphan frags from
+ *	@gfp_mask: allocation mask for replacement pages
+ *
+ *	For each frag in the SKB which needs a destructor (i.e. has an
+ *	owner) create a copy of that frag and release the original
+ *	page by calling the destructor.
+ */
+static inline int skb_orphan_frags(struct sk_buff *skb, gfp_t gfp_mask)
+{
+	if (likely(!(skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY)))
+		return 0;
+	return skb_copy_ubufs(skb, gfp_mask);
+}
+
+
+static inline void skb_copy_frag_destructor(struct sk_buff *to,
+					    struct sk_buff *from)
+{
+	skb_shinfo(to)->tx_flags |= skb_shinfo(from)->tx_flags &
+		SKBTX_DEV_ZEROCOPY;
+}
+
+/**
  *	__skb_queue_purge - empty a list
  *	@list: list to empty
  *
-- 
MST

^ permalink raw reply related

* [PATCH RFC 1/6] skbuff: support per-page destructors in copy_ubufs
From: Michael S. Tsirkin @ 2012-05-07 13:54 UTC (permalink / raw)
  To: Ian Campbell; +Cc: David Miller, netdev@vger.kernel.org, eric.dumazet@gmail.com
In-Reply-To: <cover.1336397823.git.mst@redhat.com>

sunrpc wants to use zero copy with tcp which means
some fragments are zero copy while others are not.
This in turn means there's no per skb destructor_arg,
instead some fragments have destructors. Teach
skb_copy_ubufs and skb_release_data to handle such skbs.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 net/core/skbuff.c |   18 ++++++++++++++----
 1 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c81240c..bd28e80 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -423,7 +423,7 @@ static void skb_release_data(struct sk_buff *skb)
 			struct ubuf_info *uarg;
 
 			uarg = skb_shinfo(skb)->destructor_arg;
-			if (uarg->callback)
+			if (uarg && uarg->callback)
 				uarg->callback(uarg);
 		}
 
@@ -721,6 +721,8 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
 	for (i = 0; i < num_frags; i++) {
 		u8 *vaddr;
 		skb_frag_t *f = &skb_shinfo(skb)->frags[i];
+		if (unlikely((!uarg && !f->page.destructor)))
+			continue;
 
 		page = alloc_page(GFP_ATOMIC);
 		if (!page) {
@@ -740,13 +742,21 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
 	}
 
 	/* skb frags release userspace buffers */
-	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
-		skb_frag_unref(skb, i);
+	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+		skb_frag_t *f = &skb_shinfo(skb)->frags[i];
+		if (unlikely((!uarg && !f->page.destructor)))
+			continue;
+		__skb_frag_unref(f);
+	}
 
-	uarg->callback(uarg);
+	if (uarg)
+		uarg->callback(uarg);
 
 	/* skb frags point to kernel buffers */
 	for (i = skb_shinfo(skb)->nr_frags; i > 0; i--) {
+		skb_frag_t *f = &skb_shinfo(skb)->frags[i];
+		if (unlikely((!uarg && !f->page.destructor)))
+			continue;
 		__skb_fill_page_desc(skb, i-1, head, 0,
 				     skb_shinfo(skb)->frags[i - 1].size);
 		head = (struct page *)head->private;
-- 
MST

^ permalink raw reply related

* [PATCH RFC 0/6] copy aside frags with destructors (was [PATCH 7/9] net: add skb_orphan_frags to copy aside frags with destructors)
From: Michael S. Tsirkin @ 2012-05-07 13:53 UTC (permalink / raw)
  To: Ian Campbell; +Cc: David Miller, netdev@vger.kernel.org, eric.dumazet@gmail.com

On Fri, May 04, 2012 at 11:51:31AM +0100, Ian Campbell wrote:
> On Fri, 2012-05-04 at 11:03 +0100, Michael S. Tsirkin wrote:
> > On Fri, May 04, 2012 at 02:54:33AM -0400, David Miller wrote:
> > > From: "Michael S. Tsirkin" <mst@redhat.com>
> > > Date: Fri, 4 May 2012 00:10:24 +0300
> > > 
> > > > Hmm we orphan skbs when we loop them back so how about reusing the
> > > > skb->destructor for this?
> > > 
> > > That's one possibility.

So originally I thought it would work: destructor would wake the
original owner which would do data copy and modify the fragments.  But
then I unfortunately realized that would be racy: the new owner could be
using the old frags and there appears no way for us to
make sure it doesn't so that we can put the original page.
And the same logic applies to modifying the frags at any
other time if the skb is cloned. So it seems we must copy if we
want to clone the skb.

Further, destructor itself can't do the copy because this needs
to allocate memory in atomic context, and destructor
itself can't fail.
For this second problem I see two solutions: either pre-allocate the
copy buffer just in case and track it together with the destructor, or
use an skb flag to make the check for destructors quick (if not
completely free).

Second option is what macvtap zero copy uses and it already
does copy on clone too. So I hacked that to make it support tcp/udp
used by sunrpc.

> > > 
> > > But I fear we're about to toss Ian into yet another rabbit hole. :-)
> > > 
> > > Let's try to converge on something quickly as I think integration of
> > > his work has been delayed enough as-is.

...

> > It's weekend here, I'll work on a patch like this
> > Sunday.
> 
> Thanks, I was starting to feel my nose twitching and my ears beginning
> to elongate ;-)
> 
> Ian.

Here's a first stub at a fix. Basically to be able to modify frags on
the fly we must make sure the skb isn't cloned, so the moment someone
clones the skb we need to trigger the frag copy logic.  And this is
exactly what happens with SKBTX_DEV_ZEROCOPY so it seems to make sense
to reuse that logic.

The below patchset replaces patch 7
([PATCH 7/9] net: add skb_orphan_frags to copy aside frags with destructors)
in Ian's patchset and needs to be applied there.


Compiled only but I'd like to hear what people think all the same
because it does add a couple of branches on fast path.  On the other
hand this makes it generic so the same logic will be reusable for packet
sockets (which IIRC are currently buggy in the same way as sunrpc) and
for adding zero copy support to tun.

Please comment,
Thanks!

-- 
MST


Michael S. Tsirkin (6):
  skbuff: support per-page destructors in copy_ubufs
  skbuff: add an api to orphan frags
  skbuff: convert to skb_orphan_frags
  tun: orphan frags on xmit
  net: orphan frags on receive
  skbuff: set zerocopy flag on frag destructor

 drivers/net/tun.c      |    2 ++
 include/linux/skbuff.h |   41 +++++++++++++++++++++++++++++++++++++++++
 net/core/dev.c         |    2 ++
 net/core/skbuff.c      |   43 +++++++++++++++++++++++++------------------
 4 files changed, 70 insertions(+), 18 deletions(-)

-- 
MST

^ permalink raw reply

* Re: [PATCH] net: compare_ether_addr[_64bits]() has no ordering
From: Eric Dumazet @ 2012-05-07 13:53 UTC (permalink / raw)
  To: Johannes Berg; +Cc: netdev
In-Reply-To: <1336397946.4325.27.camel@jlt3.sipsolutions.net>

On Mon, 2012-05-07 at 15:39 +0200, Johannes Berg wrote:
> From: Johannes Berg <johannes.berg@intel.com>
> 
> Neither compare_ether_addr() nor compare_ether_addr_64bits()
> (as it can fall back to the former) have comparison semantics
> like memcmp() where the sign of the return value indicates sort
> order. We had a bug in the wireless code due to a blind memcmp
> replacement because of this.
> 
> A cursory look suggests that the wireless bug was the only one
> due to this semantic difference.
> 
> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> ---
>  include/linux/etherdevice.h |   11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)

The right way to avoid this kind of problems is to change these
functions to return a bool

^ permalink raw reply

* [PATCH] net: compare_ether_addr[_64bits]() has no ordering
From: Johannes Berg @ 2012-05-07 13:39 UTC (permalink / raw)
  To: netdev

From: Johannes Berg <johannes.berg@intel.com>

Neither compare_ether_addr() nor compare_ether_addr_64bits()
(as it can fall back to the former) have comparison semantics
like memcmp() where the sign of the return value indicates sort
order. We had a bug in the wireless code due to a blind memcmp
replacement because of this.

A cursory look suggests that the wireless bug was the only one
due to this semantic difference.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 include/linux/etherdevice.h |   11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

--- a/include/linux/etherdevice.h	2012-04-12 05:40:35.000000000 +0200
+++ b/include/linux/etherdevice.h	2012-05-07 15:34:28.000000000 +0200
@@ -159,7 +159,8 @@ static inline void eth_hw_addr_random(st
  * @addr1: Pointer to a six-byte array containing the Ethernet address
  * @addr2: Pointer other six-byte array containing the Ethernet address
  *
- * Compare two ethernet addresses, returns 0 if equal
+ * Compare two ethernet addresses, returns 0 if equal, non-zero otherwise.
+ * Unlike memcmp(), it doesn't return a value suitable for sorting.
  */
 static inline unsigned compare_ether_addr(const u8 *addr1, const u8 *addr2)
 {
@@ -184,10 +185,10 @@ static inline unsigned long zap_last_2by
  * @addr1: Pointer to an array of 8 bytes
  * @addr2: Pointer to an other array of 8 bytes
  *
- * Compare two ethernet addresses, returns 0 if equal.
- * Same result than "memcmp(addr1, addr2, ETH_ALEN)" but without conditional
- * branches, and possibly long word memory accesses on CPU allowing cheap
- * unaligned memory reads.
+ * Compare two ethernet addresses, returns 0 if equal, non-zero otherwise.
+ * Unlike memcmp(), it doesn't return a value suitable for sorting.
+ * The function doesn't need any conditional branches and possibly uses
+ * word memory accesses on CPU allowing cheap unaligned memory reads.
  * arrays = { byte1, byte2, byte3, byte4, byte6, byte7, pad1, pad2}
  *
  * Please note that alignment of addr1 & addr2 is only guaranted to be 16 bits.

^ permalink raw reply

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
From: Hans Schillstrom @ 2012-05-07 12:57 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: kaber@trash.net, jengelh@medozas.de,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	hans@schillstrom.com
In-Reply-To: <20120507122232.GA32146@1984>

On Monday 07 May 2012 14:22:32 Pablo Neira Ayuso wrote:
> On Mon, May 07, 2012 at 02:09:46PM +0200, Hans Schillstrom wrote:
> > On Monday 07 May 2012 13:56:12 Pablo Neira Ayuso wrote:
> > > On Mon, May 07, 2012 at 11:14:34AM +0200, Hans Schillstrom wrote:
> > > > > > We have plenty of rules where just source port mask is zero.
> > > > > > and the dest-port-mask is 0xfffc (or 0xffff)
> > > > > 
> > > > > 0xffff and 0x0000 means on/off respectively.
> > > > > 
> > > > > Still curious, how can 0xfffc be useful?
> > > > 
> > > > That's a special case where an appl is using 4 ports.
> > > > But in general, have not seen other than "on/off" except for above.
> > > 
> > > I see. Well I'm fine with this way to switch on/off things, just
> > > wanted some clafication.
> > > 
> > > Still one final thing I'd like to remove before inclusion:
> > > 
> > > +       union hmark_ports       port_mask;
> > > +       union hmark_ports       port_set;
> > > +       __u32                   spi_mask;
> > > +       __u32                   spi_set;
> > > 
> > > the spi_mask seems redundant. The port_mask already provides u32 for
> > > it.
> > 
> > No problems, I'll remove it.
> 
> OK. As a nice side-effect, this will lead to removing the branch that
> tests ESP/AH in hmark_set_tuple_ports.
>
Yes, only check if not ESP or AH to swap src/dst

+static void
+hmark_set_tuple_ports(const struct sk_buff *skb, unsigned int nhoff,
+		      struct hmark_tuple *t, const struct xt_hmark_info *info)
+{
+	int protoff;
+
+	protoff = proto_ports_offset(t->proto);
+	if (protoff < 0)
+		return;
+
+	nhoff += protoff;
+	if (skb_copy_bits(skb, nhoff, &t->uports, sizeof(t->uports)) < 0)
+		return;
+
+	t->uports.v32 = (t->uports.v32 & info->port_mask.v32) |
+			info->port_set.v32;
+
+	if (t->proto != IPPROTO_ESP && t->proto != IPPROTO_AH)
+		if (t->uports.p16.dst < t->uports.p16.src)
+			swap(t->uports.p16.dst, t->uports.p16.src);
+}

> Please, use the patch that I sent you yesterday. Recover the swap
> behaviour that you need, I'll mangle the patch myself to add the
> little comment to explain why we do this with CT as well.
> 
> BTW, note that you do *not* have to remove the XT_HMARK_SPI flags, we
> still need those for iptables-save.
> 
> While at it:
> 
> +enum {                      
> +       XT_HMARK_NONE,       
> +       XT_HMARK_SADR_AND,   
> +       XT_HMARK_DADR_AND,   
> +       XT_HMARK_SPI_AND,    
> +       XT_HMARK_SPI_OR,    
> 
> remove all trailing _OR
> 
> +       XT_HMARK_SPORT_AND,  
> +       XT_HMARK_DPORT_AND,  
> +       XT_HMARK_SPORT_OR,   
> +       XT_HMARK_DPORT_OR,   
> +       XT_HMARK_PROTO_AND,
> 
> rename all _AND by _MASK.
> 
> +       XT_HMARK_RND,        
> +       XT_HMARK_MODULUS,    
> +       XT_HMARK_OFFSET,     
> +       XT_HMARK_CT,         
> +       XT_HMARK_METHOD_L3,  
> +       XT_HMARK_METHOD_L3_4,
> };
> 
> What I'm asking should require very little changes in the kernel-code.
> 

I'll send you the updates later to day

> > > In case you want to support different masks for AH/ESP and TCP, you
> > > could do the following:
> > > 
> > > iptables -I PREROUTING -t mangle -p esp -j HARK --spi-mask 0xffff0000
> > > iptables -I PREROUTING -t mangle -p tcp -j HARK --port-mask 0xfffc
> > > 
> > > Any objection?
> > 
> > I don't think this is a problem, but it should be written in the man page
> > that ports and spi share mask so they can't be used at the same time.
> 
> documentation is fine.
> 
> iptables can stop this by spotting a warning message from user-space.

If you think thats enough, I fine with that.

-- 
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
From: Pablo Neira Ayuso @ 2012-05-07 12:22 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: kaber@trash.net, jengelh@medozas.de,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	hans@schillstrom.com
In-Reply-To: <201205071409.47945.hans.schillstrom@ericsson.com>

On Mon, May 07, 2012 at 02:09:46PM +0200, Hans Schillstrom wrote:
> On Monday 07 May 2012 13:56:12 Pablo Neira Ayuso wrote:
> > On Mon, May 07, 2012 at 11:14:34AM +0200, Hans Schillstrom wrote:
> > > > > We have plenty of rules where just source port mask is zero.
> > > > > and the dest-port-mask is 0xfffc (or 0xffff)
> > > > 
> > > > 0xffff and 0x0000 means on/off respectively.
> > > > 
> > > > Still curious, how can 0xfffc be useful?
> > > 
> > > That's a special case where an appl is using 4 ports.
> > > But in general, have not seen other than "on/off" except for above.
> > 
> > I see. Well I'm fine with this way to switch on/off things, just
> > wanted some clafication.
> > 
> > Still one final thing I'd like to remove before inclusion:
> > 
> > +       union hmark_ports       port_mask;
> > +       union hmark_ports       port_set;
> > +       __u32                   spi_mask;
> > +       __u32                   spi_set;
> > 
> > the spi_mask seems redundant. The port_mask already provides u32 for
> > it.
> 
> No problems, I'll remove it.

OK. As a nice side-effect, this will lead to removing the branch that
tests ESP/AH in hmark_set_tuple_ports.

Please, use the patch that I sent you yesterday. Recover the swap
behaviour that you need, I'll mangle the patch myself to add the
little comment to explain why we do this with CT as well.

BTW, note that you do *not* have to remove the XT_HMARK_SPI flags, we
still need those for iptables-save.

While at it:

+enum {                      
+       XT_HMARK_NONE,       
+       XT_HMARK_SADR_AND,   
+       XT_HMARK_DADR_AND,   
+       XT_HMARK_SPI_AND,    
+       XT_HMARK_SPI_OR,    

remove all trailing _OR

+       XT_HMARK_SPORT_AND,  
+       XT_HMARK_DPORT_AND,  
+       XT_HMARK_SPORT_OR,   
+       XT_HMARK_DPORT_OR,   
+       XT_HMARK_PROTO_AND,

rename all _AND by _MASK.

+       XT_HMARK_RND,        
+       XT_HMARK_MODULUS,    
+       XT_HMARK_OFFSET,     
+       XT_HMARK_CT,         
+       XT_HMARK_METHOD_L3,  
+       XT_HMARK_METHOD_L3_4,
};

What I'm asking should require very little changes in the kernel-code.

> > In case you want to support different masks for AH/ESP and TCP, you
> > could do the following:
> > 
> > iptables -I PREROUTING -t mangle -p esp -j HARK --spi-mask 0xffff0000
> > iptables -I PREROUTING -t mangle -p tcp -j HARK --port-mask 0xfffc
> > 
> > Any objection?
> 
> I don't think this is a problem, but it should be written in the man page
> that ports and spi share mask so they can't be used at the same time.

documentation is fine.

iptables can stop this by spotting a warning message from user-space.

^ permalink raw reply

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
From: Hans Schillstrom @ 2012-05-07 12:09 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: kaber@trash.net, jengelh@medozas.de,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	hans@schillstrom.com
In-Reply-To: <20120507115612.GA31110@1984>

On Monday 07 May 2012 13:56:12 Pablo Neira Ayuso wrote:
> On Mon, May 07, 2012 at 11:14:34AM +0200, Hans Schillstrom wrote:
> > > > We have plenty of rules where just source port mask is zero.
> > > > and the dest-port-mask is 0xfffc (or 0xffff)
> > > 
> > > 0xffff and 0x0000 means on/off respectively.
> > > 
> > > Still curious, how can 0xfffc be useful?
> > 
> > That's a special case where an appl is using 4 ports.
> > But in general, have not seen other than "on/off" except for above.
> 
> I see. Well I'm fine with this way to switch on/off things, just
> wanted some clafication.
> 
> Still one final thing I'd like to remove before inclusion:
> 
> +       union hmark_ports       port_mask;
> +       union hmark_ports       port_set;
> +       __u32                   spi_mask;
> +       __u32                   spi_set;
> 
> the spi_mask seems redundant. The port_mask already provides u32 for
> it.

No problems, I'll remove it.

> In case you want to support different masks for AH/ESP and TCP, you
> could do the following:
> 
> iptables -I PREROUTING -t mangle -p esp -j HARK --spi-mask 0xffff0000
> iptables -I PREROUTING -t mangle -p tcp -j HARK --port-mask 0xfffc
> 
> Any objection?

I don't think this is a problem, but it should be written in the man page
that ports and spi share mask so they can't be used at the same time.


> Yes, you'll have to change user-space again, but we have time for
> that.

:-)

> 
> > > > > I'm also telling this because I think that ICMP support will be
> > > > > easier to add if port masking is removed.
> > > > > 
> > > > > [...]
> > > > > > This is what I have done.
> > > > > >
> > > > > > - I reduced the code size a little bit by combining the hmark_ct_set_htuple_ipvX into one func.
> > > > > >   by adding a hmark_addr6_mask() and hmark_addr_any_mask()
> > > > > >   Note that using "otuple->src.l3num" as param 1 in both src and dst is not a typo.
> > > > > >   (it's not set in the rtuple)
> > > > > 
> > > > > Good one, this made the code even smaller.
> > > > > 
> > > > > > - Made the if (dst < src) swap() in the hmark_hash() since it should be used by every caller.
> > > > > 
> > > > > Not really, you don't need for the conntrack part. The original tuple
> > > > > is always the same, not matter where the packet is coming from. I have
> > > > > removed this again so it only affects packet-based hashing.
> > > > 
> > > > Yes original tuple is always the same but not always less than the rtuple.
> > > > If you have two nodes that should produce the same hmark,
> > > > one with conntrack an one without you must make a compare to make it consistent.
> > > 
> > > I see, for consistency still makes sense although this seems to me
> > > like still strange configuration. In what scenario would you use two
> > > different approaches?
> > 
> > In the way that we use HMARK,
> > in the incomming path there is conntrack disabled in the contrainer, 
> > for the outgoing patch i.e. at the payloads there is conntrack used.
> > In that case the --hmark-ct makes life easier.
> 
> That's still not enough to guarantee that the mark will be consistent
> if NAT is in user, but I don't mind recovering the swap and add some
> comment on the code to explain this if this makes your life easier.

Thanks,  I will send a new patch soon.

-- 
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
From: Pablo Neira Ayuso @ 2012-05-07 11:56 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: kaber@trash.net, jengelh@medozas.de,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	hans@schillstrom.com
In-Reply-To: <201205071114.35324.hans.schillstrom@ericsson.com>

On Mon, May 07, 2012 at 11:14:34AM +0200, Hans Schillstrom wrote:
> > > We have plenty of rules where just source port mask is zero.
> > > and the dest-port-mask is 0xfffc (or 0xffff)
> > 
> > 0xffff and 0x0000 means on/off respectively.
> > 
> > Still curious, how can 0xfffc be useful?
> 
> That's a special case where an appl is using 4 ports.
> But in general, have not seen other than "on/off" except for above.

I see. Well I'm fine with this way to switch on/off things, just
wanted some clafication.

Still one final thing I'd like to remove before inclusion:

+       union hmark_ports       port_mask;
+       union hmark_ports       port_set;
+       __u32                   spi_mask;
+       __u32                   spi_set;

the spi_mask seems redundant. The port_mask already provides u32 for
it.

In case you want to support different masks for AH/ESP and TCP, you
could do the following:

iptables -I PREROUTING -t mangle -p esp -j HARK --spi-mask 0xffff0000
iptables -I PREROUTING -t mangle -p tcp -j HARK --port-mask 0xfffc

Any objection?

Yes, you'll have to change user-space again, but we have time for
that.

> > > > I'm also telling this because I think that ICMP support will be
> > > > easier to add if port masking is removed.
> > > > 
> > > > [...]
> > > > > This is what I have done.
> > > > >
> > > > > - I reduced the code size a little bit by combining the hmark_ct_set_htuple_ipvX into one func.
> > > > >   by adding a hmark_addr6_mask() and hmark_addr_any_mask()
> > > > >   Note that using "otuple->src.l3num" as param 1 in both src and dst is not a typo.
> > > > >   (it's not set in the rtuple)
> > > > 
> > > > Good one, this made the code even smaller.
> > > > 
> > > > > - Made the if (dst < src) swap() in the hmark_hash() since it should be used by every caller.
> > > > 
> > > > Not really, you don't need for the conntrack part. The original tuple
> > > > is always the same, not matter where the packet is coming from. I have
> > > > removed this again so it only affects packet-based hashing.
> > > 
> > > Yes original tuple is always the same but not always less than the rtuple.
> > > If you have two nodes that should produce the same hmark,
> > > one with conntrack an one without you must make a compare to make it consistent.
> > 
> > I see, for consistency still makes sense although this seems to me
> > like still strange configuration. In what scenario would you use two
> > different approaches?
> 
> In the way that we use HMARK,
> in the incomming path there is conntrack disabled in the contrainer, 
> for the outgoing patch i.e. at the payloads there is conntrack used.
> In that case the --hmark-ct makes life easier.

That's still not enough to guarantee that the mark will be consistent
if NAT is in user, but I don't mind recovering the swap and add some
comment on the code to explain this if this makes your life easier.

^ permalink raw reply

* batostr() function
From: Johannes Berg @ 2012-05-07 11:49 UTC (permalink / raw)
  To: linux-bluetooth-u79uwXL29TY76Z2rM5mHXA, netdev

Really? 2 static buffers that are used alternately based on a static
variable? How can that possibly be thread-safe? That may work in very
restricted scenarios, but ...

johannes

^ permalink raw reply

* New commands to configure IOV features
From: Yuval Mintz @ 2012-05-07 11:17 UTC (permalink / raw)
  To: gregory.v.rose; +Cc: Ben Hutchings, netdev@vger.kernel.org

I've tried to figure out if there was a standard interface
(ethtool/iproute) through which a user could configure the number
of vfs in his system.

I've seen the RFC suggested in http://markmail.org/thread/qblfcv7zbxsxp7q6,
and http://markmail.org/thread/fw54dcppmxuxoe6n, but failed to see any
later references to it (commits or further discussion on this topic).

How exactly are things standing with these RFCs? Were they abandoned?

Thanks,
Yuval

^ permalink raw reply

* Re: [PATCH 7/9] net: add skb_orphan_frags to copy aside frags with destructors
From: Michael S. Tsirkin @ 2012-05-07 10:24 UTC (permalink / raw)
  To: Ian Campbell; +Cc: netdev, David Miller, Eric Dumazet
In-Reply-To: <1336056971-7839-7-git-send-email-ian.campbell@citrix.com>

On Thu, May 03, 2012 at 03:56:09PM +0100, Ian Campbell wrote:
> This should be used by drivers which need to hold on to an skb for an extended
> (perhaps unbounded) period of time. e.g. the tun driver which relies on
> userspace consuming the skb.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: mst@redhat.com
> ---
>  drivers/net/tun.c      |    1 +
>  include/linux/skbuff.h |   11 ++++++++
>  net/core/skbuff.c      |   68 ++++++++++++++++++++++++++++++++++-------------
>  3 files changed, 61 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index bb8c72c..b53e04e 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -415,6 +415,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>  	/* Orphan the skb - required as we might hang on to it
>  	 * for indefinite time. */
>  	skb_orphan(skb);
> +	skb_orphan_frags(skb, GFP_KERNEL);
>  
>  	/* Enqueue packet */
>  	skb_queue_tail(&tun->socket.sk->sk_receive_queue, skb);
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index ccc7d93..9145f83 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1711,6 +1711,17 @@ static inline void skb_orphan(struct sk_buff *skb)
>  }
>  
>  /**
> + *	skb_orphan_frags - orphan the frags contained in a buffer
> + *	@skb: buffer to orphan frags from
> + *	@gfp_mask: allocation mask for replacement pages
> + *
> + *	For each frag in the SKB which has a destructor (i.e. has an
> + *	owner) create a copy of that frag and release the original
> + *	page by calling the destructor.
> + */
> +extern int skb_orphan_frags(struct sk_buff *skb, gfp_t gfp_mask);
> +
> +/**
>   *	__skb_queue_purge - empty a list
>   *	@list: list to empty
>   *
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 945b807..f009abb 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -697,31 +697,25 @@ struct sk_buff *skb_morph(struct sk_buff *dst, struct sk_buff *src)
>  }
>  EXPORT_SYMBOL_GPL(skb_morph);
>  
> -/*	skb_copy_ubufs	-	copy userspace skb frags buffers to kernel
> - *	@skb: the skb to modify
> - *	@gfp_mask: allocation priority
> - *
> - *	This must be called on SKBTX_DEV_ZEROCOPY skb.
> - *	It will copy all frags into kernel and drop the reference
> - *	to userspace pages.
> - *
> - *	If this function is called from an interrupt gfp_mask() must be
> - *	%GFP_ATOMIC.
> - *
> - *	Returns 0 on success or a negative error code on failure
> - *	to allocate kernel memory to copy to.
> +/*
> + * If uarg != NULL copy and replace all frags.
> + * If uarg == NULL then only copy and replace those which have a destructor
> + * pointer.
>   */
> -int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
> +static int skb_copy_frags(struct sk_buff *skb, gfp_t gfp_mask,
> +			  struct ubuf_info *uarg)
>  {
>  	int i;
>  	int num_frags = skb_shinfo(skb)->nr_frags;
>  	struct page *page, *head = NULL;
> -	struct ubuf_info *uarg = skb_shinfo(skb)->destructor_arg;
>  
>  	for (i = 0; i < num_frags; i++) {
>  		u8 *vaddr;
>  		skb_frag_t *f = &skb_shinfo(skb)->frags[i];
>  
> +		if (!uarg && !f->page.destructor)
> +			continue;
> +
>  		page = alloc_page(GFP_ATOMIC);
>  		if (!page) {
>  			while (head) {
> @@ -739,11 +733,16 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
>  		head = page;
>  	}
>  
> -	/* skb frags release userspace buffers */
> -	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
> +	/* skb frags release buffers */
> +	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
> +		skb_frag_t *f = &skb_shinfo(skb)->frags[i];
> +		if (!uarg && !f->page.destructor)
> +			continue;
>  		skb_frag_unref(skb, i);
> +	}
>  
> -	uarg->callback(uarg);
> +	if (uarg)
> +		uarg->callback(uarg);
>  

So above we only linked up copied pages, but below we
try to use the list for all frags. Looks like a bug,
I think it needs to check destructor and uarg too.


>  	/* skb frags point to kernel buffers */
>  	for (i = skb_shinfo(skb)->nr_frags; i > 0; i--) {
> @@ -752,10 +751,41 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
>  		head = (struct page *)head->private;
>  	}
>  
> -	skb_shinfo(skb)->tx_flags &= ~SKBTX_DEV_ZEROCOPY;
>  	return 0;
>  }
>  
> +/*	skb_copy_ubufs	-	copy userspace skb frags buffers to kernel
> + *	@skb: the skb to modify
> + *	@gfp_mask: allocation priority
> + *
> + *	This must be called on SKBTX_DEV_ZEROCOPY skb.
> + *	It will copy all frags into kernel and drop the reference
> + *	to userspace pages.
> + *
> + *	If this function is called from an interrupt gfp_mask() must be
> + *	%GFP_ATOMIC.
> + *
> + *	Returns 0 on success or a negative error code on failure
> + *	to allocate kernel memory to copy to.
> + */
> +int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
> +{
> +	struct ubuf_info *uarg = skb_shinfo(skb)->destructor_arg;
> +	int rc;
> +
> +	rc = skb_copy_frags(skb, gfp_mask, uarg);
> +
> +	if (rc == 0)
> +		skb_shinfo(skb)->tx_flags &= ~SKBTX_DEV_ZEROCOPY;
> +
> +	return rc;
> +}
> +
> +int skb_orphan_frags(struct sk_buff *skb, gfp_t gfp_mask)
> +{
> +	return skb_copy_frags(skb, gfp_mask, NULL);
> +}
> +EXPORT_SYMBOL(skb_orphan_frags);
>  
>  /**
>   *	skb_clone	-	duplicate an sk_buff
> -- 
> 1.7.2.5

^ permalink raw reply

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
From: Hans Schillstrom @ 2012-05-07  9:14 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: kaber@trash.net, jengelh@medozas.de,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	hans@schillstrom.com
In-Reply-To: <20120507090328.GA27650@1984>

On Monday 07 May 2012 11:03:28 Pablo Neira Ayuso wrote:
> On Mon, May 07, 2012 at 10:20:42AM +0200, Hans Schillstrom wrote:
> > On Monday 07 May 2012 00:57:38 Pablo Neira Ayuso wrote:
> > > Hi Hans,
> > > 
> > > [...]
> > > > > > > Regarding ICMP traffic, I think we can use the ID field for the
> > > > > > > hashing as well. Thus, we handle ICMP like other protocols.
> > > > > >
> > > > > > Yes why not, I can give it a try.
> > > > > >
> > > >
> > > > I think we wait with this one..
> > > 
> > > I see. This is easy to add for the conntrack side, but it will require
> > > some extra code for the packet-based solution.
> > 
> > Actually I think there is very little gain to spread with type 
> > and then we must add a user mode possibility to turn it off 
> > i.e. a --hmark-icmp-type-mask 
> > 
> > > Not directly related to this but, I know that your intention is to
> > > make this as flexible as possible. However, I still don't find how I
> > > would use the port mask feature in any of my setups.  Basically, I
> > > don't come up with any useful example for this situation.
> > 
> > We have plenty of rules where just source port mask is zero.
> > and the dest-port-mask is 0xfffc (or 0xffff)
> 
> 0xffff and 0x0000 means on/off respectively.
> 
> Still curious, how can 0xfffc be useful?

That's a special case where an appl is using 4 ports.
But in general, have not seen other than "on/off" except for above.

> 
> > > I'm also telling this because I think that ICMP support will be
> > > easier to add if port masking is removed.
> > > 
> > > [...]
> > > > This is what I have done.
> > > >
> > > > - I reduced the code size a little bit by combining the hmark_ct_set_htuple_ipvX into one func.
> > > >   by adding a hmark_addr6_mask() and hmark_addr_any_mask()
> > > >   Note that using "otuple->src.l3num" as param 1 in both src and dst is not a typo.
> > > >   (it's not set in the rtuple)
> > > 
> > > Good one, this made the code even smaller.
> > > 
> > > > - Made the if (dst < src) swap() in the hmark_hash() since it should be used by every caller.
> > > 
> > > Not really, you don't need for the conntrack part. The original tuple
> > > is always the same, not matter where the packet is coming from. I have
> > > removed this again so it only affects packet-based hashing.
> > 
> > Yes original tuple is always the same but not always less than the rtuple.
> > If you have two nodes that should produce the same hmark,
> > one with conntrack an one without you must make a compare to make it consistent.
> 
> I see, for consistency still makes sense although this seems to me
> like still strange configuration. In what scenario would you use two
> different approaches?

In the way that we use HMARK,
in the incomming path there is conntrack disabled in the contrainer, 
for the outgoing patch i.e. at the payloads there is conntrack used.
In that case the --hmark-ct makes life easier.

> 
> > > > - Moved the L3 check a little bit earlier.
> > > 
> > > good.
> > > 
> > > > - changed return values for fragments.
> > > 
> > > With this, you're giving up on trying to classify fragments. Do you
> > > really want this?
> > > 
> > > From my point of view, if your firewalls (assuming they are the HMARK
> > > classification) are stateless, it still makes sense to me to classify
> > > fragments using the XT_HMARK_METHOD_L3_4.
> > 
> > I do agree, it is back to "return 0" again.
> 
> OK.
> 

-- 
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply

* [net-next 2/2] stmmac: add mixed burst for DMA
From: Giuseppe CAVALLARO @ 2012-05-07  9:12 UTC (permalink / raw)
  To: netdev; +Cc: Giuseppe Cavallaro
In-Reply-To: <1336381953-18041-1-git-send-email-peppe.cavallaro@st.com>

In mixed burst (MB) mode, the AHB master always initiates
the bursts with fixed-size when the DMA requests transfers
of size less than or equal to 16 beats.
This patch adds the MB support and the flag that can be
passed from the platform to select it.
MB mode can also give some benefits in terms of performances
on some platforms.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
 drivers/net/ethernet/stmicro/stmmac/common.h       |    4 ++--
 drivers/net/ethernet/stmicro/stmmac/dwmac1000.h    |    1 +
 .../net/ethernet/stmicro/stmmac/dwmac1000_dma.c    |    6 +++++-
 drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c |    2 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |    6 ++++--
 include/linux/stmmac.h                             |    1 +
 6 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
index 7164509..b343b71 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -247,8 +247,8 @@ struct stmmac_desc_ops {
 
 struct stmmac_dma_ops {
 	/* DMA core initialization */
-	int (*init) (void __iomem *ioaddr, int pbl, int fb, int burst_len,
-			u32 dma_tx, u32 dma_rx);
+	int (*init) (void __iomem *ioaddr, int pbl, int fb, int mb,
+			int burst_len, u32 dma_tx, u32 dma_rx);
 	/* Dump DMA registers */
 	void (*dump_regs) (void __iomem *ioaddr);
 	/* Set tx/rx threshold in the csr6 register
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h b/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h
index 53ed56b..f02162f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h
@@ -141,6 +141,7 @@ enum rx_tx_priority_ratio {
 };
 
 #define DMA_BUS_MODE_FB		0x00010000	/* Fixed burst */
+#define DMA_BUS_MODE_MB		0x04000000	/* Mixed burst */
 #define DMA_BUS_MODE_RPBL_MASK	0x003e0000	/* Rx-Programmable Burst Len */
 #define DMA_BUS_MODE_RPBL_SHIFT	17
 #define DMA_BUS_MODE_USP	0x00800000
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
index 3675c57..15e1aa1 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c
@@ -30,7 +30,7 @@
 #include "dwmac1000.h"
 #include "dwmac_dma.h"
 
-static int dwmac1000_dma_init(void __iomem *ioaddr, int pbl, int fb,
+static int dwmac1000_dma_init(void __iomem *ioaddr, int pbl, int fb, int mb,
 			      int burst_len, u32 dma_tx, u32 dma_rx)
 {
 	u32 value = readl(ioaddr + DMA_BUS_MODE);
@@ -66,6 +66,10 @@ static int dwmac1000_dma_init(void __iomem *ioaddr, int pbl, int fb,
 	if (fb)
 		value |= DMA_BUS_MODE_FB;
 
+	/* Mixed Burst has no effect when fb is set */
+	if (mb)
+		value |= DMA_BUS_MODE_MB;
+
 #ifdef CONFIG_STMMAC_DA
 	value |= DMA_BUS_MODE_DA;	/* Rx has priority over tx */
 #endif
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c
index 92ed2e0..e4eca62 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c
@@ -32,7 +32,7 @@
 #include "dwmac100.h"
 #include "dwmac_dma.h"
 
-static int dwmac100_dma_init(void __iomem *ioaddr, int pbl, int fb,
+static int dwmac100_dma_init(void __iomem *ioaddr, int pbl, int fb, int mb,
 			     int burst_len, u32 dma_tx, u32 dma_rx)
 {
 	u32 value = readl(ioaddr + DMA_BUS_MODE);
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index a9699ae..4fd62ff 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -924,7 +924,8 @@ static void stmmac_check_ether_addr(struct stmmac_priv *priv)
 
 static int stmmac_init_dma_engine(struct stmmac_priv *priv)
 {
-	int pbl = DEFAULT_DMA_PBL, fixed_burst = 0, burst_len = 0;
+	int pbl = DEFAULT_DMA_PBL, fixed_burst = 0, burst_len = 0,
+	    mixed_burst = 0;
 
 	/* Some DMA parameters can be passed from the platform;
 	 * in case of these are not passed we keep a default
@@ -932,10 +933,11 @@ static int stmmac_init_dma_engine(struct stmmac_priv *priv)
 	if (priv->plat->dma_cfg) {
 		pbl = priv->plat->dma_cfg->pbl;
 		fixed_burst = priv->plat->dma_cfg->fixed_burst;
+		mixed_burst = priv->plat->dma_cfg->mixed_burst;
 		burst_len = priv->plat->dma_cfg->burst_len;
 	}
 
-	return priv->hw->dma->init(priv->ioaddr, pbl, fixed_burst,
+	return priv->hw->dma->init(priv->ioaddr, pbl, fixed_burst, mixed_burst,
 				   burst_len, priv->dma_tx_phy,
 				   priv->dma_rx_phy);
 }
diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
index f85c93d..b69bdb1 100644
--- a/include/linux/stmmac.h
+++ b/include/linux/stmmac.h
@@ -86,6 +86,7 @@ struct stmmac_mdio_bus_data {
 struct stmmac_dma_cfg {
 	int pbl;
 	int fixed_burst;
+	int mixed_burst;
 	int burst_len;
 };
 
-- 
1.7.4.4

^ permalink raw reply related

* [net-next 1/2] stmmac: extend mac addr reg and fix perfect filering
From: Giuseppe CAVALLARO @ 2012-05-07  9:12 UTC (permalink / raw)
  To: netdev; +Cc: Giuseppe Cavallaro, Gianni Antoniazzi

This patch is to extend the number of MAC address registers
for 16 to 32. In fact, other new 16 registers are available in new
chips and this can help on perfect filter mode for unicast.

This patch also fixes the perfect filtering mode by setting the
bit 31 in the MAC address registers.

Signed-off-by: Gianni Antoniazzi <gianni.antoniazzi-ext@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
---
 drivers/net/ethernet/stmicro/stmmac/common.h       |    2 +-
 drivers/net/ethernet/stmicro/stmmac/dwmac1000.h    |    8 +++++---
 .../net/ethernet/stmicro/stmmac/dwmac1000_core.c   |   11 +++++++++--
 .../net/ethernet/stmicro/stmmac/dwmac100_core.c    |    2 +-
 drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c    |    7 ++++++-
 drivers/net/ethernet/stmicro/stmmac/stmmac.h       |    1 +
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |    4 ++--
 7 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
index f5dedcb..7164509 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -280,7 +280,7 @@ struct stmmac_ops {
 	/* Handle extra events on specific interrupts hw dependent */
 	void (*host_irq_status) (void __iomem *ioaddr);
 	/* Multicast filter setting */
-	void (*set_filter) (struct net_device *dev);
+	void (*set_filter) (struct net_device *dev, int id);
 	/* Flow control setting */
 	void (*flow_ctrl) (void __iomem *ioaddr, unsigned int duplex,
 			   unsigned int fc, unsigned int pause_time);
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h b/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h
index 54339a7..53ed56b 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000.h
@@ -61,9 +61,11 @@ enum power_event {
 };
 
 /* GMAC HW ADDR regs */
-#define GMAC_ADDR_HIGH(reg)		(0x00000040+(reg * 8))
-#define GMAC_ADDR_LOW(reg)		(0x00000044+(reg * 8))
-#define GMAC_MAX_UNICAST_ADDRESSES	16
+#define GMAC_ADDR_HIGH(reg)	(((reg > 15) ? 0x00000800 : 0x00000040) + \
+				(reg * 8))
+#define GMAC_ADDR_LOW(reg)	(((reg > 15) ? 0x00000804 : 0x00000044) + \
+				(reg * 8))
+#define GMAC_MAX_PERFECT_ADDRESSES	32
 
 #define GMAC_AN_CTRL	0x000000c0	/* AN control */
 #define GMAC_AN_STATUS	0x000000c4	/* AN status */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
index e7cbcd9..c32af05 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
@@ -84,10 +84,11 @@ static void dwmac1000_get_umac_addr(void __iomem *ioaddr, unsigned char *addr,
 				GMAC_ADDR_LOW(reg_n));
 }
 
-static void dwmac1000_set_filter(struct net_device *dev)
+static void dwmac1000_set_filter(struct net_device *dev, int id)
 {
 	void __iomem *ioaddr = (void __iomem *) dev->base_addr;
 	unsigned int value = 0;
+	unsigned int perfect_addr_number;
 
 	CHIP_DBG(KERN_INFO "%s: # mcasts %d, # unicast %d\n",
 		 __func__, netdev_mc_count(dev), netdev_uc_count(dev));
@@ -121,8 +122,14 @@ static void dwmac1000_set_filter(struct net_device *dev)
 		writel(mc_filter[1], ioaddr + GMAC_HASH_HIGH);
 	}
 
+	/* Extra 16 regs are available in cores newer than the 3.40. */
+	if (id > 34)
+		perfect_addr_number = GMAC_MAX_PERFECT_ADDRESSES;
+	else
+		perfect_addr_number = GMAC_MAX_PERFECT_ADDRESSES / 2;
+
 	/* Handle multiple unicast addresses (perfect filtering)*/
-	if (netdev_uc_count(dev) > GMAC_MAX_UNICAST_ADDRESSES)
+	if (netdev_uc_count(dev) > perfect_addr_number)
 		/* Switch to promiscuous mode is more than 16 addrs
 		   are required */
 		value |= GMAC_FRAME_FILTER_PR;
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
index efde50f..19e0f4e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
@@ -89,7 +89,7 @@ static void dwmac100_get_umac_addr(void __iomem *ioaddr, unsigned char *addr,
 	stmmac_get_mac_addr(ioaddr, addr, MAC_ADDR_HIGH, MAC_ADDR_LOW);
 }
 
-static void dwmac100_set_filter(struct net_device *dev)
+static void dwmac100_set_filter(struct net_device *dev, int id)
 {
 	void __iomem *ioaddr = (void __iomem *) dev->base_addr;
 	u32 value = readl(ioaddr + MAC_CONTROL);
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c b/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c
index f20aa12..3edbfa2 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c
@@ -31,6 +31,8 @@
 #define DWMAC_LIB_DBG(fmt, args...)  do { } while (0)
 #endif
 
+#define GMAC_HI_REG_AE		0x80000000
+
 /* CSR1 enables the transmit DMA to check for new descriptor */
 void dwmac_enable_dma_transmission(void __iomem *ioaddr)
 {
@@ -233,7 +235,10 @@ void stmmac_set_mac_addr(void __iomem *ioaddr, u8 addr[6],
 	unsigned long data;
 
 	data = (addr[5] << 8) | addr[4];
-	writel(data, ioaddr + high);
+	/* For MAC Addr registers se have to set the Address Enable (AE)
+	 * bit that has no effect on the High Reg 0 where the bit 31 (MO)
+	 * is RO. */
+	writel(data | GMAC_HI_REG_AE, ioaddr + high);
 	data = (addr[3] << 24) | (addr[2] << 16) | (addr[1] << 8) | addr[0];
 	writel(data, ioaddr + low);
 }
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index db2de9a..6b5d060 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -85,6 +85,7 @@ struct stmmac_priv {
 	struct clk *stmmac_clk;
 #endif
 	int clk_csr;
+	int synopsys_id;
 };
 
 extern int phyaddr;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 1a4cf81..a9699ae 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1465,7 +1465,7 @@ static void stmmac_set_rx_mode(struct net_device *dev)
 	struct stmmac_priv *priv = netdev_priv(dev);
 
 	spin_lock(&priv->lock);
-	priv->hw->mac->set_filter(dev);
+	priv->hw->mac->set_filter(dev, priv->synopsys_id);
 	spin_unlock(&priv->lock);
 }
 
@@ -1806,7 +1806,7 @@ static int stmmac_hw_init(struct stmmac_priv *priv)
 	priv->hw->ring = &ring_mode_ops;
 
 	/* Get and dump the chip ID */
-	stmmac_get_synopsys_id(priv);
+	priv->synopsys_id = stmmac_get_synopsys_id(priv);
 
 	/* Get the HW capability (new GMAC newer than 3.50a) */
 	priv->hw_cap_support = stmmac_get_hw_features(priv);
-- 
1.7.4.4

^ permalink raw reply related

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
From: Pablo Neira Ayuso @ 2012-05-07  9:03 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: kaber@trash.net, jengelh@medozas.de,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	hans@schillstrom.com
In-Reply-To: <201205071020.44449.hans.schillstrom@ericsson.com>

On Mon, May 07, 2012 at 10:20:42AM +0200, Hans Schillstrom wrote:
> On Monday 07 May 2012 00:57:38 Pablo Neira Ayuso wrote:
> > Hi Hans,
> > 
> > [...]
> > > > > > Regarding ICMP traffic, I think we can use the ID field for the
> > > > > > hashing as well. Thus, we handle ICMP like other protocols.
> > > > >
> > > > > Yes why not, I can give it a try.
> > > > >
> > >
> > > I think we wait with this one..
> > 
> > I see. This is easy to add for the conntrack side, but it will require
> > some extra code for the packet-based solution.
> 
> Actually I think there is very little gain to spread with type 
> and then we must add a user mode possibility to turn it off 
> i.e. a --hmark-icmp-type-mask 
> 
> > Not directly related to this but, I know that your intention is to
> > make this as flexible as possible. However, I still don't find how I
> > would use the port mask feature in any of my setups.  Basically, I
> > don't come up with any useful example for this situation.
> 
> We have plenty of rules where just source port mask is zero.
> and the dest-port-mask is 0xfffc (or 0xffff)

0xffff and 0x0000 means on/off respectively.

Still curious, how can 0xfffc be useful?

> > I'm also telling this because I think that ICMP support will be
> > easier to add if port masking is removed.
> > 
> > [...]
> > > This is what I have done.
> > >
> > > - I reduced the code size a little bit by combining the hmark_ct_set_htuple_ipvX into one func.
> > >   by adding a hmark_addr6_mask() and hmark_addr_any_mask()
> > >   Note that using "otuple->src.l3num" as param 1 in both src and dst is not a typo.
> > >   (it's not set in the rtuple)
> > 
> > Good one, this made the code even smaller.
> > 
> > > - Made the if (dst < src) swap() in the hmark_hash() since it should be used by every caller.
> > 
> > Not really, you don't need for the conntrack part. The original tuple
> > is always the same, not matter where the packet is coming from. I have
> > removed this again so it only affects packet-based hashing.
> 
> Yes original tuple is always the same but not always less than the rtuple.
> If you have two nodes that should produce the same hmark,
> one with conntrack an one without you must make a compare to make it consistent.

I see, for consistency still makes sense although this seems to me
like still strange configuration. In what scenario would you use two
different approaches?

> > > - Moved the L3 check a little bit earlier.
> > 
> > good.
> > 
> > > - changed return values for fragments.
> > 
> > With this, you're giving up on trying to classify fragments. Do you
> > really want this?
> > 
> > From my point of view, if your firewalls (assuming they are the HMARK
> > classification) are stateless, it still makes sense to me to classify
> > fragments using the XT_HMARK_METHOD_L3_4.
> 
> I do agree, it is back to "return 0" again.

OK.

^ permalink raw reply

* Re: [PATCH RESEND 3/5] can: flexcan: adopt pinctrl support
From: Marc Kleine-Budde @ 2012-05-07  8:29 UTC (permalink / raw)
  To: Shawn Guo
  Cc: linux-arm-kernel, Arnd Bergmann, Olof Johansson, Sascha Hauer,
	Dong Aisheng, linux-can, Linux Netdev List
In-Reply-To: <CAAQ0ZWSzrfFm+=m+s8S3FDG5k_OrULN58VUVCBE=fwU0r=ZD+g@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1376 bytes --]

On 05/07/2012 10:17 AM, Shawn Guo wrote:
> On 7 May 2012 16:06, Marc Kleine-Budde <mkl@pengutronix.de> wrote:
>> It doesn't compile yet against net-next/master, which is based on v3.4-rc4:
>>
>> /home/frogger/pengutronix/socketcan/linux/drivers/net/can/flexcan.c: In
>> function 'flexcan_probe':
>> /home/frogger/pengutronix/socketcan/linux/drivers/net/can/flexcan.c:937:
>> error: implicit declaration of function 'devm_pinctrl_get_select_default'
>> /home/frogger/pengutronix/socketcan/linux/drivers/net/can/flexcan.c:937:
>> warning: assignment makes pointer from integer without a cast
>>
>> Which tree does this series depend on? If this should go over the
>> linux-can tree, I have to ask David first to merge this.
>>
> Thanks for the response, Marc.  The patch depends on pinctrl tree and
> a couple of other patches that will go through arm-soc tree, so I
> would like to ask for your ack to have the patch go over arm-soc tree.

Fine with me. David, is this okay? Shawn, better wait for David's okay.

Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>

regards, Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply

* Re: [PATCH v2] RPS: Sparse connection optimizations - v2
From: Eric Dumazet @ 2012-05-07  8:25 UTC (permalink / raw)
  To: Deng-Cheng Zhu; +Cc: Tom Herbert, davem, netdev
In-Reply-To: <4FA7815F.8030101@mips.com>

On Mon, 2012-05-07 at 16:01 +0800, Deng-Cheng Zhu wrote:

> Did you really read my patch and understand what I commented? When I was
> talking about using rps_sparse_flow (initially cpu_flow), neither
> rps_sock_flow_table nor rps_dev_flow_table is activated (number of
> entries: 0).

I read your patch and am concerned of performance issues when handling
typical workload. Say between 0.1 and 20 Mpps on current hardware.

The argument "oh its only selected when
CONFIG_RPS_SPARSE_FLOW_OPTIMIZATION is set" is wrong.

CONFIG_NR_RPS_MAP_LOOPS is wrong.

Your HZ timeout is yet another dark side of your patch. 

Your (flow->dev == skb->dev) test is wrong.

Your : flow->ts = now; is wrong (dirtying memory for each packet)

Really I dont like your patch.

You are kindly asked to find another way to solve your problem, a
generic mechanism that can help others, not only you.

We do think activating RFS is the way to go. Its the standard layer we
added below RPS, its configurable and scales. It can be expanded at will
with configurable plugins.

For example, using single queue NICS, it makes sense to select cpu on
the output device only, not on the rxhash by itself (a modulo or
something), to reduce false sharing and qdisc/device lock on tx path.

If your machine has 4 cpus, and 4 nics, you can instruct RFS table to
prefer cpu on the NIC that packet will use for output.

^ permalink raw reply

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
From: Hans Schillstrom @ 2012-05-07  8:20 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: kaber@trash.net, jengelh@medozas.de,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	hans@schillstrom.com
In-Reply-To: <20120506225738.GA23009@1984>

[-- Attachment #1: Type: text/plain, Size: 4731 bytes --]

On Monday 07 May 2012 00:57:38 Pablo Neira Ayuso wrote:
> Hi Hans,
> 
> [...]
> > > > > Regarding ICMP traffic, I think we can use the ID field for the
> > > > > hashing as well. Thus, we handle ICMP like other protocols.
> > > >
> > > > Yes why not, I can give it a try.
> > > >
> >
> > I think we wait with this one..
> 
> I see. This is easy to add for the conntrack side, but it will require
> some extra code for the packet-based solution.

Actually I think there is very little gain to spread with type 
and then we must add a user mode possibility to turn it off 
i.e. a --hmark-icmp-type-mask 

> Not directly related to this but, I know that your intention is to
> make this as flexible as possible. However, I still don't find how I
> would use the port mask feature in any of my setups.  Basically, I
> don't come up with any useful example for this situation.

We have plenty of rules where just source port mask is zero.
and the dest-port-mask is 0xfffc (or 0xffff)


> I'm also telling this because I think that ICMP support will be
> easier to add if port masking is removed.
> 
> [...]
> > This is what I have done.
> >
> > - I reduced the code size a little bit by combining the hmark_ct_set_htuple_ipvX into one func.
> >   by adding a hmark_addr6_mask() and hmark_addr_any_mask()
> >   Note that using "otuple->src.l3num" as param 1 in both src and dst is not a typo.
> >   (it's not set in the rtuple)
> 
> Good one, this made the code even smaller.
> 
> > - Made the if (dst < src) swap() in the hmark_hash() since it should be used by every caller.
> 
> Not really, you don't need for the conntrack part. The original tuple
> is always the same, not matter where the packet is coming from. I have
> removed this again so it only affects packet-based hashing.

Yes original tuple is always the same but not always less than the rtuple.
If you have two nodes that should produce the same hmark,
one with conntrack an one without you must make a compare to make it consistent.

> 
> > - Moved the L3 check a little bit earlier.
> 
> good.
> 
> > - changed return values for fragments.
> 
> With this, you're giving up on trying to classify fragments. Do you
> really want this?
> 
> From my point of view, if your firewalls (assuming they are the HMARK
> classification) are stateless, it still makes sense to me to classify
> fragments using the XT_HMARK_METHOD_L3_4.

I do agree, it is back to "return 0" again.

> 
> > - Added nhoffs to: hmark_set_tuple_ports(skb, (ip->ihl * 4) + nhoff, t, info);
> >   to get icmp working
> 
> good catch.
> 
> Below, some minor changes that I made to your patch (you can find a
> new version enclosed to this email).
> 
> [...]
> > +#ifndef XT_HMARK_H_
> > +#define XT_HMARK_H_
> > +
> > +#include <linux/types.h>
> > +
> > +enum {
> > +     XT_HMARK_NONE,
> > +     XT_HMARK_SADR_AND,
> > +     XT_HMARK_DADR_AND,
> > +     XT_HMARK_SPI_AND,
> > +     XT_HMARK_SPI_OR,
> > +     XT_HMARK_SPORT_AND,
> > +     XT_HMARK_DPORT_AND,
> > +     XT_HMARK_SPORT_OR,
> > +     XT_HMARK_DPORT_OR,
> > +     XT_HMARK_PROTO_AND,
> > +     XT_HMARK_RND,
> > +     XT_HMARK_MODULUS,
> > +     XT_HMARK_OFFSET,
> > +     XT_HMARK_CT,
> > +     XT_HMARK_METHOD_L3,
> > +     XT_HMARK_METHOD_L3_4,
> > +     XT_F_HMARK_SADR_AND    = 1 << XT_HMARK_SADR_AND,
> > +     XT_F_HMARK_DADR_AND    = 1 << XT_HMARK_DADR_AND,
> > +     XT_F_HMARK_SPI_AND     = 1 << XT_HMARK_SPI_AND,
> > +     XT_F_HMARK_SPI_OR      = 1 << XT_HMARK_SPI_OR,
> > +     XT_F_HMARK_SPORT_AND   = 1 << XT_HMARK_SPORT_AND,
> > +     XT_F_HMARK_DPORT_AND   = 1 << XT_HMARK_DPORT_AND,
> > +     XT_F_HMARK_SPORT_OR    = 1 << XT_HMARK_SPORT_OR,
> > +     XT_F_HMARK_DPORT_OR    = 1 << XT_HMARK_DPORT_OR,
> > +     XT_F_HMARK_PROTO_AND   = 1 << XT_HMARK_PROTO_AND,
> > +     XT_F_HMARK_RND         = 1 << XT_HMARK_RND,
> > +     XT_F_HMARK_MODULUS     = 1 << XT_HMARK_MODULUS,
> > +     XT_F_HMARK_OFFSET      = 1 << XT_HMARK_OFFSET,
> > +     XT_F_HMARK_CT          = 1 << XT_HMARK_CT,
> > +     XT_F_HMARK_METHOD_L3   = 1 << XT_HMARK_METHOD_L3,
> > +     XT_F_HMARK_METHOD_L3_4 = 1 << XT_HMARK_METHOD_L3_4,
> 
> I've defined:
> 
> #define XT_HMARK_FLAG(flag) (1 << flag)
> 
> So we save all those extra _F_ defintions, they look redundant.

OK, I had to change the user mode code to keep up with this change...
The user code part is also included now.

[snip]

>+static inline u32
>+hmark_addr_mask(int l3num, const __u32 *addr32, const __u32 *mask)
>+{
>+       switch (l3num) {
              ^
Added a space here

>+       case AF_INET:
>+               return *addr32 & *mask;
>+       case AF_INET6:
>+               return hmark_addr6_mask(addr32, mask);


-- 
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

[-- Attachment #2: 0001-netfilter-add-xt_hmark-target-for-hash-based-skb-mar.patch --]
[-- Type: text/x-patch, Size: 14126 bytes --]

From 04cc88b2eec677fd8eab3fbf620ed9209b883b8c Mon Sep 17 00:00:00 2001
From: Hans Schillstrom <hans.schillstrom@ericsson.com>
Date: Mon, 7 May 2012 08:33:08 +0200
Subject: [PATCH 1/1] netfilter: add xt_hmark target for hash-based skb marking

The target allows you to create rules in the "raw" and "mangle" tables
which set the skbuff mark by means of hash calculation within a given
range. The nfmark can influence the routing method (see "Use netfilter
MARK value as routing key") and can also be used by other subsystems to
change their behaviour.

Some examples:

* Default rule handles all TCP, UDP, SCTP, ESP & AH

 iptables -t mangle -A PREROUTING -m state --state NEW,ESTABLISHED,RELATED \
	-j HMARK --hmark-offset 10000 --hmark-mod 10

* Handle SCTP and hash dest port only and produce a nfmark between 100-119.

 iptables -t mangle -A PREROUTING -p SCTP -j HMARK --src-mask 0 --dst-mask 0 \
	--sp-mask 0 --offset 100 --mod 20

* Fragment safe Layer 3 only, that keep a class C network flow together

 iptables -t mangle -A PREROUTING -j HMARK --method L3 \
	--src-mask 24 --mod 20 --offset 100

[ A big part of this patch has been refactorized by Pablo Neira Ayuso ]

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 include/linux/netfilter/xt_HMARK.h |   48 +++++
 net/netfilter/Kconfig              |   15 ++
 net/netfilter/Makefile             |    1 +
 net/netfilter/xt_HMARK.c           |  355 ++++++++++++++++++++++++++++++++++++
 4 files changed, 419 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter/xt_HMARK.h
 create mode 100644 net/netfilter/xt_HMARK.c

diff --git a/include/linux/netfilter/xt_HMARK.h b/include/linux/netfilter/xt_HMARK.h
new file mode 100644
index 0000000..05e43ba
--- /dev/null
+++ b/include/linux/netfilter/xt_HMARK.h
@@ -0,0 +1,48 @@
+#ifndef XT_HMARK_H_
+#define XT_HMARK_H_
+
+#include <linux/types.h>
+
+enum {
+	XT_HMARK_NONE,
+	XT_HMARK_SADR_AND,
+	XT_HMARK_DADR_AND,
+	XT_HMARK_SPI_AND,
+	XT_HMARK_SPI_OR,
+	XT_HMARK_SPORT_AND,
+	XT_HMARK_DPORT_AND,
+	XT_HMARK_SPORT_OR,
+	XT_HMARK_DPORT_OR,
+	XT_HMARK_PROTO_AND,
+	XT_HMARK_RND,
+	XT_HMARK_MODULUS,
+	XT_HMARK_OFFSET,
+	XT_HMARK_CT,
+	XT_HMARK_METHOD_L3,
+	XT_HMARK_METHOD_L3_4,
+};
+#define XT_HMARK_FLAG(flag)	(1 << flag)
+
+union hmark_ports {
+	struct {
+		__u16	src;
+		__u16	dst;
+	} p16;
+	__u32	v32;
+};
+
+struct xt_hmark_info {
+	union nf_inet_addr	src_mask;	/* Source address mask */
+	union nf_inet_addr	dst_mask;	/* Dest address mask */
+	union hmark_ports	port_mask;
+	union hmark_ports	port_set;
+	__u32			spi_mask;
+	__u32			spi_set;
+	__u32			flags;		/* Print out only */
+	__u16			proto_mask;	/* L4 Proto mask */
+	__u32			hashrnd;
+	__u32			hmodulus;	/* Modulus */
+	__u32			hoffset;	/* Offset */
+};
+
+#endif /* XT_HMARK_H_ */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 0c6f67e..209c1ed 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -509,6 +509,21 @@ config NETFILTER_XT_TARGET_HL
 	since you can easily create immortal packets that loop
 	forever on the network.
 
+config NETFILTER_XT_TARGET_HMARK
+	tristate '"HMARK" target support'
+	depends on (IP6_NF_IPTABLES || IP6_NF_IPTABLES=n)
+	depends on NETFILTER_ADVANCED
+	---help---
+	This option adds the "HMARK" target.
+
+	The target allows you to create rules in the "raw" and "mangle" tables
+	which set the skbuff mark by means of hash calculation within a given
+	range. The nfmark can influence the routing method (see "Use netfilter
+	MARK value as routing key") and can also be used by other subsystems to
+	change their behaviour.
+
+	To compile it as a module, choose M here. If unsure, say N.
+
 config NETFILTER_XT_TARGET_IDLETIMER
 	tristate  "IDLETIMER target support"
 	depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index ca36765..4e7960c 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_CONNSECMARK) += xt_CONNSECMARK.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_CT) += xt_CT.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_DSCP) += xt_DSCP.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_HL) += xt_HL.o
+obj-$(CONFIG_NETFILTER_XT_TARGET_HMARK) += xt_HMARK.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_LED) += xt_LED.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_LOG) += xt_LOG.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_NFLOG) += xt_NFLOG.o
diff --git a/net/netfilter/xt_HMARK.c b/net/netfilter/xt_HMARK.c
new file mode 100644
index 0000000..6954d40
--- /dev/null
+++ b/net/netfilter/xt_HMARK.c
@@ -0,0 +1,355 @@
+/*
+ * xt_HMARK - Netfilter module to set mark by means of hashing
+ *
+ * (C) 2012 by Hans Schillstrom <hans.schillstrom@ericsson.com>
+ * (C) 2012 by Pablo Neira Ayuso <pablo@netfilter.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/icmp.h>
+
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_HMARK.h>
+
+#include <net/ip.h>
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+#include <net/netfilter/nf_conntrack.h>
+#endif
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+#include <net/ipv6.h>
+#include <linux/netfilter_ipv6/ip6_tables.h>
+#endif
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Hans Schillstrom <hans.schillstrom@ericsson.com>");
+MODULE_DESCRIPTION("Xtables: packet marking using hash calculation");
+MODULE_ALIAS("ipt_HMARK");
+MODULE_ALIAS("ip6t_HMARK");
+
+struct hmark_tuple {
+	u32			src;
+	u32			dst;
+	union hmark_ports	uports;
+	uint8_t			proto;
+};
+
+static inline u32 hmark_addr6_mask(const __u32 *addr32, const __u32 *mask)
+{
+	return (addr32[0] & mask[0]) ^
+	       (addr32[1] & mask[1]) ^
+	       (addr32[2] & mask[2]) ^
+	       (addr32[3] & mask[3]);
+}
+
+static inline u32
+hmark_addr_mask(int l3num, const __u32 *addr32, const __u32 *mask)
+{
+	switch (l3num) {
+	case AF_INET:
+		return *addr32 & *mask;
+	case AF_INET6:
+		return hmark_addr6_mask(addr32, mask);
+	}
+	return 0;
+}
+
+static int
+hmark_ct_set_htuple(const struct sk_buff *skb, struct hmark_tuple *t,
+		    const struct xt_hmark_info *info)
+{
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
+	struct nf_conntrack_tuple *otuple;
+	struct nf_conntrack_tuple *rtuple;
+
+	if (ct == NULL || nf_ct_is_untracked(ct))
+		return -1;
+
+	otuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
+	rtuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple;
+
+	t->src = hmark_addr_mask(otuple->src.l3num, otuple->src.u3.all,
+				 info->src_mask.all);
+	t->dst = hmark_addr_mask(otuple->src.l3num, rtuple->src.u3.all,
+				 info->dst_mask.all);
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3))
+		return 0;
+
+	t->proto = nf_ct_protonum(ct);
+	if (t->proto != IPPROTO_ICMP) {
+		t->uports.p16.src = otuple->src.u.all;
+		t->uports.p16.dst = rtuple->src.u.all;
+		t->uports.v32 = (t->uports.v32 & info->port_mask.v32) |
+				info->port_set.v32;
+	}
+
+	return 0;
+#else
+	return -1;
+#endif
+}
+
+static inline u32
+hmark_hash(struct hmark_tuple *t, const struct xt_hmark_info *info)
+{
+	u32 hash;
+
+	if (t->dst < t->src)
+		swap(t->src, t->dst);
+
+	hash = jhash_3words(t->src, t->dst, t->uports.v32, info->hashrnd);
+	hash = hash ^ (t->proto & info->proto_mask);
+
+	return (hash % info->hmodulus) + info->hoffset;
+}
+
+static void
+hmark_set_tuple_ports(const struct sk_buff *skb, unsigned int nhoff,
+		      struct hmark_tuple *t, const struct xt_hmark_info *info)
+{
+	int protoff;
+
+	protoff = proto_ports_offset(t->proto);
+	if (protoff < 0)
+		return;
+
+	nhoff += protoff;
+	if (skb_copy_bits(skb, nhoff, &t->uports, sizeof(t->uports)) < 0)
+		return;
+
+	if (t->proto == IPPROTO_ESP || t->proto == IPPROTO_AH)
+		t->uports.v32 = (t->uports.v32 & info->spi_mask) |
+				info->spi_set;
+	else {
+		t->uports.v32 = (t->uports.v32 & info->port_mask.v32) |
+				info->port_set.v32;
+
+		if (t->uports.p16.dst < t->uports.p16.src)
+			swap(t->uports.p16.dst, t->uports.p16.src);
+	}
+}
+
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+static int get_inner6_hdr(const struct sk_buff *skb, int *offset)
+{
+	struct icmp6hdr *icmp6h, _ih6;
+
+	icmp6h = skb_header_pointer(skb, *offset, sizeof(_ih6), &_ih6);
+	if (icmp6h == NULL)
+		return 0;
+
+	if (icmp6h->icmp6_type && icmp6h->icmp6_type < 128) {
+		*offset += sizeof(struct icmp6hdr);
+		return 1;
+	}
+	return 0;
+}
+
+static int
+hmark_pkt_set_htuple_ipv6(const struct sk_buff *skb, struct hmark_tuple *t,
+			  const struct xt_hmark_info *info)
+{
+	struct ipv6hdr *ip6, _ip6;
+	int flag = IP6T_FH_F_AUTH; /* Ports offset, find_hdr flags */
+	unsigned int nhoff = 0;
+	u16 fragoff = 0;
+	int nexthdr;
+
+	ip6 = (struct ipv6hdr *) (skb->data + skb_network_offset(skb));
+	nexthdr = ipv6_find_hdr(skb, &nhoff, -1, &fragoff, &flag);
+	if (nexthdr < 0)
+		return 0;
+	/* No need to check for icmp errors on fragments */
+	if ((flag & IP6T_FH_F_FRAG) || (nexthdr != IPPROTO_ICMPV6))
+		goto noicmp;
+	/* if an icmp error, use the inner header */
+	if (get_inner6_hdr(skb, &nhoff)) {
+		ip6 = skb_header_pointer(skb, nhoff, sizeof(_ip6), &_ip6);
+		if (ip6 == NULL)
+			return -1;
+		/* Treat AH as ESP, use SPI nothing else. */
+		flag = IP6T_FH_F_AUTH;
+		nexthdr = ipv6_find_hdr(skb, &nhoff, -1, &fragoff, &flag);
+		if (nexthdr < 0)
+			return -1;
+	}
+noicmp:
+	t->src = hmark_addr6_mask(ip6->saddr.s6_addr32, info->src_mask.all);
+	t->dst = hmark_addr6_mask(ip6->daddr.s6_addr32, info->dst_mask.all);
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3))
+		return 0;
+
+	t->proto = nexthdr;
+
+	if (t->proto == IPPROTO_ICMPV6)
+		return 0;
+
+	if (flag & IP6T_FH_F_FRAG)
+		return 0;
+
+	hmark_set_tuple_ports(skb, nhoff, t, info);
+
+	return 0;
+}
+
+static unsigned int
+hmark_tg_v6(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct xt_hmark_info *info = par->targinfo;
+	struct hmark_tuple t;
+
+	memset(&t, 0, sizeof(struct hmark_tuple));
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_CT)) {
+		if (hmark_ct_set_htuple(skb, &t, info) < 0)
+			return XT_CONTINUE;
+	} else {
+		if (hmark_pkt_set_htuple_ipv6(skb, &t, info) < 0)
+			return XT_CONTINUE;
+	}
+
+	skb->mark = hmark_hash(&t, info);
+	return XT_CONTINUE;
+}
+#endif
+
+static int get_inner_hdr(const struct sk_buff *skb, int iphsz, int *nhoff)
+{
+	const struct icmphdr *icmph;
+	struct icmphdr _ih;
+
+	/* Not enough header? */
+	icmph = skb_header_pointer(skb, *nhoff + iphsz, sizeof(_ih), &_ih);
+	if (icmph == NULL && icmph->type > NR_ICMP_TYPES)
+		return 0;
+
+	/* Error message? */
+	if (icmph->type != ICMP_DEST_UNREACH &&
+	    icmph->type != ICMP_SOURCE_QUENCH &&
+	    icmph->type != ICMP_TIME_EXCEEDED &&
+	    icmph->type != ICMP_PARAMETERPROB &&
+	    icmph->type != ICMP_REDIRECT)
+		return 0;
+
+	*nhoff += iphsz + sizeof(_ih);
+	return 1;
+}
+
+static int
+hmark_pkt_set_htuple_ipv4(const struct sk_buff *skb, struct hmark_tuple *t,
+			  const struct xt_hmark_info *info)
+{
+	struct iphdr *ip, _ip;
+	int nhoff = skb_network_offset(skb);
+
+	ip = (struct iphdr *) (skb->data + nhoff);
+	if (ip->protocol == IPPROTO_ICMP) {
+		/* use inner header in case of ICMP errors */
+		if (get_inner_hdr(skb, ip->ihl * 4, &nhoff)) {
+			ip = skb_header_pointer(skb, nhoff, sizeof(_ip), &_ip);
+			if (ip == NULL)
+				return -1;
+		}
+	}
+
+	t->src = (__force u32) ip->saddr;
+	t->dst = (__force u32) ip->daddr;
+
+	t->src &= info->src_mask.ip;
+	t->dst &= info->dst_mask.ip;
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3))
+		return 0;
+
+	t->proto = ip->protocol;
+
+	/* ICMP has no ports, skip */
+	if (t->proto == IPPROTO_ICMP)
+		return 0;
+
+	/* follow-up fragments don't contain ports, skip */
+	if (ip->frag_off & htons(IP_MF | IP_OFFSET))
+		return 0;
+
+	hmark_set_tuple_ports(skb, (ip->ihl * 4) + nhoff, t, info);
+
+	return 0;
+}
+
+static unsigned int
+hmark_tg_v4(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct xt_hmark_info *info = par->targinfo;
+	struct hmark_tuple t;
+
+	memset(&t, 0, sizeof(struct hmark_tuple));
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_CT)) {
+		if (hmark_ct_set_htuple(skb, &t, info) < 0)
+			return XT_CONTINUE;
+	} else {
+		if (hmark_pkt_set_htuple_ipv4(skb, &t, info) < 0)
+			return XT_CONTINUE;
+	}
+
+	skb->mark = hmark_hash(&t, info);
+	return XT_CONTINUE;
+}
+
+static int hmark_tg_check(const struct xt_tgchk_param *par)
+{
+	const struct xt_hmark_info *info = par->targinfo;
+
+	if (!info->hmodulus) {
+		pr_info("xt_HMARK: hash modulus can't be zero\n");
+		return -EINVAL;
+	}
+	if (info->proto_mask &&
+	    (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3))) {
+		pr_info("xt_HMARK: proto mask must be zero with L3 mode\n");
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static struct xt_target hmark_tg_reg[] __read_mostly = {
+	{
+		.name		= "HMARK",
+		.family		= NFPROTO_IPV4,
+		.target		= hmark_tg_v4,
+		.targetsize	= sizeof(struct xt_hmark_info),
+		.checkentry	= hmark_tg_check,
+		.me		= THIS_MODULE,
+	},
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+	{
+		.name		= "HMARK",
+		.family		= NFPROTO_IPV6,
+		.target		= hmark_tg_v6,
+		.targetsize	= sizeof(struct xt_hmark_info),
+		.checkentry	= hmark_tg_check,
+		.me		= THIS_MODULE,
+	},
+#endif
+};
+
+static int __init hmark_tg_init(void)
+{
+	return xt_register_targets(hmark_tg_reg, ARRAY_SIZE(hmark_tg_reg));
+}
+
+static void __exit hmark_tg_exit(void)
+{
+	xt_unregister_targets(hmark_tg_reg, ARRAY_SIZE(hmark_tg_reg));
+}
+
+module_init(hmark_tg_init);
+module_exit(hmark_tg_exit);
-- 
1.7.2.3


[-- Attachment #3: 0001-netfilter-userspace-part-for-target-HMARK.patch --]
[-- Type: text/x-patch, Size: 24699 bytes --]

From edcb596187a50172481d1e9fa11ae062337c69eb Mon Sep 17 00:00:00 2001
From: Hans Schillstrom <hans.schillstrom@ericsson.com>
Date: Mon, 7 May 2012 09:46:38 +0200
Subject: [PATCH 1/1] netfilter: userspace part for target HMARK

    The target allows you to create rules in the "raw" and "mangle" tables
    which alter the netfilter mark (nfmark) field within a given range.
    First a 32 bit hash value is generated then modulus by <limit> and
    finally an offset is added before it's written to nfmark.
    Prior to routing, the nfmark can influence the routing method (see
    "Use netfilter MARK value as routing key") and can also be used by
    other subsystems to change their behaviour.

    The mark match can also be used to match nfmark produced by this module.
Ver 13
    Name change of defines.

Ver 12
    Reset option flag in some cases, where option is disabled by value.

Ver 10
    conntrack reduced to --hmark-ct switch
    renaming of vars in xt_hmark_info
    Adding helptext and updated man due to --hmark-ct switc

Ver 9
    Formating changes.

Ver 8
    Syntax changes more descriptive options
    --hmark-method added.

Ver 6-7 -

Ver 5
      smask and dmask changed to length

Ver 4
      xtoptions used for parsing.

Ver 3
       -

Ver 2
      IPv4 NAT added
      iptables ver 1.4.12.1 adaptions.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 extensions/libxt_HMARK.c           |  510 ++++++++++++++++++++++++++++++++++++
 extensions/libxt_HMARK.man         |   84 ++++++
 include/linux/netfilter/xt_HMARK.h |   48 ++++
 3 files changed, 642 insertions(+), 0 deletions(-)
 create mode 100644 extensions/libxt_HMARK.c
 create mode 100644 extensions/libxt_HMARK.man
 create mode 100644 include/linux/netfilter/xt_HMARK.h

diff --git a/extensions/libxt_HMARK.c b/extensions/libxt_HMARK.c
new file mode 100644
index 0000000..4b13cd3
--- /dev/null
+++ b/extensions/libxt_HMARK.c
@@ -0,0 +1,510 @@
+/*
+ * Shared library add-on to iptables to add HMARK target support.
+ *
+ * The kernel module calculates a hash value that can be modified by modulus
+ * and an offset. The hash value is based on a direction independent
+ * five tuple: src & dst addr src & dst ports and protocol.
+ * However src & dst port can be masked and are not used for fragmented
+ * packets, ESP and AH don't have ports so SPI will be used instead.
+ * For ICMP error messages the hash mark values will be calculated on
+ * the source packet i.e. the packet caused the error (If sufficient
+ * amount of data exists).
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <stdbool.h>
+#include <stdio.h>
+#include <string.h>
+
+#include "xtables.h"
+#include <linux/netfilter/xt_HMARK.h>
+
+
+#define DEF_HRAND 0xc175a3b8	/* Default "random" value to jhash */
+
+#define XT_F_HMARK_L4_OPTS \
+		(XT_HMARK_FLAG(XT_HMARK_SPI_AND) |\
+		 XT_HMARK_FLAG(XT_HMARK_SPI_OR) |\
+		 XT_HMARK_FLAG(XT_HMARK_SPORT_AND) |\
+		 XT_HMARK_FLAG(XT_HMARK_SPORT_OR) |\
+		 XT_HMARK_FLAG(XT_HMARK_DPORT_AND) |\
+		 XT_HMARK_FLAG(XT_HMARK_DPORT_OR) |\
+		 XT_HMARK_FLAG(XT_HMARK_PROTO_AND))
+
+static void HMARK_help(void)
+{
+	printf(
+"HMARK target options, i.e. modify hash calculation by:\n"
+"  --hmark-method <method>          Overall L3/L4 and fragment behavior\n"
+"                 L3                Fragment safe, do not use ports or proto\n"
+"                                   i.e. Fragments don't need special care.\n"
+"                 L3-4 (Default)    Fragment unsafe, use ports and proto\n"
+"                                   if defrag off in conntrack\n"
+"                                      no hmark on any part of a fragment\n"
+"  Limit/modify the calculated hash mark by:\n"
+"  --hmark-mod value                nfmark modulus value\n"
+"  --hmark-offset value             Last action add value to nfmark\n\n"
+" Fine tuning of what will be included in hash calculation\n"
+"  --hmark-src-mask length          Source address mask length\n"
+"  --hmark-dst-mask length          Dest address mask length\n"
+"  --hmark-sport-mask value         Mask src port with value\n"
+"  --hmark-dport-mask value         Mask dst port with value\n"
+"  --hmark-spi-mask value           For esp and ah AND spi with value\n"
+"  --hmark-sport-set value          OR src port with value\n"
+"  --hmark-dport-set value          OR dst port with value\n"
+"  --hmark-spi-set value            For esp and ah OR spi with value\n"
+"  --hmark-proto-mask value         Mask Protocol with value\n"
+"  --hmark-rnd                      Initial Random value to hash cacl.\n"
+" For NAT in IPv4: src part from original/reply tuple will always be used\n"
+" i.e. orig src part will be used as src address/port.\n"
+"     reply src part will be used as dst address/port\n"
+" Make sure to qualify the rule in a proper way when using NAT flag\n"
+" When --ct is used only tracked connections will match\n"
+"  --hmark-ct                       Force conntrack orig and rely tuples as\n"
+"                                   source and destination.\n\n"
+" In many cases hmark can be omitted i.e. --src-mask can be used\n");
+}
+
+#define hi struct xt_hmark_info
+
+static const struct xt_option_entry HMARK_opts[] = {
+	{ .name  = "hmark-method",
+	  .type  = XTTYPE_STRING,
+	  .id    = XT_HMARK_METHOD_L3
+	},
+	{ .name  = "hmark-src-mask",
+	  .type  = XTTYPE_PLENMASK,
+	  .id    = XT_HMARK_SADR_AND,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, src_mask)
+	},
+	{ .name  = "hmark-dst-mask",
+	  .type  = XTTYPE_PLENMASK,
+	  .id    = XT_HMARK_DADR_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, dst_mask)
+	},
+	{ .name  = "hmark-sport-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_SPORT_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_mask.p16.src)
+	},
+	{ .name  = "hmark-dport-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_DPORT_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_mask.p16.dst)
+	},
+	{ .name  = "hmark-spi-mask",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_SPI_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, spi_mask)
+	},
+	{ .name  = "hmark-sport-set",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_SPORT_OR,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_set.p16.src)
+	},
+	{ .name  = "hmark-dport-set",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_DPORT_OR,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_set.p16.dst)
+	},
+	{ .name  = "hmark-spi-set",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_SPI_OR,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, spi_set)
+	},
+	{ .name  = "hmark-proto-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_PROTO_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, proto_mask)
+	},
+	{ .name  = "hmark-rnd",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_RND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, hashrnd)
+	},
+	{ .name = "hmark-mod",
+	  .type = XTTYPE_UINT32,
+	  .id = XT_HMARK_MODULUS,
+	  .min = 1,
+	  .flags = XTOPT_PUT | XTOPT_MAND,
+	  XTOPT_POINTER(hi, hmodulus)
+	},
+	{ .name  = "hmark-offset",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_OFFSET,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, hoffset)
+	},
+	{ .name  = "hmark-ct",
+	  .type  = XTTYPE_NONE,
+	  .id    = XT_HMARK_CT
+	},
+
+	{ .name  = "method",
+	  .type  = XTTYPE_STRING,
+	  .id    = XT_HMARK_METHOD_L3
+	},
+	{ .name  = "src-mask",
+	  .type  = XTTYPE_PLENMASK,
+	  .id    = XT_HMARK_SADR_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, src_mask)
+	},
+	{ .name  = "dst-mask",
+	  .type  = XTTYPE_PLENMASK,
+	  .id    = XT_HMARK_DADR_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, dst_mask)
+	},
+	{ .name  = "sport-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_SPORT_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_mask.p16.src)
+	},
+	{ .name  = "dport-mask", .type = XTTYPE_UINT16,
+	  .id = XT_HMARK_DPORT_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_mask.p16.dst)
+	},
+	{ .name  = "spi-mask",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_SPI_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, spi_mask)
+	},
+	{ .name  = "sport-set",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_SPORT_OR,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_set.p16.src)
+	},
+	{ .name  = "dport-set",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_DPORT_OR,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_set.p16.dst)
+	},
+	{ .name  = "spi-set",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_SPI_OR,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, spi_set)
+	},
+	{ .name  = "proto-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_PROTO_AND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, proto_mask)
+	},
+	{ .name  = "rnd",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_RND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, hashrnd)
+	},
+	{ .name  = "mod",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_MODULUS,
+	  .min   = 1,
+	  .flags = XTOPT_PUT,
+	  XTOPT_MAND, XTOPT_POINTER(hi, hmodulus)
+	},
+	{ .name  = "offset",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_OFFSET,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, hoffset)
+	},
+	{ .name  = "ct",
+	  .type  = XTTYPE_NONE,
+	  .id    = XT_HMARK_CT
+	},
+	XTOPT_TABLEEND,
+};
+
+static void HMARK_parse(struct xt_option_call *cb, int plen)
+{
+	struct xt_hmark_info *info = cb->data;
+
+	if (!cb->xflags) {
+		memset(info, 0xff, sizeof(struct xt_hmark_info));
+		info->port_set.v32 = 0;
+		info->flags = 0;
+		info->spi_set = 0;
+		info->hoffset = 0;
+		info->hashrnd = DEF_HRAND;
+	}
+	xtables_option_parse(cb);
+
+	switch (cb->entry->id) {
+	case XT_HMARK_SADR_AND:
+		if (cb->val.hlen == plen)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_SADR_AND);
+		break;
+	case XT_HMARK_DADR_AND:
+		if (cb->val.hlen == plen)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_DADR_AND);
+		break;
+	case XT_HMARK_SPI_AND:
+		info->spi_mask = htonl(cb->val.u32);
+		if (cb->val.u32 == 0xffffffff)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_SPI_AND);
+		break;
+	case XT_HMARK_SPI_OR:
+		info->spi_set = htonl(cb->val.u32);
+		if (cb->val.u32 == 0)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_SPI_OR);
+		break;
+	case XT_HMARK_SPORT_AND:
+		info->port_mask.p16.src = htons(cb->val.u16);
+		if (cb->val.u16 == 0xffff)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_SPORT_AND);
+		break;
+	case XT_HMARK_DPORT_AND:
+		info->port_mask.p16.dst = htons(cb->val.u16);
+		if (cb->val.u16 == 0xffff)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_DPORT_AND);
+		break;
+	case XT_HMARK_SPORT_OR:
+		info->port_set.p16.src = htons(cb->val.u16);
+		if (cb->val.u16 == 0)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_SPORT_OR);
+		break;
+	case XT_HMARK_DPORT_OR:
+		info->port_set.p16.dst = htons(cb->val.u16);
+		if (cb->val.u16 == 0)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_DPORT_OR);
+		break;
+	case XT_HMARK_PROTO_AND:
+		if (cb->val.u16 == 0xffff)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_PROTO_AND);
+		break;
+	case XT_HMARK_MODULUS:
+		if (info->hmodulus == 0) {
+			xtables_error(PARAMETER_PROBLEM,
+				      "xxx modulus 0 ? "
+				      "thats a div by 0");
+			info->hmodulus = 0xffffffff;
+		}
+		break;
+	case XT_HMARK_METHOD_L3:
+		if (strcmp(cb->arg, "L3") == 0) {
+			info->proto_mask = 0;
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_METHOD_L3_4);
+		} else if (strcmp(cb->arg, "L3-4") == 0) {
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_METHOD_L3);
+			cb->xflags |= XT_HMARK_FLAG(XT_HMARK_METHOD_L3_4);
+		}
+		break;
+	}
+	info->flags = cb->xflags;
+}
+
+static void HMARK_ip4_parse(struct xt_option_call *cb)
+{
+	HMARK_parse(cb, 32);
+}
+static void HMARK_ip6_parse(struct xt_option_call *cb)
+{
+	HMARK_parse(cb, 128);
+}
+
+static void HMARK_check(struct xt_fcheck_call *cb)
+{
+	if (!(cb->xflags & XT_HMARK_FLAG(XT_HMARK_MODULUS)))
+		xtables_error(PARAMETER_PROBLEM, "HMARK: the --hmark-mod, "
+			      "is not set, or zero wich is a div by zero");
+	/* Check for invalid options */
+	if (cb->xflags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3) &&
+	   (cb->xflags & XT_F_HMARK_L4_OPTS))
+		xtables_error(PARAMETER_PROBLEM, "HMARK: --hmark-method L3, "
+			      "can not be combined by an Layer 4 options: "
+			      "port, spi or proto ");
+}
+/*
+ * Common print for IPv4 & IPv6
+ */
+static void HMARK_print(const struct xt_hmark_info *info)
+{
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3)) {
+		printf("method L3 ");
+	} else {
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3_4))
+			printf("method L3-4 ");
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPORT_AND))
+			printf("sport-mask 0x%x ",
+			       htons(info->port_mask.p16.src));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_DPORT_AND))
+			printf("dport-mask 0x%x ",
+			       htons(info->port_mask.p16.dst));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPI_AND))
+			printf("spi-mask 0x%x ", htonl(info->spi_mask));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPORT_OR))
+			printf("sport-set 0x%x ",
+			       htons(info->port_set.p16.src));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_DPORT_OR))
+			printf("dport-set 0x%x ",
+			       htons(info->port_set.p16.dst));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPI_OR))
+			printf("spi-set 0x%x ", htonl(info->spi_set));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_PROTO_AND))
+			printf("proto-mask 0x%x ", info->proto_mask);
+	}
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_RND))
+		printf("rnd 0x%x ", info->hashrnd);
+
+}
+
+static void HMARK_ip6_print(const void *ip,
+			    const struct xt_entry_target *target, int numeric)
+{
+	const struct xt_hmark_info *info =
+			(const struct xt_hmark_info *)target->data;
+
+	printf(" HMARK ");
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_MODULUS))
+		printf("%% 0x%x ", info->hmodulus);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_OFFSET))
+		printf("+ 0x%x ", info->hoffset);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_CT))
+		printf("ct, ");
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_SADR_AND))
+		printf("src-mask %s ",
+		       xtables_ip6mask_to_numeric(&info->src_mask.in6) + 1);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_DADR_AND))
+		printf("dst-mask %s ",
+		       xtables_ip6mask_to_numeric(&info->dst_mask.in6) + 1);
+	HMARK_print(info);
+}
+static void HMARK_ip4_print(const void *ip,
+			    const struct xt_entry_target *target, int numeric)
+{
+	const struct xt_hmark_info *info =
+		(const struct xt_hmark_info *)target->data;
+
+	printf(" HMARK ");
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_MODULUS))
+		printf("%% 0x%x ", info->hmodulus);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_OFFSET))
+		printf("+ 0x%x ", info->hoffset);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_CT))
+		printf("ct, ");
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_SADR_AND))
+		printf("src-mask %s ",
+		       xtables_ipmask_to_numeric(&info->src_mask.in) + 1);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_DADR_AND))
+		printf("dst-mask %s ",
+		       xtables_ipmask_to_numeric(&info->dst_mask.in) + 1);
+	HMARK_print(info);
+}
+static void HMARK_save(const struct xt_hmark_info *info)
+{
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3)) {
+		printf(" --hmark-method L3");
+	} else {
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3_4))
+			printf(" --hmark-method L3-4");
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPORT_AND))
+			printf(" --hmark-sport-mask 0x%x",
+			       htons(info->port_mask.p16.src));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_DPORT_AND))
+			printf(" --hmark-dport-mask 0x%x",
+			       htons(info->port_mask.p16.dst));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPI_AND))
+			printf(" --hmark-spi-mask 0x%x",
+			       htonl(info->spi_mask));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPORT_OR))
+			printf(" --hmark-sport-set 0x%x",
+			       htons(info->port_set.p16.src));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_DPORT_OR))
+			printf(" --hmark-dport-set 0x%x",
+			       htons(info->port_set.p16.dst));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPI_OR))
+			printf(" --hmark-spi-set 0x%x", htonl(info->spi_set));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_PROTO_AND))
+			printf(" --hmark-proto-mask 0x%x", info->proto_mask);
+	}
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_RND))
+		printf(" --hmark-rnd 0x%x", info->hashrnd);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_MODULUS))
+		printf(" --hmark-mod 0x%x", info->hmodulus);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_OFFSET))
+		printf(" --hmark-offset 0x%x", info->hoffset);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_CT))
+		printf(" --hmark-ct");
+}
+
+static void HMARK_ip6_save(const void *ip, const struct xt_entry_target *target)
+{
+	const struct xt_hmark_info *info =
+		(const struct xt_hmark_info *)target->data;
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_SADR_AND))
+		printf(" --hmark-src-mask %s",
+		       xtables_ip6mask_to_numeric(&info->src_mask.in6) + 1);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_DADR_AND))
+		printf(" --hmark-dst-mask %s",
+		       xtables_ip6mask_to_numeric(&info->dst_mask.in6) + 1);
+	HMARK_save(info);
+}
+
+static void HMARK_ip4_save(const void *ip, const struct xt_entry_target *target)
+{
+	const struct xt_hmark_info *info =
+		(const struct xt_hmark_info *)target->data;
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_SADR_AND))
+		printf(" --hmark-src-mask %s",
+		       xtables_ipmask_to_numeric(&info->src_mask.in) + 1);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_DADR_AND))
+		printf(" --hmark-dst-mask %s",
+		       xtables_ipmask_to_numeric(&info->dst_mask.in) + 1);
+	HMARK_save(info);
+}
+
+static struct xtables_target mark_tg_reg[] = {
+	{
+		.family        = NFPROTO_IPV4,
+		.name          = "HMARK",
+		.version       = XTABLES_VERSION,
+		.revision      = 0,
+		.size          = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.userspacesize = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.help          = HMARK_help,
+		.print         = HMARK_ip4_print,
+		.save          = HMARK_ip4_save,
+		.x6_parse      = HMARK_ip4_parse,
+		.x6_fcheck     = HMARK_check,
+		.x6_options    = HMARK_opts,
+	},
+	{
+		.family        = NFPROTO_IPV6,
+		.name          = "HMARK",
+		.version       = XTABLES_VERSION,
+		.revision      = 0,
+		.size          = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.userspacesize = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.help          = HMARK_help,
+		.print         = HMARK_ip6_print,
+		.save          = HMARK_ip6_save,
+		.x6_parse      = HMARK_ip6_parse,
+		.x6_fcheck     = HMARK_check,
+		.x6_options    = HMARK_opts,
+	},
+};
+
+void _init(void)
+{
+	xtables_register_targets(mark_tg_reg, ARRAY_SIZE(mark_tg_reg));
+}
diff --git a/extensions/libxt_HMARK.man b/extensions/libxt_HMARK.man
new file mode 100644
index 0000000..c258e59
--- /dev/null
+++ b/extensions/libxt_HMARK.man
@@ -0,0 +1,84 @@
+This module does the same as MARK, i.e. set an fwmark, but the mark is based on a hash value.
+The hash is based on src-addr, dst-addr, sport, dport and proto. The same mark will be produced independent of direction if no masks is set or the same masks is used for src and dest.
+The hash mark could be adjusted by modulus and finally an offset could be added, i.e the final mark will be within a range.
+ICMP error will use the the original message for hash calculation not the icmp it self.
+
+Note: IPv4 packets with nf_defrag_ipv4 loaded will be defragmented before they reach hmark,
+      IPv6 nf_defrag is not implemented this way, hence fragmented ipv6 packets will reach hmark.
+      Default behavior is to completely ignore any fragment if it reach hmark.
+      --hmark-method L3 is fragment safe since neither ports or L4 protocol field is used.
+      None of the parameters effect the packet it self only the calculated hash value.
+
+.PP
+Parameters:
+Short hand methods
+.TP
+\fB\-\-hmark\-method\fP \fIL3\fP
+Do not use L4 protocol field, ports or spi, only Layer 3 addresses, mask length
+of L3 addresses can still be used. Fragment or not does not matter in
+this case since only L3 address can be used in calc. of hash value.
+.TP
+\fB\-\-hmark\-method\fP \fIL3-4\fP (Default)
+Include L4 in calculation. of hash value i.e. all masks below are valid.
+Fragments will be ignored. (i.e no hash value produced)
+.PP
+For all masks default is all "1:s", to disable a field use mask 0
+.TP
+\fB\-\-hmark\-src\-mask\fP \fIlength\fP
+The length of the mask to AND the source address with (saddr & value).
+.TP
+\fB\-\-hmark\-dst\-mask\fP \fIlength\fP
+The length of the mask to AND the dest. address with (daddr & value).
+.TP
+\fB\-\-hmark\-sport\-mask\fP \fIvalue\fP
+A 16 bit value to AND the src port with (sport & value).
+.TP
+\fB\-\-hmark\-dport\-mask\fP \fIvalue\fP
+A 16 bit value to AND the dest port with (dport & value).
+.TP
+\fB\-\-hmark\-sport\-set\fP \fIvalue\fP
+A 16 bit value to OR the src port with (sport | value).
+.TP
+\fB\-\-hmark\-dport\-set\fP \fIvalue\fP
+A 16 bit value to OR the dest port with (dport | value).
+.TP
+\fB\-\-hmark\-spi\-mask\fP \fIvalue\fP
+Value to AND the spi field with (spi & value) valid for proto esp or ah.
+.TP
+\fB\-\-hmark\-spi\-set\fP \fIvalue\fP
+Value to OR the spi field with (spi | value) valid for proto esp or ah.
+.TP
+\fB\-\-hmark\-proto\-mask\fP \fIvalue\fP
+An 8 bit value to AND the L4 proto field with (proto & value).
+.TP
+\fB\-\-hmark\-ct\fP
+When flag is set, conntrack data should be used. Useful when NAT internal addressed should be used in calculation.
+Be careful when using DNAT since mangle table is handled before nat table. I.e it will not work as expected to put HMARK in table mangle and PREROUTING chain. The initial packet will have it's hash based on the original address, while the rest of the flow will use the NAT:ed address.
+.TP
+\fB\-\-hmark\-rnd\fP \fIvalue\fP
+A 32 bit initial value for hash calc, default is 0xc175a3b8.
+.PP
+Final processing of the mark in order of execution.
+.TP
+\fB\-\-hmark\-mod\fP \fIvalue (must be > 0)\fP
+The easiest way to describe this is:  hash = hash mod <value>
+.TP
+\fB\-\-hmark\-offset\fP \fIvalue\fP
+The easiest way to describe this is:  hash = hash + <value>
+.PP
+\fIExamples:\fP
+.PP
+Default rule handles all TCP, UDP, SCTP, ESP & AH
+.IP
+iptables \-t mangle \-A PREROUTING \-m state \-\-state NEW,ESTABLISHED,RELATED
+ \-j HMARK \-\-hmark-offs 10000 \-\-hmark-mod 10
+.PP
+Handle SCTP and hash dest port only and produce a nfmark between 100-119.
+.IP
+iptables \-t mangle \-A PREROUTING -p SCTP \-j HMARK \-\-src\-mask 0 \-\-dst\-mask 0
+ \-\-sp\-mask 0 \-\-offset 100 \-\-mod 20
+.PP
+Fragment safe Layer 3 only that keep a class C network flow together
+.IP
+iptables \-t mangle \-A PREROUTING \-j HMARK \-\-method L3 \-\-src\-mask 24 \-\-mod 20 \-\-offset 100
+
diff --git a/include/linux/netfilter/xt_HMARK.h b/include/linux/netfilter/xt_HMARK.h
new file mode 100644
index 0000000..05e43ba
--- /dev/null
+++ b/include/linux/netfilter/xt_HMARK.h
@@ -0,0 +1,48 @@
+#ifndef XT_HMARK_H_
+#define XT_HMARK_H_
+
+#include <linux/types.h>
+
+enum {
+	XT_HMARK_NONE,
+	XT_HMARK_SADR_AND,
+	XT_HMARK_DADR_AND,
+	XT_HMARK_SPI_AND,
+	XT_HMARK_SPI_OR,
+	XT_HMARK_SPORT_AND,
+	XT_HMARK_DPORT_AND,
+	XT_HMARK_SPORT_OR,
+	XT_HMARK_DPORT_OR,
+	XT_HMARK_PROTO_AND,
+	XT_HMARK_RND,
+	XT_HMARK_MODULUS,
+	XT_HMARK_OFFSET,
+	XT_HMARK_CT,
+	XT_HMARK_METHOD_L3,
+	XT_HMARK_METHOD_L3_4,
+};
+#define XT_HMARK_FLAG(flag)	(1 << flag)
+
+union hmark_ports {
+	struct {
+		__u16	src;
+		__u16	dst;
+	} p16;
+	__u32	v32;
+};
+
+struct xt_hmark_info {
+	union nf_inet_addr	src_mask;	/* Source address mask */
+	union nf_inet_addr	dst_mask;	/* Dest address mask */
+	union hmark_ports	port_mask;
+	union hmark_ports	port_set;
+	__u32			spi_mask;
+	__u32			spi_set;
+	__u32			flags;		/* Print out only */
+	__u16			proto_mask;	/* L4 Proto mask */
+	__u32			hashrnd;
+	__u32			hmodulus;	/* Modulus */
+	__u32			hoffset;	/* Offset */
+};
+
+#endif /* XT_HMARK_H_ */
-- 
1.7.2.3


^ permalink raw reply related

* Re: [PATCH RESEND 3/5] can: flexcan: adopt pinctrl support
From: Shawn Guo @ 2012-05-07  8:17 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: linux-arm-kernel, Arnd Bergmann, Olof Johansson, Sascha Hauer,
	Dong Aisheng, linux-can, Linux Netdev List
In-Reply-To: <4FA78298.4080401@pengutronix.de>

On 7 May 2012 16:06, Marc Kleine-Budde <mkl@pengutronix.de> wrote:
> It doesn't compile yet against net-next/master, which is based on v3.4-rc4:
>
> /home/frogger/pengutronix/socketcan/linux/drivers/net/can/flexcan.c: In
> function 'flexcan_probe':
> /home/frogger/pengutronix/socketcan/linux/drivers/net/can/flexcan.c:937:
> error: implicit declaration of function 'devm_pinctrl_get_select_default'
> /home/frogger/pengutronix/socketcan/linux/drivers/net/can/flexcan.c:937:
> warning: assignment makes pointer from integer without a cast
>
> Which tree does this series depend on? If this should go over the
> linux-can tree, I have to ask David first to merge this.
>
Thanks for the response, Marc.  The patch depends on pinctrl tree and
a couple of other patches that will go through arm-soc tree, so I
would like to ask for your ack to have the patch go over arm-soc tree.

Regards,
Shawn

^ permalink raw reply

* Re: [PATCH RESEND 3/5] can: flexcan: adopt pinctrl support
From: Marc Kleine-Budde @ 2012-05-07  8:06 UTC (permalink / raw)
  To: Shawn Guo
  Cc: linux-arm-kernel, Arnd Bergmann, Olof Johansson, Sascha Hauer,
	Dong Aisheng, linux-can, Linux Netdev List
In-Reply-To: <1336352040-28447-4-git-send-email-shawn.guo@linaro.org>

[-- Attachment #1: Type: text/plain, Size: 1078 bytes --]

Hello,

On 05/07/2012 02:53 AM, Shawn Guo wrote:
> Cc: linux-can@vger.kernel.org
> Cc: Marc Kleine-Budde <mkl@pengutronix.de>
> Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
> ---

It doesn't compile yet against net-next/master, which is based on v3.4-rc4:

/home/frogger/pengutronix/socketcan/linux/drivers/net/can/flexcan.c: In
function 'flexcan_probe':
/home/frogger/pengutronix/socketcan/linux/drivers/net/can/flexcan.c:937:
error: implicit declaration of function 'devm_pinctrl_get_select_default'
/home/frogger/pengutronix/socketcan/linux/drivers/net/can/flexcan.c:937:
warning: assignment makes pointer from integer without a cast

Which tree does this series depend on? If this should go over the
linux-can tree, I have to ask David first to merge this.

regards,
Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply

* Re: [PATCH v2] RPS: Sparse connection optimizations - v2
From: Deng-Cheng Zhu @ 2012-05-07  8:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Tom Herbert, davem, netdev
In-Reply-To: <1336376282.3752.2252.camel@edumazet-glaptop>

On 05/07/2012 03:38 PM, Eric Dumazet wrote:
> On Mon, 2012-05-07 at 14:48 +0800, Deng-Cheng Zhu wrote:
>> On 05/04/2012 11:31 PM, Tom Herbert wrote:
>>>> I think the mechanisms of rps_dev_flow_table and cpu_flow (in this
>>>> patch) are different: The former works along with rps_sock_flow_table
>>>> whose CPU info is based on recvmsg by the application. But for the tests
>>>> like what I did, there's no application involved.
>>>>
>>> While rps_sock_flow_table is currently only managed by recvmsg, it
>>> still is the general mechanism that maps flows to CPUs for steering.
>>> There should be nothing preventing you from populating and managing
>>> entries in other ways.
>>
>> Well, even using rps_sock_flow_table to map the sparse flows to CPUs,
>> we still need a data structure to describe a single flow -- that's what
>> struct cpu_flow is doing. Besides, rps_sock_flow_table, by its meaning,
>> does not seem to make sense for our purpose. How about keeping the patch
>> as is but renaming struct cpu_flow to struct rps_sparse_flow? It's like:
>>
>
> sock_flow_table is about mapping a flow (by its rxhash) to cpu.
>
> If you feel 'sock' is bad name, you can rename it.
>
> You dont need adding new data structure and code in fast path.
>
> Only the first packet of a new flow might be handled by 'the wrong cpu'.
>
> If you add code in forward path to change flow_table for next packets,
> added cost in fast path is null.

Did you really read my patch and understand what I commented? When I was
talking about using rps_sparse_flow (initially cpu_flow), neither
rps_sock_flow_table nor rps_dev_flow_table is activated (number of
entries: 0).

FYI below:

On 05/04/2012 11:39 AM, Deng-Cheng Zhu wrote:
 > On 05/04/2012 11:22 AM, Tom Herbert wrote:
 >>> +struct cpu_flow {
 >>> + struct net_device *dev;
 >>> + u32 rxhash;
 >>> + unsigned long ts;
 >>> +};
 >>
 >> This seems like overkill, we already have the rps_flow_table and this
 >> used in accelerated RFS so the device can also take advantage of
 >> steering. Maybe somehow program that table for your sparse flows?
 >
 > In fact I did ever try something different in rps_flow_cnt (except for
 > rps_cpus, the only tunable thing relating to RPS in sysfs, am I
 > missing something?) and found no effect in my tests (iperf between 2
 > PCs via Malta which works as router and uses iptables/NAT+RPS)...


Deng-Cheng

^ permalink raw reply

* Re: [PATCH RESEND 0/5] Adopt pinctrl support for a few outstanding imx drivers
From: Dong Aisheng @ 2012-05-07  7:53 UTC (permalink / raw)
  To: Shawn Guo
  Cc: Dong Aisheng-B29396, linux-arm-kernel@lists.infradead.org,
	Arnd Bergmann, netdev@vger.kernel.org, Sascha Hauer, Wolfram Sang,
	linux-can@vger.kernel.org, Grant Likely, Marc Kleine-Budde,
	linux-i2c@vger.kernel.org, linux-serial@vger.kernel.org,
	Greg Kroah-Hartman, Olof Johansson,
	spi-devel-general@lists.sourceforge.net, Dong Aisheng,
	David S. Miller
In-Reply-To: <20120507073403.GG19389@S2101-09.ap.freescale.net>

On Mon, May 07, 2012 at 03:34:06PM +0800, Shawn Guo wrote:
> On Mon, May 07, 2012 at 02:50:02PM +0800, Dong Aisheng wrote:
> > Shouldn't we add the pinctrl states in dts file at the same time
> > with this patch series or using another separate patch to add them
> > before this series to avoid breaking the exist mx6q platforms?
> > 
> Ah, I just noticed that your patch "ARM: imx: enable pinctrl dummy
> states" did not cover imx6q.  I think we should do the same for imx6q,
Yes, doing that was to force people to add pinctrl states in dts file
rather than using dummy state since mx6 supports pinctrl driver.

> so that we can separate dts update from the driver change.  When all
> imx6q boards' dts files get updated to have pins defined for the
> devices, we can then remove dummy state for imx6q.  Doing so will ease
> the pinctrl migration for those imx6q boards.
> 
Well, considering we have several mx6 boards, i think i can also be fine
with this way to ease the mx6q pinctrl migration.

> Will update your patch on my branch to have dummy state enabled for
> imx6q.
> 
Then go ahead.

Regards
Dong Aisheng


^ permalink raw reply

* Re: [PATCH v2] RPS: Sparse connection optimizations - v2
From: Eric Dumazet @ 2012-05-07  7:38 UTC (permalink / raw)
  To: Deng-Cheng Zhu; +Cc: Tom Herbert, davem, netdev
In-Reply-To: <4FA77051.20804@mips.com>

On Mon, 2012-05-07 at 14:48 +0800, Deng-Cheng Zhu wrote:
> On 05/04/2012 11:31 PM, Tom Herbert wrote:
> >> I think the mechanisms of rps_dev_flow_table and cpu_flow (in this
> >> patch) are different: The former works along with rps_sock_flow_table
> >> whose CPU info is based on recvmsg by the application. But for the tests
> >> like what I did, there's no application involved.
> >>
> > While rps_sock_flow_table is currently only managed by recvmsg, it
> > still is the general mechanism that maps flows to CPUs for steering.
> > There should be nothing preventing you from populating and managing
> > entries in other ways.
> 
> Well, even using rps_sock_flow_table to map the sparse flows to CPUs,
> we still need a data structure to describe a single flow -- that's what
> struct cpu_flow is doing. Besides, rps_sock_flow_table, by its meaning,
> does not seem to make sense for our purpose. How about keeping the patch
> as is but renaming struct cpu_flow to struct rps_sparse_flow? It's like:
> 

sock_flow_table is about mapping a flow (by its rxhash) to cpu.

If you feel 'sock' is bad name, you can rename it.

You dont need adding new data structure and code in fast path.

Only the first packet of a new flow might be handled by 'the wrong cpu'.

If you add code in forward path to change flow_table for next packets,
added cost in fast path is null.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox