Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 07/25] ipvs: DH scheduler does not need GFP_ATOMIC allocation
From: pablo @ 2012-05-08  7:49 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1336463394-3119-1-git-send-email-pablo@netfilter.org>

From: Julian Anastasov <ja@ssi.bg>

	Schedulers are initialized and bound to services only
on commands.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/ip_vs_dh.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_dh.c b/net/netfilter/ipvs/ip_vs_dh.c
index 1a53a7a..8b7dca9 100644
--- a/net/netfilter/ipvs/ip_vs_dh.c
+++ b/net/netfilter/ipvs/ip_vs_dh.c
@@ -149,7 +149,7 @@ static int ip_vs_dh_init_svc(struct ip_vs_service *svc)
 
 	/* allocate the DH table for this service */
 	tbl = kmalloc(sizeof(struct ip_vs_dh_bucket)*IP_VS_DH_TAB_SIZE,
-		      GFP_ATOMIC);
+		      GFP_KERNEL);
 	if (tbl == NULL)
 		return -ENOMEM;
 
-- 
1.7.9.5


^ permalink raw reply related

* [PATCH 02/25] netfilter: nf_ct_helper: allow to disable automatic helper assignment
From: pablo @ 2012-05-08  7:49 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1336463394-3119-1-git-send-email-pablo@netfilter.org>

From: Eric Leblond <eric@regit.org>

This patch allows you to disable automatic conntrack helper
lookup based on TCP/UDP ports, eg.

echo 0 > /proc/sys/net/netfilter/nf_conntrack_helper

[ Note: flows that already got a helper will keep using it even
  if automatic helper assignment has been disabled ]

Once this behaviour has been disabled, you have to explicitly
use the iptables CT target to attach helper to flows.

There are good reasons to stop supporting automatic helper
assignment, for further information, please read:

http://www.netfilter.org/news.html#2012-04-03

This patch also adds one message to inform that automatic helper
assignment is deprecated and it will be removed soon (this is
spotted only once, with the first flow that gets a helper attached
to make it as less annoying as possible).

Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_conntrack_helper.h |    4 +-
 include/net/netns/conntrack.h               |    3 +
 net/netfilter/nf_conntrack_core.c           |   15 ++--
 net/netfilter/nf_conntrack_helper.c         |  108 ++++++++++++++++++++++++---
 4 files changed, 109 insertions(+), 21 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_helper.h b/include/net/netfilter/nf_conntrack_helper.h
index 5767dc2..1d18894 100644
--- a/include/net/netfilter/nf_conntrack_helper.h
+++ b/include/net/netfilter/nf_conntrack_helper.h
@@ -60,8 +60,8 @@ static inline struct nf_conn_help *nfct_help(const struct nf_conn *ct)
 	return nf_ct_ext_find(ct, NF_CT_EXT_HELPER);
 }
 
-extern int nf_conntrack_helper_init(void);
-extern void nf_conntrack_helper_fini(void);
+extern int nf_conntrack_helper_init(struct net *net);
+extern void nf_conntrack_helper_fini(struct net *net);
 
 extern int nf_conntrack_broadcast_help(struct sk_buff *skb,
 				       unsigned int protoff,
diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
index 7a911ec..a053a19 100644
--- a/include/net/netns/conntrack.h
+++ b/include/net/netns/conntrack.h
@@ -26,11 +26,14 @@ struct netns_ct {
 	int			sysctl_tstamp;
 	int			sysctl_checksum;
 	unsigned int		sysctl_log_invalid; /* Log invalid packets */
+	int			sysctl_auto_assign_helper;
+	bool			auto_assign_helper_warned;
 #ifdef CONFIG_SYSCTL
 	struct ctl_table_header	*sysctl_header;
 	struct ctl_table_header	*acct_sysctl_header;
 	struct ctl_table_header	*tstamp_sysctl_header;
 	struct ctl_table_header	*event_sysctl_header;
+	struct ctl_table_header	*helper_sysctl_header;
 #endif
 	char			*slabname;
 };
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index cf0747c..32c5909 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1336,7 +1336,6 @@ static void nf_conntrack_cleanup_init_net(void)
 	while (untrack_refs() > 0)
 		schedule();
 
-	nf_conntrack_helper_fini();
 	nf_conntrack_proto_fini();
 #ifdef CONFIG_NF_CONNTRACK_ZONES
 	nf_ct_extend_unregister(&nf_ct_zone_extend);
@@ -1354,6 +1353,7 @@ static void nf_conntrack_cleanup_net(struct net *net)
 	}
 
 	nf_ct_free_hashtable(net->ct.hash, net->ct.htable_size);
+	nf_conntrack_helper_fini(net);
 	nf_conntrack_timeout_fini(net);
 	nf_conntrack_ecache_fini(net);
 	nf_conntrack_tstamp_fini(net);
@@ -1504,10 +1504,6 @@ static int nf_conntrack_init_init_net(void)
 	if (ret < 0)
 		goto err_proto;
 
-	ret = nf_conntrack_helper_init();
-	if (ret < 0)
-		goto err_helper;
-
 #ifdef CONFIG_NF_CONNTRACK_ZONES
 	ret = nf_ct_extend_register(&nf_ct_zone_extend);
 	if (ret < 0)
@@ -1525,10 +1521,8 @@ static int nf_conntrack_init_init_net(void)
 
 #ifdef CONFIG_NF_CONNTRACK_ZONES
 err_extend:
-	nf_conntrack_helper_fini();
-#endif
-err_helper:
 	nf_conntrack_proto_fini();
+#endif
 err_proto:
 	return ret;
 }
@@ -1589,9 +1583,14 @@ static int nf_conntrack_init_net(struct net *net)
 	ret = nf_conntrack_timeout_init(net);
 	if (ret < 0)
 		goto err_timeout;
+	ret = nf_conntrack_helper_init(net);
+	if (ret < 0)
+		goto err_helper;
 
 	return 0;
 
+err_helper:
+	nf_conntrack_timeout_fini(net);
 err_timeout:
 	nf_conntrack_ecache_fini(net);
 err_ecache:
diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c
index 436b7cb..55234dd 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -34,6 +34,66 @@ static struct hlist_head *nf_ct_helper_hash __read_mostly;
 static unsigned int nf_ct_helper_hsize __read_mostly;
 static unsigned int nf_ct_helper_count __read_mostly;
 
+static bool nf_ct_auto_assign_helper __read_mostly = true;
+module_param_named(nf_conntrack_helper, nf_ct_auto_assign_helper, bool, 0644);
+MODULE_PARM_DESC(nf_conntrack_helper,
+		 "Enable automatic conntrack helper assignment (default 1)");
+
+#ifdef CONFIG_SYSCTL
+static struct ctl_table helper_sysctl_table[] = {
+	{
+		.procname	= "nf_conntrack_helper",
+		.data		= &init_net.ct.sysctl_auto_assign_helper,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
+	{}
+};
+
+static int nf_conntrack_helper_init_sysctl(struct net *net)
+{
+	struct ctl_table *table;
+
+	table = kmemdup(helper_sysctl_table, sizeof(helper_sysctl_table),
+			GFP_KERNEL);
+	if (!table)
+		goto out;
+
+	table[0].data = &net->ct.sysctl_auto_assign_helper;
+
+	net->ct.helper_sysctl_header = register_net_sysctl_table(net,
+			nf_net_netfilter_sysctl_path, table);
+	if (!net->ct.helper_sysctl_header) {
+		pr_err("nf_conntrack_helper: can't register to sysctl.\n");
+		goto out_register;
+	}
+	return 0;
+
+out_register:
+	kfree(table);
+out:
+	return -ENOMEM;
+}
+
+static void nf_conntrack_helper_fini_sysctl(struct net *net)
+{
+	struct ctl_table *table;
+
+	table = net->ct.helper_sysctl_header->ctl_table_arg;
+	unregister_net_sysctl_table(net->ct.helper_sysctl_header);
+	kfree(table);
+}
+#else
+static int nf_conntrack_helper_init_sysctl(struct net *net)
+{
+	return 0;
+}
+
+static void nf_conntrack_helper_fini_sysctl(struct net *net)
+{
+}
+#endif /* CONFIG_SYSCTL */
 
 /* Stupid hash, but collision free for the default registrations of the
  * helpers currently in the kernel. */
@@ -118,6 +178,7 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl,
 {
 	struct nf_conntrack_helper *helper = NULL;
 	struct nf_conn_help *help;
+	struct net *net = nf_ct_net(ct);
 	int ret = 0;
 
 	if (tmpl != NULL) {
@@ -127,8 +188,17 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl,
 	}
 
 	help = nfct_help(ct);
-	if (helper == NULL)
+	if (net->ct.sysctl_auto_assign_helper && helper == NULL) {
 		helper = __nf_ct_helper_find(&ct->tuplehash[IP_CT_DIR_REPLY].tuple);
+		if (unlikely(!net->ct.auto_assign_helper_warned && helper)) {
+			pr_info("nf_conntrack: automatic helper "
+				"assignment is deprecated and it will "
+				"be removed soon. Use the iptables CT target "
+				"to attach helpers instead.\n");
+			net->ct.auto_assign_helper_warned = true;
+		}
+	}
+
 	if (helper == NULL) {
 		if (help)
 			RCU_INIT_POINTER(help->helper, NULL);
@@ -315,28 +385,44 @@ static struct nf_ct_ext_type helper_extend __read_mostly = {
 	.id	= NF_CT_EXT_HELPER,
 };
 
-int nf_conntrack_helper_init(void)
+int nf_conntrack_helper_init(struct net *net)
 {
 	int err;
 
-	nf_ct_helper_hsize = 1; /* gets rounded up to use one page */
-	nf_ct_helper_hash = nf_ct_alloc_hashtable(&nf_ct_helper_hsize, 0);
-	if (!nf_ct_helper_hash)
-		return -ENOMEM;
+	net->ct.auto_assign_helper_warned = false;
+	net->ct.sysctl_auto_assign_helper = nf_ct_auto_assign_helper;
 
-	err = nf_ct_extend_register(&helper_extend);
+	if (net_eq(net, &init_net)) {
+		nf_ct_helper_hsize = 1; /* gets rounded up to use one page */
+		nf_ct_helper_hash =
+			nf_ct_alloc_hashtable(&nf_ct_helper_hsize, 0);
+		if (!nf_ct_helper_hash)
+			return -ENOMEM;
+
+		err = nf_ct_extend_register(&helper_extend);
+		if (err < 0)
+			goto err1;
+	}
+
+	err = nf_conntrack_helper_init_sysctl(net);
 	if (err < 0)
-		goto err1;
+		goto out_sysctl;
 
 	return 0;
 
+out_sysctl:
+	if (net_eq(net, &init_net))
+		nf_ct_extend_unregister(&helper_extend);
 err1:
 	nf_ct_free_hashtable(nf_ct_helper_hash, nf_ct_helper_hsize);
 	return err;
 }
 
-void nf_conntrack_helper_fini(void)
+void nf_conntrack_helper_fini(struct net *net)
 {
-	nf_ct_extend_unregister(&helper_extend);
-	nf_ct_free_hashtable(nf_ct_helper_hash, nf_ct_helper_hsize);
+	nf_conntrack_helper_fini_sysctl(net);
+	if (net_eq(net, &init_net)) {
+		nf_ct_extend_unregister(&helper_extend);
+		nf_ct_free_hashtable(nf_ct_helper_hash, nf_ct_helper_hsize);
+	}
 }
-- 
1.7.9.5


^ permalink raw reply related

* [PATCH 01/25] netfilter: nf_ct_ecache: refactor notifier registration
From: pablo @ 2012-05-08  7:49 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1336463394-3119-1-git-send-email-pablo@netfilter.org>

From: Tony Zelenoff <antonz@parallels.com>

* ret variable initialization removed as useless
* similar code strings concatenated and functions code
  flow became more plain

Signed-off-by: Tony Zelenoff <antonz@parallels.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_ecache.c |   10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/nf_conntrack_ecache.c b/net/netfilter/nf_conntrack_ecache.c
index 5bd3047d..3a3409f 100644
--- a/net/netfilter/nf_conntrack_ecache.c
+++ b/net/netfilter/nf_conntrack_ecache.c
@@ -84,7 +84,7 @@ EXPORT_SYMBOL_GPL(nf_ct_deliver_cached_events);
 int nf_conntrack_register_notifier(struct net *net,
 				   struct nf_ct_event_notifier *new)
 {
-	int ret = 0;
+	int ret;
 	struct nf_ct_event_notifier *notify;
 
 	mutex_lock(&nf_ct_ecache_mutex);
@@ -95,8 +95,7 @@ int nf_conntrack_register_notifier(struct net *net,
 		goto out_unlock;
 	}
 	rcu_assign_pointer(net->ct.nf_conntrack_event_cb, new);
-	mutex_unlock(&nf_ct_ecache_mutex);
-	return ret;
+	ret = 0;
 
 out_unlock:
 	mutex_unlock(&nf_ct_ecache_mutex);
@@ -121,7 +120,7 @@ EXPORT_SYMBOL_GPL(nf_conntrack_unregister_notifier);
 int nf_ct_expect_register_notifier(struct net *net,
 				   struct nf_exp_event_notifier *new)
 {
-	int ret = 0;
+	int ret;
 	struct nf_exp_event_notifier *notify;
 
 	mutex_lock(&nf_ct_ecache_mutex);
@@ -132,8 +131,7 @@ int nf_ct_expect_register_notifier(struct net *net,
 		goto out_unlock;
 	}
 	rcu_assign_pointer(net->ct.nf_expect_event_cb, new);
-	mutex_unlock(&nf_ct_ecache_mutex);
-	return ret;
+	ret = 0;
 
 out_unlock:
 	mutex_unlock(&nf_ct_ecache_mutex);
-- 
1.7.9.5


^ permalink raw reply related

* [PATCH 00/25] [v2] netfilter updates for net-next (upcoming 3.5)
From: pablo @ 2012-05-08  7:49 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Pablo Neira Ayuso <pablo@netfilter.org>

Hi David,

Second version including requested updates.

The following patchset contains the Netfilter updates for net-next.
Most notably:

* The new /proc/sys/net/netfilter/nf_conntrack_helper entry that
  allows to disable the automatic conntrack helper assignment from
  Eric Leblond. This patch also spots a warning to inform the user
  that this behaviour will be removed at some point. The automatic
  conntrack helper assignment may allows attackers to open hole in
  the firewall to access the protected network segments (with
  incorrect configurations). More information on this issue at:

  https://home.regit.org/netfilter-en/secure-use-of-helpers/

  In the near future, all conntrack helpers will be explicitly
  attached via the CT target, as we longing discussed during
  the last netfilter workshop.

* One new sysctl to translate the input device to vlan device name
  from Florian Westphal. He required this to get the REDIRECT target
  working with another sysctl vlan-on-top-of-bridge.

* Major improvements in the ip_vs_sync code from Julian Anastasov.
  They aim to improve scalability and to address possible message
  loss due to socket overrun under high rate of synchronization
  messages.

* Several minor memory allocation flags fixes from IPVS people
  contributors.

* Eric Leblond's patch spotted one problem that becomes noticeable
  if a) automatic helper assignment is disabled, and b) if NAT is
  in use, and c) the CT target is used to attach a non-standard
  conntrack helper port. This fix comes from myself.

* One small update to allow updating the expectation timeout from
  Kelvie Wong.

* Finally, remove ip[6]_queue support since they have been marked
  as obsolete since long time ago. Now, we have nfnetlink_queue
  which is way more flexible from myself.

You can pull these changes from:

git://1984.lsi.us.es/net-next master

If time allows, I'd like to send a second batch. There a several patches
that are very close to get into shape still on netfilter-devel.

Thanks!

Eric Dumazet (1):
  netfilter: nf_conntrack: use this_cpu_inc()

Eric Leblond (1):
  netfilter: nf_ct_helper: allow to disable automatic helper assignment

Florian Westphal (1):
  netfilter: bridge: optionally set indev to vlan

H Hartley Sweeten (2):
  ipvs: ip_vs_ftp: local functions should not be exposed globally
  ipvs: ip_vs_proto: local functions should not be exposed globally

Hans Schillstrom (1):
  net: export sysctl_[r|w]mem_max symbols needed by ip_vs_sync

Julian Anastasov (14):
  ipvs: timeout tables do not need GFP_ATOMIC allocation
  ipvs: LBLC scheduler does not need GFP_ATOMIC allocation on init
  ipvs: DH scheduler does not need GFP_ATOMIC allocation
  ipvs: WRR scheduler does not need GFP_ATOMIC allocation
  ipvs: LBLCR scheduler does not need GFP_ATOMIC allocation on init
  ipvs: SH scheduler does not need GFP_ATOMIC allocation
  ipvs: ignore IP_VS_CONN_F_NOOUTPUT in backup server
  ipvs: remove check for IP_VS_CONN_F_SYNC from ip_vs_bind_dest
  ipvs: fix ip_vs_try_bind_dest to rebind app and transmitter
  ipvs: always update some of the flags bits in backup
  ipvs: wakeup master thread
  ipvs: reduce sync rate with time thresholds
  ipvs: add support for sync threads
  ipvs: optimize the use of flags in ip_vs_bind_dest

Kelvie Wong (1):
  netfilter: nf_ct_expect: partially implement ctnetlink_change_expect

Pablo Neira Ayuso (2):
  netfilter: nf_conntrack: fix explicit helper attachment and NAT
  netfilter: remove ip_queue support

Sasha Levin (1):
  ipvs: use GFP_KERNEL allocation where possible

Tony Zelenoff (1):
  netfilter: nf_ct_ecache: refactor notifier registration

 Documentation/ABI/removed/ip_queue            |    9 +
 Documentation/networking/ip-sysctl.txt        |   13 +-
 include/linux/ip_vs.h                         |    5 +
 include/linux/netfilter/nf_conntrack_common.h |    4 +
 include/linux/netfilter_ipv4/Kbuild           |    1 -
 include/linux/netfilter_ipv4/ip_queue.h       |   72 ---
 include/linux/netlink.h                       |    2 +-
 include/net/ip_vs.h                           |   87 +++-
 include/net/netfilter/nf_conntrack.h          |   10 +-
 include/net/netfilter/nf_conntrack_helper.h   |    4 +-
 include/net/netns/conntrack.h                 |    3 +
 net/bridge/br_netfilter.c                     |   26 +-
 net/core/sock.c                               |    2 +
 net/ipv4/netfilter/Makefile                   |    3 -
 net/ipv4/netfilter/ip_queue.c                 |  639 ------------------------
 net/ipv6/netfilter/Kconfig                    |   22 -
 net/ipv6/netfilter/Makefile                   |    1 -
 net/ipv6/netfilter/ip6_queue.c                |  641 ------------------------
 net/netfilter/ipvs/ip_vs_conn.c               |   69 ++-
 net/netfilter/ipvs/ip_vs_core.c               |   30 +-
 net/netfilter/ipvs/ip_vs_ctl.c                |   70 ++-
 net/netfilter/ipvs/ip_vs_dh.c                 |    2 +-
 net/netfilter/ipvs/ip_vs_ftp.c                |    2 +-
 net/netfilter/ipvs/ip_vs_lblc.c               |    2 +-
 net/netfilter/ipvs/ip_vs_lblcr.c              |    2 +-
 net/netfilter/ipvs/ip_vs_proto.c              |    6 +-
 net/netfilter/ipvs/ip_vs_sh.c                 |    2 +-
 net/netfilter/ipvs/ip_vs_sync.c               |  662 +++++++++++++++++--------
 net/netfilter/ipvs/ip_vs_wrr.c                |    2 +-
 net/netfilter/nf_conntrack_core.c             |   15 +-
 net/netfilter/nf_conntrack_ecache.c           |   10 +-
 net/netfilter/nf_conntrack_helper.c           |  120 ++++-
 net/netfilter/nf_conntrack_netlink.c          |   10 +-
 security/selinux/nlmsgtab.c                   |   13 -
 34 files changed, 853 insertions(+), 1708 deletions(-)
 create mode 100644 Documentation/ABI/removed/ip_queue
 delete mode 100644 include/linux/netfilter_ipv4/ip_queue.h
 delete mode 100644 net/ipv4/netfilter/ip_queue.c
 delete mode 100644 net/ipv6/netfilter/ip6_queue.c

-- 
1.7.9.5

^ permalink raw reply

* Re: [v12 PATCH 2/3] NETFILTER module xt_hmark, new target for HASH based fwmark
From: Hans Schillstrom @ 2012-05-08  7:37 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: kaber@trash.net, jengelh@medozas.de,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org,
	hans@schillstrom.com
In-Reply-To: <20120507122232.GA32146@1984>

[-- Attachment #1: Type: text/plain, Size: 1517 bytes --]

On Monday 07 May 2012 14:22:32 Pablo Neira Ayuso wrote:
> On Mon, May 07, 2012 at 02:09:46PM +0200, Hans Schillstrom wrote:
> > On Monday 07 May 2012 13:56:12 Pablo Neira Ayuso wrote:
> > > On Mon, May 07, 2012 at 11:14:34AM +0200, Hans Schillstrom wrote:
> > > > > > We have plenty of rules where just source port mask is zero.
> > > > > > and the dest-port-mask is 0xfffc (or 0xffff)
> > > > > 
> > > > > 0xffff and 0x0000 means on/off respectively.
> > > > > 
> > > > > Still curious, how can 0xfffc be useful?
> > > > 
> > > > That's a special case where an appl is using 4 ports.
> > > > But in general, have not seen other than "on/off" except for above.
> > > 
> > > I see. Well I'm fine with this way to switch on/off things, just
> > > wanted some clafication.
> > > 
> > > Still one final thing I'd like to remove before inclusion:
> > > 
> > > +       union hmark_ports       port_mask;
> > > +       union hmark_ports       port_set;
> > > +       __u32                   spi_mask;
> > > +       __u32                   spi_set;
> > > 
> > > the spi_mask seems redundant. The port_mask already provides u32 for
> > > it.
> > 
> > No problems, I'll remove it.
> 

Done,

> OK. As a nice side-effect, this will lead to removing the branch that
> tests ESP/AH in hmark_set_tuple_ports.

Yes,

[snip]
> remove all trailing _OR
> rename all _AND by _MASK.
Done

[snip]
> iptables can stop this by spotting a warning message from user-space.
Done.


-- 
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

[-- Attachment #2: 0001-netfilter-add-xt_hmark-target-for-hash-based-skb-mar.patch --]
[-- Type: text/x-patch, Size: 14512 bytes --]

From d5065af3988cc7561a02f30bae8342e1a89126a4 Mon Sep 17 00:00:00 2001
From: Hans Schillstrom <hans.schillstrom@ericsson.com>
Date: Wed, 2 May 2012 07:49:47 +0000
Subject: netfilter: add xt_hmark target for hash-based skb
 marking

The target allows you to create rules in the "raw" and "mangle" tables
which set the skbuff mark by means of hash calculation within a given
range. The nfmark can influence the routing method (see "Use netfilter
MARK value as routing key") and can also be used by other subsystems to
change their behaviour.

Some examples:

* Default rule handles all TCP, UDP, SCTP, ESP & AH

 iptables -t mangle -A PREROUTING -m state --state NEW,ESTABLISHED,RELATED \
	-j HMARK --hmark-offset 10000 --hmark-mod 10

* Handle SCTP and hash dest port only and produce a nfmark between 100-119.

 iptables -t mangle -A PREROUTING -p SCTP -j HMARK --src-mask 0 --dst-mask 0 \
	--sp-mask 0 --offset 100 --mod 20

* Fragment safe Layer 3 only, that keep a class C network flow together

 iptables -t mangle -A PREROUTING -j HMARK --method L3 \
	--src-mask 24 --mod 20 --offset 100

[ A big part of this patch has been refactorized by Pablo Neira Ayuso ]

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 include/linux/netfilter/xt_HMARK.h |   48 +++++
 net/netfilter/Kconfig              |   15 ++
 net/netfilter/Makefile             |    1 +
 net/netfilter/xt_HMARK.c           |  358 ++++++++++++++++++++++++++++++++++++
 4 files changed, 422 insertions(+)
 create mode 100644 include/linux/netfilter/xt_HMARK.h
 create mode 100644 net/netfilter/xt_HMARK.c

diff --git a/include/linux/netfilter/xt_HMARK.h b/include/linux/netfilter/xt_HMARK.h
new file mode 100644
index 0000000..05e43ba
--- /dev/null
+++ b/include/linux/netfilter/xt_HMARK.h
@@ -0,0 +1,46 @@
+#ifndef XT_HMARK_H_
+#define XT_HMARK_H_
+
+#include <linux/types.h>
+
+enum {
+	XT_HMARK_NONE,
+	XT_HMARK_SADR_MASK,
+	XT_HMARK_DADR_MASK,
+	XT_HMARK_SPI_MASK,
+	XT_HMARK_SPI,
+	XT_HMARK_SPORT_MASK,
+	XT_HMARK_DPORT_MASK,
+	XT_HMARK_SPORT,
+	XT_HMARK_DPORT,
+	XT_HMARK_PROTO_MASK,
+	XT_HMARK_RND,
+	XT_HMARK_MODULUS,
+	XT_HMARK_OFFSET,
+	XT_HMARK_CT,
+	XT_HMARK_METHOD_L3,
+	XT_HMARK_METHOD_L3_4,
+};
+#define XT_HMARK_FLAG(flag)	(1 << flag)
+
+union hmark_ports {
+	struct {
+		__u16	src;
+		__u16	dst;
+	} p16;
+	__u32	v32;
+};
+
+struct xt_hmark_info {
+	union nf_inet_addr	src_mask;	/* Source address mask */
+	union nf_inet_addr	dst_mask;	/* Dest address mask */
+	union hmark_ports	port_mask;
+	union hmark_ports	port_set;
+	__u32			flags;		/* Print out only */
+	__u16			proto_mask;	/* L4 Proto mask */
+	__u32			hashrnd;
+	__u32			hmodulus;	/* Modulus */
+	__u32			hoffset;	/* Offset */
+};
+
+#endif /* XT_HMARK_H_ */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 0c6f67e..209c1ed 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -509,6 +509,21 @@ config NETFILTER_XT_TARGET_HL
 	since you can easily create immortal packets that loop
 	forever on the network.
 
+config NETFILTER_XT_TARGET_HMARK
+	tristate '"HMARK" target support'
+	depends on (IP6_NF_IPTABLES || IP6_NF_IPTABLES=n)
+	depends on NETFILTER_ADVANCED
+	---help---
+	This option adds the "HMARK" target.
+
+	The target allows you to create rules in the "raw" and "mangle" tables
+	which set the skbuff mark by means of hash calculation within a given
+	range. The nfmark can influence the routing method (see "Use netfilter
+	MARK value as routing key") and can also be used by other subsystems to
+	change their behaviour.
+
+	To compile it as a module, choose M here. If unsure, say N.
+
 config NETFILTER_XT_TARGET_IDLETIMER
 	tristate  "IDLETIMER target support"
 	depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index ca36765..4e7960c 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_CONNSECMARK) += xt_CONNSECMARK.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_CT) += xt_CT.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_DSCP) += xt_DSCP.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_HL) += xt_HL.o
+obj-$(CONFIG_NETFILTER_XT_TARGET_HMARK) += xt_HMARK.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_LED) += xt_LED.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_LOG) += xt_LOG.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_NFLOG) += xt_NFLOG.o
diff --git a/net/netfilter/xt_HMARK.c b/net/netfilter/xt_HMARK.c
new file mode 100644
index 0000000..b4aa912
--- /dev/null
+++ b/net/netfilter/xt_HMARK.c
@@ -0,0 +1,362 @@
+/*
+ * xt_HMARK - Netfilter module to set mark by means of hashing
+ *
+ * (C) 2012 by Hans Schillstrom <hans.schillstrom@ericsson.com>
+ * (C) 2012 by Pablo Neira Ayuso <pablo@netfilter.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/icmp.h>
+
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_HMARK.h>
+
+#include <net/ip.h>
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+#include <net/netfilter/nf_conntrack.h>
+#endif
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+#include <net/ipv6.h>
+#include <linux/netfilter_ipv6/ip6_tables.h>
+#endif
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Hans Schillstrom <hans.schillstrom@ericsson.com>");
+MODULE_DESCRIPTION("Xtables: packet marking using hash calculation");
+MODULE_ALIAS("ipt_HMARK");
+MODULE_ALIAS("ip6t_HMARK");
+
+struct hmark_tuple {
+	u32			src;
+	u32			dst;
+	union hmark_ports	uports;
+	uint8_t			proto;
+};
+
+static inline u32 hmark_addr6_mask(const __u32 *addr32, const __u32 *mask)
+{
+	return (addr32[0] & mask[0]) ^
+	       (addr32[1] & mask[1]) ^
+	       (addr32[2] & mask[2]) ^
+	       (addr32[3] & mask[3]);
+}
+
+static inline u32
+hmark_addr_mask(int l3num, const __u32 *addr32, const __u32 *mask)
+{
+	switch (l3num) {
+	case AF_INET:
+		return *addr32 & *mask;
+	case AF_INET6:
+		return hmark_addr6_mask(addr32, mask);
+	}
+	return 0;
+}
+
+static int
+hmark_ct_set_htuple(const struct sk_buff *skb, struct hmark_tuple *t,
+		    const struct xt_hmark_info *info)
+{
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
+	struct nf_conntrack_tuple *otuple;
+	struct nf_conntrack_tuple *rtuple;
+
+	if (ct == NULL || nf_ct_is_untracked(ct))
+		return -1;
+
+	otuple = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
+	rtuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple;
+
+	t->src = hmark_addr_mask(otuple->src.l3num, otuple->src.u3.all,
+				 info->src_mask.all);
+	t->dst = hmark_addr_mask(otuple->src.l3num, rtuple->src.u3.all,
+				 info->dst_mask.all);
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3))
+		return 0;
+
+	t->proto = nf_ct_protonum(ct);
+	if (t->proto != IPPROTO_ICMP) {
+		t->uports.p16.src = otuple->src.u.all;
+		t->uports.p16.dst = rtuple->src.u.all;
+		t->uports.v32 = (t->uports.v32 & info->port_mask.v32) |
+				info->port_set.v32;
+		if (t->uports.p16.dst < t->uports.p16.src)
+			swap(t->uports.p16.dst, t->uports.p16.src);
+	}
+
+	return 0;
+#else
+	return -1;
+#endif
+}
+
+static inline u32
+hmark_hash(struct hmark_tuple *t, const struct xt_hmark_info *info)
+{
+	u32 hash;
+
+	if (t->dst < t->src)
+		swap(t->src, t->dst);
+
+	hash = jhash_3words(t->src, t->dst, t->uports.v32, info->hashrnd);
+	hash = hash ^ (t->proto & info->proto_mask);
+
+	return (hash % info->hmodulus) + info->hoffset;
+}
+
+static void
+hmark_set_tuple_ports(const struct sk_buff *skb, unsigned int nhoff,
+		      struct hmark_tuple *t, const struct xt_hmark_info *info)
+{
+	int protoff;
+
+	protoff = proto_ports_offset(t->proto);
+	if (protoff < 0)
+		return;
+
+	nhoff += protoff;
+	if (skb_copy_bits(skb, nhoff, &t->uports, sizeof(t->uports)) < 0)
+		return;
+
+	t->uports.v32 = (t->uports.v32 & info->port_mask.v32) |
+			info->port_set.v32;
+
+	if (t->uports.p16.dst < t->uports.p16.src)
+		swap(t->uports.p16.dst, t->uports.p16.src);
+}
+
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+static int get_inner6_hdr(const struct sk_buff *skb, int *offset)
+{
+	struct icmp6hdr *icmp6h, _ih6;
+
+	icmp6h = skb_header_pointer(skb, *offset, sizeof(_ih6), &_ih6);
+	if (icmp6h == NULL)
+		return 0;
+
+	if (icmp6h->icmp6_type && icmp6h->icmp6_type < 128) {
+		*offset += sizeof(struct icmp6hdr);
+		return 1;
+	}
+	return 0;
+}
+
+static int
+hmark_pkt_set_htuple_ipv6(const struct sk_buff *skb, struct hmark_tuple *t,
+			  const struct xt_hmark_info *info)
+{
+	struct ipv6hdr *ip6, _ip6;
+	int flag = IP6T_FH_F_AUTH; /* Ports offset, find_hdr flags */
+	unsigned int nhoff = 0;
+	u16 fragoff = 0;
+	int nexthdr;
+
+	ip6 = (struct ipv6hdr *) (skb->data + skb_network_offset(skb));
+	nexthdr = ipv6_find_hdr(skb, &nhoff, -1, &fragoff, &flag);
+	if (nexthdr < 0)
+		return 0;
+	/* No need to check for icmp errors on fragments */
+	if ((flag & IP6T_FH_F_FRAG) || (nexthdr != IPPROTO_ICMPV6))
+		goto noicmp;
+	/* if an icmp error, use the inner header */
+	if (get_inner6_hdr(skb, &nhoff)) {
+		ip6 = skb_header_pointer(skb, nhoff, sizeof(_ip6), &_ip6);
+		if (ip6 == NULL)
+			return -1;
+		/* Treat AH as ESP, use SPI nothing else. */
+		flag = IP6T_FH_F_AUTH;
+		nexthdr = ipv6_find_hdr(skb, &nhoff, -1, &fragoff, &flag);
+		if (nexthdr < 0)
+			return -1;
+	}
+noicmp:
+	t->src = hmark_addr6_mask(ip6->saddr.s6_addr32, info->src_mask.all);
+	t->dst = hmark_addr6_mask(ip6->daddr.s6_addr32, info->dst_mask.all);
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3))
+		return 0;
+
+	t->proto = nexthdr;
+	if (t->proto == IPPROTO_ICMPV6)
+		return 0;
+
+	if (flag & IP6T_FH_F_FRAG)
+		return 0;
+
+	hmark_set_tuple_ports(skb, nhoff, t, info);
+	return 0;
+}
+
+static unsigned int
+hmark_tg_v6(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct xt_hmark_info *info = par->targinfo;
+	struct hmark_tuple t;
+
+	memset(&t, 0, sizeof(struct hmark_tuple));
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_CT)) {
+		if (hmark_ct_set_htuple(skb, &t, info) < 0)
+			return XT_CONTINUE;
+	} else {
+		if (hmark_pkt_set_htuple_ipv6(skb, &t, info) < 0)
+			return XT_CONTINUE;
+	}
+
+	skb->mark = hmark_hash(&t, info);
+	return XT_CONTINUE;
+}
+#endif
+
+static int get_inner_hdr(const struct sk_buff *skb, int iphsz, int *nhoff)
+{
+	const struct icmphdr *icmph;
+	struct icmphdr _ih;
+
+	/* Not enough header? */
+	icmph = skb_header_pointer(skb, *nhoff + iphsz, sizeof(_ih), &_ih);
+	if (icmph == NULL && icmph->type > NR_ICMP_TYPES)
+		return 0;
+
+	/* Error message? */
+	if (icmph->type != ICMP_DEST_UNREACH &&
+	    icmph->type != ICMP_SOURCE_QUENCH &&
+	    icmph->type != ICMP_TIME_EXCEEDED &&
+	    icmph->type != ICMP_PARAMETERPROB &&
+	    icmph->type != ICMP_REDIRECT)
+		return 0;
+
+	*nhoff += iphsz + sizeof(_ih);
+	return 1;
+}
+
+static int
+hmark_pkt_set_htuple_ipv4(const struct sk_buff *skb, struct hmark_tuple *t,
+			  const struct xt_hmark_info *info)
+{
+	struct iphdr *ip, _ip;
+	int nhoff = skb_network_offset(skb);
+
+	ip = (struct iphdr *) (skb->data + nhoff);
+	if (ip->protocol == IPPROTO_ICMP) {
+		/* use inner header in case of ICMP errors */
+		if (get_inner_hdr(skb, ip->ihl * 4, &nhoff)) {
+			ip = skb_header_pointer(skb, nhoff, sizeof(_ip), &_ip);
+			if (ip == NULL)
+				return -1;
+		}
+	}
+
+	t->src = (__force u32) ip->saddr;
+	t->dst = (__force u32) ip->daddr;
+
+	t->src &= info->src_mask.ip;
+	t->dst &= info->dst_mask.ip;
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3))
+		return 0;
+
+	t->proto = ip->protocol;
+
+	/* ICMP has no ports, skip */
+	if (t->proto == IPPROTO_ICMP)
+		return 0;
+
+	/* follow-up fragments don't contain ports, skip all fragments */
+	if (ip->frag_off & htons(IP_MF | IP_OFFSET))
+		return 0;
+
+	hmark_set_tuple_ports(skb, (ip->ihl * 4) + nhoff, t, info);
+
+	return 0;
+}
+
+static unsigned int
+hmark_tg_v4(struct sk_buff *skb, const struct xt_action_param *par)
+{
+	const struct xt_hmark_info *info = par->targinfo;
+	struct hmark_tuple t;
+
+	memset(&t, 0, sizeof(struct hmark_tuple));
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_CT)) {
+		if (hmark_ct_set_htuple(skb, &t, info) < 0)
+			return XT_CONTINUE;
+	} else {
+		if (hmark_pkt_set_htuple_ipv4(skb, &t, info) < 0)
+			return XT_CONTINUE;
+	}
+
+	skb->mark = hmark_hash(&t, info);
+	return XT_CONTINUE;
+}
+
+static int hmark_tg_check(const struct xt_tgchk_param *par)
+{
+	const struct xt_hmark_info *info = par->targinfo;
+
+	if (!info->hmodulus) {
+		pr_info("xt_HMARK: hash modulus can't be zero\n");
+		return -EINVAL;
+	}
+	if (info->proto_mask &&
+	    (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3))) {
+		pr_info("xt_HMARK: proto mask must be zero with L3 mode\n");
+		return -EINVAL;
+	}
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPI_MASK) &&
+	    (info->flags & (XT_HMARK_FLAG(XT_HMARK_SPORT_MASK) |
+			     XT_HMARK_FLAG(XT_HMARK_DPORT_MASK)))) {
+		pr_info("xt_HMARK: spi-mask and port-mask can't be combined\n");
+		return -EINVAL;
+	}
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPI) &&
+	    (info->flags & (XT_HMARK_FLAG(XT_HMARK_SPORT) |
+			     XT_HMARK_FLAG(XT_HMARK_DPORT)))) {
+		pr_info("xt_HMARK: spi-set and port-set can't be combined\n");
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static struct xt_target hmark_tg_reg[] __read_mostly = {
+	{
+		.name		= "HMARK",
+		.family		= NFPROTO_IPV4,
+		.target		= hmark_tg_v4,
+		.targetsize	= sizeof(struct xt_hmark_info),
+		.checkentry	= hmark_tg_check,
+		.me		= THIS_MODULE,
+	},
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+	{
+		.name		= "HMARK",
+		.family		= NFPROTO_IPV6,
+		.target		= hmark_tg_v6,
+		.targetsize	= sizeof(struct xt_hmark_info),
+		.checkentry	= hmark_tg_check,
+		.me		= THIS_MODULE,
+	},
+#endif
+};
+
+static int __init hmark_tg_init(void)
+{
+	return xt_register_targets(hmark_tg_reg, ARRAY_SIZE(hmark_tg_reg));
+}
+
+static void __exit hmark_tg_exit(void)
+{
+	xt_unregister_targets(hmark_tg_reg, ARRAY_SIZE(hmark_tg_reg));
+}
+
+module_init(hmark_tg_init);
+module_exit(hmark_tg_exit);
-- 
1.7.9.5


[-- Attachment #3: 0001-netfilter-userspace-part-for-target-HMARK.patch --]
[-- Type: text/x-patch, Size: 23501 bytes --]

From 6e59e43e0e275918ae2c307e46a5581d5587459b Mon Sep 17 00:00:00 2001
From: Hans Schillstrom <hans.schillstrom@ericsson.com>
Date: Tue, 8 May 2012 09:15:12 +0200
Subject: [PATCH] netfilter: userspace part for target HMARK

    The target allows you to create rules in the "raw" and "mangle" tables
    which alter the netfilter mark (nfmark) field within a given range.
    First a 32 bit hash value is generated then modulus by <limit> and
    finally an offset is added before it's written to nfmark.
    Prior to routing, the nfmark can influence the routing method (see
    "Use netfilter MARK value as routing key") and can also be used by
    other subsystems to change their behaviour.

    The mark match can also be used to match nfmark produced by this module.

Ver 13
    Name change of defines and spi / port check due to removal ov spi data.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 extensions/libxt_HMARK.c   |  522 ++++++++++++++++++++++++++++++++++++++++++++
 extensions/libxt_HMARK.man |   84 +++++++
 2 files changed, 606 insertions(+), 0 deletions(-)
 create mode 100644 extensions/libxt_HMARK.c
 create mode 100644 extensions/libxt_HMARK.man

diff --git a/extensions/libxt_HMARK.c b/extensions/libxt_HMARK.c
new file mode 100644
index 0000000..2442f05
--- /dev/null
+++ b/extensions/libxt_HMARK.c
@@ -0,0 +1,522 @@
+/*
+ * Shared library add-on to iptables to add HMARK target support.
+ *
+ * The kernel module calculates a hash value that can be modified by modulus
+ * and an offset. The hash value is based on a direction independent
+ * five tuple: src & dst addr src & dst ports and protocol.
+ * However src & dst port can be masked and are not used for fragmented
+ * packets, ESP and AH don't have ports so SPI will be used instead.
+ * For ICMP error messages the hash mark values will be calculated on
+ * the source packet i.e. the packet caused the error (If sufficient
+ * amount of data exists).
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <stdbool.h>
+#include <stdio.h>
+#include <string.h>
+
+#include "xtables.h"
+#include <linux/netfilter/xt_HMARK.h>
+
+
+#define DEF_HRAND 0xc175a3b8	/* Default "random" value to jhash */
+
+#define XT_F_HMARK_L4_OPTS \
+		(XT_HMARK_FLAG(XT_HMARK_SPI_MASK) |\
+		 XT_HMARK_FLAG(XT_HMARK_SPI) |\
+		 XT_HMARK_FLAG(XT_HMARK_SPORT_MASK) |\
+		 XT_HMARK_FLAG(XT_HMARK_SPORT) |\
+		 XT_HMARK_FLAG(XT_HMARK_DPORT_MASK) |\
+		 XT_HMARK_FLAG(XT_HMARK_DPORT) |\
+		 XT_HMARK_FLAG(XT_HMARK_PROTO_MASK))
+
+static void HMARK_help(void)
+{
+	printf(
+"HMARK target options, i.e. modify hash calculation by:\n"
+"  --hmark-method <method>          Overall L3/L4 and fragment behavior\n"
+"                 L3                Fragment safe, do not use ports or proto\n"
+"                                   i.e. Fragments don't need special care.\n"
+"                 L3-4 (Default)    Fragment unsafe, use ports and proto\n"
+"                                   if defrag off in conntrack\n"
+"                                      no hmark on any part of a fragment\n"
+"  Limit/modify the calculated hash mark by:\n"
+"  --hmark-mod value                nfmark modulus value\n"
+"  --hmark-offset value             Last action add value to nfmark\n\n"
+" Fine tuning of what will be included in hash calculation\n"
+"  --hmark-src-mask length          Source address mask length\n"
+"  --hmark-dst-mask length          Dest address mask length\n"
+"  --hmark-sport-mask value         Mask src port with value\n"
+"  --hmark-dport-mask value         Mask dst port with value\n"
+"  --hmark-spi-mask value           For esp and ah AND spi with value\n"
+"  --hmark-sport-set value          OR src port with value\n"
+"  --hmark-dport-set value          OR dst port with value\n"
+"  --hmark-spi-set value            For esp and ah OR spi with value\n"
+"  --hmark-proto-mask value         Mask Protocol with value\n"
+"  --hmark-rnd                      Initial Random value to hash cacl.\n"
+" For NAT in IPv4: src part from original/reply tuple will always be used\n"
+" i.e. orig src part will be used as src address/port.\n"
+"     reply src part will be used as dst address/port\n"
+" Make sure to qualify the rule in a proper way when using NAT flag\n"
+" When --ct is used only tracked connections will match\n"
+"  --hmark-ct                       Force conntrack orig and rely tuples as\n"
+"                                   source and destination.\n\n"
+" In many cases hmark can be omitted i.e. --src-mask can be used\n");
+}
+
+#define hi struct xt_hmark_info
+
+static const struct xt_option_entry HMARK_opts[] = {
+	{ .name  = "hmark-method",
+	  .type  = XTTYPE_STRING,
+	  .id    = XT_HMARK_METHOD_L3
+	},
+	{ .name  = "hmark-src-mask",
+	  .type  = XTTYPE_PLENMASK,
+	  .id    = XT_HMARK_SADR_MASK,
+	  .flags = XTOPT_PUT, XTOPT_POINTER(hi, src_mask)
+	},
+	{ .name  = "hmark-dst-mask",
+	  .type  = XTTYPE_PLENMASK,
+	  .id    = XT_HMARK_DADR_MASK,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, dst_mask)
+	},
+	{ .name  = "hmark-sport-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_SPORT_MASK,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_mask.p16.src)
+	},
+	{ .name  = "hmark-dport-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_DPORT_MASK,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_mask.p16.dst)
+	},
+	{ .name  = "hmark-spi-mask",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_SPI_MASK,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_mask.v32)
+	},
+	{ .name  = "hmark-sport-set",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_SPORT,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_set.p16.src)
+	},
+	{ .name  = "hmark-dport-set",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_DPORT,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_set.p16.dst)
+	},
+	{ .name  = "hmark-spi-set",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_SPI,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_set.v32)
+	},
+	{ .name  = "hmark-proto-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_PROTO_MASK,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, proto_mask)
+	},
+	{ .name  = "hmark-rnd",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_RND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, hashrnd)
+	},
+	{ .name = "hmark-mod",
+	  .type = XTTYPE_UINT32,
+	  .id = XT_HMARK_MODULUS,
+	  .min = 1,
+	  .flags = XTOPT_PUT | XTOPT_MAND,
+	  XTOPT_POINTER(hi, hmodulus)
+	},
+	{ .name  = "hmark-offset",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_OFFSET,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, hoffset)
+	},
+	{ .name  = "hmark-ct",
+	  .type  = XTTYPE_NONE,
+	  .id    = XT_HMARK_CT
+	},
+
+	{ .name  = "method",
+	  .type  = XTTYPE_STRING,
+	  .id    = XT_HMARK_METHOD_L3
+	},
+	{ .name  = "src-mask",
+	  .type  = XTTYPE_PLENMASK,
+	  .id    = XT_HMARK_SADR_MASK,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, src_mask)
+	},
+	{ .name  = "dst-mask",
+	  .type  = XTTYPE_PLENMASK,
+	  .id    = XT_HMARK_DADR_MASK,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, dst_mask)
+	},
+	{ .name  = "sport-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_SPORT_MASK,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_mask.p16.src)
+	},
+	{ .name  = "dport-mask", .type = XTTYPE_UINT16,
+	  .id = XT_HMARK_DPORT_MASK,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_mask.p16.dst)
+	},
+	{ .name  = "spi-mask",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_SPI_MASK,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_mask.v32)
+	},
+	{ .name  = "sport-set",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_SPORT,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_set.p16.src)
+	},
+	{ .name  = "dport-set",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_DPORT,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_set.p16.dst)
+	},
+	{ .name  = "spi-set",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_SPI,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, port_set.v32)
+	},
+	{ .name  = "proto-mask",
+	  .type  = XTTYPE_UINT16,
+	  .id    = XT_HMARK_PROTO_MASK,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, proto_mask)
+	},
+	{ .name  = "rnd",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_RND,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, hashrnd)
+	},
+	{ .name  = "mod",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_MODULUS,
+	  .min   = 1,
+	  .flags = XTOPT_PUT,
+	  XTOPT_MAND, XTOPT_POINTER(hi, hmodulus)
+	},
+	{ .name  = "offset",
+	  .type  = XTTYPE_UINT32,
+	  .id    = XT_HMARK_OFFSET,
+	  .flags = XTOPT_PUT,
+	  XTOPT_POINTER(hi, hoffset)
+	},
+	{ .name  = "ct",
+	  .type  = XTTYPE_NONE,
+	  .id    = XT_HMARK_CT
+	},
+	XTOPT_TABLEEND,
+};
+
+static void HMARK_parse(struct xt_option_call *cb, int plen)
+{
+	struct xt_hmark_info *info = cb->data;
+
+	if (!cb->xflags) {
+		memset(info, 0xff, sizeof(struct xt_hmark_info));
+		info->port_set.v32 = 0;
+		info->flags = 0;
+		info->hoffset = 0;
+		info->hashrnd = DEF_HRAND;
+	}
+	xtables_option_parse(cb);
+
+	switch (cb->entry->id) {
+	case XT_HMARK_SADR_MASK:
+		if (cb->val.hlen == plen)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_SADR_MASK);
+		break;
+	case XT_HMARK_DADR_MASK:
+		if (cb->val.hlen == plen)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_DADR_MASK);
+		break;
+	case XT_HMARK_SPI_MASK:
+		info->port_mask.v32 = htonl(cb->val.u32);
+		if (cb->val.u32 == 0xffffffff)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_SPI_MASK);
+		break;
+	case XT_HMARK_SPI:
+		info->port_set.v32 = htonl(cb->val.u32);
+		if (cb->val.u32 == 0)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_SPI);
+		break;
+	case XT_HMARK_SPORT_MASK:
+		info->port_mask.p16.src = htons(cb->val.u16);
+		if (cb->val.u16 == 0xffff)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_SPORT_MASK);
+		break;
+	case XT_HMARK_DPORT_MASK:
+		info->port_mask.p16.dst = htons(cb->val.u16);
+		if (cb->val.u16 == 0xffff)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_DPORT_MASK);
+		break;
+	case XT_HMARK_SPORT:
+		info->port_set.p16.src = htons(cb->val.u16);
+		if (cb->val.u16 == 0)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_SPORT);
+		break;
+	case XT_HMARK_DPORT:
+		info->port_set.p16.dst = htons(cb->val.u16);
+		if (cb->val.u16 == 0)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_DPORT);
+		break;
+	case XT_HMARK_PROTO_MASK:
+		if (cb->val.u16 == 0xffff)
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_PROTO_MASK);
+		break;
+	case XT_HMARK_MODULUS:
+		if (info->hmodulus == 0) {
+			xtables_error(PARAMETER_PROBLEM,
+				      "xxx modulus 0 ? "
+				      "thats a div by 0");
+			info->hmodulus = 0xffffffff;
+		}
+		break;
+	case XT_HMARK_METHOD_L3:
+		if (strcmp(cb->arg, "L3") == 0) {
+			info->proto_mask = 0;
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_METHOD_L3_4);
+		} else if (strcmp(cb->arg, "L3-4") == 0) {
+			cb->xflags &= ~XT_HMARK_FLAG(XT_HMARK_METHOD_L3);
+			cb->xflags |= XT_HMARK_FLAG(XT_HMARK_METHOD_L3_4);
+		}
+		break;
+	}
+	info->flags = cb->xflags;
+}
+
+static void HMARK_ip4_parse(struct xt_option_call *cb)
+{
+	HMARK_parse(cb, 32);
+}
+static void HMARK_ip6_parse(struct xt_option_call *cb)
+{
+	HMARK_parse(cb, 128);
+}
+
+static void HMARK_check(struct xt_fcheck_call *cb)
+{
+	if (!(cb->xflags & XT_HMARK_FLAG(XT_HMARK_MODULUS)))
+		xtables_error(PARAMETER_PROBLEM, "HMARK: the --hmark-mod, "
+			      "is not set, or zero wich is a div by zero");
+	/* Check for invalid options */
+	if (cb->xflags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3) &&
+	   (cb->xflags & XT_F_HMARK_L4_OPTS))
+		xtables_error(PARAMETER_PROBLEM, "HMARK: --hmark-method L3, "
+			      "can not be combined by an Layer 4 options: "
+			      "port, spi or proto ");
+	/* Check invalid mix of spi and ports since thye share data */
+	if (cb->xflags & XT_HMARK_FLAG(XT_HMARK_SPI_MASK) &&
+	    (cb->xflags & (XT_HMARK_FLAG(XT_HMARK_SPORT_MASK) |
+			   XT_HMARK_FLAG(XT_HMARK_DPORT_MASK))))
+		xtables_error(PARAMETER_PROBLEM, "HMARK: --hmark-spi-mask, "
+			      "can not be combined with port mask options ");
+
+	if (cb->xflags & XT_HMARK_FLAG(XT_HMARK_SPI) &&
+	    (cb->xflags & (XT_HMARK_FLAG(XT_HMARK_SPORT) |
+			   XT_HMARK_FLAG(XT_HMARK_DPORT))))
+		xtables_error(PARAMETER_PROBLEM, "HMARK: --hmark-spi-set, "
+			      "can not be combined with port set options ");
+}
+/*
+ * Common print for IPv4 & IPv6
+ */
+static void HMARK_print(const struct xt_hmark_info *info)
+{
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3)) {
+		printf("method L3 ");
+	} else {
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3_4))
+			printf("method L3-4 ");
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPORT_MASK))
+			printf("sport-mask 0x%x ",
+			       htons(info->port_mask.p16.src));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_DPORT_MASK))
+			printf("dport-mask 0x%x ",
+			       htons(info->port_mask.p16.dst));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPI_MASK))
+			printf("spi-mask 0x%x ", htonl(info->port_mask.v32));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPORT))
+			printf("sport-set 0x%x ",
+			       htons(info->port_set.p16.src));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_DPORT))
+			printf("dport-set 0x%x ",
+			       htons(info->port_set.p16.dst));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPI))
+			printf("spi-set 0x%x ", htonl(info->port_set.v32));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_PROTO_MASK))
+			printf("proto-mask 0x%x ", info->proto_mask);
+	}
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_RND))
+		printf("rnd 0x%x ", info->hashrnd);
+
+}
+
+static void HMARK_ip6_print(const void *ip,
+			    const struct xt_entry_target *target, int numeric)
+{
+	const struct xt_hmark_info *info =
+			(const struct xt_hmark_info *)target->data;
+
+	printf(" HMARK ");
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_MODULUS))
+		printf("%% 0x%x ", info->hmodulus);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_OFFSET))
+		printf("+ 0x%x ", info->hoffset);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_CT))
+		printf("ct, ");
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_SADR_MASK))
+		printf("src-mask %s ",
+		       xtables_ip6mask_to_numeric(&info->src_mask.in6) + 1);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_DADR_MASK))
+		printf("dst-mask %s ",
+		       xtables_ip6mask_to_numeric(&info->dst_mask.in6) + 1);
+	HMARK_print(info);
+}
+static void HMARK_ip4_print(const void *ip,
+			    const struct xt_entry_target *target, int numeric)
+{
+	const struct xt_hmark_info *info =
+		(const struct xt_hmark_info *)target->data;
+
+	printf(" HMARK ");
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_MODULUS))
+		printf("%% 0x%x ", info->hmodulus);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_OFFSET))
+		printf("+ 0x%x ", info->hoffset);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_CT))
+		printf("ct, ");
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_SADR_MASK))
+		printf("src-mask %s ",
+		       xtables_ipmask_to_numeric(&info->src_mask.in) + 1);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_DADR_MASK))
+		printf("dst-mask %s ",
+		       xtables_ipmask_to_numeric(&info->dst_mask.in) + 1);
+	HMARK_print(info);
+}
+static void HMARK_save(const struct xt_hmark_info *info)
+{
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3)) {
+		printf(" --hmark-method L3");
+	} else {
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3_4))
+			printf(" --hmark-method L3-4");
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPORT_MASK))
+			printf(" --hmark-sport-mask 0x%x",
+			       htons(info->port_mask.p16.src));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_DPORT_MASK))
+			printf(" --hmark-dport-mask 0x%x",
+			       htons(info->port_mask.p16.dst));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPI_MASK))
+			printf(" --hmark-spi-mask 0x%x",
+			       htonl(info->port_mask.v32));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPORT))
+			printf(" --hmark-sport-set 0x%x",
+			       htons(info->port_set.p16.src));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_DPORT))
+			printf(" --hmark-dport-set 0x%x",
+			       htons(info->port_set.p16.dst));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPI))
+			printf(" --hmark-spi-set 0x%x",
+			       htonl(info->port_set.v32));
+		if (info->flags & XT_HMARK_FLAG(XT_HMARK_PROTO_MASK))
+			printf(" --hmark-proto-mask 0x%x", info->proto_mask);
+	}
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_RND))
+		printf(" --hmark-rnd 0x%x", info->hashrnd);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_MODULUS))
+		printf(" --hmark-mod 0x%x", info->hmodulus);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_OFFSET))
+		printf(" --hmark-offset 0x%x", info->hoffset);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_CT))
+		printf(" --hmark-ct");
+}
+
+static void HMARK_ip6_save(const void *ip, const struct xt_entry_target *target)
+{
+	const struct xt_hmark_info *info =
+		(const struct xt_hmark_info *)target->data;
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_SADR_MASK))
+		printf(" --hmark-src-mask %s",
+		       xtables_ip6mask_to_numeric(&info->src_mask.in6) + 1);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_DADR_MASK))
+		printf(" --hmark-dst-mask %s",
+		       xtables_ip6mask_to_numeric(&info->dst_mask.in6) + 1);
+	HMARK_save(info);
+}
+
+static void HMARK_ip4_save(const void *ip, const struct xt_entry_target *target)
+{
+	const struct xt_hmark_info *info =
+		(const struct xt_hmark_info *)target->data;
+
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_SADR_MASK))
+		printf(" --hmark-src-mask %s",
+		       xtables_ipmask_to_numeric(&info->src_mask.in) + 1);
+	if (info->flags & XT_HMARK_FLAG(XT_HMARK_DADR_MASK))
+		printf(" --hmark-dst-mask %s",
+		       xtables_ipmask_to_numeric(&info->dst_mask.in) + 1);
+	HMARK_save(info);
+}
+
+static struct xtables_target mark_tg_reg[] = {
+	{
+		.family        = NFPROTO_IPV4,
+		.name          = "HMARK",
+		.version       = XTABLES_VERSION,
+		.revision      = 0,
+		.size          = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.userspacesize = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.help          = HMARK_help,
+		.print         = HMARK_ip4_print,
+		.save          = HMARK_ip4_save,
+		.x6_parse      = HMARK_ip4_parse,
+		.x6_fcheck     = HMARK_check,
+		.x6_options    = HMARK_opts,
+	},
+	{
+		.family        = NFPROTO_IPV6,
+		.name          = "HMARK",
+		.version       = XTABLES_VERSION,
+		.revision      = 0,
+		.size          = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.userspacesize = XT_ALIGN(sizeof(struct xt_hmark_info)),
+		.help          = HMARK_help,
+		.print         = HMARK_ip6_print,
+		.save          = HMARK_ip6_save,
+		.x6_parse      = HMARK_ip6_parse,
+		.x6_fcheck     = HMARK_check,
+		.x6_options    = HMARK_opts,
+	},
+};
+
+void _init(void)
+{
+	xtables_register_targets(mark_tg_reg, ARRAY_SIZE(mark_tg_reg));
+}
diff --git a/extensions/libxt_HMARK.man b/extensions/libxt_HMARK.man
new file mode 100644
index 0000000..92bd1ed
--- /dev/null
+++ b/extensions/libxt_HMARK.man
@@ -0,0 +1,84 @@
+This module does the same as MARK, i.e. set an fwmark, but the mark is based on a hash value.
+The hash is based on src-addr, dst-addr, sport, dport and proto. The same mark will be produced independent of direction if no masks is set or the same masks is used for src and dest.
+The hash mark could be adjusted by modulus and finally an offset could be added, i.e the final mark will be within a range.
+ICMP error will use the the original message for hash calculation not the icmp it self.
+
+Note: IPv4 packets with nf_defrag_ipv4 loaded will be defragmented before they reach hmark,
+      IPv6 nf_defrag is not implemented this way, hence fragmented ipv6 packets will reach hmark.
+      Default behavior is to completely ignore any fragment if it reach hmark.
+      --hmark-method L3 is fragment safe since neither ports or L4 protocol field is used.
+      None of the parameters effect the packet it self only the calculated hash value.
+
+.PP
+Parameters:
+Short hand methods
+.TP
+\fB\-\-hmark\-method\fP \fIL3\fP
+Do not use L4 protocol field, ports or spi, only Layer 3 addresses, mask length
+of L3 addresses can still be used. Fragment or not does not matter in
+this case since only L3 address can be used in calc. of hash value.
+.TP
+\fB\-\-hmark\-method\fP \fIL3-4\fP (Default)
+Include L4 in calculation. of hash value i.e. all masks below are valid.
+Fragments will be ignored. (i.e no hash value produced)
+.PP
+For all masks default is all "1:s", to disable a field use mask 0
+.TP
+\fB\-\-hmark\-src\-mask\fP \fIlength\fP
+The length of the mask to AND the source address with (saddr & value).
+.TP
+\fB\-\-hmark\-dst\-mask\fP \fIlength\fP
+The length of the mask to AND the dest. address with (daddr & value).
+.TP
+\fB\-\-hmark\-sport\-mask\fP \fIvalue\fP
+A 16 bit value to AND the src port with (sport & value).
+.TP
+\fB\-\-hmark\-dport\-mask\fP \fIvalue\fP
+A 16 bit value to AND the dest port with (dport & value).
+.TP
+\fB\-\-hmark\-sport\-set\fP \fIvalue\fP
+A 16 bit value to OR the src port with (sport | value).
+.TP
+\fB\-\-hmark\-dport\-set\fP \fIvalue\fP
+A 16 bit value to OR the dest port with (dport | value).
+.TP
+\fB\-\-hmark\-spi\-mask\fP \fIvalue\fP
+Value to AND the spi field with (spi & value) valid for proto esp or ah.
+.TP
+\fB\-\-hmark\-spi\-set\fP \fIvalue\fP
+Value to OR the spi field with (spi | value) valid for proto esp or ah.
+.TP
+\fB\-\-hmark\-proto\-mask\fP \fIvalue\fP
+An 8 bit value to AND the L4 proto field with (proto & value).
+.TP
+\fB\-\-hmark\-ct\fP
+When flag is set, conntrack data should be used. Useful when NAT internal addressed should be used in calculation.
+Be careful when using DNAT since mangle table is handled before nat table. I.e it will not work as expected to put HMARK in table mangle and PREROUTING chain. The initial packet will have it's hash based on the original address, while the rest of the flow will use the NAT:ed address.
+.TP
+\fB\-\-hmark\-rnd\fP \fIvalue\fP
+A 32 bit initial value for hash calc, default is 0xc175a3b8.
+.PP
+Final processing of the mark in order of execution.
+.TP
+\fB\-\-hmark\-mod\fP \fIvalue (must be > 0)\fP
+The easiest way to describe this is:  hash = hash mod <value>
+.TP
+\fB\-\-hmark\-offset\fP \fIvalue\fP
+The easiest way to describe this is:  hash = hash + <value>
+.PP
+\fIExamples:\fP
+.PP
+Default rule handles all TCP, UDP, SCTP, ESP & AH
+.IP
+iptables \-t mangle \-A PREROUTING \-m state \-\-state NEW,ESTABLISHED,RELATED
+ \-j HMARK \-\-hmark-offs 10000 \-\-hmark-mod 10
+.PP
+Handle SCTP and hash dest port only and produce a nfmark between 100-119.
+.IP
+iptables \-t mangle \-A PREROUTING -p SCTP \-j HMARK \-\-src\-mask 0 \-\-dst\-mask 0
+ \-\-sp\-mask 0 \-\-offset 100 \-\-mod 20
+.PP
+Fragment safe Layer 3 only that keep a class C network flow together
+.IP
+iptables \-t mangle \-A PREROUTING \-j HMARK \-\-method L3 \-\-src\-mask 24 \-\-mod 20 \-\-offset 100
+
-- 
1.7.2.3


^ permalink raw reply related

* Re: [PATCH] net: compare_ether_addr[_64bits]() has no ordering
From: David Miller @ 2012-05-08  7:31 UTC (permalink / raw)
  To: joe; +Cc: johannes, eric.dumazet, netdev
In-Reply-To: <1336458936.29640.2.camel@joe2Laptop>

From: Joe Perches <joe@perches.com>
Date: Mon, 07 May 2012 23:35:36 -0700

> On Tue, 2012-05-08 at 02:26 -0400, David Miller wrote:
>> From: Johannes Berg <johannes@sipsolutions.net>
>> Date: Tue, 08 May 2012 07:25:44 +0200
>> 
>> > I suppose I could fix those first and then later change the type, but I
>> > think having a "compare_ether_addr" function that returns *false* when
>> > they *match* would be rather confusing. I'd rather have
>> > "equal_ether_addr()" that returns *true* when they match.
>> > 
>> > I guess we could introduce equal_ether_addr() though and slowly convert,
>> > keeping compare_ether_addr() as a sort of wrapper around it.
>> 
>> Indeed, this is one way to proceed.
> 
> perhaps is_equal_ether_addr or is_same_ether_addr instead?

Hmmm, my first choice would have been "eth_addr_equal()"

^ permalink raw reply

* Re: [PATCH 13/25] ipvs: remove check for IP_VS_CONN_F_SYNC from ip_vs_bind_dest
From: Pablo Neira Ayuso @ 2012-05-08  7:32 UTC (permalink / raw)
  To: Simon Horman; +Cc: David Miller, netfilter-devel, netdev
In-Reply-To: <20120508031545.GB9027@verge.net.au>

On Tue, May 08, 2012 at 12:15:47PM +0900, Simon Horman wrote:
> On Mon, May 07, 2012 at 10:16:17PM -0400, David Miller wrote:
> > From: Simon Horman <horms@verge.net.au>
> > Date: Tue, 8 May 2012 11:08:44 +0900
> > 
> > > can I fix this up as a subsequent patch?
> > 
> > Pablo's tree needs to get respun to address the other feedback
> > I gave, so no reason for him or you to not fix this as well.
> 
> Understood.

Yes, I'll fix it myself.

Expect a new batch in a couple of minutes.

^ permalink raw reply

* Re: [PATCH 02/25] netfilter: nf_ct_helper: allow to disable automatic helper assignment
From: Pablo Neira Ayuso @ 2012-05-08  7:31 UTC (permalink / raw)
  To: David Miller; +Cc: netfilter-devel, netdev
In-Reply-To: <20120507.213418.1633667612399402231.davem@davemloft.net>

On Mon, May 07, 2012 at 09:34:18PM -0400, David Miller wrote:
> From: pablo@netfilter.org
> Date: Tue,  8 May 2012 02:21:56 +0200
> 
> > +	if (!net->ct.helper_sysctl_header) {
> > +		printk(KERN_ERR "nf_conntrack_helper: can't register to sysctl.\n");
> > +		goto out_register;
> > +	}
> 
> Please use pr_err().
> 
> > +			printk(KERN_INFO "nf_conntrack: automatic helper "
> > +				"assignment is deprecated. Please, read "
> > +				"http://www.netfilter.org/news.html#2012-04-03\n");
> > +			net->ct.auto_assign_helper_warned = true;
> 
> Please use pr_info().
> 
> Pointers to web sites to explain a problem is absolutely not
> appropriate in kernel log messages, nor commit messages.
> 
> Either add a document to the kernel tree, or explain things
> fully both in the kernel log message and the commit message.

I'll fix those. Thanks for spotting this issue.

^ permalink raw reply

* Re: [PATCH v2] RPS: Sparse connection optimizations - v2
From: Deng-Cheng Zhu @ 2012-05-08  6:43 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Tom Herbert, davem, netdev
In-Reply-To: <1336379153.3752.2273.camel@edumazet-glaptop>

On 05/07/2012 04:25 PM, Eric Dumazet wrote:
> On Mon, 2012-05-07 at 16:01 +0800, Deng-Cheng Zhu wrote:
>
>> Did you really read my patch and understand what I commented? When I was
>> talking about using rps_sparse_flow (initially cpu_flow), neither
>> rps_sock_flow_table nor rps_dev_flow_table is activated (number of
>> entries: 0).
>
> I read your patch and am concerned of performance issues when handling
> typical workload. Say between 0.1 and 20 Mpps on current hardware.

Thanks for reading. For the performance concern, that's why I posted
the test data in the patch description.

>
> The argument "oh its only selected when
> CONFIG_RPS_SPARSE_FLOW_OPTIMIZATION is set" is wrong.

Certainly, as stated before, I admit (in fact, it's obvious) that this
patch is nothing more than a tradeoff between the throughput of sparse
flows and that of many. In the tests, less than 4% performance loss was
observed as the number of connections went higher. If people (in most
cases I understand) don't care about the sparse flow case, then leave
this feature not selected (the default). In the real world, there are
many tradeoffs, right?

>
> CONFIG_NR_RPS_MAP_LOOPS is wrong.

To balance the overhead and the sparse flow throughput.

>
> Your HZ timeout is yet another dark side of your patch.

Certainly it can be changed to something more reasonable.

>
> Your (flow->dev == skb->dev) test is wrong.

Please let me know why, thanks. The tests showed it's possible that 2
correlated flows could have the same hash value but from different
devices.

>
> Your : flow->ts = now; is wrong (dirtying memory for each packet)

It doesn't touch the packet, does it? It only records the last time when
the flow is active. And the flow entries are only managed by this
feature.

>
> Really I dont like your patch.

Regretfully.

>
> You are kindly asked to find another way to solve your problem, a
> generic mechanism that can help others, not only you.

This is the meat of the problem. And it's why I'm still saying something
about this patch. We do have the platform where SMP irq affinity is not
available to NICs and, according to tests, RPS does take effect with a
bit imperfection: relatively low and inconsistent throughput in the case
of sparse connections. What you are saying is not the linux way, IMHO:
Provided the incoming code doesn't do harm to the kernel, we should
offer options to users. My case can be a rare one, but it would be good
to have the kernel support.

>
> We do think activating RFS is the way to go. Its the standard layer we
> added below RPS, its configurable and scales. It can be expanded at will
> with configurable plugins.
>
> For example, using single queue NICS, it makes sense to select cpu on
> the output device only, not on the rxhash by itself (a modulo or
> something), to reduce false sharing and qdisc/device lock on tx path.
>
> If your machine has 4 cpus, and 4 nics, you can instruct RFS table to
> prefer cpu on the NIC that packet will use for output.

I understand what you say above, but it does not apply in my case.

Thanks,

Deng-Cheng

^ permalink raw reply

* Re: [PATCH] net: compare_ether_addr[_64bits]() has no ordering
From: Joe Perches @ 2012-05-08  6:35 UTC (permalink / raw)
  To: David Miller; +Cc: johannes, eric.dumazet, netdev
In-Reply-To: <20120508.022647.1186809783650560801.davem@davemloft.net>

On Tue, 2012-05-08 at 02:26 -0400, David Miller wrote:
> From: Johannes Berg <johannes@sipsolutions.net>
> Date: Tue, 08 May 2012 07:25:44 +0200
> 
> > I suppose I could fix those first and then later change the type, but I
> > think having a "compare_ether_addr" function that returns *false* when
> > they *match* would be rather confusing. I'd rather have
> > "equal_ether_addr()" that returns *true* when they match.
> > 
> > I guess we could introduce equal_ether_addr() though and slowly convert,
> > keeping compare_ether_addr() as a sort of wrapper around it.
> 
> Indeed, this is one way to proceed.

perhaps is_equal_ether_addr or is_same_ether_addr instead?

^ permalink raw reply

* Re: [PATCH] net: compare_ether_addr[_64bits]() has no ordering
From: David Miller @ 2012-05-08  6:26 UTC (permalink / raw)
  To: johannes; +Cc: eric.dumazet, netdev
In-Reply-To: <1336454744.4328.2.camel@jlt3.sipsolutions.net>

From: Johannes Berg <johannes@sipsolutions.net>
Date: Tue, 08 May 2012 07:25:44 +0200

> I suppose I could fix those first and then later change the type, but I
> think having a "compare_ether_addr" function that returns *false* when
> they *match* would be rather confusing. I'd rather have
> "equal_ether_addr()" that returns *true* when they match.
> 
> I guess we could introduce equal_ether_addr() though and slowly convert,
> keeping compare_ether_addr() as a sort of wrapper around it.

Indeed, this is one way to proceed.

^ permalink raw reply

* Re: [PATCH] net: compare_ether_addr[_64bits]() has no ordering
From: Johannes Berg @ 2012-05-08  5:25 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev
In-Reply-To: <20120507.192052.181899101154654170.davem@davemloft.net>

On Mon, 2012-05-07 at 19:20 -0400, David Miller wrote:

> >> > Neither compare_ether_addr() nor compare_ether_addr_64bits()
> >> > (as it can fall back to the former) have comparison semantics
> >> > like memcmp() where the sign of the return value indicates sort
> >> > order. We had a bug in the wireless code due to a blind memcmp
> >> > replacement because of this.
> >> > 
> >> > A cursory look suggests that the wireless bug was the only one
> >> > due to this semantic difference.
> >> > 
> >> > Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> >> > ---
> >> >  include/linux/etherdevice.h |   11 ++++++-----
> >> >  1 file changed, 6 insertions(+), 5 deletions(-)
> >> 
> >> The right way to avoid this kind of problems is to change these
> >> functions to return a bool
> > 
> > Well, I guess so, but that'd be a weird thing for a compare_ function...
> > should probably be named equal_... then, but I'm not really able to do
> > such a huge change on the first day after my vacation :-)
> 
> It's true the name could be improved, but changing the name is quite
> a large undertaking even with automated scripts.
> 
> Even the bool change is slightly painful, since all of the explicit
> tests against integers (%99.999 of these are in wireless BTW :-) would
> need to be adjusted.

I suppose I could fix those first and then later change the type, but I
think having a "compare_ether_addr" function that returns *false* when
they *match* would be rather confusing. I'd rather have
"equal_ether_addr()" that returns *true* when they match.

I guess we could introduce equal_ether_addr() though and slowly convert,
keeping compare_ether_addr() as a sort of wrapper around it.

johannes

^ permalink raw reply

* Re: SO_TIMESTAMP on tcp sockets?
From: Eric Dumazet @ 2012-05-08  4:37 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Network Development
In-Reply-To: <CALCETrVUjLz63uzr+UTf-G1jq9QDcX6Gf9D6AWK8_GmunsJGnw@mail.gmail.com>

On Mon, 2012-05-07 at 18:39 -0700, Andy Lutomirski wrote:
> I've been using SO_TIMESTAMPNS to good effect on udp sockets.  I'd
> like to do the same thing for tcp.  I realize that this is
> semantically strange [1], but I don't think there's a real issue for
> my use case.  We have very thin streams -- we are likely to process
> each incoming segment as it is received, and I want the most precise
> timestamp possible on each segment.
> 
> A simple approach (I think) would be for a recvmsg on a tcp socket
> with SO_TIMESTAMP(NS) to return at most one skb worth of data along
> with the timestamp associated with that skb.  This could be a little
> strange if multiple segments overlap or if lro is involved, but
> neither of those cases seems like a major problem.
> 
> Is there any interest in something like this?
> 

LRO/GRO is not really a problem, buffers are merged because they are
received in a very short time period. If you want nanosec timestamping
on TCP, just cancel the whole idea.

TCP can 'collapse' several buffers onto single ones (to reduce memory
overhead). Which timestamp would be chosen at collapse time ?

net-next also has tcp coalescing, wich also merge buffers as soon as
they enter receive or ofo queue.

Another problem with SO_TIMESTAMPNS is it globally enables time stamping
on all skbs on the host, adding some latencies. (ktime_get() can be
slowed down when time keeping triggers and hold xtime seqlock)

^ permalink raw reply

* [PATCH V3 Resend 09/12] net/stmmac: Remove conditional compilation of clk code
From: Viresh Kumar @ 2012-05-08  3:52 UTC (permalink / raw)
  To: akpm
  Cc: spear-devel, viresh.linux, linux-kernel, linux-arm-kernel,
	mturquette, sshtylyov, jgarzik, linux, w.sang, LW, andrew,
	Viresh Kumar, Giuseppe Cavallaro, David S. Miller, netdev
In-Reply-To: <cover.1336448639.git.viresh.kumar@st.com>

With addition of dummy clk_*() calls for non CONFIG_HAVE_CLK cases in clk.h,
there is no need to have clk code enclosed in #ifdef CONFIG_HAVE_CLK, #endif
macros.

This also fixes error paths of probe(), as a goto is required in this patch.

Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
---
 drivers/net/ethernet/stmicro/stmmac/stmmac.h      |   41 ---------------------
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |   33 +++++++++--------
 2 files changed, 17 insertions(+), 57 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index db2de9a..7f85895 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -81,9 +81,7 @@ struct stmmac_priv {
 	struct stmmac_counters mmc;
 	struct dma_features dma_cap;
 	int hw_cap_support;
-#ifdef CONFIG_HAVE_CLK
 	struct clk *stmmac_clk;
-#endif
 	int clk_csr;
 };
 
@@ -103,42 +101,3 @@ int stmmac_dvr_remove(struct net_device *ndev);
 struct stmmac_priv *stmmac_dvr_probe(struct device *device,
 				     struct plat_stmmacenet_data *plat_dat,
 				     void __iomem *addr);
-
-#ifdef CONFIG_HAVE_CLK
-static inline int stmmac_clk_enable(struct stmmac_priv *priv)
-{
-	if (!IS_ERR(priv->stmmac_clk))
-		return clk_enable(priv->stmmac_clk);
-
-	return 0;
-}
-
-static inline void stmmac_clk_disable(struct stmmac_priv *priv)
-{
-	if (IS_ERR(priv->stmmac_clk))
-		return;
-
-	clk_disable(priv->stmmac_clk);
-}
-static inline int stmmac_clk_get(struct stmmac_priv *priv)
-{
-	priv->stmmac_clk = clk_get(priv->device, NULL);
-
-	if (IS_ERR(priv->stmmac_clk))
-		return PTR_ERR(priv->stmmac_clk);
-
-	return 0;
-}
-#else
-static inline int stmmac_clk_enable(struct stmmac_priv *priv)
-{
-	return 0;
-}
-static inline void stmmac_clk_disable(struct stmmac_priv *priv)
-{
-}
-static inline int stmmac_clk_get(struct stmmac_priv *priv)
-{
-	return 0;
-}
-#endif /* CONFIG_HAVE_CLK */
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 1a4cf81..bbdcb55 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -28,6 +28,7 @@
 	https://bugzilla.stlinux.com/
 *******************************************************************************/
 
+#include <linux/clk.h>
 #include <linux/kernel.h>
 #include <linux/interrupt.h>
 #include <linux/ip.h>
@@ -165,12 +166,8 @@ static void stmmac_verify_args(void)
 
 static void stmmac_clk_csr_set(struct stmmac_priv *priv)
 {
-#ifdef CONFIG_HAVE_CLK
 	u32 clk_rate;
 
-	if (IS_ERR(priv->stmmac_clk))
-		return;
-
 	clk_rate = clk_get_rate(priv->stmmac_clk);
 
 	/* Platform provided default clk_csr would be assumed valid
@@ -192,7 +189,6 @@ static void stmmac_clk_csr_set(struct stmmac_priv *priv)
 	   * we can not estimate the proper divider as it is not known
 	   * the frequency of clk_csr_i. So we do not change the default
 	   * divider. */
-#endif
 }
 
 #if defined(STMMAC_XMIT_DEBUG) || defined(STMMAC_RX_DEBUG)
@@ -971,7 +967,7 @@ static int stmmac_open(struct net_device *dev)
 	} else
 		priv->tm->enable = 1;
 #endif
-	stmmac_clk_enable(priv);
+	clk_enable(priv->stmmac_clk);
 
 	stmmac_check_ether_addr(priv);
 
@@ -1075,7 +1071,7 @@ open_error:
 	if (priv->phydev)
 		phy_disconnect(priv->phydev);
 
-	stmmac_clk_disable(priv);
+	clk_disable(priv->stmmac_clk);
 
 	return ret;
 }
@@ -1128,7 +1124,7 @@ static int stmmac_release(struct net_device *dev)
 #ifdef CONFIG_STMMAC_DEBUG_FS
 	stmmac_exit_fs();
 #endif
-	stmmac_clk_disable(priv);
+	clk_disable(priv->stmmac_clk);
 
 	return 0;
 }
@@ -1922,11 +1918,14 @@ struct stmmac_priv *stmmac_dvr_probe(struct device *device,
 	ret = register_netdev(ndev);
 	if (ret) {
 		pr_err("%s: ERROR %i registering the device\n", __func__, ret);
-		goto error;
+		goto error_netdev_register;
 	}
 
-	if (stmmac_clk_get(priv))
+	priv->stmmac_clk = clk_get(priv->device, NULL);
+	if (IS_ERR(priv->stmmac_clk)) {
 		pr_warning("%s: warning: cannot get CSR clock\n", __func__);
+		goto error_clk_get;
+	}
 
 	/* If a specific clk_csr value is passed from the platform
 	 * this means that the CSR Clock Range selection cannot be
@@ -1944,15 +1943,17 @@ struct stmmac_priv *stmmac_dvr_probe(struct device *device,
 	if (ret < 0) {
 		pr_debug("%s: MDIO bus (id: %d) registration failed",
 			 __func__, priv->plat->bus_id);
-		goto error;
+		goto error_mdio_register;
 	}
 
 	return priv;
 
-error:
-	netif_napi_del(&priv->napi);
-
+error_mdio_register:
+	clk_put(priv->stmmac_clk);
+error_clk_get:
 	unregister_netdev(ndev);
+error_netdev_register:
+	netif_napi_del(&priv->napi);
 	free_netdev(ndev);
 
 	return NULL;
@@ -2020,7 +2021,7 @@ int stmmac_suspend(struct net_device *ndev)
 	else {
 		stmmac_set_mac(priv->ioaddr, false);
 		/* Disable clock in case of PWM is off */
-		stmmac_clk_disable(priv);
+		clk_disable(priv->stmmac_clk);
 	}
 	spin_unlock(&priv->lock);
 	return 0;
@@ -2044,7 +2045,7 @@ int stmmac_resume(struct net_device *ndev)
 		priv->hw->mac->pmt(priv->ioaddr, 0);
 	else
 		/* enable the clk prevously disabled */
-		stmmac_clk_enable(priv);
+		clk_enable(priv->stmmac_clk);
 
 	netif_device_attach(ndev);
 
-- 
1.7.9

^ permalink raw reply related

* Re: [PATCH 0/3] smsc75xx: more minor fixes
From: David Miller @ 2012-05-08  3:44 UTC (permalink / raw)
  To: steve.glendinning; +Cc: netdev
In-Reply-To: <1336129033-15826-1-git-send-email-steve.glendinning@shawell.net>

From: Steve Glendinning <steve.glendinning@shawell.net>
Date: Fri,  4 May 2012 11:57:10 +0100

> 3 more minor patches for smsc75xx
> 
> Steve Glendinning (3):
>   smsc75xx: replace 0xffff with PHY_INT_SRC_CLEAR_ALL
>   smsc75xx: eliminate unnecessary phy register read
>   smsc75xx: let EEPROM determine GPIO/LED settings

All applied to net-next.  Please be specific in the future about where
you want patches applied, these wouldn't apply cleanly until I merged
'net' into 'net-next' to propagate recent bug fixes first.

^ permalink raw reply

* [README] net --> net-next merge
From: David Miller @ 2012-05-08  3:41 UTC (permalink / raw)
  To: netdev; +Cc: sfr, linville, jeffrey.t.kirsher

I just merged net into net-next, there were two major conflicts.

1) The known iwlwifi conflict due to the skb->truesize bug fix in
   'net'.

   I used the conflict resolution posted by John and later by Stephen
   when he resolved it for -next.

   John, please double-check my work.

2) drivers/net/ethernet/intel/e1000e/param.c got a conflict in
   the code which validates adapter->itr settings.

   The net-next side looked more sophisticated so that's what I
   used.

   Jeff, please double-check and send me fixup patches if necessary.

Thanks.

^ permalink raw reply

* Re: [PATCH RESEND 2/5] net: fec: adopt pinctrl support
From: Dong Aisheng @ 2012-05-08  3:44 UTC (permalink / raw)
  To: Shawn Guo
  Cc: linux-arm-kernel, Arnd Bergmann, netdev, Sascha Hauer,
	Olof Johansson, Dong Aisheng, David S. Miller
In-Reply-To: <1336352040-28447-3-git-send-email-shawn.guo@linaro.org>

On Mon, May 07, 2012 at 08:53:57AM +0800, Shawn Guo wrote:
> Cc: netdev@vger.kernel.org
> Cc: David S. Miller <davem@davemloft.net>
> Signed-off-by: Shawn Guo <shawn.guo@linaro.org>

Acked-by: Dong Aisheng <dong.aisheng@linaro.org>

Regards
Dong Aisheng

^ permalink raw reply

* Re: [PATCH 13/25] ipvs: remove check for IP_VS_CONN_F_SYNC from ip_vs_bind_dest
From: Simon Horman @ 2012-05-08  3:15 UTC (permalink / raw)
  To: David Miller; +Cc: pablo, netfilter-devel, netdev
In-Reply-To: <20120507.221617.296952751857969791.davem@davemloft.net>

On Mon, May 07, 2012 at 10:16:17PM -0400, David Miller wrote:
> From: Simon Horman <horms@verge.net.au>
> Date: Tue, 8 May 2012 11:08:44 +0900
> 
> > can I fix this up as a subsequent patch?
> 
> Pablo's tree needs to get respun to address the other feedback
> I gave, so no reason for him or you to not fix this as well.

Understood.

^ permalink raw reply

* linux-next: manual merge of the net-next tree with the ceph tree
From: Stephen Rothwell @ 2012-05-08  3:09 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: linux-next, linux-kernel, Eric Dumazet, Sage Weil, Sage Weil

[-- Attachment #1: Type: text/plain, Size: 1328 bytes --]

Hi all,

Today's linux-next merge of the net-next tree got a conflict in
net/ceph/osdmap.c between commit 8b3932690084 ("crush: warn on do_rule
failure") from the ceph tree and commit 95c961747284 ("net: cleanup
unsigned to unsigned int") from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc net/ceph/osdmap.c
index 2592f3c,56e561a..0000000
--- a/net/ceph/osdmap.c
+++ b/net/ceph/osdmap.c
@@@ -991,11 -998,12 +991,11 @@@ int ceph_calc_object_layout(struct ceph
  			    struct ceph_file_layout *fl,
  			    struct ceph_osdmap *osdmap)
  {
- 	unsigned num, num_mask;
+ 	unsigned int num, num_mask;
  	struct ceph_pg pgid;
 -	s32 preferred = (s32)le32_to_cpu(fl->fl_pg_preferred);
  	int poolid = le32_to_cpu(fl->fl_pg_pool);
  	struct ceph_pg_pool_info *pool;
- 	unsigned ps;
+ 	unsigned int ps;
  
  	BUG_ON(!osdmap);
  
@@@ -1027,7 -1045,8 +1027,7 @@@ static int *calc_pg_raw(struct ceph_osd
  	struct ceph_pg_mapping *pg;
  	struct ceph_pg_pool_info *pool;
  	int ruleno;
- 	unsigned poolid, ps, pps, t, r;
 -	unsigned int poolid, ps, pps, t;
 -	int preferred;
++	unsigned int poolid, ps, pps, t, r;
  
  	poolid = le32_to_cpu(pgid.pool);
  	ps = le16_to_cpu(pgid.ps);

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PULL net-next] macvtap, vhost and virtio tools updates
From: David Miller @ 2012-05-08  3:06 UTC (permalink / raw)
  To: mst; +Cc: netdev, kvm, jasowang
In-Reply-To: <20120507205537.GA27950@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Mon, 7 May 2012 23:55:39 +0300

> There are mostly bugfixes here.
> I hope to merge some more patches by 3.5, in particular
> vlan support fixes are waiting for Eric's ack,
> and a version of tracepoint patch might be
> ready in time, but let's merge what's ready so it's testable.
> 
> This includes a ton of zerocopy fixes by Jason -
> good stuff but too intrusive for 3.4 and zerocopy is experimental
> anyway.
> 
> virtio supported delayed interrupt for a while now
> so adding support to the virtio tool made sense
> 
> Please pull into net-next and merge for 3.5.

Pulled, please give a review to:

http://patchwork.ozlabs.org/patch/156821/
http://patchwork.ozlabs.org/patch/156822/

Thanks.

^ permalink raw reply

* Re: [patch net-next] net: IP_MULTICAST_IF setsockopt now recognizes struct mreq
From: David Miller @ 2012-05-08  3:03 UTC (permalink / raw)
  To: jpirko; +Cc: netdev, eric.dumazet, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <1336120665-954-1-git-send-email-jpirko@redhat.com>

From: Jiri Pirko <jpirko@redhat.com>
Date: Fri,  4 May 2012 10:37:45 +0200

> Until now, struct mreq has not been recognized and it was worked with
> as with struct in_addr. That means imr_multiaddr was copied to
> imr_address. So do recognize struct mreq here and copy that correctly.
> 
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>

Fair enough, applied.

^ permalink raw reply

* Re: [PATCH v6 3/3] netdev/of/phy: Add MDIO bus multiplexer driven by GPIO lines.
From: David Miller @ 2012-05-08  2:59 UTC (permalink / raw)
  To: ddaney.cavm
  Cc: ralf, grant.likely, rob.herring, devicetree-discuss, netdev,
	linux-kernel, linux-mips, afleming, galak, david.daney
In-Reply-To: <1336007799-31016-4-git-send-email-ddaney.cavm@gmail.com>

From: David Daney <ddaney.cavm@gmail.com>
Date: Wed,  2 May 2012 18:16:39 -0700

> From: David Daney <david.daney@cavium.com>
> 
> The GPIO pins select which sub bus is connected to the master.
> 
> Initially tested with an sn74cbtlv3253 switch device wired into the
> MDIO bus.
> 
> Signed-off-by: David Daney <david.daney@cavium.com>

Applied.

^ permalink raw reply

* Re: [PATCH v6 2/3] netdev/of/phy: Add MDIO bus multiplexer support.
From: David Miller @ 2012-05-08  2:58 UTC (permalink / raw)
  To: ddaney.cavm
  Cc: ralf, grant.likely, rob.herring, devicetree-discuss, netdev,
	linux-kernel, linux-mips, afleming, galak, david.daney
In-Reply-To: <1336007799-31016-3-git-send-email-ddaney.cavm@gmail.com>

From: David Daney <ddaney.cavm@gmail.com>
Date: Wed,  2 May 2012 18:16:38 -0700

> From: David Daney <david.daney@cavium.com>
> 
> This patch adds a somewhat generic framework for MDIO bus
> multiplexers.  It is modeled on the I2C multiplexer.
> 
> The multiplexer is needed if there are multiple PHYs with the same
> address connected to the same MDIO bus adepter, or if there is
> insufficient electrical drive capability for all the connected PHY
> devices.
> 
> Conceptually it could look something like this:
 ...
> This framework configures the bus topology from device tree data.  The
> mechanics of switching the multiplexer is left to device specific
> drivers.
> 
> The follow-on patch contains a multiplexer driven by GPIO lines.
> 
> Signed-off-by: David Daney <david.daney@cavium.com>

Applied.

^ permalink raw reply

* Re: [PATCH v6 1/3] netdev/of/phy: New function: of_mdio_find_bus().
From: David Miller @ 2012-05-08  2:58 UTC (permalink / raw)
  To: ddaney.cavm
  Cc: ralf, grant.likely, rob.herring, devicetree-discuss, netdev,
	linux-kernel, linux-mips, afleming, galak, david.daney
In-Reply-To: <1336007799-31016-2-git-send-email-ddaney.cavm@gmail.com>

From: David Daney <ddaney.cavm@gmail.com>
Date: Wed,  2 May 2012 18:16:37 -0700

> From: David Daney <david.daney@cavium.com>
> 
> Add of_mdio_find_bus() which allows an mii_bus to be located given its
> associated the device tree node.
> 
> This is needed by the follow-on patch to add a driver for MDIO bus
> multiplexers.
> 
> The of_mdiobus_register() function is modified so that the device tree
> node is recorded in the mii_bus.  Then we can find it again by
> iterating over all mdio_bus_class devices.
> 
> Because the OF device tree has now become an integral part of the
> kernel, this can live in mdio_bus.c (which contains the needed
> mdio_bus_class structure) instead of of_mdio.c.
> 
> Signed-off-by: David Daney <david.daney@cavium.com>

Applied.

^ permalink raw reply

* Re: [PATCH] [IPV6] remove sysctl accept_source_route
From: David Miller @ 2012-05-08  2:56 UTC (permalink / raw)
  To: eldad; +Cc: kuznet, jmorris, yoshfuji, kaber, eric.dumazet, linux-kernel,
	netdev
In-Reply-To: <1335695830-19176-1-git-send-email-eldad@fogrefinery.com>

From: Eldad Zack <eldad@fogrefinery.com>
Date: Sun, 29 Apr 2012 12:37:10 +0200

> The only place where the accpet_source_route flag is checked is when we
> are processing the type 2 routing header. In that case we only allow it if
> it (1) has only segments left = 1 and (2) if it matches our home address,
> which is the behavior required by RFC 6275 (see sections 8.5, 11.3.3), and
> it doesn't make sense to block rh2 when we're a mobile node.
> 
> Signed-off-by: Eldad Zack <eldad@fogrefinery.com>

Considering commits:

commit c382bb9d32a55029fb13b118858e25908fab4617
Author: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Date:   Tue Jul 10 22:47:58 2007 -0700

    [IPV6]: Restore semantics of Routing Header processing.
    
    The "fix" for emerging security threat was overkill and it broke
    basic semantic of IPv6 routing header processing.  We should assume
    RT0 (or even RT2, depends on configuration) as "unknown" RH type so
    that we
    - silently ignore the routing header if segleft == 0
    - send ICMPv6 Parameter Problem message back to the sender,
      otherwise.
    
    Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

and:

commit bb4dbf9e61d0801927e7df2569bb3dd8287ea301
Author: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Date:   Tue Jul 10 22:55:49 2007 -0700

    [IPV6]: Do not send RH0 anymore.
    
    Based on <draft-ietf-ipv6-deprecate-rh0-00.txt>.
    
    Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

the current behavior seems very much intentional.

Secondly, we cannot just delete sysctls like this, if someone
depends upon whatever current behavior is we will break them.

Therefore, on either account, I cannot apply this patch.

Sorry.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox