Netdev List
 help / color / mirror / Atom feed
* Re: tc ipt action
From: Jan Engelhardt @ 2012-12-16 17:21 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Pablo Neira Ayuso, Yury Stankevich, shemonc,
	netdev@vger.kernel.org, netfilter-devel
In-Reply-To: <50CDA5BE.2080800@mojatatu.com>


On Sunday 2012-12-16 11:43, Jamal Hadi Salim wrote:

> On 12-12-15 07:59 PM, Jan Engelhardt wrote:
>>
>
>> For the C level, there is XTABLES_VERSION_CODE.
>>
>> #if XTABLES_VERSION_CODE >= 6
>> 	if (m != NULL && m->x6_parse != NULL)
>> 		m->x6_parse(...)
>> #else
>> 	else if (m != NULL && m->parse != NULL)
>> 		m->parse(...)
>> 	...
>>
>
> I think you are suggesting this to be done in tc. That would make it easier to
> fix.
> IMO, it is easier to keep backward compat if you left the old
> APIs around for a period of time

As you can see, the old ->parse() that goes back to libxtables.so.0
still remains. It's just that... only 5 out of 99 plugins still come
with an old parse function.

	[m_xt] -> [libxtables] <- (plugins: libxt_*.so)

> and maybe log a warning that they
> will be deprecated over a period of time (sort of like kernel approach to
> changing APIs).

old parse has not entered any deprecation stage yet, since there are still
plugins out there (both the 5 and external ones) that make use of it.
Right now, both parse and x6_parse are valid.

> BTW: another interface that seems to have changed that we
> need is m->final_check().

Yes, all those with an x6_ prefix are new.
Mh, I already dream of plans reducing m_xt to something that
does not require libxtables.so anymore.

^ permalink raw reply

* Re: [PATCH] build: unbreak linkage of m_xt.so
From: Jan Engelhardt @ 2012-12-16 17:43 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: stephen.hemminger, vapier, netdev, urykhy, shemonc, pablo,
	netfilter-devel
In-Reply-To: <50CDFEFB.2070208@mojatatu.com>

On Sunday 2012-12-16 18:03, Jamal Hadi Salim wrote:

> On 12-12-16 05:30 AM, Jamal Hadi Salim wrote:
>
>> I can confirm it builds fine for me now if i take out the hack I had and
>> use this patch.
>
>
> Sorry, I take what i said back and went back to explicitly adding -l xtables.
> The problem is still the intepretation of tc/Makefile. Here's the compile
> output.
> ----
> gcc -Wall -Wstrict-prototypes -O2 -I../include -DRESOLVE_HOSTNAMES
> -DLIBDIR=\"/usr/lib\" -DCONFDIR=\"/etc/iproute2\" -D_GNU_SOURCE -DCONFIG_GACT
> -DCONFIG_GACT_PROB -DIPT_LIB_DIR=\"/lib/xtables\" -DYY_NO_INPUT
> -Wl,-export-dynamic -shared -fpic -o m_xt.so m_xt.c $(pkg-config xtables
> --cflags --libs)
> ----

I saw the same during make, _but_, on running `ldd tc/m_xt.so`, I got a
libxtables.so entry, so I thought I was fine.



"$() "is something for the shell to expand, not make. See this testcase.

$ make
echo $(pkg-config xtables --cflags --libs)
-I/usr/include/iptables-1.4.16.3 -lxtables
$ cat Makefile 
a:
        echo $$(pkg-config xtables --cflags --libs)

^ permalink raw reply

* Re: tc ipt action
From: Jamal Hadi Salim @ 2012-12-16 17:47 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Pablo Neira Ayuso, Yury Stankevich, shemonc,
	netdev@vger.kernel.org, netfilter-devel
In-Reply-To: <alpine.LNX.2.01.1212161815310.27614@nerf07.vanv.qr>

On 12-12-16 12:21 PM, Jan Engelhardt wrote:
>
>
> As you can see, the old ->parse() that goes back to libxtables.so.0
> still remains. It's just that... only 5 out of 99 plugins still come
> with an old parse function.
>

I see.
So calling m->XXX may not be a wise long term solution.
Hasan's patch has a call to xtables_option_tpcall(), if that is the 
right interface I hope that going forward if any of the m->parseXX
changes you will take care of hiding all that stuff.

> old parse has not entered any deprecation stage yet, since there are still
> plugins out there (both the 5 and external ones) that make use of it.
> Right now, both parse and x6_parse are valid.
>

True - but we are getting broken because we are using a call that only 5 
or so users provide. It would have saved us time if we got the
a good log message instead of checking for if !m->parse()

> Yes, all those with an x6_ prefix are new.
> Mh, I already dream of plans reducing m_xt to something that
> does not require libxtables.so anymore.
>

That would be nice - but someone is going to have to link to libxtables, no?

cheers,
jamal

^ permalink raw reply

* Re: [PATCH] build: unbreak linkage of m_xt.so
From: Jamal Hadi Salim @ 2012-12-16 18:05 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: stephen.hemminger, vapier, netdev, urykhy, shemonc, pablo,
	netfilter-devel
In-Reply-To: <alpine.LNX.2.01.1212161841070.27614@nerf07.vanv.qr>

On 12-12-16 12:43 PM, Jan Engelhardt wrote:
> On Sunday 2012-12-16 18:03, Jamal Hadi Salim wrote:
>

>
> I saw the same during make, _but_, on running `ldd tc/m_xt.so`, I got a
> libxtables.so entry, so I thought I was fine.
>

Sorry, you are right. Without your patch that doesnt happen. I had 
removed the global dlopen while debugging, so it was using the wrong
m_xt.so

So my Ack is back on;->

cheers,
jamal

^ permalink raw reply

* Re: tc ipt action
From: Jamal Hadi Salim @ 2012-12-16 18:59 UTC (permalink / raw)
  To: Hasan Chowdhury
  Cc: Jan Engelhardt, Yury Stankevich, netdev@vger.kernel.org, pablo,
	netfilter-devel
In-Reply-To: <50CDFB6A.3090806@mojatatu.com>

Hasan,

I can confirm that xtables_options_xfrm() works.
Just suspicious of that call, "xfrm" seems too specific to xfrm
subsystem but generic enough.
Dont bother resending the patch, it works right now, I am just
waiting for confirmation from Pablo/Jan and i will proceed from there.

I am also still struggling with whether to add your intermediate
solution to rename the xt->ipt.
I think i will go ahead and add a kernel change.

cheers,
jamal

^ permalink raw reply

* Re: tc ipt action
From: Jan Engelhardt @ 2012-12-16 18:59 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Pablo Neira Ayuso, Yury Stankevich, shemonc,
	netdev@vger.kernel.org, netfilter-devel
In-Reply-To: <50CE0921.7060306@mojatatu.com>


On Sunday 2012-12-16 18:47, Jamal Hadi Salim wrote:
>
>> old parse has not entered any deprecation stage yet, since there are still
>> plugins out there (both the 5 and external ones) that make use of it.
>> Right now, both parse and x6_parse are valid.
>
> True - but we are getting broken because we are using a call that only 5 or so
> users provide. It would have saved us time if we got the
> a good log message instead of checking for if !m->parse()

A certainly safe bet would be to write, at the top of tc/m_xt.c,

#if XTABLES_VERSION_CODE > 9
#	error Someone call the guy who changed iptables and \
		make him fix it^W^W^W^W say you need help.
#endif

Then I would automatically notify myself of "oh I need fix that too" when I
feed any new releases of {iptables, iproute} through the Open Build Service.

>> Yes, all those with an x6_ prefix are new.
>> Mh, I already dream of plans reducing m_xt to something that
>> does not require libxtables.so anymore.
>
> That would be nice - but someone is going to have to link to libxtables, no?

I hope the complete opposite.

The following is a rough, it-compiles, untested never-run, draft of a
module. The vision here is that userspace only ever sends a chain
name to the kernel. The git tree/branch for it is

	git://git.inai.de/linux xt2-pktsched

commit 42c559c148cbbc22bf2cc29f2ad08bc330891838

    net_sched: act: new action to call into Xtables2 chains

 include/net/netfilter/xt_core.h    |    8 ++
 include/uapi/linux/tc_act/tc_ipt.h |    2 +
 net/netfilter/xt_core.c            |    3 +-
 net/sched/Kconfig                  |    9 ++
 net/sched/Makefile                 |    1 +
 net/sched/act_xtables.c            |  238 ++++++++++++++++++++++++++++++++++++
 6 files changed, 260 insertions(+), 1 deletion(-)

^ permalink raw reply

* Re: tc ipt action
From: Jan Engelhardt @ 2012-12-16 19:13 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Hasan Chowdhury, Yury Stankevich, netdev@vger.kernel.org, pablo,
	netfilter-devel
In-Reply-To: <50CE1A04.1000405@mojatatu.com>

On Sunday 2012-12-16 19:59, Jamal Hadi Salim wrote:

> Hasan,
>
> I can confirm that xtables_options_xfrm() works.
> Just suspicious of that call, "xfrm" seems too specific to xfrm
> subsystem but generic enough.

Heh, nah.

"xfrm" is one of these pictogram-based abbreviations like "xing" (the 
thing they paint on roads/signs), apparently standing for "trans"form 
and "cross"ing, respectively, though 'x' is ambiguous and open to a lot 
of other interpretations.

^ permalink raw reply

* Re: tc ipt action
From: Jamal Hadi Salim @ 2012-12-16 20:35 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Pablo Neira Ayuso, Yury Stankevich, shemonc,
	netdev@vger.kernel.org, netfilter-devel
In-Reply-To: <alpine.LNX.2.01.1212161850530.27614@nerf07.vanv.qr>

On 12-12-16 01:59 PM, Jan Engelhardt wrote:
>
>
> A certainly safe bet would be to write, at the top of tc/m_xt.c,
>
> #if XTABLES_VERSION_CODE > 9
> #	error Someone call the guy who changed iptables and \
> 		make him fix it^W^W^W^W say you need help.
> #endif
>

Excellent idea ;->


> The following is a rough, it-compiles, untested never-run, draft of a
> module. The vision here is that userspace only ever sends a chain
> name to the kernel. The git tree/branch for it is
>
> 	git://git.inai.de/linux xt2-pktsched
>
> commit 42c559c148cbbc22bf2cc29f2ad08bc330891838
>


I'll look at it later - very slow connection at the moment so cloning 
will take a while.

cheers,
jamal

^ permalink raw reply

* Re: tc ipt action
From: Jamal Hadi Salim @ 2012-12-16 20:36 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Hasan Chowdhury, Yury Stankevich, netdev@vger.kernel.org, pablo,
	netfilter-devel
In-Reply-To: <alpine.LNX.2.01.1212162003340.27614@nerf07.vanv.qr>

On 12-12-16 02:13 PM, Jan Engelhardt wrote:
>
> "xfrm" is one of these pictogram-based abbreviations like "xing" (the
> thing they paint on roads/signs), apparently standing for "trans"form
> and "cross"ing, respectively, though 'x' is ambiguous and open to a lot
> of other interpretations.

Ok, In that case i'll push half of Hasan's patch. I have a kernel change
that works with it.

cheers,
jamal

^ permalink raw reply

* [PATCH] iproute2:  act_ipt fix xtables breakage
From: Jamal Hadi Salim @ 2012-12-16 20:41 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Hasan Chowdhury, Jan Engelhardt, Yury Stankevich,
	netdev@vger.kernel.org, pablo, netfilter-devel
In-Reply-To: <alpine.LNX.2.01.1212162003340.27614@nerf07.vanv.qr>

[-- Attachment #1: Type: text/plain, Size: 307 bytes --]


Attached.

I am going to send a kernel patch as well.
There is an "intermediate solution" from Hasan which doesnt require
the kernel change. It changes the kernel endpoint to "ipt". I am
conflicted because it is a quick hack while otoh forcing people to
upgrade kernel is a usability issue.

cheers,
jamal

[-- Attachment #2: patch-xt --]
[-- Type: text/plain, Size: 4163 bytes --]

commit ff707eaa1fd51435958ae2afcd09d4b3600160b4
Author: Hasan Chowdhury <shemonc@gmail.com>
Date:   Sun Dec 16 14:09:38 2012 -0500

    Fixes breakage with xtables API starting with version 1.4.10
    
    Signed-off-by: Hasan Chowdhury <shemonc@gmail.com>
    Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>

diff --git a/tc/m_xt.c b/tc/m_xt.c
index bcc4d75..e1c3f38 100644
--- a/tc/m_xt.c
+++ b/tc/m_xt.c
@@ -118,6 +118,7 @@ static int parse_ipt(struct action_util *a,int *argc_p,
 	struct xtables_target *m = NULL;
 	struct ipt_entry fw;
 	struct rtattr *tail;
+
 	int c;
 	int rargc = *argc_p;
 	char **argv = *argv_p;
@@ -126,6 +127,7 @@ static int parse_ipt(struct action_util *a,int *argc_p,
 	int size = 0;
 	int iok = 0, ok = 0;
 	__u32 hook = 0, index = 0;
+	struct option *opts = NULL;
 
 	xtables_init_all(&tcipt_globals, NFPROTO_IPV4);
 	set_lib_dir();
@@ -158,14 +160,22 @@ static int parse_ipt(struct action_util *a,int *argc_p,
 					printf(" %s error \n", m->name);
 					return -1;
 				}
-				tcipt_globals.opts =
-				    xtables_merge_options(
 #if (XTABLES_VERSION_CODE >= 6)
-				        tcipt_globals.orig_opts,
+			opts = xtables_options_xfrm(tcipt_globals.orig_opts,
+						    tcipt_globals.opts, 
+						    m->x6_options,
+						    &m->option_offset);
+#else                                   
+			opts = xtables_merge_options(tcipt_globals.orig_opts,
+						     tcipt_globals.opts,
+						     m->extra_opts,
+						     &m->option_offset); 
 #endif
-				        tcipt_globals.opts,
-				        m->extra_opts,
-				        &m->option_offset);
+			if (opts == NULL) {
+				fprintf(stderr, " failed to find aditional options for target %s\n\n", optarg);
+				return -1;
+			} else
+				tcipt_globals.opts = opts;
 			} else {
 				fprintf(stderr," failed to find target %s\n\n", optarg);
 				return -1;
@@ -175,17 +185,21 @@ static int parse_ipt(struct action_util *a,int *argc_p,
 
 		default:
 			memset(&fw, 0, sizeof (fw));
-			if (m) {
-				m->parse(c - m->option_offset, argv, 0,
-					 &m->tflags, NULL, &m->t);
+#if (XTABLES_VERSION_CODE >= 6)
+		if (m != NULL && m->x6_parse != NULL ) {
+			xtables_option_tpcall(c, argv, 0 , m, NULL);
+#else
+		if (m != NULL && m->parse != NULL ) {
+			m->parse(c - m->option_offset, argv, 0, &m->tflags,
+				 NULL, &m->t);
+#endif
 			} else {
-				fprintf(stderr," failed to find target %s\n\n", optarg);
+				fprintf(stderr,"failed to find target %s\n\n", optarg);
 				return -1;
 
 			}
 			ok++;
 			break;
-
 		}
 	}
 
@@ -208,8 +222,13 @@ static int parse_ipt(struct action_util *a,int *argc_p,
 	}
 
 	/* check that we passed the correct parameters to the target */
+#if (XTABLES_VERSION_CODE >= 6)
+	if (m)
+		xtables_option_tfcall(m);
+#else
 	if (m && m->final_check)
 		m->final_check(m->tflags);
+#endif
 
 	{
 		struct tcmsg *t = NLMSG_DATA(n);
@@ -271,6 +290,7 @@ print_ipt(struct action_util *au,FILE * f, struct rtattr *arg)
 {
 	struct rtattr *tb[TCA_IPT_MAX + 1];
 	struct xt_entry_target *t = NULL;
+	struct option *opts = NULL;
 
 	if (arg == NULL)
 		return -1;
@@ -309,14 +329,22 @@ print_ipt(struct action_util *au,FILE * f, struct rtattr *arg)
 				return -1;
 			}
 
-			tcipt_globals.opts =
-			    xtables_merge_options(
 #if (XTABLES_VERSION_CODE >= 6)
-				                  tcipt_globals.orig_opts,
+		opts = xtables_options_xfrm(tcipt_globals.orig_opts,
+					    tcipt_globals.opts,
+					    m->x6_options,
+					    &m->option_offset);
+#else                                   
+		opts = xtables_merge_options(tcipt_globals.orig_opts,
+					     tcipt_globals.opts,
+					     m->extra_opts,
+					     &m->option_offset); 
 #endif
-				                  tcipt_globals.opts,
-			                          m->extra_opts,
-			                          &m->option_offset);
+	if (opts == NULL) {
+		fprintf(stderr, " failed to find aditional options for target %s\n\n", optarg);
+		return -1;
+	} else
+		tcipt_globals.opts = opts;
 		} else {
 			fprintf(stderr, " failed to find target %s\n\n",
 				t->u.user.name);
@@ -355,4 +383,3 @@ struct action_util xt_action_util = {
         .parse_aopt = parse_ipt,
         .print_aopt = print_ipt,
 };
-

^ permalink raw reply related

* Re: tc ipt action
From: Jan Engelhardt @ 2012-12-16 21:21 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Pablo Neira Ayuso, Yury Stankevich, shemonc,
	netdev@vger.kernel.org, netfilter-devel
In-Reply-To: <50CE307E.40304@mojatatu.com>

On Sunday 2012-12-16 21:35, Jamal Hadi Salim wrote:

>> 	git://git.inai.de/linux xt2-pktsched
>> commit 42c559c148cbbc22bf2cc29f2ad08bc330891838
>
> I'll look at it later - very slow connection at the moment so cloning will take
> a while.

If you have a preexisting clone of any linux tree, you can utilize
`git remote add ...` to only grab the deltas.

^ permalink raw reply

* Re: [PATCH] add a `make dist` helper
From: Mike Frysinger @ 2012-12-16 21:57 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Stephen Hemminger, netdev
In-Reply-To: <20121214090935.7d1706fd@nehalam.linuxnetplumber.net>

[-- Attachment #1: Type: Text/Plain, Size: 380 bytes --]

On Friday 14 December 2012 12:09:35 Stephen Hemminger wrote:
> On Thu, 13 Dec 2012 23:16:10 -0800 Stephen Hemminger wrote:
> > I appreciate the effort but there are a number of more steps to doing a
> > release and I need to script them all together.

np

> The tarball's have been rebased, and I built a iproute2-release script for
> next time.

commit it to the tree ? :)
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH] build: unbreak linkage of m_xt.so
From: Mike Frysinger @ 2012-12-16 22:02 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: stephen.hemminger, netdev, jhs, urykhy, shemonc, pablo,
	netfilter-devel
In-Reply-To: <1355617968-26138-1-git-send-email-jengelh@inai.de>

[-- Attachment #1: Type: Text/Plain, Size: 792 bytes --]

On Saturday 15 December 2012 19:32:48 Jan Engelhardt wrote:
> --- a/configure
> +++ b/configure
> @@ -4,7 +4,6 @@
>  INCLUDE=${1:-"$PWD/include"}
> 
>  : ${PKG_CONFIG:=pkg-config}
>  : ${CC=gcc}
> 
> -echo "PKG_CONFIG:=${PKG_CONFIG}" >>Config
> 
>  # Make a temp directory in build tree.
>  TMPDIR=$(mktemp -d config.XXXXXX)
> @@ -224,6 +223,7 @@ rm -f $TMPDIR/ipsettest.c $TMPDIR/ipsettest
>  }
> 
>  echo "# Generated config based on" $INCLUDE >Config
> +echo "PKG_CONFIG:=${PKG_CONFIG}" >>Config
> 
>  echo "TC schedulers"

the use of un-indented shell functions makes the code read in a way it doesn't 
actually execute.  i'd suggest moving this logic into a function to match 
existing style rather than simply moving the Config write.  i'll post a patch.
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* [PATCH 1/3] configure: move toolchain init to a function
From: Mike Frysinger @ 2012-12-16 22:09 UTC (permalink / raw)
  To: stephen.hemminger, netdev; +Cc: jengelh

The layout of this file uses functions to update Config.  Move the
toolchain logic to the same style to fix setting the vars in Config.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 configure | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/configure b/configure
index 9912114..ea1038d 100755
--- a/configure
+++ b/configure
@@ -2,14 +2,19 @@
 # This is not an autconf generated configure
 #
 INCLUDE=${1:-"$PWD/include"}
-: ${PKG_CONFIG:=pkg-config}
-: ${CC=gcc}
-echo "PKG_CONFIG:=${PKG_CONFIG}" >>Config
 
 # Make a temp directory in build tree.
 TMPDIR=$(mktemp -d config.XXXXXX)
 trap 'status=$?; rm -rf $TMPDIR; exit $status' EXIT HUP INT QUIT TERM
 
+check_toolchain()
+{
+: ${PKG_CONFIG:=pkg-config}
+: ${CC=gcc}
+echo "CC:=${CC}" >>Config
+echo "PKG_CONFIG:=${PKG_CONFIG}" >>Config
+}
+
 check_atm()
 {
 cat >$TMPDIR/atmtest.c <<EOF
@@ -224,6 +229,7 @@ rm -f $TMPDIR/ipsettest.c $TMPDIR/ipsettest
 }
 
 echo "# Generated config based on" $INCLUDE >Config
+check_toolchain
 
 echo "TC schedulers"
 
-- 
1.8.0

^ permalink raw reply related

* [PATCH 2/3] lib: include the Config file too
From: Mike Frysinger @ 2012-12-16 22:09 UTC (permalink / raw)
  To: stephen.hemminger, netdev; +Cc: jengelh
In-Reply-To: <1355695757-9957-1-git-send-email-vapier@gentoo.org>

The lib makefile doesn't include Config which means it misses
setting up toolchain vars that it includes.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 lib/Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/Makefile b/lib/Makefile
index bfbe672..a42b885 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -1,3 +1,5 @@
+include ../Config
+
 CFLAGS += -fPIC
 
 UTILOBJ=utils.o rt_names.o ll_types.o ll_proto.o ll_addr.o inet_proto.o
-- 
1.8.0

^ permalink raw reply related

* [PATCH 3/3] configure: pull AR from the env too
From: Mike Frysinger @ 2012-12-16 22:09 UTC (permalink / raw)
  To: stephen.hemminger, netdev; +Cc: jengelh
In-Reply-To: <1355695757-9957-1-git-send-email-vapier@gentoo.org>

This matches the existing CC behavior.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
---
 configure | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/configure b/configure
index ea1038d..7c2db9b 100755
--- a/configure
+++ b/configure
@@ -10,7 +10,9 @@ trap 'status=$?; rm -rf $TMPDIR; exit $status' EXIT HUP INT QUIT TERM
 check_toolchain()
 {
 : ${PKG_CONFIG:=pkg-config}
+: ${AR=ar}
 : ${CC=gcc}
+echo "AR:=${AR}" >>Config
 echo "CC:=${CC}" >>Config
 echo "PKG_CONFIG:=${PKG_CONFIG}" >>Config
 }
-- 
1.8.0

^ permalink raw reply related

* Re: [PATCH v2] netfilter: nf_nat: Also handle non-ESTABLISHED routing changes in MASQUERADE
From: Pablo Neira Ayuso @ 2012-12-16 22:33 UTC (permalink / raw)
  To: Jozsef Kadlecsik; +Cc: Andrew Collins, netfilter-devel, netdev
In-Reply-To: <alpine.DEB.2.00.1212130918410.2048@blackhole.kfki.hu>

On Thu, Dec 13, 2012 at 09:19:27AM +0100, Jozsef Kadlecsik wrote:
> On Wed, 12 Dec 2012, Andrew Collins wrote:
> 
> > The MASQUERADE target now handles routing changes which affect
> > the output interface of a connection, but only for ESTABLISHED
> > connections.  It is also possible for NEW connections which
> > already have a conntrack entry to be affected by routing changes.
> > 
> > This adds a check to drop entries in the NEW+conntrack state
> > when the oif has changed.
> > 
> > Signed-off-by: Andrew Collins <bsderandrew@gmail.com>
> 
> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>

Applied, thanks guys.

^ permalink raw reply

* Re: [PATCH] Fix comment for packets without data
From: Pablo Neira Ayuso @ 2012-12-16 22:38 UTC (permalink / raw)
  To: Rick Jones; +Cc: Florent Fourcot, yoshfuji, netdev, netfilter-devel
In-Reply-To: <50CB9A16.1090006@hp.com>

On Fri, Dec 14, 2012 at 01:28:54PM -0800, Rick Jones wrote:
> On 12/14/2012 02:53 AM, Florent Fourcot wrote:
> >Remove ambiguity of double negation
> >
> >Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
> >---
> >  net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c |    4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> >diff --git a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
> >index 00ee17c..137e245 100644
> >--- a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
> >+++ b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
> >@@ -81,8 +81,8 @@ static int ipv6_get_l4proto(const struct sk_buff *skb, unsigned int nhoff,
> >  	}
> >  	protoff = ipv6_skip_exthdr(skb, extoff, &nexthdr, &frag_off);
> >  	/*
> >-	 * (protoff == skb->len) mean that the packet doesn't have no data
> >-	 * except of IPv6 & ext headers. but it's tracked anyway. - YK
> >+	 * (protoff == skb->len) means the packet has not data, just
> >+	 * IPv6 and possibly extensions headers, but it is tracked anyway
> >  	 */
> >  	if (protoff < 0 || (frag_off & htons(~0x7)) != 0) {
> >  		pr_debug("ip6_conntrack_core: can't find proto in pkt\n");
> >
> 
> Acked-by: Rick Jones <rick.jones2@hp.com>

Applied, thanks.

That was many discussion for a patch to fix a comment, nice indeed :-)

^ permalink raw reply

* Re: [PATCH v3] netfilter: nf_conntrack_sip: Handle Cisco 7941/7945 IP phones
From: David Woodhouse @ 2012-12-17  0:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Kevin Cernekee, Patrick McHardy, David S. Miller,
	Alexey Kuznetsov, Pekka Savola (ipv6), James Morris,
	Hideaki YOSHIFUJI, netfilter-devel, netfilter, coreteam,
	linux-kernel, netdev
In-Reply-To: <1290412334.2756.141.camel@edumazet-laptop>

[-- Attachment #1: Type: text/plain, Size: 6988 bytes --]

On Mon, 2010-11-22 at 08:52 +0100, Eric Dumazet wrote:
> Le dimanche 21 novembre 2010 à 18:40 -0800, Kevin Cernekee a écrit :
> > [v3:
> >   Only activate the new forced_dport logic if the IP matches, but the
> >   port does not. ]
> > 
> > Most SIP devices use a source port of 5060/udp on SIP requests, so the
> > response automatically comes back to port 5060:
> > 
> > phone_ip:5060 -> proxy_ip:5060   REGISTER
> > proxy_ip:5060 -> phone_ip:5060   100 Trying
> > 
> > The newer Cisco IP phones, however, use a randomly chosen high source
> > port for the SIP request but expect the response on port 5060:
> > 
> > phone_ip:49173 -> proxy_ip:5060  REGISTER
> > proxy_ip:5060 -> phone_ip:5060   100 Trying
> > 
> > Standard Linux NAT, with or without nf_nat_sip, will send the reply back
> > to port 49173, not 5060:
> > 
> > phone_ip:49173 -> proxy_ip:5060  REGISTER
> > proxy_ip:5060 -> phone_ip:49173  100 Trying
> > 
> > But the phone is not listening on 49173, so it will never see the reply.
> > 
> > This patch modifies nf_*_sip to work around this quirk by extracting
> > the SIP response port from the Via: header, iff the source IP in the
> > packet header matches the source IP in the SIP request.
> > 
> > Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
> > ---
> 
> Thanks for doing this work Keven !
> 
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

What happened to this? OpenWRT is still carrying it, and it broke in
3.7. Here's a completely untested update...

 +              if (!skb_make_writable(skb, skb->len))
 +                      return NF_DROP;
 +
-+              uh = (struct udphdr *)(skb->data + ip_hdrlen(skb));
++              uh = (void *)skb->data + protoff;
 +              uh->dest = ct_sip_info->forced_dport;
 +
-+              if (!nf_nat_mangle_udp_packet(skb, ct, ctinfo, 0, 0, NULL, 0))
++              if (!nf_nat_mangle_udp_packet(skb, ct, ctinfo, protoff, 0, 0, NU
 +                      return NF_DROP;
 +      }
 +
        return NF_ACCEPT;
  }
  


diff --git a/include/linux/netfilter/nf_conntrack_sip.h b/include/linux/netfilter/nf_conntrack_sip.h
index 387bdd0..ba7f571 100644
--- a/include/linux/netfilter/nf_conntrack_sip.h
+++ b/include/linux/netfilter/nf_conntrack_sip.h
@@ -4,12 +4,15 @@
 
 #include <net/netfilter/nf_conntrack_expect.h>
 
+#include <linux/types.h>
+
 #define SIP_PORT	5060
 #define SIP_TIMEOUT	3600
 
 struct nf_ct_sip_master {
 	unsigned int	register_cseq;
 	unsigned int	invite_cseq;
+	__be16		forced_dport;
 };
 
 enum sip_expectation_classes {
diff --git a/net/netfilter/nf_conntrack_sip.c b/net/netfilter/nf_conntrack_sip.c
index df8f4f2..72a67bb 100644
--- a/net/netfilter/nf_conntrack_sip.c
+++ b/net/netfilter/nf_conntrack_sip.c
@@ -1440,8 +1440,25 @@ static int process_sip_request(struct sk_buff *skb, unsigned int protoff,
 {
 	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
+	struct nf_ct_sip_master *ct_sip_info = nfct_help_data(ct);
+	enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
 	unsigned int matchoff, matchlen;
 	unsigned int cseq, i;
+	union nf_inet_addr addr;
+	__be16 port;
+
+	/* Many Cisco IP phones use a high source port for SIP requests, but
+	 * listen for the response on port 5060.  If we are the local
+	 * router for one of these phones, save the port number from the
+	 * Via: header so that nf_nat_sip can redirect the responses to
+	 * the correct port.
+	 */
+	if (ct_sip_parse_header_uri(ct, *dptr, NULL, *datalen,
+				    SIP_HDR_VIA_UDP, NULL, &matchoff,
+				    &matchlen, &addr, &port) > 0 &&
+	    port != ct->tuplehash[dir].tuple.src.u.udp.port &&
+	    nf_inet_addr_cmp(&addr, &ct->tuplehash[dir].tuple.src.u3))
+		ct_sip_info->forced_dport = port;
 
 	for (i = 0; i < ARRAY_SIZE(sip_handlers); i++) {
 		const struct sip_handler *handler;
diff --git a/net/netfilter/nf_nat_sip.c b/net/netfilter/nf_nat_sip.c
index 16303c7..552e270 100644
--- a/net/netfilter/nf_nat_sip.c
+++ b/net/netfilter/nf_nat_sip.c
@@ -95,6 +95,7 @@ static int map_addr(struct sk_buff *skb, unsigned int protoff,
 	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
 	enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
+	struct nf_ct_sip_master *ct_sip_info = nfct_help_data(ct);
 	char buffer[INET6_ADDRSTRLEN + sizeof("[]:nnnnn")];
 	unsigned int buflen;
 	union nf_inet_addr newaddr;
@@ -107,7 +108,8 @@ static int map_addr(struct sk_buff *skb, unsigned int protoff,
 	} else if (nf_inet_addr_cmp(&ct->tuplehash[dir].tuple.dst.u3, addr) &&
 		   ct->tuplehash[dir].tuple.dst.u.udp.port == port) {
 		newaddr = ct->tuplehash[!dir].tuple.src.u3;
-		newport = ct->tuplehash[!dir].tuple.src.u.udp.port;
+		newport = ct_sip_info->forced_dport ? ct_sip_info->forced_dport :
+			  ct->tuplehash[!dir].tuple.src.u.udp.port;
 	} else
 		return 1;
 
@@ -144,6 +146,7 @@ static unsigned int nf_nat_sip(struct sk_buff *skb, unsigned int protoff,
 	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
 	enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
+	struct nf_ct_sip_master *ct_sip_info = nfct_help_data(ct);
 	unsigned int coff, matchoff, matchlen;
 	enum sip_header_types hdr;
 	union nf_inet_addr addr;
@@ -258,6 +261,20 @@ next:
 	    !map_sip_addr(skb, protoff, dataoff, dptr, datalen, SIP_HDR_TO))
 		return NF_DROP;
 
+	/* Mangle destination port for Cisco phones, then fix up checksums */
+	if (dir == IP_CT_DIR_REPLY && ct_sip_info->forced_dport) {
+		struct udphdr *uh;
+
+		if (!skb_make_writable(skb, skb->len))
+			return NF_DROP;
+
+		uh = (void *)skb->data + protoff;
+		uh->dest = ct_sip_info->forced_dport;
+
+		if (!nf_nat_mangle_udp_packet(skb, ct, ctinfo, protoff, 0, 0, NULL, 0))
+			return NF_DROP;
+	}
+
 	return NF_ACCEPT;
 }
 
@@ -311,8 +328,10 @@ static unsigned int nf_nat_sip_expect(struct sk_buff *skb, unsigned int protoff,
 	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
 	enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo);
+	struct nf_ct_sip_master *ct_sip_info = nfct_help_data(ct);
 	union nf_inet_addr newaddr;
 	u_int16_t port;
+	__be16 srcport;
 	char buffer[INET6_ADDRSTRLEN + sizeof("[]:nnnnn")];
 	unsigned int buflen;
 
@@ -326,8 +345,9 @@ static unsigned int nf_nat_sip_expect(struct sk_buff *skb, unsigned int protoff,
 	/* If the signalling port matches the connection's source port in the
 	 * original direction, try to use the destination port in the opposite
 	 * direction. */
-	if (exp->tuple.dst.u.udp.port ==
-	    ct->tuplehash[dir].tuple.src.u.udp.port)
+	srcport = ct_sip_info->forced_dport ? ct_sip_info->forced_dport :
+		  ct->tuplehash[dir].tuple.src.u.udp.port;
+	if (exp->tuple.dst.u.udp.port == srcport)
 		port = ntohs(ct->tuplehash[!dir].tuple.dst.u.udp.port);
 	else
 		port = ntohs(exp->tuple.dst.u.udp.port);

-- 
dwmw2


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply related

* Re: [PATCH v3] netfilter: nf_conntrack_sip: Handle Cisco 7941/7945 IP phones
From: Pablo Neira Ayuso @ 2012-12-17  0:44 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Eric Dumazet, Kevin Cernekee, Patrick McHardy, David S. Miller,
	Alexey Kuznetsov, Pekka Savola (ipv6), James Morris,
	Hideaki YOSHIFUJI, netfilter-devel, netfilter, coreteam,
	linux-kernel, netdev
In-Reply-To: <1355703441.18919.6.camel@shinybook.infradead.org>

Hi David,

On Mon, Dec 17, 2012 at 12:17:21AM +0000, David Woodhouse wrote:
> On Mon, 2010-11-22 at 08:52 +0100, Eric Dumazet wrote:
> > Le dimanche 21 novembre 2010 à 18:40 -0800, Kevin Cernekee a écrit :
> > > [v3:
> > >   Only activate the new forced_dport logic if the IP matches, but the
> > >   port does not. ]
> > > 
> > > Most SIP devices use a source port of 5060/udp on SIP requests, so the
> > > response automatically comes back to port 5060:
> > > 
> > > phone_ip:5060 -> proxy_ip:5060   REGISTER
> > > proxy_ip:5060 -> phone_ip:5060   100 Trying
> > > 
> > > The newer Cisco IP phones, however, use a randomly chosen high source
> > > port for the SIP request but expect the response on port 5060:
> > > 
> > > phone_ip:49173 -> proxy_ip:5060  REGISTER
> > > proxy_ip:5060 -> phone_ip:5060   100 Trying
> > > 
> > > Standard Linux NAT, with or without nf_nat_sip, will send the reply back
> > > to port 49173, not 5060:
> > > 
> > > phone_ip:49173 -> proxy_ip:5060  REGISTER
> > > proxy_ip:5060 -> phone_ip:49173  100 Trying
> > > 
> > > But the phone is not listening on 49173, so it will never see the reply.
> > > 
> > > This patch modifies nf_*_sip to work around this quirk by extracting
> > > the SIP response port from the Via: header, iff the source IP in the
> > > packet header matches the source IP in the SIP request.
> > > 
> > > Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
> > > ---
> > 
> > Thanks for doing this work Keven !
> > 
> > Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> What happened to this? OpenWRT is still carrying it, and it broke in
> 3.7. Here's a completely untested update...

I requested Kevin to resend a new version based on the current kernel
tree while spinning on old pending patches since I have no access to
that hardware, but no luck.

So I'll review this and, since OpenWRT is carrying, I guess we can get
this into net-next merge window.

Thanks for the reminder.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* openconnect triggers soft lockup in __skb_get_rxhash
From: Kirill A. Shutemov @ 2012-12-17  0:56 UTC (permalink / raw)
  To: Maxim Krasnyansky, David S. Miller; +Cc: netdev, David Woodhouse

Hi,

In few minutes after starting openconnect it starts consume 100% CPU and I
can see soft lockup report in dmesg:

[  231.684591] BUG: soft lockup - CPU#3 stuck for 22s! [openconnect:3537]
[  231.684595] Modules linked in: rfcomm bnep iwldvm btusb iwlwifi bluetooth acpi_cpufreq container mperf thermal battery ac processor
[  231.684607] CPU 3 
[  231.684610] Pid: 3537, comm: openconnect Not tainted 3.7.0-08585-g2a74dbb #165 Hewlett-Packard HP EliteBook 8440p/172A
[  231.684612] RIP: 0010:[<ffffffff815f3f62>]  [<ffffffff815f3f62>] skb_flow_dissect+0x52/0x3b0
[  231.684621] RSP: 0018:ffff88012ebbdc48  EFLAGS: 00000246
[  231.684622] RAX: 0000000000000004 RBX: ffffffff8173b360 RCX: ff040404ff040404
[  231.684623] RDX: 0000000000000000 RSI: ffff88012ebbdcc8 RDI: ffff880122770e00
[  231.684624] RBP: ffff88012ebbdcb8 R08: 00000000000004f2 R09: ffff88012faa6000
[  231.684626] R10: ffff880122770e00 R11: 0000000000000000 R12: ffffffff8173b476
[  231.684627] R13: 0000000010000002 R14: ffff88012ebbc000 R15: 0000000000000004
[  231.684628] FS:  00007f31249d9740(0000) GS:ffff880132e00000(0000) knlGS:0000000000000000
[  231.684630] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  231.684631] CR2: 00007fed86cca000 CR3: 0000000122489000 CR4: 00000000000007e0
[  231.684632] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  231.684633] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  231.684635] Process openconnect (pid: 3537, threadinfo ffff88012ebbc000, task ffff8801305be4a0)
[  231.684636] Stack:
[  231.684637]  ffff88012ebbdce8 ffffffff815f0072 ffff8801305be4a0 00000000000004f2
[  231.684640]  0000000000000000 00000000000004f2 00000000000004f2 0000000000000000
[  231.684643]  ffff88012ebbdce8 00000000748aa9b5 ffff880122770e00 ffff88012ebbddf0
[  231.684646] Call Trace:
[  231.684651]  [<ffffffff815f0072>] ? memcpy_fromiovecend+0x72/0xc0
[  231.684654]  [<ffffffff815f71fa>] __skb_get_rxhash+0x1a/0xd0
[  231.684659]  [<ffffffff814a04a8>] tun_get_user+0x5b8/0x7b0
[  231.684662]  [<ffffffff8149ea79>] ? __tun_get+0x59/0x80
[  231.684664]  [<ffffffff814a07b1>] tun_chr_aio_write+0x81/0xb0
[  231.684670]  [<ffffffff810e342e>] ? put_lock_stats.isra.15+0xe/0x40
[  231.684675]  [<ffffffff811ac857>] do_sync_write+0xa7/0xe0
[  231.684678]  [<ffffffff811acf7b>] vfs_write+0xab/0x190
[  231.684681]  [<ffffffff811ad2f5>] sys_write+0x55/0xb0
[  231.684684]  [<ffffffff81742906>] system_call_fastpath+0x1a/0x1f
[  231.684685] Code: 31 c0 44 8b a7 a0 00 00 00 44 0f b7 6f 76 4c 03 a7 b0 00 00 00 44 2b a7 b8 00 00 00 48 c7 06 00 00 00 00 48 c7 46 08 00 00 00 00 <66> 41 81 fd 81 00 0f 84 72 01 00 00 77 70 66 41 83 fd 08 74 29 

-- 
 Kirill A. Shutemov

^ permalink raw reply

* Re: request to queue patches for stable
From: CAI Qian @ 2012-12-17  1:05 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20121214.153825.1087482329989146130.davem@davemloft.net>



----- Original Message -----
> From: "David Miller" <davem@davemloft.net>
> To: caiqian@redhat.com
> Cc: netdev@vger.kernel.org
> Sent: Saturday, December 15, 2012 4:38:25 AM
> Subject: Re: request to queue patches for stable
> 
> From: CAI Qian <caiqian@redhat.com>
> Date: Mon, 10 Dec 2012 21:03:15 -0500 (EST)
> 
> > 
> > 
> > ----- Original Message -----
> >> From: "David Miller" <davem@davemloft.net>
> >> To: caiqian@redhat.com
> >> Cc: greg@kroah.com, stable@vger.kernel.org, mbizon@freebox.fr,
> >> ja@ssi.bg
> >> Sent: Friday, December 7, 2012 11:23:21 AM
> >> Subject: Re: [PATCH] ipv4: do not cache looped multicasts
> >> 
> >> From: CAI Qian <caiqian@redhat.com>
> >> Date: Thu, 6 Dec 2012 21:56:35 -0500 (EST)
> >> 
> >> > OK, I have a few network patches in the queue that looks
> >> > applicable
> >> > to
> >> > the stable as well. I think I'll send them out here too to seek
> >> > their
> >> > ACKs. David, please let me know if I should stop doing this.
> >> 
> >> Please stop doing this.
> >> 
> >> If you want networking patches to reach stable, first
> >> consult:
> >> 
> >> http://patchwork.ozlabs.org/bundle/davem/stable/
> >> 
> >> to see if the patch you want isn't queued up already.
> >> 
> >> If it is not, ask me to queue it up on netdev@vger.kernel.org
> >> 
> >> But note that I like to let networking patches "cook" upstream
> >> in Linus's tree for a certain amount of time before I submit
> >> them to -stable.  There can be up to even a week or two.
> > Dave, the following patches looks applicable for the stable
> > releases. Please queue them up if you agree.
> > 
> > 0e376bd0b791ac6ac6bdb051492df0769c840848 (for 3.0.x, 3.4.x and
> > 3.6.x)
> > e196c0e579902f42cf72414461fb034e5a1ffbf7 (for 3.0.x, 3.4.x and
> > 3.6.x)
> > 6e51fe7572590d8d86e93b547fab6693d305fd0d (for 3.0.x, 3.4.x and
> > 3.6.x)
> > e1a676424c290b1c8d757e3860170ac7ecd89af4 (for 3.6.x)
> > 636174219b52b5a8bc51bc23bbcba97cd30a65e3 (for 3.6.x)
> 
> What is the point of my publishing the pending networking -stable
> queue if you're not even going to check it?  Those last two patches
> were already queued up.
Dave, Yes, I did check the link you gave to me. However, it was empty
(no patch there) when emailing you, so I thought none of those been
queued up yet. It is also empty now. If I clicked the "patches" link,
it pointed me to http://patchwork.ozlabs.org/project/netdev/list/
which it has patches but I believe it is not for the stable. Please
let me know if I am missing anything.
> 
> Furthermore, it is erroneous to suggest the -ENOMEM SCTP fix without
> the memory leak fix that happens in the commit right before it.
Thanks for the reviewing.
> 
> I've queued things up appropriately, but I really don't appreciate
> how you've handled this at all.  It makes a lot more work for me than
> necessary.
OK, thanks for letting me know.

CAI Qian
> 
> 

^ permalink raw reply

* Re: openconnect triggers soft lockup in __skb_get_rxhash
From: David Miller @ 2012-12-17  1:22 UTC (permalink / raw)
  To: kirill; +Cc: maxk, netdev, dwmw2
In-Reply-To: <20121217005616.GA23029@shutemov.name>


Already fixed in Linus's tree by:

>From 499744209b2cbca66c42119226e5470da3bb7040 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet@google.com>
Date: Wed, 12 Dec 2012 19:22:57 +0000
Subject: [PATCH 18/19] tuntap: dont use skb after netif_rx_ni(skb)

On Wed, 2012-12-12 at 23:16 -0500, Dave Jones wrote:
> Since todays net merge, I see this when I start openvpn..
>
> general protection fault: 0000 [#1] PREEMPT SMP
> Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables xfs iTCO_wdt iTCO_vendor_support snd_emu10k1 snd_util_mem snd_ac97_codec coretemp ac97_bus microcode snd_hwdep snd_seq pcspkr snd_pcm snd_page_alloc snd_timer lpc_ich i2c_i801 snd_rawmidi mfd_core snd_seq_device snd e1000e soundcore emu10k1_gp gameport i82975x_edac edac_core vhost_net tun macvtap macvlan kvm_intel kvm binfmt_misc nfsd auth_rpcgss nfs_acl lockd sunrpc btrfs libcrc32c zlib_deflate firewire_ohci sata_sil firewire_core crc_itu_t radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core floppy
> CPU 0
> Pid: 1381, comm: openvpn Not tainted 3.7.0+ #14                  /D975XBX
> RIP: 0010:[<ffffffff815b54a4>]  [<ffffffff815b54a4>] skb_flow_dissect+0x314/0x3e0
> RSP: 0018:ffff88007d0d9c48  EFLAGS: 00010206
> RAX: 000000000000055d RBX: 6b6b6b6b6b6b6b4b RCX: 1471030a0180040a
> RDX: 0000000000000005 RSI: 00000000ffffffe0 RDI: ffff8800ba83fa80
> RBP: ffff88007d0d9cb8 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000101 R12: ffff8800ba83fa80
> R13: 0000000000000008 R14: ffff88007d0d9cc8 R15: ffff8800ba83fa80
> FS:  00007f6637104800(0000) GS:ffff8800bf600000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f563f5b01c4 CR3: 000000007d140000 CR4: 00000000000007f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process openvpn (pid: 1381, threadinfo ffff88007d0d8000, task ffff8800a540cd60)
> Stack:
>  ffff8800ba83fa80 0000000000000296 0000000000000000 0000000000000000
>  ffff88007d0d9cc8 ffffffff815bcff4 ffff88007d0d9ce8 ffffffff815b1831
>  ffff88007d0d9ca8 00000000703f6364 ffff8800ba83fa80 0000000000000000
> Call Trace:
>  [<ffffffff815bcff4>] ? netif_rx+0x114/0x4c0
>  [<ffffffff815b1831>] ? skb_copy_datagram_from_iovec+0x61/0x290
>  [<ffffffff815b672a>] __skb_get_rxhash+0x1a/0xd0
>  [<ffffffffa03b9538>] tun_get_user+0x418/0x810 [tun]
>  [<ffffffff8135f468>] ? delay_tsc+0x98/0xf0
>  [<ffffffff8109605c>] ? __rcu_read_unlock+0x5c/0xa0
>  [<ffffffffa03b9a41>] tun_chr_aio_write+0x81/0xb0 [tun]
>  [<ffffffff81145011>] ? __buffer_unlock_commit+0x41/0x50
>  [<ffffffff811db917>] do_sync_write+0xa7/0xe0
>  [<ffffffff811dc01f>] vfs_write+0xaf/0x190
>  [<ffffffff811dc375>] sys_write+0x55/0xa0
>  [<ffffffff81705540>] tracesys+0xdd/0xe2
> Code: 41 8b 44 24 68 41 2b 44 24 6c 01 de 29 f0 83 f8 03 0f 8e a0 00 00 00 48 63 de 49 03 9c 24 e0 00 00 00 48 85 db 0f 84 72 fe ff ff <8b> 03 41 89 46 08 b8 01 00 00 00 e9 43 fd ff ff 0f 1f 40 00 48
> RIP  [<ffffffff815b54a4>] skb_flow_dissect+0x314/0x3e0
>  RSP <ffff88007d0d9c48>
> ---[ end trace 6d42c834c72c002e ]---
>
>
> Faulting instruction is
>
>    0:	8b 03                	mov    (%rbx),%eax
>
> rbx is slab poison (-20) so this looks like a use-after-free here...
>
>                         flow->ports = *ports;
>  314:   8b 03                   mov    (%rbx),%eax
>  316:   41 89 46 08             mov    %eax,0x8(%r14)
>
> in the inlined skb_header_pointer in skb_flow_dissect
>
> 	Dave
>

commit 96442e4242 (tuntap: choose the txq based on rxq) added
a use after free.

Cache rxhash in a temp variable before calling netif_rx_ni()

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jason Wang <jasowang@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/tun.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 2ac2164..40b426e 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -297,13 +297,12 @@ static void tun_flow_cleanup(unsigned long data)
 	spin_unlock_bh(&tun->lock);
 }
 
-static void tun_flow_update(struct tun_struct *tun, struct sk_buff *skb,
+static void tun_flow_update(struct tun_struct *tun, u32 rxhash,
 			    u16 queue_index)
 {
 	struct hlist_head *head;
 	struct tun_flow_entry *e;
 	unsigned long delay = tun->ageing_time;
-	u32 rxhash = skb_get_rxhash(skb);
 
 	if (!rxhash)
 		return;
@@ -1010,6 +1009,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 	int copylen;
 	bool zerocopy = false;
 	int err;
+	u32 rxhash;
 
 	if (!(tun->flags & TUN_NO_PI)) {
 		if ((len -= sizeof(pi)) > total_len)
@@ -1162,12 +1162,13 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 		skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
 	}
 
+	rxhash = skb_get_rxhash(skb);
 	netif_rx_ni(skb);
 
 	tun->dev->stats.rx_packets++;
 	tun->dev->stats.rx_bytes += len;
 
-	tun_flow_update(tun, skb, tfile->queue_index);
+	tun_flow_update(tun, rxhash, tfile->queue_index);
 	return total_len;
 }
 
-- 
1.7.11.7

^ permalink raw reply related

* Re: request to queue patches for stable
From: David Miller @ 2012-12-17  1:22 UTC (permalink / raw)
  To: caiqian; +Cc: netdev
In-Reply-To: <1687740004.1489340.1355706322262.JavaMail.root@redhat.com>

From: CAI Qian <caiqian@redhat.com>
Date: Sun, 16 Dec 2012 20:05:22 -0500 (EST)

> it was empty

It's empty because you didn't change the filter to not
filter patches already applied upstream.

^ permalink raw reply

* Re: openconnect triggers soft lockup in __skb_get_rxhash
From: Kirill A. Shutemov @ 2012-12-17  1:46 UTC (permalink / raw)
  To: David Miller; +Cc: maxk, netdev, dwmw2
In-Reply-To: <20121216.172214.687979484434537200.davem@davemloft.net>

On Sun, Dec 16, 2012 at 05:22:14PM -0800, David Miller wrote:
> 
> Already fixed in Linus's tree by:
> 
> From 499744209b2cbca66c42119226e5470da3bb7040 Mon Sep 17 00:00:00 2001

No, it's not. I use up-to-date (2a74dbb) Linus tree with the patch in and
still see the issue.

-- 
 Kirill A. Shutemov

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox