Netdev List
 help / color / mirror / Atom feed
* Re: TCP connection issues against Amazon S3
From: Rick Jones @ 2015-01-06 19:48 UTC (permalink / raw)
  To: Yuchung Cheng, Erik Grinaker
  Cc: Eric Dumazet, linux-kernel@vger.kernel.org, netdev
In-Reply-To: <54AC348B.4030900@hp.com>

On 01/06/2015 11:16 AM, Rick Jones wrote:
> I'm assuming one incident starts at XX:41:24.748265 in the trace?  That
> does look like it is slowly slogging its way through a bunch of lost
> traffic, which was I think part of the problem I was seeing with the
> middlebox I stepped in, but I don't think I see the reset where I would
> have expected it.  Still, it looks like the sender has an increasing TCP
> RTO as it is going through the slog (as it likely must since there are
> no TCP timestamps?), to the point it gets larger than I'm guessing curl
> was willing to wait, so the FIN at XX:41:53.269534 after a ten second or
> so gap.

Should the receiver's autotuning be advertising an ever larger window 
the way it is while going through the slog of lost traffic?

rick

^ permalink raw reply

* [PATCH v4 12/20] selftests/net: add install target to enable test install
From: Shuah Khan @ 2015-01-06 19:43 UTC (permalink / raw)
  To: mmarek, gregkh, akpm, rostedt, mingo, davem, keescook,
	tranmanphong, mpe, cov, dh.herrmann, hughd, bobby.prani,
	serge.hallyn, ebiederm, tim.bird, josh, koct9i,
	masami.hiramatsu.pt
  Cc: Shuah Khan, linux-kbuild, linux-kernel, linux-api, netdev
In-Reply-To: <cover.1420571615.git.shuahkh@osg.samsung.com>

Add a new make target to enable installing test. This target
installs test in the kselftest install location and add to the
kselftest script to run the test. Install target can be run
only from top level kernel source directory.

Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
---
 tools/testing/selftests/net/Makefile | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 62f22cc..a1a8253 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -6,14 +6,28 @@ CFLAGS = -Wall -O2 -g
 CFLAGS += -I../../../../usr/include/
 
 NET_PROGS = socket psock_fanout psock_tpacket
+INSTALL_PROGS = run_netsocktests run_afpackettests test_bpf.sh $(NET_PROGS)
+NETSOCK_TEST_STR = /bin/sh ./run_netsocktests || echo 'sockettests: [FAIL]'
+AFPKT_TEST_STR = /bin/sh ./run_afpackettests || echo 'afpackettests: [FAIL]'
+BFP_TEST_STR = ./test_bpf.sh
 
 all: $(NET_PROGS)
 %: %.c
 	$(CC) $(CFLAGS) -o $@ $^
 
+install:
+ifdef INSTALL_KSFT_PATH
+	install $(INSTALL_PROGS) $(INSTALL_KSFT_PATH)
+	@echo "$(NETSOCK_TEST_STR)" >> $(KSELFTEST)
+	@echo "$(AFPKT_TEST_STR)" >> $(KSELFTEST)
+	@echo "$(BFP_TEST_STR)" >> $(KSELFTEST)
+else
+	@echo "Run make kselftest_install in top level source directory"
+endif
+
 run_tests: all
-	@/bin/sh ./run_netsocktests || echo "sockettests: [FAIL]"
-	@/bin/sh ./run_afpackettests || echo "afpackettests: [FAIL]"
-	./test_bpf.sh
+	@$(NETSOCK_TEST_STR)
+	@$(AFPKT_TEST_STR)
+	@$(BFP_TEST_STR)
 clean:
 	$(RM) $(NET_PROGS)
-- 
2.1.0

^ permalink raw reply related

* Re: TCP connection issues against Amazon S3
From: Erik Grinaker @ 2015-01-06 19:50 UTC (permalink / raw)
  To: Rick Jones
  Cc: Yuchung Cheng, Eric Dumazet, linux-kernel@vger.kernel.org, netdev
In-Reply-To: <54AC348B.4030900@hp.com>

On 06 Jan 2015, at 19:16, Rick Jones <rick.jones2@hp.com> wrote:
> 
>>>>>>> A packet dump [1] shows repeated ACK retransmits for some of the
>> TCP does not retransmit ACK ... do you mean DUPACKs sent by the receiver?
>> 
>> I am trying to understand the problem. Could you confirm that it's the
>> HTTP responses sent from Amazon S3 got stalled, or HTTP requests sent
>> from the receiver (your host)?
>> 
>> btw I suspect some middleboxes are stripping SACKOK options from your
>> SYNs (or Amazon SYN-ACKs) assuming Amazon supports SACK.
> 
> The TCP Timestamp option too it seems.
> 
> Speaking of middleboxes...  It is probably a fish that is red, but a while back I stepped in a middle box (a load balancer) which decided that if it saw "too many" retransmissions in a given TCP window that something was seriously wrong and it would toast the connection.  I thought though that was an active reset on the part of the middlebox. (And the client was the active sender not the back-end server)

It’s looking increasingly probable that it’s something like that, since the sender (S3) appears to disable SACKs on the failing clients, while it enables SACKs on other functioning clients.

> I'm assuming one incident starts at XX:41:24.748265 in the trace?  That does look like it is slowly slogging its way through a bunch of lost traffic, which was I think part of the problem I was seeing with the middlebox I stepped in, but I don't think I see the reset where I would have expected it.  Still, it looks like the sender has an increasing TCP RTO as it is going through the slog (as it likely must since there are no TCP timestamps?), to the point it gets larger than I'm guessing curl was willing to wait, so the FIN at XX:41:53.269534 after a ten second or so gap.

Yes, there is one incident starting at XX:41:23. All the RSTs are sent at the end though, at the 30s Curl timeout. I’ve put up a stripped down pcap of a single request here:

http://abstrakt.bengler.no/tcp-issues-s3-failure.pcap.bz2

^ permalink raw reply

* [PATCH v4 02/20] selftests/cpu-hotplug: add install target to enable test install
From: Shuah Khan @ 2015-01-06 19:43 UTC (permalink / raw)
  To: mmarek, gregkh, akpm, rostedt, mingo, davem, keescook,
	tranmanphong, mpe, cov, dh.herrmann, hughd, bobby.prani,
	serge.hallyn, ebiederm, tim.bird, josh, koct9i,
	masami.hiramatsu.pt
  Cc: Shuah Khan, linux-kbuild, linux-kernel, linux-api, netdev
In-Reply-To: <cover.1420571615.git.shuahkh@osg.samsung.com>

Add a new make target to enable installing test. This target
installs test in the kselftest install location and add to the
kselftest script to run the test. Install target can be run
only from top level kernel source directory.

Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
---
 tools/testing/selftests/cpu-hotplug/Makefile               | 14 ++++++++++++--
 .../cpu-hotplug/{on-off-test.sh => cpu-on-off-test.sh}     |  0
 2 files changed, 12 insertions(+), 2 deletions(-)
 rename tools/testing/selftests/cpu-hotplug/{on-off-test.sh => cpu-on-off-test.sh} (100%)

diff --git a/tools/testing/selftests/cpu-hotplug/Makefile b/tools/testing/selftests/cpu-hotplug/Makefile
index e9c28d8..c168033 100644
--- a/tools/testing/selftests/cpu-hotplug/Makefile
+++ b/tools/testing/selftests/cpu-hotplug/Makefile
@@ -1,9 +1,19 @@
+TEST_STR=/bin/bash ./cpu-on-off-test.sh || echo 'cpu-hotplug selftests: [FAIL]'
+
 all:
 
+install:
+ifdef INSTALL_KSFT_PATH
+	install ./cpu-on-off-test.sh $(INSTALL_KSFT_PATH)/cpu-on-off-test.sh
+	@echo "$(TEST_STR)" >> $(KSELFTEST)
+else
+	@echo "Run make kselftest_install in top level source directory"
+endif
+
 run_tests:
-	@/bin/bash ./on-off-test.sh || echo "cpu-hotplug selftests: [FAIL]"
+	@$(TEST_STR)
 
 run_full_test:
-	@/bin/bash ./on-off-test.sh -a || echo "cpu-hotplug selftests: [FAIL]"
+	@/bin/bash ./cpu-on-off-test.sh -a || echo "cpu-hotplug selftests: [FAIL]"
 
 clean:
diff --git a/tools/testing/selftests/cpu-hotplug/on-off-test.sh b/tools/testing/selftests/cpu-hotplug/cpu-on-off-test.sh
similarity index 100%
rename from tools/testing/selftests/cpu-hotplug/on-off-test.sh
rename to tools/testing/selftests/cpu-hotplug/cpu-on-off-test.sh
-- 
2.1.0

^ permalink raw reply related

* [PATCH v4 00/20] kselftest install target feature
From: Shuah Khan @ 2015-01-06 19:43 UTC (permalink / raw)
  To: mmarek, gregkh, akpm, rostedt, mingo, davem, keescook,
	tranmanphong, mpe, cov, dh.herrmann, hughd, bobby.prani,
	serge.hallyn, ebiederm, tim.bird, josh, koct9i,
	masami.hiramatsu.pt
  Cc: Shuah Khan, linux-kbuild, linux-kernel, linux-api, netdev

This patch series adds a new kselftest_install make target
to enable selftest install. When make kselftest_install is
run, selftests are installed on the system. A new install
target is added to selftests Makefile which will install
targets for the tests that are specified in INSTALL_TARGETS.
During install, a script is generated to run tests that are
installed. This script will be installed in the selftest install
directory. Individual test Makefiles are changed to add to the
script. This will allow new tests to add install and run test
commands to the generated kselftest script. kselftest target
now depends on kselftest_install and runs the generated kselftest
script to reduce duplicate work and for common look and feel when
running tests.

This approach leverages and extends the existing framework that
uses makefile targets to implement run_tests and adds install
target. This will scale well as new tests get added and makes
it easier for test writers to add install target at the same
time new test gets added.

This series is uploaded to the following experimental branch
for anybody that is interested in playing with it:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git devel

Some benchmark numbers: no relation to this patch series,
I just happened to get some timing numbers, this is nice
and low.
kselftest full run from install dir using kselftest.sh
9.41user 3.55system 0:24.86elapsed

This v4 series fixes echo strings to add quotes around them.
There is no change to the 00/20 kbuild patch in this series.
Other patches in the sries are updated to fix the echo strings.

v3 series: reduced duplicate code to generate script
in indiviual test Makefiles and consolidates support in
selftests main Makefile. In the main Makefile, it does
minimal work to set and export install path.

v2 series: addressed the duplicate code in install and
run_tests targets in individual test Makefiles.
Reference: https://lkml.org/lkml/2014/11/4/707

Shuah Khan (20):
  selftests/breakpoints: add install target to enable test install
  selftests/cpu-hotplug: add install target to enable test install
  selftests/efivarfs: add install target to enable test install
  selftests/firmware: add install target to enable test install
  selftests/ftrace: add install target to enable test install
  selftests/ipc: add install target to enable test install
  selftests/kcmp: add install target to enable test install
  selftests/memfd: add install target to enable test install
  selftests/memory-hotplug: add install target to enable test install
  selftests/mount: add install target to enable test install
  selftests/mqueue: add install target to enable test install
  selftests/net: add install target to enable test install
  selftests/ptrace: add install target to enable test install
  selftests/size: add install target to enable test install
  selftests/sysctl: add install target to enable test install
  selftests/timers: add install target to enable test install
  selftests/user: add install target to enable test install
  selftests/vm: add install target to enable test install
  selftests: add install target to enable test install
  kbuild: add a new kselftest_install make target to install selftests

 Makefile                                           | 14 +++++-
 tools/testing/selftests/Makefile                   | 54 +++++++++++++++++++++-
 tools/testing/selftests/breakpoints/Makefile       | 19 +++++++-
 tools/testing/selftests/cpu-hotplug/Makefile       | 14 +++++-
 .../{on-off-test.sh => cpu-on-off-test.sh}         |  0
 tools/testing/selftests/efivarfs/Makefile          | 16 ++++++-
 tools/testing/selftests/firmware/Makefile          | 43 ++++++++++-------
 tools/testing/selftests/ftrace/Makefile            | 13 +++++-
 tools/testing/selftests/ipc/Makefile               | 19 +++++++-
 tools/testing/selftests/kcmp/Makefile              | 13 +++++-
 tools/testing/selftests/memfd/Makefile             | 17 +++++--
 tools/testing/selftests/memory-hotplug/Makefile    | 14 +++++-
 .../{on-off-test.sh => mem-on-off-test.sh}         |  0
 tools/testing/selftests/mount/Makefile             | 12 ++++-
 tools/testing/selftests/mqueue/Makefile            | 18 ++++++--
 tools/testing/selftests/net/Makefile               | 20 ++++++--
 tools/testing/selftests/ptrace/Makefile            | 16 +++++--
 tools/testing/selftests/size/Makefile              | 12 ++++-
 tools/testing/selftests/sysctl/Makefile            | 17 ++++++-
 tools/testing/selftests/timers/Makefile            | 12 ++++-
 tools/testing/selftests/user/Makefile              | 12 ++++-
 tools/testing/selftests/vm/Makefile                | 11 ++++-
 22 files changed, 317 insertions(+), 49 deletions(-)
 rename tools/testing/selftests/cpu-hotplug/{on-off-test.sh => cpu-on-off-test.sh} (100%)
 rename tools/testing/selftests/memory-hotplug/{on-off-test.sh => mem-on-off-test.sh} (100%)

-- 
2.1.0

^ permalink raw reply

* [PATCH v4 04/20] selftests/firmware: add install target to enable test install
From: Shuah Khan @ 2015-01-06 19:43 UTC (permalink / raw)
  To: mmarek, gregkh, akpm, rostedt, mingo, davem, keescook,
	tranmanphong, mpe, cov, dh.herrmann, hughd, bobby.prani,
	serge.hallyn, ebiederm, tim.bird, josh, koct9i,
	masami.hiramatsu.pt
  Cc: Shuah Khan, linux-kbuild, linux-kernel, linux-api, netdev
In-Reply-To: <cover.1420571615.git.shuahkh@osg.samsung.com>

Add a new make target to enable installing test. This target
installs test in the kselftest install location and add to the
kselftest script to run the test. Install target can be run
only from top level kernel source directory.

Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
---
 tools/testing/selftests/firmware/Makefile | 43 +++++++++++++++++++------------
 1 file changed, 26 insertions(+), 17 deletions(-)

diff --git a/tools/testing/selftests/firmware/Makefile b/tools/testing/selftests/firmware/Makefile
index e23cce0..0bdc25b 100644
--- a/tools/testing/selftests/firmware/Makefile
+++ b/tools/testing/selftests/firmware/Makefile
@@ -1,25 +1,34 @@
 # Makefile for firmware loading selftests
 
 # No binaries, but make sure arg-less "make" doesn't trigger "run_tests"
+
+__fw_filesystem:
+fw_filesystem  = if /bin/sh ./fw_filesystem.sh ; then
+fw_filesystem += echo 'fw_filesystem: ok';
+fw_filesystem += else echo 'fw_filesystem: [FAIL]';
+fw_filesystem += fi
+
+__fw_userhelper:
+fw_userhelper  = if /bin/sh ./fw_userhelper.sh ; then
+fw_userhelper += echo 'fw_userhelper: ok';
+fw_userhelper += else
+fw_userhelper += echo 'fw_userhelper: [FAIL]';
+fw_userhelper += fi
+
 all:
 
-fw_filesystem:
-	@if /bin/sh ./fw_filesystem.sh ; then \
-                echo "fw_filesystem: ok"; \
-        else \
-                echo "fw_filesystem: [FAIL]"; \
-                exit 1; \
-        fi
-
-fw_userhelper:
-	@if /bin/sh ./fw_userhelper.sh ; then \
-                echo "fw_userhelper: ok"; \
-        else \
-                echo "fw_userhelper: [FAIL]"; \
-                exit 1; \
-        fi
-
-run_tests: all fw_filesystem fw_userhelper
+install:
+ifdef INSTALL_KSFT_PATH
+	install ./fw_filesystem.sh ./fw_userhelper.sh $(INSTALL_KSFT_PATH)
+	@echo "$(fw_filesystem)" >> $(KSELFTEST)
+	@echo "$(fw_userhelper)" >> $(KSELFTEST)
+else
+	@echo "Run make kselftest_install in top level source directory"
+endif
+
+run_tests:
+	@$(fw_filesystem)
+	@$(fw_userhelper)
 
 # Nothing to clean up.
 clean:
-- 
2.1.0

^ permalink raw reply related

* [PATCH v4 05/20] selftests/ftrace: add install target to enable test install
From: Shuah Khan @ 2015-01-06 19:43 UTC (permalink / raw)
  To: mmarek, gregkh, akpm, rostedt, mingo, davem, keescook,
	tranmanphong, mpe, cov, dh.herrmann, hughd, bobby.prani,
	serge.hallyn, ebiederm, tim.bird, josh, koct9i,
	masami.hiramatsu.pt
  Cc: Shuah Khan, linux-kbuild, linux-kernel, linux-api, netdev
In-Reply-To: <cover.1420571615.git.shuahkh@osg.samsung.com>

Add a new make target to enable installing test. This target
installs test in the kselftest install location and add to the
kselftest script to run the test. Install target can be run
only from top level kernel source directory.

Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
---
 tools/testing/selftests/ftrace/Makefile | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/ftrace/Makefile b/tools/testing/selftests/ftrace/Makefile
index 76cc9f1..c5c77584 100644
--- a/tools/testing/selftests/ftrace/Makefile
+++ b/tools/testing/selftests/ftrace/Makefile
@@ -1,7 +1,18 @@
+TEST_STR = /bin/sh ./ftracetest || echo 'ftrace selftests: [FAIL]'
+
 all:
 
+install:
+ifdef INSTALL_KSFT_PATH
+	install ./ftracetest $(INSTALL_KSFT_PATH)
+	@cp -r test.d $(INSTALL_KSFT_PATH)
+	echo "$(TEST_STR)" >> $(KSELFTEST)
+else
+	@echo "Run make kselftest_install in top level source directory"
+endif
+
 run_tests:
-	@/bin/sh ./ftracetest || echo "ftrace selftests: [FAIL]"
+	@$(TEST_STR)
 
 clean:
 	rm -rf logs/*
-- 
2.1.0

^ permalink raw reply related

* Re: [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev
From: Hannes Frederic Sowa @ 2015-01-06 19:59 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Netdev, Jiří Pírko, john fastabend, Thomas Graf,
	Jamal Hadi Salim, Andy Gospodarek, Roopa Prabhu
In-Reply-To: <CAE4R7bCOnYL1bat39UivcyAG-S1hx-EJpwC_hWNv8prjZqV3fg@mail.gmail.com>

On Di, 2015-01-06 at 09:51 -0800, Scott Feldman wrote:
> On Tue, Jan 6, 2015 at 5:58 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > At this point I would like to start the discussion about handling of the
> > table ids/vrfs (again :) ): as I can see it, this version just passes
> > table ids down to the driver layer and the rocker driver filters them by
> > local/main table? This seems to be mostly fine for a first version but
> > does not feel like it will integrate well with the rest of the linux
> > networking ecosystem.
> >
> > Will hardware have the capabilities to do programmable matches like "ip
> > rule" is currently capable to do? Should we plan for that? Do we want to
> > support hardware which does support multiple tables/VRFs?
> 
> Good questions, thanks for bringing these up.
> >
> > I would like to present a first suggestion:
> > My take on this would be strive towards an integration with ip-rule, so
> > we add tables which will be offloaded to hardware. This happens only in
> > situations where those tables will be the first match for incoming
> > packets specified with an in-interface filter which has the capability
> > to do the offloading (for example). The determination if the table is
> > capable for hardware offloading should be done automatically, so if
> > later hardware will be capable of doing ip rule like matches, we can
> > just expand the check which flags the tables accordingly.
> 
> Sounds like a good suggestion to me.  We need to think about what the
> swdev API looks like to the switch device driver.  Could you take a
> stab at defining what integration with ip-rule looks like, code-wise,
> at the swdev API layer?
> 
> With the rocker device we're prototyping with, the standard LPM on IP
> dst is the normal L3 routing table structure.  Within that, table
> priorities could be handled, so routes in one table take precedence
> over routes in another table.  If we want to do policy routing, then
> we'd need to use the ACL table in rocker to match on other fields
> besides just IP dst.

Sorry, I haven't fully understood this. Does rocker first do a L3
routing table lookup and *after* that does decide which nexthop to chose
based on preferences in the action-set found at the leaf? My gut tells
me that we cannot do a semantically equivalent to ip rules then, we
would have to use ACLs then. Hmm...

For the first idea, I'll try to make an example:

Initial setup:
# ip rule ls
0:	from all lookup local 
32766:	from all lookup main 
32767:	from all lookup default 

# ip rule add pref 100 iif swdev0 table 5
# ip rule ls
0:	from all lookup local 
100:	from all iif swdev0 [detached] lookup 5
> maybe we can show which rules are being able to get offloaded here
32766:	from all lookup main 
32767:	from all lookup default 

table 5 should be the table we can insert routes into which are
offloaded to hardware.

During table modifications we linearly scan the rules if we find
selectors which cannot be represented by hardware.

In case we have a iif selector, we simply can use this table and just
synthesize it into the particular interface.

A ip-rule-from would need all the hardware being capable of matching
source addresses, otherwise we cannot offload all routing tables with
higher preference, same for a to/tos rule. If we encounter a fwmark
rule, we certainly cannot represent it in hardware, so skip it (here we
can think about entangling those with ACLs, but it feels hard to do).

If rules are inserted or changed we must again validate the complete
list of rules and decide if we need to flush all the routes and install
a slow path via kernel.

What do you think? Does that make sense? I could try to come up with an
API for that. ;)

Bye,
Hannes

^ permalink raw reply

* Re: next-timestamp build failure with 3.19-rc2
From: Cong Wang @ 2015-01-06 20:04 UTC (permalink / raw)
  To: Vinson Lee
  Cc: Jonathan Corbet, David S. Miller, Willem de Bruijn, linux-doc,
	LKML, Netdev
In-Reply-To: <CAHTgTXUGVkJX26bFSOC9EH2gUynSKhLNr7D8mN8cSy7Km_+f9Q@mail.gmail.com>

On Mon, Jan 5, 2015 at 11:55 AM, Vinson Lee <vlee@twopensource.com> wrote:
> Hi.
>
> I'm hitting the following build error with 3.19-rc2 on CentOS 5 with
> glibc-headers-2.5-123.
>
>   HOSTCC  Documentation/networking/timestamping/txtimestamp
> Documentation/networking/timestamping/txtimestamp.c:64:8: error:
> redefinition of ‘struct in6_pktinfo’
>  struct in6_pktinfo {
>         ^
> In file included from /usr/include/arpa/inet.h:23:0,
>                  from Documentation/networking/timestamping/txtimestamp.c:33:
> /usr/include/netinet/in.h:456:8: note: originally defined here
>  struct in6_pktinfo
>         ^

We need the same workaround as we did for in6_addr etc..
I am working on a patch now.

^ permalink raw reply

* Re: route/max_size sysctl in ipv4
From: Ani Sinha @ 2015-01-06 20:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Pádraig Brady, David Miller, netdev@vger.kernel.org
In-Reply-To: <1420567742.5947.1.camel@edumazet-glaptop2.roam.corp.google.com>

On Tue, Jan 6, 2015 at 10:09 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2015-01-06 at 09:11 -0800, Ani Sinha wrote:
>
>> Why can't se simply change the documentation to reflect the fact that
>> this sysctl is no longer in operation?
>
> It is still in operation for IPv6
>
> Looks like you propose to update the documentation.
>
> This is great !
>
> Why don't you send an official patch ?

Just did.

^ permalink raw reply

* Re: TCP connection issues against Amazon S3
From: Eric Dumazet @ 2015-01-06 20:13 UTC (permalink / raw)
  To: Erik Grinaker; +Cc: Yuchung Cheng, linux-kernel@vger.kernel.org, netdev
In-Reply-To: <76DC89D1-CFCE-44B7-994E-4349FEEDEFA6@bengler.no>

On Tue, 2015-01-06 at 19:42 +0000, Erik Grinaker wrote:

> The transfer on the functioning Netherlands server does indeed use SACKs, while the Norway servers do not.
> 
> For what it’s worth, I have made stripped down pcaps for a single failing transfer as well as a single functioning transfer in the Netherlands:
> 
> http://abstrakt.bengler.no/tcp-issues-s3-failure.pcap.bz2
> http://abstrakt.bengler.no/tcp-issues-s3-success-netherlands.pcap.bz2
> 

Although sender seems to be reluctant to retransmit, this 'failure' is
caused by receiver closing the connection too soon.

Are you sure you do not ask curl to setup a very small completion
timer ?

12:41:00.738336 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 767221:768681, ack 154, win 127, length 1460
12:41:00.738346 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 736561, win 1877, length 0
12:41:05.227150 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 736561:738021, ack 154, win 127, length 1460
12:41:05.227250 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 745321, win 1882, length 0
12:41:05.278287 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 768681:770141, ack 154, win 127, length 1460
12:41:05.278354 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 745321, win 1888, length 0
12:41:05.278421 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 770141:771601, ack 154, win 127, length 1460
12:41:05.278429 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 745321, win 1894, length 0
12:41:14.257102 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 745321:746781, ack 154, win 127, length 1460
12:41:14.257154 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 746781, win 1900, length 0
12:41:14.308117 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 771601:773061, ack 154, win 127, length 1460
12:41:14.308227 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 746781, win 1905, length 0
12:41:14.308387 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 773061:774521, ack 154, win 127, length 1460
12:41:14.308397 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 746781, win 1911, length 0

-> Here receiver sends a FIN, because application closed the socket (or died)
12:41:23.237156 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [F.], seq 154, ack 746781, win 1911, length 0
12:41:23.289805 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 746781:748241, ack 155, win 127, length 1460
12:41:23.289882 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [R], seq 505782802, win 0, length 0

Anyway, getting decent speed without SACK is going to be hard.

^ permalink raw reply

* Re: [PATCH net-next v3 0/5]: ixgbevf: Allow querying VFs RSS indirection table and key
From: Vlad Zolotarov @ 2015-01-06 20:13 UTC (permalink / raw)
  To: Greg Rose; +Cc: Gleb Natapov, netdev, Avi Kivity, jeffrey.t.kirsher
In-Reply-To: <CALgkqUr2tAprqmPVSwA3up9CtkPzgrci-0H05divHhT2NC5_kA@mail.gmail.com>


On 01/06/15 20:22, Greg Rose wrote:
> I accidentally replied just to Vlad - here is a reply to all.
>
> On Tue, Jan 6, 2015 at 9:30 AM, Vlad Zolotarov
> <vladz@cloudius-systems.com>  wrote:
>> On 01/06/15 18:59, Greg Rose wrote:
> [snip]
>
>
>>> I don't have any examples and that is not my area of expertise.  But
>>> just because we can't think of a security risk or attack example
>>> doesn't mean there isn't one.
>>>
>>> Just add a policy hook so that the system admin can decide whether
>>> this information should be shared with the VFs and then we're covered
>>> for cases of both known and unknown exploits, risks, etc.
>> I absolutely disagree with u in regard of defining an RSS redirection table
>> and RSS hash key as a security sensitive data. I don't know how u got to
>> this conclusion.
> I have not reached any such conclusion - let me reiterate:  I have no
> idea.  It is not my area of expertise.  However, to take the lowest
> risk route just add a policy hook so that a system admin can turn the
> feature on through the PF driver (which is acknowledged as secure) if
> they wish then there is no worry.

NP. Let's move on.

>> However I don't want to argue about any longer. Let's move on.
>>
>> Let's clarify one thing about this "hook". Do u agree that it should cover
>> only the cases when VF shares the mentioned above data with PF - namely for
>> all devices but x550?
> Look at how spoof checking is turned off/on for each VF using the "ip
> link set" commands.  That's what I'm envisioning - some way to decide
> on a per VF basis which VFs should be allowed to perform the query.

I will but let's agree that x550 VFs should be out of this since their 
RSS indirection table and Key belong to the specific domain and don't 
impose any even theoretical thread.

thanks,
vlad

> Thanks,
>
> - Greg

^ permalink raw reply

* Re: TCP connection issues against Amazon S3
From: Erik Grinaker @ 2015-01-06 20:26 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Yuchung Cheng, linux-kernel@vger.kernel.org, netdev
In-Reply-To: <1420575216.5947.12.camel@edumazet-glaptop2.roam.corp.google.com>


> On 06 Jan 2015, at 20:13, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> On Tue, 2015-01-06 at 19:42 +0000, Erik Grinaker wrote:
> 
>> The transfer on the functioning Netherlands server does indeed use SACKs, while the Norway servers do not.
>> 
>> For what it’s worth, I have made stripped down pcaps for a single failing transfer as well as a single functioning transfer in the Netherlands:
>> 
>> http://abstrakt.bengler.no/tcp-issues-s3-failure.pcap.bz2
>> http://abstrakt.bengler.no/tcp-issues-s3-success-netherlands.pcap.bz2
>> 
> 
> Although sender seems to be reluctant to retransmit, this 'failure' is
> caused by receiver closing the connection too soon.
> 
> Are you sure you do not ask curl to setup a very small completion
> timer ?

For testing, I am using Curl with a 30 second timeout. This may well be a bit short, but the point is that with the older kernel I could run thousands of requests without a single failure (generally the requests would finish within seconds), while with the newer kernel about 5% of requests will time out (the rest complete within seconds).

> 12:41:00.738336 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 767221:768681, ack 154, win 127, length 1460
> 12:41:00.738346 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 736561, win 1877, length 0
> 12:41:05.227150 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 736561:738021, ack 154, win 127, length 1460
> 12:41:05.227250 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 745321, win 1882, length 0
> 12:41:05.278287 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 768681:770141, ack 154, win 127, length 1460
> 12:41:05.278354 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 745321, win 1888, length 0
> 12:41:05.278421 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 770141:771601, ack 154, win 127, length 1460
> 12:41:05.278429 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 745321, win 1894, length 0
> 12:41:14.257102 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 745321:746781, ack 154, win 127, length 1460
> 12:41:14.257154 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 746781, win 1900, length 0
> 12:41:14.308117 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 771601:773061, ack 154, win 127, length 1460
> 12:41:14.308227 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 746781, win 1905, length 0
> 12:41:14.308387 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 773061:774521, ack 154, win 127, length 1460
> 12:41:14.308397 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 746781, win 1911, length 0
> 
> -> Here receiver sends a FIN, because application closed the socket (or died)
> 12:41:23.237156 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [F.], seq 154, ack 746781, win 1911, length 0
> 12:41:23.289805 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 746781:748241, ack 155, win 127, length 1460
> 12:41:23.289882 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [R], seq 505782802, win 0, length 0
> 
> Anyway, getting decent speed without SACK is going to be hard.

Yes, I am not sure why the sender (S3) disables SACK on my Norwegian servers (across ISPs), while it enables SACK on my server in the Netherlands. They run the same kernel and configuration. I will have to look into it more closely tomorrow.

^ permalink raw reply

* Re: [PATCH net-next 1/3] net: add IPv4 routing FIB support for swdev
From: Hannes Frederic Sowa @ 2015-01-06 20:26 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Netdev, Jiří Pírko, john fastabend, Thomas Graf,
	Jamal Hadi Salim, Andy Gospodarek, Roopa Prabhu
In-Reply-To: <1420574353.15181.19.camel@stressinduktion.org>

On Di, 2015-01-06 at 20:59 +0100, Hannes Frederic Sowa wrote:
> Sorry, I haven't fully understood this. Does rocker first do a L3
> routing table lookup and *after* that does decide which nexthop to chose
> based on preferences in the action-set found at the leaf? My gut tells
> me that we cannot do a semantically equivalent to ip rules then, we
> would have to use ACLs then. Hmm...

Does rocker drop the packet if no match is found or can it pass the
packet onto the slowpath to the kernel?

^ permalink raw reply

* [PATCH net-next] tg3: move init/deinit from open/close to probe/remove
From: Ivan Vecera @ 2015-01-06 20:28 UTC (permalink / raw)
  To: netdev; +Cc: prashant, mchan

Move init and deinit of PTP support from open/close functions
to probe/remove funcs to avoid removing/re-adding of associated PTP
device(s) during ifup/ifdown.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
---
 drivers/net/ethernet/broadcom/tg3.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 553dcd8..e86bee4 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -11681,13 +11681,6 @@ static int tg3_open(struct net_device *dev)
 		pci_set_power_state(tp->pdev, PCI_D3hot);
 	}
 
-	if (tg3_flag(tp, PTP_CAPABLE)) {
-		tp->ptp_clock = ptp_clock_register(&tp->ptp_info,
-						   &tp->pdev->dev);
-		if (IS_ERR(tp->ptp_clock))
-			tp->ptp_clock = NULL;
-	}
-
 	return err;
 }
 
@@ -11701,8 +11694,6 @@ static int tg3_close(struct net_device *dev)
 		return -EAGAIN;
 	}
 
-	tg3_ptp_fini(tp);
-
 	tg3_stop(tp);
 
 	/* Clear stats across close / open calls */
@@ -17880,6 +17871,13 @@ static int tg3_init_one(struct pci_dev *pdev,
 		goto err_out_apeunmap;
 	}
 
+	if (tg3_flag(tp, PTP_CAPABLE)) {
+		tp->ptp_clock = ptp_clock_register(&tp->ptp_info,
+						   &tp->pdev->dev);
+		if (IS_ERR(tp->ptp_clock))
+			tp->ptp_clock = NULL;
+	}
+
 	netdev_info(dev, "Tigon3 [partno(%s) rev %04x] (%s) MAC address %pM\n",
 		    tp->board_part_number,
 		    tg3_chip_rev_id(tp),
@@ -17955,6 +17953,8 @@ static void tg3_remove_one(struct pci_dev *pdev)
 	if (dev) {
 		struct tg3 *tp = netdev_priv(dev);
 
+		tg3_ptp_fini(tp);
+
 		release_firmware(tp->fw);
 
 		tg3_reset_task_cancel(tp);
-- 
2.0.5

^ permalink raw reply related

* Does the ordering of the fib_table_dump or /proc/net/fib_trie matter?
From: Alexander Duyck @ 2015-01-06 20:30 UTC (permalink / raw)
  To: miller >> David Miller, stephen, NetDev

I am considering reversing the order of any non-lookup traversal of the
fib_trie so that it starts at the last node and works it's way up toward
the first node.  This would make it so that all walks using the parent
pointer all go in the same direction.

The problem right now is that leaf_walk_rcu and a couple of other
iterators traverse the trie in one direction grabbing the next child
(child++) of the parent, while fib_table_lookup is traversing the list
grabbing a previous child (child & (child - 1)) of the parent.  It makes
things a bit ugly for RCU as we have to have the node fully populated
before we can start updating the parent pointers on the children.

I want to have them both moving in the same direction so the
fib_table_lookup would remain the same, but the leaf_walk_rcu and others
would walk from the last child to the first (child--) and as a result
when I assemble a tnode in inflate or halve I would be able to populate
children from 0 to ((1 << tn->bits) - 1) without having to worry about
any iterators walking into uninitialized memory.

The question I have is if that would screw up any user-space apps.  I
know ip route can dump the list via "ip route show".  I'm just wondering
if there would be any problem with default being the last entry instead
of the first entry?

- Alex

^ permalink raw reply

* Re: [PATCH] net: ethernet: cpsw: ignore VLAN ID 1
From: Felipe Balbi @ 2015-01-06 20:31 UTC (permalink / raw)
  To: David Miller; +Cc: balbi, netdev, linux-omap, stable, mugunthanvnm
In-Reply-To: <20150106.141323.2091288413667564444.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 1441 bytes --]

Hi,

On Tue, Jan 06, 2015 at 02:13:23PM -0500, David Miller wrote:
> From: Felipe Balbi <balbi@ti.com>
> Date: Tue, 6 Jan 2015 11:43:32 -0600
> 
> > CPSW completely hangs if we add, and later remove,
> > VLAN ID #1. What happens is that after removing
> > VLAN ID #1, no packets will be received by CPSW
> > rendering network unusable.
> > 
> > In order to "fix" the issue, we're returning -EINVAL
> > if anybody tries to add VLAN ID #1. While at that,
> > also filter out any ID > 4095 because we only have
> > 12 bits for VLAN IDs.
> > 
> > Fixes: 3b72c2f (drivers: net:ethernet: cpsw: add support for VLAN)
> > Cc: <stable@vger.kernel.org> # v3.9+
> > Cc: Mugunthan V N <mugunthanvnm@ti.com>
> > Tested-by: Schuyler Patton <spatton@ti.com>
> > Signed-off-by: Felipe Balbi <balbi@ti.com>
> 
> You can't just unilaterally make one VLAN ID unusable.
> 
> A better way to handle this situation must be found,
> and if that means turning off hw VLAN support completely,
> that's a much better alternative to this.
> 
> I'm not applying this patch, sorry.

All other IDs work alright, it's just ID 1 which seems to be quirky. In
fact when trying to add VLAN ID 1, vconfig itself dumps out a warning
that VLAN ID 1 doesn't work on most switches.

What you're saying here is that you prefer to drop a feature that works
for all other 1023 IDs because 1 ID is quirky. Sounds like overkill
to me.

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: 3.12.33 - BUG xfrm_selector_match+0x25/0x2f6
From: Julian Anastasov @ 2015-01-06 20:46 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Smart Weblications GmbH - Florian Wiessner, Steffen Klassert,
	netdev, LKML, stable, Simon Horman, lvs-devel
In-Reply-To: <54ABDB98.9060504@suse.cz>


	Hello,

On Tue, 6 Jan 2015, Jiri Slaby wrote:

> So what should be done to fix the issue in stable 3.12? Are those
> patches needed in the upstream kernel too? In that case I suppose it
> will propagate to me through upstream. Otherwise, could you send "3.12
> only" patches to stable@ so that I can apply them?

	I asked Pablo for the old fix for IPVS-FTP:

http://www.spinics.net/lists/lvs-devel/msg03879.html

	The new fix for the xfrm crash is not applied yet:

http://www.spinics.net/lists/lvs-devel/msg03877.html

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* [PATCH net-next] openvswitch: Do not use private netdev_vport fields
From: Daniele Di Proietto @ 2015-01-06 20:51 UTC (permalink / raw)
  To: netdev; +Cc: pshelar, Daniele Di Proietto

This commit introduces netdev_vport_index() to prevent datapath.c from directly accessing the 'dev' member of 'struct netdev_vport'.
This fix is needed to allow possible alternative netdev_vport implementations.

Signed-off-by: Daniele Di Proietto <daniele.di.proietto@gmail.com>
---
 net/openvswitch/datapath.c     | 2 +-
 net/openvswitch/vport-netdev.h | 6 ++++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 4e9a5f0..d632535 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -186,7 +186,7 @@ static int get_dpifindex(const struct datapath *dp)
 
 	local = ovs_vport_rcu(dp, OVSP_LOCAL);
 	if (local)
-		ifindex = netdev_vport_priv(local)->dev->ifindex;
+		ifindex = netdev_vport_index(local);
 	else
 		ifindex = 0;
 
diff --git a/net/openvswitch/vport-netdev.h b/net/openvswitch/vport-netdev.h
index 6f7038e..ecfcbd5 100644
--- a/net/openvswitch/vport-netdev.h
+++ b/net/openvswitch/vport-netdev.h
@@ -38,6 +38,12 @@ netdev_vport_priv(const struct vport *vport)
 	return vport_priv(vport);
 }
 
+static inline int
+netdev_vport_index(const struct vport *vport)
+{
+	return netdev_vport_priv(vport)->dev->ifindex;
+}
+
 const char *ovs_netdev_get_name(const struct vport *);
 void ovs_netdev_detach_dev(struct vport *);
 
-- 
2.1.4

^ permalink raw reply related

* Re: TCP connection issues against Amazon S3
From: Erik Grinaker @ 2015-01-06 21:04 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Yuchung Cheng, linux-kernel@vger.kernel.org, netdev
In-Reply-To: <4DA8529D-4EEC-42DA-89B0-DC7746DB2B10@bengler.no>


> On 06 Jan 2015, at 20:26, Erik Grinaker <erik@bengler.no> wrote:
> 
>> 
>> On 06 Jan 2015, at 20:13, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> 
>> On Tue, 2015-01-06 at 19:42 +0000, Erik Grinaker wrote:
>> 
>>> The transfer on the functioning Netherlands server does indeed use SACKs, while the Norway servers do not.
>>> 
>>> For what it’s worth, I have made stripped down pcaps for a single failing transfer as well as a single functioning transfer in the Netherlands:
>>> 
>>> http://abstrakt.bengler.no/tcp-issues-s3-failure.pcap.bz2
>>> http://abstrakt.bengler.no/tcp-issues-s3-success-netherlands.pcap.bz2
>>> 
>> 
>> Although sender seems to be reluctant to retransmit, this 'failure' is
>> caused by receiver closing the connection too soon.
>> 
>> Are you sure you do not ask curl to setup a very small completion
>> timer ?
> 
> For testing, I am using Curl with a 30 second timeout. This may well be a bit short, but the point is that with the older kernel I could run thousands of requests without a single failure (generally the requests would finish within seconds), while with the newer kernel about 5% of requests will time out (the rest complete within seconds).
> 
>> 12:41:00.738336 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 767221:768681, ack 154, win 127, length 1460
>> 12:41:00.738346 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 736561, win 1877, length 0
>> 12:41:05.227150 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 736561:738021, ack 154, win 127, length 1460
>> 12:41:05.227250 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 745321, win 1882, length 0
>> 12:41:05.278287 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 768681:770141, ack 154, win 127, length 1460
>> 12:41:05.278354 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 745321, win 1888, length 0
>> 12:41:05.278421 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 770141:771601, ack 154, win 127, length 1460
>> 12:41:05.278429 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 745321, win 1894, length 0
>> 12:41:14.257102 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 745321:746781, ack 154, win 127, length 1460
>> 12:41:14.257154 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 746781, win 1900, length 0
>> 12:41:14.308117 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 771601:773061, ack 154, win 127, length 1460
>> 12:41:14.308227 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 746781, win 1905, length 0
>> 12:41:14.308387 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 773061:774521, ack 154, win 127, length 1460
>> 12:41:14.308397 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [.], ack 746781, win 1911, length 0
>> 
>> -> Here receiver sends a FIN, because application closed the socket (or died)
>> 12:41:23.237156 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [F.], seq 154, ack 746781, win 1911, length 0
>> 12:41:23.289805 IP 54.231.132.98.80 > 195.159.221.106.48837: Flags [.], seq 746781:748241, ack 155, win 127, length 1460
>> 12:41:23.289882 IP 195.159.221.106.48837 > 54.231.132.98.80: Flags [R], seq 505782802, win 0, length 0
>> 
>> Anyway, getting decent speed without SACK is going to be hard.
> 
> Yes, I am not sure why the sender (S3) disables SACK on my Norwegian servers (across ISPs), while it enables SACK on my server in the Netherlands. They run the same kernel and configuration. I will have to look into it more closely tomorrow.

It turns out the Norway and Netherlands servers were resolving different loadbalancers. The ones I reached in Norway did not support SACKs, while the ones in the Netherlands did. Going directly to a SACK-enabled IP fixes the problem.

This still doesn’t explain why it works with older kernels, but not newer ones. I’m thinking it’s probably some minor change, which gets amplified by the lack of SACKs on the loadbalancer. Anyway, I’ll bring it up with Amazon.

Many thanks for your help, everyone.

^ permalink raw reply

* Re: [PATCH net-next v3 0/5]: ixgbevf: Allow querying VFs RSS indirection table and key
From: Greg Rose @ 2015-01-06 21:13 UTC (permalink / raw)
  To: Vlad Zolotarov; +Cc: Gleb Natapov, netdev, Avi Kivity, jeffrey.t.kirsher
In-Reply-To: <54AC4206.4030006@cloudius-systems.com>

On Tue, Jan 6, 2015 at 12:13 PM, Vlad Zolotarov
<vladz@cloudius-systems.com> wrote:
>
> On 01/06/15 20:22, Greg Rose wrote:
>>

[snip]

>> I have not reached any such conclusion - let me reiterate:  I have no
>> idea.  It is not my area of expertise.  However, to take the lowest
>> risk route just add a policy hook so that a system admin can turn the
>> feature on through the PF driver (which is acknowledged as secure) if
>> they wish then there is no worry.
>
>
> NP. Let's move on.
>
>>> However I don't want to argue about any longer. Let's move on.
>>>
>>> Let's clarify one thing about this "hook". Do u agree that it should
>>> cover
>>> only the cases when VF shares the mentioned above data with PF - namely
>>> for
>>> all devices but x550?
>>
>> Look at how spoof checking is turned off/on for each VF using the "ip
>> link set" commands.  That's what I'm envisioning - some way to decide
>> on a per VF basis which VFs should be allowed to perform the query.
>
>
> I will but let's agree that x550 VFs should be out of this since their RSS
> indirection table and Key belong to the specific domain and don't impose any
> even theoretical thread.

Sounds good to me.

Thanks!

- Greg

>
>
> thanks,
> vlad
>
>> Thanks,
>>
>> - Greg
>
>

^ permalink raw reply

* Re: [PATCH net-next] openvswitch: Do not use private netdev_vport fields
From: Pravin Shelar @ 2015-01-06 21:16 UTC (permalink / raw)
  To: Daniele Di Proietto; +Cc: netdev
In-Reply-To: <1420577481-20238-1-git-send-email-daniele.di.proietto@gmail.com>

On Tue, Jan 6, 2015 at 12:51 PM, Daniele Di Proietto
<daniele.di.proietto@gmail.com> wrote:
> This commit introduces netdev_vport_index() to prevent datapath.c from directly accessing the 'dev' member of 'struct netdev_vport'.
> This fix is needed to allow possible alternative netdev_vport implementations.
>
> Signed-off-by: Daniele Di Proietto <daniele.di.proietto@gmail.com>
> ---
>  net/openvswitch/datapath.c     | 2 +-
>  net/openvswitch/vport-netdev.h | 6 ++++++
>  2 files changed, 7 insertions(+), 1 deletion(-)
>
...
>
> diff --git a/net/openvswitch/vport-netdev.h b/net/openvswitch/vport-netdev.h
> index 6f7038e..ecfcbd5 100644
> --- a/net/openvswitch/vport-netdev.h
> +++ b/net/openvswitch/vport-netdev.h
> @@ -38,6 +38,12 @@ netdev_vport_priv(const struct vport *vport)
>         return vport_priv(vport);
>  }
>
> +static inline int
> +netdev_vport_index(const struct vport *vport)
> +{
> +       return netdev_vport_priv(vport)->dev->ifindex;
> +}
> +
Function return type and function name should be on same line,
otherwise looks good.

>  const char *ovs_netdev_get_name(const struct vport *);
>  void ovs_netdev_detach_dev(struct vport *);
>
> --
> 2.1.4
>

^ permalink raw reply

* [PATCH] qla3xxx: don't allow never end busy loop
From: Andy Shevchenko @ 2015-01-06 21:17 UTC (permalink / raw)
  To: netdev, linux-driver, David S . Miller; +Cc: Andy Shevchenko

The counter variable wasn't increased at all which may stuck under
certain circumstances.

Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
---
 drivers/net/ethernet/qlogic/qla3xxx.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c b/drivers/net/ethernet/qlogic/qla3xxx.c
index c2f09af..4847713 100644
--- a/drivers/net/ethernet/qlogic/qla3xxx.c
+++ b/drivers/net/ethernet/qlogic/qla3xxx.c
@@ -146,10 +146,7 @@ static int ql_wait_for_drvr_lock(struct ql3_adapter *qdev)
 {
 	int i = 0;
 
-	while (i < 10) {
-		if (i)
-			ssleep(1);
-
+	do {
 		if (ql_sem_lock(qdev,
 				QL_DRVR_SEM_MASK,
 				(QL_RESOURCE_BITS_BASE_CODE | (qdev->mac_index)
@@ -158,7 +155,8 @@ static int ql_wait_for_drvr_lock(struct ql3_adapter *qdev)
 				      "driver lock acquired\n");
 			return 1;
 		}
-	}
+		ssleep(1);
+	} while (++i < 10);
 
 	netdev_err(qdev->ndev, "Timed out waiting for driver lock...\n");
 	return 0;
-- 
1.8.3.101.g727a46b

^ permalink raw reply related

* [PATCH net] ipv6: Prevent ipv6_find_hdr() from returning ENOENT for valid non-first fragments
From: Rahul Sharma @ 2015-01-06 21:33 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, hannes

ipv6_find_hdr() currently assumes that the next-header field in the
fragment header of the non-first fragment is the "protocol number of
the last header" (here last header excludes any extension header
protocol numbers ) which is incorrect as per RFC2460. The next-header
value is the first header of the fragmentable part of the original
packet (which can be extension header as well).
This can create reassembly problems. For example: Fragmented
authenticated OSPFv3 packets (where AH header is inserted before the
protocol header). For the second fragment, the next header value in
the fragment header will be NEXTHDR_AUTH which is correct but
ipv6_find_hdr will return ENOENT since AH is an extension header
resulting in second fragment getting dropped. This check for the
presence of non-extension header needs to be removed.

Signed-off-by: Rahul Sharma <rsharma@arista.com>
---
--- linux-3.18.1/net/ipv6/exthdrs_core.c.orig   2015-01-06
10:25:36.411419863 -0800
+++ linux-3.18.1/net/ipv6/exthdrs_core.c        2015-01-06
10:51:45.819364986 -0800
@@ -171,10 +171,11 @@ EXPORT_SYMBOL_GPL(ipv6_find_tlv);
  * If the first fragment doesn't contain the final protocol header or
  * NEXTHDR_NONE it is considered invalid.
  *
- * Note that non-1st fragment is special case that "the protocol number
- * of last header" is "next header" field in Fragment header. In this case,
- * *offset is meaningless and fragment offset is stored in *fragoff if fragoff
- * isn't NULL.
+ * Note that non-1st fragment is special case that "the protocol number of the
+ * first header of the fragmentable part of the original packet" is
+ * "next header" field in the Fragment header. In this case, *offset is
+ * meaningless and fragment offset is stored in *fragoff if fragoff isn't
+ * NULL.
  *
  * if flags is not NULL and it's a fragment, then the frag flag
  * IP6_FH_F_FRAG will be set. If it's an AH header, the
@@ -250,9 +251,7 @@ int ipv6_find_hdr(const struct sk_buff *

                        _frag_off = ntohs(*fp) & ~0x7;
                        if (_frag_off) {
-                               if (target < 0 &&
-                                   ((!ipv6_ext_hdr(hp->nexthdr)) ||
-                                    hp->nexthdr == NEXTHDR_NONE)) {
+                               if (target < 0) {
                                        if (fragoff)
                                                *fragoff = _frag_off;
                                        return hp->nexthdr;

^ permalink raw reply

* Re: Possible BUG in ipv6_find_hdr function for fragmented packets
From: Rahul Sharma @ 2015-01-06 21:43 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: netdev
In-Reply-To: <1420551094.32369.34.camel@stressinduktion.org>

Hi Hannes

On Tue, Jan 6, 2015 at 7:01 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> Hi Rahul,
>
> On Mi, 2014-12-31 at 12:33 +0530, Rahul Sharma wrote:
>> I have observed a problem when I added an AH header before protocol
>> header (OSPFv3) while implementing authentication support for OSPFv3.
>>
>> Problem: Fragmented packets which include authentication header don't
>> get reassembled in the kernel. This was because ipv6_find_hdr returns
>> ENOENT for the non-first fragment since AH is an extension header.
>>
>> Firstly, this comment  "Note that non-1st fragment is special case
>> that "the protocol number of last header" is "next header" field in
>> Fragment header" ('last header' doesn't include AH or other extension
>> headers) before ipv6_find_hdr looks incorrect as per the description
>> of the fragmentation process in RFC2460. The rfc clearly states that
>> next header value in the fragments will be the first header of the
>> Fragmentable part of the original packet which could be AH (51) as in
>> our case.
>>
>> This code looks like a problem:
>> if (_frag_off) {
>> 253                                 if (target < 0 &&
>> 254                                     ((!ipv6_ext_hdr(hp->nexthdr)) ||
>> 255                                      hp->nexthdr == NEXTHDR_NONE)) {
>> 256                                         if (fragoff)
>> 257                                                 *fragoff = _frag_off;
>> 258                                         return hp->nexthdr;
>> 259                                 }
>> 260                                 return -ENOENT;
>> 261                         }
>>
>> For non-first fragments, the 'next header' in the fragment header
>> would *always* be AUTH (or whatever extension header is the first
>> header in first fragment). But the above code will keep on returning
>> ENOENT for the non-first fragment in such cases.
>>
>> Solution: I suggest we should get away with this check
>> ((!ipv6_ext_hdr(hp->nexthdr)) ||hp->nexthdr == NEXTHDR_NONE))  and
>> simply return hp->nexthdr if the _frag_off is non zero. I tested it on
>> my machine and it works. Adding an special case for NEXTHDR_AUTH also
>> works for me.
>
> The packets do get dropped in netfilter code? Do you have any idea were
> specifically?
>
> Your suggestion seems correct to me, can you provide a patch to fix
> this?
>
> Thanks,
> Hannes
>
>

Yes, the packets get dropped in the netfilter code. ip6table_raw_hook
was returning NF_DROP for the second fragment.
This was because of xt_action_param structure's hotdrop flag being set
to true for this fragment when ip6t_do_table tries to call
ip6_packet_match which in turn calls ipv6_find_hdr which was returning
ENOENT.

I have also emailed the patch.

Thanks

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox