Netdev List
 help / color / mirror / Atom feed
* Re: [RFC] API to modify /proc/sys/net/ipv4/ip_local_reserved_ports
From: Helge Deller @ 2012-05-17 21:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Cong Wang, Octavian Purdila, netdev, David Miller, Andrew Morton,
	Frank Danapfel, Laszlo Ersek, shemminger
In-Reply-To: <m18vi3w7zd.fsf@fess.ebiederm.org>

On 04/11/2012 12:13 AM, Eric W. Biederman wrote:
> Helge Deller <deller@gmx.de> writes:
> 
>> On 04/09/2012 10:43 AM, Cong Wang wrote:
>>> On Wed, 2012-04-04 at 22:24 +0200, Helge Deller wrote:
>>>> I would like to follow up on my last patch series to be able to modify
>>>> the contents of the /proc/sys/net/ipv4/ip_local_reserved_ports port list
>>>> from userspace.
>>>>
>>>> My last patch (https://lkml.org/lkml/2012/3/10/187) was based on
>>>> modifications to the proc interface, which - based on the feedback here
>>>> on the list - seemed to not be the right way to go (although I personally
>>>> still like the idea very much :-)).
>>>>
>>>> Anyway, with this RFC I would like to get feedback about a new proposed
>>>> API and attached kernel patch.
>>>>
>>>> The idea is to introduce a new<optname>  value for get/setsockopt()
>>>> named SO_RESERVED_PORTS to get/set the ip_local_reserved_ports
>>>> bitmap via standard get/setsockopt() syscalls.
>>>> As far as I understand this seems to be similiar to how iptables works.
>>>>
>>>> An untested kernel patch for review and feedback is attached below.
>>>>
>>>> In userspace it then would be possible to write a new tool or to extend
>>>> for example the "ip" tool to accept commands like:
>>>> $>  ip reserved_ports add 100-2000
>>>> $>  ip reserved_ports remove 50-60
>>>> $>  ip reserved_ports list     (to show current reserved port list)
>>>>
>>>> This userspace tool could then read the port bitmap from kernel via
>>>> a) socket(PF_INET, SOCK_RAW, IPPROTO_RAW)
>>>> b) getsockopt(3, SOL_SOCKET, SO_RESERVED_PORTS,<bitmaplist>)
>>>> and write back the results after modification via
>>>> c) setsockopt(3, SOL_SOCKET, SO_RESERVED_PORTS,<bitmaplist>)
>>>>
>>>> Would that be an acceptable solution?
>>> Hmm, it is indeed that bitmap fits for syscall rather than /proc file.
>>>
>>> But it seems that using getsockopt()/setsockopt() makes it like it is a
>>> per-socket setting, actually it is a system-wide setting.
>> Yes, that's the reason why I used SOL_SOCKET which configures at least
>> a few system-wide settings too.
>>
>>> So I am
>>> wondering if exporting a binary /proc file for this is a better
>>> solution.
>> Yeah - that's another solution, but (65536 ports)/(8 bits per byte) = 8 KByte,
>> so we
>> may again hit the 4k limit of /proc (unless you do binary reads which should
>> be done with a binary /proc-entry anyway).
>>
>> Again, I'm open to develop any kind of solution which would get an OK
>> here.
> 
> Just looking at proc_do_large_bitmap, it does appear that there is a
> very local 4k limit on writes.
> 
> Can you please just modify proc_do_large_bitmap so that there is not a
> 4k limit on writes.  Ideally the code would just read another 4k from
> userspace when it is getting close to the end of it's 4k buffer, or
> perhaps we just read everything directly from userspace and run slowly.

Hi Eric,

sorry for the very late reply.
Yes, you are right- this is only a local 4K limit. Increasing it allowed me 
to write more ports at once.

With your tips I was now able to build a simple solution which fits my needs.
Based on standard tools like echo and dd (with the seek option) I can
block all ports which I need.

Nevertheless, the current kernel interface is not very flexible.
So, my proposal for a new interface (with tools) still stands. I just need
and advise what would be acceptable. Without any advise I will just leave
everything as is (since I'm now fine with it).

Helge

^ permalink raw reply

* Re: [PATCH 2/5] drivers/net: delete all code/drivers depending on CONFIG_MCA
From: Paul Gortmaker @ 2012-05-17 21:20 UTC (permalink / raw)
  To: linux-kernel, JBottomley, David S. Miller, netdev
In-Reply-To: <4FB52C99.6010303@windriver.com>

[Re: [PATCH 2/5] drivers/net: delete all code/drivers depending on CONFIG_MCA] On 17/05/2012 (Thu 12:51) Paul Gortmaker wrote:

> On 12-05-17 12:40 PM, Paul Gortmaker wrote:
> 
> [..]
> 
> >  18 files changed, 18 insertions(+), 5288 deletions(-)
> >  delete mode 100644 drivers/net/ethernet/8390/ne2.c
> >  delete mode 100644 drivers/net/ethernet/8390/smc-mca.c
> >  delete mode 100644 drivers/net/ethernet/i825xx/3c523.c
> >  delete mode 100644 drivers/net/ethernet/i825xx/3c523.h
> >  delete mode 100644 drivers/net/ethernet/i825xx/3c527.c
> 
> I can also obviously delete 3c527.h too.  Not sure why I didn't
> notice that until just now...  :(

I've layered the above commit (with 3c527.h deleted) onto today's
net-next, did a quick re-test of x86_64 allmodconfig and put it
on a branch as per below (vs sending a 5000+ line patch).

Thanks,
Paul.
---

The following changes since commit bc6a4744b827c5a78ca591acca81809bddb8b2db:

  net/mlx4_en: num cores tx rings for every UP (2012-05-17 16:17:50 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux.git delete-mca-net-drivers

for you to fetch changes up to a5e371f61ad33c07b28e7c9b60c78d71fdd34e2a:

  drivers/net: delete all code/drivers depending on CONFIG_MCA (2012-05-17 16:37:41 -0400)

----------------------------------------------------------------
Paul Gortmaker (1):
      drivers/net: delete all code/drivers depending on CONFIG_MCA

 Documentation/networking/3c509.txt     |    1 -
 Documentation/networking/fore200e.txt  |    6 +-
 drivers/net/Space.c                    |   16 +-
 drivers/net/ethernet/3com/3c509.c      |  123 +---
 drivers/net/ethernet/8390/Kconfig      |   24 -
 drivers/net/ethernet/8390/Makefile     |    1 -
 drivers/net/ethernet/8390/ne2.c        |  798 ---------------
 drivers/net/ethernet/8390/smc-mca.c    |  575 -----------
 drivers/net/ethernet/amd/depca.c       |  210 +----
 drivers/net/ethernet/fujitsu/at1700.c  |  120 +---
 drivers/net/ethernet/i825xx/3c523.c    | 1312 -------------------------
 drivers/net/ethernet/i825xx/3c523.h    |  355 -------
 drivers/net/ethernet/i825xx/3c527.c    | 1660 --------------------------------
 drivers/net/ethernet/i825xx/3c527.h    |   81 --
 drivers/net/ethernet/i825xx/Kconfig    |   22 -
 drivers/net/ethernet/i825xx/Makefile   |    2 -
 drivers/net/ethernet/i825xx/eexpress.c |   60 +--
 drivers/net/ethernet/natsemi/Kconfig   |   20 +-
 drivers/net/ethernet/natsemi/Makefile  |    1 -
 19 files changed, 18 insertions(+), 5369 deletions(-)
 delete mode 100644 drivers/net/ethernet/8390/ne2.c
 delete mode 100644 drivers/net/ethernet/8390/smc-mca.c
 delete mode 100644 drivers/net/ethernet/i825xx/3c523.c
 delete mode 100644 drivers/net/ethernet/i825xx/3c523.h
 delete mode 100644 drivers/net/ethernet/i825xx/3c527.c
 delete mode 100644 drivers/net/ethernet/i825xx/3c527.h

^ permalink raw reply

* Re: [RFC] API to modify /proc/sys/net/ipv4/ip_local_reserved_ports
From: Stephen Hemminger @ 2012-05-17 21:22 UTC (permalink / raw)
  To: Helge Deller
  Cc: Eric W. Biederman, Cong Wang, Octavian Purdila, netdev,
	David Miller, Andrew Morton, Frank Danapfel, Laszlo Ersek
In-Reply-To: <4FB56B1A.6010208@gmx.de>

On Thu, 17 May 2012 23:18:18 +0200
Helge Deller <deller@gmx.de> wrote:

> On 04/11/2012 12:13 AM, Eric W. Biederman wrote:
> > Helge Deller <deller@gmx.de> writes:
> > 
> >> On 04/09/2012 10:43 AM, Cong Wang wrote:
> >>> On Wed, 2012-04-04 at 22:24 +0200, Helge Deller wrote:
> >>>> I would like to follow up on my last patch series to be able to modify
> >>>> the contents of the /proc/sys/net/ipv4/ip_local_reserved_ports port list
> >>>> from userspace.
> >>>>
> >>>> My last patch (https://lkml.org/lkml/2012/3/10/187) was based on
> >>>> modifications to the proc interface, which - based on the feedback here
> >>>> on the list - seemed to not be the right way to go (although I personally
> >>>> still like the idea very much :-)).
> >>>>
> >>>> Anyway, with this RFC I would like to get feedback about a new proposed
> >>>> API and attached kernel patch.
> >>>>
> >>>> The idea is to introduce a new<optname>  value for get/setsockopt()
> >>>> named SO_RESERVED_PORTS to get/set the ip_local_reserved_ports
> >>>> bitmap via standard get/setsockopt() syscalls.
> >>>> As far as I understand this seems to be similiar to how iptables works.
> >>>>
> >>>> An untested kernel patch for review and feedback is attached below.
> >>>>
> >>>> In userspace it then would be possible to write a new tool or to extend
> >>>> for example the "ip" tool to accept commands like:
> >>>> $>  ip reserved_ports add 100-2000
> >>>> $>  ip reserved_ports remove 50-60
> >>>> $>  ip reserved_ports list     (to show current reserved port list)
> >>>>
> >>>> This userspace tool could then read the port bitmap from kernel via
> >>>> a) socket(PF_INET, SOCK_RAW, IPPROTO_RAW)
> >>>> b) getsockopt(3, SOL_SOCKET, SO_RESERVED_PORTS,<bitmaplist>)
> >>>> and write back the results after modification via
> >>>> c) setsockopt(3, SOL_SOCKET, SO_RESERVED_PORTS,<bitmaplist>)
> >>>>
> >>>> Would that be an acceptable solution?
> >>> Hmm, it is indeed that bitmap fits for syscall rather than /proc file.
> >>>
> >>> But it seems that using getsockopt()/setsockopt() makes it like it is a
> >>> per-socket setting, actually it is a system-wide setting.
> >> Yes, that's the reason why I used SOL_SOCKET which configures at least
> >> a few system-wide settings too.
> >>
> >>> So I am
> >>> wondering if exporting a binary /proc file for this is a better
> >>> solution.
> >> Yeah - that's another solution, but (65536 ports)/(8 bits per byte) = 8 KByte,
> >> so we
> >> may again hit the 4k limit of /proc (unless you do binary reads which should
> >> be done with a binary /proc-entry anyway).
> >>
> >> Again, I'm open to develop any kind of solution which would get an OK
> >> here.
> > 
> > Just looking at proc_do_large_bitmap, it does appear that there is a
> > very local 4k limit on writes.
> > 
> > Can you please just modify proc_do_large_bitmap so that there is not a
> > 4k limit on writes.  Ideally the code would just read another 4k from
> > userspace when it is getting close to the end of it's 4k buffer, or
> > perhaps we just read everything directly from userspace and run slowly.
> 
> Hi Eric,
> 
> sorry for the very late reply.
> Yes, you are right- this is only a local 4K limit. Increasing it allowed me 
> to write more ports at once.
> 
> With your tips I was now able to build a simple solution which fits my needs.
> Based on standard tools like echo and dd (with the seek option) I can
> block all ports which I need.
> 
> Nevertheless, the current kernel interface is not very flexible.
> So, my proposal for a new interface (with tools) still stands. I just need
> and advise what would be acceptable. Without any advise I will just leave
> everything as is (since I'm now fine with it).
> 
> Helge

Sounds like an ideal case for providing an mmap interface to the
bitmap?

^ permalink raw reply

* Re: Stable regression with 'tcp: allow splice() to build full TSO packets'
From: Eric Dumazet @ 2012-05-17 21:40 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: netdev
In-Reply-To: <20120517211414.GP14498@1wt.eu>

On Thu, 2012-05-17 at 23:14 +0200, Willy Tarreau wrote:

> I understand and that could be possible indeed. Still, this precise code
> has been used for a few years now in prod at 10+ Gbps, so while that does
> not mean it's exempt from any race condition or bug, we have not observed
> this behaviour earlier. In fact, what I've not tested much was the small
> ARM based machine which is just a convenient development system to try to
> optimize network performance. Among the differences I see with usual
> deployments is that the NIC doesn't support TSO, while I've been used to
> enable splicing only where TSO was supported, because before your recent
> optimizations, it was less performant than recv/send.
> 
> > You drain bytes from fd 0xe to pipe buffers, but I dont see you check
> > write ability on destination socket prior the splice(pipe -> socket)
> 
> I don't, I only rely on EAGAIN to re-enable polling for write (otherwise
> it becomes a real mess of epoll_ctl which sensibly hurts performance). I
> only re-enable polling if FDs can't move anymore.
> 
> Before doing a splice(read), if any data are left pending in the pipe, I
> first try a splice(write) to try to flush the pipe, then I perform the
> splice(read) then try to flush the pipe again using a splice(write).
> Then polling is enabled if we block on EAGAIN.
> 
> I could fix the issue here by reworking my first patch. I think I was
> too much conservative, because if we leave do_tcp_sendpages() on OOM
> with copied == 0, in my opinion we never push. My first attempt tried
> to call tcp_push() only once but it seems like this is a wrong idea
> because it doesn't allow new attempts if for example tcp_write_xmit()
> cannot send upon first attempt.
> 
> After calling tcp_push() inconditionnally on OOM, I cannot reproduce
> the issue at all, and the traffic reaches a steady 950 Mbps in each
> direction.
> 
> I'm appending the patch, you'll know better than me if it's correct or
> not.
> 
> Best regards,
> Willy
> 
> ---
> 
> From 39c3f73176118a274ec9dea9c22c83d97a7fbfe0 Mon Sep 17 00:00:00 2001
> From: Willy Tarreau <w@1wt.eu>
> Date: Thu, 17 May 2012 22:43:20 +0200
> Subject: [PATCH] tcp: do_tcp_sendpages() must try to push data out on oom conditions
> 
> Since recent changes on TCP splicing (starting with commits 2f533844
> and 35f9c09f), I started seeing massive stalls when forwarding traffic
> between two sockets using splice() when pipe buffers were larger than
> socket buffers.
> 
> Latest changes (net: netdev_alloc_skb() use build_skb()) made the
> problem even more apparent.
> 
> The reason seems to be that if do_tcp_sendpages() fails on out of memory
> condition without being able to send at least one byte, tcp_push() is not
> called and the buffers cannot be flushed.
> 
> After applying the attached patch, I cannot reproduce the stalls at all
> and the data rate it perfectly stable and steady under any condition
> which previously caused the problem to be permanent.
> 
> The issue seems to have been there since before the kernel migrated to
> git, which makes me think that the stalls I occasionally experienced
> with tux during stress-tests years ago were probably related to the
> same issue.
> 
> This issue was first encountered on 3.0.31 and 3.2.17, so please backport
> to -stable.
> 
> Signed-off-by: Willy Tarreau <w@1wt.eu>
> Cc: <stable@vger.kernel.org>
> ---
>  net/ipv4/tcp.c |    3 +--
>  1 files changed, 1 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 63ddaee..231fe53 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -917,8 +917,7 @@ new_segment:
>  wait_for_sndbuf:
>  		set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
>  wait_for_memory:
> -		if (copied)
> -			tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);
> +		tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);
>  
>  		if ((err = sk_stream_wait_memory(sk, &timeo)) != 0)
>  			goto do_error;


I dont understand why we should tcp_push() if we sent 0 bytes in this
splice() call.

The push() should have be done already by prior splice() call, dont you
think ?

out:
	if (copied && !(flags & MSG_SENDPAGE_NOTLAST))
		tcp_push(sk, flags, mss_now, tp->nonagle);

^ permalink raw reply

* Re: Stable regression with 'tcp: allow splice() to build full TSO packets'
From: Willy Tarreau @ 2012-05-17 21:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <20120517211414.GP14498@1wt.eu>

On Thu, May 17, 2012 at 11:14:14PM +0200, Willy Tarreau wrote:
> On Thu, May 17, 2012 at 10:41:19PM +0200, Eric Dumazet wrote:
> > You drain bytes from fd 0xe to pipe buffers, but I dont see you check
> > write ability on destination socket prior the splice(pipe -> socket)
> 
> I don't, I only rely on EAGAIN to re-enable polling for write (otherwise
> it becomes a real mess of epoll_ctl which sensibly hurts performance). I
> only re-enable polling if FDs can't move anymore.
> 
> Before doing a splice(read), if any data are left pending in the pipe, I
> first try a splice(write) to try to flush the pipe, then I perform the
> splice(read) then try to flush the pipe again using a splice(write).
> Then polling is enabled if we block on EAGAIN.

Just for the sake of documentation, I have rebuilt an up-to-date strace
so that we get the details of the epoll_ctl(). I've restarted with only
the classical epoll (without the events cache) so that it's easier to
trace event status with strace.

Here we clearly see that the output FD (6) was not reported as being
writable for a long period, until haproxy's timeout struck :

15:07:50.130499 gettimeofday({1324652870, 130772}, NULL) = 0
15:07:50.130937 gettimeofday({1324652870, 131071}, NULL) = 0
15:07:50.131195 epoll_wait(3, {}, 6, 1000) = 0
15:07:51.132529 gettimeofday({1324652871, 132654}, NULL) = 0
15:07:51.132777 gettimeofday({1324652871, 132895}, NULL) = 0
15:07:51.133013 epoll_wait(3, {{EPOLLIN, {u32=4, u64=4}}}, 6, 1000) = 1
15:07:51.561594 gettimeofday({1324652871, 561723}, NULL) = 0
15:07:51.561866 accept(4, {sa_family=AF_INET, sin_port=htons(56441), sin_addr=inet_addr("192.168.0.31")}, [16]) = 6
15:07:51.562249 fcntl64(6, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
15:07:51.562582 setsockopt(6, SOL_TCP, TCP_NODELAY, [1], 4) = 0
15:07:51.562844 accept(4, 0x7ea65a64, [128]) = -1 EAGAIN (Resource temporarily unavailable)
15:07:51.563142 epoll_ctl(3, EPOLL_CTL_ADD, 6, {EPOLLIN, {u32=6, u64=6}}) = 0
15:07:51.563421 gettimeofday({1324652871, 563540}, NULL) = 0
15:07:51.563657 epoll_wait(3, {{EPOLLIN, {u32=6, u64=6}}}, 7, 1000) = 1
15:07:51.563923 gettimeofday({1324652871, 564041}, NULL) = 0
15:07:51.564180 recv(6, "GET /?s=1g HTTP/1.0\r\nUser-Agent:"..., 7000, 0) = 126
15:07:51.564574 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 7
15:07:51.564817 fcntl64(7, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
15:07:51.565040 setsockopt(7, SOL_TCP, TCP_NODELAY, [1], 4) = 0
15:07:51.565277 setsockopt(7, SOL_TCP, TCP_QUICKACK, [0], 4) = 0
15:07:51.565524 connect(7, {sa_family=AF_INET, sin_port=htons(8000), sin_addr=inet_addr("192.168.0.31")}, 16) = -1 EINPROGRESS (Operation now in progress)
15:07:51.565955 epoll_ctl(3, EPOLL_CTL_ADD, 7, {EPOLLOUT, {u32=7, u64=7}}) = 0
15:07:51.566235 epoll_ctl(3, EPOLL_CTL_DEL, 6, {0, {u32=6, u64=6}}) = 0
15:07:51.566494 gettimeofday({1324652871, 566613}, NULL) = 0
15:07:51.566730 epoll_wait(3, {{EPOLLOUT, {u32=7, u64=7}}}, 8, 1000) = 1
15:07:51.566994 gettimeofday({1324652871, 567114}, NULL) = 0
15:07:51.567246 send(7, "GET /?s=1g HTTP/1.0\r\nUser-Agent:"..., 107, MSG_DONTWAIT|MSG_NOSIGNAL) = 107
15:07:51.567589 epoll_ctl(3, EPOLL_CTL_MOD, 7, {EPOLLIN, {u32=7, u64=7}}) = 0
15:07:51.567907 gettimeofday({1324652871, 568031}, NULL) = 0
15:07:51.568148 epoll_wait(3, {{EPOLLIN, {u32=7, u64=7}}}, 8, 1000) = 1
15:07:51.568471 gettimeofday({1324652871, 568593}, NULL) = 0
15:07:51.568715 recv(7, "HTTP/1.1 200\r\nConnection: close\r"..., 7000, 0) = 4380
15:07:51.569057 recv(7, "89.12345678\n.123456789.123456789"..., 2620, 0) = 2620
15:07:51.569425 epoll_ctl(3, EPOLL_CTL_DEL, 7, {0, {u32=7, u64=7}}) = 0
15:07:51.569745 epoll_ctl(3, EPOLL_CTL_ADD, 6, {EPOLLOUT, {u32=6, u64=6}}) = 0
15:07:51.570013 gettimeofday({1324652871, 570133}, NULL) = 0
15:07:51.570251 epoll_wait(3, {{EPOLLOUT, {u32=6, u64=6}}}, 8, 1000) = 1
15:07:51.570514 gettimeofday({1324652871, 570634}, NULL) = 0
15:07:51.570754 send(6, "HTTP/1.1 200\r\nConnection: close\r"..., 7000, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE) = 7000
15:07:51.571244 epoll_ctl(3, EPOLL_CTL_DEL, 6, {0, {u32=6, u64=6}}) = 0
15:07:51.571518 epoll_ctl(3, EPOLL_CTL_ADD, 7, {EPOLLIN, {u32=7, u64=7}}) = 0
15:07:51.571825 gettimeofday({1324652871, 571947}, NULL) = 0
15:07:51.572064 epoll_wait(3, {{EPOLLIN, {u32=7, u64=7}}}, 8, 1000) = 1
15:07:51.572355 gettimeofday({1324652871, 572476}, NULL) = 0
15:07:51.572617 pipe([8, 9])            = 0
15:07:51.572863 fcntl64(8, 0x407 /* F_??? */, 0x40000) = 262144
15:07:51.573107 splice(7, NULL, 9, NULL, 1073734988, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 9060
15:07:51.573408 splice(7, NULL, 9, NULL, 1073725928, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 1460
15:07:51.573696 splice(7, NULL, 9, NULL, 1073724468, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
15:07:51.573942 splice(8, NULL, 6, NULL, 10520, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 10520
15:07:51.574328 gettimeofday({1324652871, 574452}, NULL) = 0
15:07:51.574572 epoll_wait(3, {{EPOLLIN, {u32=7, u64=7}}}, 8, 1000) = 1
15:07:51.574988 gettimeofday({1324652871, 575112}, NULL) = 0
15:07:51.575235 splice(7, NULL, 9, NULL, 1073724468, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 16060
15:07:51.575513 splice(8, NULL, 6, NULL, 16060, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 16060
15:07:51.575878 gettimeofday({1324652871, 576003}, NULL) = 0
15:07:51.576142 epoll_wait(3, {{EPOLLIN, {u32=7, u64=7}}}, 8, 1000) = 1
15:07:51.576510 gettimeofday({1324652871, 576631}, NULL) = 0
15:07:51.576753 splice(7, NULL, 9, NULL, 1073708408, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 11680
15:07:51.577027 splice(8, NULL, 6, NULL, 11680, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 11680
15:07:51.577358 gettimeofday({1324652871, 577480}, NULL) = 0
15:07:51.577614 epoll_wait(3, {{EPOLLIN, {u32=7, u64=7}}}, 8, 1000) = 1
15:07:51.577959 gettimeofday({1324652871, 578080}, NULL) = 0
15:07:51.578201 splice(7, NULL, 9, NULL, 1073696728, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 13140
15:07:51.578492 splice(8, NULL, 6, NULL, 13140, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 13140
15:07:51.578799 gettimeofday({1324652871, 578921}, NULL) = 0
15:07:51.579057 epoll_wait(3, {{EPOLLIN, {u32=7, u64=7}}}, 8, 1000) = 1
15:07:51.579413 gettimeofday({1324652871, 579534}, NULL) = 0
15:07:51.579654 splice(7, NULL, 9, NULL, 1073683588, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 14600
15:07:51.579927 splice(8, NULL, 6, NULL, 14600, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 14600
15:07:51.580260 gettimeofday({1324652871, 580389}, NULL) = 0
15:07:51.580526 epoll_wait(3, {{EPOLLIN, {u32=7, u64=7}}}, 8, 1000) = 1
15:07:51.580952 gettimeofday({1324652871, 581073}, NULL) = 0
15:07:51.581195 splice(7, NULL, 9, NULL, 1073668988, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 16060
15:07:51.581491 splice(8, NULL, 6, NULL, 16060, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 16060
15:07:51.581801 gettimeofday({1324652871, 581930}, NULL) = 0
15:07:51.582173 epoll_wait(3, {{EPOLLIN, {u32=7, u64=7}}}, 8, 1000) = 1
15:07:51.582475 gettimeofday({1324652871, 582597}, NULL) = 0
15:07:51.582719 splice(7, NULL, 9, NULL, 1073652928, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 17520
15:07:51.582998 splice(8, NULL, 6, NULL, 17520, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 17520
15:07:51.583345 gettimeofday({1324652871, 583476}, NULL) = 0
15:07:51.583725 epoll_wait(3, {{EPOLLIN, {u32=7, u64=7}}}, 8, 1000) = 1
15:07:51.584001 gettimeofday({1324652871, 584131}, NULL) = 0
15:07:51.584329 splice(7, NULL, 9, NULL, 1073635408, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 29200
15:07:51.584599 splice(8, NULL, 6, NULL, 29200, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 24820
15:07:51.584968 epoll_ctl(3, EPOLL_CTL_ADD, 6, {EPOLLOUT, {u32=6, u64=6}}) = 0

Here FD6 was enabled on output since some data remained in the pipe.

15:07:51.585244 gettimeofday({1324652871, 585374}, NULL) = 0
15:07:51.585583 epoll_wait(3, {{EPOLLIN, {u32=7, u64=7}}}, 8, 1000) = 1
15:07:51.585852 gettimeofday({1324652871, 585972}, NULL) = 0
15:07:51.586093 splice(7, NULL, 9, NULL, 1073606208, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 45260
15:07:51.586384 splice(8, NULL, 6, NULL, 49640, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
15:07:51.586668 gettimeofday({1324652871, 586789}, NULL) = 0
15:07:51.586908 epoll_wait(3, {{EPOLLIN, {u32=7, u64=7}}}, 8, 1000) = 1
15:07:51.587246 gettimeofday({1324652871, 587367}, NULL) = 0
15:07:51.587488 splice(7, NULL, 9, NULL, 1073560948, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 20440
15:07:51.587764 splice(8, NULL, 6, NULL, 70080, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
15:07:51.588045 gettimeofday({1324652871, 588166}, NULL) = 0
15:07:51.588284 epoll_wait(3, {{EPOLLIN, {u32=7, u64=7}}}, 8, 1000) = 1
15:07:51.588641 gettimeofday({1324652871, 588762}, NULL) = 0
15:07:51.588882 splice(7, NULL, 9, NULL, 1073540508, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 23360
15:07:51.589157 splice(8, NULL, 6, NULL, 93440, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
15:07:51.589438 gettimeofday({1324652871, 589558}, NULL) = 0
15:07:51.589677 epoll_wait(3, {{EPOLLIN, {u32=7, u64=7}}}, 8, 1000) = 1
15:07:51.590041 gettimeofday({1324652871, 590163}, NULL) = 0
15:07:51.590284 splice(7, NULL, 9, NULL, 1073517148, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
15:07:51.590533 splice(8, NULL, 6, NULL, 93440, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
15:07:51.590781 epoll_ctl(3, EPOLL_CTL_DEL, 7, {0, {u32=7, u64=7}}) = 0
15:07:51.591054 gettimeofday({1324652871, 591173}, NULL) = 0
15:07:51.591290 epoll_wait(3, {{EPOLLOUT, {u32=6, u64=6}}}, 8, 1000) = 1

write reported on FD6

15:07:51.623172 gettimeofday({1324652871, 623304}, NULL) = 0
15:07:51.623430 splice(8, NULL, 6, NULL, 93440, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 74460

74kB sent, 18980 bytes remaining, FD status not changed.

15:07:51.623739 epoll_ctl(3, EPOLL_CTL_ADD, 7, {EPOLLIN, {u32=7, u64=7}}) = 0
15:07:51.624009 gettimeofday({1324652871, 624128}, NULL) = 0
15:07:51.624246 epoll_wait(3, {{EPOLLIN, {u32=7, u64=7}}}, 8, 1000) = 1
15:07:51.624510 gettimeofday({1324652871, 624629}, NULL) = 0
15:07:51.624749 splice(7, NULL, 9, NULL, 1073517148, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 59860
15:07:51.625067 splice(8, NULL, 6, NULL, 78840, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
15:07:51.625357 gettimeofday({1324652871, 625479}, NULL) = 0
15:07:51.625597 epoll_wait(3, {{EPOLLIN, {u32=7, u64=7}}}, 8, 1000) = 1
15:07:51.625970 gettimeofday({1324652871, 626092}, NULL) = 0
15:07:51.626213 splice(7, NULL, 9, NULL, 1073457288, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = 14600
15:07:51.626490 splice(8, NULL, 6, NULL, 93440, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
15:07:51.626773 gettimeofday({1324652871, 626894}, NULL) = 0
15:07:51.627013 epoll_wait(3, {{EPOLLIN, {u32=7, u64=7}}}, 8, 1000) = 1
15:07:51.627387 gettimeofday({1324652871, 627509}, NULL) = 0
15:07:51.627630 splice(7, NULL, 9, NULL, 1073442688, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
15:07:51.627883 splice(8, NULL, 6, NULL, 93440, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
15:07:51.628131 epoll_ctl(3, EPOLL_CTL_DEL, 7, {0, {u32=7, u64=7}}) = 0

And here we're blocked on both FDs. We disable polling for reads on the input
FD since the pipe is full. We have still not disabled events on FD6, which is
not reported anymore.

15:07:51.628389 gettimeofday({1324652871, 628508}, NULL) = 0
15:07:51.628625 epoll_wait(3, {}, 8, 1000) = 0
15:07:52.629938 gettimeofday({1324652872, 630103}, NULL) = 0
15:07:52.630232 gettimeofday({1324652872, 630364}, NULL) = 0
15:07:52.630481 epoll_wait(3, {}, 8, 1000) = 0
15:07:53.631760 gettimeofday({1324652873, 631895}, NULL) = 0
15:07:53.632018 gettimeofday({1324652873, 632146}, NULL) = 0
15:07:53.632289 epoll_wait(3, {}, 8, 1000) = 0
15:07:54.633641 gettimeofday({1324652874, 633770}, NULL) = 0
15:07:54.633896 gettimeofday({1324652874, 634017}, NULL) = 0
15:07:54.634135 epoll_wait(3, {}, 8, 1000) = 0
15:07:55.635457 gettimeofday({1324652875, 635582}, NULL) = 0
15:07:55.635705 gettimeofday({1324652875, 635824}, NULL) = 0
15:07:55.635943 epoll_wait(3, {}, 8, 930) = 0
15:07:56.567197 gettimeofday({1324652876, 567322}, NULL) = 0
15:07:56.567448 gettimeofday({1324652876, 567568}, NULL) = 0
15:07:56.567685 epoll_wait(3, {}, 8, 1000) = 0
15:07:57.568855 gettimeofday({1324652877, 569012}, NULL) = 0
15:07:57.569145 gettimeofday({1324652877, 569278}, NULL) = 0
15:07:57.569396 epoll_wait(3, {}, 8, 1000) = 0
15:07:58.570760 gettimeofday({1324652878, 570899}, NULL) = 0
15:07:58.571023 gettimeofday({1324652878, 571154}, NULL) = 0
15:07:58.571270 epoll_wait(3, {}, 8, 999) = 0
15:07:59.571418 gettimeofday({1324652879, 571546}, NULL) = 0

Timeout strikes, we're about to close. It happens that the FD is re-enabled
at this point. That doesn't really matter anyway.

15:07:59.571688 epoll_ctl(3, EPOLL_CTL_ADD, 7, {EPOLLIN, {u32=7, u64=7}}) = 0
15:07:59.571964 gettimeofday({1324652879, 572084}, NULL) = 0
15:07:59.572227 epoll_wait(3, {{EPOLLIN, {u32=7, u64=7}}}, 8, 57) = 1
15:07:59.572499 gettimeofday({1324652879, 572619}, NULL) = 0
15:07:59.572742 splice(7, NULL, 9, NULL, 1073442688, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
15:07:59.573010 splice(8, NULL, 6, NULL, 93440, SPLICE_F_MOVE|SPLICE_F_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)

The FD 6 was still blocked.

15:07:59.573269 epoll_ctl(3, EPOLL_CTL_DEL, 7, {0, {u32=7, u64=7}}) = 0
15:07:59.573529 gettimeofday({1324652879, 573649}, NULL) = 0
15:07:59.573766 epoll_wait(3, {}, 8, 56) = 0
15:07:59.630081 gettimeofday({1324652879, 630202}, NULL) = 0
15:07:59.630327 setsockopt(6, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0
15:07:59.630594 close(6)                = 0
15:07:59.630909 close(7)                = 0

FDs are closed.

15:07:59.631424 close(9)                = 0
15:07:59.631746 close(8)                = 0

And pipe is killed because it contains remaining data.

Regards,
Willy

^ permalink raw reply

* Re: Stable regression with 'tcp: allow splice() to build full TSO packets'
From: Eric Dumazet @ 2012-05-17 21:50 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: netdev
In-Reply-To: <1337290822.3403.47.camel@edumazet-glaptop>

On Thu, 2012-05-17 at 23:40 +0200, Eric Dumazet wrote:

> I dont understand why we should tcp_push() if we sent 0 bytes in this
> splice() call.
> 
> The push() should have be done already by prior splice() call, dont you
> think ?
> 
> out:
> 	if (copied && !(flags & MSG_SENDPAGE_NOTLAST))
> 		tcp_push(sk, flags, mss_now, tp->nonagle);
> 

I think I now understand

One splice() syscall actually calls do_tcp_sendpages() several times
(N).

Problem is N-1 calls are done with MSG_SENDPAGE_NOTLAST set

And last one with MSG_SENDPAGE_NOTLAST unset

So maybe we should replace this test by :

	if (!(flags & MSG_SENDPAGE_NOTLAST))
		tcp_push(...);

^ permalink raw reply

* Re: Stable regression with 'tcp: allow splice() to build full TSO packets'
From: Willy Tarreau @ 2012-05-17 21:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1337290822.3403.47.camel@edumazet-glaptop>

On Thu, May 17, 2012 at 11:40:22PM +0200, Eric Dumazet wrote:
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 63ddaee..231fe53 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -917,8 +917,7 @@ new_segment:
> >  wait_for_sndbuf:
> >  		set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
> >  wait_for_memory:
> > -		if (copied)
> > -			tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);
> > +		tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);
> >  
> >  		if ((err = sk_stream_wait_memory(sk, &timeo)) != 0)
> >  			goto do_error;
> 
> 
> I dont understand why we should tcp_push() if we sent 0 bytes in this
> splice() call.
> 
> The push() should have be done already by prior splice() call, dont you
> think ?
> 
> out:
> 	if (copied && !(flags & MSG_SENDPAGE_NOTLAST))
> 		tcp_push(sk, flags, mss_now, tp->nonagle);
> 

I do think so, but my guess is that something fails below us at some point.
I tracked the tcp_push() chain down to tcp_write_xmit() which I *think* might
fail. I'm not certain about the conditions which can make this happen though.
However what I'm almost sure is that if we fail one tcp_write_xmit() from a
call to tcp_push(), next time we come here in do_tcp_sendpages(), the buffers
won't have moved and this second splice() will immediately go to out with
copied == 0.

Maybe I just worked around the consequence of a deeper issue and something
else needs to enable tcp_push() when we fail, but it's not very clear to me.
I felt like the call to tcp_check_probe_timer() in __tcp_push_pending_frames()
is there for this purpose, but I'm not sure.

Willy

^ permalink raw reply

* Re: [IPROUTE2 0/2] Add ECN support to tc-netem
From: Stephen Hemminger @ 2012-05-17 21:54 UTC (permalink / raw)
  To: Vijay Subramanian; +Cc: netdev, Eric Dumazet
In-Reply-To: <1337212318-2100-1-git-send-email-subramanian.vijay@gmail.com>

On Wed, 16 May 2012 16:51:56 -0700
Vijay Subramanian <subramanian.vijay@gmail.com> wrote:

> Recent patch to net-next kernel from Eric Dumazet (e4ae004b84b netem: add ECN
> capability) made it possible for netem to mark packets with ECN instead of
> dropping them. These two patches add support to iproute2/tc and update the
> manpage.
> 
> Vijay Subramanian (2):
>   Update tc-netem manpage to add ecn capability
>   tc-netem: Add support for ECN packet marking
> 
>  include/linux/pkt_sched.h |    1 +
>  man/man8/tc-netem.8       |    8 ++++++--
>  tc/q_netem.c              |   25 +++++++++++++++++++++++++
>  3 files changed, 32 insertions(+), 2 deletions(-)
> 


I am holding these until 3.5 merge window, since they depend
on the feature still in net-next

^ permalink raw reply

* Re: Stable regression with 'tcp: allow splice() to build full TSO packets'
From: Willy Tarreau @ 2012-05-17 21:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1337291410.3403.52.camel@edumazet-glaptop>

On Thu, May 17, 2012 at 11:50:10PM +0200, Eric Dumazet wrote:
> On Thu, 2012-05-17 at 23:40 +0200, Eric Dumazet wrote:
> 
> > I dont understand why we should tcp_push() if we sent 0 bytes in this
> > splice() call.
> > 
> > The push() should have be done already by prior splice() call, dont you
> > think ?
> > 
> > out:
> > 	if (copied && !(flags & MSG_SENDPAGE_NOTLAST))
> > 		tcp_push(sk, flags, mss_now, tp->nonagle);
> > 
> 
> I think I now understand
> 
> One splice() syscall actually calls do_tcp_sendpages() several times
> (N).
> 
> Problem is N-1 calls are done with MSG_SENDPAGE_NOTLAST set
> 
> And last one with MSG_SENDPAGE_NOTLAST unset
> 
> So maybe we should replace this test by :
> 
> 	if (!(flags & MSG_SENDPAGE_NOTLAST))
> 		tcp_push(...);

Just tested, and it unfortunately does not fix the issue :-(

I'll see if I can log errors from lower layers.
Willy

^ permalink raw reply

* Re: Stable regression with 'tcp: allow splice() to build full TSO packets'
From: Eric Dumazet @ 2012-05-17 22:01 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: netdev
In-Reply-To: <1337291410.3403.52.camel@edumazet-glaptop>

On Thu, 2012-05-17 at 23:50 +0200, Eric Dumazet wrote:
> On Thu, 2012-05-17 at 23:40 +0200, Eric Dumazet wrote:
> 
> > I dont understand why we should tcp_push() if we sent 0 bytes in this
> > splice() call.
> > 
> > The push() should have be done already by prior splice() call, dont you
> > think ?
> > 
> > out:
> > 	if (copied && !(flags & MSG_SENDPAGE_NOTLAST))
> > 		tcp_push(sk, flags, mss_now, tp->nonagle);
> > 
> 
> I think I now understand
> 
> One splice() syscall actually calls do_tcp_sendpages() several times
> (N).
> 
> Problem is N-1 calls are done with MSG_SENDPAGE_NOTLAST set
> 
> And last one with MSG_SENDPAGE_NOTLAST unset
> 
> So maybe we should replace this test by :
> 
> 	if (!(flags & MSG_SENDPAGE_NOTLAST))
> 		tcp_push(...);
> 
> 

Its more tricky than that.

If we return 0 from do_tcp_sendpages(), __splice_from_pipe() wont call
us again, but we need to flush().  (This bug is indeed very old)

So maybe use :

if ((copied && !(flags & MSG_SENDPAGE_NOTLAST)) ||
    !copied)
	tcp_push(...);

^ permalink raw reply

* [RFC PATCH net-next] hp100: delete VG/AnyLAN hp100
From: Joe Perches @ 2012-05-17 22:09 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: linux-kernel, JBottomley, David S. Miller, netdev
In-Reply-To: <20120517212039.GC27789@windriver.com>

On Thu, 2012-05-17 at 17:20 -0400, Paul Gortmaker wrote:
> [Re: [PATCH 2/5] drivers/net: delete all code/drivers depending on CONFIG_MCA

If we're removing really old and unused stuff,
how about the VG/AnyLAN driver too?
---
 drivers/net/ethernet/Kconfig     |    1 -
 drivers/net/ethernet/Makefile    |    1 -
 drivers/net/ethernet/hp/Kconfig  |   32 -
 drivers/net/ethernet/hp/Makefile |    5 -
 drivers/net/ethernet/hp/hp100.c  | 3066 --------------------------------------
 drivers/net/ethernet/hp/hp100.h  |  615 --------
 6 files changed, 0 insertions(+), 3720 deletions(-)
 delete mode 100644 drivers/net/ethernet/hp/Kconfig
 delete mode 100644 drivers/net/ethernet/hp/Makefile
 delete mode 100644 drivers/net/ethernet/hp/hp100.c
 delete mode 100644 drivers/net/ethernet/hp/hp100.h

diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index a11af5c..0896c8f 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -52,7 +52,6 @@ source "drivers/net/ethernet/neterion/Kconfig"
 source "drivers/net/ethernet/faraday/Kconfig"
 source "drivers/net/ethernet/freescale/Kconfig"
 source "drivers/net/ethernet/fujitsu/Kconfig"
-source "drivers/net/ethernet/hp/Kconfig"
 source "drivers/net/ethernet/ibm/Kconfig"
 source "drivers/net/ethernet/intel/Kconfig"
 source "drivers/net/ethernet/i825xx/Kconfig"
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index 878ad32..756ebd3 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -27,7 +27,6 @@ obj-$(CONFIG_NET_VENDOR_EXAR) += neterion/
 obj-$(CONFIG_NET_VENDOR_FARADAY) += faraday/
 obj-$(CONFIG_NET_VENDOR_FREESCALE) += freescale/
 obj-$(CONFIG_NET_VENDOR_FUJITSU) += fujitsu/
-obj-$(CONFIG_NET_VENDOR_HP) += hp/
 obj-$(CONFIG_NET_VENDOR_IBM) += ibm/
 obj-$(CONFIG_NET_VENDOR_INTEL) += intel/
 obj-$(CONFIG_NET_VENDOR_I825XX) += i825xx/

^ permalink raw reply related

* Re: Stable regression with 'tcp: allow splice() to build full TSO packets'
From: Eric Dumazet @ 2012-05-17 22:10 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: netdev
In-Reply-To: <1337292119.3403.60.camel@edumazet-glaptop>

On Fri, 2012-05-18 at 00:02 +0200, Eric Dumazet wrote:
> On Thu, 2012-05-17 at 23:50 +0200, Eric Dumazet wrote:
> > On Thu, 2012-05-17 at 23:40 +0200, Eric Dumazet wrote:
> > 
> > > I dont understand why we should tcp_push() if we sent 0 bytes in this
> > > splice() call.
> > > 
> > > The push() should have be done already by prior splice() call, dont you
> > > think ?
> > > 
> > > out:
> > > 	if (copied && !(flags & MSG_SENDPAGE_NOTLAST))
> > > 		tcp_push(sk, flags, mss_now, tp->nonagle);
> > > 
> > 
> > I think I now understand
> > 
> > One splice() syscall actually calls do_tcp_sendpages() several times
> > (N).
> > 
> > Problem is N-1 calls are done with MSG_SENDPAGE_NOTLAST set
> > 
> > And last one with MSG_SENDPAGE_NOTLAST unset
> > 
> > So maybe we should replace this test by :
> > 
> > 	if (!(flags & MSG_SENDPAGE_NOTLAST))
> > 		tcp_push(...);
> > 
> > 
> 
> Its more tricky than that.
> 
> If we return 0 from do_tcp_sendpages(), __splice_from_pipe() wont call
> us again, but we need to flush().  (This bug is indeed very old)
> 
> So maybe use :
> 
> if ((copied && !(flags & MSG_SENDPAGE_NOTLAST)) ||
>     !copied)
> 	tcp_push(...);


But then your patch is equivalentm so I'm going to Ack it ;)

^ permalink raw reply

* Re: Stable regression with 'tcp: allow splice() to build full TSO packets'
From: Eric Dumazet @ 2012-05-17 22:14 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: netdev
In-Reply-To: <20120517211414.GP14498@1wt.eu>

On Thu, 2012-05-17 at 23:14 +0200, Willy Tarreau wrote:

> ---
> 
> From 39c3f73176118a274ec9dea9c22c83d97a7fbfe0 Mon Sep 17 00:00:00 2001
> From: Willy Tarreau <w@1wt.eu>
> Date: Thu, 17 May 2012 22:43:20 +0200
> Subject: [PATCH] tcp: do_tcp_sendpages() must try to push data out on oom conditions
> 
> Since recent changes on TCP splicing (starting with commits 2f533844
> and 35f9c09f), I started seeing massive stalls when forwarding traffic
> between two sockets using splice() when pipe buffers were larger than
> socket buffers.
> 
> Latest changes (net: netdev_alloc_skb() use build_skb()) made the
> problem even more apparent.
> 
> The reason seems to be that if do_tcp_sendpages() fails on out of memory
> condition without being able to send at least one byte, tcp_push() is not
> called and the buffers cannot be flushed.
> 
> After applying the attached patch, I cannot reproduce the stalls at all
> and the data rate it perfectly stable and steady under any condition
> which previously caused the problem to be permanent.
> 
> The issue seems to have been there since before the kernel migrated to
> git, which makes me think that the stalls I occasionally experienced
> with tux during stress-tests years ago were probably related to the
> same issue.
> 
> This issue was first encountered on 3.0.31 and 3.2.17, so please backport
> to -stable.
> 
> Signed-off-by: Willy Tarreau <w@1wt.eu>
> Cc: <stable@vger.kernel.org>
> ---
>  net/ipv4/tcp.c |    3 +--
>  1 files changed, 1 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 63ddaee..231fe53 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -917,8 +917,7 @@ new_segment:
>  wait_for_sndbuf:
>  		set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
>  wait_for_memory:
> -		if (copied)
> -			tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);
> +		tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);
>  
>  		if ((err = sk_stream_wait_memory(sk, &timeo)) != 0)
>  			goto do_error;



Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: Stable regression with 'tcp: allow splice() to build full TSO packets'
From: Willy Tarreau @ 2012-05-17 22:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1337292119.3403.60.camel@edumazet-glaptop>

On Fri, May 18, 2012 at 12:01:59AM +0200, Eric Dumazet wrote:
> On Thu, 2012-05-17 at 23:50 +0200, Eric Dumazet wrote:
> > On Thu, 2012-05-17 at 23:40 +0200, Eric Dumazet wrote:
> > 
> > > I dont understand why we should tcp_push() if we sent 0 bytes in this
> > > splice() call.
> > > 
> > > The push() should have be done already by prior splice() call, dont you
> > > think ?
> > > 
> > > out:
> > > 	if (copied && !(flags & MSG_SENDPAGE_NOTLAST))
> > > 		tcp_push(sk, flags, mss_now, tp->nonagle);
> > > 
> > 
> > I think I now understand
> > 
> > One splice() syscall actually calls do_tcp_sendpages() several times
> > (N).
> > 
> > Problem is N-1 calls are done with MSG_SENDPAGE_NOTLAST set
> > 
> > And last one with MSG_SENDPAGE_NOTLAST unset
> > 
> > So maybe we should replace this test by :
> > 
> > 	if (!(flags & MSG_SENDPAGE_NOTLAST))
> > 		tcp_push(...);
> > 
> > 
> 
> Its more tricky than that.
> 
> If we return 0 from do_tcp_sendpages(), __splice_from_pipe() wont call
> us again, but we need to flush().  (This bug is indeed very old)
> 
> So maybe use :
> 
> if ((copied && !(flags & MSG_SENDPAGE_NOTLAST)) ||
>     !copied)
> 	tcp_push(...);

That's what I wanted to do at first but was still worried about the
situations where we fail upon first call due to lack of memory. And
indeed that did not fix the issue either :-(

At least now I've checked that we fail here in do_tcp_sendpages() :

	if (!sk_stream_memory_free(sk)) {
		printk(KERN_DEBUG "%s:%d copied=%d\n", __FUNCTION__, __LINE__, copied);
		goto wait_for_sndbuf;
	}

Well, finally I just put the same test as yours above for the call to
tcp_push() upon out of memory and it fixed it too. I have simplified
the expression, since ((A && !B) || !A) == !(A & B).

With the updated patch here it works for me. I don't know if it's
better.

Willy

-----

>From 1f26263bd4f8827bbca4cef77a99fac128cf8ccc Mon Sep 17 00:00:00 2001
From: Willy Tarreau <w@1wt.eu>
Date: Thu, 17 May 2012 22:43:20 +0200
Subject: [PATCH] tcp: do_tcp_sendpages() must try to push data out on oom conditions

Since recent changes on TCP splicing (starting with commits 2f533844
and 35f9c09f), I started seeing massive stalls when forwarding traffic
between two sockets using splice() when pipe buffers were larger than
socket buffers.

Latest changes (net: netdev_alloc_skb() use build_skb()) made the
problem even more apparent.

The reason seems to be that if do_tcp_sendpages() fails on out of memory
condition without being able to send at least one byte, tcp_push() is not
called and the buffers cannot be flushed.

After applying the attached patch, I cannot reproduce the stalls at all
and the data rate it perfectly stable and steady under any condition
which previously caused the problem to be permanent.

The issue seems to have been there since before the kernel migrated to
git, which makes me think that the stalls I occasionally experienced
with tux during stress-tests years ago were probably related to the
same issue.

This issue was first encountered on 3.0.31 and 3.2.17, so please backport
to -stable.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Cc: <stable@vger.kernel.org>
---
 net/ipv4/tcp.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 63ddaee..cfe47c1 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -917,7 +917,7 @@ new_segment:
 wait_for_sndbuf:
 		set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
 wait_for_memory:
-		if (copied)
+		if (!(copied && (flags & MSG_SENDPAGE_NOTLAST)))
 			tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);
 
 		if ((err = sk_stream_wait_memory(sk, &timeo)) != 0)
@@ -927,7 +927,7 @@ wait_for_memory:
 	}
 
 out:
-	if (copied && !(flags & MSG_SENDPAGE_NOTLAST))
+	if (!(copied && (flags & MSG_SENDPAGE_NOTLAST)))
 		tcp_push(sk, flags, mss_now, tp->nonagle);
 	return copied;
 
-- 
1.7.2.1.45.g54fbc

^ permalink raw reply related

* Re: Stable regression with 'tcp: allow splice() to build full TSO packets'
From: Eric Dumazet @ 2012-05-17 22:22 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: netdev
In-Reply-To: <20120517221641.GS14498@1wt.eu>

On Fri, 2012-05-18 at 00:16 +0200, Willy Tarreau wrote:
> On Fri, May 18, 2012 at 12:01:59AM +0200, Eric Dumazet wrote:
> > On Thu, 2012-05-17 at 23:50 +0200, Eric Dumazet wrote:
> > > On Thu, 2012-05-17 at 23:40 +0200, Eric Dumazet wrote:
> > > 
> > > > I dont understand why we should tcp_push() if we sent 0 bytes in this
> > > > splice() call.
> > > > 
> > > > The push() should have be done already by prior splice() call, dont you
> > > > think ?
> > > > 
> > > > out:
> > > > 	if (copied && !(flags & MSG_SENDPAGE_NOTLAST))
> > > > 		tcp_push(sk, flags, mss_now, tp->nonagle);
> > > > 
> > > 
> > > I think I now understand
> > > 
> > > One splice() syscall actually calls do_tcp_sendpages() several times
> > > (N).
> > > 
> > > Problem is N-1 calls are done with MSG_SENDPAGE_NOTLAST set
> > > 
> > > And last one with MSG_SENDPAGE_NOTLAST unset
> > > 
> > > So maybe we should replace this test by :
> > > 
> > > 	if (!(flags & MSG_SENDPAGE_NOTLAST))
> > > 		tcp_push(...);
> > > 
> > > 
> > 
> > Its more tricky than that.
> > 
> > If we return 0 from do_tcp_sendpages(), __splice_from_pipe() wont call
> > us again, but we need to flush().  (This bug is indeed very old)
> > 
> > So maybe use :
> > 
> > if ((copied && !(flags & MSG_SENDPAGE_NOTLAST)) ||
> >     !copied)
> > 	tcp_push(...);
> 
> That's what I wanted to do at first but was still worried about the
> situations where we fail upon first call due to lack of memory. And
> indeed that did not fix the issue either :-(
> 
> At least now I've checked that we fail here in do_tcp_sendpages() :
> 
> 	if (!sk_stream_memory_free(sk)) {
> 		printk(KERN_DEBUG "%s:%d copied=%d\n", __FUNCTION__, __LINE__, copied);
> 		goto wait_for_sndbuf;
> 	}
> 
> Well, finally I just put the same test as yours above for the call to
> tcp_push() upon out of memory and it fixed it too. I have simplified
> the expression, since ((A && !B) || !A) == !(A & B).
> 
> With the updated patch here it works for me. I don't know if it's
> better.
> 
> Willy
> 
> -----
> 
> From 1f26263bd4f8827bbca4cef77a99fac128cf8ccc Mon Sep 17 00:00:00 2001
> From: Willy Tarreau <w@1wt.eu>
> Date: Thu, 17 May 2012 22:43:20 +0200
> Subject: [PATCH] tcp: do_tcp_sendpages() must try to push data out on oom conditions
> 
> Since recent changes on TCP splicing (starting with commits 2f533844
> and 35f9c09f), I started seeing massive stalls when forwarding traffic
> between two sockets using splice() when pipe buffers were larger than
> socket buffers.
> 
> Latest changes (net: netdev_alloc_skb() use build_skb()) made the
> problem even more apparent.
> 
> The reason seems to be that if do_tcp_sendpages() fails on out of memory
> condition without being able to send at least one byte, tcp_push() is not
> called and the buffers cannot be flushed.
> 
> After applying the attached patch, I cannot reproduce the stalls at all
> and the data rate it perfectly stable and steady under any condition
> which previously caused the problem to be permanent.
> 
> The issue seems to have been there since before the kernel migrated to
> git, which makes me think that the stalls I occasionally experienced
> with tux during stress-tests years ago were probably related to the
> same issue.
> 
> This issue was first encountered on 3.0.31 and 3.2.17, so please backport
> to -stable.
> 
> Signed-off-by: Willy Tarreau <w@1wt.eu>
> Cc: <stable@vger.kernel.org>
> ---
>  net/ipv4/tcp.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 63ddaee..cfe47c1 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -917,7 +917,7 @@ new_segment:
>  wait_for_sndbuf:
>  		set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
>  wait_for_memory:
> -		if (copied)
> +		if (!(copied && (flags & MSG_SENDPAGE_NOTLAST)))
>  			tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);
>  
>  		if ((err = sk_stream_wait_memory(sk, &timeo)) != 0)
> @@ -927,7 +927,7 @@ wait_for_memory:
>  	}
>  
>  out:
> -	if (copied && !(flags & MSG_SENDPAGE_NOTLAST))
> +	if (!(copied && (flags & MSG_SENDPAGE_NOTLAST)))
>  		tcp_push(sk, flags, mss_now, tp->nonagle);
>  	return copied;
>  


Hmm... I believe I prefer your prior patch ( the one I Acked)

^ permalink raw reply

* Re: [PATCH 2/5] drivers/net: delete all code/drivers depending on CONFIG_MCA
From: David Miller @ 2012-05-17 22:23 UTC (permalink / raw)
  To: paul.gortmaker; +Cc: linux-kernel, JBottomley, netdev
In-Reply-To: <20120517212039.GC27789@windriver.com>

From: Paul Gortmaker <paul.gortmaker@windriver.com>
Date: Thu, 17 May 2012 17:20:39 -0400

> I've layered the above commit (with 3c527.h deleted) onto today's
> net-next, did a quick re-test of x86_64 allmodconfig and put it
> on a branch as per below (vs sending a 5000+ line patch).
 ...
>   git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux.git delete-mca-net-drivers

Pulled, thanks Paul.

^ permalink raw reply

* Re: Stable regression with 'tcp: allow splice() to build full TSO packets'
From: Willy Tarreau @ 2012-05-17 22:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1337293351.3403.67.camel@edumazet-glaptop>

On Fri, May 18, 2012 at 12:22:31AM +0200, Eric Dumazet wrote:
> On Fri, 2012-05-18 at 00:16 +0200, Willy Tarreau wrote:
> > On Fri, May 18, 2012 at 12:01:59AM +0200, Eric Dumazet wrote:
> > > On Thu, 2012-05-17 at 23:50 +0200, Eric Dumazet wrote:
> > > > On Thu, 2012-05-17 at 23:40 +0200, Eric Dumazet wrote:
> > > > 
> > > > > I dont understand why we should tcp_push() if we sent 0 bytes in this
> > > > > splice() call.
> > > > > 
> > > > > The push() should have be done already by prior splice() call, dont you
> > > > > think ?
> > > > > 
> > > > > out:
> > > > > 	if (copied && !(flags & MSG_SENDPAGE_NOTLAST))
> > > > > 		tcp_push(sk, flags, mss_now, tp->nonagle);
> > > > > 
> > > > 
> > > > I think I now understand
> > > > 
> > > > One splice() syscall actually calls do_tcp_sendpages() several times
> > > > (N).
> > > > 
> > > > Problem is N-1 calls are done with MSG_SENDPAGE_NOTLAST set
> > > > 
> > > > And last one with MSG_SENDPAGE_NOTLAST unset
> > > > 
> > > > So maybe we should replace this test by :
> > > > 
> > > > 	if (!(flags & MSG_SENDPAGE_NOTLAST))
> > > > 		tcp_push(...);
> > > > 
> > > > 
> > > 
> > > Its more tricky than that.
> > > 
> > > If we return 0 from do_tcp_sendpages(), __splice_from_pipe() wont call
> > > us again, but we need to flush().  (This bug is indeed very old)
> > > 
> > > So maybe use :
> > > 
> > > if ((copied && !(flags & MSG_SENDPAGE_NOTLAST)) ||
> > >     !copied)
> > > 	tcp_push(...);
> > 
> > That's what I wanted to do at first but was still worried about the
> > situations where we fail upon first call due to lack of memory. And
> > indeed that did not fix the issue either :-(
> > 
> > At least now I've checked that we fail here in do_tcp_sendpages() :
> > 
> > 	if (!sk_stream_memory_free(sk)) {
> > 		printk(KERN_DEBUG "%s:%d copied=%d\n", __FUNCTION__, __LINE__, copied);
> > 		goto wait_for_sndbuf;
> > 	}
> > 
> > Well, finally I just put the same test as yours above for the call to
> > tcp_push() upon out of memory and it fixed it too. I have simplified
> > the expression, since ((A && !B) || !A) == !(A & B).
> > 
> > With the updated patch here it works for me. I don't know if it's
> > better.
> > 
> > Willy
> > 
> > -----
> > 
> > From 1f26263bd4f8827bbca4cef77a99fac128cf8ccc Mon Sep 17 00:00:00 2001
> > From: Willy Tarreau <w@1wt.eu>
> > Date: Thu, 17 May 2012 22:43:20 +0200
> > Subject: [PATCH] tcp: do_tcp_sendpages() must try to push data out on oom conditions
> > 
> > Since recent changes on TCP splicing (starting with commits 2f533844
> > and 35f9c09f), I started seeing massive stalls when forwarding traffic
> > between two sockets using splice() when pipe buffers were larger than
> > socket buffers.
> > 
> > Latest changes (net: netdev_alloc_skb() use build_skb()) made the
> > problem even more apparent.
> > 
> > The reason seems to be that if do_tcp_sendpages() fails on out of memory
> > condition without being able to send at least one byte, tcp_push() is not
> > called and the buffers cannot be flushed.
> > 
> > After applying the attached patch, I cannot reproduce the stalls at all
> > and the data rate it perfectly stable and steady under any condition
> > which previously caused the problem to be permanent.
> > 
> > The issue seems to have been there since before the kernel migrated to
> > git, which makes me think that the stalls I occasionally experienced
> > with tux during stress-tests years ago were probably related to the
> > same issue.
> > 
> > This issue was first encountered on 3.0.31 and 3.2.17, so please backport
> > to -stable.
> > 
> > Signed-off-by: Willy Tarreau <w@1wt.eu>
> > Cc: <stable@vger.kernel.org>
> > ---
> >  net/ipv4/tcp.c |    4 ++--
> >  1 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 63ddaee..cfe47c1 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -917,7 +917,7 @@ new_segment:
> >  wait_for_sndbuf:
> >  		set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
> >  wait_for_memory:
> > -		if (copied)
> > +		if (!(copied && (flags & MSG_SENDPAGE_NOTLAST)))
> >  			tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);
> >  
> >  		if ((err = sk_stream_wait_memory(sk, &timeo)) != 0)
> > @@ -927,7 +927,7 @@ wait_for_memory:
> >  	}
> >  
> >  out:
> > -	if (copied && !(flags & MSG_SENDPAGE_NOTLAST))
> > +	if (!(copied && (flags & MSG_SENDPAGE_NOTLAST)))
> >  		tcp_push(sk, flags, mss_now, tp->nonagle);
> >  	return copied;
> >  
> 
> 
> Hmm... I believe I prefer your prior patch ( the one I Acked)

I too, less changes are better :-)

Thanks,
Willy

^ permalink raw reply

* Re: Stable regression with 'tcp: allow splice() to build full TSO packets'
From: David Miller @ 2012-05-17 22:25 UTC (permalink / raw)
  To: w; +Cc: eric.dumazet, netdev
In-Reply-To: <20120517222431.GT14498@1wt.eu>

From: Willy Tarreau <w@1wt.eu>
Date: Fri, 18 May 2012 00:24:31 +0200

> On Fri, May 18, 2012 at 12:22:31AM +0200, Eric Dumazet wrote:
>> Hmm... I believe I prefer your prior patch ( the one I Acked)
> 
> I too, less changes are better :-)

And I think that's the one I'm going to apply after I audit this
code a little bit myself.

Thanks!

^ permalink raw reply

* Re: Stable regression with 'tcp: allow splice() to build full TSO packets'
From: Joe Perches @ 2012-05-17 22:27 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Willy Tarreau, netdev
In-Reply-To: <1337293351.3403.67.camel@edumazet-glaptop>

On Fri, 2012-05-18 at 00:22 +0200, Eric Dumazet wrote:
> On Fri, 2012-05-18 at 00:16 +0200, Willy Tarreau wrote:
> > I have simplified the expression,
> > since ((A && !B) || !A) == !(A & B).

Perhaps it's clearer for a reader the first way and
any decent compiler (even gcc) could probably do that
optimization.

^ permalink raw reply

* Re: [RFC PATCH net-next] hp100: delete VG/AnyLAN hp100
From: Francois Romieu @ 2012-05-17 22:28 UTC (permalink / raw)
  To: Joe Perches
  Cc: Paul Gortmaker, linux-kernel, JBottomley, David S. Miller, netdev,
	Joel Soete
In-Reply-To: <1337292547.8872.20.camel@joe2Laptop>

Joe Perches <joe@perches.com> :
> On Thu, 2012-05-17 at 17:20 -0400, Paul Gortmaker wrote:
> > [Re: [PATCH 2/5] drivers/net: delete all code/drivers depending on CONFIG_MCA
> 
> If we're removing really old and unused stuff,
> how about the VG/AnyLAN driver too?

Joel Soete appeared to actively use a PCI one with recent kernels back
in 2007.

-- 
Ueimor

^ permalink raw reply

* Re: Stable regression with 'tcp: allow splice() to build full TSO packets'
From: Willy Tarreau @ 2012-05-17 22:29 UTC (permalink / raw)
  To: David Miller; +Cc: Eric Dumazet, netdev
In-Reply-To: <1337292894.3403.66.camel@edumazet-glaptop>

David,

After several tests, we finally agreed on this one. I can't make the
transfers fail anymore. And they're damn fast thanks to Eric's recent
work!

Best regards,
Willy

---

>From 39c3f73176118a274ec9dea9c22c83d97a7fbfe0 Mon Sep 17 00:00:00 2001
From: Willy Tarreau <w@1wt.eu>
Date: Thu, 17 May 2012 22:43:20 +0200
Subject: [PATCH] tcp: do_tcp_sendpages() must try to push data out on oom conditions

Since recent changes on TCP splicing (starting with commits 2f533844
and 35f9c09f), I started seeing massive stalls when forwarding traffic
between two sockets using splice() when pipe buffers were larger than
socket buffers.

Latest changes (net: netdev_alloc_skb() use build_skb()) made the
problem even more apparent.

The reason seems to be that if do_tcp_sendpages() fails on out of memory
condition without being able to send at least one byte, tcp_push() is not
called and the buffers cannot be flushed.

After applying the attached patch, I cannot reproduce the stalls at all
and the data rate it perfectly stable and steady under any condition
which previously caused the problem to be permanent.

The issue seems to have been there since before the kernel migrated to
git, which makes me think that the stalls I occasionally experienced
with tux during stress-tests years ago were probably related to the
same issue.

This issue was first encountered on 3.0.31 and 3.2.17, so please backport
to -stable.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Cc: <stable@vger.kernel.org>
Acked-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 63ddaee..231fe53 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -917,8 +917,7 @@ new_segment:
 wait_for_sndbuf:
 		set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
 wait_for_memory:
-		if (copied)
-			tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);
+		tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);
 
 		if ((err = sk_stream_wait_memory(sk, &timeo)) != 0)
 			goto do_error;

^ permalink raw reply related

* Re: Stable regression with 'tcp: allow splice() to build full TSO packets'
From: Willy Tarreau @ 2012-05-17 22:30 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev
In-Reply-To: <20120517.182545.91312407467980757.davem@davemloft.net>

On Thu, May 17, 2012 at 06:25:45PM -0400, David Miller wrote:
> From: Willy Tarreau <w@1wt.eu>
> Date: Fri, 18 May 2012 00:24:31 +0200
> 
> > On Fri, May 18, 2012 at 12:22:31AM +0200, Eric Dumazet wrote:
> >> Hmm... I believe I prefer your prior patch ( the one I Acked)
> > 
> > I too, less changes are better :-)
> 
> And I think that's the one I'm going to apply after I audit this
> code a little bit myself.
> 
> Thanks!

Sorry for the double mail, I didn't know if you were following the thread
when I forwarded it to you.

Cheers,
Willy

^ permalink raw reply

* Re: [net] e1000: Prevent reset task killing itself.
From: David Miller @ 2012-05-17 22:33 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: tushar.n.dave, netdev, gospo, sassmann, stable
In-Reply-To: <1337252690-29687-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 17 May 2012 04:04:50 -0700

> From: Tushar Dave <tushar.n.dave@intel.com>
> 
> Killing reset task while adapter is resetting causes deadlock.
> Only kill reset task if adapter is not resetting.
> Ref bug #43132 on bugzilla.kernel.org
> 
> CC: stable@vger.kernel.org
> Signed-off-by: Tushar Dave <tushar.n.dave@intel.com>
> Tested-by: Aaron Brown <aaron.f.brown@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied, thanks.

^ permalink raw reply

* Re: Stable regression with 'tcp: allow splice() to build full TSO packets'
From: David Miller @ 2012-05-17 22:35 UTC (permalink / raw)
  To: w; +Cc: eric.dumazet, netdev
In-Reply-To: <20120517223028.GV14498@1wt.eu>

From: Willy Tarreau <w@1wt.eu>
Date: Fri, 18 May 2012 00:30:28 +0200

> On Thu, May 17, 2012 at 06:25:45PM -0400, David Miller wrote:
>> From: Willy Tarreau <w@1wt.eu>
>> Date: Fri, 18 May 2012 00:24:31 +0200
>> 
>> > On Fri, May 18, 2012 at 12:22:31AM +0200, Eric Dumazet wrote:
>> >> Hmm... I believe I prefer your prior patch ( the one I Acked)
>> > 
>> > I too, less changes are better :-)
>> 
>> And I think that's the one I'm going to apply after I audit this
>> code a little bit myself.
>> 
>> Thanks!
> 
> Sorry for the double mail, I didn't know if you were following the thread
> when I forwarded it to you.

No problem.

BTW, this issue goes back all the way to the original implementation
of do_tcp_sendpages().

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2012-05-17 22:44 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


1) Thanks to Willy Tarreau and Eric Dumazet, we've unlocked a bug that's
   been present in do_tcp_sendpages() since that function was written in
   2002.

   When we block to wait for memory we have to unconditionally try and
   push out pending TCP data, otherwise we can block for an
   unreasonably long amount of time.

2) Fix deadlock in e1000, fixes kernel bugzilla 43132

   From Tushar Dave.

Please pull, thanks a lot!

The following changes since commit 1be5f0b7575e090fd100a98b303860879b5800de:

  Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma (2012-05-17 09:57:13 -0700)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net master

for you to fetch changes up to 8ce6909f77ba1b7bcdea65cc2388fd1742b6d669:

  e1000: Prevent reset task killing itself. (2012-05-17 18:32:41 -0400)

----------------------------------------------------------------
Tushar Dave (1):
      e1000: Prevent reset task killing itself.

Willy Tarreau (1):
      tcp: do_tcp_sendpages() must try to push data out on oom conditions

 drivers/net/ethernet/intel/e1000/e1000_main.c |    6 +++++-
 net/ipv4/tcp.c                                |    3 +--
 2 files changed, 6 insertions(+), 3 deletions(-)

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox