All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg KH <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: torvalds@linux-foundation.org, akpm@linux-foundation.org,
	alan@lxorguk.ukuu.org.uk, Neal Cardwell <ncardwell@google.com>,
	"David S. Miller" <davem@davemloft.net>
Subject: [ 25/41] tcp: fix tcp_shift_skb_data() to not shift SACKed data below snd_una
Date: Fri, 16 Mar 2012 16:38:35 -0700	[thread overview]
Message-ID: <20120316233812.834856254@linuxfoundation.org> (raw)
In-Reply-To: <20120316233829.GA14022@kroah.com>

3.2-stable review patch.  If anyone has any objections, please let me know.

------------------


From: Neal Cardwell <ncardwell@google.com>

[ Upstream commit 4648dc97af9d496218a05353b0e442b3dfa6aaab ]

This commit fixes tcp_shift_skb_data() so that it does not shift
SACKed data below snd_una.

This fixes an issue whose symptoms exactly match reports showing
tp->sacked_out going negative since 3.3.0-rc4 (see "WARNING: at
net/ipv4/tcp_input.c:3418" thread on netdev).

Since 2008 (832d11c5cd076abc0aa1eaf7be96c81d1a59ce41)
tcp_shift_skb_data() had been shifting SACKed ranges that were below
snd_una. It checked that the *end* of the skb it was about to shift
from was above snd_una, but did not check that the end of the actual
shifted range was above snd_una; this commit adds that check.

Shifting SACKed ranges below snd_una is problematic because for such
ranges tcp_sacktag_one() short-circuits: it does not declare anything
as SACKed and does not increase sacked_out.

Before the fixes in commits cc9a672ee522d4805495b98680f4a3db5d0a0af9
and daef52bab1fd26e24e8e9578f8fb33ba1d0cb412, shifting SACKed ranges
below snd_una happened to work because tcp_shifted_skb() was always
(incorrectly) passing in to tcp_sacktag_one() an skb whose end_seq
tcp_shift_skb_data() had already guaranteed was beyond snd_una. Hence
tcp_sacktag_one() never short-circuited and always increased
tp->sacked_out in this case.

After those two fixes, my testing has verified that shifting SACKed
ranges below snd_una could cause tp->sacked_out to go negative with
the following sequence of events:

(1) tcp_shift_skb_data() sees an skb whose end_seq is beyond snd_una,
    then shifts a prefix of that skb that is below snd_una

(2) tcp_shifted_skb() increments the packet count of the
    already-SACKed prev sk_buff

(3) tcp_sacktag_one() sees the end of the new SACKed range is below
    snd_una, so it short-circuits and doesn't increase tp->sacked_out

(5) tcp_clean_rtx_queue() sees the SACKed skb has been ACKed,
    decrements tp->sacked_out by this "inflated" pcount that was
    missing a matching increase in tp->sacked_out, and hence
    tp->sacked_out underflows to a u32 like 0xFFFFFFFF, which casted
    to s32 is negative.

(6) this leads to the warnings seen in the recent "WARNING: at
    net/ipv4/tcp_input.c:3418" thread on the netdev list; e.g.:
    tcp_input.c:3418  WARN_ON((int)tp->sacked_out < 0);

More generally, I think this bug can be tickled in some cases where
two or more ACKs from the receiver are lost and then a DSACK arrives
that is immediately above an existing SACKed skb in the write queue.

This fix changes tcp_shift_skb_data() to abort this sequence at step
(1) in the scenario above by noticing that the bytes are below snd_una
and not shifting them.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/tcp_input.c |    4 ++++
 1 file changed, 4 insertions(+)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1588,6 +1588,10 @@ static struct sk_buff *tcp_shift_skb_dat
 		}
 	}
 
+	/* tcp_sacktag_one() won't SACK-tag ranges below snd_una */
+	if (!after(TCP_SKB_CB(skb)->seq + len, tp->snd_una))
+		goto fallback;
+
 	if (!skb_shift(prev, skb, len))
 		goto fallback;
 	if (!tcp_shifted_skb(sk, skb, state, pcount, len, mss, dup_sack))



  parent reply	other threads:[~2012-03-16 23:41 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-16 23:38 [ 00/41] 3.2.12-stable review Greg KH
2012-03-16 23:38 ` [ 01/41] ASoC: neo1973: fix neo1973 wm8753 initialization Greg KH
2012-03-16 23:38 ` [ 02/41] ALSA: hda/realtek - Apply the coef-setup only to ALC269VB Greg KH
2012-03-16 23:38 ` [ 03/41] aio: fix io_setup/io_destroy race Greg KH
2012-03-16 23:38 ` [ 04/41] aio: fix the "too late munmap()" race Greg KH
2012-03-16 23:38 ` [ 05/41] x86: Derandom delay_tsc for 64 bit Greg KH
2012-03-16 23:38 ` [ 06/41] PCI: ignore pre-1.1 ASPM quirking when ASPM is disabled Greg KH
2012-03-31  3:23   ` Ken Moffat
2012-03-31  3:33     ` Jonathan Nieder
2012-03-31 18:20       ` Linus Torvalds
2012-03-31 18:20         ` Linus Torvalds
2012-03-31 18:32         ` Matthew Garrett
2012-03-31 18:32           ` Matthew Garrett
2012-04-19 23:21           ` Ken Moffat
2012-04-01 16:11     ` Ken Moffat
2012-04-01 16:11       ` Ken Moffat
2012-04-01 16:59       ` Linus Torvalds
2012-04-01 16:59         ` Linus Torvalds
2012-04-01 17:10         ` Greg KH
2012-04-01 17:10           ` Greg KH
2012-04-02 20:27       ` Ken Moffat
2012-04-02 20:27         ` Ken Moffat
2012-03-16 23:38 ` [ 07/41] [media] omap3isp: ccdc: Fix crash in HS/VS interrupt handler Greg KH
2012-03-16 23:38 ` [ 08/41] rt2x00: fix random stalls Greg KH
2012-03-16 23:38 ` [ 09/41] perf/x86: Fix local vs remote memory events for NHM/WSM Greg KH
2012-03-16 23:38 ` [ 10/41] CIFS: Do not kmalloc under the flocks spinlock Greg KH
2012-03-17  2:37   ` Ben Hutchings
2012-03-17  6:14     ` Pavel Shilovsky
2012-03-17  6:14       ` Pavel Shilovsky
2012-03-17  7:32       ` Ben Hutchings
2012-03-17  7:52         ` Pavel Shilovsky
2012-03-17  7:52           ` Pavel Shilovsky
2012-03-19 15:50           ` Greg KH
2012-03-19 19:11             ` Pavel Shilovsky
2012-03-19 19:11               ` Pavel Shilovsky
2012-03-19 19:24               ` Greg KH
2012-03-23 17:52                 ` Greg KH
2012-03-16 23:38 ` [ 11/41] vfs: fix return value from do_last() Greg KH
2012-03-16 23:38 ` [ 12/41] vfs: fix double put after complete_walk() Greg KH
2012-03-16 23:38 ` [ 13/41] acer-wmi: No wifi rfkill on Lenovo machines Greg KH
2012-03-16 23:38 ` [ 14/41] atl1c: dont use highprio tx queue Greg KH
2012-03-16 23:38 ` [ 15/41] neighbour: Fixed race condition at tbl->nht Greg KH
2012-03-16 23:38 ` [ 16/41] ipsec: be careful of non existing mac headers Greg KH
2012-03-16 23:38   ` Greg KH
2012-03-16 23:38 ` [ 17/41] ppp: fix ppp_mp_reconstruct bad seq errors Greg KH
2012-03-16 23:38 ` [ 18/41] sfc: Fix assignment of ip_summed for pre-allocated skbs Greg KH
2012-03-16 23:38 ` [ 19/41] tcp: fix false reordering signal in tcp_shifted_skb Greg KH
2012-03-16 23:38 ` [ 20/41] vmxnet3: Fix transport header size Greg KH
2012-03-16 23:38 ` [ 21/41] packetengines: fix config default Greg KH
2012-03-16 23:38 ` [ 22/41] r8169: corrupted IP fragments fix for large mtu Greg KH
2012-03-16 23:38   ` Greg KH
2012-03-16 23:38 ` [ 23/41] tcp: dont fragment SACKed skbs in tcp_mark_head_lost() Greg KH
2012-03-16 23:38   ` Greg KH
2012-03-16 23:38 ` [ 24/41] bridge: check return value of ipv6_dev_get_saddr() Greg KH
2012-03-16 23:38 ` Greg KH [this message]
2012-03-16 23:38 ` [ 26/41] IPv6: Fix not join all-router mcast group when forwarding set Greg KH
2012-03-16 23:38 ` [ 27/41] usb: asix: Patch for Sitecom LN-031 Greg KH
2012-03-16 23:38 ` [ 28/41] regulator: Fix setting selector in tps6524x set_voltage function Greg KH
2012-03-16 23:38 ` [ 29/41] block: Fix NULL pointer dereference in sd_revalidate_disk Greg KH
2012-03-16 23:38 ` [ 30/41] block, sx8: fix pointer math issue getting fw version Greg KH
2012-03-16 23:38 ` [ 31/41] block: fix __blkdev_get and add_disk race condition Greg KH
2012-03-16 23:38 ` [ 32/41] Block: use a freezable workqueue for disk-event polling Greg KH
2012-03-16 23:38 ` [ 33/41] sparc32: Add -Av8 to assembler command line Greg KH
2012-03-16 23:38 ` [ 34/41] hwmon: (w83627ehf) Fix writing into fan_stop_time for NCT6775F/NCT6776F Greg KH
2012-03-16 23:38 ` [ 35/41] hwmon: (w83627ehf) Fix memory leak in probe function Greg KH
2012-03-16 23:38 ` [ 36/41] hwmon: (w83627ehf) Fix temp2 source for W83627UHG Greg KH
2012-03-16 23:38 ` [ 37/41] rapidio/tsi721: fix bug in register offset definitions Greg KH
2012-03-16 23:38 ` [ 38/41] i2c-algo-bit: Fix spurious SCL timeouts under heavy load Greg KH
2012-03-16 23:38 ` [ 39/41] iscsi-target: Fix reservation conflict -EBUSY response handling bug Greg KH
2012-03-16 23:38 ` [ 40/41] target: Fix compatible reservation handling (CRH=1) with legacy RESERVE/RELEASE Greg KH
2012-03-16 23:38 ` [ 41/41] hwmon: (zl6100) Enable interval between chip accesses for all chips Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120316233812.834856254@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ncardwell@google.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.