From: Greg KH <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: torvalds@linux-foundation.org, akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk, Neal Cardwell <ncardwell@google.com>,
"David S. Miller" <davem@davemloft.net>
Subject: [ 25/41] tcp: fix tcp_shift_skb_data() to not shift SACKed data below snd_una
Date: Fri, 16 Mar 2012 16:38:35 -0700 [thread overview]
Message-ID: <20120316233812.834856254@linuxfoundation.org> (raw)
In-Reply-To: <20120316233829.GA14022@kroah.com>
3.2-stable review patch. If anyone has any objections, please let me know.
------------------
From: Neal Cardwell <ncardwell@google.com>
[ Upstream commit 4648dc97af9d496218a05353b0e442b3dfa6aaab ]
This commit fixes tcp_shift_skb_data() so that it does not shift
SACKed data below snd_una.
This fixes an issue whose symptoms exactly match reports showing
tp->sacked_out going negative since 3.3.0-rc4 (see "WARNING: at
net/ipv4/tcp_input.c:3418" thread on netdev).
Since 2008 (832d11c5cd076abc0aa1eaf7be96c81d1a59ce41)
tcp_shift_skb_data() had been shifting SACKed ranges that were below
snd_una. It checked that the *end* of the skb it was about to shift
from was above snd_una, but did not check that the end of the actual
shifted range was above snd_una; this commit adds that check.
Shifting SACKed ranges below snd_una is problematic because for such
ranges tcp_sacktag_one() short-circuits: it does not declare anything
as SACKed and does not increase sacked_out.
Before the fixes in commits cc9a672ee522d4805495b98680f4a3db5d0a0af9
and daef52bab1fd26e24e8e9578f8fb33ba1d0cb412, shifting SACKed ranges
below snd_una happened to work because tcp_shifted_skb() was always
(incorrectly) passing in to tcp_sacktag_one() an skb whose end_seq
tcp_shift_skb_data() had already guaranteed was beyond snd_una. Hence
tcp_sacktag_one() never short-circuited and always increased
tp->sacked_out in this case.
After those two fixes, my testing has verified that shifting SACKed
ranges below snd_una could cause tp->sacked_out to go negative with
the following sequence of events:
(1) tcp_shift_skb_data() sees an skb whose end_seq is beyond snd_una,
then shifts a prefix of that skb that is below snd_una
(2) tcp_shifted_skb() increments the packet count of the
already-SACKed prev sk_buff
(3) tcp_sacktag_one() sees the end of the new SACKed range is below
snd_una, so it short-circuits and doesn't increase tp->sacked_out
(5) tcp_clean_rtx_queue() sees the SACKed skb has been ACKed,
decrements tp->sacked_out by this "inflated" pcount that was
missing a matching increase in tp->sacked_out, and hence
tp->sacked_out underflows to a u32 like 0xFFFFFFFF, which casted
to s32 is negative.
(6) this leads to the warnings seen in the recent "WARNING: at
net/ipv4/tcp_input.c:3418" thread on the netdev list; e.g.:
tcp_input.c:3418 WARN_ON((int)tp->sacked_out < 0);
More generally, I think this bug can be tickled in some cases where
two or more ACKs from the receiver are lost and then a DSACK arrives
that is immediately above an existing SACKed skb in the write queue.
This fix changes tcp_shift_skb_data() to abort this sequence at step
(1) in the scenario above by noticing that the bytes are below snd_una
and not shifting them.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/ipv4/tcp_input.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1588,6 +1588,10 @@ static struct sk_buff *tcp_shift_skb_dat
}
}
+ /* tcp_sacktag_one() won't SACK-tag ranges below snd_una */
+ if (!after(TCP_SKB_CB(skb)->seq + len, tp->snd_una))
+ goto fallback;
+
if (!skb_shift(prev, skb, len))
goto fallback;
if (!tcp_shifted_skb(sk, skb, state, pcount, len, mss, dup_sack))
next prev parent reply other threads:[~2012-03-16 23:38 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-16 23:38 [ 00/41] 3.2.12-stable review Greg KH
2012-03-16 23:38 ` [ 01/41] ASoC: neo1973: fix neo1973 wm8753 initialization Greg KH
2012-03-16 23:38 ` [ 02/41] ALSA: hda/realtek - Apply the coef-setup only to ALC269VB Greg KH
2012-03-16 23:38 ` [ 03/41] aio: fix io_setup/io_destroy race Greg KH
2012-03-16 23:38 ` [ 04/41] aio: fix the "too late munmap()" race Greg KH
2012-03-16 23:38 ` [ 05/41] x86: Derandom delay_tsc for 64 bit Greg KH
2012-03-16 23:38 ` [ 06/41] PCI: ignore pre-1.1 ASPM quirking when ASPM is disabled Greg KH
2012-03-31 3:23 ` Ken Moffat
2012-03-31 3:33 ` Jonathan Nieder
2012-03-31 18:20 ` Linus Torvalds
2012-03-31 18:32 ` Matthew Garrett
2012-04-19 23:21 ` Ken Moffat
2012-04-01 16:11 ` Ken Moffat
2012-04-01 16:59 ` Linus Torvalds
2012-04-01 17:10 ` Greg KH
2012-04-02 20:27 ` Ken Moffat
2012-03-16 23:38 ` [ 07/41] [media] omap3isp: ccdc: Fix crash in HS/VS interrupt handler Greg KH
2012-03-16 23:38 ` [ 08/41] rt2x00: fix random stalls Greg KH
2012-03-16 23:38 ` [ 09/41] perf/x86: Fix local vs remote memory events for NHM/WSM Greg KH
2012-03-16 23:38 ` [ 10/41] CIFS: Do not kmalloc under the flocks spinlock Greg KH
2012-03-17 2:37 ` Ben Hutchings
2012-03-17 6:14 ` Pavel Shilovsky
2012-03-17 7:32 ` Ben Hutchings
2012-03-17 7:52 ` Pavel Shilovsky
2012-03-19 15:50 ` Greg KH
2012-03-19 19:11 ` Pavel Shilovsky
2012-03-19 19:24 ` Greg KH
2012-03-23 17:52 ` Greg KH
2012-03-16 23:38 ` [ 11/41] vfs: fix return value from do_last() Greg KH
2012-03-16 23:38 ` [ 12/41] vfs: fix double put after complete_walk() Greg KH
2012-03-16 23:38 ` [ 13/41] acer-wmi: No wifi rfkill on Lenovo machines Greg KH
2012-03-16 23:38 ` [ 14/41] atl1c: dont use highprio tx queue Greg KH
2012-03-16 23:38 ` [ 15/41] neighbour: Fixed race condition at tbl->nht Greg KH
2012-03-16 23:38 ` [ 16/41] ipsec: be careful of non existing mac headers Greg KH
2012-03-16 23:38 ` [ 17/41] ppp: fix ppp_mp_reconstruct bad seq errors Greg KH
2012-03-16 23:38 ` [ 18/41] sfc: Fix assignment of ip_summed for pre-allocated skbs Greg KH
2012-03-16 23:38 ` [ 19/41] tcp: fix false reordering signal in tcp_shifted_skb Greg KH
2012-03-16 23:38 ` [ 20/41] vmxnet3: Fix transport header size Greg KH
2012-03-16 23:38 ` [ 21/41] packetengines: fix config default Greg KH
2012-03-16 23:38 ` [ 22/41] r8169: corrupted IP fragments fix for large mtu Greg KH
2012-03-16 23:38 ` [ 23/41] tcp: dont fragment SACKed skbs in tcp_mark_head_lost() Greg KH
2012-03-16 23:38 ` [ 24/41] bridge: check return value of ipv6_dev_get_saddr() Greg KH
2012-03-16 23:38 ` Greg KH [this message]
2012-03-16 23:38 ` [ 26/41] IPv6: Fix not join all-router mcast group when forwarding set Greg KH
2012-03-16 23:38 ` [ 27/41] usb: asix: Patch for Sitecom LN-031 Greg KH
2012-03-16 23:38 ` [ 28/41] regulator: Fix setting selector in tps6524x set_voltage function Greg KH
2012-03-16 23:38 ` [ 29/41] block: Fix NULL pointer dereference in sd_revalidate_disk Greg KH
2012-03-16 23:38 ` [ 30/41] block, sx8: fix pointer math issue getting fw version Greg KH
2012-03-16 23:38 ` [ 31/41] block: fix __blkdev_get and add_disk race condition Greg KH
2012-03-16 23:38 ` [ 32/41] Block: use a freezable workqueue for disk-event polling Greg KH
2012-03-16 23:38 ` [ 33/41] sparc32: Add -Av8 to assembler command line Greg KH
2012-03-16 23:38 ` [ 34/41] hwmon: (w83627ehf) Fix writing into fan_stop_time for NCT6775F/NCT6776F Greg KH
2012-03-16 23:38 ` [ 35/41] hwmon: (w83627ehf) Fix memory leak in probe function Greg KH
2012-03-16 23:38 ` [ 36/41] hwmon: (w83627ehf) Fix temp2 source for W83627UHG Greg KH
2012-03-16 23:38 ` [ 37/41] rapidio/tsi721: fix bug in register offset definitions Greg KH
2012-03-16 23:38 ` [ 38/41] i2c-algo-bit: Fix spurious SCL timeouts under heavy load Greg KH
2012-03-16 23:38 ` [ 39/41] iscsi-target: Fix reservation conflict -EBUSY response handling bug Greg KH
2012-03-16 23:38 ` [ 40/41] target: Fix compatible reservation handling (CRH=1) with legacy RESERVE/RELEASE Greg KH
2012-03-16 23:38 ` [ 41/41] hwmon: (zl6100) Enable interval between chip accesses for all chips Greg KH
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120316233812.834856254@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=davem@davemloft.net \
--cc=linux-kernel@vger.kernel.org \
--cc=ncardwell@google.com \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).