public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Neil Horman <nhorman@tuxdriver.com>,
	Jamal Hadi Salim <jhs@mojatatu.com>,
	"David S. Miller" <davem@davemloft.net>,
	netem@lists.linux-foundation.org, eric.dumazet@gmail.com,
	stephen@networkplumber.org, Eric Dumazet <edumazet@google.com>
Subject: [PATCH 4.5 030/101] netem: Segment GSO packets on enqueue
Date: Mon, 16 May 2016 18:20:35 -0700	[thread overview]
Message-ID: <20160517011507.478632526@linuxfoundation.org> (raw)
In-Reply-To: <20160517011506.359924439@linuxfoundation.org>

4.5-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Neil Horman <nhorman@tuxdriver.com>

[ Upstream commit 6071bd1aa13ed9e41824bafad845b7b7f4df5cfd ]

This was recently reported to me, and reproduced on the latest net kernel,
when attempting to run netperf from a host that had a netem qdisc attached
to the egress interface:

[  788.073771] ---------------------[ cut here ]---------------------------
[  788.096716] WARNING: at net/core/dev.c:2253 skb_warn_bad_offload+0xcd/0xda()
[  788.129521] bnx2: caps=(0x00000001801949b3, 0x0000000000000000) len=2962
data_len=0 gso_size=1448 gso_type=1 ip_summed=3
[  788.182150] Modules linked in: sch_netem kvm_amd kvm crc32_pclmul ipmi_ssif
ghash_clmulni_intel sp5100_tco amd64_edac_mod aesni_intel lrw gf128mul
glue_helper ablk_helper edac_mce_amd cryptd pcspkr sg edac_core hpilo ipmi_si
i2c_piix4 k10temp fam15h_power hpwdt ipmi_msghandler shpchp acpi_power_meter
pcc_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c
sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect sysimgblt
i2c_algo_bit drm_kms_helper ahci ata_generic pata_acpi ttm libahci
crct10dif_pclmul pata_atiixp tg3 libata crct10dif_common drm crc32c_intel ptp
serio_raw bnx2 r8169 hpsa pps_core i2c_core mii dm_mirror dm_region_hash dm_log
dm_mod
[  788.465294] CPU: 16 PID: 0 Comm: swapper/16 Tainted: G        W
------------   3.10.0-327.el7.x86_64 #1
[  788.511521] Hardware name: HP ProLiant DL385p Gen8, BIOS A28 12/17/2012
[  788.542260]  ffff880437c036b8 f7afc56532a53db9 ffff880437c03670
ffffffff816351f1
[  788.576332]  ffff880437c036a8 ffffffff8107b200 ffff880633e74200
ffff880231674000
[  788.611943]  0000000000000001 0000000000000003 0000000000000000
ffff880437c03710
[  788.647241] Call Trace:
[  788.658817]  <IRQ>  [<ffffffff816351f1>] dump_stack+0x19/0x1b
[  788.686193]  [<ffffffff8107b200>] warn_slowpath_common+0x70/0xb0
[  788.713803]  [<ffffffff8107b29c>] warn_slowpath_fmt+0x5c/0x80
[  788.741314]  [<ffffffff812f92f3>] ? ___ratelimit+0x93/0x100
[  788.767018]  [<ffffffff81637f49>] skb_warn_bad_offload+0xcd/0xda
[  788.796117]  [<ffffffff8152950c>] skb_checksum_help+0x17c/0x190
[  788.823392]  [<ffffffffa01463a1>] netem_enqueue+0x741/0x7c0 [sch_netem]
[  788.854487]  [<ffffffff8152cb58>] dev_queue_xmit+0x2a8/0x570
[  788.880870]  [<ffffffff8156ae1d>] ip_finish_output+0x53d/0x7d0
...

The problem occurs because netem is not prepared to handle GSO packets (as it
uses skb_checksum_help in its enqueue path, which cannot manipulate these
frames).

The solution I think is to simply segment the skb in a simmilar fashion to the
way we do in __dev_queue_xmit (via validate_xmit_skb), with some minor changes.
When we decide to corrupt an skb, if the frame is GSO, we segment it, corrupt
the first segment, and enqueue the remaining ones.

tested successfully by myself on the latest net kernel, to which this applies

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Jamal Hadi Salim <jhs@mojatatu.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: netem@lists.linux-foundation.org
CC: eric.dumazet@gmail.com
CC: stephen@networkplumber.org
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/sched/sch_netem.c |   61 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 59 insertions(+), 2 deletions(-)

--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -395,6 +395,25 @@ static void tfifo_enqueue(struct sk_buff
 	sch->q.qlen++;
 }
 
+/* netem can't properly corrupt a megapacket (like we get from GSO), so instead
+ * when we statistically choose to corrupt one, we instead segment it, returning
+ * the first packet to be corrupted, and re-enqueue the remaining frames
+ */
+static struct sk_buff *netem_segment(struct sk_buff *skb, struct Qdisc *sch)
+{
+	struct sk_buff *segs;
+	netdev_features_t features = netif_skb_features(skb);
+
+	segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
+
+	if (IS_ERR_OR_NULL(segs)) {
+		qdisc_reshape_fail(skb, sch);
+		return NULL;
+	}
+	consume_skb(skb);
+	return segs;
+}
+
 /*
  * Insert one skb into qdisc.
  * Note: parent depends on return value to account for queue length.
@@ -407,7 +426,11 @@ static int netem_enqueue(struct sk_buff
 	/* We don't fill cb now as skb_unshare() may invalidate it */
 	struct netem_skb_cb *cb;
 	struct sk_buff *skb2;
+	struct sk_buff *segs = NULL;
+	unsigned int len = 0, last_len, prev_len = qdisc_pkt_len(skb);
+	int nb = 0;
 	int count = 1;
+	int rc = NET_XMIT_SUCCESS;
 
 	/* Random duplication */
 	if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor))
@@ -453,10 +476,23 @@ static int netem_enqueue(struct sk_buff
 	 * do it now in software before we mangle it.
 	 */
 	if (q->corrupt && q->corrupt >= get_crandom(&q->corrupt_cor)) {
+		if (skb_is_gso(skb)) {
+			segs = netem_segment(skb, sch);
+			if (!segs)
+				return NET_XMIT_DROP;
+		} else {
+			segs = skb;
+		}
+
+		skb = segs;
+		segs = segs->next;
+
 		if (!(skb = skb_unshare(skb, GFP_ATOMIC)) ||
 		    (skb->ip_summed == CHECKSUM_PARTIAL &&
-		     skb_checksum_help(skb)))
-			return qdisc_drop(skb, sch);
+		     skb_checksum_help(skb))) {
+			rc = qdisc_drop(skb, sch);
+			goto finish_segs;
+		}
 
 		skb->data[prandom_u32() % skb_headlen(skb)] ^=
 			1<<(prandom_u32() % 8);
@@ -516,6 +552,27 @@ static int netem_enqueue(struct sk_buff
 		sch->qstats.requeues++;
 	}
 
+finish_segs:
+	if (segs) {
+		while (segs) {
+			skb2 = segs->next;
+			segs->next = NULL;
+			qdisc_skb_cb(segs)->pkt_len = segs->len;
+			last_len = segs->len;
+			rc = qdisc_enqueue(segs, sch);
+			if (rc != NET_XMIT_SUCCESS) {
+				if (net_xmit_drop_count(rc))
+					qdisc_qstats_drop(sch);
+			} else {
+				nb++;
+				len += last_len;
+			}
+			segs = skb2;
+		}
+		sch->q.qlen += nb;
+		if (nb > 1)
+			qdisc_tree_reduce_backlog(sch, 1 - nb, prev_len - len);
+	}
 	return NET_XMIT_SUCCESS;
 }
 

  parent reply	other threads:[~2016-05-17  1:25 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-17  1:20 [PATCH 4.5 000/101] 4.5.5-stable review Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 001/101] staging: wilc1000: remove extraneous variable Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 002/101] decnet: Do not build routes to devices without decnet private data Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 003/101] route: do not cache fib route info on local routes with oif Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 004/101] packet: fix heap info leak in PACKET_DIAG_MCLIST sock_diag interface Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 005/101] net: sched: do not requeue a NULL skb Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 006/101] bpf/verifier: reject invalid LD_ABS | BPF_DW instruction Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 009/101] net: use skb_postpush_rcsum instead of own implementations Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 010/101] vlan: pull on __vlan_insert_tag error path and fix csum correction Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 011/101] openvswitch: Orphan skbs before IPv6 defrag Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 012/101] atl2: Disable unimplemented scatter/gather feature Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 013/101] openvswitch: use flow protocol when recalculating ipv6 checksums Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 014/101] net/mlx5_core: Fix soft lockup in steering error flow Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 015/101] net/mlx5e: Devices mtu field is u16 and not int Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 016/101] net/mlx5e: Fix minimum MTU Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 017/101] net/mlx5e: Use vport MTU rather than physical port MTU Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 018/101] ipv4/fib: dont warn when primary address is missing if in_dev is dead Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 019/101] net/mlx4_en: fix spurious timestamping callbacks Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 020/101] bpf: fix double-fdput in replace_map_fd_with_map_ptr() Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 021/101] bpf: fix refcnt overflow Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 022/101] bpf: fix check_map_func_compatibility logic Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 023/101] samples/bpf: fix trace_output example Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 024/101] net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 025/101] gre: do not pull header in ICMP error processing Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 026/101] net_sched: introduce qdisc_replace() helper Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 027/101] net_sched: update hierarchical backlog too Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 028/101] sch_htb: update backlog as well Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 029/101] sch_dsmark: " Greg Kroah-Hartman
2016-05-17  1:20 ` Greg Kroah-Hartman [this message]
2016-05-17  1:20 ` [PATCH 4.5 031/101] ipv6/ila: fix nlsize calculation for lwtunnel Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 035/101] net/mlx4_en: Fix endianness bug in IPV6 csum calculation Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 036/101] VSOCK: do not disconnect socket when peer has shutdown SEND only Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 037/101] net: bridge: fix old ioctl unlocked net device walk Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 040/101] net: fix a kernel infoleak in x25 module Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 041/101] net: thunderx: avoid exposing kernel stack Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 042/101] tcp: refresh skb timestamp at retransmit time Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 043/101] net/route: enforce hoplimit max value Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 044/101] ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 045/101] ocfs2: fix posix_acl_create deadlock Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 046/101] zsmalloc: fix zs_can_compact() integer overflow Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 047/101] mm: thp: calculate the mapcount correctly for THP pages during WP faults Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 048/101] crypto: qat - fix invalid pf2vf_resp_wq logic Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 049/101] crypto: qat - fix adf_ctl_drv.c:undefined reference to adf_init_pf_wq Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 050/101] crypto: hash - Fix page length clamping in hash walk Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 051/101] crypto: testmgr - Use kmalloc memory for RSA input Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 052/101] ALSA: usb-audio: Quirk for yet another Phoenix Audio devices (v2) Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 053/101] ALSA: usb-audio: Yet another Phoneix Audio device quirk Greg Kroah-Hartman
2016-05-17  1:20 ` [PATCH 4.5 054/101] ALSA: hda - Fix subwoofer pin on ASUS N751 and N551 Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 055/101] ALSA: hda - Fix white noise on Asus UX501VW headset Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 056/101] ALSA: hda - Fix broken reconfig Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 057/101] spi: pxa2xx: Do not detect number of enabled chip selects on Intel SPT Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 058/101] spi: spi-ti-qspi: Fix FLEN and WLEN settings if bits_per_word is overridden Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 059/101] spi: spi-ti-qspi: Handle truncated frames properly Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 060/101] pinctrl: at91-pio4: fix pull-up/down logic Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 061/101] regmap: spmi: Fix regmap_spmi_ext_read in multi-byte case Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 062/101] perf diff: Fix duplicated output column Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 063/101] perf/core: Disable the event on a truncated AUX record Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 064/101] vfs: add vfs_select_inode() helper Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 065/101] vfs: rename: check backing inode being equal Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 066/101] ARM: dts: at91: sam9x5: Fix the memory range assigned to the PMC Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 068/101] regulator: s2mps11: Fix invalid selector mask and voltages for buck9 Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 069/101] regulator: axp20x: Fix axp22x ldo_io voltage ranges Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 070/101] atomic_open(): fix the handling of create_error Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 071/101] qla1280: Dont allocate 512kb of host tags Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 072/101] tools lib traceevent: Do not reassign parg after collapse_tree() Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 073/101] get_rock_ridge_filename(): handle malformed NM entries Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 074/101] Input: max8997-haptic - fix NULL pointer dereference Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 075/101] Revert "[media] videobuf2-v4l2: Verify planes array in buffer dequeueing" Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 077/101] drm/radeon: fix PLL sharing on DCE6.1 (v2) Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 078/101] drm/i915: Bail out of pipe config compute loop on LPT Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 079/101] Revert "drm/i915: start adding dp mst audio" Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 081/101] drm/radeon: fix DP link training issue with second 4K monitor Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 082/101] drm/radeon: fix DP mode validation Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 083/101] drm/amdgpu: " Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 084/101] btrfs: reada: Fix in-segment calculation for reada Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 085/101] Btrfs: fix truncate_space_check Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 086/101] btrfs: remove error message from search ioctl for nonexistent tree Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 087/101] btrfs: change max_inline default to 2048 Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 088/101] Btrfs: fix unreplayable log after snapshot delete + parent dir fsync Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 089/101] Btrfs: fix file loss on log replay after renaming a file and fsync Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 090/101] Btrfs: fix extent_same allowing destination offset beyond i_size Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 091/101] Btrfs: fix deadlock between direct IO reads and buffered writes Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 092/101] Btrfs: fix race when checking if we can skip fsyncing an inode Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 093/101] Btrfs: do not collect ordered extents when logging that inode exists Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 094/101] btrfs: csum_tree_block: return proper errno value Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 095/101] btrfs: do not write corrupted metadata blocks to disk Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 096/101] Btrfs: fix invalid reference in replace_path Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 097/101] btrfs: handle non-fatal errors in btrfs_qgroup_inherit() Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 098/101] btrfs: fallback to vmalloc in btrfs_compare_tree Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 099/101] Btrfs: dont use src fd for printk Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 100/101] btrfs: Reset IO error counters before start of device replacing Greg Kroah-Hartman
2016-05-17  1:21 ` [PATCH 4.5 101/101] nf_conntrack: avoid kernel pointer value leak in slab name Greg Kroah-Hartman
2016-05-17 17:28 ` [PATCH 4.5 000/101] 4.5.5-stable review Guenter Roeck
2016-05-17 17:28 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160517011507.478632526@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=jhs@mojatatu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netem@lists.linux-foundation.org \
    --cc=nhorman@tuxdriver.com \
    --cc=stable@vger.kernel.org \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox