From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Willem de Bruijn <willemb@google.com>,
Neil Horman <nhorman@tuxdriver.com>,
Matteo Croce <mcroce@redhat.com>,
"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 4.9 052/102] af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET
Date: Mon, 8 Jul 2019 17:12:45 +0200 [thread overview]
Message-ID: <20190708150529.134836691@linuxfoundation.org> (raw)
In-Reply-To: <20190708150525.973820964@linuxfoundation.org>
From: Neil Horman <nhorman@tuxdriver.com>
[ Upstream commit 89ed5b519004a7706f50b70f611edbd3aaacff2c ]
When an application is run that:
a) Sets its scheduler to be SCHED_FIFO
and
b) Opens a memory mapped AF_PACKET socket, and sends frames with the
MSG_DONTWAIT flag cleared, its possible for the application to hang
forever in the kernel. This occurs because when waiting, the code in
tpacket_snd calls schedule, which under normal circumstances allows
other tasks to run, including ksoftirqd, which in some cases is
responsible for freeing the transmitted skb (which in AF_PACKET calls a
destructor that flips the status bit of the transmitted frame back to
available, allowing the transmitting task to complete).
However, when the calling application is SCHED_FIFO, its priority is
such that the schedule call immediately places the task back on the cpu,
preventing ksoftirqd from freeing the skb, which in turn prevents the
transmitting task from detecting that the transmission is complete.
We can fix this by converting the schedule call to a completion
mechanism. By using a completion queue, we force the calling task, when
it detects there are no more frames to send, to schedule itself off the
cpu until such time as the last transmitted skb is freed, allowing
forward progress to be made.
Tested by myself and the reporter, with good results
Change Notes:
V1->V2:
Enhance the sleep logic to support being interruptible and
allowing for honoring to SK_SNDTIMEO (Willem de Bruijn)
V2->V3:
Rearrage the point at which we wait for the completion queue, to
avoid needing to check for ph/skb being null at the end of the loop.
Also move the complete call to the skb destructor to avoid needing to
modify __packet_set_status. Also gate calling complete on
packet_read_pending returning zero to avoid multiple calls to complete.
(Willem de Bruijn)
Move timeo computation within loop, to re-fetch the socket
timeout since we also use the timeo variable to record the return code
from the wait_for_complete call (Neil Horman)
V3->V4:
Willem has requested that the control flow be restored to the
previous state. Doing so lets us eliminate the need for the
po->wait_on_complete flag variable, and lets us get rid of the
packet_next_frame function, but introduces another complexity.
Specifically, but using the packet pending count, we can, if an
applications calls sendmsg multiple times with MSG_DONTWAIT set, each
set of transmitted frames, when complete, will cause
tpacket_destruct_skb to issue a complete call, for which there will
never be a wait_on_completion call. This imbalance will lead to any
future call to wait_for_completion here to return early, when the frames
they sent may not have completed. To correct this, we need to re-init
the completion queue on every call to tpacket_snd before we enter the
loop so as to ensure we wait properly for the frames we send in this
iteration.
Change the timeout and interrupted gotos to out_put rather than
out_status so that we don't try to free a non-existant skb
Clean up some extra newlines (Willem de Bruijn)
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/packet/af_packet.c | 20 +++++++++++++++++---
net/packet/internal.h | 1 +
2 files changed, 18 insertions(+), 3 deletions(-)
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2399,6 +2399,9 @@ static void tpacket_destruct_skb(struct
ts = __packet_set_timestamp(po, ph, skb);
__packet_set_status(po, ph, TP_STATUS_AVAILABLE | ts);
+
+ if (!packet_read_pending(&po->tx_ring))
+ complete(&po->skb_completion);
}
sock_wfree(skb);
@@ -2629,7 +2632,7 @@ static int tpacket_parse_header(struct p
static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
{
- struct sk_buff *skb;
+ struct sk_buff *skb = NULL;
struct net_device *dev;
struct virtio_net_hdr *vnet_hdr = NULL;
struct sockcm_cookie sockc;
@@ -2644,6 +2647,7 @@ static int tpacket_snd(struct packet_soc
int len_sum = 0;
int status = TP_STATUS_AVAILABLE;
int hlen, tlen, copylen = 0;
+ long timeo = 0;
mutex_lock(&po->pg_vec_lock);
@@ -2690,12 +2694,21 @@ static int tpacket_snd(struct packet_soc
if ((size_max > dev->mtu + reserve + VLAN_HLEN) && !po->has_vnet_hdr)
size_max = dev->mtu + reserve + VLAN_HLEN;
+ reinit_completion(&po->skb_completion);
+
do {
ph = packet_current_frame(po, &po->tx_ring,
TP_STATUS_SEND_REQUEST);
if (unlikely(ph == NULL)) {
- if (need_wait && need_resched())
- schedule();
+ if (need_wait && skb) {
+ timeo = sock_sndtimeo(&po->sk, msg->msg_flags & MSG_DONTWAIT);
+ timeo = wait_for_completion_interruptible_timeout(&po->skb_completion, timeo);
+ if (timeo <= 0) {
+ err = !timeo ? -ETIMEDOUT : -ERESTARTSYS;
+ goto out_put;
+ }
+ }
+ /* check for additional frames */
continue;
}
@@ -3249,6 +3262,7 @@ static int packet_create(struct net *net
sock_init_data(sock, sk);
po = pkt_sk(sk);
+ init_completion(&po->skb_completion);
sk->sk_family = PF_PACKET;
po->num = proto;
po->xmit = dev_queue_xmit;
--- a/net/packet/internal.h
+++ b/net/packet/internal.h
@@ -125,6 +125,7 @@ struct packet_sock {
unsigned int tp_hdrlen;
unsigned int tp_reserve;
unsigned int tp_tstamp;
+ struct completion skb_completion;
struct net_device __rcu *cached_dev;
int (*xmit)(struct sk_buff *skb);
struct packet_type prot_hook ____cacheline_aligned_in_smp;
next prev parent reply other threads:[~2019-07-08 15:20 UTC|newest]
Thread overview: 108+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-08 15:11 [PATCH 4.9 000/102] 4.9.185-stable review Greg Kroah-Hartman
2019-07-08 15:11 ` [PATCH 4.9 001/102] tracing: Silence GCC 9 array bounds warning Greg Kroah-Hartman
2019-07-08 15:11 ` [PATCH 4.9 002/102] gcc-9: silence address-of-packed-member warning Greg Kroah-Hartman
2019-07-08 15:11 ` [PATCH 4.9 003/102] scsi: ufs: Avoid runtime suspend possibly being blocked forever Greg Kroah-Hartman
2019-07-08 15:11 ` [PATCH 4.9 004/102] usb: chipidea: udc: workaround for endpoint conflict issue Greg Kroah-Hartman
2019-07-08 15:11 ` [PATCH 4.9 005/102] IB/hfi1: Silence txreq allocation warnings Greg Kroah-Hartman
2019-07-08 15:11 ` [PATCH 4.9 006/102] Input: uinput - add compat ioctl number translation for UI_*_FF_UPLOAD Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 007/102] apparmor: enforce nullbyte at end of tag string Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 008/102] ARC: fix build warnings with !CONFIG_KPROBES Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 009/102] parport: Fix mem leak in parport_register_dev_model Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 010/102] parisc: Fix compiler warnings in float emulation code Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 011/102] IB/rdmavt: Fix alloc_qpn() WARN_ON() Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 012/102] IB/hfi1: Insure freeze_work work_struct is canceled on shutdown Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 013/102] IB/{qib, hfi1, rdmavt}: Correct ibv_devinfo max_mr value Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 014/102] MIPS: uprobes: remove set but not used variable epc Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 015/102] net: dsa: mv88e6xxx: avoid error message on remove from VLAN 0 Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 016/102] net: hns: Fix loopback test failed at copper ports Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 017/102] sparc: perf: fix updated event period in response to PERF_EVENT_IOC_PERIOD Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 018/102] net: ethernet: mediatek: Use hw_feature to judge if HWLRO is supported Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 019/102] net: ethernet: mediatek: Use NET_IP_ALIGN to judge if HW RX_2BYTE_OFFSET is enabled Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 020/102] drm/arm/hdlcd: Allow a bit of clock tolerance Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 021/102] scripts/checkstack.pl: Fix arm64 wrong or unknown architecture Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 022/102] scsi: ufs: Check that space was properly alloced in copy_query_response Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 023/102] s390/qeth: fix VLAN attribute in bridge_hostnotify udev event Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 024/102] hwmon: (pmbus/core) Treat parameters as paged if on multiple pages Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 025/102] nvme: Fix u32 overflow in the number of namespace list calculation Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 026/102] btrfs: start readahead also in seed devices Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 027/102] can: flexcan: fix timeout when set small bitrate Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 028/102] can: purge socket error queue on sock destruct Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 029/102] powerpc/bpf: use unsigned division instruction for 64-bit operations Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 030/102] ARM: imx: cpuidle-imx6sx: Restrict the SW2ISO increase to i.MX6SX Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 031/102] Bluetooth: Align minimum encryption key size for LE and BR/EDR connections Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 032/102] Bluetooth: Fix regression with minimum encryption key size alignment Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 033/102] cfg80211: fix memory leak of wiphy device name Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 034/102] mac80211: drop robust management frames from unknown TA Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 035/102] mac80211: Do not use stack memory with scatterlist for GMAC Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 036/102] IB/hfi1: Avoid hardlockup with flushlist_lock Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 037/102] perf ui helpline: Use strlcpy() as a shorter form of strncpy() + explicit set nul Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 038/102] perf help: Remove needless use of strncpy() Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 039/102] perf header: Fix unchecked usage " Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 040/102] 9p/rdma: do not disconnect on down_interruptible EAGAIN Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 041/102] 9p: acl: fix uninitialized iattr access Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 042/102] 9p/rdma: remove useless check in cm_event_handler Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 043/102] 9p: p9dirent_read: check network-provided name length Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 044/102] net/9p: include trans_common.h to fix missing prototype warning Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 045/102] fs/proc/array.c: allow reporting eip/esp for all coredumping threads Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 046/102] fs/binfmt_flat.c: make load_flat_shared_library() work Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 047/102] mm/page_idle.c: fix oops because end_pfn is larger than max_pfn Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 048/102] scsi: vmw_pscsi: Fix use-after-free in pvscsi_queue_lck() Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 049/102] x86/speculation: Allow guests to use SSBD even if host does not Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 050/102] NFS/flexfiles: Use the correct TCP timeout for flexfiles I/O Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 051/102] cpu/speculation: Warn on unsupported mitigations= parameter Greg Kroah-Hartman
2019-07-08 15:12 ` Greg Kroah-Hartman [this message]
2019-07-08 15:12 ` [PATCH 4.9 053/102] net: stmmac: fixed new system time seconds value calculation Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 054/102] sctp: change to hold sk after auth shkey is created successfully Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 055/102] tipc: change to use register_pernet_device Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 056/102] tipc: check msg->req data len in tipc_nl_compat_bearer_disable Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 057/102] tun: wake up waitqueues after IFF_UP is set Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 058/102] team: Always enable vlan tx offload Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 059/102] bonding: " Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 060/102] ipv4: Use return value of inet_iif() for __raw_v4_lookup in the while loop Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 061/102] net: check before dereferencing netdev_ops during busy poll Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 062/102] bpf: udp: Avoid calling reuseports bpf_prog from udp_gro Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 063/102] bpf: udp: ipv6: Avoid running reuseports bpf_prog from __udp6_lib_err Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 064/102] tipc: pass tunnel dev as NULL to udp_tunnel(6)_xmit_skb Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 065/102] Bluetooth: Fix faulty expression for minimum encryption key size check Greg Kroah-Hartman
2019-07-08 15:12 ` [PATCH 4.9 066/102] ASoC : cs4265 : readable register too low Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 067/102] ASoC: soc-pcm: BE dai needs prepare when pause release after resume Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 068/102] spi: bitbang: Fix NULL pointer dereference in spi_unregister_master Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 069/102] drm/mediatek: fix unbind functions Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 070/102] ASoC: max98090: remove 24-bit format support if RJ is 0 Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 071/102] usb: gadget: fusb300_udc: Fix memory leak of fusb300->ep[i] Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 072/102] usb: gadget: udc: lpc32xx: allocate descriptor with GFP_ATOMIC Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 073/102] scsi: hpsa: correct ioaccel2 chaining Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 074/102] scripts/decode_stacktrace.sh: prefix addr2line with $CROSS_COMPILE Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 075/102] mm/mlock.c: change count_mm_mlocked_page_nr return type Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 076/102] MIPS: math-emu: do not use bools for arithmetic Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 077/102] MIPS: netlogic: xlr: Remove erroneous check in nlm_fmn_send() Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 078/102] mfd: omap-usb-tll: Fix register offsets Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 079/102] ARC: fix allnoconfig build warning Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 080/102] bug.h: work around GCC PR82365 in BUG() Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 081/102] ARC: handle gcc generated __builtin_trap for older compiler Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 082/102] clk: sunxi: fix uninitialized access Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 083/102] KVM: x86: degrade WARN to pr_warn_ratelimited Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 084/102] drm/i915/dmc: protect against reading random memory Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 085/102] MIPS: Workaround GCC __builtin_unreachable reordering bug Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 086/102] ptrace: Fix ->ptracer_cred handling for PTRACE_TRACEME Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 087/102] crypto: user - prevent operating on larval algorithms Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 088/102] ALSA: seq: fix incorrect order of dest_client/dest_ports arguments Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 089/102] ALSA: firewire-lib/fireworks: fix miss detection of received MIDI messages Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 090/102] ALSA: line6: Fix write on zero-sized buffer Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 091/102] ALSA: usb-audio: fix sign unintended sign extension on left shifts Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 092/102] lib/mpi: Fix karactx leak in mpi_powm Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 093/102] drm/imx: notify drm core before sending event during crtc disable Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 094/102] drm/imx: only send event on crtc disable if kept disabled Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 095/102] btrfs: Ensure replaced device doesnt have pending chunk allocation Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 096/102] tty: rocket: fix incorrect forward declaration of rp_init() Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 097/102] arm64, vdso: Define vdso_{start,end} as array Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 098/102] KVM: LAPIC: Fix pending interrupt in IRR blocked by software disable LAPIC Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 099/102] IB/hfi1: Close PSM sdma_progress sleep window Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 100/102] MIPS: Add missing EHB in mtc0 -> mfc0 sequence Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 101/102] dmaengine: imx-sdma: remove BD_INTR for channel0 Greg Kroah-Hartman
2019-07-08 15:13 ` [PATCH 4.9 102/102] arm64: kaslr: keep modules inside module region when KASAN is enabled Greg Kroah-Hartman
2019-07-08 19:12 ` [PATCH 4.9 000/102] 4.9.185-stable review kernelci.org bot
2019-07-09 2:10 ` Naresh Kamboju
2019-07-09 2:27 ` shuah
2019-07-09 18:47 ` Guenter Roeck
2019-07-10 6:11 ` Jon Hunter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190708150529.134836691@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=davem@davemloft.net \
--cc=linux-kernel@vger.kernel.org \
--cc=mcroce@redhat.com \
--cc=nhorman@tuxdriver.com \
--cc=stable@vger.kernel.org \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).