* [PATCH AUTOSEL 5.4 059/266] scsi: vhost: Notify TCM about the maximum sg entries supported per command
From: Sasha Levin @ 2020-06-18 1:13 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Sudhakar Panneerselvam, Michael S . Tsirkin, Jason Wang,
Paolo Bonzini, Stefan Hajnoczi, Mike Christie,
Martin K . Petersen, Sasha Levin, virtualization, kvm, netdev
In-Reply-To: <20200618011631.604574-1-sashal@kernel.org>
From: Sudhakar Panneerselvam <sudhakar.panneerselvam@oracle.com>
[ Upstream commit 5ae6a6a915033bfee79e76e0c374d4f927909edc ]
vhost-scsi pre-allocates the maximum sg entries per command and if a
command requires more than VHOST_SCSI_PREALLOC_SGLS entries, then that
command is failed by it. This patch lets vhost communicate the max sg limit
when it registers vhost_scsi_ops with TCM. With this change, TCM would
report the max sg entries through "Block Limits" VPD page which will be
typically queried by the SCSI initiator during device discovery. By knowing
this limit, the initiator could ensure the maximum transfer length is less
than or equal to what is reported by vhost-scsi.
Link: https://lore.kernel.org/r/1590166317-953-1-git-send-email-sudhakar.panneerselvam@oracle.com
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Sudhakar Panneerselvam <sudhakar.panneerselvam@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/vhost/scsi.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index a9caf1bc3c3e..88ce114790d7 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -2290,6 +2290,7 @@ static struct configfs_attribute *vhost_scsi_wwn_attrs[] = {
static const struct target_core_fabric_ops vhost_scsi_ops = {
.module = THIS_MODULE,
.fabric_name = "vhost",
+ .max_data_sg_nents = VHOST_SCSI_PREALLOC_SGLS,
.tpg_get_wwn = vhost_scsi_get_fabric_wwn,
.tpg_get_tag = vhost_scsi_get_tpgt,
.tpg_check_demo_mode = vhost_scsi_check_true,
--
2.25.1
^ permalink raw reply related
* [PATCH AUTOSEL 5.4 071/266] yam: fix possible memory leak in yam_init_driver
From: Sasha Levin @ 2020-06-18 1:13 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Wang Hai, Hulk Robot, David S . Miller, Sasha Levin, linux-hams,
netdev
In-Reply-To: <20200618011631.604574-1-sashal@kernel.org>
From: Wang Hai <wanghai38@huawei.com>
[ Upstream commit 98749b7188affbf2900c2aab704a8853901d1139 ]
If register_netdev(dev) fails, free_netdev(dev) needs
to be called, otherwise a memory leak will occur.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/hamradio/yam.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/hamradio/yam.c b/drivers/net/hamradio/yam.c
index 71cdef9fb56b..5ab53e9942f3 100644
--- a/drivers/net/hamradio/yam.c
+++ b/drivers/net/hamradio/yam.c
@@ -1133,6 +1133,7 @@ static int __init yam_init_driver(void)
err = register_netdev(dev);
if (err) {
printk(KERN_WARNING "yam: cannot register net device %s\n", dev->name);
+ free_netdev(dev);
goto error;
}
yam_devs[i] = dev;
--
2.25.1
^ permalink raw reply related
* [PATCH AUTOSEL 5.4 080/266] bpf, sockhash: Fix memory leak when unlinking sockets in sock_hash_free
From: Sasha Levin @ 2020-06-18 1:13 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Jakub Sitnicki, Alexei Starovoitov, John Fastabend, Sasha Levin,
netdev, bpf
In-Reply-To: <20200618011631.604574-1-sashal@kernel.org>
From: Jakub Sitnicki <jakub@cloudflare.com>
[ Upstream commit 33a7c831565c43a7ee2f38c7df4c4a40e1dfdfed ]
When sockhash gets destroyed while sockets are still linked to it, we will
walk the bucket lists and delete the links. However, we are not freeing the
list elements after processing them, leaking the memory.
The leak can be triggered by close()'ing a sockhash map when it still
contains sockets, and observed with kmemleak:
unreferenced object 0xffff888116e86f00 (size 64):
comm "race_sock_unlin", pid 223, jiffies 4294731063 (age 217.404s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
81 de e8 41 00 00 00 00 c0 69 2f 15 81 88 ff ff ...A.....i/.....
backtrace:
[<00000000dd089ebb>] sock_hash_update_common+0x4ca/0x760
[<00000000b8219bd5>] sock_hash_update_elem+0x1d2/0x200
[<000000005e2c23de>] __do_sys_bpf+0x2046/0x2990
[<00000000d0084618>] do_syscall_64+0xad/0x9a0
[<000000000d96f263>] entry_SYSCALL_64_after_hwframe+0x49/0xb3
Fix it by freeing the list element when we're done with it.
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200607205229.2389672-2-jakub@cloudflare.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/core/sock_map.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 8291568b707f..ba65c608c228 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -879,6 +879,7 @@ static void sock_hash_free(struct bpf_map *map)
sock_map_unref(elem->sk, elem);
rcu_read_unlock();
release_sock(elem->sk);
+ sock_hash_free_elem(htab, elem);
}
}
--
2.25.1
^ permalink raw reply related
* [PATCH AUTOSEL 5.4 201/266] net: sunrpc: Fix off-by-one issues in 'rpc_ntop6'
From: Sasha Levin @ 2020-06-18 1:15 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Fedor Tokarev, Anna Schumaker, Sasha Levin, linux-nfs, netdev
In-Reply-To: <20200618011631.604574-1-sashal@kernel.org>
From: Fedor Tokarev <ftokarev@gmail.com>
[ Upstream commit 118917d696dc59fd3e1741012c2f9db2294bed6f ]
Fix off-by-one issues in 'rpc_ntop6':
- 'snprintf' returns the number of characters which would have been
written if enough space had been available, excluding the terminating
null byte. Thus, a return value of 'sizeof(scopebuf)' means that the
last character was dropped.
- 'strcat' adds a terminating null byte to the string, thus if len ==
buflen, the null byte is written past the end of the buffer.
Signed-off-by: Fedor Tokarev <ftokarev@gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/sunrpc/addr.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/sunrpc/addr.c b/net/sunrpc/addr.c
index d024af4be85e..105d17af4abc 100644
--- a/net/sunrpc/addr.c
+++ b/net/sunrpc/addr.c
@@ -82,11 +82,11 @@ static size_t rpc_ntop6(const struct sockaddr *sap,
rc = snprintf(scopebuf, sizeof(scopebuf), "%c%u",
IPV6_SCOPE_DELIMITER, sin6->sin6_scope_id);
- if (unlikely((size_t)rc > sizeof(scopebuf)))
+ if (unlikely((size_t)rc >= sizeof(scopebuf)))
return 0;
len += rc;
- if (unlikely(len > buflen))
+ if (unlikely(len >= buflen))
return 0;
strcat(buf, scopebuf);
--
2.25.1
^ permalink raw reply related
* [PATCH AUTOSEL 5.4 213/266] rxrpc: Adjust /proc/net/rxrpc/calls to display call->debug_id not user_ID
From: Sasha Levin @ 2020-06-18 1:15 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: David Howells, Sasha Levin, linux-afs, netdev
In-Reply-To: <20200618011631.604574-1-sashal@kernel.org>
From: David Howells <dhowells@redhat.com>
[ Upstream commit 32f71aa497cfb23d37149c2ef16ad71fce2e45e2 ]
The user ID value isn't actually much use - and leaks a kernel pointer or a
userspace value - so replace it with the call debug ID, which appears in trace
points.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/rxrpc/proc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/rxrpc/proc.c b/net/rxrpc/proc.c
index 8b179e3c802a..543afd9bd664 100644
--- a/net/rxrpc/proc.c
+++ b/net/rxrpc/proc.c
@@ -68,7 +68,7 @@ static int rxrpc_call_seq_show(struct seq_file *seq, void *v)
"Proto Local "
" Remote "
" SvID ConnID CallID End Use State Abort "
- " UserID TxSeq TW RxSeq RW RxSerial RxTimo\n");
+ " DebugId TxSeq TW RxSeq RW RxSerial RxTimo\n");
return 0;
}
@@ -100,7 +100,7 @@ static int rxrpc_call_seq_show(struct seq_file *seq, void *v)
rx_hard_ack = READ_ONCE(call->rx_hard_ack);
seq_printf(seq,
"UDP %-47.47s %-47.47s %4x %08x %08x %s %3u"
- " %-8.8s %08x %lx %08x %02x %08x %02x %08x %06lx\n",
+ " %-8.8s %08x %08x %08x %02x %08x %02x %08x %06lx\n",
lbuff,
rbuff,
call->service_id,
@@ -110,7 +110,7 @@ static int rxrpc_call_seq_show(struct seq_file *seq, void *v)
atomic_read(&call->usage),
rxrpc_call_states[call->state],
call->abort_code,
- call->user_call_ID,
+ call->debug_id,
tx_hard_ack, READ_ONCE(call->tx_top) - tx_hard_ack,
rx_hard_ack, READ_ONCE(call->rx_top) - rx_hard_ack,
call->rx_serial,
--
2.25.1
^ permalink raw reply related
* [PATCH AUTOSEL 5.4 228/266] geneve: change from tx_error to tx_dropped on missing metadata
From: Sasha Levin @ 2020-06-18 1:15 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Jiri Benc, David S . Miller, Sasha Levin, netdev
In-Reply-To: <20200618011631.604574-1-sashal@kernel.org>
From: Jiri Benc <jbenc@redhat.com>
[ Upstream commit 9d149045b3c0e44c049cdbce8a64e19415290017 ]
If the geneve interface is in collect_md (external) mode, it can't send any
packets submitted directly to its net interface, as such packets won't have
metadata attached. This is expected.
However, the kernel itself sends some packets to the interface, most
notably, IPv6 DAD, IPv6 multicast listener reports, etc. This is not wrong,
as tunnel metadata can be specified in routing table (although technically,
that has never worked for IPv6, but hopefully will be fixed eventually) and
then the interface must correctly participate in IPv6 housekeeping.
The problem is that any such attempt increases the tx_error counter. Just
bringing up a geneve interface with IPv6 enabled is enough to see a number
of tx_errors. That causes confusion among users, prompting them to find
a network error where there is none.
Change the counter used to tx_dropped. That better conveys the meaning
(there's nothing wrong going on, just some packets are getting dropped) and
hopefully will make admins panic less.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/geneve.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index aa101f72d405..cac75c7d1d01 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -987,9 +987,10 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct net_device *dev)
if (geneve->collect_md) {
info = skb_tunnel_info(skb);
if (unlikely(!info || !(info->mode & IP_TUNNEL_INFO_TX))) {
- err = -EINVAL;
netdev_dbg(dev, "no tunnel metadata\n");
- goto tx_error;
+ dev_kfree_skb(skb);
+ dev->stats.tx_dropped++;
+ return NETDEV_TX_OK;
}
} else {
info = &geneve->info;
@@ -1006,7 +1007,7 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct net_device *dev)
if (likely(!err))
return NETDEV_TX_OK;
-tx_error:
+
dev_kfree_skb(skb);
if (err == -ELOOP)
--
2.25.1
^ permalink raw reply related
* [PATCH AUTOSEL 5.4 236/266] net: marvell: Fix OF_MDIO config check
From: Sasha Levin @ 2020-06-18 1:16 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Dan Murphy, Florian Fainelli, David S . Miller, Sasha Levin,
netdev
In-Reply-To: <20200618011631.604574-1-sashal@kernel.org>
From: Dan Murphy <dmurphy@ti.com>
[ Upstream commit 5cd119d9a05f1c1a08778a7305b4ca0f16bc1e20 ]
When CONFIG_OF_MDIO is set to be a module the code block is not
compiled. Use the IS_ENABLED macro that checks for both built in as
well as module.
Fixes: cf41a51db8985 ("of/phylib: Use device tree properties to initialize Marvell PHYs.")
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/phy/marvell.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index a7796134e3be..91cf1d167263 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -358,7 +358,7 @@ static int m88e1101_config_aneg(struct phy_device *phydev)
return marvell_config_aneg(phydev);
}
-#ifdef CONFIG_OF_MDIO
+#if IS_ENABLED(CONFIG_OF_MDIO)
/* Set and/or override some configuration registers based on the
* marvell,reg-init property stored in the of_node for the phydev.
*
--
2.25.1
^ permalink raw reply related
* [PATCH AUTOSEL 5.4 254/266] bpf/sockmap: Fix kernel panic at __tcp_bpf_recvmsg
From: Sasha Levin @ 2020-06-18 1:16 UTC (permalink / raw)
To: linux-kernel, stable
Cc: dihu, Alexei Starovoitov, John Fastabend, Jakub Sitnicki,
Sasha Levin, netdev, bpf
In-Reply-To: <20200618011631.604574-1-sashal@kernel.org>
From: dihu <anny.hu@linux.alibaba.com>
[ Upstream commit 487082fb7bd2a32b66927d2b22e3a81b072b44f0 ]
When user application calls read() with MSG_PEEK flag to read data
of bpf sockmap socket, kernel panic happens at
__tcp_bpf_recvmsg+0x12c/0x350. sk_msg is not removed from ingress_msg
queue after read out under MSG_PEEK flag is set. Because it's not
judged whether sk_msg is the last msg of ingress_msg queue, the next
sk_msg may be the head of ingress_msg queue, whose memory address of
sg page is invalid. So it's necessary to add check codes to prevent
this problem.
[20759.125457] BUG: kernel NULL pointer dereference, address:
0000000000000008
[20759.132118] CPU: 53 PID: 51378 Comm: envoy Tainted: G E
5.4.32 #1
[20759.140890] Hardware name: Inspur SA5212M4/YZMB-00370-109, BIOS
4.1.12 06/18/2017
[20759.149734] RIP: 0010:copy_page_to_iter+0xad/0x300
[20759.270877] __tcp_bpf_recvmsg+0x12c/0x350
[20759.276099] tcp_bpf_recvmsg+0x113/0x370
[20759.281137] inet_recvmsg+0x55/0xc0
[20759.285734] __sys_recvfrom+0xc8/0x130
[20759.290566] ? __audit_syscall_entry+0x103/0x130
[20759.296227] ? syscall_trace_enter+0x1d2/0x2d0
[20759.301700] ? __audit_syscall_exit+0x1e4/0x290
[20759.307235] __x64_sys_recvfrom+0x24/0x30
[20759.312226] do_syscall_64+0x55/0x1b0
[20759.316852] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Signed-off-by: dihu <anny.hu@linux.alibaba.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20200605084625.9783-1-anny.hu@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/ipv4/tcp_bpf.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index 69b025408390..ad9f38202731 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -96,6 +96,9 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock,
} while (i != msg_rx->sg.end);
if (unlikely(peek)) {
+ if (msg_rx == list_last_entry(&psock->ingress_msg,
+ struct sk_msg, list))
+ break;
msg_rx = list_next_entry(msg_rx, list);
continue;
}
--
2.25.1
^ permalink raw reply related
* [PATCH AUTOSEL 5.4 256/266] tracing/probe: Fix bpf_task_fd_query() for kprobes and uprobes
From: Sasha Levin @ 2020-06-18 1:16 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Jean-Philippe Brucker, Alexei Starovoitov, Yonghong Song,
Masami Hiramatsu, Sasha Levin, netdev, bpf
In-Reply-To: <20200618011631.604574-1-sashal@kernel.org>
From: Jean-Philippe Brucker <jean-philippe@linaro.org>
[ Upstream commit 22d5bd6867364b41576a712755271a7d6161abd6 ]
Commit 60d53e2c3b75 ("tracing/probe: Split trace_event related data from
trace_probe") removed the trace_[ku]probe structure from the
trace_event_call->data pointer. As bpf_get_[ku]probe_info() were
forgotten in that change, fix them now. These functions are currently
only used by the bpf_task_fd_query() syscall handler to collect
information about a perf event.
Fixes: 60d53e2c3b75 ("tracing/probe: Split trace_event related data from trace_probe")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/bpf/20200608124531.819838-1-jean-philippe@linaro.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
kernel/trace/trace_kprobe.c | 2 +-
kernel/trace/trace_uprobe.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index fba4b48451f6..26de9c654956 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1464,7 +1464,7 @@ int bpf_get_kprobe_info(const struct perf_event *event, u32 *fd_type,
if (perf_type_tracepoint)
tk = find_trace_kprobe(pevent, group);
else
- tk = event->tp_event->data;
+ tk = trace_kprobe_primary_from_call(event->tp_event);
if (!tk)
return -EINVAL;
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 2619bc5ed520..5294843de6ef 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -1405,7 +1405,7 @@ int bpf_get_uprobe_info(const struct perf_event *event, u32 *fd_type,
if (perf_type_tracepoint)
tu = find_probe_event(pevent, group);
else
- tu = event->tp_event->data;
+ tu = trace_uprobe_primary_from_call(event->tp_event);
if (!tu)
return -EINVAL;
--
2.25.1
^ permalink raw reply related
* [PATCH AUTOSEL 5.4 255/266] bpf, sockhash: Synchronize delete from bucket list on map free
From: Sasha Levin @ 2020-06-18 1:16 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Jakub Sitnicki, Eric Dumazet, Alexei Starovoitov, John Fastabend,
Sasha Levin, netdev, bpf
In-Reply-To: <20200618011631.604574-1-sashal@kernel.org>
From: Jakub Sitnicki <jakub@cloudflare.com>
[ Upstream commit 75e68e5bf2c7fa9d3e874099139df03d5952a3e1 ]
We can end up modifying the sockhash bucket list from two CPUs when a
sockhash is being destroyed (sock_hash_free) on one CPU, while a socket
that is in the sockhash is unlinking itself from it on another CPU
it (sock_hash_delete_from_link).
This results in accessing a list element that is in an undefined state as
reported by KASAN:
| ==================================================================
| BUG: KASAN: wild-memory-access in sock_hash_free+0x13c/0x280
| Write of size 8 at addr dead000000000122 by task kworker/2:1/95
|
| CPU: 2 PID: 95 Comm: kworker/2:1 Not tainted 5.7.0-rc7-02961-ge22c35ab0038-dirty #691
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
| Workqueue: events bpf_map_free_deferred
| Call Trace:
| dump_stack+0x97/0xe0
| ? sock_hash_free+0x13c/0x280
| __kasan_report.cold+0x5/0x40
| ? mark_lock+0xbc1/0xc00
| ? sock_hash_free+0x13c/0x280
| kasan_report+0x38/0x50
| ? sock_hash_free+0x152/0x280
| sock_hash_free+0x13c/0x280
| bpf_map_free_deferred+0xb2/0xd0
| ? bpf_map_charge_finish+0x50/0x50
| ? rcu_read_lock_sched_held+0x81/0xb0
| ? rcu_read_lock_bh_held+0x90/0x90
| process_one_work+0x59a/0xac0
| ? lock_release+0x3b0/0x3b0
| ? pwq_dec_nr_in_flight+0x110/0x110
| ? rwlock_bug.part.0+0x60/0x60
| worker_thread+0x7a/0x680
| ? _raw_spin_unlock_irqrestore+0x4c/0x60
| kthread+0x1cc/0x220
| ? process_one_work+0xac0/0xac0
| ? kthread_create_on_node+0xa0/0xa0
| ret_from_fork+0x24/0x30
| ==================================================================
Fix it by reintroducing spin-lock protected critical section around the
code that removes the elements from the bucket on sockhash free.
To do that we also need to defer processing of removed elements, until out
of atomic context so that we can unlink the socket from the map when
holding the sock lock.
Fixes: 90db6d772f74 ("bpf, sockmap: Remove bucket->lock from sock_{hash|map}_free")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200607205229.2389672-3-jakub@cloudflare.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/core/sock_map.c | 23 +++++++++++++++++++++--
1 file changed, 21 insertions(+), 2 deletions(-)
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index ba65c608c228..b22e9f119180 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -861,6 +861,7 @@ static void sock_hash_free(struct bpf_map *map)
{
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
struct bpf_htab_bucket *bucket;
+ struct hlist_head unlink_list;
struct bpf_htab_elem *elem;
struct hlist_node *node;
int i;
@@ -872,13 +873,31 @@ static void sock_hash_free(struct bpf_map *map)
synchronize_rcu();
for (i = 0; i < htab->buckets_num; i++) {
bucket = sock_hash_select_bucket(htab, i);
- hlist_for_each_entry_safe(elem, node, &bucket->head, node) {
- hlist_del_rcu(&elem->node);
+
+ /* We are racing with sock_hash_delete_from_link to
+ * enter the spin-lock critical section. Every socket on
+ * the list is still linked to sockhash. Since link
+ * exists, psock exists and holds a ref to socket. That
+ * lets us to grab a socket ref too.
+ */
+ raw_spin_lock_bh(&bucket->lock);
+ hlist_for_each_entry(elem, &bucket->head, node)
+ sock_hold(elem->sk);
+ hlist_move_list(&bucket->head, &unlink_list);
+ raw_spin_unlock_bh(&bucket->lock);
+
+ /* Process removed entries out of atomic context to
+ * block for socket lock before deleting the psock's
+ * link to sockhash.
+ */
+ hlist_for_each_entry_safe(elem, node, &unlink_list, node) {
+ hlist_del(&elem->node);
lock_sock(elem->sk);
rcu_read_lock();
sock_map_unref(elem->sk, elem);
rcu_read_unlock();
release_sock(elem->sk);
+ sock_put(elem->sk);
sock_hash_free_elem(htab, elem);
}
}
--
2.25.1
^ permalink raw reply related
* [PATCH AUTOSEL 5.4 258/266] libbpf: Handle GCC noreturn-turned-volatile quirk
From: Sasha Levin @ 2020-06-18 1:16 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Andrii Nakryiko, Jean-Philippe Brucker, Daniel Borkmann,
Sasha Levin, netdev, bpf, clang-built-linux
In-Reply-To: <20200618011631.604574-1-sashal@kernel.org>
From: Andrii Nakryiko <andriin@fb.com>
[ Upstream commit 32022fd97ed34f6812802bf1288db27c313576f4 ]
Handle a GCC quirk of emitting extra volatile modifier in DWARF (and
subsequently preserved in BTF by pahole) for function pointers marked as
__attribute__((noreturn)). This was the way to mark such functions before GCC
2.5 added noreturn attribute. Drop such func_proto modifiers, similarly to how
it's done for array (also to handle GCC quirk/bug).
Such volatile attribute is emitted by GCC only, so existing selftests can't
express such test. Simple repro is like this (compiled with GCC + BTF
generated by pahole):
struct my_struct {
void __attribute__((noreturn)) (*fn)(int);
};
struct my_struct a;
Without this fix, output will be:
struct my_struct {
voidvolatile (*fn)(int);
};
With the fix:
struct my_struct {
void (*fn)(int);
};
Fixes: 351131b51c7a ("libbpf: add btf_dump API for BTF-to-C conversion")
Reported-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/bpf/20200610052335.2862559-1-andriin@fb.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
tools/lib/bpf/btf_dump.c | 33 ++++++++++++++++++++++++---------
1 file changed, 24 insertions(+), 9 deletions(-)
diff --git a/tools/lib/bpf/btf_dump.c b/tools/lib/bpf/btf_dump.c
index 87f27e2664c5..d9e386b8f47e 100644
--- a/tools/lib/bpf/btf_dump.c
+++ b/tools/lib/bpf/btf_dump.c
@@ -1141,6 +1141,20 @@ static void btf_dump_emit_mods(struct btf_dump *d, struct id_stack *decl_stack)
}
}
+static void btf_dump_drop_mods(struct btf_dump *d, struct id_stack *decl_stack)
+{
+ const struct btf_type *t;
+ __u32 id;
+
+ while (decl_stack->cnt) {
+ id = decl_stack->ids[decl_stack->cnt - 1];
+ t = btf__type_by_id(d->btf, id);
+ if (!btf_is_mod(t))
+ return;
+ decl_stack->cnt--;
+ }
+}
+
static void btf_dump_emit_name(const struct btf_dump *d,
const char *name, bool last_was_ptr)
{
@@ -1239,14 +1253,7 @@ static void btf_dump_emit_type_chain(struct btf_dump *d,
* a const/volatile modifier for array, so we are
* going to silently skip them here.
*/
- while (decls->cnt) {
- next_id = decls->ids[decls->cnt - 1];
- next_t = btf__type_by_id(d->btf, next_id);
- if (btf_is_mod(next_t))
- decls->cnt--;
- else
- break;
- }
+ btf_dump_drop_mods(d, decls);
if (decls->cnt == 0) {
btf_dump_emit_name(d, fname, last_was_ptr);
@@ -1274,7 +1281,15 @@ static void btf_dump_emit_type_chain(struct btf_dump *d,
__u16 vlen = btf_vlen(t);
int i;
- btf_dump_emit_mods(d, decls);
+ /*
+ * GCC emits extra volatile qualifier for
+ * __attribute__((noreturn)) function pointers. Clang
+ * doesn't do it. It's a GCC quirk for backwards
+ * compatibility with code written for GCC <2.5. So,
+ * similarly to extra qualifiers for array, just drop
+ * them, instead of handling them.
+ */
+ btf_dump_drop_mods(d, decls);
if (decls->cnt) {
btf_dump_printf(d, " (");
btf_dump_emit_type_chain(d, decls, fname, lvl);
--
2.25.1
^ permalink raw reply related
* [PATCH AUTOSEL 5.4 264/266] xdp: Fix xsk_generic_xmit errno
From: Sasha Levin @ 2020-06-18 1:16 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Li RongQing, Daniel Borkmann, Björn Töpel, Sasha Levin,
netdev, bpf
In-Reply-To: <20200618011631.604574-1-sashal@kernel.org>
From: Li RongQing <lirongqing@baidu.com>
[ Upstream commit aa2cad0600ed2ca6a0ab39948d4db1666b6c962b ]
Propagate sock_alloc_send_skb error code, not set it to
EAGAIN unconditionally, when fail to allocate skb, which
might cause that user space unnecessary loops.
Fixes: 35fcde7f8deb ("xsk: support for Tx")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/1591852266-24017-1-git-send-email-lirongqing@baidu.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/xdp/xsk.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 7181a30666b4..f9eb5efb237c 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -362,10 +362,8 @@ static int xsk_generic_xmit(struct sock *sk)
len = desc.len;
skb = sock_alloc_send_skb(sk, len, 1, &err);
- if (unlikely(!skb)) {
- err = -EAGAIN;
+ if (unlikely(!skb))
goto out;
- }
skb_put(skb, len);
addr = desc.addr;
--
2.25.1
^ permalink raw reply related
* [PATCH AUTOSEL 5.4 261/266] net/filter: Permit reading NET in load_bytes_relative when MAC not set
From: Sasha Levin @ 2020-06-18 1:16 UTC (permalink / raw)
To: linux-kernel, stable
Cc: YiFei Zhu, YiFei Zhu, Daniel Borkmann, Stanislav Fomichev,
Sasha Levin, netdev, bpf
In-Reply-To: <20200618011631.604574-1-sashal@kernel.org>
From: YiFei Zhu <zhuyifei1999@gmail.com>
[ Upstream commit 0f5d82f187e1beda3fe7295dfc500af266a5bd80 ]
Added a check in the switch case on start_header that checks for
the existence of the header, and in the case that MAC is not set
and the caller requests for MAC, -EFAULT. If the caller requests
for NET then MAC's existence is completely ignored.
There is no function to check NET header's existence and as far
as cgroup_skb/egress is concerned it should always be set.
Removed for ptr >= the start of header, considering offset is
bounded unsigned and should always be true. len <= end - mac is
redundant to ptr + len <= end.
Fixes: 3eee1f75f2b9 ("bpf: fix bpf_skb_load_bytes_relative pkt length check")
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/76bb820ddb6a95f59a772ecbd8c8a336f646b362.1591812755.git.zhuyifei@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/core/filter.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index f1f2304822e3..a0a492f7cf9c 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1766,25 +1766,27 @@ BPF_CALL_5(bpf_skb_load_bytes_relative, const struct sk_buff *, skb,
u32, offset, void *, to, u32, len, u32, start_header)
{
u8 *end = skb_tail_pointer(skb);
- u8 *net = skb_network_header(skb);
- u8 *mac = skb_mac_header(skb);
- u8 *ptr;
+ u8 *start, *ptr;
- if (unlikely(offset > 0xffff || len > (end - mac)))
+ if (unlikely(offset > 0xffff))
goto err_clear;
switch (start_header) {
case BPF_HDR_START_MAC:
- ptr = mac + offset;
+ if (unlikely(!skb_mac_header_was_set(skb)))
+ goto err_clear;
+ start = skb_mac_header(skb);
break;
case BPF_HDR_START_NET:
- ptr = net + offset;
+ start = skb_network_header(skb);
break;
default:
goto err_clear;
}
- if (likely(ptr >= mac && ptr + len <= end)) {
+ ptr = start + offset;
+
+ if (likely(ptr + len <= end)) {
memcpy(to, ptr, len);
return 0;
}
--
2.25.1
^ permalink raw reply related
* [PATCH AUTOSEL 5.4 266/266] bpf: Fix memlock accounting for sock_hash
From: Sasha Levin @ 2020-06-18 1:16 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Andrey Ignatov, Alexei Starovoitov, Sasha Levin, netdev, bpf
In-Reply-To: <20200618011631.604574-1-sashal@kernel.org>
From: Andrey Ignatov <rdna@fb.com>
[ Upstream commit 60e5ca8a64bad8f3e2e20a1e57846e497361c700 ]
Add missed bpf_map_charge_init() in sock_hash_alloc() and
correspondingly bpf_map_charge_finish() on ENOMEM.
It was found accidentally while working on unrelated selftest that
checks "map->memory.pages > 0" is true for all map types.
Before:
# bpftool m l
...
3692: sockhash name m_sockhash flags 0x0
key 4B value 4B max_entries 8 memlock 0B
After:
# bpftool m l
...
84: sockmap name m_sockmap flags 0x0
key 4B value 4B max_entries 8 memlock 4096B
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200612000857.2881453-1-rdna@fb.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/core/sock_map.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index b22e9f119180..6bbc118bf00e 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -837,11 +837,15 @@ static struct bpf_map *sock_hash_alloc(union bpf_attr *attr)
err = -EINVAL;
goto free_htab;
}
+ err = bpf_map_charge_init(&htab->map.memory, cost);
+ if (err)
+ goto free_htab;
htab->buckets = bpf_map_area_alloc(htab->buckets_num *
sizeof(struct bpf_htab_bucket),
htab->map.numa_node);
if (!htab->buckets) {
+ bpf_map_charge_finish(&htab->map.memory);
err = -ENOMEM;
goto free_htab;
}
--
2.25.1
^ permalink raw reply related
* [PATCH] [net/sched]: Remove redundant condition in qdisc_graft
From: Gaurav Singh @ 2020-06-18 1:23 UTC (permalink / raw)
To: gaurav1086, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
David S. Miller, Jakub Kicinski, open list:TC subsystem,
open list
In-Reply-To: <20200618005526.27101-1-gaurav1086@gmail.com>
Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>
Fix build errors
Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>
---
net/sched/sch_api.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 9a3449b56bd6..be93ebcdb18d 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1094,7 +1094,7 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
/* Only support running class lockless if parent is lockless */
if (new && (new->flags & TCQ_F_NOLOCK) &&
- parent && !(parent->flags & TCQ_F_NOLOCK))
+ !(parent->flags & TCQ_F_NOLOCK))
qdisc_clear_nolock(new);
if (!cops || !cops->graft)
--
2.17.1
^ permalink raw reply related
* [PATCH AUTOSEL 4.19 040/172] yam: fix possible memory leak in yam_init_driver
From: Sasha Levin @ 2020-06-18 1:20 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Wang Hai, Hulk Robot, David S . Miller, Sasha Levin, linux-hams,
netdev
In-Reply-To: <20200618012218.607130-1-sashal@kernel.org>
From: Wang Hai <wanghai38@huawei.com>
[ Upstream commit 98749b7188affbf2900c2aab704a8853901d1139 ]
If register_netdev(dev) fails, free_netdev(dev) needs
to be called, otherwise a memory leak will occur.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/hamradio/yam.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/hamradio/yam.c b/drivers/net/hamradio/yam.c
index ba9df430fca6..fdab49872587 100644
--- a/drivers/net/hamradio/yam.c
+++ b/drivers/net/hamradio/yam.c
@@ -1148,6 +1148,7 @@ static int __init yam_init_driver(void)
err = register_netdev(dev);
if (err) {
printk(KERN_WARNING "yam: cannot register net device %s\n", dev->name);
+ free_netdev(dev);
goto error;
}
yam_devs[i] = dev;
--
2.25.1
^ permalink raw reply related
* Re: [PATCH net-next v7 1/6] dt-bindings: net: Add tx and rx internal delays
From: Andrew Lunn @ 2020-06-18 2:01 UTC (permalink / raw)
To: Dan Murphy
Cc: f.fainelli, hkallweit1, davem, robh, netdev, linux-kernel,
devicetree
In-Reply-To: <20200617182019.6790-2-dmurphy@ti.com>
On Wed, Jun 17, 2020 at 01:20:14PM -0500, Dan Murphy wrote:
> tx-internal-delays and rx-internal-delays are a common setting for RGMII
> capable devices.
>
> These properties are used when the phy-mode or phy-controller is set to
> rgmii-id, rgmii-rxid or rgmii-txid. These modes indicate to the
> controller that the PHY will add the internal delay for the connection.
>
> Signed-off-by: Dan Murphy <dmurphy@ti.com>
> ---
> .../devicetree/bindings/net/ethernet-phy.yaml | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/net/ethernet-phy.yaml b/Documentation/devicetree/bindings/net/ethernet-phy.yaml
> index 9b1f1147ca36..b2887476fe6a 100644
> --- a/Documentation/devicetree/bindings/net/ethernet-phy.yaml
> +++ b/Documentation/devicetree/bindings/net/ethernet-phy.yaml
> @@ -162,6 +162,17 @@ properties:
> description:
> Specifies a reference to a node representing a SFP cage.
>
> +
> + rx-internal-delay-ps:
> + description: |
> + RGMII Receive PHY Clock Delay defined in pico seconds. This is used for
> + PHY's that have configurable RX internal delays.
> +
> + tx-internal-delay-ps:
> + description: |
> + RGMII Transmit PHY Clock Delay defined in pico seconds. This is used for
> + PHY's that have configurable TX internal delays.
> +
So in a later patch you have:
default: 2000
That seems to apply that these values only apply when the phy mode
indicates a delay is needed. It would be good to document that here,
when each of these properties will be used. Also, that they default to
2000 when not present.
Andrew
^ permalink raw reply
* [PATCH AUTOSEL 4.19 126/172] net: sunrpc: Fix off-by-one issues in 'rpc_ntop6'
From: Sasha Levin @ 2020-06-18 1:21 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Fedor Tokarev, Anna Schumaker, Sasha Levin, linux-nfs, netdev
In-Reply-To: <20200618012218.607130-1-sashal@kernel.org>
From: Fedor Tokarev <ftokarev@gmail.com>
[ Upstream commit 118917d696dc59fd3e1741012c2f9db2294bed6f ]
Fix off-by-one issues in 'rpc_ntop6':
- 'snprintf' returns the number of characters which would have been
written if enough space had been available, excluding the terminating
null byte. Thus, a return value of 'sizeof(scopebuf)' means that the
last character was dropped.
- 'strcat' adds a terminating null byte to the string, thus if len ==
buflen, the null byte is written past the end of the buffer.
Signed-off-by: Fedor Tokarev <ftokarev@gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/sunrpc/addr.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/sunrpc/addr.c b/net/sunrpc/addr.c
index 2e0a6f92e563..8391c2785550 100644
--- a/net/sunrpc/addr.c
+++ b/net/sunrpc/addr.c
@@ -81,11 +81,11 @@ static size_t rpc_ntop6(const struct sockaddr *sap,
rc = snprintf(scopebuf, sizeof(scopebuf), "%c%u",
IPV6_SCOPE_DELIMITER, sin6->sin6_scope_id);
- if (unlikely((size_t)rc > sizeof(scopebuf)))
+ if (unlikely((size_t)rc >= sizeof(scopebuf)))
return 0;
len += rc;
- if (unlikely(len > buflen))
+ if (unlikely(len >= buflen))
return 0;
strcat(buf, scopebuf);
--
2.25.1
^ permalink raw reply related
* [PATCH AUTOSEL 4.19 135/172] rxrpc: Adjust /proc/net/rxrpc/calls to display call->debug_id not user_ID
From: Sasha Levin @ 2020-06-18 1:21 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: David Howells, Sasha Levin, linux-afs, netdev
In-Reply-To: <20200618012218.607130-1-sashal@kernel.org>
From: David Howells <dhowells@redhat.com>
[ Upstream commit 32f71aa497cfb23d37149c2ef16ad71fce2e45e2 ]
The user ID value isn't actually much use - and leaks a kernel pointer or a
userspace value - so replace it with the call debug ID, which appears in trace
points.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/rxrpc/proc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/rxrpc/proc.c b/net/rxrpc/proc.c
index 9805e3b85c36..81a765dd8c9b 100644
--- a/net/rxrpc/proc.c
+++ b/net/rxrpc/proc.c
@@ -72,7 +72,7 @@ static int rxrpc_call_seq_show(struct seq_file *seq, void *v)
"Proto Local "
" Remote "
" SvID ConnID CallID End Use State Abort "
- " UserID TxSeq TW RxSeq RW RxSerial RxTimo\n");
+ " DebugId TxSeq TW RxSeq RW RxSerial RxTimo\n");
return 0;
}
@@ -104,7 +104,7 @@ static int rxrpc_call_seq_show(struct seq_file *seq, void *v)
rx_hard_ack = READ_ONCE(call->rx_hard_ack);
seq_printf(seq,
"UDP %-47.47s %-47.47s %4x %08x %08x %s %3u"
- " %-8.8s %08x %lx %08x %02x %08x %02x %08x %06lx\n",
+ " %-8.8s %08x %08x %08x %02x %08x %02x %08x %06lx\n",
lbuff,
rbuff,
call->service_id,
@@ -114,7 +114,7 @@ static int rxrpc_call_seq_show(struct seq_file *seq, void *v)
atomic_read(&call->usage),
rxrpc_call_states[call->state],
call->abort_code,
- call->user_call_ID,
+ call->debug_id,
tx_hard_ack, READ_ONCE(call->tx_top) - tx_hard_ack,
rx_hard_ack, READ_ONCE(call->rx_top) - rx_hard_ack,
call->rx_serial,
--
2.25.1
^ permalink raw reply related
* [PATCH AUTOSEL 4.19 144/172] geneve: change from tx_error to tx_dropped on missing metadata
From: Sasha Levin @ 2020-06-18 1:21 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Jiri Benc, David S . Miller, Sasha Levin, netdev
In-Reply-To: <20200618012218.607130-1-sashal@kernel.org>
From: Jiri Benc <jbenc@redhat.com>
[ Upstream commit 9d149045b3c0e44c049cdbce8a64e19415290017 ]
If the geneve interface is in collect_md (external) mode, it can't send any
packets submitted directly to its net interface, as such packets won't have
metadata attached. This is expected.
However, the kernel itself sends some packets to the interface, most
notably, IPv6 DAD, IPv6 multicast listener reports, etc. This is not wrong,
as tunnel metadata can be specified in routing table (although technically,
that has never worked for IPv6, but hopefully will be fixed eventually) and
then the interface must correctly participate in IPv6 housekeeping.
The problem is that any such attempt increases the tx_error counter. Just
bringing up a geneve interface with IPv6 enabled is enough to see a number
of tx_errors. That causes confusion among users, prompting them to find
a network error where there is none.
Change the counter used to tx_dropped. That better conveys the meaning
(there's nothing wrong going on, just some packets are getting dropped) and
hopefully will make admins panic less.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/geneve.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 36444de701cd..817c290b78cd 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -911,9 +911,10 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct net_device *dev)
if (geneve->collect_md) {
info = skb_tunnel_info(skb);
if (unlikely(!info || !(info->mode & IP_TUNNEL_INFO_TX))) {
- err = -EINVAL;
netdev_dbg(dev, "no tunnel metadata\n");
- goto tx_error;
+ dev_kfree_skb(skb);
+ dev->stats.tx_dropped++;
+ return NETDEV_TX_OK;
}
} else {
info = &geneve->info;
@@ -930,7 +931,7 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct net_device *dev)
if (likely(!err))
return NETDEV_TX_OK;
-tx_error:
+
dev_kfree_skb(skb);
if (err == -ELOOP)
--
2.25.1
^ permalink raw reply related
* [PATCH AUTOSEL 4.19 171/172] net/filter: Permit reading NET in load_bytes_relative when MAC not set
From: Sasha Levin @ 2020-06-18 1:22 UTC (permalink / raw)
To: linux-kernel, stable
Cc: YiFei Zhu, YiFei Zhu, Daniel Borkmann, Stanislav Fomichev,
Sasha Levin, netdev, bpf
In-Reply-To: <20200618012218.607130-1-sashal@kernel.org>
From: YiFei Zhu <zhuyifei1999@gmail.com>
[ Upstream commit 0f5d82f187e1beda3fe7295dfc500af266a5bd80 ]
Added a check in the switch case on start_header that checks for
the existence of the header, and in the case that MAC is not set
and the caller requests for MAC, -EFAULT. If the caller requests
for NET then MAC's existence is completely ignored.
There is no function to check NET header's existence and as far
as cgroup_skb/egress is concerned it should always be set.
Removed for ptr >= the start of header, considering offset is
bounded unsigned and should always be true. len <= end - mac is
redundant to ptr + len <= end.
Fixes: 3eee1f75f2b9 ("bpf: fix bpf_skb_load_bytes_relative pkt length check")
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/76bb820ddb6a95f59a772ecbd8c8a336f646b362.1591812755.git.zhuyifei@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/core/filter.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index 40b3af05c883..b5521b60a2d4 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1730,25 +1730,27 @@ BPF_CALL_5(bpf_skb_load_bytes_relative, const struct sk_buff *, skb,
u32, offset, void *, to, u32, len, u32, start_header)
{
u8 *end = skb_tail_pointer(skb);
- u8 *net = skb_network_header(skb);
- u8 *mac = skb_mac_header(skb);
- u8 *ptr;
+ u8 *start, *ptr;
- if (unlikely(offset > 0xffff || len > (end - mac)))
+ if (unlikely(offset > 0xffff))
goto err_clear;
switch (start_header) {
case BPF_HDR_START_MAC:
- ptr = mac + offset;
+ if (unlikely(!skb_mac_header_was_set(skb)))
+ goto err_clear;
+ start = skb_mac_header(skb);
break;
case BPF_HDR_START_NET:
- ptr = net + offset;
+ start = skb_network_header(skb);
break;
default:
goto err_clear;
}
- if (likely(ptr >= mac && ptr + len <= end)) {
+ ptr = start + offset;
+
+ if (likely(ptr + len <= end)) {
memcpy(to, ptr, len);
return 0;
}
--
2.25.1
^ permalink raw reply related
* [PATCH AUTOSEL 4.19 172/172] xdp: Fix xsk_generic_xmit errno
From: Sasha Levin @ 2020-06-18 1:22 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Li RongQing, Daniel Borkmann, Björn Töpel, Sasha Levin,
netdev, bpf
In-Reply-To: <20200618012218.607130-1-sashal@kernel.org>
From: Li RongQing <lirongqing@baidu.com>
[ Upstream commit aa2cad0600ed2ca6a0ab39948d4db1666b6c962b ]
Propagate sock_alloc_send_skb error code, not set it to
EAGAIN unconditionally, when fail to allocate skb, which
might cause that user space unnecessary loops.
Fixes: 35fcde7f8deb ("xsk: support for Tx")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/1591852266-24017-1-git-send-email-lirongqing@baidu.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/xdp/xsk.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 72caa4fb13f4..9ff2ab63e639 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -233,10 +233,8 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
len = desc.len;
skb = sock_alloc_send_skb(sk, len, 1, &err);
- if (unlikely(!skb)) {
- err = -EAGAIN;
+ if (unlikely(!skb))
goto out;
- }
skb_put(skb, len);
addr = desc.addr;
--
2.25.1
^ permalink raw reply related
* [PATCH AUTOSEL 4.14 030/108] yam: fix possible memory leak in yam_init_driver
From: Sasha Levin @ 2020-06-18 1:24 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Wang Hai, Hulk Robot, David S . Miller, Sasha Levin, linux-hams,
netdev
In-Reply-To: <20200618012600.608744-1-sashal@kernel.org>
From: Wang Hai <wanghai38@huawei.com>
[ Upstream commit 98749b7188affbf2900c2aab704a8853901d1139 ]
If register_netdev(dev) fails, free_netdev(dev) needs
to be called, otherwise a memory leak will occur.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/hamradio/yam.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/hamradio/yam.c b/drivers/net/hamradio/yam.c
index 16a6e1193912..b74c735a423d 100644
--- a/drivers/net/hamradio/yam.c
+++ b/drivers/net/hamradio/yam.c
@@ -1162,6 +1162,7 @@ static int __init yam_init_driver(void)
err = register_netdev(dev);
if (err) {
printk(KERN_WARNING "yam: cannot register net device %s\n", dev->name);
+ free_netdev(dev);
goto error;
}
yam_devs[i] = dev;
--
2.25.1
^ permalink raw reply related
* Re: [PATCH net-next v7 2/6] net: phy: Add a helper to return the index for of the internal delay
From: Andrew Lunn @ 2020-06-18 1:51 UTC (permalink / raw)
To: Dan Murphy
Cc: f.fainelli, hkallweit1, davem, robh, netdev, linux-kernel,
devicetree
In-Reply-To: <20200617182019.6790-3-dmurphy@ti.com>
On Wed, Jun 17, 2020 at 01:20:15PM -0500, Dan Murphy wrote:
> Add a helper function that will return the index in the array for the
> passed in internal delay value. The helper requires the array, size and
> delay value.
>
> The helper will then return the index for the exact match or return the
> index for the index to the closest smaller value.
>
> Signed-off-by: Dan Murphy <dmurphy@ti.com>
> ---
> drivers/net/phy/phy_device.c | 68 ++++++++++++++++++++++++++++++++++++
> include/linux/phy.h | 4 +++
> 2 files changed, 72 insertions(+)
>
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 04946de74fa0..611d4e68e3c6 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -31,6 +31,7 @@
> #include <linux/mdio.h>
> #include <linux/io.h>
> #include <linux/uaccess.h>
> +#include <linux/property.h>
>
> MODULE_DESCRIPTION("PHY library");
> MODULE_AUTHOR("Andy Fleming");
> @@ -2657,6 +2658,73 @@ void phy_get_pause(struct phy_device *phydev, bool *tx_pause, bool *rx_pause)
> }
> EXPORT_SYMBOL(phy_get_pause);
>
> +/**
> + * phy_get_delay_index - returns the index of the internal delay
> + * @phydev: phy_device struct
> + * @dev: pointer to the devices device struct
> + * @delay_values: array of delays the PHY supports
> + * @size: the size of the delay array
> + * @is_rx: boolean to indicate to get the rx internal delay
> + *
> + * Returns the index within the array of internal delay passed in.
> + * Or if size == 0 then the delay read from the firmware is returned.
> + * The array must be in ascending order.
> + * Return errno if the delay is invalid or cannot be found.
> + */
> +s32 phy_get_internal_delay(struct phy_device *phydev, struct device *dev,
> + const int *delay_values, int size, bool is_rx)
> +{
> + int ret;
> + int i;
> + s32 delay;
> +
> + if (is_rx)
> + ret = device_property_read_u32(dev, "rx-internal-delay-ps",
> + &delay);
> + else
> + ret = device_property_read_u32(dev, "tx-internal-delay-ps",
> + &delay);
> + if (ret) {
> + phydev_err(phydev, "internal delay not defined\n");
This is an optional property. So printing an error message seems heavy
handed.
Maybe it would be better to default to 0 if the property is not found,
and continue with the lookup in the table to find what value should be
written for a 0ps delay?
Andrew
^ permalink raw reply
* [PATCH AUTOSEL 4.14 082/108] net: sunrpc: Fix off-by-one issues in 'rpc_ntop6'
From: Sasha Levin @ 2020-06-18 1:25 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Fedor Tokarev, Anna Schumaker, Sasha Levin, linux-nfs, netdev
In-Reply-To: <20200618012600.608744-1-sashal@kernel.org>
From: Fedor Tokarev <ftokarev@gmail.com>
[ Upstream commit 118917d696dc59fd3e1741012c2f9db2294bed6f ]
Fix off-by-one issues in 'rpc_ntop6':
- 'snprintf' returns the number of characters which would have been
written if enough space had been available, excluding the terminating
null byte. Thus, a return value of 'sizeof(scopebuf)' means that the
last character was dropped.
- 'strcat' adds a terminating null byte to the string, thus if len ==
buflen, the null byte is written past the end of the buffer.
Signed-off-by: Fedor Tokarev <ftokarev@gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/sunrpc/addr.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/sunrpc/addr.c b/net/sunrpc/addr.c
index 2e0a6f92e563..8391c2785550 100644
--- a/net/sunrpc/addr.c
+++ b/net/sunrpc/addr.c
@@ -81,11 +81,11 @@ static size_t rpc_ntop6(const struct sockaddr *sap,
rc = snprintf(scopebuf, sizeof(scopebuf), "%c%u",
IPV6_SCOPE_DELIMITER, sin6->sin6_scope_id);
- if (unlikely((size_t)rc > sizeof(scopebuf)))
+ if (unlikely((size_t)rc >= sizeof(scopebuf)))
return 0;
len += rc;
- if (unlikely(len > buflen))
+ if (unlikely(len >= buflen))
return 0;
strcat(buf, scopebuf);
--
2.25.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox