* Re: [PATCH v3] tools: bpftool: fix reading from /proc/config.gz
From: Quentin Monnet @ 2019-08-09 17:44 UTC (permalink / raw)
To: Peter Wu, Alexei Starovoitov, Daniel Borkmann
Cc: netdev, Stanislav Fomichev, Jakub Kicinski
In-Reply-To: <20190809003911.7852-1-peter@lekensteyn.nl>
2019-08-09 01:39 UTC+0100 ~ Peter Wu <peter@lekensteyn.nl>
> /proc/config has never existed as far as I can see, but /proc/config.gz
> is present on Arch Linux. Add support for decompressing config.gz using
> zlib which is a mandatory dependency of libelf. Replace existing stdio
> functions with gzFile operations since the latter transparently handles
> uncompressed and gzip-compressed files.
>
> Cc: Quentin Monnet <quentin.monnet@netronome.com>
> Signed-off-by: Peter Wu <peter@lekensteyn.nl>
> ---
> v3: replace popen(gunzip) by linking directly to zlib. Reword commit
> message, remove "Fixes" line. (this patch)
> v2: fix style (reorder vars as reverse xmas tree, rename function,
> braces), fallback to /proc/config.gz if uname() fails.
> https://lkml.kernel.org/r/20190806010702.3303-1-peter@lekensteyn.nl
> v1: https://lkml.kernel.org/r/20190805001541.8096-1-peter@lekensteyn.nl
>
> Hi,
>
> Thanks to Jakub for observing that zlib is already used by libelf, this
> simplifies the patch tremendously as the same API can be used for both
> compressed and uncompressed files. No special case exists anymore for
> fclose/pclose.
>
> According to configure.ac in elfutils, zlib is mandatory, so I just
> assume it to be available. For simplicity I also silently assume lines
> to be less than 4096 characters. If that is not the case, then lines
> will appear truncated, but that should not be an issue for the
> CONFIG_xyz lines that we are scanning for.
>
> Jakub requested the handle leak fix to be posted separately against the
> bpf tree, but since the whole code is rewritten I am not sure if it is
> worth it. It is an unusual edge case: /boot/config-$(uname -r) could be
> opened, but starts with unexpected data.
>
> Kind regards,
> Peter
This version seems good to me, thanks!
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
^ permalink raw reply
* Re: [PATCH v3 net-next 0/3] net: batched receive in GRO path
From: Edward Cree @ 2019-08-09 17:32 UTC (permalink / raw)
To: Ioana Ciocoi Radulescu
Cc: David Miller, netdev, Eric Dumazet,
linux-net-drivers@solarflare.com
In-Reply-To: <AM0PR04MB4994C1A8F32FB6C9A7EE057E94D60@AM0PR04MB4994.eurprd04.prod.outlook.com>
On 09/08/2019 18:14, Ioana Ciocoi Radulescu wrote:
> Hi Edward,
>
> I'm probably missing a lot of context here, but is there a reason
> this change targets only the napi_gro_frags() path and not the
> napi_gro_receive() one?
> I'm trying to understand what drivers that don't call napi_gro_frags()
> should do in order to benefit from this batching feature.
The sfc driver (which is what I have lots of hardware for, so I can
test it) uses napi_gro_frags().
It should be possible to do a similar patch to napi_gro_receive(),
if someone wants to put in the effort of writing and testing it.
However, there are many more callers, so more effort required to
make sure none of them care whether the return value is GRO_DROP
or GRO_NORMAL (since the listified version cannot give that
indication).
Also, the guidance from Eric is that drivers seeking high performance
should use napi_gro_frags(), as this allows GRO to recycle the SKB.
All of this together means I don't plan to submit such a patch; I
would however be happy to review a patch if someone else writes one.
HTH,
-Ed
^ permalink raw reply
* Re: KASAN: use-after-free Read in tomoyo_socket_sendmsg_permission
From: Cong Wang @ 2019-08-09 17:29 UTC (permalink / raw)
To: Dmitry Vyukov
Cc: Tetsuo Handa, syzbot, syzkaller-bugs, Ralf Baechle, linux-hams,
LKML, netdev
In-Reply-To: <CACT4Y+Y7d29kA1fpS13QvSopknuChPANRc9evxeWiJd-zkyNug@mail.gmail.com>
On Fri, Aug 9, 2019 at 1:53 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Fri, Aug 9, 2019 at 12:08 AM Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >
> > On 2019/08/09 1:45, syzbot wrote:
> > > Hello,
> > >
> > > syzbot found the following crash on:
> > >
> > > HEAD commit: 107e47cc vrf: make sure skb->data contains ip header to ma..
> > > git tree: net
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=139506d8600000
> > > kernel config: https://syzkaller.appspot.com/x/.config?x=4dba67bf8b8c9ad7
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=b91501546ab4037f685f
> > > compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> >
> > This is not TOMOYO's bug. LSM modules expect that "struct sock" does not go away.
> >
> > Also, another use-after-free (presumably on the same "struct sock") was concurrently
> > inflight at nr_insert_socket() in net/netrom/af_netrom.c . Thus, suspecting netrom's bug.
>
> There is a number of UAFs/refcount bugs in nr sockets lately. Most
> likely it's the same issue them. Most of them were bisected to:
>
> commit c8c8218ec5af5d2598381883acbefbf604e56b5e
> Date: Thu Jun 27 21:30:58 2019 +0000
> netrom: fix a memory leak in nr_rx_frame()
The UAF introduced by this commit has been fixed. There is
another UAF in netrom which exists long before the above commit,
it is not fixed. The last time I looked at it, it seems related to the
state machine used by netrom sockets, so it is not easy.
Thanks,
^ permalink raw reply
* [PATCH] brcmfmac: remove redundant assignment to pointer hash
From: Colin King @ 2019-08-09 17:22 UTC (permalink / raw)
To: Arend van Spriel, Hante Meuleman, Chi-Hsien Lin, Wright Feng,
Kalle Valo, David S . Miller, linux-wireless,
brcm80211-dev-list.pdl, brcm80211-dev-list, netdev
Cc: kernel-janitors, linux-kernel
From: Colin Ian King <colin.king@canonical.com>
The pointer hash is being initialized with a value that is never read
and is being re-assigned a little later on. The assignment is
redundant and hence can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
drivers/net/wireless/broadcom/brcm80211/brcmfmac/msgbuf.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/msgbuf.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/msgbuf.c
index 8428be8b8d43..e3dd8623be4e 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/msgbuf.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/msgbuf.c
@@ -1468,7 +1468,6 @@ static int brcmf_msgbuf_stats_read(struct seq_file *seq, void *data)
seq_printf(seq, "\nh2d_flowrings: depth %u\n",
BRCMF_H2D_TXFLOWRING_MAX_ITEM);
seq_puts(seq, "Active flowrings:\n");
- hash = msgbuf->flow->hash;
for (i = 0; i < msgbuf->flow->nrofrings; i++) {
if (!msgbuf->flow->rings[i])
continue;
--
2.20.1
^ permalink raw reply related
* KASAN: use-after-free Read in rxrpc_send_keepalive
From: syzbot @ 2019-08-09 17:22 UTC (permalink / raw)
To: davem, dhowells, linux-afs, linux-kernel, netdev, syzkaller-bugs
Hello,
syzbot found the following crash on:
HEAD commit: b678c568 Merge tag 'nfs-for-5.3-2' of git://git.linux-nfs...
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=10ea5e36600000
kernel config: https://syzkaller.appspot.com/x/.config?x=a4c9e9f08e9e8960
dashboard link: https://syzkaller.appspot.com/bug?extid=d850c266e3df14da1d31
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
Unfortunately, I don't have any reproducer for this crash yet.
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+d850c266e3df14da1d31@syzkaller.appspotmail.com
==================================================================
BUG: KASAN: use-after-free in rxrpc_send_keepalive+0x8a2/0x940
net/rxrpc/output.c:635
Read of size 8 at addr ffff888064219698 by task kworker/0:3/11077
CPU: 0 PID: 11077 Comm: kworker/0:3 Not tainted 5.3.0-rc3+ #96
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Workqueue: krxrpcd rxrpc_peer_keepalive_worker
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0xd4/0x306 mm/kasan/report.c:351
__kasan_report.cold+0x1b/0x36 mm/kasan/report.c:482
kasan_report+0x12/0x17 mm/kasan/common.c:612
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
rxrpc_send_keepalive+0x8a2/0x940 net/rxrpc/output.c:635
rxrpc_peer_keepalive_dispatch net/rxrpc/peer_event.c:369 [inline]
rxrpc_peer_keepalive_worker+0x7be/0xd02 net/rxrpc/peer_event.c:430
process_one_work+0x9af/0x1740 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x361/0x430 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Allocated by task 20465:
save_stack+0x23/0x90 mm/kasan/common.c:69
set_track mm/kasan/common.c:77 [inline]
__kasan_kmalloc mm/kasan/common.c:487 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:460
kasan_kmalloc+0x9/0x10 mm/kasan/common.c:501
kmem_cache_alloc_trace+0x158/0x790 mm/slab.c:3550
kmalloc include/linux/slab.h:552 [inline]
kzalloc include/linux/slab.h:748 [inline]
rxrpc_alloc_local net/rxrpc/local_object.c:79 [inline]
rxrpc_lookup_local+0x64c/0x1b70 net/rxrpc/local_object.c:279
rxrpc_sendmsg+0x379/0x5f0 net/rxrpc/af_rxrpc.c:566
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:657
___sys_sendmsg+0x3e2/0x920 net/socket.c:2311
__sys_sendmmsg+0x1bf/0x4d0 net/socket.c:2413
__do_sys_sendmmsg net/socket.c:2442 [inline]
__se_sys_sendmmsg net/socket.c:2439 [inline]
__x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2439
do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 0:
save_stack+0x23/0x90 mm/kasan/common.c:69
set_track mm/kasan/common.c:77 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:449
kasan_slab_free+0xe/0x10 mm/kasan/common.c:457
__cache_free mm/slab.c:3425 [inline]
kfree+0x10a/0x2c0 mm/slab.c:3756
rxrpc_local_rcu+0x62/0x80 net/rxrpc/local_object.c:471
__rcu_reclaim kernel/rcu/rcu.h:222 [inline]
rcu_do_batch kernel/rcu/tree.c:2114 [inline]
rcu_core+0x67f/0x1580 kernel/rcu/tree.c:2314
rcu_core_si+0x9/0x10 kernel/rcu/tree.c:2323
__do_softirq+0x262/0x98c kernel/softirq.c:292
The buggy address belongs to the object at ffff888064219680
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 24 bytes inside of
1024-byte region [ffff888064219680, ffff888064219a80)
The buggy address belongs to the page:
page:ffffea0001908600 refcount:1 mapcount:0 mapping:ffff8880aa400c40
index:0xffff888064218480 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea00025f5a08 ffffea00028fca08 ffff8880aa400c40
raw: ffff888064218480 ffff888064218000 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888064219580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888064219600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ffff888064219680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888064219700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888064219780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
^ permalink raw reply
* Re: [PATCH ethtool] ethtool: dump nested registers
From: Michal Kubecek @ 2019-08-09 17:21 UTC (permalink / raw)
To: netdev
Cc: John W. Linville, Vivien Didelot, f.fainelli, andrew, davem,
linville, cphealy
In-Reply-To: <20190809162336.GB8016@tuxdriver.com>
On Fri, Aug 09, 2019 at 12:23:36PM -0400, John W. Linville wrote:
> On Fri, Aug 02, 2019 at 03:34:54PM -0400, Vivien Didelot wrote:
> > Usually kernel drivers set the regs->len value to the same length as
> > info->regdump_len, which was used for the allocation. In case where
> > regs->len is smaller than the allocated info->regdump_len length,
> > we may assume that the dump contains a nested set of registers.
> >
> > This becomes handy for kernel drivers to expose registers of an
> > underlying network conduit unfortunately not exposed to userspace,
> > as found in network switching equipment for example.
> >
> > This patch adds support for recursing into the dump operation if there
> > is enough room for a nested ethtool_drvinfo structure containing a
> > valid driver name, followed by a ethtool_regs structure like this:
> >
> > 0 regs->len info->regdump_len
> > v v v
> > +--------------+-----------------+--------------+-- - --+
> > | ethtool_regs | ethtool_drvinfo | ethtool_regs | |
> > +--------------+-----------------+--------------+-- - --+
> >
> > Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
>
> I wasn't sure what to do with this one, given the disucssion that
> followed. But since Dave merged "net: dsa: dump CPU port regs through
> master" for net-next, I went ahead and queued this one for the next
> release. If that was the wrong thing to do, speak-up now!
I'm certainly not happy about it as I'm afraid it's going to give me
a headache one day. But as long as the driver packs CPU port registers
into the device's register dump, it doesn't make sense not to allow
ethtool to parse them. And I'm not sure my objections are strong enough
to request a revert of the kernel part as I'm not sure the idea of
transforming the register dump into a structured format (an actual list
of registers rather than an opaque data block) is feasible at all.
So let's keep the patch in.
Michal
^ permalink raw reply
* RE: [PATCH v3 net-next 0/3] net: batched receive in GRO path
From: Ioana Ciocoi Radulescu @ 2019-08-09 17:14 UTC (permalink / raw)
To: Edward Cree, David Miller
Cc: netdev, Eric Dumazet, linux-net-drivers@solarflare.com
In-Reply-To: <c6e2474e-2c8a-5881-86bf-59c66bdfc34f@solarflare.com>
> -----Original Message-----
> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
> Behalf Of Edward Cree
> Sent: Tuesday, August 6, 2019 4:52 PM
> To: David Miller <davem@davemloft.net>
> Cc: netdev <netdev@vger.kernel.org>; Eric Dumazet
> <eric.dumazet@gmail.com>; linux-net-drivers@solarflare.com
> Subject: [PATCH v3 net-next 0/3] net: batched receive in GRO path
>
> This series listifies part of GRO processing, in a manner which allows those
> packets which are not GROed (i.e. for which dev_gro_receive returns
> GRO_NORMAL) to be passed on to the listified regular receive path.
> dev_gro_receive() itself is not listified, nor the per-protocol GRO
> callback, since GRO's need to hold packets on lists under napi->gro_hash
> makes keeping the packets on other lists awkward, and since the GRO control
> block state of held skbs can refer only to one 'new' skb at a time.
> Instead, when napi_frags_finish() handles a GRO_NORMAL result, stash the skb
> onto a list in the napi struct, which is received at the end of the napi
> poll or when its length exceeds the (new) sysctl net.core.gro_normal_batch.
Hi Edward,
I'm probably missing a lot of context here, but is there a reason
this change targets only the napi_gro_frags() path and not the
napi_gro_receive() one?
I'm trying to understand what drivers that don't call napi_gro_frags()
should do in order to benefit from this batching feature.
Thanks,
Ioana
^ permalink raw reply
* Re: [PATCH v2 08/17] nfp: no need to check return value of debugfs_create functions
From: Jakub Kicinski @ 2019-08-09 17:10 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: netdev, David S. Miller, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Edwin Peer, Yangtao Li,
Simon Horman, oss-drivers
In-Reply-To: <20190809123108.27065-9-gregkh@linuxfoundation.org>
On Fri, 9 Aug 2019 14:30:59 +0200, Greg Kroah-Hartman wrote:
> When calling debugfs functions, there is no need to ever check the
> return value. The function can work or not, but the code logic should
> never do something different based on this.
>
> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Jesper Dangaard Brouer <hawk@kernel.org>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Edwin Peer <edwin.peer@netronome.com>
> Cc: Yangtao Li <tiny.windzz@gmail.com>
> Cc: Simon Horman <simon.horman@netronome.com>
> Cc: oss-drivers@netronome.com
> Cc: netdev@vger.kernel.org
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Still
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
:) Thanks!
^ permalink raw reply
* [PATCH][net-next] rxrpc: fix uninitialized return value in variable err
From: Colin King @ 2019-08-09 17:02 UTC (permalink / raw)
To: David Howells, David S . Miller, linux-afs, netdev
Cc: kernel-janitors, linux-kernel
From: Colin Ian King <colin.king@canonical.com>
An earlier commit removed the setting of err to -ENOMEM so currently
the skb_shinfo(skb)->nr_frags > 16 check returns with an uninitialized
bogus return code. Fix this by setting err to -ENOMEM to restore
the original behaviour.
Addresses-Coverity: ("Uninitialized scalar variable")
Fixes: b214b2d8f277 ("rxrpc: Don't use skb_cow_data() in rxkad")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
net/rxrpc/rxkad.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c
index 8b4cddd8b673..c810a7c43b0f 100644
--- a/net/rxrpc/rxkad.c
+++ b/net/rxrpc/rxkad.c
@@ -248,8 +248,10 @@ static int rxkad_secure_packet_encrypt(const struct rxrpc_call *call,
crypto_skcipher_encrypt(req);
/* we want to encrypt the skbuff in-place */
- if (skb_shinfo(skb)->nr_frags > 16)
+ if (skb_shinfo(skb)->nr_frags > 16) {
+ err = -ENOMEM;
goto out;
+ }
len = data_size + call->conn->size_align - 1;
len &= ~(call->conn->size_align - 1);
--
2.20.1
^ permalink raw reply related
* AW: tcan4x5x on a Raspberry Pi
From: FIXED-TERM Buecheler Konstantin (ETAS-SEC/ECT-Mu) @ 2019-08-09 16:46 UTC (permalink / raw)
To: Dan Murphy, linux-can@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <d1badcdb-7635-705d-35d5-448297e8fafa@ti.com>
> Konstantin
>> On 7/29/19 6:19 AM, FIXED-TERM Buecheler Konstantin (ETAS-SEC/ECT-Mu) wrote:
>> Hi all,
>>
>> I am currently working on a project where I am trying to use the tcan4550 chip with a Raspberry PI 3B.
>> I am struggling to create a working device tree overlay file for the Raspberry Pi.
>> Has anyone here tried this already? I would appreciate any help.
> Are you using the driver from net-next?
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/drivers/net/can/m_can
Yes, I am using the driver from net-next.
> DT documentation here
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/Documentation/devicetree/bindings/net/can/tcan4x5x.txt
I saw this documentation but it didn’t help much (As I said, I don’t have much experience with device trees) . My dts file currently looks like this:
/dts-v1/;
/plugin/;
/ {
compatible = "brcm,bcm2835", "brcm,bcm2836", "brcm,bcm2708", "brcm,bcm2709";
fragment@0 {
target = <&spi0>;
__overlay__ {
status = "okay";
spidev@0{
status = "disabled";
};
};
};
fragment@2 {
compatible = "bosch, m_can";
target = <&spi0>;
__overlay__ {
tcan4x5x: tcan4x5x@0 {
compatible = "ti,tcan4x5x";
reg = <0>;
#address-cells = <1>;
#size-cells = <1>;
spi-max-frequency = <10000000>;
bosch,mram-cfg = <0x0 0 0 32 0 0 1 1>;
data-ready-gpios = <&gpio 23 0>;
device-wake-gpios = <&gpio 24 1>;
};
};
};
};
Checking dmesg I always see these errors:
[ 5.409051] tcan4x5x spi0.0: no clock found
[ 5.409064] tcan4x5x spi0.0: no CAN clock source defined
[ 5.409125] tcan4x5x spi0.0: data-ready gpio not defined
[ 5.409135] tcan4x5x spi0.0: Probe failed, err=-22
I already fixed the clock issue once by doing something like this:
clocks = <&can0_osc>,
<&can0_osc>;
clock-names = "hclk", "cclk";
But that didn’t fix the " data-ready gpio not defined" error.
> I did the development on a BeagleBone Black.
> Dan
> Thanks,
> Konstantin
>
^ permalink raw reply
* Re: [PATCH v2 09/13] net: lpc-enet: fix printk format strings
From: Joe Perches @ 2019-08-09 16:30 UTC (permalink / raw)
To: Arnd Bergmann, soc
Cc: Vladimir Zapolskiy, Sylvain Lemieux, linux-arm-kernel,
linux-kernel, kbuild test robot, David S. Miller, netdev
In-Reply-To: <20190809144043.476786-10-arnd@arndb.de>
On Fri, 2019-08-09 at 16:40 +0200, Arnd Bergmann wrote:
> compile-testing this driver on other architectures showed
> multiple warnings:
>
> drivers/net/ethernet/nxp/lpc_eth.c: In function 'lpc_eth_drv_probe':
> drivers/net/ethernet/nxp/lpc_eth.c:1337:19: warning: format '%d' expects argument of type 'int', but argument 4 has type 'resource_size_t {aka long long unsigned int}' [-Wformat=]
>
> drivers/net/ethernet/nxp/lpc_eth.c:1342:19: warning: format '%x' expects argument of type 'unsigned int', but argument 4 has type 'dma_addr_t {aka long long unsigned int}' [-Wformat=]
>
> Use format strings that work on all architectures.
[]
> diff --git a/drivers/net/ethernet/nxp/lpc_eth.c b/drivers/net/ethernet/nxp/lpc_eth.c
[]
> @@ -1333,13 +1333,14 @@ static int lpc_eth_drv_probe(struct platform_device *pdev)
> pldat->dma_buff_base_p = dma_handle;
>
> netdev_dbg(ndev, "IO address space :%pR\n", res);
> - netdev_dbg(ndev, "IO address size :%d\n", resource_size(res));
> + netdev_dbg(ndev, "IO address size :%zd\n",
> + (size_t)resource_size(res));
Ideally all these would use %zu not %zd
^ permalink raw reply
* Re: [PATCH ethtool] ethtool: dump nested registers
From: John W. Linville @ 2019-08-09 16:23 UTC (permalink / raw)
To: Vivien Didelot; +Cc: netdev, f.fainelli, andrew, davem, linville, cphealy
In-Reply-To: <20190802193455.17126-1-vivien.didelot@gmail.com>
On Fri, Aug 02, 2019 at 03:34:54PM -0400, Vivien Didelot wrote:
> Usually kernel drivers set the regs->len value to the same length as
> info->regdump_len, which was used for the allocation. In case where
> regs->len is smaller than the allocated info->regdump_len length,
> we may assume that the dump contains a nested set of registers.
>
> This becomes handy for kernel drivers to expose registers of an
> underlying network conduit unfortunately not exposed to userspace,
> as found in network switching equipment for example.
>
> This patch adds support for recursing into the dump operation if there
> is enough room for a nested ethtool_drvinfo structure containing a
> valid driver name, followed by a ethtool_regs structure like this:
>
> 0 regs->len info->regdump_len
> v v v
> +--------------+-----------------+--------------+-- - --+
> | ethtool_regs | ethtool_drvinfo | ethtool_regs | |
> +--------------+-----------------+--------------+-- - --+
>
> Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
I wasn't sure what to do with this one, given the disucssion that
followed. But since Dave merged "net: dsa: dump CPU port regs through
master" for net-next, I went ahead and queued this one for the next
release. If that was the wrong thing to do, speak-up now!
Thanks,
John
--
John W. Linville Someday the world will need a hero, and you
linville@tuxdriver.com might be all we have. Be ready.
^ permalink raw reply
* Re: [PATCH ethtool] gitignore: ignore vim swapfiles and patches
From: John W. Linville @ 2019-08-09 16:21 UTC (permalink / raw)
To: Michal Kubecek; +Cc: netdev
In-Reply-To: <20190729131003.1E301E0E3B@unicorn.suse.cz>
On Mon, Jul 29, 2019 at 03:10:03PM +0200, Michal Kubecek wrote:
> The .*.swp files are created by vim to hold the undo/redo log. Add them to
> .gitignore to prevent "git status" or "git gui" from showing them whenever
> some file is open in editor.
>
> Add also *.patch to hide patches created by e.g. "git format-patch".
>
> Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Queued for next release...thanks!
--
John W. Linville Someday the world will need a hero, and you
linville@tuxdriver.com might be all we have. Be ready.
^ permalink raw reply
* Re: [patch net-next rfc 3/7] net: rtnetlink: add commands to add and delete alternative ifnames
From: David Ahern @ 2019-08-09 16:14 UTC (permalink / raw)
To: Roopa Prabhu, Jiri Pirko
Cc: netdev, David Miller, Jakub Kicinski, Stephen Hemminger, dcbw,
Michal Kubecek, Andrew Lunn, parav, Saeed Mahameed, mlxsw
In-Reply-To: <CAJieiUj7nzHdRUjBpnfL5bKPszJL0b_hKjxpjM0RGd9ocF3EoA@mail.gmail.com>
On 8/9/19 9:40 AM, Roopa Prabhu wrote:
>>>> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
>>>> index ce2a623abb75..b36cfd83eb76 100644
>>>> --- a/include/uapi/linux/rtnetlink.h
>>>> +++ b/include/uapi/linux/rtnetlink.h
>>>> @@ -164,6 +164,13 @@ enum {
>>>> RTM_GETNEXTHOP,
>>>> #define RTM_GETNEXTHOP RTM_GETNEXTHOP
>>>>
>>>> + RTM_NEWALTIFNAME = 108,
>>>> +#define RTM_NEWALTIFNAME RTM_NEWALTIFNAME
>>>> + RTM_DELALTIFNAME,
>>>> +#define RTM_DELALTIFNAME RTM_DELALTIFNAME
>>>> + RTM_GETALTIFNAME,
>>>> +#define RTM_GETALTIFNAME RTM_GETALTIFNAME
>>>> +
>>>
>>> I might have missed the prior discussion, why do we need new commands
>>> ?. can't this simply be part of RTM_*LINK and we use RTM_SETLINK to
>>> set alternate names ?
>>
>> How? This is to add/remove. How do you suggest to to add/remove by
>> setlink?
>
> to that point, I am also not sure why we have a new API For multiple
> names. I mean why support more than two names (existing old name and
> a new name to remove the length limitation) ?
>
> Your patch series addresses a very important problem (we run into this
> limitation all the time and its hard to explain it to network
> operators) and
> its already unfortunate that we have to have more than one name
> because we cannot resize the existing one.
>
> The best we can do for simpler transition/management from user-space
> is to keep the api simple..
> ie keep it close to the management of existing link attributes. Hence
> the question.
>
> I assumed this would be like alias. A single new field that can be
> referenced in lieu of the old one.
>
> Your series is very useful to many of us...but when i think about
> changing our network manager to accommodate this, I am worried about
> how many apps will have to change.
> I agree they have to change regardless but now they will have to
> listen to yet another notification and msg format for names ?
>
> (apologies for joining the thread late and if i missed prior discussion on this)
I agree with Roopa. I do not understand why new RTM commands are needed.
The existing IFLA + ifinfomsg struct give more than enough ways to id
the device for adding / deleting an alternate name.
^ permalink raw reply
* Re: [PATCH v2 bpf-next] xdp: xdp_umem: fix umem pages mapping for 32bits systems
From: Daniel Borkmann @ 2019-08-09 16:12 UTC (permalink / raw)
To: Ivan Khoronzhuk, bjorn.topel, magnus.karlsson
Cc: davem, ast, john.fastabend, hawk, netdev, bpf, xdp-newbies,
linux-kernel
In-Reply-To: <20190808093803.4918-1-ivan.khoronzhuk@linaro.org>
On 8/8/19 11:38 AM, Ivan Khoronzhuk wrote:
> Use kmap instead of page_address as it's not always in low memory.
>
> Acked-by: Björn Töpel <bjorn.topel@intel.com>
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Applied, thanks!
^ permalink raw reply
* [PATCH bpf-next v2 4/4] selftests/bpf: add sockopt clone/inheritance test
From: Stanislav Fomichev @ 2019-08-09 16:10 UTC (permalink / raw)
To: netdev, bpf
Cc: davem, ast, daniel, Stanislav Fomichev, Martin KaFai Lau,
Yonghong Song
In-Reply-To: <20190809161038.186678-1-sdf@google.com>
Add a test that calls setsockopt on the listener socket which triggers
BPF program. This BPF program writes to the sk storage and sets
clone flag. Make sure that sk storage is cloned for a newly
accepted connection.
We have two cloned maps in the tests to make sure we hit both cases
in bpf_sk_storage_clone: first element (sk_storage_alloc) and
non-first element(s) (selem_link_map).
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
tools/testing/selftests/bpf/.gitignore | 1 +
tools/testing/selftests/bpf/Makefile | 3 +-
.../selftests/bpf/progs/sockopt_inherit.c | 97 +++++++
.../selftests/bpf/test_sockopt_inherit.c | 253 ++++++++++++++++++
4 files changed, 353 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/bpf/progs/sockopt_inherit.c
create mode 100644 tools/testing/selftests/bpf/test_sockopt_inherit.c
diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore
index 90f70d2c7c22..60c9338cd9b4 100644
--- a/tools/testing/selftests/bpf/.gitignore
+++ b/tools/testing/selftests/bpf/.gitignore
@@ -42,4 +42,5 @@ xdping
test_sockopt
test_sockopt_sk
test_sockopt_multi
+test_sockopt_inherit
test_tcp_rtt
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 3bd0f4a0336a..c875763a851a 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -29,7 +29,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
test_cgroup_storage test_select_reuseport test_section_names \
test_netcnt test_tcpnotify_user test_sock_fields test_sysctl test_hashmap \
test_btf_dump test_cgroup_attach xdping test_sockopt test_sockopt_sk \
- test_sockopt_multi test_tcp_rtt
+ test_sockopt_multi test_sockopt_inherit test_tcp_rtt
BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c)))
TEST_GEN_FILES = $(BPF_OBJ_FILES)
@@ -110,6 +110,7 @@ $(OUTPUT)/test_cgroup_attach: cgroup_helpers.c
$(OUTPUT)/test_sockopt: cgroup_helpers.c
$(OUTPUT)/test_sockopt_sk: cgroup_helpers.c
$(OUTPUT)/test_sockopt_multi: cgroup_helpers.c
+$(OUTPUT)/test_sockopt_inherit: cgroup_helpers.c
$(OUTPUT)/test_tcp_rtt: cgroup_helpers.c
.PHONY: force
diff --git a/tools/testing/selftests/bpf/progs/sockopt_inherit.c b/tools/testing/selftests/bpf/progs/sockopt_inherit.c
new file mode 100644
index 000000000000..dede0fcd6102
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/sockopt_inherit.c
@@ -0,0 +1,97 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include "bpf_helpers.h"
+
+char _license[] SEC("license") = "GPL";
+__u32 _version SEC("version") = 1;
+
+#define SOL_CUSTOM 0xdeadbeef
+#define CUSTOM_INHERIT1 0
+#define CUSTOM_INHERIT2 1
+#define CUSTOM_LISTENER 2
+
+struct sockopt_inherit {
+ __u8 val;
+};
+
+struct {
+ __uint(type, BPF_MAP_TYPE_SK_STORAGE);
+ __uint(map_flags, BPF_F_NO_PREALLOC | BPF_F_CLONE);
+ __type(key, int);
+ __type(value, struct sockopt_inherit);
+} cloned1_map SEC(".maps");
+
+struct {
+ __uint(type, BPF_MAP_TYPE_SK_STORAGE);
+ __uint(map_flags, BPF_F_NO_PREALLOC | BPF_F_CLONE);
+ __type(key, int);
+ __type(value, struct sockopt_inherit);
+} cloned2_map SEC(".maps");
+
+struct {
+ __uint(type, BPF_MAP_TYPE_SK_STORAGE);
+ __uint(map_flags, BPF_F_NO_PREALLOC);
+ __type(key, int);
+ __type(value, struct sockopt_inherit);
+} listener_only_map SEC(".maps");
+
+static __inline struct sockopt_inherit *get_storage(struct bpf_sockopt *ctx)
+{
+ if (ctx->optname == CUSTOM_INHERIT1)
+ return bpf_sk_storage_get(&cloned1_map, ctx->sk, 0,
+ BPF_SK_STORAGE_GET_F_CREATE);
+ else if (ctx->optname == CUSTOM_INHERIT2)
+ return bpf_sk_storage_get(&cloned2_map, ctx->sk, 0,
+ BPF_SK_STORAGE_GET_F_CREATE);
+ else
+ return bpf_sk_storage_get(&listener_only_map, ctx->sk, 0,
+ BPF_SK_STORAGE_GET_F_CREATE);
+}
+
+SEC("cgroup/getsockopt")
+int _getsockopt(struct bpf_sockopt *ctx)
+{
+ __u8 *optval_end = ctx->optval_end;
+ struct sockopt_inherit *storage;
+ __u8 *optval = ctx->optval;
+
+ if (ctx->level != SOL_CUSTOM)
+ return 1; /* only interested in SOL_CUSTOM */
+
+ if (optval + 1 > optval_end)
+ return 0; /* EPERM, bounds check */
+
+ storage = get_storage(ctx);
+ if (!storage)
+ return 0; /* EPERM, couldn't get sk storage */
+
+ ctx->retval = 0; /* Reset system call return value to zero */
+
+ optval[0] = storage->val;
+ ctx->optlen = 1;
+
+ return 1;
+}
+
+SEC("cgroup/setsockopt")
+int _setsockopt(struct bpf_sockopt *ctx)
+{
+ __u8 *optval_end = ctx->optval_end;
+ struct sockopt_inherit *storage;
+ __u8 *optval = ctx->optval;
+
+ if (ctx->level != SOL_CUSTOM)
+ return 1; /* only interested in SOL_CUSTOM */
+
+ if (optval + 1 > optval_end)
+ return 0; /* EPERM, bounds check */
+
+ storage = get_storage(ctx);
+ if (!storage)
+ return 0; /* EPERM, couldn't get sk storage */
+
+ storage->val = optval[0];
+ ctx->optlen = -1;
+
+ return 1;
+}
diff --git a/tools/testing/selftests/bpf/test_sockopt_inherit.c b/tools/testing/selftests/bpf/test_sockopt_inherit.c
new file mode 100644
index 000000000000..1bf699815b9b
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_sockopt_inherit.c
@@ -0,0 +1,253 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <error.h>
+#include <errno.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <pthread.h>
+
+#include <linux/filter.h>
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+
+#include "bpf_rlimit.h"
+#include "bpf_util.h"
+#include "cgroup_helpers.h"
+
+#define CG_PATH "/sockopt_inherit"
+#define SOL_CUSTOM 0xdeadbeef
+#define CUSTOM_INHERIT1 0
+#define CUSTOM_INHERIT2 1
+#define CUSTOM_LISTENER 2
+
+static int connect_to_server(int server_fd)
+{
+ struct sockaddr_storage addr;
+ socklen_t len = sizeof(addr);
+ int fd;
+
+ fd = socket(AF_INET, SOCK_STREAM, 0);
+ if (fd < 0) {
+ log_err("Failed to create client socket");
+ return -1;
+ }
+
+ if (getsockname(server_fd, (struct sockaddr *)&addr, &len)) {
+ log_err("Failed to get server addr");
+ goto out;
+ }
+
+ if (connect(fd, (const struct sockaddr *)&addr, len) < 0) {
+ log_err("Fail to connect to server");
+ goto out;
+ }
+
+ return fd;
+
+out:
+ close(fd);
+ return -1;
+}
+
+static int verify_sockopt(int fd, int optname, const char *msg, char expected)
+{
+ socklen_t optlen = 1;
+ char buf = 0;
+ int err;
+
+ err = getsockopt(fd, SOL_CUSTOM, optname, &buf, &optlen);
+ if (err) {
+ log_err("%s: failed to call getsockopt", msg);
+ return 1;
+ }
+
+ printf("%s %d: got=0x%x ? expected=0x%x\n", msg, optname, buf, expected);
+
+ if (buf != expected) {
+ log_err("%s: unexpected getsockopt value %d != %d", msg,
+ buf, expected);
+ return 1;
+ }
+
+ return 0;
+}
+
+static void *server_thread(void *arg)
+{
+ struct sockaddr_storage addr;
+ socklen_t len = sizeof(addr);
+ int fd = *(int *)arg;
+ int client_fd;
+ int err = 0;
+
+ if (listen(fd, 1) < 0)
+ error(1, errno, "Failed to listed on socket");
+
+ err += verify_sockopt(fd, CUSTOM_INHERIT1, "listen", 1);
+ err += verify_sockopt(fd, CUSTOM_INHERIT2, "listen", 1);
+ err += verify_sockopt(fd, CUSTOM_LISTENER, "listen", 1);
+
+ client_fd = accept(fd, (struct sockaddr *)&addr, &len);
+ if (client_fd < 0)
+ error(1, errno, "Failed to accept client");
+
+ err += verify_sockopt(client_fd, CUSTOM_INHERIT1, "accept", 1);
+ err += verify_sockopt(client_fd, CUSTOM_INHERIT2, "accept", 1);
+ err += verify_sockopt(client_fd, CUSTOM_LISTENER, "accept", 0);
+
+ close(client_fd);
+
+ return (void *)(long)err;
+}
+
+static int start_server(void)
+{
+ struct sockaddr_in addr = {
+ .sin_family = AF_INET,
+ .sin_addr.s_addr = htonl(INADDR_LOOPBACK),
+ };
+ char buf;
+ int err;
+ int fd;
+ int i;
+
+ fd = socket(AF_INET, SOCK_STREAM, 0);
+ if (fd < 0) {
+ log_err("Failed to create server socket");
+ return -1;
+ }
+
+ for (i = CUSTOM_INHERIT1; i <= CUSTOM_LISTENER; i++) {
+ buf = 0x01;
+ err = setsockopt(fd, SOL_CUSTOM, i, &buf, 1);
+ if (err) {
+ log_err("Failed to call setsockopt(%d)", i);
+ close(fd);
+ return -1;
+ }
+ }
+
+ if (bind(fd, (const struct sockaddr *)&addr, sizeof(addr)) < 0) {
+ log_err("Failed to bind socket");
+ close(fd);
+ return -1;
+ }
+
+ return fd;
+}
+
+static int prog_attach(struct bpf_object *obj, int cgroup_fd, const char *title)
+{
+ enum bpf_attach_type attach_type;
+ enum bpf_prog_type prog_type;
+ struct bpf_program *prog;
+ int err;
+
+ err = libbpf_prog_type_by_name(title, &prog_type, &attach_type);
+ if (err) {
+ log_err("Failed to deduct types for %s BPF program", title);
+ return -1;
+ }
+
+ prog = bpf_object__find_program_by_title(obj, title);
+ if (!prog) {
+ log_err("Failed to find %s BPF program", title);
+ return -1;
+ }
+
+ err = bpf_prog_attach(bpf_program__fd(prog), cgroup_fd,
+ attach_type, 0);
+ if (err) {
+ log_err("Failed to attach %s BPF program", title);
+ return -1;
+ }
+
+ return 0;
+}
+
+static int run_test(int cgroup_fd)
+{
+ struct bpf_prog_load_attr attr = {
+ .file = "./sockopt_inherit.o",
+ };
+ int server_fd = -1, client_fd;
+ struct bpf_object *obj;
+ void *server_err;
+ pthread_t tid;
+ int ignored;
+ int err;
+
+ err = bpf_prog_load_xattr(&attr, &obj, &ignored);
+ if (err) {
+ log_err("Failed to load BPF object");
+ return -1;
+ }
+
+ err = prog_attach(obj, cgroup_fd, "cgroup/getsockopt");
+ if (err)
+ goto close_bpf_object;
+
+ err = prog_attach(obj, cgroup_fd, "cgroup/setsockopt");
+ if (err)
+ goto close_bpf_object;
+
+ server_fd = start_server();
+ if (server_fd < 0) {
+ err = -1;
+ goto close_bpf_object;
+ }
+
+ pthread_create(&tid, NULL, server_thread, (void *)&server_fd);
+
+ client_fd = connect_to_server(server_fd);
+ if (client_fd < 0) {
+ err = -1;
+ goto close_server_fd;
+ }
+
+ err += verify_sockopt(client_fd, CUSTOM_INHERIT1, "connect", 0);
+ err += verify_sockopt(client_fd, CUSTOM_INHERIT2, "connect", 0);
+ err += verify_sockopt(client_fd, CUSTOM_LISTENER, "connect", 0);
+
+ pthread_join(tid, &server_err);
+
+ err += (int)(long)server_err;
+
+ close(client_fd);
+
+close_server_fd:
+ close(server_fd);
+close_bpf_object:
+ bpf_object__close(obj);
+ return err;
+}
+
+int main(int args, char **argv)
+{
+ int cgroup_fd;
+ int err = EXIT_SUCCESS;
+
+ if (setup_cgroup_environment())
+ return err;
+
+ cgroup_fd = create_and_get_cgroup(CG_PATH);
+ if (cgroup_fd < 0)
+ goto cleanup_cgroup_env;
+
+ if (join_cgroup(CG_PATH))
+ goto cleanup_cgroup;
+
+ if (run_test(cgroup_fd))
+ err = EXIT_FAILURE;
+
+ printf("test_sockopt_inherit: %s\n",
+ err == EXIT_SUCCESS ? "PASSED" : "FAILED");
+
+cleanup_cgroup:
+ close(cgroup_fd);
+cleanup_cgroup_env:
+ cleanup_cgroup_environment();
+ return err;
+}
--
2.23.0.rc1.153.gdeed80330f-goog
^ permalink raw reply related
* [PATCH bpf-next v2 3/4] bpf: sync bpf.h to tools/
From: Stanislav Fomichev @ 2019-08-09 16:10 UTC (permalink / raw)
To: netdev, bpf
Cc: davem, ast, daniel, Stanislav Fomichev, Martin KaFai Lau,
Yonghong Song
In-Reply-To: <20190809161038.186678-1-sdf@google.com>
Sync new sk storage clone flag.
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
tools/include/uapi/linux/bpf.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 4393bd4b2419..0ef594ac3899 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -337,6 +337,9 @@ enum bpf_attach_type {
#define BPF_F_RDONLY_PROG (1U << 7)
#define BPF_F_WRONLY_PROG (1U << 8)
+/* Clone map from listener for newly accepted socket */
+#define BPF_F_CLONE (1U << 9)
+
/* flags for BPF_PROG_QUERY */
#define BPF_F_QUERY_EFFECTIVE (1U << 0)
--
2.23.0.rc1.153.gdeed80330f-goog
^ permalink raw reply related
* [PATCH bpf-next v2 2/4] bpf: support cloning sk storage on accept()
From: Stanislav Fomichev @ 2019-08-09 16:10 UTC (permalink / raw)
To: netdev, bpf
Cc: davem, ast, daniel, Stanislav Fomichev, Martin KaFai Lau,
Yonghong Song
In-Reply-To: <20190809161038.186678-1-sdf@google.com>
Add new helper bpf_sk_storage_clone which optionally clones sk storage
and call it from sk_clone_lock.
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
include/net/bpf_sk_storage.h | 10 ++++
include/uapi/linux/bpf.h | 3 ++
net/core/bpf_sk_storage.c | 100 +++++++++++++++++++++++++++++++++--
net/core/sock.c | 9 ++--
4 files changed, 116 insertions(+), 6 deletions(-)
diff --git a/include/net/bpf_sk_storage.h b/include/net/bpf_sk_storage.h
index b9dcb02e756b..8e4f831d2e52 100644
--- a/include/net/bpf_sk_storage.h
+++ b/include/net/bpf_sk_storage.h
@@ -10,4 +10,14 @@ void bpf_sk_storage_free(struct sock *sk);
extern const struct bpf_func_proto bpf_sk_storage_get_proto;
extern const struct bpf_func_proto bpf_sk_storage_delete_proto;
+#ifdef CONFIG_BPF_SYSCALL
+int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk);
+#else
+static inline int bpf_sk_storage_clone(const struct sock *sk,
+ struct sock *newsk)
+{
+ return 0;
+}
+#endif
+
#endif /* _BPF_SK_STORAGE_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 4393bd4b2419..0ef594ac3899 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -337,6 +337,9 @@ enum bpf_attach_type {
#define BPF_F_RDONLY_PROG (1U << 7)
#define BPF_F_WRONLY_PROG (1U << 8)
+/* Clone map from listener for newly accepted socket */
+#define BPF_F_CLONE (1U << 9)
+
/* flags for BPF_PROG_QUERY */
#define BPF_F_QUERY_EFFECTIVE (1U << 0)
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index 94c7f77ecb6b..584e08ee0ca3 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -12,6 +12,9 @@
static atomic_t cache_idx;
+#define SK_STORAGE_CREATE_FLAG_MASK \
+ (BPF_F_NO_PREALLOC | BPF_F_CLONE)
+
struct bucket {
struct hlist_head list;
raw_spinlock_t lock;
@@ -209,7 +212,6 @@ static void selem_unlink_sk(struct bpf_sk_storage_elem *selem)
kfree_rcu(sk_storage, rcu);
}
-/* sk_storage->lock must be held and sk_storage->list cannot be empty */
static void __selem_link_sk(struct bpf_sk_storage *sk_storage,
struct bpf_sk_storage_elem *selem)
{
@@ -509,7 +511,7 @@ static int sk_storage_delete(struct sock *sk, struct bpf_map *map)
return 0;
}
-/* Called by __sk_destruct() */
+/* Called by __sk_destruct() & bpf_sk_storage_clone() */
void bpf_sk_storage_free(struct sock *sk)
{
struct bpf_sk_storage_elem *selem;
@@ -557,6 +559,11 @@ static void bpf_sk_storage_map_free(struct bpf_map *map)
smap = (struct bpf_sk_storage_map *)map;
+ /* Note that this map might be concurrently cloned from
+ * bpf_sk_storage_clone. Wait for any existing bpf_sk_storage_clone
+ * RCU read section to finish before proceeding. New RCU
+ * read sections should be prevented via bpf_map_inc_not_zero.
+ */
synchronize_rcu();
/* bpf prog and the userspace can no longer access this map
@@ -601,7 +608,8 @@ static void bpf_sk_storage_map_free(struct bpf_map *map)
static int bpf_sk_storage_map_alloc_check(union bpf_attr *attr)
{
- if (attr->map_flags != BPF_F_NO_PREALLOC || attr->max_entries ||
+ if (attr->map_flags & ~SK_STORAGE_CREATE_FLAG_MASK ||
+ attr->max_entries ||
attr->key_size != sizeof(int) || !attr->value_size ||
/* Enforce BTF for userspace sk dumping */
!attr->btf_key_type_id || !attr->btf_value_type_id)
@@ -739,6 +747,92 @@ static int bpf_fd_sk_storage_delete_elem(struct bpf_map *map, void *key)
return err;
}
+static struct bpf_sk_storage_elem *
+bpf_sk_storage_clone_elem(struct sock *newsk,
+ struct bpf_sk_storage_map *smap,
+ struct bpf_sk_storage_elem *selem)
+{
+ struct bpf_sk_storage_elem *copy_selem;
+
+ copy_selem = selem_alloc(smap, newsk, NULL, true);
+ if (!copy_selem)
+ return NULL;
+
+ if (map_value_has_spin_lock(&smap->map))
+ copy_map_value_locked(&smap->map, SDATA(copy_selem)->data,
+ SDATA(selem)->data, true);
+ else
+ copy_map_value(&smap->map, SDATA(copy_selem)->data,
+ SDATA(selem)->data);
+
+ return copy_selem;
+}
+
+int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
+{
+ struct bpf_sk_storage *new_sk_storage = NULL;
+ struct bpf_sk_storage *sk_storage;
+ struct bpf_sk_storage_elem *selem;
+ int ret;
+
+ RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL);
+
+ rcu_read_lock();
+ sk_storage = rcu_dereference(sk->sk_bpf_storage);
+
+ if (!sk_storage || hlist_empty(&sk_storage->list))
+ goto out;
+
+ hlist_for_each_entry_rcu(selem, &sk_storage->list, snode) {
+ struct bpf_sk_storage_elem *copy_selem;
+ struct bpf_sk_storage_map *smap;
+ struct bpf_map *map;
+ int refold;
+
+ smap = rcu_dereference(SDATA(selem)->smap);
+ if (!(smap->map.map_flags & BPF_F_CLONE))
+ continue;
+
+ map = bpf_map_inc_not_zero(&smap->map, false);
+ if (IS_ERR(map))
+ continue;
+
+ copy_selem = bpf_sk_storage_clone_elem(newsk, smap, selem);
+ if (!copy_selem) {
+ ret = -ENOMEM;
+ bpf_map_put(map);
+ goto err;
+ }
+
+ if (new_sk_storage) {
+ selem_link_map(smap, copy_selem);
+ __selem_link_sk(new_sk_storage, copy_selem);
+ } else {
+ ret = sk_storage_alloc(newsk, smap, copy_selem);
+ if (ret) {
+ kfree(copy_selem);
+ atomic_sub(smap->elem_size,
+ &newsk->sk_omem_alloc);
+ bpf_map_put(map);
+ goto err;
+ }
+
+ new_sk_storage = rcu_dereference(copy_selem->sk_storage);
+ }
+ bpf_map_put(map);
+ }
+
+out:
+ rcu_read_unlock();
+ return 0;
+
+err:
+ rcu_read_unlock();
+
+ bpf_sk_storage_free(newsk);
+ return ret;
+}
+
BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
void *, value, u64, flags)
{
diff --git a/net/core/sock.c b/net/core/sock.c
index d57b0cc995a0..f5e801a9cea4 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1851,9 +1851,12 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
goto out;
}
RCU_INIT_POINTER(newsk->sk_reuseport_cb, NULL);
-#ifdef CONFIG_BPF_SYSCALL
- RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL);
-#endif
+
+ if (bpf_sk_storage_clone(sk, newsk)) {
+ sk_free_unlock_clone(newsk);
+ newsk = NULL;
+ goto out;
+ }
newsk->sk_err = 0;
newsk->sk_err_soft = 0;
--
2.23.0.rc1.153.gdeed80330f-goog
^ permalink raw reply related
* [PATCH bpf-next v2 1/4] bpf: export bpf_map_inc_not_zero
From: Stanislav Fomichev @ 2019-08-09 16:10 UTC (permalink / raw)
To: netdev, bpf
Cc: davem, ast, daniel, Stanislav Fomichev, Martin KaFai Lau,
Yonghong Song
In-Reply-To: <20190809161038.186678-1-sdf@google.com>
Rename existing bpf_map_inc_not_zero to __bpf_map_inc_not_zero to
indicate that it's caller's responsibility to do proper locking.
Create and export bpf_map_inc_not_zero wrapper that properly
locks map_idr_lock. Will be used in the next commit to
hold a map while cloning a socket.
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
include/linux/bpf.h | 2 ++
kernel/bpf/syscall.c | 16 +++++++++++++---
2 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f9a506147c8a..15ae49862b82 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -647,6 +647,8 @@ void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock);
struct bpf_map *bpf_map_get_with_uref(u32 ufd);
struct bpf_map *__bpf_map_get(struct fd f);
struct bpf_map * __must_check bpf_map_inc(struct bpf_map *map, bool uref);
+struct bpf_map * __must_check bpf_map_inc_not_zero(struct bpf_map *map,
+ bool uref);
void bpf_map_put_with_uref(struct bpf_map *map);
void bpf_map_put(struct bpf_map *map);
int bpf_map_charge_memlock(struct bpf_map *map, u32 pages);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 5d141f16f6fa..cf8052b016e7 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -683,8 +683,8 @@ struct bpf_map *bpf_map_get_with_uref(u32 ufd)
}
/* map_idr_lock should have been held */
-static struct bpf_map *bpf_map_inc_not_zero(struct bpf_map *map,
- bool uref)
+static struct bpf_map *__bpf_map_inc_not_zero(struct bpf_map *map,
+ bool uref)
{
int refold;
@@ -704,6 +704,16 @@ static struct bpf_map *bpf_map_inc_not_zero(struct bpf_map *map,
return map;
}
+struct bpf_map *bpf_map_inc_not_zero(struct bpf_map *map, bool uref)
+{
+ spin_lock_bh(&map_idr_lock);
+ map = __bpf_map_inc_not_zero(map, uref);
+ spin_unlock_bh(&map_idr_lock);
+
+ return map;
+}
+EXPORT_SYMBOL_GPL(bpf_map_inc_not_zero);
+
int __weak bpf_stackmap_copy(struct bpf_map *map, void *key, void *value)
{
return -ENOTSUPP;
@@ -2177,7 +2187,7 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
spin_lock_bh(&map_idr_lock);
map = idr_find(&map_idr, id);
if (map)
- map = bpf_map_inc_not_zero(map, true);
+ map = __bpf_map_inc_not_zero(map, true);
else
map = ERR_PTR(-ENOENT);
spin_unlock_bh(&map_idr_lock);
--
2.23.0.rc1.153.gdeed80330f-goog
^ permalink raw reply related
* [PATCH bpf-next v2 0/4] bpf: support cloning sk storage on accept()
From: Stanislav Fomichev @ 2019-08-09 16:10 UTC (permalink / raw)
To: netdev, bpf
Cc: davem, ast, daniel, Stanislav Fomichev, Martin KaFai Lau,
Yonghong Song
Currently there is no way to propagate sk storage from the listener
socket to a newly accepted one. Consider the following use case:
fd = socket();
setsockopt(fd, SOL_IP, IP_TOS,...);
/* ^^^ setsockopt BPF program triggers here and saves something
* into sk storage of the listener.
*/
listen(fd, ...);
while (client = accept(fd)) {
/* At this point all association between listener
* socket and newly accepted one is gone. New
* socket will not have any sk storage attached.
*/
}
Let's add new BPF_F_CLONE flag that can be specified when creating
a socket storage map. This new flag indicates that map contents
should be cloned when the socket is cloned.
v2:
* remove spinlocks around selem_link_map/sk (Martin KaFai Lau)
* BPF_F_CLONE on a map, not selem (Martin KaFai Lau)
* hold a map while cloning (Martin KaFai Lau)
* use BTF maps in selftests (Yonghong Song)
* do proper cleanup selftests; don't call close(-1) (Yonghong Song)
* export bpf_map_inc_not_zero
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Stanislav Fomichev (4):
bpf: export bpf_map_inc_not_zero
bpf: support cloning sk storage on accept()
bpf: sync bpf.h to tools/
selftests/bpf: add sockopt clone/inheritance test
include/linux/bpf.h | 2 +
include/net/bpf_sk_storage.h | 10 +
include/uapi/linux/bpf.h | 3 +
kernel/bpf/syscall.c | 16 +-
net/core/bpf_sk_storage.c | 100 ++++++-
net/core/sock.c | 9 +-
tools/include/uapi/linux/bpf.h | 3 +
tools/testing/selftests/bpf/.gitignore | 1 +
tools/testing/selftests/bpf/Makefile | 3 +-
.../selftests/bpf/progs/sockopt_inherit.c | 97 +++++++
.../selftests/bpf/test_sockopt_inherit.c | 253 ++++++++++++++++++
11 files changed, 487 insertions(+), 10 deletions(-)
create mode 100644 tools/testing/selftests/bpf/progs/sockopt_inherit.c
create mode 100644 tools/testing/selftests/bpf/test_sockopt_inherit.c
--
2.23.0.rc1.153.gdeed80330f-goog
^ permalink raw reply
* Re: [bpf-next v3 PATCH 0/3] bpf: improvements to xdp_fwd sample
From: Daniel Borkmann @ 2019-08-09 16:08 UTC (permalink / raw)
To: Jesper Dangaard Brouer, netdev, Alexei Starovoitov
Cc: a.s.protopopov, dsahern, Toke Høiland-Jørgensen,
ys114321
In-Reply-To: <156528102557.22124.261409336813472418.stgit@firesoul>
On 8/8/19 6:17 PM, Jesper Dangaard Brouer wrote:
> V3: Hopefully fixed all issues point out by Yonghong Song
>
> V2: Addressed issues point out by Yonghong Song
> - Please ACK patch 2/3 again
> - Added ACKs and reviewed-by to other patches
>
> This patchset is focused on improvements for XDP forwarding sample
> named xdp_fwd, which leverage the existing FIB routing tables as
> described in LPC2018[1] talk by David Ahern.
>
> The primary motivation is to illustrate how Toke's recent work
> improves usability of XDP_REDIRECT via lookups in devmap. The other
> patches are to help users understand the sample.
>
> I have more improvements to xdp_fwd, but those might requires changes
> to libbpf. Thus, sending these patches first as they are isolated.
>
> [1] http://vger.kernel.org/lpc-networking2018.html#session-1
>
> ---
>
> Jesper Dangaard Brouer (3):
> samples/bpf: xdp_fwd rename devmap name to be xdp_tx_ports
> samples/bpf: make xdp_fwd more practically usable via devmap lookup
> samples/bpf: xdp_fwd explain bpf_fib_lookup return codes
>
>
> samples/bpf/xdp_fwd_kern.c | 39 ++++++++++++++++++++++++++++++---------
> samples/bpf/xdp_fwd_user.c | 35 +++++++++++++++++++++++------------
> 2 files changed, 53 insertions(+), 21 deletions(-)
>
> --
>
Applied, thanks!
^ permalink raw reply
* [PATCH net 2/2] rxrpc: Don't bother generating maxSkew in the ACK packet
From: David Howells @ 2019-08-09 16:06 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <156536674651.17478.15139844428920800280.stgit@warthog.procyon.org.uk>
Don't bother generating maxSkew in the ACK packet as it has been obsolete
since AFS 3.1.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
---
net/rxrpc/af_rxrpc.c | 2 +-
net/rxrpc/ar-internal.h | 3 +--
net/rxrpc/call_event.c | 15 ++++++---------
net/rxrpc/input.c | 43 ++++++++++++++++---------------------------
net/rxrpc/output.c | 3 +--
net/rxrpc/recvmsg.c | 6 +++---
6 files changed, 28 insertions(+), 44 deletions(-)
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 8c9bd3ae9edf..0dbbfd1b6487 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -402,7 +402,7 @@ EXPORT_SYMBOL(rxrpc_kernel_check_life);
*/
void rxrpc_kernel_probe_life(struct socket *sock, struct rxrpc_call *call)
{
- rxrpc_propose_ACK(call, RXRPC_ACK_PING, 0, 0, true, false,
+ rxrpc_propose_ACK(call, RXRPC_ACK_PING, 0, true, false,
rxrpc_propose_ack_ping_for_check_life);
rxrpc_send_ack_packet(call, true, NULL);
}
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 9796c45d2f6a..145335611af6 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -650,7 +650,6 @@ struct rxrpc_call {
/* receive-phase ACK management */
u8 ackr_reason; /* reason to ACK */
- u16 ackr_skew; /* skew on packet being ACK'd */
rxrpc_serial_t ackr_serial; /* serial of packet being ACK'd */
rxrpc_serial_t ackr_first_seq; /* first sequence number received */
rxrpc_seq_t ackr_prev_seq; /* previous sequence number received */
@@ -744,7 +743,7 @@ int rxrpc_reject_call(struct rxrpc_sock *);
/*
* call_event.c
*/
-void rxrpc_propose_ACK(struct rxrpc_call *, u8, u16, u32, bool, bool,
+void rxrpc_propose_ACK(struct rxrpc_call *, u8, u32, bool, bool,
enum rxrpc_propose_ack_trace);
void rxrpc_process_call(struct work_struct *);
diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index bc2adeb3acb9..c767679bfa5d 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -43,8 +43,7 @@ static void rxrpc_propose_ping(struct rxrpc_call *call,
* propose an ACK be sent
*/
static void __rxrpc_propose_ACK(struct rxrpc_call *call, u8 ack_reason,
- u16 skew, u32 serial, bool immediate,
- bool background,
+ u32 serial, bool immediate, bool background,
enum rxrpc_propose_ack_trace why)
{
enum rxrpc_propose_ack_outcome outcome = rxrpc_propose_ack_use;
@@ -69,14 +68,12 @@ static void __rxrpc_propose_ACK(struct rxrpc_call *call, u8 ack_reason,
if (RXRPC_ACK_UPDATEABLE & (1 << ack_reason)) {
outcome = rxrpc_propose_ack_update;
call->ackr_serial = serial;
- call->ackr_skew = skew;
}
if (!immediate)
goto trace;
} else if (prior > rxrpc_ack_priority[call->ackr_reason]) {
call->ackr_reason = ack_reason;
call->ackr_serial = serial;
- call->ackr_skew = skew;
} else {
outcome = rxrpc_propose_ack_subsume;
}
@@ -137,11 +134,11 @@ static void __rxrpc_propose_ACK(struct rxrpc_call *call, u8 ack_reason,
* propose an ACK be sent, locking the call structure
*/
void rxrpc_propose_ACK(struct rxrpc_call *call, u8 ack_reason,
- u16 skew, u32 serial, bool immediate, bool background,
+ u32 serial, bool immediate, bool background,
enum rxrpc_propose_ack_trace why)
{
spin_lock_bh(&call->lock);
- __rxrpc_propose_ACK(call, ack_reason, skew, serial,
+ __rxrpc_propose_ACK(call, ack_reason, serial,
immediate, background, why);
spin_unlock_bh(&call->lock);
}
@@ -239,7 +236,7 @@ static void rxrpc_resend(struct rxrpc_call *call, unsigned long now_j)
ack_ts = ktime_sub(now, call->acks_latest_ts);
if (ktime_to_ns(ack_ts) < call->peer->rtt)
goto out;
- rxrpc_propose_ACK(call, RXRPC_ACK_PING, 0, 0, true, false,
+ rxrpc_propose_ACK(call, RXRPC_ACK_PING, 0, true, false,
rxrpc_propose_ack_ping_for_lost_ack);
rxrpc_send_ack_packet(call, true, NULL);
goto out;
@@ -372,7 +369,7 @@ void rxrpc_process_call(struct work_struct *work)
if (time_after_eq(now, t)) {
trace_rxrpc_timer(call, rxrpc_timer_exp_keepalive, now);
cmpxchg(&call->keepalive_at, t, now + MAX_JIFFY_OFFSET);
- rxrpc_propose_ACK(call, RXRPC_ACK_PING, 0, 0, true, true,
+ rxrpc_propose_ACK(call, RXRPC_ACK_PING, 0, true, true,
rxrpc_propose_ack_ping_for_keepalive);
set_bit(RXRPC_CALL_EV_PING, &call->events);
}
@@ -407,7 +404,7 @@ void rxrpc_process_call(struct work_struct *work)
send_ack = NULL;
if (test_and_clear_bit(RXRPC_CALL_EV_ACK_LOST, &call->events)) {
call->acks_lost_top = call->tx_top;
- rxrpc_propose_ACK(call, RXRPC_ACK_PING, 0, 0, true, false,
+ rxrpc_propose_ACK(call, RXRPC_ACK_PING, 0, true, false,
rxrpc_propose_ack_ping_for_lost_ack);
send_ack = &call->acks_lost_ping;
}
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index ee95d1cd1cdf..dd47d465d1d3 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -196,15 +196,14 @@ static void rxrpc_congestion_management(struct rxrpc_call *call,
* Ping the other end to fill our RTT cache and to retrieve the rwind
* and MTU parameters.
*/
-static void rxrpc_send_ping(struct rxrpc_call *call, struct sk_buff *skb,
- int skew)
+static void rxrpc_send_ping(struct rxrpc_call *call, struct sk_buff *skb)
{
struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
ktime_t now = skb->tstamp;
if (call->peer->rtt_usage < 3 ||
ktime_before(ktime_add_ms(call->peer->rtt_last_req, 1000), now))
- rxrpc_propose_ACK(call, RXRPC_ACK_PING, skew, sp->hdr.serial,
+ rxrpc_propose_ACK(call, RXRPC_ACK_PING, sp->hdr.serial,
true, true,
rxrpc_propose_ack_ping_for_params);
}
@@ -419,8 +418,7 @@ static void rxrpc_input_dup_data(struct rxrpc_call *call, rxrpc_seq_t seq,
/*
* Process a DATA packet, adding the packet to the Rx ring.
*/
-static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb,
- u16 skew)
+static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb)
{
struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
enum rxrpc_call_state state;
@@ -600,11 +598,11 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb,
ack:
if (ack)
- rxrpc_propose_ACK(call, ack, skew, ack_serial,
+ rxrpc_propose_ACK(call, ack, ack_serial,
immediate_ack, true,
rxrpc_propose_ack_input_data);
else
- rxrpc_propose_ACK(call, RXRPC_ACK_DELAY, skew, serial,
+ rxrpc_propose_ACK(call, RXRPC_ACK_DELAY, serial,
false, true,
rxrpc_propose_ack_input_data);
@@ -822,8 +820,7 @@ static void rxrpc_input_soft_acks(struct rxrpc_call *call, u8 *acks,
* soft-ACK means that the packet may be discarded and retransmission
* requested. A phase is complete when all packets are hard-ACK'd.
*/
-static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb,
- u16 skew)
+static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb)
{
struct rxrpc_ack_summary summary = { 0 };
struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
@@ -867,11 +864,11 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb,
if (buf.ack.reason == RXRPC_ACK_PING) {
_proto("Rx ACK %%%u PING Request", sp->hdr.serial);
rxrpc_propose_ACK(call, RXRPC_ACK_PING_RESPONSE,
- skew, sp->hdr.serial, true, true,
+ sp->hdr.serial, true, true,
rxrpc_propose_ack_respond_to_ping);
} else if (sp->hdr.flags & RXRPC_REQUEST_ACK) {
rxrpc_propose_ACK(call, RXRPC_ACK_REQUESTED,
- skew, sp->hdr.serial, true, true,
+ sp->hdr.serial, true, true,
rxrpc_propose_ack_respond_to_ack);
}
@@ -948,7 +945,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb,
RXRPC_TX_ANNO_LAST &&
summary.nr_acks == call->tx_top - hard_ack &&
rxrpc_is_client_call(call))
- rxrpc_propose_ACK(call, RXRPC_ACK_PING, skew, sp->hdr.serial,
+ rxrpc_propose_ACK(call, RXRPC_ACK_PING, sp->hdr.serial,
false, true,
rxrpc_propose_ack_ping_for_lost_reply);
@@ -1004,7 +1001,7 @@ static void rxrpc_input_abort(struct rxrpc_call *call, struct sk_buff *skb)
* Process an incoming call packet.
*/
static void rxrpc_input_call_packet(struct rxrpc_call *call,
- struct sk_buff *skb, u16 skew)
+ struct sk_buff *skb)
{
struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
unsigned long timo;
@@ -1023,11 +1020,11 @@ static void rxrpc_input_call_packet(struct rxrpc_call *call,
switch (sp->hdr.type) {
case RXRPC_PACKET_TYPE_DATA:
- rxrpc_input_data(call, skb, skew);
+ rxrpc_input_data(call, skb);
break;
case RXRPC_PACKET_TYPE_ACK:
- rxrpc_input_ack(call, skb, skew);
+ rxrpc_input_ack(call, skb);
break;
case RXRPC_PACKET_TYPE_BUSY:
@@ -1181,7 +1178,6 @@ int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb)
struct rxrpc_peer *peer = NULL;
struct rxrpc_sock *rx = NULL;
unsigned int channel;
- int skew = 0;
_enter("%p", udp_sk);
@@ -1309,15 +1305,8 @@ int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb)
goto out;
}
- /* Note the serial number skew here */
- skew = (int)sp->hdr.serial - (int)conn->hi_serial;
- if (skew >= 0) {
- if (skew > 0)
- conn->hi_serial = sp->hdr.serial;
- } else {
- skew = -skew;
- skew = min(skew, 65535);
- }
+ if ((int)sp->hdr.serial - (int)conn->hi_serial > 0)
+ conn->hi_serial = sp->hdr.serial;
/* Call-bound packets are routed by connection channel. */
channel = sp->hdr.cid & RXRPC_CHANNELMASK;
@@ -1380,11 +1369,11 @@ int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb)
call = rxrpc_new_incoming_call(local, rx, skb);
if (!call)
goto reject_packet;
- rxrpc_send_ping(call, skb, skew);
+ rxrpc_send_ping(call, skb);
mutex_unlock(&call->user_mutex);
}
- rxrpc_input_call_packet(call, skb, skew);
+ rxrpc_input_call_packet(call, skb);
goto discard;
discard:
diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index 948e3fe249ec..369e516c4bdf 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -87,7 +87,7 @@ static size_t rxrpc_fill_out_ack(struct rxrpc_connection *conn,
*_top = top;
pkt->ack.bufferSpace = htons(8);
- pkt->ack.maxSkew = htons(call->ackr_skew);
+ pkt->ack.maxSkew = htons(0);
pkt->ack.firstPacket = htonl(hard_ack + 1);
pkt->ack.previousPacket = htonl(call->ackr_prev_seq);
pkt->ack.serial = htonl(serial);
@@ -228,7 +228,6 @@ int rxrpc_send_ack_packet(struct rxrpc_call *call, bool ping,
if (ping)
clear_bit(RXRPC_CALL_PINGING, &call->flags);
rxrpc_propose_ACK(call, pkt->ack.reason,
- ntohs(pkt->ack.maxSkew),
ntohl(pkt->ack.serial),
false, true,
rxrpc_propose_ack_retry_tx);
diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index 5abf46cf9e6c..9a7e1bc9791d 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -141,7 +141,7 @@ static void rxrpc_end_rx_phase(struct rxrpc_call *call, rxrpc_serial_t serial)
ASSERTCMP(call->rx_hard_ack, ==, call->rx_top);
if (call->state == RXRPC_CALL_CLIENT_RECV_REPLY) {
- rxrpc_propose_ACK(call, RXRPC_ACK_IDLE, 0, serial, false, true,
+ rxrpc_propose_ACK(call, RXRPC_ACK_IDLE, serial, false, true,
rxrpc_propose_ack_terminal_ack);
//rxrpc_send_ack_packet(call, false, NULL);
}
@@ -159,7 +159,7 @@ static void rxrpc_end_rx_phase(struct rxrpc_call *call, rxrpc_serial_t serial)
call->state = RXRPC_CALL_SERVER_ACK_REQUEST;
call->expect_req_by = jiffies + MAX_JIFFY_OFFSET;
write_unlock_bh(&call->state_lock);
- rxrpc_propose_ACK(call, RXRPC_ACK_DELAY, 0, serial, false, true,
+ rxrpc_propose_ACK(call, RXRPC_ACK_DELAY, serial, false, true,
rxrpc_propose_ack_processing_op);
break;
default:
@@ -212,7 +212,7 @@ static void rxrpc_rotate_rx_window(struct rxrpc_call *call)
if (after_eq(hard_ack, call->ackr_consumed + 2) ||
after_eq(top, call->ackr_seen + 2) ||
(hard_ack == top && after(hard_ack, call->ackr_consumed)))
- rxrpc_propose_ACK(call, RXRPC_ACK_DELAY, 0, serial,
+ rxrpc_propose_ACK(call, RXRPC_ACK_DELAY, serial,
true, true,
rxrpc_propose_ack_rotate_rx);
if (call->ackr_reason && call->ackr_reason != RXRPC_ACK_DELAY)
^ permalink raw reply related
* [PATCH net 1/2] rxrpc: Fix local endpoint refcounting
From: David Howells @ 2019-08-09 16:05 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <156536674651.17478.15139844428920800280.stgit@warthog.procyon.org.uk>
The object lifetime management on the rxrpc_local struct is broken in that
the rxrpc_local_processor() function is expected to clean up and remove an
object - but it may get requeued by packets coming in on the backing UDP
socket once it starts running.
This may result in the assertion in rxrpc_local_rcu() firing because the
memory has been scheduled for RCU destruction whilst still queued:
rxrpc: Assertion failed
------------[ cut here ]------------
kernel BUG at net/rxrpc/local_object.c:468!
Note that if the processor comes around before the RCU free function, it
will just do nothing because ->dead is true.
Fix this by adding a separate refcount to count active users of the
endpoint that causes the endpoint to be destroyed when it reaches 0.
The original refcount can then be used to refcount objects through the work
processor and cause the memory to be rcu freed when that reaches 0.
Fixes: 4f95dd78a77e ("rxrpc: Rework local endpoint management")
Reported-by: syzbot+1e0edc4b8b7494c28450@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/af_rxrpc.c | 4 +-
net/rxrpc/ar-internal.h | 5 ++-
net/rxrpc/input.c | 16 ++++++---
net/rxrpc/local_object.c | 86 +++++++++++++++++++++++++++++-----------------
4 files changed, 72 insertions(+), 39 deletions(-)
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index d09eaf153544..8c9bd3ae9edf 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -193,7 +193,7 @@ static int rxrpc_bind(struct socket *sock, struct sockaddr *saddr, int len)
service_in_use:
write_unlock(&local->services_lock);
- rxrpc_put_local(local);
+ rxrpc_unuse_local(local);
ret = -EADDRINUSE;
error_unlock:
release_sock(&rx->sk);
@@ -901,7 +901,7 @@ static int rxrpc_release_sock(struct sock *sk)
rxrpc_queue_work(&rxnet->service_conn_reaper);
rxrpc_queue_work(&rxnet->client_conn_reaper);
- rxrpc_put_local(rx->local);
+ rxrpc_unuse_local(rx->local);
rx->local = NULL;
key_put(rx->key);
rx->key = NULL;
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 822f45386e31..9796c45d2f6a 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -254,7 +254,8 @@ struct rxrpc_security {
*/
struct rxrpc_local {
struct rcu_head rcu;
- atomic_t usage;
+ atomic_t active_users; /* Number of users of the local endpoint */
+ atomic_t usage; /* Number of references to the structure */
struct rxrpc_net *rxnet; /* The network ns in which this resides */
struct list_head link;
struct socket *socket; /* my UDP socket */
@@ -1002,6 +1003,8 @@ struct rxrpc_local *rxrpc_lookup_local(struct net *, const struct sockaddr_rxrpc
struct rxrpc_local *rxrpc_get_local(struct rxrpc_local *);
struct rxrpc_local *rxrpc_get_local_maybe(struct rxrpc_local *);
void rxrpc_put_local(struct rxrpc_local *);
+struct rxrpc_local *rxrpc_use_local(struct rxrpc_local *);
+void rxrpc_unuse_local(struct rxrpc_local *);
void rxrpc_queue_local(struct rxrpc_local *);
void rxrpc_destroy_all_locals(struct rxrpc_net *);
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 5bd6f1546e5c..ee95d1cd1cdf 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -1108,8 +1108,12 @@ static void rxrpc_post_packet_to_local(struct rxrpc_local *local,
{
_enter("%p,%p", local, skb);
- skb_queue_tail(&local->event_queue, skb);
- rxrpc_queue_local(local);
+ if (rxrpc_get_local_maybe(local)) {
+ skb_queue_tail(&local->event_queue, skb);
+ rxrpc_queue_local(local);
+ } else {
+ rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ }
}
/*
@@ -1119,8 +1123,12 @@ static void rxrpc_reject_packet(struct rxrpc_local *local, struct sk_buff *skb)
{
CHECK_SLAB_OKAY(&local->usage);
- skb_queue_tail(&local->reject_queue, skb);
- rxrpc_queue_local(local);
+ if (rxrpc_get_local_maybe(local)) {
+ skb_queue_tail(&local->reject_queue, skb);
+ rxrpc_queue_local(local);
+ } else {
+ rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ }
}
/*
diff --git a/net/rxrpc/local_object.c b/net/rxrpc/local_object.c
index b1c71bad510b..9798159ee65f 100644
--- a/net/rxrpc/local_object.c
+++ b/net/rxrpc/local_object.c
@@ -79,6 +79,7 @@ static struct rxrpc_local *rxrpc_alloc_local(struct rxrpc_net *rxnet,
local = kzalloc(sizeof(struct rxrpc_local), GFP_KERNEL);
if (local) {
atomic_set(&local->usage, 1);
+ atomic_set(&local->active_users, 1);
local->rxnet = rxnet;
INIT_LIST_HEAD(&local->link);
INIT_WORK(&local->processor, rxrpc_local_processor);
@@ -266,11 +267,8 @@ struct rxrpc_local *rxrpc_lookup_local(struct net *net,
* bind the transport socket may still fail if we're attempting
* to use a local address that the dying object is still using.
*/
- if (!rxrpc_get_local_maybe(local)) {
- cursor = cursor->next;
- list_del_init(&local->link);
+ if (!rxrpc_use_local(local))
break;
- }
age = "old";
goto found;
@@ -284,7 +282,10 @@ struct rxrpc_local *rxrpc_lookup_local(struct net *net,
if (ret < 0)
goto sock_error;
- list_add_tail(&local->link, cursor);
+ if (cursor != &rxnet->local_endpoints)
+ list_replace(cursor, &local->link);
+ else
+ list_add_tail(&local->link, cursor);
age = "new";
found:
@@ -342,7 +343,8 @@ struct rxrpc_local *rxrpc_get_local_maybe(struct rxrpc_local *local)
}
/*
- * Queue a local endpoint.
+ * Queue a local endpoint unless it has become unreferenced and pass the
+ * caller's reference to the work item.
*/
void rxrpc_queue_local(struct rxrpc_local *local)
{
@@ -351,15 +353,8 @@ void rxrpc_queue_local(struct rxrpc_local *local)
if (rxrpc_queue_work(&local->processor))
trace_rxrpc_local(local, rxrpc_local_queued,
atomic_read(&local->usage), here);
-}
-
-/*
- * A local endpoint reached its end of life.
- */
-static void __rxrpc_put_local(struct rxrpc_local *local)
-{
- _enter("%d", local->debug_id);
- rxrpc_queue_work(&local->processor);
+ else
+ rxrpc_put_local(local);
}
/*
@@ -375,10 +370,45 @@ void rxrpc_put_local(struct rxrpc_local *local)
trace_rxrpc_local(local, rxrpc_local_put, n, here);
if (n == 0)
- __rxrpc_put_local(local);
+ call_rcu(&local->rcu, rxrpc_local_rcu);
}
}
+/*
+ * Start using a local endpoint.
+ */
+struct rxrpc_local *rxrpc_use_local(struct rxrpc_local *local)
+{
+ unsigned int au;
+
+ local = rxrpc_get_local_maybe(local);
+ if (!local)
+ return NULL;
+
+ au = atomic_fetch_add_unless(&local->active_users, 1, 0);
+ if (au == 0) {
+ rxrpc_put_local(local);
+ return NULL;
+ }
+
+ return local;
+}
+
+/*
+ * Cease using a local endpoint. Once the number of active users reaches 0, we
+ * start the closure of the transport in the work processor.
+ */
+void rxrpc_unuse_local(struct rxrpc_local *local)
+{
+ unsigned int au;
+
+ au = atomic_dec_return(&local->active_users);
+ if (au == 0)
+ rxrpc_queue_local(local);
+ else
+ rxrpc_put_local(local);
+}
+
/*
* Destroy a local endpoint's socket and then hand the record to RCU to dispose
* of.
@@ -393,16 +423,6 @@ static void rxrpc_local_destroyer(struct rxrpc_local *local)
_enter("%d", local->debug_id);
- /* We can get a race between an incoming call packet queueing the
- * processor again and the work processor starting the destruction
- * process which will shut down the UDP socket.
- */
- if (local->dead) {
- _leave(" [already dead]");
- return;
- }
- local->dead = true;
-
mutex_lock(&rxnet->local_mutex);
list_del_init(&local->link);
mutex_unlock(&rxnet->local_mutex);
@@ -422,13 +442,11 @@ static void rxrpc_local_destroyer(struct rxrpc_local *local)
*/
rxrpc_purge_queue(&local->reject_queue);
rxrpc_purge_queue(&local->event_queue);
-
- _debug("rcu local %d", local->debug_id);
- call_rcu(&local->rcu, rxrpc_local_rcu);
}
/*
- * Process events on an endpoint
+ * Process events on an endpoint. The work item carries a ref which
+ * we must release.
*/
static void rxrpc_local_processor(struct work_struct *work)
{
@@ -441,8 +459,10 @@ static void rxrpc_local_processor(struct work_struct *work)
do {
again = false;
- if (atomic_read(&local->usage) == 0)
- return rxrpc_local_destroyer(local);
+ if (atomic_read(&local->active_users) == 0) {
+ rxrpc_local_destroyer(local);
+ break;
+ }
if (!skb_queue_empty(&local->reject_queue)) {
rxrpc_reject_packets(local);
@@ -454,6 +474,8 @@ static void rxrpc_local_processor(struct work_struct *work)
again = true;
}
} while (again);
+
+ rxrpc_put_local(local);
}
/*
^ permalink raw reply related
* [PATCH net 0/2] rxrpc: Fixes
From: David Howells @ 2019-08-09 16:05 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
Here's a couple of fixes for rxrpc:
(1) Fix refcounting of the local endpoint.
(2) Don't calculate or report packet skew information. This has been
obsolete since AFS 3.1 and so is a waste of resources.
The patches are tagged here:
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
rxrpc-fixes-20190809
and can also be found on the following branch:
http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-fixes
David
---
David Howells (2):
rxrpc: Fix local endpoint refcounting
rxrpc: Don't bother generating maxSkew in the ACK packet
net/rxrpc/af_rxrpc.c | 6 ++-
net/rxrpc/ar-internal.h | 8 +++-
net/rxrpc/call_event.c | 15 +++-----
net/rxrpc/input.c | 59 +++++++++++++++-----------------
net/rxrpc/local_object.c | 86 +++++++++++++++++++++++++++++-----------------
net/rxrpc/output.c | 3 +-
net/rxrpc/recvmsg.c | 6 ++-
7 files changed, 100 insertions(+), 83 deletions(-)
^ permalink raw reply
* Re: [PATCH] perf trace: Fix segmentation fault when access syscall info
From: Arnaldo Carvalho de Melo @ 2019-08-09 16:03 UTC (permalink / raw)
To: Leo Yan
Cc: Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
Namhyung Kim, Daniel Borkmann, Martin KaFai Lau, Song Liu,
Yonghong Song, linux-kernel, netdev, bpf
In-Reply-To: <20190809134431.GE8313@leoy-ThinkPad-X240s>
Em Fri, Aug 09, 2019 at 09:44:31PM +0800, Leo Yan escreveu:
> On Fri, Aug 09, 2019 at 10:25:22AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Fri, Aug 09, 2019 at 06:47:52PM +0800, Leo Yan escreveu:
> > > 'perf trace' reports the segmentation fault as below on Arm64:
> > >
> > > # perf trace -e string -e augmented_raw_syscalls.c
> > > LLVM: dumping tools/perf/examples/bpf/augmented_raw_syscalls.o
> > > perf: Segmentation fault
> > > Obtained 12 stack frames.
> > > perf(sighandler_dump_stack+0x47) [0xaaaaac96ac87]
> > > linux-vdso.so.1(+0x5b7) [0xffffadbeb5b7]
> > > /lib/aarch64-linux-gnu/libc.so.6(strlen+0x10) [0xfffface7d5d0]
> > > /lib/aarch64-linux-gnu/libc.so.6(_IO_vfprintf+0x1ac7) [0xfffface49f97]
> > > /lib/aarch64-linux-gnu/libc.so.6(__vsnprintf_chk+0xc7) [0xffffacedfbe7]
> > > perf(scnprintf+0x97) [0xaaaaac9ca3ff]
> > > perf(+0x997bb) [0xaaaaac8e37bb]
> > > perf(cmd_trace+0x28e7) [0xaaaaac8ec09f]
> > > perf(+0xd4a13) [0xaaaaac91ea13]
> > > perf(main+0x62f) [0xaaaaac8a147f]
> > > /lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0xe3) [0xfffface22d23]
> > > perf(+0x57723) [0xaaaaac8a1723]
> > > Segmentation fault
> > >
> > > This issue is introduced by commit 30a910d7d3e0 ("perf trace:
> > > Preallocate the syscall table"), it allocates trace->syscalls.table[]
> > > array and the element count is 'trace->sctbl->syscalls.nr_entries';
> > > but on Arm64, the system call number is not continuously used; e.g. the
> > > syscall maximum id is 436 but the real entries is only 281. So the
> > > table is allocated with 'nr_entries' as the element count, but it
> > > accesses the table with the syscall id, which might be out of the bound
> > > of the array and cause the segmentation fault.
> > >
> > > This patch allocates trace->syscalls.table[] with the element count is
> > > 'trace->sctbl->syscalls.max_id + 1', this allows any id to access the
> > > table without out of the bound.
> >
> > Thanks a lot!
>
> You are welcome, Arnaldo.
>
> > My bad, that is why we have that max_id there, I forgot
> > about it and since I tested so far only on x86_64... applied to
> > perf/core, since it is only on:
> >
> > [acme@quaco perf]$ git tag --contains 30a910d7d3e0
> > perf-core-for-mingo-5.4-20190729
> > [acme@quaco perf]$
>
> Thanks! Yes, I am working on perf/core branch and hit this issue.
>
> Just in case Ingo has not merged your PR, if could save your efforts
> it's quite fine for me to merge this change in your original patch.
This got already merged into tip/perf/core, so no way to combine both by
now, unfortunately, would be good for bisection purposes on ARM64,
agreed, but not possible.
Thanks again,
- Arnaldo
> Thanks,
> Leo Yan
>
> >
> > - Arnaldo
> >
> > > Fixes: 30a910d7d3e0 ("perf trace: Preallocate the syscall table")
> > > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > > ---
> > > tools/perf/builtin-trace.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> > > index 75eb3811e942..d553d06a9aeb 100644
> > > --- a/tools/perf/builtin-trace.c
> > > +++ b/tools/perf/builtin-trace.c
> > > @@ -1492,7 +1492,7 @@ static int trace__read_syscall_info(struct trace *trace, int id)
> > > const char *name = syscalltbl__name(trace->sctbl, id);
> > >
> > > if (trace->syscalls.table == NULL) {
> > > - trace->syscalls.table = calloc(trace->sctbl->syscalls.nr_entries, sizeof(*sc));
> > > + trace->syscalls.table = calloc(trace->sctbl->syscalls.max_id + 1, sizeof(*sc));
> > > if (trace->syscalls.table == NULL)
> > > return -ENOMEM;
> > > }
> > > --
> > > 2.17.1
> >
> > --
> >
> > - Arnaldo
--
- Arnaldo
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox