* Re: [PATCH net] r8169: fix NAPI handling under high load
From: David Miller @ 2018-10-18 5:21 UTC (permalink / raw)
To: romieu; +Cc: hkallweit1, nic_swsd, netdev
In-Reply-To: <20181017233044.GA8478@electric-eye.fr.zoreil.com>
From: Francois Romieu <romieu@fr.zoreil.com>
Date: Thu, 18 Oct 2018 01:30:45 +0200
> Heiner Kallweit <hkallweit1@gmail.com> :
> [...]
>> This issue has been there more or less forever (at least it exists in
>> 3.16 already), so I can't provide a "Fixes" tag.
>
> Hardly forever. It fixes da78dbff2e05630921c551dbbc70a4b7981a8fff.
I don't see exactly how that can be true.
That commit didn't change the parts of the NAPI poll processing which
are relevant here, mainly the guarding of the RX and TX work using
the status bits which are cleared.
Maybe I'm missing something? If so, indeed it would be nice to add
a proper Fixes: tag here.
Thanks!
^ permalink raw reply
* Re: [PATCH net-next 0/2] tcp_bbr: TCP BBR changes for EDT pacing model
From: David Miller @ 2018-10-18 5:24 UTC (permalink / raw)
To: ncardwell; +Cc: netdev
In-Reply-To: <20181017001645.261770-1-ncardwell@google.com>
From: Neal Cardwell <ncardwell@google.com>
Date: Tue, 16 Oct 2018 20:16:43 -0400
> Two small patches for TCP BBR to follow up with Eric's recent work to change
> the TCP and fq pacing machinery to an "earliest departure time" (EDT) model:
>
> - The first patch adjusts the TCP BBR logic to work with the new
> "earliest departure time" (EDT) pacing model.
>
> - The second patch adjusts the TCP BBR logic to centralize the setting
> of gain values, to simplify the code and prepare for future changes.
Series applied, thanks Neal.
^ permalink raw reply
* Re: [PATCH net] mlxsw: core: Fix use-after-free when flashing firmware during init
From: David Miller @ 2018-10-18 5:27 UTC (permalink / raw)
To: idosch; +Cc: netdev, jiri, petrm, alexpe, mlxsw
In-Reply-To: <20181017080525.21856-1-idosch@mellanox.com>
From: Ido Schimmel <idosch@mellanox.com>
Date: Wed, 17 Oct 2018 08:05:45 +0000
> When the switch driver (e.g., mlxsw_spectrum) determines it needs to
> flash a new firmware version it resets the ASIC after the flashing
> process. The bus driver (e.g., mlxsw_pci) then registers itself again
> with mlxsw_core which means (among other things) that the device
> registers itself again with the hwmon subsystem again.
>
> Since the device was registered with the hwmon subsystem using
> devm_hwmon_device_register_with_groups(), then the old hwmon device
> (registered before the flashing) was never unregistered and was
> referencing stale data, resulting in a use-after free.
>
> Fix by removing reliance on device managed APIs in mlxsw_hwmon_init().
>
> Fixes: c86d62cc410c ("mlxsw: spectrum: Reset FW after flash")
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
> Tested-by: Alexander Petrovskiy <alexpe@mellanox.com>
> Reviewed-by: Petr Machata <petrm@mellanox.com>
Applied.
^ permalink raw reply
* Re: [PATCH net] udp6: fix encap return code for resubmitting
From: David Miller @ 2018-10-18 5:27 UTC (permalink / raw)
To: pabeni; +Cc: netdev
In-Reply-To: <1e792b80ae514e944f868c904c30e797e02b418c.1539769397.git.pabeni@redhat.com>
From: Paolo Abeni <pabeni@redhat.com>
Date: Wed, 17 Oct 2018 11:44:04 +0200
> The commit eb63f2964dbe ("udp6: add missing checks on edumux packet
> processing") used the same return code convention of the ipv4 counterpart,
> but ipv6 uses the opposite one: positive values means resubmit.
>
> This change addresses the issue, using positive return value for
> resubmitting. Also update the related comment, which was broken, too.
>
> Fixes: eb63f2964dbe ("udp6: add missing checks on edumux packet processing")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> Note: I could not find any in kernel udp6 encap using the above
> feature, that would explain why nobody complained so far...
Applied.
^ permalink raw reply
* Re: [PATCH V2 net-next] net: ena: Fix Kconfig dependency on X86
From: David Miller @ 2018-10-18 5:28 UTC (permalink / raw)
To: netanel
Cc: netdev, akiyano, alisaidi, dwmw, zorik, matua, saeedb, msw,
aliguori, nafea, gtzalik
In-Reply-To: <20181017100421.56045-1-netanel@amazon.com>
From: <netanel@amazon.com>
Date: Wed, 17 Oct 2018 10:04:21 +0000
> From: Netanel Belgazal <netanel@amazon.com>
>
> The Kconfig limitation of X86 is to too wide.
> The ENA driver only requires a little endian dependency.
>
> Change the dependency to be on little endian CPU.
>
> Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Applied.
^ permalink raw reply
* Re: [PATCH V1 net-next] net: ena: enable Low Latency Queues
From: David Miller @ 2018-10-18 5:31 UTC (permalink / raw)
To: akiyano
Cc: netdev, dwmw, zorik, matua, saeedb, msw, aliguori, nafea, gtzalik,
netanel, alisaidi
In-Reply-To: <1539779603-17895-1-git-send-email-akiyano@amazon.com>
From: <akiyano@amazon.com>
Date: Wed, 17 Oct 2018 15:33:23 +0300
> From: Arthur Kiyanovski <akiyano@amazon.com>
>
> Use the new API to enable usage of LLQ.
>
> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Applied.
^ permalink raw reply
* Re: [PATCH net] sctp: fix the data size calculation in sctp_data_size
From: David Miller @ 2018-10-18 5:33 UTC (permalink / raw)
To: lucien.xin; +Cc: netdev, linux-sctp, marcelo.leitner, nhorman
In-Reply-To: <c407b682a424f0441d9b9a29ef6296c8e9824b64.1539781887.git.lucien.xin@gmail.com>
From: Xin Long <lucien.xin@gmail.com>
Date: Wed, 17 Oct 2018 21:11:27 +0800
> sctp data size should be calculated by subtracting data chunk header's
> length from chunk_hdr->length, not just data header.
>
> Fixes: 668c9beb9020 ("sctp: implement assign_number for sctp_stream_interleave")
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
Applied and queued up for -stable.
^ permalink raw reply
* Re: [PATCH net] net/sched: properly init chain in case of multiple control actions
From: Cong Wang @ 2018-10-18 5:35 UTC (permalink / raw)
To: Davide Caratti
Cc: Jiri Pirko, Jamal Hadi Salim, David Miller,
Linux Kernel Network Developers
In-Reply-To: <ddbde7ffec27783a65a63e89962bf9db7565f0e0.camel@redhat.com>
On Tue, Oct 16, 2018 at 10:38 AM Davide Caratti <dcaratti@redhat.com> wrote:
>
> On Mon, 2018-10-15 at 11:31 -0700, Cong Wang wrote:
> > On Sat, Oct 13, 2018 at 8:23 AM Davide Caratti <dcaratti@redhat.com> wrote:
> > >
> > > On Fri, 2018-10-12 at 13:57 -0700, Cong Wang wrote:
> > > > Why not just validate the fallback action in each action init()?
> > > > For example, checking tcfg_paction in tcf_gact_init().
> > > >
> > > > I don't see the need of making it generic.
> ...
> > > A (legal?) trick is to let tcf_action store the fallback action when it
> > > contains a 'goto chain' command, I just posted a proposal for gact. If you
> > > think it's ok, I will test and post the same for act_police.
> >
> > Do we really need to support TC_ACT_GOTO_CHAIN for
> > gact->tcfg_paction etc.? I mean, is it useful in practice or is it just for
> > completeness?
> >
> > IF we don't need to support it, we can just make it invalid without needing
> > to initialize it in ->init() at all.
> >
> > If we do, however, we really need to move it into each ->init(), because
> > we have to lock each action if we are modifying an existing one. With
> > your patch, tcf_action_goto_chain_init() is still called without the per-action
> > lock.
> >
> > What's more, if we support two different actions in gact, that is, tcfg_paction
> > and tcf_action, how could you still only have one a->goto_chain pointer?
> > There should be two pointers for each of them. :)
>
> whatever fixes the NULL dereference is OK for me.
> I thought that the proposal made with
>
> https://www.mail-archive.com/netdev@vger.kernel.org/msg251933.html
>
> (i.e., letting init() copy tcfg_paction to tcf_action in case it contained
> 'goto chain x') was smart enough to preserve the current behavior, and
> also let 'goto chain' work in case it was configured *only* for the
> fallback action.
> When the action is modified, the change to tcfg_paction is done with the
> same spinlock as tcf_action, so I didn't notice anything worse than the
> current locking layout.
>
> (well, after some more thinking I looked again at that patch and yes, it
> lacked the most important thing:)
Hmm, as I said, I am not sure if the logic is correct, if we have two different
goto actions, we must have two pointers.
I will re-think about it tomorrow. (I am at a conference so don't have much
time on reviewing this.)
Thanks.
^ permalink raw reply
* Re: [PATCH net] net: ipmr: fix unresolved entry dumps
From: David Miller @ 2018-10-18 5:36 UTC (permalink / raw)
To: nikolay; +Cc: netdev
In-Reply-To: <20181017193434.11383-1-nikolay@cumulusnetworks.com>
From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Date: Wed, 17 Oct 2018 22:34:34 +0300
> If the skb space ends in an unresolved entry while dumping we'll miss
> some unresolved entries. The reason is due to zeroing the entry counter
> between dumping resolved and unresolved mfc entries. We should just
> keep counting until the whole table is dumped and zero when we move to
> the next as we have a separate table counter.
>
> Reported-by: Colin Ian King <colin.king@canonical.com>
> Fixes: 8fb472c09b9d ("ipmr: improve hash scalability")
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Applied and queued up for -stable.
^ permalink raw reply
* [PATCH v3 bpf-next 2/2] bpf: add tests for direct packet access from CGROUP_SKB
From: Song Liu @ 2018-10-18 5:39 UTC (permalink / raw)
To: netdev; +Cc: ast, daniel, kernel-team, Song Liu
In-Reply-To: <20181018053949.4064426-1-songliubraving@fb.com>
Tests are added to make sure CGROUP_SKB cannot access:
tc_classid, data_meta, flow_keys
and can read and write:
mark, prority, and cb[0-4]
and can read other fields.
To make selftest with skb->sk work, a dummy sk is added in
bpf_prog_test_run_skb().
Signed-off-by: Song Liu <songliubraving@fb.com>
---
net/bpf/test_run.c | 4 +
tools/testing/selftests/bpf/test_verifier.c | 170 ++++++++++++++++++++
2 files changed, 174 insertions(+)
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 0c423b8cd75c..c7210e2f1ae9 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -10,6 +10,7 @@
#include <linux/etherdevice.h>
#include <linux/filter.h>
#include <linux/sched/signal.h>
+#include <net/sock.h>
static __always_inline u32 bpf_test_run_one(struct bpf_prog *prog, void *ctx,
struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE])
@@ -115,6 +116,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
u32 retval, duration;
int hh_len = ETH_HLEN;
struct sk_buff *skb;
+ struct sock sk;
void *data;
int ret;
@@ -142,6 +144,8 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
kfree(data);
return -ENOMEM;
}
+ sock_init_data(NULL, &sk);
+ skb->sk = &sk;
skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
__skb_put(skb, size);
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index cf4cd32b6772..5bfba7e8afd7 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -4862,6 +4862,176 @@ static struct bpf_test tests[] = {
.result = REJECT,
.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
+ {
+ "direct packet read test#1 for CGROUP_SKB",
+ .insns = {
+ BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+ offsetof(struct __sk_buff, data)),
+ BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+ offsetof(struct __sk_buff, data_end)),
+ BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_1,
+ offsetof(struct __sk_buff, len)),
+ BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+ offsetof(struct __sk_buff, pkt_type)),
+ BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+ offsetof(struct __sk_buff, mark)),
+ BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_6,
+ offsetof(struct __sk_buff, mark)),
+ BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+ offsetof(struct __sk_buff, queue_mapping)),
+ BPF_LDX_MEM(BPF_W, BPF_REG_8, BPF_REG_1,
+ offsetof(struct __sk_buff, protocol)),
+ BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_1,
+ offsetof(struct __sk_buff, vlan_present)),
+ BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
+ BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 1),
+ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_2, 0),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .result = ACCEPT,
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ },
+ {
+ "direct packet read test#2 for CGROUP_SKB",
+ .insns = {
+ BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_1,
+ offsetof(struct __sk_buff, vlan_tci)),
+ BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+ offsetof(struct __sk_buff, vlan_proto)),
+ BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+ offsetof(struct __sk_buff, priority)),
+ BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_6,
+ offsetof(struct __sk_buff, priority)),
+ BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+ offsetof(struct __sk_buff, ingress_ifindex)),
+ BPF_LDX_MEM(BPF_W, BPF_REG_8, BPF_REG_1,
+ offsetof(struct __sk_buff, tc_index)),
+ BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_1,
+ offsetof(struct __sk_buff, hash)),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .result = ACCEPT,
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ },
+ {
+ "direct packet read test#3 for CGROUP_SKB",
+ .insns = {
+ BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_1,
+ offsetof(struct __sk_buff, cb[0])),
+ BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+ offsetof(struct __sk_buff, cb[1])),
+ BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+ offsetof(struct __sk_buff, cb[2])),
+ BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+ offsetof(struct __sk_buff, cb[3])),
+ BPF_LDX_MEM(BPF_W, BPF_REG_8, BPF_REG_1,
+ offsetof(struct __sk_buff, cb[4])),
+ BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_1,
+ offsetof(struct __sk_buff, napi_id)),
+ BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_4,
+ offsetof(struct __sk_buff, cb[0])),
+ BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_5,
+ offsetof(struct __sk_buff, cb[1])),
+ BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_6,
+ offsetof(struct __sk_buff, cb[2])),
+ BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_7,
+ offsetof(struct __sk_buff, cb[3])),
+ BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_8,
+ offsetof(struct __sk_buff, cb[4])),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .result = ACCEPT,
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ },
+ {
+ "direct packet read test#4 for CGROUP_SKB",
+ .insns = {
+ BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+ offsetof(struct __sk_buff, family)),
+ BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+ offsetof(struct __sk_buff, remote_ip4)),
+ BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_1,
+ offsetof(struct __sk_buff, local_ip4)),
+ BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+ offsetof(struct __sk_buff, remote_ip6[0])),
+ BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+ offsetof(struct __sk_buff, remote_ip6[1])),
+ BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+ offsetof(struct __sk_buff, remote_ip6[2])),
+ BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+ offsetof(struct __sk_buff, remote_ip6[3])),
+ BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+ offsetof(struct __sk_buff, local_ip6[0])),
+ BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+ offsetof(struct __sk_buff, local_ip6[1])),
+ BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+ offsetof(struct __sk_buff, local_ip6[2])),
+ BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+ offsetof(struct __sk_buff, local_ip6[3])),
+ BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+ offsetof(struct __sk_buff, remote_port)),
+ BPF_LDX_MEM(BPF_W, BPF_REG_8, BPF_REG_1,
+ offsetof(struct __sk_buff, local_port)),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .result = ACCEPT,
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ },
+ {
+ "invalid access of tc_classid for CGROUP_SKB",
+ .insns = {
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+ offsetof(struct __sk_buff, tc_classid)),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .result = REJECT,
+ .errstr = "invalid bpf_context access",
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ },
+ {
+ "invalid access of data_meta for CGROUP_SKB",
+ .insns = {
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+ offsetof(struct __sk_buff, data_meta)),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .result = REJECT,
+ .errstr = "invalid bpf_context access",
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ },
+ {
+ "invalid access of flow_keys for CGROUP_SKB",
+ .insns = {
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+ offsetof(struct __sk_buff, flow_keys)),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .result = REJECT,
+ .errstr = "invalid bpf_context access",
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ },
+ {
+ "invalid write access to napi_id for CGROUP_SKB",
+ .insns = {
+ BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_1,
+ offsetof(struct __sk_buff, napi_id)),
+ BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_9,
+ offsetof(struct __sk_buff, napi_id)),
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ },
+ .result = REJECT,
+ .errstr = "invalid bpf_context access",
+ .prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+ },
{
"valid cgroup storage access",
.insns = {
--
2.17.1
^ permalink raw reply related
* [PATCH v3 bpf-next 0/2] bpf: add cg_skb_is_valid_access
From: Song Liu @ 2018-10-18 5:39 UTC (permalink / raw)
To: netdev; +Cc: ast, daniel, kernel-team, Song Liu
Changes v2 -> v3:
1. Added helper function bpf_compute_and_save_data_pointers() and
bpf_restore_data_pointers().
Changes v1 -> v2:
1. Updated the list of read-only fields, and read-write fields.
2. Added dummy sk to bpf_prog_test_run_skb().
This set enables BPF program of type BPF_PROG_TYPE_CGROUP_SKB to access
some __skb_buff data directly.
Song Liu (2):
bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKB
bpf: add tests for direct packet access from CGROUP_SKB
include/linux/filter.h | 24 +++
kernel/bpf/cgroup.c | 6 +
net/bpf/test_run.c | 4 +
net/core/filter.c | 36 ++++-
tools/testing/selftests/bpf/test_verifier.c | 170 ++++++++++++++++++++
5 files changed, 239 insertions(+), 1 deletion(-)
^ permalink raw reply
* [PATCH v3 bpf-next 1/2] bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKB
From: Song Liu @ 2018-10-18 5:39 UTC (permalink / raw)
To: netdev; +Cc: ast, daniel, kernel-team, Song Liu
In-Reply-To: <20181018053949.4064426-1-songliubraving@fb.com>
BPF programs of BPF_PROG_TYPE_CGROUP_SKB need to access headers in the
skb. This patch enables direct access of skb for these programs.
Two helper functions bpf_compute_and_save_data_pointers() and
bpf_restore_data_pointers() are introduced. There are used in
__cgroup_bpf_run_filter_skb(), to compute proper data_end for the
BPF program, and restore original data afterwards.
Signed-off-by: Song Liu <songliubraving@fb.com>
---
include/linux/filter.h | 24 ++++++++++++++++++++++++
kernel/bpf/cgroup.c | 6 ++++++
net/core/filter.c | 36 +++++++++++++++++++++++++++++++++++-
3 files changed, 65 insertions(+), 1 deletion(-)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 5771874bc01e..96b3ee7f14c9 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -548,6 +548,30 @@ static inline void bpf_compute_data_pointers(struct sk_buff *skb)
cb->data_end = skb->data + skb_headlen(skb);
}
+/* Similar to bpf_compute_data_pointers(), except that save orginal
+ * data in cb->data and cb->meta_data for restore.
+ */
+static inline void bpf_compute_and_save_data_pointers(
+ struct sk_buff *skb, void *saved_pointers[2])
+{
+ struct bpf_skb_data_end *cb = (struct bpf_skb_data_end *)skb->cb;
+
+ saved_pointers[0] = cb->data_meta;
+ saved_pointers[1] = cb->data_end;
+ cb->data_meta = skb->data - skb_metadata_len(skb);
+ cb->data_end = skb->data + skb_headlen(skb);
+}
+
+/* Restore data saved by bpf_compute_data_pointers(). */
+static inline void bpf_restore_data_pointers(
+ struct sk_buff *skb, void *saved_pointers[2])
+{
+ struct bpf_skb_data_end *cb = (struct bpf_skb_data_end *)skb->cb;
+
+ cb->data_meta = saved_pointers[0];
+ cb->data_end = saved_pointers[1];;
+}
+
static inline u8 *bpf_skb_cb(struct sk_buff *skb)
{
/* eBPF programs may read/write skb->cb[] area to transfer meta
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 00f6ed2e4f9a..5f5180104ddc 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -554,6 +554,7 @@ int __cgroup_bpf_run_filter_skb(struct sock *sk,
unsigned int offset = skb->data - skb_network_header(skb);
struct sock *save_sk;
struct cgroup *cgrp;
+ void *saved_pointers[2];
int ret;
if (!sk || !sk_fullsock(sk))
@@ -566,8 +567,13 @@ int __cgroup_bpf_run_filter_skb(struct sock *sk,
save_sk = skb->sk;
skb->sk = sk;
__skb_push(skb, offset);
+
+ /* compute pointers for the bpf prog */
+ bpf_compute_and_save_data_pointers(skb, saved_pointers);
+
ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], skb,
bpf_prog_run_save_cb);
+ bpf_restore_data_pointers(skb, saved_pointers);
__skb_pull(skb, offset);
skb->sk = save_sk;
return ret == 1 ? 0 : -EPERM;
diff --git a/net/core/filter.c b/net/core/filter.c
index 1a3ac6c46873..e3ca30bd6840 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5346,6 +5346,40 @@ static bool sk_filter_is_valid_access(int off, int size,
return bpf_skb_is_valid_access(off, size, type, prog, info);
}
+static bool cg_skb_is_valid_access(int off, int size,
+ enum bpf_access_type type,
+ const struct bpf_prog *prog,
+ struct bpf_insn_access_aux *info)
+{
+ switch (off) {
+ case bpf_ctx_range(struct __sk_buff, tc_classid):
+ case bpf_ctx_range(struct __sk_buff, data_meta):
+ case bpf_ctx_range(struct __sk_buff, flow_keys):
+ return false;
+ }
+ if (type == BPF_WRITE) {
+ switch (off) {
+ case bpf_ctx_range(struct __sk_buff, mark):
+ case bpf_ctx_range(struct __sk_buff, priority):
+ case bpf_ctx_range_till(struct __sk_buff, cb[0], cb[4]):
+ break;
+ default:
+ return false;
+ }
+ }
+
+ switch (off) {
+ case bpf_ctx_range(struct __sk_buff, data):
+ info->reg_type = PTR_TO_PACKET;
+ break;
+ case bpf_ctx_range(struct __sk_buff, data_end):
+ info->reg_type = PTR_TO_PACKET_END;
+ break;
+ }
+
+ return bpf_skb_is_valid_access(off, size, type, prog, info);
+}
+
static bool lwt_is_valid_access(int off, int size,
enum bpf_access_type type,
const struct bpf_prog *prog,
@@ -7038,7 +7072,7 @@ const struct bpf_prog_ops xdp_prog_ops = {
const struct bpf_verifier_ops cg_skb_verifier_ops = {
.get_func_proto = cg_skb_func_proto,
- .is_valid_access = sk_filter_is_valid_access,
+ .is_valid_access = cg_skb_is_valid_access,
.convert_ctx_access = bpf_convert_ctx_access,
};
--
2.17.1
^ permalink raw reply related
* Re: [net-next PATCH] net: sched: cls_flower: Classify packets using port ranges
From: Cong Wang @ 2018-10-18 5:41 UTC (permalink / raw)
To: David Miller
Cc: amritha.nambiar, Linux Kernel Network Developers, Jakub Kicinski,
sridhar.samudrala, Jamal Hadi Salim, Jiri Pirko
In-Reply-To: <20181017.214233.695288699109520404.davem@davemloft.net>
On Wed, Oct 17, 2018 at 9:42 PM David Miller <davem@davemloft.net> wrote:
>
> From: Amritha Nambiar <amritha.nambiar@intel.com>
> Date: Fri, 12 Oct 2018 06:53:30 -0700
>
> > Added support in tc flower for filtering based on port ranges.
> > This is a rework of the RFC patch at:
> > https://patchwork.ozlabs.org/patch/969595/
>
> You never addressed Cong's feedback asking you to explain why this
> can't be simply built using existing generic filtering facilities that
> exist already.
>
> I appreciate that you addressed Jiri's feedback, but Cong's feedback is
> just as, if not more, important.
>
My objection is against introducing a new filter just for port range, now
it is built on top of flower filter, so it is much better now.
u32 filter can do the nearly same, but requires a power-of-two, so it is
not completely duplicated.
Therefore, I think the idea of building it on top of flower is fine. But I don't
read into any code, only the description.
Thanks!
^ permalink raw reply
* Re: bond: take rcu lock in netpoll_send_skb_on_dev
From: Cong Wang @ 2018-10-18 5:46 UTC (permalink / raw)
To: eranbe
Cc: Dave Jones, Linux Kernel Network Developers, Tariq Toukan,
Saeed Mahameed
In-Reply-To: <b80a8984-2100-a7c7-008d-2768c09ef0f8@mellanox.com>
On Mon, Oct 15, 2018 at 4:36 AM Eran Ben Elisha <eranbe@mellanox.com> wrote:
> Hi,
>
> This suggested fix introduced a regression while using netconsole module
> with mlx5_core module loaded.
It is already reported here:
https://marc.info/?l=linux-kernel&m=153917359528669&w=2
>
> During irq handling, we hit a warning that this rcu_read_lock_bh cannot
> be taken inside an IRQ.
Yes, I mentioned the same even before this patch was sent out:
https://marc.info/?l=linux-netdev&m=153816136624679&w=2
Thanks.
^ permalink raw reply
* Re: [PATCH net] r8169: fix NAPI handling under high load
From: Heiner Kallweit @ 2018-10-18 5:58 UTC (permalink / raw)
To: David Miller, romieu; +Cc: nic_swsd, netdev
In-Reply-To: <20181017.222149.1241280289340067644.davem@davemloft.net>
On 18.10.2018 07:21, David Miller wrote:
> From: Francois Romieu <romieu@fr.zoreil.com>
> Date: Thu, 18 Oct 2018 01:30:45 +0200
>
>> Heiner Kallweit <hkallweit1@gmail.com> :
>> [...]
>>> This issue has been there more or less forever (at least it exists in
>>> 3.16 already), so I can't provide a "Fixes" tag.
>>
>> Hardly forever. It fixes da78dbff2e05630921c551dbbc70a4b7981a8fff.
>
> I don't see exactly how that can be true.
>
> That commit didn't change the parts of the NAPI poll processing which
> are relevant here, mainly the guarding of the RX and TX work using
> the status bits which are cleared.
>
AFAICS Francois is right and patch da78dbff2e05 ("r8169: remove work
from irq handler") introduced the guarding of RX and TX work.
I just checked back to 3.16 as oldest LTS kernel version.
> Maybe I'm missing something? If so, indeed it would be nice to add
> a proper Fixes: tag here.
>
Shall I submit a v2 including the Fixes line?
> Thanks!
>
^ permalink raw reply
* Re: [PATCH net] r8169: fix NAPI handling under high load
From: Heiner Kallweit @ 2018-10-18 6:03 UTC (permalink / raw)
To: Jonathan Woithe
Cc: Francois Romieu, Holger Hoffstätte, David Miller,
Realtek linux nic maintainers, netdev@vger.kernel.org
In-Reply-To: <20181018055835.GE2487@marvin.atrad.com.au>
On 18.10.2018 07:58, Jonathan Woithe wrote:
> On Thu, Oct 18, 2018 at 01:30:51AM +0200, Francois Romieu wrote:
>> Holger Hoffstätte <holger@applied-asynchrony.com> :
>> [...]
>>> I continued to use the BQL patch in my private tree after it was reverted
>>> and also had occasional timeouts, but *only* after I started playing
>>> with ethtool to change offload settings. Without offloads or the BQL patch
>>> everything has been rock-solid since then.
>>> The other weird problem was that timeouts would occur on an otherwise
>>> *completely idle* system. Since that occasionally borked my NFS server
>>> over night I ultimately removed BQL as well. Rock-solid since then.
>>
>> The bug will induce delayed rx processing when a spike of "load" is
>> followed by an idle period.
>
> If this is the case, I wonder whether this bug might also be the cause of
> the long reception delays we've observed at times when a period of high
> network load is followed by almost nothing[1]. That thread[2] details the
> investigations subsequently done. A git bisect showed that commit
> da78dbff2e05630921c551dbbc70a4b7981a8fff was the origin of the misbehaviour
> we were observing.
>
> We still see the problem when we test with recent kernels. It would be
> great if the underlying problem has now been identified.
>
> I can possibly scrape some hardware together to test any proposed fix under
> our workload if there was interest.
>
Proposed fix is here:
https://patchwork.ozlabs.org/patch/985014/
Would be good if you could test it. Thanks!
Heiner
> Regards
> jonathan
>
> [1] https://marc.info/?l=linux-netdev&m=136281333207734&w=2
> [2] https://marc.info/?t=136281339500002&r=1&w=2
>
^ permalink raw reply
* Re: [PATCH net] r8169: fix NAPI handling under high load
From: Jonathan Woithe @ 2018-10-18 5:58 UTC (permalink / raw)
To: Francois Romieu
Cc: Holger Hoffstätte, Heiner Kallweit, David Miller,
Realtek linux nic maintainers, netdev@vger.kernel.org
In-Reply-To: <20181017233051.GB8478@electric-eye.fr.zoreil.com>
On Thu, Oct 18, 2018 at 01:30:51AM +0200, Francois Romieu wrote:
> Holger Hoffstätte <holger@applied-asynchrony.com> :
> [...]
> > I continued to use the BQL patch in my private tree after it was reverted
> > and also had occasional timeouts, but *only* after I started playing
> > with ethtool to change offload settings. Without offloads or the BQL patch
> > everything has been rock-solid since then.
> > The other weird problem was that timeouts would occur on an otherwise
> > *completely idle* system. Since that occasionally borked my NFS server
> > over night I ultimately removed BQL as well. Rock-solid since then.
>
> The bug will induce delayed rx processing when a spike of "load" is
> followed by an idle period.
If this is the case, I wonder whether this bug might also be the cause of
the long reception delays we've observed at times when a period of high
network load is followed by almost nothing[1]. That thread[2] details the
investigations subsequently done. A git bisect showed that commit
da78dbff2e05630921c551dbbc70a4b7981a8fff was the origin of the misbehaviour
we were observing.
We still see the problem when we test with recent kernels. It would be
great if the underlying problem has now been identified.
I can possibly scrape some hardware together to test any proposed fix under
our workload if there was interest.
Regards
jonathan
[1] https://marc.info/?l=linux-netdev&m=136281333207734&w=2
[2] https://marc.info/?t=136281339500002&r=1&w=2
^ permalink raw reply
* Re: [PATCH net] r8169: fix NAPI handling under high load
From: Jonathan Woithe @ 2018-10-18 6:15 UTC (permalink / raw)
To: Heiner Kallweit
Cc: Francois Romieu, Holger Hoffstätte, David Miller,
Realtek linux nic maintainers, netdev@vger.kernel.org
In-Reply-To: <8beda4fa-5d04-49e6-eb3e-5656897a301f@gmail.com>
On Thu, Oct 18, 2018 at 08:03:32AM +0200, Heiner Kallweit wrote:
> On 18.10.2018 07:58, Jonathan Woithe wrote:
> > On Thu, Oct 18, 2018 at 01:30:51AM +0200, Francois Romieu wrote:
> >> Holger Hoffstätte <holger@applied-asynchrony.com> :
> >> [...]
> >> The bug will induce delayed rx processing when a spike of "load" is
> >> followed by an idle period.
> >
> > If this is the case, I wonder whether this bug might also be the cause of
> > the long reception delays we've observed at times when a period of high
> > network load is followed by almost nothing[1]. That thread[2] details the
> > investigations subsequently done. A git bisect showed that commit
> > da78dbff2e05630921c551dbbc70a4b7981a8fff was the origin of the misbehaviour
> > we were observing.
> >
> > We still see the problem when we test with recent kernels. It would be
> > great if the underlying problem has now been identified.
> >
> > I can possibly scrape some hardware together to test any proposed fix under
> > our workload if there was interest.
> >
> Proposed fix is here:
> https://patchwork.ozlabs.org/patch/985014/
> Would be good if you could test it. Thanks!
I should be able to do so tomorrow. Which kernel would you like me to apply
the patch to?
Regards
jonathan
^ permalink raw reply
* Re: [PATCH bpf] bpf: fix doc of bpf_skb_adjust_room() in uapi
From: Nicolas Dichtel @ 2018-10-18 6:21 UTC (permalink / raw)
To: Alexei Starovoitov; +Cc: ast, daniel, davem, netdev, Quentin Monnet
In-Reply-To: <20181018044900.fvsz27v76ptgelfj@ast-mbp.dhcp.thefacebook.com>
Le 18/10/2018 à 06:49, Alexei Starovoitov a écrit :
> On Wed, Oct 17, 2018 at 04:24:48PM +0200, Nicolas Dichtel wrote:
>> len_diff is signed.
>>
>> Fixes: fa15601ab31e ("bpf: add documentation for eBPF helpers (33-41)")
>> CC: Quentin Monnet <quentin.monnet@netronome.com>
>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>> ---
>> include/uapi/linux/bpf.h | 2 +-
>> tools/include/uapi/linux/bpf.h | 2 +-
>> 2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 66917a4eba27..c4ffe91d5598 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -1430,7 +1430,7 @@ union bpf_attr {
>> * Return
>> * 0 on success, or a negative error in case of failure.
>> *
>> - * int bpf_skb_adjust_room(struct sk_buff *skb, u32 len_diff, u32 mode, u64 flags)
>> + * int bpf_skb_adjust_room(struct sk_buff *skb, s32 len_diff, u32 mode, u64 flags)
>
> Thanks. Applied to bpf-next, since we're very late into release cycle.
>
Yep, I was also wondering if it was not too late for the bpf tree ;-)
Thank you,
Nicolas
^ permalink raw reply
* Re: [PATCH net] r8169: fix NAPI handling under high load
From: David Miller @ 2018-10-18 6:24 UTC (permalink / raw)
To: hkallweit1; +Cc: romieu, nic_swsd, netdev
In-Reply-To: <d54112bb-2f02-bd59-5b5b-e211f070b20a@gmail.com>
From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Thu, 18 Oct 2018 07:58:52 +0200
> On 18.10.2018 07:21, David Miller wrote:
>> From: Francois Romieu <romieu@fr.zoreil.com>
>> Date: Thu, 18 Oct 2018 01:30:45 +0200
>>
>>> Heiner Kallweit <hkallweit1@gmail.com> :
>>> [...]
>>>> This issue has been there more or less forever (at least it exists in
>>>> 3.16 already), so I can't provide a "Fixes" tag.
>>>
>>> Hardly forever. It fixes da78dbff2e05630921c551dbbc70a4b7981a8fff.
>>
>> I don't see exactly how that can be true.
>>
>> That commit didn't change the parts of the NAPI poll processing which
>> are relevant here, mainly the guarding of the RX and TX work using
>> the status bits which are cleared.
>>
>
> AFAICS Francois is right and patch da78dbff2e05 ("r8169: remove work
> from irq handler") introduced the guarding of RX and TX work.
> I just checked back to 3.16 as oldest LTS kernel version.
Aha, now I see it.
>> Maybe I'm missing something? If so, indeed it would be nice to add
>> a proper Fixes: tag here.
>>
> Shall I submit a v2 including the Fixes line?
Yes, please do!
^ permalink raw reply
* Re: [PATCH v3 bpf-next 2/2] bpf: add tests for direct packet access from CGROUP_SKB
From: Alexei Starovoitov @ 2018-10-18 6:25 UTC (permalink / raw)
To: Song Liu; +Cc: netdev, ast, daniel, kernel-team
In-Reply-To: <20181018053949.4064426-3-songliubraving@fb.com>
On Wed, Oct 17, 2018 at 10:39:49PM -0700, Song Liu wrote:
> Tests are added to make sure CGROUP_SKB cannot access:
> tc_classid, data_meta, flow_keys
>
> and can read and write:
> mark, prority, and cb[0-4]
>
> and can read other fields.
>
> To make selftest with skb->sk work, a dummy sk is added in
> bpf_prog_test_run_skb().
>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
> net/bpf/test_run.c | 4 +
> tools/testing/selftests/bpf/test_verifier.c | 170 ++++++++++++++++++++
> 2 files changed, 174 insertions(+)
>
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index 0c423b8cd75c..c7210e2f1ae9 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -10,6 +10,7 @@
> #include <linux/etherdevice.h>
> #include <linux/filter.h>
> #include <linux/sched/signal.h>
> +#include <net/sock.h>
>
> static __always_inline u32 bpf_test_run_one(struct bpf_prog *prog, void *ctx,
> struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE])
> @@ -115,6 +116,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
> u32 retval, duration;
> int hh_len = ETH_HLEN;
> struct sk_buff *skb;
> + struct sock sk;
> void *data;
> int ret;
>
> @@ -142,6 +144,8 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
> kfree(data);
> return -ENOMEM;
> }
> + sock_init_data(NULL, &sk);
> + skb->sk = &sk;
I was about to apply it, but it crashes as:
[ 16.830822] BUG: unable to handle kernel paging request at 000000014427b974
[ 16.831363] PGD 8000000135ecf067 P4D 8000000135ecf067 PUD 0
[ 16.831792] Oops: 0000 [#1] SMP PTI
[ 16.832061] CPU: 1 PID: 1965 Comm: test_verifier Not tainted 4.19.0-rc7-02550-ga76dee97ff12 #1153
[ 16.832712] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
[ 16.833358] RIP: 0010:cmp_map_id+0x10/0x50
[ 16.835036] RSP: 0018:ffffc9000080faf8 EFLAGS: 00010246
[ 16.835429] RAX: 00000000ffffffff RBX: 0000000036069ee8 RCX: 0000000000000000
[ 16.835958] RDX: 000000014427b970 RSI: 000000014427b970 RDI: ffffc9000080fb44
[ 16.836496] RBP: 000000000000000c R08: ffffffff810f7330 R09: 0000000036069ee8
[ 16.837026] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[ 16.837554] R13: ffffffff810f7330 R14: 000000014427b970 R15: 000000001b034f74
[ 16.838083] FS: 00007fae50663700(0000) GS:ffff88013ba80000(0000) knlGS:0000000000000000
[ 16.838677] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 16.839105] CR2: 000000014427b974 CR3: 0000000135934005 CR4: 00000000003606e0
[ 16.839632] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 16.840157] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 16.840682] Call Trace:
[ 16.840897] bsearch+0x50/0x90
[ 16.841144] map_id_range_down+0x81/0xa0
[ 16.841438] make_kuid+0xf/0x10
[ 16.841677] sock_init_data+0x24f/0x260
[ 16.841979] bpf_prog_test_run_skb+0x9e/0x270
I suspect sock_net_set(sk, &init_net) is necessary before sock_init_data() call.
^ permalink raw reply
* [PATCH] net: socket: fix a missing-check bug
From: Wenwen Wang @ 2018-10-18 14:36 UTC (permalink / raw)
To: Wenwen Wang
Cc: Kangjie Lu, David S. Miller, open list:NETWORKING [GENERAL],
open list
In ethtool_ioctl(), the ioctl command 'ethcmd' is checked through a switch
statement to see whether it is necessary to pre-process the ethtool
structure, because, as mentioned in the comment, the structure
ethtool_rxnfc is defined with padding. If yes, a user-space buffer 'rxnfc'
is allocated through compat_alloc_user_space(). One thing to note here is
that, if 'ethcmd' is ETHTOOL_GRXCLSRLALL, the size of the buffer 'rxnfc' is
partially determined by 'rule_cnt', which is actually acquired from the
user-space buffer 'compat_rxnfc', i.e., 'compat_rxnfc->rule_cnt', through
get_user(). After 'rxnfc' is allocated, the data in the original user-space
buffer 'compat_rxnfc' is then copied to 'rxnfc' through copy_in_user(),
including the 'rule_cnt' field. However, after this copy, no check is
re-enforced on 'rxnfc->rule_cnt'. So it is possible that a malicious user
race to change the value in the 'compat_rxnfc->rule_cnt' between these two
copies. Through this way, the attacker can bypass the previous check on
'rule_cnt' and inject malicious data. This can cause undefined behavior of
the kernel and introduce potential security risk.
This patch avoids the above issue via copying the value acquired by
get_user() to 'rxnfc->rule_cn', if 'ethcmd' is ETHTOOL_GRXCLSRLALL.
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
---
net/socket.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/net/socket.c b/net/socket.c
index 01f3f8f..390a8ec 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2875,9 +2875,14 @@ static int ethtool_ioctl(struct net *net, struct compat_ifreq __user *ifr32)
copy_in_user(&rxnfc->fs.ring_cookie,
&compat_rxnfc->fs.ring_cookie,
(void __user *)(&rxnfc->fs.location + 1) -
- (void __user *)&rxnfc->fs.ring_cookie) ||
- copy_in_user(&rxnfc->rule_cnt, &compat_rxnfc->rule_cnt,
- sizeof(rxnfc->rule_cnt)))
+ (void __user *)&rxnfc->fs.ring_cookie))
+ return -EFAULT;
+ if (ethcmd == ETHTOOL_GRXCLSRLALL) {
+ if (put_user(rule_cnt, &rxnfc->rule_cnt))
+ return -EFAULT;
+ } else if (copy_in_user(&rxnfc->rule_cnt,
+ &compat_rxnfc->rule_cnt,
+ sizeof(rxnfc->rule_cnt)))
return -EFAULT;
}
--
2.7.4
^ permalink raw reply related
* Re: [PATCH v3 bpf-next 2/2] bpf: add tests for direct packet access from CGROUP_SKB
From: Song Liu @ 2018-10-18 6:53 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: netdev@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
Kernel Team
In-Reply-To: <20181018062557.zyeoiker7jigcv6q@ast-mbp.dhcp.thefacebook.com>
> On Oct 17, 2018, at 11:25 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Oct 17, 2018 at 10:39:49PM -0700, Song Liu wrote:
>> Tests are added to make sure CGROUP_SKB cannot access:
>> tc_classid, data_meta, flow_keys
>>
>> and can read and write:
>> mark, prority, and cb[0-4]
>>
>> and can read other fields.
>>
>> To make selftest with skb->sk work, a dummy sk is added in
>> bpf_prog_test_run_skb().
>>
>> Signed-off-by: Song Liu <songliubraving@fb.com>
>> ---
>> net/bpf/test_run.c | 4 +
>> tools/testing/selftests/bpf/test_verifier.c | 170 ++++++++++++++++++++
>> 2 files changed, 174 insertions(+)
>>
>> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
>> index 0c423b8cd75c..c7210e2f1ae9 100644
>> --- a/net/bpf/test_run.c
>> +++ b/net/bpf/test_run.c
>> @@ -10,6 +10,7 @@
>> #include <linux/etherdevice.h>
>> #include <linux/filter.h>
>> #include <linux/sched/signal.h>
>> +#include <net/sock.h>
>>
>> static __always_inline u32 bpf_test_run_one(struct bpf_prog *prog, void *ctx,
>> struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE])
>> @@ -115,6 +116,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
>> u32 retval, duration;
>> int hh_len = ETH_HLEN;
>> struct sk_buff *skb;
>> + struct sock sk;
>> void *data;
>> int ret;
>>
>> @@ -142,6 +144,8 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
>> kfree(data);
>> return -ENOMEM;
>> }
>> + sock_init_data(NULL, &sk);
>> + skb->sk = &sk;
>
> I was about to apply it, but it crashes as:
> [ 16.830822] BUG: unable to handle kernel paging request at 000000014427b974
> [ 16.831363] PGD 8000000135ecf067 P4D 8000000135ecf067 PUD 0
> [ 16.831792] Oops: 0000 [#1] SMP PTI
> [ 16.832061] CPU: 1 PID: 1965 Comm: test_verifier Not tainted 4.19.0-rc7-02550-ga76dee97ff12 #1153
> [ 16.832712] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
> [ 16.833358] RIP: 0010:cmp_map_id+0x10/0x50
> [ 16.835036] RSP: 0018:ffffc9000080faf8 EFLAGS: 00010246
> [ 16.835429] RAX: 00000000ffffffff RBX: 0000000036069ee8 RCX: 0000000000000000
> [ 16.835958] RDX: 000000014427b970 RSI: 000000014427b970 RDI: ffffc9000080fb44
> [ 16.836496] RBP: 000000000000000c R08: ffffffff810f7330 R09: 0000000036069ee8
> [ 16.837026] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
> [ 16.837554] R13: ffffffff810f7330 R14: 000000014427b970 R15: 000000001b034f74
> [ 16.838083] FS: 00007fae50663700(0000) GS:ffff88013ba80000(0000) knlGS:0000000000000000
> [ 16.838677] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 16.839105] CR2: 000000014427b974 CR3: 0000000135934005 CR4: 00000000003606e0
> [ 16.839632] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 16.840157] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 16.840682] Call Trace:
> [ 16.840897] bsearch+0x50/0x90
> [ 16.841144] map_id_range_down+0x81/0xa0
> [ 16.841438] make_kuid+0xf/0x10
> [ 16.841677] sock_init_data+0x24f/0x260
> [ 16.841979] bpf_prog_test_run_skb+0x9e/0x270
>
> I suspect sock_net_set(sk, &init_net) is necessary before sock_init_data() call.
I am not able to repro this, even with CONFIG_KASAN and CONFIG_PAGE_POISONING.
Let me try a better approach on this.
Thanks,
Song
^ permalink raw reply
* [PATCH net-next v8 00/28] WireGuard: Secure Network Tunnel
From: Jason A. Donenfeld @ 2018-10-18 14:56 UTC (permalink / raw)
To: linux-kernel, netdev, linux-crypto, davem, gregkh; +Cc: Jason A. Donenfeld
Changes v7->v8, along with who suggested it.
--------------------------------------------
- Implementations that fail the selftests are now disabled, after a warning
is printed. This way users don't make wrong calculations, even in the face
of a rather grave bug.
- [Sultan Alsawaf] When assigning to a boolean, prefer "BIT(i) & a" to "(a >>
i) & 1".
- [Andrew Lunn] Avoid control statements inside macros.
- [Jiri Pirko] Prefix functions used in callbacks with wg_.
- [Jiri Pirko] Rename struct wireguard_peer and struct wireguard_device to
struct wg_peer and struct wg_device.
- [Eugene Syromiatnikov] Do not use nla type field as an index, and actually
don't use an index at all, because it has no meaning or relevance at all.
- [Joe Perches] Do not place a space between for_each iterators and
parentheses.
- Enumerable style cleanups and nits.
- [Arnd Bergmann] Swap endianness in allowedips early on in case optimizer is
able to look a bit further in but not too far, resulting in a warning from
-Wmaybe-uninitialized.
- [Jiri Pirko] Use textual error labels instead of numerical ones.
- [Jiri Pirko] Better module description string.
- [Eric Biggers] In poly1305 port to crypto API, account for short inputs in
final function, in which case -ENOKEY should be returned.
-----------------------------------------------------------
This patchset is available on git.kernel.org in this branch, where it may be
pulled directly for inclusion into net-next:
* https://git.kernel.org/pub/scm/linux/kernel/git/zx2c4/linux.git/log/?h=jd/wireguard
-----------------------------------------------------------
WireGuard is a secure network tunnel written especially for Linux, which
has faced around three years of serious development, deployment, and
scrutiny. It delivers excellent performance and is extremely easy to
use and configure. It has been designed with the primary goal of being
both easy to audit by virtue of being small and highly secure from a
cryptography and systems security perspective. WireGuard is used by some
massive companies pushing enormous amounts of traffic, and likely
already today you've consumed bytes that at some point transited through
a WireGuard tunnel. Even as an out-of-tree module, WireGuard has been
integrated into various userspace tools, Linux distributions, mobile
phones, and data centers. There are ports in several languages to
several operating systems, and even commercial hardware and services
sold integrating WireGuard. It is time, therefore, for WireGuard to be
properly integrated into Linux.
Ample information, including documentation, installation instructions,
and project details, is available at:
* https://www.wireguard.com/
* https://www.wireguard.com/papers/wireguard.pdf
As it is currently an out-of-tree module, it lives in its own git repo
and has its own mailing list, and every commit for the module is tested
against every stable kernel since 3.10 on a variety of architectures
using an extensive test suite:
* https://git.zx2c4.com/WireGuard
https://git.kernel.org/pub/scm/linux/kernel/git/zx2c4/WireGuard.git/
* https://lists.zx2c4.com/mailman/listinfo/wireguard
* https://www.wireguard.com/build-status/
The project has been broadly discussed at conferences, and was presented
to the Netdev developers in Seoul last November, where a paper was
released detailing some interesting aspects of the project. Dave asked
me after the talk if I would consider sending in a v1 "sooner rather
than later", hence this patchset. Zinc was presented at Kernel Recipes
in September, and a video is available online. Both Zinc and WireGuard
will be presented at the conference in Vancouver in November.
* https://www.wireguard.com/presentations/
* https://www.wireguard.com/papers/wireguard-netdev22.pdf
* Zinc talk: https://www.youtube.com/watch?v=bFhdln8aJ_U
* Netdev talk: https://www.youtube.com/watch?v=54orFwtQ1XY
The cryptography in the protocol itself has been formally verified by
several independent academic teams with positive results, and I know of
two additional efforts on their way to further corroborate those
findings. The version 1 protocol is "complete", and so the purpose of
this review is to assess the implementation of the protocol. However, it
still may be of interest to know that the thing you're reviewing uses a
protocol with various nice security properties:
* https://www.wireguard.com/formal-verification/
This patchset is divided into four segments. The first introduces a very
simple helper for working with the FPU state for the purposes of amortizing
SIMD operations. The second segment is a small collection of cryptographic
primitives, split up into several commits by primitive and by hardware. The
third shows usage of Zinc within the existing crypto API and as a replacement
to the existing crypto API. The last is WireGuard itself, presented as an
unintrusive and self-contained virtual network driver.
It is intended that this entire patch series enter the kernel through
DaveM's net-next tree. Subsequently, WireGuard patches will go through
DaveM's net-next tree, while Zinc patches will go through Greg KH's tree in
cases when an entire development cycle has no relationships with existing code
in crypto/; however, if there are any relationships with code in crypto/, then
pull requests will be sent to Herbert instead in case there are merge
conflicts.
Enjoy,
Jason
^ permalink raw reply
* [PATCH net-next v8 01/28] ARM: makefile: use ARMv3M mode for RiscPC
From: Jason A. Donenfeld @ 2018-10-18 14:56 UTC (permalink / raw)
To: linux-kernel, netdev, linux-crypto, davem, gregkh; +Cc: Jason A. Donenfeld
In-Reply-To: <20181018145712.7538-1-Jason@zx2c4.com>
The purpose of CONFIG_CPU_32v3 is to avoid ldrh/strh on the RiscPC,
which is pretty much an ARMv4 device, except its bus will choke on the
half-words. The way to make the C compiler not output ldrh/strh is with
-march=armv3, which doesn't support them in the ISA. However, this
prevents certain cryptography code from working that uses instructions
like umull. Fortunately there's also -march=armv3m that does support
those, making it possible to continue assembling optimized cryptography
routines for our beloved RiscPC.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
Notes:
This commit has been submitted to the proper ARM tree and is working its
way upstream. It's included in this series here so that kbuild 0-day bot
doesn't get too nervous about RiscPC, but is already entering the tree
through arm-next.
arch/arm/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index d1516f85f25d..7fd4bcaf0721 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -74,7 +74,7 @@ endif
arch-$(CONFIG_CPU_32v5) =-D__LINUX_ARM_ARCH__=5 $(call cc-option,-march=armv5te,-march=armv4t)
arch-$(CONFIG_CPU_32v4T) =-D__LINUX_ARM_ARCH__=4 -march=armv4t
arch-$(CONFIG_CPU_32v4) =-D__LINUX_ARM_ARCH__=4 -march=armv4
-arch-$(CONFIG_CPU_32v3) =-D__LINUX_ARM_ARCH__=3 -march=armv3
+arch-$(CONFIG_CPU_32v3) =-D__LINUX_ARM_ARCH__=3 -march=armv3m
# Evaluate arch cc-option calls now
arch-y := $(arch-y)
--
2.19.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox