* [PATCH] rtlwifi: rtl8821ae: fix spelling mistake: "faill" -> "failed"
From: Colin King @ 2017-08-22 12:59 UTC (permalink / raw)
To: Chaoming Li, Kalle Valo, Ping-Ke Shih, Joe Perches, yuan linyu,
linux-wireless, netdev
Cc: linux-kernel
From: Colin Ian King <colin.king@canonical.com>
Trivial fix to spelling mistake in RT_TRACE message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
drivers/net/wireless/realtek/rtlwifi/rtl8821ae/fw.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8821ae/fw.c b/drivers/net/wireless/realtek/rtlwifi/rtl8821ae/fw.c
index b84b4fa7b71c..f2b2c549e5b2 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8821ae/fw.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8821ae/fw.c
@@ -98,7 +98,7 @@ static int _rtl8821ae_fw_free_to_go(struct ieee80211_hw *hw)
if (counter >= FW_8821AE_POLLING_TIMEOUT_COUNT) {
RT_TRACE(rtlpriv, COMP_ERR, DBG_LOUD,
- "chksum report faill ! REG_MCUFWDL:0x%08x .\n",
+ "chksum report fail! REG_MCUFWDL:0x%08x .\n",
value32);
goto exit;
}
--
2.14.1
^ permalink raw reply related
* Re: Something hitting my total number of connections to the server
From: Eric Dumazet @ 2017-08-22 13:02 UTC (permalink / raw)
To: Akshat Kakkar; +Cc: David Laight, netdev
In-Reply-To: <CAA5aLPgtfrA053gAbsXOZTdrJKtsOZODUmF5cPGmnOmsO3PbOA@mail.gmail.com>
On Tue, 2017-08-22 at 11:12 +0530, Akshat Kakkar wrote:
> There are multiple hosts/clients. All are mainly windows based.
>
> Timestamp is not used as my clients mainly are windows based and in
> that it tcp timestamp is by defauly disabled.
>
> sysctl is as follows:
>
> kernel.shmmax = 68719476736
> kernel.shmall = 4294967296
> kernel.pid_max=4194303
> vm.max_map_count=131072
> kernel.sem=250 32000 32 250
>
> net.netfilter.nf_conntrack_generic_timeout = 300
> net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 60
> net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 30
> net.netfilter.nf_conntrack_tcp_timeout_established = 7200
> net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 60
> net.netfilter.nf_conntrack_tcp_timeout_close_wait = 30
> net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
> net.netfilter.nf_conntrack_tcp_timeout_time_wait = 60
> net.netfilter.nf_conntrack_tcp_timeout_close = 10
> net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
> net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300
> net.netfilter.nf_conntrack_udp_timeout = 30
> net.netfilter.nf_conntrack_udp_timeout_stream = 180
> net.netfilter.nf_conntrack_icmp_timeout = 30
> net.netfilter.nf_conntrack_events_retry_timeout = 15
> net.core.rmem_max = 8388608
> net.core.wmem_max = 8388608
>
> net.ipv4.tcp_tw_reuse=1
> net.ipv4.tcp_tw_recycle=1
This is exactly what I feared.
We do not support tcp_tw_reuse = 1 AND tcp_tw_recycle = 1
This is a very well known bad combination.
> net.ipv4.tcp_fin_timeout=30
> net.ipv4.tcp_keepalive_time=1800
> net.ipv4.tcp_keepalive_intvl=60
> net.ipv4.tcp_keepalive_probes=20
> net.ipv4.tcp_max_syn_backlog=4096
> net.ipv4.tcp_syncookies=1
> net.ipv4.tcp_sack=1
> net.ipv4.tcp_dsack=1
> net.ipv4.tcp_window_scaling=1
> net.ipv4.tcp_syn_retries=3
> net.ipv4.tcp_synack_retries=3
> net.ipv4.tcp_retries1=3
> net.ipv4.tcp_retries2=15
> net.ipv4.ip_local_port_range=1024 65535
>
> net.ipv4.tcp_timestamps=0
>
> net.core.netdev_max_backlog=10000
This is an insane backlog.
> net.core.somaxconn=100000
> net.core.optmem_max=81920
>
> net.netfilter.nf_conntrack_max=524288
> net.nf_conntrack_max=524288
> net.ipv6.conf.all.disable_ipv6 = 1
> fs.file-max=1000000
>
> net.ipv4.tcp_no_metrics_save = 1
> net.ipv4.tcp_max_syn_backlog = 10240
> net.ipv4.tcp_congestion_control=htcp
>
> net.ipv4.tcp_rfc1337 = 1
> net.core.netdev_max_backlog = 65536
This is a crazy backlog. Do not do that.
> net.ipv4.tcp_max_tw_buckets = 1440000
>
> net.core.rmem_max = 134217728
> net.core.wmem_max = 134217728
>
>
>
It looks like your sysctls have been set to unreasonable values.
^ permalink raw reply
* Re: [PATCH] vsock: only load vmci transport on VMware hypervisor by default
From: Jorgen S. Hansen @ 2017-08-22 13:07 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Dexuan Cui, davem@davemloft.net, netdev@vger.kernel.org,
gregkh@linuxfoundation.org, devel@linuxdriverproject.org,
KY Srinivasan, Haiyang Zhang, Stephen Hemminger, George Zhang,
Michal Kubecek, Asias He, Vitaly Kuznetsov, Cathy Avery,
jasowang@redhat.com, Rolf Neugebauer, Dave Scott, Marcelo Cerri,
apw@canonical.com
In-Reply-To: <20170822095437.GB16799@stefanha-x1.localdomain>
> On Aug 22, 2017, at 11:54 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Fri, Aug 18, 2017 at 11:07:37PM +0000, Dexuan Cui wrote:
>>> Not all features need to be supported. For example, VMCI supports
>>> SOCK_DGRAM while Hyper-V and virtio do not. But features that are
>>> available should behave identically.
>> I totally agree, though I'm afraid Hyper-V may have a little more limitations
>> compared to VMware/KVM duo to the <VM_ID, ServiceID> <--> <cid, port>
>> mapping.
>>
>>>> Can we use the 'protocol' parameter in the socket() function:
>>>> int socket(int domain, int type, int protocol)
>>>>
>>>> IMO currently the 'protocol' is not really used.
>>>> I think we can modify __vsock_core_init() to allow multiple transport layers
>>> to
>>>> be registered, and we can define different 'protocol' numbers for
>>>> VMware/KVM/Hyper-V, and ask the application to explicitly specify what
>>> should
>>>> be used. Considering compatibility, we can use the default transport in a
>>> given
>>>> VM depending on the underlying hypervisor.
>>>
>>> I think AF_VSOCK should hide the transport from users/applications.
>> Ideally yes, but let's consider the KVM-on-KVM nested scenario: when
>> an application in the Level-1 VM creates an AF_VSOCK socket and call
>> connect() for it, how can we know if the app is trying to connect to
>> the Level-0 host, or connect to the Level-2 VM? We can't.
>
> We *can* by looking at the destination CID. Please take a look at
> drivers/misc/vmw_vmci/vmci_route.c:vmci_route() to see how VMCI handles
> nested virt.
>
> It boils down to something like this:
>
> static int vsock_stream_connect(struct socket *sock, struct sockaddr *addr,
> int addr_len, int flags)
> {
> ...
> if (remote_addr.svm_cid == VMADDR_CID_HOST)
> transport = host_transport;
> else
> transport = guest_transport;
>
> It's easy for connect(2) but Jorgen mentioned it's harder for listen(2)
> because the socket would need to listen on both transports. We define
> two new constants VMADDR_CID_LISTEN_FROM_GUEST and
> VMADDR_CID_LISTEN_FROM_HOST for bind(2) so that applications can decide
> which side to listen on.
If a socket is bound to VMADDR_CID_HOST, we would consider that socket as bound to the host side transport, so that would be the same as VMADDR_CID_LISTEN_FROM_GUEST. For the guest, we have IOCTL_VM_SOCKETS_GET_LOCAL_CID, so that could be used to get and bind a socket to the guest transport (VMCI will always return the guest CID as the local one, if the VMCI driver is used in a guest, and it looks like virtio will do the same). We could treat VMADDR_CID_ANY as always being the guest transport, since that is the use case where you don’t know upfront what your CID is, if we don’t want to listen on all transports. So we would use the host transport, if a socket is bound to VMADDR_CID_HOST, or if there is no guest transport, and in all other cases use the guest transport. However, having a couple of symbolic names like you suggest certainly makes it more obvious, and could be used in combination with this. It would be a plus if existing applications would function as intended in most cases.
> Or the listen socket could simply listen to
> both sides.
The only problem here would be the potential for a guest and a host app to have a conflict wrt port numbers, even though they would be able to operate fine, if restricted to their appropriate transport.
Thanks,
Jorgen
^ permalink raw reply
* Re: [PATCH] net: mdio-gpio: make mdiobb_ops const
From: Andrew Lunn @ 2017-08-22 13:15 UTC (permalink / raw)
To: Bhumika Goyal; +Cc: julia.lawall, f.fainelli, netdev, linux-kernel
In-Reply-To: <1503389609-17490-1-git-send-email-bhumirks@gmail.com>
On Tue, Aug 22, 2017 at 01:43:29PM +0530, Bhumika Goyal wrote:
> Make this const as it is only stored in a const field of a
> mdiobb_ctrl structure.
>
> Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Andrew
^ permalink raw reply
* [PATCH net-next 0/4] bpf: verifier fixes
From: Edward Cree @ 2017-08-22 13:17 UTC (permalink / raw)
To: davem, Alexei Starovoitov, Alexei Starovoitov, Daniel Borkmann
Cc: netdev, iovisor-dev
Fix a couple of bugs introduced in my recent verifier patches.
Patch #3 does slightly increase the insn count on bpf_lxc.o, but only by
about a hundred insns (i.e. 0.2%).
Alexei Starovoitov (1):
selftests/bpf: add a test for a pruning bug in the verifier
Edward Cree (3):
bpf/verifier: remove varlen_map_value_access flag
bpf/verifier: when pruning a branch, ignore its write marks
bpf/verifier: document liveness analysis
include/linux/bpf_verifier.h | 14 +++++-
kernel/bpf/verifier.c | 78 +++++++++++++++++------------
tools/testing/selftests/bpf/test_verifier.c | 28 +++++++++++
3 files changed, 87 insertions(+), 33 deletions(-)
^ permalink raw reply
* [PATCH net-next 1/4] selftests/bpf: add a test for a pruning bug in the verifier
From: Edward Cree via iovisor-dev @ 2017-08-22 13:26 UTC (permalink / raw)
To: davem-fT/PcQaiUtIeIZ0/mPfg9Q, Alexei Starovoitov,
Alexei Starovoitov, Daniel Borkmann
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev
In-Reply-To: <f25f46bd-27cb-2182-2551-2887757c1e0f-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>
From: Alexei Starovoitov <ast-b10kYP2dOMg@public.gmane.org>
The test makes a read through a map value pointer, then considers pruning
a branch where the register holds an adjusted map value pointer. It
should not prune, but currently it does.
Signed-off-by: Alexei Starovoitov <ast-b10kYP2dOMg@public.gmane.org>
[ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org: added test-name and patch description]
Signed-off-by: Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>
---
tools/testing/selftests/bpf/test_verifier.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index c03542c..6ca1487 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -6487,6 +6487,34 @@ static struct bpf_test tests[] = {
.result = REJECT,
.prog_type = BPF_PROG_TYPE_LWT_IN,
},
+ {
+ "varlen_map_value_access pruning",
+ .insns = {
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 8),
+ BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 0),
+ BPF_MOV32_IMM(BPF_REG_2, MAX_ENTRIES),
+ BPF_JMP_REG(BPF_JSGT, BPF_REG_2, BPF_REG_1, 1),
+ BPF_MOV32_IMM(BPF_REG_1, 0),
+ BPF_ALU32_IMM(BPF_LSH, BPF_REG_1, 2),
+ BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1),
+ BPF_JMP_IMM(BPF_JA, 0, 0, 0),
+ BPF_ST_MEM(BPF_DW, BPF_REG_0, 0,
+ offsetof(struct test_val, foo)),
+ BPF_EXIT_INSN(),
+ },
+ .fixup_map2 = { 3 },
+ .errstr_unpriv = "R0 leaks addr",
+ .errstr = "R0 unbounded memory access",
+ .result_unpriv = REJECT,
+ .result = REJECT,
+ .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
+ },
};
static int probe_filter_length(const struct bpf_insn *fp)
^ permalink raw reply related
* [PATCH net-next 2/4] bpf/verifier: remove varlen_map_value_access flag
From: Edward Cree via iovisor-dev @ 2017-08-22 13:26 UTC (permalink / raw)
To: davem-fT/PcQaiUtIeIZ0/mPfg9Q, Alexei Starovoitov,
Alexei Starovoitov, Daniel Borkmann
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev
In-Reply-To: <f25f46bd-27cb-2182-2551-2887757c1e0f-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>
The optimisation it does is broken when the 'new' register value has a
variable offset and the 'old' was constant. I broke it with my pointer
types unification (see Fixes tag below), before which the 'new' value
would have type PTR_TO_MAP_VALUE_ADJ and would thus not compare equal;
other changes in that patch mean that its original behaviour (ignore
min/max values) cannot be restored.
Tests on a sample set of cilium programs show no change in count of
processed instructions.
Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Signed-off-by: Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>
---
include/linux/bpf_verifier.h | 1 -
kernel/bpf/verifier.c | 41 ++++++++++++-----------------------------
2 files changed, 12 insertions(+), 30 deletions(-)
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 91d07ef..d8f131a 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -125,7 +125,6 @@ struct bpf_verifier_env {
u32 id_gen; /* used to generate unique reg IDs */
bool allow_ptr_leaks;
bool seen_direct_write;
- bool varlen_map_value_access;
struct bpf_insn_aux_data *insn_aux_data; /* array of per-insn state */
};
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e42c096..1e3f56c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -832,11 +832,6 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
*/
if (log_level)
print_verifier_state(state);
- /* If the offset is variable, we will need to be stricter in state
- * pruning from now on.
- */
- if (!tnum_is_const(reg->var_off))
- env->varlen_map_value_access = true;
/* The minimum value is only important with signed
* comparisons where we can't assume the floor of a
* value is 0. If we are using signed variables for our
@@ -3247,9 +3242,8 @@ static bool check_ids(u32 old_id, u32 cur_id, struct idpair *idmap)
}
/* Returns true if (rold safe implies rcur safe) */
-static bool regsafe(struct bpf_reg_state *rold,
- struct bpf_reg_state *rcur,
- bool varlen_map_access, struct idpair *idmap)
+static bool regsafe(struct bpf_reg_state *rold, struct bpf_reg_state *rcur,
+ struct idpair *idmap)
{
if (!(rold->live & REG_LIVE_READ))
/* explored state didn't use this */
@@ -3281,22 +3275,14 @@ static bool regsafe(struct bpf_reg_state *rold,
tnum_is_unknown(rold->var_off);
}
case PTR_TO_MAP_VALUE:
- if (varlen_map_access) {
- /* If the new min/max/var_off satisfy the old ones and
- * everything else matches, we are OK.
- * We don't care about the 'id' value, because nothing
- * uses it for PTR_TO_MAP_VALUE (only for ..._OR_NULL)
- */
- return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0 &&
- range_within(rold, rcur) &&
- tnum_in(rold->var_off, rcur->var_off);
- } else {
- /* If the ranges/var_off were not the same, but
- * everything else was and we didn't do a variable
- * access into a map then we are a-ok.
- */
- return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0;
- }
+ /* If the new min/max/var_off satisfy the old ones and
+ * everything else matches, we are OK.
+ * We don't care about the 'id' value, because nothing
+ * uses it for PTR_TO_MAP_VALUE (only for ..._OR_NULL)
+ */
+ return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0 &&
+ range_within(rold, rcur) &&
+ tnum_in(rold->var_off, rcur->var_off);
case PTR_TO_MAP_VALUE_OR_NULL:
/* a PTR_TO_MAP_VALUE could be safe to use as a
* PTR_TO_MAP_VALUE_OR_NULL into the same map.
@@ -3380,7 +3366,6 @@ static bool states_equal(struct bpf_verifier_env *env,
struct bpf_verifier_state *old,
struct bpf_verifier_state *cur)
{
- bool varlen_map_access = env->varlen_map_value_access;
struct idpair *idmap;
bool ret = false;
int i;
@@ -3391,8 +3376,7 @@ static bool states_equal(struct bpf_verifier_env *env,
return false;
for (i = 0; i < MAX_BPF_REG; i++) {
- if (!regsafe(&old->regs[i], &cur->regs[i], varlen_map_access,
- idmap))
+ if (!regsafe(&old->regs[i], &cur->regs[i], idmap))
goto out_free;
}
@@ -3412,7 +3396,7 @@ static bool states_equal(struct bpf_verifier_env *env,
continue;
if (!regsafe(&old->spilled_regs[i / BPF_REG_SIZE],
&cur->spilled_regs[i / BPF_REG_SIZE],
- varlen_map_access, idmap))
+ idmap))
/* when explored and current stack slot are both storing
* spilled registers, check that stored pointers types
* are the same as well.
@@ -3550,7 +3534,6 @@ static int do_check(struct bpf_verifier_env *env)
init_reg_state(regs);
state->parent = NULL;
insn_idx = 0;
- env->varlen_map_value_access = false;
for (;;) {
struct bpf_insn *insn;
u8 class;
^ permalink raw reply related
* [PATCH net-next 3/4] bpf/verifier: when pruning a branch, ignore its write marks
From: Edward Cree @ 2017-08-22 13:27 UTC (permalink / raw)
To: davem, Alexei Starovoitov, Alexei Starovoitov, Daniel Borkmann
Cc: netdev, iovisor-dev
In-Reply-To: <f25f46bd-27cb-2182-2551-2887757c1e0f@solarflare.com>
The fact that writes occurred in reaching the continuation state does
not screen off its reads from us, because we're not really its parent.
So detect 'not really the parent' in do_propagate_liveness, and ignore
write marks in that case.
Fixes: dc503a8ad984 ("bpf/verifier: track liveness for pruning")
Signed-off-by: Edward Cree <ecree@solarflare.com>
---
kernel/bpf/verifier.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 1e3f56c..711bdbd 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3420,6 +3420,7 @@ static bool states_equal(struct bpf_verifier_env *env,
static bool do_propagate_liveness(const struct bpf_verifier_state *state,
struct bpf_verifier_state *parent)
{
+ bool writes = parent == state->parent; /* Observe write marks */
bool touched = false; /* any changes made? */
int i;
@@ -3431,7 +3432,9 @@ static bool do_propagate_liveness(const struct bpf_verifier_state *state,
for (i = 0; i < BPF_REG_FP; i++) {
if (parent->regs[i].live & REG_LIVE_READ)
continue;
- if (state->regs[i].live == REG_LIVE_READ) {
+ if (writes && (state->regs[i].live & REG_LIVE_WRITTEN))
+ continue;
+ if (state->regs[i].live & REG_LIVE_READ) {
parent->regs[i].live |= REG_LIVE_READ;
touched = true;
}
@@ -3444,7 +3447,9 @@ static bool do_propagate_liveness(const struct bpf_verifier_state *state,
continue;
if (parent->spilled_regs[i].live & REG_LIVE_READ)
continue;
- if (state->spilled_regs[i].live == REG_LIVE_READ) {
+ if (writes && (state->spilled_regs[i].live & REG_LIVE_WRITTEN))
+ continue;
+ if (state->spilled_regs[i].live & REG_LIVE_READ) {
parent->spilled_regs[i].live |= REG_LIVE_READ;
touched = true;
}
^ permalink raw reply related
* [PATCH net-next 4/4] bpf/verifier: document liveness analysis
From: Edward Cree via iovisor-dev @ 2017-08-22 13:27 UTC (permalink / raw)
To: davem-fT/PcQaiUtIeIZ0/mPfg9Q, Alexei Starovoitov,
Alexei Starovoitov, Daniel Borkmann
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev
In-Reply-To: <f25f46bd-27cb-2182-2551-2887757c1e0f-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>
The liveness tracking algorithm is quite subtle; add comments to explain it.
Signed-off-by: Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>
---
include/linux/bpf_verifier.h | 13 +++++++++++++
kernel/bpf/verifier.c | 28 +++++++++++++++++++++++++++-
2 files changed, 40 insertions(+), 1 deletion(-)
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index d8f131a..b8d200f 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -21,6 +21,19 @@
*/
#define BPF_MAX_VAR_SIZ INT_MAX
+/* Liveness marks, used for registers and spilled-regs (in stack slots).
+ * Read marks propagate upwards until they find a write mark; they record that
+ * "one of this state's descendants read this reg" (and therefore the reg is
+ * relevant for states_equal() checks).
+ * Write marks collect downwards and do not propagate; they record that "the
+ * straight-line code that reached this state (from its parent) wrote this reg"
+ * (and therefore that reads propagated from this state or its descendants
+ * should not propagate to its parent).
+ * A state with a write mark can receive read marks; it just won't propagate
+ * them to its parent, since the write mark is a property, not of the state,
+ * but of the link between it and its parent. See mark_reg_read() and
+ * mark_stack_slot_read() in kernel/bpf/verifier.c.
+ */
enum bpf_reg_liveness {
REG_LIVE_NONE = 0, /* reg hasn't been read or written this branch */
REG_LIVE_READ, /* reg was read, so we're sensitive to initial value */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 711bdbd..5fc350e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3417,6 +3417,12 @@ static bool states_equal(struct bpf_verifier_env *env,
return ret;
}
+/* A write screens off any subsequent reads; but write marks come from the
+ * straight-line code between a state and its parent. When we arrive at a
+ * jump target (in the first iteration of the propagate_liveness() loop),
+ * we didn't arrive by the straight-line code, so read marks in state must
+ * propagate to parent regardless of state's write marks.
+ */
static bool do_propagate_liveness(const struct bpf_verifier_state *state,
struct bpf_verifier_state *parent)
{
@@ -3457,6 +3463,15 @@ static bool do_propagate_liveness(const struct bpf_verifier_state *state,
return touched;
}
+/* "parent" is "a state from which we reach the current state", but initially
+ * it is not the state->parent (i.e. "the state whose straight-line code leads
+ * to the current state"), instead it is the state that happened to arrive at
+ * a (prunable) equivalent of the current state. See comment above
+ * do_propagate_liveness() for consequences of this.
+ * This function is just a more efficient way of calling mark_reg_read() or
+ * mark_stack_slot_read() on each reg in "parent" that is read in "state", so
+ * long as parent != state->parent.
+ */
static void propagate_liveness(const struct bpf_verifier_state *state,
struct bpf_verifier_state *parent)
{
@@ -3485,6 +3500,12 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
/* reached equivalent register/stack state,
* prune the search.
* Registers read by the continuation are read by us.
+ * If we have any write marks in env->cur_state, they
+ * will prevent corresponding reads in the continuation
+ * from reaching our parent (an explored_state). Our
+ * own state will get the read marks recorded, but
+ * they'll be immediately forgotten as we're pruning
+ * this state and will pop a new one.
*/
propagate_liveness(&sl->state, &env->cur_state);
return 1;
@@ -3508,7 +3529,12 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
env->explored_states[insn_idx] = new_sl;
/* connect new state to parentage chain */
env->cur_state.parent = &new_sl->state;
- /* clear liveness marks in current state */
+ /* clear write marks in current state: the writes we did are not writes
+ * our child did, so they don't screen off its reads from us.
+ * (There are no read marks in current state, because reads always mark
+ * their parent and current state never has children yet. Only
+ * explored_states can get read marks.)
+ */
for (i = 0; i < BPF_REG_FP; i++)
env->cur_state.regs[i].live = REG_LIVE_NONE;
for (i = 0; i < MAX_BPF_STACK / BPF_REG_SIZE; i++)
^ permalink raw reply related
* [PATCH net] macsec: add genl family module alias
From: Sabrina Dubroca @ 2017-08-22 13:36 UTC (permalink / raw)
To: netdev; +Cc: Sabrina Dubroca
This helps tools such as wpa_supplicant can start even if the macsec
module isn't loaded yet.
Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
---
I think this should go to stable as well.
drivers/net/macsec.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index 5e1ab1160856..98e4deaa3a6a 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -3521,6 +3521,7 @@ module_init(macsec_init);
module_exit(macsec_exit);
MODULE_ALIAS_RTNL_LINK("macsec");
+MODULE_ALIAS_GENL_FAMILY("macsec");
MODULE_DESCRIPTION("MACsec IEEE 802.1AE");
MODULE_LICENSE("GPL v2");
--
2.14.1
^ permalink raw reply related
* Re: [PATCH v2 3/4] net: phy: realtek: add disable RX internal delay mode
From: Andrew Lunn @ 2017-08-22 13:39 UTC (permalink / raw)
To: Icenowy Zheng
Cc: Maxime Ripard, Chen-Yu Tsai, Florian Fainelli, Corentin Labbe,
devicetree-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-sunxi-/JYPxA39Uh5TLH3MbocFFw, Icenowy Zheng,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
In-Reply-To: <20170822040400.12166-4-icenowy-h8G6r0blFSE@public.gmane.org>
On Tue, Aug 22, 2017 at 12:03:59PM +0800, Icenowy Zheng wrote:
> From: Icenowy Zheng <icenowy-ymACFijhrKM@public.gmane.org>
>
> Some RTL8211E chips have broken GbE function, which needs a hack to
> fix. It's said that this fix will affect the performance on not-buggy
> PHYs, so it should only be enabled on boards with the broken PHY.
I would not call this a broken PHY. It is a question of board
design. If you have shorter tracks, you need a delay. If you have long
tracks, you don't. Hence the four RGMII modes, so you can select if
the delay is needed or not, depending on your board design.
> Currently only some Pine64+ boards are known to have this issue.
Design feature.
Please remove all reference to bug.
> This hack is said to disable RX relay for RTL8211E according to Realtek.
> So implement it as RGMII-TXID mode.
Do we know that the TX lines do have a delay after making this change?
We need to be careful here. We will get into a mess if this is
actually RGMII not RGMII_TXID, and somebody designs a board which
needs RGMII-TXID.
> +#include <linux/of.h>
> #include <linux/phy.h>
> #include <linux/module.h>
> +static int rtl8211e_config_init(struct phy_device *phydev)
> +{
> + struct device *dev = &phydev->mdio.dev;
> + struct device_node *of_node = dev->of_node;
You don't appear to use of_node, nor dev. Please remove, along with
the header file.
Andrew
^ permalink raw reply
* Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi
From: Koichiro Den @ 2017-08-22 13:42 UTC (permalink / raw)
To: Willem de Bruijn, Jason Wang
Cc: Network Development, virtualization, Michael S. Tsirkin
In-Reply-To: <CAF=yD-L-9QnxV2kM_j2BxVk=h7i3wvMZxG-3_=0Q=jDPgeT=VQ@mail.gmail.com>
On Mon, 2017-08-21 at 23:10 -0400, Willem de Bruijn wrote:
> > > > Interesting, deadlock could be treated as a a radical case of the
> > > > discussion
> > > > here https://patchwork.kernel.org/patch/3787671/.
> > > >
> > > > git grep tells more similar skb_orphan() cases. Do we need to change
> > > > them
> > > > all (or part)?
> > >
> > > Most skb_orphan calls are not relevant to the issue of transmit delay.
> >
> >
> > Yes, but at least we should audit the ones in drivers/net.
>
> Do you mean other virtual device driver transmit paths, like xen,
> specifically?
>
> > > > Actually, we may meet similar issues at many other places (e.g netem).
> > >
> > > Netem is an interesting case. Because it is intended to mimic network
> > > delay, at least in the case where it calls skb_orphan, it may make
> > > sense to release all references, including calling skb_zcopy_clear.
> > >
> > > In general, zerocopy reverts to copy on all paths that may cause
> > > unbounded delay due to another process. Guarding against delay
> > > induced by the administrator is infeasible. It is always possible to
> > > just pause the nic. Netem is one instance of that, and not unbounded.
> >
> >
> > The problem is, admin may only delay the traffic in e.g one interface, but
> > it actually delay or stall all traffic inside a VM.
>
> Understood. Ideally we can remove the HoL blocking cause of this,
> itself.
>
> > > > Need
> > > > to consider a complete solution for this. Figuring out all places that
> > > > could
> > > > delay a packet is a method.
> > >
> > > The issue described in the referenced patch seems like head of line
> > > blocking between two flows. If one flow delays zerocopy descriptor
> > > release from the vhost-net pool, it blocks all subsequent descriptors
> > > in that pool from being released, also delaying other flows that use
> > > the same descriptor pool. If the pool is empty, all transmission stopped.
> > >
> > > Reverting to copy tx when the pool reaches a low watermark, as the
> > > patch does, fixes this.
> >
> >
> > An issue of the referenced patch is that sndbuf could be smaller than low
> > watermark.
We cannot determine the low watermark properly because of not only sndbuf size
issue but also the fact that the upper vhost-net cannot directly see how much
descriptor is currently available at the virtio-net tx queue. It depends on
multiqueue settings or other senders which are also using the same tx queue.
Note that in the latter case if they constantly transmitting, the deadlock could
not occur(*), however if it has just temporarily fulfill some portion of the
pool in the mean time, then the low watermark cannot be helpful.
(*: That is because it's reliable enough in the sense I mention below.)
Keep in this in mind, let me briefly describe the possible deadlock I mentioned:
(1). vhost-net on L1 guest has nothing to do sendmsg until the upper layer sets
new descriptors, which depends only on the vhost-net zcopy callback and adding
newly used descriptors.
(2). vhost-net callback depends on the skb freeing on the xmit path only.
(3). the xmit path depends (possibly only) on the vhost-net sendmsg.
As you see, it's enough to bring about the situation above that L1 virtio-net
reaches its limit earlier than the L0 host processing. The vhost-net pool could
be almost full or empty, whatever.
Also note that relaxing the restriction (2) (and thus remove the dependency (3)
-> (1)) seems quite natural but it needs to be reliable enough to never miss the
last trigger if it's based on interruption. We might need to do some
modification on virtio interruption mitigation in none tx napi case but it's too
costly and not fair to cover the upper layer peculiarity.
So I think drivers which holds characteristics such as (2) is worth checking
whether we should revise.
> >
> > > Perhaps the descriptor pool should also be
> > > revised to allow out of order completions. Then there is no need to
> > > copy zerocopy packets whenever they may experience delay.
> >
> >
> > Yes, but as replied in the referenced thread, windows driver may treat out
> > of order completion as a bug.
>
> Interesting. I missed that. Perhaps the zerocopy optimization
> could be gated on guest support for out of order completions.
>
> > > On the point of counting copy vs zerocopy: the new msg_zerocopy
> > > variant of ubuf_info has a field to record whether a deep copy was
> > > made. This can be used with vhost-net zerocopy, too.
> >
> >
> > Just to make sure I understand. It's still not clear to me how to reuse this
> > for vhost-net, e.g zerocopy flag is in a union which is not used by
> > vhost_net.
>
> True, but that is not set in stone. I went back and forth on that when
> preparing fix 0a4a060bb204 ("sock: fix zerocopy_success regression
> with msg_zerocopy"). The field can be moved outside the union and
> initialized in the other zerocopy paths.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi
From: Koichiro Den @ 2017-08-22 14:04 UTC (permalink / raw)
To: Willem de Bruijn; +Cc: Michael S. Tsirkin, virtualization, Network Development
In-Reply-To: <CAF=yD-LO0rP1A_hrxqx_1HYEv6qyk=Sc=D7iH7cPh2YXDm=cYQ@mail.gmail.com>
On Tue, 2017-08-22 at 08:11 -0400, Willem de Bruijn wrote:
> On Sun, Aug 20, 2017 at 4:49 PM, Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> > On Sat, Aug 19, 2017 at 2:38 AM, Koichiro Den <den@klaipeden.com> wrote:
> > > Facing the possible unbounded delay relying on freeing on xmit path,
> > > we also better to invoke and clear the upper layer zerocopy callback
> > > beforehand to keep them from waiting for unbounded duration in vain.
> >
> > Good point.
> >
> > > For instance, this removes the possible deadlock in the case that the
> > > upper layer is a zerocopy-enabled vhost-net.
> > > This does not apply if napi_tx is enabled since it will be called in
> > > reasonale time.
> >
> > Indeed. Btw, I am gathering data to eventually make napi the default
> > mode. But that is taking some time.
> >
> > >
> > > Signed-off-by: Koichiro Den <den@klaipeden.com>
> > > ---
> > > drivers/net/virtio_net.c | 8 ++++++++
> > > 1 file changed, 8 insertions(+)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 4302f313d9a7..f7deaa5b7b50 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -1290,6 +1290,14 @@ static netdev_tx_t start_xmit(struct sk_buff *skb,
> > > struct net_device *dev)
> > >
> > > /* Don't wait up for transmitted skbs to be freed. */
> > > if (!use_napi) {
> > > + if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
> > > + struct ubuf_info *uarg;
> > > + uarg = skb_shinfo(skb)->destructor_arg;
> > > + if (uarg->callback)
> > > + uarg->callback(uarg, true);
> > > + skb_shinfo(skb)->destructor_arg = NULL;
> > > + skb_shinfo(skb)->tx_flags &= ~SKBTX_DEV_ZEROCOPY;
> > > + }
> >
> > Instead of open coding, this can use skb_zcopy_clear.
>
> It is not correct to only send the zerocopy completion here, as
> the skb will still be shared with the nic. The only safe approach
> to notifying early is to create a copy with skb_orphan_frags_rx.
> That will call skb_zcopy_clear(skb, false) internally. But the
> copy will be very expensive.
I noticed this email after my last post. I cannot not imagine how it could be
unsafe in virtio case. Sorry could you explain a bit more?
>
> Is the deadlock you refer to the case where tx interrupts are
> disabled, so transmit completions are only handled in start_xmit
> and somehow the socket(s) are unable to send new data? The
> question is what is blocking them. If it is zerocopy, it may be a
> too low optmem_max or hitting the per-user locked pages limit.
> We may have to keep interrupts enabled when zerocopy skbs
> are in flight.
Even if we keep interrupts enabled, We miss the completion without start_xmit.
And it's also likely that the next start_xmit depends on the completion itself
as I described in my last post.
^ permalink raw reply
* Re: [PATCH 1/2] vhost: remove the possible fruitless search on iotlb prefetch
From: Koichiro Den @ 2017-08-22 14:09 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: netdev, kvm, virtualization
In-Reply-To: <20170821223039-mutt-send-email-mst@kernel.org>
On Mon, 2017-08-21 at 22:45 +0300, Michael S. Tsirkin wrote:
> On Sat, Aug 19, 2017 at 03:41:14PM +0900, Koichiro Den wrote:
> > Signed-off-by: Koichiro Den <den@klaipeden.com>
> > ---
> > drivers/vhost/vhost.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index e4613a3c362d..93e909afc1c3 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -1184,7 +1184,7 @@ static int iotlb_access_ok(struct vhost_virtqueue *vq,
> > while (len > s) {
> > node = vhost_umem_interval_tree_iter_first(&umem-
> > >umem_tree,
> > addr,
> > - addr + len - 1);
> > + addr + len - s -
> > 1);
> > if (node == NULL || node->start > addr) {
> > vhost_iotlb_miss(vq, addr, access);
> > return false;
>
> This works but it probably makes sense to just refactor the code to make end
> of
> range a variable. I posted a patch like this, pls take a look.
>
> > --
> > 2.9.4
> >
Ok, thanks.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next 03/18] net: mvpp2: set the SMI PHY address when connecting to the PHY
From: Antoine Tenart @ 2017-08-22 14:41 UTC (permalink / raw)
To: Andrew Lunn
Cc: Antoine Tenart, davem, jason, gregory.clement,
sebastian.hesselbarth, thomas.petazzoni, nadavh, linux, mw,
stefanc, netdev, linux-arm-kernel
In-Reply-To: <20170728042153.GH18666@lunn.ch>
[-- Attachment #1: Type: text/plain, Size: 2086 bytes --]
Hi Andrew,
On Fri, Jul 28, 2017 at 06:21:53AM +0200, Andrew Lunn wrote:
> On Thu, Jul 27, 2017 at 06:49:05PM -0700, Antoine Tenart wrote:
> > On Wed, Jul 26, 2017 at 06:08:06PM +0200, Andrew Lunn wrote:
> > > On Mon, Jul 24, 2017 at 03:48:33PM +0200, Antoine Tenart wrote:
> > > >
> > > > + if (priv->hw_version != MVPP22)
> > > > + return 0;
> > > > +
> > > > + /* Set the SMI PHY address */
> > > > + if (of_property_read_u32(port->phy_node, "reg", &phy_addr)) {
> > > > + netdev_err(port->dev, "cannot find the PHY address\n");
> > > > + return -EINVAL;
> > > > + }
> > > > +
> > > > + writel(phy_addr, priv->iface_base + MVPP22_SMI_PHY_ADDR(port->gop_id));
> > > > return 0;
> > > > }
> > >
> > > Why does the MAC need to know this address? The phylib and PHY driver
> > > should be the only thing accessing the PHY, otherwise you are asking
> > > for trouble.
> >
> > This is part of the SMI/xSMI interface. I added into the mvpp2 driver
> > and not in the mvmdio one because the GoP port number must be known to
> > set this register (so that would be even less clean to do it).
>
> It is still not clear to my why you need to program the address into
> the hardware. Is the hardware talking to the PHY?
Sorry for the answer delay, I was out of the office...
This PHY address configuration should be done in the mvmdio driver as
this is not directly related to the PPv2 (well, the mvmdio driver is
only an abstraction to reuse the mdio code, using registers exposed by
PPv2 in this case anyway). But two values must be known in order to do
this: the PHY address and the GoP port number. Getting the last one from
the mvmdio driver would be really ugly as we would need to read the PPv2
dt node. This is why this patch adds it in the PPv2 driver, but I know
it's not perfect.
I'll resend a series very soon, with this patch still included. We can
continue the discussion there I guess, if needed.
Thanks!
Antoine
--
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH] vhost: fix end of range for access_ok
From: Koichiro Den @ 2017-08-22 14:49 UTC (permalink / raw)
To: Michael S. Tsirkin, linux-kernel
Cc: netdev, David Miller, kvm, virtualization
In-Reply-To: <1503344576-8141-1-git-send-email-mst@redhat.com>
On Mon, 2017-08-21 at 22:45 +0300, Michael S. Tsirkin wrote:
> During access_ok checks, addr increases as we iterate over the data
> structure, thus addr + len - 1 will point beyond the end of region we
> are translating. Harmless since we then verify that the region covers
> addr, but let's not waste cpu cycles.
>
> Reported-by: Koichiro Den <den@klaipeden.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>
> Lightly tested, would appreciate an ack from reporter.
>
> drivers/vhost/vhost.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index e4613a3..ecd70e4 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1176,7 +1176,7 @@ static int iotlb_access_ok(struct vhost_virtqueue *vq,
> {
> const struct vhost_umem_node *node;
> struct vhost_umem *umem = vq->iotlb;
> - u64 s = 0, size, orig_addr = addr;
> + u64 s = 0, size, orig_addr = addr, last = addr + len - 1;
>
> if (vhost_vq_meta_fetch(vq, addr, len, type))
> return true;
> @@ -1184,7 +1184,7 @@ static int iotlb_access_ok(struct vhost_virtqueue *vq,
> while (len > s) {
> node = vhost_umem_interval_tree_iter_first(&umem->umem_tree,
> addr,
> - addr + len - 1);
> + last);
> if (node == NULL || node->start > addr) {
> vhost_iotlb_miss(vq, addr, access);
> return false;
Michael, Thank you for this one.
Acked-by: Koichiro Den <den@klaipeden.com>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next 03/18] net: mvpp2: set the SMI PHY address when connecting to the PHY
From: Andrew Lunn @ 2017-08-22 14:50 UTC (permalink / raw)
To: Antoine Tenart
Cc: davem, jason, gregory.clement, sebastian.hesselbarth,
thomas.petazzoni, nadavh, linux, mw, stefanc, netdev,
linux-arm-kernel
In-Reply-To: <20170822144101.GC14942@kwain>
> > It is still not clear to my why you need to program the address into
> > the hardware. Is the hardware talking to the PHY?
>
> Sorry for the answer delay, I was out of the office...
>
> This PHY address configuration should be done in the mvmdio driver as
> this is not directly related to the PPv2 (well, the mvmdio driver is
> only an abstraction to reuse the mdio code, using registers exposed by
> PPv2 in this case anyway). But two values must be known in order to do
> this: the PHY address and the GoP port number. Getting the last one from
> the mvmdio driver would be really ugly as we would need to read the PPv2
> dt node. This is why this patch adds it in the PPv2 driver, but I know
> it's not perfect.
>
> I'll resend a series very soon, with this patch still included. We can
> continue the discussion there I guess, if needed.
>
> Thanks!
Hi Antoine
You have still not explained why PPv2 needs to know the PHY address.
What i'm worried about is that the PPv2 is actually talking to the
PHY. Is it polling link state? Does it take the PHY mutex when it does
this poll?
Andrew
^ permalink raw reply
* Re: [PATCH v4 net-next] arm: eBPF JIT compiler
From: Daniel Borkmann @ 2017-08-22 15:08 UTC (permalink / raw)
To: Shubham Bansal, linux, davem
Cc: netdev, ast, linux-arm-kernel, linux-kernel, keescook
In-Reply-To: <1503383772-5788-1-git-send-email-illusionist.neo@gmail.com>
On 08/22/2017 08:36 AM, Shubham Bansal wrote:
[...]
> +
> +static int out_offset = -1; /* initialized on the first pass of build_body() */
Hm, why is this a global var actually? There can be
multiple parallel calls to bpf_int_jit_compile(), we
don't take a global lock on this. Unless I'm missing
something this should really reside in jit_ctx, no?
Given this is on emit_bpf_tail_call(), did you get
tail calls working the way I suggested to test?
> +static int emit_bpf_tail_call(struct jit_ctx *ctx)
> {
[...]
> + const int idx0 = ctx->idx;
> +#define cur_offset (ctx->idx - idx0)
> +#define jmp_offset (out_offset - (cur_offset))
[...]
> +
> + /* out: */
> + if (out_offset == -1)
> + out_offset = cur_offset;
> + if (cur_offset != out_offset) {
> + pr_err_once("tail_call out_offset = %d, expected %d!\n",
> + cur_offset, out_offset);
> + return -1;
> + }
> + return 0;
> +#undef cur_offset
> +#undef jmp_offset
> }
^ permalink raw reply
* Re: [PATCH net-next 2/4] bpf/verifier: remove varlen_map_value_access flag
From: Alexei Starovoitov via iovisor-dev @ 2017-08-22 15:17 UTC (permalink / raw)
To: Edward Cree, davem-fT/PcQaiUtIeIZ0/mPfg9Q, Alexei Starovoitov,
Daniel Borkmann
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev
In-Reply-To: <6174b732-0bf9-19a3-09c4-f6a62be24a0c-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>
On 8/22/17 6:26 AM, Edward Cree wrote:
> The optimisation it does is broken when the 'new' register value has a
> variable offset and the 'old' was constant. I broke it with my pointer
> types unification (see Fixes tag below), before which the 'new' value
> would have type PTR_TO_MAP_VALUE_ADJ and would thus not compare equal;
> other changes in that patch mean that its original behaviour (ignore
> min/max values) cannot be restored.
> Tests on a sample set of cilium programs show no change in count of
> processed instructions.
>
> Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
> Signed-off-by: Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>
Acked-by: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
^ permalink raw reply
* Re: [PATCH net-next 3/4] bpf/verifier: when pruning a branch, ignore its write marks
From: Alexei Starovoitov via iovisor-dev @ 2017-08-22 15:24 UTC (permalink / raw)
To: Edward Cree, davem-fT/PcQaiUtIeIZ0/mPfg9Q, Alexei Starovoitov,
Daniel Borkmann
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, iovisor-dev
In-Reply-To: <d1a5b78a-8f5c-50d9-08a2-eb7c4829a478-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>
On 8/22/17 6:27 AM, Edward Cree wrote:
> The fact that writes occurred in reaching the continuation state does
> not screen off its reads from us, because we're not really its parent.
> So detect 'not really the parent' in do_propagate_liveness, and ignore
> write marks in that case.
>
> Fixes: dc503a8ad984 ("bpf/verifier: track liveness for pruning")
> Signed-off-by: Edward Cree <ecree-s/n/eUQHGBpZroRs9YW3xA@public.gmane.org>
> ---
> kernel/bpf/verifier.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 1e3f56c..711bdbd 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -3420,6 +3420,7 @@ static bool states_equal(struct bpf_verifier_env *env,
> static bool do_propagate_liveness(const struct bpf_verifier_state *state,
> struct bpf_verifier_state *parent)
> {
> + bool writes = parent == state->parent; /* Observe write marks */
> bool touched = false; /* any changes made? */
> int i;
>
> @@ -3431,7 +3432,9 @@ static bool do_propagate_liveness(const struct bpf_verifier_state *state,
> for (i = 0; i < BPF_REG_FP; i++) {
> if (parent->regs[i].live & REG_LIVE_READ)
> continue;
> - if (state->regs[i].live == REG_LIVE_READ) {
> + if (writes && (state->regs[i].live & REG_LIVE_WRITTEN))
> + continue;
> + if (state->regs[i].live & REG_LIVE_READ) {
makes sense to me.
if i understand correctly it not only should make the liveness marking
correct, but improve the numbers, since smaller number of states
will have READ marks.
Do you have a test case for this by any chance?
Acked-by: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
^ permalink raw reply
* Re: [PATCH v4 net-next] arm: eBPF JIT compiler
From: Daniel Borkmann @ 2017-08-22 15:25 UTC (permalink / raw)
To: Shubham Bansal, linux, davem
Cc: netdev, ast, linux-arm-kernel, linux-kernel, keescook
In-Reply-To: <599C48E7.3000409@iogearbox.net>
On 08/22/2017 05:08 PM, Daniel Borkmann wrote:
> On 08/22/2017 08:36 AM, Shubham Bansal wrote:
> [...]
>> +
>> +static int out_offset = -1; /* initialized on the first pass of build_body() */
>
> Hm, why is this a global var actually? There can be
> multiple parallel calls to bpf_int_jit_compile(), we
> don't take a global lock on this. Unless I'm missing
> something this should really reside in jit_ctx, no?
Hm, okay, it's for generating the out jmp offsets in
tail call emission which are supposed to always be the
same relative offsets; should be fine then.
> Given this is on emit_bpf_tail_call(), did you get
> tail calls working the way I suggested to test?
>
>> +static int emit_bpf_tail_call(struct jit_ctx *ctx)
>> {
> [...]
>> + const int idx0 = ctx->idx;
>> +#define cur_offset (ctx->idx - idx0)
>> +#define jmp_offset (out_offset - (cur_offset))
> [...]
>> +
>> + /* out: */
>> + if (out_offset == -1)
>> + out_offset = cur_offset;
>> + if (cur_offset != out_offset) {
>> + pr_err_once("tail_call out_offset = %d, expected %d!\n",
>> + cur_offset, out_offset);
>> + return -1;
>> + }
>> + return 0;
>> +#undef cur_offset
>> +#undef jmp_offset
>> }
^ permalink raw reply
* Re: [PATCH v3 3/4] net: stmmac: register parent MDIO node for sun8i-h3-emac
From: Chen-Yu Tsai @ 2017-08-22 15:39 UTC (permalink / raw)
To: Andrew Lunn
Cc: Maxime Ripard, Chen-Yu Tsai, Corentin Labbe, Rob Herring,
Mark Rutland, Russell King, Giuseppe Cavallaro, Alexandre Torgue,
devicetree, linux-arm-kernel, linux-kernel, netdev
In-Reply-To: <20170821142321.GE1703-g2DYL2Zd6BY@public.gmane.org>
On Mon, Aug 21, 2017 at 10:23 PM, Andrew Lunn <andrew-g2DYL2Zd6BY@public.gmane.org> wrote:
>> All muxes are mostly always represented the same way afaik, or do you
>> want to simply introduce a new compatible / property?
>
> + mdio-mux {
> + compatible = "allwinner,sun8i-h3-mdio-switch";
> + mdio-parent-bus = <&mdio_parent>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + internal_mdio: mdio@1 {
> reg = <1>;
> - clocks = <&ccu CLK_BUS_EPHY>;
> - resets = <&ccu RST_BUS_EPHY>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> + int_mii_phy: ethernet-phy@1 {
> + compatible = "ethernet-phy-ieee802.3-c22";
> + reg = <1>;
> + clocks = <&ccu CLK_BUS_EPHY>;
> + resets = <&ccu RST_BUS_EPHY>;
> + phy-is-integrated;
> + };
> + };
> + mdio: mdio@0 {
> + reg = <0>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> };
>
> Hi Maxim
>
> Anybody who knows the MDIO-mux code/binding, knows that it is a run
> time mux. You swap the mux per MDIO transaction. You can access all
> the PHY and switches on the mux'ed MDIO bus.
>
> However here, it is effectively a boot-time MUX. You cannot change it
> on the fly. What happens when somebody has a phandle to a PHY on the
> internal and a phandle to a phy on the external? Does the driver at
> least return -EINVAL, or -EBUSY? Is there a representation which
> eliminates this possibility?
There is only one controller. Either you use the internal PHY, which
is then directly coupled (no magnetics needed) to the RJ45 port, or
you use an external PHY over MII/RMII/RGMII. You could supposedly
have both on a board, and let the user choose one. But why bother
with the extra complexity and cost? Either you use the internal PHY
at 100M, or an external RGMII PHY for gigabit speeds.
So I think what you are saying is either impossible or engineering-wise
a very stupid design, like using an external MAC with a discrete PHY
connected to the internal MAC's MDIO bus, while using the internal MAC
with the internal PHY.
Now can we please decide on something? We're a week and a half from
the 4.13 release. If mdio-mux is wrong, then we could have two mdio
nodes (internal-mdio & external-mdio).
Regards
ChenYu
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH net-next] udp: remove unreachable ufo branches
From: Willem de Bruijn @ 2017-08-22 15:39 UTC (permalink / raw)
To: netdev; +Cc: davem, Willem de Bruijn
From: Willem de Bruijn <willemb@google.com>
Remove two references to ufo in the udp send path that are no longer
reachable now that ufo has been removed.
Commit 85f1bd9a7b5a ("udp: consistently apply ufo or fragmentation")
is a fix to ufo. It is safe to revert what remains of it.
Also, no skb can enter ip_append_page with skb_is_gso true now that
skb_shinfo(skb)->gso_type is no longer set in ip_append_page/_data.
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
net/ipv4/ip_output.c | 12 ++++--------
net/ipv4/udp.c | 2 +-
2 files changed, 5 insertions(+), 9 deletions(-)
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 73b0b15245b6..e8e675be60ec 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1223,15 +1223,11 @@ ssize_t ip_append_page(struct sock *sk, struct flowi4 *fl4, struct page *page,
cork->length += size;
while (size > 0) {
- if (skb_is_gso(skb)) {
- len = size;
- } else {
+ /* Check if the remaining data fits into current packet. */
+ len = mtu - skb->len;
+ if (len < size)
+ len = maxfraglen - skb->len;
- /* Check if the remaining data fits into current packet. */
- len = mtu - skb->len;
- if (len < size)
- len = maxfraglen - skb->len;
- }
if (len <= 0) {
struct sk_buff *skb_prev;
int alloclen;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 25fb14490d6a..bf6c406bf5e7 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -809,7 +809,7 @@ static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4)
if (is_udplite) /* UDP-Lite */
csum = udplite_csum(skb);
- else if (sk->sk_no_check_tx && !skb_is_gso(skb)) { /* UDP csum off */
+ else if (sk->sk_no_check_tx) { /* UDP csum off */
skb->ip_summed = CHECKSUM_NONE;
goto send;
--
2.14.1.480.gb18f417b89-goog
^ permalink raw reply related
* Re: [PATCH net-next 4/4] bpf/verifier: document liveness analysis
From: Alexei Starovoitov @ 2017-08-22 15:42 UTC (permalink / raw)
To: Edward Cree, davem, Alexei Starovoitov, Daniel Borkmann
Cc: netdev, iovisor-dev
In-Reply-To: <60838927-7286-60c9-ad69-3b97350a05a4@solarflare.com>
On 8/22/17 6:27 AM, Edward Cree wrote:
> The liveness tracking algorithm is quite subtle; add comments to explain it.
>
> Signed-off-by: Edward Cree <ecree@solarflare.com>
> ---
> include/linux/bpf_verifier.h | 13 +++++++++++++
> kernel/bpf/verifier.c | 28 +++++++++++++++++++++++++++-
> 2 files changed, 40 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index d8f131a..b8d200f 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -21,6 +21,19 @@
> */
> #define BPF_MAX_VAR_SIZ INT_MAX
>
> +/* Liveness marks, used for registers and spilled-regs (in stack slots).
> + * Read marks propagate upwards until they find a write mark; they record that
> + * "one of this state's descendants read this reg" (and therefore the reg is
> + * relevant for states_equal() checks).
> + * Write marks collect downwards and do not propagate; they record that "the
> + * straight-line code that reached this state (from its parent) wrote this reg"
> + * (and therefore that reads propagated from this state or its descendants
> + * should not propagate to its parent).
> + * A state with a write mark can receive read marks; it just won't propagate
> + * them to its parent, since the write mark is a property, not of the state,
> + * but of the link between it and its parent. See mark_reg_read() and
> + * mark_stack_slot_read() in kernel/bpf/verifier.c.
> + */
+1
> enum bpf_reg_liveness {
> REG_LIVE_NONE = 0, /* reg hasn't been read or written this branch */
> REG_LIVE_READ, /* reg was read, so we're sensitive to initial value */
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 711bdbd..5fc350e 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -3417,6 +3417,12 @@ static bool states_equal(struct bpf_verifier_env *env,
> return ret;
> }
>
> +/* A write screens off any subsequent reads; but write marks come from the
> + * straight-line code between a state and its parent. When we arrive at a
> + * jump target (in the first iteration of the propagate_liveness() loop),
> + * we didn't arrive by the straight-line code, so read marks in state must
> + * propagate to parent regardless of state's write marks.
> + */
+1
> static bool do_propagate_liveness(const struct bpf_verifier_state *state,
> struct bpf_verifier_state *parent)
> {
> @@ -3457,6 +3463,15 @@ static bool do_propagate_liveness(const struct bpf_verifier_state *state,
> return touched;
> }
>
> +/* "parent" is "a state from which we reach the current state", but initially
> + * it is not the state->parent (i.e. "the state whose straight-line code leads
> + * to the current state"), instead it is the state that happened to arrive at
> + * a (prunable) equivalent of the current state. See comment above
> + * do_propagate_liveness() for consequences of this.
> + * This function is just a more efficient way of calling mark_reg_read() or
> + * mark_stack_slot_read() on each reg in "parent" that is read in "state", so
> + * long as parent != state->parent.
> + */
i'm confused with 'so long as parent != state->parent' which implies
looping and multiple iterations, whereas 'parent != state->parent'
condition is true only for the first iteration of
'while (do_propagate_liveness(state, parent))' loop.
right ?
> static void propagate_liveness(const struct bpf_verifier_state *state,
> struct bpf_verifier_state *parent)
> {
> @@ -3485,6 +3500,12 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
> /* reached equivalent register/stack state,
> * prune the search.
> * Registers read by the continuation are read by us.
> + * If we have any write marks in env->cur_state, they
> + * will prevent corresponding reads in the continuation
> + * from reaching our parent (an explored_state). Our
> + * own state will get the read marks recorded, but
> + * they'll be immediately forgotten as we're pruning
> + * this state and will pop a new one.
> */
+1
> propagate_liveness(&sl->state, &env->cur_state);
> return 1;
> @@ -3508,7 +3529,12 @@ static int is_state_visited(struct bpf_verifier_env *env, int insn_idx)
> env->explored_states[insn_idx] = new_sl;
> /* connect new state to parentage chain */
> env->cur_state.parent = &new_sl->state;
> - /* clear liveness marks in current state */
> + /* clear write marks in current state: the writes we did are not writes
> + * our child did, so they don't screen off its reads from us.
> + * (There are no read marks in current state, because reads always mark
> + * their parent and current state never has children yet. Only
> + * explored_states can get read marks.)
> + */
+1
> for (i = 0; i < BPF_REG_FP; i++)
> env->cur_state.regs[i].live = REG_LIVE_NONE;
> for (i = 0; i < MAX_BPF_STACK / BPF_REG_SIZE; i++)
>
^ permalink raw reply
* Re: [PATCHv3 net-next] gre: introduce native tunnel support for ERSPAN
From: William Tu @ 2017-08-22 15:44 UTC (permalink / raw)
To: David Miller
Cc: Linux Kernel Network Developers, Meenakshi Vohra,
Alexey Kuznetsov, Hideaki YOSHIFUJI
In-Reply-To: <20170821.110226.325505016722205417.davem@davemloft.net>
>> + struct metadata_dst *tun_dst = NULL;
>> + const struct iphdr *iph;
>> + struct erspanhdr *ershdr;
>> + __be32 index;
>> + __be32 session_id;
>> + int len;
>
> Please order local variables from longest to shortest line, ie. reverse
> christmas tree format.
>
thanks for the review, I will fix the ordering in next version.
>> +
>> + itn = net_generic(net, erspan_net_id);
>> + iph = ip_hdr(skb);
>> + len = iph->ihl * 4 + gre_hdr_len + sizeof(*ershdr);
>> +
>> + if (unlikely(!pskb_may_pull(skb, len)))
>> + return -ENOMEM;
>
> I think the len passed here is wrong, it should be
> "gre_hdr_len + sizeof(*ershdr)".
yes, thanks.
The skb->data at this point points to the beginning of GRE header. So
I should check pull of gre + erspan header.
Regards,
William
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox