* [pull request][net 0/3] Mellanox, mlx5 fixes 2018-05-10
From: Saeed Mahameed @ 2018-05-10 23:19 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Saeed Mahameed
Hi Dave,
the following series includes some fixes for mlx5 core driver.
Please pull and let me know if there's any problem.
For -stable v4.5
("net/mlx5: E-Switch, Include VF RDMA stats in vport statistics")
For -stable v4.10
("net/mlx5e: Err if asked to offload TC match on frag being first")
Thanks,
Saeed.
---
The following changes since commit ca3943c4aaff083bc25419f04e549e293590258e:
Merge tag 'linux-can-fixes-for-4.17-20180510' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can (2018-05-10 17:57:11 -0400)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-fixes-2018-05-10
for you to fetch changes up to f85900c3e13fdb61f040c9feecbcda601e0cdcfb:
net/mlx5e: Err if asked to offload TC match on frag being first (2018-05-10 16:10:13 -0700)
----------------------------------------------------------------
mlx5-fixes-2018-05-10
----------------------------------------------------------------
Adi Nissim (1):
net/mlx5: E-Switch, Include VF RDMA stats in vport statistics
Daniel Jurgens (1):
net/mlx5: Free IRQs in shutdown path
Roi Dayan (1):
net/mlx5e: Err if asked to offload TC match on frag being first
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 4 ++++
drivers/net/ethernet/mellanox/mlx5/core/eq.c | 28 ++++++++++++++++++++++
drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 11 ++++++++-
drivers/net/ethernet/mellanox/mlx5/core/main.c | 8 +++++++
.../net/ethernet/mellanox/mlx5/core/mlx5_core.h | 2 ++
5 files changed, 52 insertions(+), 1 deletion(-)
^ permalink raw reply
* Re: [PATCH] coredump: rename umh_pipe_setup() to coredump_pipe_setup()
From: Alexei Starovoitov @ 2018-05-10 23:19 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: viro, ast, linux-fsdevel, linux-kernel, David S. Miller, netdev
In-Reply-To: <20180507233002.5571-1-mcgrof@kernel.org>
On Mon, May 07, 2018 at 04:30:02PM -0700, Luis R. Rodriguez wrote:
> This makes it clearer this code is part of the coredump code, and
> is not an exported generic helper from kernel/umh.c.
>
> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
> ---
> fs/coredump.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/fs/coredump.c b/fs/coredump.c
> index 1e2c87acac9b..566504781683 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -508,7 +508,7 @@ static void wait_for_dump_helpers(struct file *file)
> }
>
> /*
> - * umh_pipe_setup
> + * coredump_pipe_setup
> * helper function to customize the process used
> * to collect the core in userspace. Specifically
> * it sets up a pipe and installs it as fd 0 (stdin)
> @@ -518,7 +518,7 @@ static void wait_for_dump_helpers(struct file *file)
> * is a special value that we use to trap recursive
> * core dumps
> */
> -static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
> +static int coredump_pipe_setup(struct subprocess_info *info, struct cred *new)
I think this renaming makes sense.
How do we want to proceed?
I can take it as part of my series and get the whole thing through net-next
or folks want to apply this separately?
^ permalink raw reply
* [PATCH v6 1/6] net: phy: at803x: Export at803x_debug_reg_mask()
From: Paul Burton @ 2018-05-10 23:16 UTC (permalink / raw)
To: netdev; +Cc: linux-mips, David S . Miller, Andrew Lunn, Paul Burton
In-Reply-To: <20180510231657.28503-1-paul.burton@mips.com>
From: Andrew Lunn <andrew@lunn.ch>
On some boards, this PHY has a problem when it hibernates. Export this
function to a board can register a PHY fixup to disable hibernation.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: linux-mips@linux-mips.org
Cc: netdev@vger.kernel.org
---
Changes in v6:
- New patch.
Changes in v5: None
Changes in v4: None
Changes in v3: None
Changes in v2: None
drivers/net/phy/at803x.c | 5 +++--
include/linux/at803x_phy.h | 16 ++++++++++++++++
2 files changed, 19 insertions(+), 2 deletions(-)
create mode 100644 include/linux/at803x_phy.h
diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c
index 411cf1072bae..5aede5708abf 100644
--- a/drivers/net/phy/at803x.c
+++ b/drivers/net/phy/at803x.c
@@ -18,6 +18,7 @@
#include <linux/etherdevice.h>
#include <linux/of_gpio.h>
#include <linux/gpio/consumer.h>
+#include <linux/at803x_phy.h>
#define AT803X_INTR_ENABLE 0x12
#define AT803X_INTR_ENABLE_AUTONEG_ERR BIT(15)
@@ -93,8 +94,8 @@ static int at803x_debug_reg_read(struct phy_device *phydev, u16 reg)
return phy_read(phydev, AT803X_DEBUG_DATA);
}
-static int at803x_debug_reg_mask(struct phy_device *phydev, u16 reg,
- u16 clear, u16 set)
+int at803x_debug_reg_mask(struct phy_device *phydev, u16 reg,
+ u16 clear, u16 set)
{
u16 val;
int ret;
diff --git a/include/linux/at803x_phy.h b/include/linux/at803x_phy.h
new file mode 100644
index 000000000000..2460c17d56ec
--- /dev/null
+++ b/include/linux/at803x_phy.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _AT803X_PHY_H
+#define _PHY_AT803X_PHY
+
+#define ATH8030_PHY_ID 0x004dd076
+#define ATH8031_PHY_ID 0x004dd074
+#define ATH8035_PHY_ID 0x004dd072
+#define AT803X_PHY_ID_MASK 0xffffffef
+
+#define AT8031_HIBERNATE 0x0B
+#define AT8031_PS_HIB_EN 0x8000 /* Hibernate enable */
+
+int at803x_debug_reg_mask(struct phy_device *phydev, u16 reg,
+ u16 clear, u16 set);
+
+#endif /* _AT803X_PHY_H */
--
2.17.0
^ permalink raw reply related
* [PATCH v6 0/6] net: pch_gbe: MIPS support
From: Paul Burton @ 2018-05-10 23:16 UTC (permalink / raw)
To: netdev; +Cc: linux-mips, David S . Miller, Andrew Lunn, Paul Burton
The Intel EG20T Platform Controller Hub is used on the MIPS Boston
development board to provide various peripherals including ethernet.
This series migrates the pch_gbe driver's PHY support to use phylib,
implements support for device tree which we use to provide the PHY reset
GPIO, and allows the driver to be built for MIPS.
Applies atop v4.17-rc4.
Please note that I don't have access to the Intel systems (eg.
MinnowBoard v1) that make use of this driver, so am unable to test on
those. If anyone with such a system could test the series that would be
much appreciated.
v6 of the series is later than I'd hoped, but we had a product
release[1] that kept me busy. My apologies!
The series is significantly different to earlier versions - Andrew did
the legwork of converting to phylib and that simplified things
significantly. v5 contained further fixes to the driver which I've
removed from v6 such that this series is just enough to get the driver
running on the MIPS Boston platform, despite a few bugs in the interest
of a simpler & more focused patch series. I'll submit those fixes
separately.
Thanks,
Paul
[1] https://www.mips.com/press/new-mips-i7200-processor-core-delivers-unmatched-performance-and-efficiency-for-advanced-lte5g-communications-and-networking-ic-designs/
Andrew Lunn (2):
net: phy: at803x: Export at803x_debug_reg_mask()
net: ethernet: pch_gbe: Convert to mdiobus and phylib
Paul Burton (4):
net: pch_gbe: Support DeviceTree for MDIO/PHY description
ptp: pch: Allow build on MIPS platforms
net: pch_gbe: Allow build on MIPS platforms
MIPS: Boston: Adjust DT for pch_gbe PHY support
arch/mips/boot/dts/img/boston.dts | 13 +-
drivers/net/ethernet/oki-semi/pch_gbe/Kconfig | 5 +-
.../net/ethernet/oki-semi/pch_gbe/Makefile | 2 +-
.../net/ethernet/oki-semi/pch_gbe/pch_gbe.h | 35 +-
.../ethernet/oki-semi/pch_gbe/pch_gbe_api.c | 118 ------
.../ethernet/oki-semi/pch_gbe/pch_gbe_api.h | 8 +-
.../oki-semi/pch_gbe/pch_gbe_ethtool.c | 89 +---
.../ethernet/oki-semi/pch_gbe/pch_gbe_main.c | 382 +++++++++---------
.../ethernet/oki-semi/pch_gbe/pch_gbe_param.c | 265 ------------
.../ethernet/oki-semi/pch_gbe/pch_gbe_phy.c | 377 -----------------
.../ethernet/oki-semi/pch_gbe/pch_gbe_phy.h | 37 --
drivers/net/phy/at803x.c | 5 +-
drivers/ptp/Kconfig | 2 +-
include/linux/at803x_phy.h | 16 +
14 files changed, 249 insertions(+), 1105 deletions(-)
delete mode 100644 drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_phy.c
delete mode 100644 drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_phy.h
create mode 100644 include/linux/at803x_phy.h
--
2.17.0
^ permalink raw reply
* Re: [PATCH v2 net-next 1/4] umh: introduce fork_usermode_blob() helper
From: Alexei Starovoitov @ 2018-05-10 23:16 UTC (permalink / raw)
To: Kees Cook
Cc: Luis R. Rodriguez, Alexei Starovoitov, David S. Miller,
Daniel Borkmann, Linus Torvalds, Greg KH, Andy Lutomirski,
Network Development, LKML, kernel-team, Al Viro, David Howells,
Mimi Zohar, Andrew Morton, Dominik Brodowski, Hugh Dickins,
Jani Nikula, Joonas Lahtinen, Rodrigo Vivi <rodrigo.
In-Reply-To: <CAGXu5j+1p5ccFBjcKa3FDR4cQnbQn3cokbbhrwizoUyAAqFSTQ@mail.gmail.com>
On Thu, May 10, 2018 at 03:27:24PM -0700, Kees Cook wrote:
> On Fri, May 4, 2018 at 12:56 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > What a mighty short list of reviewers. Adding some more. My review below.
> > I'd appreciate a Cc on future versions of these patches.
>
> Me too, please. And likely linux-security-module@ and Jessica too.
>
> > On Wed, May 02, 2018 at 09:36:01PM -0700, Alexei Starovoitov wrote:
> >> Introduce helper:
> >> int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
> >> struct umh_info {
> >> struct file *pipe_to_umh;
> >> struct file *pipe_from_umh;
> >> pid_t pid;
> >> };
> >>
> >> that GPLed kernel modules (signed or unsigned) can use it to execute part
> >> of its own data as swappable user mode process.
> >>
> >> The kernel will do:
> >> - mount "tmpfs"
> >> - allocate a unique file in tmpfs
> >> - populate that file with [data, data + len] bytes
> >> - user-mode-helper code will do_execve that file and, before the process
> >> starts, the kernel will create two unix pipes for bidirectional
> >> communication between kernel module and umh
> >> - close tmpfs file, effectively deleting it
> >> - the fork_usermode_blob will return zero on success and populate
> >> 'struct umh_info' with two unix pipes and the pid of the user process
>
> I'm trying to think how LSMs can successfully reason about the
> resulting exec(). In the past, we've replaced "blob" style interfaces
> with file-based interfaces (e.g. init_module() -> finit_module(),
> kexec_load() -> kexec_file_load()) to better let the kernel understand
> the origin of executable content. Here the intent is fine: we're
> getting the exec from an already-loaded module, etc, etc. I'm trying
> to think specifically about the interface.
>
> How can the ultimate exec get tied back to the kernel module in a way
> that the LSM can query? Right now the hooks hit during exec are:
> kernel_read_file() and kernel_post_read_file() of tmpfs file,
> bprm_set_creds(), bprm_check(), bprm_commiting_creds(),
> bprm_commited_creds(). It seems silly to me for an LSM to perform
> these checks at all since I would expect the _meaningful_ check to be
> finit_module() of the module itself. Having a way for an LSM to know
> the exec is tied to a kernel module would let them skip the nonsense
> checks.
>
> Since the process for doing the usermode_blob is defined by the kernel
> module build/link/objcopy process, could we tighten the
> fork_usermode_blob() interface to point to the kernel module itself,
> rather than leaving it an open-ended "blob" interface? Given our
> history of needing to replace blob interfaces with file interfaces,
> I'm cautious to add a new blob interface. Maybe just pull all the
> blob-finding/loading into the interface, and just make it something
> like fork_usermode_kmod(struct module *mod, struct umh_info *info) ?
I don't think it will work, since Andy and others pointed out that
bpfilter needs to work as builtin as well. There is no 'struct module'
in such case, but fork-ing of the user process still needs to happen.
^ permalink raw reply
* [PATCH net 5/5] rxrpc: Trace UDP transmission failure
From: David Howells @ 2018-05-10 22:45 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <152599231687.26376.15020977491573449830.stgit@warthog.procyon.org.uk>
Add a tracepoint to log transmission failure from the UDP transport socket
being used by AF_RXRPC.
Signed-off-by: David Howells <dhowells@redhat.com>
---
include/trace/events/rxrpc.h | 55 ++++++++++++++++++++++++++++++++++++++++++
net/rxrpc/conn_event.c | 11 ++++++--
net/rxrpc/local_event.c | 3 ++
net/rxrpc/output.c | 23 ++++++++++++++++--
net/rxrpc/rxkad.c | 6 +++--
5 files changed, 90 insertions(+), 8 deletions(-)
diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index 497d0b67f421..077e664ac9a2 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -211,6 +211,20 @@ enum rxrpc_congest_change {
rxrpc_cong_saw_nack,
};
+enum rxrpc_tx_fail_trace {
+ rxrpc_tx_fail_call_abort,
+ rxrpc_tx_fail_call_ack,
+ rxrpc_tx_fail_call_data_frag,
+ rxrpc_tx_fail_call_data_nofrag,
+ rxrpc_tx_fail_call_final_resend,
+ rxrpc_tx_fail_conn_abort,
+ rxrpc_tx_fail_conn_challenge,
+ rxrpc_tx_fail_conn_response,
+ rxrpc_tx_fail_reject,
+ rxrpc_tx_fail_version_keepalive,
+ rxrpc_tx_fail_version_reply,
+};
+
#endif /* end __RXRPC_DECLARE_TRACE_ENUMS_ONCE_ONLY */
/*
@@ -438,6 +452,19 @@ enum rxrpc_congest_change {
EM(RXRPC_CALL_LOCAL_ERROR, "LocalError") \
E_(RXRPC_CALL_NETWORK_ERROR, "NetError")
+#define rxrpc_tx_fail_traces \
+ EM(rxrpc_tx_fail_call_abort, "CallAbort") \
+ EM(rxrpc_tx_fail_call_ack, "CallAck") \
+ EM(rxrpc_tx_fail_call_data_frag, "CallDataFrag") \
+ EM(rxrpc_tx_fail_call_data_nofrag, "CallDataNofrag") \
+ EM(rxrpc_tx_fail_call_final_resend, "CallFinalResend") \
+ EM(rxrpc_tx_fail_conn_abort, "ConnAbort") \
+ EM(rxrpc_tx_fail_conn_challenge, "ConnChall") \
+ EM(rxrpc_tx_fail_conn_response, "ConnResp") \
+ EM(rxrpc_tx_fail_reject, "Reject") \
+ EM(rxrpc_tx_fail_version_keepalive, "VerKeepalive") \
+ E_(rxrpc_tx_fail_version_reply, "VerReply")
+
/*
* Export enum symbols via userspace.
*/
@@ -461,6 +488,7 @@ rxrpc_propose_ack_traces;
rxrpc_propose_ack_outcomes;
rxrpc_congest_modes;
rxrpc_congest_changes;
+rxrpc_tx_fail_traces;
/*
* Now redefine the EM() and E_() macros to map the enums to the strings that
@@ -1404,6 +1432,33 @@ TRACE_EVENT(rxrpc_rx_icmp,
&__entry->srx.transport)
);
+TRACE_EVENT(rxrpc_tx_fail,
+ TP_PROTO(unsigned int debug_id, rxrpc_serial_t serial, int ret,
+ enum rxrpc_tx_fail_trace what),
+
+ TP_ARGS(debug_id, serial, ret, what),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, debug_id )
+ __field(rxrpc_serial_t, serial )
+ __field(int, ret )
+ __field(enum rxrpc_tx_fail_trace, what )
+ ),
+
+ TP_fast_assign(
+ __entry->debug_id = debug_id;
+ __entry->serial = serial;
+ __entry->ret = ret;
+ __entry->what = what;
+ ),
+
+ TP_printk("c=%08x r=%x ret=%d %s",
+ __entry->debug_id,
+ __entry->serial,
+ __entry->ret,
+ __print_symbolic(__entry->what, rxrpc_tx_fail_traces))
+ );
+
#endif /* _TRACE_RXRPC_H */
/* This part must be outside protection */
diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c
index c717152070df..1350f1be8037 100644
--- a/net/rxrpc/conn_event.c
+++ b/net/rxrpc/conn_event.c
@@ -40,7 +40,7 @@ static void rxrpc_conn_retransmit_call(struct rxrpc_connection *conn,
} __attribute__((packed)) pkt;
struct rxrpc_ackinfo ack_info;
size_t len;
- int ioc;
+ int ret, ioc;
u32 serial, mtu, call_id, padding;
_enter("%d", conn->debug_id);
@@ -135,10 +135,13 @@ static void rxrpc_conn_retransmit_call(struct rxrpc_connection *conn,
break;
}
- kernel_sendmsg(conn->params.local->socket, &msg, iov, ioc, len);
+ ret = kernel_sendmsg(conn->params.local->socket, &msg, iov, ioc, len);
conn->params.peer->last_tx_at = ktime_get_real();
+ if (ret < 0)
+ trace_rxrpc_tx_fail(conn->debug_id, serial, ret,
+ rxrpc_tx_fail_call_final_resend);
+
_leave("");
- return;
}
/*
@@ -236,6 +239,8 @@ static int rxrpc_abort_connection(struct rxrpc_connection *conn,
ret = kernel_sendmsg(conn->params.local->socket, &msg, iov, 2, len);
if (ret < 0) {
+ trace_rxrpc_tx_fail(conn->debug_id, serial, ret,
+ rxrpc_tx_fail_conn_abort);
_debug("sendmsg failed: %d", ret);
return -EAGAIN;
}
diff --git a/net/rxrpc/local_event.c b/net/rxrpc/local_event.c
index 93b5d910b4a1..8325f1b86840 100644
--- a/net/rxrpc/local_event.c
+++ b/net/rxrpc/local_event.c
@@ -71,7 +71,8 @@ static void rxrpc_send_version_request(struct rxrpc_local *local,
ret = kernel_sendmsg(local->socket, &msg, iov, 2, len);
if (ret < 0)
- _debug("sendmsg failed: %d", ret);
+ trace_rxrpc_tx_fail(local->debug_id, 0, ret,
+ rxrpc_tx_fail_version_reply);
_leave("");
}
diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index 6b9d27f0d7ec..f03de1c59ba3 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -210,6 +210,9 @@ int rxrpc_send_ack_packet(struct rxrpc_call *call, bool ping,
if (ping)
call->ping_time = now;
conn->params.peer->last_tx_at = ktime_get_real();
+ if (ret < 0)
+ trace_rxrpc_tx_fail(call->debug_id, serial, ret,
+ rxrpc_tx_fail_call_ack);
if (call->state < RXRPC_CALL_COMPLETE) {
if (ret < 0) {
@@ -294,6 +297,10 @@ int rxrpc_send_abort_packet(struct rxrpc_call *call)
ret = kernel_sendmsg(conn->params.local->socket,
&msg, iov, 1, sizeof(pkt));
conn->params.peer->last_tx_at = ktime_get_real();
+ if (ret < 0)
+ trace_rxrpc_tx_fail(call->debug_id, serial, ret,
+ rxrpc_tx_fail_call_abort);
+
rxrpc_put_connection(conn);
return ret;
@@ -387,6 +394,9 @@ int rxrpc_send_data_packet(struct rxrpc_call *call, struct sk_buff *skb,
conn->params.peer->last_tx_at = ktime_get_real();
up_read(&conn->params.local->defrag_sem);
+ if (ret < 0)
+ trace_rxrpc_tx_fail(call->debug_id, serial, ret,
+ rxrpc_tx_fail_call_data_nofrag);
if (ret == -EMSGSIZE)
goto send_fragmentable;
@@ -476,6 +486,10 @@ int rxrpc_send_data_packet(struct rxrpc_call *call, struct sk_buff *skb,
#endif
}
+ if (ret < 0)
+ trace_rxrpc_tx_fail(call->debug_id, serial, ret,
+ rxrpc_tx_fail_call_data_frag);
+
up_write(&conn->params.local->defrag_sem);
goto done;
}
@@ -493,6 +507,7 @@ void rxrpc_reject_packets(struct rxrpc_local *local)
struct kvec iov[2];
size_t size;
__be32 code;
+ int ret;
_enter("%d", local->debug_id);
@@ -527,7 +542,10 @@ void rxrpc_reject_packets(struct rxrpc_local *local)
whdr.flags ^= RXRPC_CLIENT_INITIATED;
whdr.flags &= RXRPC_CLIENT_INITIATED;
- kernel_sendmsg(local->socket, &msg, iov, 2, size);
+ ret = kernel_sendmsg(local->socket, &msg, iov, 2, size);
+ if (ret < 0)
+ trace_rxrpc_tx_fail(local->debug_id, 0, ret,
+ rxrpc_tx_fail_reject);
}
rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
@@ -578,7 +596,8 @@ void rxrpc_send_keepalive(struct rxrpc_peer *peer)
ret = kernel_sendmsg(peer->local->socket, &msg, iov, 2, len);
if (ret < 0)
- _debug("sendmsg failed: %d", ret);
+ trace_rxrpc_tx_fail(peer->debug_id, 0, ret,
+ rxrpc_tx_fail_version_keepalive);
peer->last_tx_at = ktime_get_real();
_leave("");
diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c
index 588fea0dd362..6c0ae27fff84 100644
--- a/net/rxrpc/rxkad.c
+++ b/net/rxrpc/rxkad.c
@@ -664,7 +664,8 @@ static int rxkad_issue_challenge(struct rxrpc_connection *conn)
ret = kernel_sendmsg(conn->params.local->socket, &msg, iov, 2, len);
if (ret < 0) {
- _debug("sendmsg failed: %d", ret);
+ trace_rxrpc_tx_fail(conn->debug_id, serial, ret,
+ rxrpc_tx_fail_conn_challenge);
return -EAGAIN;
}
@@ -719,7 +720,8 @@ static int rxkad_send_response(struct rxrpc_connection *conn,
ret = kernel_sendmsg(conn->params.local->socket, &msg, iov, 3, len);
if (ret < 0) {
- _debug("sendmsg failed: %d", ret);
+ trace_rxrpc_tx_fail(conn->debug_id, serial, ret,
+ rxrpc_tx_fail_conn_response);
return -EAGAIN;
}
^ permalink raw reply related
* [PATCH net 4/5] rxrpc: Add a tracepoint to log ICMP/ICMP6 and error messages
From: David Howells @ 2018-05-10 22:45 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <152599231687.26376.15020977491573449830.stgit@warthog.procyon.org.uk>
Add a tracepoint to log received ICMP/ICMP6 events and other error
messages.
Signed-off-by: David Howells <dhowells@redhat.com>
---
include/trace/events/rxrpc.h | 30 +++++++++++++++++++++++++++
net/rxrpc/peer_event.c | 46 +++++++++++++++++++++---------------------
2 files changed, 53 insertions(+), 23 deletions(-)
diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index 9e96c2fe2793..497d0b67f421 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -15,6 +15,7 @@
#define _TRACE_RXRPC_H
#include <linux/tracepoint.h>
+#include <linux/errqueue.h>
/*
* Define enums for tracing information.
@@ -1374,6 +1375,35 @@ TRACE_EVENT(rxrpc_resend,
__entry->anno)
);
+TRACE_EVENT(rxrpc_rx_icmp,
+ TP_PROTO(struct rxrpc_peer *peer, struct sock_extended_err *ee,
+ struct sockaddr_rxrpc *srx),
+
+ TP_ARGS(peer, ee, srx),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, peer )
+ __field_struct(struct sock_extended_err, ee )
+ __field_struct(struct sockaddr_rxrpc, srx )
+ ),
+
+ TP_fast_assign(
+ __entry->peer = peer->debug_id;
+ memcpy(&__entry->ee, ee, sizeof(__entry->ee));
+ memcpy(&__entry->srx, srx, sizeof(__entry->srx));
+ ),
+
+ TP_printk("P=%08x o=%u t=%u c=%u i=%u d=%u e=%d %pISp",
+ __entry->peer,
+ __entry->ee.ee_origin,
+ __entry->ee.ee_type,
+ __entry->ee.ee_code,
+ __entry->ee.ee_info,
+ __entry->ee.ee_data,
+ __entry->ee.ee_errno,
+ &__entry->srx.transport)
+ );
+
#endif /* _TRACE_RXRPC_H */
/* This part must be outside protection */
diff --git a/net/rxrpc/peer_event.c b/net/rxrpc/peer_event.c
index 78c2f95d1f22..0ed8b651cec2 100644
--- a/net/rxrpc/peer_event.c
+++ b/net/rxrpc/peer_event.c
@@ -28,39 +28,39 @@ static void rxrpc_store_error(struct rxrpc_peer *, struct sock_exterr_skb *);
* Find the peer associated with an ICMP packet.
*/
static struct rxrpc_peer *rxrpc_lookup_peer_icmp_rcu(struct rxrpc_local *local,
- const struct sk_buff *skb)
+ const struct sk_buff *skb,
+ struct sockaddr_rxrpc *srx)
{
struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
- struct sockaddr_rxrpc srx;
_enter("");
- memset(&srx, 0, sizeof(srx));
- srx.transport_type = local->srx.transport_type;
- srx.transport_len = local->srx.transport_len;
- srx.transport.family = local->srx.transport.family;
+ memset(srx, 0, sizeof(*srx));
+ srx->transport_type = local->srx.transport_type;
+ srx->transport_len = local->srx.transport_len;
+ srx->transport.family = local->srx.transport.family;
/* Can we see an ICMP4 packet on an ICMP6 listening socket? and vice
* versa?
*/
- switch (srx.transport.family) {
+ switch (srx->transport.family) {
case AF_INET:
- srx.transport.sin.sin_port = serr->port;
+ srx->transport.sin.sin_port = serr->port;
switch (serr->ee.ee_origin) {
case SO_EE_ORIGIN_ICMP:
_net("Rx ICMP");
- memcpy(&srx.transport.sin.sin_addr,
+ memcpy(&srx->transport.sin.sin_addr,
skb_network_header(skb) + serr->addr_offset,
sizeof(struct in_addr));
break;
case SO_EE_ORIGIN_ICMP6:
_net("Rx ICMP6 on v4 sock");
- memcpy(&srx.transport.sin.sin_addr,
+ memcpy(&srx->transport.sin.sin_addr,
skb_network_header(skb) + serr->addr_offset + 12,
sizeof(struct in_addr));
break;
default:
- memcpy(&srx.transport.sin.sin_addr, &ip_hdr(skb)->saddr,
+ memcpy(&srx->transport.sin.sin_addr, &ip_hdr(skb)->saddr,
sizeof(struct in_addr));
break;
}
@@ -68,25 +68,25 @@ static struct rxrpc_peer *rxrpc_lookup_peer_icmp_rcu(struct rxrpc_local *local,
#ifdef CONFIG_AF_RXRPC_IPV6
case AF_INET6:
- srx.transport.sin6.sin6_port = serr->port;
+ srx->transport.sin6.sin6_port = serr->port;
switch (serr->ee.ee_origin) {
case SO_EE_ORIGIN_ICMP6:
_net("Rx ICMP6");
- memcpy(&srx.transport.sin6.sin6_addr,
+ memcpy(&srx->transport.sin6.sin6_addr,
skb_network_header(skb) + serr->addr_offset,
sizeof(struct in6_addr));
break;
case SO_EE_ORIGIN_ICMP:
_net("Rx ICMP on v6 sock");
- srx.transport.sin6.sin6_addr.s6_addr32[0] = 0;
- srx.transport.sin6.sin6_addr.s6_addr32[1] = 0;
- srx.transport.sin6.sin6_addr.s6_addr32[2] = htonl(0xffff);
- memcpy(srx.transport.sin6.sin6_addr.s6_addr + 12,
+ srx->transport.sin6.sin6_addr.s6_addr32[0] = 0;
+ srx->transport.sin6.sin6_addr.s6_addr32[1] = 0;
+ srx->transport.sin6.sin6_addr.s6_addr32[2] = htonl(0xffff);
+ memcpy(srx->transport.sin6.sin6_addr.s6_addr + 12,
skb_network_header(skb) + serr->addr_offset,
sizeof(struct in_addr));
break;
default:
- memcpy(&srx.transport.sin6.sin6_addr,
+ memcpy(&srx->transport.sin6.sin6_addr,
&ipv6_hdr(skb)->saddr,
sizeof(struct in6_addr));
break;
@@ -98,7 +98,7 @@ static struct rxrpc_peer *rxrpc_lookup_peer_icmp_rcu(struct rxrpc_local *local,
BUG();
}
- return rxrpc_lookup_peer_rcu(local, &srx);
+ return rxrpc_lookup_peer_rcu(local, srx);
}
/*
@@ -146,6 +146,7 @@ static void rxrpc_adjust_mtu(struct rxrpc_peer *peer, struct sock_exterr_skb *se
void rxrpc_error_report(struct sock *sk)
{
struct sock_exterr_skb *serr;
+ struct sockaddr_rxrpc srx;
struct rxrpc_local *local = sk->sk_user_data;
struct rxrpc_peer *peer;
struct sk_buff *skb;
@@ -166,7 +167,7 @@ void rxrpc_error_report(struct sock *sk)
}
rcu_read_lock();
- peer = rxrpc_lookup_peer_icmp_rcu(local, skb);
+ peer = rxrpc_lookup_peer_icmp_rcu(local, skb, &srx);
if (peer && !rxrpc_get_peer_maybe(peer))
peer = NULL;
if (!peer) {
@@ -176,6 +177,8 @@ void rxrpc_error_report(struct sock *sk)
return;
}
+ trace_rxrpc_rx_icmp(peer, &serr->ee, &srx);
+
if ((serr->ee.ee_origin == SO_EE_ORIGIN_ICMP &&
serr->ee.ee_type == ICMP_DEST_UNREACH &&
serr->ee.ee_code == ICMP_FRAG_NEEDED)) {
@@ -209,9 +212,6 @@ static void rxrpc_store_error(struct rxrpc_peer *peer,
ee = &serr->ee;
- _net("Rx Error o=%d t=%d c=%d e=%d",
- ee->ee_origin, ee->ee_type, ee->ee_code, ee->ee_errno);
-
err = ee->ee_errno;
switch (ee->ee_origin) {
^ permalink raw reply related
* [PATCH net 3/5] rxrpc: Fix the min security level for kernel calls
From: David Howells @ 2018-05-10 22:45 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <152599231687.26376.15020977491573449830.stgit@warthog.procyon.org.uk>
Fix the kernel call initiation to set the minimum security level for kernel
initiated calls (such as from kAFS) from the sockopt value.
Fixes: 19ffa01c9c45 ("rxrpc: Use structs to hold connection params and protocol info")
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/af_rxrpc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 9a2c8e7c000e..2b463047dd7b 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -313,7 +313,7 @@ struct rxrpc_call *rxrpc_kernel_begin_call(struct socket *sock,
memset(&cp, 0, sizeof(cp));
cp.local = rx->local;
cp.key = key;
- cp.security_level = 0;
+ cp.security_level = rx->min_sec_level;
cp.exclusive = false;
cp.upgrade = upgrade;
cp.service_id = srx->srx_service;
^ permalink raw reply related
* [PATCH net 2/5] rxrpc: Fix error reception on AF_INET6 sockets
From: David Howells @ 2018-05-10 22:45 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <152599231687.26376.15020977491573449830.stgit@warthog.procyon.org.uk>
AF_RXRPC tries to turn on IP_RECVERR and IP_MTU_DISCOVER on the UDP socket
it just opened for communications with the outside world, regardless of the
type of socket. Unfortunately, this doesn't work with an AF_INET6 socket.
Fix this by turning on IPV6_RECVERR and IPV6_MTU_DISCOVER instead if the
socket is of the AF_INET6 family.
Without this, kAFS server and address rotation doesn't work correctly
because the algorithm doesn't detect received network errors.
Fixes: 75b54cb57ca3 ("rxrpc: Add IPv6 support")
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/local_object.c | 57 ++++++++++++++++++++++++++++++++++------------
1 file changed, 42 insertions(+), 15 deletions(-)
diff --git a/net/rxrpc/local_object.c b/net/rxrpc/local_object.c
index 8b54e9531d52..b493e6b62740 100644
--- a/net/rxrpc/local_object.c
+++ b/net/rxrpc/local_object.c
@@ -134,22 +134,49 @@ static int rxrpc_open_socket(struct rxrpc_local *local, struct net *net)
}
}
- /* we want to receive ICMP errors */
- opt = 1;
- ret = kernel_setsockopt(local->socket, SOL_IP, IP_RECVERR,
- (char *) &opt, sizeof(opt));
- if (ret < 0) {
- _debug("setsockopt failed");
- goto error;
- }
+ switch (local->srx.transport.family) {
+ case AF_INET:
+ /* we want to receive ICMP errors */
+ opt = 1;
+ ret = kernel_setsockopt(local->socket, SOL_IP, IP_RECVERR,
+ (char *) &opt, sizeof(opt));
+ if (ret < 0) {
+ _debug("setsockopt failed");
+ goto error;
+ }
- /* we want to set the don't fragment bit */
- opt = IP_PMTUDISC_DO;
- ret = kernel_setsockopt(local->socket, SOL_IP, IP_MTU_DISCOVER,
- (char *) &opt, sizeof(opt));
- if (ret < 0) {
- _debug("setsockopt failed");
- goto error;
+ /* we want to set the don't fragment bit */
+ opt = IP_PMTUDISC_DO;
+ ret = kernel_setsockopt(local->socket, SOL_IP, IP_MTU_DISCOVER,
+ (char *) &opt, sizeof(opt));
+ if (ret < 0) {
+ _debug("setsockopt failed");
+ goto error;
+ }
+ break;
+
+ case AF_INET6:
+ /* we want to receive ICMP errors */
+ opt = 1;
+ ret = kernel_setsockopt(local->socket, SOL_IPV6, IPV6_RECVERR,
+ (char *) &opt, sizeof(opt));
+ if (ret < 0) {
+ _debug("setsockopt failed");
+ goto error;
+ }
+
+ /* we want to set the don't fragment bit */
+ opt = IPV6_PMTUDISC_DO;
+ ret = kernel_setsockopt(local->socket, SOL_IPV6, IPV6_MTU_DISCOVER,
+ (char *) &opt, sizeof(opt));
+ if (ret < 0) {
+ _debug("setsockopt failed");
+ goto error;
+ }
+ break;
+
+ default:
+ BUG();
}
/* set the socket up */
^ permalink raw reply related
* [PATCH net 1/5] rxrpc: Fix missing start of call timeout
From: David Howells @ 2018-05-10 22:45 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <152599231687.26376.15020977491573449830.stgit@warthog.procyon.org.uk>
The expect_rx_by call timeout is supposed to be set when a call is started
to indicate that we need to receive a packet by that point. This is
currently put back every time we receive a packet, but it isn't started
when we first send a packet. Without this, the call may wait forever if
the server doesn't deign to reply.
Fix this by setting the timeout upon a successful UDP sendmsg call for the
first DATA packet. The timeout is initiated only for initial transmission
and not for subsequent retries as we don't want the retry mechanism to
extend the timeout indefinitely.
Fixes: a158bdd3247b ("rxrpc: Fix call timeouts")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/ar-internal.h | 1 +
net/rxrpc/input.c | 2 +-
net/rxrpc/output.c | 11 +++++++++++
net/rxrpc/sendmsg.c | 10 ++++++++++
4 files changed, 23 insertions(+), 1 deletion(-)
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 90d7079e0aa9..19975d2ca9a2 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -476,6 +476,7 @@ enum rxrpc_call_flag {
RXRPC_CALL_SEND_PING, /* A ping will need to be sent */
RXRPC_CALL_PINGING, /* Ping in process */
RXRPC_CALL_RETRANS_TIMEOUT, /* Retransmission due to timeout occurred */
+ RXRPC_CALL_BEGAN_RX_TIMER, /* We began the expect_rx_by timer */
};
/*
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 0410d2277ca2..b5fd6381313d 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -971,7 +971,7 @@ static void rxrpc_input_call_packet(struct rxrpc_call *call,
if (timo) {
unsigned long now = jiffies, expect_rx_by;
- expect_rx_by = jiffies + timo;
+ expect_rx_by = now + timo;
WRITE_ONCE(call->expect_rx_by, expect_rx_by);
rxrpc_reduce_call_timer(call, expect_rx_by, now,
rxrpc_timer_set_for_normal);
diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index 7f1fc04775b3..6b9d27f0d7ec 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -414,6 +414,17 @@ int rxrpc_send_data_packet(struct rxrpc_call *call, struct sk_buff *skb,
rxrpc_timer_set_for_lost_ack);
}
}
+
+ if (sp->hdr.seq == 1 &&
+ !test_and_set_bit(RXRPC_CALL_BEGAN_RX_TIMER,
+ &call->flags)) {
+ unsigned long nowj = jiffies, expect_rx_by;
+
+ expect_rx_by = nowj + call->next_rx_timo;
+ WRITE_ONCE(call->expect_rx_by, expect_rx_by);
+ rxrpc_reduce_call_timer(call, expect_rx_by, nowj,
+ rxrpc_timer_set_for_normal);
+ }
}
rxrpc_set_keepalive(call);
diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index 206e802ccbdc..be01f9c5d963 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -223,6 +223,15 @@ static void rxrpc_queue_packet(struct rxrpc_sock *rx, struct rxrpc_call *call,
ret = rxrpc_send_data_packet(call, skb, false);
if (ret < 0) {
+ switch (ret) {
+ case -ENETUNREACH:
+ case -EHOSTUNREACH:
+ case -ECONNREFUSED:
+ rxrpc_set_call_completion(call,
+ RXRPC_CALL_LOCAL_ERROR,
+ 0, ret);
+ goto out;
+ }
_debug("need instant resend %d", ret);
rxrpc_instant_resend(call, ix);
} else {
@@ -241,6 +250,7 @@ static void rxrpc_queue_packet(struct rxrpc_sock *rx, struct rxrpc_call *call,
rxrpc_timer_set_for_send);
}
+out:
rxrpc_free_skb(skb, rxrpc_skb_tx_freed);
_leave("");
}
^ permalink raw reply related
* [PATCH net 0/5] rxrpc: Fixes
From: David Howells @ 2018-05-10 22:45 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
Here are three fixes for AF_RXRPC and two tracepoints that were useful for
finding them:
(1) Fix missing start of expect-Rx-by timeout on initial packet
transmission so that calls will time out if the peer doesn't respond.
(2) Fix error reception on AF_INET6 sockets by using the correct family of
sockopts on the UDP transport socket.
(3) Fix setting the minimum security level on kernel calls so that they
can be encrypted.
(4) Add a tracepoint to log ICMP/ICMP6 and other error reports from the
transport socket.
(5) Add a tracepoint to log UDP sendmsg failure so that we can find out if
transmission failure occurred on the UDP socket.
The patches are tagged here:
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
rxrpc-fixes-20180510
and can also be found on the following branch:
http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-fixes
David
---
David Howells (5):
rxrpc: Fix missing start of call timeout
rxrpc: Fix error reception on AF_INET6 sockets
rxrpc: Fix the min security level for kernel calls
rxrpc: Add a tracepoint to log ICMP/ICMP6 and error messages
rxrpc: Trace UDP transmission failure
include/trace/events/rxrpc.h | 85 ++++++++++++++++++++++++++++++++++++++++++
net/rxrpc/af_rxrpc.c | 2 -
net/rxrpc/ar-internal.h | 1
net/rxrpc/conn_event.c | 11 ++++-
net/rxrpc/input.c | 2 -
net/rxrpc/local_event.c | 3 +
net/rxrpc/local_object.c | 57 +++++++++++++++++++++-------
net/rxrpc/output.c | 34 ++++++++++++++++-
net/rxrpc/peer_event.c | 46 +++++++++++------------
net/rxrpc/rxkad.c | 6 ++-
net/rxrpc/sendmsg.c | 10 +++++
11 files changed, 209 insertions(+), 48 deletions(-)
^ permalink raw reply
* Re: [PATCH v6 0/5] PCI: Improve PCIe link status reporting
From: Bjorn Helgaas @ 2018-05-10 22:29 UTC (permalink / raw)
To: Jeff Kirsher, Ganesh Goudar, Michael Chan, Ariel Elior
Cc: linux-pci, everest-linux-l2, intel-wired-lan, netdev,
linux-kernel, Tal Gilboa, Tariq Toukan, Jacob Keller,
Jakub Kicinski
In-Reply-To: <152537719056.62474.2571390812509425478.stgit@bhelgaas-glaptop.roam.corp.google.com>
On Thu, May 03, 2018 at 03:00:07PM -0500, Bjorn Helgaas wrote:
> This is based on Tal's recent work to unify the approach for reporting PCIe
> link speed/width and whether the device is being limited by a slower
> upstream link.
>
> The new pcie_print_link_status() interface appeared in v4.17-rc1; see
> 9e506a7b5147 ("PCI: Add pcie_print_link_status() to log link speed and
> whether it's limited").
>
> That's a good way to replace use of pcie_get_minimum_link(), which gives
> misleading results when a path contains both a fast, narrow link and a
> slow, wide link: it reports the equivalent of a slow, narrow link.
>
> This series removes the remaining uses of pcie_get_minimum_link() and then
> removes the interface itself. I'd like to merge them all through the PCI
> tree to make the removal easy.
>
> This does change the dmesg reporting of link speeds, and in the ixgbe case,
> it changes the reporting from KERN_WARN level to KERN_INFO. If that's an
> issue, let's talk about it. I'm hoping the reduce code size, improved
> functionality, and consistency across drivers is enough to make this
> worthwhile.
>
> ---
>
> Bjorn Helgaas (5):
> bnx2x: Report PCIe link properties with pcie_print_link_status()
> bnxt_en: Report PCIe link properties with pcie_print_link_status()
> cxgb4: Report PCIe link properties with pcie_print_link_status()
> ixgbe: Report PCIe link properties with pcie_print_link_status()
> PCI: Remove unused pcie_get_minimum_link()
Jeff has acked the ixgbe patch.
Any comments on the bnx2x, bnxt_en, or cxgb4 patches?
> drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 23 ++-----
> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 19 ------
> drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 75 ----------------------
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 47 --------------
> drivers/pci/pci.c | 43 -------------
> include/linux/pci.h | 2 -
> 6 files changed, 9 insertions(+), 200 deletions(-)
^ permalink raw reply
* Re: [PATCH] bpf, doc: clarification for the meaning of 'id'
From: Daniel Borkmann @ 2018-05-10 22:29 UTC (permalink / raw)
To: Wang YanQing, ast, corbet, netdev, linux-kernel, linux-doc
In-Reply-To: <20180510030921.GA25941@udknight>
On 05/10/2018 05:09 AM, Wang YanQing wrote:
> For me, as a reader whose mother language isn't English, the
> old words bring a little difficulty to catch the meaning, this
> patch rewords the subsection in a more clarificatory way.
>
> This patch also add blank lines as separator at two places
> to improve readability.
>
> Signed-off-by: Wang YanQing <udknight@gmail.com>
Applied to bpf-next, thanks Wang!
^ permalink raw reply
* Re: [PATCH bpf] tools: bpf: handle NULL return in bpf_prog_load_xattr()
From: Daniel Borkmann @ 2018-05-10 22:28 UTC (permalink / raw)
To: Jakub Kicinski, alexei.starovoitov; +Cc: oss-drivers, netdev
In-Reply-To: <20180510170934.15283-1-jakub.kicinski@netronome.com>
On 05/10/2018 07:09 PM, Jakub Kicinski wrote:
> bpf_object__open() can return error pointer as well as NULL.
> Fix error handling in bpf_prog_load_xattr() (and indirectly
> bpf_prog_load()).
>
> Fixes: 6f6d33f3b3d0 ("bpf: selftests add sockmap tests")
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Applied to bpf tree, thanks Jakub!
^ permalink raw reply
* Re: [PATCH v2 net-next 1/4] umh: introduce fork_usermode_blob() helper
From: Kees Cook @ 2018-05-10 22:27 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Alexei Starovoitov, David S. Miller, Daniel Borkmann,
Linus Torvalds, Greg KH, Andy Lutomirski, Network Development,
LKML, kernel-team, Al Viro, David Howells, Mimi Zohar,
Andrew Morton, Dominik Brodowski, Hugh Dickins, Jani Nikula,
Joonas Lahtinen, Rodrigo Vivi, David Airlie
In-Reply-To: <20180504195642.GB12838@wotan.suse.de>
On Fri, May 4, 2018 at 12:56 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> What a mighty short list of reviewers. Adding some more. My review below.
> I'd appreciate a Cc on future versions of these patches.
Me too, please. And likely linux-security-module@ and Jessica too.
> On Wed, May 02, 2018 at 09:36:01PM -0700, Alexei Starovoitov wrote:
>> Introduce helper:
>> int fork_usermode_blob(void *data, size_t len, struct umh_info *info);
>> struct umh_info {
>> struct file *pipe_to_umh;
>> struct file *pipe_from_umh;
>> pid_t pid;
>> };
>>
>> that GPLed kernel modules (signed or unsigned) can use it to execute part
>> of its own data as swappable user mode process.
>>
>> The kernel will do:
>> - mount "tmpfs"
>> - allocate a unique file in tmpfs
>> - populate that file with [data, data + len] bytes
>> - user-mode-helper code will do_execve that file and, before the process
>> starts, the kernel will create two unix pipes for bidirectional
>> communication between kernel module and umh
>> - close tmpfs file, effectively deleting it
>> - the fork_usermode_blob will return zero on success and populate
>> 'struct umh_info' with two unix pipes and the pid of the user process
I'm trying to think how LSMs can successfully reason about the
resulting exec(). In the past, we've replaced "blob" style interfaces
with file-based interfaces (e.g. init_module() -> finit_module(),
kexec_load() -> kexec_file_load()) to better let the kernel understand
the origin of executable content. Here the intent is fine: we're
getting the exec from an already-loaded module, etc, etc. I'm trying
to think specifically about the interface.
How can the ultimate exec get tied back to the kernel module in a way
that the LSM can query? Right now the hooks hit during exec are:
kernel_read_file() and kernel_post_read_file() of tmpfs file,
bprm_set_creds(), bprm_check(), bprm_commiting_creds(),
bprm_commited_creds(). It seems silly to me for an LSM to perform
these checks at all since I would expect the _meaningful_ check to be
finit_module() of the module itself. Having a way for an LSM to know
the exec is tied to a kernel module would let them skip the nonsense
checks.
Since the process for doing the usermode_blob is defined by the kernel
module build/link/objcopy process, could we tighten the
fork_usermode_blob() interface to point to the kernel module itself,
rather than leaving it an open-ended "blob" interface? Given our
history of needing to replace blob interfaces with file interfaces,
I'm cautious to add a new blob interface. Maybe just pull all the
blob-finding/loading into the interface, and just make it something
like fork_usermode_kmod(struct module *mod, struct umh_info *info) ?
-Kees
--
Kees Cook
Pixel Security
^ permalink raw reply
* [PATCH bpf-next] selftests/bpf: Fix bash reference in Makefile
From: Joe Stringer @ 2018-05-10 22:26 UTC (permalink / raw)
To: daniel; +Cc: netdev
'|& ...' is a bash 4.0+ construct which is not guaranteed to be available
when using '$(shell ...)' in a Makefile. Fall back to the more portable
'2>&1 | ...'.
Fixes the following warning during compilation:
/bin/sh: 1: Syntax error: "&" unexpected
Signed-off-by: Joe Stringer <joe@wand.net.nz>
---
tools/testing/selftests/bpf/Makefile | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 9d762184b805..79d29d6cc719 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -90,9 +90,9 @@ CLANG_FLAGS = -I. -I./include/uapi -I../../../include/uapi \
$(OUTPUT)/test_l4lb_noinline.o: CLANG_FLAGS += -fno-inline
$(OUTPUT)/test_xdp_noinline.o: CLANG_FLAGS += -fno-inline
-BTF_LLC_PROBE := $(shell $(LLC) -march=bpf -mattr=help |& grep dwarfris)
-BTF_PAHOLE_PROBE := $(shell $(BTF_PAHOLE) --help |& grep BTF)
-BTF_OBJCOPY_PROBE := $(shell $(LLVM_OBJCOPY) --version |& grep LLVM)
+BTF_LLC_PROBE := $(shell $(LLC) -march=bpf -mattr=help 2>&1 | grep dwarfris)
+BTF_PAHOLE_PROBE := $(shell $(BTF_PAHOLE) --help 2>&1 | grep BTF)
+BTF_OBJCOPY_PROBE := $(shell $(LLVM_OBJCOPY) --version 2>&1 | grep LLVM)
ifneq ($(BTF_LLC_PROBE),)
ifneq ($(BTF_PAHOLE_PROBE),)
--
2.14.1
^ permalink raw reply related
* [PATCH net v2] rps: Correct wrong skb_flow_limit check when enable RPS
From: gfree.wind @ 2018-05-10 22:09 UTC (permalink / raw)
To: davem, daniel, edumazet, willemb, jakub.kicinski, ktkhai, ast,
linux, john.fastabend, brouer, dsahern, netdev
Cc: Gao Feng
From: Gao Feng <gfree.wind@vip.163.com>
The skb flow limit is implemented for each CPU independently. In the
current codes, the function skb_flow_limit gets the softnet_data by
this_cpu_ptr. But the target cpu of enqueue_to_backlog would be not
the current cpu when enable RPS. As the result, the skb_flow_limit checks
the stats of current CPU, while the skb is going to append the queue of
another CPU. It isn't the expected behavior.
Now pass the softnet_data as a param to make consistent.
Fixes: 99bbc7074190 ("rps: selective flow shedding during softnet overflow")
Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
---
v2: Add Fixes tag per Eric, and enhance the commit log
v1: intial version
net/core/dev.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index af0558b..0f98eff 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3883,18 +3883,15 @@ static int rps_ipi_queued(struct softnet_data *sd)
int netdev_flow_limit_table_len __read_mostly = (1 << 12);
#endif
-static bool skb_flow_limit(struct sk_buff *skb, unsigned int qlen)
+static bool skb_flow_limit(struct softnet_data *sd, struct sk_buff *skb, unsigned int qlen)
{
#ifdef CONFIG_NET_FLOW_LIMIT
struct sd_flow_limit *fl;
- struct softnet_data *sd;
unsigned int old_flow, new_flow;
if (qlen < (netdev_max_backlog >> 1))
return false;
- sd = this_cpu_ptr(&softnet_data);
-
rcu_read_lock();
fl = rcu_dereference(sd->flow_limit);
if (fl) {
@@ -3938,7 +3935,7 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
if (!netif_running(skb->dev))
goto drop;
qlen = skb_queue_len(&sd->input_pkt_queue);
- if (qlen <= netdev_max_backlog && !skb_flow_limit(skb, qlen)) {
+ if (qlen <= netdev_max_backlog && !skb_flow_limit(sd, skb, qlen)) {
if (qlen) {
enqueue:
__skb_queue_tail(&sd->input_pkt_queue, skb);
--
1.9.1
^ permalink raw reply related
* Re: [PATCH net-next] tcp: switch pacing timer to softirq based hrtimer
From: David Miller @ 2018-05-10 22:00 UTC (permalink / raw)
To: eric.dumazet; +Cc: edumazet, netdev
In-Reply-To: <604c253d-a6d9-99ff-317b-b2ca2ffff54a@gmail.com>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 10 May 2018 13:55:00 -0700
>
>
> On 05/10/2018 12:49 PM, Eric Dumazet wrote:
>> linux-4.16 got support for softirq based hrtimers.
>> TCP can switch its pacing hrtimer to this variant, since this
>> avoids going through a tasklet and some atomic operations.
>>
>
> I need to send a V2, adding a test of hrtimer_cancel() return value
> in tcp_clear_xmit_timers() to eventually release the socket reference.
Ok.
^ permalink raw reply
* [PATCH v2 net-next] tcp: switch pacing timer to softirq based hrtimer
From: Eric Dumazet @ 2018-05-10 21:59 UTC (permalink / raw)
To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet
linux-4.16 got support for softirq based hrtimers.
TCP can switch its pacing hrtimer to this variant, since this
avoids going through a tasklet and some atomic operations.
pacing timer logic looks like other (jiffies based) tcp timers.
v2: use hrtimer_try_to_cancel() in tcp_clear_xmit_timers()
to correctly release reference on socket if needed.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/tcp.h | 4 ++-
net/ipv4/tcp_output.c | 69 ++++++++++++++++---------------------------
net/ipv4/tcp_timer.c | 2 +-
3 files changed, 29 insertions(+), 46 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index cf803fe0fb86b6a0fb1876a9f775a9c6e6a28ac4..3b1d617b01109b133b4ecafa9ee46173851083f8 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -557,7 +557,9 @@ void tcp_fin(struct sock *sk);
void tcp_init_xmit_timers(struct sock *);
static inline void tcp_clear_xmit_timers(struct sock *sk)
{
- hrtimer_cancel(&tcp_sk(sk)->pacing_timer);
+ if (hrtimer_try_to_cancel(&tcp_sk(sk)->pacing_timer) == 1)
+ sock_put(sk);
+
inet_csk_clear_xmit_timers(sk);
}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index d07c0dcc99aaa55c4da963599c8286c8baa1f783..0d8f950a9006598c70dbf51e281a3fe32dfaa234 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -772,7 +772,7 @@ struct tsq_tasklet {
};
static DEFINE_PER_CPU(struct tsq_tasklet, tsq_tasklet);
-static void tcp_tsq_handler(struct sock *sk)
+static void tcp_tsq_write(struct sock *sk)
{
if ((1 << sk->sk_state) &
(TCPF_ESTABLISHED | TCPF_FIN_WAIT1 | TCPF_CLOSING |
@@ -789,6 +789,16 @@ static void tcp_tsq_handler(struct sock *sk)
0, GFP_ATOMIC);
}
}
+
+static void tcp_tsq_handler(struct sock *sk)
+{
+ bh_lock_sock(sk);
+ if (!sock_owned_by_user(sk))
+ tcp_tsq_write(sk);
+ else if (!test_and_set_bit(TCP_TSQ_DEFERRED, &sk->sk_tsq_flags))
+ sock_hold(sk);
+ bh_unlock_sock(sk);
+}
/*
* One tasklet per cpu tries to send more skbs.
* We run in tasklet context but need to disable irqs when
@@ -816,16 +826,7 @@ static void tcp_tasklet_func(unsigned long data)
smp_mb__before_atomic();
clear_bit(TSQ_QUEUED, &sk->sk_tsq_flags);
- if (!sk->sk_lock.owned &&
- test_bit(TCP_TSQ_DEFERRED, &sk->sk_tsq_flags)) {
- bh_lock_sock(sk);
- if (!sock_owned_by_user(sk)) {
- clear_bit(TCP_TSQ_DEFERRED, &sk->sk_tsq_flags);
- tcp_tsq_handler(sk);
- }
- bh_unlock_sock(sk);
- }
-
+ tcp_tsq_handler(sk);
sk_free(sk);
}
}
@@ -853,9 +854,10 @@ void tcp_release_cb(struct sock *sk)
nflags = flags & ~TCP_DEFERRED_ALL;
} while (cmpxchg(&sk->sk_tsq_flags, flags, nflags) != flags);
- if (flags & TCPF_TSQ_DEFERRED)
- tcp_tsq_handler(sk);
-
+ if (flags & TCPF_TSQ_DEFERRED) {
+ tcp_tsq_write(sk);
+ __sock_put(sk);
+ }
/* Here begins the tricky part :
* We are called from release_sock() with :
* 1) BH disabled
@@ -929,7 +931,7 @@ void tcp_wfree(struct sk_buff *skb)
if (!(oval & TSQF_THROTTLED) || (oval & TSQF_QUEUED))
goto out;
- nval = (oval & ~TSQF_THROTTLED) | TSQF_QUEUED | TCPF_TSQ_DEFERRED;
+ nval = (oval & ~TSQF_THROTTLED) | TSQF_QUEUED;
nval = cmpxchg(&sk->sk_tsq_flags, oval, nval);
if (nval != oval)
continue;
@@ -948,37 +950,17 @@ void tcp_wfree(struct sk_buff *skb)
sk_free(sk);
}
-/* Note: Called under hard irq.
- * We can not call TCP stack right away.
+/* Note: Called under soft irq.
+ * We can call TCP stack right away, unless socket is owned by user.
*/
enum hrtimer_restart tcp_pace_kick(struct hrtimer *timer)
{
struct tcp_sock *tp = container_of(timer, struct tcp_sock, pacing_timer);
struct sock *sk = (struct sock *)tp;
- unsigned long nval, oval;
- for (oval = READ_ONCE(sk->sk_tsq_flags);; oval = nval) {
- struct tsq_tasklet *tsq;
- bool empty;
+ tcp_tsq_handler(sk);
+ sock_put(sk);
- if (oval & TSQF_QUEUED)
- break;
-
- nval = (oval & ~TSQF_THROTTLED) | TSQF_QUEUED | TCPF_TSQ_DEFERRED;
- nval = cmpxchg(&sk->sk_tsq_flags, oval, nval);
- if (nval != oval)
- continue;
-
- if (!refcount_inc_not_zero(&sk->sk_wmem_alloc))
- break;
- /* queue this socket to tasklet queue */
- tsq = this_cpu_ptr(&tsq_tasklet);
- empty = list_empty(&tsq->head);
- list_add(&tp->tsq_node, &tsq->head);
- if (empty)
- tasklet_schedule(&tsq->tasklet);
- break;
- }
return HRTIMER_NORESTART;
}
@@ -1011,7 +993,8 @@ static void tcp_internal_pacing(struct sock *sk, const struct sk_buff *skb)
do_div(len_ns, rate);
hrtimer_start(&tcp_sk(sk)->pacing_timer,
ktime_add_ns(ktime_get(), len_ns),
- HRTIMER_MODE_ABS_PINNED);
+ HRTIMER_MODE_ABS_PINNED_SOFT);
+ sock_hold(sk);
}
static void tcp_update_skb_after_send(struct tcp_sock *tp, struct sk_buff *skb)
@@ -1078,7 +1061,7 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
/* if no packet is in qdisc/device queue, then allow XPS to select
* another queue. We can be called from tcp_tsq_handler()
- * which holds one reference to sk_wmem_alloc.
+ * which holds one reference to sk.
*
* TODO: Ideally, in-flight pure ACK packets should not matter here.
* One way to get this would be to set skb->truesize = 2 on them.
@@ -2185,7 +2168,7 @@ static int tcp_mtu_probe(struct sock *sk)
static bool tcp_pacing_check(const struct sock *sk)
{
return tcp_needs_internal_pacing(sk) &&
- hrtimer_active(&tcp_sk(sk)->pacing_timer);
+ hrtimer_is_queued(&tcp_sk(sk)->pacing_timer);
}
/* TCP Small Queues :
@@ -2365,8 +2348,6 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
skb, limit, mss_now, gfp)))
break;
- if (test_bit(TCP_TSQ_DEFERRED, &sk->sk_tsq_flags))
- clear_bit(TCP_TSQ_DEFERRED, &sk->sk_tsq_flags);
if (tcp_small_queue_check(sk, skb, 0))
break;
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index f7d944855f8ebd0a312fe73a53a56ab8d451ee44..92bdf64fffae3a5be291ca419eb21276b4c8cbae 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -713,6 +713,6 @@ void tcp_init_xmit_timers(struct sock *sk)
inet_csk_init_xmit_timers(sk, &tcp_write_timer, &tcp_delack_timer,
&tcp_keepalive_timer);
hrtimer_init(&tcp_sk(sk)->pacing_timer, CLOCK_MONOTONIC,
- HRTIMER_MODE_ABS_PINNED);
+ HRTIMER_MODE_ABS_PINNED_SOFT);
tcp_sk(sk)->pacing_timer.function = tcp_pace_kick;
}
--
2.17.0.441.gb46fe60e1d-goog
^ permalink raw reply related
* Re: pull-request: can 2018-05-10
From: David Miller @ 2018-05-10 21:58 UTC (permalink / raw)
To: mkl; +Cc: netdev, linux-can, kernel
In-Reply-To: <20180510164749.20481-1-mkl@pengutronix.de>
From: Marc Kleine-Budde <mkl@pengutronix.de>
Date: Thu, 10 May 2018 18:47:47 +0200
> this is a pull request for net/master consisting of 2 patches.
>
> Both patches are from Lukas Wunner and fix two problems found in the hi311x CAN
> driver under high load situations.
Applied.
^ permalink raw reply
* Re: [PATCH] qed: fix spelling mistake: "taskelt" -> "tasklet"
From: David Miller @ 2018-05-10 21:56 UTC (permalink / raw)
To: colin.king
Cc: Ariel.Elior, everest-linux-l2, netdev, kernel-janitors,
linux-kernel
In-Reply-To: <20180510140327.15991-1-colin.king@canonical.com>
From: Colin King <colin.king@canonical.com>
Date: Thu, 10 May 2018 15:03:27 +0100
> From: Colin Ian King <colin.king@canonical.com>
>
> Trivial fix to spelling mistake in DP_VERBOSE message text
>
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next] rocker: Postpone filtering of !added_by_user FDB
From: David Miller @ 2018-05-10 21:55 UTC (permalink / raw)
To: petrm; +Cc: netdev, jiri, vivien.didelot
In-Reply-To: <6fbb39711485d7abbe7cdaa8042a222b9588a6b6.1525957759.git.petrm@mellanox.com>
From: Petr Machata <petrm@mellanox.com>
Date: Thu, 10 May 2018 15:29:46 +0200
> Breaking out of the switch in rocker_switchdev_event() still ends up
> scheduling work, except an ill-defined one. This leads to an OOPS cited
> below. Fix by postponing the check until rocker_switchdev_event_work().
...
> Fixes: 816a3bed9549 ("switchdev: Add fdb.added_by_user to switchdev notifications")
> Suggested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
> Signed-off-by: Petr Machata <petrm@mellanox.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next] tls: Fix tls_device initialization
From: David Miller @ 2018-05-10 21:54 UTC (permalink / raw)
To: borisp; +Cc: netdev, davejwatson
In-Reply-To: <1525958845-72260-1-git-send-email-borisp@mellanox.com>
From: Boris Pismenny <borisp@mellanox.com>
Date: Thu, 10 May 2018 16:27:25 +0300
> Add sg table initialization to fix a BUG_ON encountered when enabling
> CONFIG_DEBUG_SG.
>
> Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Applied.
^ permalink raw reply
* Re: [PATCH][next] net: aquantia: fix unsigned numvecs comparison with less than zero
From: David Miller @ 2018-05-10 21:53 UTC (permalink / raw)
To: colin.king
Cc: igor.russkikh, pavel.belous, weiyongjun1, netdev, kernel-janitors,
linux-kernel
In-Reply-To: <20180510125201.19707-1-colin.king@canonical.com>
From: Colin King <colin.king@canonical.com>
Date: Thu, 10 May 2018 13:52:01 +0100
> From: Colin Ian King <colin.king@canonical.com>
>
> The comparison of numvecs < 0 is always false because numvecs is a u32
> and hence the error return from a failed call to pci_alloc_irq_vectores
> is never detected. Fix this by using the signed int ret to handle the
> error return and assign numvecs to err.
>
> Detected by CoverityScan, CID#1468650 ("Unsigned compared against 0")
>
> Fixes: a09bd81b5413 ("net: aquantia: Limit number of vectors to actually allocated irqs")
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
This doesn't apply to net-next.
^ permalink raw reply
* Re: [PATCH net-next] cxgb4: fix the wrong conversion of Mbps to Kbps
From: David Miller @ 2018-05-10 21:52 UTC (permalink / raw)
To: ganeshgr; +Cc: netdev, nirranjan, indranil, venkatesh
In-Reply-To: <1525948643-13034-1-git-send-email-ganeshgr@chelsio.com>
From: Ganesh Goudar <ganeshgr@chelsio.com>
Date: Thu, 10 May 2018 16:07:23 +0530
> fix the wrong conversion where 1 Mbps was converted to
> 1024 Kbps.
>
> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Applied, thanks.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox