* [PATCH net-next] selftests: use posix-style redirection in ip_defrag.sh
From: Paolo Abeni @ 2018-10-11 9:17 UTC (permalink / raw)
To: netdev; +Cc: David S. Miller, Author : Peter Oskolkov
The ip_defrag.sh script requires bash-style output redirection but
use the default shell. This may cause random failures if the default
shell is not bash.
Address the above using posix compliant output redirection.
Fixes: 02c7f38b7ace ("selftests/net: add ip_defrag selftest")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
tools/testing/selftests/net/ip_defrag.sh | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/net/ip_defrag.sh b/tools/testing/selftests/net/ip_defrag.sh
index 3a4042d918f0..f34672796044 100755
--- a/tools/testing/selftests/net/ip_defrag.sh
+++ b/tools/testing/selftests/net/ip_defrag.sh
@@ -11,10 +11,10 @@ readonly NETNS="ns-$(mktemp -u XXXXXX)"
setup() {
ip netns add "${NETNS}"
ip -netns "${NETNS}" link set lo up
- ip netns exec "${NETNS}" sysctl -w net.ipv4.ipfrag_high_thresh=9000000 &> /dev/null
- ip netns exec "${NETNS}" sysctl -w net.ipv4.ipfrag_low_thresh=7000000 &> /dev/null
- ip netns exec "${NETNS}" sysctl -w net.ipv6.ip6frag_high_thresh=9000000 &> /dev/null
- ip netns exec "${NETNS}" sysctl -w net.ipv6.ip6frag_low_thresh=7000000 &> /dev/null
+ ip netns exec "${NETNS}" sysctl -w net.ipv4.ipfrag_high_thresh=9000000 >/dev/null 2>&1
+ ip netns exec "${NETNS}" sysctl -w net.ipv4.ipfrag_low_thresh=7000000 >/dev/null 2>&1
+ ip netns exec "${NETNS}" sysctl -w net.ipv6.ip6frag_high_thresh=9000000 >/dev/null 2>&1
+ ip netns exec "${NETNS}" sysctl -w net.ipv6.ip6frag_low_thresh=7000000 >/dev/null 2>&1
}
cleanup() {
--
2.17.2
^ permalink raw reply related
* [PATCH net-next 0/3] veth: XDP stats improvement
From: Toshiaki Makita @ 2018-10-11 9:36 UTC (permalink / raw)
To: David S. Miller; +Cc: Toshiaki Makita, netdev, Jesper Dangaard Brouer
ndo_xdp_xmit in veth did not update packet counters as described in [1].
Also, current implementation only updates counters on tx side so rx side
events like XDP_DROP were not collected.
This series implements the missing accounting as well as support for
ethtool per-queue stats in veth.
Patch 1: Update drop counter in ndo_xdp_xmit.
Patch 2: Update packet and byte counters for all XDP path, and drop
counter on XDP_DROP.
Patch 3: Support per-queue ethtool stats for XDP counters.
Note that counters are maintained on per-queue basis for XDP but not
otherwise (per-cpu and atomic as before). This is because 1) tx path in
veth is essentially lockless so we cannot update per-queue stats on tx,
and 2) rx path is net core routine (process_backlog) which cannot update
per-queue based stats when XDP is disabled. On the other hand there are
real rxqs and napi handlers for veth XDP, so update per-queue stats on
rx for XDP packets, and use them to calculate tx counters as well,
contrary to the existing non-XDP counters.
[1] https://patchwork.ozlabs.org/cover/953071/#1967449
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Toshiaki Makita (3):
veth: Account for packet drops in ndo_xdp_xmit
veth: Account for XDP packet statistics on rx side
veth: Add ethtool statistics support for XDP
drivers/net/veth.c | 175 ++++++++++++++++++++++++++++++++++++++++++++---------
1 file changed, 147 insertions(+), 28 deletions(-)
--
1.8.3.1
^ permalink raw reply
* [PATCH net-next 1/3] veth: Account for packet drops in ndo_xdp_xmit
From: Toshiaki Makita @ 2018-10-11 9:36 UTC (permalink / raw)
To: David S. Miller; +Cc: Toshiaki Makita, netdev, Jesper Dangaard Brouer
In-Reply-To: <1539250610-2557-1-git-send-email-makita.toshiaki@lab.ntt.co.jp>
Use existing atomic drop counter. Since drop path is really an
exceptional case here, I'm thinking atomic ops would not hurt the
performance.
XDP packets and bytes are not counted in ndo_xdp_xmit, but will be
accounted on rx side by the following commit.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
---
drivers/net/veth.c | 30 ++++++++++++++++++++++--------
1 file changed, 22 insertions(+), 8 deletions(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 224c56a..452193f2 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -308,16 +308,20 @@ static int veth_xdp_xmit(struct net_device *dev, int n,
{
struct veth_priv *rcv_priv, *priv = netdev_priv(dev);
struct net_device *rcv;
+ int i, ret, drops = n;
unsigned int max_len;
struct veth_rq *rq;
- int i, drops = 0;
- if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK))
- return -EINVAL;
+ if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK)) {
+ ret = -EINVAL;
+ goto drop;
+ }
rcv = rcu_dereference(priv->peer);
- if (unlikely(!rcv))
- return -ENXIO;
+ if (unlikely(!rcv)) {
+ ret = -ENXIO;
+ goto drop;
+ }
rcv_priv = netdev_priv(rcv);
rq = &rcv_priv->rq[veth_select_rxq(rcv)];
@@ -325,9 +329,12 @@ static int veth_xdp_xmit(struct net_device *dev, int n,
* side. This means an XDP program is loaded on the peer and the peer
* device is up.
*/
- if (!rcu_access_pointer(rq->xdp_prog))
- return -ENXIO;
+ if (!rcu_access_pointer(rq->xdp_prog)) {
+ ret = -ENXIO;
+ goto drop;
+ }
+ drops = 0;
max_len = rcv->mtu + rcv->hard_header_len + VLAN_HLEN;
spin_lock(&rq->xdp_ring.producer_lock);
@@ -346,7 +353,14 @@ static int veth_xdp_xmit(struct net_device *dev, int n,
if (flags & XDP_XMIT_FLUSH)
__veth_xdp_flush(rq);
- return n - drops;
+ if (likely(!drops))
+ return n;
+
+ ret = n - drops;
+drop:
+ atomic64_add(drops, &priv->dropped);
+
+ return ret;
}
static void veth_xdp_flush(struct net_device *dev)
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next 2/3] veth: Account for XDP packet statistics on rx side
From: Toshiaki Makita @ 2018-10-11 9:36 UTC (permalink / raw)
To: David S. Miller; +Cc: Toshiaki Makita, netdev, Jesper Dangaard Brouer
In-Reply-To: <1539250610-2557-1-git-send-email-makita.toshiaki@lab.ntt.co.jp>
On XDP path veth has napi handler so we can collect statistics on
per-queue basis for XDP.
By this change now we can collect XDP_DROP drop count as well as packets
and bytes coming through ndo_xdp_xmit. Packet counters shown by
"ip -s link", sysfs stats or /proc/net/dev is now correct for XDP.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
---
drivers/net/veth.c | 97 ++++++++++++++++++++++++++++++++++++++++++++----------
1 file changed, 79 insertions(+), 18 deletions(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 452193f2..68bb93d 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -37,11 +37,19 @@
#define VETH_XDP_TX BIT(0)
#define VETH_XDP_REDIR BIT(1)
+struct veth_rq_stats {
+ u64 xdp_packets;
+ u64 xdp_bytes;
+ u64 xdp_drops;
+ struct u64_stats_sync syncp;
+};
+
struct veth_rq {
struct napi_struct xdp_napi;
struct net_device *dev;
struct bpf_prog __rcu *xdp_prog;
struct xdp_mem_info xdp_mem;
+ struct veth_rq_stats stats;
bool rx_notify_masked;
struct ptr_ring xdp_ring;
struct xdp_rxq_info xdp_rxq;
@@ -211,12 +219,14 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
skb_tx_timestamp(skb);
if (likely(veth_forward_skb(rcv, skb, rq, rcv_xdp) == NET_RX_SUCCESS)) {
- struct pcpu_lstats *stats = this_cpu_ptr(dev->lstats);
+ if (!rcv_xdp) {
+ struct pcpu_lstats *stats = this_cpu_ptr(dev->lstats);
- u64_stats_update_begin(&stats->syncp);
- stats->bytes += length;
- stats->packets++;
- u64_stats_update_end(&stats->syncp);
+ u64_stats_update_begin(&stats->syncp);
+ stats->bytes += length;
+ stats->packets++;
+ u64_stats_update_end(&stats->syncp);
+ }
} else {
drop:
atomic64_inc(&priv->dropped);
@@ -230,7 +240,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
return NETDEV_TX_OK;
}
-static u64 veth_stats_one(struct pcpu_lstats *result, struct net_device *dev)
+static u64 veth_stats_tx(struct pcpu_lstats *result, struct net_device *dev)
{
struct veth_priv *priv = netdev_priv(dev);
int cpu;
@@ -253,23 +263,58 @@ static u64 veth_stats_one(struct pcpu_lstats *result, struct net_device *dev)
return atomic64_read(&priv->dropped);
}
+static void veth_stats_rx(struct veth_rq_stats *result, struct net_device *dev)
+{
+ struct veth_priv *priv = netdev_priv(dev);
+ int i;
+
+ result->xdp_packets = 0;
+ result->xdp_bytes = 0;
+ result->xdp_drops = 0;
+ for (i = 0; i < dev->num_rx_queues; i++) {
+ struct veth_rq_stats *stats = &priv->rq[i].stats;
+ u64 packets, bytes, drops;
+ unsigned int start;
+
+ do {
+ start = u64_stats_fetch_begin_irq(&stats->syncp);
+ packets = stats->xdp_packets;
+ bytes = stats->xdp_bytes;
+ drops = stats->xdp_drops;
+ } while (u64_stats_fetch_retry_irq(&stats->syncp, start));
+ result->xdp_packets += packets;
+ result->xdp_bytes += bytes;
+ result->xdp_drops += drops;
+ }
+}
+
static void veth_get_stats64(struct net_device *dev,
struct rtnl_link_stats64 *tot)
{
struct veth_priv *priv = netdev_priv(dev);
struct net_device *peer;
- struct pcpu_lstats one;
+ struct veth_rq_stats rx;
+ struct pcpu_lstats tx;
+
+ tot->tx_dropped = veth_stats_tx(&tx, dev);
+ tot->tx_bytes = tx.bytes;
+ tot->tx_packets = tx.packets;
- tot->tx_dropped = veth_stats_one(&one, dev);
- tot->tx_bytes = one.bytes;
- tot->tx_packets = one.packets;
+ veth_stats_rx(&rx, dev);
+ tot->rx_dropped = rx.xdp_drops;
+ tot->rx_bytes = rx.xdp_bytes;
+ tot->rx_packets = rx.xdp_packets;
rcu_read_lock();
peer = rcu_dereference(priv->peer);
if (peer) {
- tot->rx_dropped = veth_stats_one(&one, peer);
- tot->rx_bytes = one.bytes;
- tot->rx_packets = one.packets;
+ tot->rx_dropped += veth_stats_tx(&tx, peer);
+ tot->rx_bytes += tx.bytes;
+ tot->rx_packets += tx.packets;
+
+ veth_stats_rx(&rx, peer);
+ tot->tx_bytes += rx.xdp_bytes;
+ tot->tx_packets += rx.xdp_packets;
}
rcu_read_unlock();
}
@@ -609,28 +654,42 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, struct sk_buff *skb,
static int veth_xdp_rcv(struct veth_rq *rq, int budget, unsigned int *xdp_xmit)
{
- int i, done = 0;
+ int i, done = 0, drops = 0, bytes = 0;
for (i = 0; i < budget; i++) {
void *ptr = __ptr_ring_consume(&rq->xdp_ring);
+ unsigned int xdp_xmit_one = 0;
struct sk_buff *skb;
if (!ptr)
break;
if (veth_is_xdp_frame(ptr)) {
- skb = veth_xdp_rcv_one(rq, veth_ptr_to_xdp(ptr),
- xdp_xmit);
+ struct xdp_frame *frame = veth_ptr_to_xdp(ptr);
+
+ bytes += frame->len;
+ skb = veth_xdp_rcv_one(rq, frame, &xdp_xmit_one);
} else {
- skb = veth_xdp_rcv_skb(rq, ptr, xdp_xmit);
+ skb = ptr;
+ bytes += skb->len;
+ skb = veth_xdp_rcv_skb(rq, skb, &xdp_xmit_one);
}
+ *xdp_xmit |= xdp_xmit_one;
if (skb)
napi_gro_receive(&rq->xdp_napi, skb);
+ else if (!xdp_xmit_one)
+ drops++;
done++;
}
+ u64_stats_update_begin(&rq->stats.syncp);
+ rq->stats.xdp_packets += done;
+ rq->stats.xdp_bytes += bytes;
+ rq->stats.xdp_drops += drops;
+ u64_stats_update_end(&rq->stats.syncp);
+
return done;
}
@@ -821,8 +880,10 @@ static int veth_alloc_queues(struct net_device *dev)
if (!priv->rq)
return -ENOMEM;
- for (i = 0; i < dev->num_rx_queues; i++)
+ for (i = 0; i < dev->num_rx_queues; i++) {
priv->rq[i].dev = dev;
+ u64_stats_init(&priv->rq[i].stats.syncp);
+ }
return 0;
}
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH net-next v2] net: mscc: allow extracting the FCS into the skb
From: David Miller @ 2018-10-11 17:05 UTC (permalink / raw)
To: antoine.tenart
Cc: netdev, linux-kernel, thomas.petazzoni, alexandre.belloni,
quentin.schulz, allan.nielsen, f.fainelli, andrew
In-Reply-To: <20181011071224.28646-1-antoine.tenart@bootlin.com>
From: Antoine Tenart <antoine.tenart@bootlin.com>
Date: Thu, 11 Oct 2018 09:12:24 +0200
> This patch adds support for the NETIF_F_RXFCS feature in the Mscc
> Ethernet driver. This feature is disabled by default and allow a user
> to request the driver not to drop the FCS and to extract it into the skb
> for debugging purposes.
>
> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
> Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
> ---
>
> Since v1:
> - Rebased on top of the latest net-next.
Applied.
^ permalink raw reply
* [PATCH net-next 3/3] veth: Add ethtool statistics support for XDP
From: Toshiaki Makita @ 2018-10-11 9:36 UTC (permalink / raw)
To: David S. Miller; +Cc: Toshiaki Makita, netdev, Jesper Dangaard Brouer
In-Reply-To: <1539250610-2557-1-git-send-email-makita.toshiaki@lab.ntt.co.jp>
Expose per-queue stats for ethtool -S.
As there are only rx queues, and rx queues are used only when XDP is
used, per-queue counters are only rx XDP ones.
Example:
$ ethtool -S veth0
NIC statistics:
peer_ifindex: 11
rx_queue_0_xdp_packets: 28601434
rx_queue_0_xdp_bytes: 1716086040
rx_queue_0_xdp_drops: 28601434
rx_queue_1_xdp_packets: 17873050
rx_queue_1_xdp_bytes: 1072383000
rx_queue_1_xdp_drops: 17873050
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
---
drivers/net/veth.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 46 insertions(+), 2 deletions(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 68bb93d..890fa5b 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -67,6 +67,21 @@ struct veth_priv {
* ethtool interface
*/
+struct veth_q_stat_desc {
+ char desc[ETH_GSTRING_LEN];
+ size_t offset;
+};
+
+#define VETH_RQ_STAT(m) offsetof(struct veth_rq_stats, m)
+
+static const struct veth_q_stat_desc veth_rq_stats_desc[] = {
+ { "xdp_packets", VETH_RQ_STAT(xdp_packets) },
+ { "xdp_bytes", VETH_RQ_STAT(xdp_bytes) },
+ { "xdp_drops", VETH_RQ_STAT(xdp_drops) },
+};
+
+#define VETH_RQ_STATS_LEN ARRAY_SIZE(veth_rq_stats_desc)
+
static struct {
const char string[ETH_GSTRING_LEN];
} ethtool_stats_keys[] = {
@@ -91,9 +106,20 @@ static void veth_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *inf
static void veth_get_strings(struct net_device *dev, u32 stringset, u8 *buf)
{
+ char *p = (char *)buf;
+ int i, j;
+
switch(stringset) {
case ETH_SS_STATS:
- memcpy(buf, ðtool_stats_keys, sizeof(ethtool_stats_keys));
+ memcpy(p, ðtool_stats_keys, sizeof(ethtool_stats_keys));
+ p += sizeof(ethtool_stats_keys);
+ for (i = 0; i < dev->real_num_rx_queues; i++) {
+ for (j = 0; j < VETH_RQ_STATS_LEN; j++) {
+ snprintf(p, ETH_GSTRING_LEN, "rx_queue_%u_%s",
+ i, veth_rq_stats_desc[j].desc);
+ p += ETH_GSTRING_LEN;
+ }
+ }
break;
}
}
@@ -102,7 +128,8 @@ static int veth_get_sset_count(struct net_device *dev, int sset)
{
switch (sset) {
case ETH_SS_STATS:
- return ARRAY_SIZE(ethtool_stats_keys);
+ return ARRAY_SIZE(ethtool_stats_keys) +
+ VETH_RQ_STATS_LEN * dev->real_num_rx_queues;
default:
return -EOPNOTSUPP;
}
@@ -113,8 +140,25 @@ static void veth_get_ethtool_stats(struct net_device *dev,
{
struct veth_priv *priv = netdev_priv(dev);
struct net_device *peer = rtnl_dereference(priv->peer);
+ int i, j, idx;
data[0] = peer ? peer->ifindex : 0;
+ idx = 1;
+ for (i = 0; i < dev->real_num_rx_queues; i++) {
+ const struct veth_rq_stats *rq_stats = &priv->rq[i].stats;
+ const void *stats_base = (void *)rq_stats;
+ unsigned int start;
+ size_t offset;
+
+ do {
+ start = u64_stats_fetch_begin_irq(&rq_stats->syncp);
+ for (j = 0; j < VETH_RQ_STATS_LEN; j++) {
+ offset = veth_rq_stats_desc[j].offset;
+ data[idx + j] = *(u64 *)(stats_base + offset);
+ }
+ } while (u64_stats_fetch_retry_irq(&rq_stats->syncp, start));
+ idx += VETH_RQ_STATS_LEN;
+ }
}
static int veth_get_ts_info(struct net_device *dev,
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH] r8169: set RX_MULTI_EN bit in RxConfig for 8168F-family chips
From: Heiner Kallweit @ 2018-10-11 17:12 UTC (permalink / raw)
To: Maciej S. Szmigiero, David S. Miller, Chris Clayton
Cc: Azat Khuzhin, Realtek linux nic maintainers,
netdev@vger.kernel.org, linux-kernel
In-Reply-To: <73ee1d68-2a27-aee9-d8b5-79ba3eda8187@maciej.szmigiero.name>
On 11.10.2018 16:02, Maciej S. Szmigiero wrote:
> It has been reported that since
> commit 05212ba8132b42 ("r8169: set RxConfig after tx/rx is enabled for RTL8169sb/8110sb devices")
> at least RTL_GIGA_MAC_VER_38 NICs work erratically after a resume from
> suspend.
> The problem has been traced to a missing RX_MULTI_EN bit in the RxConfig
> register.
> We already set this bit for RTL_GIGA_MAC_VER_35 NICs of the same 8168F
> chip family so let's do it also for its other siblings: RTL_GIGA_MAC_VER_36
> and RTL_GIGA_MAC_VER_38.
>
> Curiously, the NIC seems to work fine after a system boot without having
> this bit set as long as the system isn't suspended and resumed.
>
> Fixes: 05212ba8132b42 ("r8169: set RxConfig after tx/rx is enabled for RTL8169sb/8110sb devices")
> Reported-by: Chris Clayton <chris2553@googlemail.com>
> Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
> ---
> drivers/net/ethernet/realtek/r8169.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 7d3f671e1bb3..b68e32186d67 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -4269,8 +4269,8 @@ static void rtl_init_rxcfg(struct rtl8169_private *tp)
> RTL_W32(tp, RxConfig, RX_FIFO_THRESH | RX_DMA_BURST);
> break;
> case RTL_GIGA_MAC_VER_18 ... RTL_GIGA_MAC_VER_24:
> - case RTL_GIGA_MAC_VER_34:
> - case RTL_GIGA_MAC_VER_35:
> + case RTL_GIGA_MAC_VER_34 ... RTL_GIGA_MAC_VER_36:
> + case RTL_GIGA_MAC_VER_38:
> RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | RX_DMA_BURST);
> break;
> case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
>
Patch should have been flagged as "net". Apart from that:
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
^ permalink raw reply
* Re: [PATCH net-next V3] virtio_net: ethtool tx napi configuration
From: Jason Wang @ 2018-10-11 9:47 UTC (permalink / raw)
To: David Miller; +Cc: mst, virtualization, netdev, linux-kernel, willemb
In-Reply-To: <20181010.223458.915344551656019248.davem@davemloft.net>
On 2018年10月11日 13:34, David Miller wrote:
> From: Jason Wang <jasowang@redhat.com>
> Date: Tue, 9 Oct 2018 10:06:26 +0800
>
>> Implement ethtool .set_coalesce (-C) and .get_coalesce (-c) handlers.
>> Interrupt moderation is currently not supported, so these accept and
>> display the default settings of 0 usec and 1 frame.
>>
>> Toggle tx napi through setting tx-frames. So as to not interfere
>> with possible future interrupt moderation, value 1 means tx napi while
>> value 0 means not.
>>
>> Only allow the switching when device is down for simplicity.
>>
>> Link: https://patchwork.ozlabs.org/patch/948149/
>> Suggested-by: Jason Wang <jasowang@redhat.com>
>> Signed-off-by: Willem de Bruijn <willemb@google.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>> Changes from V2:
>> - only allow the switching when device is done
>> - remove unnecessary global variable and initialization
>> Changes from V1:
>> - try to synchronize with datapath to allow changing mode when
>> interface is up.
>> - use tx-frames 0 as to disable tx napi while tx-frames 1 to enable tx napi
> Applied, with...
>
>> + bool running = netif_running(dev);
> this unused variable removed.
My bad, thanks for the fixup.
^ permalink raw reply
* Re: [PATCH net-next] cxgb4: fix thermal configuration dependencies
From: David Miller @ 2018-10-11 17:21 UTC (permalink / raw)
To: arnd
Cc: ganeshgr, rahul.lakkireddy, kumaras, rajur, leedom, arjun,
atul.gupta, netdev, linux-kernel
In-Reply-To: <20181011085835.2567446-1-arnd@arndb.de>
From: Arnd Bergmann <arnd@arndb.de>
Date: Thu, 11 Oct 2018 10:57:57 +0200
> With CONFIG_THERMAL=m, we get a build error:
>
> drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c: In function 'cxgb4_thermal_get_trip_type':
> drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.c:48:11: error: 'struct adapter' has no member named 'ch_thermal'
>
> Once that is fixed by using IS_ENABLED() checks, we get a link error
> against the thermal subsystem when cxgb4 is built-in:
>
> drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.o: In function `cxgb4_thermal_init':
> cxgb4_thermal.c:(.text+0x180): undefined reference to `thermal_zone_device_register'
> drivers/net/ethernet/chelsio/cxgb4/cxgb4_thermal.o: In function `cxgb4_thermal_remove':
> cxgb4_thermal.c:(.text+0x1e0): undefined reference to `thermal_zone_device_unregister'
>
> Finally, since CONFIG_THERMAL can be =m, the Makefile fails to pick up the
> extra file into built-in.a, and we get another link failure against the
> cxgb4_thermal_init/cxgb4_thermal_remove files, so the Makefile has to
> be adapted as well to work for both CONFIG_THERMAL=y and =m.
>
> Fixes: b18719157762 ("cxgb4: Add thermal zone support")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Applied, thanks Arnd.
^ permalink raw reply
* Re: [PATCH net] tipc: eliminate possible recursive locking detected by LOCKDEP
From: David Miller @ 2018-10-11 17:26 UTC (permalink / raw)
To: ying.xue
Cc: jon.maloy, dvyukov, parthasarathy.bhuvaragan, netdev,
linux-kernel, tipc-discussion
In-Reply-To: <1539259076-8562-1-git-send-email-ying.xue@windriver.com>
From: Ying Xue <ying.xue@windriver.com>
Date: Thu, 11 Oct 2018 19:57:56 +0800
> When booting kernel with LOCKDEP option, below warning info was found:
>
> WARNING: possible recursive locking detected
> 4.19.0-rc7+ #14 Not tainted
...
> The reason why the noise above was complained by LOCKDEP is because we
> nested to hold l->wakeupq.lock and l->inputq->lock in tipc_link_reset
> function. In fact it's unnecessary to move skb buffer from l->wakeupq
> queue to l->inputq queue while holding the two locks at the same time.
> Instead, we can move skb buffers in l->wakeupq queue to a temporary
> list first and then move the buffers of the temporary list to l->inputq
> queue, which is also safe for us.
>
> Fixes: 3f32d0be6c16 ("tipc: lock wakeup & inputq at tipc_link_reset()")
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Signed-off-by: Ying Xue <ying.xue@windriver.com>
Applied.
^ permalink raw reply
* Re: [PATCH] r8169: set RX_MULTI_EN bit in RxConfig for 8168F-family chips
From: Chris Clayton @ 2018-10-11 18:14 UTC (permalink / raw)
To: Maciej S. Szmigiero, David S. Miller
Cc: Heiner Kallweit, Azat Khuzhin, Realtek linux nic maintainers,
netdev@vger.kernel.org, linux-kernel
In-Reply-To: <73ee1d68-2a27-aee9-d8b5-79ba3eda8187@maciej.szmigiero.name>
On 11/10/2018 15:02, Maciej S. Szmigiero wrote:
> It has been reported that since
> commit 05212ba8132b42 ("r8169: set RxConfig after tx/rx is enabled for RTL8169sb/8110sb devices")
> at least RTL_GIGA_MAC_VER_38 NICs work erratically after a resume from
> suspend.
> The problem has been traced to a missing RX_MULTI_EN bit in the RxConfig
> register.
> We already set this bit for RTL_GIGA_MAC_VER_35 NICs of the same 8168F
> chip family so let's do it also for its other siblings: RTL_GIGA_MAC_VER_36
> and RTL_GIGA_MAC_VER_38.
>
> Curiously, the NIC seems to work fine after a system boot without having
> this bit set as long as the system isn't suspended and resumed.
>
> Fixes: 05212ba8132b42 ("r8169: set RxConfig after tx/rx is enabled for RTL8169sb/8110sb devices")
> Reported-by: Chris Clayton <chris2553@googlemail.com>
> Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
> ---
> drivers/net/ethernet/realtek/r8169.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 7d3f671e1bb3..b68e32186d67 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -4269,8 +4269,8 @@ static void rtl_init_rxcfg(struct rtl8169_private *tp)
> RTL_W32(tp, RxConfig, RX_FIFO_THRESH | RX_DMA_BURST);
> break;
> case RTL_GIGA_MAC_VER_18 ... RTL_GIGA_MAC_VER_24:
> - case RTL_GIGA_MAC_VER_34:
> - case RTL_GIGA_MAC_VER_35:
> + case RTL_GIGA_MAC_VER_34 ... RTL_GIGA_MAC_VER_36:
> + case RTL_GIGA_MAC_VER_38:
> RTL_W32(tp, RxConfig, RX128_INT_EN | RX_MULTI_EN | RX_DMA_BURST);
> break;
> case RTL_GIGA_MAC_VER_40 ... RTL_GIGA_MAC_VER_51:
>
Once this is merged in 4.19, a backport for 4.18 stable will be needed because the same problem was introduced at 4.18.8.
Tested-by: Chris Clayton <chris2553@googlemail.com>
^ permalink raw reply
* Re: [RFC 0/2] net: sched: indirect/remote setup tc block cb registering
From: John Hurley @ 2018-10-11 11:03 UTC (permalink / raw)
To: Or Gerlitz
Cc: Jakub Kicinski, Linux Netdev List, Jiri Pirko, oss-drivers, ozsh,
avivh, Simon Horman
In-Reply-To: <CAJ3xEMj8rcWo5HMS7OtUJbkoN3M_irbE=CO31mnSNuM5c9=9Gw@mail.gmail.com>
On Wed, Oct 10, 2018 at 2:38 PM Or Gerlitz <gerlitz.or@gmail.com> wrote:
>
> On Thu, Oct 4, 2018 at 8:19 PM Jakub Kicinski
> <jakub.kicinski@netronome.com> wrote:
> > On Thu, 4 Oct 2018 17:20:43 +0100, John Hurley wrote:
> > > > > In this case the hw driver will receive the rules from the tunnel device directly.
> > > > > The driver can then offload them as it sees fit.
> > > >
> > > > if both instances of the hw drivers (uplink0, uplink1) register to get
> > > > the rules installed on the block of the tunnel device we have exactly
> > > > what we want, isn't that?
> > > >
> > >
> > > The design here is that each hw driver should only need to register
> > > for callbacks on a 'higher level' device's block once.
> > > When a callback is triggered the driver receives one instance of the
> > > rule and can make its own decision about what to do.
> > > This is slightly different from registering ingress devs where each
> > > uplink registers for its own block.
> > > It is probably more akin to the egdev setup in that if a rule on a
> > > block egresses to an uplink, the driver receives 1 callback on the
> > > rule, irrespective of how may underlying netdevs are on the block.
> >
> > Right, though nothing stops the driver from registering multiple
> > callbacks for the same device, if its somehow useful.
>
> I must be missing something.. put uplink bonding a side. If the user
> is setting tc ingress rule
> on a tunnel device (vxlan0/gre0) over a system with multiple unrelated
> NICs/uplinks that support
> TC decap offload, wouldn't each of these netdevs want to install the
> rule into HW? why do we want
> the HW driver to duplicate the rule between the potential candidates
> among the netdev instances they created?
> and not each of them to get the callback and decide??
>
> we want each netdev instance of these NIC
Hi Or,
It depends on how we want to offload tunnels.
In the case of the NFP, we offload 1 instance of a tunnel rule, not
one instance per uplink.
With this, it makes sense to have 1 callback per tunnel netdev (and
per driver) rather that per uplink (although as Jakub pointed out, the
option is there to register more callbacks).
If we consider the egdev model for offload, we only got a single
callback per rule if the egress device was registered and did not know
the ingress dev - is this not a similar in that the driver gets 1
callback for the rule and decides what to do with it?
John
^ permalink raw reply
* Re: [PATCH] net: cdc_ncm: use tasklet_init() for tasklet_struct init
From: David Miller @ 2018-10-11 19:06 UTC (permalink / raw)
To: ben.dooks; +Cc: oliver, linux-usb, netdev, linux-kernel, linux-kernel
In-Reply-To: <20181011130332.25710-1-ben.dooks@codethink.co.uk>
From: Ben Dooks <ben.dooks@codethink.co.uk>
Date: Thu, 11 Oct 2018 14:03:32 +0100
> The tasklet initialisation would be better done by tasklet_init()
> instead of assuming all the fields are in an ok state by default.
>
> This does not fix any actual know bug.
>
> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Applied to net-next.
^ permalink raw reply
* Re: [PATCH] r8169: set RX_MULTI_EN bit in RxConfig for 8168F-family chips
From: David Miller @ 2018-10-11 19:08 UTC (permalink / raw)
To: mail; +Cc: chris2553, hkallweit1, a3at.mail, nic_swsd, netdev, linux-kernel
In-Reply-To: <73ee1d68-2a27-aee9-d8b5-79ba3eda8187@maciej.szmigiero.name>
From: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
Date: Thu, 11 Oct 2018 16:02:10 +0200
> It has been reported that since
> commit 05212ba8132b42 ("r8169: set RxConfig after tx/rx is enabled for RTL8169sb/8110sb devices")
> at least RTL_GIGA_MAC_VER_38 NICs work erratically after a resume from
> suspend.
> The problem has been traced to a missing RX_MULTI_EN bit in the RxConfig
> register.
> We already set this bit for RTL_GIGA_MAC_VER_35 NICs of the same 8168F
> chip family so let's do it also for its other siblings: RTL_GIGA_MAC_VER_36
> and RTL_GIGA_MAC_VER_38.
>
> Curiously, the NIC seems to work fine after a system boot without having
> this bit set as long as the system isn't suspended and resumed.
>
> Fixes: 05212ba8132b42 ("r8169: set RxConfig after tx/rx is enabled for RTL8169sb/8110sb devices")
> Reported-by: Chris Clayton <chris2553@googlemail.com>
> Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Applied and queued up for -stable.
^ permalink raw reply
* [PATCH net] net/mlx4_core: Fix warnings during boot on driverinit param set failures
From: Tariq Toukan @ 2018-10-11 12:01 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Eran Ben Elisha, Moshe Shemesh, Tariq Toukan
From: Moshe Shemesh <moshe@mellanox.com>
During boot, mlx4_core sets the driverinit configuration parameters and
updates the devlink module on the initial values calling
devlink_param_driverinit_value_set().
If devlink_param_driverinit_value_set() returns an error mlx4_core
reports kernel module warning.
This caused false alarm during boot in case kernel was compiled with
CONFIG_NET_DEVLINK off.
Fix by removing warning reported in case
devlink_param_driverinit_value_set() fails.
This actually makes the function mlx4_devlink_set_init_value()
redundant to using directly devlink_param_driverinit_value_set() and so
removed.
It fixes the following kernel trace:
mlx4_core 0000:00:06.0: devlink set parameter 0 value failed (err = -95)
mlx4_core 0000:00:06.0: devlink set parameter 1 value failed (err = -95)
mlx4_core 0000:00:06.0: devlink set parameter 4 value failed (err = -95)
mlx4_core 0000:00:06.0: devlink set parameter 5 value failed (err = -95)
mlx4_core 0000:00:06.0: devlink set parameter 3 value failed (err = -95)
Fixes: bd1b51dc66df ("mlx4: Add mlx4 initial parameters table and register it")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/main.c | 43 +++++++++++--------------------
1 file changed, 15 insertions(+), 28 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index d2d59444f562..6a046030e873 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -260,47 +260,34 @@ enum mlx4_devlink_param_id {
NULL, NULL, NULL),
};
-static void mlx4_devlink_set_init_value(struct devlink *devlink, u32 param_id,
- union devlink_param_value init_val)
-{
- struct mlx4_priv *priv = devlink_priv(devlink);
- struct mlx4_dev *dev = &priv->dev;
- int err;
-
- err = devlink_param_driverinit_value_set(devlink, param_id, init_val);
- if (err)
- mlx4_warn(dev,
- "devlink set parameter %u value failed (err = %d)",
- param_id, err);
-}
-
static void mlx4_devlink_set_params_init_values(struct devlink *devlink)
{
union devlink_param_value value;
value.vbool = !!mlx4_internal_err_reset;
- mlx4_devlink_set_init_value(devlink,
- DEVLINK_PARAM_GENERIC_ID_INT_ERR_RESET,
- value);
+ devlink_param_driverinit_value_set(devlink,
+ DEVLINK_PARAM_GENERIC_ID_INT_ERR_RESET,
+ value);
value.vu32 = 1UL << log_num_mac;
- mlx4_devlink_set_init_value(devlink,
- DEVLINK_PARAM_GENERIC_ID_MAX_MACS, value);
+ devlink_param_driverinit_value_set(devlink,
+ DEVLINK_PARAM_GENERIC_ID_MAX_MACS,
+ value);
value.vbool = enable_64b_cqe_eqe;
- mlx4_devlink_set_init_value(devlink,
- MLX4_DEVLINK_PARAM_ID_ENABLE_64B_CQE_EQE,
- value);
+ devlink_param_driverinit_value_set(devlink,
+ MLX4_DEVLINK_PARAM_ID_ENABLE_64B_CQE_EQE,
+ value);
value.vbool = enable_4k_uar;
- mlx4_devlink_set_init_value(devlink,
- MLX4_DEVLINK_PARAM_ID_ENABLE_4K_UAR,
- value);
+ devlink_param_driverinit_value_set(devlink,
+ MLX4_DEVLINK_PARAM_ID_ENABLE_4K_UAR,
+ value);
value.vbool = false;
- mlx4_devlink_set_init_value(devlink,
- DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT,
- value);
+ devlink_param_driverinit_value_set(devlink,
+ DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT,
+ value);
}
static inline void mlx4_set_num_reserved_uars(struct mlx4_dev *dev,
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH iproute 2/2] utils: fix get_rtnl_link_stats_rta stats parsing
From: Lorenzo Bianconi @ 2018-10-11 12:24 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20181010121139.335ff091@xeon-e3>
> > iproute2 walks through the list of available tunnels using netlink
> > protocol in order to get device info instead of reading
> > them from proc filesystem. However the kernel reports device statistics
> > using IFLA_INET6_STATS/IFLA_INET6_ICMP6STATS attributes nested in
> > IFLA_PROTINFO one but iproutes expects these info in
> > IFLA_STATS64/IFLA_STATS attributes.
> > The issue can be triggered with the following reproducer:
> >
> > $ip link add ip6d0 type ip6tnl mode ip6ip6 local 1111::1 remote 2222::1
> > $ip -6 -d -s tunnel show ip6d0
> > ip6d0: ipv6/ipv6 remote 2222::1 local 1111::1 encaplimit 4 hoplimit 64
> > tclass 0x00 flowlabel 0x00000 (flowinfo 0x00000000)
> > Dump terminated
> >
> > Fix the issue introducing IFLA_INET6_STATS attribute parsing
> >
> > Fixes: 3e953938717f ("iptunnel/ip6tunnel: Use netlink to walk through
> > tunnels list")
> >
> > Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
>
> Can't we fix the kernel to report statistics properly, rather than
> starting iproute2 doing more /proc interfaces.
>
Hi Stephen,
sorry, I did not get what you mean. Current iproute implementation
walks through tunnels list using netlink protocol and parses device
statistics in the kernel netlink message. However it does not take
into account the actual netlink message layout since the statistic
attribute is nested in IFLA_PROTINFO one.
Moreover AFAIU the related kernel code has not changed since iproute
commit 3e953938717f, so I guess we should fix the issue in iproute code
instead in the kernel one. Do you agree?
Regards,
Lorenzo
^ permalink raw reply
* [PATCH 1/1] net: socionext: clear rx irq correctly
From: Ilias Apalodimas @ 2018-10-11 12:28 UTC (permalink / raw)
To: netdev, jaswinder.singh
Cc: ard.biesheuvel, masami.hiramatsu, Ilias Apalodimas
commit 63ae7949e94a ("net: socionext: Use descriptor info instead of MMIO reads on Rx")
removed constant mmio reads from the driver and started using a descriptor
field to check if packet should be processed.
This lead the napi rx handler being constantly called while no packets
needed processing and ksoftirq getting 100% cpu usage. Issue one mmio read
to clear the irq correcty after processing packets
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
drivers/net/ethernet/socionext/netsec.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
index 7aa5ebb..4289ccb 100644
--- a/drivers/net/ethernet/socionext/netsec.c
+++ b/drivers/net/ethernet/socionext/netsec.c
@@ -735,8 +735,11 @@ static int netsec_process_rx(struct netsec_priv *priv, int budget)
u16 idx = dring->tail;
struct netsec_de *de = dring->vaddr + (DESC_SZ * idx);
- if (de->attr & (1U << NETSEC_RX_PKT_OWN_FIELD))
+ if (de->attr & (1U << NETSEC_RX_PKT_OWN_FIELD)) {
+ /* reading the register clears the irq */
+ netsec_read(priv, NETSEC_REG_NRM_RX_PKTCNT);
break;
+ }
/* This barrier is needed to keep us from reading
* any other fields out of the netsec_de until we have
--
2.7.4
^ permalink raw reply related
* [PATCH net-next] hv_netvsc: fix vf serial matching with pci slot info
From: Haiyang Zhang @ 2018-10-11 20:14 UTC (permalink / raw)
To: davem, netdev; +Cc: olaf, sthemmin, haiyangz, linux-kernel, devel, vkuznets
From: Haiyang Zhang <haiyangz@microsoft.com>
The VF device's serial number is saved as a string in PCI slot's
kobj name, not the slot->number. This patch corrects the netvsc
driver, so the VF device can be successfully paired with synthetic
NIC.
Fixes: 00d7ddba1143 ("hv_netvsc: pair VF based on serial number")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
drivers/net/hyperv/netvsc_drv.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 9bcaf204a7d4..8121ce34a39f 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -2030,14 +2030,15 @@ static void netvsc_vf_setup(struct work_struct *w)
rtnl_unlock();
}
-/* Find netvsc by VMBus serial number.
- * The PCI hyperv controller records the serial number as the slot.
+/* Find netvsc by VF serial number.
+ * The PCI hyperv controller records the serial number as the slot kobj name.
*/
static struct net_device *get_netvsc_byslot(const struct net_device *vf_netdev)
{
struct device *parent = vf_netdev->dev.parent;
struct net_device_context *ndev_ctx;
struct pci_dev *pdev;
+ u32 serial;
if (!parent || !dev_is_pci(parent))
return NULL; /* not a PCI device */
@@ -2048,16 +2049,22 @@ static struct net_device *get_netvsc_byslot(const struct net_device *vf_netdev)
return NULL;
}
+ if (kstrtou32(pdev->slot->kobj.name, 10, &serial)) {
+ netdev_notice(vf_netdev, "Invalid vf serial:%s\n",
+ pdev->slot->kobj.name);
+ return NULL;
+ }
+
list_for_each_entry(ndev_ctx, &netvsc_dev_list, list) {
if (!ndev_ctx->vf_alloc)
continue;
- if (ndev_ctx->vf_serial == pdev->slot->number)
+ if (ndev_ctx->vf_serial == serial)
return hv_get_drvdata(ndev_ctx->device_ctx);
}
netdev_notice(vf_netdev,
- "no netdev found for slot %u\n", pdev->slot->number);
+ "no netdev found for vf serial:%u\n", serial);
return NULL;
}
--
2.18.0
^ permalink raw reply related
* RE: [PATCH net] tipc: eliminate possible recursive locking detected by LOCKDEP
From: Jon Maloy @ 2018-10-11 13:30 UTC (permalink / raw)
To: Ying Xue, dvyukov@google.com
Cc: davem@davemloft.net, parthasarathy.bhuvaragan@ericsson.com,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
tipc-discussion@lists.sourceforge.net
In-Reply-To: <1539259076-8562-1-git-send-email-ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
///jon
> -----Original Message-----
> From: Ying Xue <ying.xue@windriver.com>
> Sent: October 11, 2018 7:58 AM
> To: Jon Maloy <jon.maloy@ericsson.com>; dvyukov@google.com
> Cc: davem@davemloft.net; parthasarathy.bhuvaragan@ericsson.com;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; tipc-
> discussion@lists.sourceforge.net
> Subject: [PATCH net] tipc: eliminate possible recursive locking detected by
> LOCKDEP
>
> When booting kernel with LOCKDEP option, below warning info was found:
>
> WARNING: possible recursive locking detected 4.19.0-rc7+ #14 Not tainted
> --------------------------------------------
> swapper/0/1 is trying to acquire lock:
> 00000000dcfc0fc8 (&(&list->lock)->rlock#4){+...}, at: spin_lock_bh
> include/linux/spinlock.h:334 [inline]
> 00000000dcfc0fc8 (&(&list->lock)->rlock#4){+...}, at:
> tipc_link_reset+0x125/0xdf0 net/tipc/link.c:850
>
> but task is already holding lock:
> 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at: spin_lock_bh
> include/linux/spinlock.h:334 [inline]
> 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at:
> tipc_link_reset+0xfa/0xdf0 net/tipc/link.c:849
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(&(&list->lock)->rlock#4);
> lock(&(&list->lock)->rlock#4);
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> 2 locks held by swapper/0/1:
> #0: 00000000f7539d34 (pernet_ops_rwsem){+.+.}, at:
> register_pernet_subsys+0x19/0x40 net/core/net_namespace.c:1051
> #1: 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at:
> spin_lock_bh include/linux/spinlock.h:334 [inline]
> #1: 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at:
> tipc_link_reset+0xfa/0xdf0 net/tipc/link.c:849
>
> stack backtrace:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc7+ #14 Hardware name:
> QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace:
> __dump_stack lib/dump_stack.c:77 [inline]
> dump_stack+0x1af/0x295 lib/dump_stack.c:113 print_deadlock_bug
> kernel/locking/lockdep.c:1759 [inline] check_deadlock
> kernel/locking/lockdep.c:1803 [inline] validate_chain
> kernel/locking/lockdep.c:2399 [inline]
> __lock_acquire+0xf1e/0x3c60 kernel/locking/lockdep.c:3411
> lock_acquire+0x1db/0x520 kernel/locking/lockdep.c:3900
> __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
> _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168 spin_lock_bh
> include/linux/spinlock.h:334 [inline]
> tipc_link_reset+0x125/0xdf0 net/tipc/link.c:850
> tipc_link_bc_create+0xb5/0x1f0 net/tipc/link.c:526
> tipc_bcast_init+0x59b/0xab0 net/tipc/bcast.c:521
> tipc_init_net+0x472/0x610 net/tipc/core.c:82
> ops_init+0xf7/0x520 net/core/net_namespace.c:129
> __register_pernet_operations net/core/net_namespace.c:940 [inline]
> register_pernet_operations+0x453/0xac0 net/core/net_namespace.c:1011
> register_pernet_subsys+0x28/0x40 net/core/net_namespace.c:1052
> tipc_init+0x83/0x104 net/tipc/core.c:140 do_one_initcall+0x109/0x70a
> init/main.c:885 do_initcall_level init/main.c:953 [inline] do_initcalls
> init/main.c:961 [inline] do_basic_setup init/main.c:979 [inline]
> kernel_init_freeable+0x4bd/0x57f init/main.c:1144
> kernel_init+0x13/0x180 init/main.c:1063
> ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413
>
> The reason why the noise above was complained by LOCKDEP is because we
> nested to hold l->wakeupq.lock and l->inputq->lock in tipc_link_reset
> function. In fact it's unnecessary to move skb buffer from l->wakeupq queue
> to l->inputq queue while holding the two locks at the same time.
> Instead, we can move skb buffers in l->wakeupq queue to a temporary list
> first and then move the buffers of the temporary list to l->inputq queue,
> which is also safe for us.
>
> Fixes: 3f32d0be6c16 ("tipc: lock wakeup & inputq at tipc_link_reset()")
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Signed-off-by: Ying Xue <ying.xue@windriver.com>
> ---
> net/tipc/link.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/net/tipc/link.c b/net/tipc/link.c index fb886b5..1d21ae4 100644
> --- a/net/tipc/link.c
> +++ b/net/tipc/link.c
> @@ -843,14 +843,21 @@ static void link_prepare_wakeup(struct tipc_link *l)
>
> void tipc_link_reset(struct tipc_link *l) {
> + struct sk_buff_head list;
> +
> + __skb_queue_head_init(&list);
> +
> l->in_session = false;
> l->session++;
> l->mtu = l->advertised_mtu;
> +
> spin_lock_bh(&l->wakeupq.lock);
> + skb_queue_splice_init(&l->wakeupq, &list);
> + spin_unlock_bh(&l->wakeupq.lock);
> +
> spin_lock_bh(&l->inputq->lock);
> - skb_queue_splice_init(&l->wakeupq, l->inputq);
> + skb_queue_splice_init(&list, l->inputq);
> spin_unlock_bh(&l->inputq->lock);
> - spin_unlock_bh(&l->wakeupq.lock);
>
> __skb_queue_purge(&l->transmq);
> __skb_queue_purge(&l->deferdq);
> --
> 2.7.4
^ permalink raw reply
* Re: [PATCH net 2/2] selftests: udpgso_bench.sh explicitly requires bash
From: Willem de Bruijn @ 2018-10-11 13:48 UTC (permalink / raw)
To: Paolo Abeni
Cc: Network Development, David Miller, Florian Westphal,
Willem de Bruijn
In-Reply-To: <6d93e3bc5920b3f6f8b121fa0969ee42ef33ed5c.1539247467.git.pabeni@redhat.com>
On Thu, Oct 11, 2018 at 4:58 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> The udpgso_bench.sh script requires several bash-only features. This
> may cause random failures if the default shell is not bash.
> Address the above explicitly requiring bash as the script interpreter
>
> Fixes: 3a687bef148d ("selftests: udp gso benchmark")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
^ permalink raw reply
* Re: [PATCH 1/1] net: socionext: clear rx irq correctly
From: Ard Biesheuvel @ 2018-10-11 13:49 UTC (permalink / raw)
To: Ilias Apalodimas
Cc: <netdev@vger.kernel.org>, Jaswinder Singh, Masami Hiramatsu
In-Reply-To: <1539260906-29596-1-git-send-email-ilias.apalodimas@linaro.org>
On 11 October 2018 at 14:28, Ilias Apalodimas
<ilias.apalodimas@linaro.org> wrote:
> commit 63ae7949e94a ("net: socionext: Use descriptor info instead of MMIO reads on Rx")
> removed constant mmio reads from the driver and started using a descriptor
> field to check if packet should be processed.
> This lead the napi rx handler being constantly called while no packets
> needed processing and ksoftirq getting 100% cpu usage. Issue one mmio read
> to clear the irq correcty after processing packets
>
> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
> drivers/net/ethernet/socionext/netsec.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
> index 7aa5ebb..4289ccb 100644
> --- a/drivers/net/ethernet/socionext/netsec.c
> +++ b/drivers/net/ethernet/socionext/netsec.c
> @@ -735,8 +735,11 @@ static int netsec_process_rx(struct netsec_priv *priv, int budget)
> u16 idx = dring->tail;
> struct netsec_de *de = dring->vaddr + (DESC_SZ * idx);
>
> - if (de->attr & (1U << NETSEC_RX_PKT_OWN_FIELD))
> + if (de->attr & (1U << NETSEC_RX_PKT_OWN_FIELD)) {
> + /* reading the register clears the irq */
> + netsec_read(priv, NETSEC_REG_NRM_RX_PKTCNT);
> break;
> + }
>
> /* This barrier is needed to keep us from reading
> * any other fields out of the netsec_de until we have
> --
> 2.7.4
>
^ permalink raw reply
* [PATCH bpf-next 0/2] Two libbpf RB walk improvements
From: Daniel Borkmann @ 2018-10-11 14:02 UTC (permalink / raw)
To: alexei.starovoitov; +Cc: netdev, Daniel Borkmann
Two small improvements to libbpf's perf RB walk: first one fixes
used barriers and second one simplifies ring walking code. For
details see individual patches.
Thanks!
Daniel Borkmann (2):
bpf, libbpf: use proper barriers in perf RB walk
bpf, libbpf: simplify perf RB walk and do incremental updates
tools/bpf/bpftool/map_perf_ring.c | 10 ++-
tools/lib/bpf/libbpf.c | 117 ++++++++++++++++++----------
tools/lib/bpf/libbpf.h | 14 ++--
tools/testing/selftests/bpf/trace_helpers.c | 7 +-
4 files changed, 93 insertions(+), 55 deletions(-)
--
2.9.5
^ permalink raw reply
* [PATCH bpf-next 1/2] bpf, libbpf: use proper barriers in perf RB walk
From: Daniel Borkmann @ 2018-10-11 14:02 UTC (permalink / raw)
To: alexei.starovoitov; +Cc: netdev, Daniel Borkmann
In-Reply-To: <20181011140207.27602-1-daniel@iogearbox.net>
User proper CPU barrier instead of just a compile barrier when fetching
ring's data_head in bpf_perf_event_read_simple() which is not correct.
Also, add two small helpers bpf_perf_read_head() and bpf_perf_write_tail()
to make used barriers more obvious and a comment to what they pair to.
Fixes: d0cabbb021be ("tools: bpf: move the event reading loop to libbpf")
Fixes: 39111695b1b8 ("samples: bpf: add bpf_perf_event_output example")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
tools/lib/bpf/libbpf.c | 52 ++++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 46 insertions(+), 6 deletions(-)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 176cf55..1ac8856 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -20,6 +20,7 @@
#include <fcntl.h>
#include <errno.h>
#include <asm/unistd.h>
+#include <asm/barrier.h>
#include <linux/err.h>
#include <linux/kernel.h>
#include <linux/bpf.h>
@@ -27,6 +28,7 @@
#include <linux/list.h>
#include <linux/limits.h>
#include <linux/perf_event.h>
+#include <linux/compiler.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/vfs.h>
@@ -2404,18 +2406,58 @@ int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
return 0;
}
+/*
+ * Comment from kernel/events/ring_buffer.c:
+ *
+ * Since the mmap() consumer (userspace) can run on a different CPU:
+ *
+ * kernel user
+ *
+ * if (LOAD ->data_tail) { LOAD ->data_head
+ * (A) smp_rmb() (C)
+ * STORE $data LOAD $data
+ * smp_wmb() (B) smp_mb() (D)
+ * STORE ->data_head STORE ->data_tail
+ * }
+ *
+ * Where A pairs with D, and B pairs with C.
+ *
+ * In our case (A) is a control dependency that separates the load of
+ * the ->data_tail and the stores of $data. In case ->data_tail
+ * indicates there is no room in the buffer to store $data we do not.
+ *
+ * D needs to be a full barrier since it separates the data READ
+ * from the tail WRITE.
+ *
+ * For B a WMB is sufficient since it separates two WRITEs, and for C
+ * an RMB is sufficient since it separates two READs.
+ */
+static __u64 bpf_perf_read_head(struct perf_event_mmap_page *header)
+{
+ __u64 data_head = READ_ONCE(header->data_head);
+
+ rmb();
+ return data_head;
+}
+
+static void bpf_perf_write_tail(struct perf_event_mmap_page *header,
+ __u64 data_tail)
+{
+ mb();
+ header->data_tail = data_tail;
+}
+
enum bpf_perf_event_ret
bpf_perf_event_read_simple(void *mem, unsigned long size,
unsigned long page_size, void **buf, size_t *buf_len,
bpf_perf_event_print_t fn, void *priv)
{
- volatile struct perf_event_mmap_page *header = mem;
+ struct perf_event_mmap_page *header = mem;
+ __u64 data_head = bpf_perf_read_head(header);
__u64 data_tail = header->data_tail;
- __u64 data_head = header->data_head;
int ret = LIBBPF_PERF_EVENT_ERROR;
void *base, *begin, *end;
- asm volatile("" ::: "memory"); /* in real code it should be smp_rmb() */
if (data_head == data_tail)
return LIBBPF_PERF_EVENT_CONT;
@@ -2458,8 +2500,6 @@ bpf_perf_event_read_simple(void *mem, unsigned long size,
data_tail += ehdr->size;
}
- __sync_synchronize(); /* smp_mb() */
- header->data_tail = data_tail;
-
+ bpf_perf_write_tail(header, data_tail);
return ret;
}
--
2.9.5
^ permalink raw reply related
* [PATCH bpf-next 2/2] bpf, libbpf: simplify perf RB walk and do incremental updates
From: Daniel Borkmann @ 2018-10-11 14:02 UTC (permalink / raw)
To: alexei.starovoitov; +Cc: netdev, Daniel Borkmann
In-Reply-To: <20181011140207.27602-1-daniel@iogearbox.net>
Clean up and improve bpf_perf_event_read_simple() ring walk a bit
to use similar tail update scheme as in perf and bcc allowing the
kernel to make forward progress not only after full timely walk.
Also few other improvements to use realloc() instead of free() and
malloc() combination and for the callback use proper perf_event_header
instead of void pointer, so that real applications can use container_of()
macro with proper type checking.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
tools/bpf/bpftool/map_perf_ring.c | 10 ++--
tools/lib/bpf/libbpf.c | 77 +++++++++++++----------------
tools/lib/bpf/libbpf.h | 14 +++---
tools/testing/selftests/bpf/trace_helpers.c | 7 +--
4 files changed, 53 insertions(+), 55 deletions(-)
diff --git a/tools/bpf/bpftool/map_perf_ring.c b/tools/bpf/bpftool/map_perf_ring.c
index 6d41323..bdaf406 100644
--- a/tools/bpf/bpftool/map_perf_ring.c
+++ b/tools/bpf/bpftool/map_perf_ring.c
@@ -50,15 +50,17 @@ static void int_exit(int signo)
stop = true;
}
-static enum bpf_perf_event_ret print_bpf_output(void *event, void *priv)
+static enum bpf_perf_event_ret
+print_bpf_output(struct perf_event_header *event, void *private_data)
{
- struct event_ring_info *ring = priv;
- struct perf_event_sample *e = event;
+ struct perf_event_sample *e = container_of(event, struct perf_event_sample,
+ header);
+ struct event_ring_info *ring = private_data;
struct {
struct perf_event_header header;
__u64 id;
__u64 lost;
- } *lost = event;
+ } *lost = (typeof(lost))event;
if (json_output) {
jsonw_start_object(json_wtr);
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 1ac8856..35f4a77 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -2448,58 +2448,51 @@ static void bpf_perf_write_tail(struct perf_event_mmap_page *header,
}
enum bpf_perf_event_ret
-bpf_perf_event_read_simple(void *mem, unsigned long size,
- unsigned long page_size, void **buf, size_t *buf_len,
- bpf_perf_event_print_t fn, void *priv)
-{
- struct perf_event_mmap_page *header = mem;
- __u64 data_head = bpf_perf_read_head(header);
- __u64 data_tail = header->data_tail;
- int ret = LIBBPF_PERF_EVENT_ERROR;
- void *base, *begin, *end;
-
- if (data_head == data_tail)
- return LIBBPF_PERF_EVENT_CONT;
-
- base = ((char *)header) + page_size;
-
- begin = base + data_tail % size;
- end = base + data_head % size;
-
- while (begin != end) {
- struct perf_event_header *ehdr;
-
- ehdr = begin;
- if (begin + ehdr->size > base + size) {
- long len = base + size - begin;
-
- if (*buf_len < ehdr->size) {
- free(*buf);
- *buf = malloc(ehdr->size);
- if (!*buf) {
+bpf_perf_event_read_simple(void *mmap_mem, size_t mmap_size, size_t page_size,
+ void **copy_mem, size_t *copy_size,
+ bpf_perf_event_print_t fn, void *private_data)
+{
+ struct perf_event_mmap_page *header = mmap_mem;
+ void *base = ((__u8 *)header) + page_size;
+ int ret = LIBBPF_PERF_EVENT_CONT;
+ struct perf_event_header *ehdr;
+ __u64 data_head, data_tail;
+ size_t ehdr_size;
+
+ for (data_tail = header->data_tail,
+ data_head = bpf_perf_read_head(header);
+ data_head != data_tail;
+ data_head = bpf_perf_read_head(header)) {
+ ehdr = base + (data_tail & (mmap_size - 1));
+ ehdr_size = ehdr->size;
+ if (((void *)ehdr) + ehdr_size > base + mmap_size) {
+ void *copy_start = ehdr;
+ size_t len_first = base + mmap_size - copy_start;
+ size_t len_secnd = ehdr_size - len_first;
+
+ if (*copy_size < ehdr_size) {
+ void *ptr = realloc(*copy_mem, ehdr_size);
+
+ if (!ptr) {
ret = LIBBPF_PERF_EVENT_ERROR;
break;
}
- *buf_len = ehdr->size;
+ *copy_mem = ptr;
+ *copy_size = ehdr_size;
}
- memcpy(*buf, begin, len);
- memcpy(*buf + len, base, ehdr->size - len);
- ehdr = (void *)*buf;
- begin = base + ehdr->size - len;
- } else if (begin + ehdr->size == base + size) {
- begin = base;
- } else {
- begin += ehdr->size;
+ memcpy(*copy_mem, copy_start, len_first);
+ memcpy(*copy_mem + len_first, base, len_secnd);
+ ehdr = *copy_mem;
}
- ret = fn(ehdr, priv);
+ ret = fn(ehdr, private_data);
+
+ data_tail += ehdr_size;
+ bpf_perf_write_tail(header, data_tail);
if (ret != LIBBPF_PERF_EVENT_CONT)
break;
-
- data_tail += ehdr->size;
}
- bpf_perf_write_tail(header, data_tail);
return ret;
}
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 8af8d36..16b5c95 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -284,12 +284,14 @@ enum bpf_perf_event_ret {
LIBBPF_PERF_EVENT_CONT = -2,
};
-typedef enum bpf_perf_event_ret (*bpf_perf_event_print_t)(void *event,
- void *priv);
-int bpf_perf_event_read_simple(void *mem, unsigned long size,
- unsigned long page_size,
- void **buf, size_t *buf_len,
- bpf_perf_event_print_t fn, void *priv);
+struct perf_event_header;
+typedef enum bpf_perf_event_ret
+ (*bpf_perf_event_print_t)(struct perf_event_header *hdr,
+ void *private_data);
+enum bpf_perf_event_ret
+bpf_perf_event_read_simple(void *mmap_mem, size_t mmap_size, size_t page_size,
+ void **copy_mem, size_t *copy_size,
+ bpf_perf_event_print_t fn, void *private_data);
struct nlattr;
typedef int (*libbpf_dump_nlmsg_t)(void *cookie, void *msg, struct nlattr **tb);
diff --git a/tools/testing/selftests/bpf/trace_helpers.c b/tools/testing/selftests/bpf/trace_helpers.c
index cabe2a3..1764093 100644
--- a/tools/testing/selftests/bpf/trace_helpers.c
+++ b/tools/testing/selftests/bpf/trace_helpers.c
@@ -124,10 +124,11 @@ struct perf_event_sample {
char data[];
};
-static enum bpf_perf_event_ret bpf_perf_event_print(void *event, void *priv)
+static enum bpf_perf_event_ret
+bpf_perf_event_print(struct perf_event_header *hdr, void *private_data)
{
- struct perf_event_sample *e = event;
- perf_event_print_fn fn = priv;
+ struct perf_event_sample *e = (struct perf_event_sample *)hdr;
+ perf_event_print_fn fn = private_data;
int ret;
if (e->header.type == PERF_RECORD_SAMPLE) {
--
2.9.5
^ permalink raw reply related
* [PATCH net] rxrpc: Fix an uninitialised variable
From: David Howells @ 2018-10-11 21:32 UTC (permalink / raw)
Cc: netdev, dhowells, linux-afs, linux-kernel
Fix an uninitialised variable introduced by the last patch. This can cause
a crash when a new call comes in to a local service, such as when an AFS
fileserver calls back to the local cache manager.
Fixes: c1e15b4944c9 ("rxrpc: Fix the packet reception routine")
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/call_accept.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c
index 652e314de38e..8079aacaecac 100644
--- a/net/rxrpc/call_accept.c
+++ b/net/rxrpc/call_accept.c
@@ -337,7 +337,7 @@ struct rxrpc_call *rxrpc_new_incoming_call(struct rxrpc_local *local,
{
struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
struct rxrpc_connection *conn;
- struct rxrpc_peer *peer;
+ struct rxrpc_peer *peer = NULL;
struct rxrpc_call *call;
_enter("");
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox