* Re: [PATCH] bpf: validate bpf_func when BPF_JIT is enabled
From: Sami Tolvanen @ 2019-09-11 21:07 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Björn Töpel, Yonghong Song, Alexei Starovoitov,
Daniel Borkmann, Kees Cook, Martin Lau, Song Liu,
netdev@vger.kernel.org, bpf@vger.kernel.org,
linux-kernel@vger.kernel.org, Jesper Dangaard Brouer
In-Reply-To: <87impzt4pu.fsf@toke.dk>
On Wed, Sep 11, 2019 at 5:09 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Björn Töpel <bjorn.topel@intel.com> writes:
> > I ran the "xdp_rxq_info" sample with and without Sami's patch:
>
> Thanks for doing this!
Yes, thanks for testing this Björn!
> Or (1/22998700 - 1/23923874) * 10**9 == 1.7 nanoseconds of overhead.
>
> I guess that is not *too* bad; but it's still chipping away at
> performance; anything we could do to lower the overhead?
The check is already rather minimal, but I could move this to a static
inline function to help ensure the compiler doesn't generate an
additional function call for this. I'm also fine with gating this
behind a separate config option, but I'm not sure if that's worth it.
Any thoughts?
Sami
^ permalink raw reply
* Re: [PATCH v2 0/2] mmc: core: Fix Marvell WiFi reset by adding SDIO API to replug card
From: Doug Anderson @ 2019-09-11 21:26 UTC (permalink / raw)
To: Ulf Hansson
Cc: Kalle Valo, Adrian Hunter, Ganapathi Bhat, linux-wireless,
Andreas Fenkart, Brian Norris, Amitkumar Karwar,
open list:ARM/Rockchip SoC..., Wolfram Sang, Nishant Sarmukadam,
netdev, Avri Altman, linux-mmc@vger.kernel.org, David S. Miller,
Xinming Hu, Linux Kernel Mailing List, Thomas Gleixner,
Kate Stewart
In-Reply-To: <CAPDyKFoND5Kaam72zxO4wChO0z_1XL2KWX6oNjVcMUGA7G8RFg@mail.gmail.com>
Hi,
On Thu, Jul 25, 2019 at 6:28 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Mon, 22 Jul 2019 at 21:41, Douglas Anderson <dianders@chromium.org> wrote:
> >
> > As talked about in the thread at:
> >
> > http://lkml.kernel.org/r/CAD=FV=X7P2F1k_zwHc0mbtfk55-rucTz_GoDH=PL6zWqKYcpuw@mail.gmail.com
> >
> > ...when the Marvell WiFi card tries to reset itself it kills
> > Bluetooth. It was observed that we could re-init the card properly by
> > unbinding / rebinding the host controller. It was also observed that
> > in the downstream Chrome OS codebase the solution used was
> > mmc_remove_host() / mmc_add_host(), which is similar to the solution
> > in this series.
> >
> > So far I've only done testing of this series using the reset test
> > source that can be simulated via sysfs. Specifically I ran this test:
> >
> > for i in $(seq 1000); do
> > echo "LOOP $i --------"
> > echo 1 > /sys/kernel/debug/mwifiex/mlan0/reset
> >
> > while true; do
> > if ! ping -w15 -c1 "${GW}" >/dev/null 2>&1; then
> > fail=$(( fail + 1 ))
> > echo "Fail WiFi ${fail}"
> > if [[ ${fail} == 3 ]]; then
> > exit 1
> > fi
> > else
> > fail=0
> > break
> > fi
> > done
> >
> > hciconfig hci0 down
> > sleep 1
> > if ! hciconfig hci0 up; then
> > echo "Fail BT"
> > exit 1
> > fi
> > done
> >
> > I ran this several times and got several hundred iterations each
> > before a failure. When I saw failures:
> >
> > * Once I saw a "Fail BT"; manually resetting the card again fixed it.
> > I didn't give it time to see if it would have detected this
> > automatically.
> > * Once I saw the ping fail because (for some reason) my device only
> > got an IPv6 address from my router and the IPv4 ping failed. I
> > changed my script to use 'ping6' to see if that would help.
> > * Once I saw the ping fail because the higher level network stack
> > ("shill" in my case) seemed to crash. A few minutes later the
> > system recovered itself automatically. https://crbug.com/984593 if
> > you want more details.
> > * Sometimes while I was testing I saw "Fail WiFi 1" indicating a
> > transitory failure. Usually this was an association failure, but in
> > one case I saw the device do "Firmware wakeup failed" after I
> > triggered the reset. This caused the driver to trigger a re-reset
> > of itself which eventually recovered things. This was good because
> > it was an actual test of the normal reset flow (not the one
> > triggered via sysfs).
> >
> > Changes in v2:
> > - s/routnine/routine (Brian Norris, Matthias Kaehlcke).
> > - s/contining/containing (Matthias Kaehlcke).
> > - Add Matthias Reviewed-by tag.
> > - Removed clear_bit() calls and old comment (Brian Norris).
> > - Explicit CC of Andreas Fenkart.
> > - Explicit CC of Brian Norris.
> > - Add "Fixes" pointing at the commit Brian talked about.
> > - Add Brian's Reviewed-by tag.
> >
> > Douglas Anderson (2):
> > mmc: core: Add sdio_trigger_replug() API
> > mwifiex: Make use of the new sdio_trigger_replug() API to reset
> >
> > drivers/mmc/core/core.c | 28 +++++++++++++++++++--
> > drivers/mmc/core/sdio_io.c | 20 +++++++++++++++
> > drivers/net/wireless/marvell/mwifiex/sdio.c | 16 +-----------
> > include/linux/mmc/host.h | 15 ++++++++++-
> > include/linux/mmc/sdio_func.h | 2 ++
> > 5 files changed, 63 insertions(+), 18 deletions(-)
> >
>
> Doug, thanks for sending this!
>
> As you know, I have been working on additional changes for SDIO
> suspend/resume (still WIP and not ready for sharing) and this series
> is related.
>
> The thing is, that even during system suspend/resume, synchronizations
> are needed between the different layers (mmc host, mmc core and
> sdio-funcs), which is common to the problem you want to solve.
>
> That said, I need to scratch my head a bit more before I can provide
> you some feedback on $subject series. Moreover, it's vacation period
> at my side so things are moving a bit slower. Please be patient.
I had kinda forgotten about this series after we landed it locally in
Chrome OS, but I realized that it never got resolved upstream. Any
chance your head has been sufficiently scratched and you're now happy
with $subject series? ;-)
-Doug
^ permalink raw reply
* Re: [Patch net] sch_sfb: fix a crash in sfb_destroy()
From: Eric Dumazet @ 2019-09-11 21:36 UTC (permalink / raw)
To: Cong Wang, netdev
Cc: syzbot+d5870a903591faaca4ae, Linus Torvalds, Jamal Hadi Salim,
Jiri Pirko
In-Reply-To: <20190911183445.32547-1-xiyou.wangcong@gmail.com>
On 9/11/19 8:34 PM, Cong Wang wrote:
> When tcf_block_get() fails in sfb_init(), q->qdisc is still a NULL
> pointer which leads to a crash in sfb_destroy().
>
> Linus suggested three solutions for this problem, the simplest fix
> is just moving the noop_qdisc assignment before tcf_block_get()
> so that qdisc_put() would become a nop.
>
> Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure")
> Reported-by: syzbot+d5870a903591faaca4ae@syzkaller.appspotmail.com
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Cc: Jiri Pirko <jiri@resnulli.us>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
> net/sched/sch_sfb.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
> index 1dff8506a715..db1c8eb521a2 100644
> --- a/net/sched/sch_sfb.c
> +++ b/net/sched/sch_sfb.c
> @@ -552,11 +552,11 @@ static int sfb_init(struct Qdisc *sch, struct nlattr *opt,
> struct sfb_sched_data *q = qdisc_priv(sch);
> int err;
>
> + q->qdisc = &noop_qdisc;
> +
> err = tcf_block_get(&q->block, &q->filter_list, sch, extack);
> if (err)
> return err;
> -
> - q->qdisc = &noop_qdisc;
> return sfb_change(sch, opt, extack);
> }
>
>
It seems a similar fix would be needed in net/sched/sch_dsmark.c ?
^ permalink raw reply
* [PATCH v3 1/2] tcp: Add TCP_INFO counter for packets received out-of-order
From: Thomas Higdon @ 2019-09-11 22:31 UTC (permalink / raw)
To: netdev@vger.kernel.org
Cc: Jonathan Lemon, Dave Jones, Eric Dumazet, Neal Cardwell
For receive-heavy cases on the server-side, we want to track the
connection quality for individual client IPs. This counter, similar to
the existing system-wide TCPOFOQueue counter in /proc/net/netstat,
tracks out-of-order packet reception. By providing this counter in
TCP_INFO, it will allow understanding to what degree receive-heavy
sockets are experiencing out-of-order delivery and packet drops
indicating congestion.
Please note that this is similar to the counter in NetBSD TCP_INFO, and
has the same name.
Signed-off-by: Thomas Higdon <tph@fb.com>
---
include/linux/tcp.h | 2 ++
include/uapi/linux/tcp.h | 2 ++
net/ipv4/tcp.c | 2 ++
net/ipv4/tcp_input.c | 1 +
4 files changed, 7 insertions(+)
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index f3a85a7fb4b1..a01dc78218f1 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -393,6 +393,8 @@ struct tcp_sock {
*/
struct request_sock *fastopen_rsk;
u32 *saved_syn;
+
+ u32 rcv_ooopack; /* Received out-of-order packets, for tcpinfo */
};
enum tsq_enum {
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index b3564f85a762..20237987ccc8 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -270,6 +270,8 @@ struct tcp_info {
__u64 tcpi_bytes_retrans; /* RFC4898 tcpEStatsPerfOctetsRetrans */
__u32 tcpi_dsack_dups; /* RFC4898 tcpEStatsStackDSACKDups */
__u32 tcpi_reord_seen; /* reordering events seen */
+
+ __u32 tcpi_rcv_ooopack; /* Out-of-order packets received */
};
/* netlink attributes types for SCM_TIMESTAMPING_OPT_STATS */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 94df48bcecc2..4cf58208270e 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2653,6 +2653,7 @@ int tcp_disconnect(struct sock *sk, int flags)
tp->rx_opt.saw_tstamp = 0;
tp->rx_opt.dsack = 0;
tp->rx_opt.num_sacks = 0;
+ tp->rcv_ooopack = 0;
/* Clean up fastopen related fields */
@@ -3295,6 +3296,7 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
info->tcpi_bytes_retrans = tp->bytes_retrans;
info->tcpi_dsack_dups = tp->dsack_dups;
info->tcpi_reord_seen = tp->reord_seen;
+ info->tcpi_rcv_ooopack = tp->rcv_ooopack;
unlock_sock_fast(sk, slow);
}
EXPORT_SYMBOL_GPL(tcp_get_info);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 706cbb3b2986..2ef333354026 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4555,6 +4555,7 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
tp->pred_flags = 0;
inet_csk_schedule_ack(sk);
+ tp->rcv_ooopack += max_t(u16, 1, skb_shinfo(skb)->gso_segs);
NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPOFOQUEUE);
seq = TCP_SKB_CB(skb)->seq;
end_seq = TCP_SKB_CB(skb)->end_seq;
--
2.17.1
^ permalink raw reply related
* Re: [PATCH net v2 01/11] net: core: limit nested device depth
From: David Miller @ 2019-09-11 22:32 UTC (permalink / raw)
To: ap420073
Cc: netdev, j.vosburgh, vfalico, andy, jiri, sd, roopa, saeedm,
manishc, rahulv, kys, haiyangz, sthemmin, sashal, hare, varun,
ubraun, kgraul, jay.vosburgh
In-Reply-To: <20190907134532.31975-1-ap420073@gmail.com>
From: Taehee Yoo <ap420073@gmail.com>
Date: Sat, 7 Sep 2019 22:45:32 +0900
> Current code doesn't limit the number of nested devices.
> Nested devices would be handled recursively and this needs huge stack
> memory. So, unlimited nested devices could make stack overflow.
...
> Splat looks like:
> [ 140.483124] BUG: looking up invalid subclass: 8
> [ 140.483505] turning off the locking correctness validator.
The limit here is not stack memory, but a limit in the lockdep
validator, which can probably be fixed by other means.
This was the feedback I saw given for the previous version of
this series as well.
^ permalink raw reply
* [PATCH v3 2/2] tcp: Add rcv_wnd to TCP_INFO
From: Thomas Higdon @ 2019-09-11 22:31 UTC (permalink / raw)
To: netdev@vger.kernel.org
Cc: Jonathan Lemon, Dave Jones, Eric Dumazet, Neal Cardwell
In-Reply-To: <20190911223148.89808-1-tph@fb.com>
Neal Cardwell mentioned that rcv_wnd would be useful for helping
diagnose whether a flow is receive-window-limited at a given instant.
This serves the purpose of adding an additional __u32 to avoid the
would-be hole caused by the addition of the tcpi_rcvi_ooopack field.
Signed-off-by: Thomas Higdon <tph@fb.com>
---
include/uapi/linux/tcp.h | 1 +
net/ipv4/tcp.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 20237987ccc8..8a0d1d1af622 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -272,6 +272,7 @@ struct tcp_info {
__u32 tcpi_reord_seen; /* reordering events seen */
__u32 tcpi_rcv_ooopack; /* Out-of-order packets received */
+ __u32 tcpi_rcv_wnd; /* Receive window size */
};
/* netlink attributes types for SCM_TIMESTAMPING_OPT_STATS */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 4cf58208270e..c980145c4247 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3297,6 +3297,7 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
info->tcpi_dsack_dups = tp->dsack_dups;
info->tcpi_reord_seen = tp->reord_seen;
info->tcpi_rcv_ooopack = tp->rcv_ooopack;
+ info->tcpi_rcv_wnd = tp->rcv_wnd;
unlock_sock_fast(sk, slow);
}
EXPORT_SYMBOL_GPL(tcp_get_info);
--
2.17.1
^ permalink raw reply related
* Re: [PATCH v4 1/2] net: phy: dp83867: Add documentation for SGMII mode type
From: David Miller @ 2019-09-11 22:37 UTC (permalink / raw)
To: vitaly.gaiduk
Cc: robh+dt, f.fainelli, mark.rutland, andrew, tpiepho, netdev,
devicetree, linux-kernel
In-Reply-To: <1568049566-16708-2-git-send-email-vitaly.gaiduk@cloudbear.ru>
From: Vitaly Gaiduk <vitaly.gaiduk@cloudbear.ru>
Date: Mon, 9 Sep 2019 20:19:25 +0300
> Add documentation of ti,sgmii-ref-clock-output-enable
> which can be used to select SGMII mode type (4 or 6-wire).
>
> Signed-off-by: Vitaly Gaiduk <vitaly.gaiduk@cloudbear.ru>
Applied.
^ permalink raw reply
* Re: [PATCH v4 2/2] net: phy: dp83867: Add SGMII mode type switching
From: David Miller @ 2019-09-11 22:37 UTC (permalink / raw)
To: vitaly.gaiduk
Cc: robh+dt, f.fainelli, mark.rutland, andrew, hkallweit1, tpiepho,
netdev, devicetree, linux-kernel
In-Reply-To: <1568049566-16708-1-git-send-email-vitaly.gaiduk@cloudbear.ru>
From: Vitaly Gaiduk <vitaly.gaiduk@cloudbear.ru>
Date: Mon, 9 Sep 2019 20:19:24 +0300
> This patch adds ability to switch beetween two PHY SGMII modes.
> Some hardware, for example, FPGA IP designs may use 6-wire mode
> which enables differential SGMII clock to MAC.
>
> Signed-off-by: Vitaly Gaiduk <vitaly.gaiduk@cloudbear.ru>
Applied.
^ permalink raw reply
* Re: [PATCH net-next] net: stmmac: pci: Add HAPS support using GMAC5
From: David Miller @ 2019-09-11 22:50 UTC (permalink / raw)
To: Jose.Abreu
Cc: netdev, Joao.Pinto, peppe.cavallaro, alexandre.torgue,
mcoquelin.stm32, linux-stm32, linux-arm-kernel, linux-kernel
In-Reply-To: <c37a55225e1ef66233b47c02b1441b91abeb3b76.1568047994.git.joabreu@synopsys.com>
From: Jose Abreu <Jose.Abreu@synopsys.com>
Date: Mon, 9 Sep 2019 18:54:26 +0200
> Add the support for Synopsys HAPS board that uses GMAC5.
>
> Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next] ipv6: Don't use dst gateway directly in ip6_confirm_neigh()
From: David Miller @ 2019-09-11 22:52 UTC (permalink / raw)
To: sbrivio; +Cc: gnault, ja, nicolas.dichtel, dsahern, netdev
In-Reply-To: <938b711c35ce3fa2b6f057cc23919e897a1e5c2b.1568061608.git.sbrivio@redhat.com>
From: Stefano Brivio <sbrivio@redhat.com>
Date: Mon, 9 Sep 2019 22:44:06 +0200
> This is the equivalent of commit 2c6b55f45d53 ("ipv6: fix neighbour
> resolution with raw socket") for ip6_confirm_neigh(): we can send a
> packet with MSG_CONFIRM on a raw socket for a connected route, so the
> gateway would be :: here, and we should pick the next hop using
> rt6_nexthop() instead.
>
> This was found by code review and, to the best of my knowledge, doesn't
> actually fix a practical issue: the destination address from the packet
> is not considered while confirming a neighbour, as ip6_confirm_neigh()
> calls choose_neigh_daddr() without passing the packet, so there are no
> similar issues as the one fixed by said commit.
>
> A possible source of issues with the existing implementation might come
> from the fact that, if we have a cached dst, we won't consider it,
> while rt6_nexthop() takes care of that. I might just not be creative
> enough to find a practical problem here: the only way to affect this
> with cached routes is to have one coming from an ICMPv6 redirect, but
> if the next hop is a directly connected host, there should be no
> topology for which a redirect applies here, and tests with redirected
> routes show no differences for MSG_CONFIRM (and MSG_PROBE) packets on
> raw sockets destined to a directly connected host.
>
> However, directly using the dst gateway here is not consistent anymore
> with neighbour resolution, and, in general, as we want the next hop,
> using rt6_nexthop() looks like the only sane way to fetch it.
>
> Reported-by: Guillaume Nault <gnault@redhat.com>
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Applied.
^ permalink raw reply
* Re: [PATCH 1/7] net/dsa: configure autoneg for CPU port
From: Andrew Lunn @ 2019-09-11 22:52 UTC (permalink / raw)
To: Robert Beckett; +Cc: Florian Fainelli, netdev, Vivien Didelot, David S. Miller
In-Reply-To: <ad302835a98ca5abc7ac88b3caad64867e33ee70.camel@collabora.com>
> It is not just for broadcast storm protection. The original issue that
> made me look in to all of this turned out to be rx descritor ring
> buffer exhaustion due to the CPU not being able to keep up with packet
> reception.
Pause frames does not really solve this problem. The switch will at
some point fill its buffers, and start throwing packets away. Or it
needs to send pause packets it its peers. And then your whole switch
throughput goes down. Packets will always get thrown away, so you need
QoS in your network to give the network hints about which frames is
should throw away first.
..
> Fundamentally, with a phy to phy CPU connection, the CPU MAC may well
> wish to enable pause frames for various reasons, so we should strive to
> handle that I think.
It actually has nothing to do with PHY to PHY connections. You can use
pause frames with direct MAC to MAC connections. PHY auto-negotiation
is one way to indicate both ends support it, but there are also other
ways. e.g.
ethtool -A|--pause devname [autoneg on|off] [rx on|off] [tx on|off]
on the SoC you could do
ethtool --pause eth0 autoneg off rx on tx on
to force the SoC to send and process pause frames. Ideally i would
prefer a solution like this, since it is not a change of behaviour for
everybody else.
Andrew
^ permalink raw reply
* Re: [PATCH net] tcp: fix tcp_ecn_withdraw_cwr() to clear TCP_ECN_QUEUE_CWR
From: David Miller @ 2019-09-11 22:54 UTC (permalink / raw)
To: ncardwell; +Cc: netdev, ycheng, soheil, edumazet
In-Reply-To: <20190909205602.248472-1-ncardwell@google.com>
From: Neal Cardwell <ncardwell@google.com>
Date: Mon, 9 Sep 2019 16:56:02 -0400
> Fix tcp_ecn_withdraw_cwr() to clear the correct bit:
> TCP_ECN_QUEUE_CWR.
>
> Rationale: basically, TCP_ECN_DEMAND_CWR is a bit that is purely about
> the behavior of data receivers, and deciding whether to reflect
> incoming IP ECN CE marks as outgoing TCP th->ece marks. The
> TCP_ECN_QUEUE_CWR bit is purely about the behavior of data senders,
> and deciding whether to send CWR. The tcp_ecn_withdraw_cwr() function
> is only called from tcp_undo_cwnd_reduction() by data senders during
> an undo, so it should zero the sender-side state,
> TCP_ECN_QUEUE_CWR. It does not make sense to stop the reflection of
> incoming CE bits on incoming data packets just because outgoing
> packets were spuriously retransmitted.
>
> The bug has been reproduced with packetdrill to manifest in a scenario
> with RFC3168 ECN, with an incoming data packet with CE bit set and
> carrying a TCP timestamp value that causes cwnd undo. Before this fix,
> the IP CE bit was ignored and not reflected in the TCP ECE header bit,
> and sender sent a TCP CWR ('W') bit on the next outgoing data packet,
> even though the cwnd reduction had been undone. After this fix, the
> sender properly reflects the CE bit and does not set the W bit.
>
> Note: the bug actually predates 2005 git history; this Fixes footer is
> chosen to be the oldest SHA1 I have tested (from Sep 2007) for which
> the patch applies cleanly (since before this commit the code was in a
> .h file).
>
> Fixes: bdf1ee5d3bd3 ("[TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it")
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> Acked-by: Yuchung Cheng <ycheng@google.com>
> Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
> Cc: Eric Dumazet <edumazet@google.com>
Applied and queued up for -stable, thanks Neal.
^ permalink raw reply
* Re: [PATCH 0/7] net: dsa: mv88e6xxx: features to handle network storms
From: Andrew Lunn @ 2019-09-11 22:58 UTC (permalink / raw)
To: Robert Beckett
Cc: Ido Schimmel, Florian Fainelli, netdev@vger.kernel.org,
Vivien Didelot, David S. Miller, Jiri Pirko
In-Reply-To: <3f50ee51ec04a2d683a5338a68607824a3f45711.camel@collabora.com>
> We have a setup as follows:
>
> Marvell 88E6240 switch chip, accepting traffic from 4 ports. Port 1
> (P1) is critical priority, no dropped packets allowed, all others can
> be best effort.
>
> CPU port of swtich chip is connected via phy to phy of intel i210 (igb
> driver).
>
> i210 is connected via pcie switch to imx6.
>
> When too many small packets attempt to be delivered to CPU port (e.g.
> during broadcast flood) we saw dropped packets.
>
> The packets were being received by i210 in to rx descriptor buffer
> fine, but the CPU could not keep up with the load. We saw
> rx_fifo_errors increasing rapidly and ksoftirqd at ~100% CPU.
>
>
> With this in mind, I am wondering whether any amount of tc traffic
> shaping would help?
Hi Robert
The model in linux is that you start with a software TC filter, and
then offload it to the hardware. So the user configures TC just as
normal, and then that is used to program the hardware to do the same
thing as what would happen in software. This is exactly the same as we
do with bridging. You create a software bridge and add interfaces to
the bridge. This then gets offloaded to the hardware and it does the
bridging for you.
So think about how your can model the Marvell switch capabilities
using TC, and implement offload support for it.
Andrew
^ permalink raw reply
* Re: [PATCH net-next] nfp: read chip model from the PluDevice register
From: David Miller @ 2019-09-11 23:01 UTC (permalink / raw)
To: simon.horman; +Cc: jakub.kicinski, netdev, oss-drivers, dirk.vandermerwe
In-Reply-To: <20190911152118.30698-1-simon.horman@netronome.com>
From: Simon Horman <simon.horman@netronome.com>
Date: Wed, 11 Sep 2019 16:21:18 +0100
> From: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
>
> The PluDevice register provides the authoritative chip model/revision.
>
> Since the model number is purely used for reporting purposes, follow
> the hardware team convention of subtracting 0x10 from the PluDevice
> register to obtain the chip model/revision number.
>
> Suggested-by: Francois H. Theron <francois.theron@netronome.com>
> Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
> Signed-off-by: Simon Horman <simon.horman@netronome.com>
Applied.
^ permalink raw reply
* Re: [PATCH 0/7] net: dsa: mv88e6xxx: features to handle network storms
From: Andrew Lunn @ 2019-09-11 23:01 UTC (permalink / raw)
To: Robert Beckett; +Cc: Vivien Didelot, netdev, Florian Fainelli, David S. Miller
In-Reply-To: <3f265c5afcb2eea48410ec607d65e8f4e6a20373.camel@collabora.com>
> > Feature series targeting netdev must be prefixed "PATCH net-next". As
>
> Thanks for the info. Out of curiosity, where should I have gleaned this
> info from? This is my first contribution to netdev, so I wasnt familiar
> with the etiquette.
It is also a good idea to 'lurk' in a mailing list for a while,
reading emails flying around, getting to know how things work. This
subject of "PATCH net-next" comes up maybe once a week. The idea off
offloads gets discussed once every couple of weeks etc.
Andrew
^ permalink raw reply
* Re: [net 0/2][pull request] Intel Wired LAN Driver Updates 2019-09-11
From: David Miller @ 2019-09-11 23:08 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann
In-Reply-To: <20190911164955.10644-1-jeffrey.t.kirsher@intel.com>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 11 Sep 2019 09:49:53 -0700
> This series contains fixes to ixgbe.
>
> Alex fixes up the adaptive ITR scheme for ixgbe which could result in a
> value that was either 0 or something less than 10 which was causing
> issues with hardware features, like RSC, that do not function well with
> ITR values that low.
>
> Ilya Maximets fixes the ixgbe driver to limit the number of transmit
> descriptors to clean by the number of transmit descriptors used in the
> transmit ring, so that the driver does not try to "double" clean the
> same descriptors.
Pulled, thanks Jeff.
^ permalink raw reply
* [PATCH][PATCH net-next] hv_sock: Add the support of hibernation
From: Dexuan Cui @ 2019-09-11 23:37 UTC (permalink / raw)
To: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
sashal@kernel.org, davem@davemloft.net,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, Michael Kelley
Cc: Dexuan Cui
Add the necessary dummy callbacks for hibernation.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
This patch is basically a pure Hyper-V specific change and it has a
build dependency on the commit 271b2224d42f ("Drivers: hv: vmbus: Implement
suspend/resume for VSC drivers for hibernation"), which is on Sasha Levin's
Hyper-V tree's hyperv-next branch:
https://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git/log/?h=hyperv-next
I request this patch should go through Sasha's tree rather than the
net-next tree.
net/vmw_vsock/hyperv_transport.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index f2084e3..e91a884 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -930,6 +930,24 @@ static int hvs_remove(struct hv_device *hdev)
return 0;
}
+/* hv_sock connections can not persist across hibernation, and all the hv_sock
+ * channels are forceed to be rescinded before hibernation: see
+ * vmbus_bus_suspend(). Here the dummy hvs_suspend() and hvs_resume()
+ * are only needed because hibernation requires that every device's driver
+ * should have a .suspend and .resume callback: see vmbus_suspend().
+ */
+static int hvs_suspend(struct hv_device *hv_dev)
+{
+ /* Dummy */
+ return 0;
+}
+
+static int hvs_resume(struct hv_device *dev)
+{
+ /* Dummy */
+ return 0;
+}
+
/* This isn't really used. See vmbus_match() and vmbus_probe() */
static const struct hv_vmbus_device_id id_table[] = {
{},
@@ -941,6 +959,8 @@ static int hvs_remove(struct hv_device *hdev)
.id_table = id_table,
.probe = hvs_probe,
.remove = hvs_remove,
+ .suspend = hvs_suspend,
+ .resume = hvs_resume,
};
static int __init hvs_init(void)
--
1.8.3.1
^ permalink raw reply related
* [PATCH][PATCH net-next] hv_netvsc: Add the support of hibernation
From: Dexuan Cui @ 2019-09-11 23:37 UTC (permalink / raw)
To: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
sashal@kernel.org, davem@davemloft.net,
linux-hyperv@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, Michael Kelley
Cc: Dexuan Cui
The existing netvsc_detach() and netvsc_attach() APIs make it easy to
implement the suspend/resume callbacks.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
This patch is basically a pure Hyper-V specific change and it has a
build dependency on the commit 271b2224d42f ("Drivers: hv: vmbus: Implement
suspend/resume for VSC drivers for hibernation"), which is on Sasha Levin's
Hyper-V tree's hyperv-next branch:
https://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git/log/?h=hyperv-next
I request this patch should go through Sasha's tree rather than the
net-next tree.
drivers/net/hyperv/hyperv_net.h | 3 +++
drivers/net/hyperv/netvsc_drv.c | 59 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 62 insertions(+)
diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index ecc9af0..b8763ee 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -952,6 +952,9 @@ struct net_device_context {
u32 vf_alloc;
/* Serial number of the VF to team with */
u32 vf_serial;
+
+ /* Used to temporarily save the config info across hibernation */
+ struct netvsc_device_info *saved_netvsc_dev_info;
};
/* Per channel data */
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index afdcc56..f920959 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -2392,6 +2392,63 @@ static int netvsc_remove(struct hv_device *dev)
return 0;
}
+static int netvsc_suspend(struct hv_device *dev)
+{
+ struct net_device_context *ndev_ctx;
+ struct net_device *vf_netdev, *net;
+ struct netvsc_device *nvdev;
+ int ret;
+
+ net = hv_get_drvdata(dev);
+
+ ndev_ctx = netdev_priv(net);
+ cancel_delayed_work_sync(&ndev_ctx->dwork);
+
+ rtnl_lock();
+
+ nvdev = rtnl_dereference(ndev_ctx->nvdev);
+ if (nvdev == NULL) {
+ ret = -ENODEV;
+ goto out;
+ }
+
+ cancel_work_sync(&nvdev->subchan_work);
+
+ vf_netdev = rtnl_dereference(ndev_ctx->vf_netdev);
+ if (vf_netdev)
+ netvsc_unregister_vf(vf_netdev);
+
+ /* Save the current config info */
+ ndev_ctx->saved_netvsc_dev_info = netvsc_devinfo_get(nvdev);
+
+ ret = netvsc_detach(net, nvdev);
+out:
+ rtnl_unlock();
+
+ return ret;
+}
+
+static int netvsc_resume(struct hv_device *dev)
+{
+ struct net_device *net = hv_get_drvdata(dev);
+ struct net_device_context *net_device_ctx;
+ struct netvsc_device_info *device_info;
+ int ret;
+
+ rtnl_lock();
+
+ net_device_ctx = netdev_priv(net);
+ device_info = net_device_ctx->saved_netvsc_dev_info;
+
+ ret = netvsc_attach(net, device_info);
+
+ rtnl_unlock();
+
+ kfree(device_info);
+ net_device_ctx->saved_netvsc_dev_info = NULL;
+
+ return ret;
+}
static const struct hv_vmbus_device_id id_table[] = {
/* Network guid */
{ HV_NIC_GUID, },
@@ -2406,6 +2463,8 @@ static int netvsc_remove(struct hv_device *dev)
.id_table = id_table,
.probe = netvsc_probe,
.remove = netvsc_remove,
+ .suspend = netvsc_suspend,
+ .resume = netvsc_resume,
.driver = {
.probe_type = PROBE_FORCE_SYNCHRONOUS,
},
--
1.8.3.1
^ permalink raw reply related
* Klientskie Bazy http://prodawez.tilda.ws/page7270311.html
From: netdev @ 2019-09-12 0:25 UTC (permalink / raw)
To: netdev
Klientskie Bazy http://prodawez.tilda.ws/page7270311.html
^ permalink raw reply
* Re: [PATCH v3 2/2] tcp: Add rcv_wnd to TCP_INFO
From: Neal Cardwell @ 2019-09-12 0:49 UTC (permalink / raw)
To: Thomas Higdon
Cc: netdev@vger.kernel.org, Jonathan Lemon, Dave Jones, Eric Dumazet,
Yuchung Cheng, Soheil Hassas Yeganeh
In-Reply-To: <20190911223148.89808-2-tph@fb.com>
On Wed, Sep 11, 2019 at 6:32 PM Thomas Higdon <tph@fb.com> wrote:
>
> Neal Cardwell mentioned that rcv_wnd would be useful for helping
> diagnose whether a flow is receive-window-limited at a given instant.
>
> This serves the purpose of adding an additional __u32 to avoid the
> would-be hole caused by the addition of the tcpi_rcvi_ooopack field.
>
> Signed-off-by: Thomas Higdon <tph@fb.com>
> ---
Thanks, Thomas.
I know that when I mentioned this before I mentioned the idea of both
tp->snd_wnd (send-side receive window) and tp->rcv_wnd (receive-side
receive window) in tcp_info, and did not express a preference between
the two. Now that we are faced with a decision between the two,
personally I think it would be a little more useful to start with
tp->snd_wnd. :-)
Two main reasons:
(1) Usually when we're diagnosing TCP performance problems, we do so
from the sender, since the sender makes most of the
performance-critical decisions (cwnd, pacing, TSO size, TSQ, etc).
From the sender-side the thing that would be most useful is to see
tp->snd_wnd, the receive window that the receiver has advertised to
the sender.
(2) From the receiver side, "ss" can already show a fair amount of
info about receive-side buffer/window limits, like:
info->tcpi_rcv_ssthresh, info->tcpi_rcv_space,
skmeminfo[SK_MEMINFO_RMEM_ALLOC], skmeminfo[SK_MEMINFO_RCVBUF]. Often
the rwin can be approximated by combining those.
Hopefully Eric, Yuchung, and Soheil can weigh in on the question of
snd_wnd vs rcv_wnd. Or we can perhaps think of another field, and add
the tcpi_rcvi_ooopack, snd_wnd, rcv_wnd, and that final field, all
together.
thanks,
neal
^ permalink raw reply
* Re: [PATCH V2 net-next 4/7] net: hns3: fix port setting handle for fibre port
From: tanhuazhong @ 2019-09-12 0:56 UTC (permalink / raw)
To: Sergei Shtylyov, davem
Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
jakub.kicinski
In-Reply-To: <7f914173-a2fc-08d8-e2b1-48fa3da4e29c@cogentembedded.com>
On 2019/9/11 18:16, Sergei Shtylyov wrote:
> Hello!
>
> On 11.09.2019 5:40, Huazhong Tan wrote:
>
>> From: Guangbin Huang <huangguangbin2@huawei.com>
>>
>> For hardware doesn't support use specified speed and duplex
>
> Can't pasre that. "For hardware that does not support using", perhaps?
Yes, thanks. Will check the grammar more carefully next time.
>
>> to negotiate, it's unnecessary to check and modify the port
>> speed and duplex for fibre port when autoneg is on.
>>
>> Fixes: 22f48e24a23d ("net: hns3: add autoneg and change speed support
>> for fibre port")
>> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
>> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
>> ---
>> drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 15 +++++++++++++++
>> 1 file changed, 15 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
>> b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
>> index f5a681d..680c350 100644
>> --- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
>> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
>> @@ -726,6 +726,12 @@ static int hns3_check_ksettings_param(const
>> struct net_device *netdev,
>> u8 duplex;
>> int ret;
>> + /* hw doesn't support use specified speed and duplex to negotiate,
>
> I can't parse that, did you mean "using"?
yes, thanks.
>
>> + * unnecessary to check them when autoneg on.
>> + */
>> + if (cmd->base.autoneg)
>> + return 0;
>> +
>> if (ops->get_ksettings_an_result) {
>> ops->get_ksettings_an_result(handle, &autoneg, &speed,
>> &duplex);
>> if (cmd->base.autoneg == autoneg && cmd->base.speed == speed &&
>> @@ -787,6 +793,15 @@ static int hns3_set_link_ksettings(struct
>> net_device *netdev,
>> return ret;
>> }
>> + /* hw doesn't support use specified speed and duplex to negotiate,
>
> Here too...
>
yes, thanks.
>> + * ignore them when autoneg on.
>> + */
>> + if (cmd->base.autoneg) {
>> + netdev_info(netdev,
>> + "autoneg is on, ignore the speed and duplex\n");
>> + return 0;
>> + }
>> +
>> if (ops->cfg_mac_speed_dup_h)
>> ret = ops->cfg_mac_speed_dup_h(handle, cmd->base.speed,
>> cmd->base.duplex);
>
> MBR, Sergei
>
> .
>
^ permalink raw reply
* Re: [Patch net] sch_sfb: fix a crash in sfb_destroy()
From: Cong Wang @ 2019-09-12 1:10 UTC (permalink / raw)
To: Eric Dumazet
Cc: Linux Kernel Network Developers, syzbot, Linus Torvalds,
Jamal Hadi Salim, Jiri Pirko
In-Reply-To: <7b5b69a9-7ace-2d21-f187-7a81fb1dae5a@gmail.com>
On Wed, Sep 11, 2019 at 2:36 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> It seems a similar fix would be needed in net/sched/sch_dsmark.c ?
>
Yeah, or just add a NULL check in dsmark_destroy().
Anyway, I will send a separate patch for it.
Thanks.
^ permalink raw reply
* Re: [PATCH v1 net-next 12/15] net: dsa: sja1105: Configure the Time-Aware Scheduler via tc-taprio offload
From: Vladimir Oltean @ 2019-09-12 1:30 UTC (permalink / raw)
To: Vinicius Costa Gomes
Cc: f.fainelli, vivien.didelot, andrew, davem, vedang.patel,
richardcochran, weifeng.voon, jiri, m-karicheri2, Jose.Abreu,
ilias.apalodimas, jhs, xiyou.wangcong, kurt.kanzenbach, netdev
In-Reply-To: <87woeeipm8.fsf@linux.intel.com>
Hi Vinicius,
On 11/09/2019, Vinicius Costa Gomes <vinicius.gomes@intel.com> wrote:
> Hi,
>
> Vladimir Oltean <olteanv@gmail.com> writes:
>
>> This qdisc offload is the closest thing to what the SJA1105 supports in
>> hardware for time-based egress shaping. The switch core really is built
>> around SAE AS6802/TTEthernet (a TTTech standard) but can be made to
>> operate similarly to IEEE 802.1Qbv with some constraints:
>>
>> - The gate control list is a global list for all ports. There are 8
>> execution threads that iterate through this global list in parallel.
>> I don't know why 8, there are only 4 front-panel ports.
>>
>> - Care must be taken by the user to make sure that two execution threads
>> never get to execute a GCL entry simultaneously. I created a O(n^4)
>> checker for this hardware limitation, prior to accepting a taprio
>> offload configuration as valid.
>>
>> - The spec says that if a GCL entry's interval is shorter than the frame
>> length, you shouldn't send it (and end up in head-of-line blocking).
>> Well, this switch does anyway.
>>
>> - The switch has no concept of ADMIN and OPER configurations. Because
>> it's so simple, the TAS settings are loaded through the static config
>> tables interface, so there isn't even place for any discussion about
>> 'graceful switchover between ADMIN and OPER'. You just reset the
>> switch and upload a new OPER config.
>>
>> - The switch accepts multiple time sources for the gate events. Right
>> now I am using the standalone clock source as opposed to PTP. So the
>> base time parameter doesn't really do much. Support for the PTP clock
>> source will be added in the next patch.
>>
>> Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
>> ---
>> Changes since RFC:
>> - Removed the sja1105_tas_config_work workqueue.
>> - Allocating memory with GFP_KERNEL.
>> - Made the ASCII art drawing fit in < 80 characters.
>> - Made most of the time-holding variables s64 instead of u64 (for fear
>> of them not holding the result of signed arithmetics properly).
>>
>> drivers/net/dsa/sja1105/Kconfig | 8 +
>> drivers/net/dsa/sja1105/Makefile | 4 +
>> drivers/net/dsa/sja1105/sja1105.h | 5 +
>> drivers/net/dsa/sja1105/sja1105_main.c | 19 +-
>> drivers/net/dsa/sja1105/sja1105_tas.c | 420 +++++++++++++++++++++++++
>> drivers/net/dsa/sja1105/sja1105_tas.h | 42 +++
>> 6 files changed, 497 insertions(+), 1 deletion(-)
>> create mode 100644 drivers/net/dsa/sja1105/sja1105_tas.c
>> create mode 100644 drivers/net/dsa/sja1105/sja1105_tas.h
>>
>> diff --git a/drivers/net/dsa/sja1105/Kconfig
>> b/drivers/net/dsa/sja1105/Kconfig
>> index 770134a66e48..55424f39cb0d 100644
>> --- a/drivers/net/dsa/sja1105/Kconfig
>> +++ b/drivers/net/dsa/sja1105/Kconfig
>> @@ -23,3 +23,11 @@ config NET_DSA_SJA1105_PTP
>> help
>> This enables support for timestamping and PTP clock manipulations in
>> the SJA1105 DSA driver.
>> +
>> +config NET_DSA_SJA1105_TAS
>> + bool "Support for the Time-Aware Scheduler on NXP SJA1105"
>> + depends on NET_DSA_SJA1105
>> + help
>> + This enables support for the TTEthernet-based egress scheduling
>> + engine in the SJA1105 DSA driver, which is controlled using a
>> + hardware offload of the tc-tqprio qdisc.
>> diff --git a/drivers/net/dsa/sja1105/Makefile
>> b/drivers/net/dsa/sja1105/Makefile
>> index 4483113e6259..66161e874344 100644
>> --- a/drivers/net/dsa/sja1105/Makefile
>> +++ b/drivers/net/dsa/sja1105/Makefile
>> @@ -12,3 +12,7 @@ sja1105-objs := \
>> ifdef CONFIG_NET_DSA_SJA1105_PTP
>> sja1105-objs += sja1105_ptp.o
>> endif
>> +
>> +ifdef CONFIG_NET_DSA_SJA1105_TAS
>> +sja1105-objs += sja1105_tas.o
>> +endif
>> diff --git a/drivers/net/dsa/sja1105/sja1105.h
>> b/drivers/net/dsa/sja1105/sja1105.h
>> index 3ca0b87aa3e4..d95f9ce3b4f9 100644
>> --- a/drivers/net/dsa/sja1105/sja1105.h
>> +++ b/drivers/net/dsa/sja1105/sja1105.h
>> @@ -21,6 +21,7 @@
>> #define SJA1105_AGEING_TIME_MS(ms) ((ms) / 10)
>>
>> #include "sja1105_ptp.h"
>> +#include "sja1105_tas.h"
>>
>> /* Keeps the different addresses between E/T and P/Q/R/S */
>> struct sja1105_regs {
>> @@ -96,6 +97,7 @@ struct sja1105_private {
>> struct mutex mgmt_lock;
>> struct sja1105_tagger_data tagger_data;
>> struct sja1105_ptp_data ptp_data;
>> + struct sja1105_tas_data tas_data;
>> };
>>
>> #include "sja1105_dynamic_config.h"
>> @@ -111,6 +113,9 @@ typedef enum {
>> SPI_WRITE = 1,
>> } sja1105_spi_rw_mode_t;
>>
>> +/* From sja1105_main.c */
>> +int sja1105_static_config_reload(struct sja1105_private *priv);
>> +
>> /* From sja1105_spi.c */
>> int sja1105_spi_send_packed_buf(const struct sja1105_private *priv,
>> sja1105_spi_rw_mode_t rw, u64 reg_addr,
>> diff --git a/drivers/net/dsa/sja1105/sja1105_main.c
>> b/drivers/net/dsa/sja1105/sja1105_main.c
>> index 8b930cc2dabc..4b393782cc84 100644
>> --- a/drivers/net/dsa/sja1105/sja1105_main.c
>> +++ b/drivers/net/dsa/sja1105/sja1105_main.c
>> @@ -22,6 +22,7 @@
>> #include <linux/if_ether.h>
>> #include <linux/dsa/8021q.h>
>> #include "sja1105.h"
>> +#include "sja1105_tas.h"
>>
>> static void sja1105_hw_reset(struct gpio_desc *gpio, unsigned int
>> pulse_len,
>> unsigned int startup_delay)
>> @@ -1382,7 +1383,7 @@ static void sja1105_bridge_leave(struct dsa_switch
>> *ds, int port,
>> * modify at runtime (currently only MAC) and restore them after
>> uploading,
>> * such that this operation is relatively seamless.
>> */
>> -static int sja1105_static_config_reload(struct sja1105_private *priv)
>> +int sja1105_static_config_reload(struct sja1105_private *priv)
>> {
>> struct ptp_system_timestamp ptp_sts_before;
>> struct ptp_system_timestamp ptp_sts_after;
>> @@ -1761,6 +1762,7 @@ static void sja1105_teardown(struct dsa_switch *ds)
>> {
>> struct sja1105_private *priv = ds->priv;
>>
>> + sja1105_tas_teardown(priv);
>> cancel_work_sync(&priv->tagger_data.rxtstamp_work);
>> skb_queue_purge(&priv->tagger_data.skb_rxtstamp_queue);
>> sja1105_ptp_clock_unregister(priv);
>> @@ -2088,6 +2090,18 @@ static bool sja1105_port_txtstamp(struct dsa_switch
>> *ds, int port,
>> return true;
>> }
>>
>> +static int sja1105_port_setup_tc(struct dsa_switch *ds, int port,
>> + enum tc_setup_type type,
>> + void *type_data)
>> +{
>> + switch (type) {
>> + case TC_SETUP_QDISC_TAPRIO:
>> + return sja1105_setup_tc_taprio(ds, port, type_data);
>> + default:
>> + return -EOPNOTSUPP;
>> + }
>> +}
>> +
>> static const struct dsa_switch_ops sja1105_switch_ops = {
>> .get_tag_protocol = sja1105_get_tag_protocol,
>> .setup = sja1105_setup,
>> @@ -2120,6 +2134,7 @@ static const struct dsa_switch_ops
>> sja1105_switch_ops = {
>> .port_hwtstamp_set = sja1105_hwtstamp_set,
>> .port_rxtstamp = sja1105_port_rxtstamp,
>> .port_txtstamp = sja1105_port_txtstamp,
>> + .port_setup_tc = sja1105_port_setup_tc,
>> };
>>
>> static int sja1105_check_device_id(struct sja1105_private *priv)
>> @@ -2229,6 +2244,8 @@ static int sja1105_probe(struct spi_device *spi)
>> }
>> mutex_init(&priv->mgmt_lock);
>>
>> + sja1105_tas_setup(priv);
>> +
>> return dsa_register_switch(priv->ds);
>> }
>>
>> diff --git a/drivers/net/dsa/sja1105/sja1105_tas.c
>> b/drivers/net/dsa/sja1105/sja1105_tas.c
>> new file mode 100644
>> index 000000000000..769e1d8e5e8f
>> --- /dev/null
>> +++ b/drivers/net/dsa/sja1105/sja1105_tas.c
>> @@ -0,0 +1,420 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/* Copyright (c) 2019, Vladimir Oltean <olteanv@gmail.com>
>> + */
>> +#include "sja1105.h"
>> +
>> +#define SJA1105_TAS_CLKSRC_DISABLED 0
>> +#define SJA1105_TAS_CLKSRC_STANDALONE 1
>> +#define SJA1105_TAS_CLKSRC_AS6802 2
>> +#define SJA1105_TAS_CLKSRC_PTP 3
>> +#define SJA1105_GATE_MASK GENMASK_ULL(SJA1105_NUM_TC - 1, 0)
>> +#define SJA1105_TAS_MAX_DELTA BIT(19)
>> +
>> +/* This is not a preprocessor macro because the "ns" argument may or may
>> not be
>> + * s64 at caller side. This ensures it is properly type-cast before
>> div_s64.
>> + */
>> +static s64 ns_to_sja1105_delta(s64 ns)
>> +{
>> + return div_s64(ns, 200);
>> +}
>> +
>> +/* Lo and behold: the egress scheduler from hell.
>> + *
>> + * At the hardware level, the Time-Aware Shaper holds a global linear
>> arrray of
>> + * all schedule entries for all ports. These are the Gate Control List
>> (GCL)
>> + * entries, let's call them "timeslots" for short. This linear array of
>> + * timeslots is held in BLK_IDX_SCHEDULE.
>> + *
>> + * Then there are a maximum of 8 "execution threads" inside the switch,
>> which
>> + * iterate cyclically through the "schedule". Each "cycle" has an entry
>> point
>> + * and an exit point, both being timeslot indices in the schedule table.
>> The
>> + * hardware calls each cycle a "subschedule".
>> + *
>> + * Subschedule (cycle) i starts when
>> + * ptpclkval >= ptpschtm + BLK_IDX_SCHEDULE_ENTRY_POINTS[i].delta.
>> + *
>> + * The hardware scheduler iterates BLK_IDX_SCHEDULE with a k ranging
>> from
>> + * k = BLK_IDX_SCHEDULE_ENTRY_POINTS[i].address to
>> + * k = BLK_IDX_SCHEDULE_PARAMS.subscheind[i]
>> + *
>> + * For each schedule entry (timeslot) k, the engine executes the gate
>> control
>> + * list entry for the duration of BLK_IDX_SCHEDULE[k].delta.
>> + *
>> + * +---------+
>> + * | | BLK_IDX_SCHEDULE_ENTRY_POINTS_PARAMS
>> + * +---------+
>> + * |
>> + * +-----------------+
>> + * | .actsubsch
>> + * BLK_IDX_SCHEDULE_ENTRY_POINTS v
>> + * +-------+-------+
>> + * |cycle 0|cycle 1|
>> + * +-------+-------+
>> + * | | | |
>> + * +----------------+ | |
>> +-------------------------------------+
>> + * | .subschindx | | .subschindx
>> |
>> + * | | +---------------+
>> |
>> + * | .address | .address |
>> |
>> + * | | |
>> |
>> + * | | |
>> |
>> + * | BLK_IDX_SCHEDULE v v
>> |
>> + * | +-------+-------+-------+-------+-------+------+
>> |
>> + * | |entry 0|entry 1|entry 2|entry 3|entry 4|entry5|
>> |
>> + * | +-------+-------+-------+-------+-------+------+
>> |
>> + * | ^ ^ ^ ^
>> |
>> + * | | | | |
>> |
>> + * | +-------------------------+ | | |
>> |
>> + * | | +-------------------------------+ | |
>> |
>> + * | | | +-------------------+ |
>> |
>> + * | | | | |
>> |
>> + * | +---------------------------------------------------------------+
>> |
>> + * | |subscheind[0]<=subscheind[1]<=subscheind[2]<=...<=subscheind[7]|
>> |
>> + * | +---------------------------------------------------------------+
>> |
>> + * | ^ ^ BLK_IDX_SCHEDULE_PARAMS
>> |
>> + * | | |
>> |
>> + * +--------+
>> +-------------------------------------------+
>> + *
>> + * In the above picture there are two subschedules (cycles):
>> + *
>> + * - cycle 0: iterates the schedule table from 0 to 2 (and back)
>> + * - cycle 1: iterates the schedule table from 3 to 5 (and back)
>> + *
>> + * All other possible execution threads must be marked as unused by
>> making
>> + * their "subschedule end index" (subscheind) equal to the last valid
>> + * subschedule's end index (in this case 5).
>> + */
>> +static int sja1105_init_scheduling(struct sja1105_private *priv)
>> +{
>> + struct sja1105_schedule_entry_points_entry *schedule_entry_points;
>> + struct sja1105_schedule_entry_points_params_entry
>> + *schedule_entry_points_params;
>> + struct sja1105_schedule_params_entry *schedule_params;
>> + struct sja1105_tas_data *tas_data = &priv->tas_data;
>> + struct sja1105_schedule_entry *schedule;
>> + struct sja1105_table *table;
>> + int subscheind[8] = {0};
>> + int schedule_start_idx;
>> + s64 entry_point_delta;
>> + int schedule_end_idx;
>> + int num_entries = 0;
>> + int num_cycles = 0;
>> + int cycle = 0;
>> + int i, k = 0;
>> + int port;
>> +
>> + /* Discard previous Schedule Table */
>> + table = &priv->static_config.tables[BLK_IDX_SCHEDULE];
>> + if (table->entry_count) {
>> + kfree(table->entries);
>> + table->entry_count = 0;
>> + }
>> +
>> + /* Discard previous Schedule Entry Points Parameters Table */
>> + table =
>> &priv->static_config.tables[BLK_IDX_SCHEDULE_ENTRY_POINTS_PARAMS];
>> + if (table->entry_count) {
>> + kfree(table->entries);
>> + table->entry_count = 0;
>> + }
>> +
>> + /* Discard previous Schedule Parameters Table */
>> + table = &priv->static_config.tables[BLK_IDX_SCHEDULE_PARAMS];
>> + if (table->entry_count) {
>> + kfree(table->entries);
>> + table->entry_count = 0;
>> + }
>> +
>> + /* Discard previous Schedule Entry Points Table */
>> + table = &priv->static_config.tables[BLK_IDX_SCHEDULE_ENTRY_POINTS];
>> + if (table->entry_count) {
>> + kfree(table->entries);
>> + table->entry_count = 0;
>> + }
>> +
>> + /* Figure out the dimensioning of the problem */
>> + for (port = 0; port < SJA1105_NUM_PORTS; port++) {
>> + if (tas_data->config[port]) {
>> + num_entries += tas_data->config[port]->num_entries;
>> + num_cycles++;
>> + }
>> + }
>> +
>> + /* Nothing to do */
>> + if (!num_cycles)
>> + return 0;
>> +
>> + /* Pre-allocate space in the static config tables */
>> +
>> + /* Schedule Table */
>> + table = &priv->static_config.tables[BLK_IDX_SCHEDULE];
>> + table->entries = kcalloc(num_entries, table->ops->unpacked_entry_size,
>> + GFP_KERNEL);
>> + if (!table->entries)
>> + return -ENOMEM;
>> + table->entry_count = num_entries;
>> + schedule = table->entries;
>> +
>> + /* Schedule Points Parameters Table */
>> + table =
>> &priv->static_config.tables[BLK_IDX_SCHEDULE_ENTRY_POINTS_PARAMS];
>> + table->entries =
>> kcalloc(SJA1105_MAX_SCHEDULE_ENTRY_POINTS_PARAMS_COUNT,
>> + table->ops->unpacked_entry_size, GFP_KERNEL);
>> + if (!table->entries)
>> + return -ENOMEM;
>
> Should this free the previous allocation, in case this one fails?
> (also applies to the statements below)
>
I had to take a look at the overall driver code again, since it's
already been a while since I added it and I couldn't remember exactly.
All memory is freed automagically in sja1105_static_config_free from
sja1105_static_config.c. That simplifies driver code considerably,
although it's so generic that I forgot that it's there.
Thanks,
-Vladimir
^ permalink raw reply
* Re: [PATCH net 1/2] sctp: remove redundant assignment when call sctp_get_port_local
From: maowenan @ 2019-09-12 2:05 UTC (permalink / raw)
To: Marcelo Ricardo Leitner, Dan Carpenter
Cc: vyasevich, nhorman, davem, linux-sctp, netdev, linux-kernel,
kernel-janitors
In-Reply-To: <20190911143923.GE3499@localhost.localdomain>
On 2019/9/11 22:39, Marcelo Ricardo Leitner wrote:
> On Wed, Sep 11, 2019 at 11:30:08AM -0300, Marcelo Ricardo Leitner wrote:
>> On Wed, Sep 11, 2019 at 11:30:38AM +0300, Dan Carpenter wrote:
>>> On Wed, Sep 11, 2019 at 09:30:47AM +0800, maowenan wrote:
>>>>
>>>>
>>>> On 2019/9/11 3:22, Dan Carpenter wrote:
>>>>> On Tue, Sep 10, 2019 at 09:57:10PM +0300, Dan Carpenter wrote:
>>>>>> On Tue, Sep 10, 2019 at 03:13:42PM +0800, Mao Wenan wrote:
>>>>>>> There are more parentheses in if clause when call sctp_get_port_local
>>>>>>> in sctp_do_bind, and redundant assignment to 'ret'. This patch is to
>>>>>>> do cleanup.
>>>>>>>
>>>>>>> Signed-off-by: Mao Wenan <maowenan@huawei.com>
>>>>>>> ---
>>>>>>> net/sctp/socket.c | 3 +--
>>>>>>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>>>>>>> index 9d1f83b10c0a..766b68b55ebe 100644
>>>>>>> --- a/net/sctp/socket.c
>>>>>>> +++ b/net/sctp/socket.c
>>>>>>> @@ -399,9 +399,8 @@ static int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len)
>>>>>>> * detection.
>>>>>>> */
>>>>>>> addr->v4.sin_port = htons(snum);
>>>>>>> - if ((ret = sctp_get_port_local(sk, addr))) {
>>>>>>> + if (sctp_get_port_local(sk, addr))
>>>>>>> return -EADDRINUSE;
>>>>>>
>>>>>> sctp_get_port_local() returns a long which is either 0,1 or a pointer
>>>>>> casted to long. It's not documented what it means and neither of the
>>>>>> callers use the return since commit 62208f12451f ("net: sctp: simplify
>>>>>> sctp_get_port").
>>>>>
>>>>> Actually it was commit 4e54064e0a13 ("sctp: Allow only 1 listening
>>>>> socket with SO_REUSEADDR") from 11 years ago. That patch fixed a bug,
>>>>> because before the code assumed that a pointer casted to an int was the
>>>>> same as a pointer casted to a long.
>>>>
>>>> commit 4e54064e0a13 treated non-zero return value as unexpected, so the current
>>>> cleanup is ok?
>>>
>>> Yeah. It's fine, I was just confused why we weren't preserving the
>>> error code and then I saw that we didn't return errors at all and got
>>> confused.
>>
>> But please lets seize the moment and do the change Dean suggested.
>
> *Dan*, sorry.
>
>> This was the last place saving this return value somewhere. It makes
>> sense to cleanup sctp_get_port_local() now and remove that masked
>> pointer return.
>>
>> Then you may also cleanup:
>> socket.c: return !!sctp_get_port_local(sk, &addr);
>> as it will be a direct map.
Thanks Marcelo, shall I post a new individual patch for cleanup as your suggest?
>>
>> Marcelo
>>
>
> .
>
^ permalink raw reply
* Klientskie Bazy http://prodawez.tilda.ws/page7270311.html
From: netdev @ 2019-09-12 1:18 UTC (permalink / raw)
To: netdev
Klientskie Bazy http://prodawez.tilda.ws/page7270311.html
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox