* Re: [PATCH v7 03/11] net: pch_gbe: Probe PHY ID & initialize only once
From: Paul Burton @ 2018-06-27 17:31 UTC (permalink / raw)
To: Andrew Lunn; +Cc: netdev, David S . Miller
In-Reply-To: <20180627172131.GB885@lunn.ch>
Hi Andrew,
On Wed, Jun 27, 2018 at 07:21:31PM +0200, Andrew Lunn wrote:
> > [1] Please, someone patent PHY hotplugging & rigorously enforce said
> > patent such that nobody can do it. At least not with an EG20T MAC.
>
> Hi Paul
>
> It is already possible, and probably patented. SFP cages are usually
> used for fibre optical modules. But it is also possible to have copper
> modules, which contain a standard PHY. And SFP modules are
> hot-plugable...
D'oh, but at least not relevant to the EG20T/pch_gbe :)
> > @@ -2577,6 +2579,8 @@ static int pch_gbe_probe(struct pci_dev *pdev,
> > if (ret)
> > goto err_free_netdev;
> >
> > + pch_gbe_check_options(adapter);
> > +
> > /* Initialize PHY */
> > ret = pch_gbe_init_phy(adapter);
> > if (ret) {
> > @@ -2606,8 +2610,6 @@ static int pch_gbe_probe(struct pci_dev *pdev,
> >
> > INIT_WORK(&adapter->reset_task, pch_gbe_reset_task);
> >
> > - pch_gbe_check_options(adapter);
> > -
> > /* initialize the wol settings based on the eeprom settings */
> > adapter->wake_up_evt = PCH_GBE_WL_INIT_SETTING;
> > dev_info(&pdev->dev, "MAC address : %pM\n", netdev->dev_addr);
>
> But these two changes seem unrelated. Should they be in a different
> patch?
This is actually needed because pch_gbe_check_options() sets up, amongst
other things, the autoneg_advertised field in struct pch_gbe_phy_info
and that needs to happen before pch_gbe_phy_init_setting() is called.
Thanks,
Paul
^ permalink raw reply
* Re: [PATCH v7 07/11] net: pch_gbe: Remove AR8031 PHY hibernation disable
From: Andrew Lunn @ 2018-06-27 17:30 UTC (permalink / raw)
To: Paul Burton; +Cc: netdev, David S . Miller
In-Reply-To: <20180627000612.27263-8-paul.burton@mips.com>
On Tue, Jun 26, 2018 at 05:06:08PM -0700, Paul Burton wrote:
> We should now be able to cope with the PHY entering hibernation, ie.
> ceasing to provide the RX clock, whilst the ethernet link is down.
>
> Remove the code responsible for disabling the AR8031 PHY's hibernation
> feature, allowing the PHY to enter its low power hibernation state.
>
> Signed-off-by: Paul Burton <paul.burton@mips.com>
> Cc: Andrew Lunn <andrew@lunn.ch>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: netdev@vger.kernel.org
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Andrew
^ permalink raw reply
* Re: [PATCH v7 06/11] net: pch_gbe: Only enable MAC when PHY link is active
From: Andrew Lunn @ 2018-06-27 17:30 UTC (permalink / raw)
To: Paul Burton; +Cc: netdev, David S . Miller
In-Reply-To: <20180627000612.27263-7-paul.burton@mips.com>
On Tue, Jun 26, 2018 at 05:06:07PM -0700, Paul Burton wrote:
> When using a PHY connected via RGMII, as the pch_gbe driver presumes is
> the case, the RX clock is provided by the PHY to the MAC. Various PHYs,
> including both the AR8031 used by the Minnowboard & the RTL8211E used by
> the MIPS Boston development board, will stop generating the RX clock
> when the ethernet link is down (eg. the ethernet cable is unplugged).
>
> Various pieces of functionality in the EG20T MAC, ranging from basics
> like completing a MAC reset to programming MAC addresses, rely upon the
> RX clock being provided. When the clock is not provided these pieces of
> functionality simply never complete, and the busy bits that indicate
> they're in progress remain set indefinitely.
>
> The pch_gbe driver currently requires that the RX clock is always
> provided, and attempts to enforce this by disabling the hibernation
> feature of the AR8031 PHY to keep it generating the RX clock. This patch
> moves us away from this model by only configuring the MAC when the PHY
> indicates that the ethernet link is up. When the link is up we should be
> able to safely expect that the RX clock is being provided, and therefore
> safely reset & configure the MAC.
Hi Paul
I like the concept, but the implementation is not clear. Maybe it just
needs more details in the commit message. What has the watchdog got to
do with link up?
And what happens on link down? Does the MAC need shutting down? I
don't see such code here.
Andrew
^ permalink raw reply
* Re: [PATCH v7 05/11] net: pch_gbe: Move pch_gbe_watchdog lower in pch_gbe_main.c
From: Andrew Lunn @ 2018-06-27 17:23 UTC (permalink / raw)
To: Paul Burton; +Cc: netdev, David S . Miller
In-Reply-To: <20180627000612.27263-6-paul.burton@mips.com>
On Tue, Jun 26, 2018 at 05:06:06PM -0700, Paul Burton wrote:
> This patch moves the pch_gbe_watchdog() function lower in pch_gbe_main.c
> in order to allow use of other functions in the next patch, without
> requiring lots of forward declarations. Doing this as a separate patch
> makes it clearer what actually changed in the next patch.
>
> The function is unmodified except for whitespace changes to satisfy
> checkpatch.
>
> Signed-off-by: Paul Burton <paul.burton@mips.com>
> Cc: Andrew Lunn <andrew@lunn.ch>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: netdev@vger.kernel.org
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Andrew
^ permalink raw reply
* Re: [PATCH v7 03/11] net: pch_gbe: Probe PHY ID & initialize only once
From: Andrew Lunn @ 2018-06-27 17:21 UTC (permalink / raw)
To: Paul Burton; +Cc: netdev, David S . Miller
In-Reply-To: <20180627000612.27263-4-paul.burton@mips.com>
On Tue, Jun 26, 2018 at 05:06:04PM -0700, Paul Burton wrote:
> The pch_gbe driver currently probes for the PHY ID & configures the PHY
> every time the MAC is reset, even though we know that the PHY won't have
> changed since the last MAC reset [1].
>
> This patch moves the PHY probe to instead happen only once when the
> driver is probed, saving time & moving us closer to the behavior we'll
> have with phylib.
>
> [1] Please, someone patent PHY hotplugging & rigorously enforce said
> patent such that nobody can do it. At least not with an EG20T MAC.
Hi Paul
It is already possible, and probably patented. SFP cages are usually
used for fibre optical modules. But it is also possible to have copper
modules, which contain a standard PHY. And SFP modules are
hot-plugable...
>
> Signed-off-by: Paul Burton <paul.burton@mips.com>
> Cc: Andrew Lunn <andrew@lunn.ch>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: netdev@vger.kernel.org
> ---
>
> Changes in v7: New patch
>
> .../ethernet/oki-semi/pch_gbe/pch_gbe_main.c | 26 ++++++++++---------
> 1 file changed, 14 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
> index 9651fa02d4bb..5157cea16773 100644
> --- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
> +++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
> @@ -617,8 +617,10 @@ static void pch_gbe_init_stats(struct pch_gbe_adapter *adapter)
> static int pch_gbe_init_phy(struct pch_gbe_adapter *adapter)
> {
> struct net_device *netdev = adapter->netdev;
> + struct pch_gbe_hw *hw = &adapter->hw;
> u32 addr;
> u16 bmcr, stat;
> + s32 ret_val;
>
> /* Discover phy addr by searching addrs in order {1,0,2,..., 31} */
> for (addr = 0; addr < PCH_GBE_PHY_REGS_LEN; addr++) {
> @@ -652,6 +654,16 @@ static int pch_gbe_init_phy(struct pch_gbe_adapter *adapter)
> adapter->mii.mdio_read = pch_gbe_mdio_read;
> adapter->mii.mdio_write = pch_gbe_mdio_write;
> adapter->mii.supports_gmii = mii_check_gmii_support(&adapter->mii);
> +
> + ret_val = pch_gbe_phy_get_id(hw);
> + if (ret_val) {
> + netdev_err(adapter->netdev, "pch_gbe_phy_get_id error\n");
> + return -EIO;
> + }
> + pch_gbe_phy_init_setting(hw);
> + /* Setup Mac interface option RGMII */
> + pch_gbe_phy_set_rgmii(hw);
> +
> return 0;
> }
>
> @@ -721,22 +733,12 @@ void pch_gbe_reset(struct pch_gbe_adapter *adapter)
> {
> struct net_device *netdev = adapter->netdev;
> struct pch_gbe_hw *hw = &adapter->hw;
> - s32 ret_val;
>
> pch_gbe_mac_reset_hw(hw);
> /* reprogram multicast address register after reset */
> pch_gbe_set_multi(netdev);
> /* Setup the receive address. */
> pch_gbe_mac_init_rx_addrs(hw, PCH_GBE_MAR_ENTRIES);
> -
> - ret_val = pch_gbe_phy_get_id(hw);
> - if (ret_val) {
> - netdev_err(adapter->netdev, "pch_gbe_phy_get_id error\n");
> - return;
> - }
> - pch_gbe_phy_init_setting(hw);
> - /* Setup Mac interface option RGMII */
> - pch_gbe_phy_set_rgmii(hw);
> }
>
> /**
Up to here, everything looks O.K.
> @@ -2577,6 +2579,8 @@ static int pch_gbe_probe(struct pci_dev *pdev,
> if (ret)
> goto err_free_netdev;
>
> + pch_gbe_check_options(adapter);
> +
> /* Initialize PHY */
> ret = pch_gbe_init_phy(adapter);
> if (ret) {
> @@ -2606,8 +2610,6 @@ static int pch_gbe_probe(struct pci_dev *pdev,
>
> INIT_WORK(&adapter->reset_task, pch_gbe_reset_task);
>
> - pch_gbe_check_options(adapter);
> -
> /* initialize the wol settings based on the eeprom settings */
> adapter->wake_up_evt = PCH_GBE_WL_INIT_SETTING;
> dev_info(&pdev->dev, "MAC address : %pM\n", netdev->dev_addr);
But these two changes seem unrelated. Should they be in a different
patch?
Andrew
^ permalink raw reply
* Re: [patch net-next 0/9] net: sched: introduce chain templates support with offloading to mlxsw
From: Cong Wang @ 2018-06-27 17:04 UTC (permalink / raw)
To: sridhar.samudrala
Cc: Jiri Pirko, Jakub Kicinski, Linux Kernel Network Developers,
David Miller, Jamal Hadi Salim, Simon Horman, john.hurley,
David Ahern, mlxsw
In-Reply-To: <0272f671-802b-30b6-6ca2-2ffc1e205664@intel.com>
On Wed, Jun 27, 2018 at 9:46 AM Samudrala, Sridhar
<sridhar.samudrala@intel.com> wrote:
>
> On 6/27/2018 12:50 AM, Jiri Pirko wrote:
> > if you don't like "tc filter template add dev dummy0 ingress", how
> > about:
> > "tc template add dev dummy0 ingress ..."
> > "tc template add dev dummy0 ingress chain 22 ..."
> > that makes more sense I think.
Better than 'tc filter template', but this doesn't reflect 'template'
is a template of tc filter, it could be an action etc., since it is in the
same position with 'tc action/filter/qdisc'.
>
> Isn't it possible to avoid introducing another keyword 'template',
>
> Can't we just do
> tc chain add dev dummy0 ingress flower chain_index 0
> to create a chain that takes any types of flower rules with index 0
> and
> tc chain add dev dummy0 ingress flower chain_index 22
> dst_mac 00:00:00:00:00:00/00:00:00:00:FF:FF
> tc chain add dev dummy0 ingress flower chain_index 23
> dst_ip 192.168.0.0/16
> to create 2 chains 22 and 23 that allow rules with specific fields.
Sounds good too. Since filter chain can be shared by qdiscs,
a 'tc chain' sub-command makes sense, and would probably make
it easier to be shared.
^ permalink raw reply
* Re: [bpf-next PATCH 1/2] samples/bpf: extend xdp_rxq_info to read packet payload
From: Song Liu @ 2018-06-27 17:03 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Networking, Daniel Borkmann, Toke Høiland-Jørgensen,
Alexei Starovoitov
In-Reply-To: <20180627132312.0640448b@redhat.com>
On Wed, Jun 27, 2018 at 4:23 AM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
> On Tue, 26 Jun 2018 16:53:15 -0700
> Song Liu <liu.song.a23@gmail.com> wrote:
>
>> > +static char* options2str(enum cfg_options_flags flag)
>> > +{
>> > + if (flag == NO_TOUCH)
>> > + return "no_touch";
>> > + if (flag & READ_MEM)
>> > + return "read";
>> > + fprintf(stderr, "ERR: Unknown config option flags");
>> > + exit(EXIT_FAIL);
>> > +}
>> > +
>>
>> enum cfg_options_flags is used as a bitmap in other parts of the sample.
>> So this function is a little weird (with more flags added).
>
> Sure, and do I handle this correctly in the next patch.
>
> I'm uncertain what you want me to change?
> Do you want me to drop the enum, and use #define instead?
I think it is good as-is for sample code.
Acked-by: Song Liu <songliubraving@fb.com>
^ permalink raw reply
* Re: [bpf-next PATCH 2/2] samples/bpf: xdp_rxq_info action XDP_TX must adjust MAC-addrs
From: Song Liu @ 2018-06-27 17:02 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Networking, Daniel Borkmann, Toke Høiland-Jørgensen,
Alexei Starovoitov
In-Reply-To: <20180627132026.79190a42@redhat.com>
On Wed, Jun 27, 2018 at 4:20 AM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
> On Tue, 26 Jun 2018 17:09:01 -0700
> Song Liu <liu.song.a23@gmail.com> wrote:
>
>> On Mon, Jun 25, 2018 at 7:27 AM, Jesper Dangaard Brouer
>> <brouer@redhat.com> wrote:
>> > XDP_TX requires also changing the MAC-addrs, else some hardware
>> > may drop the TX packet before reaching the wire. This was
>> > observed with driver mlx5.
>> >
>> > If xdp_rxq_info select --action XDP_TX the swapmac functionality
>> > is activated. It is also possible to manually enable via cmdline
>> > option --swapmac. This is practical if wanting to measure the
>> > overhead of writing/updating payload for other action types.
>> >
>> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>> > Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
>> > ---
> [...]
>> >
>> > diff --git a/samples/bpf/xdp_rxq_info_kern.c b/samples/bpf/xdp_rxq_info_kern.c
>> > index 61af6210df2f..222a83eed1cb 100644
>> > --- a/samples/bpf/xdp_rxq_info_kern.c
>> > +++ b/samples/bpf/xdp_rxq_info_kern.c
>> > @@ -21,6 +21,7 @@ struct config {
>> > enum cfg_options_flags {
>> > NO_TOUCH = 0x0U,
>> > READ_MEM = 0x1U,
>> > + SWAP_MAC = 0x2U,
>> > };
> [...]
>> > @@ -98,7 +116,7 @@ int xdp_prognum0(struct xdp_md *ctx)
>> > rxq_rec->issue++;
>> >
>> > /* Default: Don't touch packet data, only count packets */
>> > - if (unlikely(config->options & READ_MEM)) {
>> > + if (unlikely(config->options & (READ_MEM|SWAP_MAC))) {
>> > struct ethhdr *eth = data;
>> >
>> > if (eth + 1 > data_end)
> [...]
>
>> > diff --git a/samples/bpf/xdp_rxq_info_user.c b/samples/bpf/xdp_rxq_info_user.c
>> > index 435485d4f49e..248a7eab9531 100644
> [...]
>> > @@ -119,6 +121,8 @@ static char* options2str(enum cfg_options_flags flag)
>> > {
>> > if (flag == NO_TOUCH)
>> > return "no_touch";
>> > + if (flag & SWAP_MAC)
>> > + return "swapmac";
>> > if (flag & READ_MEM)
>> > return "read";
>>
>> I guess SWAP_MAC also reads the memory, so it "includes" READ_MEM?
>
> True (see _kern side)
>
>> It is OK for now. We may need to refactor this part when adding other
>> flags in the future.
>
> Sure, do remember that this is only a 'sample' program.
Agreed.
Acked-by: Song Liu <songliubraving@fb.com>
^ permalink raw reply
* Re: [PATCH net-next v2] tcp: force cwnd at least 2 in tcp_cwnd_reduction
From: Yuchung Cheng @ 2018-06-27 16:57 UTC (permalink / raw)
To: Neal Cardwell
Cc: Lawrence Brakmo, Matt Mathis, Netdev, Kernel Team, Blake Matheny,
Alexei Starovoitov, Eric Dumazet, Wei Wang
In-Reply-To: <CADVnQymQc0Z5q0p=Cs3nJ1D+QrO7Pc78n287XYLDC_y5vBEN+Q@mail.gmail.com>
On Wed, Jun 27, 2018 at 8:24 AM, Neal Cardwell <ncardwell@google.com> wrote:
> On Tue, Jun 26, 2018 at 10:34 PM Lawrence Brakmo <brakmo@fb.com> wrote:
>> The only issue is if it is safe to always use 2 or if it is better to
>> use min(2, snd_ssthresh) (which could still trigger the problem).
>
> Always using 2 SGTM. I don't think we need min(2, snd_ssthresh), as
> that should be the same as just 2, since:
>
> (a) RFCs mandate ssthresh should not be below 2, e.g.
> https://tools.ietf.org/html/rfc5681 page 7:
>
> ssthresh = max (FlightSize / 2, 2*SMSS) (4)
>
> (b) The main loss-based CCs used in Linux (CUBIC, Reno, DCTCP) respect
> that constraint, and always have an ssthresh of at least 2.
>
> And if some CC misbehaves and uses a lower ssthresh, then taking
> min(2, snd_ssthresh) will trigger problems, as you note.
>
>> + tp->snd_cwnd = max((int)tcp_packets_in_flight(tp) + sndcnt, 2);
>
> AFAICT this does seem like it will make the sender behavior more
> aggressive in cases with high loss and/or a very low per-flow
> fair-share.
>
> Old:
>
> o send N packets
> o receive SACKs for last 3 packets
> o fast retransmit packet 1
> o using ACKs, slow-start upward
>
> New:
>
> o send N packets
> o receive SACKs for last 3 packets
> o fast retransmit packets 1 and 2
> o using ACKs, slow-start upward
>
> In the extreme case, if the available fair share is less than 2
> packets, whereas inflight would have oscillated between 1 packet and 2
> packets with the existing code, it now seems like with this commit the
> inflight will now hover at 2. It seems like this would have
> significantly higher losses than we had with the existing code.
I share similar concern. Note that this function is used by most
existing congestion control modules beside DCTCP so I am more cautious
of changing this to address DCTCP issue.
One problem that DCTCP paper notices when cwnd = 1 is still too big
when the bottleneck
is shared by many flows (e.g. incast). It specifically suggest
changing the lower-bound of 2 in the spec to 1. (Section 8.2).
https://www.usenix.org/system/files/conference/nsdi15/nsdi15-paper-judd.pdf
I am curious about the differences you observe in 4.11 and 4.16. I
wasn't aware of any (significant) change in tcp_cwnd_reduction / PRR
algorithm between 4.11 and 4.16. Also the receiver should not delay
ACKs if it has out-of-order packet or receiving CE data packets. This
means the delayed ACK is by tail losses and the last data received
carries no CE mark: seems a less common scenario?
If delayed-ACK is the problem, we probably should fix the receiver to
delay ACK more intelligently, not the sender. weiwan@google.com is
working on it.
>
> This may or may not be OK in practice, but IMHO it is worth mentioning
> and discussing.
>
> neal
^ permalink raw reply
* Re: [patch net-next 0/9] net: sched: introduce chain templates support with offloading to mlxsw
From: Samudrala, Sridhar @ 2018-06-27 16:46 UTC (permalink / raw)
To: Jiri Pirko, Jakub Kicinski
Cc: Linux Netdev List, David Miller, Jamal Hadi Salim, Cong Wang,
Simon Horman, John Hurley, David Ahern, mlxsw
In-Reply-To: <20180627075017.GA2007@nanopsycho>
On 6/27/2018 12:50 AM, Jiri Pirko wrote:
> Tue, Jun 26, 2018 at 11:18:58PM CEST, jakub.kicinski@netronome.com wrote:
>> On Tue, 26 Jun 2018 09:12:17 +0200, Jiri Pirko wrote:
>>> Tue, Jun 26, 2018 at 09:00:45AM CEST, jakub.kicinski@netronome.com wrote:
>>>> On Mon, Jun 25, 2018 at 11:43 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>>> Tue, Jun 26, 2018 at 06:58:50AM CEST, jakub.kicinski@netronome.com wrote:
>>>>>> On Mon, 25 Jun 2018 23:01:39 +0200, Jiri Pirko wrote:
>>>>>>> From: Jiri Pirko <jiri@mellanox.com>
>>>>>>>
>>>>>>> For the TC clsact offload these days, some of HW drivers need
>>>>>>> to hold a magic ball. The reason is, with the first inserted rule inside
>>>>>>> HW they need to guess what fields will be used for the matching. If
>>>>>>> later on this guess proves to be wrong and user adds a filter with a
>>>>>>> different field to match, there's a problem. Mlxsw resolves it now with
>>>>>>> couple of patterns. Those try to cover as many match fields as possible.
>>>>>>> This aproach is far from optimal, both performance-wise and scale-wise.
>>>>>>> Also, there is a combination of filters that in certain order won't
>>>>>>> succeed.
>>>>>>>
>>>>>>> Most of the time, when user inserts filters in chain, he knows right away
>>>>>>> how the filters are going to look like - what type and option will they
>>>>>>> have. For example, he knows that he will only insert filters of type
>>>>>>> flower matching destination IP address. He can specify a template that
>>>>>>> would cover all the filters in the chain.
>>>>>> Perhaps it's lack of sleep, but this paragraph threw me a little off
>>>>>> the track. IIUC the goal of this set is to provide a way to inform the
>>>>>> HW about expected matches before any rule is programmed into the HW.
>>>>>> Not before any rule is added to a particular chain. One can just use
>>>>>> the first rule in the chain to make a guess about the chain, but thanks
>>>>>> to this set user can configure *all* chains before any rules are added.
>>>>> The template is per-chain. User can use template for chain x and
>>>>> not-use it for chain y. Up to him.
>>>> Makes sense.
>>>>
>>>> I can't help but wonder if it'd be better to associate the
>>>> constraints/rules with chains instead of creating a new "template"
>>>> object. It seems more natural to create a chain with specific
>>>> constraints in place than add and delete template of which there can
>>>> be at most one to a chain... Perhaps that's more about the user space
>>>> tc command line. Anyway, not a strong objection, just a thought.
>>> Hmm. I don't think it is good idea. User should see the template in a
>>> "show" command per chain. We would have to have 2 show commands, one to
>>> list the template objects and one to list templates per chains. It makes
>>> things more complicated for no good reason. I think that this simple
>>> chain-lock is easier and serves the purpose.
>> Hm, I think the dump is fine, what I was thinking about was:
>>
>> # tc chain add dev dummy0 ingress chain_index 22 \
>> ^^^^^
>> template proto ip \
>> ^^^^^^^^
>> flower dst_mac 00:00:00:00:00:00/00:00:00:00:FF:FF
> Okay, I got it. I see 2 issues.
> 1) user might expect to add a chain without the template. But that does
> not make sense really. Chains are created/deleted implicitly
> according to refcount.
> 2) there is not chain object like this available to user. Adding it just
> for template looks odd. Also, the "filter" and "template" are very
> much alike. They both are added to a chain, they both implicitly
> create chain if it does not exist, etc.
>
> if you don't like "tc filter template add dev dummy0 ingress", how
> about:
> "tc template add dev dummy0 ingress ..."
> "tc template add dev dummy0 ingress chain 22 ..."
> that makes more sense I think.
Isn't it possible to avoid introducing another keyword 'template',
Can't we just do
tc chain add dev dummy0 ingress flower chain_index 0
to create a chain that takes any types of flower rules with index 0
and
tc chain add dev dummy0 ingress flower chain_index 22
dst_mac 00:00:00:00:00:00/00:00:00:00:FF:FF
tc chain add dev dummy0 ingress flower chain_index 23
dst_ip 192.168.0.0/16
to create 2 chains 22 and 23 that allow rules with specific fields.
To create a chain that takes decap rules like
filter protocol ip pref 2 flower chain 25 handle 0x2
dst_mac fe:24:9a:23:4c:5c
src_mac ce:1d:58:eb:a0:35
eth_type ipv4
enc_dst_ip 192.168.100.20
enc_src_ip 192.168.100.22
enc_key_id 100
enc_dst_port 4789
can we create a chain using this format
tc chain add dev dummy0 ingress flower chain_index 25
dst_mac 00:00:00:00:00:00/FF:FF:FF:FF:FF:FF
src_mac 00:00:00:00:00:00/FF:FF:FF:FF:FF:FF
enc_dst_ip 192.168.100.0/24
enc_src_ip 192.168.100.0/24
eth_type ipv4/arp
enc_key_id 0/ffff
enc_dst_port 0/ffff
When adding a filter to a pre-defined chain, does the filter need
to specify all the fields that are included in the chain's template?
>
>
>> instead of:
>>
>> # tc filter template add dev dummy0 ingress \
>> ^^^^^^^^^^^^^^^
>> proto ip chain_index 22 \
>> flower dst_mac 00:00:00:00:00:00/00:00:00:00:FF:FF
>>
>> And then delete becomes:
>>
>> # tc chain del dev dummy0 ingress chain_index 22
>> Error: The chain is not empty.
>>
>> The fact that template is very much like a filter is sort of an
>> implementation detail, from user perspective it may be more intuitive
>> to model template as an attribute of the chain, not a filter object
>> added to a chain.
>>
>> But I could well be the only person who feels that way :)
^ permalink raw reply
* RE: [RFC bpf-next 0/6] XDP RX device meta data acceleration (WIP)
From: Parikh, Neerav @ 2018-06-27 16:42 UTC (permalink / raw)
To: Saeed Mahameed, Jesper Dangaard Brouer, Alexei Starovoitov,
Daniel Borkmann
Cc: pjwaskiewicz@gmail.com, ttoukan.linux@gmail.com, Tariq Toukan,
Duyck, Alexander H, Waskiewicz Jr, Peter, Opher Reviv,
Rony Efraim, netdev@vger.kernel.org, Saeed Mahameed
In-Reply-To: <20180627024615.17856-1-saeedm@mellanox.com>
Thanks Saeed for starting this thread :)
My comments inline.
> -----Original Message-----
> From: Saeed Mahameed [mailto:saeedm@dev.mellanox.co.il]
> Sent: Tuesday, June 26, 2018 7:46 PM
> To: Jesper Dangaard Brouer <brouer@redhat.com>; Alexei Starovoitov
> <alexei.starovoitov@gmail.com>; Daniel Borkmann
> <borkmann@iogearbox.net>
> Cc: Parikh, Neerav <neerav.parikh@intel.com>; pjwaskiewicz@gmail.com;
> ttoukan.linux@gmail.com; Tariq Toukan <tariqt@mellanox.com>; Duyck,
> Alexander H <alexander.h.duyck@intel.com>; Waskiewicz Jr, Peter
> <peter.waskiewicz.jr@intel.com>; Opher Reviv <opher@mellanox.com>; Rony
> Efraim <ronye@mellanox.com>; netdev@vger.kernel.org; Saeed Mahameed
> <saeedm@mellanox.com>
> Subject: [RFC bpf-next 0/6] XDP RX device meta data acceleration (WIP)
>
> Hello,
>
> Although it is far from being close to completion, this series provides
> the means and infrastructure to enable device drivers to share packets
> meta data information and accelerations with XDP programs,
> such as hash, stripped vlan, checksum, flow mark, packet header types,
> etc ..
>
> The series is still work in progress and sending it now as RFC in order
> to get early feedback and to discuss the design, structures and UAPI.
>
> For now the general idea is to help XDP programs accelerate, and provide
> them with meta data that already available from the HW or device driver
> to save cpu cycles on packet headers and data processing and decision
> making, aside of that we want to avoid having a fixed size meta data
> structure that wastes a lot of buffer space for stuff that the xdp program
> might not need, like the current "elephant" SKB fields, kidding :) ..
>
> So my idea here is to provide a dynamic mechanism to allow XDP programs
> on xdp load to request only the specific meta data they need, and
> the device driver or netdevice will provide them in a predefined order
> in the xdp_buff/xdp_md data meta section.
>
> And here is how it is done and what i would like to discuss:
>
> 1. In the first patch: added the infrastructure to request predefined meta data
> flags on xdp load indicating that the XDP program is going to need them.
>
> 1.1) In this patch I am using the current u32 bit IFLA_XDP_FLAGS,
> TODO: this needs to be improved in order to allow more meta data flags,
> maybe use a new dedicated flags ?
>
> 1.2) Device driver that want to support xdp meta data should implement
> new XDP command XDP_QUERY_META_FLAGS and report the meta data flags
> they
> can support.
>
> 1.3) the kernel will cross check the requested flags with the device's
> supported flags and will fail the xdp load in case of discrepancy.
>
> Question : do we want this ? or is it better to return to the program
> the actual supported flags and let it decide ?
>
>
The work we are doing in this direction does not assume any specific flags but
instead the XDP program requests for certain "meta data" that it needs and
the driver (or HW) if can provide that then allow loading of that program.
If we put the flags and capabilities in kernel then it will depend on the control
program loading the program to pass on that information. If the XDP program
has that built-in via say ELF "section" (similar to maps) then the program
can be loaded independently and knows what kind of meta data it wants
and receives.
If the meta data is not supported by the device (driver or software mode)
then that would perhaps fail the program load.
> 2. This is the interesting part: in the 2nd patch Added the meta data
> set/get infrastructure to allow device drivers populate them into xdp_buff
> data meta in a well defined and structured format and let the xdp program
> read them according to the predefined format/structure, the idea here is
> that the xdp program and the device driver will share a static information
> offsets array that will define where each meta data will sit inside
> xdp_buff data meta, such structure will be decided once on xdp load.
>
> Enters struct xdp_md_info and xdp_md_info_arr:
>
> struct xdp_md_info {
> __u16 present:1;
> __u16 offset:15; /* offset from data_meta in xdp_md buffer */
> };
We were trying to not define a generic structure if possible and were working
towards a model similar to current usage where the XDP program produces
a meta data that is consumed by the eBPF program on the TC classifier without
hard coding any structure. It's purely a producer-consumer model with the
kernel helping transfer that data from one end to another instead of providing
a structure.
So, the NIC produces the meta data requested by the XDP program v/s producing
some meta data then that gets translated in drivers into a generic structure.
>
> /* XDP meta data offsets info array
> * present bit describes if a meta data is or will be present in xdp_md buff
> * offset describes where a meta data is or should be placed in xdp_md buff
> *
> * Kernel builds this array using xdp_md_info_build helper on demand.
> * User space builds it statically in the xdp program.
> */
> typedef struct xdp_md_info xdp_md_info_arr[XDP_DATA_META_MAX];
>
> Offsets in xdp_md_info_arr are always in ascending order and only for
> requested meta data:
> example : for XDP prgram that requested the following flags:
> flags = XDP_FLAGS_META_HASH | XDP_FLAGS_META_VLAN;
>
> the offsets array will be:
>
> xdp_md_info_arr mdi = {
> [XDP_DATA_META_HASH] = {.offset = 0, .present = 1},
> [XDP_DATA_META_MARK] = {.offset = 0, .present = 0},
> [XDP_DATA_META_VLAN] = {.offset = sizeof(struct xdp_md_hash), .present
> = 1},
> [XDP_DATA_META_CSUM] = {.offset = 0, .present = 0},
> }
>
> For this example: hash fields will always appear first then the vlan for every
> xdp_md.
>
> Once requested to provide xdp meta data, device driver will use a pre-built
> xdp_md_info_arr which was built via xdp_md_info_build on xdp setup,
> xdp_md_info_arr will tell the driver what is the offset of each meta data.
> The user space XDP program will use a similar xdp_md_info_arr to
> statically know what is the offset of each meta data.
>
> *For future meta data they will be added to the end of the array with
> higher flags value.
>
> This patch also provides helper functions for the device drivers to store
> meta data into xdp_buff, and helper function for XDP programs to fetch
> specific xdp meta data from xdp_md buffer.
>
> Question: currently the XDP program is responsible to build the static
> meta data offsets array "xdp_md_info_arr" and the kernel will build it
> for the netdevice, if we are going to choose this direction, should we
> somehow share the same xdp_md_info_arr built by the kernel with the xdp
> program ?
>
> 3. The last mlx5e patch is the actual show case of how the device driver
> will add the support for xdp meta data, it is pretty simple and straight
> forward ! The first two mlx5e patches are just small refactoring to make
> the xdp_md_info_arr and packet completion information available in the rx
> xdp handlers data path.
>
> 4. Just added a small example to demonstrate how XDP program can request
> meta data on xdp load and how it can read them on the critical path.
> of course more examples are needed and some performance numbers.
> Exciting use cases such as:
> - using flow mark to allow fast white/black listing lookups.
> - flow mark to accelerate DDos prevention.
> - Generic Data path: Use the meta data from the xdp_buff to build SKBs
> !(Jesper's Idea)
> - using hash type to know the packet headers and encapsulation without
> touching the packet data at all.
> - using packet hash to do RPS and XPS like cpu redirecting.
> - and many more.
>
>
> More ideas:
>
> From Jesper:
> =========================
> Give XDP more rich information about the hash,
> By reducing the 'ht' value to the PKT_HASH_TYPE's we are loosing information.
>
> I happen to know your hardware well; CQE_RSS_HTYPE_IP tell us if this
> is IPv4 or IPv6. And CQE_RSS_HTYPE_L4 tell us if this is TCP, UDP or
> IPSEC. (And you have another bit telling of this is IPv6 with extension
> headers).
>
Yes the commodity NICs have lot of rich information about packet parsing and
protocol information. It would be worth extending this for the XDP programs so
that they can take advantage of as much work done by the NIC already v/s
redoing all that in the XDP program.
So, basically that will allow XDP programs to put more focus on the
"business logic" of whatever it determines to do with the packet.
> If we don't want to invent our own xdp_hash_types, we can simply adopt
> the RSS Hashing Types defined by Microsoft:
> https://docs.microsoft.com/en-us/windows-hardware/drivers/network/rss-
> hashing-types
>
> An interesting part of the RSS standard, is that the hash type can help
> identify if this is a fragment. (XDP could use this info to avoid
> touching payload and react, e.g. drop fragments, or redirect all
> fragments to another CPU, or skip parsing in XDP and defer to network
> stack via XDP_PASS).
>
Yes as well it would be interesting if there is use if XDP program would like to
monitor any packet errors that were captured by the NIC to monitor.
While traditionally the driver may decide to drop/consume such packets but
it would be a good use-case for debugging.
> By using the RSS standard, we do e.g. loose the ability to say this is
> IPSEC traffic, even-though your HW supports this.
>
> I have tried to implemented my own (non-dynamic) XDP RX-types UAPI here:
> https://marc.info/?l=linux-netdev&m=149512213531769
> https://marc.info/?l=linux-netdev&m=149512213631774
> =========================
>
> Thanks,
> Saeed.
>
> Saeed Mahameed (6):
> net: xdp: Add support for meta data flags requests
> net: xdp: RX meta data infrastructure
> net/mlx5e: Store xdp flags and meta data info
> net/mlx5e: Pass CQE to RX handlers
> net/mlx5e: Add XDP RX meta data support
> samples/bpf: Add meta data hash example to xdp_redirect_cpu
>
> drivers/net/ethernet/mellanox/mlx5/core/en.h | 19 ++-
> .../net/ethernet/mellanox/mlx5/core/en_main.c | 58 +++++----
> .../net/ethernet/mellanox/mlx5/core/en_rx.c | 47 ++++++--
> include/linux/netdevice.h | 10 ++
> include/net/xdp.h | 6 +
> include/uapi/linux/bpf.h | 114 ++++++++++++++++++
> include/uapi/linux/if_link.h | 20 ++-
> net/core/dev.c | 41 +++++++
> samples/bpf/xdp_redirect_cpu_kern.c | 67 ++++++++++
> samples/bpf/xdp_redirect_cpu_user.c | 7 ++
> 10 files changed, 354 insertions(+), 35 deletions(-)
>
> --
> 2.17.0
^ permalink raw reply
* Re: [RFC PATCH v2 net-next 06/12] net: core: propagate SKB lists through packet_type lookup
From: Edward Cree @ 2018-06-27 16:34 UTC (permalink / raw)
To: Willem de Bruijn; +Cc: linux-net-drivers, Network Development, David Miller
In-Reply-To: <CAF=yD-KSJRO9SFE+bNNAWXihTi6VvD3pvUC36OJzSv9vNXS6vg@mail.gmail.com>
On 27/06/18 17:00, Willem de Bruijn wrote:
> On Wed, Jun 27, 2018 at 10:49 AM Edward Cree <ecree@solarflare.com> wrote:
>> On 27/06/18 15:36, Willem de Bruijn wrote:
>>> Also, this function does more than just process network taps.
>> This is true, but naming things is hard, and I couldn't think of either a
>> better new name for this function or a name that could fit in between
>> __netif_receive_skb() and __netif_receive_skb_core() for the new function
>> in my patch named __netif_receive_skb_core(). Any suggestions?
> ____netif_receive_skb_core? Not that four underscores is particularly
> readable. Perhaps __netif_receive_skb_core_inner. It's indeed tricky (and
> not the most important, I didn't mean to bikeshed).
I've gone with __netif_receive_skb_one_core() (by contrast to ..._list_core())
for the outer function. And I don't mind when people shed bikes :)
> Come to think of it, from your fast path assumptions, we could perhaps wrap
> ptype_all and rx_handler logic in a static_branch similar to tc and netfilter
> (and sk_memalloc_socks). Remaining branches like skip_classify, pfmemalloc
> and deliver_exact can also not be reached if all these are off, so this entire
> section can be skipped. Then it could become __netif_receive_skb_slow,
> taken only on the static branch or for vlan packets. I do not suggest it as
> part of this patchset. it would be a pretty complex change on its own.
That is an interesting idea, but agreed that it'd be quite complex.
^ permalink raw reply
* Re: [net] bpfilter: include bpfilter_umh in assembly instead of using objcopy
From: Guenter Roeck @ 2018-06-27 16:18 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: David S . Miller, daniel, torvalds, mcroce, yamada.masahiro,
netdev, kernel-team
In-Reply-To: <20180627031348.285964-1-ast@kernel.org>
On Tue, Jun 26, 2018 at 08:13:48PM -0700, Alexei Starovoitov wrote:
> From: Masahiro Yamada <yamada.masahiro@socionext.com>
>
> What we want here is to embed a user-space program into the kernel.
> Instead of the complex ELF magic, let's simply wrap it in the assembly
> with the '.incbin' directive.
>
> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
> I think this patch should 'fix' bpfilter build issue on all archs.
> cflags for cross CC may still be incorrect and embedded blob
> may fail to execute via fork_usermode_blob()
> (like in case of 'make ARCH=i386 net/bpfilter/' CC will build and link 64-bit
> binary that will be included into bpfilter.o or vmlinux and that binary
> will fail to run on 32-bit kernel),
> but that is separate issue that will be addressed in net-next time frame.
> Long term we've discussed to switch to something like klibc and keep it
> as part of the kernel to avoid relying on glibc and cc-can-link.sh.
>
At the very least it fixes the build issue when building i386 images
on x86_64 systems.
(Build-)Tested-by: Guenter Roeck <linux@roeck-us.net>
> net/bpfilter/Makefile | 17 ++---------------
> net/bpfilter/bpfilter_kern.c | 11 +++++------
> net/bpfilter/bpfilter_umh_blob.S | 7 +++++++
> 3 files changed, 14 insertions(+), 21 deletions(-)
> create mode 100644 net/bpfilter/bpfilter_umh_blob.S
>
> diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
> index 051dc18b8ccb..39c6980b5d99 100644
> --- a/net/bpfilter/Makefile
> +++ b/net/bpfilter/Makefile
> @@ -15,20 +15,7 @@ ifeq ($(CONFIG_BPFILTER_UMH), y)
> HOSTLDFLAGS += -static
> endif
>
> -# a bit of elf magic to convert bpfilter_umh binary into a binary blob
> -# inside bpfilter_umh.o elf file referenced by
> -# _binary_net_bpfilter_bpfilter_umh_start symbol
> -# which bpfilter_kern.c passes further into umh blob loader at run-time
> -quiet_cmd_copy_umh = GEN $@
> - cmd_copy_umh = echo ':' > $(obj)/.bpfilter_umh.o.cmd; \
> - $(OBJCOPY) -I binary \
> - `LC_ALL=C $(OBJDUMP) -f net/bpfilter/bpfilter_umh \
> - |awk -F' |,' '/file format/{print "-O",$$NF} \
> - /^architecture:/{print "-B",$$2}'` \
> - --rename-section .data=.init.rodata $< $@
> -
> -$(obj)/bpfilter_umh.o: $(obj)/bpfilter_umh
> - $(call cmd,copy_umh)
> +$(obj)/bpfilter_umh_blob.o: $(obj)/bpfilter_umh
>
> obj-$(CONFIG_BPFILTER_UMH) += bpfilter.o
> -bpfilter-objs += bpfilter_kern.o bpfilter_umh.o
> +bpfilter-objs += bpfilter_kern.o bpfilter_umh_blob.o
> diff --git a/net/bpfilter/bpfilter_kern.c b/net/bpfilter/bpfilter_kern.c
> index 09522573f611..f0fc182d3db7 100644
> --- a/net/bpfilter/bpfilter_kern.c
> +++ b/net/bpfilter/bpfilter_kern.c
> @@ -10,11 +10,8 @@
> #include <linux/file.h>
> #include "msgfmt.h"
>
> -#define UMH_start _binary_net_bpfilter_bpfilter_umh_start
> -#define UMH_end _binary_net_bpfilter_bpfilter_umh_end
> -
> -extern char UMH_start;
> -extern char UMH_end;
> +extern char bpfilter_umh_start;
> +extern char bpfilter_umh_end;
>
> static struct umh_info info;
> /* since ip_getsockopt() can run in parallel, serialize access to umh */
> @@ -93,7 +90,9 @@ static int __init load_umh(void)
> int err;
>
> /* fork usermode process */
> - err = fork_usermode_blob(&UMH_start, &UMH_end - &UMH_start, &info);
> + err = fork_usermode_blob(&bpfilter_umh_start,
> + &bpfilter_umh_end - &bpfilter_umh_start,
> + &info);
> if (err)
> return err;
> pr_info("Loaded bpfilter_umh pid %d\n", info.pid);
> diff --git a/net/bpfilter/bpfilter_umh_blob.S b/net/bpfilter/bpfilter_umh_blob.S
> new file mode 100644
> index 000000000000..40311d10d2f2
> --- /dev/null
> +++ b/net/bpfilter/bpfilter_umh_blob.S
> @@ -0,0 +1,7 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> + .section .init.rodata, "a"
> + .global bpfilter_umh_start
> +bpfilter_umh_start:
> + .incbin "net/bpfilter/bpfilter_umh"
> + .global bpfilter_umh_end
> +bpfilter_umh_end:
^ permalink raw reply
* Re: [Patch net-next v3 1/3] net: introduce helper dev_change_tx_queue_len()
From: Eric Dumazet @ 2018-06-27 16:14 UTC (permalink / raw)
To: Cong Wang, netdev; +Cc: john.fastabend
In-Reply-To: <20180126022624.20442-2-xiyou.wangcong@gmail.com>
On 01/25/2018 06:26 PM, Cong Wang wrote:
> This patch promotes the local change_tx_queue_len() to a core
> helper function, dev_change_tx_queue_len(), so that rtnetlink
> and net-sysfs could share the code. This also prepares for the
> following patch.
>
> Note, the -EFAULT in the original code doesn't make sense,
> we should propagate the errno from notifiers.
>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
> include/linux/netdevice.h | 1 +
> net/core/dev.c | 28 ++++++++++++++++++++++++++++
> net/core/net-sysfs.c | 25 +------------------------
> net/core/rtnetlink.c | 18 +++++-------------
> 4 files changed, 35 insertions(+), 37 deletions(-)
>
Hi Cong
What about using dev_change_tx_queue_len() helper from SIOCSIFTXQLEN path in
net/core/dev_ioctl.c ?
This would make sure we call dev_qdisc_change_tx_queue_len() in this case.
Thanks !
^ permalink raw reply
* Re: [2/4] libertas_tf: use irqsave() in USB's complete callback
From: Kalle Valo @ 2018-06-27 16:13 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: netdev, linux-usb, tglx, David S. Miller, linux-wireless,
Sebastian Andrzej Siewior
In-Reply-To: <20180620193648.25136-3-bigeasy@linutronix.de>
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> The USB completion callback does not disable interrupts while acquiring
> the lock. We want to remove the local_irq_disable() invocation from
> __usb_hcd_giveback_urb() and therefore it is required for the callback
> handler to disable the interrupts while acquiring the lock.
> The callback may be invoked either in IRQ or BH context depending on the
> USB host controller.
> Use the _irqsave() variant of the locking primitives.
>
> I am removing the
> BUG_ON(!in_interrupt());
>
> check because it serves no purpose. Running the completion callback in
> BH context makes in_interrupt() still return true but the interrupts
> could be enabled. The important part is that ->driver_lock is acquired
> with disabled interrupts which is the case now.
>
> Cc: Kalle Valo <kvalo@codeaurora.org>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: linux-wireless@vger.kernel.org
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
3 patches applied to wireless-drivers-next.git, thanks.
fc75122fabb5 libertas_tf: use irqsave() in USB's complete callback
a3128feef6d5 libertas: use irqsave() in USB's complete callback
81454b8405f2 zd1211rw: use irqsave() in USB's complete callback
--
https://patchwork.kernel.org/patch/10478609/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
^ permalink raw reply
* Re: [2/4] zd1211rw: stop using deprecated get_seconds()
From: Kalle Valo @ 2018-06-27 16:10 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Arnd Bergmann, y2038, netdev, linux-wireless, linux-kernel,
Ulrich Kunitz, Daniel Drake, David S. Miller
In-Reply-To: <20180618151142.1214422-2-arnd@arndb.de>
Arnd Bergmann <arnd@arndb.de> wrote:
> The get_seconds() function is deprecated because of the y2038 overflow.
> In zd1211rw we don't even care about the absolute value, so this is
> not a problem, but it's equally trivial to change to the non-deprecated
> ktime_get_seconds().
>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2 patches applied to wireless-drivers-next.git, thanks.
71e140b57151 zd1211rw: stop using deprecated get_seconds()
3cade2f3d98a ipw2x00: track time using boottime
--
https://patchwork.kernel.org/patch/10471961/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
_______________________________________________
Y2038 mailing list
Y2038@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/y2038
^ permalink raw reply
* Re: [2/4] zd1211rw: stop using deprecated get_seconds()
From: Kalle Valo @ 2018-06-27 16:10 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Daniel Drake, Ulrich Kunitz, David S. Miller, y2038,
Arnd Bergmann, linux-wireless, netdev, linux-kernel
In-Reply-To: <20180618151142.1214422-2-arnd@arndb.de>
Arnd Bergmann <arnd@arndb.de> wrote:
> The get_seconds() function is deprecated because of the y2038 overflow.
> In zd1211rw we don't even care about the absolute value, so this is
> not a problem, but it's equally trivial to change to the non-deprecated
> ktime_get_seconds().
>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2 patches applied to wireless-drivers-next.git, thanks.
71e140b57151 zd1211rw: stop using deprecated get_seconds()
3cade2f3d98a ipw2x00: track time using boottime
--
https://patchwork.kernel.org/patch/10471961/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
^ permalink raw reply
* Re: [PATCH net] tcp: add one more quick ack after after ECN events
From: Neal Cardwell @ 2018-06-27 16:07 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, Netdev, Eric Dumazet, Lawrence Brakmo
In-Reply-To: <20180627154721.195722-1-edumazet@google.com>
On Wed, Jun 27, 2018 at 11:47 AM Eric Dumazet <edumazet@google.com> wrote:
>
> Larry Brakmo proposal ( https://patchwork.ozlabs.org/patch/935233/
> tcp: force cwnd at least 2 in tcp_cwnd_reduction) made us rethink
> about our recent patch removing ~16 quick acks after ECN events.
>
> tcp_enter_quickack_mode(sk, 1) makes sure one immediate ack is sent,
> but in the case the sender cwnd was lowered to 1, we do not want
> to have a delayed ack for the next packet we will receive.
>
> Fixes: 522040ea5fdd ("tcp: do not aggressively quick ack after ECN events")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Neal Cardwell <ncardwell@google.com>
> Cc: Lawrence Brakmo <brakmo@fb.com>
> ---
> net/ipv4/tcp_input.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
Acked-by: Neal Cardwell <ncardwell@google.com>
Thanks, Eric!
neal
^ permalink raw reply
* Re: atmel: use memdup_user to simplify the code
From: Kalle Valo @ 2018-06-27 16:06 UTC (permalink / raw)
To: YueHaibing; +Cc: davem, netdev, linux-kernel, linux-wireless, YueHaibing
In-Reply-To: <20180604103210.23984-1-yuehaibing@huawei.com>
YueHaibing <yuehaibing@huawei.com> wrote:
> use existing memdup_user() helper function instead of open-coding
>
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Patch applied to wireless-drivers-next.git, thanks.
8668f9a57c8c atmel: use memdup_user to simplify the code
--
https://patchwork.kernel.org/patch/10446299/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
^ permalink raw reply
* Re: net: ipw2x00: Replace NULL comparison with !priv
From: Kalle Valo @ 2018-06-27 16:05 UTC (permalink / raw)
To: Varsha Rao
Cc: Stanislav Yakovlev, David S. Miller, linux-wireless, netdev,
linux-kernel, Nicholas Mc Guire, Lukas Bulwahn, Varsha Rao
In-Reply-To: <20180603111136.4262-1-rvarsha016@gmail.com>
Varsha Rao <rvarsha016@gmail.com> wrote:
> Remove extra parentheses and replace NULL comparison with !priv, to fix
> clang warning of extraneous parentheses and check patch issue. Following
> coccinelle script is used to fix it.
>
> @disable is_null,paren@
> expression e;
> statement s;
> @@
> if (
> - (e==NULL)
> +!e
> )
> s
>
> Signed-off-by: Varsha Rao <rvarsha016@gmail.com>
Patch applied to wireless-drivers-next.git, thanks.
4e5f881d430a net: ipw2x00: Replace NULL comparison with !priv
--
https://patchwork.kernel.org/patch/10445241/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
^ permalink raw reply
* Re: [RFC PATCH v2 net-next 06/12] net: core: propagate SKB lists through packet_type lookup
From: Willem de Bruijn @ 2018-06-27 16:00 UTC (permalink / raw)
To: ecree; +Cc: linux-net-drivers, Network Development, David Miller
In-Reply-To: <4477db98-5f14-caa8-eb34-d0b1478877f4@solarflare.com>
On Wed, Jun 27, 2018 at 10:49 AM Edward Cree <ecree@solarflare.com> wrote:
>
> On 27/06/18 15:36, Willem de Bruijn wrote:
> > On Tue, Jun 26, 2018 at 8:19 PM Edward Cree <ecree@solarflare.com> wrote:
> >> __netif_receive_skb_taps() does a depressingly large amount of per-packet
> >> work that can't easily be listified, because the another_round looping
> >> makes it nontrivial to slice up into smaller functions.
> >> Fortunately, most of that work disappears in the fast path:
> >> * Hardware devices generally don't have an rx_handler
> >> * Unless you're tcpdumping or something, there is usually only one ptype
> >> * VLAN processing comes before the protocol ptype lookup, so doesn't force
> >> a pt_prev deliver
> >> so normally, __netif_receive_skb_taps() will run straight through and return
> >> the one ptype found in ptype_base[hash of skb->protocol].
> >>
> >> Signed-off-by: Edward Cree <ecree@solarflare.com>
> >> ---
> >> -static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc)
> >> +static int __netif_receive_skb_taps(struct sk_buff *skb, bool pfmemalloc,
> >> + struct packet_type **pt_prev)
> > A lot of code churn can be avoided by keeping local variable pt_prev and
> > calling this ppt_prev or so, then assigning just before returning on success.
> Good idea, I'll try that.
>
> > Also, this function does more than just process network taps.
> This is true, but naming things is hard, and I couldn't think of either a
> better new name for this function or a name that could fit in between
> __netif_receive_skb() and __netif_receive_skb_core() for the new function
> in my patch named __netif_receive_skb_core(). Any suggestions?
____netif_receive_skb_core? Not that four underscores is particularly
readable. Perhaps __netif_receive_skb_core_inner. It's indeed tricky (and
not the most important, I didn't mean to bikeshed).
Come to think of it, from your fast path assumptions, we could perhaps wrap
ptype_all and rx_handler logic in a static_branch similar to tc and netfilter
(and sk_memalloc_socks). Remaining branches like skip_classify, pfmemalloc
and deliver_exact can also not be reached if all these are off, so this entire
section can be skipped. Then it could become __netif_receive_skb_slow,
taken only on the static branch or for vlan packets. I do not suggest it as
part of this patchset. it would be a pretty complex change on its own.
^ permalink raw reply
* [PATCH net 1/1] net/smc: rebuild nonblocking connect
From: Ursula Braun @ 2018-06-27 15:59 UTC (permalink / raw)
To: davem
Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun,
xiyou.wangcong, eric.dumazet
The recent poll change may lead to stalls for non-blocking connecting
SMC sockets, since sock_poll_wait is no longer performed on the
internal CLC socket, but on the outer SMC socket. kernel_connect() on
the internal CLC socket returns with -EINPROGRESS, but the wake up
logic does not work in all cases. If the internal CLC socket is still
in state TCP_SYN_SENT when polled, sock_poll_wait() from sock_poll()
does not sleep. It is supposed to sleep till the state of the internal
CLC socket switches to TCP_ESTABLISHED.
This problem triggered a redesign of the SMC nonblocking connect logic.
This patch introduces a connect worker covering all connect steps
followed by a wake up of socket waiters. It allows to get rid of all
delays and locks in smc_poll().
Fixes: c0129a061442 ("smc: convert to ->poll_mask")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
---
net/smc/af_smc.c | 91 +++++++++++++++++++++++++++++++++++++++-----------------
net/smc/smc.h | 8 +++++
2 files changed, 71 insertions(+), 28 deletions(-)
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index da7f02edcd37..4d1d4ddb2f49 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -45,6 +45,7 @@ static DEFINE_MUTEX(smc_create_lgr_pending); /* serialize link group
*/
static void smc_tcp_listen_work(struct work_struct *);
+static void smc_connect_work(struct work_struct *);
static void smc_set_keepalive(struct sock *sk, int val)
{
@@ -122,6 +123,12 @@ static int smc_release(struct socket *sock)
goto out;
smc = smc_sk(sk);
+
+ /* cleanup for a dangling non-blocking connect */
+ flush_work(&smc->connect_work);
+ kfree(smc->connect_info);
+ smc->connect_info = NULL;
+
if (sk->sk_state == SMC_LISTEN)
/* smc_close_non_accepted() is called and acquires
* sock lock for child sockets again
@@ -186,6 +193,7 @@ static struct sock *smc_sock_alloc(struct net *net, struct socket *sock,
sk->sk_protocol = protocol;
smc = smc_sk(sk);
INIT_WORK(&smc->tcp_listen_work, smc_tcp_listen_work);
+ INIT_WORK(&smc->connect_work, smc_connect_work);
INIT_DELAYED_WORK(&smc->conn.tx_work, smc_tx_work);
INIT_LIST_HEAD(&smc->accept_q);
spin_lock_init(&smc->accept_q_lock);
@@ -576,6 +584,35 @@ static int __smc_connect(struct smc_sock *smc)
return 0;
}
+static void smc_connect_work(struct work_struct *work)
+{
+ struct smc_sock *smc = container_of(work, struct smc_sock,
+ connect_work);
+ int rc;
+
+ lock_sock(&smc->sk);
+ rc = kernel_connect(smc->clcsock, &smc->connect_info->addr,
+ smc->connect_info->alen, smc->connect_info->flags);
+ if (smc->clcsock->sk->sk_err) {
+ smc->sk.sk_err = smc->clcsock->sk->sk_err;
+ goto out;
+ }
+ if (rc < 0) {
+ smc->sk.sk_err = -rc;
+ goto out;
+ }
+
+ rc = __smc_connect(smc);
+ if (rc < 0)
+ smc->sk.sk_err = -rc;
+
+out:
+ smc->sk.sk_state_change(&smc->sk);
+ kfree(smc->connect_info);
+ smc->connect_info = NULL;
+ release_sock(&smc->sk);
+}
+
static int smc_connect(struct socket *sock, struct sockaddr *addr,
int alen, int flags)
{
@@ -605,15 +642,32 @@ static int smc_connect(struct socket *sock, struct sockaddr *addr,
smc_copy_sock_settings_to_clc(smc);
tcp_sk(smc->clcsock->sk)->syn_smc = 1;
- rc = kernel_connect(smc->clcsock, addr, alen, flags);
- if (rc)
- goto out;
+ if (flags & O_NONBLOCK) {
+ if (smc->connect_info) {
+ rc = -EALREADY;
+ goto out;
+ }
+ smc->connect_info = kzalloc(alen + 2 * sizeof(int), GFP_KERNEL);
+ if (!smc->connect_info) {
+ rc = -ENOMEM;
+ goto out;
+ }
+ smc->connect_info->alen = alen;
+ smc->connect_info->flags = flags ^ O_NONBLOCK;
+ memcpy(&smc->connect_info->addr, addr, alen);
+ schedule_work(&smc->connect_work);
+ rc = -EINPROGRESS;
+ } else {
+ rc = kernel_connect(smc->clcsock, addr, alen, flags);
+ if (rc)
+ goto out;
- rc = __smc_connect(smc);
- if (rc < 0)
- goto out;
- else
- rc = 0; /* success cases including fallback */
+ rc = __smc_connect(smc);
+ if (rc < 0)
+ goto out;
+ else
+ rc = 0; /* success cases including fallback */
+ }
out:
release_sock(sk);
@@ -1278,34 +1332,17 @@ static __poll_t smc_poll_mask(struct socket *sock, __poll_t events)
struct sock *sk = sock->sk;
__poll_t mask = 0;
struct smc_sock *smc;
- int rc;
if (!sk)
return EPOLLNVAL;
smc = smc_sk(sock->sk);
- sock_hold(sk);
- lock_sock(sk);
if ((sk->sk_state == SMC_INIT) || smc->use_fallback) {
/* delegate to CLC child sock */
- release_sock(sk);
mask = smc->clcsock->ops->poll_mask(smc->clcsock, events);
- lock_sock(sk);
sk->sk_err = smc->clcsock->sk->sk_err;
- if (sk->sk_err) {
+ if (sk->sk_err)
mask |= EPOLLERR;
- } else {
- /* if non-blocking connect finished ... */
- if (sk->sk_state == SMC_INIT &&
- mask & EPOLLOUT &&
- smc->clcsock->sk->sk_state != TCP_CLOSE) {
- rc = __smc_connect(smc);
- if (rc < 0)
- mask |= EPOLLERR;
- /* success cases including fallback */
- mask |= EPOLLOUT | EPOLLWRNORM;
- }
- }
} else {
if (sk->sk_err)
mask |= EPOLLERR;
@@ -1334,8 +1371,6 @@ static __poll_t smc_poll_mask(struct socket *sock, __poll_t events)
mask |= EPOLLPRI;
}
- release_sock(sk);
- sock_put(sk);
return mask;
}
diff --git a/net/smc/smc.h b/net/smc/smc.h
index 51ae1f10d81a..d7ca26570482 100644
--- a/net/smc/smc.h
+++ b/net/smc/smc.h
@@ -187,11 +187,19 @@ struct smc_connection {
struct work_struct close_work; /* peer sent some closing */
};
+struct smc_connect_info {
+ int flags;
+ int alen;
+ struct sockaddr addr;
+};
+
struct smc_sock { /* smc sock container */
struct sock sk;
struct socket *clcsock; /* internal tcp socket */
struct smc_connection conn; /* smc connection */
struct smc_sock *listen_smc; /* listen parent */
+ struct smc_connect_info *connect_info; /* connect address & flags */
+ struct work_struct connect_work; /* handle non-blocking connect*/
struct work_struct tcp_listen_work;/* handle tcp socket accepts */
struct work_struct smc_listen_work;/* prepare new accept socket */
struct list_head accept_q; /* sockets to be accepted */
--
2.16.4
^ permalink raw reply related
* Re: [PATCH net-next v2] net: vhost: improve performance when enable busyloop
From: Michael S. Tsirkin @ 2018-06-27 15:58 UTC (permalink / raw)
To: Jason Wang; +Cc: xiangxia.m.yue, virtualization, netdev, Tonghao Zhang
In-Reply-To: <369bea44-6ebd-337a-b20b-a28a604fa2e9@redhat.com>
On Wed, Jun 27, 2018 at 10:24:43PM +0800, Jason Wang wrote:
>
>
> On 2018年06月26日 13:17, xiangxia.m.yue@gmail.com wrote:
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >
> > This patch improves the guest receive performance from
> > host. On the handle_tx side, we poll the sock receive
> > queue at the same time. handle_rx do that in the same way.
> >
> > For avoiding deadlock, change the code to lock the vq one
> > by one and use the VHOST_NET_VQ_XX as a subclass for
> > mutex_lock_nested. With the patch, qemu can set differently
> > the busyloop_timeout for rx or tx queue.
> >
> > We set the poll-us=100us and use the iperf3 to test
> > its throughput. The iperf3 command is shown as below.
> >
> > on the guest:
> > iperf3 -s -D
> >
> > on the host:
> > iperf3 -c 192.168.1.100 -i 1 -P 10 -t 10 -M 1400
> >
> > * With the patch: 23.1 Gbits/sec
> > * Without the patch: 12.7 Gbits/sec
> >
> > Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com>
>
> Thanks a lot for the patch. Looks good generally, but please split this big
> patch into separate ones like:
>
> patch 1: lock vqs one by one
> patch 2: replace magic number of lock annotation
> patch 3: factor out generic busy polling logic to vhost_net_busy_poll()
> patch 4: add rx busy polling in tx path.
>
> And please cc Michael in v3.
>
> Thanks
Pls include host CPU utilization numbers. You can get them e.g. using
vmstat. I suspect we also want the polling controllable e.g. through
an ioctl.
--
MST
^ permalink raw reply
* Re: brcmsmac: make function wlc_phy_workarounds_nphy_rev1 static
From: Kalle Valo @ 2018-06-27 15:58 UTC (permalink / raw)
To: Colin Ian King
Cc: Arend van Spriel, Franky Lin, Hante Meuleman, Chi-Hsien Lin,
Wright Feng, David S . Miller, linux-wireless,
brcm80211-dev-list.pdl, brcm80211-dev-list, netdev,
kernel-janitors, linux-kernel
In-Reply-To: <20180623221531.6396-2-colin.king@canonical.com>
Colin Ian King <colin.king@canonical.com> wrote:
> From: Colin Ian King <colin.king@canonical.com>
>
> The function wlc_phy_workarounds_nphy_rev1 is local to the source and
> does not need to be in global scope, so make it static.
>
> Cleans up sparse warning:
> symbol 'wlc_phy_workarounds_nphy_rev1' was not declared. Should it
> be static?
>
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
Patch applied to wireless-drivers-next.git, thanks.
ab8d904654e2 brcmsmac: make function wlc_phy_workarounds_nphy_rev1 static
--
https://patchwork.kernel.org/patch/10483927/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
^ permalink raw reply
* Re: brcmsmac: fix wrap around in conversion from constant to s16
From: Kalle Valo @ 2018-06-27 15:57 UTC (permalink / raw)
To: Stefan Agner
Cc: Stefan Agner, Tobias Regnery, Arend van Spriel, Franky Lin,
Hante Meuleman, Chi-Hsien Lin, Wright Feng, David S. Miller,
linux-wireless, brcm80211-dev-list.pdl, brcm80211-dev-list,
netdev, linux-kernel
In-Reply-To: <20180617103407.27819-1-stefan@agner.ch>
Stefan Agner <stefan@agner.ch> wrote:
> The last value in the log_table wraps around to a negative value
> since s16 has a value range of -32768 to 32767. This is not what
> the table intends to represent. Use the closest positive value
> 32767.
>
> This fixes a warning seen with clang:
> drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_qmath.c:216:2: warning:
> implicit conversion from 'int' to 's16' (aka 'short') changes
> value from 32768
> to -32768 [-Wconstant-conversion]
> 32768
> ^~~~~
> 1 warning generated.
>
> Fixes: 4c0bfeaae9f9 ("brcmsmac: fix array out-of-bounds access in qm_log10")
> Cc: Tobias Regnery <tobias.regnery@gmail.com>
> Signed-off-by: Stefan Agner <stefan@agner.ch>
Patch applied to wireless-drivers-next.git, thanks.
c9a61469fc97 brcmsmac: fix wrap around in conversion from constant to s16
--
https://patchwork.kernel.org/patch/10468755/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox