Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next] rds: clean up loopback rds_connections on netns deletion
From: Dmitry Vyukov @ 2018-06-26 14:48 UTC (permalink / raw)
  To: Sowmini Varadhan
  Cc: David Miller, netdev, rds-devel, Santosh Shilimkar,
	syzkaller-bugs
In-Reply-To: <20180626144454.GF20575@oracle.com>

On Tue, Jun 26, 2018 at 4:44 PM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> On (06/26/18 23:29), David Miller wrote:
>>
>> I think there is a way to ask syzbot to test a patch in an
>> email.
>
> Dmitry/syzkaller-bugs, can you clarify?
>
> This is for the cluster of dup reports like
>  https://groups.google.com/forum/#!topic/syzkaller-bugs/zBph8Vu-q2U
> and (most recently)
>  https://www.spinics.net/lists/linux-rdma/msg66020.html
>
> as I understand it, if there is no reproducer, you cannot really
> have a pass/fail test to confirm the fix.

This bug has a reproducer as far as I see:

https://syzkaller.appspot.com/bug?id=f4ef381349e100280193c25f24e01d9d364132d9

It seems to be a subtle race since syzbot did not progress with
minimization too much:

https://syzkaller.appspot.com/text?tag=ReproSyz&x=16cbfeaf800000

it probably hit the race by a pure luck of the large program, but then
never had the same luck when tried to remove any syscalls.
So it can make sense to submit several test requests to get more testing.

^ permalink raw reply

* Re: [PATCH v2] fib_rules: match rules based on suppress_* properties too
From: Roopa Prabhu @ 2018-06-26 14:51 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: netdev
In-Reply-To: <20180625233932.11531-1-Jason@zx2c4.com>

On Mon, Jun 25, 2018 at 4:39 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Two rules with different values of suppress_prefix or suppress_ifgroup
> are not the same. This fixes an -EEXIST when running:
>
>    $ ip -4 rule add table main suppress_prefixlength 0
>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> Fixes: f9d4b0c1e969 ("fib_rules: move common handling of newrule delrule msgs into fib_nl2rule")
> ---
> This adds the new condition you mentioned. I'm not sure what you make of
> DaveM's remark about this not being in the original code, but here is
> nonetheless the requested change.

I just saw DaveM's comment and agree the new rule_find is different
but that was intentional and it merged
the finding of the rule in the newlink and dellink paths. I did port
each of the conditions from previous rule_exists
to new rule_find, but forgot to add the new keys which now became
necessary. I replied with details on your
other bug report thread. Also pasting that response here:

So the previous rule_exists code did not check for attribute matches correctly.
It would ignore a rule at the first non-existent attribute mis-match.
And rule_find will always
be called with a valid key.
eg in your case, it would
return at pref mismatch...and never match an existing rule.

$ip -4 rule add table main suppress_prefixlength 0
$ip -4 rule add table main suppress_prefixlength 0
$ip -4 rule add table main suppress_prefixlength 0

$ip rule show
0:      from all lookup local
32763:  from all lookup main suppress_prefixlength 0
32764:  from all lookup main suppress_prefixlength 0
32765:  from all lookup main suppress_prefixlength 0
32766:  from all lookup main
32767:  from all lookup default

With your patch, you should get proper EXISTS check
$ ip -4 rule add table main suppress_prefixlength 0
$ ip -4 rule add table main suppress_prefixlength 0

RTNETLINK answers: File exists

Dave, pls let me know if this is acceptable. If not
I can easily restore the previous rule_exists func. Will also submit a
patch to cover this in self-tests.

thanks.



>
>  net/core/fib_rules.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
> index 126ffc5bc630..bc8425d81022 100644
> --- a/net/core/fib_rules.c
> +++ b/net/core/fib_rules.c
> @@ -416,6 +416,14 @@ static struct fib_rule *rule_find(struct fib_rules_ops *ops,
>                 if (rule->mark && r->mark != rule->mark)
>                         continue;
>
> +               if (rule->suppress_ifgroup != -1 &&
> +                   r->suppress_ifgroup != rule->suppress_ifgroup)
> +                       continue;
> +
> +               if (rule->suppress_prefixlen != -1 &&
> +                   r->suppress_prefixlen != rule->suppress_prefixlen)
> +                       continue;
> +
>                 if (rule->mark_mask && r->mark_mask != rule->mark_mask)
>                         continue;
>
> --

^ permalink raw reply

* Re: [PATCH net-next 1/1] tc-testing: initial version of tunnel_key unit tests
From: Davide Caratti @ 2018-06-26 14:51 UTC (permalink / raw)
  To: Keara Leibovitz, davem; +Cc: netdev, jhs, xiyou.wangcong, jiri, lucasb
In-Reply-To: <1530019039-20519-1-git-send-email-kleib@mojatatu.com>

On Tue, 2018-06-26 at 09:17 -0400, Keara Leibovitz wrote:
> Create unittests for the tc tunnel_key action.
> 
> 
> Signed-off-by: Keara Leibovitz <kleib@mojatatu.com>
> ---
>  .../tc-testing/tc-tests/actions/tunnel_key.json    | 676 +++++++++++++++++++++
>  1 file changed, 676 insertions(+)
>  create mode 100644 tools/testing/selftests/tc-testing/tc-tests/actions/tunnel_key.json
> 
> diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/tunnel_key.json b/tools/testing/selftests/tc-testing/tc-tests/actions/tunnel_key.json
> new file mode 100644
> index 000000000000..bfe522ac8177

hello Keara!

I think the 'teardown' stage in some of these tests should be reviewed.
Those that are meant to test invalid configurations (like dc6b) should
allow non-zero exit codes in the teardown stage, if the wrong
configuration is catched by the userspace TC tool, before talking to the
kernel. 

Otherwise, those tests will fail when they are invoked one by one with the
act_tunnel_key module unloaded.

> --- /dev/null
> +++ b/tools/testing/selftests/tc-testing/tc-tests/actions/tunnel_key.json
> @@ -0,0 +1,676 @@
> 
...

> +    {
> +        "id": "dc6b",
> +        "name": "Add tunnel_key set action with missing mandatory src_ip parameter",
> +        "category": [
> +            "actions",
> +            "tunnel_key"
> +        ],
> +        "setup": [
> +            [
> +                "$TC actions flush action tunnel_key",
> +                0,
> +                1,
> +                255
> +            ]
> +        ],
> +        "cmdUnderTest": "$TC actions add action tunnel_key set dst_ip 20.20.20.2 id 100",
> +        "expExitCode": "255",
> +        "verifyCmd": "$TC actions list action tunnel_key",
> +        "matchPattern": "action order [0-9]+: tunnel_key set.*dst_ip 20.20.20.2.*key_id 100",
> +        "matchCount": "0",
> +        "teardown": [
> +            "$TC actions flush action tunnel_key"
> +        ]
> +    },

example: try the test above as follows:

[root@rhel tc-testing]# modprobe  act_tunnel_key
[root@rhel tc-testing]# ./tdc.py -e dc6b
Test dc6b: Add tunnel_key set action with missing mandatory src_ip parameter
All test results: 

1..1
ok 1 - dc6b # Add tunnel_key set action with missing mandatory src_ip parameter
about to flush the tap output if tests need to be skipped
done flushing skipped test tap output

[root@rhel tc-testing]# modprobe -r act_tunnel_key ; ./tdc.py -p /usr/local/src/iproute2/tc/tc -e dc6b
Test dc6b: Add tunnel_key set action with missing mandatory src_ip parameter

-----> teardown stage *** Could not execute: "$TC actions flush action tunnel_key"

-----> teardown stage *** Error message: "Error: Cannot flush unknown TC action.
We have an error flushing
"
[...]
---------------
accumulated output for this test:
---------------
All test results: 

1..1
about to flush the tap output if tests need to be skipped
ok 1 - dc6b # skipped - previous teardown failed 1 dc6b
done flushing skipped test tap output

(BTW: I'm fixing the bpf test suite for a similar problem, I forgot to fix
it when I posted commit f7017cafcdd ("tc-testing: fix tdc tests for 'bpf'
action") . Sorry for that.)


WDYT?

regards,
-- 
davide

^ permalink raw reply

* Re: [PATCH net-next] rds: clean up loopback rds_connections on netns deletion
From: Sowmini Varadhan @ 2018-06-26 14:53 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, rds-devel, santosh.shilimkar
In-Reply-To: <20180626.232956.1181108479532700313.davem@davemloft.net>

On (06/26/18 23:29), David Miller wrote:
> >> 
> >> Since this probably fixes syzbot reports, this can be targetted
> >> at 'net' instead?
> > 
> > that thought occurred to me but I wanted to be conservative and have
> > it in net-next first, have the syzkaller-bugs team confirm the
> > the fixes and then backport to earlier kernels (if needed)..
> 
> I think there is a way to ask syzbot to test a patch in an
> email.

and just to add, the fix itself is logically correct, so belongs in
net-next. What I dont have (and therefore did not target net) is
official confirmation that the syzbot failures are root-caused to the
absence of this patch (since there is no reproducer for many of these,
and no crash dumps available from syzbot).  

--Sowmini

^ permalink raw reply

* Re: Fwd: [PATCH 0/6] offload Linux LAG devices to the TC datapath
From: Or Gerlitz @ 2018-06-26 14:57 UTC (permalink / raw)
  To: John Hurley, Jakub Kicinski, Jiri Pirko
  Cc: netdev, ASAP_Direct_Dev, simon.horman, Andy Gospodarek
In-Reply-To: <8f406548-8f90-b658-fcd1-342d702b3445@mellanox.com>

> -------- Forwarded Message --------
> Subject: [PATCH 0/6] offload Linux LAG devices to the TC datapath
> Date: Thu, 21 Jun 2018 14:35:55 +0100
> From: John Hurley <john.hurley@netronome.com>
> To: dev@openvswitch.org, roid@mellanox.com, gavi@mellanox.com, paulb@mellanox.com, fbl@sysclose.org, simon.horman@netronome.com
> CC: John Hurley <john.hurley@netronome.com>
> 
> This patchset extends OvS TC and the linux-netdev implementation to
> support the offloading of Linux Link Aggregation devices (LAG) and their
> slaves. TC blocks are used to provide this offload. Blocks, in TC, group
> together a series of qdiscs. If a filter is added to one of these qdiscs
> then it applied to all. Similarly, if a packet is matched on one of the
> grouped qdiscs then the stats for the entire block are increased. The
> basis of the LAG offload is that the LAG master (attached to the OvS
> bridge) and slaves that may exist outside of OvS are all added to the same
> TC block. OvS can then control the filters and collect the stats on the
> slaves via its interaction with the LAG master.
> 
> The TC API is extended within OvS to allow the addition of a block id to
> ingress qdisc adds. Block ids are then assigned to each LAG master that is
> attached to the OvS bridge. The linux netdev netlink socket is used to
> monitor slave devices. If a LAG slave is found whose master is on the bridge
> then it is added to the same block as its master. If the underlying slaves
> belong to an offloadable device then the Linux LAG device can be offloaded
> to hardware.

Guys (J/J/J), 

Doing this here b/c

a. this has impact on the kernel side of things

b. I am more of a netdev and not openvswitch citizen..

some comments, 

1. this + Jakub's patch for the reply are really a great design

2. re the egress side of things. Some NIC HWs can't just use LAG
as the egress port destination of an ACL (tc rule) and the HW rule
needs to be duplicated to both HW ports. So... in that case, you 
see the HW driver doing the duplication (:() or we can somehow
make it happen from user-space?

3. for the case of overlay networks, e.g OVS based vxlan tunnel, the
ingress (decap) rule is set on the vxlan device. Jakub, you mentioned 
a possible kernel patch to the HW (nfp, mlx5) drivers to have them bind 
to the tunnel device for ingress rules. If we have agreed way to identify
uplink representors, can we do that from ovs too? does it matter if we are
bonding + encapsulating or just encapsulating? note that under encap scheme
the bond is typically not part of the OVS bridge. 

Or.

^ permalink raw reply

* Re: [PATCH net-next] rds: clean up loopback rds_connections on netns deletion
From: Sowmini Varadhan @ 2018-06-26 15:04 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: David Miller, netdev, rds-devel, Santosh Shilimkar,
	syzkaller-bugs
In-Reply-To: <CACT4Y+YFX3jiGtugGmO+_taxJgfn9eo1YOz_jufeBmkv_V72uA@mail.gmail.com>

On (06/26/18 16:48), Dmitry Vyukov wrote:
> it probably hit the race by a pure luck of the large program, but then
> never had the same luck when tried to remove any syscalls.
> So it can make sense to submit several test requests to get more testing.

How does one submit test requests by email? 

the last time I asked this question, the answer was a pointer to
https://groups.google.com/forum/#!msg/syzkaller-bugs/7ucgCkAJKSk/skZjgavRAQAJ

Thanks
--Sowmini

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Cornelia Huck @ 2018-06-26 15:08 UTC (permalink / raw)
  To: Siwei Liu
  Cc: Alexander Duyck, virtio-dev, Jiri Pirko, Michael S. Tsirkin,
	Jakub Kicinski, Samudrala, Sridhar, konrad.wilk, qemu-devel,
	virtualization, Venu Busireddy, Netdev, boris.ostrovsky,
	aaron.f.brown, Joao Martins
In-Reply-To: <CADGSJ21HNd4VYNcCt4H0gJ_CCx1GUFpHrDof2N=4WqhD24Zc2A@mail.gmail.com>

On Fri, 22 Jun 2018 17:05:04 -0700
Siwei Liu <loseweigh@gmail.com> wrote:

> On Fri, Jun 22, 2018 at 3:33 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > I suspect the diveregence will be lost on most users though
> > simply because they don't even care about vfio. They just
> > want things to go fast.  
> 
> Like Jason said, VF isn't faster than virtio-net in all cases. It
> depends on the workload and performance metrics: throughput, latency,
> or packet per second.

So, will it be guest/admin-controllable then where the traffic flows
through? Just because we do have a vf available after negotiation of
the feature bit, it does not necessarily mean we want to use it? Do we
(the guest) even want to make it visible in that case?

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Cornelia Huck @ 2018-06-26 15:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Samudrala, Sridhar, Siwei Liu, Alexander Duyck, virtio-dev,
	aaron.f.brown, Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel,
	virtualization, konrad.wilk, boris.ostrovsky, Joao Martins,
	Venu Busireddy, vijay.balakrishna
In-Reply-To: <20180626044650-mutt-send-email-mst@kernel.org>

On Tue, 26 Jun 2018 04:50:25 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Jun 25, 2018 at 10:54:09AM -0700, Samudrala, Sridhar wrote:
> > > > > > Might not neccessarily be something wrong, but it's very limited to
> > > > > > prohibit the MAC of VF from changing when enslaved by failover.  
> > > > > You mean guest changing MAC? I'm not sure why we prohibit that.  
> > > > I think Sridhar and Jiri might be better person to answer it. My
> > > > impression was that sync'ing the MAC address change between all 3
> > > > devices is challenging, as the failover driver uses MAC address to
> > > > match net_device internally.  
> > 
> > Yes. The MAC address is assigned by the hypervisor and it needs to manage the movement
> > of the MAC between the PF and VF.  Allowing the guest to change the MAC will require
> > synchronization between the hypervisor and the PF/VF drivers. Most of the VF drivers
> > don't allow changing guest MAC unless it is a trusted VF.  
> 
> OK but it's a policy thing. Maybe it's a trusted VF. Who knows?
> For example I can see host just
> failing VIRTIO_NET_CTRL_MAC_ADDR_SET if it wants to block it.
> I'm not sure why VIRTIO_NET_F_STANDBY has to block it in the guest.
> 

So, what I get from this is that QEMU needs to be able to control all
of standby, uuid, and mac to accommodate the different setups
(respectively have libvirt/management software set it up). Is the host
able to find out respectively define whether a VF is trusted?

^ permalink raw reply

* Re: [PATCH net-next] rds: clean up loopback rds_connections on netns deletion
From: Dmitry Vyukov @ 2018-06-26 15:21 UTC (permalink / raw)
  To: Sowmini Varadhan
  Cc: David Miller, netdev, rds-devel, Santosh Shilimkar,
	syzkaller-bugs
In-Reply-To: <20180626150442.GI20575@oracle.com>

On Tue, Jun 26, 2018 at 5:04 PM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> On (06/26/18 16:48), Dmitry Vyukov wrote:
>> it probably hit the race by a pure luck of the large program, but then
>> never had the same luck when tried to remove any syscalls.
>> So it can make sense to submit several test requests to get more testing.
>
> How does one submit test requests by email?

https://github.com/google/syzkaller/blob/master/docs/syzbot.md#testing-patches

> the last time I asked this question, the answer was a pointer to
> https://groups.google.com/forum/#!msg/syzkaller-bugs/7ucgCkAJKSk/skZjgavRAQAJ

You probably asked to apply an unsubmitted patch to syzbot git tree.
That's the question that I gave that link to. But now it's also
detailed here:

https://github.com/google/syzkaller/blob/master/docs/syzbot.md#no-custom-patches

^ permalink raw reply

* Re: [PATCH v3,net-next] vlan: implement vlan id and protocol changes
From: Ido Schimmel @ 2018-06-26 15:29 UTC (permalink / raw)
  To: Chas Williams; +Cc: dsa, David S. Miller, netdev, Roopa Prabhu, idosch
In-Reply-To: <CAG2-GkmUJCc2bvOpaXsnUsEeJCLjWeYrs4Xe2kF_9M48FMRTzA@mail.gmail.com>

On Tue, Jun 26, 2018 at 09:33:40AM -0400, Chas Williams wrote:
> On Tue, Jun 26, 2018 at 6:32 AM Ido Schimmel <idosch@idosch.org> wrote:
> 
> > On Mon, Jun 25, 2018 at 02:45:24PM -0600, David Ahern wrote:
> > > On 6/25/18 4:30 AM, Chas Williams wrote:
> > > > vlan_changelink silently ignores attempts to change the vlan id
> > > > or protocol id of an existing vlan interface.  Implement by adding
> > > > the new vlan id and protocol to the interface's vlan group and then
> > > > removing the old vlan id and protocol from the vlan group.
> > > >
> > > > Signed-off-by: Chas Williams <3chas3@gmail.com>
> > > > ---
> > > >  include/linux/netdevice.h |  1 +
> > > >  net/8021q/vlan.c          |  4 ++--
> > > >  net/8021q/vlan.h          |  2 ++
> > > >  net/8021q/vlan_netlink.c  | 38 ++++++++++++++++++++++++++++++++++++++
> > > >  net/core/dev.c            |  1 +
> > > >  5 files changed, 44 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > > index 3ec9850c7936..a95ae238addf 100644
> > > > --- a/include/linux/netdevice.h
> > > > +++ b/include/linux/netdevice.h
> > > > @@ -2409,6 +2409,7 @@ enum netdev_cmd {
> > > >     NETDEV_CVLAN_FILTER_DROP_INFO,
> > > >     NETDEV_SVLAN_FILTER_PUSH_INFO,
> > > >     NETDEV_SVLAN_FILTER_DROP_INFO,
> > > > +   NETDEV_CHANGEVLAN,
> > > >  };
> > > >  const char *netdev_cmd_to_name(enum netdev_cmd cmd);
> > > >
> > >
> > > you add the new notifier, but do not add any hooks to catch and process
> > it.
> > >
> > > Personally, I think it is a bit sketchy to change the vlan id on an
> > > existing device and I suspect it will cause latent errors.
> >
> > +1
> >
> > >
> > > What's your use case for trying to implement the change versus causing
> > > it to generate an unsupported error?
> > >
> > > If this patch does get accepted, I believe the mlxsw switchdev driver
> > > will be impacted.
> >
> > Yes, at minimum we need to return an error for NETDEV_CHANGEVLAN, but
> > looking at the code it seems that there's no proper rollback.
> >
> 
> I would prefer not to bother with error handling on the notification.  If
> something misses the notification, something misses the notification.
> It happens.

The notification is used so that relevant users in the kernel can
potentially veto the operation and refuse it. See other notifications
such as NETDEV_PRECHANGEUPPER.

The driver David mentioned is one existing user that needs to refuse the
VLAN change as it can't support it.

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Michael S. Tsirkin @ 2018-06-26 15:38 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Samudrala, Sridhar, Siwei Liu, Alexander Duyck, virtio-dev,
	aaron.f.brown, Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel,
	virtualization, konrad.wilk, boris.ostrovsky, Joao Martins,
	Venu Busireddy, vijay.balakrishna
In-Reply-To: <20180626171732.5038f53f.cohuck@redhat.com>

On Tue, Jun 26, 2018 at 05:17:32PM +0200, Cornelia Huck wrote:
> On Tue, 26 Jun 2018 04:50:25 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Mon, Jun 25, 2018 at 10:54:09AM -0700, Samudrala, Sridhar wrote:
> > > > > > > Might not neccessarily be something wrong, but it's very limited to
> > > > > > > prohibit the MAC of VF from changing when enslaved by failover.  
> > > > > > You mean guest changing MAC? I'm not sure why we prohibit that.  
> > > > > I think Sridhar and Jiri might be better person to answer it. My
> > > > > impression was that sync'ing the MAC address change between all 3
> > > > > devices is challenging, as the failover driver uses MAC address to
> > > > > match net_device internally.  
> > > 
> > > Yes. The MAC address is assigned by the hypervisor and it needs to manage the movement
> > > of the MAC between the PF and VF.  Allowing the guest to change the MAC will require
> > > synchronization between the hypervisor and the PF/VF drivers. Most of the VF drivers
> > > don't allow changing guest MAC unless it is a trusted VF.  
> > 
> > OK but it's a policy thing. Maybe it's a trusted VF. Who knows?
> > For example I can see host just
> > failing VIRTIO_NET_CTRL_MAC_ADDR_SET if it wants to block it.
> > I'm not sure why VIRTIO_NET_F_STANDBY has to block it in the guest.
> > 
> 
> So, what I get from this is that QEMU needs to be able to control all
> of standby, uuid, and mac to accommodate the different setups
> (respectively have libvirt/management software set it up). Is the host
> able to find out respectively define whether a VF is trusted?

You do it with ip link I think but QEMU doesn't normally do this,
it relies on libvirt to poke at host kernel and supply the info.

-- 
MST

^ permalink raw reply

* [PATCH 0/3] xdp: don't mix XDP_TX and XDP_REDIRECT flush ops
From: Jesper Dangaard Brouer @ 2018-06-26 15:39 UTC (permalink / raw)
  To: netdev, Jesper Dangaard Brouer
  Cc: John Fastabend, Jason Wang, Daniel Borkmann, BjörnTöpel,
	Alexei Starovoitov

Fix driver logic that are combining XDP_TX flush and XDP_REDIRECT map
flushing.  These are two different XDP xmit modes, and it is clearly
wrong to invoke both types of flush operations when only one of the
XDP xmit modes is used.

---
Unsure what git tree to send this against. Thus, I'll leave it up-to
the patchwork assigner ;-)


Jesper Dangaard Brouer (3):
      ixgbe: split XDP_TX tail and XDP_REDIRECT map flushing
      i40e: split XDP_TX tail and XDP_REDIRECT map flushing
      virtio_net: split XDP_TX kick and XDP_REDIRECT map flushing


 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |   24 +++++++++++++-------
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   24 ++++++++++++--------
 drivers/net/virtio_net.c                      |   30 ++++++++++++++++---------
 3 files changed, 48 insertions(+), 30 deletions(-)

^ permalink raw reply

* [PATCH 1/3] ixgbe: split XDP_TX tail and XDP_REDIRECT map flushing
From: Jesper Dangaard Brouer @ 2018-06-26 15:39 UTC (permalink / raw)
  To: netdev, Jesper Dangaard Brouer
  Cc: John Fastabend, Jason Wang, Daniel Borkmann, BjörnTöpel,
	Alexei Starovoitov
In-Reply-To: <153002741940.15389.10466368482771753300.stgit@firesoul>

The driver was combining the XDP_TX tail flush and XDP_REDIRECT
map flushing (xdp_do_flush_map).  This is suboptimal, these two
flush operations should be kept separate.

Fixes: 11393cc9b9be ("xdp: Add batching support to redirect map")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 4929f7265598..5f8a969638b2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2186,9 +2186,10 @@ static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring,
 	return skb;
 }
 
-#define IXGBE_XDP_PASS 0
-#define IXGBE_XDP_CONSUMED 1
-#define IXGBE_XDP_TX 2
+#define IXGBE_XDP_PASS		0
+#define IXGBE_XDP_CONSUMED	BIT(0)
+#define IXGBE_XDP_TX		BIT(1)
+#define IXGBE_XDP_REDIR		BIT(2)
 
 static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
 			       struct xdp_frame *xdpf);
@@ -2225,7 +2226,7 @@ static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter,
 	case XDP_REDIRECT:
 		err = xdp_do_redirect(adapter->netdev, xdp, xdp_prog);
 		if (!err)
-			result = IXGBE_XDP_TX;
+			result = IXGBE_XDP_REDIR;
 		else
 			result = IXGBE_XDP_CONSUMED;
 		break;
@@ -2285,7 +2286,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 	unsigned int mss = 0;
 #endif /* IXGBE_FCOE */
 	u16 cleaned_count = ixgbe_desc_unused(rx_ring);
-	bool xdp_xmit = false;
+	unsigned int xdp_xmit = 0;
 	struct xdp_buff xdp;
 
 	xdp.rxq = &rx_ring->xdp_rxq;
@@ -2328,8 +2329,10 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		}
 
 		if (IS_ERR(skb)) {
-			if (PTR_ERR(skb) == -IXGBE_XDP_TX) {
-				xdp_xmit = true;
+			unsigned int xdp_res = -PTR_ERR(skb);
+
+			if (xdp_res & (IXGBE_XDP_TX | IXGBE_XDP_REDIR)) {
+				xdp_xmit |= xdp_res;
 				ixgbe_rx_buffer_flip(rx_ring, rx_buffer, size);
 			} else {
 				rx_buffer->pagecnt_bias++;
@@ -2401,7 +2404,10 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		total_rx_packets++;
 	}
 
-	if (xdp_xmit) {
+	if (xdp_xmit & IXGBE_XDP_REDIR)
+		xdp_do_flush_map();
+
+	if (xdp_xmit & IXGBE_XDP_TX) {
 		struct ixgbe_ring *ring = adapter->xdp_ring[smp_processor_id()];
 
 		/* Force memory writes to complete before letting h/w
@@ -2409,8 +2415,6 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		 */
 		wmb();
 		writel(ring->next_to_use, ring->tail);
-
-		xdp_do_flush_map();
 	}
 
 	u64_stats_update_begin(&rx_ring->syncp);

^ permalink raw reply related

* [PATCH 2/3] i40e: split XDP_TX tail and XDP_REDIRECT map flushing
From: Jesper Dangaard Brouer @ 2018-06-26 15:39 UTC (permalink / raw)
  To: netdev, Jesper Dangaard Brouer
  Cc: John Fastabend, Jason Wang, Daniel Borkmann, BjörnTöpel,
	Alexei Starovoitov
In-Reply-To: <153002741940.15389.10466368482771753300.stgit@firesoul>

The driver was combining the XDP_TX tail flush and XDP_REDIRECT
map flushing (xdp_do_flush_map).  This is suboptimal, these two
flush operations should be kept separate.

It looks like the mistake was copy-pasted from ixgbe.

Fixes: d9314c474d4f ("i40e: add support for XDP_REDIRECT")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c |   24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 8ffb7454e67c..c1c027743159 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2200,9 +2200,10 @@ static bool i40e_is_non_eop(struct i40e_ring *rx_ring,
 	return true;
 }
 
-#define I40E_XDP_PASS 0
-#define I40E_XDP_CONSUMED 1
-#define I40E_XDP_TX 2
+#define I40E_XDP_PASS		0
+#define I40E_XDP_CONSUMED	BIT(0)
+#define I40E_XDP_TX		BIT(1)
+#define I40E_XDP_REDIR		BIT(2)
 
 static int i40e_xmit_xdp_ring(struct xdp_frame *xdpf,
 			      struct i40e_ring *xdp_ring);
@@ -2249,7 +2250,7 @@ static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring,
 		break;
 	case XDP_REDIRECT:
 		err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
-		result = !err ? I40E_XDP_TX : I40E_XDP_CONSUMED;
+		result = !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED;
 		break;
 	default:
 		bpf_warn_invalid_xdp_action(act);
@@ -2312,7 +2313,8 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
 	struct sk_buff *skb = rx_ring->skb;
 	u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
-	bool failure = false, xdp_xmit = false;
+	unsigned int xdp_xmit = 0;
+	bool failure = false;
 	struct xdp_buff xdp;
 
 	xdp.rxq = &rx_ring->xdp_rxq;
@@ -2373,8 +2375,10 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 		}
 
 		if (IS_ERR(skb)) {
-			if (PTR_ERR(skb) == -I40E_XDP_TX) {
-				xdp_xmit = true;
+			unsigned int xdp_res = -PTR_ERR(skb);
+
+			if (xdp_res & (I40E_XDP_TX | I40E_XDP_REDIR)) {
+				xdp_xmit |= xdp_res;
 				i40e_rx_buffer_flip(rx_ring, rx_buffer, size);
 			} else {
 				rx_buffer->pagecnt_bias++;
@@ -2428,12 +2432,14 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 		total_rx_packets++;
 	}
 
-	if (xdp_xmit) {
+	if (xdp_xmit & I40E_XDP_REDIR)
+		xdp_do_flush_map();
+
+	if (xdp_xmit & I40E_XDP_TX) {
 		struct i40e_ring *xdp_ring =
 			rx_ring->vsi->xdp_rings[rx_ring->queue_index];
 
 		i40e_xdp_ring_update_tail(xdp_ring);
-		xdp_do_flush_map();
 	}
 
 	rx_ring->skb = skb;

^ permalink raw reply related

* [PATCH 3/3] virtio_net: split XDP_TX kick and XDP_REDIRECT map flushing
From: Jesper Dangaard Brouer @ 2018-06-26 15:39 UTC (permalink / raw)
  To: netdev, Jesper Dangaard Brouer
  Cc: John Fastabend, Jason Wang, Daniel Borkmann, BjörnTöpel,
	Alexei Starovoitov
In-Reply-To: <153002741940.15389.10466368482771753300.stgit@firesoul>

The driver was combining XDP_TX virtqueue_kick and XDP_REDIRECT
map flushing (xdp_do_flush_map).  This is suboptimal, these two
flush operations should be kept separate.

The suboptimal behavior was introduced in commit 9267c430c6b6
("virtio-net: add missing virtqueue kick when flushing packets").

Fixes: 9267c430c6b6 ("virtio-net: add missing virtqueue kick when flushing packets")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/virtio_net.c |   30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 1619ee3070b6..ae47ecf80c2d 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -53,6 +53,10 @@ module_param(napi_tx, bool, 0644);
 /* Amount of XDP headroom to prepend to packets for use by xdp_adjust_head */
 #define VIRTIO_XDP_HEADROOM 256
 
+/* Separating two types of XDP xmit */
+#define VIRTIO_XDP_TX		BIT(0)
+#define VIRTIO_XDP_REDIR	BIT(1)
+
 /* RX packet size EWMA. The average packet size is used to determine the packet
  * buffer size when refilling RX rings. As the entire RX ring may be refilled
  * at once, the weight is chosen so that the EWMA will be insensitive to short-
@@ -582,7 +586,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
 				     struct receive_queue *rq,
 				     void *buf, void *ctx,
 				     unsigned int len,
-				     bool *xdp_xmit)
+				     unsigned int *xdp_xmit)
 {
 	struct sk_buff *skb;
 	struct bpf_prog *xdp_prog;
@@ -654,14 +658,14 @@ static struct sk_buff *receive_small(struct net_device *dev,
 				trace_xdp_exception(vi->dev, xdp_prog, act);
 				goto err_xdp;
 			}
-			*xdp_xmit = true;
+			*xdp_xmit |= VIRTIO_XDP_TX;
 			rcu_read_unlock();
 			goto xdp_xmit;
 		case XDP_REDIRECT:
 			err = xdp_do_redirect(dev, &xdp, xdp_prog);
 			if (err)
 				goto err_xdp;
-			*xdp_xmit = true;
+			*xdp_xmit |= VIRTIO_XDP_REDIR;
 			rcu_read_unlock();
 			goto xdp_xmit;
 		default:
@@ -723,7 +727,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 					 void *buf,
 					 void *ctx,
 					 unsigned int len,
-					 bool *xdp_xmit)
+					 unsigned int *xdp_xmit)
 {
 	struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
 	u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
@@ -818,7 +822,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 					put_page(xdp_page);
 				goto err_xdp;
 			}
-			*xdp_xmit = true;
+			*xdp_xmit |= VIRTIO_XDP_TX;
 			if (unlikely(xdp_page != page))
 				put_page(page);
 			rcu_read_unlock();
@@ -830,7 +834,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 					put_page(xdp_page);
 				goto err_xdp;
 			}
-			*xdp_xmit = true;
+			*xdp_xmit |= VIRTIO_XDP_REDIR;
 			if (unlikely(xdp_page != page))
 				put_page(page);
 			rcu_read_unlock();
@@ -939,7 +943,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 }
 
 static int receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
-		       void *buf, unsigned int len, void **ctx, bool *xdp_xmit)
+		       void *buf, unsigned int len, void **ctx,
+		       unsigned int *xdp_xmit)
 {
 	struct net_device *dev = vi->dev;
 	struct sk_buff *skb;
@@ -1232,7 +1237,8 @@ static void refill_work(struct work_struct *work)
 	}
 }
 
-static int virtnet_receive(struct receive_queue *rq, int budget, bool *xdp_xmit)
+static int virtnet_receive(struct receive_queue *rq, int budget,
+			   unsigned int *xdp_xmit)
 {
 	struct virtnet_info *vi = rq->vq->vdev->priv;
 	unsigned int len, received = 0, bytes = 0;
@@ -1321,7 +1327,7 @@ static int virtnet_poll(struct napi_struct *napi, int budget)
 	struct virtnet_info *vi = rq->vq->vdev->priv;
 	struct send_queue *sq;
 	unsigned int received, qp;
-	bool xdp_xmit = false;
+	unsigned int xdp_xmit = 0;
 
 	virtnet_poll_cleantx(rq);
 
@@ -1331,12 +1337,14 @@ static int virtnet_poll(struct napi_struct *napi, int budget)
 	if (received < budget)
 		virtqueue_napi_complete(napi, rq->vq, received);
 
-	if (xdp_xmit) {
+	if (xdp_xmit & VIRTIO_XDP_REDIR)
+		xdp_do_flush_map();
+
+	if (xdp_xmit & VIRTIO_XDP_TX) {
 		qp = vi->curr_queue_pairs - vi->xdp_queue_pairs +
 		     smp_processor_id();
 		sq = &vi->sq[qp];
 		virtqueue_kick(sq->vq);
-		xdp_do_flush_map();
 	}
 
 	return received;

^ permalink raw reply related

* [PATCH net-next] sh_eth: fix *enum* {A|M}PR_BIT
From: Sergei Shtylyov @ 2018-06-26 15:42 UTC (permalink / raw)
  To: netdev, David S. Miller; +Cc: linux-renesas-soc
In-Reply-To: <2809eba8-4c9a-1d5f-a47d-8125777e365b@cogentembedded.com>

The *enum* {A|M}PR_BIT were declared in the commit 86a74ff21a7a ("net:
sh_eth: add support for  Renesas SuperH Ethernet") adding SH771x support,
however the SH771x manual  doesn't have the APR/MPR registers described
and the code writing to them for SH7710 was later removed by the commit
380af9e390ec ("net: sh_eth: CPU dependency code collect to "struct
sh_eth_cpu_data""). All the newer SoC manuals have these registers
documented as having a 16-bit TIME parameter of the PAUSE frame, not
1-bit -- update the *enum* accordingly, fixing up the APR/MPR writes...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

---
This patch is against DaveM's 'net-next.git' repo.

 drivers/net/ethernet/renesas/sh_eth.c |    4 ++--
 drivers/net/ethernet/renesas/sh_eth.h |    4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

Index: net-next/drivers/net/ethernet/renesas/sh_eth.c
===================================================================
--- net-next.orig/drivers/net/ethernet/renesas/sh_eth.c
+++ net-next/drivers/net/ethernet/renesas/sh_eth.c
@@ -1521,9 +1521,9 @@ static int sh_eth_dev_init(struct net_de
 
 	/* mask reset */
 	if (mdp->cd->apr)
-		sh_eth_write(ndev, APR_AP, APR);
+		sh_eth_write(ndev, 1, APR);
 	if (mdp->cd->mpr)
-		sh_eth_write(ndev, MPR_MP, MPR);
+		sh_eth_write(ndev, 1, MPR);
 	if (mdp->cd->tpauser)
 		sh_eth_write(ndev, TPAUSER_UNLIMITED, TPAUSER);
 
Index: net-next/drivers/net/ethernet/renesas/sh_eth.h
===================================================================
--- net-next.orig/drivers/net/ethernet/renesas/sh_eth.h
+++ net-next/drivers/net/ethernet/renesas/sh_eth.h
@@ -383,12 +383,12 @@ enum ECSIPR_STATUS_MASK_BIT {
 
 /* APR */
 enum APR_BIT {
-	APR_AP = 0x00000001,
+	APR_AP = 0x0000ffff,
 };
 
 /* MPR */
 enum MPR_BIT {
-	MPR_MP = 0x00000001,
+	MPR_MP = 0x0000ffff,
 };
 
 /* TRSCER */

^ permalink raw reply

* [PATCH net-next] tcp: remove one indentation level in tcp_create_openreq_child
From: Eric Dumazet @ 2018-06-26 15:45 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_minisocks.c | 223 ++++++++++++++++++++-------------------
 1 file changed, 113 insertions(+), 110 deletions(-)

diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 1dda1341a223937580b4efdbedb21ae50b221ff7..dac5893a52b4520d86ed2fcadbfb561a559fcd3d 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -449,119 +449,122 @@ struct sock *tcp_create_openreq_child(const struct sock *sk,
 				      struct sk_buff *skb)
 {
 	struct sock *newsk = inet_csk_clone_lock(sk, req, GFP_ATOMIC);
-
-	if (newsk) {
-		const struct inet_request_sock *ireq = inet_rsk(req);
-		struct tcp_request_sock *treq = tcp_rsk(req);
-		struct inet_connection_sock *newicsk = inet_csk(newsk);
-		struct tcp_sock *newtp = tcp_sk(newsk);
-		struct tcp_sock *oldtp = tcp_sk(sk);
-
-		smc_check_reset_syn_req(oldtp, req, newtp);
-
-		/* Now setup tcp_sock */
-		newtp->pred_flags = 0;
-
-		newtp->rcv_wup = newtp->copied_seq =
-		newtp->rcv_nxt = treq->rcv_isn + 1;
-		newtp->segs_in = 1;
-
-		newtp->snd_sml = newtp->snd_una =
-		newtp->snd_nxt = newtp->snd_up = treq->snt_isn + 1;
-
-		INIT_LIST_HEAD(&newtp->tsq_node);
-		INIT_LIST_HEAD(&newtp->tsorted_sent_queue);
-
-		tcp_init_wl(newtp, treq->rcv_isn);
-
-		newtp->srtt_us = 0;
-		newtp->mdev_us = jiffies_to_usecs(TCP_TIMEOUT_INIT);
-		minmax_reset(&newtp->rtt_min, tcp_jiffies32, ~0U);
-		newicsk->icsk_rto = TCP_TIMEOUT_INIT;
-		newicsk->icsk_ack.lrcvtime = tcp_jiffies32;
-
-		newtp->packets_out = 0;
-		newtp->retrans_out = 0;
-		newtp->sacked_out = 0;
-		newtp->snd_ssthresh = TCP_INFINITE_SSTHRESH;
-		newtp->tlp_high_seq = 0;
-		newtp->lsndtime = tcp_jiffies32;
-		newsk->sk_txhash = treq->txhash;
-		newtp->last_oow_ack_time = 0;
-		newtp->total_retrans = req->num_retrans;
-
-		/* So many TCP implementations out there (incorrectly) count the
-		 * initial SYN frame in their delayed-ACK and congestion control
-		 * algorithms that we must have the following bandaid to talk
-		 * efficiently to them.  -DaveM
-		 */
-		newtp->snd_cwnd = TCP_INIT_CWND;
-		newtp->snd_cwnd_cnt = 0;
-
-		/* There's a bubble in the pipe until at least the first ACK. */
-		newtp->app_limited = ~0U;
-
-		tcp_init_xmit_timers(newsk);
-		newtp->write_seq = newtp->pushed_seq = treq->snt_isn + 1;
-
-		newtp->rx_opt.saw_tstamp = 0;
-
-		newtp->rx_opt.dsack = 0;
-		newtp->rx_opt.num_sacks = 0;
-
-		newtp->urg_data = 0;
-
-		if (sock_flag(newsk, SOCK_KEEPOPEN))
-			inet_csk_reset_keepalive_timer(newsk,
-						       keepalive_time_when(newtp));
-
-		newtp->rx_opt.tstamp_ok = ireq->tstamp_ok;
-		newtp->rx_opt.sack_ok = ireq->sack_ok;
-		newtp->window_clamp = req->rsk_window_clamp;
-		newtp->rcv_ssthresh = req->rsk_rcv_wnd;
-		newtp->rcv_wnd = req->rsk_rcv_wnd;
-		newtp->rx_opt.wscale_ok = ireq->wscale_ok;
-		if (newtp->rx_opt.wscale_ok) {
-			newtp->rx_opt.snd_wscale = ireq->snd_wscale;
-			newtp->rx_opt.rcv_wscale = ireq->rcv_wscale;
-		} else {
-			newtp->rx_opt.snd_wscale = newtp->rx_opt.rcv_wscale = 0;
-			newtp->window_clamp = min(newtp->window_clamp, 65535U);
-		}
-		newtp->snd_wnd = (ntohs(tcp_hdr(skb)->window) <<
-				  newtp->rx_opt.snd_wscale);
-		newtp->max_window = newtp->snd_wnd;
-
-		if (newtp->rx_opt.tstamp_ok) {
-			newtp->rx_opt.ts_recent = req->ts_recent;
-			newtp->rx_opt.ts_recent_stamp = get_seconds();
-			newtp->tcp_header_len = sizeof(struct tcphdr) + TCPOLEN_TSTAMP_ALIGNED;
-		} else {
-			newtp->rx_opt.ts_recent_stamp = 0;
-			newtp->tcp_header_len = sizeof(struct tcphdr);
-		}
-		newtp->tsoffset = treq->ts_off;
+	const struct inet_request_sock *ireq = inet_rsk(req);
+	struct tcp_request_sock *treq = tcp_rsk(req);
+	struct inet_connection_sock *newicsk;
+	struct tcp_sock *oldtp, *newtp;
+
+	if (!newsk)
+		return NULL;
+
+	newicsk = inet_csk(newsk);
+	newtp = tcp_sk(newsk);
+	oldtp = tcp_sk(sk);
+
+	smc_check_reset_syn_req(oldtp, req, newtp);
+
+	/* Now setup tcp_sock */
+	newtp->pred_flags = 0;
+
+	newtp->rcv_wup = newtp->copied_seq =
+	newtp->rcv_nxt = treq->rcv_isn + 1;
+	newtp->segs_in = 1;
+
+	newtp->snd_sml = newtp->snd_una =
+	newtp->snd_nxt = newtp->snd_up = treq->snt_isn + 1;
+
+	INIT_LIST_HEAD(&newtp->tsq_node);
+	INIT_LIST_HEAD(&newtp->tsorted_sent_queue);
+
+	tcp_init_wl(newtp, treq->rcv_isn);
+
+	newtp->srtt_us = 0;
+	newtp->mdev_us = jiffies_to_usecs(TCP_TIMEOUT_INIT);
+	minmax_reset(&newtp->rtt_min, tcp_jiffies32, ~0U);
+	newicsk->icsk_rto = TCP_TIMEOUT_INIT;
+	newicsk->icsk_ack.lrcvtime = tcp_jiffies32;
+
+	newtp->packets_out = 0;
+	newtp->retrans_out = 0;
+	newtp->sacked_out = 0;
+	newtp->snd_ssthresh = TCP_INFINITE_SSTHRESH;
+	newtp->tlp_high_seq = 0;
+	newtp->lsndtime = tcp_jiffies32;
+	newsk->sk_txhash = treq->txhash;
+	newtp->last_oow_ack_time = 0;
+	newtp->total_retrans = req->num_retrans;
+
+	/* So many TCP implementations out there (incorrectly) count the
+	 * initial SYN frame in their delayed-ACK and congestion control
+	 * algorithms that we must have the following bandaid to talk
+	 * efficiently to them.  -DaveM
+	 */
+	newtp->snd_cwnd = TCP_INIT_CWND;
+	newtp->snd_cwnd_cnt = 0;
+
+	/* There's a bubble in the pipe until at least the first ACK. */
+	newtp->app_limited = ~0U;
+
+	tcp_init_xmit_timers(newsk);
+	newtp->write_seq = newtp->pushed_seq = treq->snt_isn + 1;
+
+	newtp->rx_opt.saw_tstamp = 0;
+
+	newtp->rx_opt.dsack = 0;
+	newtp->rx_opt.num_sacks = 0;
+
+	newtp->urg_data = 0;
+
+	if (sock_flag(newsk, SOCK_KEEPOPEN))
+		inet_csk_reset_keepalive_timer(newsk,
+					       keepalive_time_when(newtp));
+
+	newtp->rx_opt.tstamp_ok = ireq->tstamp_ok;
+	newtp->rx_opt.sack_ok = ireq->sack_ok;
+	newtp->window_clamp = req->rsk_window_clamp;
+	newtp->rcv_ssthresh = req->rsk_rcv_wnd;
+	newtp->rcv_wnd = req->rsk_rcv_wnd;
+	newtp->rx_opt.wscale_ok = ireq->wscale_ok;
+	if (newtp->rx_opt.wscale_ok) {
+		newtp->rx_opt.snd_wscale = ireq->snd_wscale;
+		newtp->rx_opt.rcv_wscale = ireq->rcv_wscale;
+	} else {
+		newtp->rx_opt.snd_wscale = newtp->rx_opt.rcv_wscale = 0;
+		newtp->window_clamp = min(newtp->window_clamp, 65535U);
+	}
+	newtp->snd_wnd = ntohs(tcp_hdr(skb)->window) << newtp->rx_opt.snd_wscale;
+	newtp->max_window = newtp->snd_wnd;
+
+	if (newtp->rx_opt.tstamp_ok) {
+		newtp->rx_opt.ts_recent = req->ts_recent;
+		newtp->rx_opt.ts_recent_stamp = get_seconds();
+		newtp->tcp_header_len = sizeof(struct tcphdr) + TCPOLEN_TSTAMP_ALIGNED;
+	} else {
+		newtp->rx_opt.ts_recent_stamp = 0;
+		newtp->tcp_header_len = sizeof(struct tcphdr);
+	}
+	newtp->tsoffset = treq->ts_off;
 #ifdef CONFIG_TCP_MD5SIG
-		newtp->md5sig_info = NULL;	/*XXX*/
-		if (newtp->af_specific->md5_lookup(sk, newsk))
-			newtp->tcp_header_len += TCPOLEN_MD5SIG_ALIGNED;
+	newtp->md5sig_info = NULL;	/*XXX*/
+	if (newtp->af_specific->md5_lookup(sk, newsk))
+		newtp->tcp_header_len += TCPOLEN_MD5SIG_ALIGNED;
 #endif
-		if (skb->len >= TCP_MSS_DEFAULT + newtp->tcp_header_len)
-			newicsk->icsk_ack.last_seg_size = skb->len - newtp->tcp_header_len;
-		newtp->rx_opt.mss_clamp = req->mss;
-		tcp_ecn_openreq_child(newtp, req);
-		newtp->fastopen_req = NULL;
-		newtp->fastopen_rsk = NULL;
-		newtp->syn_data_acked = 0;
-		newtp->rack.mstamp = 0;
-		newtp->rack.advanced = 0;
-		newtp->rack.reo_wnd_steps = 1;
-		newtp->rack.last_delivered = 0;
-		newtp->rack.reo_wnd_persist = 0;
-		newtp->rack.dsack_seen = 0;
+	if (skb->len >= TCP_MSS_DEFAULT + newtp->tcp_header_len)
+		newicsk->icsk_ack.last_seg_size = skb->len - newtp->tcp_header_len;
+	newtp->rx_opt.mss_clamp = req->mss;
+	tcp_ecn_openreq_child(newtp, req);
+	newtp->fastopen_req = NULL;
+	newtp->fastopen_rsk = NULL;
+	newtp->syn_data_acked = 0;
+	newtp->rack.mstamp = 0;
+	newtp->rack.advanced = 0;
+	newtp->rack.reo_wnd_steps = 1;
+	newtp->rack.last_delivered = 0;
+	newtp->rack.reo_wnd_persist = 0;
+	newtp->rack.dsack_seen = 0;
+
+	__TCP_INC_STATS(sock_net(sk), TCP_MIB_PASSIVEOPENS);
 
-		__TCP_INC_STATS(sock_net(sk), TCP_MIB_PASSIVEOPENS);
-	}
 	return newsk;
 }
 EXPORT_SYMBOL(tcp_create_openreq_child);
-- 
2.18.0.rc2.346.g013aa6912e-goog

^ permalink raw reply related

* Re: [PATCH v3,net-next] vlan: implement vlan id and protocol changes
From: Ido Schimmel @ 2018-06-26 15:57 UTC (permalink / raw)
  To: Chas Williams; +Cc: dsa, David S. Miller, netdev, Roopa Prabhu
In-Reply-To: <CAG2-Gkm0u3Od64nAMpUzq+=M+cj3VS0J1VQ8L5BChbo7vig+kA@mail.gmail.com>

On Tue, Jun 26, 2018 at 09:31:55AM -0400, Chas Williams wrote:
> On Mon, Jun 25, 2018 at 4:45 PM David Ahern <dsa@cumulusnetworks.com> wrote:
> 
> > On 6/25/18 4:30 AM, Chas Williams wrote:
> > > vlan_changelink silently ignores attempts to change the vlan id
> > > or protocol id of an existing vlan interface.  Implement by adding
> > > the new vlan id and protocol to the interface's vlan group and then
> > > removing the old vlan id and protocol from the vlan group.
> > >
> > > Signed-off-by: Chas Williams <3chas3@gmail.com>
> > > ---
> > >  include/linux/netdevice.h |  1 +
> > >  net/8021q/vlan.c          |  4 ++--
> > >  net/8021q/vlan.h          |  2 ++
> > >  net/8021q/vlan_netlink.c  | 38 ++++++++++++++++++++++++++++++++++++++
> > >  net/core/dev.c            |  1 +
> > >  5 files changed, 44 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 3ec9850c7936..a95ae238addf 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -2409,6 +2409,7 @@ enum netdev_cmd {
> > >       NETDEV_CVLAN_FILTER_DROP_INFO,
> > >       NETDEV_SVLAN_FILTER_PUSH_INFO,
> > >       NETDEV_SVLAN_FILTER_DROP_INFO,
> > > +     NETDEV_CHANGEVLAN,
> > >  };
> > >  const char *netdev_cmd_to_name(enum netdev_cmd cmd);
> > >
> >
> > you add the new notifier, but do not add any hooks to catch and process it.
> >
> 
> I can remove it.  I thought it would be prudent to add it now.
> This could also really be NETDEV_CHANGE.  I wasn't sure
> which would be more acceptable.
> 
> 
> > Personally, I think it is a bit sketchy to change the vlan id on an
> > existing device and I suspect it will cause latent errors.
> >
> 
> It's not any different than changing any other layer 2 property on a device.
> If you change the MTU or the MAC address, you are potentially going to
> cause latent errors.

It is different in switch ASICs, at least. The MTU and MAC don't have
any state associated with them. The VLAN does.

For example, when you assign an IP address to a VLAN device configured
on top of an mlxsw port (e.g., swp1.10), then you are basically creating
a router interface (RIF) that is able to route packets. This RIF is
bound to the port and the VLAN {1, 10} which cannot be changed during
the lifetime of the RIF (at least w/o impacting traffic). The MAC and
the MTU can be easily changed and are changed following
NETDEV_CHANGEADDR and NETDEV_CHANGEMTU events.

Similar problems exist in bridged VLAN devices.

> 
> 
> >
> > What's your use case for trying to implement the change versus causing
> > it to generate an unsupported error?
> >
> 
> It's far more convenient to be able to change the VLAN ID and proto
> instead of having to delete the link and put it back.  That's a lot of
> churn (netlink mesages, kernel calls) for something relatively simple.
> 
> 
> >
> > If this patch does get accepted, I believe the mlxsw switchdev driver
> > will be impacted.
> >
> 
> How so?  It was relying on the fact that VLAN changes were ignored?

It is relying on existing kernel behavior which doesn't allow to change
the VLAN.

tl;dr - I'm still not convinced this is actually needed, but if you're
going to allow such behavior, then please also include a notification
that enables existing in-kernel users to refuse the operation.

Thanks

^ permalink raw reply

* [PATCH v5 net-next] net:sched: add action inheritdsfield to skbedit
From: Fu, Qiaobin @ 2018-06-26 15:58 UTC (permalink / raw)
  To: davem@davemloft.net
  Cc: Marcelo Ricardo Leitner, Davide Caratti, Michel Machado,
	netdev@vger.kernel.org, jhs@mojatatu.com,
	xiyou.wangcong@gmail.com
In-Reply-To: <B84B92F9-B872-4430-B7E2-FBF23E543632@bu.edu>

The new action inheritdsfield copies the field DS of
IPv4 and IPv6 packets into skb->priority. This enables
later classification of packets based on the DS field.

v5:
*Update the drop counter for TC_ACT_SHOT

v4:
*Not allow setting flags other than the expected ones.

*Allow dumping the pure flags.

v3:
*Use optional flags, so that it won't break old versions of tc.

*Allow users to set both SKBEDIT_F_PRIORITY and SKBEDIT_F_INHERITDSFIELD flags.

v2:
*Fix the style issue

*Move the code from skbmod to skbedit

Original idea by Jamal Hadi Salim <jhs@mojatatu.com>

Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
Reviewed-by: Michel Machado <michel@digirati.com.br>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Davide Caratti <dcaratti@redhat.com>
---

Note that the motivation for this patch is found in the following discussion:
https://www.spinics.net/lists/netdev/msg501061.html
---
diff --git a/include/uapi/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h
index fbcfe27a4e6c..6de6071ebed6 100644
--- a/include/uapi/linux/tc_act/tc_skbedit.h
+++ b/include/uapi/linux/tc_act/tc_skbedit.h
@@ -30,6 +30,7 @@
#define SKBEDIT_F_MARK			0x4
#define SKBEDIT_F_PTYPE			0x8
#define SKBEDIT_F_MASK			0x10
+#define SKBEDIT_F_INHERITDSFIELD	0x20

struct tc_skbedit {
	tc_gen;
@@ -45,6 +46,7 @@ enum {
	TCA_SKBEDIT_PAD,
	TCA_SKBEDIT_PTYPE,
	TCA_SKBEDIT_MASK,
+	TCA_SKBEDIT_FLAGS,
	__TCA_SKBEDIT_MAX
};
#define TCA_SKBEDIT_MAX (__TCA_SKBEDIT_MAX - 1)
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index 6138d1d71900..dfaf5d8028dd 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -23,6 +23,9 @@
#include <linux/rtnetlink.h>
#include <net/netlink.h>
#include <net/pkt_sched.h>
+#include <net/ip.h>
+#include <net/ipv6.h>
+#include <net/dsfield.h>

#include <linux/tc_act/tc_skbedit.h>
#include <net/tc_act/tc_skbedit.h>
@@ -41,6 +44,25 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,

	if (d->flags & SKBEDIT_F_PRIORITY)
		skb->priority = d->priority;
+	if (d->flags & SKBEDIT_F_INHERITDSFIELD) {
+		int wlen = skb_network_offset(skb);
+
+		switch (tc_skb_protocol(skb)) {
+		case htons(ETH_P_IP):
+			wlen += sizeof(struct iphdr);
+			if (!pskb_may_pull(skb, wlen))
+				goto err;
+			skb->priority = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
+			break;
+
+		case htons(ETH_P_IPV6):
+			wlen += sizeof(struct ipv6hdr);
+			if (!pskb_may_pull(skb, wlen))
+				goto err;
+			skb->priority = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
+			break;
+		}
+	}
	if (d->flags & SKBEDIT_F_QUEUE_MAPPING &&
	    skb->dev->real_num_tx_queues > d->queue_mapping)
		skb_set_queue_mapping(skb, d->queue_mapping);
@@ -53,6 +75,11 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,

	spin_unlock(&d->tcf_lock);
	return d->tcf_action;
+
+err:
+	d->tcf_qstats.drops++;
+	spin_unlock(&d->tcf_lock);
+	return TC_ACT_SHOT;
}

static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
@@ -62,6 +89,7 @@ static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
	[TCA_SKBEDIT_MARK]		= { .len = sizeof(u32) },
	[TCA_SKBEDIT_PTYPE]		= { .len = sizeof(u16) },
	[TCA_SKBEDIT_MASK]		= { .len = sizeof(u32) },
+	[TCA_SKBEDIT_FLAGS]		= { .len = sizeof(u64) },
};

static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
@@ -114,6 +142,13 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
		mask = nla_data(tb[TCA_SKBEDIT_MASK]);
	}

+	if (tb[TCA_SKBEDIT_FLAGS] != NULL) {
+		u64 *pure_flags = nla_data(tb[TCA_SKBEDIT_FLAGS]);
+
+		if (*pure_flags & SKBEDIT_F_INHERITDSFIELD)
+			flags |= SKBEDIT_F_INHERITDSFIELD;
+	}
+
	parm = nla_data(tb[TCA_SKBEDIT_PARMS]);

	exists = tcf_idr_check(tn, parm->index, a, bind);
@@ -178,6 +213,7 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a,
		.action  = d->tcf_action,
	};
	struct tcf_t t;
+	u64 pure_flags = 0;

	if (nla_put(skb, TCA_SKBEDIT_PARMS, sizeof(opt), &opt))
		goto nla_put_failure;
@@ -196,6 +232,11 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a,
	if ((d->flags & SKBEDIT_F_MASK) &&
	    nla_put_u32(skb, TCA_SKBEDIT_MASK, d->mask))
		goto nla_put_failure;
+	if (d->flags & SKBEDIT_F_INHERITDSFIELD)
+		pure_flags |= SKBEDIT_F_INHERITDSFIELD;
+	if (pure_flags != 0 &&
+	    nla_put(skb, TCA_SKBEDIT_FLAGS, sizeof(pure_flags), &pure_flags))
+		goto nla_put_failure;

	tcf_tm_dump(&t, &d->tcf_tm);
	if (nla_put_64bit(skb, TCA_SKBEDIT_TM, sizeof(t), &t, TCA_SKBEDIT_PAD))

^ permalink raw reply related

* Re: [RFC] net: Add new LoRaWAN subsystem
From: Jian-Hong Pan @ 2018-06-26 16:02 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Marcel Holtmann, David S. Miller, Alexander Aring, Stefan Schmidt,
	linux-wpan - ML, netdev, linux-kernel
In-Reply-To: <5e6b2d6a-f413-7547-6c03-c41dff21453c@suse.de>

Hi Andreas,

2018-06-24 23:49 GMT+08:00 Andreas Färber <afaerber@suse.de>:
> Hi Jian-Hong Pan,
>
> Am 13.05.2018 um 04:42 schrieb Jian-Hong Pan:
>> Hi Jiri and Marcel,
>>
>> 2018-05-11 23:39 GMT+08:00 Marcel Holtmann <marcel@holtmann.org>:
>>> Hi Jian-Hong,
>>>
>>>> A Low-Power Wide-Area Network (LPWAN) is a type of wireless
>>>> telecommunication wide area network designed to allow long range
>>>> communications at a low bit rate among things (connected objects), such
>>>> as sensors operated on a battery.  It can be used widely in IoT area.
>>>> LoRaWAN, which is one kind of implementation of LPWAN, is a medium
>>>> access control (MAC) layer protocol for managing communication between
>>>> LPWAN gateways and end-node devices, maintained by the LoRa Alliance.
>>>> LoRaWAN™ Specification could be downloaded at:
>>>> https://lora-alliance.org/lorawan-for-developers
>>>>
>>>> However, LoRaWAN is not implemented in Linux kernel right now, so I am
>>>> trying to develop it.  Here is my repository:
>>>> https://github.com/starnight/LoRa/tree/lorawan-ndo/LoRaWAN
>>>>
>>>> Because it is a kind of network, the ideal usage in an user space
>>>> program should be like "socket(PF_LORAWAN, SOCK_DGRAM, 0)" and with
>>>> other socket APIs.  Therefore, the definitions like AF_LORAWAN,
>>>> PF_LORAWAN ..., must be listed in the header files of glibc.
>>>> For the driver in kernel space, the definitions also must be listed in
>>>> the corresponding Linux socket header files.
>>>> Especially, both are for the testing programs.
>>>>
>>>> Back to the mentioned "LoRaWAN is not implemented in Linux kernel now".
>>>> Could or should we add the definitions into corresponding kernel header
>>>> files now, if LoRaWAN will be accepted as a subsystem in Linux?
>>>
>>> when you submit your LoRaWAN subsystem to netdev for review, include a patch that adds these new address family definitions. Just pick the next one available. There will be no pre-allocation of numbers until your work has been accepted upstream. Meaning, that the number might change if other address families get merged before yours. So you have to keep updating. glibc will eventually follow the number assigned by the kernel.
>>
>> Thanks for your guidance.  I will follow the steps.
>
> I have been working on a similar thing on and off since proposing it at
> FOSDEM 2017:

Wow!  Great!  I get new friends :)

> At https://github.com/afaerber/lora-modules you will find my proof of
> concept of PF_LORA with SOCK_DGRAM and stub drivers for various modules.
> My idea was to layer LoRaWAN on top of LoRa later.

We have the same idea here.

> The way I have developed this was to simply reuse numbers unused in our
> distro kernel and built my modules against the distro kernel, to avoid
> frequent reboots and full kernel builds.

I use the the AF_MAX number as AF_LORAWAN and the new AF_MAX will be
the old AF_MAX + 1
And so on ...

> Not having looked at your code yet, do you think our implementations are
> fairly independent at this point, or do you see conflicts apart from
> number allocation? Like, I am currently using lora0 as name - are you
> planning to use lorawan0 or rather something more generic like lpwan0?

The interface name I created is loraX, X will be 0, 1, 2 ...

4: lora0: <NOARP,UP,LOWER_UP> mtu 20 qdisc noqueue state UNKNOWN group
default qlen 1000
    link/[830] 01:02:03:04 brd ff:ff:ff:ff

> We might place your code in net/lora/lorawan/ and mine in net/lora/?

My implementation is:
LoRaWAN class module: net/lorawan/
LoRa device drivers: drivers/net/lorawan/sx127X ...

> More problematic would be the actual device drivers, where some devices
> would support both modes - some with soft MAC, others with full MAC. Do
> you have any ideas how to handle that in a sane way?

Let me guess!  You have the LoRa "chips" and "modules", am I correct?

If I am right, here is my opinion:
Most of the LoRa chips go with the SPI interface.  These are okay.

However, the LoRa modules go with their own protocols (like AT
commands) over the serial port.
The modules also are the combination of an MCU and a LoRa chip. Users
can flash the firmware of the MCU on their own directly.  I prefer having user
space applications deal with these modules, until there is a formal spec for
these kind of modules.

Regards,
Jian-Hong Pan

> Please keep me CC'ed on any follow-ups.
>
> Regards,
> Andreas
>
> --
> SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
> GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Cornelia Huck @ 2018-06-26 16:03 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Samudrala, Sridhar, Siwei Liu, Alexander Duyck, virtio-dev,
	aaron.f.brown, Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel,
	virtualization, konrad.wilk, boris.ostrovsky, Joao Martins,
	Venu Busireddy, vijay.balakrishna
In-Reply-To: <20180626183706-mutt-send-email-mst@kernel.org>

On Tue, 26 Jun 2018 18:38:51 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Tue, Jun 26, 2018 at 05:17:32PM +0200, Cornelia Huck wrote:
> > On Tue, 26 Jun 2018 04:50:25 +0300
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >   
> > > On Mon, Jun 25, 2018 at 10:54:09AM -0700, Samudrala, Sridhar wrote:  
> > > > > > > > Might not neccessarily be something wrong, but it's very limited to
> > > > > > > > prohibit the MAC of VF from changing when enslaved by failover.    
> > > > > > > You mean guest changing MAC? I'm not sure why we prohibit that.    
> > > > > > I think Sridhar and Jiri might be better person to answer it. My
> > > > > > impression was that sync'ing the MAC address change between all 3
> > > > > > devices is challenging, as the failover driver uses MAC address to
> > > > > > match net_device internally.    
> > > > 
> > > > Yes. The MAC address is assigned by the hypervisor and it needs to manage the movement
> > > > of the MAC between the PF and VF.  Allowing the guest to change the MAC will require
> > > > synchronization between the hypervisor and the PF/VF drivers. Most of the VF drivers
> > > > don't allow changing guest MAC unless it is a trusted VF.    
> > > 
> > > OK but it's a policy thing. Maybe it's a trusted VF. Who knows?
> > > For example I can see host just
> > > failing VIRTIO_NET_CTRL_MAC_ADDR_SET if it wants to block it.
> > > I'm not sure why VIRTIO_NET_F_STANDBY has to block it in the guest.
> > >   
> > 
> > So, what I get from this is that QEMU needs to be able to control all
> > of standby, uuid, and mac to accommodate the different setups
> > (respectively have libvirt/management software set it up). Is the host
> > able to find out respectively define whether a VF is trusted?  
> 
> You do it with ip link I think but QEMU doesn't normally do this,
> it relies on libvirt to poke at host kernel and supply the info.
> 

Ok, that makes me conclude that we definitely need to involve the
libvirt folks before we proceed further with defining QEMU interfaces.

^ permalink raw reply

* Re: [PATCH net-next] liquidio: fix kernel panic when NIC firmware is older than 1.7.2
From: Shannon Nelson @ 2018-06-26 16:03 UTC (permalink / raw)
  To: Felix Manlunas, davem
  Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla,
	ricardo.farrington
In-Reply-To: <20180626115807.GA7089@felix-thinkpad.cavium.com>

On 6/26/2018 4:58 AM, Felix Manlunas wrote:
> From: Rick Farrington <ricardo.farrington@cavium.com>
> 
> Pre-1.7.2 NIC firmware does not support (and does not respond to) the "get
> speed" command which is sent by the 1.7.2 driver during modprobe.  Due to a
> bug in older firmware (with respect to unknown commands), this unsupported
> command causes a cascade of errors that ends in a kernel panic.
> 
> Fix it by making the sending of the "get speed" command conditional on the
> firmware version.
> 
> Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com>
> Acked-by: Derek Chickles <derek.chickles@cavium.com>
> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
> ---
> Note: To avoid checkpatch.pl "WARNING: line over 80 characters", the comma
>        that separates the arguments in the call to strcmp() was placed one
>        line below the usual spot.
> 
>   drivers/net/ethernet/cavium/liquidio/lio_main.c | 11 ++++++++++-
>   1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c
> index 7cb4e75..f83f884 100644
> --- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
> +++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
> @@ -3671,7 +3671,16 @@ static int setup_nic_devices(struct octeon_device *octeon_dev)
>   			OCTEON_CN2350_25GB_SUBSYS_ID ||
>   		    octeon_dev->subsystem_id ==
>   			OCTEON_CN2360_25GB_SUBSYS_ID) {
> -			liquidio_get_speed(lio);
> +			/* speed control unsupported in f/w older than 1.7.2 */
> +			if (strcmp(octeon_dev->fw_info.liquidio_firmware_version
> +			   , "1.7.2") < 0) {

Will the liquidio_firmware_version ever end up something like 1.7.10? 
If so, this strcmp() may not do what you want.

sln

> +				dev_info(&octeon_dev->pci_dev->dev,
> +					 "speed setting not supported by f/w.");
> +				octeon_dev->speed_setting = 25;
> +				octeon_dev->no_speed_setting = 1;
> +			} else {
> +				liquidio_get_speed(lio);
> +			}
>   
>   			if (octeon_dev->speed_setting == 0) {
>   				octeon_dev->speed_setting = 25;
> 

^ permalink raw reply

* Re: [PATCH net] netfilter: nf_log: fix uninit read in nf_log_proc_dostring
From: Pablo Neira Ayuso @ 2018-06-26 16:05 UTC (permalink / raw)
  To: Jann Horn
  Cc: Jozsef Kadlecsik, Florian Westphal, netfilter-devel, coreteam,
	David S. Miller, netdev, linux-kernel
In-Reply-To: <20180620163345.212776-1-jannh@google.com>

On Wed, Jun 20, 2018 at 06:33:45PM +0200, Jann Horn wrote:
> When proc_dostring() is called with a non-zero offset in strict mode, it
> doesn't just write to the ->data buffer, it also reads. Make sure it
> doesn't read uninitialized data.

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net] netfilter: nf_log: don't hold nf_log_mutex during user access
From: Pablo Neira Ayuso @ 2018-06-26 16:05 UTC (permalink / raw)
  To: Jann Horn
  Cc: Jozsef Kadlecsik, Florian Westphal, David S. Miller,
	netfilter-devel, coreteam, netdev, linux-kernel, security
In-Reply-To: <20180625152200.200145-1-jannh@google.com>

On Mon, Jun 25, 2018 at 05:22:00PM +0200, Jann Horn wrote:
> The old code would indefinitely block other users of nf_log_mutex if
> a userspace access in proc_dostring() blocked e.g. due to a userfaultfd
> region. Fix it by moving proc_dostring() out of the locked region.
> 
> This is a followup to commit 266d07cb1c9a ("netfilter: nf_log: fix
> sleeping function called from invalid context"), which changed this code
> from using rcu_read_lock() to taking nf_log_mutex.

Applied.

^ permalink raw reply

* Re: [PATCH net-next] sh_eth: fix *enum* {A|M}PR_BIT
From: Geert Uytterhoeven @ 2018-06-26 16:18 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: netdev, David S. Miller, Linux-Renesas
In-Reply-To: <e67e6256-4ae9-4527-d482-cf3bb50921cf@cogentembedded.com>

On Tue, Jun 26, 2018 at 5:43 PM Sergei Shtylyov
<sergei.shtylyov@cogentembedded.com> wrote:
> The *enum* {A|M}PR_BIT were declared in the commit 86a74ff21a7a ("net:
> sh_eth: add support for  Renesas SuperH Ethernet") adding SH771x support,
> however the SH771x manual  doesn't have the APR/MPR registers described
> and the code writing to them for SH7710 was later removed by the commit
> 380af9e390ec ("net: sh_eth: CPU dependency code collect to "struct
> sh_eth_cpu_data""). All the newer SoC manuals have these registers
> documented as having a 16-bit TIME parameter of the PAUSE frame, not
> 1-bit -- update the *enum* accordingly, fixing up the APR/MPR writes...
>
> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox