Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH mlx5-next 05/12] net/mlx5: Rate limit errors in command interface
From: Leon Romanovsky @ 2018-06-27  5:48 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: RDMA mailing list, Hadar Hen Zion, Matan Barak, Michael J Ruhl,
	Noa Osherovich, Raed Salem, Yishai Hadas, Saeed Mahameed,
	linux-netdev
In-Reply-To: <20180624082353.16138-6-leon@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 952 bytes --]

On Sun, Jun 24, 2018 at 11:23:46AM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@mellanox.com>
>
> Any error status returned by FW will trigger similar
> to the following error message in the dmesg.
>
> [   55.884355] mlx5_core 0000:00:04.0: mlx5_cmd_check:712:(pid 555):
> ALLOC_UAR(0x802) op_mod(0x0) failed, status limits exceeded(0x8),
> syndrome (0x0)
>
> Those prints are extremely valuable to diagnose issues with running
> system and it is important to keep them. However, not-so-careful user
> can trigger endless number of such prints by depleting HW resources
> and will spam dmesg.
>
> Rate limiting of such messages solves this issue.
>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/cmd.c       | 11 ++++-------
>  drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h |  6 ++++++
>  2 files changed, 10 insertions(+), 7 deletions(-)
>

Thanks, applied to mlx5-next.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: [patch net-next v2 0/9] net: sched: introduce chain templates support with offloading to mlxsw
From: Jiri Pirko @ 2018-06-27  6:05 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	Jakub Kicinski, Simon Horman, john.hurley, David Ahern, mlxsw
In-Reply-To: <CAM_iQpVzuX4P_C=APtWcTMGtuFZiS_ncAaJvtF6Chz19pG_oJg@mail.gmail.com>

Wed, Jun 27, 2018 at 02:04:31AM CEST, xiyou.wangcong@gmail.com wrote:
>On Tue, Jun 26, 2018 at 1:01 AM Jiri Pirko <jiri@resnulli.us> wrote:
>> Create dummy device with clsact first:
>> # ip link add type dummy
>> # tc qdisc add dev dummy0 clsact
>>
>> There is no template assigned by default:
>> # tc filter template show dev dummy0 ingress
>>
>> Add a template of type flower allowing to insert rules matching on last
>> 2 bytes of destination mac address:
>> # tc filter template add dev dummy0 ingress proto ip flower dst_mac 00:00:00:00:00:00/00:00:00:00:FF:FF
>
>Now you are extending 'tc filter' command with a new
>subcommand 'template', which looks weird.
>
>Why not make it a new property of filter like you did for chain?
>Like:
>
>tc filter add dev dummy0 ingress proto ip template flower

But unlike chain, this is not a filter property. For chain, when you add
filter, you add it to a specific chain. That makes sense.
But for template, you need to add the template first. Then, later on,
you add filters which either match or does not match the template.
Does not make sense to have "template" the filter property as you
suggest.

>
>which is much better IMHO.

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Siwei Liu @ 2018-06-27  6:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Samudrala, Sridhar, Cornelia Huck, Alexander Duyck, virtio-dev,
	aaron.f.brown, Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel,
	virtualization, konrad.wilk, boris.ostrovsky, Joao Martins,
	Venu Busireddy, vijay.balakrishna
In-Reply-To: <20180627032825-mutt-send-email-mst@kernel.org>

On Tue, Jun 26, 2018 at 5:29 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Tue, Jun 26, 2018 at 04:38:26PM -0700, Siwei Liu wrote:
>> On Mon, Jun 25, 2018 at 6:50 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > On Mon, Jun 25, 2018 at 10:54:09AM -0700, Samudrala, Sridhar wrote:
>> >> > > > > Might not neccessarily be something wrong, but it's very limited to
>> >> > > > > prohibit the MAC of VF from changing when enslaved by failover.
>> >> > > > You mean guest changing MAC? I'm not sure why we prohibit that.
>> >> > > I think Sridhar and Jiri might be better person to answer it. My
>> >> > > impression was that sync'ing the MAC address change between all 3
>> >> > > devices is challenging, as the failover driver uses MAC address to
>> >> > > match net_device internally.
>> >>
>> >> Yes. The MAC address is assigned by the hypervisor and it needs to manage the movement
>> >> of the MAC between the PF and VF.  Allowing the guest to change the MAC will require
>> >> synchronization between the hypervisor and the PF/VF drivers. Most of the VF drivers
>> >> don't allow changing guest MAC unless it is a trusted VF.
>> >
>> > OK but it's a policy thing. Maybe it's a trusted VF. Who knows?
>> > For example I can see host just
>> > failing VIRTIO_NET_CTRL_MAC_ADDR_SET if it wants to block it.
>> > I'm not sure why VIRTIO_NET_F_STANDBY has to block it in the guest.
>>
>> That's why I think pairing using MAC is fragile IMHO. When VF's MAC
>> got changed before virtio attempts to match and pair the device, it
>> ends up with no pairing found out at all.
>
> Guest seems to match on the hardware mac and ignore whatever
> is set by user. Makes sense to me and should not be fragile.
Host can change the hardware mac for VF any time.

-Siwei
>
>
>> UUID is better.
>>
>> -Siwei
>>
>> >
>> > --
>> > MST

^ permalink raw reply

* Re: [patch net-next v2 0/9] net: sched: introduce chain templates support with offloading to mlxsw
From: Samudrala, Sridhar @ 2018-06-27  6:34 UTC (permalink / raw)
  To: Jiri Pirko, Cong Wang
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	Jakub Kicinski, Simon Horman, john.hurley, David Ahern, mlxsw
In-Reply-To: <20180627060504.GS2161@nanopsycho>


On 6/26/2018 11:05 PM, Jiri Pirko wrote:
> Wed, Jun 27, 2018 at 02:04:31AM CEST, xiyou.wangcong@gmail.com wrote:
>> On Tue, Jun 26, 2018 at 1:01 AM Jiri Pirko <jiri@resnulli.us> wrote:
>>> Create dummy device with clsact first:
>>> # ip link add type dummy
>>> # tc qdisc add dev dummy0 clsact
>>>
>>> There is no template assigned by default:
>>> # tc filter template show dev dummy0 ingress
>>>
>>> Add a template of type flower allowing to insert rules matching on last
>>> 2 bytes of destination mac address:
>>> # tc filter template add dev dummy0 ingress proto ip flower dst_mac 00:00:00:00:00:00/00:00:00:00:FF:FF
>> Now you are extending 'tc filter' command with a new
>> subcommand 'template', which looks weird.
>>
>> Why not make it a new property of filter like you did for chain?
>> Like:
>>
>> tc filter add dev dummy0 ingress proto ip template flower
> But unlike chain, this is not a filter property. For chain, when you add
> filter, you add it to a specific chain. That makes sense.
> But for template, you need to add the template first. Then, later on,
> you add filters which either match or does not match the template.

So can we say that template defines the types of rules(match fields/masks) that
can be added to a specific chain and there is 1-1 relationship between a template
and a chain?

Without attaching a template to a chain, i guess it is possible to add different
types of rules to a chain?


> Does not make sense to have "template" the filter property as you
> suggest.

template seems to a chain property.

>> which is much better IMHO.

^ permalink raw reply

* Re: [PATCH v2 net] nfp: cast sizeof() to int when comparing with error code
From: David Miller @ 2018-06-27  6:38 UTC (permalink / raw)
  To: cgxu519; +Cc: jakub.kicinski, oss-drivers, netdev
In-Reply-To: <20180626011631.22717-1-cgxu519@gmx.com>

From: Chengguang Xu <cgxu519@gmx.com>
Date: Tue, 26 Jun 2018 09:16:31 +0800

> sizeof() will return unsigned value so in the error check
> negative error code will be always larger than sizeof().
> 
> Fixes: a0d8e02c35ff ("nfp: add support for reading nffw info")
> 
> Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> ---
> v2:
> - Add more information to patch subject and commit log.

I guess the:

	if (x < 0 || x < sizeof(foo))

better self-documents the situation, but this patch is fine too
so I have applied it.

Thanks.

^ permalink raw reply

* Re: [PATCH net-next] neighbour: force neigh_invalidate when NUD_FAILED update is from admin
From: David Miller @ 2018-06-27  6:41 UTC (permalink / raw)
  To: roopa; +Cc: netdev
In-Reply-To: <1529983973-5508-1-git-send-email-roopa@cumulusnetworks.com>

From: Roopa Prabhu <roopa@cumulusnetworks.com>
Date: Mon, 25 Jun 2018 20:32:53 -0700

> From: Roopa Prabhu <roopa@cumulusnetworks.com>
> 
> In systems where neigh gc thresh holds are set to high values,
> admin deleted neigh entries (eg ip neigh flush or ip neigh del) can
> linger around in NUD_FAILED state for a long time until periodic gc kicks
> in. This patch forces neigh_invalidate when NUD_FAILED neigh_update is
> from an admin.
> 
> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>

Applied.

^ permalink raw reply

* Re: Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Samudrala, Sridhar @ 2018-06-27  6:49 UTC (permalink / raw)
  To: Siwei Liu, Michael S. Tsirkin
  Cc: Cornelia Huck, Alexander Duyck, virtio-dev, aaron.f.brown,
	Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel, virtualization,
	konrad.wilk, boris.ostrovsky, Joao Martins, Venu Busireddy,
	vijay.balakrishna
In-Reply-To: <CADGSJ22Sv_AFUkDou6hdq9++RNC40BTB-puS9209tOqzGLFJ-g@mail.gmail.com>

On 6/26/2018 11:21 PM, Siwei Liu wrote:
> On Tue, Jun 26, 2018 at 5:29 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Tue, Jun 26, 2018 at 04:38:26PM -0700, Siwei Liu wrote:
>>> On Mon, Jun 25, 2018 at 6:50 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>>> On Mon, Jun 25, 2018 at 10:54:09AM -0700, Samudrala, Sridhar wrote:
>>>>>>>>> Might not neccessarily be something wrong, but it's very limited to
>>>>>>>>> prohibit the MAC of VF from changing when enslaved by failover.
>>>>>>>> You mean guest changing MAC? I'm not sure why we prohibit that.
>>>>>>> I think Sridhar and Jiri might be better person to answer it. My
>>>>>>> impression was that sync'ing the MAC address change between all 3
>>>>>>> devices is challenging, as the failover driver uses MAC address to
>>>>>>> match net_device internally.
>>>>> Yes. The MAC address is assigned by the hypervisor and it needs to manage the movement
>>>>> of the MAC between the PF and VF.  Allowing the guest to change the MAC will require
>>>>> synchronization between the hypervisor and the PF/VF drivers. Most of the VF drivers
>>>>> don't allow changing guest MAC unless it is a trusted VF.
>>>> OK but it's a policy thing. Maybe it's a trusted VF. Who knows?
>>>> For example I can see host just
>>>> failing VIRTIO_NET_CTRL_MAC_ADDR_SET if it wants to block it.
>>>> I'm not sure why VIRTIO_NET_F_STANDBY has to block it in the guest.
>>> That's why I think pairing using MAC is fragile IMHO. When VF's MAC
>>> got changed before virtio attempts to match and pair the device, it
>>> ends up with no pairing found out at all.
>> Guest seems to match on the hardware mac and ignore whatever
>> is set by user. Makes sense to me and should not be fragile.
> Host can change the hardware mac for VF any time.

Live migration is initiated and controlled by the Host,  So the Source Host will
reset the MAC during live migration after unplugging the VF. This is to redirect the
VMs frames towards PF so that they can be received via virtio-net standby interface.
The destination host will set the VF MAC and plug the VF after live migration is
completed.

Allowing the guest to change the MAC will require the qemu/libvirt/mgmt layers to
track the MAC changes and replay that change after live migration.


>
> -Siwei
>>
>>> UUID is better.
>>>
>>> -Siwei
>>>
>>>> --
>>>> MST

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Siwei Liu @ 2018-06-27  7:03 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: Alexander Duyck, virtio-dev, Jiri Pirko, konrad.wilk,
	Michael S. Tsirkin, Jakub Kicinski, Netdev, Cornelia Huck,
	qemu-devel, virtualization, Venu Busireddy, boris.ostrovsky,
	aaron.f.brown, Joao Martins
In-Reply-To: <19654612-a321-2ce9-9c1c-bcbae3a10e2f@intel.com>

On Tue, Jun 26, 2018 at 11:49 PM, Samudrala, Sridhar
<sridhar.samudrala@intel.com> wrote:
> On 6/26/2018 11:21 PM, Siwei Liu wrote:
>>
>> On Tue, Jun 26, 2018 at 5:29 PM, Michael S. Tsirkin <mst@redhat.com>
>> wrote:
>>>
>>> On Tue, Jun 26, 2018 at 04:38:26PM -0700, Siwei Liu wrote:
>>>>
>>>> On Mon, Jun 25, 2018 at 6:50 PM, Michael S. Tsirkin <mst@redhat.com>
>>>> wrote:
>>>>>
>>>>> On Mon, Jun 25, 2018 at 10:54:09AM -0700, Samudrala, Sridhar wrote:
>>>>>>>>>>
>>>>>>>>>> Might not neccessarily be something wrong, but it's very limited
>>>>>>>>>> to
>>>>>>>>>> prohibit the MAC of VF from changing when enslaved by failover.
>>>>>>>>>
>>>>>>>>> You mean guest changing MAC? I'm not sure why we prohibit that.
>>>>>>>>
>>>>>>>> I think Sridhar and Jiri might be better person to answer it. My
>>>>>>>> impression was that sync'ing the MAC address change between all 3
>>>>>>>> devices is challenging, as the failover driver uses MAC address to
>>>>>>>> match net_device internally.
>>>>>>
>>>>>> Yes. The MAC address is assigned by the hypervisor and it needs to
>>>>>> manage the movement
>>>>>> of the MAC between the PF and VF.  Allowing the guest to change the
>>>>>> MAC will require
>>>>>> synchronization between the hypervisor and the PF/VF drivers. Most of
>>>>>> the VF drivers
>>>>>> don't allow changing guest MAC unless it is a trusted VF.
>>>>>
>>>>> OK but it's a policy thing. Maybe it's a trusted VF. Who knows?
>>>>> For example I can see host just
>>>>> failing VIRTIO_NET_CTRL_MAC_ADDR_SET if it wants to block it.
>>>>> I'm not sure why VIRTIO_NET_F_STANDBY has to block it in the guest.
>>>>
>>>> That's why I think pairing using MAC is fragile IMHO. When VF's MAC
>>>> got changed before virtio attempts to match and pair the device, it
>>>> ends up with no pairing found out at all.
>>>
>>> Guest seems to match on the hardware mac and ignore whatever
>>> is set by user. Makes sense to me and should not be fragile.
>>
>> Host can change the hardware mac for VF any time.
>
>
> Live migration is initiated and controlled by the Host,  So the Source Host
> will
> reset the MAC during live migration after unplugging the VF. This is to
> redirect the
> VMs frames towards PF so that they can be received via virtio-net standby
> interface.
> The destination host will set the VF MAC and plug the VF after live
> migration is
> completed.
>
> Allowing the guest to change the MAC will require the qemu/libvirt/mgmt
> layers to
> track the MAC changes and replay that change after live migration.
>
If the failover's MAC is kept in sync with VF's MAC address change,
the VF on destination host can be paired using the permanent address
after plugging in, while failover interface will resync the MAC to the
current one in use when enslaving the VF. I think similar is done for
multicast and unicast address list on VF's registration, right? No
need of QEMU or mgmt software keep track of MAC changes.

-Siwei

>
>
>>
>> -Siwei
>>>
>>>
>>>> UUID is better.
>>>>
>>>> -Siwei
>>>>
>>>>> --
>>>>> MST
>
>

^ permalink raw reply

* Re: [patch net-next v2 0/9] net: sched: introduce chain templates support with offloading to mlxsw
From: Jiri Pirko @ 2018-06-27  7:03 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: Cong Wang, Linux Kernel Network Developers, David Miller,
	Jamal Hadi Salim, Jakub Kicinski, Simon Horman, john.hurley,
	David Ahern, mlxsw
In-Reply-To: <4b2e7870-b225-3855-f3ac-183126142c1c@intel.com>

Wed, Jun 27, 2018 at 08:34:46AM CEST, sridhar.samudrala@intel.com wrote:
>
>On 6/26/2018 11:05 PM, Jiri Pirko wrote:
>> Wed, Jun 27, 2018 at 02:04:31AM CEST, xiyou.wangcong@gmail.com wrote:
>> > On Tue, Jun 26, 2018 at 1:01 AM Jiri Pirko <jiri@resnulli.us> wrote:
>> > > Create dummy device with clsact first:
>> > > # ip link add type dummy
>> > > # tc qdisc add dev dummy0 clsact
>> > > 
>> > > There is no template assigned by default:
>> > > # tc filter template show dev dummy0 ingress
>> > > 
>> > > Add a template of type flower allowing to insert rules matching on last
>> > > 2 bytes of destination mac address:
>> > > # tc filter template add dev dummy0 ingress proto ip flower dst_mac 00:00:00:00:00:00/00:00:00:00:FF:FF
>> > Now you are extending 'tc filter' command with a new
>> > subcommand 'template', which looks weird.
>> > 
>> > Why not make it a new property of filter like you did for chain?
>> > Like:
>> > 
>> > tc filter add dev dummy0 ingress proto ip template flower
>> But unlike chain, this is not a filter property. For chain, when you add
>> filter, you add it to a specific chain. That makes sense.
>> But for template, you need to add the template first. Then, later on,
>> you add filters which either match or does not match the template.
>
>So can we say that template defines the types of rules(match fields/masks) that
>can be added to a specific chain and there is 1-1 relationship between a template
>and a chain?

yes

>
>Without attaching a template to a chain, i guess it is possible to add different
>types of rules to a chain?

yes

>
>
>> Does not make sense to have "template" the filter property as you
>> suggest.
>
>template seems to a chain property.

yes

>
>> > which is much better IMHO.
>

^ permalink raw reply

* Re: [patch net-next 0/9] net: sched: introduce chain templates support with offloading to mlxsw
From: Jiri Pirko @ 2018-06-27  7:50 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Linux Netdev List, David Miller, Jamal Hadi Salim, Cong Wang,
	Simon Horman, John Hurley, David Ahern, mlxsw
In-Reply-To: <20180626141858.7f18730f@cakuba.netronome.com>

Tue, Jun 26, 2018 at 11:18:58PM CEST, jakub.kicinski@netronome.com wrote:
>On Tue, 26 Jun 2018 09:12:17 +0200, Jiri Pirko wrote:
>> Tue, Jun 26, 2018 at 09:00:45AM CEST, jakub.kicinski@netronome.com wrote:
>> >On Mon, Jun 25, 2018 at 11:43 PM, Jiri Pirko <jiri@resnulli.us> wrote:  
>> >> Tue, Jun 26, 2018 at 06:58:50AM CEST, jakub.kicinski@netronome.com wrote:  
>> >>>On Mon, 25 Jun 2018 23:01:39 +0200, Jiri Pirko wrote:  
>> >>>> From: Jiri Pirko <jiri@mellanox.com>
>> >>>>
>> >>>> For the TC clsact offload these days, some of HW drivers need
>> >>>> to hold a magic ball. The reason is, with the first inserted rule inside
>> >>>> HW they need to guess what fields will be used for the matching. If
>> >>>> later on this guess proves to be wrong and user adds a filter with a
>> >>>> different field to match, there's a problem. Mlxsw resolves it now with
>> >>>> couple of patterns. Those try to cover as many match fields as possible.
>> >>>> This aproach is far from optimal, both performance-wise and scale-wise.
>> >>>> Also, there is a combination of filters that in certain order won't
>> >>>> succeed.
>> >>>>
>> >>>> Most of the time, when user inserts filters in chain, he knows right away
>> >>>> how the filters are going to look like - what type and option will they
>> >>>> have. For example, he knows that he will only insert filters of type
>> >>>> flower matching destination IP address. He can specify a template that
>> >>>> would cover all the filters in the chain.  
>> >>>
>> >>>Perhaps it's lack of sleep, but this paragraph threw me a little off
>> >>>the track.  IIUC the goal of this set is to provide a way to inform the
>> >>>HW about expected matches before any rule is programmed into the HW.
>> >>>Not before any rule is added to a particular chain.  One can just use
>> >>>the first rule in the chain to make a guess about the chain, but thanks
>> >>>to this set user can configure *all* chains before any rules are added.  
>> >>
>> >> The template is per-chain. User can use template for chain x and
>> >> not-use it for chain y. Up to him.  
>> >
>> >Makes sense.
>> >
>> >I can't help but wonder if it'd be better to associate the
>> >constraints/rules with chains instead of creating a new "template"
>> >object.  It seems more natural to create a chain with specific
>> >constraints in place than add and delete template of which there can
>> >be at most one to a chain...  Perhaps that's more about the user space
>> >tc command line.  Anyway, not a strong objection, just a thought.  
>> 
>> Hmm. I don't think it is good idea. User should see the template in a
>> "show" command per chain. We would have to have 2 show commands, one to
>> list the template objects and one to list templates per chains. It makes
>> things more complicated for no good reason. I think that this simple
>> chain-lock is easier and serves the purpose.
>
>Hm, I think the dump is fine, what I was thinking about was:
>
># tc chain add dev dummy0 ingress chain_index 22 \
>     ^^^^^
>	template proto ip \
>	^^^^^^^^
>	flower dst_mac 00:00:00:00:00:00/00:00:00:00:FF:FF

Okay, I got it. I see 2 issues.
1) user might expect to add a chain without the template. But that does
   not make sense really. Chains are created/deleted implicitly
   according to refcount.
2) there is not chain object like this available to user. Adding it just
   for template looks odd. Also, the "filter" and "template" are very
   much alike. They both are added to a chain, they both implicitly
   create chain if it does not exist, etc.

if you don't like "tc filter template add dev dummy0 ingress", how
about:
"tc template add dev dummy0 ingress ..."
"tc template add dev dummy0 ingress chain 22 ..."
that makes more sense I think.


>
>instead of:
>
># tc filter template add dev dummy0 ingress \
>     ^^^^^^^^^^^^^^^
>	proto ip chain_index 22 \
>	flower dst_mac 00:00:00:00:00:00/00:00:00:00:FF:FF
>
>And then delete becomes:
>
># tc chain del dev dummy0 ingress chain_index 22
>Error: The chain is not empty.
>
>The fact that template is very much like a filter is sort of an
>implementation detail, from user perspective it may be more intuitive
>to model template as an attribute of the chain, not a filter object
>added to a chain.
>
>But I could well be the only person who feels that way :)

^ permalink raw reply

* Re: [PATCH net-next] net: qmi_wwan: Add pass through mode
From: Bjørn Mork @ 2018-06-27  8:00 UTC (permalink / raw)
  To: Subash Abhinov Kasiviswanathan
  Cc: dnlplm, dcbw, davem, netdev, Aleksander Morgado
In-Reply-To: <1530066614-24995-1-git-send-email-subashab@codeaurora.org>

Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> writes:

> Pass through mode is to allow packets in MAP format to be passed
> on to the stack. rmnet driver can be used to process and demultiplex
> these packets. Note that pass through mode can be enabled when the
> device is in raw ip mode only.

The concepte looks fine to me, but I have a few comments to the
implementation below.

First: I missed the last part of the discussion around automatic
detection of passthrough mode.  Could you give us a short summary of the
alternatives you tried and why they were dropped?

IIUC, userspace will be responsible for doing something like this to set
up a rmnet map interface:

 1) set the qmi_wwan netdev mode to raw-ip using sysfs
 2) set the qmi_wwan netdev mode to pass-through using syfs
 3) bind an rmnet netdev to the qmi_wwan netdev using netlink
 4) configure the device for raw-ip using qmi
 5) configure the device for map using qmi

It would be good to have some sanity check by the ones having to deal
with all that in userspace before carving it in stone.  Note that I'm
not looking for a commitment to actually implement anything :-)

I know Dan already is involved, so I am sure this is taken care of. But
I'm including Aleksander in the CC as well just in case he sees any
issues the rest of us fail to see.  The above procedure will probably
not scare any of you?  Most of it is due to the driver being completely
control protocol agnostic.  And I don't think the order matters much in
practice, except for the raw-ip before pass-through enforced by the
below patch?

> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
> ---
>  drivers/net/usb/qmi_wwan.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 74 insertions(+)
>
> diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
> index 8fac8e1..6eeec92 100644
> --- a/drivers/net/usb/qmi_wwan.c
> +++ b/drivers/net/usb/qmi_wwan.c
> @@ -59,6 +59,7 @@ struct qmi_wwan_state {
>  enum qmi_wwan_flags {
>  	QMI_WWAN_FLAG_RAWIP = 1 << 0,
>  	QMI_WWAN_FLAG_MUX = 1 << 1,
> +	QMI_WWAN_FLAG_PASS_THROUGH = 1 << 2,
>  };
>  
>  enum qmi_wwan_quirks {
> @@ -425,14 +426,82 @@ static ssize_t del_mux_store(struct device *d,  struct device_attribute *attr, c
>  	return ret;
>  }
>  
> +static ssize_t pass_through_show(struct device *d,
> +				 struct device_attribute *attr,
> +				 char *buf)
> +{
> +	struct usbnet *dev = netdev_priv(to_net_dev(d));
> +	struct qmi_wwan_state *info;
> +
> +	info = (void *)&dev->data;
> +	return sprintf(buf, "%c\n",
> +		       info->flags & QMI_WWAN_FLAG_PASS_THROUGH ? 'Y' : 'N');
> +}
> +
> +static ssize_t pass_through_store(struct device *d,
> +				  struct device_attribute *attr,
> +				  const char *buf, size_t len)
> +{

I've just had a quick glance at this, but this function looks like an
almost exact copy of the raw_ip_store().  Why not share that code
instead of copying it?

And while you're at it:  There is nothing preventing us from turning on
raw-ip here instead of failing if it is off, us there?  I.e. why not
set QMI_WWAN_FLAG_RAWIP when QMI_WWAN_FLAG_PASS_THROUGH is being set,
and clear QMI_WWAN_FLAG_PASS_THROUGH when QMI_WWAN_FLAG_RAWIP is being
cleared?

> +	struct usbnet *dev = netdev_priv(to_net_dev(d));
> +	struct qmi_wwan_state *info;
> +	bool enable;
> +	int ret;
> +
> +	if (strtobool(buf, &enable))
> +		return -EINVAL;
> +
> +	info = (void *)&dev->data;
> +
> +	/* no change? */
> +	if (enable == (info->flags & QMI_WWAN_FLAG_PASS_THROUGH))
> +		return len;
> +
> +	/* pass through mode can be set for raw ip devices only */
> +	if (!(info->flags & QMI_WWAN_FLAG_RAWIP)) {
> +		netdev_err(dev->net, "Cannot set pass through mode on non ip device\n");
> +		return -EINVAL;
> +	}

You're missing the inverse relationship, aren't you?  There is nothing
preventing the user from turning off raw-ip again after setting
pass-through. 

> +
> +	if (!rtnl_trylock())
> +		return restart_syscall();
> +
> +	/* we don't want to modify a running netdev */
> +	if (netif_running(dev->net)) {
> +		netdev_err(dev->net, "Cannot change a running device\n");
> +		ret = -EBUSY;
> +		goto err;
> +	}
> +
> +	/* let other drivers deny the change */
> +	ret = call_netdevice_notifiers(NETDEV_PRE_TYPE_CHANGE, dev->net);
> +	ret = notifier_to_errno(ret);
> +	if (ret) {
> +		netdev_err(dev->net, "Type change was refused\n");
> +		goto err;
> +	}
> +
> +	if (enable)
> +		info->flags |= QMI_WWAN_FLAG_PASS_THROUGH;
> +	else
> +		info->flags &= ~QMI_WWAN_FLAG_PASS_THROUGH;
> +	qmi_wwan_netdev_setup(dev->net);
> +	call_netdevice_notifiers(NETDEV_POST_TYPE_CHANGE, dev->net);

Do we really need all the notifier stuff here? You don't change the
qmi_wwan netdev since that's already taken care of when setting
QMI_WWAN_FLAG_RAWIP.

AFAICS, QMI_WWAN_FLAG_PASS_THROUGH can be changed without any locking,
notifications or netdev state restrictions. All it does is change the
behaviour on rx, and there is no reason that can't be applied from the
next packet even on a running interface.

We could make pass_through_store just call raw_ip_store (or the part
of it which matters, factored out into a separate function), if
necessary. The rest of pass_through_store just sets or clears the flag.
There is no need to do more than that, is there?

And as noted above, raw_ip_store must also clear
QMI_WWAN_FLAG_PASS_THROUGH if clearing QMI_WWAN_FLAG_RAWIP.

> +	ret = len;
> +err:
> +	rtnl_unlock();
> +	return ret;
> +}
> +
>  static DEVICE_ATTR_RW(raw_ip);
>  static DEVICE_ATTR_RW(add_mux);
>  static DEVICE_ATTR_RW(del_mux);
> +static DEVICE_ATTR_RW(pass_through);
>  
>  static struct attribute *qmi_wwan_sysfs_attrs[] = {
>  	&dev_attr_raw_ip.attr,
>  	&dev_attr_add_mux.attr,
>  	&dev_attr_del_mux.attr,
> +	&dev_attr_pass_through.attr,
>  	NULL,
>  };
>  
> @@ -479,6 +548,11 @@ static int qmi_wwan_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
>  	if (info->flags & QMI_WWAN_FLAG_MUX)
>  		return qmimux_rx_fixup(dev, skb);
>  
> +	if (rawip && (info->flags & QMI_WWAN_FLAG_PASS_THROUGH)) {

There is no need testing for rawip here, since you enforce that in
pass_through_store().

> +		skb->protocol = htons(ETH_P_MAP);
> +		return (netif_rx(skb) == NET_RX_SUCCESS);
> +	}
> +
>  	switch (skb->data[0] & 0xf0) {
>  	case 0x40:
>  		proto = htons(ETH_P_IP);

Bjørn

^ permalink raw reply

* Re: [PATCH net-next] net: qmi_wwan: Add pass through mode
From: Daniele Palmas @ 2018-06-27  8:51 UTC (permalink / raw)
  To: Bjørn Mork
  Cc: Subash Abhinov Kasiviswanathan, Dan Williams, David Miller,
	netdev, Aleksander Morgado
In-Reply-To: <87a7rg8xqv.fsf@miraculix.mork.no>

Hi Bjørn,

Il giorno mer 27 giu 2018 alle ore 10:01 Bjørn Mork <bjorn@mork.no> ha scritto:
>
> Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> writes:
>
> > Pass through mode is to allow packets in MAP format to be passed
> > on to the stack. rmnet driver can be used to process and demultiplex
> > these packets. Note that pass through mode can be enabled when the
> > device is in raw ip mode only.
>
> The concepte looks fine to me, but I have a few comments to the
> implementation below.
>
> First: I missed the last part of the discussion around automatic
> detection of passthrough mode.  Could you give us a short summary of the
> alternatives you tried and why they were dropped?
>
> IIUC, userspace will be responsible for doing something like this to set
> up a rmnet map interface:
>
>  1) set the qmi_wwan netdev mode to raw-ip using sysfs
>  2) set the qmi_wwan netdev mode to pass-through using syfs
>  3) bind an rmnet netdev to the qmi_wwan netdev using netlink
>  4) configure the device for raw-ip using qmi
>  5) configure the device for map using qmi
>
> It would be good to have some sanity check by the ones having to deal
> with all that in userspace before carving it in stone.  Note that I'm
> not looking for a commitment to actually implement anything :-)
>
> I know Dan already is involved, so I am sure this is taken care of. But
> I'm including Aleksander in the CC as well just in case he sees any
> issues the rest of us fail to see.  The above procedure will probably
> not scare any of you?  Most of it is due to the driver being completely
> control protocol agnostic.  And I don't think the order matters much in
> practice, except for the raw-ip before pass-through enforced by the
> below patch?
>

For your information this is how I'm testing (patch without the
pass_through sysfs attribute):

qmicli -d /dev/cdc-wdm0 --set-expected-data-format=raw-ip
qmicli -d /dev/cdc-wdm0
--wda-set-data-format=link-layer-protocol=raw-ip,ul-protocol=qmap,dl-protocol=qmap
ip link set wwp0s20u5i2 up

ip link add link wwp0s20u5i2 name rmnet0 type rmnet mux_id 1
ip link add link wwp0s20u5i2 name rmnet1 type rmnet mux_id 2

qmicli -d /dev/cdc-wdm0 --wds-noop --client-no-release-cid
qmicli -d /dev/cdc-wdm0
--wds-bind-mux-data-port=mux-id=1,ep-iface-number=2
--client-no-release-cid --client-cid=${ccid1}
qmicli -p -d /dev/cdc-wdm0 --wds-start-network=apn=${apn1},ip-type=4
--client-no-release-cid --client-cid=${ccid1}
qmicli -p -d /dev/cdc-wdm0 --wds-get-current-settings
--client-no-release-cid --client-cid=${ccid1}

ip addr add ${ip1}/${bitmask1} dev rmnet0
ip link set rmnet0 up

qmicli -d /dev/cdc-wdm0 --wds-noop --client-no-release-cid
qmicli -d /dev/cdc-wdm0
--wds-bind-mux-data-port=mux-id=2,ep-iface-number=2
--client-no-release-cid --client-cid=${ccid2}
qmicli -p -d /dev/cdc-wdm0 --wds-start-network=apn=${apn2},ip-type=4
--client-no-release-cid --client-cid=${ccid2}
qmicli -p -d /dev/cdc-wdm0 --wds-get-current-settings
--client-no-release-cid --client-cid=${ccid2}

ip addr add ${ip2}/${bitmask2} dev rmnet1
ip link set rmnet1 up

I think that 3) implies 1) and 2), so maybe some steps could be merged.

Regards,
Daniele

>
> > Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
> > ---
> >  drivers/net/usb/qmi_wwan.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 74 insertions(+)
> >
> > diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
> > index 8fac8e1..6eeec92 100644
> > --- a/drivers/net/usb/qmi_wwan.c
> > +++ b/drivers/net/usb/qmi_wwan.c
> > @@ -59,6 +59,7 @@ struct qmi_wwan_state {
> >  enum qmi_wwan_flags {
> >       QMI_WWAN_FLAG_RAWIP = 1 << 0,
> >       QMI_WWAN_FLAG_MUX = 1 << 1,
> > +     QMI_WWAN_FLAG_PASS_THROUGH = 1 << 2,
> >  };
> >
> >  enum qmi_wwan_quirks {
> > @@ -425,14 +426,82 @@ static ssize_t del_mux_store(struct device *d,  struct device_attribute *attr, c
> >       return ret;
> >  }
> >
> > +static ssize_t pass_through_show(struct device *d,
> > +                              struct device_attribute *attr,
> > +                              char *buf)
> > +{
> > +     struct usbnet *dev = netdev_priv(to_net_dev(d));
> > +     struct qmi_wwan_state *info;
> > +
> > +     info = (void *)&dev->data;
> > +     return sprintf(buf, "%c\n",
> > +                    info->flags & QMI_WWAN_FLAG_PASS_THROUGH ? 'Y' : 'N');
> > +}
> > +
> > +static ssize_t pass_through_store(struct device *d,
> > +                               struct device_attribute *attr,
> > +                               const char *buf, size_t len)
> > +{
>
>
> I've just had a quick glance at this, but this function looks like an
> almost exact copy of the raw_ip_store().  Why not share that code
> instead of copying it?
>
> And while you're at it:  There is nothing preventing us from turning on
> raw-ip here instead of failing if it is off, us there?  I.e. why not
> set QMI_WWAN_FLAG_RAWIP when QMI_WWAN_FLAG_PASS_THROUGH is being set,
> and clear QMI_WWAN_FLAG_PASS_THROUGH when QMI_WWAN_FLAG_RAWIP is being
> cleared?
>
>
> > +     struct usbnet *dev = netdev_priv(to_net_dev(d));
> > +     struct qmi_wwan_state *info;
> > +     bool enable;
> > +     int ret;
> > +
> > +     if (strtobool(buf, &enable))
> > +             return -EINVAL;
> > +
> > +     info = (void *)&dev->data;
> > +
> > +     /* no change? */
> > +     if (enable == (info->flags & QMI_WWAN_FLAG_PASS_THROUGH))
> > +             return len;
> > +
> > +     /* pass through mode can be set for raw ip devices only */
> > +     if (!(info->flags & QMI_WWAN_FLAG_RAWIP)) {
> > +             netdev_err(dev->net, "Cannot set pass through mode on non ip device\n");
> > +             return -EINVAL;
> > +     }
>
> You're missing the inverse relationship, aren't you?  There is nothing
> preventing the user from turning off raw-ip again after setting
> pass-through.
>
> > +
> > +     if (!rtnl_trylock())
> > +             return restart_syscall();
> > +
> > +     /* we don't want to modify a running netdev */
> > +     if (netif_running(dev->net)) {
> > +             netdev_err(dev->net, "Cannot change a running device\n");
> > +             ret = -EBUSY;
> > +             goto err;
> > +     }
> > +
> > +     /* let other drivers deny the change */
> > +     ret = call_netdevice_notifiers(NETDEV_PRE_TYPE_CHANGE, dev->net);
> > +     ret = notifier_to_errno(ret);
> > +     if (ret) {
> > +             netdev_err(dev->net, "Type change was refused\n");
> > +             goto err;
> > +     }
> > +
> > +     if (enable)
> > +             info->flags |= QMI_WWAN_FLAG_PASS_THROUGH;
> > +     else
> > +             info->flags &= ~QMI_WWAN_FLAG_PASS_THROUGH;
> > +     qmi_wwan_netdev_setup(dev->net);
> > +     call_netdevice_notifiers(NETDEV_POST_TYPE_CHANGE, dev->net);
>
>
> Do we really need all the notifier stuff here? You don't change the
> qmi_wwan netdev since that's already taken care of when setting
> QMI_WWAN_FLAG_RAWIP.
>
> AFAICS, QMI_WWAN_FLAG_PASS_THROUGH can be changed without any locking,
> notifications or netdev state restrictions. All it does is change the
> behaviour on rx, and there is no reason that can't be applied from the
> next packet even on a running interface.
>
> We could make pass_through_store just call raw_ip_store (or the part
> of it which matters, factored out into a separate function), if
> necessary. The rest of pass_through_store just sets or clears the flag.
> There is no need to do more than that, is there?
>
> And as noted above, raw_ip_store must also clear
> QMI_WWAN_FLAG_PASS_THROUGH if clearing QMI_WWAN_FLAG_RAWIP.
>
>
>
> > +     ret = len;
> > +err:
> > +     rtnl_unlock();
> > +     return ret;
> > +}
> > +
> >  static DEVICE_ATTR_RW(raw_ip);
> >  static DEVICE_ATTR_RW(add_mux);
> >  static DEVICE_ATTR_RW(del_mux);
> > +static DEVICE_ATTR_RW(pass_through);
> >
> >  static struct attribute *qmi_wwan_sysfs_attrs[] = {
> >       &dev_attr_raw_ip.attr,
> >       &dev_attr_add_mux.attr,
> >       &dev_attr_del_mux.attr,
> > +     &dev_attr_pass_through.attr,
> >       NULL,
> >  };
> >
> > @@ -479,6 +548,11 @@ static int qmi_wwan_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
> >       if (info->flags & QMI_WWAN_FLAG_MUX)
> >               return qmimux_rx_fixup(dev, skb);
> >
> > +     if (rawip && (info->flags & QMI_WWAN_FLAG_PASS_THROUGH)) {
>
>
> There is no need testing for rawip here, since you enforce that in
> pass_through_store().
>
>
>
> > +             skb->protocol = htons(ETH_P_MAP);
> > +             return (netif_rx(skb) == NET_RX_SUCCESS);
> > +     }
> > +
> >       switch (skb->data[0] & 0xf0) {
> >       case 0x40:
> >               proto = htons(ETH_P_IP);
>
>
>
>
>
> Bjørn

^ permalink raw reply

* Re: [PATCH bpf-next 2/7] lib: reciprocal_div: implement the improved algorithm on the paper mentioned
From: Daniel Borkmann @ 2018-06-27  8:54 UTC (permalink / raw)
  To: Jakub Kicinski, Song Liu
  Cc: Alexei Starovoitov, oss-drivers, Networking, Jiong Wang
In-Reply-To: <20180626135258.7e80bead@cakuba.netronome.com>

On 06/26/2018 10:52 PM, Jakub Kicinski wrote:
> On Mon, 25 Jun 2018 23:21:10 -0700, Song Liu wrote:
>>> +struct reciprocal_value_adv reciprocal_value_adv(u32 d, u8 prec)
>>> +{
>>> +       struct reciprocal_value_adv R;
>>> +       u32 l, post_shift;
>>> +       u64 mhigh, mlow;
>>> +
>>> +       l = fls(d - 1);
>>> +       post_shift = l;
>>> +       /* NOTE: mlow/mhigh could overflow u64 when l == 32 which means d has
>>> +        * MSB set. This case needs to be handled before calling
>>> +        * "reciprocal_value_adv", please see the comment at
>>> +        * include/linux/reciprocal_div.h.
>>> +        */  
>>
>> Shall we handle l == 32 case better? I guess the concern here is extra
>> handling may slow down the fast path? If that's the case, we should
>> at least add a WARNING on the slow path.
> 
> Agreed, I think Jiong is travelling, hence no response.  We'll respin.

Ok, since there's going to be a respin, I've tossed the current series from
patchwork in that case.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH bpf-next] nfp: bpf: allow source ptr type be map ptr in memcpy optimization
From: Daniel Borkmann @ 2018-06-27  8:58 UTC (permalink / raw)
  To: Jakub Kicinski, alexei.starovoitov; +Cc: netdev, oss-drivers, Jiong Wang
In-Reply-To: <20180627024852.4437-1-jakub.kicinski@netronome.com>

On 06/27/2018 04:48 AM, Jakub Kicinski wrote:
> From: Jiong Wang <jiong.wang@netronome.com>
> 
> Map read has been supported on NFP, this patch enables optimization
> for memcpy from map to packet.
> 
> This patch also fixed one latent bug which will cause copying from
> unexpected address once memcpy for map pointer enabled.  The fixed
> code path was not exercised before.
> 
> Reported-by: Mary Pham <mary.pham@netronome.com>
> Reported-by: David Beckett <david.beckett@netronome.com>
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Acked-by: Song Liu <songliubraving@fb.com>

Applied to bpf-next, thanks guys!

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Cornelia Huck @ 2018-06-27  9:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alexander Duyck, virtio-dev, Jiri Pirko, konrad.wilk,
	Jakub Kicinski, Samudrala, Sridhar, qemu-devel, virtualization,
	Siwei Liu, Venu Busireddy, Netdev, boris.ostrovsky, aaron.f.brown,
	Joao Martins
In-Reply-To: <20180626204312-mutt-send-email-mst@kernel.org>

On Tue, 26 Jun 2018 20:50:20 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Tue, Jun 26, 2018 at 05:08:13PM +0200, Cornelia Huck wrote:
> > On Fri, 22 Jun 2018 17:05:04 -0700
> > Siwei Liu <loseweigh@gmail.com> wrote:
> >   
> > > On Fri, Jun 22, 2018 at 3:33 PM, Michael S. Tsirkin <mst@redhat.com> wrote:  
> > > > I suspect the diveregence will be lost on most users though
> > > > simply because they don't even care about vfio. They just
> > > > want things to go fast.    
> > > 
> > > Like Jason said, VF isn't faster than virtio-net in all cases. It
> > > depends on the workload and performance metrics: throughput, latency,
> > > or packet per second.  
> > 
> > So, will it be guest/admin-controllable then where the traffic flows
> > through? Just because we do have a vf available after negotiation of
> > the feature bit, it does not necessarily mean we want to use it? Do we
> > (the guest) even want to make it visible in that case?  
> 
> I think these ideas belong to what Alex Duyck wanted to do:
> some kind of advanced device that isn't tied to
> any network interfaces and allows workload and performance
> specific tuning.
> 
> Way out of scope for a simple failover, and more importantly,
> no one is looking at even enumerating the problems involved,
> much less solving them.

So, for simplicity's sake, we need to rely on the host admin
configuring the vm for its guest's intended use case. Sounds fair, but
probably needs a note somewhere.

^ permalink raw reply

* Re: [PATCH RESEND bpf-next v6 1/2] trace_helpers.c: Add helpers to poll multiple perf FDs for events
From: Daniel Borkmann @ 2018-06-27  9:15 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, netdev
In-Reply-To: <152992950237.15897.9894201421576854943.stgit@alrua-kau>

On 06/25/2018 02:25 PM, Toke Høiland-Jørgensen wrote:
> Add two new helper functions to trace_helpers that supports polling
> multiple perf file descriptors for events. These are used to the XDP
> perf_event_output example, which needs to work with one perf fd per CPU.
> 
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>

Both applied to bpf-next, thanks Toke!

^ permalink raw reply

* Re: [PATCH v3] net: ethernet: stmmac: dwmac-rk: Add GMAC support for px30
From: David Miller @ 2018-06-27  9:17 UTC (permalink / raw)
  To: david.wu
  Cc: heiko, robh+dt, mark.rutland, huangtao, netdev, linux-arm-kernel,
	linux-rockchip, linux-kernel
In-Reply-To: <1530001173-21935-1-git-send-email-david.wu@rock-chips.com>

From: David Wu <david.wu@rock-chips.com>
Date: Tue, 26 Jun 2018 16:19:33 +0800

> Add constants and callback functions for the dwmac on px30 Soc.
> The base structure is the same, but registers and the bits in
> them are moved slightly, and add the clk_mac_speed for selecting
> mac speed.
> 
> Signed-off-by: David Wu <david.wu@rock-chips.com>

This patch doesn't apply to net-next.

^ permalink raw reply

* Re: [PATCH bpf-next] selftests/bpf: Test sys_connect BPF hooks with TFO
From: Daniel Borkmann @ 2018-06-27  9:28 UTC (permalink / raw)
  To: Andrey Ignatov, netdev; +Cc: ast, kernel-team
In-Reply-To: <20180626212241.3298872-1-rdna@fb.com>

On 06/26/2018 11:22 PM, Andrey Ignatov wrote:
> TCP Fast Open is triggered by sys_sendmsg with MSG_FASTOPEN flag for
> SOCK_STREAM socket.
> 
> Even though it's sys_sendmsg, it eventually calls __inet_stream_connect
> the same way sys_connect does for TCP.  __inet_stream_connect, in turn,
> already has BPF hooks for sys_connect.
> 
> That means TFO is already covered by BPF_CGROUP_INET{4,6}_CONNECT and
> the only missing piece is selftest. The patch adds selftest for TFO.
> 
> Signed-off-by: Andrey Ignatov <rdna@fb.com>

Applied to bpf-next, thanks Andrey!

^ permalink raw reply

* [PATCH] hinic: reset irq affinity before freeing irq
From: Wei Yongjun @ 2018-06-27  9:47 UTC (permalink / raw)
  To: Aviad Krawczyk; +Cc: Wei Yongjun, netdev

Following warning is seen when rmmod hinic. This is because affinity
value is not reset before calling free_irq(). This patch fixes it.

[   55.181232] WARNING: CPU: 38 PID: 19589 at kernel/irq/manage.c:1608
__free_irq+0x2aa/0x2c0

Fixes: 352f58b0d9f2 ("net-next/hinic: Set Rxq irq to specific cpu for NUMA")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_rx.c b/drivers/net/ethernet/huawei/hinic/hinic_rx.c
index e2e5cdc..4c0f7ed 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_rx.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_rx.c
@@ -439,6 +439,7 @@ static void rx_free_irq(struct hinic_rxq *rxq)
 {
 	struct hinic_rq *rq = rxq->rq;
 
+	irq_set_affinity_hint(rq->irq, NULL);
 	free_irq(rq->irq, rxq);
 	rx_del_napi(rxq);
 }

^ permalink raw reply related

* s390x BPF JIT failures with test_bpf
From: Kleber Souza @ 2018-06-27  9:40 UTC (permalink / raw)
  To: linux-s390, netdev; +Cc: Alexei Starovoitov, Daniel Borkmann

Hi,

When I load the test_bpf module from mainline (v4.18-rc2) with
CONFIG_BPF_JIT_ALWAYS_ON=y on a s390x system I get the following errors:

test_bpf: #289 BPF_MAXINSNS: Ctx heavy transformations FAIL to
prog_create err=-524 len=4096
test_bpf: #290 BPF_MAXINSNS: Call heavy transformations FAIL to
prog_create err=-524 len=4096
[...]
test_bpf: #296 BPF_MAXINSNS: exec all MSH FAIL to prog_create err=-524
len=4096
test_bpf: #297 BPF_MAXINSNS: ld_abs+get_processor_id FAIL to prog_create
err=-524 len=4096

>From a quick look at the code it seems that
arch/s390/net/bpf_jit_comp.c:bpf_int_jit_compile() is failing to JIT
compile the test code.

Are those failures expected and could be flagged with FLAG_EXPECTED_FAIL
on lib/test_bpf.c or are those caused by some issue with the s390x JIT
compiler that needs to be fixed?

Thanks,
Kleber

^ permalink raw reply

* Re: [PATCH 00/14] ARM: davinci: step towards removing at24_platform_data
From: Bartosz Golaszewski @ 2018-06-27  9:40 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Sekhar Nori, Kevin Hilman, Russell King, Grygorii Strashko,
	David S . Miller, Srinivas Kandagatla, Lukas Wunner, Rob Herring,
	Florian Fainelli, Dan Carpenter, Ivan Khoronzhuk, David Lechner,
	Greg Kroah-Hartman, Linux ARM, Linux Kernel Mailing List,
	linux-omap, netdev, Bartosz Golaszewski
In-Reply-To: <CAMRc=McCk6fz5x5vJmjYkYx-i6XfGPAZPm9ctg2fuCCfkvKpLQ@mail.gmail.com>

2018-06-26 15:38 GMT+02:00 Bartosz Golaszewski <brgl@bgdev.pl>:
> 2018-06-26 15:21 GMT+02:00 Andrew Lunn <andrew@lunn.ch>:
>>> I see. I see it this way: the setup callback comes from the time when
>>> we didn't have nvmem and should go away. I will protest loud whenever
>>> someone will try to use it again and will work towards removing it as
>>> soon as possible.
>>
>> The setup() callback could be moved into the nvmem framework, rather
>> than in the at24 driver. Make the call when the cells have been
>> connected to the backing store.
>>
>
> This would at least make it more generic. And maybe I could also get
> rid of the setup callback from the mityomapl138 board file.
>
> Bart
>
>>> I will give your problem a thought and will try to get back with some
>>> proposals - maybe we should, as you suggested, extend nvmem even
>>> further to allow to remove nvmem info entries etc.
>>
>> That does not help me too much. I have the same problem with i2c and
>> MDIO. So i actually prefer to keep this the same as all others.
>>
>>         Andrew

Hi Andrew,

MTD subsystem has the struct mtd_notifier and register_mtd_user(). I'm
wondering if we could implement something like this as a generic nvmem
functionality.

What do you think?

Bart

^ permalink raw reply

* Re: [PATCH net-next v2 3/4] net: check tunnel option type in tunnel flags
From: Daniel Borkmann @ 2018-06-27  9:49 UTC (permalink / raw)
  To: Jakub Kicinski, davem, jbenc
  Cc: Roopa Prabhu, jiri, jhs, xiyou.wangcong, oss-drivers, netdev,
	Pieter Jansen van Vuuren
In-Reply-To: <20180627043937.25431-4-jakub.kicinski@netronome.com>

On 06/27/2018 06:39 AM, Jakub Kicinski wrote:
> From: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
> 
> Check the tunnel option type stored in tunnel flags when creating options
> for tunnels. Thereby ensuring we do not set geneve, vxlan or erspan tunnel
> options on interfaces that are not associated with them.
> 
> Make sure all users of the infrastructure set correct flags, for the BPF
> helper we have to set all bits to keep backward compatibility.
> 
> Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> ---
> CC: Daniel Borkmann <daniel@iogearbox.net>
> 
> v2:
>  - use __be16 for dst_opt_type in net/openvswitch/flow_netlink.c (build bot).

Looks good to me, and yes in BPF case a mask like TUNNEL_OPTIONS_PRESENT is
right approach since this is opaque info and solely defined by the BPF prog
that is using the generic helper.

Thanks,
Daniel

^ permalink raw reply

* Re: s390x BPF JIT failures with test_bpf
From: Daniel Borkmann @ 2018-06-27 10:01 UTC (permalink / raw)
  To: Kleber Souza, linux-s390, netdev; +Cc: Alexei Starovoitov
In-Reply-To: <6a515d91-e831-e240-1ae6-ff0d6ead6cf2@canonical.com>

Hi Kleber,

On 06/27/2018 11:40 AM, Kleber Souza wrote:
[...]
> When I load the test_bpf module from mainline (v4.18-rc2) with
> CONFIG_BPF_JIT_ALWAYS_ON=y on a s390x system I get the following errors:
> 
> test_bpf: #289 BPF_MAXINSNS: Ctx heavy transformations FAIL to
> prog_create err=-524 len=4096
> test_bpf: #290 BPF_MAXINSNS: Call heavy transformations FAIL to
> prog_create err=-524 len=4096
> [...]
> test_bpf: #296 BPF_MAXINSNS: exec all MSH FAIL to prog_create err=-524
> len=4096
> test_bpf: #297 BPF_MAXINSNS: ld_abs+get_processor_id FAIL to prog_create
> err=-524 len=4096
> 
> From a quick look at the code it seems that
> arch/s390/net/bpf_jit_comp.c:bpf_int_jit_compile() is failing to JIT
> compile the test code.
> 
> Are those failures expected and could be flagged with FLAG_EXPECTED_FAIL
> on lib/test_bpf.c or are those caused by some issue with the s390x JIT
> compiler that needs to be fixed?

JIT doesn't guarantee in general to map really all programs to native insns,
so some, mostly crafted corner cases could fail. E.g. x86-64 JIT doesn't converge
on some programs in test_bpf.c and thus falls back to interpreter or simply
rejects the program in case of CONFIG_BPF_JIT_ALWAYS_ON=y. Above would seem
likely that it's hitting the BPF_SIZE_MAX that s390 would do. I think it might
make sense to either have the FLAG_EXPECTED_FAIL in lib/test_bpf.c more fine
grained as a flag per arch, so we could say it's expected to fail on e.g. s390
but not on x86 and the like, or just denote it as 'could potentially fail but
doesn't have to be the case everywhere'.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH net-next 2/3] rds: Enable RDS IPv6 support
From: Ka-Cheong Poon @ 2018-06-27 10:07 UTC (permalink / raw)
  To: Sowmini Varadhan; +Cc: netdev, santosh.shilimkar, davem, rds-devel
In-Reply-To: <20180626130844.GD20575@oracle.com>

On 06/26/2018 09:08 PM, Sowmini Varadhan wrote:
> On (06/26/18 21:02), Ka-Cheong Poon wrote:
>>
>> In this case, RFC 6724 prefers link local address as source.
> 
> the keyword is "prefers".

There is a reason for that.  It is the way folks expect
how IPv6 addresses are being used.

>> While using non-link local address (say ULA) is not forbidden,
>> doing this can easily cause inter-operability issues (does the
>> app really know that the non-link local source and the link
>> local destination addresses are really on the same link?).  I
>> think it is prudent to disallow this in RDS unless there is a
>> very clear and important reason to do so.
> 
> I remember the issues that triggered 6724. The "interop" issue
> is that when you send from Link-local to global, and need forwarding,
> it may not work.

It is not just forwarding.  The simple case is that one
picks a global address in a different link and then
use it to send to a link local address in another link.
This does not work.  And the RDS connection created will
be stuck forever.  I don't think this is a good idea to
have such stuck connections.

> but I dont think an RDS application today expects to deal with
> the case that "oh I got back and error when I tried to send to
> address X on rds socket rs1, let me go and check what I am bound
> to, and maybe create another socket, and bind it to link-local"

I don't expect RDS apps will want to use link local address
in the first place.  In fact, most normal network apps don't.

> You're not doing this for IPv4 and RDS today (you dont have to do this
> for UDP, afaik)

Do you know of any IPv4 RDS app which uses IPv4 link local
address?  In fact, IPv4 link local address is explicitly
disallowed for active active bonding.

> This is especially true if "X" is a hostname that got resovled using DNS

Can you explain why DNS name resolution will return an IPv6
link local address?  I'm surprised if it actually does.

>> BTW, if it is really > needed, it can be added in future.
> 
> shrug. You are introducing a new error return.

An error needs to be returned because it is not allowed.

-- 
K. Poon
ka-cheong.poon@oracle.com

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Cornelia Huck @ 2018-06-27 10:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Siwei Liu, Samudrala, Sridhar, Alexander Duyck, virtio-dev,
	aaron.f.brown, Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel,
	virtualization, konrad.wilk, boris.ostrovsky, Joao Martins,
	Venu Busireddy, vijay.balakrishna
In-Reply-To: <20180623003545-mutt-send-email-mst@kernel.org>

On Sat, 23 Jun 2018 00:43:24 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Fri, Jun 22, 2018 at 05:09:55PM +0200, Cornelia Huck wrote:
> > Would it be more helpful to focus on generic
> > migration support for vfio instead of going about it device by device?  
> 
> Just to note this approach is actually device by device *type*.  It's
> mostly device agnostic for a given device type so you can migrate
> between hosts with very different hardware.

This enables heterogeneous environments, yes.

But one drawback of that is that you cannot exploit any hardware
specialities - it seems you're limited to what the paravirtual device
supports. This is limiting for more homogeneous environments.

> 
> And support for more PV device types has other advantages
> such as security and forward compatibility to future hosts.

But again the drawback is that we can't exploit new capabilities
easily, can we?

> 
> Finally, it all can happen mostly within QEMU. User is currently
> required to enable it but it's pretty lightweight.
> 
> OTOH vfio migration generally requires actual device-specific work, and
> only works when hosts are mostly identical. When they aren't it's easy
> to blame the user, but tools for checking host compatiblity are
> currently non-existent. Upper layer management will also have to learn
> about host and device compatibility wrt migration. At the moment they
> can't even figure it out wrt software versions of vhost in kernel and
> dpdk so I won't hold my breath for all of this happening quickly.

Yes, that's a real problem.

I think one issue here is that we want to support really different
environments. For the case here, we have a lot of different networking
adapters, but the guests are basically interested in one thing: doing
network traffic. On the other hand, I'm thinking of the mainframe
environment, where we have a very limited set of devices to support,
but at the same time want to exploit their specialities, so the pv
approach is limiting. For that use case, generic migration looks more
useful.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox