Netdev List
 help / color / mirror / Atom feed
* RE: [Intel-wired-lan] [PATCH net v2] i40e: Fix limit imprecise of the number of MAC/VLAN that can be added for VFs
From: wangyunjian @ 2017-09-29  9:13 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: David Miller, Jeff Kirsher, Sergei Shtylyov, Netdev, caihe,
	intel-wired-lan
In-Reply-To: <CAKgT0UcsnLUWU8qedNrv+bhbgJ6SC9zWyc7oR9gObcYo2atN5g@mail.gmail.com>



> -----Original Message-----
> From: Alexander Duyck [mailto:alexander.duyck@gmail.com]
> Sent: Thursday, September 28, 2017 11:44 PM
> To: wangyunjian <wangyunjian@huawei.com>
> Cc: David Miller <davem@davemloft.net>; Jeff Kirsher
> <jeffrey.t.kirsher@intel.com>; Sergei Shtylyov
> <sergei.shtylyov@cogentembedded.com>; Netdev
> <netdev@vger.kernel.org>; caihe <caihe@huawei.com>; intel-wired-lan
> <intel-wired-lan@lists.osuosl.org>
> Subject: Re: [Intel-wired-lan] [PATCH net v2] i40e: Fix limit imprecise of the
> number of MAC/VLAN that can be added for VFs
> 
> On Wed, Sep 27, 2017 at 7:01 PM, w00273186 <wangyunjian@huawei.com>
> wrote:
> > From: Yunjian Wang <wangyunjian@huawei.com>
> >
> > Now it doesn't limit the number of MAC/VLAN strictly. When there is more
> > elements in the virtchnl MAC/VLAN list, it can still add successfully.
> 
> You could still add but should you. I'm not clear from this patch
> description what this is supposed to be addressing. If you enable the
> "trust" flag for a VF via the "ip link set dev <iface> vf <vfnum>
> trust on" it can make use of any resources on the device, but without
> that there is an upper limit that is supposed to be enforced to
> prevent the VF from making use of an excessive amount of resources.
> That is what is being enforced by the code you are moving out of the
> way below.

I don't enable the "trust" flag for a VF. But this script can successfully add
MACs more than I40E_VC_MAX_MAC_ADDR_PER_VF(12) in VM. It has
same problem with VLAN.

Test script:

for((i=10;i<50;i++))
do
    ipmaddr add 01:00:5e:01:02:$i  dev eth0
done

for ((i=1;i<40;i++))
do
    ip link add link eth0 name eth0.$i type vlan id $i
done

> 
> > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> > ---
> >  drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 27 +++++++++++++--
> -------
> >  1 file changed, 16 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> > index 4d1e670..285b96a 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> > +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> > @@ -2065,11 +2065,6 @@ static inline int i40e_check_vf_permission(struct
> i40e_vf *vf, u8 *macaddr)
> >                 dev_err(&pf->pdev->dev,
> >                         "VF attempting to override administratively set MAC address,
> reload the VF driver to resume normal operation\n");
> >                 ret = -EPERM;
> > -       } else if ((vf->num_mac >= I40E_VC_MAX_MAC_ADDR_PER_VF) &&
> > -                  !test_bit(I40E_VIRTCHNL_VF_CAP_PRIVILEGE, &vf->vf_caps)) {
> > -               dev_err(&pf->pdev->dev,
> > -                       "VF is not trusted, switch the VF to trusted to add more
> functionality\n");
> > -               ret = -EPERM;
> >         }
> >         return ret;
> >  }
> > @@ -2128,6 +2123,15 @@ static int i40e_vc_add_mac_addr_msg(struct
> i40e_vf *vf, u8 *msg, u16 msglen)
> >                 } else {
> >                         vf->num_mac++;
> >                 }
> > +
> > +               if ((vf->num_mac >= I40E_VC_MAX_MAC_ADDR_PER_VF) &&
> > +                   !test_bit(I40E_VIRTCHNL_VF_CAP_PRIVILEGE, &vf->vf_caps)) {
> > +                       dev_err(&pf->pdev->dev,
> > +                               "VF is not trusted, switch the VF to trusted to add more
> functionality\n");
> > +                       ret = -EPERM;
> > +                       spin_unlock_bh(&vsi->mac_filter_hash_lock);
> > +                       goto error_param;
> > +               }
> >         }
> >         spin_unlock_bh(&vsi->mac_filter_hash_lock);
> >
> 
> This doesn't make any sense. You are doing the checks after you have
> already added the MAC. The only part you aren't doing is sending the
> message to the VF indicating that the request was successful.
> 
> > @@ -2221,12 +2225,6 @@ static int i40e_vc_add_vlan_msg(struct i40e_vf
> *vf, u8 *msg, u16 msglen)
> >         i40e_status aq_ret = 0;
> >         int i;
> >
> > -       if ((vf->num_vlan >= I40E_VC_MAX_VLAN_PER_VF) &&
> > -           !test_bit(I40E_VIRTCHNL_VF_CAP_PRIVILEGE, &vf->vf_caps)) {
> > -               dev_err(&pf->pdev->dev,
> > -                       "VF is not trusted, switch the VF to trusted to add more VLAN
> addresses\n");
> > -               goto error_param;
> > -       }
> >         if (!test_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states) ||
> >             !i40e_vc_isvalid_vsi_id(vf, vsi_id)) {
> >                 aq_ret = I40E_ERR_PARAM;
> > @@ -2269,6 +2267,13 @@ static int i40e_vc_add_vlan_msg(struct i40e_vf
> *vf, u8 *msg, u16 msglen)
> >                         dev_err(&pf->pdev->dev,
> >                                 "Unable to add VLAN filter %d for VF %d, error %d\n",
> >                                 vfl->vlan_id[i], vf->vf_id, ret);
> > +               if ((vf->num_vlan >= I40E_VC_MAX_VLAN_PER_VF) &&
> > +                   !test_bit(I40E_VIRTCHNL_VF_CAP_PRIVILEGE, &vf->vf_caps)) {
> > +                       dev_err(&pf->pdev->dev,
> > +                               "VF is not trusted, switch the VF to trusted to add more
> VLAN addresses\n");
> > +                       aq_ret = -EPERM;
> > +                       goto error_param;
> > +               }
> >         }
> >
> >  error_param:
> 
> Same here. You are doing this after the call to i40e_vsi_add_vlan. The
> code makes no sense here. This bit of code is supposed to be
> preventing a VF from abusing resources if the VF is not privelaged.

^ permalink raw reply

* Re: [patch net-next 2/7] ipv4: ipmr: Add the parent ID field to VIF struct
From: Nikolay Aleksandrov @ 2017-09-29  9:29 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, yotamg, idosch, mlxsw, andrew, dsa, edumazet, willemb,
	johannes.berg, dcaratti, pabeni, daniel, f.fainelli, fw,
	gfree.wind
In-Reply-To: <20170928173415.15551-3-jiri@resnulli.us>

On 28/09/17 20:34, Jiri Pirko wrote:
> From: Yotam Gigi <yotamg@mellanox.com>
> 
> In order to allow the ipmr module to do partial multicast forwarding
> according to the device parent ID, add the device parent ID field to the
> VIF struct. This way, the forwarding path can use the parent ID field
> without invoking switchdev calls, which requires the RTNL lock.
> 
> When a new VIF is added, set the device parent ID field in it by invoking
> the switchdev_port_attr_get call.
> 
> Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
> Reviewed-by: Ido Schimmel <idosch@mellanox.com>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> ---
>  include/linux/mroute.h | 2 ++
>  net/ipv4/ipmr.c        | 9 +++++++++
>  2 files changed, 11 insertions(+)
> 
> diff --git a/include/linux/mroute.h b/include/linux/mroute.h
> index b072a84..a46577f 100644
> --- a/include/linux/mroute.h
> +++ b/include/linux/mroute.h
> @@ -57,6 +57,8 @@ static inline bool ipmr_rule_default(const struct fib_rule *rule)
>  
>  struct vif_device {
>  	struct net_device 	*dev;			/* Device we are using */
> +	struct netdev_phys_item_id dev_parent_id;	/* Device parent ID    */
> +	bool		dev_parent_id_valid;
>  	unsigned long	bytes_in,bytes_out;
>  	unsigned long	pkt_in,pkt_out;		/* Statistics 			*/
>  	unsigned long	rate_limit;		/* Traffic shaping (NI) 	*/
> diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
> index 292a8e8..4566c54 100644
> --- a/net/ipv4/ipmr.c
> +++ b/net/ipv4/ipmr.c
> @@ -67,6 +67,7 @@
>  #include <net/fib_rules.h>
>  #include <linux/netconf.h>
>  #include <net/nexthop.h>
> +#include <net/switchdev.h>
>  
>  struct ipmr_rule {
>  	struct fib_rule		common;
> @@ -868,6 +869,9 @@ static int vif_add(struct net *net, struct mr_table *mrt,
>  		   struct vifctl *vifc, int mrtsock)
>  {
>  	int vifi = vifc->vifc_vifi;
> +	struct switchdev_attr attr = {
> +		.id = SWITCHDEV_ATTR_ID_PORT_PARENT_ID,
> +	};
>  	struct vif_device *v = &mrt->vif_table[vifi];
>  	struct net_device *dev;
>  	struct in_device *in_dev;
> @@ -942,6 +946,11 @@ static int vif_add(struct net *net, struct mr_table *mrt,
>  
>  	/* Fill in the VIF structures */
>  
> +	attr.orig_dev = dev;
> +	if (!switchdev_port_attr_get(dev, &attr)) {
> +		v->dev_parent_id_valid = true;
> +		memcpy(v->dev_parent_id.id, attr.u.ppid.id, attr.u.ppid.id_len);

Hmm, shouldn't you set dev_parent_id.id_len too ? It would seem netdev_phys_item_id_same()
uses it in the comparison and without the len it would always look like they're the same
because memcmp will simply return 0 with count = 0.

> +	}
>  	v->rate_limit = vifc->vifc_rate_limit;
>  	v->local = vifc->vifc_lcl_addr.s_addr;
>  	v->remote = vifc->vifc_rmt_addr.s_addr;
> 

^ permalink raw reply

* Re: [net-next PATCH 1/5] bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP
From: Paolo Abeni @ 2017-09-29  9:37 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Alexei Starovoitov
  Cc: Jesper Dangaard Brouer, netdev, jakub.kicinski,
	Michael S. Tsirkin, Jason Wang, mchan, John Fastabend,
	peter.waskiewicz.jr, Daniel Borkmann, Andy Gospodarek, edumazet
In-Reply-To: <8737760wg5.fsf@stressinduktion.org>

On Fri, 2017-09-29 at 09:56 +0200, Hannes Frederic Sowa wrote:
> [adding Paolo, Eric]
> 
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> 
> > On Thu, Sep 28, 2017 at 02:57:08PM +0200, Jesper Dangaard Brouer wrote:
> 
> [...]
> 
> > > +	wake_up_process(rcpu->kthread);
> > 
> > In general the whole thing looks like 'threaded NAPI' that Hannes was
> > proposing some time back. I liked it back then and I like it now.
> > I don't remember what were the objections back then.
> > Something scheduler related?
> > Adding Hannes.

Beyond the added scheduling complexity, the threaded NAPI
implementation proposed some time ago also possibly introduced OoO
packet delivery, because the NAPI threads were left unbound to any CPU.

Cheers,

Paolo

^ permalink raw reply

* Re: [net-next PATCH 1/5] bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP
From: Hannes Frederic Sowa @ 2017-09-29  9:40 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Alexei Starovoitov, Jesper Dangaard Brouer, netdev,
	jakub.kicinski, Michael S. Tsirkin, Jason Wang, mchan,
	John Fastabend, peter.waskiewicz.jr, Daniel Borkmann,
	Andy Gospodarek, edumazet
In-Reply-To: <1506677857.2478.5.camel@redhat.com>

Paolo Abeni <pabeni@redhat.com> writes:

> On Fri, 2017-09-29 at 09:56 +0200, Hannes Frederic Sowa wrote:
>> [adding Paolo, Eric]
>> 
>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>> 
>> > On Thu, Sep 28, 2017 at 02:57:08PM +0200, Jesper Dangaard Brouer wrote:
>> 
>> [...]
>> 
>> > > +	wake_up_process(rcpu->kthread);
>> > 
>> > In general the whole thing looks like 'threaded NAPI' that Hannes was
>> > proposing some time back. I liked it back then and I like it now.
>> > I don't remember what were the objections back then.
>> > Something scheduler related?
>> > Adding Hannes.
>
> Beyond the added scheduling complexity, the threaded NAPI
> implementation proposed some time ago also possibly introduced OoO
> packet delivery, because the NAPI threads were left unbound to any CPU.

Right, yes, but that can be resolved. The problem was just in that
particular patch.

^ permalink raw reply

* Re: [PATCH net-next 0/3] support changing steering policies in tuntap
From: Jason Wang @ 2017-09-29  9:41 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Michael S. Tsirkin, Network Development, LKML
In-Reply-To: <CAF=yD-KuDZK0-YUfYde=dk_6M6fgj_opuiK2VTfCKDM_=MqDcw@mail.gmail.com>



On 2017年09月29日 00:09, Willem de Bruijn wrote:
> On Thu, Sep 28, 2017 at 3:23 AM, Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2017年09月28日 07:25, Willem de Bruijn wrote:
>>>>> In the future, both simple and sophisticated policy like RSS or other
>>>>> guest
>>>>> driven steering policies could be done on top.
>>>> IMHO there should be a more practical example before adding all this
>>>> indirection. And it would be nice to understand why this queue selection
>>>> needs to be tun specific.
>>> I was thinking the same and this reminds me of the various strategies
>>> implemented in packet fanout. tun_cpu_select_queue is analogous to
>>> fanout_demux_cpu though it is tun-specific in that it requires
>>> tun->numqueues.
>>
>> Right, the main idea is to introduce a way to change flow steering policy
>> for tun. I think fanout policy could be implemented through the API
>> introduced in this series. (Current flow caches based automatic steering
>> method is tun specific).
>>
>>> Fanout accrued various strategies until it gained an eBPF variant. Just
>>> supporting BPF is probably sufficient here, too.
>>
>> Technically yes, but for tun, it also serve for virt. We probably still need
>> some hard coded policy which could be changed by guest until we can accept
>> an BPF program from guest I think?
> When would a guest choose the policy? As long as this is under control
> of a host user, possibly unprivileged, allowing BPF here is moot, as any
> user can run socket filter BPF already. Programming from the guest is
> indeed different. I don't fully understand that use case.

The problem is userspace (qemu) know little about what kind of workloads 
will be done by guest, so we need guest controllable method here since 
it knows the best steering policy. Rethink about this, instead of 
passing eBPF from guest, qemu can have some pre-defined sets of polices. 
I will change the cpu id based to eBPF based in V2.

Thanks

^ permalink raw reply

* Re: [PATCH net-next RFC 0/9] net: dsa: PTP timestamping for mv88e6xxx
From: Richard Cochran @ 2017-09-29  9:43 UTC (permalink / raw)
  To: Brandon Streiff
  Cc: netdev, linux-kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot, Erik Hons
In-Reply-To: <1506612341-18061-1-git-send-email-brandon.streiff@ni.com>

Brandon,

On Thu, Sep 28, 2017 at 10:25:32AM -0500, Brandon Streiff wrote:
> - Patch #2: We expose the switch time as a PTP clock but don't support
>   adjustment (max_adj=0).

The driver should implement a cyclecounter/timecounter.

> Our platform adjusted a systemwide oscillator
>   from userspace, so we didn't need adjustment at this layer, but other
>   PTP clock drivers support this and we probably should too.

We don't currently have any way to support this kind of HW or anything
like an external VCO.  I would like to find a way to do this, but that
is a different kettle of fish as it might require changes in the PHC
subsystem.  For this driver, I think we should get it merged using the
cyclecounter/timecounter (as that will benefit lots of users) and
worry about the external oscillator later.
 
> Feedback is appreciated.

I happy to see this series.  I just finished porting an out-of-tree
PHC driver for the Marvell mv88e635x, and I want to mainline it, but I
also have a few uglies.

Unfortunately I am in the middle of a move right now, and so my review
of this series might have to wait a bit.  However, I am looking
forward to comparing notes, and then getting this support in.

Thanks,
Richard

^ permalink raw reply

* Re: [patch net-next 2/7] ipv4: ipmr: Add the parent ID field to VIF struct
From: Nikolay Aleksandrov @ 2017-09-29  9:45 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, yotamg, idosch, mlxsw, andrew, dsa, edumazet, willemb,
	johannes.berg, dcaratti, pabeni, daniel, f.fainelli, fw,
	gfree.wind
In-Reply-To: <7618b2e8-e7f9-d2c6-b13c-53aef0f50de0@cumulusnetworks.com>

On 29/09/17 12:29, Nikolay Aleksandrov wrote:
> On 28/09/17 20:34, Jiri Pirko wrote:
>> From: Yotam Gigi <yotamg@mellanox.com>
>>
>> In order to allow the ipmr module to do partial multicast forwarding
>> according to the device parent ID, add the device parent ID field to the
>> VIF struct. This way, the forwarding path can use the parent ID field
>> without invoking switchdev calls, which requires the RTNL lock.
>>
>> When a new VIF is added, set the device parent ID field in it by invoking
>> the switchdev_port_attr_get call.
>>
>> Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
>> Reviewed-by: Ido Schimmel <idosch@mellanox.com>
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>>  include/linux/mroute.h | 2 ++
>>  net/ipv4/ipmr.c        | 9 +++++++++
>>  2 files changed, 11 insertions(+)
>>
>> diff --git a/include/linux/mroute.h b/include/linux/mroute.h
>> index b072a84..a46577f 100644
>> --- a/include/linux/mroute.h
>> +++ b/include/linux/mroute.h
>> @@ -57,6 +57,8 @@ static inline bool ipmr_rule_default(const struct fib_rule *rule)
>>  
>>  struct vif_device {
>>  	struct net_device 	*dev;			/* Device we are using */
>> +	struct netdev_phys_item_id dev_parent_id;	/* Device parent ID    */
>> +	bool		dev_parent_id_valid;
>>  	unsigned long	bytes_in,bytes_out;
>>  	unsigned long	pkt_in,pkt_out;		/* Statistics 			*/
>>  	unsigned long	rate_limit;		/* Traffic shaping (NI) 	*/
>> diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
>> index 292a8e8..4566c54 100644
>> --- a/net/ipv4/ipmr.c
>> +++ b/net/ipv4/ipmr.c
>> @@ -67,6 +67,7 @@
>>  #include <net/fib_rules.h>
>>  #include <linux/netconf.h>
>>  #include <net/nexthop.h>
>> +#include <net/switchdev.h>
>>  
>>  struct ipmr_rule {
>>  	struct fib_rule		common;
>> @@ -868,6 +869,9 @@ static int vif_add(struct net *net, struct mr_table *mrt,
>>  		   struct vifctl *vifc, int mrtsock)
>>  {
>>  	int vifi = vifc->vifc_vifi;
>> +	struct switchdev_attr attr = {
>> +		.id = SWITCHDEV_ATTR_ID_PORT_PARENT_ID,
>> +	};
>>  	struct vif_device *v = &mrt->vif_table[vifi];
>>  	struct net_device *dev;
>>  	struct in_device *in_dev;
>> @@ -942,6 +946,11 @@ static int vif_add(struct net *net, struct mr_table *mrt,
>>  
>>  	/* Fill in the VIF structures */
>>  
>> +	attr.orig_dev = dev;
>> +	if (!switchdev_port_attr_get(dev, &attr)) {
>> +		v->dev_parent_id_valid = true;
>> +		memcpy(v->dev_parent_id.id, attr.u.ppid.id, attr.u.ppid.id_len);
> 
> Hmm, shouldn't you set dev_parent_id.id_len too ? It would seem netdev_phys_item_id_same()
> uses it in the comparison and without the len it would always look like they're the same
> because memcmp will simply return 0 with count = 0.

Also maybe we can use the non-zero id_len as a signal that it was set and drop the dev_parent_id_valid
field altogether, it would seem there's no valid reason to have id_len == 0 and yet expect a valid
parent_id.

> 
>> +	}
>>  	v->rate_limit = vifc->vifc_rate_limit;
>>  	v->local = vifc->vifc_lcl_addr.s_addr;
>>  	v->remote = vifc->vifc_rmt_addr.s_addr;
>>
> 

^ permalink raw reply

* Re: [net-next PATCH 3/5] bpf: cpumap xdp_buff to skb conversion and allocation
From: Jason Wang @ 2017-09-29  9:49 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, netdev
  Cc: jakub.kicinski, Michael S. Tsirkin, mchan, John Fastabend,
	peter.waskiewicz.jr, Daniel Borkmann, Alexei Starovoitov,
	Andy Gospodarek
In-Reply-To: <150660343811.2808.7680200486950101509.stgit@firesoul>



On 2017年09月28日 20:57, Jesper Dangaard Brouer wrote:
> +};
> +
> +/* Convert xdp_buff to xdp_pkt */
> +static struct xdp_pkt *convert_to_xdp_pkt(struct xdp_buff *xdp)
> +{
> +	struct xdp_pkt *xdp_pkt;
> +	int headroom;
> +
> +	/* Assure headroom is available for storing info */
> +	headroom = xdp->data - xdp->data_hard_start;
> +	if (headroom < sizeof(*xdp_pkt))
> +		return NULL;

Hi Jesper:

Do you consider this as a trick or a long term solution? Is it better to 
store XDP in a circular buffer? (I'm asking since I meet similar issue 
when doing xdp_xmit for tun).

> +
> +	/* Store info in top of packet */
> +	xdp_pkt = xdp->data_hard_start;
> +
> +	xdp_pkt->data = xdp->data;
> +	xdp_pkt->len  = xdp->data_end - xdp->data;
> +	xdp_pkt->headroom = headroom - sizeof(*xdp_pkt);
> +

Is wmb() needed here?

> +	return xdp_pkt;
> +}

Thanks

^ permalink raw reply

* Re: [patch net-next 2/7] ipv4: ipmr: Add the parent ID field to VIF struct
From: Nikolay Aleksandrov @ 2017-09-29  9:50 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, yotamg, idosch, mlxsw, andrew, dsa, edumazet, willemb,
	johannes.berg, dcaratti, pabeni, daniel, f.fainelli, fw,
	gfree.wind
In-Reply-To: <20170928173415.15551-3-jiri@resnulli.us>

On 28/09/17 20:34, Jiri Pirko wrote:
> From: Yotam Gigi <yotamg@mellanox.com>
> 
> In order to allow the ipmr module to do partial multicast forwarding
> according to the device parent ID, add the device parent ID field to the
> VIF struct. This way, the forwarding path can use the parent ID field
> without invoking switchdev calls, which requires the RTNL lock.
> 
> When a new VIF is added, set the device parent ID field in it by invoking
> the switchdev_port_attr_get call.
> 
> Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
> Reviewed-by: Ido Schimmel <idosch@mellanox.com>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> ---
>  include/linux/mroute.h | 2 ++
>  net/ipv4/ipmr.c        | 9 +++++++++
>  2 files changed, 11 insertions(+)
> 
> diff --git a/include/linux/mroute.h b/include/linux/mroute.h
> index b072a84..a46577f 100644
> --- a/include/linux/mroute.h
> +++ b/include/linux/mroute.h
> @@ -57,6 +57,8 @@ static inline bool ipmr_rule_default(const struct fib_rule *rule)
>  
>  struct vif_device {
>  	struct net_device 	*dev;			/* Device we are using */
> +	struct netdev_phys_item_id dev_parent_id;	/* Device parent ID    */
> +	bool		dev_parent_id_valid;
>  	unsigned long	bytes_in,bytes_out;
>  	unsigned long	pkt_in,pkt_out;		/* Statistics 			*/
>  	unsigned long	rate_limit;		/* Traffic shaping (NI) 	*/
> diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
> index 292a8e8..4566c54 100644
> --- a/net/ipv4/ipmr.c
> +++ b/net/ipv4/ipmr.c
> @@ -67,6 +67,7 @@
>  #include <net/fib_rules.h>
>  #include <linux/netconf.h>
>  #include <net/nexthop.h>
> +#include <net/switchdev.h>
>  
>  struct ipmr_rule {
>  	struct fib_rule		common;
> @@ -868,6 +869,9 @@ static int vif_add(struct net *net, struct mr_table *mrt,
>  		   struct vifctl *vifc, int mrtsock)
>  {
>  	int vifi = vifc->vifc_vifi;
> +	struct switchdev_attr attr = {
> +		.id = SWITCHDEV_ATTR_ID_PORT_PARENT_ID,
> +	};
>  	struct vif_device *v = &mrt->vif_table[vifi];
>  	struct net_device *dev;
>  	struct in_device *in_dev;
> @@ -942,6 +946,11 @@ static int vif_add(struct net *net, struct mr_table *mrt,
>  
>  	/* Fill in the VIF structures */
>  
> +	attr.orig_dev = dev;
> +	if (!switchdev_port_attr_get(dev, &attr)) {
> +		v->dev_parent_id_valid = true;
> +		memcpy(v->dev_parent_id.id, attr.u.ppid.id, attr.u.ppid.id_len);
> +	}
>  	v->rate_limit = vifc->vifc_rate_limit;
>  	v->local = vifc->vifc_lcl_addr.s_addr;
>  	v->remote = vifc->vifc_rmt_addr.s_addr;
> 

One more thing - what happens on vif delete, then add with the same vif index of another
device that doesn't have a parent id ? I think the vif will be stuck with its parent_id
when it gets set.

^ permalink raw reply

* RE: [PATCH net-next 03/10] sctp: factor out stream->in allocation
From: David Laight @ 2017-09-29 10:04 UTC (permalink / raw)
  To: 'Marcelo Ricardo Leitner', netdev@vger.kernel.org
  Cc: linux-sctp@vger.kernel.org, Neil Horman, Vlad Yasevich, Xin Long
In-Reply-To: <a2d837095740147003164fe573794c140001ec71.1506536044.git.marcelo.leitner@gmail.com>

From: Marcelo Ricardo Leitner
> Sent: 28 September 2017 21:25
> Same idea as previous patch.

That needs a proper description.

	David

^ permalink raw reply

* netlink backwards compatibility in userspace tools
From: Jason A. Donenfeld @ 2017-09-29 10:22 UTC (permalink / raw)
  To: Netdev, LKML; +Cc: Daniel Kahn Gillmor

Hi guys,

One handy aspect of Netlink is that it's backwards compatible. This
means that you can run old userspace utilities on new kernels, even if
the new kernel supports new features and netlink attributes. The wire
format is stable enough that the data marshaled can be extended
without breaking compat. Neat.

I was wondering, though, what you think the best stance is toward
these old userspace utilities. What should they do if the kernel sends
it netlink attributes that it does not recognize? At the moment, I'm
doing something like this:

static void warn_unrecognized(void)
{
    static bool once = false;
    if (once)
        return;
    once = true;
    fprintf(stderr,
        "Warning: this program received from your kernel one or more\n"
        "attributes that it did not recognize. It is possible that\n"
        "this version of wg(8) is older than your kernel. You may\n"
        "want to update this program.\n");
}

This seems like a somewhat sensible warning, but then I wonder about
distributions like Debian, which has a long stable life cycle, so it
frequently has very old tools (ancient iproute2 for example). Then,
VPS providers have these Debian images run on top of newer kernels.
People in this situation would undoubtedly see the above warning a lot
and not be able to do anything about it. Not horrible, but a bit
annoying. Is this an okay annoyance? Or is it advised to just have no
warning at all? One idea would be to put it behind an environment
variable flag, but I don't like too many nobs.

I'm generally wondering about attitudes toward this kind of userspace
program behavior in response to newer kernels.

Thanks,
Jason

^ permalink raw reply

* Re: [PATCH net] ipv6: fix net.ipv6.conf.all interface DAD handlers
From: Matteo Croce @ 2017-09-29 10:47 UTC (permalink / raw)
  To: Erik Kline; +Cc: David Miller, netdev, linux-doc
In-Reply-To: <CAAedzxrLurc=oGy64ePO30j3QQ_sL5kJCEz+RnTk3LMSn_yvyQ@mail.gmail.com>

On Thu, Sep 28, 2017 at 1:22 PM, Erik Kline <ek@google.com> wrote:
> Upon further reflection, doesn't the whole premise of this change
> means that it's no longer possible to selectively disable these
> features if they are set on "all"?  Or are we saying that this mode is
> only support with "default" enable + "ifname" disable?

Hi Erik, thanks for the review.
Yes the behaviour seems wrong when writing all.accept_dad respect what
the documentation says.

BTW the previous behaviour was not defined, I put them in OR because
that's what other handlers do, eg. send_redirects.
If you think that it's better to put them in AND we can change the
documentation accordingly.
What do you think?

-- 
Matteo Croce
per aspera ad upstream

^ permalink raw reply

* Re: [PATCH v2 net-next 0/2] net/sched: support tunnel options in cls_flower and act_tunnel_key
From: David Miller @ 2017-09-29  4:54 UTC (permalink / raw)
  To: simon.horman; +Cc: jiri, jhs, xiyou.wangcong, netdev, oss-drivers
In-Reply-To: <1506500194-17637-1-git-send-email-simon.horman@netronome.com>

From: Simon Horman <simon.horman@netronome.com>
Date: Wed, 27 Sep 2017 10:16:32 +0200

> Users of options:
> 
> * There are eBPF hooks to allow getting on and setting tunnel metadata:
>   bpf_skb_set_tunnel_opt, bpf_skb_get_tunnel_opt.
> 
> * Open vSwitch is able to match and set Geneve and VXLAN-GBP options.
> 
> Neither of the above appear to assume any structure for the data.

I really worry about this.

These metadata option blobs are internal kernel datastructure which we
could change at any point in time.  They are not exported to
userspace as a UAPI.

It's kinda OK for eBPF programs to access this stuff since they are
expected to cope with changes to internal data-structures.

But for anything user facing, this really doesn't work.

^ permalink raw reply

* Re: [PATCH net-next] net: bridge: add per-port group_fwd_mask with less restrictions
From: David Miller @ 2017-09-29  5:04 UTC (permalink / raw)
  To: nikolay; +Cc: netdev, roopa, bridge, stephen
In-Reply-To: <1506517964-17479-1-git-send-email-nikolay@cumulusnetworks.com>

From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Date: Wed, 27 Sep 2017 16:12:44 +0300

> We need to be able to transparently forward most link-local frames via
> tunnels (e.g. vxlan, qinq). Currently the bridge's group_fwd_mask has a
> mask which restricts the forwarding of STP and LACP, but we need to be able
> to forward these over tunnels and control that forwarding on a per-port
> basis thus add a new per-port group_fwd_mask option which only disallows
> mac pause frames to be forwarded (they're always dropped anyway).
> The patch does not change the current default situation - all of the others
> are still restricted unless configured for forwarding.
> We have successfully tested this patch with LACP and STP forwarding over
> VxLAN and qinq tunnels.
> 
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

Applied, thanks.

^ permalink raw reply

* Re: [patch net-next 1/7] skbuff: Add the offload_mr_fwd_mark field
From: Davide Caratti @ 2017-09-29 11:14 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, yotamg, idosch, mlxsw, nikolay, andrew, dsa, edumazet,
	willemb, johannes.berg, pabeni, daniel, f.fainelli, fw,
	gfree.wind
In-Reply-To: <20170928173415.15551-2-jiri@resnulli.us>

hello Jiri and Yotam,

On Thu, 2017-09-28 at 19:34 +0200, Jiri Pirko wrote:
> From: Yotam Gigi <yotamg@mellanox.com>
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 19e64bf..ada8214 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -772,6 +772,7 @@ struct sk_buff {
>  	__u8			remcsum_offload:1;
>  #ifdef CONFIG_NET_SWITCHDEV
>  	__u8			offload_fwd_mark:1;
> +	__u8			offload_mr_fwd_mark:1;

I had a look at the pahole output:

$ make allyesconfig
$ make net/core/skbuff.o
$ pahole net/core/skbuff.o | grep -C7 tc_from_ingress


        __u8                       ipvs_property:1;      /*   147: 7  1 */
        __u8                       inner_protocol_type:1; /*   147: 6  1 */
        __u8                       remcsum_offload:1;    /*   147: 5  1 */
        __u8                       offload_fwd_mark:1;   /*   147: 4  1 */
        __u8                       tc_skip_classify:1;   /*   147: 3  1 */
        __u8                       tc_at_ingress:1;      /*   147: 2  1 */
        __u8                       tc_redirected:1;      /*   147: 1  1 */
        __u8                       tc_from_ingress:1;    /*   147: 0  1 */
        __u16                      tc_index;             /*   148     2 */

        /* XXX 2 bytes hole, try to pack */

        union {
                __wsum             csum;                 /*           4 */
                struct {

apparently there are no more spare bits to use at that offset: therefore,
adding 'offload_mr_fwd_mark' before 'tc_skip_classify' will make
'tc_from_ingress' slip at offset 148, and tc_index at offset 150.
I think you can use that 2-bytes hole below tc_index, and also move the
offload_fwd_mark bit there, as we use both when CONFIG_NET_SWITCHDEV is
enabled. This way we will also gain one spare bit, without changing the
struct size or worsening the cacheline alignments.

what do you think?

regards,
-- 
davide

^ permalink raw reply

* Re: [PATCH net-next] net: ipv4: remove fib_info arg to fib_check_nh
From: David Miller @ 2017-09-29  5:20 UTC (permalink / raw)
  To: dsahern; +Cc: netdev
In-Reply-To: <1506570119-17088-1-git-send-email-dsahern@gmail.com>

From: David Ahern <dsahern@gmail.com>
Date: Wed, 27 Sep 2017 20:41:59 -0700

> fib_check_nh does not use the fib_info arg; remove t.
> 
> Signed-off-by: David Ahern <dsahern@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: ipv4: remove fib_weight
From: David Miller @ 2017-09-29  5:20 UTC (permalink / raw)
  To: dsahern; +Cc: netdev
In-Reply-To: <1506564480-20374-1-git-send-email-dsahern@gmail.com>

From: David Ahern <dsahern@gmail.com>
Date: Wed, 27 Sep 2017 19:08:00 -0700

> fib_weight in fib_info is set but not used. Remove it and the
> helpers for setting it.
> 
> Signed-off-by: David Ahern <dsahern@gmail.com>

Applied.

^ permalink raw reply

* [PATCH net-next] net: core: decouple ifalias get/set from rtnl lock
From: Florian Westphal @ 2017-09-29 11:21 UTC (permalink / raw)
  To: netdev; +Cc: edumazet, Florian Westphal

Device alias can be set by either rtnetlink (rtnl is held) or sysfs.

rtnetlink hold the rtnl mutex, sysfs acquires it for this purpose.
Add an extra mutex for it plus a seqcount to get a consistent snapshot
of the alias buffer.

This allows the sysfs path to not take rtnl and would later allow
to not hold it when dumping ifalias.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/linux/netdevice.h |  3 +-
 net/core/dev.c            | 70 +++++++++++++++++++++++++++++++++++++++--------
 net/core/net-sysfs.c      | 14 ++++------
 net/core/rtnetlink.c      | 13 +++++++--
 4 files changed, 77 insertions(+), 23 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f535779d9dc1..0bcedb498f6f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1632,7 +1632,7 @@ enum netdev_priv_flags {
 struct net_device {
 	char			name[IFNAMSIZ];
 	struct hlist_node	name_hlist;
-	char 			*ifalias;
+	char			__rcu *ifalias;
 	/*
 	 *	I/O specific fields
 	 *	FIXME: Merge these and struct ifmap into one
@@ -3275,6 +3275,7 @@ void __dev_notify_flags(struct net_device *, unsigned int old_flags,
 			unsigned int gchanges);
 int dev_change_name(struct net_device *, const char *);
 int dev_set_alias(struct net_device *, const char *, size_t);
+int dev_get_alias(const struct net_device *, char *, size_t);
 int dev_change_net_namespace(struct net_device *, struct net *, const char *);
 int __dev_set_mtu(struct net_device *, int);
 int dev_set_mtu(struct net_device *, int);
diff --git a/net/core/dev.c b/net/core/dev.c
index e350c768d4b5..a7ac2902d702 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -188,6 +188,9 @@ static struct napi_struct *napi_by_id(unsigned int napi_id);
 DEFINE_RWLOCK(dev_base_lock);
 EXPORT_SYMBOL(dev_base_lock);
 
+static DEFINE_MUTEX(ifalias_mutex);
+static seqcount_t ifalias_rename_seq;
+
 /* protects napi_hash addition/deletion and napi_gen_id */
 static DEFINE_SPINLOCK(napi_hash_lock);
 
@@ -1265,29 +1268,72 @@ int dev_change_name(struct net_device *dev, const char *newname)
  */
 int dev_set_alias(struct net_device *dev, const char *alias, size_t len)
 {
-	char *new_ifalias;
-
-	ASSERT_RTNL();
+	char *new_ifalias, *old_ifalias;
 
 	if (len >= IFALIASZ)
 		return -EINVAL;
 
+	mutex_lock(&ifalias_mutex);
+
+	old_ifalias = rcu_dereference_protected(dev->ifalias,
+						mutex_is_locked(&ifalias_mutex));
 	if (!len) {
-		kfree(dev->ifalias);
-		dev->ifalias = NULL;
-		return 0;
+		RCU_INIT_POINTER(dev->ifalias, NULL);
+		goto out;
 	}
 
-	new_ifalias = krealloc(dev->ifalias, len + 1, GFP_KERNEL);
-	if (!new_ifalias)
+	new_ifalias = __krealloc(old_ifalias, len + 1, GFP_KERNEL);
+	if (!new_ifalias) {
+		mutex_unlock(&ifalias_mutex);
 		return -ENOMEM;
-	dev->ifalias = new_ifalias;
-	memcpy(dev->ifalias, alias, len);
-	dev->ifalias[len] = 0;
+	}
 
+	if (new_ifalias == old_ifalias) {
+		write_seqcount_begin(&ifalias_rename_seq);
+		memcpy(new_ifalias, alias, len);
+		new_ifalias[len] = 0;
+		write_seqcount_end(&ifalias_rename_seq);
+		mutex_unlock(&ifalias_mutex);
+		return len;
+	}
+
+	memcpy(new_ifalias, alias, len);
+	new_ifalias[len] = 0;
+
+	rcu_assign_pointer(dev->ifalias, new_ifalias);
+out:
+	mutex_unlock(&ifalias_mutex);
+	if (old_ifalias) {
+		synchronize_net();
+		kfree(old_ifalias);
+	}
 	return len;
 }
 
+int dev_get_alias(const struct net_device *dev, char *alias, size_t len)
+{
+	unsigned int seq;
+	int ret;
+
+	for (;;) {
+		const char *name;
+
+		ret = 0;
+		rcu_read_lock();
+		name = rcu_dereference(dev->ifalias);
+		seq = raw_seqcount_begin(&ifalias_rename_seq);
+		if (name)
+			ret = snprintf(alias, len, "%s", name);
+		rcu_read_unlock();
+
+		if (!read_seqcount_retry(&ifalias_rename_seq, seq))
+			break;
+
+		cond_resched();
+	}
+
+	return ret;
+}
 
 /**
  *	netdev_features_change - device changes features
@@ -8770,6 +8816,8 @@ static int __init net_dev_init(void)
 				       NULL, dev_cpu_dead);
 	WARN_ON(rc < 0);
 	rc = 0;
+
+	seqcount_init(&ifalias_rename_seq);
 out:
 	return rc;
 }
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 927a6dcbad96..3d85119d3104 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -391,10 +391,7 @@ static ssize_t ifalias_store(struct device *dev, struct device_attribute *attr,
 	if (len >  0 && buf[len - 1] == '\n')
 		--count;
 
-	if (!rtnl_trylock())
-		return restart_syscall();
 	ret = dev_set_alias(netdev, buf, count);
-	rtnl_unlock();
 
 	return ret < 0 ? ret : len;
 }
@@ -403,13 +400,12 @@ static ssize_t ifalias_show(struct device *dev,
 			    struct device_attribute *attr, char *buf)
 {
 	const struct net_device *netdev = to_net_dev(dev);
+	char tmp[IFALIASZ];
 	ssize_t ret = 0;
 
-	if (!rtnl_trylock())
-		return restart_syscall();
-	if (netdev->ifalias)
-		ret = sprintf(buf, "%s\n", netdev->ifalias);
-	rtnl_unlock();
+	ret = dev_get_alias(netdev, tmp, sizeof(tmp));
+	if (ret > 0)
+		ret = sprintf(buf, "%s\n", tmp);
 	return ret;
 }
 static DEVICE_ATTR_RW(ifalias);
@@ -1488,7 +1484,7 @@ static void netdev_release(struct device *d)
 
 	BUG_ON(dev->reg_state != NETREG_RELEASED);
 
-	kfree(dev->ifalias);
+	kfree(rcu_access_pointer(dev->ifalias));
 	netdev_freemem(dev);
 }
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index e6955da0d58d..3961f87cdc76 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1366,6 +1366,16 @@ static int nla_put_iflink(struct sk_buff *skb, const struct net_device *dev)
 	return nla_put_u32(skb, IFLA_LINK, ifindex);
 }
 
+static noinline_for_stack int nla_put_ifalias(struct sk_buff *skb,
+					      struct net_device *dev)
+{
+	char buf[IFALIASZ];
+	int ret;
+
+	ret = dev_get_alias(dev, buf, sizeof(buf));
+	return ret > 0 ? nla_put_string(skb, IFLA_IFALIAS, buf) : 0;
+}
+
 static int rtnl_fill_link_netnsid(struct sk_buff *skb,
 				  const struct net_device *dev)
 {
@@ -1425,8 +1435,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 	    nla_put_u8(skb, IFLA_CARRIER, netif_carrier_ok(dev)) ||
 	    (dev->qdisc &&
 	     nla_put_string(skb, IFLA_QDISC, dev->qdisc->ops->id)) ||
-	    (dev->ifalias &&
-	     nla_put_string(skb, IFLA_IFALIAS, dev->ifalias)) ||
+	    nla_put_ifalias(skb, dev) ||
 	    nla_put_u32(skb, IFLA_CARRIER_CHANGES,
 			atomic_read(&dev->carrier_changes)) ||
 	    nla_put_u8(skb, IFLA_PROTO_DOWN, dev->proto_down))
-- 
2.13.5

^ permalink raw reply related

* Re: [PATCH v3] ebtables: fix race condition in frame_filter_net_init()
From: Pablo Neira Ayuso @ 2017-09-29 11:29 UTC (permalink / raw)
  To: Artem Savkov; +Cc: Florian Westphal, netdev, linux-kernel, netfilter-devel
In-Reply-To: <20170926163545.3668-1-asavkov@redhat.com>

On Tue, Sep 26, 2017 at 06:35:45PM +0200, Artem Savkov wrote:
> It is possible for ebt_in_hook to be triggered before ebt_table is assigned
> resulting in a NULL-pointer dereference. Make sure hooks are
> registered as the last step.

Applied, thanks.

^ permalink raw reply

* RE: [patch net-next 1/7] skbuff: Add the offload_mr_fwd_mark field
From: Yuval Mintz @ 2017-09-29 11:36 UTC (permalink / raw)
  To: Davide Caratti, Jiri Pirko, netdev@vger.kernel.org
  Cc: davem@davemloft.net, Yotam Gigi, Ido Schimmel, mlxsw,
	nikolay@cumulusnetworks.com, andrew@lunn.ch,
	dsa@cumulusnetworks.com, edumazet@google.com, willemb@google.com,
	johannes.berg@intel.com, pabeni@redhat.com, daniel@iogearbox.net,
	f.fainelli@gmail.com, fw@strlen.de, gfree.wind@vip.163.com
In-Reply-To: <1506683686.2980.44.camel@redhat.com>

> hello Jiri and Yotam,
> 
> On Thu, 2017-09-28 at 19:34 +0200, Jiri Pirko wrote:
> > From: Yotam Gigi <yotamg@mellanox.com>
> >
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index 19e64bf..ada8214 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -772,6 +772,7 @@ struct sk_buff {
> >  	__u8			remcsum_offload:1;
> >  #ifdef CONFIG_NET_SWITCHDEV
> >  	__u8			offload_fwd_mark:1;
> > +	__u8			offload_mr_fwd_mark:1;
> 
> I had a look at the pahole output:
> 
> $ make allyesconfig
> $ make net/core/skbuff.o
> $ pahole net/core/skbuff.o | grep -C7 tc_from_ingress
> 
> 
>         __u8                       ipvs_property:1;      /*   147: 7  1 */
>         __u8                       inner_protocol_type:1; /*   147: 6  1 */
>         __u8                       remcsum_offload:1;    /*   147: 5  1 */
>         __u8                       offload_fwd_mark:1;   /*   147: 4  1 */
>         __u8                       tc_skip_classify:1;   /*   147: 3  1 */
>         __u8                       tc_at_ingress:1;      /*   147: 2  1 */
>         __u8                       tc_redirected:1;      /*   147: 1  1 */
>         __u8                       tc_from_ingress:1;    /*   147: 0  1 */
>         __u16                      tc_index;             /*   148     2 */
> 
>         /* XXX 2 bytes hole, try to pack */
> 
>         union {
>                 __wsum             csum;                 /*           4 */
>                 struct {
> 
> apparently there are no more spare bits to use at that offset: therefore,
> adding 'offload_mr_fwd_mark' before 'tc_skip_classify' will make
> 'tc_from_ingress' slip at offset 148, and tc_index at offset 150.
> I think you can use that 2-bytes hole below tc_index, and also move the
> offload_fwd_mark bit there, as we use both when
> CONFIG_NET_SWITCHDEV is
> enabled. This way we will also gain one spare bit, without changing the
> struct size or worsening the cacheline alignments.
> 
> what do you think?

Your pahole output still shows a 2B hole until the following union
which is 4B-aligned.
While it's true tc_index moves to offset 150, the union will not move 
[I.e., stay at offset 152] so the layout doesn't really change [greatly]
nor the size of the struct. And we have the benefit of all the bits
remaining consecutive.


^ permalink raw reply

* [PATCH iproute2] ip xfrm: use correct key length for netlink message
From: Michal Kubecek @ 2017-09-29 11:41 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

When SA is added manually using "ip xfrm state add", xfrm_state_modify()
uses alg_key_len field of struct xfrm_algo for the length of key passed to
kernel in the netlink message. However alg_key_len is bit length of the key
while we need byte length here. This is usually harmless as kernel ignores
the excess data but when the bit length of the key exceeds 512
(XFRM_ALGO_KEY_BUF_SIZE), it can result in buffer overflow.

We can simply divide by 8 here as the only place setting alg_key_len is in
xfrm_algo_parse() where it is always set to a multiple of 8 (and there are
already multiple places using "algo->alg_key_len / 8").

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
---
 ip/xfrm_state.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ip/xfrm_state.c b/ip/xfrm_state.c
index 4483fb8f71d2..99fdec2325ec 100644
--- a/ip/xfrm_state.c
+++ b/ip/xfrm_state.c
@@ -539,7 +539,7 @@ static int xfrm_state_modify(int cmd, unsigned int flags, int argc, char **argv)
 
 				xfrm_algo_parse((void *)&alg, type, name, key,
 						buf, sizeof(alg.buf));
-				len += alg.u.alg.alg_key_len;
+				len += alg.u.alg.alg_key_len / 8;
 
 				addattr_l(&req.n, sizeof(req.buf), type,
 					  (void *)&alg, len);
-- 
2.14.2

^ permalink raw reply related

* Re: [PATCH] net: ethernet: sfc: There is a typo, modify curent to current.
From: Hao Zhang @ 2017-09-29 11:51 UTC (permalink / raw)
  To: davem, bkenward, ecree, netdev, linux-net-drivers
In-Reply-To: <20170929101937.GA8963@arx-kt>

Resend to maintainer.



Regards :)

2017-09-29 18:19 GMT+08:00 Hao Zhang <hao5781286@gmail.com>:
> There is a typo, fix it by modify curent to current.
>
> Signed-off-by: Hao Zhang <hao5781286@gmail.com>
> ---
>  drivers/net/ethernet/sfc/mcdi_pcol.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/sfc/mcdi_pcol.h b/drivers/net/ethernet/sfc/mcdi_pcol.h
> index 91fb54f..8bfc8d4 100644
> --- a/drivers/net/ethernet/sfc/mcdi_pcol.h
> +++ b/drivers/net/ethernet/sfc/mcdi_pcol.h
> @@ -11505,7 +11505,7 @@
>
>  /***********************************/
>  /* MC_CMD_GET_SNAPSHOT_LENGTH
> - * Obtain the curent range of allowable values for the SNAPSHOT_LENGTH
> + * Obtain the current range of allowable values for the SNAPSHOT_LENGTH
>   * parameter to MC_CMD_INIT_RXQ.
>   */
>  #define MC_CMD_GET_SNAPSHOT_LENGTH 0x101
> --
> 2.7.4
>

^ permalink raw reply

* Fwd: [PATCH] net: ethernet: sfc: There is a typo, modify curent to current.
From: Hao Zhang @ 2017-09-29 11:53 UTC (permalink / raw)
  To: netdev, davem
In-Reply-To: <20170929101937.GA8963@arx-kt>

Resend to maintainer.



Regards :)


---------- Forwarded message ----------
From: Hao Zhang <hao5781286@gmail.com>
Date: 2017-09-29 18:19 GMT+08:00
Subject: [PATCH] net: ethernet: sfc: There is a typo, modify curent to current.
To: linux-net-drivers@solarflare.com, ecree@solarflare.com,
bkenward@solarflare.com
抄送: linux-kernel@vger.kernel.org, hao5781286@gmail.com


There is a typo, fix it by modify curent to current.

Signed-off-by: Hao Zhang <hao5781286@gmail.com>
---
 drivers/net/ethernet/sfc/mcdi_pcol.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/sfc/mcdi_pcol.h
b/drivers/net/ethernet/sfc/mcdi_pcol.h
index 91fb54f..8bfc8d4 100644
--- a/drivers/net/ethernet/sfc/mcdi_pcol.h
+++ b/drivers/net/ethernet/sfc/mcdi_pcol.h
@@ -11505,7 +11505,7 @@

 /***********************************/
 /* MC_CMD_GET_SNAPSHOT_LENGTH
- * Obtain the curent range of allowable values for the SNAPSHOT_LENGTH
+ * Obtain the current range of allowable values for the SNAPSHOT_LENGTH
  * parameter to MC_CMD_INIT_RXQ.
  */
 #define MC_CMD_GET_SNAPSHOT_LENGTH 0x101

^ permalink raw reply related

* Re: [PATCH RFC 3/5] Add KSZ8795 switch driver
From: Andrew Lunn @ 2017-09-29 12:12 UTC (permalink / raw)
  To: David Laight
  Cc: Tristram.Ha@microchip.com, muvarov@gmail.com, pavel@ucw.cz,
	nathan.leigh.conrad@gmail.com,
	vivien.didelot@savoirfairelinux.com, f.fainelli@gmail.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Woojung.Huh@microchip.com
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DD0084EDC@AcuExch.aculab.com>

On Fri, Sep 29, 2017 at 09:14:26AM +0000, David Laight wrote:
> From: Andrew Lunn
> > Sent: 28 September 2017 20:34
> ...
> > > There are 34 counters.  In normal case using generic bus I/O or PCI to read them
> > > is very quick, but the switch is mostly accessed using SPI, or even I2C.  As the SPI
> > > access is very slow.
> > 
> > How slow is it? The Marvell switches all use MDIO. It is probably a
> > bit faster than I2C, but it is a lot slower than MMIO or PCI.
> > 
> > ethtool -S lan0 takes about 25ms.
> 
> Is the SPI access software bit-banged?

That will depend on the board design. I've used mdio bit banging, and
that was painfully slow for stats.

But we should primarily think about average hardware. It is going to
have hardware SPI or I2C. If statistics reading with hardware I2C is
reasonable, i would avoid caching, and just ensure other accesses are
permitted between individual statistic reads.

It also requires Microchip also post new code. They have been very
silent for quite a while....

	  Andrew

^ permalink raw reply

* [PATCH v2 net] net: mvpp2: Fix clock resource by adding an optional bus clock
From: Gregory CLEMENT @ 2017-09-29 12:27 UTC (permalink / raw)
  To: David S. Miller, linux-kernel, netdev
  Cc: Jason Cooper, Andrew Lunn, Sebastian Hesselbarth, Gregory CLEMENT,
	Thomas Petazzoni, linux-arm-kernel, Antoine Tenart,
	Miquèl Raynal, Nadav Haklai, Shadi Ammouri, Yehuda Yitschak,
	Omri Itach, Hanna Hawa, Igal Liberman, Marcin Wojtas

On Armada 7K/8K we need to explicitly enable the bus clock. The bus clock
is optional because not all the SoCs need them but at least for Armada
7K/8K it is actually mandatory.

The binding documentation is updating accordingly.

Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
---
Changelog:
v1 -> v2:
 - manage the -EPROBE_DEFER case
 - fix typos in documentation
 - remove useless test before clk_disable_unprepare()

 Documentation/devicetree/bindings/net/marvell-pp2.txt | 10 ++++++----
 drivers/net/ethernet/marvell/mvpp2.c                  | 15 +++++++++++++++
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/marvell-pp2.txt b/Documentation/devicetree/bindings/net/marvell-pp2.txt
index 7e2dad08a12e..1814fa13f6ab 100644
--- a/Documentation/devicetree/bindings/net/marvell-pp2.txt
+++ b/Documentation/devicetree/bindings/net/marvell-pp2.txt
@@ -21,8 +21,9 @@ Required properties:
 	- main controller clock (for both armada-375-pp2 and armada-7k-pp2)
 	- GOP clock (for both armada-375-pp2 and armada-7k-pp2)
 	- MG clock (only for armada-7k-pp2)
-- clock-names: names of used clocks, must be "pp_clk", "gop_clk" and
-  "mg_clk" (the latter only for armada-7k-pp2).
+	- AXI clock (only for armada-7k-pp2)
+- clock-names: names of used clocks, must be "pp_clk", "gop_clk", "mg_clk"
+  and "axi_clk" (the 2 latter only for armada-7k-pp2).
 
 The ethernet ports are represented by subnodes. At least one port is
 required.
@@ -78,8 +79,9 @@ Example for marvell,armada-7k-pp2:
 cpm_ethernet: ethernet@0 {
 	compatible = "marvell,armada-7k-pp22";
 	reg = <0x0 0x100000>, <0x129000 0xb000>;
-	clocks = <&cpm_syscon0 1 3>, <&cpm_syscon0 1 9>, <&cpm_syscon0 1 5>;
-	clock-names = "pp_clk", "gop_clk", "gp_clk";
+	clocks = <&cpm_syscon0 1 3>, <&cpm_syscon0 1 9>,
+		 <&cpm_syscon0 1 5>, <&cpm_syscon0 1 18>;
+	clock-names = "pp_clk", "gop_clk", "gp_clk", "axi_clk";
 
 	eth0: eth0 {
 		interrupts = <ICU_GRP_NSR 39 IRQ_TYPE_LEVEL_HIGH>,
diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c
index dd0ee2691c86..f2656112986b 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -792,6 +792,7 @@ struct mvpp2 {
 	struct clk *pp_clk;
 	struct clk *gop_clk;
 	struct clk *mg_clk;
+	struct clk *axi_clk;
 
 	/* List of pointers to port structures */
 	struct mvpp2_port **port_list;
@@ -7963,6 +7964,18 @@ static int mvpp2_probe(struct platform_device *pdev)
 		err = clk_prepare_enable(priv->mg_clk);
 		if (err < 0)
 			goto err_gop_clk;
+
+		priv->axi_clk = devm_clk_get(&pdev->dev, "axi_clk");
+		if (IS_ERR(priv->axi_clk)) {
+			err = PTR_ERR(priv->axi_clk);
+			if (err == -EPROBE_DEFER)
+				goto err_gop_clk;
+			priv->axi_clk = NULL;
+		} else {
+			err = clk_prepare_enable(priv->axi_clk);
+			if (err < 0)
+				goto err_gop_clk;
+		}
 	}
 
 	/* Get system's tclk rate */
@@ -8015,6 +8028,7 @@ static int mvpp2_probe(struct platform_device *pdev)
 	return 0;
 
 err_mg_clk:
+	clk_disable_unprepare(priv->axi_clk);
 	if (priv->hw_version == MVPP22)
 		clk_disable_unprepare(priv->mg_clk);
 err_gop_clk:
@@ -8052,6 +8066,7 @@ static int mvpp2_remove(struct platform_device *pdev)
 				  aggr_txq->descs_dma);
 	}
 
+	clk_disable_unprepare(priv->axi_clk);
 	clk_disable_unprepare(priv->mg_clk);
 	clk_disable_unprepare(priv->pp_clk);
 	clk_disable_unprepare(priv->gop_clk);
-- 
2.14.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox