Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [jkirsher/next-queue PATCH v4 0/6] tc-flower based cloud filters in i40e
From: Jiri Pirko @ 2017-10-11 20:38 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Amritha Nambiar, intel-wired-lan, Jeff Kirsher,
	Duyck, Alexander H, Netdev, Jamal Hadi Salim, Cong Wang
In-Reply-To: <CAKgT0UcP8refj2KLMT258xPNS2h3KhLqKD5sJkNLXRdNJrRCVg@mail.gmail.com>

Wed, Oct 11, 2017 at 07:46:27PM CEST, alexander.duyck@gmail.com wrote:
>On Wed, Oct 11, 2017 at 5:56 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Wed, Oct 11, 2017 at 02:24:12AM CEST, amritha.nambiar@intel.com wrote:
>>>This patch series enables configuring cloud filters in i40e
>>>using the tc-flower classifier. The classification function
>>>of the filter is to match a packet to a class. cls_flower is
>>>extended to offload classid to hardware. The offloaded classid
>>>is used direct matched packets to a traffic class on the device.
>>>The approach here is similar to the tc 'prio' qdisc which uses
>>>the classid for band selection. The ingress qdisc is called ffff:0,
>>>so traffic classes are ffff:1 to ffff:8 (i40e has max of 8 TCs).
>>
>>
>> NACK. This clearly looks like abuse of classid to something
>> else. Classid is here to identify qdisc instance. However, you use it
>> for hw tclass identification. This is mixing of apples and oranges.
>>
>> Why?
>>
>> Please don't try to abuse things! This is not nice.
>
>This isn't an abuse. This is reproducing in hardware what is already
>the behavior for software. Isn't that how offloads are supposed to
>work?

What is meaning of classid in HW? Classid is SW only identification of
qdisc instances. No relation to HW instances = abuse.


>
>This is exactly how prio currently handles this. We are essentially
>doing the exact same thing in the hardware where we are choosing a
>queueing group based on the class ID. You could setup a prio qdisc. If
>you are offloading a qdisc behavior into hardware how are you supposed
>to emulate the behavior if you aren't allowing the offload to use the
>same mechanism?

I believe that the correct way to solve is to introduce SETTCLASS
termination action (similar to TRAP, GOTO_CHAIN, etc), which would allow
user to instruct the destination tclass and driver can offload it.

^ permalink raw reply

* Re: [PATCH net-next 2/2] net sched act_vlan: VLAN action rewrite to use RCU lock/unlock and update
From: Jiri Pirko @ 2017-10-11 20:41 UTC (permalink / raw)
  To: Cong Wang
  Cc: Manish Kurup, Jamal Hadi Salim, David Miller,
	Linux Kernel Network Developers, LKML, Alexander Aring,
	Roman Mashak, manish.kurup
In-Reply-To: <CAM_iQpUYGUzvvM3Weaw5ertgVpwn0p9j6B5xd5Ar5ysJwrdVvg@mail.gmail.com>

Wed, Oct 11, 2017 at 06:27:07PM CEST, xiyou.wangcong@gmail.com wrote:
>On Tue, Oct 10, 2017 at 7:33 PM, Manish Kurup <kurup.manish@gmail.com> wrote:

[...]


>> @@ -187,16 +196,33 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla,
>>
>>         v = to_vlan(*a);
>>
>> -       spin_lock_bh(&v->tcf_lock);
>> -
>> -       v->tcfv_action = action;
>> -       v->tcfv_push_vid = push_vid;
>> -       v->tcfv_push_prio = push_prio;
>> -       v->tcfv_push_proto = push_proto;
>> +       ASSERT_RTNL();
>> +       p = kzalloc(sizeof(*p), GFP_KERNEL);
>> +       if (unlikely(!p)) {
>> +               if (ovr)
>> +                       tcf_idr_release(*a, bind);
>> +               return -ENOMEM;
>> +       }
>>
>>         v->tcf_action = parm->action;
>>
>> -       spin_unlock_bh(&v->tcf_lock);
>> +       p_old = rtnl_dereference(v->vlan_p);
>> +
>> +       if (ovr)
>> +               spin_lock_bh(&v->tcf_lock);
>
>Why still take spinlock when you already have RTNL lock?
>What's the point?

Yeah, I believe this is copy&paste bug from act_skbmod

^ permalink raw reply

* Re: [PATCH net-next] tcp: fix tcp_unlink_write_queue()
From: David Miller @ 2017-10-11 20:41 UTC (permalink / raw)
  To: eric.dumazet; +Cc: ynorov, netdev, catalin.marinas, weiwan, ncardwell
In-Reply-To: <1507753649.31614.36.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 11 Oct 2017 13:27:29 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> Yury reported crash with this signature :
> 
> [  554.034021] [<ffff80003ccd5a58>] 0xffff80003ccd5a58
> [  554.034156] [<ffff00000888fd34>] skb_release_all+0x14/0x30
> [  554.034288] [<ffff00000888fd64>] __kfree_skb+0x14/0x28
> [  554.034409] [<ffff0000088ece6c>] tcp_sendmsg_locked+0x4dc/0xcc8
> [  554.034541] [<ffff0000088ed68c>] tcp_sendmsg+0x34/0x58
> [  554.034659] [<ffff000008919fd4>] inet_sendmsg+0x2c/0xf8
> [  554.034783] [<ffff0000088842e8>] sock_sendmsg+0x18/0x30
> [  554.034928] [<ffff0000088861fc>] SyS_sendto+0x84/0xf8
> 
> Problem is that skb->destructor contains garbage, and this is
> because I accidentally removed tcp_skb_tsorted_anchor_cleanup()
> from tcp_unlink_write_queue()
> 
> This would trigger with a write(fd, <invalid_memory>, len) attempt,
> and we will add to packetdrill this capability to avoid future
> regressions.
> 
> Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue")
> Reported-by: Yury Norov <ynorov@caviumnetworks.com>
> Tested-by: Yury Norov <ynorov@caviumnetworks.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH net-next v3 5/5] selinux: bpf: Add addtional check for bpf object file receive
From: Chenbo Feng @ 2017-10-11 20:43 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Chenbo Feng, linux-security-module, netdev, SELinux,
	Daniel Borkmann, Alexei Starovoitov, Lorenzo Colitti
In-Reply-To: <1507726470.15898.3.camel@tycho.nsa.gov>

On Wed, Oct 11, 2017 at 5:54 AM, Stephen Smalley <sds@tycho.nsa.gov> wrote:
> On Tue, 2017-10-10 at 17:09 -0700, Chenbo Feng wrote:
>> From: Chenbo Feng <fengc@google.com>
>>
>> Introduce a bpf object related check when sending and receiving files
>> through unix domain socket as well as binder. It checks if the
>> receiving
>> process have privilege to read/write the bpf map or use the bpf
>> program.
>> This check is necessary because the bpf maps and programs are using a
>> anonymous inode as their shared inode so the normal way of checking
>> the
>> files and sockets when passing between processes cannot work properly
>> on
>> eBPF object. This check only works when the BPF_SYSCALL is
>> configured.
>> The information stored inside the file security struct is the same as
>> the information in bpf object security struct.
>>
>> Signed-off-by: Chenbo Feng <fengc@google.com>
>> ---
>>  include/linux/lsm_hooks.h         | 17 ++++++++++
>>  include/linux/security.h          |  9 ++++++
>>  kernel/bpf/syscall.c              | 27 ++++++++++++++--
>>  security/security.c               |  8 +++++
>>  security/selinux/hooks.c          | 67
>> +++++++++++++++++++++++++++++++++++++++
>>  security/selinux/include/objsec.h |  9 ++++++
>>  6 files changed, 135 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
>> index 7161d8e7ee79..517dea60b87b 100644
>> --- a/include/linux/lsm_hooks.h
>> +++ b/include/linux/lsm_hooks.h
>> @@ -1385,6 +1385,19 @@
>>   * @bpf_prog_free_security:
>>   *   Clean up the security information stored inside bpf prog.
>>   *
>> + * @bpf_map_file:
>> + *   When creating a bpf map fd, set up the file security
>> information with
>> + *   the bpf security information stored in the map struct. So
>> when the map
>> + *   fd is passed between processes, the security module can
>> directly read
>> + *   the security information from file security struct rather
>> than the bpf
>> + *   security struct.
>> + *
>> + * @bpf_prog_file:
>> + *   When creating a bpf prog fd, set up the file security
>> information with
>> + *   the bpf security information stored in the prog struct. So
>> when the prog
>> + *   fd is passed between processes, the security module can
>> directly read
>> + *   the security information from file security struct rather
>> than the bpf
>> + *   security struct.
>>   */
>>  union security_list_options {
>>       int (*binder_set_context_mgr)(struct task_struct *mgr);
>> @@ -1726,6 +1739,8 @@ union security_list_options {
>>       void (*bpf_map_free_security)(struct bpf_map *map);
>>       int (*bpf_prog_alloc_security)(struct bpf_prog_aux *aux);
>>       void (*bpf_prog_free_security)(struct bpf_prog_aux *aux);
>> +     void (*bpf_map_file)(struct bpf_map *map, struct file
>> *file);
>> +     void (*bpf_prog_file)(struct bpf_prog_aux *aux, struct file
>> *file);
>>  #endif /* CONFIG_BPF_SYSCALL */
>>  };
>>
>> @@ -1954,6 +1969,8 @@ struct security_hook_heads {
>>       struct list_head bpf_map_free_security;
>>       struct list_head bpf_prog_alloc_security;
>>       struct list_head bpf_prog_free_security;
>> +     struct list_head bpf_map_file;
>> +     struct list_head bpf_prog_file;
>>  #endif /* CONFIG_BPF_SYSCALL */
>>  } __randomize_layout;
>>
>> diff --git a/include/linux/security.h b/include/linux/security.h
>> index 18800b0911e5..57573b794e2d 100644
>> --- a/include/linux/security.h
>> +++ b/include/linux/security.h
>> @@ -1740,6 +1740,8 @@ extern int security_bpf_map_alloc(struct
>> bpf_map *map);
>>  extern void security_bpf_map_free(struct bpf_map *map);
>>  extern int security_bpf_prog_alloc(struct bpf_prog_aux *aux);
>>  extern void security_bpf_prog_free(struct bpf_prog_aux *aux);
>> +extern void security_bpf_map_file(struct bpf_map *map, struct file
>> *file);
>> +extern void security_bpf_prog_file(struct bpf_prog_aux *aux, struct
>> file *file);
>>  #else
>>  static inline int security_bpf(int cmd, union bpf_attr *attr,
>>                                            unsigned int size)
>> @@ -1772,6 +1774,13 @@ static inline int
>> security_bpf_prog_alloc(struct bpf_prog_aux *aux)
>>
>>  static inline void security_bpf_prog_free(struct bpf_prog_aux *aux)
>>  { }
>> +
>> +static inline void security_bpf_map_file(struct bpf_map *map, struct
>> file *file)
>> +{ }
>> +
>> +static inline void security_bpf_prog_file(struct bpf_prog_aux *aux,
>> +                                       struct file *file)
>> +{ }
>>  #endif /* CONFIG_SECURITY */
>>  #endif /* CONFIG_BPF_SYSCALL */
>>
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index 1cf31ddd7616..aee69e564c50 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -324,11 +324,22 @@ static const struct file_operations
>> bpf_map_fops = {
>>
>>  int bpf_map_new_fd(struct bpf_map *map, int flags)
>>  {
>> +     int fd;
>> +     struct fd f;
>>       if (security_bpf_map(map, OPEN_FMODE(flags)))
>>               return -EPERM;
>>
>> -     return anon_inode_getfd("bpf-map", &bpf_map_fops, map,
>> +     fd = anon_inode_getfd("bpf-map", &bpf_map_fops, map,
>>                               flags | O_CLOEXEC);
>> +     if (fd < 0)
>> +             return fd;
>> +
>> +     f = fdget(fd);
>> +     if (!f.file)
>> +             return -EBADF;
>
> This seems convoluted and unnecessarily inefficient, since
> anon_inode_getfd() has the struct file and could have directly returned
> it instead of having to go through fdget() on a fd we just installed.
> Also, couldn't the fd->file mapping have changed underneath us between
> fd_install() and fdget()?
> I would think it would be safer and more efficient to create an
> anon_inode_getfdandfile() or similar interface and use that, so that we
> can just pass the file it set up to the hook.  Obviously that would
> need to be reviewed by the vfs folks.
>
Do you mean create a anonymous inode interface specifically for eBPF
object? Is it okay that we add the hooks inside anon_inode_getfd and
pass the file to the hook before fd install.
>> +     security_bpf_map_file(map, f.file);
>> +     fdput(f);
>> +     return fd;
>>  }
>>
>>  int bpf_get_file_flag(int flags)
>> @@ -975,11 +986,23 @@ static const struct file_operations
>> bpf_prog_fops = {
>>
>>  int bpf_prog_new_fd(struct bpf_prog *prog)
>>  {
>> +     int fd;
>> +     struct fd f;
>> +
>>       if (security_bpf_prog(prog))
>>               return -EPERM;
>>
>> -     return anon_inode_getfd("bpf-prog", &bpf_prog_fops, prog,
>> +     fd =  anon_inode_getfd("bpf-prog", &bpf_prog_fops, prog,
>>                               O_RDWR | O_CLOEXEC);
>> +     if (fd < 0)
>> +             return fd;
>> +
>> +     f = fdget(fd);
>> +     if (!f.file)
>> +             return -EBADF;
>> +     security_bpf_prog_file(prog->aux, f.file);
>> +     fdput(f);
>> +     return fd;
>>  }
>>
>>  static struct bpf_prog *____bpf_prog_get(struct fd f)
>> diff --git a/security/security.c b/security/security.c
>> index 1cd8526cb0b7..dacf649b8cfa 100644
>> --- a/security/security.c
>> +++ b/security/security.c
>> @@ -1734,4 +1734,12 @@ void security_bpf_prog_free(struct
>> bpf_prog_aux *aux)
>>  {
>>       call_void_hook(bpf_prog_free_security, aux);
>>  }
>> +void security_bpf_map_file(struct bpf_map *map, struct file *file)
>> +{
>> +     call_void_hook(bpf_map_file, map, file);
>> +}
>> +void security_bpf_prog_file(struct bpf_prog_aux *aux, struct file
>> *file)
>> +{
>> +     call_void_hook(bpf_prog_file, aux, file);
>> +}
>>  #endif /* CONFIG_BPF_SYSCALL */
>> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
>> index 94e473b9c884..0a6ef20513b0 100644
>> --- a/security/selinux/hooks.c
>> +++ b/security/selinux/hooks.c
>> @@ -1815,6 +1815,10 @@ static inline int file_path_has_perm(const
>> struct cred *cred,
>>       return inode_has_perm(cred, file_inode(file), av, &ad);
>>  }
>>
>> +#ifdef CONFIG_BPF_SYSCALL
>> +static int bpf_file_check(struct file *file, u32 sid);
>> +#endif
>> +
>>  /* Check whether a task can use an open file descriptor to
>>     access an inode in a given way.  Check access to the
>>     descriptor itself, and then use dentry_has_perm to
>> @@ -1845,6 +1849,14 @@ static int file_has_perm(const struct cred
>> *cred,
>>                       goto out;
>>       }
>>
>> +#ifdef CONFIG_BPF_SYSCALL
>> +     if (fsec->bpf_type) {
>> +             rc = bpf_file_check(file, cred_sid(cred));
>> +             if (rc)
>> +                     goto out;
>> +     }
>> +#endif
>> +
>>       /* av is zero if only checking access to the descriptor. */
>>       rc = 0;
>>       if (av)
>> @@ -2165,6 +2177,14 @@ static int selinux_binder_transfer_file(struct
>> task_struct *from,
>>                       return rc;
>>       }
>>
>> +#ifdef CONFIG_BPF_SYSCALL
>> +     if (fsec->bpf_type) {
>> +             rc = bpf_file_check(file, sid);
>> +             if (rc)
>> +                     return rc;
>> +     }
>> +#endif
>> +
>>       if (unlikely(IS_PRIVATE(d_backing_inode(dentry))))
>>               return 0;
>>
>> @@ -6288,6 +6308,33 @@ static u32 bpf_map_fmode_to_av(fmode_t fmode)
>>       return av;
>>  }
>>
>> +/* This function will check the file pass through unix socket or
>> binder to see
>> + * if it is a bpf related object. And apply correspinding checks on
>> the bpf
>> + * object based on the type. The bpf maps and programs, not like
>> other files and
>> + * socket, are using a shared anonymous inode inside the kernel as
>> their inode.
>> + * So checking that inode cannot identify if the process have
>> privilege to
>> + * access the bpf object and that's why we have to add this
>> additional check in
>> + * selinux_file_receive and selinux_binder_transfer_files.
>> + */
>> +static int bpf_file_check(struct file *file, u32 sid)
>> +{
>> +     struct file_security_struct *fsec = file->f_security;
>> +     int ret;
>> +
>> +     if (fsec->bpf_type == BPF_MAP) {
>> +             ret = avc_has_perm(sid, fsec->bpf_sid, SECCLASS_BPF,
>> +                                bpf_map_fmode_to_av(file-
>> >f_mode), NULL);
>> +             if (ret)
>> +                     return ret;
>> +     } else if (fsec->bpf_type == BPF_PROG) {
>> +             ret = avc_has_perm(sid, fsec->bpf_sid, SECCLASS_BPF,
>> +                                BPF__PROG_USE, NULL);
>> +             if (ret)
>> +                     return ret;
>> +     }
>> +     return 0;
>> +}
>> +
>>  static int selinux_bpf_map(struct bpf_map *map, fmode_t fmode)
>>  {
>>       u32 sid = current_sid();
>> @@ -6351,6 +6398,24 @@ static void selinux_bpf_prog_free(struct
>> bpf_prog_aux *aux)
>>       aux->security = NULL;
>>       kfree(bpfsec);
>>  }
>> +
>> +static void selinux_bpf_map_file(struct bpf_map *map, struct file
>> *file)
>> +{
>> +     struct bpf_security_struct *bpfsec = map->security;
>> +     struct file_security_struct *fsec = file->f_security;
>> +
>> +     fsec->bpf_type = BPF_MAP;
>> +     fsec->bpf_sid = bpfsec->sid;
>> +}
>> +
>> +static void selinux_bpf_prog_file(struct bpf_prog_aux *aux, struct
>> file *file)
>> +{
>> +     struct bpf_security_struct *bpfsec = aux->security;
>> +     struct file_security_struct *fsec = file->f_security;
>> +
>> +     fsec->bpf_type = BPF_PROG;
>> +     fsec->bpf_sid = bpfsec->sid;
>> +}
>>  #endif
>>
>>  static struct security_hook_list selinux_hooks[] __lsm_ro_after_init
>> = {
>> @@ -6581,6 +6646,8 @@ static struct security_hook_list
>> selinux_hooks[] __lsm_ro_after_init = {
>>       LSM_HOOK_INIT(bpf_prog_alloc_security,
>> selinux_bpf_prog_alloc),
>>       LSM_HOOK_INIT(bpf_map_free_security, selinux_bpf_map_free),
>>       LSM_HOOK_INIT(bpf_prog_free_security,
>> selinux_bpf_prog_free),
>> +     LSM_HOOK_INIT(bpf_map_file, selinux_bpf_map_file),
>> +     LSM_HOOK_INIT(bpf_prog_file, selinux_bpf_prog_file),
>>  #endif
>>  };
>>
>> diff --git a/security/selinux/include/objsec.h
>> b/security/selinux/include/objsec.h
>> index 3d54468ce334..0162648761f9 100644
>> --- a/security/selinux/include/objsec.h
>> +++ b/security/selinux/include/objsec.h
>> @@ -67,11 +67,20 @@ struct inode_security_struct {
>>       spinlock_t lock;
>>  };
>>
>> +enum bpf_obj_type {
>> +     BPF_MAP = 1,
>> +     BPF_PROG,
>> +};
>> +
>>  struct file_security_struct {
>>       u32 sid;                /* SID of open file description */
>>       u32 fown_sid;           /* SID of file owner (for
>> SIGIO) */
>>       u32 isid;               /* SID of inode at the time of file
>> open */
>>       u32 pseqno;             /* Policy seqno at the time of
>> file open */
>> +#ifdef CONFIG_BPF_SYSCALL
>> +     unsigned char bpf_type;
>> +     u32 bpf_sid;
>> +#endif
>>  };
>
> Can you check how this impacts the size of the file_security_cache
> objects, and thus the memory overhead imposed on all open files?
>
> If it is significant, do we need to cache the bpf_sid here or could we
> just store the bpf_type and then dereference the bpfsec if it is a map
> or prog?
>
>From proc/slabinfo I find the number of object and the object size
grows a lot after adding this two field. I will try to dereference the
bpfsec instead to see if it helps.
>>
>>  struct superblock_security_struct {

^ permalink raw reply

* Re: [patch net-next 1/4] net: sched: make tc_action_ops->get_dev return dev and avoid passing net
From: Jiri Pirko @ 2017-10-11 20:43 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	Saeed Mahameed, matanb, leonro, mlxsw
In-Reply-To: <CAM_iQpU5MP546FjwdD_PtOUFiAKb_TiRBEY3b+Kt9mVcCY7MTQ@mail.gmail.com>

Wed, Oct 11, 2017 at 06:34:51PM CEST, xiyou.wangcong@gmail.com wrote:
>On Tue, Oct 10, 2017 at 2:19 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Tue, Oct 10, 2017 at 07:44:53PM CEST, xiyou.wangcong@gmail.com wrote:
>>>On Tue, Oct 10, 2017 at 12:30 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>> -static int tcf_mirred_device(const struct tc_action *a, struct net *net,
>>>> -                            struct net_device **mirred_dev)
>>>> +static struct net_device *tcf_mirred_get_dev(const struct tc_action *a)
>>>>  {
>>>> -       int ifindex = tcf_mirred_ifindex(a);
>>>> +       struct tcf_mirred *m = to_mirred(a);
>>>>
>>>> -       *mirred_dev = __dev_get_by_index(net, ifindex);
>>>> -       if (!*mirred_dev)
>>>> -               return -EINVAL;
>>>> -       return 0;
>>>> +       return __dev_get_by_index(m->net, m->tcfm_ifindex);
>>>
>>>Hmm, why not just return m->tcfm_dev?
>>
>> I just follow the existing code. The change you suggest should be a
>> separate follow-up patch.
>
>Why?

I try to do small contained changes per patch. The resulting code is
doing the same thing as the original, therefore reducing possible bug
appearance. 

>
>Your goal is "make tc_action_ops->get_dev return dev and avoid passing net",
>using m->tcfm_dev is simpler and could save you from adding a net pointer
>to struct tcf_mirred too.

^ permalink raw reply

* Re: Ethtool question
From: John W. Linville @ 2017-10-11 20:44 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev
In-Reply-To: <492e57b2-ac0a-ff01-2698-e048e97d8e37@candelatech.com>

On Wed, Oct 11, 2017 at 09:51:56AM -0700, Ben Greear wrote:
> I noticed today that setting some ethtool settings to the same value
> returns an error code.  I would think this should silently return
> success instead?  Makes it easier to call it from scripts this way:
> 
> [root@lf0313-6477 lanforge]# ethtool -L eth3 combined 1
> combined unmodified, ignoring
> no channel parameters changed, aborting
> current values: tx 0 rx 0 other 1 combined 1
> [root@lf0313-6477 lanforge]# echo $?
> 1

I just had this discussion a couple of months ago with someone. My
initial feeling was like you, a no-op is not a failure. But someone
convinced me otherwise...I will now endeavour to remember who that
was and how they convinced me...

Anyone else have input here?

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

* Re: [PATCH v1 RFC 7/7] Modify tag_ksz.c so that tail tag code can be used by other KSZ switch drivers
From: Pavel Machek @ 2017-10-11 20:45 UTC (permalink / raw)
  To: Tristram.Ha
  Cc: Andrew Lunn, Florian Fainelli, Ruediger Schmitt, muvarov,
	nathan.leigh.conrad, vivien.didelot, UNGLinuxDriver, netdev,
	linux-kernel
In-Reply-To: <1507321985-15097-8-git-send-email-Tristram.Ha@microchip.com>

[-- Attachment #1: Type: text/plain, Size: 1690 bytes --]

Hi!

> +#define	KSZ_INGRESS_TAG_LEN		1

This define is now (or should be) unused, so you can delete it, no?

> _#define	KSZ_EGRESS_TAG_LEN	1

And I'd delete this define, too. Having constant for something that's
variable is quite confusing :-).

Plus you are really doing too much inside single patch.

> + * For Egress (KSZ9477 -> Host), 1 byte is added before FCS.
> + * ---------------------------------------------------------------------------
> + * DA(6bytes)|SA(6bytes)|....|Data(nbytes)|tag0(1byte)|FCS(4bytes)
> + * ---------------------------------------------------------------------------
> + * tag0 : zero-based value represents port
> + *	  (eg, 0x00=port1, 0x02=port3, 0x06=port7)
> + */
> +
> +#define KSZ9477_INGRESS_TAG_LEN		2
> +#define KSZ9477_PTP_TAG_LEN		4
> +#define KSZ9477_PTP_TAG_INDICATION	0x80
> +
> +#define KSZ9477_TAIL_TAG_OVERRIDE	BIT(9)
> +#define KSZ9477_TAIL_TAG_LOOKUP		BIT(10)
> +
> +static int ksz9477_get_tag(u8 *tag, int *port)
> +{
> +	int len = KSZ_EGRESS_TAG_LEN;
> +
> +	/* Extra 4-bytes PTP timestamp */
> +	if (tag[0] & KSZ9477_PTP_TAG_INDICATION)
> +		len += KSZ9477_PTP_TAG_LEN;
> +	*port = tag[0] & 7;
> +	return len;
> +}
> +
> +static void ksz9477_set_tag(void *ptr, u8 *addr, int p)
> +{
> +	u16 *tag = (u16 *)ptr;
> +
> +	*tag = 1 << p;
> +	if (!memcmp(addr, special_mult_addr, ETH_ALEN))
> +		*tag |= KSZ9477_TAIL_TAG_OVERRIDE;
> +	*tag = cpu_to_be16(*tag);
> +}

These are new features that were not there before, right?
								Pavel
								
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* Re: [PATCH RFC] Add other KSZ switch support so that patch check does not complain
From: Pavel Machek @ 2017-10-11 20:46 UTC (permalink / raw)
  To: Tristram.Ha
  Cc: Andrew Lunn, Florian Fainelli, Ruediger Schmitt, muvarov,
	nathan.leigh.conrad, vivien.didelot, UNGLinuxDriver, netdev,
	linux-kernel
In-Reply-To: <1507322009-15142-1-git-send-email-Tristram.Ha@microchip.com>

[-- Attachment #1: Type: text/plain, Size: 433 bytes --]

On Fri 2017-10-06 13:33:29, Tristram.Ha@microchip.com wrote:
> From: Tristram Ha <Tristram.Ha@microchip.com>
> 
> Add other KSZ switch support so that patch check does not complain.
> 
> Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>

Reviewed-by: Pavel Machek <pavel@ucw.cz>

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* Re: [jkirsher/next-queue PATCH v4 0/6] tc-flower based cloud filters in i40e
From: David Miller @ 2017-10-11 20:46 UTC (permalink / raw)
  To: jiri
  Cc: alexander.duyck, amritha.nambiar, intel-wired-lan,
	jeffrey.t.kirsher, alexander.h.duyck, netdev, jhs, xiyou.wangcong
In-Reply-To: <20171011203832.GA9297@nanopsycho>

From: Jiri Pirko <jiri@resnulli.us>
Date: Wed, 11 Oct 2017 22:38:32 +0200

> Wed, Oct 11, 2017 at 07:46:27PM CEST, alexander.duyck@gmail.com wrote:
>>On Wed, Oct 11, 2017 at 5:56 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>> Wed, Oct 11, 2017 at 02:24:12AM CEST, amritha.nambiar@intel.com wrote:
>>>>This patch series enables configuring cloud filters in i40e
>>>>using the tc-flower classifier. The classification function
>>>>of the filter is to match a packet to a class. cls_flower is
>>>>extended to offload classid to hardware. The offloaded classid
>>>>is used direct matched packets to a traffic class on the device.
>>>>The approach here is similar to the tc 'prio' qdisc which uses
>>>>the classid for band selection. The ingress qdisc is called ffff:0,
>>>>so traffic classes are ffff:1 to ffff:8 (i40e has max of 8 TCs).
>>>
>>>
>>> NACK. This clearly looks like abuse of classid to something
>>> else. Classid is here to identify qdisc instance. However, you use it
>>> for hw tclass identification. This is mixing of apples and oranges.
>>>
>>> Why?
>>>
>>> Please don't try to abuse things! This is not nice.
>>
>>This isn't an abuse. This is reproducing in hardware what is already
>>the behavior for software. Isn't that how offloads are supposed to
>>work?
> 
> What is meaning of classid in HW? Classid is SW only identification of
> qdisc instances. No relation to HW instances = abuse.

Jiri I really don't see what the problem is.

As long as the driver does the right thing when changes are made to the
qdisc, it doesn't really matter what "key" they use to refer to it.

It could have just as easily used the qdisc pointer and then internally
use some IDR allocated ID to refer to it in the driver and hardware.

But that's such a waste, we have a unique handle already so why can't
the driver just use that?

^ permalink raw reply

* Re: Ethtool question
From: David Miller @ 2017-10-11 20:49 UTC (permalink / raw)
  To: linville; +Cc: greearb, netdev
In-Reply-To: <20171011204406.GC30940@tuxdriver.com>

From: "John W. Linville" <linville@tuxdriver.com>
Date: Wed, 11 Oct 2017 16:44:07 -0400

> On Wed, Oct 11, 2017 at 09:51:56AM -0700, Ben Greear wrote:
>> I noticed today that setting some ethtool settings to the same value
>> returns an error code.  I would think this should silently return
>> success instead?  Makes it easier to call it from scripts this way:
>> 
>> [root@lf0313-6477 lanforge]# ethtool -L eth3 combined 1
>> combined unmodified, ignoring
>> no channel parameters changed, aborting
>> current values: tx 0 rx 0 other 1 combined 1
>> [root@lf0313-6477 lanforge]# echo $?
>> 1
> 
> I just had this discussion a couple of months ago with someone. My
> initial feeling was like you, a no-op is not a failure. But someone
> convinced me otherwise...I will now endeavour to remember who that
> was and how they convinced me...
> 
> Anyone else have input here?

I guess this usually happens when drivers don't support changing the
settings at all.  So they just make their ethtool operation for the
'set' always return an error.

We could have a generic ethtool helper that does "get" and then if the
"set" request is identical just return zero.

But from another perspective, the error returned from the "set" in this
situation also indicates to the user that the driver does not support
the "set" operation which has value and meaning in and of itself.  And
we'd lose that with the given suggestion.

^ permalink raw reply

* Re: [PATCH v2 net-next 0/2] lan9303: Add basic offloading of unicast traffic
From: David Miller @ 2017-10-11 20:53 UTC (permalink / raw)
  To: privat; +Cc: andrew, vivien.didelot, f.fainelli, netdev, linux-kernel
In-Reply-To: <20171010124953.386-1-privat@egil-hjelmeland.no>

From: Egil Hjelmeland <privat@egil-hjelmeland.no>
Date: Tue, 10 Oct 2017 14:49:51 +0200

> This series add basic offloading of unicast traffic to the lan9303
> DSA driver.
> 
> Review welcome!
>  
> Changes v1 -> v2:
>  - Patch 1: Codestyle linting.
>  - Patch 2: Remember SWE_PORT_STATE while not bridged.
>             Added constant LAN9303_SWE_PORT_MIRROR_DISABLED.

Series applied, thanks.

^ permalink raw reply

* Re: [jkirsher/next-queue PATCH v4 0/6] tc-flower based cloud filters in i40e
From: Jiri Pirko @ 2017-10-11 20:58 UTC (permalink / raw)
  To: David Miller
  Cc: alexander.duyck, amritha.nambiar, intel-wired-lan,
	jeffrey.t.kirsher, alexander.h.duyck, netdev, jhs, xiyou.wangcong
In-Reply-To: <20171011.134652.1653141099248918341.davem@davemloft.net>

Wed, Oct 11, 2017 at 10:46:52PM CEST, davem@davemloft.net wrote:
>From: Jiri Pirko <jiri@resnulli.us>
>Date: Wed, 11 Oct 2017 22:38:32 +0200
>
>> Wed, Oct 11, 2017 at 07:46:27PM CEST, alexander.duyck@gmail.com wrote:
>>>On Wed, Oct 11, 2017 at 5:56 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>>>> Wed, Oct 11, 2017 at 02:24:12AM CEST, amritha.nambiar@intel.com wrote:
>>>>>This patch series enables configuring cloud filters in i40e
>>>>>using the tc-flower classifier. The classification function
>>>>>of the filter is to match a packet to a class. cls_flower is
>>>>>extended to offload classid to hardware. The offloaded classid
>>>>>is used direct matched packets to a traffic class on the device.
>>>>>The approach here is similar to the tc 'prio' qdisc which uses
>>>>>the classid for band selection. The ingress qdisc is called ffff:0,
>>>>>so traffic classes are ffff:1 to ffff:8 (i40e has max of 8 TCs).
>>>>
>>>>
>>>> NACK. This clearly looks like abuse of classid to something
>>>> else. Classid is here to identify qdisc instance. However, you use it
>>>> for hw tclass identification. This is mixing of apples and oranges.
>>>>
>>>> Why?
>>>>
>>>> Please don't try to abuse things! This is not nice.
>>>
>>>This isn't an abuse. This is reproducing in hardware what is already
>>>the behavior for software. Isn't that how offloads are supposed to
>>>work?
>> 
>> What is meaning of classid in HW? Classid is SW only identification of
>> qdisc instances. No relation to HW instances = abuse.
>
>Jiri I really don't see what the problem is.
>
>As long as the driver does the right thing when changes are made to the
>qdisc, it doesn't really matter what "key" they use to refer to it.
>
>It could have just as easily used the qdisc pointer and then internally
>use some IDR allocated ID to refer to it in the driver and hardware.
>
>But that's such a waste, we have a unique handle already so why can't
>the driver just use that?

Well if I see classid, I expect it should refer to qdisc instance. So
far, this has been always a case. But for some drivers, this would mean
something totally different and unrelated. So what should I think?
What's next? Classid could be abused to identify something else. I don't
understand why.

classid in kernel and tclass in hw are 2 completely unrelated things.
Why they should share the same userspace api? What am I missing that
indicates this is not an abuse?

There should be clean and well-defined userspace api:
1) classid to identify qdisc instances
2) something else to identify HW tclasses

^ permalink raw reply

* Re: [PATCH v1 RFC 1/1] Add Microchip KSZ8795 DSA driver
From: Pavel Machek @ 2017-10-11 20:59 UTC (permalink / raw)
  To: Tristram.Ha
  Cc: Andrew Lunn, Florian Fainelli, Ruediger Schmitt, muvarov,
	nathan.leigh.conrad, vivien.didelot, UNGLinuxDriver, netdev,
	linux-kernel
In-Reply-To: <1507322023-15182-2-git-send-email-Tristram.Ha@microchip.com>

[-- Attachment #1: Type: text/plain, Size: 3869 bytes --]

Hi!

> +static void ksz8795_set_prio_queue(struct ksz_device *dev, int port, int queue)
> +{
> +	u8 hi;
> +	u8 lo;
> +
> +	/* Number of queues can only be 1, 2, or 4. */
> +	switch (queue) {
> +	case 4:
> +	case 3:
> +		queue = PORT_QUEUE_SPLIT_4;
> +		break;
> +	case 2:
> +		queue = PORT_QUEUE_SPLIT_2;
> +		break;
> +	default:
> +		queue = PORT_QUEUE_SPLIT_1;
> +	}

If only 1, 2 and 4 are valid, it probably should not accept other
values?

> +static void ksz8795_r_mib_cnt(struct ksz_device *dev, int port, u16 addr,
> +			      u64 *cnt)
> +{
> +	u32 data;
> +	u16 ctrl_addr;
> +	u8 check;
> +	int loop;
> +
> +	ctrl_addr = addr + SWITCH_COUNTER_NUM * port;
> +	ctrl_addr |= IND_ACC_TABLE(TABLE_MIB | TABLE_READ);
> +
> +	mutex_lock(&dev->alu_mutex);
> +	ksz_write16(dev, REG_IND_CTRL_0, ctrl_addr);
> +
> +	/* It is almost guaranteed to always read the valid bit because of
> +	 * slow SPI speed.
> +	 */
> +	for (loop = 2; loop > 0; loop--) {
> +		ksz_read8(dev, REG_IND_MIB_CHECK, &check);
> +
> +		if (check & MIB_COUNTER_VALID) {
> +			ksz_read32(dev, REG_IND_DATA_LO, &data);
> +			if (check & MIB_COUNTER_OVERFLOW)
> +				*cnt += MIB_COUNTER_VALUE + 1;
> +			*cnt += data & MIB_COUNTER_VALUE;
> +			break;
> +		}
> +	}

Hmm. Maybe, but should not this at least warn if if it can not get
valid counter?

> +	/* It is almost guaranteed to always read the valid bit because of
> +	 * slow SPI speed.
> +	 */
> +	for (loop = 2; loop > 0; loop--) {
> +		ksz_read8(dev, REG_IND_MIB_CHECK, &check);
> +
> +		if (check & MIB_COUNTER_VALID) {
> +			ksz_read32(dev, REG_IND_DATA_LO, &data);
> +			if (addr < 2) {
> +				u64 total;
> +
> +				total = check & MIB_TOTAL_BYTES_H;
> +				total <<= 32;
> +				*cnt += total;
> +				*cnt += data;
> +				if (check & MIB_COUNTER_OVERFLOW) {
> +					total = MIB_TOTAL_BYTES_H + 1;
> +					total <<= 32;
> +					*cnt += total;
> +				}
> +			} else {
> +				if (check & MIB_COUNTER_OVERFLOW)
> +					*cnt += MIB_PACKET_DROPPED + 1;
> +				*cnt += data & MIB_PACKET_DROPPED;
> +			}
> +			break;
> +		}
> +	}

Same here. Plus, is overflow handling correct? There may be more than
MIB_PACKET_DROPPED + 1 packets dropped between the checks. 

> +static void ksz8795_r_table(struct ksz_device *dev, int table, u16 addr,
> +			    u64 *data)
> +{
> +	u16 ctrl_addr;
> +
> +	ctrl_addr = IND_ACC_TABLE(table | TABLE_READ) | addr;
> +
> +	mutex_lock(&dev->alu_mutex);
> +	ksz_write16(dev, REG_IND_CTRL_0, ctrl_addr);
> +	ksz_get(dev, REG_IND_DATA_HI, data, sizeof(u64));
> +	mutex_unlock(&dev->alu_mutex);
> +	*data = be64_to_cpu(*data);
> +}

It would be a tiny bit nicer to have be64 temporary variable and use
it; having *data change endianness at runtime is "interesting".

> +static int ksz8795_valid_dyn_entry(struct ksz_device *dev, u8 *data)
> +{
> +	int timeout = 100;
> +
> +	do {
> +		ksz_read8(dev, REG_IND_DATA_CHECK, data);
> +		timeout--;
> +	} while ((*data & DYNAMIC_MAC_TABLE_NOT_READY) && timeout);
> +
> +	/* Entry is not ready for accessing. */
> +	if (*data & DYNAMIC_MAC_TABLE_NOT_READY) {
> +		return -EAGAIN;
> +	/* Entry is ready for accessing. */
> +	} else {
> +		ksz_read8(dev, REG_IND_DATA_8, data);
> +
> +		/* There is no valid entry in the table. */
> +		if (*data & DYNAMIC_MAC_TABLE_MAC_EMPTY)
> +			return -ENXIO;
> +	}

You can drop else and one indentation level.

> +	/* At least one valid entry in the table. */
> +	} else {
> +		u64 buf;
> +		int cnt;
> +
> +		ksz_get(dev, REG_IND_DATA_HI, &buf, sizeof(buf));
> +		buf = be64_to_cpu(buf);

Would it make sense to convert endianness inside ksz_get?

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* Re: [PATCH 3/4] dpaa_eth: change device used
From: David Miller @ 2017-10-11 21:02 UTC (permalink / raw)
  To: madalin.bucur
  Cc: netdev, f.fainelli, andrew, vivien.didelot, junote, linux-kernel
In-Reply-To: <1507644618-32006-4-git-send-email-madalin.bucur@nxp.com>

From: Madalin Bucur <madalin.bucur@nxp.com>
Date: Tue, 10 Oct 2017 17:10:17 +0300

> @@ -2696,7 +2681,13 @@ static int dpaa_eth_probe(struct platform_device *pdev)
>  	int err = 0, i, channel;
>  	struct device *dev;
>  
> -	dev = &pdev->dev;
> +	/* device used for DMA mapping */
> +	dev = pdev->dev.parent;
> +	err = dma_coerce_mask_and_coherent(dev, DMA_BIT_MASK(40));
> +	if (err) {
> +		dev_err(dev, "dma_coerce_mask_and_coherent() failed\n");
> +		goto dev_mask_failed;
> +	}
>  
>  	/* Allocate this early, so we can store relevant information in
>  	 * the private area

Since you are moving this code up before the netdev allocation, you must
adjust the failure path goto label used.

Your change as-is will cause an OOPS because we'll pass a NULL pointer
to free_netdev().

^ permalink raw reply

* Re: [PATCH RFC] Add Microchip KSZ8895 DSA driver
From: Pavel Machek @ 2017-10-11 21:03 UTC (permalink / raw)
  To: Tristram.Ha
  Cc: Andrew Lunn, Florian Fainelli, muvarov, nathan.leigh.conrad,
	vivien.didelot, UNGLinuxDriver, netdev, linux-kernel
In-Reply-To: <1507322033-15222-2-git-send-email-Tristram.Ha@microchip.com>

[-- Attachment #1: Type: text/plain, Size: 2626 bytes --]

Hi!

> +static void ksz8895_set_prio_queue(struct ksz_device *dev, int port, int queue)
> +{
> +	u8 hi;
> +	u8 lo;
> +
> +	/* Number of queues can only be 1, 2, or 4. */
> +	switch (queue) {
> +	case 4:
> +	case 3:
> +		queue = PORT_QUEUE_SPLIT_4;
> +		break;
> +	case 2:
> +		queue = PORT_QUEUE_SPLIT_2;
> +		break;
> +	default:
> +		queue = PORT_QUEUE_SPLIT_1;
> +	}
> +	ksz_pread8(dev, port, REG_PORT_CTRL_0, &lo);
> +	ksz_pread8(dev, port, P_DROP_TAG_CTRL, &hi);
> +	lo &= ~PORT_QUEUE_SPLIT_L;
> +	if (queue & PORT_QUEUE_SPLIT_2)
> +		lo |= PORT_QUEUE_SPLIT_L;
> +	hi &= ~PORT_QUEUE_SPLIT_H;
> +	if (queue & PORT_QUEUE_SPLIT_4)
> +		hi |= PORT_QUEUE_SPLIT_H;
> +	ksz_pwrite8(dev, port, REG_PORT_CTRL_0, lo);
> +	ksz_pwrite8(dev, port, P_DROP_TAG_CTRL, hi);
> +
> +	/* Default is port based for egress rate limit. */
> +	if (queue != PORT_QUEUE_SPLIT_1)
> +		ksz_cfg(dev, REG_SW_CTRL_19, SW_OUT_RATE_LIMIT_QUEUE_BASED,
> +			true);
> +}

This is same as the other driver, right? Same comments apply here, and
please find a way to make it shared.


> +static void ksz8895_r_mib_cnt(struct ksz_device *dev, int port, u16 addr,
> +			      u64 *cnt)
> +{
> +	u32 data;
> +	u16 ctrl_addr;
> +	u8 check;
> +	int loop;
> +
> +	ctrl_addr = addr + SWITCH_COUNTER_NUM * port;
> +	ctrl_addr |= IND_ACC_TABLE(TABLE_MIB | TABLE_READ);
> +
> +	mutex_lock(&dev->alu_mutex);
> +	ksz_write16(dev, REG_IND_CTRL_0, ctrl_addr);
> +
> +	/* It is almost guaranteed to always read the valid bit because of
> +	 * slow SPI speed.
> +	 */
> +	for (loop = 2; loop > 0; loop--) {
> +		ksz_read8(dev, REG_IND_MIB_CHECK, &check);
> +
> +		if (check & MIB_COUNTER_VALID) {
> +			ksz_read32(dev, REG_IND_DATA_LO, &data);
> +			if (check & MIB_COUNTER_OVERFLOW)
> +				*cnt += MIB_COUNTER_VALUE + 1;
> +			*cnt += data & MIB_COUNTER_VALUE;
> +			break;
> +		}
> +	}
> +	mutex_unlock(&dev->alu_mutex);

Again, same function, same review comments.

> +static void ksz8895_r_table(struct ksz_device *dev, int table, u16 addr,
> +			    u64 *data)
> +{
> +	u16 ctrl_addr;
> +
> +	ctrl_addr = IND_ACC_TABLE(table | TABLE_READ) | addr;
> +
> +	mutex_lock(&dev->alu_mutex);
> +	ksz_write16(dev, REG_IND_CTRL_0, ctrl_addr);
> +	ksz_get(dev, REG_IND_DATA_HI, data, sizeof(u64));
> +	mutex_unlock(&dev->alu_mutex);
> +	*data = be64_to_cpu(*data);
> +}

I've seen this before; this is duplicated code, it does not make sense
to review.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* Re: BUG:af_packet fails to TX TSO frames
From: Willem de Bruijn @ 2017-10-11 21:05 UTC (permalink / raw)
  To: Anton Ivanov; +Cc: Anton Ivanov, Network Development, David Miller
In-Reply-To: <d6c1cda2-f7df-9132-304d-075c62d4d83d@kot-begemot.co.uk>

On Wed, Oct 11, 2017 at 3:39 PM, Anton Ivanov
<anton.ivanov@kot-begemot.co.uk> wrote:
> On 11/10/17 19:57, Willem de Bruijn wrote:
>> On Wed, Oct 11, 2017 at 2:39 PM, Anton Ivanov
>> <anton.ivanov@kot-begemot.co.uk> wrote:
>>> The check as now insists that the actual driver supports GSO_ROBUST, because
>>> we have marked the skb dodgy.
>>>
>>> The specific bit which does this check is in net_gso_ok()
>>>
>>> Now, lets's see how many Ethernet drivers set GSO_ROBUST.
>>>
>>> find drivers/net/ethernet -type f -name "*.[c,h]" -exec grep -H GSO_ROBUST
>>> {} \;
>>>
>>> That returns nothing in 4.x
>>>
>>> IMHO - af_packet allocates the skb, does all checks (and extra may be added)
>>> on the gso, why is this set dodgy in the first place?
>> It is set when the header has to be validated.
>>
>> The segmentation logic will validate and fixup gso_segs. See for
>> instance tcp_gso_segment:
>>
>>         if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) {
>>                 /* Packet is from an untrusted source, reset gso_segs. */
>>
>>                 skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(skb->len, mss);
>>
>>                 segs = NULL;
>>                 goto out;
>>         }
>>
>> If the device would have the robust bit set and otherwise supports the
>> required features, fix up gso_segs and pass the large packet to the
>> device.
>>
>> Else it continues to the software gso path.
>>
>> Large packets generated with psock_txring_vnet.c pass this test. I
>
> That test is indeed a different path

The test can be run both with and without ring:

  psock_txring_vnet -l 8000 -s $src_ip -d $dst_ip -v
  psock_txring_vnet -l 8000 -s $src_ip -d $dst_ip -v -N

both with and without qdisc bypass ('-q').

>  - this goes via the tpacket_snd
> which allocs via sock_alloc_send_skb. That results in a non-fragged skb
> as it calls pskb after that with data_len = 0 asking for a contiguous one.

but attached the ring slot as fragments in tpacket_fill_skb.

> My stuff is using sendmmsg which ends up via packet_snd which allocs
> via  sock_alloc_send_pskb which is invoked in a way which always creates
> 2 segments - one for the linear section and one for the rest (and more
> if needed). It is faster than tpacket by the way (several times).
>
> As a comparison tap and other virtual drivers use sock_alloc_send_pskb
> with non-zero data length which results in multiple frags. The code in
> packet_snd is in fact identical with tap (+/- some cosmetic differences).
>
> That is the difference between the tests and that is why your test works
> and mine fails.

All the above test cases work for me, including those that build skbs
with fragments. Could you try those.

^ permalink raw reply

* Re: [PATCH net-next 1/1] net/smc: add SMC rendezvous protocol
From: David Miller @ 2017-10-11 21:06 UTC (permalink / raw)
  To: ubraun; +Cc: netdev, linux-s390, jwi, schwidefsky, heiko.carstens, raspl,
	hwippel
In-Reply-To: <20171010141419.88190-1-ubraun@linux.vnet.ibm.com>

From: Ursula Braun <ubraun@linux.vnet.ibm.com>
Date: Tue, 10 Oct 2017 16:14:19 +0200

> The goal of this patch is to leave common TCP code unmodified. Thus,
> it uses netfilter hooks to intercept TCP SYN and SYN/ACK
> packets. For outgoing packets originating from SMC sockets, the
> experimental option is added. For inbound packets destined for SMC
> sockets, the experimental option is checked.

I think this really isn't going to pass.

It's a user experience nightmare when the kernel inserts and
deletes filtering rules outside of what the user configures
on their system.

This approach was also considerd for ipv6 ILA, and the same
pushback was given.

Why not add support for these new options as a normal TCP
socket option based feature?  Then normal userspace as well
as the SMC stack can make use of it.

^ permalink raw reply

* Re: [PATCH net] macsec: fix memory leaks when skb_to_sgvec fails
From: David Miller @ 2017-10-11 21:07 UTC (permalink / raw)
  To: sd; +Cc: netdev, Jason
In-Reply-To: <5d21bccd731e3c54f2216fcf3b12cd40ba708f7b.1507639962.git.sd@queasysnail.net>

From: Sabrina Dubroca <sd@queasysnail.net>
Date: Tue, 10 Oct 2017 17:07:12 +0200

> Fixes: cda7ea690350 ("macsec: check return value of skb_to_sgvec always")
> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [RFC net-next 1/4] net: ipv6: Make inet6addr_validator a blocking notifier
From: David Miller @ 2017-10-11 21:13 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, jiri, idosch, kjlx
In-Reply-To: <1507653665-20540-2-git-send-email-dsahern@gmail.com>

From: David Ahern <dsahern@gmail.com>
Date: Tue, 10 Oct 2017 09:41:02 -0700

> +	/* validator notifier needs to be blocking;
> +	 * do not call in softirq context
> +	 */
> +	if (!in_softirq()) {

I think we can test this better.

You should be able to audit the call sites and for each one set the
value of a new boolean argument properly, and this way you can also
give the boolean argument a descriptive name.

Furthermore, we can also then pull the inet6_addr allocation out of
the locking paths and thus use GFP_KERNEL when possible.

^ permalink raw reply

* Re: [PATCH v5 1/2] net: phy: DP83822 initial driver submission
From: David Miller @ 2017-10-11 21:14 UTC (permalink / raw)
  To: dmurphy; +Cc: andrew, f.fainelli, netdev, Woojung.Huh, afd
In-Reply-To: <20171010174256.21930-1-dmurphy@ti.com>

From: Dan Murphy <dmurphy@ti.com>
Date: Tue, 10 Oct 2017 12:42:55 -0500

> Add support for the TI  DP83822 10/100Mbit ethernet phy.
> 
> The DP83822 provides flexibility to connect to a MAC through a
> standard MII, RMII or RGMII interface.
> 
> In addition the DP83822 needs to be removed from the DP83848 driver
> as the WoL support is added here for this device.
> 
> Datasheet:
> http://www.ti.com/product/DP83822I/datasheet
> 
> Signed-off-by: Dan Murphy <dmurphy@ti.com>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH v5 2/2] net: phy: at803x: Change error to EINVAL for invalid MAC
From: David Miller @ 2017-10-11 21:14 UTC (permalink / raw)
  To: dmurphy; +Cc: andrew, f.fainelli, netdev, Woojung.Huh, afd
In-Reply-To: <20171010174256.21930-2-dmurphy@ti.com>

From: Dan Murphy <dmurphy@ti.com>
Date: Tue, 10 Oct 2017 12:42:56 -0500

> Change the return error code to EINVAL if the MAC
> address is not valid in the set_wol function.
> 
> Signed-off-by: Dan Murphy <dmurphy@ti.com>

Applied to net-next.

^ permalink raw reply

* [PATCH] rtlwifi: Remove unused cur_rfstate variables
From: Christos Gkekas @ 2017-10-11 21:15 UTC (permalink / raw)
  To: Larry Finger, Chaoming Li, Kalle Valo, linux-wireless, netdev,
	linux-kernel
  Cc: Christos Gkekas

Clean up unused cur_rfstate variables in rtl8188ee, rtl8723ae, rtl8723be
and rtl8821ae.

Signed-off-by: Christos Gkekas <chris.gekas@gmail.com>
---
 drivers/net/wireless/realtek/rtlwifi/rtl8188ee/hw.c | 4 +---
 drivers/net/wireless/realtek/rtlwifi/rtl8723ae/hw.c | 4 +---
 drivers/net/wireless/realtek/rtlwifi/rtl8723be/hw.c | 4 +---
 drivers/net/wireless/realtek/rtlwifi/rtl8821ae/hw.c | 4 +---
 4 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8188ee/hw.c b/drivers/net/wireless/realtek/rtlwifi/rtl8188ee/hw.c
index 0ba26d2..4d843e6 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8188ee/hw.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8188ee/hw.c
@@ -2235,7 +2235,7 @@ bool rtl88ee_gpio_radio_on_off_checking(struct ieee80211_hw *hw, u8 *valid)
 {
 	struct rtl_priv *rtlpriv = rtl_priv(hw);
 	struct rtl_ps_ctl *ppsc = rtl_psc(rtl_priv(hw));
-	enum rf_pwrstate e_rfpowerstate_toset, cur_rfstate;
+	enum rf_pwrstate e_rfpowerstate_toset;
 	u32 u4tmp;
 	bool b_actuallyset = false;
 
@@ -2254,8 +2254,6 @@ bool rtl88ee_gpio_radio_on_off_checking(struct ieee80211_hw *hw, u8 *valid)
 		spin_unlock(&rtlpriv->locks.rf_ps_lock);
 	}
 
-	cur_rfstate = ppsc->rfpwr_state;
-
 	u4tmp = rtl_read_dword(rtlpriv, REG_GPIO_OUTPUT);
 	e_rfpowerstate_toset = (u4tmp & BIT(31)) ? ERFON : ERFOFF;
 
diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/hw.c b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/hw.c
index 5ac7b81..4e1e1f8 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/hw.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/hw.c
@@ -2103,7 +2103,7 @@ bool rtl8723e_gpio_radio_on_off_checking(struct ieee80211_hw *hw, u8 *valid)
 	struct rtl_priv *rtlpriv = rtl_priv(hw);
 	struct rtl_ps_ctl *ppsc = rtl_psc(rtl_priv(hw));
 	struct rtl_phy *rtlphy = &(rtlpriv->phy);
-	enum rf_pwrstate e_rfpowerstate_toset, cur_rfstate;
+	enum rf_pwrstate e_rfpowerstate_toset;
 	u8 u1tmp;
 	bool b_actuallyset = false;
 
@@ -2122,8 +2122,6 @@ bool rtl8723e_gpio_radio_on_off_checking(struct ieee80211_hw *hw, u8 *valid)
 		spin_unlock(&rtlpriv->locks.rf_ps_lock);
 	}
 
-	cur_rfstate = ppsc->rfpwr_state;
-
 	rtl_write_byte(rtlpriv, REG_GPIO_IO_SEL_2,
 		       rtl_read_byte(rtlpriv, REG_GPIO_IO_SEL_2)&~(BIT(1)));
 
diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723be/hw.c b/drivers/net/wireless/realtek/rtlwifi/rtl8723be/hw.c
index 4d47b97..2ad1013 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8723be/hw.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723be/hw.c
@@ -2486,7 +2486,7 @@ bool rtl8723be_gpio_radio_on_off_checking(struct ieee80211_hw *hw, u8 *valid)
 	struct rtl_priv *rtlpriv = rtl_priv(hw);
 	struct rtl_ps_ctl *ppsc = rtl_psc(rtl_priv(hw));
 	struct rtl_phy *rtlphy = &(rtlpriv->phy);
-	enum rf_pwrstate e_rfpowerstate_toset, cur_rfstate;
+	enum rf_pwrstate e_rfpowerstate_toset;
 	u8 u1tmp;
 	bool b_actuallyset = false;
 
@@ -2505,8 +2505,6 @@ bool rtl8723be_gpio_radio_on_off_checking(struct ieee80211_hw *hw, u8 *valid)
 		spin_unlock(&rtlpriv->locks.rf_ps_lock);
 	}
 
-	cur_rfstate = ppsc->rfpwr_state;
-
 	rtl_write_byte(rtlpriv, REG_GPIO_IO_SEL_2,
 		       rtl_read_byte(rtlpriv, REG_GPIO_IO_SEL_2) & ~(BIT(1)));
 
diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8821ae/hw.c b/drivers/net/wireless/realtek/rtlwifi/rtl8821ae/hw.c
index 1d431d4..e756f94 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8821ae/hw.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8821ae/hw.c
@@ -3845,7 +3845,7 @@ bool rtl8821ae_gpio_radio_on_off_checking(struct ieee80211_hw *hw, u8 *valid)
 	struct rtl_priv *rtlpriv = rtl_priv(hw);
 	struct rtl_ps_ctl *ppsc = rtl_psc(rtl_priv(hw));
 	struct rtl_phy *rtlphy = &rtlpriv->phy;
-	enum rf_pwrstate e_rfpowerstate_toset, cur_rfstate;
+	enum rf_pwrstate e_rfpowerstate_toset;
 	u8 u1tmp = 0;
 	bool b_actuallyset = false;
 
@@ -3864,8 +3864,6 @@ bool rtl8821ae_gpio_radio_on_off_checking(struct ieee80211_hw *hw, u8 *valid)
 		spin_unlock(&rtlpriv->locks.rf_ps_lock);
 	}
 
-	cur_rfstate = ppsc->rfpwr_state;
-
 	rtl_write_byte(rtlpriv, REG_GPIO_IO_SEL_2,
 			rtl_read_byte(rtlpriv,
 					REG_GPIO_IO_SEL_2) & ~(BIT(1)));
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH 1/3] atm: idt77105: Drop needless setup_timer()
From: David Miller @ 2017-10-11 21:15 UTC (permalink / raw)
  To: keescook; +Cc: tglx, 3chas3, linux-atm-general, netdev, linux-kernel
In-Reply-To: <1507663550-13343-2-git-send-email-keescook@chromium.org>

From: Kees Cook <keescook@chromium.org>
Date: Tue, 10 Oct 2017 12:25:48 -0700

> Calling setup_timer() is redundant when DEFINE_TIMER() has been used.
> 
> Cc: Chas Williams <3chas3@gmail.com>
> Cc: linux-atm-general@lists.sourceforge.net
> Cc: netdev@vger.kernel.org
> Signed-off-by: Kees Cook <keescook@chromium.org>

Applied to net-next, thanks.

^ permalink raw reply

* [PATCH net-next 0/3] sched: act: ife: UAPI checks and performance tweaks
From: Alexander Aring @ 2017-10-11 21:16 UTC (permalink / raw)
  To: jhs; +Cc: xiyou.wangcong, jiri, netdev, eric.dumazet, bjb, Alexander Aring

Hi,

this patch series contains at first a patch which adds a check for
IFE_ENCODE and IFE_DECODE when a ife act gets created or updated and adding
handling of these cases only inside the act callback only.

The second patch use per-cpu counters and move the spinlock around so that
the spinlock is less being held in act callback.

The last patch use rcu for update parameters and also move the spinlock for
the same purpose as in patch 2.

Notes:
 - There is still a spinlock around for protecting the metalist and a
   rw-lock for another list. Should be migrated to a rcu list, ife
   possible.

 - I use still dereference in dump callback, so I think what I didn't
   got was what happened when rcu_assign_pointer will do when rcu read
   lock is held. I suppose the pointer will be updated, then we don't
   have any issue here.

Alexander Aring (3):
  sched: act: ife: move encode/decode check to init
  sched: act: ife: migrate to use per-cpu counters
  sched: act: ife: update parameters via rcu handling

 include/net/tc_act/tc_ife.h |  10 +++-
 net/sched/act_ife.c         | 135 +++++++++++++++++++++++++-------------------
 2 files changed, 86 insertions(+), 59 deletions(-)

-- 
2.11.0

^ permalink raw reply

* [PATCH net-next 1/3] sched: act: ife: move encode/decode check to init
From: Alexander Aring @ 2017-10-11 21:16 UTC (permalink / raw)
  To: jhs; +Cc: xiyou.wangcong, jiri, netdev, eric.dumazet, bjb, Alexander Aring
In-Reply-To: <20171011211608.22692-1-aring@mojatatu.com>

This patch adds the check of the two possible ife handlings encode
and decode to the init callback. The decode value is for usability
aspect and used in userspace code only. The current code offers encode
else decode only. This patch avoids any other option than this.

Signed-off-by: Alexander Aring <aring@mojatatu.com>
---
 net/sched/act_ife.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index 8ccd35825b6b..efac8a32c30a 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -450,6 +450,13 @@ static int tcf_ife_init(struct net *net, struct nlattr *nla,
 
 	parm = nla_data(tb[TCA_IFE_PARMS]);
 
+	/* IFE_DECODE is 0 and indicates the opposite of IFE_ENCODE because
+	 * they cannot run as the same time. Check on all other values which
+	 * are not supported right now.
+	 */
+	if (parm->flags & ~IFE_ENCODE)
+		return -EINVAL;
+
 	exists = tcf_idr_check(tn, parm->index, a, bind);
 	if (exists && bind)
 		return 0;
@@ -772,17 +779,7 @@ static int tcf_ife_act(struct sk_buff *skb, const struct tc_action *a,
 	if (ife->flags & IFE_ENCODE)
 		return tcf_ife_encode(skb, a, res);
 
-	if (!(ife->flags & IFE_ENCODE))
-		return tcf_ife_decode(skb, a, res);
-
-	pr_info_ratelimited("unknown failure(policy neither de/encode\n");
-	spin_lock(&ife->tcf_lock);
-	bstats_update(&ife->tcf_bstats, skb);
-	tcf_lastuse_update(&ife->tcf_tm);
-	ife->tcf_qstats.drops++;
-	spin_unlock(&ife->tcf_lock);
-
-	return TC_ACT_SHOT;
+	return tcf_ife_decode(skb, a, res);
 }
 
 static int tcf_ife_walker(struct net *net, struct sk_buff *skb,
-- 
2.11.0

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox