Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Permissions for eBPF objects
From: Alexei Starovoitov @ 2017-08-29  1:15 UTC (permalink / raw)
  To: Chenbo Feng
  Cc: Daniel Borkmann, Jeffrey Vander Stoep, Stephen Smalley, netdev,
	SELinux, Mickaël Salaün
In-Reply-To: <CAMOXUJ=kiZDEpuBfys0Me4o8wqSymCz+Eu_qdQdOH5+Czzfj8g@mail.gmail.com>

On Mon, Aug 28, 2017 at 05:47:19PM -0700, Chenbo Feng wrote:
> On Fri, Aug 25, 2017 at 6:03 PM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Fri, Aug 25, 2017 at 10:07:27PM +0200, Daniel Borkmann wrote:
> >> On 08/25/2017 09:52 PM, Chenbo Feng wrote:
> >> > On Fri, Aug 25, 2017 at 12:45 PM, Jeffrey Vander Stoep <jeffv@google.com> wrote:
> >> > > On Fri, Aug 25, 2017 at 12:26 PM, Stephen Smalley <sds@tycho.nsa.gov> wrote:
> >> > > > On Fri, 2017-08-25 at 11:01 -0700, Jeffrey Vander Stoep via Selinux
> >> > > > wrote:
> >> > > > > I’d like to get your thoughts on adding LSM permission checks on BPF
> >> > > > > objects.
> >
> > before reinventing the wheel please take a look at landlock work.
> > Everything that was discussed in this thread is covered by it.
> > The patches have been in development for more than a year and most of the early
> > issues have been resolved.
> > It will be presented again during security summit in LA in September.
> >
> I am not very familiar with landlock lsm, isn't this module also
> depend on the lsm hooks to do
> the landlock check? If so then adding lsm hooks for eBPF object seems
> not conflict with the
> work on progress.

I see. I got it the other way around. What lsm checks are you proposing?
and why unprivileged_bpf_disabled is not enough?
you want to allow unpriv only for specific user(s) ?

^ permalink raw reply

* RE: [patch net-next 2/3] net/sched: Change cls_flower to use IDR
From: Chris Mi @ 2017-08-29  1:34 UTC (permalink / raw)
  To: Jamal Hadi Salim, netdev@vger.kernel.org
  Cc: xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net,
	mawilcox@microsoft.com
In-Reply-To: <a562fbec-797f-1cb2-8af5-cce36126aa23@mojatatu.com>

> -----Original Message-----
> From: Jamal Hadi Salim [mailto:jhs@mojatatu.com]
> Sent: Tuesday, August 29, 2017 5:56 AM
> To: Chris Mi <chrism@mellanox.com>; netdev@vger.kernel.org
> Cc: xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net;
> mawilcox@microsoft.com
> Subject: Re: [patch net-next 2/3] net/sched: Change cls_flower to use IDR
> 
> On 17-08-28 02:41 AM, Chris Mi wrote:
> > Currently, all filters with the same priority are linked in a doubly
> > linked list. Every filter should have a unique handle. To make the
> > handle unique, we need to iterate the list every time to see if the
> > handle exists or not when inserting a new filter. It is time-consuming.
> > For example, it takes about 5m3.169s to insert 64K rules.
> >
> > This patch changes cls_flower to use IDR. With this patch, it takes
> > about 0m1.127s to insert 64K rules. The improvement is huge.
> >
> > But please note that in this testing, all filters share the same action.
> > If every filter has a unique action, that is another bottleneck.
> > Follow-up patch in this patchset addresses that.
> >
> > Signed-off-by: Chris Mi <chrism@mellanox.com>
> > Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> 
> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
> 
> As Cong asked last time - any plans to add to other classifiers?
I think if other classifiers don't need so many items, list is enough for them.
If we change all of them, we need spend a lot of time to test them to make sure
there is no regression. But the benefit is not very big. If a certain classifier
need to change in the future, flower is an example for reference.

-Chris
> 
> cheers,
> jamal

^ permalink raw reply

* Re: Permissions for eBPF objects
From: Chenbo Feng @ 2017-08-29  1:44 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Jeffrey Vander Stoep, Stephen Smalley, netdev,
	SELinux, Mickaël Salaün
In-Reply-To: <20170829011545.hjhpmyaerd44r5xo@ast-mbp>

On Mon, Aug 28, 2017 at 6:15 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Mon, Aug 28, 2017 at 05:47:19PM -0700, Chenbo Feng wrote:
>> On Fri, Aug 25, 2017 at 6:03 PM, Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>> > On Fri, Aug 25, 2017 at 10:07:27PM +0200, Daniel Borkmann wrote:
>> >> On 08/25/2017 09:52 PM, Chenbo Feng wrote:
>> >> > On Fri, Aug 25, 2017 at 12:45 PM, Jeffrey Vander Stoep <jeffv@google.com> wrote:
>> >> > > On Fri, Aug 25, 2017 at 12:26 PM, Stephen Smalley <sds@tycho.nsa.gov> wrote:
>> >> > > > On Fri, 2017-08-25 at 11:01 -0700, Jeffrey Vander Stoep via Selinux
>> >> > > > wrote:
>> >> > > > > I’d like to get your thoughts on adding LSM permission checks on BPF
>> >> > > > > objects.
>> >
>> > before reinventing the wheel please take a look at landlock work.
>> > Everything that was discussed in this thread is covered by it.
>> > The patches have been in development for more than a year and most of the early
>> > issues have been resolved.
>> > It will be presented again during security summit in LA in September.
>> >
>> I am not very familiar with landlock lsm, isn't this module also
>> depend on the lsm hooks to do
>> the landlock check? If so then adding lsm hooks for eBPF object seems
>> not conflict with the
>> work on progress.
>
> I see. I got it the other way around. What lsm checks are you proposing?
> and why unprivileged_bpf_disabled is not enough?
> you want to allow unpriv only for specific user(s) ?
>
Exactly, the proposal patch I am currently working on will add checks
before map creation,
map read,  and map modify, since all these functionalities will be
available to all users when
unprivileged_bpf_disabled is turned off. And eBPF prog_load may also
need a check as well
since loading some types of program is not restricted either.

^ permalink raw reply

* Hooking on L4 Level with process information
From: Ravish Kumar @ 2017-08-29  2:04 UTC (permalink / raw)
  To: Networking

Hi,

I want to hook tcp/udp packets on L4 Layer and based on process
information , content want to deny or allow packets.

Netfilter provides pre/post Routing hooks but not sure that will be
right place so thought of asking whether my approach is right.
Also how i can get process information whether this packet is send by
this process.

Thoughts /source code reference would be appreciated.

Regards,
Ravish

^ permalink raw reply

* Re: [PATCH v2 net-next 1/8] bpf: Add support for recursively running cgroup sock filters
From: David Ahern @ 2017-08-29  2:22 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: netdev, daniel, ast, tj, davem, luto, David Ahern (gmail)
In-Reply-To: <20170829011213.suddt5hkptaxd4rp@ast-mbp>

On 8/28/17 7:12 PM, Alexei Starovoitov wrote:
>>>> To consider what happens on doubling back and changing programs in the
>>>> hierarchy, start with $MNT/a/b/c from 3 above (non-recursive on 'a',
>>>> recursive on 'b' and recursive on 'c') for each of the following cases:
>>>>
>>>> 1. Program attached to 'b' is detached, recursive flag is reset in the
>>>> request. Attempt fails EINVAL because the recursion flag has to be set.
>>>
>>> didn't get this point. you mean 'detach' will fail?
>>
>> Yes, because it tried to reset the flag in the process.
>>
>> This is something we can make user friendly - the detach succeeds, but
>> the recurse flag is ignored and the recurse flag in the group is not
>> reset unless it is the base group with the recurse flag (i.e., the
>> parent is marked non-recurse).
> 
> if we don't reset group flags to default it will be even more difficult
> for users to use, since attach with recursive flag + immediate detach sets
> some internal flag on the cgroup and user space has no way of
> either querying this flag or deleting it.

We have discussed this before -- the need to know which cgroup has a
program and now what is the status of flags. That need is a different
problem than this patch set.

I'll address the reset of the flags below to keep that discussion together.

> 
>>>
>>>> 2. Program attached to 'b' is detached, recursive flag is set. Allowed.
>>>
>>> meaing that detach from 'b' has to pass recurse flag to be detached?
>>> That's also odd.
>>> imo detach should always succeed and the process doing detach
>>> shouldn't need to know what flags were used in attach.
>>
>> Then, we agree to make it user friendly and handle resetting the recurse
>> flag automatically.
> 
> in that sense yes attach/delete pair should be side-effect free.
> 
>>>> Process in 'b' opens a socket. No program attached to 'b' so no program
>>>> is run. Recursive flag is set to program from 'a' is run. Stop.
>>>>
>>>> We should allow the recursive flag to be reset if the parent is not
>>>> recursive allowing an unwind of settings applied. I'll add that change.
>>>
>>> I don't get this part.
>>> Anyway looking forward to the next patch set with tests and comments like above.
>>
>> Per above discussion, you don't want 'a' run since it is not marked
>> recurse. My last sentence is the user friendly part in resetting the
>> flag in the cgroup.
> 
> I'm still not grasping fully the semantics of what you're proposing.
> You keep saying that override and recurse flags are indepedent, but
> the more we talk the more it's clear that there is a complicated
> relationship between them. Like no_override overrules everything, etc.

yes, I have said that a few times. Override should block everything in
terms of installing programs. If it is not enabled, the status of the
recurse flag is not relevant at attach / detach time as the call should
fail. So installing a program with the recurse flag only works if
override is allowed.


> I'm looking for the simplest to use logic. Not implementation.
> Implementation can be complex, but uapi should be as simple to
> explain and as simple to understand as possible.
> So how about allowing recurse+overide combination only?
> All descendents must be recurse+override too and
> no program allowed to be set on parent unless it's recurse+override
> as well. Then detach anywhere is simple, since all programs in
> such chain are always recurse+override.


Let's walk through examples based on the new ground rule - recursion
stops at last cgroup with flag set.

Assuming override is allowed ...

${MNT}/a/b/c/d

- 'a' has no program

- 'b' has a program, override allowed, recurse set

- 'c' and 'd' inherit the program from 'b' by recursion, not inheritance
(ie., bpf.effective is not updated with the program from 'b', but the
recurse flag is set on 'c' and 'd').

At this point 'c' and 'd' can ONLY take programs that are recursive.

- 'c' gets a program installed
- 'd' gets a program installed.


Process in 'd' has programs run in this order: 'd', 'c', 'b'


Now, program 'c' is detached. It is in the middle of the recursive set.
It MUST keep the recurse flag set as it inherited the restriction from
'b'. The recurse flag on 'c' can ONLY be reset when the program is
detached from 'b' as it is the start of the recursive chain.

I'll stop here to make sure we agree on the above. Considering all
permutations is a maze.

###

Also, let's agree on this intention. Based on the new ground rule, I
want to point out this example:

If 'a' gets a program installed with no recurse flag set, ONLY processes
in 'a' have the 'a' program run. Processes in groups 'b', 'c' and 'd'
all stop at cgroup 'b' program.

To me this is counterintuitive and why I said programs are run up to the
first parent without the recurse flag. They are all part of the same tree.

^ permalink raw reply

* linux-next: manual merge of the net-next tree with the net tree
From: Stephen Rothwell @ 2017-08-29  2:25 UTC (permalink / raw)
  To: David Miller, Networking
  Cc: Linux-Next Mailing List, Linux Kernel Mailing List,
	Antoine Tenart, Thomas Petazzoni

Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  drivers/net/ethernet/marvell/mvpp2.c

between commit:

  4c2286826451 ("net: mvpp2: fix the mac address used when using PPv2.2")

from the net tree and commits:

  09f8397553a2 ("net: mvpp2: introduce per-port nrxqs/ntxqs variables")
  213f428f5056 ("net: mvpp2: add support for TX interrupts and RX queue distribution modes")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/net/ethernet/marvell/mvpp2.c
index 4d598ca8503a,fea9ae5b70ba..000000000000
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@@ -6504,7 -7248,9 +7248,9 @@@ static int mvpp2_port_probe(struct plat
  	struct resource *res;
  	const char *dt_mac_addr;
  	const char *mac_from;
 -	char hw_mac_addr[ETH_ALEN];
 +	char hw_mac_addr[ETH_ALEN] = {0};
+ 	unsigned int ntxqs, nrxqs;
+ 	bool has_tx_irqs;
  	u32 id;
  	int features;
  	int phy_mode;

^ permalink raw reply

* RE: [PATCH] vsock: only load vmci transport on VMware hypervisor by default
From: Dexuan Cui @ 2017-08-29  2:36 UTC (permalink / raw)
  To: 'Jorgen S. Hansen', 'Stefan Hajnoczi'
  Cc: 'davem@davemloft.net', 'netdev@vger.kernel.org',
	'gregkh@linuxfoundation.org',
	'devel@linuxdriverproject.org', KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, 'George Zhang',
	'Michal Kubecek', 'Asias He',
	'Vitaly Kuznetsov', 'Cathy Avery',
	'jasowang@redhat.com', 'Rolf Neugebauer',
	'Dave Scott', 'Marcelo Cerri',
	'apw@canonical.com', "'
In-Reply-To: <KL1P15301MB000812AD1EFB8A3986EC6277BF850@KL1P15301MB0008.APCP153.PROD.OUTLOOK.COM>

> From: Dexuan Cui
> Sent: Tuesday, August 22, 2017 21:21
> > ...
> > ...
> > The only problem here would be the potential for a guest and a host app
> to
> > have a conflict wrt port numbers, even though they would be able to
> > operate fine, if restricted to their appropriate transport.
> >
> > Thanks,
> > Jorgen
> 
> Hi Jorgen, Stefan,
> Thank you for the detailed analysis!
> You have a much better understanding than me about the complex
> scenarios. Can you please work out a patch? :-)

Hi Jorgen, Stefan,
May I know your plan for this? 
 
> IMO Linux driver of Hyper-V sockets is the simplest case, as we only have
> the "to host" option (the host side driver of Hyper-V sockets runs on
> Windows kernel and I don't think the other hypervisors emulate
> the full Hyper-V VMBus 4.0, which is required to support Hyper-V sockets).
> 
> -- Dexuan

Thanks,
-- Dexuan

^ permalink raw reply

* [net-next:master 1466/1469] drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c:48:10: error: implicit declaration of function 'bnxt_vf_rep_get_fid'
From: kbuild test robot @ 2017-08-29  2:49 UTC (permalink / raw)
  To: Sathya Perla; +Cc: kbuild-all, netdev, Michael Chan

[-- Attachment #1: Type: text/plain, Size: 6897 bytes --]

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
head:   acae4b48856838d71d548ab6610a99d8e32653e4
commit: 2ae7408fedfee979e01ed3801223c632bb124c46 [1466/1469] bnxt_en: bnxt: add TC flower filter offload support
config: x86_64-randconfig-b0-08290613 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        git checkout 2ae7408fedfee979e01ed3801223c632bb124c46
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c: In function 'bnxt_flow_get_dst_fid':
>> drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c:48:10: error: implicit declaration of function 'bnxt_vf_rep_get_fid' [-Werror=implicit-function-declaration]
      return bnxt_vf_rep_get_fid(dev);
             ^~~~~~~~~~~~~~~~~~~
   In file included from include/uapi/linux/stddef.h:1:0,
                    from include/linux/stddef.h:4,
                    from include/uapi/linux/posix_types.h:4,
                    from include/uapi/linux/types.h:13,
                    from include/linux/types.h:5,
                    from include/linux/list.h:4,
                    from include/linux/timer.h:4,
                    from include/linux/netdevice.h:28,
                    from drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c:10:
   drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c: At top level:
   include/linux/compiler.h:162:4: warning: '______f' is static but declared in inline function 'strcpy' which is not static
       ______f = {     \
       ^
   include/linux/compiler.h:154:23: note: in expansion of macro '__trace_if'
    #define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
                          ^~~~~~~~~~
   include/linux/string.h:390:2: note: in expansion of macro 'if'
     if (p_size == (size_t)-1 && q_size == (size_t)-1)
     ^~
   include/linux/compiler.h:162:4: warning: '______f' is static but declared in inline function 'kmemdup' which is not static
       ______f = {     \
       ^
   include/linux/compiler.h:154:23: note: in expansion of macro '__trace_if'
    #define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
                          ^~~~~~~~~~
   include/linux/string.h:380:2: note: in expansion of macro 'if'
     if (p_size < size)
     ^~
   include/linux/compiler.h:162:4: warning: '______f' is static but declared in inline function 'kmemdup' which is not static
       ______f = {     \
       ^
   include/linux/compiler.h:154:23: note: in expansion of macro '__trace_if'
    #define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
                          ^~~~~~~~~~
   include/linux/string.h:378:2: note: in expansion of macro 'if'
     if (__builtin_constant_p(size) && p_size < size)
     ^~
   include/linux/compiler.h:162:4: warning: '______f' is static but declared in inline function 'memchr_inv' which is not static
       ______f = {     \
       ^
   include/linux/compiler.h:154:23: note: in expansion of macro '__trace_if'
    #define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
                          ^~~~~~~~~~
   include/linux/string.h:369:2: note: in expansion of macro 'if'
     if (p_size < size)
     ^~
   include/linux/compiler.h:162:4: warning: '______f' is static but declared in inline function 'memchr_inv' which is not static
       ______f = {     \
       ^
   include/linux/compiler.h:154:23: note: in expansion of macro '__trace_if'
    #define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
                          ^~~~~~~~~~
   include/linux/string.h:367:2: note: in expansion of macro 'if'
     if (__builtin_constant_p(size) && p_size < size)
     ^~
   include/linux/compiler.h:162:4: warning: '______f' is static but declared in inline function 'memchr' which is not static
       ______f = {     \
       ^
   include/linux/compiler.h:154:23: note: in expansion of macro '__trace_if'
    #define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
                          ^~~~~~~~~~
   include/linux/string.h:358:2: note: in expansion of macro 'if'
     if (p_size < size)
     ^~
   include/linux/compiler.h:162:4: warning: '______f' is static but declared in inline function 'memchr' which is not static
       ______f = {     \
       ^
   include/linux/compiler.h:154:23: note: in expansion of macro '__trace_if'
    #define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
                          ^~~~~~~~~~
   include/linux/string.h:356:2: note: in expansion of macro 'if'
     if (__builtin_constant_p(size) && p_size < size)
     ^~
   include/linux/compiler.h:162:4: warning: '______f' is static but declared in inline function 'memcmp' which is not static
       ______f = {     \
       ^
   include/linux/compiler.h:154:23: note: in expansion of macro '__trace_if'
    #define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
                          ^~~~~~~~~~
   include/linux/string.h:348:2: note: in expansion of macro 'if'
     if (p_size < size || q_size < size)
     ^~
   include/linux/compiler.h:162:4: warning: '______f' is static but declared in inline function 'memcmp' which is not static
       ______f = {     \
       ^
   include/linux/compiler.h:154:23: note: in expansion of macro '__trace_if'
    #define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
                          ^~~~~~~~~~
   include/linux/string.h:345:3: note: in expansion of macro 'if'
      if (q_size < size)
      ^~
   include/linux/compiler.h:162:4: warning: '______f' is static but declared in inline function 'memcmp' which is not static
       ______f = {     \
       ^
   include/linux/compiler.h:154:23: note: in expansion of macro '__trace_if'
    #define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
                          ^~~~~~~~~~
   include/linux/string.h:343:3: note: in expansion of macro 'if'

vim +/bnxt_vf_rep_get_fid +48 drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c

    30	
    31	/* Return the dst fid of the func for flow forwarding
    32	 * For PFs: src_fid is the fid of the PF
    33	 * For VF-reps: src_fid the fid of the VF
    34	 */
    35	static u16 bnxt_flow_get_dst_fid(struct bnxt *pf_bp, struct net_device *dev)
    36	{
    37		struct bnxt *bp;
    38	
    39		/* check if dev belongs to the same switch */
    40		if (!switchdev_port_same_parent_id(pf_bp->dev, dev)) {
    41			netdev_info(pf_bp->dev, "dev(ifindex=%d) not on same switch",
    42				    dev->ifindex);
    43			return BNXT_FID_INVALID;
    44		}
    45	
    46		/* Is dev a VF-rep? */
    47		if (dev != pf_bp->dev)
  > 48			return bnxt_vf_rep_get_fid(dev);
    49	
    50		bp = netdev_priv(dev);
    51		return bp->pf.fw_fid;
    52	}
    53	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 29938 bytes --]

^ permalink raw reply

* Re: [patch net-next 11/12] mlxsw: spectrum_dpipe: Add support for IPv4 host table dump
From: David Ahern @ 2017-08-29  2:57 UTC (permalink / raw)
  To: Arkadi Sharshevsky, Jiri Pirko, netdev
  Cc: davem, idosch, mlxsw, roopa, Shrijeet Mukherjee
In-Reply-To: <ee012822-1080-fb83-bd8a-72053162d359@mellanox.com>

On 8/27/17 2:31 AM, Arkadi Sharshevsky wrote:
>> Also, this dpipe capability seems to be just dumping data structures
>> maintained by the driver. ie., you can compare the mlxsw view of
>> networking state to IPv4 and IPv6 level tables. Any plans to offer a
>> command that reads data from the h/w and passes that back to the user?
>> i.e, a command to compare kernel tables to h/w state?
>>
> 
> So this infra should provide several things-
> 
> 1) Reveal the interactions between various hardware tables
> 2) Counters for this tables
> 3) Debugabillity
> 
> The first two can be achieved right now. Regarding debugabillity, which
> is a bit vague, the current assumption is that the drivers internal data
> structures are synced with hardware (which is no always true), and maybe
> are not synced with the kernel, so this can be achieved right now by
> dumping the internal state of the driver. Furthermore, the counters are
> dumped from the hardware and give the user additional indication.
> 
> I completely agree that the hardware should be dumped in order to
> validate the internal data structures are really synced with HW. This
> could be usable for observing data corruptions inside the ASIC and
> various complex bugs.
> 
> In order to address that I though about maybe add a flag called
> "validate_hw" so that during the dump the driver<-->hw state could be
> validated.
> 
> What do you think about it?

It is not just a matter of dumping hardware state. The data returned by
dump needs to be consistent across platforms and vendors.

If the intent is validating hardware state matches kernel state (ie.,
h/w forwarding matches s/w forwarding), then the hardware state should
be dumped by the driver in a form that parallels kernel state. e.g.,
dump h/w routes, neighbor entries, fdb's in a form and granularity
similar to what is done for kernel tables.

With the recent dpipe changes that allows kernel to driver cache and
kernel to h/w state comparisons.

^ permalink raw reply

* Re: [PATCH net-next 1/3 v9] net: ether: Add support for multiplexing and aggregation type
From: Subash Abhinov Kasiviswanathan @ 2017-08-29  3:00 UTC (permalink / raw)
  To: Dan Williams
  Cc: netdev, davem, fengguang.wu, jiri, stephen, David.Laight, marcel,
	andrew
In-Reply-To: <1503958953.20027.5.camel@redhat.com>

On 2017-08-28 16:22, Dan Williams wrote:
> On Thu, 2017-08-24 at 22:39 -0600, Subash Abhinov Kasiviswanathan
> wrote:
>> Define the multiplexing and aggregation (MAP) ether type 0x00F9. This
>> is needed for receiving data in the MAP protocol like RMNET. This is
>> not an officially registered ID.
>> 
>> Signed-off-by: Subash Abhinov Kasiviswanathan <subash
>> +#define ETH_P_MAP	0x00F9		/* Multiplex &
>> aggregation proto*/
> 
> Any chance you could name this QUALCOMM_MAP or something like that?  Or
> at least update the comment to include that fact.
> 
> Dan
> 

Hi Dan

Sure, I can add a comment for it.

^ permalink raw reply

* [net-next] be2net: use shift instead of expensive divide
From: Zhang Shengju @ 2017-08-29  3:18 UTC (permalink / raw)
  To: sathya.perla, ajit.khaparde, sriharsha.basavapatna, netdev

Replace shift instead of expensive divide.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
---
 drivers/net/ethernet/emulex/benet/be_main.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 319eee3..e06094f 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -2455,7 +2455,7 @@ static struct be_rx_compl_info *be_rx_compl_get(struct be_rx_obj *rxo)
 
 	/* For checking the valid bit it is Ok to use either definition as the
 	 * valid bit is at the same position in both v0 and v1 Rx compl */
-	if (compl->dw[offsetof(struct amap_eth_rx_compl_v1, valid) / 32] == 0)
+	if (compl->dw[offsetof(struct amap_eth_rx_compl_v1, valid) >> 5] == 0)
 		return NULL;
 
 	rmb();
@@ -2486,7 +2486,7 @@ static struct be_rx_compl_info *be_rx_compl_get(struct be_rx_obj *rxo)
 	}
 
 	/* As the compl has been parsed, reset it; we wont touch it again */
-	compl->dw[offsetof(struct amap_eth_rx_compl_v1, valid) / 32] = 0;
+	compl->dw[offsetof(struct amap_eth_rx_compl_v1, valid) >> 5] = 0;
 
 	queue_tail_inc(&rxo->cq);
 	return rxcp;
@@ -2590,7 +2590,7 @@ static struct be_tx_compl_info *be_tx_compl_get(struct be_tx_obj *txo)
 	struct be_tx_compl_info *txcp = &txo->txcp;
 	struct be_eth_tx_compl *compl = queue_tail_node(tx_cq);
 
-	if (compl->dw[offsetof(struct amap_eth_tx_compl, valid) / 32] == 0)
+	if (compl->dw[offsetof(struct amap_eth_tx_compl, valid) >> 5] == 0)
 		return NULL;
 
 	/* Ensure load ordering of valid bit dword and other dwords below */
@@ -2600,7 +2600,7 @@ static struct be_tx_compl_info *be_tx_compl_get(struct be_tx_obj *txo)
 	txcp->status = GET_TX_COMPL_BITS(status, compl);
 	txcp->end_index = GET_TX_COMPL_BITS(wrb_index, compl);
 
-	compl->dw[offsetof(struct amap_eth_tx_compl, valid) / 32] = 0;
+	compl->dw[offsetof(struct amap_eth_tx_compl, valid) >> 5] = 0;
 	queue_tail_inc(tx_cq);
 	return txcp;
 }
-- 
1.8.3.1

^ permalink raw reply related

* Re: Hooking on L4 Level with process information
From: Stephen Hemminger @ 2017-08-29  3:19 UTC (permalink / raw)
  To: Ravish Kumar; +Cc: Networking
In-Reply-To: <CAEDnJmJZ1-4DFtjmzDO1US2HCuHYDkcsa+tk8mN3CZ6vcDLbVw@mail.gmail.com>

On Tue, 29 Aug 2017 07:34:51 +0530
Ravish Kumar <ravishk2004@gmail.com> wrote:

> Hi,
> 
> I want to hook tcp/udp packets on L4 Layer and based on process
> information , content want to deny or allow packets.
> 
> Netfilter provides pre/post Routing hooks but not sure that will be
> right place so thought of asking whether my approach is right.
> Also how i can get process information whether this packet is send by
> this process.
> 
> Thoughts /source code reference would be appreciated.
> 
> Regards,
> Ravish

There is not a 1:1 relationship between sockets/files and processes.

^ permalink raw reply

* RE: [patch net-next 2/3] net/sched: Change cls_flower to use IDR
From: Chris Mi @ 2017-08-29  3:25 UTC (permalink / raw)
  To: Simon Horman
  Cc: netdev@vger.kernel.org, jhs@mojatatu.com,
	xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net,
	mawilcox@microsoft.com
In-Reply-To: <20170828113721.GA14697@vergenet.net>



> -----Original Message-----
> From: Simon Horman [mailto:simon.horman@netronome.com]
> Sent: Monday, August 28, 2017 7:37 PM
> To: Chris Mi <chrism@mellanox.com>
> Cc: netdev@vger.kernel.org; jhs@mojatatu.com;
> xiyou.wangcong@gmail.com; jiri@resnulli.us; davem@davemloft.net;
> mawilcox@microsoft.com
> Subject: Re: [patch net-next 2/3] net/sched: Change cls_flower to use IDR
> 
> On Mon, Aug 28, 2017 at 02:41:16AM -0400, Chris Mi wrote:
> > Currently, all filters with the same priority are linked in a doubly
> > linked list. Every filter should have a unique handle. To make the
> > handle unique, we need to iterate the list every time to see if the
> > handle exists or not when inserting a new filter. It is time-consuming.
> > For example, it takes about 5m3.169s to insert 64K rules.
> >
> > This patch changes cls_flower to use IDR. With this patch, it takes
> > about 0m1.127s to insert 64K rules. The improvement is huge.
> 
> Very nice :)
> 
> > But please note that in this testing, all filters share the same action.
> > If every filter has a unique action, that is another bottleneck.
> > Follow-up patch in this patchset addresses that.
> >
> > Signed-off-by: Chris Mi <chrism@mellanox.com>
> > Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> > ---
> >  net/sched/cls_flower.c | 55
> > +++++++++++++++++++++-----------------------------
> >  1 file changed, 23 insertions(+), 32 deletions(-)
> >
> > diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c index
> > bd9dab4..3d041d2 100644
> > --- a/net/sched/cls_flower.c
> > +++ b/net/sched/cls_flower.c
> 
> ...
> 
> > @@ -890,6 +870,7 @@ static int fl_change(struct net *net, struct sk_buff
> *in_skb,
> >  	struct cls_fl_filter *fnew;
> >  	struct nlattr **tb;
> >  	struct fl_flow_mask mask = {};
> > +	unsigned long idr_index;
> >  	int err;
> >
> >  	if (!tca[TCA_OPTIONS])
> > @@ -920,13 +901,21 @@ static int fl_change(struct net *net, struct sk_buff
> *in_skb,
> >  		goto errout;
> >
> >  	if (!handle) {
> > -		handle = fl_grab_new_handle(tp, head);
> > -		if (!handle) {
> > -			err = -EINVAL;
> > +		err = idr_alloc_ext(&head->handle_idr, fnew, &idr_index,
> > +				    1, 0x80000000, GFP_KERNEL);
> > +		if (err)
> >  			goto errout;
> > -		}
> > +		fnew->handle = idr_index;
> > +	}
> > +
> > +	/* user specifies a handle and it doesn't exist */
> > +	if (handle && !fold) {
> > +		err = idr_alloc_ext(&head->handle_idr, fnew, &idr_index,
> > +				    handle, handle + 1, GFP_KERNEL);
> > +		if (err)
> > +			goto errout;
> > +		fnew->handle = idr_index;
> >  	}
> > -	fnew->handle = handle;
> >
> >  	if (tb[TCA_FLOWER_FLAGS]) {
> >  		fnew->flags = nla_get_u32(tb[TCA_FLOWER_FLAGS]);
> > @@ -980,6 +969,8 @@ static int fl_change(struct net *net, struct sk_buff
> *in_skb,
> >  	*arg = fnew;
> >
> >  	if (fold) {
> > +		fnew->handle = handle;
> 
> Can it be the case that fold is non-NULL and handle is zero?
> The handling of that case seem to have changed in this patch.
I don't think that could happen.  In function tc_ctl_tfilter(),

fl_get() will be called.  If handle is zero, fl_get() will return NULL.
That means fold is NULL.

> 
> > +		idr_replace_ext(&head->handle_idr, fnew, fnew->handle);
> >  		list_replace_rcu(&fold->list, &fnew->list);
> >  		tcf_unbind_filter(tp, &fold->res);
> >  		call_rcu(&fold->rcu, fl_destroy_filter);
> > --
> > 1.8.3.1
> >

^ permalink raw reply

* Re: Hooking on L4 Level with process information
From: Ravish Kumar @ 2017-08-29  3:36 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Networking
In-Reply-To: <20170828201900.625a10ed@xeon-e3>

Not convinced with this .

A process open a socket and that socket is associated with that
particular process unless it shares the file descriptors.
Can you explain why it is not related , at a time a socket will be
opened by a particular process.

On Tue, Aug 29, 2017 at 8:49 AM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Tue, 29 Aug 2017 07:34:51 +0530
> Ravish Kumar <ravishk2004@gmail.com> wrote:
>
>> Hi,
>>
>> I want to hook tcp/udp packets on L4 Layer and based on process
>> information , content want to deny or allow packets.
>>
>> Netfilter provides pre/post Routing hooks but not sure that will be
>> right place so thought of asking whether my approach is right.
>> Also how i can get process information whether this packet is send by
>> this process.
>>
>> Thoughts /source code reference would be appreciated.
>>
>> Regards,
>> Ravish
>
> There is not a 1:1 relationship between sockets/files and processes.

^ permalink raw reply

* Re: [net-next] be2net: use shift instead of expensive divide
From: Joe Perches @ 2017-08-29  3:44 UTC (permalink / raw)
  To: Zhang Shengju, sathya.perla, ajit.khaparde, sriharsha.basavapatna,
	netdev
In-Reply-To: <1503976719-3063-1-git-send-email-zhangshengju@cmss.chinamobile.com>

On Tue, 2017-08-29 at 11:18 +0800, Zhang Shengju wrote:
> Replace shift instead of expensive divide.

This change is mostly pointless.
Any half-way decent compiler should produce the same object.
gcc emits the same object with the old code.

The AMAP_GET_BITS macro uses the "offsetof(struct, member) / 32"
style too.

> diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
[]
> @@ -2455,7 +2455,7 @@ static struct be_rx_compl_info *be_rx_compl_get(struct be_rx_obj *rxo)
>  
>  	/* For checking the valid bit it is Ok to use either definition as the
>  	 * valid bit is at the same position in both v0 and v1 Rx compl */
> -	if (compl->dw[offsetof(struct amap_eth_rx_compl_v1, valid) / 32] == 0)
> +	if (compl->dw[offsetof(struct amap_eth_rx_compl_v1, valid) >> 5] == 0)

etc...

^ permalink raw reply

* net.ipv4.tcp_max_syn_backlog implementation
From: Harsha Chenji @ 2017-08-29  3:47 UTC (permalink / raw)
  To: netdev

So I have ubuntu 12.04 x32 in a VM with syncookies turned off. I tried
to do a syn flood (with netwox) on 3 different processes. Each of them
returns a different value with netstat -na | grep -c RECV :

nc -l 5555 returns 16 (netcat-traditional)
apache2 port 80 returns 256
vsftpd on 21 returns 64.
net.ipv4.tcp_max_syn_backlog is 512.

Why do these different processes on different ports have different
queue lengths for incomplete connections? Where exactly in the kernel
is this decided?

^ permalink raw reply

* Re: [PATCH v2 net-next 1/8] bpf: Add support for recursively running cgroup sock filters
From: Alexei Starovoitov @ 2017-08-29  4:11 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, daniel, ast, tj, davem, luto
In-Reply-To: <32706501-5fc3-7f59-9210-1898e896d384@gmail.com>

On Mon, Aug 28, 2017 at 08:22:31PM -0600, David Ahern wrote:
> On 8/28/17 7:12 PM, Alexei Starovoitov wrote:
> >>>> To consider what happens on doubling back and changing programs in the
> >>>> hierarchy, start with $MNT/a/b/c from 3 above (non-recursive on 'a',
> >>>> recursive on 'b' and recursive on 'c') for each of the following cases:
> >>>>
> >>>> 1. Program attached to 'b' is detached, recursive flag is reset in the
> >>>> request. Attempt fails EINVAL because the recursion flag has to be set.
> >>>
> >>> didn't get this point. you mean 'detach' will fail?
> >>
> >> Yes, because it tried to reset the flag in the process.
> >>
> >> This is something we can make user friendly - the detach succeeds, but
> >> the recurse flag is ignored and the recurse flag in the group is not
> >> reset unless it is the base group with the recurse flag (i.e., the
> >> parent is marked non-recurse).
> > 
> > if we don't reset group flags to default it will be even more difficult
> > for users to use, since attach with recursive flag + immediate detach sets
> > some internal flag on the cgroup and user space has no way of
> > either querying this flag or deleting it.
> 
> We have discussed this before -- the need to know which cgroup has a
> program and now what is the status of flags. That need is a different
> problem than this patch set.
> 
> I'll address the reset of the flags below to keep that discussion together.
> 
> > 
> >>>
> >>>> 2. Program attached to 'b' is detached, recursive flag is set. Allowed.
> >>>
> >>> meaing that detach from 'b' has to pass recurse flag to be detached?
> >>> That's also odd.
> >>> imo detach should always succeed and the process doing detach
> >>> shouldn't need to know what flags were used in attach.
> >>
> >> Then, we agree to make it user friendly and handle resetting the recurse
> >> flag automatically.
> > 
> > in that sense yes attach/delete pair should be side-effect free.
> > 
> >>>> Process in 'b' opens a socket. No program attached to 'b' so no program
> >>>> is run. Recursive flag is set to program from 'a' is run. Stop.
> >>>>
> >>>> We should allow the recursive flag to be reset if the parent is not
> >>>> recursive allowing an unwind of settings applied. I'll add that change.
> >>>
> >>> I don't get this part.
> >>> Anyway looking forward to the next patch set with tests and comments like above.
> >>
> >> Per above discussion, you don't want 'a' run since it is not marked
> >> recurse. My last sentence is the user friendly part in resetting the
> >> flag in the cgroup.
> > 
> > I'm still not grasping fully the semantics of what you're proposing.
> > You keep saying that override and recurse flags are indepedent, but
> > the more we talk the more it's clear that there is a complicated
> > relationship between them. Like no_override overrules everything, etc.
> 
> yes, I have said that a few times. Override should block everything in
> terms of installing programs. If it is not enabled, the status of the
> recurse flag is not relevant at attach / detach time as the call should
> fail. So installing a program with the recurse flag only works if
> override is allowed.
> 
> 
> > I'm looking for the simplest to use logic. Not implementation.
> > Implementation can be complex, but uapi should be as simple to
> > explain and as simple to understand as possible.
> > So how about allowing recurse+overide combination only?
> > All descendents must be recurse+override too and
> > no program allowed to be set on parent unless it's recurse+override
> > as well. Then detach anywhere is simple, since all programs in
> > such chain are always recurse+override.
> 
> 
> Let's walk through examples based on the new ground rule - recursion
> stops at last cgroup with flag set.
> 
> Assuming override is allowed ...
> 
> ${MNT}/a/b/c/d
> 
> - 'a' has no program
> 
> - 'b' has a program, override allowed, recurse set
> 
> - 'c' and 'd' inherit the program from 'b' by recursion, not inheritance
> (ie., bpf.effective is not updated with the program from 'b', but the
> recurse flag is set on 'c' and 'd').
> 
> At this point 'c' and 'd' can ONLY take programs that are recursive.
> 
> - 'c' gets a program installed
> - 'd' gets a program installed.
> 
> 
> Process in 'd' has programs run in this order: 'd', 'c', 'b'
> 
> 
> Now, program 'c' is detached. It is in the middle of the recursive set.
> It MUST keep the recurse flag set as it inherited the restriction from
> 'b'. The recurse flag on 'c' can ONLY be reset when the program is
> detached from 'b' as it is the start of the recursive chain.
> 
> I'll stop here to make sure we agree on the above. Considering all
> permutations is a maze.

Agree on the above, but you're mixing semantics of the new recurse
flag and implementation of it. Ex: we don't have to copy this flag
from prog->attr into cgroup. So this reset or non-reset discussion
only makes sense in the context of your current implementation.
We can implement the logic differently. Like don't copy that flag
at all and at attach time walk parent->parent->parent and see
what programs are attached. All of them should have prog->attr & recurse_bit set
In such implementation detach from 'b' is a nop from reset/non-reset
point of view. When socket creation in 'c' is invoked the program
'c' is called first then the code keeps walking parents until root
invoking 'a' along the way.
I'm not saying it will be an efficient implementation. The point
is to discuss UAPI independent of implementation.

> ###
> 
> Also, let's agree on this intention. Based on the new ground rule, I
> want to point out this example:
> 
> If 'a' gets a program installed with no recurse flag set, ONLY processes
> in 'a' have the 'a' program run. Processes in groups 'b', 'c' and 'd'
> all stop at cgroup 'b' program.

I'm proposing that such situation should not be allowed to happen.
In a->b->c->d cgroup scenario if override+recurse prog attached to 'b'
then only the same override+recurse can be attached to c, d, a.
So at detach time there can be gaps (like only 'b' and 'd' have
override+recurse progs), but walking up until root from any point
will guarantee that only override+recurse programs are seen.

^ permalink raw reply

* Re: net.ipv4.tcp_max_syn_backlog implementation
From: Willy Tarreau @ 2017-08-29  4:12 UTC (permalink / raw)
  To: Harsha Chenji; +Cc: netdev
In-Reply-To: <CAMB9Wx+q9fidTnh3Tyias8KKdbRcKpoXM-ntSqjE_oW36F1W_A@mail.gmail.com>

On Mon, Aug 28, 2017 at 11:47:41PM -0400, Harsha Chenji wrote:
> So I have ubuntu 12.04 x32 in a VM with syncookies turned off. I tried
> to do a syn flood (with netwox) on 3 different processes. Each of them
> returns a different value with netstat -na | grep -c RECV :
> 
> nc -l 5555 returns 16 (netcat-traditional)
> apache2 port 80 returns 256
> vsftpd on 21 returns 64.
> net.ipv4.tcp_max_syn_backlog is 512.
> 
> Why do these different processes on different ports have different
> queue lengths for incomplete connections? Where exactly in the kernel
> is this decided?

The listening socket's backlog (second argument to the listen() syscall)
is considered as well. The code path to determine whether or not to start
to send SYN cookies is far from being trivial but makes sense once you
write it down completely. I never perfectly remember it, I regularly have
to recheck when I have a doubt.

Willy

^ permalink raw reply

* Re: net.ipv4.tcp_max_syn_backlog implementation
From: Eric Dumazet @ 2017-08-29  4:17 UTC (permalink / raw)
  To: Harsha Chenji; +Cc: netdev
In-Reply-To: <CAMB9Wx+q9fidTnh3Tyias8KKdbRcKpoXM-ntSqjE_oW36F1W_A@mail.gmail.com>

On Mon, 2017-08-28 at 23:47 -0400, Harsha Chenji wrote:
> So I have ubuntu 12.04 x32 in a VM with syncookies turned off. I tried
> to do a syn flood (with netwox) on 3 different processes. Each of them
> returns a different value with netstat -na | grep -c RECV :
> 
> nc -l 5555 returns 16 (netcat-traditional)
> apache2 port 80 returns 256
> vsftpd on 21 returns 64.
> net.ipv4.tcp_max_syn_backlog is 512.
> 
> Why do these different processes on different ports have different
> queue lengths for incomplete connections? Where exactly in the kernel
> is this decided?

See 2nd argument in listen() system call, ie backlog 

man listen

Without a synflood, just look at "ss -t state listening" 

The backlog is the 2nd column (Send)

^ permalink raw reply

* Re: [PATCH net] net: dsa: Don't dereference dst->cpu_dp->netdev
From: David Miller @ 2017-08-29  4:20 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, andrew, vivien.didelot, dan.carpenter
In-Reply-To: <1503965451-59498-1-git-send-email-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Mon, 28 Aug 2017 17:10:51 -0700

> If we do not have a master network device attached dst->cpu_dp will be
> NULL and accessing cpu_dp->netdev will create a trace similar to the one
> below. The correct check is on dst->cpu_dp period.
 ...
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> Fixes: 6d3c8c0dd88a ("net: dsa: Remove master_netdev and use dst->cpu_dp->netdev")
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Applied, thanks Florian.

^ permalink raw reply

* Re: [PATCH] be2net: Fix some u16 fields appropriately
From: David Miller @ 2017-08-29  4:22 UTC (permalink / raw)
  To: yanhaishuang
  Cc: sathya.perla, ajit.khaparde, sriharsha.basavapatna, somnath.kotur,
	netdev, linux-kernel
In-Reply-To: <979AB2E6-7D90-4C6B-B43F-1D17EE621D05@cmss.chinamobile.com>

From: 严海双 <yanhaishuang@cmss.chinamobile.com>
Date: Tue, 29 Aug 2017 09:04:57 +0800

> The GET_TX_COMPL_BITS comes from amap_get which also returns a 32-bit value:

It never returns a value with more than 16-bits of significance for
this specific call.

Please stop trying to be semantically clever when arguing about this
change.

It's not about types, it's about what range of values the struct
member can actually hold.

^ permalink raw reply

* [PATCH net-next v2] bridge: fdb add and delete tracepoints
From: Roopa Prabhu @ 2017-08-29  4:22 UTC (permalink / raw)
  To: davem; +Cc: netdev, nikolay, f.fainelli, bridge

From: Roopa Prabhu <roopa@cumulusnetworks.com>

A few useful tracepoints to trace bridge forwarding
database updates.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
v2 - address comments from Florian

 include/trace/events/bridge.h |   98 +++++++++++++++++++++++++++++++++++++++++
 net/bridge/br_fdb.c           |    7 +++
 net/core/net-traces.c         |    6 +++
 3 files changed, 111 insertions(+)
 create mode 100644 include/trace/events/bridge.h

diff --git a/include/trace/events/bridge.h b/include/trace/events/bridge.h
new file mode 100644
index 0000000..3a4ecc3
--- /dev/null
+++ b/include/trace/events/bridge.h
@@ -0,0 +1,98 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM bridge
+
+#if !defined(_TRACE_BRIDGE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_BRIDGE_H
+
+#include <linux/netdevice.h>
+#include <linux/tracepoint.h>
+
+#include "../../../net/bridge/br_private.h"
+
+TRACE_EVENT(br_fdb_add,
+
+	TP_PROTO(struct ndmsg *ndm, struct net_device *dev,
+		 const unsigned char *addr, u16 vid, u16 nlh_flags),
+
+	TP_ARGS(ndm, dev, addr, vid, nlh_flags),
+
+	TP_STRUCT__entry(
+		__field(u8, ndm_flags)
+		__string(dev, dev->name)
+		__array(unsigned char, addr, ETH_ALEN)
+		__field(u16, vid)
+		__field(u16, nlh_flags)
+	),
+
+	TP_fast_assign(
+		__assign_str(dev, dev->name);
+		memcpy(__entry->addr, addr, ETH_ALEN);
+		__entry->vid = vid;
+		__entry->nlh_flags = nlh_flags;
+		__entry->ndm_flags = ndm->ndm_flags;
+	),
+
+	TP_printk("dev %s addr %02x:%02x:%02x:%02x:%02x:%02x vid %u nlh_flags %04x ndm_flags = %02x",
+		  __get_str(dev), __entry->addr[0], __entry->addr[1],
+		  __entry->addr[2], __entry->addr[3], __entry->addr[4],
+		  __entry->addr[5], __entry->vid,
+		  __entry->nlh_flags, __entry->ndm_flags)
+);
+
+TRACE_EVENT(br_fdb_external_learn_add,
+
+	TP_PROTO(struct net_bridge *br, struct net_bridge_port *p,
+		 const unsigned char *addr, u16 vid),
+
+	TP_ARGS(br, p, addr, vid),
+
+	TP_STRUCT__entry(
+		__string(br_dev, br->dev->name)
+		__string(dev, p->dev->name)
+		__array(unsigned char, addr, ETH_ALEN)
+		__field(u16, vid)
+	),
+
+	TP_fast_assign(
+		__assign_str(br_dev, br ? br->dev->name : "null");
+		__assign_str(dev, p ? p->dev->name : "null");
+		memcpy(__entry->addr, addr, ETH_ALEN);
+		__entry->vid = vid;
+	),
+
+	TP_printk("br_dev %s port %s addr %02x:%02x:%02x:%02x:%02x:%02x vid %u",
+		  __get_str(br_dev), __get_str(dev), __entry->addr[0],
+		  __entry->addr[1], __entry->addr[2], __entry->addr[3],
+		  __entry->addr[4], __entry->addr[5], __entry->vid)
+);
+
+TRACE_EVENT(fdb_delete,
+
+	TP_PROTO(struct net_bridge *br, struct net_bridge_fdb_entry *f),
+
+	TP_ARGS(br, f),
+
+	TP_STRUCT__entry(
+		__string(br_dev, br->dev->name)
+		__string(dev, f->dst ? f->dst->dev->name : "null")
+		__array(unsigned char, addr, ETH_ALEN)
+		__field(u16, vid)
+	),
+
+	TP_fast_assign(
+		__assign_str(br_dev, br ? br->dev->name : "null");
+		__assign_str(dev, f->dst ? f->dst->dev->name : "null");
+		memcpy(__entry->addr, f->addr.addr, ETH_ALEN);
+		__entry->vid = f->vlan_id;
+	),
+
+	TP_printk("br_dev %s dev %s addr %02x:%02x:%02x:%02x:%02x:%02x vid %u",
+		  __get_str(br_dev), __get_str(dev), __entry->addr[0],
+		  __entry->addr[1], __entry->addr[2], __entry->addr[3],
+		  __entry->addr[4], __entry->addr[5], __entry->vid)
+);
+
+#endif /* _TRACE_BRIDGE_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index a79b648..be5e1da 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -25,6 +25,7 @@
 #include <asm/unaligned.h>
 #include <linux/if_vlan.h>
 #include <net/switchdev.h>
+#include <trace/events/bridge.h>
 #include "br_private.h"
 
 static struct kmem_cache *br_fdb_cache __read_mostly;
@@ -171,6 +172,8 @@ static void fdb_del_hw_addr(struct net_bridge *br, const unsigned char *addr)
 
 static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f)
 {
+	trace_fdb_delete(br, f);
+
 	if (f->is_static)
 		fdb_del_hw_addr(br, f->addr.addr);
 
@@ -870,6 +873,8 @@ int br_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
 	struct net_bridge *br = NULL;
 	int err = 0;
 
+	trace_br_fdb_add(ndm, dev, addr, vid, nlh_flags);
+
 	if (!(ndm->ndm_state & (NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE))) {
 		pr_info("bridge: RTM_NEWNEIGH with invalid state %#x\n", ndm->ndm_state);
 		return -EINVAL;
@@ -1066,6 +1071,8 @@ int br_fdb_external_learn_add(struct net_bridge *br, struct net_bridge_port *p,
 	bool modified = false;
 	int err = 0;
 
+	trace_br_fdb_external_learn_add(br, p, addr, vid);
+
 	spin_lock_bh(&br->hash_lock);
 
 	head = &br->hash[br_mac_hash(addr, vid)];
diff --git a/net/core/net-traces.c b/net/core/net-traces.c
index 4f1468c..4a0292c 100644
--- a/net/core/net-traces.c
+++ b/net/core/net-traces.c
@@ -37,6 +37,12 @@
 #include <trace/events/fib6.h>
 EXPORT_TRACEPOINT_SYMBOL_GPL(fib6_table_lookup);
 #endif
+#if IS_ENABLED(CONFIG_BRIDGE)
+#include <trace/events/bridge.h>
+EXPORT_TRACEPOINT_SYMBOL_GPL(br_fdb_add);
+EXPORT_TRACEPOINT_SYMBOL_GPL(br_fdb_external_learn_add);
+EXPORT_TRACEPOINT_SYMBOL_GPL(fdb_delete);
+#endif
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(kfree_skb);
 
-- 
1.7.10.4

^ permalink raw reply related

* Re: [net-next] be2net: use shift instead of expensive divide
From: David Miller @ 2017-08-29  4:25 UTC (permalink / raw)
  To: zhangshengju; +Cc: sathya.perla, ajit.khaparde, sriharsha.basavapatna, netdev
In-Reply-To: <1503976719-3063-1-git-send-email-zhangshengju@cmss.chinamobile.com>

From: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Date: Tue, 29 Aug 2017 11:18:39 +0800

> Replace shift instead of expensive divide.
> 
> Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>

The divide is more easier to understand, and it costs the same
as a shift if the types are unsigned.

I'm not applying silly changes like this which actually make
the code worse off, sorry.

^ permalink raw reply

* Re: [PATCH net-next v2 00/10] net: dsa: add generic debugfs interface
From: David Miller @ 2017-08-29  4:38 UTC (permalink / raw)
  To: vivien.didelot
  Cc: netdev, linux-kernel, kernel, f.fainelli, andrew, privat, john,
	Woojung.Huh, sean.wang, nikita.yoush, cphealy
In-Reply-To: <20170828191748.19492-1-vivien.didelot@savoirfairelinux.com>

From: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Date: Mon, 28 Aug 2017 15:17:38 -0400

> This patch series adds a generic debugfs interface for the DSA
> framework, so that all switch devices benefit from it, e.g. Marvell,
> Broadcom, Microchip or any other DSA driver.

I've been thinking this over and I agree with the feedback given that
debugfs really isn't appropriate for this.

Please create a DSA device class, and hang these values under
appropriate sysfs device nodes that can be easily found via
/sys/class/dsa/ just as easily as they would be /sys/kernel/debug/dsa/

You really intend these values to be consistent across DSA devices,
and you don't intend to go willy-nilly changig these exported values
arbitrarily over time.  That's what debugfs is for, throw-away
stuff.

So please make these proper device sysfs attributes rather than
debugfs.

Thank you.

^ permalink raw reply

* Re: [PATCH net-next] hinic: don't build the module by default
From: David Miller @ 2017-08-29  4:40 UTC (permalink / raw)
  To: vkuznets; +Cc: netdev, aviad.krawczyk, linux-kernel
In-Reply-To: <20170828131605.3173-1-vkuznets@redhat.com>

From: Vitaly Kuznetsov <vkuznets@redhat.com>
Date: Mon, 28 Aug 2017 15:16:05 +0200

> We probably don't want to enable code supporting particular hardware by
> default e.g. when someone does 'make defconfig'. Other ethernet modules
> don't do it.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

Applied, thanks.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox