Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
From: Alexei Starovoitov @ 2016-09-15  4:48 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Mickaël Salaün, linux-kernel@vger.kernel.org,
	Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, Eric W . Biederman, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Tejun Heo,
	Will Drewry
In-Reply-To: <CALCETrU=tGLx8s_eqji6SfXRi=3W8FkGC7wA6VMfD-_wAVb66w@mail.gmail.com>

On Wed, Sep 14, 2016 at 09:38:16PM -0700, Andy Lutomirski wrote:
> On Wed, Sep 14, 2016 at 9:31 PM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Wed, Sep 14, 2016 at 09:08:57PM -0700, Andy Lutomirski wrote:
> >> On Wed, Sep 14, 2016 at 9:00 PM, Alexei Starovoitov
> >> <alexei.starovoitov@gmail.com> wrote:
> >> > On Wed, Sep 14, 2016 at 07:27:08PM -0700, Andy Lutomirski wrote:
> >> >> >> >
> >> >> >> > This RFC handle both cgroup and seccomp approaches in a similar way. I
> >> >> >> > don't see why building on top of cgroup v2 is a problem. Is there
> >> >> >> > security issues with delegation?
> >> >> >>
> >> >> >> What I mean is: cgroup v2 delegation has a functionality problem.
> >> >> >> Tejun says [1]:
> >> >> >>
> >> >> >> We haven't had to face this decision because cgroup has never properly
> >> >> >> supported delegating to applications and the in-use setups where this
> >> >> >> happens are custom configurations where there is no boundary between
> >> >> >> system and applications and adhoc trial-and-error is good enough a way
> >> >> >> to find a working solution.  That wiggle room goes away once we
> >> >> >> officially open this up to individual applications.
> >> >> >>
> >> >> >> Unless and until that changes, I think that landlock should stay away
> >> >> >> from cgroups.  Others could reasonably disagree with me.
> >> >> >
> >> >> > Ours and Sargun's use cases for cgroup+lsm+bpf is not for security
> >> >> > and not for sandboxing. So the above doesn't matter in such contexts.
> >> >> > lsm hooks + cgroups provide convenient scope and existing entry points.
> >> >> > Please see checmate examples how it's used.
> >> >> >
> >> >>
> >> >> To be clear: I'm not arguing at all that there shouldn't be
> >> >> bpf+lsm+cgroup integration.  I'm arguing that the unprivileged
> >> >> landlock interface shouldn't expose any cgroup integration, at least
> >> >> until the cgroup situation settles down a lot.
> >> >
> >> > ahh. yes. we're perfectly in agreement here.
> >> > I'm suggesting that the next RFC shouldn't include unpriv
> >> > and seccomp at all. Once bpf+lsm+cgroup is merged, we can
> >> > argue about unpriv with cgroups and even unpriv as a whole,
> >> > since it's not a given. Seccomp integration is also questionable.
> >> > I'd rather not have seccomp as a gate keeper for this lsm.
> >> > lsm and seccomp are orthogonal hook points. Syscalls and lsm hooks
> >> > don't have one to one relationship, so mixing them up is only
> >> > asking for trouble further down the road.
> >> > If we really need to carry some information from seccomp to lsm+bpf,
> >> > it's easier to add eBPF support to seccomp and let bpf side deal
> >> > with passing whatever information.
> >> >
> >>
> >> As an argument for keeping seccomp (or an extended seccomp) as the
> >> interface for an unprivileged bpf+lsm: seccomp already checks off most
> >> of the boxes for safely letting unprivileged programs sandbox
> >> themselves.
> >
> > you mean the attach part of seccomp syscall that deals with no_new_priv?
> > sure, that's reusable.
> >
> >> Furthermore, to the extent that there are use cases for
> >> unprivileged bpf+lsm that *aren't* expressible within the seccomp
> >> hierarchy, I suspect that syscall filters have exactly the same
> >> problem and that we should fix seccomp to cover it.
> >
> > not sure what you mean by 'seccomp hierarchy'. The normal process
> > hierarchy ?
> 
> Kind of.  I mean the filter layers that are inherited across fork(),
> the TSYNC mechanism, etc.
> 
> > imo the main deficiency of secccomp is inability to look into arguments.
> > One can argue that it's a blessing, since composite args
> > are not yet copied into the kernel memory.
> > But in a lot of cases the seccomp arguments are FDs pointing
> > to kernel objects and if programs could examine those objects
> > the sandboxing scope would be more precise.
> > lsm+bpf solves that part and I'd still argue that it's
> > orthogonal to seccomp's pass/reject flow.
> > I mean if seccomp says 'ok' the syscall should continue executing
> > as normal and whatever LSM hooks were triggered by it may have
> > their own lsm+bpf verdicts.
> 
> I agree with all of this...
> 
> > Furthermore in the process hierarchy different children
> > should be able to set their own lsm+bpf filters that are not
> > related to parallel seccomp+bpf hierarchy of programs.
> > seccomp syscall can be an interface to attach programs
> > to lsm hooks, but nothing more than that.
> 
> I'm not sure what you mean.  I mean that, logically, I think we should
> be able to do:
> 
> seccomp(attach a syscall filter);
> fork();
> child does seccomp(attach some lsm filters);
> 
> I think that they *should* be related to the seccomp+bpf hierarchy of
> programs in that they are entries in the same logical list of filter
> layers installed.  Some of those layers can be syscall filters and
> some of the layers can be lsm filters.  If we subsequently add a way
> to attach a removable seccomp filter or a way to attach a seccomp
> filter that logs failures to some fd watched by an outside monitor, I
> think that should work for lsm, too, with more or less the same
> interface.
> 
> If we need a way for a sandbox manager to opt different children into
> different subsets of fancy filters, then I think that syscall filters
> and lsm filters should use the same mechanism.
> 
> I think we might be on the same page here and just saying it different ways.

Sounds like it :)
All of the above makes sense to me.
The 'orthogonal' part is that the user should be able to use
this seccomp-managed hierarchy without actually enabling
TIF_SECCOMP for the task and syscalls should still go through
fast path and all the way till lsm hooks as normal.
I don't want to pay _any_ performance penalty for this feature
for lsm hooks (and all syscalls) that don't have bpf programs attached.


^ permalink raw reply

* RE: [RFC 02/11] Add RoCE driver framework
From: Mintz, Yuval @ 2016-09-15  5:11 UTC (permalink / raw)
  To: Leon Romanovsky, dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
  Cc: Mark Bloch, Ram Amrani, David Miller, Ariel Elior,
	Michal Kalderon, Rajesh Borundia,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, netdev
In-Reply-To: <20160915043716.GD26069-2ukJVAZIZ/Y@public.gmane.org>

> > > If you want dynamic prints, you have two options:
> > > 1. Add support of ethtool to whole RDMA stack.
> > > 2. Use dynamic tracing infrastructure.
> >
> > > Which option do you prefer?
> > Option 3 - continuing this discussion. :-)
> 
> Sorry,
> I was under impression that you want this driver to be merged, but it looks like It
> was incorrect assumption. Let's continue discussion.

No, this is an RFC - there's no chance for *this* to merge, but this is exactly
the right time to discuss this sort of stuff.

> > Perhaps I misread your intentions - I thought that by dynamic debug
> > you meant that all debug in RDMA should be pr_debug() based, and
> > therefore my objection regarding the ease with which users can
> > configure it.
> 
> It is not for all RDMA, but in your proposed driver. You are adding this "debug"
> module argument to your module.

I don't get your answer. 
I made a generic remark [and actually one in favor of your arguments],
and instead of saying something meaningful you bash the driver.

> > If all you meant was 'dynamically set' as opposed to 'statically set'
> > then I agree that having that sort of configurability is preferable
> > [Even though end-user would still probably prefer a module parameter
> > for reproductions; As the name implies, 'debug' isn't meant to be used
> > in other situations].
> 
> We are not adding code just for fun, but for a real reason, and especially
> interfaces which will be visible to user.
> 
> The overall expectation from the driver's authors that they are submitting driver
> which doesn't have bringup issues. For real life scenarios, where the bugs will be
> reveled after some time of usage, this global debug is useless.

This has nothing to do with bringup; Real drivers are experiencing issues years after
they're productized.

> > Do notice you would be harming user-experience of reproductions though
> > - as it would have to follow different mechanisms to open debug prints
> > of various qed* components.
> 
> I don't understand this point at all. Do you think that it is normal to ask user to
> debug your driver? Is this called "user-experience"?

No, I call this 'user involved in fixing the driver' - it has nothing to do with
user-experience. Sometimes user have specifics in his system that can't
be easily identified and thus lab reproductions fail, and the user assists
in the reproduction. While I never claimed this is good practice it does happen.

> As a summary, I didn't see in your responses any real life example where you will
> need global debug level for your driver.

Not sure what you you're expecting - a list of BZs /private e-mails where
user reproductions were needed?
You're basically ignoring my claims that such are used, instead wanting
"evidence". I'm not going to try and produce any such.

Doug - I think we need a definite answer from you here; Doesn't look like
this discussion would bear any fruit.
If a debug module parameter is completely unacceptable, we'd remove it
[regardless of what I think about it].

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH V2 2/3] net-next: dsa: add Qualcomm tag RX/TX handler
From: Florian Fainelli @ 2016-09-15  5:29 UTC (permalink / raw)
  To: John Crispin, David S. Miller, Andrew Lunn
  Cc: netdev, linux-kernel, qsdk-review
In-Reply-To: <1473849542-3298-3-git-send-email-john@phrozen.org>

On 09/14/2016 03:39 AM, John Crispin wrote:
> Add support for the 2-bytes Qualcomm tag that gigabit switches such as
> the QCA8337/N might insert when receiving packets, or that we need
> to insert while targeting specific switch ports. The tag is inserted
> directly behind the ethernet header.
> 
> Signed-off-by: John Crispin <john@phrozen.org>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* Re: [PATCH 3/3] mm: memcontrol: consolidate cgroup socket tracking
From: kbuild test robot @ 2016-09-15  5:34 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: kbuild-all, Andrew Morton, Tejun Heo, David S. Miller,
	Michal Hocko, Vladimir Davydov, linux-mm, cgroups, netdev,
	linux-kernel, kernel-team
In-Reply-To: <20160914194846.11153-3-hannes@cmpxchg.org>

[-- Attachment #1: Type: text/plain, Size: 1465 bytes --]

Hi Johannes,

[auto build test ERROR on net/master]
[also build test ERROR on v4.8-rc6 next-20160914]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
[Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:    https://github.com/0day-ci/linux/commits/Johannes-Weiner/mm-memcontrol-make-per-cpu-charge-cache-IRQ-safe-for-socket-accounting/20160915-035634
config: m68k-sun3_defconfig (attached as .config)
compiler: m68k-linux-gcc (GCC) 4.9.0
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=m68k 

All errors (new ones prefixed by >>):

   net/built-in.o: In function `sk_alloc':
>> (.text+0x4076): undefined reference to `mem_cgroup_sk_alloc'
   net/built-in.o: In function `__sk_destruct':
>> sock.c:(.text+0x457e): undefined reference to `mem_cgroup_sk_free'
   net/built-in.o: In function `sk_clone_lock':
   (.text+0x4f1c): undefined reference to `mem_cgroup_sk_alloc'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 11447 bytes --]

^ permalink raw reply

* Re: [RFC 02/11] Add RoCE driver framework
From: Leon Romanovsky @ 2016-09-15  5:42 UTC (permalink / raw)
  To: Mintz, Yuval
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Mark Bloch,
	Ram Amrani, David Miller, Ariel Elior, Michal Kalderon,
	Rajesh Borundia,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, netdev
In-Reply-To: <BL2PR07MB23060C776EDAE92B84FCFB768DF00-I6Fv6QFlT9L2NWYWB7JgfuFPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 633 bytes --]

On Thu, Sep 15, 2016 at 05:11:03AM +0000, Mintz, Yuval wrote:
> > As a summary, I didn't see in your responses any real life example where you will
> > need global debug level for your driver.
>
> Not sure what you you're expecting - a list of BZs /private e-mails where
> user reproductions were needed?
> You're basically ignoring my claims that such are used, instead wanting
> "evidence". I'm not going to try and produce any such.

I asked an example and not evidence, where "modprobe your_driver
debug=1" will be superior to "modprobe your_driver dyndbg==pmf".

https://www.kernel.org/doc/Documentation/dynamic-debug-howto.txt

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* RE: [RFC 00/11] QLogic RDMA Driver (qedr) RFC
From: Amrani, Ram @ 2016-09-15  5:55 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, David Miller,
	Yuval Mintz, Ariel Elior, Michal Kalderon, Rajesh Borundia,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, netdev
In-Reply-To: <20160914171737.GH16014-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

> > What do you mean by "standard kernel names"?
> 
> By that I mean 'identical copies' do not copy the file and then randomly change
> it giving things different names or putting different content in structs.
> 
> You will want to submit your user provider to rdma-plumbing to get it into the
> distros, we are planning to set it as the vehicle for code targeting 4.9

Got it. Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v4 05/16] IB/pvrdma: Add functions for Verbs support
From: Christoph Hellwig @ 2016-09-15  6:15 UTC (permalink / raw)
  To: Adit Ranadive
  Cc: Christoph Hellwig,
	dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, pv-drivers,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Jorgen S. Hansen, Aditya Sarwade, George Zhang, Bryan Tan
In-Reply-To: <BLUPR0501MB836AB8BF5108322033C5DD6C5F00-84Rf5TRaNBMVDhIuTCx1aJLWcSx1hRipwIZJ9u9yWa8oOQlpcoRfSA@public.gmane.org>

On Thu, Sep 15, 2016 at 12:10:10AM +0000, Adit Ranadive wrote:
> On Wed, Sep 14, 2016 at 05:49:50 -0700 Christoph Hellwig wrote:
> > > +	props->max_fmr = dev->dsr->caps.max_fmr;
> > > +	props->max_map_per_fmr = dev->dsr->caps.max_map_per_fmr;
> > 
> > Please don't add FMR support to any new drivers.
> 
> We don't and our device reports these as 0. If you want me to more explicit I
> can remove the zero'd out properties.

Oh, ok.  I'll withdraw my comment then.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v5 0/6] Add eBPF hooks for cgroups
From: Vincent Bernat @ 2016-09-15  6:36 UTC (permalink / raw)
  To: Daniel Mack
  Cc: htejun, daniel, ast, davem, kafai, fw, pablo, harald, netdev,
	sargun, cgroups
In-Reply-To: <1473696735-11269-1-git-send-email-daniel@zonque.org>

 ❦ 12 septembre 2016 18:12 CEST, Daniel Mack <daniel@zonque.org> :

> * The sample program learned to support both ingress and egress, and
>   can now optionally make the eBPF program drop packets by making it
>   return 0.

Ability to lock the eBPF program to avoid modification from a later
program or in a subcgroup would be pretty interesting from a security
perspective.
-- 
Use recursive procedures for recursively-defined data structures.
            - The Elements of Programming Style (Kernighan & Plauger)

^ permalink raw reply

* Re: [PATCH v4 00/16] Add Paravirtual RDMA Driver
From: Leon Romanovsky @ 2016-09-15  7:02 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Adit Ranadive, dledford@redhat.com, linux-rdma@vger.kernel.org,
	pv-drivers, netdev@vger.kernel.org, linux-pci@vger.kernel.org,
	Jorgen S. Hansen, Aditya Sarwade, George Zhang, Bryan Tan
In-Reply-To: <20160914173636.GA10309@obsidianresearch.com>

[-- Attachment #1: Type: text/plain, Size: 1024 bytes --]

On Wed, Sep 14, 2016 at 11:36:36AM -0600, Jason Gunthorpe wrote:
> On Mon, Sep 12, 2016 at 10:43:00PM +0000, Adit Ranadive wrote:
> > On Mon, Sep 12, 2016 at 11:03:39 -0700, Jason Gunthorpe wrote:
> > > On Sun, Sep 11, 2016 at 09:49:10PM -0700, Adit Ranadive wrote:
> > > > [2] Libpvrdma User-level library -
> > > > http://git.openfabrics.org/?p=~aditr/libpvrdma.git;a=summary
> > >
> > > You will probably find that rdma-plumbing will be the best way to get
> > > your userspace component into the distributors.
> >
> > Hi Jason,
> >
> > Sorry I haven't paying attention to that discussion. Do you know how soon
> > distros will pick up the rdma-plumbing stuff?
>
> We desire to use this as the vehical for the userspace included with
> the 4.9 kernel.
>
> I anticipate the tree will be running by Oct 1.

+1

>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH v4 05/16] IB/pvrdma: Add functions for Verbs support
From: Leon Romanovsky @ 2016-09-15  7:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Adit Ranadive, dledford@redhat.com, linux-rdma@vger.kernel.org,
	pv-drivers, netdev@vger.kernel.org, linux-pci@vger.kernel.org,
	Jorgen S. Hansen, Aditya Sarwade, George Zhang, Bryan Tan
In-Reply-To: <20160915061537.GC4869@infradead.org>

[-- Attachment #1: Type: text/plain, Size: 884 bytes --]

On Wed, Sep 14, 2016 at 11:15:37PM -0700, Christoph Hellwig wrote:
> On Thu, Sep 15, 2016 at 12:10:10AM +0000, Adit Ranadive wrote:
> > On Wed, Sep 14, 2016 at 05:49:50 -0700 Christoph Hellwig wrote:
> > > > +	props->max_fmr = dev->dsr->caps.max_fmr;
> > > > +	props->max_map_per_fmr = dev->dsr->caps.max_map_per_fmr;
> > >
> > > Please don't add FMR support to any new drivers.
> >
> > We don't and our device reports these as 0. If you want me to more explicit I
> > can remove the zero'd out properties.
>
> Oh, ok.  I'll withdraw my comment then.

I would suggest to remove zero assignments to struct which is zero from
the beginning. It will eliminate the confusions.

Thanks

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [RFC 08/11] Add support for data path
From: Leon Romanovsky @ 2016-09-15  7:24 UTC (permalink / raw)
  To: Ram Amrani
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	Yuval.Mintz-h88ZbnxC6KDQT0dZR+AlfA,
	Ariel.Elior-h88ZbnxC6KDQT0dZR+AlfA,
	Michal.Kalderon-h88ZbnxC6KDQT0dZR+AlfA,
	rajesh.borundia-h88ZbnxC6KDQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1473696465-27986-9-git-send-email-Ram.Amrani-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 546 bytes --]

On Mon, Sep 12, 2016 at 07:07:42PM +0300, Ram Amrani wrote:

> +++ b/drivers/infiniband/hw/qedr/qedr_hsi_rdma.h
> @@ -150,6 +150,12 @@ struct rdma_rq_sge {
>  	struct regpair addr;
>  	__le32 length;
>  	__le32 flags;
> +#define RDMA_RQ_SGE_L_KEY_MASK      0x3FFFFFF
> +#define RDMA_RQ_SGE_L_KEY_SHIFT     0
> +#define RDMA_RQ_SGE_NUM_SGES_MASK   0x7
> +#define RDMA_RQ_SGE_NUM_SGES_SHIFT  26
> +#define RDMA_RQ_SGE_RESERVED0_MASK  0x7
> +#define RDMA_RQ_SGE_RESERVED0_SHIFT 29
>  };

It is interesting twist to mix defines and structs together.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH v4 16/16] MAINTAINERS: Update for PVRDMA driver
From: Leon Romanovsky @ 2016-09-15  7:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Adit Ranadive, dledford, linux-rdma, pv-drivers, netdev,
	linux-pci, jhansen, asarwade, georgezhang, bryantan
In-Reply-To: <20160912175222.GG5843@obsidianresearch.com>

[-- Attachment #1: Type: text/plain, Size: 291 bytes --]

On Mon, Sep 12, 2016 at 11:52:22AM -0600, Jason Gunthorpe wrote:
> On Sun, Sep 11, 2016 at 09:49:26PM -0700, Adit Ranadive wrote:
> > Add maintainer info for the PVRDMA driver.
>
> You can probably squash the last three patches.

It doesn't matter, Doug will squash the whole series anyway.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH v4 09/16] IB/pvrdma: Add support for Completion Queues
From: Yuval Shaia @ 2016-09-15  7:36 UTC (permalink / raw)
  To: Adit Ranadive
  Cc: dledford@redhat.com, linux-rdma@vger.kernel.org, pv-drivers,
	netdev@vger.kernel.org, linux-pci@vger.kernel.org,
	Jorgen S. Hansen, Aditya Sarwade, George Zhang, Bryan Tan
In-Reply-To: <BLUPR0501MB836640F5ECEAB3A99CD82BFC5F00@BLUPR0501MB836.namprd05.prod.outlook.com>

Hi Adit,
Please see my comments inline.

Besides that I have no more comment for this patch.

Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>

Yuval

On Thu, Sep 15, 2016 at 12:07:29AM +0000, Adit Ranadive wrote:
> On Wed, Sep 14, 2016 at 05:43:37 -0700, Yuval Shaia wrote:
> > On Sun, Sep 11, 2016 at 09:49:19PM -0700, Adit Ranadive wrote:
> > > +
> > > +static int pvrdma_poll_one(struct pvrdma_cq *cq, struct pvrdma_qp
> > **cur_qp,
> > > +			   struct ib_wc *wc)
> > > +{
> > > +	struct pvrdma_dev *dev = to_vdev(cq->ibcq.device);
> > > +	int has_data;
> > > +	unsigned int head;
> > > +	bool tried = false;
> > > +	struct pvrdma_cqe *cqe;
> > > +
> > > +retry:
> > > +	has_data = pvrdma_idx_ring_has_data(&cq->ring_state->rx,
> > > +					    cq->ibcq.cqe, &head);
> > > +	if (has_data == 0) {
> > > +		if (tried)
> > > +			return -EAGAIN;
> > > +
> > > +		/* Pass down POLL to give physical HCA a chance to poll. */
> > > +		pvrdma_write_uar_cq(dev, cq->cq_handle |
> > PVRDMA_UAR_CQ_POLL);
> > > +
> > > +		tried = true;
> > > +		goto retry;
> > > +	} else if (has_data == PVRDMA_INVALID_IDX) {
> > 
> > I didn't went throw the entire life cycle of RX-ring's head and tail but you
> > need to make sure that PVRDMA_INVALID_IDX error is recoverable one, i.e
> > there is probability that in the next call to pvrdma_poll_one it will be fine.
> > Otherwise it is an endless loop.
> 
> We have never run into this issue internally but I don't think we can recover here 

I briefly reviewed the life cycle of RX-ring's head and tail and didn't
caught any suspicious place that might corrupt it.
So glad to see that you never encountered this case.

> in the driver. The only way to recover would be to destroy and recreate the CQ 
> which we shouldn't do since it could be used by multiple QPs. 

Agree.
But don't they hit the same problem too?

> We don't have a way yet to recover in the device. Once we add that this check
> should go away.

To be honest i have no idea how to do that - i was expecting driver's vendors
to come up with an ideas :)
I once came up with an idea to force restart of the driver but it was
rejected.

> 
> The reason I returned an error value from poll_cq in v3 was to break the possible 
> loop so that it might give clients a chance to recover. But since poll_cq is not expected
> to fail I just log the device error here. I can revert to that version if you want to break
> the possible loop.

Clients (ULPs) cannot recover from this case. They even do not check the
reason of the error and treats any error as -EAGAIN.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] nfp: fix error return code in nfp_net_netdev_open()
From: Jakub Kicinski @ 2016-09-15  7:43 UTC (permalink / raw)
  To: Wei Yongjun
  Cc: Dinan Gunawardena, Simon Horman, Wei Yongjun, oss-drivers, netdev
In-Reply-To: <1473911107-8427-1-git-send-email-weiyj.lk@gmail.com>

On Thu, 15 Sep 2016 03:45:07 +0000, Wei Yongjun wrote:
> From: Wei Yongjun <weiyongjun1@huawei.com>
> 
> Fix to return a negative error code from the error handling
> case instead of 0, as done elsewhere in this function.
> 
> Fixes: 73725d9dfd99 ("nfp: allocate ring SW structs dynamically")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>

FWIW this is for net.  Thanks Wei!

^ permalink raw reply

* Re: [PATCHv3 net-next 05/15] bpf: enable non-core use of the verfier
From: Jakub Kicinski @ 2016-09-15  7:52 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: netdev, ast, daniel, jiri, john.fastabend, kubakici
In-Reply-To: <20160914230549.GB60248@ast-mbp.thefacebook.com>

On Wed, 14 Sep 2016 16:05:51 -0700, Alexei Starovoitov wrote:
> On Wed, Sep 14, 2016 at 08:00:13PM +0100, Jakub Kicinski wrote:
> > Advanced JIT compilers and translators may want to use
> > eBPF verifier as a base for parsers or to perform custom
> > checks and validations.
> > 
> > Add ability for external users to invoke the verifier
> > and provide callbacks to be invoked for every intruction
> > checked.  For now only add most basic callback for
> > per-instruction pre-interpretation checks is added.  More
> > advanced users may also like to have per-instruction post
> > callback and state comparison callback.
> > 
> > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > ---
> >  include/linux/bpf_parser.h |  89 ++++++++++++++++++++++++++++++
> >  kernel/bpf/verifier.c      | 134 +++++++++++++++++++++++----------------------
> >  2 files changed, 158 insertions(+), 65 deletions(-)
> >  create mode 100644 include/linux/bpf_parser.h
> > 
> > diff --git a/include/linux/bpf_parser.h b/include/linux/bpf_parser.h
> > new file mode 100644
> > index 000000000000..daa53b204f4d
> > --- /dev/null
> > +++ b/include/linux/bpf_parser.h  
> 
> 'bpf parser' is a bit misleading name, since it can be interpreted
> as parser written in bpf.
> Also the header file containes verifier bits, therefore I think
> the better name would be bpf_verifier.h ?
> 
> > +#define MAX_USED_MAPS 64 /* max number of maps accessed by one eBPF program */
> > +
> > +struct verifier_env;
> > +struct bpf_ext_parser_ops {
> > +	int (*insn_hook)(struct verifier_env *env,
> > +			 int insn_idx, int prev_insn_idx);
> > +};  
> 
> How about calling this bpf_ext_analyzer_ops
> and main entry bpf_analyzer() ?
> I think it will better convey what it's doing.
> 
> > +
> > +/* single container for all structs
> > + * one verifier_env per bpf_check() call
> > + */
> > +struct verifier_env {
> > +	struct bpf_prog *prog;		/* eBPF program being verified */
> > +	struct verifier_stack_elem *head; /* stack of verifier states to be processed */
> > +	int stack_size;			/* number of states to be processed */
> > +	struct verifier_state cur_state; /* current verifier state */
> > +	struct verifier_state_list **explored_states; /* search pruning optimization */
> > +	const struct bpf_ext_parser_ops *pops; /* external parser ops */
> > +	void *ppriv; /* pointer to external parser's private data */  
> 
> a bit hard to review, since move and addition is in one patch.

Agreed, I'll do move+prefix with bpf_ to one patch since they're both
"no functional changes" and additions to a separate one.

> I think ppriv and pops are too obscure names.
> May be analyzer_ops and analyzer_priv ?

I'll rename everything as suggested.
 
> Conceptually looks good.

Thanks!

^ permalink raw reply

* Re: [PATCH v4 01/16] vmxnet3: Move PCI Id to pci_ids.h
From: Yuval Shaia @ 2016-09-15  7:55 UTC (permalink / raw)
  To: Adit Ranadive
  Cc: dledford@redhat.com, linux-rdma@vger.kernel.org, pv-drivers,
	netdev@vger.kernel.org, linux-pci@vger.kernel.org,
	Jorgen S. Hansen, Aditya Sarwade, George Zhang, Bryan Tan
In-Reply-To: <DM2PR0501MB84422A1029EE72FC727F960C5F10@DM2PR0501MB844.namprd05.prod.outlook.com>

Besides that no more comments.

Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>

On Wed, Sep 14, 2016 at 07:36:34PM +0000, Adit Ranadive wrote:
> On Wed, Sep 14, 2016 at 09:25:18 -0700, Yuval Shaia wrote:
> > On Wed, Sep 14, 2016 at 04:00:25PM +0000, Adit Ranadive wrote:
> > > On Wed, Sep 14, 2016 at 04:09:12 -0700, Yuval Shaia wrote:
> > > > Please update vmxnet3_drv.c accordingly.
> > >
> > > Any reason why? I don't think we need to. Vmxnet3 should just pick up
> > > the moved PCI device id from pci_ids.h file.
> > 
> > So now you need to include it from vmxnet3_drv.c.
> > Same with pvrdma_main.c
> 
> If you're asking me to include pci_ids.h in our drivers we already do that
> by including pci.h in both the drivers. 
> pci.h already includes pci_ids.h - 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/linux/pci.h#n35
> 
> If that's going to change maybe someone from the PCI group can comment on.
> 
> Thanks,
> Adit
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCHv3 net-next 08/15] nfp: add BPF to NFP code translator
From: Jakub Kicinski @ 2016-09-15  7:53 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: netdev, ast, daniel, jiri, john.fastabend, kubakici, David Miller
In-Reply-To: <20160914231510.GC60248@ast-mbp.thefacebook.com>

On Wed, 14 Sep 2016 16:15:11 -0700, Alexei Starovoitov wrote:
> On Wed, Sep 14, 2016 at 08:00:16PM +0100, Jakub Kicinski wrote:
> > Add translator for JITing eBPF to operations which
> > can be executed on NFP's programmable engines.
> > 
> > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > ---
> > v3:
> >  - don't clone the program for the verifier (no longer needed);
> >  - temporarily add a local copy of macros from bitfield.h.  
> 
> so what's the status of that other patch? which tree is it going through?

It's in wireless-drivers-next, Kalle says it should be landing in
net-next early next week.

> Does it mean we'd have to wait till after the merge window? :(
> That would be sad, since it looks like it almost ready.

If it's OK with everyone I was hoping I could have that small local copy
of the macros until bitfield.h gets propagated and then we don't have
to wait :S

^ permalink raw reply

* Re: [PATCH v5 0/6] Add eBPF hooks for cgroups
From: Daniel Mack @ 2016-09-15  8:11 UTC (permalink / raw)
  To: Vincent Bernat
  Cc: htejun-b10kYP2dOMg, daniel-FeC+5ew28dpmcu3hnIyYJQ,
	ast-b10kYP2dOMg, davem-fT/PcQaiUtIeIZ0/mPfg9Q, kafai-b10kYP2dOMg,
	fw-HFFVJYpyMKqzQB+pC5nmwQ, pablo-Cap9r6Oaw4JrovVCs/uTlw,
	harald-H+wXaHxf7aLQT0dZR+AlfA, netdev-u79uwXL29TY76Z2rM5mHXA,
	sargun-GaZTRHToo+CzQB+pC5nmwQ, cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <m3y42tlldz.fsf-PiWSfznZvZU/eRriIvX0kg@public.gmane.org>

On 09/15/2016 08:36 AM, Vincent Bernat wrote:
>  ❦ 12 septembre 2016 18:12 CEST, Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org> :
> 
>> * The sample program learned to support both ingress and egress, and
>>   can now optionally make the eBPF program drop packets by making it
>>   return 0.
> 
> Ability to lock the eBPF program to avoid modification from a later
> program or in a subcgroup would be pretty interesting from a security
> perspective.

For now, you can achieve that by dropping CAP_NET_ADMIN after installing
a program between fork and exec. I think that should suffice for a first
version. Flags to further limit that could be be added later.


Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH 3/9] net: ethernet: ti: cpts: rework initialization/deinitialization
From: Richard Cochran @ 2016-09-15  8:13 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: David S. Miller, netdev, Mugunthan V N, Sekhar Nori, linux-kernel,
	linux-omap, WingMan Kwok
In-Reply-To: <20160914130231.3035-4-grygorii.strashko@ti.com>

On Wed, Sep 14, 2016 at 04:02:25PM +0300, Grygorii Strashko wrote:
> The current implementation CPTS initialization and deinitialization
> (represented by cpts_register/unregister()) is pretty entangled and
> has some issues, like:
> - ptp clock registered before spinlock, which is protecting it, and
> before timecounter and cyclecounter initialization;
> - CPTS ref_clk requested using devm API while cpts_register() is
> called from .ndo_open(), as result additional checks required;
> - CPTS ref_clk is prepared, but never unprepared;
> - CPTS is not disabled even when unregistered..

This list of four items is a clear sign that this one patch should be
broken into a series of four.

Thanks,
Richard

^ permalink raw reply

* [PATCH] iproute2: build nsid-name cache only for commands that need it
From: Anton Aksola @ 2016-09-15  8:23 UTC (permalink / raw)
  To: netdev

The calling of netns_map_init() before command parsing introduced
a performance issue with large number of namespaces.

As commands such as add, del and exec do not need to iterate through
/var/run/netns it would be good not no build the cache before executing
these commands.

Example:
unpatched:
time seq 1 1000 | xargs -n 1 ip netns add

real    0m16.832s
user    0m1.350s
sys    0m15.029s

patched:
time seq 1 1000 | xargs -n 1 ip netns add

real    0m3.859s
user    0m0.132s
sys    0m3.205s

Signed-off-by: Anton Aksola <aakso@iki.fi>
---
 ip/ipnetns.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/ip/ipnetns.c b/ip/ipnetns.c
index af87065..4546fe7 100644
--- a/ip/ipnetns.c
+++ b/ip/ipnetns.c
@@ -775,8 +775,6 @@ static int netns_monitor(int argc, char **argv)
 
 int do_netns(int argc, char **argv)
 {
-	netns_map_init();
-
 	if (argc < 1)
 		return netns_list(0, NULL);
 
@@ -784,8 +782,10 @@ int do_netns(int argc, char **argv)
 	    (matches(*argv, "lst") == 0))
 		return netns_list(argc-1, argv+1);
 
-	if ((matches(*argv, "list-id") == 0))
+	if ((matches(*argv, "list-id") == 0)) {
+		netns_map_init();
 		return netns_list_id(argc-1, argv+1);
+	}
 
 	if (matches(*argv, "help") == 0)
 		return usage();
-- 
1.8.3.1

^ permalink raw reply related

* Re: [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing
From: Pavel Machek @ 2016-09-15  9:19 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
	Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
	David S . Miller, Elena Reshetova, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Will Drewry,
	kernel-hardening, linux-api, linux-security-module, netdev
In-Reply-To: <1472121165-29071-1-git-send-email-mic@digikod.net>

Hi!

> This series is a proof of concept to fill some missing part of seccomp as the
> ability to check syscall argument pointers or creating more dynamic security
> policies. The goal of this new stackable Linux Security Module (LSM) called
> Landlock is to allow any process, including unprivileged ones, to create
> powerful security sandboxes comparable to the Seatbelt/XNU Sandbox or the
> OpenBSD Pledge. This kind of sandbox help to mitigate the security impact of
> bugs or unexpected/malicious behaviors in userland applications.
> 
> The first RFC [1] was focused on extending seccomp while staying at the syscall
> level. This brought a working PoC but with some (mitigated) ToCToU race
> conditions due to the seccomp ptrace hole (now fixed) and the non-atomic
> syscall argument evaluation (hence the LSM hooks).

Long and nice description follows. Should it go to Documentation/
somewhere?

Because some documentation would be useful...
								Pavel

>  include/linux/bpf.h                   |  41 +++++
>  include/linux/lsm_hooks.h             |   5 +
>  include/linux/seccomp.h               |  54 ++++++-
>  include/uapi/asm-generic/errno-base.h |   1 +
>  include/uapi/linux/bpf.h              | 103 ++++++++++++
>  include/uapi/linux/seccomp.h          |   2 +
>  kernel/bpf/arraymap.c                 | 222 +++++++++++++++++++++++++
>  kernel/bpf/syscall.c                  |  18 ++-
>  kernel/bpf/verifier.c                 |  32 +++-
>  kernel/fork.c                         |  41 ++++-
>  kernel/seccomp.c                      | 211 +++++++++++++++++++++++-
>  samples/Makefile                      |   2 +-
>  samples/landlock/.gitignore           |   1 +
>  samples/landlock/Makefile             |  16 ++
>  samples/landlock/sandbox.c            | 295 ++++++++++++++++++++++++++++++++++
>  security/Kconfig                      |   1 +
>  security/Makefile                     |   2 +
>  security/landlock/Kconfig             |  19 +++
>  security/landlock/Makefile            |   3 +
>  security/landlock/checker_cgroup.c    |  96 +++++++++++
>  security/landlock/checker_cgroup.h    |  18 +++
>  security/landlock/checker_fs.c        | 183 +++++++++++++++++++++
>  security/landlock/checker_fs.h        |  20 +++
>  security/landlock/lsm.c               | 228 ++++++++++++++++++++++++++
>  security/security.c                   |   1 +
>  25 files changed, 1592 insertions(+), 23 deletions(-)
>  create mode 100644 samples/landlock/.gitignore
>  create mode 100644 samples/landlock/Makefile
>  create mode 100644 samples/landlock/sandbox.c
>  create mode 100644 security/landlock/Kconfig
>  create mode 100644 security/landlock/Makefile
>  create mode 100644 security/landlock/checker_cgroup.c
>  create mode 100644 security/landlock/checker_cgroup.h
>  create mode 100644 security/landlock/checker_fs.c
>  create mode 100644 security/landlock/checker_fs.h
>  create mode 100644 security/landlock/lsm.c
> 

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply

* Re: [RFC 09/11] Add LL2 RoCE interface
From: Leon Romanovsky @ 2016-09-15 10:20 UTC (permalink / raw)
  To: Ram Amrani
  Cc: dledford, davem, Yuval.Mintz, Ariel.Elior, Michal.Kalderon,
	rajesh.borundia, linux-rdma, netdev
In-Reply-To: <1473696465-27986-10-git-send-email-Ram.Amrani@qlogic.com>

[-- Attachment #1: Type: text/plain, Size: 600 bytes --]

On Mon, Sep 12, 2016 at 07:07:43PM +0300, Ram Amrani wrote:
> Add light L2 interface for RoCE.
>
> Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
> Signed-off-by: Ram Amrani <Ram.Amrani@qlogic.com>
> ---

<....>

> +		DP_ERR(cdev,
> +		       "QED RoCE set MAC filter failed - roce_info/ll2 NULL\n");
> +		return -EINVAL;
> +	}
> +
> +	p_ptt = qed_ptt_acquire(QED_LEADING_HWFN(cdev));
> +	if (!p_ptt) {
> +		DP_ERR(cdev,
> +		       "qed roce ll2 mac filter set: failed to acquire PTT\n");
> +		return -EINVAL;
> +	}

Please use single style for your debug prints QED RoCE vs. qed roce.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [PATCH v2 net-next 1/2] net: phy: Add Edge-rate driver for Microsemi PHYs.
From: Raju Lakkaraju @ 2016-09-15 10:26 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, f.fainelli, Allan.Nielsen, robh+dt
In-Reply-To: <20160909131832.GB30871@lunn.ch>

Hi Andrew,

Thank you for review the code.

On Fri, Sep 09, 2016 at 03:18:32PM +0200, Andrew Lunn wrote:
> EXTERNAL EMAIL
> 
> 
> > > > +static int vsc85xx_edge_rate_cntl_set(struct phy_device *phydev,
> > > > +                                   u8     edge_rate)
> > >
> > > No spaces place.
> > >
> > I ran the checkpatch. I did not find any error. I created another workspace and
> > applied the same patch. It shows the correct alignement. I have used tabs (8 space width).
> > then some spaces to align braces.
> 
> Sorry, i worded that poorly. I was meaning between the u8 and edge. A
> single space is enough.
> 
I accepted your suggestion.

> > > > +#ifdef CONFIG_OF_MDIO
> > > > +static int vsc8531_of_init(struct phy_device *phydev)
> > > > +{
> > > > +     int rc;
> > > > +     struct vsc8531_private *vsc8531 = phydev->priv;
> > > > +     struct device *dev = &phydev->mdio.dev;
> > > > +     struct device_node *of_node = dev->of_node;
> > > > +
> > > > +     if (!of_node)
> > > > +             return -ENODEV;
> > > > +
> > > > +     rc = of_property_read_u8(of_node, "vsc8531,edge-rate",
> > > > +                              &vsc8531->edge_rate);
> > >
> > > Until you have written the Documentation, it is hard for me to tell,
> > > but device tree bindings should use real units, like seconds, Ohms,
> > > Farads, etc. Is the edge rate in nS? Or is it some magic value which
> > > just gets written into the register?
> > >
> >
> > This is some magic value which just gets written into the register.
> 
> Magic values are generally not accepted in device tree bindings. Both
> Micrel and Renesas define their clock skew in ps, for example. Since
> this is rise time, it should also be possible to define it in a unit
> of time.
> 

I accepted your comment. I had discussion with my hardware team and explained
the code review comments.
They asked me to define as picoseconds as units.

> > > >  static int vsc85xx_config_init(struct phy_device *phydev)
> > > >  {
> > > >       int rc;
> > > > +     struct vsc8531_private *vsc8531;
> > > > +
> > > > +     if (!phydev->priv) {
> > >
> > > How can this happen?
> > >
> >
> > VSC 8531 driver don't have any private structure assigned initially.
> > Allways priv points to NULL.
> 
> So if it cannot happen, don't check for it.
> 
> Also, by convention, you allocate memory in the .probe() function of a
> driver. Please do it there.
> 
I accepted your review comment. 
I will re-send the patch with updates.

>         Andrew

---
Thanks,
Raju.

^ permalink raw reply

* [PATCH net-next 1/2] net sched ife action: add 16 bit helpers
From: Jamal Hadi Salim @ 2016-09-15 10:49 UTC (permalink / raw)
  To: davem; +Cc: daniel, xiyou.wangcong, netdev, Jamal Hadi Salim

From: Jamal Hadi Salim <jhs@mojatatu.com>

encoder and checker for 16 bits metadata

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 include/net/tc_act/tc_ife.h |  2 ++
 net/sched/act_ife.c         | 26 ++++++++++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/include/net/tc_act/tc_ife.h b/include/net/tc_act/tc_ife.h
index 5164bd7..9fd2bea0 100644
--- a/include/net/tc_act/tc_ife.h
+++ b/include/net/tc_act/tc_ife.h
@@ -50,9 +50,11 @@ int ife_tlv_meta_encode(void *skbdata, u16 attrtype, u16 dlen,
 int ife_alloc_meta_u32(struct tcf_meta_info *mi, void *metaval, gfp_t gfp);
 int ife_alloc_meta_u16(struct tcf_meta_info *mi, void *metaval, gfp_t gfp);
 int ife_check_meta_u32(u32 metaval, struct tcf_meta_info *mi);
+int ife_check_meta_u16(u16 metaval, struct tcf_meta_info *mi);
 int ife_encode_meta_u32(u32 metaval, void *skbdata, struct tcf_meta_info *mi);
 int ife_validate_meta_u32(void *val, int len);
 int ife_validate_meta_u16(void *val, int len);
+int ife_encode_meta_u16(u16 metaval, void *skbdata, struct tcf_meta_info *mi);
 void ife_release_meta_gen(struct tcf_meta_info *mi);
 int register_ife_op(struct tcf_meta_ops *mops);
 int unregister_ife_op(struct tcf_meta_ops *mops);
diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index e87cd81..ccf7b4b 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -63,6 +63,23 @@ int ife_tlv_meta_encode(void *skbdata, u16 attrtype, u16 dlen, const void *dval)
 }
 EXPORT_SYMBOL_GPL(ife_tlv_meta_encode);
 
+int ife_encode_meta_u16(u16 metaval, void *skbdata, struct tcf_meta_info *mi)
+{
+	u16 edata = 0;
+
+	if (mi->metaval)
+		edata = *(u16 *)mi->metaval;
+	else if (metaval)
+		edata = metaval;
+
+	if (!edata) /* will not encode */
+		return 0;
+
+	edata = htons(edata);
+	return ife_tlv_meta_encode(skbdata, mi->metaid, 2, &edata);
+}
+EXPORT_SYMBOL_GPL(ife_encode_meta_u16);
+
 int ife_get_meta_u32(struct sk_buff *skb, struct tcf_meta_info *mi)
 {
 	if (mi->metaval)
@@ -81,6 +98,15 @@ int ife_check_meta_u32(u32 metaval, struct tcf_meta_info *mi)
 }
 EXPORT_SYMBOL_GPL(ife_check_meta_u32);
 
+int ife_check_meta_u16(u16 metaval, struct tcf_meta_info *mi)
+{
+	if (metaval || mi->metaval)
+		return 8; /* T+L+(V) == 2+2+(2+2bytepad) */
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ife_check_meta_u16);
+
 int ife_encode_meta_u32(u32 metaval, void *skbdata, struct tcf_meta_info *mi)
 {
 	u32 edata = metaval;
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 2/2] net sched ife action: Introduce skb tcindex metadata encap decap
From: Jamal Hadi Salim @ 2016-09-15 10:49 UTC (permalink / raw)
  To: davem; +Cc: daniel, xiyou.wangcong, netdev, Jamal Hadi Salim
In-Reply-To: <1473936594-5152-1-git-send-email-jhs@emojatatu.com>

From: Jamal Hadi Salim <jhs@mojatatu.com>

Sample use case of how this is encoded:
user space via tuntap (or a connected VM/Machine/container)
encodes the tcindex TLV.

Sample use case of decoding:
IFE action decodes it and the skb->tc_index is then used to classify.
So something like this for encoded ICMP packets:

.. first decode then reclassify... skb->tcindex will be set
sudo $TC filter add dev $ETH parent ffff: prio 2 protocol 0xbeef \
u32 match u32 0 0 flowid 1:1 \
action ife decode reclassify

...next match the decode icmp packet...
sudo $TC filter add dev $ETH parent ffff: prio 4 protocol ip \
u32 match ip protocol 1 0xff flowid 1:1 \
action continue
... last classify it using the tcindex classifier and do someaction..
sudo $TC filter add dev $ETH parent ffff: prio 5 protocol ip \
handle 0x11 tcindex classid 1:1 \
action blah..

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 include/uapi/linux/tc_act/tc_ife.h |  3 +-
 net/sched/Kconfig                  |  5 +++
 net/sched/Makefile                 |  1 +
 net/sched/act_meta_skbtcindex.c    | 81 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 89 insertions(+), 1 deletion(-)
 create mode 100644 net/sched/act_meta_skbtcindex.c

diff --git a/include/uapi/linux/tc_act/tc_ife.h b/include/uapi/linux/tc_act/tc_ife.h
index 4ece02a..cd18360 100644
--- a/include/uapi/linux/tc_act/tc_ife.h
+++ b/include/uapi/linux/tc_act/tc_ife.h
@@ -32,8 +32,9 @@ enum {
 #define IFE_META_HASHID 2
 #define	IFE_META_PRIO 3
 #define	IFE_META_QMAP 4
+#define	IFE_META_TCINDEX 5
 /*Can be overridden at runtime by module option*/
-#define	__IFE_META_MAX 5
+#define	__IFE_META_MAX 6
 #define IFE_META_MAX (__IFE_META_MAX - 1)
 
 #endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 7795d5a..87956a7 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -793,6 +793,11 @@ config NET_IFE_SKBPRIO
         depends on NET_ACT_IFE
         ---help---
 
+config NET_IFE_SKBTCINDEX
+        tristate "Support to encoding decoding skb tcindex on IFE action"
+        depends on NET_ACT_IFE
+        ---help---
+
 config NET_CLS_IND
 	bool "Incoming device classification"
 	depends on NET_CLS_U32 || NET_CLS_FW
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 148ae0d..4bdda36 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -23,6 +23,7 @@ obj-$(CONFIG_NET_ACT_SKBMOD)	+= act_skbmod.o
 obj-$(CONFIG_NET_ACT_IFE)	+= act_ife.o
 obj-$(CONFIG_NET_IFE_SKBMARK)	+= act_meta_mark.o
 obj-$(CONFIG_NET_IFE_SKBPRIO)	+= act_meta_skbprio.o
+obj-$(CONFIG_NET_IFE_SKBTCINDEX)	+= act_meta_skbtcindex.o
 obj-$(CONFIG_NET_ACT_TUNNEL_KEY)+= act_tunnel_key.o
 obj-$(CONFIG_NET_SCH_FIFO)	+= sch_fifo.o
 obj-$(CONFIG_NET_SCH_CBQ)	+= sch_cbq.o
diff --git a/net/sched/act_meta_skbtcindex.c b/net/sched/act_meta_skbtcindex.c
new file mode 100644
index 0000000..ec43327
--- /dev/null
+++ b/net/sched/act_meta_skbtcindex.c
@@ -0,0 +1,81 @@
+/*
+ * net/sched/act_meta_tc_index.c IFE skb->tc_index metadata module
+ *
+ *		This program is free software; you can redistribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * copyright Jamal Hadi Salim (2016)
+ *
+*/
+
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+#include <uapi/linux/tc_act/tc_ife.h>
+#include <net/tc_act/tc_ife.h>
+#include <linux/rtnetlink.h>
+
+static int skbtcindex_encode(struct sk_buff *skb, void *skbdata,
+			     struct tcf_meta_info *e)
+{
+	u32 ifetc_index = skb->tc_index;
+
+	return ife_encode_meta_u16(ifetc_index, skbdata, e);
+}
+
+static int skbtcindex_decode(struct sk_buff *skb, void *data, u16 len)
+{
+	u16 ifetc_index = *(u16 *)data;
+
+	skb->tc_index = ntohs(ifetc_index);
+	return 0;
+}
+
+static int skbtcindex_check(struct sk_buff *skb, struct tcf_meta_info *e)
+{
+	return ife_check_meta_u16(skb->tc_index, e);
+}
+
+static struct tcf_meta_ops ife_skbtcindex_ops = {
+	.metaid = IFE_META_TCINDEX,
+	.metatype = NLA_U16,
+	.name = "tc_index",
+	.synopsis = "skb tc_index 16 bit metadata",
+	.check_presence = skbtcindex_check,
+	.encode = skbtcindex_encode,
+	.decode = skbtcindex_decode,
+	.get = ife_get_meta_u16,
+	.alloc = ife_alloc_meta_u16,
+	.release = ife_release_meta_gen,
+	.validate = ife_validate_meta_u16,
+	.owner = THIS_MODULE,
+};
+
+static int __init ifetc_index_init_module(void)
+{
+	pr_emerg("Loaded IFE tc_index\n");
+	return register_ife_op(&ife_skbtcindex_ops);
+}
+
+static void __exit ifetc_index_cleanup_module(void)
+{
+	pr_emerg("Unloaded IFE tc_index\n");
+	unregister_ife_op(&ife_skbtcindex_ops);
+}
+
+module_init(ifetc_index_init_module);
+module_exit(ifetc_index_cleanup_module);
+
+MODULE_AUTHOR("Jamal Hadi Salim(2016)");
+MODULE_DESCRIPTION("Inter-FE skb tc_index metadata module");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_IFE_META(IFE_META_SKBTCINDEX);
-- 
1.9.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox