* Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
From: Alexei Starovoitov @ 2016-09-15 4:48 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Mickaël Salaün, linux-kernel@vger.kernel.org,
Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
Elena Reshetova, Eric W . Biederman, James Morris, Kees Cook,
Paul Moore, Sargun Dhillon, Serge E . Hallyn, Tejun Heo,
Will Drewry
In-Reply-To: <CALCETrU=tGLx8s_eqji6SfXRi=3W8FkGC7wA6VMfD-_wAVb66w@mail.gmail.com>
On Wed, Sep 14, 2016 at 09:38:16PM -0700, Andy Lutomirski wrote:
> On Wed, Sep 14, 2016 at 9:31 PM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Wed, Sep 14, 2016 at 09:08:57PM -0700, Andy Lutomirski wrote:
> >> On Wed, Sep 14, 2016 at 9:00 PM, Alexei Starovoitov
> >> <alexei.starovoitov@gmail.com> wrote:
> >> > On Wed, Sep 14, 2016 at 07:27:08PM -0700, Andy Lutomirski wrote:
> >> >> >> >
> >> >> >> > This RFC handle both cgroup and seccomp approaches in a similar way. I
> >> >> >> > don't see why building on top of cgroup v2 is a problem. Is there
> >> >> >> > security issues with delegation?
> >> >> >>
> >> >> >> What I mean is: cgroup v2 delegation has a functionality problem.
> >> >> >> Tejun says [1]:
> >> >> >>
> >> >> >> We haven't had to face this decision because cgroup has never properly
> >> >> >> supported delegating to applications and the in-use setups where this
> >> >> >> happens are custom configurations where there is no boundary between
> >> >> >> system and applications and adhoc trial-and-error is good enough a way
> >> >> >> to find a working solution. That wiggle room goes away once we
> >> >> >> officially open this up to individual applications.
> >> >> >>
> >> >> >> Unless and until that changes, I think that landlock should stay away
> >> >> >> from cgroups. Others could reasonably disagree with me.
> >> >> >
> >> >> > Ours and Sargun's use cases for cgroup+lsm+bpf is not for security
> >> >> > and not for sandboxing. So the above doesn't matter in such contexts.
> >> >> > lsm hooks + cgroups provide convenient scope and existing entry points.
> >> >> > Please see checmate examples how it's used.
> >> >> >
> >> >>
> >> >> To be clear: I'm not arguing at all that there shouldn't be
> >> >> bpf+lsm+cgroup integration. I'm arguing that the unprivileged
> >> >> landlock interface shouldn't expose any cgroup integration, at least
> >> >> until the cgroup situation settles down a lot.
> >> >
> >> > ahh. yes. we're perfectly in agreement here.
> >> > I'm suggesting that the next RFC shouldn't include unpriv
> >> > and seccomp at all. Once bpf+lsm+cgroup is merged, we can
> >> > argue about unpriv with cgroups and even unpriv as a whole,
> >> > since it's not a given. Seccomp integration is also questionable.
> >> > I'd rather not have seccomp as a gate keeper for this lsm.
> >> > lsm and seccomp are orthogonal hook points. Syscalls and lsm hooks
> >> > don't have one to one relationship, so mixing them up is only
> >> > asking for trouble further down the road.
> >> > If we really need to carry some information from seccomp to lsm+bpf,
> >> > it's easier to add eBPF support to seccomp and let bpf side deal
> >> > with passing whatever information.
> >> >
> >>
> >> As an argument for keeping seccomp (or an extended seccomp) as the
> >> interface for an unprivileged bpf+lsm: seccomp already checks off most
> >> of the boxes for safely letting unprivileged programs sandbox
> >> themselves.
> >
> > you mean the attach part of seccomp syscall that deals with no_new_priv?
> > sure, that's reusable.
> >
> >> Furthermore, to the extent that there are use cases for
> >> unprivileged bpf+lsm that *aren't* expressible within the seccomp
> >> hierarchy, I suspect that syscall filters have exactly the same
> >> problem and that we should fix seccomp to cover it.
> >
> > not sure what you mean by 'seccomp hierarchy'. The normal process
> > hierarchy ?
>
> Kind of. I mean the filter layers that are inherited across fork(),
> the TSYNC mechanism, etc.
>
> > imo the main deficiency of secccomp is inability to look into arguments.
> > One can argue that it's a blessing, since composite args
> > are not yet copied into the kernel memory.
> > But in a lot of cases the seccomp arguments are FDs pointing
> > to kernel objects and if programs could examine those objects
> > the sandboxing scope would be more precise.
> > lsm+bpf solves that part and I'd still argue that it's
> > orthogonal to seccomp's pass/reject flow.
> > I mean if seccomp says 'ok' the syscall should continue executing
> > as normal and whatever LSM hooks were triggered by it may have
> > their own lsm+bpf verdicts.
>
> I agree with all of this...
>
> > Furthermore in the process hierarchy different children
> > should be able to set their own lsm+bpf filters that are not
> > related to parallel seccomp+bpf hierarchy of programs.
> > seccomp syscall can be an interface to attach programs
> > to lsm hooks, but nothing more than that.
>
> I'm not sure what you mean. I mean that, logically, I think we should
> be able to do:
>
> seccomp(attach a syscall filter);
> fork();
> child does seccomp(attach some lsm filters);
>
> I think that they *should* be related to the seccomp+bpf hierarchy of
> programs in that they are entries in the same logical list of filter
> layers installed. Some of those layers can be syscall filters and
> some of the layers can be lsm filters. If we subsequently add a way
> to attach a removable seccomp filter or a way to attach a seccomp
> filter that logs failures to some fd watched by an outside monitor, I
> think that should work for lsm, too, with more or less the same
> interface.
>
> If we need a way for a sandbox manager to opt different children into
> different subsets of fancy filters, then I think that syscall filters
> and lsm filters should use the same mechanism.
>
> I think we might be on the same page here and just saying it different ways.
Sounds like it :)
All of the above makes sense to me.
The 'orthogonal' part is that the user should be able to use
this seccomp-managed hierarchy without actually enabling
TIF_SECCOMP for the task and syscalls should still go through
fast path and all the way till lsm hooks as normal.
I don't want to pay _any_ performance penalty for this feature
for lsm hooks (and all syscalls) that don't have bpf programs attached.
^ permalink raw reply
* RE: [RFC 02/11] Add RoCE driver framework
From: Mintz, Yuval @ 2016-09-15 5:11 UTC (permalink / raw)
To: Leon Romanovsky, dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
Cc: Mark Bloch, Ram Amrani, David Miller, Ariel Elior,
Michal Kalderon, Rajesh Borundia,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, netdev
In-Reply-To: <20160915043716.GD26069-2ukJVAZIZ/Y@public.gmane.org>
> > > If you want dynamic prints, you have two options:
> > > 1. Add support of ethtool to whole RDMA stack.
> > > 2. Use dynamic tracing infrastructure.
> >
> > > Which option do you prefer?
> > Option 3 - continuing this discussion. :-)
>
> Sorry,
> I was under impression that you want this driver to be merged, but it looks like It
> was incorrect assumption. Let's continue discussion.
No, this is an RFC - there's no chance for *this* to merge, but this is exactly
the right time to discuss this sort of stuff.
> > Perhaps I misread your intentions - I thought that by dynamic debug
> > you meant that all debug in RDMA should be pr_debug() based, and
> > therefore my objection regarding the ease with which users can
> > configure it.
>
> It is not for all RDMA, but in your proposed driver. You are adding this "debug"
> module argument to your module.
I don't get your answer.
I made a generic remark [and actually one in favor of your arguments],
and instead of saying something meaningful you bash the driver.
> > If all you meant was 'dynamically set' as opposed to 'statically set'
> > then I agree that having that sort of configurability is preferable
> > [Even though end-user would still probably prefer a module parameter
> > for reproductions; As the name implies, 'debug' isn't meant to be used
> > in other situations].
>
> We are not adding code just for fun, but for a real reason, and especially
> interfaces which will be visible to user.
>
> The overall expectation from the driver's authors that they are submitting driver
> which doesn't have bringup issues. For real life scenarios, where the bugs will be
> reveled after some time of usage, this global debug is useless.
This has nothing to do with bringup; Real drivers are experiencing issues years after
they're productized.
> > Do notice you would be harming user-experience of reproductions though
> > - as it would have to follow different mechanisms to open debug prints
> > of various qed* components.
>
> I don't understand this point at all. Do you think that it is normal to ask user to
> debug your driver? Is this called "user-experience"?
No, I call this 'user involved in fixing the driver' - it has nothing to do with
user-experience. Sometimes user have specifics in his system that can't
be easily identified and thus lab reproductions fail, and the user assists
in the reproduction. While I never claimed this is good practice it does happen.
> As a summary, I didn't see in your responses any real life example where you will
> need global debug level for your driver.
Not sure what you you're expecting - a list of BZs /private e-mails where
user reproductions were needed?
You're basically ignoring my claims that such are used, instead wanting
"evidence". I'm not going to try and produce any such.
Doug - I think we need a definite answer from you here; Doesn't look like
this discussion would bear any fruit.
If a debug module parameter is completely unacceptable, we'd remove it
[regardless of what I think about it].
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH V2 2/3] net-next: dsa: add Qualcomm tag RX/TX handler
From: Florian Fainelli @ 2016-09-15 5:29 UTC (permalink / raw)
To: John Crispin, David S. Miller, Andrew Lunn
Cc: netdev, linux-kernel, qsdk-review
In-Reply-To: <1473849542-3298-3-git-send-email-john@phrozen.org>
On 09/14/2016 03:39 AM, John Crispin wrote:
> Add support for the 2-bytes Qualcomm tag that gigabit switches such as
> the QCA8337/N might insert when receiving packets, or that we need
> to insert while targeting specific switch ports. The tag is inserted
> directly behind the ethernet header.
>
> Signed-off-by: John Crispin <john@phrozen.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
--
Florian
^ permalink raw reply
* Re: [PATCH 3/3] mm: memcontrol: consolidate cgroup socket tracking
From: kbuild test robot @ 2016-09-15 5:34 UTC (permalink / raw)
To: Johannes Weiner
Cc: kbuild-all, Andrew Morton, Tejun Heo, David S. Miller,
Michal Hocko, Vladimir Davydov, linux-mm, cgroups, netdev,
linux-kernel, kernel-team
In-Reply-To: <20160914194846.11153-3-hannes@cmpxchg.org>
[-- Attachment #1: Type: text/plain, Size: 1465 bytes --]
Hi Johannes,
[auto build test ERROR on net/master]
[also build test ERROR on v4.8-rc6 next-20160914]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
[Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
[Check https://git-scm.com/docs/git-format-patch for more information]
url: https://github.com/0day-ci/linux/commits/Johannes-Weiner/mm-memcontrol-make-per-cpu-charge-cache-IRQ-safe-for-socket-accounting/20160915-035634
config: m68k-sun3_defconfig (attached as .config)
compiler: m68k-linux-gcc (GCC) 4.9.0
reproduce:
wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=m68k
All errors (new ones prefixed by >>):
net/built-in.o: In function `sk_alloc':
>> (.text+0x4076): undefined reference to `mem_cgroup_sk_alloc'
net/built-in.o: In function `__sk_destruct':
>> sock.c:(.text+0x457e): undefined reference to `mem_cgroup_sk_free'
net/built-in.o: In function `sk_clone_lock':
(.text+0x4f1c): undefined reference to `mem_cgroup_sk_alloc'
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 11447 bytes --]
^ permalink raw reply
* Re: [RFC 02/11] Add RoCE driver framework
From: Leon Romanovsky @ 2016-09-15 5:42 UTC (permalink / raw)
To: Mintz, Yuval
Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Mark Bloch,
Ram Amrani, David Miller, Ariel Elior, Michal Kalderon,
Rajesh Borundia,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, netdev
In-Reply-To: <BL2PR07MB23060C776EDAE92B84FCFB768DF00-I6Fv6QFlT9L2NWYWB7JgfuFPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 633 bytes --]
On Thu, Sep 15, 2016 at 05:11:03AM +0000, Mintz, Yuval wrote:
> > As a summary, I didn't see in your responses any real life example where you will
> > need global debug level for your driver.
>
> Not sure what you you're expecting - a list of BZs /private e-mails where
> user reproductions were needed?
> You're basically ignoring my claims that such are used, instead wanting
> "evidence". I'm not going to try and produce any such.
I asked an example and not evidence, where "modprobe your_driver
debug=1" will be superior to "modprobe your_driver dyndbg==pmf".
https://www.kernel.org/doc/Documentation/dynamic-debug-howto.txt
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* RE: [RFC 00/11] QLogic RDMA Driver (qedr) RFC
From: Amrani, Ram @ 2016-09-15 5:55 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, David Miller,
Yuval Mintz, Ariel Elior, Michal Kalderon, Rajesh Borundia,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, netdev
In-Reply-To: <20160914171737.GH16014-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
> > What do you mean by "standard kernel names"?
>
> By that I mean 'identical copies' do not copy the file and then randomly change
> it giving things different names or putting different content in structs.
>
> You will want to submit your user provider to rdma-plumbing to get it into the
> distros, we are planning to set it as the vehicle for code targeting 4.9
Got it. Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH v4 05/16] IB/pvrdma: Add functions for Verbs support
From: Christoph Hellwig @ 2016-09-15 6:15 UTC (permalink / raw)
To: Adit Ranadive
Cc: Christoph Hellwig,
dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, pv-drivers,
netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Jorgen S. Hansen, Aditya Sarwade, George Zhang, Bryan Tan
In-Reply-To: <BLUPR0501MB836AB8BF5108322033C5DD6C5F00-84Rf5TRaNBMVDhIuTCx1aJLWcSx1hRipwIZJ9u9yWa8oOQlpcoRfSA@public.gmane.org>
On Thu, Sep 15, 2016 at 12:10:10AM +0000, Adit Ranadive wrote:
> On Wed, Sep 14, 2016 at 05:49:50 -0700 Christoph Hellwig wrote:
> > > + props->max_fmr = dev->dsr->caps.max_fmr;
> > > + props->max_map_per_fmr = dev->dsr->caps.max_map_per_fmr;
> >
> > Please don't add FMR support to any new drivers.
>
> We don't and our device reports these as 0. If you want me to more explicit I
> can remove the zero'd out properties.
Oh, ok. I'll withdraw my comment then.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH v5 0/6] Add eBPF hooks for cgroups
From: Vincent Bernat @ 2016-09-15 6:36 UTC (permalink / raw)
To: Daniel Mack
Cc: htejun, daniel, ast, davem, kafai, fw, pablo, harald, netdev,
sargun, cgroups
In-Reply-To: <1473696735-11269-1-git-send-email-daniel@zonque.org>
❦ 12 septembre 2016 18:12 CEST, Daniel Mack <daniel@zonque.org> :
> * The sample program learned to support both ingress and egress, and
> can now optionally make the eBPF program drop packets by making it
> return 0.
Ability to lock the eBPF program to avoid modification from a later
program or in a subcgroup would be pretty interesting from a security
perspective.
--
Use recursive procedures for recursively-defined data structures.
- The Elements of Programming Style (Kernighan & Plauger)
^ permalink raw reply
* Re: [PATCH v4 00/16] Add Paravirtual RDMA Driver
From: Leon Romanovsky @ 2016-09-15 7:02 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Adit Ranadive, dledford@redhat.com, linux-rdma@vger.kernel.org,
pv-drivers, netdev@vger.kernel.org, linux-pci@vger.kernel.org,
Jorgen S. Hansen, Aditya Sarwade, George Zhang, Bryan Tan
In-Reply-To: <20160914173636.GA10309@obsidianresearch.com>
[-- Attachment #1: Type: text/plain, Size: 1024 bytes --]
On Wed, Sep 14, 2016 at 11:36:36AM -0600, Jason Gunthorpe wrote:
> On Mon, Sep 12, 2016 at 10:43:00PM +0000, Adit Ranadive wrote:
> > On Mon, Sep 12, 2016 at 11:03:39 -0700, Jason Gunthorpe wrote:
> > > On Sun, Sep 11, 2016 at 09:49:10PM -0700, Adit Ranadive wrote:
> > > > [2] Libpvrdma User-level library -
> > > > http://git.openfabrics.org/?p=~aditr/libpvrdma.git;a=summary
> > >
> > > You will probably find that rdma-plumbing will be the best way to get
> > > your userspace component into the distributors.
> >
> > Hi Jason,
> >
> > Sorry I haven't paying attention to that discussion. Do you know how soon
> > distros will pick up the rdma-plumbing stuff?
>
> We desire to use this as the vehical for the userspace included with
> the 4.9 kernel.
>
> I anticipate the tree will be running by Oct 1.
+1
>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Re: [PATCH v4 05/16] IB/pvrdma: Add functions for Verbs support
From: Leon Romanovsky @ 2016-09-15 7:15 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Adit Ranadive, dledford@redhat.com, linux-rdma@vger.kernel.org,
pv-drivers, netdev@vger.kernel.org, linux-pci@vger.kernel.org,
Jorgen S. Hansen, Aditya Sarwade, George Zhang, Bryan Tan
In-Reply-To: <20160915061537.GC4869@infradead.org>
[-- Attachment #1: Type: text/plain, Size: 884 bytes --]
On Wed, Sep 14, 2016 at 11:15:37PM -0700, Christoph Hellwig wrote:
> On Thu, Sep 15, 2016 at 12:10:10AM +0000, Adit Ranadive wrote:
> > On Wed, Sep 14, 2016 at 05:49:50 -0700 Christoph Hellwig wrote:
> > > > + props->max_fmr = dev->dsr->caps.max_fmr;
> > > > + props->max_map_per_fmr = dev->dsr->caps.max_map_per_fmr;
> > >
> > > Please don't add FMR support to any new drivers.
> >
> > We don't and our device reports these as 0. If you want me to more explicit I
> > can remove the zero'd out properties.
>
> Oh, ok. I'll withdraw my comment then.
I would suggest to remove zero assignments to struct which is zero from
the beginning. It will eliminate the confusions.
Thanks
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Re: [RFC 08/11] Add support for data path
From: Leon Romanovsky @ 2016-09-15 7:24 UTC (permalink / raw)
To: Ram Amrani
Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
Yuval.Mintz-h88ZbnxC6KDQT0dZR+AlfA,
Ariel.Elior-h88ZbnxC6KDQT0dZR+AlfA,
Michal.Kalderon-h88ZbnxC6KDQT0dZR+AlfA,
rajesh.borundia-h88ZbnxC6KDQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1473696465-27986-9-git-send-email-Ram.Amrani-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 546 bytes --]
On Mon, Sep 12, 2016 at 07:07:42PM +0300, Ram Amrani wrote:
> +++ b/drivers/infiniband/hw/qedr/qedr_hsi_rdma.h
> @@ -150,6 +150,12 @@ struct rdma_rq_sge {
> struct regpair addr;
> __le32 length;
> __le32 flags;
> +#define RDMA_RQ_SGE_L_KEY_MASK 0x3FFFFFF
> +#define RDMA_RQ_SGE_L_KEY_SHIFT 0
> +#define RDMA_RQ_SGE_NUM_SGES_MASK 0x7
> +#define RDMA_RQ_SGE_NUM_SGES_SHIFT 26
> +#define RDMA_RQ_SGE_RESERVED0_MASK 0x7
> +#define RDMA_RQ_SGE_RESERVED0_SHIFT 29
> };
It is interesting twist to mix defines and structs together.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Re: [PATCH v4 16/16] MAINTAINERS: Update for PVRDMA driver
From: Leon Romanovsky @ 2016-09-15 7:27 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Adit Ranadive, dledford, linux-rdma, pv-drivers, netdev,
linux-pci, jhansen, asarwade, georgezhang, bryantan
In-Reply-To: <20160912175222.GG5843@obsidianresearch.com>
[-- Attachment #1: Type: text/plain, Size: 291 bytes --]
On Mon, Sep 12, 2016 at 11:52:22AM -0600, Jason Gunthorpe wrote:
> On Sun, Sep 11, 2016 at 09:49:26PM -0700, Adit Ranadive wrote:
> > Add maintainer info for the PVRDMA driver.
>
> You can probably squash the last three patches.
It doesn't matter, Doug will squash the whole series anyway.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Re: [PATCH v4 09/16] IB/pvrdma: Add support for Completion Queues
From: Yuval Shaia @ 2016-09-15 7:36 UTC (permalink / raw)
To: Adit Ranadive
Cc: dledford@redhat.com, linux-rdma@vger.kernel.org, pv-drivers,
netdev@vger.kernel.org, linux-pci@vger.kernel.org,
Jorgen S. Hansen, Aditya Sarwade, George Zhang, Bryan Tan
In-Reply-To: <BLUPR0501MB836640F5ECEAB3A99CD82BFC5F00@BLUPR0501MB836.namprd05.prod.outlook.com>
Hi Adit,
Please see my comments inline.
Besides that I have no more comment for this patch.
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Yuval
On Thu, Sep 15, 2016 at 12:07:29AM +0000, Adit Ranadive wrote:
> On Wed, Sep 14, 2016 at 05:43:37 -0700, Yuval Shaia wrote:
> > On Sun, Sep 11, 2016 at 09:49:19PM -0700, Adit Ranadive wrote:
> > > +
> > > +static int pvrdma_poll_one(struct pvrdma_cq *cq, struct pvrdma_qp
> > **cur_qp,
> > > + struct ib_wc *wc)
> > > +{
> > > + struct pvrdma_dev *dev = to_vdev(cq->ibcq.device);
> > > + int has_data;
> > > + unsigned int head;
> > > + bool tried = false;
> > > + struct pvrdma_cqe *cqe;
> > > +
> > > +retry:
> > > + has_data = pvrdma_idx_ring_has_data(&cq->ring_state->rx,
> > > + cq->ibcq.cqe, &head);
> > > + if (has_data == 0) {
> > > + if (tried)
> > > + return -EAGAIN;
> > > +
> > > + /* Pass down POLL to give physical HCA a chance to poll. */
> > > + pvrdma_write_uar_cq(dev, cq->cq_handle |
> > PVRDMA_UAR_CQ_POLL);
> > > +
> > > + tried = true;
> > > + goto retry;
> > > + } else if (has_data == PVRDMA_INVALID_IDX) {
> >
> > I didn't went throw the entire life cycle of RX-ring's head and tail but you
> > need to make sure that PVRDMA_INVALID_IDX error is recoverable one, i.e
> > there is probability that in the next call to pvrdma_poll_one it will be fine.
> > Otherwise it is an endless loop.
>
> We have never run into this issue internally but I don't think we can recover here
I briefly reviewed the life cycle of RX-ring's head and tail and didn't
caught any suspicious place that might corrupt it.
So glad to see that you never encountered this case.
> in the driver. The only way to recover would be to destroy and recreate the CQ
> which we shouldn't do since it could be used by multiple QPs.
Agree.
But don't they hit the same problem too?
> We don't have a way yet to recover in the device. Once we add that this check
> should go away.
To be honest i have no idea how to do that - i was expecting driver's vendors
to come up with an ideas :)
I once came up with an idea to force restart of the driver but it was
rejected.
>
> The reason I returned an error value from poll_cq in v3 was to break the possible
> loop so that it might give clients a chance to recover. But since poll_cq is not expected
> to fail I just log the device error here. I can revert to that version if you want to break
> the possible loop.
Clients (ULPs) cannot recover from this case. They even do not check the
reason of the error and treats any error as -EAGAIN.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH] nfp: fix error return code in nfp_net_netdev_open()
From: Jakub Kicinski @ 2016-09-15 7:43 UTC (permalink / raw)
To: Wei Yongjun
Cc: Dinan Gunawardena, Simon Horman, Wei Yongjun, oss-drivers, netdev
In-Reply-To: <1473911107-8427-1-git-send-email-weiyj.lk@gmail.com>
On Thu, 15 Sep 2016 03:45:07 +0000, Wei Yongjun wrote:
> From: Wei Yongjun <weiyongjun1@huawei.com>
>
> Fix to return a negative error code from the error handling
> case instead of 0, as done elsewhere in this function.
>
> Fixes: 73725d9dfd99 ("nfp: allocate ring SW structs dynamically")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
FWIW this is for net. Thanks Wei!
^ permalink raw reply
* Re: [PATCHv3 net-next 05/15] bpf: enable non-core use of the verfier
From: Jakub Kicinski @ 2016-09-15 7:52 UTC (permalink / raw)
To: Alexei Starovoitov; +Cc: netdev, ast, daniel, jiri, john.fastabend, kubakici
In-Reply-To: <20160914230549.GB60248@ast-mbp.thefacebook.com>
On Wed, 14 Sep 2016 16:05:51 -0700, Alexei Starovoitov wrote:
> On Wed, Sep 14, 2016 at 08:00:13PM +0100, Jakub Kicinski wrote:
> > Advanced JIT compilers and translators may want to use
> > eBPF verifier as a base for parsers or to perform custom
> > checks and validations.
> >
> > Add ability for external users to invoke the verifier
> > and provide callbacks to be invoked for every intruction
> > checked. For now only add most basic callback for
> > per-instruction pre-interpretation checks is added. More
> > advanced users may also like to have per-instruction post
> > callback and state comparison callback.
> >
> > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > ---
> > include/linux/bpf_parser.h | 89 ++++++++++++++++++++++++++++++
> > kernel/bpf/verifier.c | 134 +++++++++++++++++++++++----------------------
> > 2 files changed, 158 insertions(+), 65 deletions(-)
> > create mode 100644 include/linux/bpf_parser.h
> >
> > diff --git a/include/linux/bpf_parser.h b/include/linux/bpf_parser.h
> > new file mode 100644
> > index 000000000000..daa53b204f4d
> > --- /dev/null
> > +++ b/include/linux/bpf_parser.h
>
> 'bpf parser' is a bit misleading name, since it can be interpreted
> as parser written in bpf.
> Also the header file containes verifier bits, therefore I think
> the better name would be bpf_verifier.h ?
>
> > +#define MAX_USED_MAPS 64 /* max number of maps accessed by one eBPF program */
> > +
> > +struct verifier_env;
> > +struct bpf_ext_parser_ops {
> > + int (*insn_hook)(struct verifier_env *env,
> > + int insn_idx, int prev_insn_idx);
> > +};
>
> How about calling this bpf_ext_analyzer_ops
> and main entry bpf_analyzer() ?
> I think it will better convey what it's doing.
>
> > +
> > +/* single container for all structs
> > + * one verifier_env per bpf_check() call
> > + */
> > +struct verifier_env {
> > + struct bpf_prog *prog; /* eBPF program being verified */
> > + struct verifier_stack_elem *head; /* stack of verifier states to be processed */
> > + int stack_size; /* number of states to be processed */
> > + struct verifier_state cur_state; /* current verifier state */
> > + struct verifier_state_list **explored_states; /* search pruning optimization */
> > + const struct bpf_ext_parser_ops *pops; /* external parser ops */
> > + void *ppriv; /* pointer to external parser's private data */
>
> a bit hard to review, since move and addition is in one patch.
Agreed, I'll do move+prefix with bpf_ to one patch since they're both
"no functional changes" and additions to a separate one.
> I think ppriv and pops are too obscure names.
> May be analyzer_ops and analyzer_priv ?
I'll rename everything as suggested.
> Conceptually looks good.
Thanks!
^ permalink raw reply
* Re: [PATCH v4 01/16] vmxnet3: Move PCI Id to pci_ids.h
From: Yuval Shaia @ 2016-09-15 7:55 UTC (permalink / raw)
To: Adit Ranadive
Cc: dledford@redhat.com, linux-rdma@vger.kernel.org, pv-drivers,
netdev@vger.kernel.org, linux-pci@vger.kernel.org,
Jorgen S. Hansen, Aditya Sarwade, George Zhang, Bryan Tan
In-Reply-To: <DM2PR0501MB84422A1029EE72FC727F960C5F10@DM2PR0501MB844.namprd05.prod.outlook.com>
Besides that no more comments.
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
On Wed, Sep 14, 2016 at 07:36:34PM +0000, Adit Ranadive wrote:
> On Wed, Sep 14, 2016 at 09:25:18 -0700, Yuval Shaia wrote:
> > On Wed, Sep 14, 2016 at 04:00:25PM +0000, Adit Ranadive wrote:
> > > On Wed, Sep 14, 2016 at 04:09:12 -0700, Yuval Shaia wrote:
> > > > Please update vmxnet3_drv.c accordingly.
> > >
> > > Any reason why? I don't think we need to. Vmxnet3 should just pick up
> > > the moved PCI device id from pci_ids.h file.
> >
> > So now you need to include it from vmxnet3_drv.c.
> > Same with pvrdma_main.c
>
> If you're asking me to include pci_ids.h in our drivers we already do that
> by including pci.h in both the drivers.
> pci.h already includes pci_ids.h -
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/linux/pci.h#n35
>
> If that's going to change maybe someone from the PCI group can comment on.
>
> Thanks,
> Adit
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCHv3 net-next 08/15] nfp: add BPF to NFP code translator
From: Jakub Kicinski @ 2016-09-15 7:53 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: netdev, ast, daniel, jiri, john.fastabend, kubakici, David Miller
In-Reply-To: <20160914231510.GC60248@ast-mbp.thefacebook.com>
On Wed, 14 Sep 2016 16:15:11 -0700, Alexei Starovoitov wrote:
> On Wed, Sep 14, 2016 at 08:00:16PM +0100, Jakub Kicinski wrote:
> > Add translator for JITing eBPF to operations which
> > can be executed on NFP's programmable engines.
> >
> > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > ---
> > v3:
> > - don't clone the program for the verifier (no longer needed);
> > - temporarily add a local copy of macros from bitfield.h.
>
> so what's the status of that other patch? which tree is it going through?
It's in wireless-drivers-next, Kalle says it should be landing in
net-next early next week.
> Does it mean we'd have to wait till after the merge window? :(
> That would be sad, since it looks like it almost ready.
If it's OK with everyone I was hoping I could have that small local copy
of the macros until bitfield.h gets propagated and then we don't have
to wait :S
^ permalink raw reply
* Re: [PATCH v5 0/6] Add eBPF hooks for cgroups
From: Daniel Mack @ 2016-09-15 8:11 UTC (permalink / raw)
To: Vincent Bernat
Cc: htejun-b10kYP2dOMg, daniel-FeC+5ew28dpmcu3hnIyYJQ,
ast-b10kYP2dOMg, davem-fT/PcQaiUtIeIZ0/mPfg9Q, kafai-b10kYP2dOMg,
fw-HFFVJYpyMKqzQB+pC5nmwQ, pablo-Cap9r6Oaw4JrovVCs/uTlw,
harald-H+wXaHxf7aLQT0dZR+AlfA, netdev-u79uwXL29TY76Z2rM5mHXA,
sargun-GaZTRHToo+CzQB+pC5nmwQ, cgroups-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <m3y42tlldz.fsf-PiWSfznZvZU/eRriIvX0kg@public.gmane.org>
On 09/15/2016 08:36 AM, Vincent Bernat wrote:
> ❦ 12 septembre 2016 18:12 CEST, Daniel Mack <daniel-cYrQPVfZoowdnm+yROfE0A@public.gmane.org> :
>
>> * The sample program learned to support both ingress and egress, and
>> can now optionally make the eBPF program drop packets by making it
>> return 0.
>
> Ability to lock the eBPF program to avoid modification from a later
> program or in a subcgroup would be pretty interesting from a security
> perspective.
For now, you can achieve that by dropping CAP_NET_ADMIN after installing
a program between fork and exec. I think that should suffice for a first
version. Flags to further limit that could be be added later.
Thanks,
Daniel
^ permalink raw reply
* Re: [PATCH 3/9] net: ethernet: ti: cpts: rework initialization/deinitialization
From: Richard Cochran @ 2016-09-15 8:13 UTC (permalink / raw)
To: Grygorii Strashko
Cc: David S. Miller, netdev, Mugunthan V N, Sekhar Nori, linux-kernel,
linux-omap, WingMan Kwok
In-Reply-To: <20160914130231.3035-4-grygorii.strashko@ti.com>
On Wed, Sep 14, 2016 at 04:02:25PM +0300, Grygorii Strashko wrote:
> The current implementation CPTS initialization and deinitialization
> (represented by cpts_register/unregister()) is pretty entangled and
> has some issues, like:
> - ptp clock registered before spinlock, which is protecting it, and
> before timecounter and cyclecounter initialization;
> - CPTS ref_clk requested using devm API while cpts_register() is
> called from .ndo_open(), as result additional checks required;
> - CPTS ref_clk is prepared, but never unprepared;
> - CPTS is not disabled even when unregistered..
This list of four items is a clear sign that this one patch should be
broken into a series of four.
Thanks,
Richard
^ permalink raw reply
* [PATCH] iproute2: build nsid-name cache only for commands that need it
From: Anton Aksola @ 2016-09-15 8:23 UTC (permalink / raw)
To: netdev
The calling of netns_map_init() before command parsing introduced
a performance issue with large number of namespaces.
As commands such as add, del and exec do not need to iterate through
/var/run/netns it would be good not no build the cache before executing
these commands.
Example:
unpatched:
time seq 1 1000 | xargs -n 1 ip netns add
real 0m16.832s
user 0m1.350s
sys 0m15.029s
patched:
time seq 1 1000 | xargs -n 1 ip netns add
real 0m3.859s
user 0m0.132s
sys 0m3.205s
Signed-off-by: Anton Aksola <aakso@iki.fi>
---
ip/ipnetns.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/ip/ipnetns.c b/ip/ipnetns.c
index af87065..4546fe7 100644
--- a/ip/ipnetns.c
+++ b/ip/ipnetns.c
@@ -775,8 +775,6 @@ static int netns_monitor(int argc, char **argv)
int do_netns(int argc, char **argv)
{
- netns_map_init();
-
if (argc < 1)
return netns_list(0, NULL);
@@ -784,8 +782,10 @@ int do_netns(int argc, char **argv)
(matches(*argv, "lst") == 0))
return netns_list(argc-1, argv+1);
- if ((matches(*argv, "list-id") == 0))
+ if ((matches(*argv, "list-id") == 0)) {
+ netns_map_init();
return netns_list_id(argc-1, argv+1);
+ }
if (matches(*argv, "help") == 0)
return usage();
--
1.8.3.1
^ permalink raw reply related
* Re: [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing
From: Pavel Machek @ 2016-09-15 9:19 UTC (permalink / raw)
To: Mickaël Salaün
Cc: linux-kernel, Alexei Starovoitov, Andy Lutomirski, Arnd Bergmann,
Casey Schaufler, Daniel Borkmann, Daniel Mack, David Drysdale,
David S . Miller, Elena Reshetova, James Morris, Kees Cook,
Paul Moore, Sargun Dhillon, Serge E . Hallyn, Will Drewry,
kernel-hardening, linux-api, linux-security-module, netdev
In-Reply-To: <1472121165-29071-1-git-send-email-mic@digikod.net>
Hi!
> This series is a proof of concept to fill some missing part of seccomp as the
> ability to check syscall argument pointers or creating more dynamic security
> policies. The goal of this new stackable Linux Security Module (LSM) called
> Landlock is to allow any process, including unprivileged ones, to create
> powerful security sandboxes comparable to the Seatbelt/XNU Sandbox or the
> OpenBSD Pledge. This kind of sandbox help to mitigate the security impact of
> bugs or unexpected/malicious behaviors in userland applications.
>
> The first RFC [1] was focused on extending seccomp while staying at the syscall
> level. This brought a working PoC but with some (mitigated) ToCToU race
> conditions due to the seccomp ptrace hole (now fixed) and the non-atomic
> syscall argument evaluation (hence the LSM hooks).
Long and nice description follows. Should it go to Documentation/
somewhere?
Because some documentation would be useful...
Pavel
> include/linux/bpf.h | 41 +++++
> include/linux/lsm_hooks.h | 5 +
> include/linux/seccomp.h | 54 ++++++-
> include/uapi/asm-generic/errno-base.h | 1 +
> include/uapi/linux/bpf.h | 103 ++++++++++++
> include/uapi/linux/seccomp.h | 2 +
> kernel/bpf/arraymap.c | 222 +++++++++++++++++++++++++
> kernel/bpf/syscall.c | 18 ++-
> kernel/bpf/verifier.c | 32 +++-
> kernel/fork.c | 41 ++++-
> kernel/seccomp.c | 211 +++++++++++++++++++++++-
> samples/Makefile | 2 +-
> samples/landlock/.gitignore | 1 +
> samples/landlock/Makefile | 16 ++
> samples/landlock/sandbox.c | 295 ++++++++++++++++++++++++++++++++++
> security/Kconfig | 1 +
> security/Makefile | 2 +
> security/landlock/Kconfig | 19 +++
> security/landlock/Makefile | 3 +
> security/landlock/checker_cgroup.c | 96 +++++++++++
> security/landlock/checker_cgroup.h | 18 +++
> security/landlock/checker_fs.c | 183 +++++++++++++++++++++
> security/landlock/checker_fs.h | 20 +++
> security/landlock/lsm.c | 228 ++++++++++++++++++++++++++
> security/security.c | 1 +
> 25 files changed, 1592 insertions(+), 23 deletions(-)
> create mode 100644 samples/landlock/.gitignore
> create mode 100644 samples/landlock/Makefile
> create mode 100644 samples/landlock/sandbox.c
> create mode 100644 security/landlock/Kconfig
> create mode 100644 security/landlock/Makefile
> create mode 100644 security/landlock/checker_cgroup.c
> create mode 100644 security/landlock/checker_cgroup.h
> create mode 100644 security/landlock/checker_fs.c
> create mode 100644 security/landlock/checker_fs.h
> create mode 100644 security/landlock/lsm.c
>
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply
* Re: [RFC 09/11] Add LL2 RoCE interface
From: Leon Romanovsky @ 2016-09-15 10:20 UTC (permalink / raw)
To: Ram Amrani
Cc: dledford, davem, Yuval.Mintz, Ariel.Elior, Michal.Kalderon,
rajesh.borundia, linux-rdma, netdev
In-Reply-To: <1473696465-27986-10-git-send-email-Ram.Amrani@qlogic.com>
[-- Attachment #1: Type: text/plain, Size: 600 bytes --]
On Mon, Sep 12, 2016 at 07:07:43PM +0300, Ram Amrani wrote:
> Add light L2 interface for RoCE.
>
> Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
> Signed-off-by: Ram Amrani <Ram.Amrani@qlogic.com>
> ---
<....>
> + DP_ERR(cdev,
> + "QED RoCE set MAC filter failed - roce_info/ll2 NULL\n");
> + return -EINVAL;
> + }
> +
> + p_ptt = qed_ptt_acquire(QED_LEADING_HWFN(cdev));
> + if (!p_ptt) {
> + DP_ERR(cdev,
> + "qed roce ll2 mac filter set: failed to acquire PTT\n");
> + return -EINVAL;
> + }
Please use single style for your debug prints QED RoCE vs. qed roce.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Re: [PATCH v2 net-next 1/2] net: phy: Add Edge-rate driver for Microsemi PHYs.
From: Raju Lakkaraju @ 2016-09-15 10:26 UTC (permalink / raw)
To: Andrew Lunn; +Cc: netdev, f.fainelli, Allan.Nielsen, robh+dt
In-Reply-To: <20160909131832.GB30871@lunn.ch>
Hi Andrew,
Thank you for review the code.
On Fri, Sep 09, 2016 at 03:18:32PM +0200, Andrew Lunn wrote:
> EXTERNAL EMAIL
>
>
> > > > +static int vsc85xx_edge_rate_cntl_set(struct phy_device *phydev,
> > > > + u8 edge_rate)
> > >
> > > No spaces place.
> > >
> > I ran the checkpatch. I did not find any error. I created another workspace and
> > applied the same patch. It shows the correct alignement. I have used tabs (8 space width).
> > then some spaces to align braces.
>
> Sorry, i worded that poorly. I was meaning between the u8 and edge. A
> single space is enough.
>
I accepted your suggestion.
> > > > +#ifdef CONFIG_OF_MDIO
> > > > +static int vsc8531_of_init(struct phy_device *phydev)
> > > > +{
> > > > + int rc;
> > > > + struct vsc8531_private *vsc8531 = phydev->priv;
> > > > + struct device *dev = &phydev->mdio.dev;
> > > > + struct device_node *of_node = dev->of_node;
> > > > +
> > > > + if (!of_node)
> > > > + return -ENODEV;
> > > > +
> > > > + rc = of_property_read_u8(of_node, "vsc8531,edge-rate",
> > > > + &vsc8531->edge_rate);
> > >
> > > Until you have written the Documentation, it is hard for me to tell,
> > > but device tree bindings should use real units, like seconds, Ohms,
> > > Farads, etc. Is the edge rate in nS? Or is it some magic value which
> > > just gets written into the register?
> > >
> >
> > This is some magic value which just gets written into the register.
>
> Magic values are generally not accepted in device tree bindings. Both
> Micrel and Renesas define their clock skew in ps, for example. Since
> this is rise time, it should also be possible to define it in a unit
> of time.
>
I accepted your comment. I had discussion with my hardware team and explained
the code review comments.
They asked me to define as picoseconds as units.
> > > > static int vsc85xx_config_init(struct phy_device *phydev)
> > > > {
> > > > int rc;
> > > > + struct vsc8531_private *vsc8531;
> > > > +
> > > > + if (!phydev->priv) {
> > >
> > > How can this happen?
> > >
> >
> > VSC 8531 driver don't have any private structure assigned initially.
> > Allways priv points to NULL.
>
> So if it cannot happen, don't check for it.
>
> Also, by convention, you allocate memory in the .probe() function of a
> driver. Please do it there.
>
I accepted your review comment.
I will re-send the patch with updates.
> Andrew
---
Thanks,
Raju.
^ permalink raw reply
* [PATCH net-next 1/2] net sched ife action: add 16 bit helpers
From: Jamal Hadi Salim @ 2016-09-15 10:49 UTC (permalink / raw)
To: davem; +Cc: daniel, xiyou.wangcong, netdev, Jamal Hadi Salim
From: Jamal Hadi Salim <jhs@mojatatu.com>
encoder and checker for 16 bits metadata
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
include/net/tc_act/tc_ife.h | 2 ++
net/sched/act_ife.c | 26 ++++++++++++++++++++++++++
2 files changed, 28 insertions(+)
diff --git a/include/net/tc_act/tc_ife.h b/include/net/tc_act/tc_ife.h
index 5164bd7..9fd2bea0 100644
--- a/include/net/tc_act/tc_ife.h
+++ b/include/net/tc_act/tc_ife.h
@@ -50,9 +50,11 @@ int ife_tlv_meta_encode(void *skbdata, u16 attrtype, u16 dlen,
int ife_alloc_meta_u32(struct tcf_meta_info *mi, void *metaval, gfp_t gfp);
int ife_alloc_meta_u16(struct tcf_meta_info *mi, void *metaval, gfp_t gfp);
int ife_check_meta_u32(u32 metaval, struct tcf_meta_info *mi);
+int ife_check_meta_u16(u16 metaval, struct tcf_meta_info *mi);
int ife_encode_meta_u32(u32 metaval, void *skbdata, struct tcf_meta_info *mi);
int ife_validate_meta_u32(void *val, int len);
int ife_validate_meta_u16(void *val, int len);
+int ife_encode_meta_u16(u16 metaval, void *skbdata, struct tcf_meta_info *mi);
void ife_release_meta_gen(struct tcf_meta_info *mi);
int register_ife_op(struct tcf_meta_ops *mops);
int unregister_ife_op(struct tcf_meta_ops *mops);
diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index e87cd81..ccf7b4b 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -63,6 +63,23 @@ int ife_tlv_meta_encode(void *skbdata, u16 attrtype, u16 dlen, const void *dval)
}
EXPORT_SYMBOL_GPL(ife_tlv_meta_encode);
+int ife_encode_meta_u16(u16 metaval, void *skbdata, struct tcf_meta_info *mi)
+{
+ u16 edata = 0;
+
+ if (mi->metaval)
+ edata = *(u16 *)mi->metaval;
+ else if (metaval)
+ edata = metaval;
+
+ if (!edata) /* will not encode */
+ return 0;
+
+ edata = htons(edata);
+ return ife_tlv_meta_encode(skbdata, mi->metaid, 2, &edata);
+}
+EXPORT_SYMBOL_GPL(ife_encode_meta_u16);
+
int ife_get_meta_u32(struct sk_buff *skb, struct tcf_meta_info *mi)
{
if (mi->metaval)
@@ -81,6 +98,15 @@ int ife_check_meta_u32(u32 metaval, struct tcf_meta_info *mi)
}
EXPORT_SYMBOL_GPL(ife_check_meta_u32);
+int ife_check_meta_u16(u16 metaval, struct tcf_meta_info *mi)
+{
+ if (metaval || mi->metaval)
+ return 8; /* T+L+(V) == 2+2+(2+2bytepad) */
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(ife_check_meta_u16);
+
int ife_encode_meta_u32(u32 metaval, void *skbdata, struct tcf_meta_info *mi)
{
u32 edata = metaval;
--
1.9.1
^ permalink raw reply related
* [PATCH net-next 2/2] net sched ife action: Introduce skb tcindex metadata encap decap
From: Jamal Hadi Salim @ 2016-09-15 10:49 UTC (permalink / raw)
To: davem; +Cc: daniel, xiyou.wangcong, netdev, Jamal Hadi Salim
In-Reply-To: <1473936594-5152-1-git-send-email-jhs@emojatatu.com>
From: Jamal Hadi Salim <jhs@mojatatu.com>
Sample use case of how this is encoded:
user space via tuntap (or a connected VM/Machine/container)
encodes the tcindex TLV.
Sample use case of decoding:
IFE action decodes it and the skb->tc_index is then used to classify.
So something like this for encoded ICMP packets:
.. first decode then reclassify... skb->tcindex will be set
sudo $TC filter add dev $ETH parent ffff: prio 2 protocol 0xbeef \
u32 match u32 0 0 flowid 1:1 \
action ife decode reclassify
...next match the decode icmp packet...
sudo $TC filter add dev $ETH parent ffff: prio 4 protocol ip \
u32 match ip protocol 1 0xff flowid 1:1 \
action continue
... last classify it using the tcindex classifier and do someaction..
sudo $TC filter add dev $ETH parent ffff: prio 5 protocol ip \
handle 0x11 tcindex classid 1:1 \
action blah..
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
include/uapi/linux/tc_act/tc_ife.h | 3 +-
net/sched/Kconfig | 5 +++
net/sched/Makefile | 1 +
net/sched/act_meta_skbtcindex.c | 81 ++++++++++++++++++++++++++++++++++++++
4 files changed, 89 insertions(+), 1 deletion(-)
create mode 100644 net/sched/act_meta_skbtcindex.c
diff --git a/include/uapi/linux/tc_act/tc_ife.h b/include/uapi/linux/tc_act/tc_ife.h
index 4ece02a..cd18360 100644
--- a/include/uapi/linux/tc_act/tc_ife.h
+++ b/include/uapi/linux/tc_act/tc_ife.h
@@ -32,8 +32,9 @@ enum {
#define IFE_META_HASHID 2
#define IFE_META_PRIO 3
#define IFE_META_QMAP 4
+#define IFE_META_TCINDEX 5
/*Can be overridden at runtime by module option*/
-#define __IFE_META_MAX 5
+#define __IFE_META_MAX 6
#define IFE_META_MAX (__IFE_META_MAX - 1)
#endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 7795d5a..87956a7 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -793,6 +793,11 @@ config NET_IFE_SKBPRIO
depends on NET_ACT_IFE
---help---
+config NET_IFE_SKBTCINDEX
+ tristate "Support to encoding decoding skb tcindex on IFE action"
+ depends on NET_ACT_IFE
+ ---help---
+
config NET_CLS_IND
bool "Incoming device classification"
depends on NET_CLS_U32 || NET_CLS_FW
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 148ae0d..4bdda36 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -23,6 +23,7 @@ obj-$(CONFIG_NET_ACT_SKBMOD) += act_skbmod.o
obj-$(CONFIG_NET_ACT_IFE) += act_ife.o
obj-$(CONFIG_NET_IFE_SKBMARK) += act_meta_mark.o
obj-$(CONFIG_NET_IFE_SKBPRIO) += act_meta_skbprio.o
+obj-$(CONFIG_NET_IFE_SKBTCINDEX) += act_meta_skbtcindex.o
obj-$(CONFIG_NET_ACT_TUNNEL_KEY)+= act_tunnel_key.o
obj-$(CONFIG_NET_SCH_FIFO) += sch_fifo.o
obj-$(CONFIG_NET_SCH_CBQ) += sch_cbq.o
diff --git a/net/sched/act_meta_skbtcindex.c b/net/sched/act_meta_skbtcindex.c
new file mode 100644
index 0000000..ec43327
--- /dev/null
+++ b/net/sched/act_meta_skbtcindex.c
@@ -0,0 +1,81 @@
+/*
+ * net/sched/act_meta_tc_index.c IFE skb->tc_index metadata module
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * copyright Jamal Hadi Salim (2016)
+ *
+*/
+
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+#include <uapi/linux/tc_act/tc_ife.h>
+#include <net/tc_act/tc_ife.h>
+#include <linux/rtnetlink.h>
+
+static int skbtcindex_encode(struct sk_buff *skb, void *skbdata,
+ struct tcf_meta_info *e)
+{
+ u32 ifetc_index = skb->tc_index;
+
+ return ife_encode_meta_u16(ifetc_index, skbdata, e);
+}
+
+static int skbtcindex_decode(struct sk_buff *skb, void *data, u16 len)
+{
+ u16 ifetc_index = *(u16 *)data;
+
+ skb->tc_index = ntohs(ifetc_index);
+ return 0;
+}
+
+static int skbtcindex_check(struct sk_buff *skb, struct tcf_meta_info *e)
+{
+ return ife_check_meta_u16(skb->tc_index, e);
+}
+
+static struct tcf_meta_ops ife_skbtcindex_ops = {
+ .metaid = IFE_META_TCINDEX,
+ .metatype = NLA_U16,
+ .name = "tc_index",
+ .synopsis = "skb tc_index 16 bit metadata",
+ .check_presence = skbtcindex_check,
+ .encode = skbtcindex_encode,
+ .decode = skbtcindex_decode,
+ .get = ife_get_meta_u16,
+ .alloc = ife_alloc_meta_u16,
+ .release = ife_release_meta_gen,
+ .validate = ife_validate_meta_u16,
+ .owner = THIS_MODULE,
+};
+
+static int __init ifetc_index_init_module(void)
+{
+ pr_emerg("Loaded IFE tc_index\n");
+ return register_ife_op(&ife_skbtcindex_ops);
+}
+
+static void __exit ifetc_index_cleanup_module(void)
+{
+ pr_emerg("Unloaded IFE tc_index\n");
+ unregister_ife_op(&ife_skbtcindex_ops);
+}
+
+module_init(ifetc_index_init_module);
+module_exit(ifetc_index_cleanup_module);
+
+MODULE_AUTHOR("Jamal Hadi Salim(2016)");
+MODULE_DESCRIPTION("Inter-FE skb tc_index metadata module");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_IFE_META(IFE_META_SKBTCINDEX);
--
1.9.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox