Netdev List
 help / color / mirror / Atom feed
* Re: ipv6: handle -EFAULT from skb_copy_bits
From: Dave Jones @ 2016-12-20  0:31 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20161219.144848.736886978545428136.davem@davemloft.net>

On Mon, Dec 19, 2016 at 02:48:48PM -0500, David Miller wrote:

 > One thing that's interesting is that if the user picks "IPPROTO_RAW"
 > as the value of 'protocol' we set inet->hdrincl to 1.
 > 
 > The user can also set inet->hdrincl to 1 or 0 via setsockopt().
 > 
 > I think this is part of the problem.  The test above means to check
 > for "RAW socket with hdrincl set" and is trying to do this more simply.
 > But because setsockopt() can set this arbitrarily, testing sk_protocol
 > alone isn't enough.
 > 
 > So changing:
 > 
 > 	sk->sk_protocol == IPPROTO_RAW
 > 
 > into something like:
 > 
 > 	(sk->sk_socket->type == SOCK_RAW && inet_sk(sk)->hdrincl)
 > 
 > should correct the test.
 >  ..
 > 
 > You can test if the change I suggest above works.

Unfortunately, this made no difference.  I spent some time today trying
to make a better reproducer, but failed. I'll revisit again tomorrow.

Maybe I need >1 process/thread to trigger this.  That would explain why
I can trigger it with Trinity.

	Dave

^ permalink raw reply

* Re: Potential issues (security and otherwise) with the current cgroup-bpf API
From: Andy Lutomirski @ 2016-12-20  0:25 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, Daniel Mack, Mickaël Salaün, Kees Cook,
	Jann Horn, Tejun Heo, David Ahern, David S. Miller, Thomas Graf,
	Michael Kerrisk, Peter Zijlstra, Linux API,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Network Development
In-Reply-To: <20161220000254.GA58895-+o4/htvd0TDFYCXBM6kdu7fOX0fSgVTm@public.gmane.org>

On Mon, Dec 19, 2016 at 4:02 PM, Alexei Starovoitov
<alexei.starovoitov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Mon, Dec 19, 2016 at 01:23:50PM -0800, Andy Lutomirski wrote:
>> On Mon, Dec 19, 2016 at 12:56 PM, Alexei Starovoitov
>> <alexei.starovoitov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> > On Sat, Dec 17, 2016 at 10:18:44AM -0800, Andy Lutomirski wrote:
>> >> Hi all-
>> >>
>> >> I apologize for being rather late with this.  I didn't realize that
>> >> cgroup-bpf was going to be submitted for Linux 4.10, and I didn't see
>> >> it on the linux-api list, so I missed the discussion.
>> >>
>> >> I think that the inet ingress, egress etc filters are a neat feature,
>> >> but I think the API has some issues that will bite us down the road
>> >> if it becomes stable in its current form.
>> >>
>> >> Most of the problems I see are summarized in this transcript:
>> >>
>> >> # mkdir cg2
>> >> # mount -t cgroup2 none cg2
>> >> # mkdir cg2/nosockets
>> >> # strace cgrp_socket_rule cg2/nosockets/ 0
>> >> ...
>> >> open("cg2/nosockets/", O_RDONLY|O_DIRECTORY) = 3
>> >>
>> >> ^^^^ You can modify a cgroup after opening it O_RDONLY?
>> >>
>> >> bpf(BPF_PROG_LOAD, {prog_type=0x9 /* BPF_PROG_TYPE_??? */, insn_cnt=2,
>> >> insns=0x7fffe3568c10, license="GPL", log_level=1, log_size=262144,
>> >> log_buf=0x6020c0, kern_version=0}, 48) = 4
>> >>
>> >> ^^^^ This is fine.  The bpf() syscall manipulates bpf objects.
>> >>
>> >> bpf(0x8 /* BPF_??? */, 0x7fffe3568bf0, 48) = 0
>> >>
>> >> ^^^^ This is not so good:
>> >> ^^^^
>> >> ^^^^ a) The bpf() syscall is supposed to manipulate bpf objects.  This
>> >> ^^^^    is manipulating a cgroup.  There's no reason that a socket creation
>> >> ^^^^    filter couldn't be written in a different language (new iptables
>> >> ^^^^    table?  Simple list of address families?), but if that happened,
>> >> ^^^^    then using bpf() to install it would be entirely nonsensical.
>> >
>> > I don't see why it's _modifing_ the cgroup. I'm looking at it as
>> > network stack is using cgroup as an application group that should
>> > invoke bpf program at the certain point in the stack.
>> > imo cgroup management is orthogonal.
>>
>> It is literally modifying the struct cgroup, and, as a practical
>> matter, it's causing membership in the cgroup to have a certain
>> effect.  But rather than pointless arguing, let me propose an
>> alternative API that I think solves most of the problems here.
>>
>> In my model, BPF_PROG_ATTACH and BPF_PROG_DETACH go away completely.
>> Instead, the cgroup gets three new control files:
>> "net.ingress_filter", "net.egress_filter", and
>> "net.socket_create_filter".  Initially, if you read these files, you
>> see "none\n".
>>
>> To attach a bpf filter, you open the file for write and do an ioctl on
>> it.  After doing the ioctl, if you read the file, you'll see
>> "bpf:[hash]\n" where "[hash]" is exactly what you'd see in fdinfo for
>> the bpf program.
>>
>> To detach any type of filter, bpf or otherwise, you open the file for
>> write and write "none\n" (or just "none").
>>
>> If you write anything else to the file, you get -EINVAL.  But, if
>> someone writes a new type of filter (perhaps a simple list of address
>> families), maybe you can enable the filter by writing something
>> appropriate to the file.
>
> I see no difference in what you're proposing vs what is already implemented
> from feature set point of view, but the file approach is very ugly, since
> it's a mismatch to FD style access that bpf is using everywhere.
> In your proposal you'd also need to add bpf prefix everywhere.
> So the control file names should be bpf_inet_ingress, bpf_inet_egress
> and bpf_socket_create.

I think we're still talking past each other.  A big part of the point
of changing it is that none of this is specific to bpf.  You could (in
theory -- I'm not proposing implementing these until there's demand)
have:

net.socket_create_filter = "none": no filter
net.socket_create_filter = "bpf:baadf00d": bpf filter
net.socket_create_filter = "disallow": no sockets created period
net.socket_create_filter = "iptables:foobar": some iptables thingy
net.socket_create_filter = "nft:blahblahblah": some nft thingy
net.socket_create_filter = "address_family_list:1,2,3": allow AF 1, 2, and 3

See?  This API is not bpf-specific.  It's an API for filtering.  The
fact that struct cgroup currently contains a member called "bpf" is
purely an artifact of the fact that it currently only supports bpf.
Someone will want to rename it to "filters" some day, and
BPF_PROG_DETACH makes no sense whatsoever as a generic API to detach a
filter.

> If you want to prepare such patch for them, I don't mind,
> but you cannot kill syscall command, since it's more flexible
> and your control-file approach _will_ be obsolete pretty quickly.

BPF_PROG_ATTACH and BPF_PROC_DETACH should be removed regardless IMO.
If you really really want a syscall, make it a new syscall.

>
>> Now the API matches the effect.  A cgroup with something other than
>> "none" in one of its net.*_filter files is a cgroup that filters
>> network activity.  And you get CRIU compatibility for free: CRIU can
>> read the control file and use whatever mechanism is uses for BPF in
>> general to restore the cgroup filter state.   As an added bonus, you
>> get ACLs for free and the ugly multiplexer goes away.
>
> extended bpf is not supported by criu. only classic, so having
> control_file-style attachment doesn't buy us anything.

CRIU will support it some day.  We might as well put fewer obstacles
in their way.

>
>> >> # mkdir cg2/nosockets/sockets
>> >> # /home/luto/apps/linux/samples/bpf/cgrp_socket_rule cg2/nosockets/sockets/ 1
>> >>
>> >> ^^^^ This succeeded, which means that, if this feature is enabled in 4.10,
>> >> ^^^^ then we're stuck with its semantics.  If it returned -EINVAL instead,
>> >> ^^^^ there would be a chance to refine it.
>> >
>> > i don't see the problem with this behavior. bpf and cgroup are indepedent.
>> > Imange that socket accounting program is attached to cg2/nosockets.
>> > The program is readonly and carry no security meaning.
>> > Why cgroup should prevent creation of cg2/nosockets/foo directory ?
>>
>> I think you're misunderstanding me.  What I'm saying is that, if you
>> allow a cgroup and one of its descendents to both enable the same type
>> of filter, you have just committed to some particular semantics for
>> what happens.  And I think that the current semantics are the *wrong*
>> semantics for default long-term use, so you should either fix the
>> semantics or disable the problematic case.
>
> Are you saying that use cases I provided are also 'wrong'?
> If you insist on that we won't be able to make any forward progress.
> The current semantics is fine for what it's designed for.

Can you explain your use case more clearly?

>
>> >> # echo $$ >cg2/nosockets/sockets/cgroup.procs
>> >> # ping 127.0.0.1
>> >> PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
>> >> 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.029 ms
>> >
>> > hmm. in this case the admin created a subgroup with a group that allows
>> > ping, so? how's that a problem?
>>
>> I think you're forgetting about namespaces.  There are two different
>> admins here.  There's the admin who created the outer container and
>> said "no sockets here".  Then there's the admin inside the container
>> who said "muahaha, I can create a sub-container and allow sockets
>> *there*".
>
> container management frameworks should be doing sensible
> things. With any infra in the kernel there are many ways to
> create non-sensical setups. It's not a job of the kernel
> to interfere with user space.

What exactly isn't sensible about using cgroup_bpf for containers?

>
>> > In case of vrf I can imagine
>> > a set of application auto-binding to one vrf and smaller
>> > set of applications binding to a different vrf.
>> > Or broader set of applications monitoring all tcp traffic
>> > and subset of them monitoring udp instead.
>> > Those are valid use cases.
>>
>> > I guess you're advocating to run a link list of programs?
>> > That won't be useful for the above use cases, where there is no
>> > reason to run more than one program that control plane
>> > assigned to run for this cgroup.
>>
>> Yes there is, for both monitoring and policy.  If I want to monitor
>> all activity in a cgroup, I probably want to monitor descendents as
>> well.  If I want to restrict a cgroup, I want to restrict its
>> descendents.
>
> you're ignoring use cases I described earlier.
> In vrf case there is only one ifindex it needs to bind to.

I'm totally lost.  Can you explain what this has to do with the cgroup
hierarchy?

>> > At this stage we decided to allow only one program per cgroup per hook
>> > and later can extend it if necessary.
>>
>> No you can't.  Since you allow the problematic case and you gave it
>> the surprising "one program" semantics, you can't change it later.
>
> please show me why we cannot? As far as I can see nothing prevents
> that in the future. We can add any number of new fields to
> BPF_PROG_ATTACH command just like we did in the past with
> other commands, whereas control file interface is not extensible.

Because people are going to start using the old API, tools won't be
aware of the new semantics, you have no usable introspection
mechanism, and everyone is going to screw it up.  But it's even worse:
because global privilege is currently needed to set up these filters,
containers really can use it today, but once you switch to ns_capable,
then it suddenly becomes insecure.  And *that* is something that you
can't do.

>
>> > As you're pointing out, in case of security, we probably
>> > want to preserve original bpf program that should always be
>> > run first and only after it returned 'ok' (we'd need to define
>> > what 'ok' means in secruity context) run program attached to sub-hierarchy.
>>
>> It's already defined AFAICT.  1 means okay.  0 means not okay.
>
> sorry that doesn't make any sense. For seccomp we have a set of
> ranges that mean different things. Here you're proposing to
> hastily assign 1 and 0 ? How is that extensible?
> We need to carefully think through what should be the semantics
> of attaching multiple programs, consider performance implications,
> return codes and so on.

You already assigned it.  The return value of the bpf program, loaded
in Linus' tree today, tells the kernel whether to accept or reject.

>
>> > Another alternative is to disallow attaching programs in sub-hierarchy
>> > if parent has something already attached, but it's not useful
>> > for general case.
>> > All of these are possible future extensions.
>>
>> I like this last one, but IT'S NOT A POSSIBLE FUTURE EXTENSION.  You
>> have to do it now (or disable the feature for 4.10).  This is why I'm
>> bringing this whole thing up now.
>
> We don't have to touch user visible api here, so extensions are fine.

Huh?  My example in the original email attaches a program in a
sub-hierarchy.  Are you saying that 4.11 could make that example stop
working?

>
>> >> In 4.10 with With CONFIG_CGROUP_BPF=y, a cgroup can have bpf
>> >> programs attached that can do things if various events occur. (Right
>> >> now, this means socket operations, but there are plans in the works
>> >> to do this for LSM hooks too.) These bpf programs can say yes or no,
>> >> but they can also read out various data (including socket payloads!)
>> >> and save them away where an attacker can find them. This sounds a
>> >> lot like seccomp with a narrower scope but a much stronger ability to
>> >> exfiltrate private information.
>> >
>> > that sounds like a future problem to solve when bpf+lsm+cgroup are
>> > used for security.
>>
>> [...]
>>
>> >
>> > I disagree with the urgency. I see nothing that needs immediate action.
>> > bpf+lsm+cgroup is not in the tree yet.
>> >
>>
>> I disagree very strongly here.  The API in 4.10 is IMO quite ugly, but
>> the result if bpf+lsm+cgroup works *differently* will be far, far
>> uglier.
>
> again we're talking about the future here and 'ugly' is the matter of taste.
> I hear all the time that people say that netlink api is ugly, so?

Yeah, it's ugly, but that ship already sailed.  This ship hasn't.

^ permalink raw reply

* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock
From: Francois Romieu @ 2016-12-20  0:05 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Lino Sanfilippo, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro,
	alexandre.torgue, davem, linux-kernel, netdev
In-Reply-To: <20161219100215.GA6296@amd>

Pavel Machek <pavel@ucw.cz> :
[...]
> Considering the memory barriers... is something like this neccessary
> in the via-rhine ?

Yes.

> AFAICT... we need a barrier after making sure that descriptor is no
> longer owned by DMA (to make sure we don't get stale data in rest of
> descriptor)... and we need a barrier before giving the descriptor to
> the dma, to make sure DMA engine sees the complete update....?

I would not expect stale data while processing a single transmit
descriptor as the transmit completion does not use the rest of the
descriptor at all in the via-rhine driver. However I agree that transmit
descriptors should be read by the cpu with adequate ordering so the
dma_rmb() should stay.

Same kind of narrative for dma_wmb rhine_rx (s/read/written/ and
s/cpu/device/).

> diff --git a/drivers/net/ethernet/via/via-rhine.c b/drivers/net/ethernet/via/via-rhine.c
> index ba5c542..3806e72 100644
> --- a/drivers/net/ethernet/via/via-rhine.c
> +++ b/drivers/net/ethernet/via/via-rhine.c
[...]
> @@ -2061,6 +2062,7 @@ static int rhine_rx(struct net_device *dev, int limit)
>  
>  		if (desc_status & DescOwn)
>  			break;
> +		dma_rmb();
>  

I agree with your explanation for this one (late vlan processing in a 
different word from the same descriptor).

-- 
Ueimor

^ permalink raw reply

* Re: Potential issues (security and otherwise) with the current cgroup-bpf API
From: Alexei Starovoitov @ 2016-12-20  0:02 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Daniel Mack, Mickaël Salaün, Kees Cook, Jann Horn,
	Tejun Heo, David Ahern, David S. Miller, Thomas Graf,
	Michael Kerrisk, Peter Zijlstra, Linux API,
	linux-kernel@vger.kernel.org, Network Development
In-Reply-To: <CALCETrWr5XMkexdGp7HdkiLkQV=P9ycj+sNO7xWSRoCVxihVZA@mail.gmail.com>

On Mon, Dec 19, 2016 at 01:23:50PM -0800, Andy Lutomirski wrote:
> On Mon, Dec 19, 2016 at 12:56 PM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Sat, Dec 17, 2016 at 10:18:44AM -0800, Andy Lutomirski wrote:
> >> Hi all-
> >>
> >> I apologize for being rather late with this.  I didn't realize that
> >> cgroup-bpf was going to be submitted for Linux 4.10, and I didn't see
> >> it on the linux-api list, so I missed the discussion.
> >>
> >> I think that the inet ingress, egress etc filters are a neat feature,
> >> but I think the API has some issues that will bite us down the road
> >> if it becomes stable in its current form.
> >>
> >> Most of the problems I see are summarized in this transcript:
> >>
> >> # mkdir cg2
> >> # mount -t cgroup2 none cg2
> >> # mkdir cg2/nosockets
> >> # strace cgrp_socket_rule cg2/nosockets/ 0
> >> ...
> >> open("cg2/nosockets/", O_RDONLY|O_DIRECTORY) = 3
> >>
> >> ^^^^ You can modify a cgroup after opening it O_RDONLY?
> >>
> >> bpf(BPF_PROG_LOAD, {prog_type=0x9 /* BPF_PROG_TYPE_??? */, insn_cnt=2,
> >> insns=0x7fffe3568c10, license="GPL", log_level=1, log_size=262144,
> >> log_buf=0x6020c0, kern_version=0}, 48) = 4
> >>
> >> ^^^^ This is fine.  The bpf() syscall manipulates bpf objects.
> >>
> >> bpf(0x8 /* BPF_??? */, 0x7fffe3568bf0, 48) = 0
> >>
> >> ^^^^ This is not so good:
> >> ^^^^
> >> ^^^^ a) The bpf() syscall is supposed to manipulate bpf objects.  This
> >> ^^^^    is manipulating a cgroup.  There's no reason that a socket creation
> >> ^^^^    filter couldn't be written in a different language (new iptables
> >> ^^^^    table?  Simple list of address families?), but if that happened,
> >> ^^^^    then using bpf() to install it would be entirely nonsensical.
> >
> > I don't see why it's _modifing_ the cgroup. I'm looking at it as
> > network stack is using cgroup as an application group that should
> > invoke bpf program at the certain point in the stack.
> > imo cgroup management is orthogonal.
> 
> It is literally modifying the struct cgroup, and, as a practical
> matter, it's causing membership in the cgroup to have a certain
> effect.  But rather than pointless arguing, let me propose an
> alternative API that I think solves most of the problems here.
> 
> In my model, BPF_PROG_ATTACH and BPF_PROG_DETACH go away completely.
> Instead, the cgroup gets three new control files:
> "net.ingress_filter", "net.egress_filter", and
> "net.socket_create_filter".  Initially, if you read these files, you
> see "none\n".
> 
> To attach a bpf filter, you open the file for write and do an ioctl on
> it.  After doing the ioctl, if you read the file, you'll see
> "bpf:[hash]\n" where "[hash]" is exactly what you'd see in fdinfo for
> the bpf program.
> 
> To detach any type of filter, bpf or otherwise, you open the file for
> write and write "none\n" (or just "none").
> 
> If you write anything else to the file, you get -EINVAL.  But, if
> someone writes a new type of filter (perhaps a simple list of address
> families), maybe you can enable the filter by writing something
> appropriate to the file.

I see no difference in what you're proposing vs what is already implemented
from feature set point of view, but the file approach is very ugly, since
it's a mismatch to FD style access that bpf is using everywhere.
In your proposal you'd also need to add bpf prefix everywhere.
So the control file names should be bpf_inet_ingress, bpf_inet_egress
and bpf_socket_create.
If you want to prepare such patch for them, I don't mind,
but you cannot kill syscall command, since it's more flexible
and your control-file approach _will_ be obsolete pretty quickly.

> Now the API matches the effect.  A cgroup with something other than
> "none" in one of its net.*_filter files is a cgroup that filters
> network activity.  And you get CRIU compatibility for free: CRIU can
> read the control file and use whatever mechanism is uses for BPF in
> general to restore the cgroup filter state.   As an added bonus, you
> get ACLs for free and the ugly multiplexer goes away.

extended bpf is not supported by criu. only classic, so having
control_file-style attachment doesn't buy us anything.

> >> # mkdir cg2/nosockets/sockets
> >> # /home/luto/apps/linux/samples/bpf/cgrp_socket_rule cg2/nosockets/sockets/ 1
> >>
> >> ^^^^ This succeeded, which means that, if this feature is enabled in 4.10,
> >> ^^^^ then we're stuck with its semantics.  If it returned -EINVAL instead,
> >> ^^^^ there would be a chance to refine it.
> >
> > i don't see the problem with this behavior. bpf and cgroup are indepedent.
> > Imange that socket accounting program is attached to cg2/nosockets.
> > The program is readonly and carry no security meaning.
> > Why cgroup should prevent creation of cg2/nosockets/foo directory ?
> 
> I think you're misunderstanding me.  What I'm saying is that, if you
> allow a cgroup and one of its descendents to both enable the same type
> of filter, you have just committed to some particular semantics for
> what happens.  And I think that the current semantics are the *wrong*
> semantics for default long-term use, so you should either fix the
> semantics or disable the problematic case.

Are you saying that use cases I provided are also 'wrong'?
If you insist on that we won't be able to make any forward progress.
The current semantics is fine for what it's designed for.

> >> # echo $$ >cg2/nosockets/sockets/cgroup.procs
> >> # ping 127.0.0.1
> >> PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
> >> 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.029 ms
> >
> > hmm. in this case the admin created a subgroup with a group that allows
> > ping, so? how's that a problem?
> 
> I think you're forgetting about namespaces.  There are two different
> admins here.  There's the admin who created the outer container and
> said "no sockets here".  Then there's the admin inside the container
> who said "muahaha, I can create a sub-container and allow sockets
> *there*".

container management frameworks should be doing sensible
things. With any infra in the kernel there are many ways to
create non-sensical setups. It's not a job of the kernel
to interfere with user space.

> > In case of vrf I can imagine
> > a set of application auto-binding to one vrf and smaller
> > set of applications binding to a different vrf.
> > Or broader set of applications monitoring all tcp traffic
> > and subset of them monitoring udp instead.
> > Those are valid use cases.
> 
> > I guess you're advocating to run a link list of programs?
> > That won't be useful for the above use cases, where there is no
> > reason to run more than one program that control plane
> > assigned to run for this cgroup.
> 
> Yes there is, for both monitoring and policy.  If I want to monitor
> all activity in a cgroup, I probably want to monitor descendents as
> well.  If I want to restrict a cgroup, I want to restrict its
> descendents.

you're ignoring use cases I described earlier.
In vrf case there is only one ifindex it needs to bind to.

> In the case where I actually don't want to hook the descendents, I'd
> be find with having an option to turn off recursion.  Maybe
> net.egress_filter could also say "bpf(overridable):[hash]" or
> "bpf(nonrecursive):[hash]".  But you should have to opt in to allowing
> your filter to be overridden.

'opt in' only makes sense if you're implementing sandboxing environment.
It doesn't make sense as the default.

> > At this stage we decided to allow only one program per cgroup per hook
> > and later can extend it if necessary.
> 
> No you can't.  Since you allow the problematic case and you gave it
> the surprising "one program" semantics, you can't change it later.

please show me why we cannot? As far as I can see nothing prevents
that in the future. We can add any number of new fields to
BPF_PROG_ATTACH command just like we did in the past with
other commands, whereas control file interface is not extensible.

> > As you're pointing out, in case of security, we probably
> > want to preserve original bpf program that should always be
> > run first and only after it returned 'ok' (we'd need to define
> > what 'ok' means in secruity context) run program attached to sub-hierarchy.
> 
> It's already defined AFAICT.  1 means okay.  0 means not okay.

sorry that doesn't make any sense. For seccomp we have a set of
ranges that mean different things. Here you're proposing to
hastily assign 1 and 0 ? How is that extensible?
We need to carefully think through what should be the semantics
of attaching multiple programs, consider performance implications,
return codes and so on.

> > Another alternative is to disallow attaching programs in sub-hierarchy
> > if parent has something already attached, but it's not useful
> > for general case.
> > All of these are possible future extensions.
> 
> I like this last one, but IT'S NOT A POSSIBLE FUTURE EXTENSION.  You
> have to do it now (or disable the feature for 4.10).  This is why I'm
> bringing this whole thing up now.

We don't have to touch user visible api here, so extensions are fine.

> >> In 4.10 with With CONFIG_CGROUP_BPF=y, a cgroup can have bpf
> >> programs attached that can do things if various events occur. (Right
> >> now, this means socket operations, but there are plans in the works
> >> to do this for LSM hooks too.) These bpf programs can say yes or no,
> >> but they can also read out various data (including socket payloads!)
> >> and save them away where an attacker can find them. This sounds a
> >> lot like seccomp with a narrower scope but a much stronger ability to
> >> exfiltrate private information.
> >
> > that sounds like a future problem to solve when bpf+lsm+cgroup are
> > used for security.
> 
> [...]
> 
> >
> > I disagree with the urgency. I see nothing that needs immediate action.
> > bpf+lsm+cgroup is not in the tree yet.
> >
> 
> I disagree very strongly here.  The API in 4.10 is IMO quite ugly, but
> the result if bpf+lsm+cgroup works *differently* will be far, far
> uglier.

again we're talking about the future here and 'ugly' is the matter of taste.
I hear all the time that people say that netlink api is ugly, so?

> I think the right solution here is to clean up the API so that it'll
> work for future extensions that people are already imagining.  If this
> can happen for 4.10, great.  If not, then postpone this stuff
> entirely.  I've had code I've written for Linux postponed extensively
> until I've gotten the API right, and it's not so bad.  (Actually, I've
> even had API changes I've made reverted in -stable, IIRC.  This is
> much worse than postponing before a release.)

huh? 'not right api' because it's using bpf syscall instead
of cgroup control-file? I think the opposite is the truth.
syscall style is more extensible whereas control-file and text
based interfaces have proven to be huge pain to extend and maintain.
Again, I don't mind if you want to have both available.

^ permalink raw reply

* Re: mlx4: Bug in XDP_TX + 16 rx-queues
From: Martin KaFai Lau @ 2016-12-19 23:37 UTC (permalink / raw)
  To: Tariq Toukan; +Cc: Saeed Mahameed, netdev@vger.kernel.org, Alexei Starovoitov
In-Reply-To: <20161217101803.GB8732@kafai-mba.local>

Hi Tariq,

On Sat, Dec 17, 2016 at 02:18:03AM -0800, Martin KaFai Lau wrote:
> Hi All,
>
> I have been debugging with XDP_TX and 16 rx-queues.
>
> 1) When 16 rx-queues is used and an XDP prog is doing XDP_TX,
> it seems that the packet cannot be XDP_TX out if the pkt
> is received from some particular CPUs (/rx-queues).
>
> 2) If 8 rx-queues is used, it does not have problem.
>
> 3) The 16 rx-queues problem also went away after reverting these
> two patches:
> 15fca2c8eb41 net/mlx4_en: Add ethtool statistics for XDP cases
> 67f8b1dcb9ee net/mlx4_en: Refactor the XDP forwarding rings scheme
>
After taking a closer look at 67f8b1dcb9ee ("net/mlx4_en: Refactor the XDP forwarding rings scheme")
and armed with the fact that '>8 rx-queues does not work', I have
made the attached change that fixed the issue.

Making change in mlx4_en_fill_qp_context() could be an easier fix
but I think this change will be easier for discussion purpose.

I don't want to lie that I know anything about how this variable
works in CX3.  If this change makes sense, I can cook up a diff.
Otherwise, can you shed some light on what could be happening
and hopefully can lead to a diff?

Thanks
--Martin


diff --git i/drivers/net/ethernet/mellanox/mlx4/en_netdev.c w/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index bcd955339058..b3bfb987e493 100644
--- i/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ w/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -1638,10 +1638,10 @@ int mlx4_en_start_port(struct net_device *dev)

 	/* Configure tx cq's and rings */
 	for (t = 0 ; t < MLX4_EN_NUM_TX_TYPES; t++) {
-		u8 num_tx_rings_p_up = t == TX ? priv->num_tx_rings_p_up : 1;
-
 		for (i = 0; i < priv->tx_ring_num[t]; i++) {
 			/* Configure cq */
+			int user_prio;
+
 			cq = priv->tx_cq[t][i];
 			err = mlx4_en_activate_cq(priv, cq, i);
 			if (err) {
@@ -1660,9 +1660,14 @@ int mlx4_en_start_port(struct net_device *dev)

 			/* Configure ring */
 			tx_ring = priv->tx_ring[t][i];
+			if (t != TX_XDP)
+				user_prio = i / priv->num_tx_rings_p_up;
+			else
+				user_prio = i & 0x07;
+
 			err = mlx4_en_activate_tx_ring(priv, tx_ring,
 						       cq->mcq.cqn,
-						       i / num_tx_rings_p_up);
+						       user_prio);
 			if (err) {
 				en_err(priv, "Failed allocating Tx ring\n");
 				mlx4_en_deactivate_cq(priv, cq);

^ permalink raw reply related

* Re: [PATCH 1/3] NFC: trf7970a: add device tree option for 27MHz clock
From: Geoff Lansberry @ 2016-12-19 23:23 UTC (permalink / raw)
  To: Rob Herring
  Cc: linux-wireless, Lauro Ramos Venancio, Aloisio Almeida Jr,
	Samuel Ortiz, mark.rutland-5wv7dgnIgG8,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Mark Greer, Justin Bronder
In-Reply-To: <20161219223111.itu77g7wsqtj6cvs@rob-hp-laptop>

I can make that change, however, I worry that it may be a bit
misleading, since there are only two supported clock frequencies, but
a number like that to me implies that it could be set to any number
you want.   I'm new at this, and so I'll go ahead and change it as you
request, but I'd like to hear your thoughts on my concern.

Thanks
Geoff
Geoff Lansberry


Engineering Guy
Kuvée, Inc
125 Kingston St., 3rd Floor
Boston, MA 02111
1-617-290-1118 (m)
geoff.lansberry (skype)
http://www.kuvee.com



On Mon, Dec 19, 2016 at 5:31 PM, Rob Herring <robh-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> On Thu, Dec 15, 2016 at 05:30:42PM -0500, Geoff Lansberry wrote:
>> From: Geoff Lansberry <geoff-R+k406RtEhcAvxtiuMwx3w@public.gmane.org>
>>
>> ---
>>  .../devicetree/bindings/net/nfc/trf7970a.txt       |  3 ++
>>  drivers/nfc/trf7970a.c                             | 42 ++++++++++++++++------
>>  2 files changed, 34 insertions(+), 11 deletions(-)
>>
>> diff --git a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
>> index 32b35a0..9dda879 100644
>> --- a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
>> +++ b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
>> @@ -21,6 +21,8 @@ Optional SoC Specific Properties:
>>  - t5t-rmb-extra-byte-quirk: Specify that the trf7970a has the erratum
>>    where an extra byte is returned by Read Multiple Block commands issued
>>    to Type 5 tags.
>> +- crystal_27mhz: Set to specify that the input frequency to the trf7970a is 27.12MHz
>> +
>
> Can't you use 'clock-frequency = "27000000";'?
>
>>
>>  Example (for ARM-based BeagleBone with TRF7970A on SPI1):
>>
>> @@ -43,6 +45,7 @@ Example (for ARM-based BeagleBone with TRF7970A on SPI1):
>>               irq-status-read-quirk;
>>               en2-rf-quirk;
>>               t5t-rmb-extra-byte-quirk;
>> +             crystal_27mhz;
>>               status = "okay";
>>       };
>>  };
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 2/3] Bluetooth: btusb: Add out-of-band wakeup support
From: Brian Norris @ 2016-12-19 23:10 UTC (permalink / raw)
  To: Rajat Jain
  Cc: Rob Herring, Mark Rutland, Marcel Holtmann, Gustavo Padovan,
	Johan Hedberg, Amitkumar Karwar, Wei-Ning Huang, Xinming Hu,
	netdev, devicetree, linux-bluetooth, linux-kernel, rajatxjain
In-Reply-To: <1481916604-114279-2-git-send-email-rajatja@google.com>

Hi Rajat,

On Fri, Dec 16, 2016 at 11:30:03AM -0800, Rajat Jain wrote:
> Some onboard BT chips (e.g. Marvell 8997) contain a wakeup pin that
> can be connected to a gpio on the CPU side, and can be used to wakeup
> the host out-of-band. This can be useful in situations where the
> in-band wakeup is not possible or not preferable (e.g. the in-band
> wakeup may require the USB host controller to remain active, and
> hence consuming more system power during system sleep).
> 
> The oob gpio interrupt to be used for wakeup on the CPU side, is
> read from the device tree node, (using standard interrupt descriptors).
> A devcie tree binding document is also added for the driver. The
> compatible string is in compliance with
> Documentation/devicetree/bindings/usb/usb-device.txt
> 
> Signed-off-by: Rajat Jain <rajatja@google.com>

A few small comments below, but other than those, for the whole series:

Reviewed-by: Brian Norris <briannorris@chromium.org>

> ---
> v2: * Use interrupt-names ("wakeup") instead of assuming first interrupt.
>     * Leave it on device tree to specify IRQ flags (level /edge triggered)
>     * Mark the device as non wakeable on exit.
> 
>  Documentation/devicetree/bindings/net/btusb.txt | 40 ++++++++++++
>  drivers/bluetooth/btusb.c                       | 84 +++++++++++++++++++++++++
>  2 files changed, 124 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/btusb.txt
> 
> diff --git a/Documentation/devicetree/bindings/net/btusb.txt b/Documentation/devicetree/bindings/net/btusb.txt
> new file mode 100644
> index 0000000..2c0355c
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/btusb.txt
> @@ -0,0 +1,40 @@
> +Generic Bluetooth controller over USB (btusb driver)
> +---------------------------------------------------
> +
> +Required properties:
> +
> +  - compatible : should comply with the format "usbVID,PID" specified in
> +		 Documentation/devicetree/bindings/usb/usb-device.txt
> +		 At the time of writing, the only OF supported devices
> +		 (more may be added later) are:
> +
> +		  "usb1286,204e" (Marvell 8997)

I don't know if anyone cares about these sort of organizational aspects,
but in patch 3 you're extending this binding with device-specific
Marvell bindings. Seems like it'd be good to have some clear way to
correlate those 2. e.g., add a reference to marvell-bt-sd8xxx.txt (or
marvell-bt-8xxx.txt, once you rename it) in patch 3.

> +
> +Optional properties:
> +
> +  - interrupt-parent: phandle of the parent interrupt controller
> +  - interrupt-names: (see below)
> +  - interrupts : The interrupt specified by the name "wakeup" is the interrupt
> +		 that shall be used for out-of-band wake-on-bt. Driver will
> +		 request this interrupt for wakeup. During system suspend, the
> +		 irq will be enabled so that the bluetooth chip can wakeup host
> +		 platform out of band. During system resume, the irq will be
> +		 disabled to make sure unnecessary interrupt is not received.
> +
> +Example:
> +
> +Following example uses irq pin number 3 of gpio0 for out of band wake-on-bt:
> +
> +&usb_host1_ehci {
> +    status = "okay";
> +    #address-cells = <1>;
> +    #size-cells = <0>;
> +
> +    mvl_bt1: bt@1 {
> +	compatible = "usb1286,204e";
> +	reg = <1>;
> +	interrupt-parent = <&gpio0>;
> +	interrupt-name = "wakeup";
> +	interrupts = <3 IRQ_TYPE_LEVEL_LOW>;
> +    };
> +};
> diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
> index ce22cef..beca4e9 100644
> --- a/drivers/bluetooth/btusb.c
> +++ b/drivers/bluetooth/btusb.c
> @@ -24,6 +24,8 @@
>  #include <linux/module.h>
>  #include <linux/usb.h>
>  #include <linux/firmware.h>
> +#include <linux/of_device.h>
> +#include <linux/of_irq.h>
>  #include <asm/unaligned.h>
>  
>  #include <net/bluetooth/bluetooth.h>
> @@ -369,6 +371,7 @@ static const struct usb_device_id blacklist_table[] = {
>  #define BTUSB_BOOTING		9
>  #define BTUSB_RESET_RESUME	10
>  #define BTUSB_DIAG_RUNNING	11
> +#define BTUSB_OOB_WAKE_DISABLED	12
>  
>  struct btusb_data {
>  	struct hci_dev       *hdev;
> @@ -416,6 +419,8 @@ struct btusb_data {
>  	int (*recv_bulk)(struct btusb_data *data, void *buffer, int count);
>  
>  	int (*setup_on_usb)(struct hci_dev *hdev);
> +
> +	int oob_wake_irq;   /* irq for out-of-band wake-on-bt */
>  };
>  
>  static inline void btusb_free_frags(struct btusb_data *data)
> @@ -2728,6 +2733,65 @@ static int btusb_bcm_set_diag(struct hci_dev *hdev, bool enable)
>  }
>  #endif
>  
> +#ifdef CONFIG_PM
> +static irqreturn_t btusb_oob_wake_handler(int irq, void *priv)
> +{
> +	struct btusb_data *data = priv;
> +
> +	/* Disable only if not already disabled (keep it balanced) */
> +	if (!test_and_set_bit(BTUSB_OOB_WAKE_DISABLED, &data->flags)) {
> +		disable_irq_nosync(irq);
> +		disable_irq_wake(irq);
> +	}
> +	pm_wakeup_event(&data->udev->dev, 0);
> +	return IRQ_HANDLED;
> +}
> +
> +static const struct of_device_id btusb_match_table[] = {
> +	{ .compatible = "usb1286,204e" },
> +	{ }
> +};
> +MODULE_DEVICE_TABLE(of, btusb_match_table);

You define a match table here, but you also define essentially same
table for Marvell-specific additions in patch 3. It looks like maybe
it's legal to have more than one OF table in a module? But it seems like
it would get confusing, besides being somewhat strange to maintain. It
might also produce duplicate 'modinfo' output.

If you really want to independently opt into device-tree-specified
interrupts vs. Marvell-specific interrrupt configuration, then you
should probably just merge the latter into the former table, and
implement a Marvell/GPIO flag to stick in the .data field of this table.

Or it might be fine to drop one or both "match" checks. Particularly for
the Marvell-specific stuff, it's probably fair just to check if it has
an ->of_node and if 'id->driver_info & BTUSB_MARVELL'. Any other Marvell
device-specific quirks could probably be keyed off of the
(weirdly-named?) blacklist_table[], which already matches PID/VID.

> +
> +/* Use an oob wakeup pin? */
> +static int btusb_config_oob_wake(struct hci_dev *hdev)
> +{
> +	struct btusb_data *data = hci_get_drvdata(hdev);
> +	struct device *dev = &data->udev->dev;
> +	int irq, ret;
> +
> +	if (!of_match_device(btusb_match_table, dev))
> +		return 0;
> +
> +	/* Move on if no IRQ specified */
> +	irq = of_irq_get_byname(dev->of_node, "wakeup");
> +	if (irq <= 0) {
> +		bt_dev_dbg(hdev, "%s: no OOB Wakeup IRQ in DT", __func__);
> +		return 0;
> +	}
> +
> +	set_bit(BTUSB_OOB_WAKE_DISABLED, &data->flags);
> +
> +	ret = devm_request_irq(&hdev->dev, irq, btusb_oob_wake_handler,
> +			       0, "OOB Wake-on-BT", data);
> +	if (ret) {
> +		bt_dev_err(hdev, "%s: IRQ request failed", __func__);
> +		return ret;
> +	}
> +
> +	ret = device_init_wakeup(dev, true);
> +	if (ret) {
> +		bt_dev_err(hdev, "%s: failed to init_wakeup\n", __func__);
> +		return ret;
> +	}
> +
> +	data->oob_wake_irq = irq;
> +	disable_irq(irq);
> +	bt_dev_info(hdev, "OOB Wake-on-BT configured at IRQ %u\n", irq);
> +	return 0;
> +}
> +#endif
> +
>  static int btusb_probe(struct usb_interface *intf,
>  		       const struct usb_device_id *id)
>  {
> @@ -2849,6 +2913,11 @@ static int btusb_probe(struct usb_interface *intf,
>  	hdev->send   = btusb_send_frame;
>  	hdev->notify = btusb_notify;
>  
> +#ifdef CONFIG_PM
> +	err = btusb_config_oob_wake(hdev);
> +	if (err)
> +		goto out_free_dev;
> +#endif
>  	if (id->driver_info & BTUSB_CW6622)
>  		set_bit(HCI_QUIRK_BROKEN_STORED_LINK_KEY, &hdev->quirks);
>  
> @@ -3061,6 +3130,9 @@ static void btusb_disconnect(struct usb_interface *intf)
>  			usb_driver_release_interface(&btusb_driver, data->isoc);
>  	}
>  
> +	if (data->oob_wake_irq)
> +		device_init_wakeup(&data->udev->dev, false);
> +
>  	hci_free_dev(hdev);
>  }
>  
> @@ -3089,6 +3161,12 @@ static int btusb_suspend(struct usb_interface *intf, pm_message_t message)
>  	btusb_stop_traffic(data);
>  	usb_kill_anchored_urbs(&data->tx_anchor);
>  
> +	if (data->oob_wake_irq && device_may_wakeup(&data->udev->dev)) {
> +		clear_bit(BTUSB_OOB_WAKE_DISABLED, &data->flags);
> +		enable_irq_wake(data->oob_wake_irq);
> +		enable_irq(data->oob_wake_irq);
> +	}
> +
>  	/* Optionally request a device reset on resume, but only when
>  	 * wakeups are disabled. If wakeups are enabled we assume the
>  	 * device will stay powered up throughout suspend.
> @@ -3126,6 +3204,12 @@ static int btusb_resume(struct usb_interface *intf)
>  	if (--data->suspend_count)

Not related to your patch exactly, but isn't this 'suspend_count' stuff
useless? I'd send a patch, but it might conflict with yours. Just wanted
to note it and see if I'm crazy...

Brian

>  		return 0;
>  
> +	/* Disable only if not already disabled (keep it balanced) */
> +	if (!test_and_set_bit(BTUSB_OOB_WAKE_DISABLED, &data->flags)) {
> +		disable_irq(data->oob_wake_irq);
> +		disable_irq_wake(data->oob_wake_irq);
> +	}
> +
>  	if (!test_bit(HCI_RUNNING, &hdev->flags))
>  		goto done;
>  
> -- 
> 2.8.0.rc3.226.g39d4020
> 

^ permalink raw reply

* [PATCH net 2/2] net: netcp: ethss: fix 10gbe host port tx pri map configuration
From: Murali Karicheri @ 2016-12-19 22:55 UTC (permalink / raw)
  To: netdev, davem, linux-kernel
In-Reply-To: <1482188157-24490-1-git-send-email-m-karicheri2@ti.com>

From: WingMan Kwok <w-kwok2@ti.com>

This patch adds the missing 10gbe host port tx priority map
configurations.

Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
---
 drivers/net/ethernet/ti/netcp_ethss.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/netcp_ethss.c b/drivers/net/ethernet/ti/netcp_ethss.c
index a31931c..7d9e36f 100644
--- a/drivers/net/ethernet/ti/netcp_ethss.c
+++ b/drivers/net/ethernet/ti/netcp_ethss.c
@@ -94,6 +94,7 @@
 
 /* offset relative to base of XGBE_SS_REG_INDEX */
 #define XGBE10_SGMII_MODULE_OFFSET	0x100
+#define IS_SS_ID_XGBE(d)		((d)->ss_version == XGBE_SS_VERSION_10)
 /* offset relative to base of XGBE_SM_REG_INDEX */
 #define XGBE10_HOST_PORT_OFFSET		0x34
 #define XGBE10_SLAVE_PORT_OFFSET	0x64
@@ -2322,7 +2323,7 @@ static void gbe_init_host_port(struct gbe_priv *priv)
 	int bypass_en = 1;
 
 	/* Host Tx Pri */
-	if (IS_SS_ID_NU(priv))
+	if (IS_SS_ID_NU(priv) || IS_SS_ID_XGBE(priv))
 		writel(HOST_TX_PRI_MAP_DEFAULT,
 		       GBE_REG_ADDR(priv, host_port_regs, tx_pri_map));
 
-- 
1.9.1

^ permalink raw reply related

* [PATCH net 1/2] net: netcp: ethss: fix errors in ethtool ops
From: Murali Karicheri @ 2016-12-19 22:55 UTC (permalink / raw)
  To: netdev, davem, linux-kernel

From: WingMan Kwok <w-kwok2@ti.com>

In ethtool ops, it needs to retrieve the corresponding
ethss module (gbe or xgbe) from the net_device structure.
Prior to this patch, the retrieving procedure only
checks for the gbe module.  This patch fixes the issue
by checking the xgbe module if the net_device structure
does not correspond to the gbe module.

Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
---
 drivers/net/ethernet/ti/netcp_ethss.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/ti/netcp_ethss.c b/drivers/net/ethernet/ti/netcp_ethss.c
index c7e547e..a31931c 100644
--- a/drivers/net/ethernet/ti/netcp_ethss.c
+++ b/drivers/net/ethernet/ti/netcp_ethss.c
@@ -1746,6 +1746,17 @@ static void keystone_set_msglevel(struct net_device *ndev, u32 value)
 	netcp->msg_enable = value;
 }
 
+static struct gbe_intf *keystone_get_intf_data(struct netcp_intf *netcp)
+{
+	struct gbe_intf *gbe_intf;
+
+	gbe_intf = netcp_module_get_intf_data(&gbe_module, netcp);
+	if (!gbe_intf)
+		gbe_intf = netcp_module_get_intf_data(&xgbe_module, netcp);
+
+	return gbe_intf;
+}
+
 static void keystone_get_stat_strings(struct net_device *ndev,
 				      uint32_t stringset, uint8_t *data)
 {
@@ -1754,7 +1765,7 @@ static void keystone_get_stat_strings(struct net_device *ndev,
 	struct gbe_priv *gbe_dev;
 	int i;
 
-	gbe_intf = netcp_module_get_intf_data(&gbe_module, netcp);
+	gbe_intf = keystone_get_intf_data(netcp);
 	if (!gbe_intf)
 		return;
 	gbe_dev = gbe_intf->gbe_dev;
@@ -1778,7 +1789,7 @@ static int keystone_get_sset_count(struct net_device *ndev, int stringset)
 	struct gbe_intf *gbe_intf;
 	struct gbe_priv *gbe_dev;
 
-	gbe_intf = netcp_module_get_intf_data(&gbe_module, netcp);
+	gbe_intf = keystone_get_intf_data(netcp);
 	if (!gbe_intf)
 		return -EINVAL;
 	gbe_dev = gbe_intf->gbe_dev;
@@ -1896,7 +1907,7 @@ static void keystone_get_ethtool_stats(struct net_device *ndev,
 	struct gbe_intf *gbe_intf;
 	struct gbe_priv *gbe_dev;
 
-	gbe_intf = netcp_module_get_intf_data(&gbe_module, netcp);
+	gbe_intf = keystone_get_intf_data(netcp);
 	if (!gbe_intf)
 		return;
 
@@ -1920,7 +1931,7 @@ static int keystone_get_link_ksettings(struct net_device *ndev,
 	if (!phy)
 		return -EINVAL;
 
-	gbe_intf = netcp_module_get_intf_data(&gbe_module, netcp);
+	gbe_intf = keystone_get_intf_data(netcp);
 	if (!gbe_intf)
 		return -EINVAL;
 
@@ -1953,7 +1964,7 @@ static int keystone_set_link_ksettings(struct net_device *ndev,
 	if (!phy)
 		return -EINVAL;
 
-	gbe_intf = netcp_module_get_intf_data(&gbe_module, netcp);
+	gbe_intf = keystone_get_intf_data(netcp);
 	if (!gbe_intf)
 		return -EINVAL;
 
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock
From: Lino Sanfilippo @ 2016-12-19 22:49 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Francois Romieu, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro,
	alexandre.torgue, davem, linux-kernel, netdev
In-Reply-To: <20161218183010.GA7886@amd>

Hi,

On 18.12.2016 19:30, Pavel Machek wrote:
> Hi!
> 
>> > - e1efa87241272104d6a12c8b9fcdc4f62634d447
>> 
>> Yep, a sync of the dma descriptors before the hardware gets ownership of the tx tail
>> idx is missing in the stmmac, too.
> 
> I can reproduce failure with 4.4 fairly easily. I tried with dma_
> variant of barriers, and it failed, too
> 
> [ 1018.410012] stmmac: early irq
> [ 1023.939702] fpga_cmd_read:wait_event timed out!
> [ 1033.128692] ------------[ cut here ]------------
> [ 1033.133329] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303
> dev_watchdog+0x264/0x284()
> [ 1033.141748] NETDEV WATCHDOG: eth0 (socfpga-dwmac): transmit queue 0
> timed out
> [ 1033.148861] Modules linked in:

This watchdog warning clearly says that for some reason the tx queue was stopped
but never woken up in a certain timespan (5 sec in our case, which is a lot).

Does this occur after the queue has been stopped and woken up again a few times or
is it already the first time the queue is stopped (and never woken up)? 

Regards,
Lino

^ permalink raw reply

* Re: [PATCH 2/3] NFC: trf7970a: Add device tree option of 1.8 Volt IO voltage
From: Rob Herring @ 2016-12-19 22:35 UTC (permalink / raw)
  To: Geoff Lansberry
  Cc: linux-wireless, lauro.venancio, aloisio.almeida, sameo,
	mark.rutland, netdev, devicetree, linux-kernel, mgreer, justin
In-Reply-To: <1481841044-4314-2-git-send-email-glansberry@gmail.com>

On Thu, Dec 15, 2016 at 05:30:43PM -0500, Geoff Lansberry wrote:
> From: Geoff Lansberry <geoff@kuvee.com>
> 
> ---
>  Documentation/devicetree/bindings/net/nfc/trf7970a.txt |  2 ++
>  drivers/nfc/trf7970a.c                                 | 13 ++++++++++++-
>  2 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
> index 9dda879..208f045 100644
> --- a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
> +++ b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
> @@ -21,6 +21,7 @@ Optional SoC Specific Properties:
>  - t5t-rmb-extra-byte-quirk: Specify that the trf7970a has the erratum
>    where an extra byte is returned by Read Multiple Block commands issued
>    to Type 5 tags.
> +- vdd_io_1v8: Set to specify that the trf7970a io voltage should be set to 1.8V

Use the regulator binding and provide a fixed 1.8V supply.

>  - crystal_27mhz: Set to specify that the input frequency to the trf7970a is 27.12MHz
>  
>  
> @@ -45,6 +46,7 @@ Example (for ARM-based BeagleBone with TRF7970A on SPI1):
>  		irq-status-read-quirk;
>  		en2-rf-quirk;
>  		t5t-rmb-extra-byte-quirk;
> +		vdd_io_1v8;
>  		crystal_27mhz;
>  		status = "okay";
>  	};

^ permalink raw reply

* Re: [PATCH 1/3] NFC: trf7970a: add device tree option for 27MHz clock
From: Rob Herring @ 2016-12-19 22:31 UTC (permalink / raw)
  To: Geoff Lansberry
  Cc: linux-wireless, lauro.venancio, aloisio.almeida, sameo,
	mark.rutland, netdev, devicetree, linux-kernel, mgreer, justin
In-Reply-To: <1481841044-4314-1-git-send-email-glansberry@gmail.com>

On Thu, Dec 15, 2016 at 05:30:42PM -0500, Geoff Lansberry wrote:
> From: Geoff Lansberry <geoff@kuvee.com>
> 
> ---
>  .../devicetree/bindings/net/nfc/trf7970a.txt       |  3 ++
>  drivers/nfc/trf7970a.c                             | 42 ++++++++++++++++------
>  2 files changed, 34 insertions(+), 11 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
> index 32b35a0..9dda879 100644
> --- a/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
> +++ b/Documentation/devicetree/bindings/net/nfc/trf7970a.txt
> @@ -21,6 +21,8 @@ Optional SoC Specific Properties:
>  - t5t-rmb-extra-byte-quirk: Specify that the trf7970a has the erratum
>    where an extra byte is returned by Read Multiple Block commands issued
>    to Type 5 tags.
> +- crystal_27mhz: Set to specify that the input frequency to the trf7970a is 27.12MHz
> +

Can't you use 'clock-frequency = "27000000";'?

>  
>  Example (for ARM-based BeagleBone with TRF7970A on SPI1):
>  
> @@ -43,6 +45,7 @@ Example (for ARM-based BeagleBone with TRF7970A on SPI1):
>  		irq-status-read-quirk;
>  		en2-rf-quirk;
>  		t5t-rmb-extra-byte-quirk;
> +		crystal_27mhz;
>  		status = "okay";
>  	};
>  };

^ permalink raw reply

* Re: [PATCH 3/3] Bluetooth: btusb: Configure Marvel to use one of the pins for oob wakeup
From: Rob Herring @ 2016-12-19 22:04 UTC (permalink / raw)
  To: Rajat Jain
  Cc: Mark Rutland, Marcel Holtmann, Gustavo Padovan, Johan Hedberg,
	Amitkumar Karwar, Wei-Ning Huang, Xinming Hu, netdev, devicetree,
	linux-bluetooth, Brian Norris, rajatxjain
In-Reply-To: <1481742779-15105-3-git-send-email-rajatja@google.com>

On Wed, Dec 14, 2016 at 11:12:59AM -0800, Rajat Jain wrote:
> The Marvell devices may have many gpio pins, and hence for wakeup
> on these out-of-band pins, the chip needs to be told which pin is
> to be used for wakeup, using an hci command.
> 
> Thus, we read the pin number etc from the device tree node and send
> a command to the chip.
> 
> Signed-off-by: Rajat Jain <rajatja@google.com>
> ---
> Note that while I would have liked to name the compatible string as more
> like "marvell, usb8997-bt", the devicetrees/bindings/usb/usb-device.txt
> requires the compatible property to be of the form "usbVID,PID".
> 
>  .../{marvell-bt-sd8xxx.txt => marvell-bt-8xxx.txt} | 25 ++++++++-

Acked-by: Rob Herring <robh@kernel.org>

>  drivers/bluetooth/btusb.c                          | 59 ++++++++++++++++++++++
>  2 files changed, 82 insertions(+), 2 deletions(-)
>  rename Documentation/devicetree/bindings/net/{marvell-bt-sd8xxx.txt => marvell-bt-8xxx.txt} (76%)

^ permalink raw reply

* Re: [v3] net: ethernet: cavium: octeon: octeon_mgmt: Handle return NULL error from devm_ioremap
From: David Daney @ 2016-12-19 21:37 UTC (permalink / raw)
  To: David Miller, arvind.yadav.cs
  Cc: peter.chen, fw, david.daney, f.fainelli, netdev, linux-kernel
In-Reply-To: <20161219.110420.1570801461162159172.davem@davemloft.net>

On 12/19/2016 08:04 AM, David Miller wrote:
> From: Arvind Yadav <arvind.yadav.cs@gmail.com>
> Date: Thu, 15 Dec 2016 00:33:30 +0530
>
>> Here, If devm_ioremap will fail. It will return NULL.
>> Kernel can run into a NULL-pointer dereference.
>> This error check will avoid NULL pointer dereference.
>>
>> Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
>
> Since ioremap() is in fact designed to possibly fail, we do
> need to always check it's return value.  So this change is
> correct and I have applied it to the 'net' tree.

Yes, I think that is fine, although I have not tested the patch.

In general I like to know if a patch fixes a problem that has occurred 
on a platform used by the patch author, or if the author just noticed an 
issue through code inspection or automated tool for a platform that they 
cannot test on.

This patch appears to fall into the second category, but attempts to 
determine this for sure were for the most part unsuccessful.

With respect to ioremap(), in general I agree that it is designed to 
possibly fail.  For mips64 however, I think the implementation can never 
fail.  Certainly testing for failure fits better with what we expect to 
see in Linux kernel code.


>
> Thanks.
>

^ permalink raw reply

* Re: Potential issues (security and otherwise) with the current cgroup-bpf API
From: Andy Lutomirski @ 2016-12-19 21:23 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, Daniel Mack, Mickaël Salaün, Kees Cook,
	Jann Horn, Tejun Heo, David Ahern, David S. Miller, Thomas Graf,
	Michael Kerrisk, Peter Zijlstra, Linux API,
	linux-kernel@vger.kernel.org, Network Development
In-Reply-To: <20161219205631.GA31242@ast-mbp.thefacebook.com>

On Mon, Dec 19, 2016 at 12:56 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Sat, Dec 17, 2016 at 10:18:44AM -0800, Andy Lutomirski wrote:
>> Hi all-
>>
>> I apologize for being rather late with this.  I didn't realize that
>> cgroup-bpf was going to be submitted for Linux 4.10, and I didn't see
>> it on the linux-api list, so I missed the discussion.
>>
>> I think that the inet ingress, egress etc filters are a neat feature,
>> but I think the API has some issues that will bite us down the road
>> if it becomes stable in its current form.
>>
>> Most of the problems I see are summarized in this transcript:
>>
>> # mkdir cg2
>> # mount -t cgroup2 none cg2
>> # mkdir cg2/nosockets
>> # strace cgrp_socket_rule cg2/nosockets/ 0
>> ...
>> open("cg2/nosockets/", O_RDONLY|O_DIRECTORY) = 3
>>
>> ^^^^ You can modify a cgroup after opening it O_RDONLY?
>>
>> bpf(BPF_PROG_LOAD, {prog_type=0x9 /* BPF_PROG_TYPE_??? */, insn_cnt=2,
>> insns=0x7fffe3568c10, license="GPL", log_level=1, log_size=262144,
>> log_buf=0x6020c0, kern_version=0}, 48) = 4
>>
>> ^^^^ This is fine.  The bpf() syscall manipulates bpf objects.
>>
>> bpf(0x8 /* BPF_??? */, 0x7fffe3568bf0, 48) = 0
>>
>> ^^^^ This is not so good:
>> ^^^^
>> ^^^^ a) The bpf() syscall is supposed to manipulate bpf objects.  This
>> ^^^^    is manipulating a cgroup.  There's no reason that a socket creation
>> ^^^^    filter couldn't be written in a different language (new iptables
>> ^^^^    table?  Simple list of address families?), but if that happened,
>> ^^^^    then using bpf() to install it would be entirely nonsensical.
>
> I don't see why it's _modifing_ the cgroup. I'm looking at it as
> network stack is using cgroup as an application group that should
> invoke bpf program at the certain point in the stack.
> imo cgroup management is orthogonal.

It is literally modifying the struct cgroup, and, as a practical
matter, it's causing membership in the cgroup to have a certain
effect.  But rather than pointless arguing, let me propose an
alternative API that I think solves most of the problems here.

In my model, BPF_PROG_ATTACH and BPF_PROG_DETACH go away completely.
Instead, the cgroup gets three new control files:
"net.ingress_filter", "net.egress_filter", and
"net.socket_create_filter".  Initially, if you read these files, you
see "none\n".

To attach a bpf filter, you open the file for write and do an ioctl on
it.  After doing the ioctl, if you read the file, you'll see
"bpf:[hash]\n" where "[hash]" is exactly what you'd see in fdinfo for
the bpf program.

To detach any type of filter, bpf or otherwise, you open the file for
write and write "none\n" (or just "none").

If you write anything else to the file, you get -EINVAL.  But, if
someone writes a new type of filter (perhaps a simple list of address
families), maybe you can enable the filter by writing something
appropriate to the file.

Now the API matches the effect.  A cgroup with something other than
"none" in one of its net.*_filter files is a cgroup that filters
network activity.  And you get CRIU compatibility for free: CRIU can
read the control file and use whatever mechanism is uses for BPF in
general to restore the cgroup filter state.   As an added bonus, you
get ACLs for free and the ugly multiplexer goes away.

>> # mkdir cg2/nosockets/sockets
>> # /home/luto/apps/linux/samples/bpf/cgrp_socket_rule cg2/nosockets/sockets/ 1
>>
>> ^^^^ This succeeded, which means that, if this feature is enabled in 4.10,
>> ^^^^ then we're stuck with its semantics.  If it returned -EINVAL instead,
>> ^^^^ there would be a chance to refine it.
>
> i don't see the problem with this behavior. bpf and cgroup are indepedent.
> Imange that socket accounting program is attached to cg2/nosockets.
> The program is readonly and carry no security meaning.
> Why cgroup should prevent creation of cg2/nosockets/foo directory ?

I think you're misunderstanding me.  What I'm saying is that, if you
allow a cgroup and one of its descendents to both enable the same type
of filter, you have just committed to some particular semantics for
what happens.  And I think that the current semantics are the *wrong*
semantics for default long-term use, so you should either fix the
semantics or disable the problematic case.

>
>> # echo $$ >cg2/nosockets/sockets/cgroup.procs
>> # ping 127.0.0.1
>> PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
>> 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.029 ms
>
> hmm. in this case the admin created a subgroup with a group that allows
> ping, so? how's that a problem?

I think you're forgetting about namespaces.  There are two different
admins here.  There's the admin who created the outer container and
said "no sockets here".  Then there's the admin inside the container
who said "muahaha, I can create a sub-container and allow sockets
*there*".

> In case of vrf I can imagine
> a set of application auto-binding to one vrf and smaller
> set of applications binding to a different vrf.
> Or broader set of applications monitoring all tcp traffic
> and subset of them monitoring udp instead.
> Those are valid use cases.

> I guess you're advocating to run a link list of programs?
> That won't be useful for the above use cases, where there is no
> reason to run more than one program that control plane
> assigned to run for this cgroup.

Yes there is, for both monitoring and policy.  If I want to monitor
all activity in a cgroup, I probably want to monitor descendents as
well.  If I want to restrict a cgroup, I want to restrict its
descendents.

In the case where I actually don't want to hook the descendents, I'd
be find with having an option to turn off recursion.  Maybe
net.egress_filter could also say "bpf(overridable):[hash]" or
"bpf(nonrecursive):[hash]".  But you should have to opt in to allowing
your filter to be overridden.

> At this stage we decided to allow only one program per cgroup per hook
> and later can extend it if necessary.

No you can't.  Since you allow the problematic case and you gave it
the surprising "one program" semantics, you can't change it later.

> As you're pointing out, in case of security, we probably
> want to preserve original bpf program that should always be
> run first and only after it returned 'ok' (we'd need to define
> what 'ok' means in secruity context) run program attached to sub-hierarchy.

It's already defined AFAICT.  1 means okay.  0 means not okay.

> Another alternative is to disallow attaching programs in sub-hierarchy
> if parent has something already attached, but it's not useful
> for general case.
> All of these are possible future extensions.

I like this last one, but IT'S NOT A POSSIBLE FUTURE EXTENSION.  You
have to do it now (or disable the feature for 4.10).  This is why I'm
bringing this whole thing up now.

>
>> In 4.10 with With CONFIG_CGROUP_BPF=y, a cgroup can have bpf
>> programs attached that can do things if various events occur. (Right
>> now, this means socket operations, but there are plans in the works
>> to do this for LSM hooks too.) These bpf programs can say yes or no,
>> but they can also read out various data (including socket payloads!)
>> and save them away where an attacker can find them. This sounds a
>> lot like seccomp with a narrower scope but a much stronger ability to
>> exfiltrate private information.
>
> that sounds like a future problem to solve when bpf+lsm+cgroup are
> used for security.

[...]

>
> I disagree with the urgency. I see nothing that needs immediate action.
> bpf+lsm+cgroup is not in the tree yet.
>

I disagree very strongly here.  The API in 4.10 is IMO quite ugly, but
the result if bpf+lsm+cgroup works *differently* will be far, far
uglier.

I think the right solution here is to clean up the API so that it'll
work for future extensions that people are already imagining.  If this
can happen for 4.10, great.  If not, then postpone this stuff
entirely.  I've had code I've written for Linux postponed extensively
until I've gotten the API right, and it's not so bad.  (Actually, I've
even had API changes I've made reverted in -stable, IIRC.  This is
much worse than postponing before a release.)

--Andy

^ permalink raw reply

* Re: HalfSipHash Acceptable Usage
From: Jason A. Donenfeld @ 2016-12-19 21:00 UTC (permalink / raw)
  To: Jean-Philippe Aumasson
  Cc: Theodore Ts'o, Hannes Frederic Sowa, LKML, Eric Biggers,
	Daniel J . Bernstein, David Laight, David Miller, Andi Kleen,
	George Spelvin, kernel-hardening, Andy Lutomirski,
	Linux Crypto Mailing List, Tom Herbert, Vegard Nossum, Netdev,
	Linus Torvalds
In-Reply-To: <CAGiyFdduUNSGq24zfsk0ZU=hnOCmewAw8vw6XvDoS-3f+3UPKQ@mail.gmail.com>

Hi JP,

On Mon, Dec 19, 2016 at 9:49 PM, Jean-Philippe Aumasson
<jeanphilippe.aumasson@gmail.com> wrote:
>
> On Mon, Dec 19, 2016 at 6:32 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>>
>> Hi JP,
>>
>> With the threads getting confusing, I've been urged to try and keep
>> the topics and threads more closely constrained. Here's where we're
>> at, and here's the current pressing security concern. It'd be helpful
>> to have a definitive statement on what you think is best, so we can
>> just build on top of that, instead of getting lost in the chorus of
>> opinions.
>>
>> 1) Anything that requires actual long-term security will use
>> SipHash2-4, with the 64-bit output and the 128-bit key. This includes
>> things like TCP sequence numbers. This seems pretty uncontroversial to
>> me. Seem okay to you?
>
>
>
> Right, since 2012 when we published SipHash many cryptanalysts attempted to
> break SipHash-2-4 with a 128-bit key, for various notions of "break", and
> nothing worth worrying was ever found. I'm totally confident that
> SipHash-2-4 will live up to its security promises.
>
> Don't use something weaker for things like TCP sequence numbers or RNGs. Use
> SipHash2-4 for those. That is the correct choice.
>
>>
>>
>> 2) People seem to want something competitive, performance-wise, with
>> jhash if it's going to replace jhash. The kernel community
>> instinctively pushes back on anything that could harm performance,
>> especially in networking and in critical data structures, so there
>> have been some calls for something faster than SipHash. So, questions
>> regarding this:
>>
>
> No free lunch I guess: either go with a cryptographically secure,
> time-proved keyed hash such as SipHash, or go with some simpler hash deemed
> secure cos its designer can't break it :) #DontRollYourOwnCrypto
>
>> 2a) George thinks that HalfSipHash on 32-bit systems will have roughly
>> comparable speed as SipHash on 64-bit systems, so the idea would be to
>> use HalfSipHash on 32-bit systems' hash tables and SipHash on 64-bit
>> systems' hash tables. The big obvious question is: does HalfSipHash
>> have a sufficient security margin for hashtable usage and hashtable
>> attacks? I'm not wondering about the security margin for other usages,
>> but just of the hashtable usage. In your opinion, does HalfSipHash cut
>> it?
>
>
> HalfSipHash takes its core function from Chaskey and uses the same
> construction as SipHash, so it *should* be secure. Nonetheless it hasn't
> received the same amount of attention as 64-bit SipHash did. So I'm less
> confident about its security than about SipHash's, but it obviously inspires
> a lot more confidence than non-crypto hashes.
>
> Too, HalfSipHash only has a 64-bit key, not a 128-bit key like SipHash, so
> only use this as a mitigation for hash-flooding attacks, where the output of
> the hash function is never directly shown to the caller. Do not use
> HalfSipHash for TCP sequence numbers or RNGs.
>
>
>>
>>
>> 2b) While I certainly wouldn't consider making the use case in
>> question (1) employ a weaker function, for this question (2), there
>> has been some discussion about using HalfSipHash1-3 (or SipHash1-3 on
>> 64-bit) instead of 2-4. So, the same question is therefore posed:
>> would using HalfSipHash1-3 give a sufficient security margin for
>> hashtable usage and hashtable attacks?
>
>
> My educated guess is that yes, it will, but that it may not withhold
> cryptanalysis as a pseudorandom function (PRF). For example I wouldn't be
> surprised if there were a "distinguishing attack" that detects non-random
> patterns in HalfSipHash-1-3's output. But most of the non-crypto hashes I've
> seen have obvious distinguishing attacks. So the upshot is that HSH will get
> you better security that AnyWeakHash even with 1 & 3 rounds.
>
> So, if you're willing to compromise on security, but still want something
> not completely unreasonable, you might be able to get away with using
> HalfSipHash1-3 as a replacement for jhash—in circumstances where the output
> of the hash function is kept secret—in order to mitigate hash-flooding
> attacks.
>

Thanks for the detailed response. I will continue exactly how you've specified.

1. SipHash2-4 for TCP sequence numbers, syncookies, and RNG. IOW, the
things that MD5 is used for now.

2. HalfSipHash1-3 for hash tables where the output is not revealed,
for jhash replacements. On 64-bit this will alias to SipHash1-3.

3. I will write Documentation/siphash.txt detailing this.

4. I'll continue to discourage other kernel developers from rolling
their own crypto or departing from the tried&true in substantial ways.

Thanks again,
Jason

^ permalink raw reply

* Re: Potential issues (security and otherwise) with the current cgroup-bpf API
From: Alexei Starovoitov @ 2016-12-19 20:56 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Daniel Mack, Mickaël Salaün, Kees Cook, Jann Horn,
	Tejun Heo, David Ahern, David S. Miller, Thomas Graf,
	Michael Kerrisk, Peter Zijlstra, Linux API,
	linux-kernel@vger.kernel.org, Network Development
In-Reply-To: <CALCETrV81oFwq2AgeRsN54HA1jR=b5cOZfAgve8H8zhx83DTyA@mail.gmail.com>

On Sat, Dec 17, 2016 at 10:18:44AM -0800, Andy Lutomirski wrote:
> Hi all-
> 
> I apologize for being rather late with this.  I didn't realize that
> cgroup-bpf was going to be submitted for Linux 4.10, and I didn't see
> it on the linux-api list, so I missed the discussion.
> 
> I think that the inet ingress, egress etc filters are a neat feature,
> but I think the API has some issues that will bite us down the road
> if it becomes stable in its current form.
> 
> Most of the problems I see are summarized in this transcript:
> 
> # mkdir cg2
> # mount -t cgroup2 none cg2
> # mkdir cg2/nosockets
> # strace cgrp_socket_rule cg2/nosockets/ 0
> ...
> open("cg2/nosockets/", O_RDONLY|O_DIRECTORY) = 3
> 
> ^^^^ You can modify a cgroup after opening it O_RDONLY?
> 
> bpf(BPF_PROG_LOAD, {prog_type=0x9 /* BPF_PROG_TYPE_??? */, insn_cnt=2,
> insns=0x7fffe3568c10, license="GPL", log_level=1, log_size=262144,
> log_buf=0x6020c0, kern_version=0}, 48) = 4
> 
> ^^^^ This is fine.  The bpf() syscall manipulates bpf objects.
> 
> bpf(0x8 /* BPF_??? */, 0x7fffe3568bf0, 48) = 0
> 
> ^^^^ This is not so good:
> ^^^^
> ^^^^ a) The bpf() syscall is supposed to manipulate bpf objects.  This
> ^^^^    is manipulating a cgroup.  There's no reason that a socket creation
> ^^^^    filter couldn't be written in a different language (new iptables
> ^^^^    table?  Simple list of address families?), but if that happened,
> ^^^^    then using bpf() to install it would be entirely nonsensical.

I don't see why it's _modifing_ the cgroup. I'm looking at it as
network stack is using cgroup as an application group that should
invoke bpf program at the certain point in the stack.
imo cgroup management is orthogonal.
I certainly don't mind adding 'cat cgroup.bpf' control file that will
print the program digest for debugging, just like we do for fdinfo,
but cgroup file interface shouldn't be used to control networking.
If there was something like networking-wide ioctl, I think it could
have been an alternative way to attach a program to a hook to a cgroup.
Adding a control file to a cgroup for the purpose of attaching bpf,
just doesn't make sense to me, since it's dragging bpf and networking
details into cgroup land. Imagine in the future somebody would want
to have nft to perform similar BPF_CGROUP_INET_INGRESS functionality.
I'd think that the position of the hook within the networking stack
would remain the same, but enum id will be different, since
other ids in that enum (like BPF_CGROUP_INET_SOCK_CREATE) are not
applicable to nft. Similarly the concept of program_fd doesn't exist
there either. So it would be using its own netlink based mechanism
and its own way of enumerating hook id. And cgroup could be passed
as a string into netlink message instead of fd.
So if somebody adds a control file to a cgroup api that takes bpf's hookid
and bpf's program_fd, such control file will only be applicable to bpf
and not usable for anything else. Hence I see no reason to go that route.

> ^^^^
> ^^^^ b) This is starting to be an excessively ugly multiplexer.  Among
> ^^^^    other things, it's very unfriendly to seccomp.
> 
> # echo $$ >cg2/nosockets/cgroup.procs
> # ping 127.0.0.1
> ping: socket: Operation not permitted
> # ls cg2/nosockets/
> cgroup.controllers  cgroup.events  cgroup.procs  cgroup.subtree_control
> # cat cg2/nosockets/cgroup.controllers
> 
> ^^^^ Something in cgroupfs should give an indication that this cgroup
> ^^^^ filters socket creation, but there's nothing there.  You should also
> ^^^^ be able to turn the filter off from cgroupfs.

Doing 'cat cg2/nosockets/cgroup.bpf' here would be useful for debugging
to see which program is attached to which hook in this cgroup.
Detaching programs via cgroup control file is a lot less clear,
since it will be confusing to a running application that attached the program
in the first place, since it won't have any indication that external entity
detached the program.
One can argue that it could be useful for curious human/admin
to detach all bpf progs from all hooks for this cgroup, but
then it probably needs dmesg warning and only usable in extreme cases
as debugging and not something control plane should ever use.

> # mkdir cg2/nosockets/sockets
> # /home/luto/apps/linux/samples/bpf/cgrp_socket_rule cg2/nosockets/sockets/ 1
> 
> ^^^^ This succeeded, which means that, if this feature is enabled in 4.10,
> ^^^^ then we're stuck with its semantics.  If it returned -EINVAL instead,
> ^^^^ there would be a chance to refine it.

i don't see the problem with this behavior. bpf and cgroup are indepedent.
Imange that socket accounting program is attached to cg2/nosockets.
The program is readonly and carry no security meaning.
Why cgroup should prevent creation of cg2/nosockets/foo directory ?

> # echo $$ >cg2/nosockets/sockets/cgroup.procs
> # ping 127.0.0.1
> PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
> 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.029 ms

hmm. in this case the admin created a subgroup with a group that allows
ping, so? how's that a problem? In case of vrf I can imagine
a set of application auto-binding to one vrf and smaller
set of applications binding to a different vrf.
Or broader set of applications monitoring all tcp traffic
and subset of them monitoring udp instead.
Those are valid use cases.
I guess you're advocating to run a link list of programs?
That won't be useful for the above use cases, where there is no
reason to run more than one program that control plane
assigned to run for this cgroup.
At this stage we decided to allow only one program per cgroup per hook
and later can extend it if necessary.
As you're pointing out, in case of security, we probably
want to preserve original bpf program that should always be
run first and only after it returned 'ok' (we'd need to define
what 'ok' means in secruity context) run program attached to sub-hierarchy.
Another alternative is to disallow attaching programs in sub-hierarchy
if parent has something already attached, but it's not useful
for general case.
All of these are possible future extensions.

> In 4.10 with With CONFIG_CGROUP_BPF=y, a cgroup can have bpf
> programs attached that can do things if various events occur. (Right
> now, this means socket operations, but there are plans in the works
> to do this for LSM hooks too.) These bpf programs can say yes or no,
> but they can also read out various data (including socket payloads!)
> and save them away where an attacker can find them. This sounds a
> lot like seccomp with a narrower scope but a much stronger ability to
> exfiltrate private information.

that sounds like a future problem to solve when bpf+lsm+cgroup are
used for security.

> This isn't much of a problem yet because you currently need
> CAP_NET_ADMIN to create a malicious sandbox in the first place.  I'm
> sure that, in the near future, someone will want to make this stuff
> work in containers with delegated cgroup hierarchies, and then there
> may be a real problem here.

I agree. Currently cgroup+bpf+hooks are for root only and have to
go long way before they can be used for security and sandboxing.

> I've included a few security people on this thread.  The current API
> looks abusable, and it would be nice to find all the holes before
> 4.10 comes out.

I disagree with the urgency. I see nothing that needs immediate action.
bpf+lsm+cgroup is not in the tree yet.

^ permalink raw reply

* Re: HalfSipHash Acceptable Usage
From: Jean-Philippe Aumasson @ 2016-12-19 20:49 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Theodore Ts'o, Hannes Frederic Sowa, LKML, Eric Biggers,
	Daniel J . Bernstein, David Laight, David Miller, Andi Kleen,
	George Spelvin, kernel-hardening, Andy Lutomirski,
	Linux Crypto Mailing List, Tom Herbert, Vegard Nossum, Netdev,
	Linus Torvalds
In-Reply-To: <CAHmME9rPmH=wP_eHYopt8ZPG9TSN7bos3fGOuqKL2HjQW-2SWA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4442 bytes --]

On Mon, Dec 19, 2016 at 6:32 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:

> Hi JP,
>
> With the threads getting confusing, I've been urged to try and keep
> the topics and threads more closely constrained. Here's where we're
> at, and here's the current pressing security concern. It'd be helpful
> to have a definitive statement on what you think is best, so we can
> just build on top of that, instead of getting lost in the chorus of
> opinions.
>
> 1) Anything that requires actual long-term security will use
> SipHash2-4, with the 64-bit output and the 128-bit key. This includes
> things like TCP sequence numbers. This seems pretty uncontroversial to
> me. Seem okay to you?
>


Right, since 2012 when we published SipHash many cryptanalysts attempted to
break SipHash-2-4 with a 128-bit key, for various notions of "break", and
nothing worth worrying was ever found. I'm totally confident that
SipHash-2-4 will live up to its security promises.

Don't use something weaker for things like TCP sequence numbers or RNGs.
Use SipHash2-4 for those. That is the correct choice.


>
> 2) People seem to want something competitive, performance-wise, with
> jhash if it's going to replace jhash. The kernel community
> instinctively pushes back on anything that could harm performance,
> especially in networking and in critical data structures, so there
> have been some calls for something faster than SipHash. So, questions
> regarding this:
>
>
No free lunch I guess: either go with a cryptographically secure,
time-proved keyed hash such as SipHash, or go with some simpler hash deemed
secure cos its designer can't break it :) #DontRollYourOwnCrypto

2a) George thinks that HalfSipHash on 32-bit systems will have roughly
> comparable speed as SipHash on 64-bit systems, so the idea would be to
> use HalfSipHash on 32-bit systems' hash tables and SipHash on 64-bit
> systems' hash tables. The big obvious question is: does HalfSipHash
> have a sufficient security margin for hashtable usage and hashtable
> attacks? I'm not wondering about the security margin for other usages,
> but just of the hashtable usage. In your opinion, does HalfSipHash cut
> it?
>

HalfSipHash takes its core function from Chaskey and uses the same
construction as SipHash, so it *should* be secure. Nonetheless it hasn't
received the same amount of attention as 64-bit SipHash did. So I'm less
confident about its security than about SipHash's, but it obviously
inspires a lot more confidence than non-crypto hashes.

Too, HalfSipHash only has a 64-bit key, not a 128-bit key like SipHash, so
only use this as a mitigation for hash-flooding attacks, where the output
of the hash function is never directly shown to the caller. Do not use
HalfSipHash for TCP sequence numbers or RNGs.



>
> 2b) While I certainly wouldn't consider making the use case in
> question (1) employ a weaker function, for this question (2), there
> has been some discussion about using HalfSipHash1-3 (or SipHash1-3 on
> 64-bit) instead of 2-4. So, the same question is therefore posed:
> would using HalfSipHash1-3 give a sufficient security margin for
> hashtable usage and hashtable attacks?
>

My educated guess is that yes, it will, but that it may not withhold
cryptanalysis as a pseudorandom function (PRF). For example I wouldn't be
surprised if there were a "distinguishing attack" that detects non-random
patterns in HalfSipHash-1-3's output. But most of the non-crypto hashes
I've seen have obvious distinguishing attacks. So the upshot is that HSH
will get you better security that AnyWeakHash even with 1 & 3 rounds.

So, if you're willing to compromise on security, but still want something
not completely unreasonable, you might be able to get away with using
HalfSipHash1-3 as a replacement for jhash—in circumstances where the output
of the hash function is kept secret—in order to mitigate hash-flooding
attacks.



>
> My plan is essentially to implement things according to your security
> recommendation. The thread started with me pushing a heavy duty
> security solution -- SipHash2-4 -- for _everything_. I've received
> understandable push back on that notion for certain use cases. So now
> I'm trying to discover what the most acceptable compromise is. Your
> answers on (2a) and (2b) will direct that compromise.
>
> Thanks again,
> Jason


>

[-- Attachment #2: Type: text/html, Size: 6561 bytes --]

^ permalink raw reply

* RE: [upstream-release] [PATCH net v3 2/4] powerpc: fsl/fman: remove fsl, fman from of_device_ids[]
From: Madalin-Cristian Bucur @ 2016-12-19 20:43 UTC (permalink / raw)
  To: Scott Wood, netdev@vger.kernel.org
  Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
	davem@davemloft.net
In-Reply-To: <1482176783.32531.28.camel@buserror.net>

> From: Scott Wood [mailto:oss@buserror.net]
> Sent: Monday, December 19, 2016 9:46 PM
> 
> On Mon, 2016-12-19 at 18:13 +0200, Madalin Bucur wrote:
> > The fsl/fman drivers will use of_platform_populate() on all
> > supported platforms. Call of_platform_populate() to probe the
> > FMan sub-nodes.
> >
> > Signed-off-by: Igal Liberman <igal.liberman@freescale.com>
> > Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
> > ---
> >  arch/powerpc/platforms/85xx/corenet_generic.c | 3 ---
> >  drivers/net/ethernet/freescale/fman/fman.c    | 8 ++++++++
> >  2 files changed, 8 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/powerpc/platforms/85xx/corenet_generic.c
> > b/arch/powerpc/platforms/85xx/corenet_generic.c
> > index 1179115..824b7f1 100644
> > --- a/arch/powerpc/platforms/85xx/corenet_generic.c
> > +++ b/arch/powerpc/platforms/85xx/corenet_generic.c
> > @@ -117,9 +117,6 @@ static const struct of_device_id of_device_ids[] = {
> >  	{
> >  		.compatible	= "fsl,qe",
> >  	},
> > -	{
> > -		.compatible    = "fsl,fman",
> > -	},
> >  	/* The following two are for the Freescale hypervisor */
> >  	{
> >  		.name		= "hypervisor",
> 
> For this part:
> 
> Acked-by: Scott Wood <oss@buserror.net>

Thank you, added to v4.

> > diff --git a/drivers/net/ethernet/freescale/fman/fman.c
> > b/drivers/net/ethernet/freescale/fman/fman.c
> > index dafd9e1..0b7f711 100644
> > --- a/drivers/net/ethernet/freescale/fman/fman.c
> > +++ b/drivers/net/ethernet/freescale/fman/fman.c
> > @@ -2868,6 +2868,14 @@ static struct fman *read_dts_node(struct
> > platform_device *of_dev)
> >
> >  	fman->dev = &of_dev->dev;
> >
> > +	/* call of_platform_populate in order to probe sub-nodes on arm64
> > */
> > +	err = of_platform_populate(fm_node, NULL, NULL, &of_dev->dev);
> > +	if (err) {
> > +		dev_err(&of_dev->dev, "%s: of_platform_populate()
> > failed\n",
> > +			__func__);
> > +		goto fman_free;
> > +	}
> 
> The "on arm64" comment is no longer accurate (and the rest of the comment
> seems unnecessary).
> 
> -Scott

Removed in v4.

^ permalink raw reply

* [PATCH net v4 3/4] fsl/fman: A007273 only applies to PPC SoCs
From: Madalin Bucur @ 2016-12-19 20:42 UTC (permalink / raw)
  To: netdev; +Cc: linuxppc-dev, linux-kernel, davem, scott.wood, mpe
In-Reply-To: <1482180166-10677-1-git-send-email-madalin.bucur@nxp.com>

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Reviewed-by: Camelia Groza <camelia.groza@nxp.com>
---
 drivers/net/ethernet/freescale/fman/fman.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/freescale/fman/fman.c b/drivers/net/ethernet/freescale/fman/fman.c
index 4b83263..f60845f 100644
--- a/drivers/net/ethernet/freescale/fman/fman.c
+++ b/drivers/net/ethernet/freescale/fman/fman.c
@@ -1890,6 +1890,7 @@ static int fman_reset(struct fman *fman)
 
 		goto _return;
 	} else {
+#ifdef CONFIG_PPC
 		struct device_node *guts_node;
 		struct ccsr_guts __iomem *guts_regs;
 		u32 devdisr2, reg;
@@ -1921,6 +1922,7 @@ static int fman_reset(struct fman *fman)
 
 		/* Enable all MACs */
 		iowrite32be(reg, &guts_regs->devdisr2);
+#endif
 
 		/* Perform FMan reset */
 		iowrite32be(FPM_RSTC_FM_RESET, &fman->fpm_regs->fm_rstc);
@@ -1932,25 +1934,31 @@ static int fman_reset(struct fman *fman)
 		} while (((ioread32be(&fman->fpm_regs->fm_rstc)) &
 			 FPM_RSTC_FM_RESET) && --count);
 		if (count == 0) {
+#ifdef CONFIG_PPC
 			iounmap(guts_regs);
 			of_node_put(guts_node);
+#endif
 			err = -EBUSY;
 			goto _return;
 		}
+#ifdef CONFIG_PPC
 
 		/* Restore devdisr2 value */
 		iowrite32be(devdisr2, &guts_regs->devdisr2);
 
 		iounmap(guts_regs);
 		of_node_put(guts_node);
+#endif
 
 		goto _return;
 
+#ifdef CONFIG_PPC
 guts_regs:
 		of_node_put(guts_node);
 guts_node:
 		dev_dbg(fman->dev, "%s: Didn't perform FManV3 reset due to Errata A007273!\n",
 			__func__);
+#endif
 	}
 _return:
 	return err;
-- 
2.1.0

^ permalink raw reply related

* [PATCH net v4 0/4] fsl/fman: fixes for ARM
From: Madalin Bucur @ 2016-12-19 20:42 UTC (permalink / raw)
  To: netdev; +Cc: linuxppc-dev, linux-kernel, davem, scott.wood, mpe

The patch set fixes advertised speeds for QSGMII interfaces, disables
A007273 erratum workaround on non-PowerPC platforms where it does not
apply, enables compilation on ARM64 and addresses a probing issue on
non PPC platforms.

Changes from v3: removed redundant comment, added ack by Scott
Changes from v2: merged fsl/fman changes to avoid a point of failure
Changes from v1: unifying probing on all supported platforms

Madalin Bucur (4):
  fsl/fman: fix 1G support for QSGMII interfaces
  powerpc: fsl/fman: remove fsl,fman from of_device_ids[]
  fsl/fman: A007273 only applies to PPC SoCs
  fsl/fman: enable compilation on ARM64

 arch/powerpc/platforms/85xx/corenet_generic.c |  3 ---
 drivers/net/ethernet/freescale/fman/Kconfig   |  2 +-
 drivers/net/ethernet/freescale/fman/fman.c    | 15 +++++++++++++++
 drivers/net/ethernet/freescale/fman/mac.c     |  1 +
 4 files changed, 17 insertions(+), 4 deletions(-)

-- 
2.1.0

^ permalink raw reply

* [PATCH net v4 4/4] fsl/fman: enable compilation on ARM64
From: Madalin Bucur @ 2016-12-19 20:42 UTC (permalink / raw)
  To: netdev; +Cc: scott.wood, linuxppc-dev, linux-kernel, davem
In-Reply-To: <1482180166-10677-1-git-send-email-madalin.bucur@nxp.com>

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
---
 drivers/net/ethernet/freescale/fman/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/fman/Kconfig b/drivers/net/ethernet/freescale/fman/Kconfig
index 79b7c84..dc0850b 100644
--- a/drivers/net/ethernet/freescale/fman/Kconfig
+++ b/drivers/net/ethernet/freescale/fman/Kconfig
@@ -1,6 +1,6 @@
 config FSL_FMAN
 	tristate "FMan support"
-	depends on FSL_SOC || COMPILE_TEST
+	depends on FSL_SOC || ARCH_LAYERSCAPE || COMPILE_TEST
 	select GENERIC_ALLOCATOR
 	select PHYLIB
 	default n
-- 
2.1.0

^ permalink raw reply related

* [PATCH net v4 2/4] powerpc: fsl/fman: remove fsl,fman from of_device_ids[]
From: Madalin Bucur @ 2016-12-19 20:42 UTC (permalink / raw)
  To: netdev; +Cc: linuxppc-dev, linux-kernel, davem, scott.wood, mpe
In-Reply-To: <1482180166-10677-1-git-send-email-madalin.bucur@nxp.com>

The fsl/fman drivers will use of_platform_populate() on all
supported platforms. Call of_platform_populate() to probe the
FMan sub-nodes.

Signed-off-by: Igal Liberman <igal.liberman@freescale.com>
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Acked-by: Scott Wood <oss@buserror.net>
---
 arch/powerpc/platforms/85xx/corenet_generic.c | 3 ---
 drivers/net/ethernet/freescale/fman/fman.c    | 7 +++++++
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/corenet_generic.c b/arch/powerpc/platforms/85xx/corenet_generic.c
index 1179115..824b7f1 100644
--- a/arch/powerpc/platforms/85xx/corenet_generic.c
+++ b/arch/powerpc/platforms/85xx/corenet_generic.c
@@ -117,9 +117,6 @@ static const struct of_device_id of_device_ids[] = {
 	{
 		.compatible	= "fsl,qe",
 	},
-	{
-		.compatible    = "fsl,fman",
-	},
 	/* The following two are for the Freescale hypervisor */
 	{
 		.name		= "hypervisor",
diff --git a/drivers/net/ethernet/freescale/fman/fman.c b/drivers/net/ethernet/freescale/fman/fman.c
index dafd9e1..4b83263 100644
--- a/drivers/net/ethernet/freescale/fman/fman.c
+++ b/drivers/net/ethernet/freescale/fman/fman.c
@@ -2868,6 +2868,13 @@ static struct fman *read_dts_node(struct platform_device *of_dev)
 
 	fman->dev = &of_dev->dev;
 
+	err = of_platform_populate(fm_node, NULL, NULL, &of_dev->dev);
+	if (err) {
+		dev_err(&of_dev->dev, "%s: of_platform_populate() failed\n",
+			__func__);
+		goto fman_free;
+	}
+
 	return fman;
 
 fman_node_put:
-- 
2.1.0

^ permalink raw reply related

* [PATCH net v4 1/4] fsl/fman: fix 1G support for QSGMII interfaces
From: Madalin Bucur @ 2016-12-19 20:42 UTC (permalink / raw)
  To: netdev; +Cc: linuxppc-dev, linux-kernel, davem, scott.wood, mpe
In-Reply-To: <1482180166-10677-1-git-send-email-madalin.bucur@nxp.com>

QSGMII ports were not advertising 1G speed.

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Reviewed-by: Camelia Groza <camelia.groza@nxp.com>
---
 drivers/net/ethernet/freescale/fman/mac.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/freescale/fman/mac.c b/drivers/net/ethernet/freescale/fman/mac.c
index 69ca42c..0b31f85 100644
--- a/drivers/net/ethernet/freescale/fman/mac.c
+++ b/drivers/net/ethernet/freescale/fman/mac.c
@@ -594,6 +594,7 @@ static const u16 phy2speed[] = {
 	[PHY_INTERFACE_MODE_RGMII_RXID]	= SPEED_1000,
 	[PHY_INTERFACE_MODE_RGMII_TXID]	= SPEED_1000,
 	[PHY_INTERFACE_MODE_RTBI]		= SPEED_1000,
+	[PHY_INTERFACE_MODE_QSGMII]		= SPEED_1000,
 	[PHY_INTERFACE_MODE_XGMII]		= SPEED_10000
 };
 
-- 
2.1.0

^ permalink raw reply related

* Re: [PATCH RFC net-next 2/7] net: add dst_pending_confirm flag to skbuff
From: Julian Anastasov @ 2016-12-19 20:37 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-sctp, YueHaibing
In-Reply-To: <1482164252.1521.7.camel@edumazet-glaptop3.roam.corp.google.com>


	Hello,

On Mon, 19 Dec 2016, Eric Dumazet wrote:

> I am still digesting this awesome patch series ;)

	Thanks. I don't feel quite comfortable with some
of the changes (mostly XFRM, dst_confirm usage in CXGB) and
I hope the discussion can provide adequate solution.

> Not sure why you used an unlikely() here. TCP for example would hit this
> path quite often.

	I was not sure, may be because ACKs can come with lower
rate than the sent packets. Also because non-TCP rarely uses
MSG_CONFIRM. If you still think it is better without unlikely,
I'll remove it.

> So considering sk_dst_pending_confirm might be dirtied quite often,
> 
> I am not sure why you placed it in the cache line that contains
> sk_rx_dst (in 1st patch)

	I saw your recent changes and was worried if the
sk_dst_confirm() calling on RX can cause unwanted dirtying of
additional cache line. My preliminary analyze pointed
sk_omem_alloc as good candidate for moving to next cache
line. I know how critical is to properly place the new flags,
so I really need recommendations about this.

Regards

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox