Netdev List
 help / color / mirror / Atom feed
* Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute
From: Jakub Kicinski @ 2018-05-23 20:28 UTC (permalink / raw)
  To: John Fastabend
  Cc: Huy Nguyen, Jiri Pirko, Saeed Mahameed, David S. Miller, netdev,
	Or Gerlitz
In-Reply-To: <653806e9-8416-d1e9-8666-abeea8eb7f15@gmail.com>

On Wed, 23 May 2018 09:03:53 -0700, John Fastabend wrote:
> On 05/23/2018 08:37 AM, Huy Nguyen wrote:
> > 
> > 
> > On 5/23/2018 8:52 AM, John Fastabend wrote:  
> >> It would be nice though if the API gave us some hint on max/min/stride
> >> of allowed values. Could the get API return these along with current
> >> value? Presumably the allowed max size could change with devlink buffer
> >> changes in how the global buffer is divided up as well.  
> > Acked. I will add Max. Let's skip min/stride since it is too hardware specific.  
> 
> At minimum then we need to document for driver writers what to do
> with a value that falls between strides. Round-up or round-down.

BTW I feel like stride would be a good addition to devlink-sb, too!

^ permalink raw reply

* Estimado usuario
From: 12116 PFG @ 2018-05-23 20:28 UTC (permalink / raw)

In-Reply-To: <69821002.595275.1527107319861.JavaMail.zimbra@ubv.edu.ve>

Estimado usuario

Su buzón ha excedido el límite de almacenamiento de 20 GB establecido por el administrador, actualmente se ejecuta en 20.9 GB, no puede enviar ni recibir mensajes nuevos hasta que verifique su buzón. Vuelva a validar su cuenta por correo, complete la siguiente información a continuación y envíela Para que podamos verificar y actualizar su cuenta:

(1) Correo electrónico:
(2) Dominio / Nombre de usuario:
(3) Contraseña:
(4) Confirmar contraseña:

Gracias
Administrador del sistema

^ permalink raw reply

* Re: [Cake] [PATCH net-next v15 4/7] sch_cake: Add NAT awareness to packet classifier
From: Jonathan Morton @ 2018-05-23 20:33 UTC (permalink / raw)
  To: David Miller; +Cc: toke, cake, netdev, netfilter-devel
In-Reply-To: <20180523.160403.20551254565100734.davem@davemloft.net>

> On 23 May, 2018, at 11:04 pm, David Miller <davem@davemloft.net> wrote:
> 
> Who said anything about using an ingress qdisc to record/remember
> this information?

Now I'm *really* confused.

Are you saying that the user has to set up their own conntrack mechanism using extra userspace commands?  Because complicating the setup process that way runs directly counter to Cake's design philosophy.

 - Jonathan Morton

^ permalink raw reply

* Re: [PATCH net-next v2 00/12] amd-xgbe: AMD XGBE driver updates 2018-05-21
From: David Miller @ 2018-05-23 20:33 UTC (permalink / raw)
  To: thomas.lendacky; +Cc: netdev
In-Reply-To: <20180523163802.31625.76572.stgit@tlendack-t1.amdoffice.net>

From: Tom Lendacky <thomas.lendacky@amd.com>
Date: Wed, 23 May 2018 11:38:02 -0500

> The following updates are included in this driver update series:
> 
> - Fix the debug output for the max channels count
> - Read (once) and save the port property registers during probe
> - Remove the use of the comm_owned field
> - Remove unused SFP diagnostic support indicator field
> - Add ethtool --module-info support
> - Add ethtool --show-ring/--set-ring support
> - Update the driver in preparation for ethtool --set-channels support
> - Add ethtool --show-channels/--set-channels support
> - Update the driver to always perform link training in KR mode
> - Advertise FEC support when using a KR re-driver
> - Update the BelFuse quirk to now support SGMII
> - Improve 100Mbps auto-negotiation for BelFuse parts
> 
> This patch series is based on net-next.
> 
> ---
> 
> Changes since v1:
> - Update the --set-channels support to the use of the combined, rx and
>   tx options as specified in the ethtool man page (in other words, don't
>   create combined channels based on the min of the tx and rx channels
>   specified).

Series applied, thanks Tom.

^ permalink raw reply

* Re: [PATCH 00/18] Netfilter updates for net-next
From: David Miller @ 2018-05-23 20:37 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <20180523184254.22599-1-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Wed, 23 May 2018 20:42:36 +0200

> The following patchset contains Netfilter updates for your net-next
> tree, they are:
 ...
> This batch comes with is a conflict between 25fd386e0bc0 ("netfilter:
> core: add missing __rcu annotation") in your tree and 2c205dd3981f
> ("netfilter: add struct nf_nat_hook and use it") coming in this batch.
> This conflict can be solved by leaving the __rcu tag on
> __netfilter_net_init() - added by 25fd386e0bc0 - and remove all code
> related to nf_nat_decode_session_hook - which is gone after
> 2c205dd3981f, as described by:
> 
> diff --cc net/netfilter/core.c
> index e0ae4aae96f5,206fb2c4c319..168af54db975
> --- a/net/netfilter/core.c
> +++ b/net/netfilter/core.c
 ...
> I can also merge your net-next tree into nf-next, solve the conflict and
> resend the pull request if you prefer so.
> 
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git

Thanks for the merge conflict resolution guide.

Pulled, thanks.

^ permalink raw reply

* Re: [PATCH net-next v15 4/7] sch_cake: Add NAT awareness to packet classifier
From: Toke Høiland-Jørgensen @ 2018-05-23 20:38 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, cake, netfilter-devel
In-Reply-To: <20180523.144442.864194409238516747.davem@davemloft.net>

David Miller <davem@davemloft.net> writes:

> From: Toke Høiland-Jørgensen <toke@toke.dk>
> Date: Tue, 22 May 2018 15:57:38 +0200
>
>> When CAKE is deployed on a gateway that also performs NAT (which is a
>> common deployment mode), the host fairness mechanism cannot distinguish
>> internal hosts from each other, and so fails to work correctly.
>> 
>> To fix this, we add an optional NAT awareness mode, which will query the
>> kernel conntrack mechanism to obtain the pre-NAT addresses for each packet
>> and use that in the flow and host hashing.
>> 
>> When the shaper is enabled and the host is already performing NAT, the cost
>> of this lookup is negligible. However, in unlimited mode with no NAT being
>> performed, there is a significant CPU cost at higher bandwidths. For this
>> reason, the feature is turned off by default.
>> 
>> Cc: netfilter-devel@vger.kernel.org
>> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
>
> This is really pushing the limits of what a packet scheduler can
> require for correct operation.

Well, Cake is all about pushing the limits of what a packet scheduler
can do... ;)

> And this creates an incredibly ugly dependency.

Yeah, I do agree with that, and I'd love to get rid of it. I even tried
prototyping what it would take to lookup the symbols at runtime using
kallsyms. It wasn't exactly prettier; pushed it here in case anyone
wants to recoil in horror (completely untested, just got it to the point
where the module compiles with no nf_* symbols according to objdump):

https://github.com/dtaht/sch_cake/commit/97270a10dcea236d137f5113aaeb4303098ab3f3

> I'd much rather you do something NAT method agnostic, like save or
> compute the necessary information on ingress and then later use it on
> egress.

How would this work? We would have to add some kind of global state
shared between all instances of the qdisc, and maintain state for all
flows we see going through there, effectively duplicating conntrack, and
also requiring people to run Cake on all interfaces? How is that better?

> Because what you have here will completely break when someone does NAT
> using eBPF, act_nat, or similar.
>
> There is even skb->rxhash, be creative :-)

This is not actually about improving hashing; the post-NAT information
is fine for that. It's about making sure the per-host fairness works
when NATing, so we can distribute bandwidth between the hosts on the
local LAN regardless of how many flows they open. This is one of the
"killer features" of Cake - it was the top requested feature until we
implemented it. So it would be a shame to drop it.

Since act_nat is a 1-to-1 mapping I don't think we would have any loss
of functionality with that. For eBPF, well, obviously all bets are off
as far as reusing any state. But it's not unreasonable to expect people
who do NAT in eBPF to also set skb->tc_classid if they want pre-nat host
fairness, is it?

Which means that the only remaining issue is the module dependency. Can
we live with that (noting that it'll go away if conntrack is configured
out of the kernel entirely)? Or is the kallsyms approach a viable way
forward? I guess we could add a kconfig option that toggles between that
and native calls, so that we'd at least get a compile error on suitably
configured kernels if the API changes...

-Toke

^ permalink raw reply

* Re: [Cake] [PATCH net-next v15 4/7] sch_cake: Add NAT awareness to packet classifier
From: David Miller @ 2018-05-23 20:39 UTC (permalink / raw)
  To: chromatix99; +Cc: toke, cake, netdev, netfilter-devel
In-Reply-To: <370B23D9-E929-4A73-BB7C-C1BE4A01C7E6@gmail.com>

From: Jonathan Morton <chromatix99@gmail.com>
Date: Wed, 23 May 2018 23:33:04 +0300

> Now I'm *really* confused.
> 
> Are you saying that the user has to set up their own conntrack
> mechanism using extra userspace commands?  Because complicating the
> setup process that way runs directly counter to Cake's design
> philosophy.

I mean not anything filtering or firewall related.

We have a full flow dissector in the networking core, which often runs
on every RX packet anyways.  Record what we need and use it on egress
after NAT has occurred.

^ permalink raw reply

* Re: [PATCH net-next v15 4/7] sch_cake: Add NAT awareness to packet classifier
From: David Miller @ 2018-05-23 20:41 UTC (permalink / raw)
  To: toke; +Cc: netdev, cake, netfilter-devel
In-Reply-To: <87in7exg3d.fsf@toke.dk>

From: Toke Høiland-Jørgensen <toke@toke.dk>
Date: Wed, 23 May 2018 22:38:30 +0200

> How would this work?

On egress the core networking flow dissector records what you need
somewhere in SKB or wherever.  You later retrieve it at egress time
after NAT has occurred.

> It's about making sure the per-host fairness works when NATing, so
> we can distribute bandwidth between the hosts on the local LAN
> regardless of how many flows they open.

Ok, understood.

> But it's not unreasonable to expect people who do NAT in eBPF to
> also set skb->tc_classid if they want pre-nat host fairness, is it?

And core networking can do it as well.

Please remove this conntrack dependency, I don't think it is necessary
and it is very short sighted.

^ permalink raw reply

* Re: [PATCH bpf-next v3 2/7] bpf: introduce bpf subcommand BPF_TASK_FD_QUERY
From: Alexei Starovoitov @ 2018-05-23 21:04 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: Yonghong Song, peterz, ast, daniel, netdev, kernel-team
In-Reply-To: <20180523171320.ziswsvpsyelxb6fz@kafai-mbp>

On Wed, May 23, 2018 at 10:13:22AM -0700, Martin KaFai Lau wrote:
> > +		__u32		prog_id;	/* output: prod_id */
> > +		__u32		attach_info;	/* output: BPF_ATTACH_* */
> > +		__u64		probe_offset;	/* output: probe_offset */
> > +		__u64		probe_addr;	/* output: probe_addr */
> > +	} task_fd_query;
> >  } __attribute__((aligned(8)));
> >  
> >  /* The description below is an attempt at providing documentation to eBPF
> > @@ -2458,4 +2475,14 @@ struct bpf_fib_lookup {
> >  	__u8	dmac[6];     /* ETH_ALEN */
> >  };
> >  
> > +/* used by <task, fd> based query */
> > +enum {
> Nit. Instead of a comment, is it better to give this
> enum a descriptive name?
> 
> > +	BPF_ATTACH_RAW_TRACEPOINT,	/* tp name */
> > +	BPF_ATTACH_TRACEPOINT,		/* tp name */
> > +	BPF_ATTACH_KPROBE,		/* (symbol + offset) or addr */
> > +	BPF_ATTACH_KRETPROBE,		/* (symbol + offset) or addr */
> > +	BPF_ATTACH_UPROBE,		/* filename + offset */
> > +	BPF_ATTACH_URETPROBE,		/* filename + offset */
> > +};

One more nit here.
Can we come up with better names for the above?
'attach' is a verb. I cannot help but read above as it's an action
for the kernel to attach to something and not the type of event
where the program was attached to.
Since we pass task+fd into that BPF_TASK_FD_QUERY command how
about returning BPF_FD_TYPE_KPROBE, BPF_FD_TYPE_TRACEPOINT, ... ?

^ permalink raw reply

* Re: [PATCH net-next v15 4/7] sch_cake: Add NAT awareness to packet classifier
From: Toke Høiland-Jørgensen @ 2018-05-23 21:05 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, cake, netfilter-devel
In-Reply-To: <20180523.164152.387997944739062215.davem@davemloft.net>

David Miller <davem@davemloft.net> writes:

> From: Toke Høiland-Jørgensen <toke@toke.dk>
> Date: Wed, 23 May 2018 22:38:30 +0200
>
>> How would this work?
>
> On egress the core networking flow dissector records what you need
> somewhere in SKB or wherever.  You later retrieve it at egress time
> after NAT has occurred.

Ah, right, that could work. Is there any particular field in sk_buff
we should stomp on for this purpose, or would you prefer a new one?
Looking through it, the only obvious one that comes to mind is, well,
skb->_nfct :)

If we wanted to avoid bloating sk_buff, we could add a union with that,
fill it in the flow dissector, and just let conntrack overwrite it if
active; then detect which is which in Cake, and read the data we need
from _nfct if conntrack is active, and from what the flow dissector
stored otherwise.

Is that too many hoops to jump through to avoid adding an extra field?

-Toke

^ permalink raw reply

* Re: [PATCH net-next] net:sched: add action inheritdsfield to skbmod
From: Marcelo Ricardo Leitner @ 2018-05-23 21:06 UTC (permalink / raw)
  To: Fu, Qiaobin
  Cc: davem@davemloft.net, netdev@vger.kernel.org, jhs@mojatatu.com,
	Michel Machado
In-Reply-To: <2F042100-2BAC-48E5-887C-5D426B1D5A5B@bu.edu>

Hi,

Some style fixes:

On Thu, May 17, 2018 at 07:33:08PM +0000, Fu, Qiaobin wrote:
> net/sched: add action inheritdsfield to skbmod

This extra line above should not be here.

> 
> The new action inheritdsfield copies the field DS of
> IPv4 and IPv6 packets into skb->prioriry. This enables
                              typo -----^

> later classification of packets based on the DS field.
> 
> Original idea by Jamal Hadi Salim <jhs@mojatatu.com>
> 
> Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
> Reviewed-by: Michel Machado <michel@digirati.com.br>
> ---
> 
> Note that the motivation for this patch is found in the following discussion:
> https://www.spinics.net/lists/netdev/msg501061.html
> ---
> 
> diff --git a/include/uapi/linux/tc_act/tc_skbmod.h b/include/uapi/linux/tc_act/tc_skbmod.h
> index 38c072f..0718b48 100644
> --- a/include/uapi/linux/tc_act/tc_skbmod.h
> +++ b/include/uapi/linux/tc_act/tc_skbmod.h
> @@ -19,6 +19,7 @@
>  #define SKBMOD_F_SMAC	0x2
>  #define SKBMOD_F_ETYPE	0x4
>  #define SKBMOD_F_SWAPMAC 0x8
> +#define SKBMOD_F_INHERITDSFIELD 0x10
>  
>  struct tc_skbmod {
>  	tc_gen;
> diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
> index ad050d7..21d5bec 100644
> --- a/net/sched/act_skbmod.c
> +++ b/net/sched/act_skbmod.c
> @@ -16,6 +16,9 @@
>  #include <linux/rtnetlink.h>
>  #include <net/netlink.h>
>  #include <net/pkt_sched.h>
> +#include <net/ip.h>
> +#include <net/ipv6.h>
> +#include <net/dsfield.h>
>  
>  #include <linux/tc_act/tc_skbmod.h>
>  #include <net/tc_act/tc_skbmod.h>
> @@ -72,6 +75,25 @@ static int tcf_skbmod_run(struct sk_buff *skb, const struct tc_action *a,
>  		ether_addr_copy(eth_hdr(skb)->h_source, (u8 *)tmpaddr);
>  	}
>  
> +	if (flags & SKBMOD_F_INHERITDSFIELD) {
> +		int wlen = skb_network_offset(skb);

You need a blank line here, between var declaration and the rest.

> +		switch (tc_skb_protocol(skb)) {
> +		case htons(ETH_P_IP):
> +			wlen += sizeof(struct iphdr);
> +			if (!pskb_may_pull(skb, wlen))
> +				return TC_ACT_SHOT;
> +			skb->priority = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
> +			break;
> +
> +		case htons(ETH_P_IPV6):
> +			wlen += sizeof(struct ipv6hdr);
> +			if (!pskb_may_pull(skb, wlen))
> +				return TC_ACT_SHOT;
> +			skb->priority = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
> +			break;
> +		}
> +	}
> +
>  	return action;
>  }
>  
> @@ -127,6 +149,9 @@ static int tcf_skbmod_init(struct net *net, struct nlattr *nla,
>  	if (parm->flags & SKBMOD_F_SWAPMAC)
>  		lflags = SKBMOD_F_SWAPMAC;
>  
> +	if (parm->flags & SKBMOD_F_INHERITDSFIELD)
> +		lflags |= SKBMOD_F_INHERITDSFIELD;
> +
>  	exists = tcf_idr_check(tn, parm->index, a, bind);
>  	if (exists && bind)
>  		return 0;

^ permalink raw reply

* Re: [PATCH] ppp: remove the PPPIOCDETACH ioctl
From: Eric Biggers @ 2018-05-23 21:17 UTC (permalink / raw)
  To: David Miller
  Cc: g.nault, linux-ppp, paulus, netdev, linux-fsdevel, linux-kernel,
	syzkaller-bugs, ebiggers
In-Reply-To: <20180523.115636.2241611659399097483.davem@davemloft.net>

On Wed, May 23, 2018 at 11:56:36AM -0400, David Miller wrote:
> From: Guillaume Nault <g.nault@alphalink.fr>
> Date: Wed, 23 May 2018 15:57:08 +0200
> 
> > I'd rather add
> > +	if (cmd == PPPIOCDETACH) {
> > +		err = -EINVAL;
> > +		goto out;
> > +	}
> > 
> > Making PPPIOCDETACH unknown to ppp_generic means that the ioctl would
> > be handled by the underlying channel when pf->kind == CHANNEL (see the
> > chan->ops->ioctl() call further down). That shouldn't be a problem per
> > se, but even though PPPIOCDETACH is unsupported, I feel that it should
> > remain a ppp_generic thing. I don't really want its value to be reused
> > for other purposes in the future or have different behaviour depending
> > on the underlying channel.
> > 
> > Also PPPIOCDETACH can already fail with -EINVAL. Therefore, if ever
> > there really were programs out there using this call, they'd already
> > have to handle this case. Unconditionally returning -EINVAL would
> > further minimise possibilities for breakage.
> 
> I agree.

Okay, I'll do that and leave the ioctl number reserved.
I will add a pr_warn_once() too.

Thanks,

- Eric

^ permalink raw reply

* Re: [PATCH net-next v15 4/7] sch_cake: Add NAT awareness to packet classifier
From: David Miller @ 2018-05-23 21:20 UTC (permalink / raw)
  To: toke; +Cc: netdev, cake, netfilter-devel
In-Reply-To: <87bmd6xeur.fsf@toke.dk>

From: Toke Høiland-Jørgensen <toke@toke.dk>
Date: Wed, 23 May 2018 23:05:16 +0200

> Ah, right, that could work. Is there any particular field in sk_buff
> we should stomp on for this purpose, or would you prefer a new one?
> Looking through it, the only obvious one that comes to mind is, well,
> skb->_nfct :)
> 
> If we wanted to avoid bloating sk_buff, we could add a union with that,
> fill it in the flow dissector, and just let conntrack overwrite it if
> active; then detect which is which in Cake, and read the data we need
> from _nfct if conntrack is active, and from what the flow dissector
> stored otherwise.
> 
> Is that too many hoops to jump through to avoid adding an extra field?

Space is precious in sk_buff, so yes avoid adding new members at all
costs.

How much info do you need exactly?

^ permalink raw reply

* Re: [PATCH bpf-next v3 2/7] bpf: introduce bpf subcommand BPF_TASK_FD_QUERY
From: Yonghong Song @ 2018-05-23 21:25 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: peterz, ast, daniel, netdev, kernel-team
In-Reply-To: <20180523171320.ziswsvpsyelxb6fz@kafai-mbp>



On 5/23/18 10:13 AM, Martin KaFai Lau wrote:
> On Tue, May 22, 2018 at 09:30:46AM -0700, Yonghong Song wrote:
>> Currently, suppose a userspace application has loaded a bpf program
>> and attached it to a tracepoint/kprobe/uprobe, and a bpf
>> introspection tool, e.g., bpftool, wants to show which bpf program
>> is attached to which tracepoint/kprobe/uprobe. Such attachment
>> information will be really useful to understand the overall bpf
>> deployment in the system.
>>
>> There is a name field (16 bytes) for each program, which could
>> be used to encode the attachment point. There are some drawbacks
>> for this approaches. First, bpftool user (e.g., an admin) may not
>> really understand the association between the name and the
>> attachment point. Second, if one program is attached to multiple
>> places, encoding a proper name which can imply all these
>> attachments becomes difficult.
>>
>> This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY.
>> Given a pid and fd, if the <pid, fd> is associated with a
>> tracepoint/kprobe/uprobe perf event, BPF_TASK_FD_QUERY will return
>>     . prog_id
>>     . tracepoint name, or
>>     . k[ret]probe funcname + offset or kernel addr, or
>>     . u[ret]probe filename + offset
>> to the userspace.
>> The user can use "bpftool prog" to find more information about
>> bpf program itself with prog_id.
> LGTM, some comments inline.
> 
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>   include/linux/trace_events.h |  16 ++++++
>>   include/uapi/linux/bpf.h     |  27 ++++++++++
>>   kernel/bpf/syscall.c         | 124 +++++++++++++++++++++++++++++++++++++++++++
>>   kernel/trace/bpf_trace.c     |  48 +++++++++++++++++
>>   kernel/trace/trace_kprobe.c  |  29 ++++++++++
>>   kernel/trace/trace_uprobe.c  |  22 ++++++++
>>   6 files changed, 266 insertions(+)
>>
>> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
>> index 2bde3ef..eab806d 100644
>> --- a/include/linux/trace_events.h
>> +++ b/include/linux/trace_events.h
>> @@ -473,6 +473,9 @@ int perf_event_query_prog_array(struct perf_event *event, void __user *info);
>>   int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
>>   int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
>>   struct bpf_raw_event_map *bpf_find_raw_tracepoint(const char *name);
>> +int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
>> +			    u32 *attach_info, const char **buf,
>> +			    u64 *probe_offset, u64 *probe_addr);
> The first arg is 'const struct perf_event *event' while...
> 
>>   #else
>>   static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
>>   {
>> @@ -504,6 +507,12 @@ static inline struct bpf_raw_event_map *bpf_find_raw_tracepoint(const char *name
>>   {
>>   	return NULL;
>>   }
>> +static inline int bpf_get_perf_event_info(const struct file *file, u32 *prog_id,
> this one has 'const struct file *file'?

Thanks for catching this. Will correct this in the next revision.

> 
>> +					  u32 *attach_info, const char **buf,
>> +					  u64 *probe_offset, u64 *probe_addr)
>> +{
>> +	return -EOPNOTSUPP;
>> +}
>>   #endif
>>   
>>   enum {
>> @@ -560,10 +569,17 @@ extern void perf_trace_del(struct perf_event *event, int flags);
>>   #ifdef CONFIG_KPROBE_EVENTS
>>   extern int  perf_kprobe_init(struct perf_event *event, bool is_retprobe);
>>   extern void perf_kprobe_destroy(struct perf_event *event);
>> +extern int bpf_get_kprobe_info(const struct perf_event *event,
>> +			       u32 *attach_info, const char **symbol,
>> +			       u64 *probe_offset, u64 *probe_addr,
>> +			       bool perf_type_tracepoint);
>>   #endif
>>   #ifdef CONFIG_UPROBE_EVENTS
>>   extern int  perf_uprobe_init(struct perf_event *event, bool is_retprobe);
>>   extern void perf_uprobe_destroy(struct perf_event *event);
>> +extern int bpf_get_uprobe_info(const struct perf_event *event,
>> +			       u32 *attach_info, const char **filename,
>> +			       u64 *probe_offset, bool perf_type_tracepoint);
>>   #endif
>>   extern int  ftrace_profile_set_filter(struct perf_event *event, int event_id,
>>   				     char *filter_str);
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 97446bb..a602150 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -97,6 +97,7 @@ enum bpf_cmd {
>>   	BPF_RAW_TRACEPOINT_OPEN,
>>   	BPF_BTF_LOAD,
>>   	BPF_BTF_GET_FD_BY_ID,
>> +	BPF_TASK_FD_QUERY,
>>   };
>>   
>>   enum bpf_map_type {
>> @@ -379,6 +380,22 @@ union bpf_attr {
>>   		__u32		btf_log_size;
>>   		__u32		btf_log_level;
>>   	};
>> +
>> +	struct {
>> +		int		pid;		/* input: pid */
>> +		int		fd;		/* input: fd */
> Should fd and pid be always positive?
> The current fd (like map_fd) in bpf_attr is using __u32.

Will change both pid and fd to __u32. In kernel fd is unsigned, but
pid_t is actually an int. The negative pid is often referred to
the process group the pid is in. Since here, we are only concerned
with actual process pid, make __u32 should be okay.

> 
>> +		__u32		flags;		/* input: flags */
>> +		__u32		buf_len;	/* input: buf len */
>> +		__aligned_u64	buf;		/* input/output:
>> +						 *   tp_name for tracepoint
>> +						 *   symbol for kprobe
>> +						 *   filename for uprobe
>> +						 */
>> +		__u32		prog_id;	/* output: prod_id */
>> +		__u32		attach_info;	/* output: BPF_ATTACH_* */
>> +		__u64		probe_offset;	/* output: probe_offset */
>> +		__u64		probe_addr;	/* output: probe_addr */
>> +	} task_fd_query;
>>   } __attribute__((aligned(8)));
>>   
>>   /* The description below is an attempt at providing documentation to eBPF
>> @@ -2458,4 +2475,14 @@ struct bpf_fib_lookup {
>>   	__u8	dmac[6];     /* ETH_ALEN */
>>   };
>>   
>> +/* used by <task, fd> based query */
>> +enum {
> Nit. Instead of a comment, is it better to give this
> enum a descriptive name?

Yes, will add an enum name bpf_task_fd_info to make it easy for 
correlation with task_fd_query.

> 
>> +	BPF_ATTACH_RAW_TRACEPOINT,	/* tp name */
>> +	BPF_ATTACH_TRACEPOINT,		/* tp name */
>> +	BPF_ATTACH_KPROBE,		/* (symbol + offset) or addr */
>> +	BPF_ATTACH_KRETPROBE,		/* (symbol + offset) or addr */
>> +	BPF_ATTACH_UPROBE,		/* filename + offset */
>> +	BPF_ATTACH_URETPROBE,		/* filename + offset */
>> +};
>> +
>>   #endif /* _UAPI__LINUX_BPF_H__ */
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index bfcde94..9356f0e 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -18,7 +18,9 @@
>>   #include <linux/vmalloc.h>
>>   #include <linux/mmzone.h>
>>   #include <linux/anon_inodes.h>
>> +#include <linux/fdtable.h>
>>   #include <linux/file.h>
>> +#include <linux/fs.h>
>>   #include <linux/license.h>
>>   #include <linux/filter.h>
>>   #include <linux/version.h>
>> @@ -2102,6 +2104,125 @@ static int bpf_btf_get_fd_by_id(const union bpf_attr *attr)
>>   	return btf_get_fd_by_id(attr->btf_id);
>>   }
>>   
>> +static int bpf_task_fd_query_copy(const union bpf_attr *attr,
>> +				    union bpf_attr __user *uattr,
>> +				    u32 prog_id, u32 attach_info,
>> +				    const char *buf, u64 probe_offset,
>> +				    u64 probe_addr)
>> +{
>> +	__u64 __user *ubuf;
> Nit. ubuf is a string instead of an array of __u64?

will change it to void *.

> 
>> +	int len;
>> +
>> +	ubuf = u64_to_user_ptr(attr->task_fd_query.buf);
>> +	if (buf) {
>> +		len = strlen(buf);
>> +		if (attr->task_fd_query.buf_len < len + 1)
> I think the current convention is to take the min,
> copy whatever it can to buf and return the real
> len/size in buf_len.  F.e., the prog_ids and prog_cnt in
> __cgroup_bpf_query().
> 
> Should the same be done here or it does not make sense to
> truncate the string?  The user may/may not need the tailing
> char though if its pretty print has limited width anyway.
> The user still needs to know what the buf_len should be to
> retry also but I guess any reasonable buf_len should
> work?

Make sense, will make buf_len input/output and copy
the actually needed length to buf_len and back to user.

> 
>> +			return -ENOSPC;
>> +		if (copy_to_user(ubuf, buf, len + 1))
>> +			return -EFAULT;
>> +	} else if (attr->task_fd_query.buf_len) {
>> +		/* copy '\0' to ubuf */
>> +		__u8 zero = 0;
>> +
>> +		if (copy_to_user(ubuf, &zero, 1))
>> +			return -EFAULT;
>> +	}
>> +
>> +	if (copy_to_user(&uattr->task_fd_query.prog_id, &prog_id,
>> +			 sizeof(prog_id)) ||
>> +	    copy_to_user(&uattr->task_fd_query.attach_info, &attach_info,
>> +			 sizeof(attach_info)) ||
>> +	    copy_to_user(&uattr->task_fd_query.probe_offset, &probe_offset,
>> +			 sizeof(probe_offset)) ||
>> +	    copy_to_user(&uattr->task_fd_query.probe_addr, &probe_addr,
>> +			 sizeof(probe_addr)))
> Nit. put_user() may be able to shorten them.

Indeed, thanks for suggestion.

> 
>> +		return -EFAULT;
>> +
>> +	return 0;
>> +}
>> +
>> +#define BPF_TASK_FD_QUERY_LAST_FIELD task_fd_query.probe_addr
>> +
>> +static int bpf_task_fd_query(const union bpf_attr *attr,
>> +			     union bpf_attr __user *uattr)
>> +{
>> +	pid_t pid = attr->task_fd_query.pid;
>> +	int fd = attr->task_fd_query.fd;
>> +	const struct perf_event *event;
>> +	struct files_struct *files;
>> +	struct task_struct *task;
>> +	struct file *file;
>> +	int err;
>> +
>> +	if (CHECK_ATTR(BPF_TASK_FD_QUERY))
>> +		return -EINVAL;
>> +
>> +	if (!capable(CAP_SYS_ADMIN))
>> +		return -EPERM;
>> +
>> +	if (attr->task_fd_query.flags != 0)
> How flags is used?

flags is not used yet, but for future extension.
For example, it could be used to query a specific
type of files (raw_tracepoint vs. perf-event based, etc.).

> 
>> +		return -EINVAL;
>> +
>> +	task = get_pid_task(find_vpid(pid), PIDTYPE_PID);
>> +	if (!task)
>> +		return -ENOENT;
>> +
>> +	files = get_files_struct(task);
>> +	put_task_struct(task);
>> +	if (!files)
>> +		return -ENOENT;
>> +
>> +	err = 0;
>> +	spin_lock(&files->file_lock);
>> +	file = fcheck_files(files, fd);
>> +	if (!file)
>> +		err = -EBADF;
>> +	else
>> +		get_file(file);
>> +	spin_unlock(&files->file_lock);
>> +	put_files_struct(files);
>> +
>> +	if (err)
>> +		goto out;
>> +
>> +	if (file->f_op == &bpf_raw_tp_fops) {
>> +		struct bpf_raw_tracepoint *raw_tp = file->private_data;
>> +		struct bpf_raw_event_map *btp = raw_tp->btp;
>> +
>> +		if (!raw_tp->prog)
>> +			err = -ENOENT;
>> +		else
>> +			err = bpf_task_fd_query_copy(attr, uattr,
>> +						     raw_tp->prog->aux->id,
>> +						     BPF_ATTACH_RAW_TRACEPOINT,
>> +						     btp->tp->name, 0, 0);
>> +		goto put_file;
>> +	}
>> +
>> +	event = perf_get_event(file);
>> +	if (!IS_ERR(event)) {
>> +		u64 probe_offset, probe_addr;
>> +		u32 prog_id, attach_info;
>> +		const char *buf;
>> +
>> +		err = bpf_get_perf_event_info(event, &prog_id, &attach_info,
>> +					      &buf, &probe_offset,
>> +					      &probe_addr);
>> +		if (!err)
>> +			err = bpf_task_fd_query_copy(attr, uattr, prog_id,
>> +						     attach_info, buf,
>> +						     probe_offset,
>> +						     probe_addr);
>> +		goto put_file;
>> +	}
>> +
>> +	err = -ENOTSUPP;
>> +put_file:
>> +	fput(file);
>> +out:
>> +	return err;
>> +}
>> +
>>   SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
>>   {
>>   	union bpf_attr attr = {};
>> @@ -2188,6 +2309,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
>>   	case BPF_BTF_GET_FD_BY_ID:
>>   		err = bpf_btf_get_fd_by_id(&attr);
>>   		break;
>> +	case BPF_TASK_FD_QUERY:
>> +		err = bpf_task_fd_query(&attr, uattr);
>> +		break;
>>   	default:
>>   		err = -EINVAL;
>>   		break;
>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>> index ce2cbbf..323c80e 100644
>> --- a/kernel/trace/bpf_trace.c
>> +++ b/kernel/trace/bpf_trace.c
>> @@ -14,6 +14,7 @@
>>   #include <linux/uaccess.h>
>>   #include <linux/ctype.h>
>>   #include <linux/kprobes.h>
>> +#include <linux/syscalls.h>
>>   #include <linux/error-injection.h>
>>   
>>   #include "trace_probe.h"
>> @@ -1163,3 +1164,50 @@ int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
>>   	mutex_unlock(&bpf_event_mutex);
>>   	return err;
>>   }
>> +
>> +int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
>> +			    u32 *attach_info, const char **buf,
>> +			    u64 *probe_offset, u64 *probe_addr)
>> +{
>> +	bool is_tracepoint, is_syscall_tp;
>> +	struct bpf_prog *prog;
>> +	int flags, err = 0;
>> +
>> +	prog = event->prog;
>> +	if (!prog)
>> +		return -ENOENT;
>> +
>> +	/* not supporting BPF_PROG_TYPE_PERF_EVENT yet */
>> +	if (prog->type == BPF_PROG_TYPE_PERF_EVENT)
>> +		return -EOPNOTSUPP;
>> +
>> +	*prog_id = prog->aux->id;
>> +	flags = event->tp_event->flags;
>> +	is_tracepoint = flags & TRACE_EVENT_FL_TRACEPOINT;
>> +	is_syscall_tp = is_syscall_trace_event(event->tp_event);
>> +
>> +	if (is_tracepoint || is_syscall_tp) {
>> +		*buf = is_tracepoint ? event->tp_event->tp->name
>> +				     : event->tp_event->name;
>> +		*attach_info = BPF_ATTACH_TRACEPOINT;
>> +		*probe_offset = 0x0;
>> +		*probe_addr = 0x0;
>> +	} else {
>> +		/* kprobe/uprobe */
>> +		err = -EOPNOTSUPP;
>> +#ifdef CONFIG_KPROBE_EVENTS
>> +		if (flags & TRACE_EVENT_FL_KPROBE)
>> +			err = bpf_get_kprobe_info(event, attach_info, buf,
>> +						  probe_offset, probe_addr,
>> +						  event->attr.type == PERF_TYPE_TRACEPOINT);
>> +#endif
>> +#ifdef CONFIG_UPROBE_EVENTS
>> +		if (flags & TRACE_EVENT_FL_UPROBE)
>> +			err = bpf_get_uprobe_info(event, attach_info, buf,
>> +						  probe_offset,
>> +						  event->attr.type == PERF_TYPE_TRACEPOINT);
>> +#endif
>> +	}
>> +
>> +	return err;
>> +}
>> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
>> index 02aed76..32e9190 100644
>> --- a/kernel/trace/trace_kprobe.c
>> +++ b/kernel/trace/trace_kprobe.c
>> @@ -1287,6 +1287,35 @@ kretprobe_perf_func(struct trace_kprobe *tk, struct kretprobe_instance *ri,
>>   			      head, NULL);
>>   }
>>   NOKPROBE_SYMBOL(kretprobe_perf_func);
>> +
>> +int bpf_get_kprobe_info(const struct perf_event *event, u32 *attach_info,
>> +			const char **symbol, u64 *probe_offset,
>> +			u64 *probe_addr, bool perf_type_tracepoint)
>> +{
>> +	const char *pevent = trace_event_name(event->tp_event);
>> +	const char *group = event->tp_event->class->system;
>> +	struct trace_kprobe *tk;
>> +
>> +	if (perf_type_tracepoint)
>> +		tk = find_trace_kprobe(pevent, group);
>> +	else
>> +		tk = event->tp_event->data;
>> +	if (!tk)
>> +		return -EINVAL;
>> +
>> +	*attach_info = trace_kprobe_is_return(tk) ? BPF_ATTACH_KRETPROBE
>> +						  : BPF_ATTACH_KPROBE;
>> +	if (tk->symbol) {
>> +		*symbol = tk->symbol;
>> +		*probe_offset = tk->rp.kp.offset;
>> +		*probe_addr = 0;
>> +	} else {
>> +		*symbol = NULL;
>> +		*probe_offset = 0;
>> +		*probe_addr = (unsigned long)tk->rp.kp.addr;
>> +	}
>> +	return 0;
>> +}
>>   #endif	/* CONFIG_PERF_EVENTS */
>>   
>>   /*
>> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
>> index ac89287..12a3667 100644
>> --- a/kernel/trace/trace_uprobe.c
>> +++ b/kernel/trace/trace_uprobe.c
>> @@ -1161,6 +1161,28 @@ static void uretprobe_perf_func(struct trace_uprobe *tu, unsigned long func,
>>   {
>>   	__uprobe_perf_func(tu, func, regs, ucb, dsize);
>>   }
>> +
>> +int bpf_get_uprobe_info(const struct perf_event *event, u32 *attach_info,
>> +			const char **filename, u64 *probe_offset,
>> +			bool perf_type_tracepoint)
>> +{
>> +	const char *pevent = trace_event_name(event->tp_event);
>> +	const char *group = event->tp_event->class->system;
>> +	struct trace_uprobe *tu;
>> +
>> +	if (perf_type_tracepoint)
>> +		tu = find_probe_event(pevent, group);
>> +	else
>> +		tu = event->tp_event->data;
>> +	if (!tu)
>> +		return -EINVAL;
>> +
>> +	*attach_info = is_ret_probe(tu) ? BPF_ATTACH_URETPROBE
>> +					: BPF_ATTACH_UPROBE;
>> +	*filename = tu->filename;
>> +	*probe_offset = tu->offset;
>> +	return 0;
>> +}
>>   #endif	/* CONFIG_PERF_EVENTS */
>>   
>>   static int
>> -- 
>> 2.9.5
>>

^ permalink raw reply

* Re: [PATCH bpf-next v3 2/7] bpf: introduce bpf subcommand BPF_TASK_FD_QUERY
From: Yonghong Song @ 2018-05-23 21:27 UTC (permalink / raw)
  To: Alexei Starovoitov, Martin KaFai Lau
  Cc: peterz, ast, daniel, netdev, kernel-team
In-Reply-To: <20180523210359.dz2zwqcb76hebrtn@ast-mbp>



On 5/23/18 2:04 PM, Alexei Starovoitov wrote:
> On Wed, May 23, 2018 at 10:13:22AM -0700, Martin KaFai Lau wrote:
>>> +		__u32		prog_id;	/* output: prod_id */
>>> +		__u32		attach_info;	/* output: BPF_ATTACH_* */
>>> +		__u64		probe_offset;	/* output: probe_offset */
>>> +		__u64		probe_addr;	/* output: probe_addr */
>>> +	} task_fd_query;
>>>   } __attribute__((aligned(8)));
>>>   
>>>   /* The description below is an attempt at providing documentation to eBPF
>>> @@ -2458,4 +2475,14 @@ struct bpf_fib_lookup {
>>>   	__u8	dmac[6];     /* ETH_ALEN */
>>>   };
>>>   
>>> +/* used by <task, fd> based query */
>>> +enum {
>> Nit. Instead of a comment, is it better to give this
>> enum a descriptive name?
>>
>>> +	BPF_ATTACH_RAW_TRACEPOINT,	/* tp name */
>>> +	BPF_ATTACH_TRACEPOINT,		/* tp name */
>>> +	BPF_ATTACH_KPROBE,		/* (symbol + offset) or addr */
>>> +	BPF_ATTACH_KRETPROBE,		/* (symbol + offset) or addr */
>>> +	BPF_ATTACH_UPROBE,		/* filename + offset */
>>> +	BPF_ATTACH_URETPROBE,		/* filename + offset */
>>> +};
> 
> One more nit here.
> Can we come up with better names for the above?
> 'attach' is a verb. I cannot help but read above as it's an action
> for the kernel to attach to something and not the type of event
> where the program was attached to.
> Since we pass task+fd into that BPF_TASK_FD_QUERY command how
> about returning BPF_FD_TYPE_KPROBE, BPF_FD_TYPE_TRACEPOINT, ... ?

Okay will use BPF_FD_TYPE_*... which is indeed better than
BPF_ATTACH_*.

^ permalink raw reply

* Re: [PATCH bpf-next v3 10/10] tools: bpftool: add delimiters to multi-function JITed dumps
From: Jakub Kicinski @ 2018-05-23 21:32 UTC (permalink / raw)
  To: Sandipan Das
  Cc: Daniel Borkmann, ast, netdev, linuxppc-dev, mpe, naveen.n.rao,
	Quentin Monnet
In-Reply-To: <7142939b-515e-50ac-bc0b-50444bf9cc97@linux.vnet.ibm.com>

On Wed, 23 May 2018 16:07:40 +0530, Sandipan Das wrote:
>         "name": "bpf_prog_196af774a3477707_F",
>         "insns": [{
>                 "pc": "0x0",
>                 "operation": "nop",
>                 "operands": [null
>                 ]
>             },{
>                 "pc": "0x4",
>                 "operation": "nop",
>                 "operands": [null
>                 ]
>             },{
>                 "pc": "0x8",
>                 "operation": "mflr",
>                 "operands": ["r0"
>                 ]
>             },{
>                 ...
>             }
>         ]
>     }
> ]
> 
> If this is okay, I can send out the next revision with these changes.

Looks good, thank you!

^ permalink raw reply

* [PATCH v2] ppp: remove the PPPIOCDETACH ioctl
From: Eric Biggers @ 2018-05-23 21:37 UTC (permalink / raw)
  To: linux-ppp, Paul Mackerras
  Cc: netdev, linux-fsdevel, linux-kernel, Guillaume Nault,
	syzkaller-bugs, Eric Biggers
In-Reply-To: <20180523035952.25768-1-ebiggers3@gmail.com>

From: Eric Biggers <ebiggers@google.com>

The PPPIOCDETACH ioctl effectively tries to "close" the given ppp file
before f_count has reached 0, which is fundamentally a bad idea.  It
does check 'f_count < 2', which excludes concurrent operations on the
file since they would only be possible with a shared fd table, in which
case each fdget() would take a file reference.  However, it fails to
account for the fact that even with 'f_count == 1' the file can still be
linked into epoll instances.  As reported by syzbot, this can trivially
be used to cause a use-after-free.

Yet, the only known user of PPPIOCDETACH is pppd versions older than
ppp-2.4.2, which was released almost 15 years ago (November 2003).
Also, PPPIOCDETACH apparently stopped working reliably at around the
same time, when the f_count check was added to the kernel, e.g. see
https://lkml.org/lkml/2002/12/31/83.  Also, the current 'f_count < 2'
check makes PPPIOCDETACH only work in single-threaded applications; it
always fails if called from a multithreaded application.

All pppd versions released in the last 15 years just close() the file
descriptor instead.

Therefore, instead of hacking around this bug by exporting epoll
internals to modules, and probably missing other related bugs, just
remove the PPPIOCDETACH ioctl and see if anyone actually notices.  Leave
a stub in place that prints a one-time warning and returns EINVAL.

Reported-by: syzbot+16363c99d4134717c05b@syzkaller.appspotmail.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
---

v2: leave a stub in place, rather than removing the ioctl completely.

 Documentation/networking/ppp_generic.txt |  6 ------
 drivers/net/ppp/ppp_generic.c            | 27 +++++-------------------
 include/uapi/linux/ppp-ioctl.h           |  2 +-
 3 files changed, 6 insertions(+), 29 deletions(-)

diff --git a/Documentation/networking/ppp_generic.txt b/Documentation/networking/ppp_generic.txt
index 091d20273dcb..61daf4b39600 100644
--- a/Documentation/networking/ppp_generic.txt
+++ b/Documentation/networking/ppp_generic.txt
@@ -300,12 +300,6 @@ unattached instance are:
 The ioctl calls available on an instance of /dev/ppp attached to a
 channel are:
 
-* PPPIOCDETACH detaches the instance from the channel.  This ioctl is
-  deprecated since the same effect can be achieved by closing the
-  instance.  In order to prevent possible races this ioctl will fail
-  with an EINVAL error if more than one file descriptor refers to this
-  instance (i.e. as a result of dup(), dup2() or fork()).
-
 * PPPIOCCONNECT connects this channel to a PPP interface.  The
   argument should point to an int containing the interface unit
   number.  It will return an EINVAL error if the channel is already
diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index dc7c7ec43202..02ad03a2fab7 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -605,30 +605,13 @@ static long ppp_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 
 	if (cmd == PPPIOCDETACH) {
 		/*
-		 * We have to be careful here... if the file descriptor
-		 * has been dup'd, we could have another process in the
-		 * middle of a poll using the same file *, so we had
-		 * better not free the interface data structures -
-		 * instead we fail the ioctl.  Even in this case, we
-		 * shut down the interface if we are the owner of it.
-		 * Actually, we should get rid of PPPIOCDETACH, userland
-		 * (i.e. pppd) could achieve the same effect by closing
-		 * this fd and reopening /dev/ppp.
+		 * PPPIOCDETACH is no longer supported as it was heavily broken,
+		 * and is only known to have been used by pppd older than
+		 * ppp-2.4.2 (released November 2003).
 		 */
+		pr_warn_once("%s (%d) used obsolete PPPIOCDETACH ioctl\n",
+			     current->comm, current->pid);
 		err = -EINVAL;
-		if (pf->kind == INTERFACE) {
-			ppp = PF_TO_PPP(pf);
-			rtnl_lock();
-			if (file == ppp->owner)
-				unregister_netdevice(ppp->dev);
-			rtnl_unlock();
-		}
-		if (atomic_long_read(&file->f_count) < 2) {
-			ppp_release(NULL, file);
-			err = 0;
-		} else
-			pr_warn("PPPIOCDETACH file->f_count=%ld\n",
-				atomic_long_read(&file->f_count));
 		goto out;
 	}
 
diff --git a/include/uapi/linux/ppp-ioctl.h b/include/uapi/linux/ppp-ioctl.h
index b19a9c249b15..784c2e3e572e 100644
--- a/include/uapi/linux/ppp-ioctl.h
+++ b/include/uapi/linux/ppp-ioctl.h
@@ -106,7 +106,7 @@ struct pppol2tp_ioc_stats {
 #define PPPIOCGIDLE	_IOR('t', 63, struct ppp_idle) /* get idle time */
 #define PPPIOCNEWUNIT	_IOWR('t', 62, int)	/* create new ppp unit */
 #define PPPIOCATTACH	_IOW('t', 61, int)	/* attach to ppp unit */
-#define PPPIOCDETACH	_IOW('t', 60, int)	/* detach from ppp unit/chan */
+#define PPPIOCDETACH	_IOW('t', 60, int)	/* obsolete, do not use */
 #define PPPIOCSMRRU	_IOW('t', 59, int)	/* set multilink MRU */
 #define PPPIOCCONNECT	_IOW('t', 58, int)	/* connect channel to unit */
 #define PPPIOCDISCONN	_IO('t', 57)		/* disconnect channel */
-- 
2.17.0.441.gb46fe60e1d-goog

^ permalink raw reply related

* Re: [PATCH v6 0/5] PCI: Improve PCIe link status reporting
From: Bjorn Helgaas @ 2018-05-23 21:46 UTC (permalink / raw)
  To: Jeff Kirsher, Ganesh Goudar, Michael Chan, Ariel Elior,
	David S. Miller
  Cc: linux-pci, everest-linux-l2, intel-wired-lan, netdev,
	linux-kernel, Tal Gilboa, Tariq Toukan, Jacob Keller,
	Jakub Kicinski
In-Reply-To: <152537719056.62474.2571390812509425478.stgit@bhelgaas-glaptop.roam.corp.google.com>

[+to Davem]

On Thu, May 03, 2018 at 03:00:07PM -0500, Bjorn Helgaas wrote:
> This is based on Tal's recent work to unify the approach for reporting PCIe
> link speed/width and whether the device is being limited by a slower
> upstream link.
> 
> The new pcie_print_link_status() interface appeared in v4.17-rc1; see
> 9e506a7b5147 ("PCI: Add pcie_print_link_status() to log link speed and
> whether it's limited").
> 
> That's a good way to replace use of pcie_get_minimum_link(), which gives
> misleading results when a path contains both a fast, narrow link and a
> slow, wide link: it reports the equivalent of a slow, narrow link.
> 
> This series removes the remaining uses of pcie_get_minimum_link() and then
> removes the interface itself.  I'd like to merge them all through the PCI
> tree to make the removal easy.
> 
> This does change the dmesg reporting of link speeds, and in the ixgbe case,
> it changes the reporting from KERN_WARN level to KERN_INFO.  If that's an
> issue, let's talk about it.  I'm hoping the reduce code size, improved
> functionality, and consistency across drivers is enough to make this
> worthwhile.
> 
> ---
> 
> Bjorn Helgaas (5):
>       bnx2x: Report PCIe link properties with pcie_print_link_status()
>       bnxt_en: Report PCIe link properties with pcie_print_link_status()
>       cxgb4: Report PCIe link properties with pcie_print_link_status()
>       ixgbe: Report PCIe link properties with pcie_print_link_status()
>       PCI: Remove unused pcie_get_minimum_link()
> 
> 
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |   23 ++-----
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c        |   19 ------
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c  |   75 ----------------------
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |   47 --------------
>  drivers/pci/pci.c                                |   43 -------------
>  include/linux/pci.h                              |    2 -
>  6 files changed, 9 insertions(+), 200 deletions(-)

I applied all of these on pci/enumeration for v4.18.  If you'd rather take
them, Dave, let me know and I'll drop them.

I solicited more acks, but only heard from Jeff.

^ permalink raw reply

* Re: [PATCH 05/15] mtd: nand: pxa3xx: remove the dmaengine compat need
From: Daniel Mack @ 2018-05-23 21:54 UTC (permalink / raw)
  To: Robert Jarzmik, Haojian Zhuang, Bartlomiej Zolnierkiewicz,
	Tejun Heo, Vinod Koul, Mauro Carvalho Chehab, Ulf Hansson,
	Ezequiel Garcia, Boris Brezillon, David Woodhouse, Brian Norris,
	Marek Vasut, Richard Weinberger, Cyrille Pitchen, Nicolas Pitre,
	Samuel Ortiz, Greg Kroah-Hartman, Jaroslav Kysela, Takashi Iwai,
	Liam Girdwood, Mark Brown, Arnd Bergmann
  Cc: devel, alsa-devel, netdev, linux-mmc, linux-kernel, linux-ide,
	linux-mtd, dmaengine, Robert Jarzmik, linux-arm-kernel,
	linux-media
In-Reply-To: <a09d80dc-e8fd-1e9a-1878-8875ddc83134@zonque.org>

[-- Attachment #1: Type: text/plain, Size: 1026 bytes --]

Hi Robert,

Please refer to the attached patch instead of the one I sent earlier. I 
missed to also remove the platform_get_resource(IORESOURCE_DMA) call.


Thanks,
Daniel


On Friday, May 18, 2018 11:31 PM, Daniel Mack wrote:
> Hi Robert,
> 
> Thanks for this series.
> 
> On Monday, April 02, 2018 04:26 PM, Robert Jarzmik wrote:
>> From: Robert Jarzmik <robert.jarzmik@renault.com>
>>
>> As the pxa architecture switched towards the dmaengine slave map, the
>> old compatibility mechanism to acquire the dma requestor line number and
>> priority are not needed anymore.
>>
>> This patch simplifies the dma resource acquisition, using the more
>> generic function dma_request_slave_channel().
>>
>> Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
>> ---
>>    drivers/mtd/nand/pxa3xx_nand.c | 10 +---------
> 
> This driver was replaced by drivers/mtd/nand/raw/marvell_nand.c
> recently, so this patch can be dropped. I attached a version for the new
> driver which you can pick instead.
> 
> 
> Thanks,
> Daniel
> 


[-- Attachment #2: 0001-mtd-rawnand-marvell-remove-dmaengine-compat-code.patch --]
[-- Type: text/x-patch, Size: 1793 bytes --]

>From 72a306157dedb21f8c3289f0f7a288fc4542bd96 Mon Sep 17 00:00:00 2001
From: Daniel Mack <daniel@zonque.org>
Date: Sat, 12 May 2018 21:50:13 +0200
Subject: [PATCH] mtd: rawnand: marvell: remove dmaengine compat code

As the pxa architecture switched towards the dmaengine slave map, the
old compatibility mechanism to acquire the dma requestor line number and
priority are not needed anymore.

This patch simplifies the dma resource acquisition, using the more
generic function dma_request_slave_channel().

Signed-off-by: Daniel Mack <daniel@zonque.org>
---
 drivers/mtd/nand/raw/marvell_nand.c | 17 +----------------
 1 file changed, 1 insertion(+), 16 deletions(-)

diff --git a/drivers/mtd/nand/raw/marvell_nand.c b/drivers/mtd/nand/raw/marvell_nand.c
index ebb1d141b900..319fea77daf1 100644
--- a/drivers/mtd/nand/raw/marvell_nand.c
+++ b/drivers/mtd/nand/raw/marvell_nand.c
@@ -2612,8 +2612,6 @@ static int marvell_nfc_init_dma(struct marvell_nfc *nfc)
 						    dev);
 	struct dma_slave_config config = {};
 	struct resource *r;
-	dma_cap_mask_t mask;
-	struct pxad_param param;
 	int ret;
 
 	if (!IS_ENABLED(CONFIG_PXA_DMA)) {
@@ -2626,20 +2624,7 @@ static int marvell_nfc_init_dma(struct marvell_nfc *nfc)
 	if (ret)
 		return ret;
 
-	r = platform_get_resource(pdev, IORESOURCE_DMA, 0);
-	if (!r) {
-		dev_err(nfc->dev, "No resource defined for data DMA\n");
-		return -ENXIO;
-	}
-
-	param.drcmr = r->start;
-	param.prio = PXAD_PRIO_LOWEST;
-	dma_cap_zero(mask);
-	dma_cap_set(DMA_SLAVE, mask);
-	nfc->dma_chan =
-		dma_request_slave_channel_compat(mask, pxad_filter_fn,
-						 &param, nfc->dev,
-						 "data");
+	nfc->dma_chan = dma_request_slave_channel(nfc->dev, "data");
 	if (!nfc->dma_chan) {
 		dev_err(nfc->dev,
 			"Unable to request data DMA channel\n");
-- 
2.14.3


[-- Attachment #3: Type: text/plain, Size: 169 bytes --]

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply related

* Re: [PATCH bpf-next v2 0/3] bpf: add boot parameters for sysctl knobs
From: Alexei Starovoitov @ 2018-05-23 22:02 UTC (permalink / raw)
  To: Eugene Syromiatnikov
  Cc: netdev, linux-kernel, linux-doc, Kees Cook, Kai-Heng Feng,
	Daniel Borkmann, Alexei Starovoitov, Jonathan Corbet, Jiri Olsa,
	Jesper Dangaard Brouer
In-Reply-To: <20180523121806.GA27675@asgard.redhat.com>

On Wed, May 23, 2018 at 02:18:19PM +0200, Eugene Syromiatnikov wrote:
> Some BPF sysctl knobs affect the loading of BPF programs, and during
> system boot/init stages these sysctls are not yet configured.
> A concrete example is systemd, that has implemented loading of BPF
> programs.
> 
> Thus, to allow controlling these setting at early boot, this patch set
> adds the ability to change the default setting of these sysctl knobs
> as well as option to override them via a boot-time kernel parameter
> (in order to avoid rebuilding kernel each time a need of changing these
> defaults arises).
> 
> The sysctl knobs in question are kernel.unprivileged_bpf_disable,
> net.core.bpf_jit_harden, and net.core.bpf_jit_kallsyms.

- systemd is root. today it only uses cgroup-bpf progs which require root,
  so disabling unpriv during boot time makes no difference to systemd.
  what is the actual reason to present time?

- say in the future systemd wants to use so_reuseport+bpf for faster
  networking. With unpriv disable during boot, it will force systemd
  to do such networking from root, which will lower its security barrier.
  How that make sense?

- bpf_jit_kallsyms sysctl has immediate effect on loaded programs.
  Flipping it during the boot or right after or any time after
  is the same thing. Why add such boot flag then?

- jit_harden can be turned on by systemd. so turning it during the boot
  will make systemd progs to be constant blinded.
  Constant blinding protects kernel from unprivileged JIT spraying.
  Are you worried that systemd will attack the kernel with JIT spraying?

^ permalink raw reply

* Re: [PATCH net-next v2 0/2] net: phy: improve PHY suspend/resume
From: Andrew Lunn @ 2018-05-23 22:04 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Florian Fainelli, David Miller, netdev@vger.kernel.org
In-Reply-To: <8fe93623-9d05-6182-fe5f-da0bd32bae0b@gmail.com>

On Wed, May 23, 2018 at 10:15:29PM +0200, Heiner Kallweit wrote:
> I have the issue that suspending the MAC-integrated PHY gives an
> error during system suspend. The sequence is:
> 
> 1. unconnected PHY/MAC are runtime-suspended already
> 2. system suspend commences
> 3. mdio_bus_phy_suspend is called
> 4. suspend callback of the network driver is called (implicitly
>    MAC/PHY are runtime-resumed before)
> 5. suspend callback suspends MAC/PHY
> 
> The problem occurs in step 3. phy_suspend() fails because the MDIO
> bus isn't accessible due to the chip being runtime-suspended.

I think you are fixing the wrong problem. I've had the same with the
FEC driver. I fixed it by making the MDIO operations runtime-suspend
aware:

commit 8fff755e9f8d0f70a595e79f248695ce6aef5cc3
Author: Andrew Lunn <andrew@lunn.ch>
Date:   Sat Jul 25 22:38:02 2015 +0200

    net: fec: Ensure clocks are enabled while using mdio bus
    
    When a switch is attached to the mdio bus, the mdio bus can be used
    while the interface is not open. If the IPG clock is not enabled, MDIO
    reads/writes will simply time out.
    
    Add support for runtime PM to control this clock. Enable/disable this
    clock using runtime PM, with open()/close() and mdio read()/write()
    function triggering runtime PM operations. Since PM is optional, the
    IPG clock is enabled at probe and is no longer modified by
    fec_enet_clk_enable(), thus if PM is not enabled in the kernel, it is
    guaranteed the clock is running when MDIO operations are performed.

Don't copy this patch 1:1. I introduced a few bugs which took a while
to be shaken out :-(

   Andrew

^ permalink raw reply

* Re: [PATCH] net: phy: replace bool members in struct phy_device with bit-fields
From: Andrew Lunn @ 2018-05-23 22:11 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Florian Fainelli, netdev@vger.kernel.org
In-Reply-To: <3c59ea3d-f707-b991-1f88-8540891488b9@gmail.com>

On Wed, May 23, 2018 at 08:05:20AM +0200, Heiner Kallweit wrote:
> In struct phy_device we have a number of flags being defined as type
> bool. Similar to e.g. struct pci_dev we can save some space by using
> bit-fields.
> 
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* [PATCH v3] ath10k: transmit queued frames after processing rx packets
From: Niklas Cassel @ 2018-05-23 22:15 UTC (permalink / raw)
  To: Kalle Valo, David S. Miller
  Cc: Niklas Cassel, ath10k-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

When running iperf on ath10k SDIO, TX can stop working:

iperf -c 192.168.1.1 -i 1 -t 20 -w 10K
[  3]  0.0- 1.0 sec  2.00 MBytes  16.8 Mbits/sec
[  3]  1.0- 2.0 sec  3.12 MBytes  26.2 Mbits/sec
[  3]  2.0- 3.0 sec  3.25 MBytes  27.3 Mbits/sec
[  3]  3.0- 4.0 sec   655 KBytes  5.36 Mbits/sec
[  3]  4.0- 5.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  5.0- 6.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  6.0- 7.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  7.0- 8.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  8.0- 9.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  9.0-10.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  0.0-10.3 sec  9.01 MBytes  7.32 Mbits/sec

There are frames in the ieee80211_txq and there are frames that have
been removed from from this queue, but haven't yet been sent on the wire
(num_pending_tx).

When num_pending_tx reaches max_num_pending_tx, we will stop the queues
by calling ieee80211_stop_queues().

As frames that have previously been sent for transmission
(num_pending_tx) are completed, we will decrease num_pending_tx and wake
the queues by calling ieee80211_wake_queue(). ieee80211_wake_queue()
does not call wake_tx_queue, so we might still have frames in the
queue at this point.

While the queues were stopped, the socket buffer might have filled up,
and in order for user space to write more, we need to free the frames
in the queue, since they are accounted to the socket. In order to free
them, we first need to transmit them.

This problem cannot be reproduced on low-latency devices, e.g. pci,
since they call ath10k_mac_tx_push_pending() from
ath10k_htt_txrx_compl_task(). ath10k_htt_txrx_compl_task() is not called
on high-latency devices.
Fix the problem by calling ath10k_mac_tx_push_pending(), after
processing rx packets, just like for low-latency devices, also in the
SDIO case. Since we are calling ath10k_mac_tx_push_pending() directly,
we also need to export it.

Signed-off-by: Niklas Cassel <niklas.cassel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
---
Changes since V2:
Moved ath10k_mac_tx_push_pending() call to ath10k_sdio_irq_handler().

 drivers/net/wireless/ath/ath10k/mac.c  | 1 +
 drivers/net/wireless/ath/ath10k/sdio.c | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index 487a7a7380fd..bfcd9705ed54 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -4038,6 +4038,7 @@ void ath10k_mac_tx_push_pending(struct ath10k *ar)
 	rcu_read_unlock();
 	spin_unlock_bh(&ar->txqs_lock);
 }
+EXPORT_SYMBOL(ath10k_mac_tx_push_pending);
 
 /************/
 /* Scanning */
diff --git a/drivers/net/wireless/ath/ath10k/sdio.c b/drivers/net/wireless/ath/ath10k/sdio.c
index d612ce8c9cff..2856c75f9011 100644
--- a/drivers/net/wireless/ath/ath10k/sdio.c
+++ b/drivers/net/wireless/ath/ath10k/sdio.c
@@ -30,6 +30,7 @@
 #include "debug.h"
 #include "hif.h"
 #include "htc.h"
+#include "mac.h"
 #include "targaddrs.h"
 #include "trace.h"
 #include "sdio.h"
@@ -1342,6 +1343,8 @@ static void ath10k_sdio_irq_handler(struct sdio_func *func)
 			break;
 	} while (time_before(jiffies, timeout) && !done);
 
+	ath10k_mac_tx_push_pending(ar);
+
 	sdio_claim_host(ar_sdio->func);
 
 	if (ret && ret != -ECANCELED)
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH 2/4] arcnet: com20020: bindings for smsc com20020
From: Andrea Greco @ 2018-05-23 22:17 UTC (permalink / raw)
  To: Rob Herring
  Cc: Tobin C. Harding, Andrea Greco, Mark Rutland, netdev, devicetree,
	linux-kernel
In-Reply-To: <20180523164931.GA3635@rob-hp-laptop>

On 05/23/2018 06:49 PM, Rob Herring wrote:
> One typo, otherwise:
>
> Reviewed-by: Rob Herring <robh@kernel.org>

Yes typo, Fixed over my branch, sorry for that...
I expect a comment about bps, Bit per Second, used in `bus-speed-bps`
You will add it by your self in property-units.txt, or required my patch?
If your confirm that, ready for: Reviewed-by

Regards, Andrea

^ permalink raw reply

* Re: [PATCH v3] ath10k: transmit queued frames after processing rx packets
From: Niklas Cassel @ 2018-05-23 22:20 UTC (permalink / raw)
  To: Kalle Valo; +Cc: ath10k, linux-wireless, netdev, linux-kernel
In-Reply-To: <20180523221508.26391-1-niklas.cassel@linaro.org>

On Thu, May 24, 2018 at 12:15:08AM +0200, Niklas Cassel wrote:
> This problem cannot be reproduced on low-latency devices, e.g. pci,
> since they call ath10k_mac_tx_push_pending() from
> ath10k_htt_txrx_compl_task(). ath10k_htt_txrx_compl_task() is not called
> on high-latency devices.
> Fix the problem by calling ath10k_mac_tx_push_pending(), after
> processing rx packets, just like for low-latency devices, also in the
> SDIO case. Since we are calling ath10k_mac_tx_push_pending() directly,
> we also need to export it.
> 

Even if we are now calling ath10k_mac_tx_push_pending each time we process rx packets,
the number of packets we actually queue from ath10k_mac_tx_push_pending are quite few:

>From running iperf for 20 seconds:

# grep ath10k_mac_tx_push_txq /sys/kernel/debug/tracing/trace | grep -v ath10k_mac_op_wake_tx_queue | wc -l
233
number of times ath10k_mac_tx_push_txq was called, but not from ath10k_mac_op_wake_tx_queue,
i.e. number of times ath10k_mac_tx_push_txq was called from ath10k_mac_tx_push_pending.


# grep ath10k_mac_tx_push_txq /sys/kernel/debug/tracing/trace | grep -v ath10k_mac_tx_push_pending | wc -l
28415
number of times ath10k_mac_tx_push_txq was called, but not from ath10k_mac_tx_push_pending,
i.e number of times ath10k_mac_tx_push_txq was called from ath10k_mac_op_wake_tx_queue.


Regards,
Niklas

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox