* Re: [PATCH net-next v15 4/7] sch_cake: Add NAT awareness to packet classifier
From: Pablo Neira Ayuso @ 2018-05-23 22:46 UTC (permalink / raw)
To: Toke Høiland-Jørgensen; +Cc: netdev, cake, netfilter-devel
In-Reply-To: <87k1rvg4qt.fsf@toke.dk>
On Tue, May 22, 2018 at 04:11:06PM +0200, Toke Høiland-Jørgensen wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> writes:
>
> > Hi Toke,
> >
> > On Tue, May 22, 2018 at 03:57:38PM +0200, Toke Høiland-Jørgensen wrote:
> >> When CAKE is deployed on a gateway that also performs NAT (which is a
> >> common deployment mode), the host fairness mechanism cannot distinguish
> >> internal hosts from each other, and so fails to work correctly.
> >>
> >> To fix this, we add an optional NAT awareness mode, which will query the
> >> kernel conntrack mechanism to obtain the pre-NAT addresses for each packet
> >> and use that in the flow and host hashing.
> >>
> >> When the shaper is enabled and the host is already performing NAT, the cost
> >> of this lookup is negligible. However, in unlimited mode with no NAT being
> >> performed, there is a significant CPU cost at higher bandwidths. For this
> >> reason, the feature is turned off by default.
> >>
> >> Cc: netfilter-devel@vger.kernel.org
> >> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
> >> ---
> >> net/sched/sch_cake.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >> 1 file changed, 79 insertions(+)
> >>
> >> diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
> >> index 68ac908470f1..6f7cae705c84 100644
> >> --- a/net/sched/sch_cake.c
> >> +++ b/net/sched/sch_cake.c
> >> @@ -71,6 +71,12 @@
> >> #include <net/tcp.h>
> >> #include <net/flow_dissector.h>
> >>
> >> +#if IS_REACHABLE(CONFIG_NF_CONNTRACK)
> >> +#include <net/netfilter/nf_conntrack_core.h>
> >> +#include <net/netfilter/nf_conntrack_zones.h>
> >> +#include <net/netfilter/nf_conntrack.h>
> >> +#endif
> >> +
> >> #define CAKE_SET_WAYS (8)
> >> #define CAKE_MAX_TINS (8)
> >> #define CAKE_QUEUES (1024)
> >> @@ -516,6 +522,60 @@ static bool cobalt_should_drop(struct cobalt_vars *vars,
> >> return drop;
> >> }
> >>
> >> +#if IS_REACHABLE(CONFIG_NF_CONNTRACK)
> >> +
> >> +static void cake_update_flowkeys(struct flow_keys *keys,
> >> + const struct sk_buff *skb)
> >> +{
> >> + const struct nf_conntrack_tuple *tuple;
> >> + enum ip_conntrack_info ctinfo;
> >> + struct nf_conn *ct;
> >> + bool rev = false;
> >> +
> >> + if (tc_skb_protocol(skb) != htons(ETH_P_IP))
> >> + return;
> >> +
> >> + ct = nf_ct_get(skb, &ctinfo);
> >> + if (ct) {
> >> + tuple = nf_ct_tuple(ct, CTINFO2DIR(ctinfo));
> >> + } else {
> >> + const struct nf_conntrack_tuple_hash *hash;
> >> + struct nf_conntrack_tuple srctuple;
> >> +
> >> + if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb),
> >> + NFPROTO_IPV4, dev_net(skb->dev),
> >> + &srctuple))
> >> + return;
> >> +
> >> + hash = nf_conntrack_find_get(dev_net(skb->dev),
> >> + &nf_ct_zone_dflt,
> >> + &srctuple);
> >> + if (!hash)
> >> + return;
> >> +
> >> + rev = true;
> >> + ct = nf_ct_tuplehash_to_ctrack(hash);
> >> + tuple = nf_ct_tuple(ct, !hash->tuple.dst.dir);
> >> + }
> >> +
> >> + keys->addrs.v4addrs.src = rev ? tuple->dst.u3.ip : tuple->src.u3.ip;
> >> + keys->addrs.v4addrs.dst = rev ? tuple->src.u3.ip : tuple->dst.u3.ip;
> >> +
> >> + if (keys->ports.ports) {
> >> + keys->ports.src = rev ? tuple->dst.u.all : tuple->src.u.all;
> >> + keys->ports.dst = rev ? tuple->src.u.all : tuple->dst.u.all;
> >> + }
> >> + if (rev)
> >> + nf_ct_put(ct);
> >> +}
> >
> > This is going to pull in the nf_conntrack module, even if you may not
> > want it, as soon as cake is in place.
>
> Yeah, we are aware of that; we get a moddep on nf_conntrack. Our main
> deployment scenario has been home routers where conntrack is used
> anyway, so this has not been much of an issue. However, if there is a
> way to avoid this, and instead detect at runtime if conntrack is
> available, that would certainly be useful. Is there? :)
Yes, there is.
You place this function in net/netfilter/nf_conntrack_core.c, call it
nf_conntrack_get_tuple() which internally uses a rcu hook for this.
See nf_ct_attach() and ip_ct_attach() in net/netfilter/core.c for
instance.
This allows you to avoid the dependency with nf_conntrack (which would
be only called if the module has been explicitly loaded), which is
what you're searching for.
^ permalink raw reply
* Re: [PATCH v2] ath10k: transmit queued frames after waking queues
From: Niklas Cassel @ 2018-05-23 22:44 UTC (permalink / raw)
To: Erik Stromdahl
Cc: Rajkumar Manoharan, Kalle Valo, ath10k, linux-wireless, netdev,
linux-kernel, linux-wireless-owner
In-Reply-To: <c131da6e-6479-3a40-fbd3-9c61d6690ba8@gmail.com>
On Wed, May 23, 2018 at 06:25:49PM +0200, Erik Stromdahl wrote:
>
>
> On 05/22/2018 11:15 PM, Niklas Cassel wrote:
>
> <snip>
> > >
> > > Earlier we observed performance issues in calling push_pending from each
> > > tx completion. IMHO this change may introduce the same problem again.
> >
> > I prefer functional TX over performance issues,
> > but I agree that it is unfortunate that SDIO doesn't use
> > ath10k_htt_txrx_compl_task().
> > Erik, is there a reason for this?
> The reason is that the SDIO code has been derived mainly from qcacld and ath6kl
> and they don't implement napi.
>
> ath10k_htt_txrx_compl_task is currently only called from the napi poll function,
> and the sdio bus driver doesn't have such a function.
Ok, thanks for the explanation. Perhaps we can change the SDIO code so that it
uses NAPI in the future.
<snip>
> > Another solution might be to change so that we only call
> > ath10k_mac_tx_push_pending() from ath10k_txrx_tx_unref()
> > if (htt->num_pending_tx == 0). That should decrease the number
> > of calls to ath10k_mac_tx_push_pending(), while still avoiding
> > a "TX deadlock" scenario for SDIO.
> Just out of curiosity, where did the limit of 3 come from?
> If it works with a limit of 0, I think it should be used instead.
It came from mt76_txq_schedule():
if (hwq->swq_queued >= 4 || list_empty(&hwq->swq))
break;
len = mt76_txq_schedule_list(dev, hwq);
Since this used a break, I simply inverted the logic,
and called ath10k_mac_tx_push_pending() rather than
mt76_txq_schedule_list().
However, I've submitted a V4 now that mimics the behavior
in ath10k_htt_txrx_compl_task() instead, so now I call
ath10k_mac_tx_push_pending() regardless of num_pending_tx.
In most cases, ath10k_mac_tx_push_pending() will not dequeue
any frames, since the ar->txqs list will be empty, so this
shouldn't be so bad after all.
>
> Another intersting thing that I stumbled upon when looking into the
> code (while writing this email) is the *wake_up(&htt->empty_tx_wq);*
>
> For some reason I have considered it not to be applicable for HL devices.
>
> The queue is waited for in the flush op (*ath10k_flush*).
> I am unsure what it is used for, but I don't think it affects the TX
> deadlock scenario.
It seems to be called by mac80211 in certain scenarios, but like you said,
it doesn't help with this problem.
Regards,
Niklas
^ permalink raw reply
* Re: [PATCH net-next v15 4/7] sch_cake: Add NAT awareness to packet classifier
From: Toke Høiland-Jørgensen @ 2018-05-23 22:40 UTC (permalink / raw)
To: David Miller; +Cc: netdev, cake, netfilter-devel
In-Reply-To: <20180523.172008.1067759293733489715.davem@davemloft.net>
David Miller <davem@davemloft.net> writes:
> From: Toke Høiland-Jørgensen <toke@toke.dk>
> Date: Wed, 23 May 2018 23:05:16 +0200
>
>> Ah, right, that could work. Is there any particular field in sk_buff
>> we should stomp on for this purpose, or would you prefer a new one?
>> Looking through it, the only obvious one that comes to mind is, well,
>> skb->_nfct :)
>>
>> If we wanted to avoid bloating sk_buff, we could add a union with that,
>> fill it in the flow dissector, and just let conntrack overwrite it if
>> active; then detect which is which in Cake, and read the data we need
>> from _nfct if conntrack is active, and from what the flow dissector
>> stored otherwise.
>>
>> Is that too many hoops to jump through to avoid adding an extra field?
>
> Space is precious in sk_buff, so yes avoid adding new members at all
> costs.
>
> How much info do you need exactly?
We use a u32 hash (from flow_hash_from_keys()) on the source address.
Ideally we'd want that; but we could get away with less if we are
willing to accept more hash collisions; we just need to map the source
address into a hash bucket. We currently have 1024 of those, so 10 bits
would suffice if we just drop the set-associative hashing for source
hosts.
Or maybe 16 bits to be on the safe side? It really is a pretty
straight-forward tradeoff between space and collision probability.
Hmm, and we still have an issue with ingress filtering (where cake is
running on an ifb interface). That runs pre-NAT in the conntrack case,
and we can't do the RX trick. Here we do the lookup manually in
conntrack (and this part is actually what brings in most of the
dependencies). Any neat tricks up your sleeve for this case? :)
-Toke
^ permalink raw reply
* Re: [kbuild-all] [PATCH net-next 2/2] sctp: add sctp_make_op_error_limited and reuse inner functions
From: Marcelo Ricardo Leitner @ 2018-05-23 22:34 UTC (permalink / raw)
To: Ye Xiaolong
Cc: kbuild test robot, Xin Long, Neil Horman, netdev, Vlad Yasevich,
linux-sctp, kbuild-all
In-Reply-To: <20180515012308.GH17407@yexl-desktop>
On Tue, May 15, 2018 at 09:23:08AM +0800, Ye Xiaolong wrote:
> On 05/14, Marcelo Ricardo Leitner wrote:
> >On Mon, May 14, 2018 at 07:47:20PM +0800, Ye Xiaolong wrote:
> >> On 05/14, Marcelo Ricardo Leitner wrote:
> >> >On Mon, May 14, 2018 at 03:40:53PM +0800, Ye Xiaolong wrote:
> >> >> >> config: x86_64-randconfig-x006-201817 (attached as .config)
> >> >> >> compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
> >> >> >> reproduce:
> >> >> >> # save the attached .config to linux build tree
> >> >> >> make ARCH=x86_64
> >> >> >>
> >> >> >> All errors (new ones prefixed by >>):
> >> >> >>
> >> >> >> net//sctp/sm_make_chunk.c: In function 'sctp_make_op_error_limited':
> >> >> >> >> net//sctp/sm_make_chunk.c:1260:9: error: implicit declaration of function 'sctp_mtu_payload'; did you mean 'sctp_do_peeloff'? [-Werror=implicit-function-declaration]
> >> >> >> size = sctp_mtu_payload(sp, size, sizeof(struct sctp_errhdr));
> >> >> >> ^~~~~~~~~~~~~~~~
> >> >> >> sctp_do_peeloff
> >> >> >> cc1: some warnings being treated as errors
> >> >> >
> >> >> >Seems the test didn't pick up the MTU refactor patchset yet.
> >> >>
> >> >> Do you mean your patchset require MTU refactor patchset as prerequisites?
> >> >
> >> >Yes.
> >>
> >> Then it is recommended to use '--base' option of git format-patch, it would record
> >> the base tree info in the first patch or cover letter, 0day bot would apply your
> >> patchset to right base according to it.
> >
> >Nice. I wasn't aware of it. Thanks.
> >
> >Considering that the MTU refactor patchset was already applied on
> >net-next when the bot did the test, why should I have to specify the
> >base?
>
> Could you share me the subjects or commits of MTU refactor patcheset, I'll double
> check what was wrong.
https://www.mail-archive.com/netdev@vger.kernel.org/msg231756.html
this one.
Thanks,
Marcelo
>
> Thanks,
> Xiaolong
> >
> > Marcelo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply
* Re: [PATCH iproute2] Allow to configure /var/run/netns directory
From: Stephen Hemminger @ 2018-05-23 22:30 UTC (permalink / raw)
To: Pavel Maltsev; +Cc: netdev, lorenzo
In-Reply-To: <20180518224400.201511-1-pavelm@google.com>
On Fri, 18 May 2018 15:44:00 -0700
Pavel Maltsev <pavelm@google.com> wrote:
> Currently NETNS_RUN_DIR is hardcoded and refers to /var/run/netns.
> However, some systems (e.g. Android) doesn't have /var
> which results in error attempts to create network namespaces on these
> systems. This change makes NETNS_RUN_DIR configurable at build time
> by allowing to pass environment variable to make command.
> Also, this change makes /etc/netns directory configurable through
> NETNS_ETC_DIR environment variable.
>
> For example: ./configure && NETNS_RUN_DIR=/mnt/vendor/netns make
>
> Tested: verified that iproute2 with configuration mentioned above
> creates namespaces in /mnt/vendor/netns
>
> Signed-off-by: Pavel Maltsev <pavelm@google.com>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH iproute2] ip route: Print expires as signed int
From: Stephen Hemminger @ 2018-05-23 22:30 UTC (permalink / raw)
To: dsahern; +Cc: netdev, David Ahern
In-Reply-To: <20180523185001.32686-1-dsahern@kernel.org>
On Wed, 23 May 2018 11:50:01 -0700
dsahern@kernel.org wrote:
> From: David Ahern <dsahern@gmail.com>
>
> rta_expires is a signed int; print it as one.
>
> Fixes: 663c3cb23103f ("iproute: implement JSON and color output")
> Signed-off-by: David Ahern <dsahern@gmail.com>
Applied, thanks.
^ permalink raw reply
* [Patch net-next] net_sched: switch to rcu_work
From: Cong Wang @ 2018-05-23 22:26 UTC (permalink / raw)
To: netdev; +Cc: Cong Wang, Tejun Heo, Paul E. McKenney, Jamal Hadi Salim
Commit 05f0fe6b74db ("RCU, workqueue: Implement rcu_work") introduces
new API's for dispatching work in a RCU callback. Now we can just
switch to the new API's for tc filters. This could get rid of a lot
of code.
Cc: Tejun Heo <tj@kernel.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
include/net/pkt_cls.h | 2 +-
net/sched/cls_api.c | 5 +++--
net/sched/cls_basic.c | 24 +++++++-----------------
net/sched/cls_bpf.c | 22 ++++++----------------
net/sched/cls_cgroup.c | 23 +++++------------------
net/sched/cls_flow.c | 24 +++++++-----------------
net/sched/cls_flower.c | 40 ++++++++++------------------------------
net/sched/cls_fw.c | 24 +++++++-----------------
net/sched/cls_matchall.c | 21 +++++----------------
net/sched/cls_route.c | 23 +++++++++--------------
net/sched/cls_rsvp.h | 20 +++++---------------
net/sched/cls_tcindex.c | 41 ++++++++++-------------------------------
net/sched/cls_u32.c | 37 ++++++++++---------------------------
13 files changed, 85 insertions(+), 221 deletions(-)
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 0005f0b40fe9..f3ec43725724 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -33,7 +33,7 @@ struct tcf_block_ext_info {
};
struct tcf_block_cb;
-bool tcf_queue_work(struct work_struct *work);
+bool tcf_queue_work(struct rcu_work *rwork, work_func_t func);
#ifdef CONFIG_NET_CLS
struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 963e4bf0aab8..a4a5ace834c3 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -103,9 +103,10 @@ int unregister_tcf_proto_ops(struct tcf_proto_ops *ops)
}
EXPORT_SYMBOL(unregister_tcf_proto_ops);
-bool tcf_queue_work(struct work_struct *work)
+bool tcf_queue_work(struct rcu_work *rwork, work_func_t func)
{
- return queue_work(tc_filter_wq, work);
+ INIT_RCU_WORK(rwork, func);
+ return queue_rcu_work(tc_filter_wq, rwork);
}
EXPORT_SYMBOL(tcf_queue_work);
diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c
index 6b7ab3512f5b..95367f37098d 100644
--- a/net/sched/cls_basic.c
+++ b/net/sched/cls_basic.c
@@ -35,10 +35,7 @@ struct basic_filter {
struct tcf_result res;
struct tcf_proto *tp;
struct list_head link;
- union {
- struct work_struct work;
- struct rcu_head rcu;
- };
+ struct rcu_work rwork;
};
static int basic_classify(struct sk_buff *skb, const struct tcf_proto *tp,
@@ -97,21 +94,14 @@ static void __basic_delete_filter(struct basic_filter *f)
static void basic_delete_filter_work(struct work_struct *work)
{
- struct basic_filter *f = container_of(work, struct basic_filter, work);
-
+ struct basic_filter *f = container_of(to_rcu_work(work),
+ struct basic_filter,
+ rwork);
rtnl_lock();
__basic_delete_filter(f);
rtnl_unlock();
}
-static void basic_delete_filter(struct rcu_head *head)
-{
- struct basic_filter *f = container_of(head, struct basic_filter, rcu);
-
- INIT_WORK(&f->work, basic_delete_filter_work);
- tcf_queue_work(&f->work);
-}
-
static void basic_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
{
struct basic_head *head = rtnl_dereference(tp->root);
@@ -122,7 +112,7 @@ static void basic_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
tcf_unbind_filter(tp, &f->res);
idr_remove(&head->handle_idr, f->handle);
if (tcf_exts_get_net(&f->exts))
- call_rcu(&f->rcu, basic_delete_filter);
+ tcf_queue_work(&f->rwork, basic_delete_filter_work);
else
__basic_delete_filter(f);
}
@@ -140,7 +130,7 @@ static int basic_delete(struct tcf_proto *tp, void *arg, bool *last,
tcf_unbind_filter(tp, &f->res);
idr_remove(&head->handle_idr, f->handle);
tcf_exts_get_net(&f->exts);
- call_rcu(&f->rcu, basic_delete_filter);
+ tcf_queue_work(&f->rwork, basic_delete_filter_work);
*last = list_empty(&head->flist);
return 0;
}
@@ -234,7 +224,7 @@ static int basic_change(struct net *net, struct sk_buff *in_skb,
list_replace_rcu(&fold->link, &fnew->link);
tcf_unbind_filter(tp, &fold->res);
tcf_exts_get_net(&fold->exts);
- call_rcu(&fold->rcu, basic_delete_filter);
+ tcf_queue_work(&fold->rwork, basic_delete_filter_work);
} else {
list_add_rcu(&fnew->link, &head->flist);
}
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index b07c1fa8bc0d..1aa7f6511065 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -49,10 +49,7 @@ struct cls_bpf_prog {
struct sock_filter *bpf_ops;
const char *bpf_name;
struct tcf_proto *tp;
- union {
- struct work_struct work;
- struct rcu_head rcu;
- };
+ struct rcu_work rwork;
};
static const struct nla_policy bpf_policy[TCA_BPF_MAX + 1] = {
@@ -275,21 +272,14 @@ static void __cls_bpf_delete_prog(struct cls_bpf_prog *prog)
static void cls_bpf_delete_prog_work(struct work_struct *work)
{
- struct cls_bpf_prog *prog = container_of(work, struct cls_bpf_prog, work);
-
+ struct cls_bpf_prog *prog = container_of(to_rcu_work(work),
+ struct cls_bpf_prog,
+ rwork);
rtnl_lock();
__cls_bpf_delete_prog(prog);
rtnl_unlock();
}
-static void cls_bpf_delete_prog_rcu(struct rcu_head *rcu)
-{
- struct cls_bpf_prog *prog = container_of(rcu, struct cls_bpf_prog, rcu);
-
- INIT_WORK(&prog->work, cls_bpf_delete_prog_work);
- tcf_queue_work(&prog->work);
-}
-
static void __cls_bpf_delete(struct tcf_proto *tp, struct cls_bpf_prog *prog,
struct netlink_ext_ack *extack)
{
@@ -300,7 +290,7 @@ static void __cls_bpf_delete(struct tcf_proto *tp, struct cls_bpf_prog *prog,
list_del_rcu(&prog->link);
tcf_unbind_filter(tp, &prog->res);
if (tcf_exts_get_net(&prog->exts))
- call_rcu(&prog->rcu, cls_bpf_delete_prog_rcu);
+ tcf_queue_work(&prog->rwork, cls_bpf_delete_prog_work);
else
__cls_bpf_delete_prog(prog);
}
@@ -526,7 +516,7 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
list_replace_rcu(&oldprog->link, &prog->link);
tcf_unbind_filter(tp, &oldprog->res);
tcf_exts_get_net(&oldprog->exts);
- call_rcu(&oldprog->rcu, cls_bpf_delete_prog_rcu);
+ tcf_queue_work(&oldprog->rwork, cls_bpf_delete_prog_work);
} else {
list_add_rcu(&prog->link, &head->plist);
}
diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
index 762da5c0cf5e..3bc01bdde165 100644
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -23,10 +23,7 @@ struct cls_cgroup_head {
struct tcf_exts exts;
struct tcf_ematch_tree ematches;
struct tcf_proto *tp;
- union {
- struct work_struct work;
- struct rcu_head rcu;
- };
+ struct rcu_work rwork;
};
static int cls_cgroup_classify(struct sk_buff *skb, const struct tcf_proto *tp,
@@ -70,24 +67,14 @@ static void __cls_cgroup_destroy(struct cls_cgroup_head *head)
static void cls_cgroup_destroy_work(struct work_struct *work)
{
- struct cls_cgroup_head *head = container_of(work,
+ struct cls_cgroup_head *head = container_of(to_rcu_work(work),
struct cls_cgroup_head,
- work);
+ rwork);
rtnl_lock();
__cls_cgroup_destroy(head);
rtnl_unlock();
}
-static void cls_cgroup_destroy_rcu(struct rcu_head *root)
-{
- struct cls_cgroup_head *head = container_of(root,
- struct cls_cgroup_head,
- rcu);
-
- INIT_WORK(&head->work, cls_cgroup_destroy_work);
- tcf_queue_work(&head->work);
-}
-
static int cls_cgroup_change(struct net *net, struct sk_buff *in_skb,
struct tcf_proto *tp, unsigned long base,
u32 handle, struct nlattr **tca,
@@ -134,7 +121,7 @@ static int cls_cgroup_change(struct net *net, struct sk_buff *in_skb,
rcu_assign_pointer(tp->root, new);
if (head) {
tcf_exts_get_net(&head->exts);
- call_rcu(&head->rcu, cls_cgroup_destroy_rcu);
+ tcf_queue_work(&head->rwork, cls_cgroup_destroy_work);
}
return 0;
errout:
@@ -151,7 +138,7 @@ static void cls_cgroup_destroy(struct tcf_proto *tp,
/* Head can still be NULL due to cls_cgroup_init(). */
if (head) {
if (tcf_exts_get_net(&head->exts))
- call_rcu(&head->rcu, cls_cgroup_destroy_rcu);
+ tcf_queue_work(&head->rwork, cls_cgroup_destroy_work);
else
__cls_cgroup_destroy(head);
}
diff --git a/net/sched/cls_flow.c b/net/sched/cls_flow.c
index cd5fe383afdd..2bb043cd436b 100644
--- a/net/sched/cls_flow.c
+++ b/net/sched/cls_flow.c
@@ -57,10 +57,7 @@ struct flow_filter {
u32 divisor;
u32 baseclass;
u32 hashrnd;
- union {
- struct work_struct work;
- struct rcu_head rcu;
- };
+ struct rcu_work rwork;
};
static inline u32 addr_fold(void *addr)
@@ -383,21 +380,14 @@ static void __flow_destroy_filter(struct flow_filter *f)
static void flow_destroy_filter_work(struct work_struct *work)
{
- struct flow_filter *f = container_of(work, struct flow_filter, work);
-
+ struct flow_filter *f = container_of(to_rcu_work(work),
+ struct flow_filter,
+ rwork);
rtnl_lock();
__flow_destroy_filter(f);
rtnl_unlock();
}
-static void flow_destroy_filter(struct rcu_head *head)
-{
- struct flow_filter *f = container_of(head, struct flow_filter, rcu);
-
- INIT_WORK(&f->work, flow_destroy_filter_work);
- tcf_queue_work(&f->work);
-}
-
static int flow_change(struct net *net, struct sk_buff *in_skb,
struct tcf_proto *tp, unsigned long base,
u32 handle, struct nlattr **tca,
@@ -563,7 +553,7 @@ static int flow_change(struct net *net, struct sk_buff *in_skb,
if (fold) {
tcf_exts_get_net(&fold->exts);
- call_rcu(&fold->rcu, flow_destroy_filter);
+ tcf_queue_work(&fold->rwork, flow_destroy_filter_work);
}
return 0;
@@ -583,7 +573,7 @@ static int flow_delete(struct tcf_proto *tp, void *arg, bool *last,
list_del_rcu(&f->list);
tcf_exts_get_net(&f->exts);
- call_rcu(&f->rcu, flow_destroy_filter);
+ tcf_queue_work(&f->rwork, flow_destroy_filter_work);
*last = list_empty(&head->filters);
return 0;
}
@@ -608,7 +598,7 @@ static void flow_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
list_for_each_entry_safe(f, next, &head->filters, list) {
list_del_rcu(&f->list);
if (tcf_exts_get_net(&f->exts))
- call_rcu(&f->rcu, flow_destroy_filter);
+ tcf_queue_work(&f->rwork, flow_destroy_filter_work);
else
__flow_destroy_filter(f);
}
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index eacaaf803914..4e74508515f4 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -73,10 +73,7 @@ struct fl_flow_mask {
struct cls_fl_head {
struct rhashtable ht;
struct list_head masks;
- union {
- struct work_struct work;
- struct rcu_head rcu;
- };
+ struct rcu_work rwork;
struct idr handle_idr;
};
@@ -90,10 +87,7 @@ struct cls_fl_filter {
struct list_head list;
u32 handle;
u32 flags;
- union {
- struct work_struct work;
- struct rcu_head rcu;
- };
+ struct rcu_work rwork;
struct net_device *hw_dev;
};
@@ -235,21 +229,14 @@ static void __fl_destroy_filter(struct cls_fl_filter *f)
static void fl_destroy_filter_work(struct work_struct *work)
{
- struct cls_fl_filter *f = container_of(work, struct cls_fl_filter, work);
+ struct cls_fl_filter *f = container_of(to_rcu_work(work),
+ struct cls_fl_filter, rwork);
rtnl_lock();
__fl_destroy_filter(f);
rtnl_unlock();
}
-static void fl_destroy_filter(struct rcu_head *head)
-{
- struct cls_fl_filter *f = container_of(head, struct cls_fl_filter, rcu);
-
- INIT_WORK(&f->work, fl_destroy_filter_work);
- tcf_queue_work(&f->work);
-}
-
static void fl_hw_destroy_filter(struct tcf_proto *tp, struct cls_fl_filter *f,
struct netlink_ext_ack *extack)
{
@@ -327,7 +314,7 @@ static bool __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f,
fl_hw_destroy_filter(tp, f, extack);
tcf_unbind_filter(tp, &f->res);
if (async)
- call_rcu(&f->rcu, fl_destroy_filter);
+ tcf_queue_work(&f->rwork, fl_destroy_filter_work);
else
__fl_destroy_filter(f);
@@ -336,20 +323,13 @@ static bool __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f,
static void fl_destroy_sleepable(struct work_struct *work)
{
- struct cls_fl_head *head = container_of(work, struct cls_fl_head,
- work);
+ struct cls_fl_head *head = container_of(to_rcu_work(work),
+ struct cls_fl_head,
+ rwork);
kfree(head);
module_put(THIS_MODULE);
}
-static void fl_destroy_rcu(struct rcu_head *rcu)
-{
- struct cls_fl_head *head = container_of(rcu, struct cls_fl_head, rcu);
-
- INIT_WORK(&head->work, fl_destroy_sleepable);
- schedule_work(&head->work);
-}
-
static void fl_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
{
struct cls_fl_head *head = rtnl_dereference(tp->root);
@@ -365,7 +345,7 @@ static void fl_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
idr_destroy(&head->handle_idr);
__module_get(THIS_MODULE);
- call_rcu(&head->rcu, fl_destroy_rcu);
+ tcf_queue_work(&head->rwork, fl_destroy_sleepable);
}
static void *fl_get(struct tcf_proto *tp, u32 handle)
@@ -1036,7 +1016,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
list_replace_rcu(&fold->list, &fnew->list);
tcf_unbind_filter(tp, &fold->res);
tcf_exts_get_net(&fold->exts);
- call_rcu(&fold->rcu, fl_destroy_filter);
+ tcf_queue_work(&fold->rwork, fl_destroy_filter_work);
} else {
list_add_tail_rcu(&fnew->list, &fnew->mask->filters);
}
diff --git a/net/sched/cls_fw.c b/net/sched/cls_fw.c
index 8b207723fbc2..29eeeaf3ea44 100644
--- a/net/sched/cls_fw.c
+++ b/net/sched/cls_fw.c
@@ -47,10 +47,7 @@ struct fw_filter {
#endif /* CONFIG_NET_CLS_IND */
struct tcf_exts exts;
struct tcf_proto *tp;
- union {
- struct work_struct work;
- struct rcu_head rcu;
- };
+ struct rcu_work rwork;
};
static u32 fw_hash(u32 handle)
@@ -134,21 +131,14 @@ static void __fw_delete_filter(struct fw_filter *f)
static void fw_delete_filter_work(struct work_struct *work)
{
- struct fw_filter *f = container_of(work, struct fw_filter, work);
-
+ struct fw_filter *f = container_of(to_rcu_work(work),
+ struct fw_filter,
+ rwork);
rtnl_lock();
__fw_delete_filter(f);
rtnl_unlock();
}
-static void fw_delete_filter(struct rcu_head *head)
-{
- struct fw_filter *f = container_of(head, struct fw_filter, rcu);
-
- INIT_WORK(&f->work, fw_delete_filter_work);
- tcf_queue_work(&f->work);
-}
-
static void fw_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
{
struct fw_head *head = rtnl_dereference(tp->root);
@@ -164,7 +154,7 @@ static void fw_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
rtnl_dereference(f->next));
tcf_unbind_filter(tp, &f->res);
if (tcf_exts_get_net(&f->exts))
- call_rcu(&f->rcu, fw_delete_filter);
+ tcf_queue_work(&f->rwork, fw_delete_filter_work);
else
__fw_delete_filter(f);
}
@@ -193,7 +183,7 @@ static int fw_delete(struct tcf_proto *tp, void *arg, bool *last,
RCU_INIT_POINTER(*fp, rtnl_dereference(f->next));
tcf_unbind_filter(tp, &f->res);
tcf_exts_get_net(&f->exts);
- call_rcu(&f->rcu, fw_delete_filter);
+ tcf_queue_work(&f->rwork, fw_delete_filter_work);
ret = 0;
break;
}
@@ -316,7 +306,7 @@ static int fw_change(struct net *net, struct sk_buff *in_skb,
rcu_assign_pointer(*fp, fnew);
tcf_unbind_filter(tp, &f->res);
tcf_exts_get_net(&f->exts);
- call_rcu(&f->rcu, fw_delete_filter);
+ tcf_queue_work(&f->rwork, fw_delete_filter_work);
*arg = fnew;
return err;
diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index 2ba721a590a7..47b207ef7762 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -21,10 +21,7 @@ struct cls_mall_head {
struct tcf_result res;
u32 handle;
u32 flags;
- union {
- struct work_struct work;
- struct rcu_head rcu;
- };
+ struct rcu_work rwork;
};
static int mall_classify(struct sk_buff *skb, const struct tcf_proto *tp,
@@ -53,22 +50,14 @@ static void __mall_destroy(struct cls_mall_head *head)
static void mall_destroy_work(struct work_struct *work)
{
- struct cls_mall_head *head = container_of(work, struct cls_mall_head,
- work);
+ struct cls_mall_head *head = container_of(to_rcu_work(work),
+ struct cls_mall_head,
+ rwork);
rtnl_lock();
__mall_destroy(head);
rtnl_unlock();
}
-static void mall_destroy_rcu(struct rcu_head *rcu)
-{
- struct cls_mall_head *head = container_of(rcu, struct cls_mall_head,
- rcu);
-
- INIT_WORK(&head->work, mall_destroy_work);
- tcf_queue_work(&head->work);
-}
-
static void mall_destroy_hw_filter(struct tcf_proto *tp,
struct cls_mall_head *head,
unsigned long cookie,
@@ -126,7 +115,7 @@ static void mall_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
mall_destroy_hw_filter(tp, head, (unsigned long) head, extack);
if (tcf_exts_get_net(&head->exts))
- call_rcu(&head->rcu, mall_destroy_rcu);
+ tcf_queue_work(&head->rwork, mall_destroy_work);
else
__mall_destroy(head);
}
diff --git a/net/sched/cls_route.c b/net/sched/cls_route.c
index 21a03a8ee029..0404aa5fa7cb 100644
--- a/net/sched/cls_route.c
+++ b/net/sched/cls_route.c
@@ -57,10 +57,7 @@ struct route4_filter {
u32 handle;
struct route4_bucket *bkt;
struct tcf_proto *tp;
- union {
- struct work_struct work;
- struct rcu_head rcu;
- };
+ struct rcu_work rwork;
};
#define ROUTE4_FAILURE ((struct route4_filter *)(-1L))
@@ -266,19 +263,17 @@ static void __route4_delete_filter(struct route4_filter *f)
static void route4_delete_filter_work(struct work_struct *work)
{
- struct route4_filter *f = container_of(work, struct route4_filter, work);
-
+ struct route4_filter *f = container_of(to_rcu_work(work),
+ struct route4_filter,
+ rwork);
rtnl_lock();
__route4_delete_filter(f);
rtnl_unlock();
}
-static void route4_delete_filter(struct rcu_head *head)
+static void route4_queue_work(struct route4_filter *f)
{
- struct route4_filter *f = container_of(head, struct route4_filter, rcu);
-
- INIT_WORK(&f->work, route4_delete_filter_work);
- tcf_queue_work(&f->work);
+ tcf_queue_work(&f->rwork, route4_delete_filter_work);
}
static void route4_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
@@ -304,7 +299,7 @@ static void route4_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
RCU_INIT_POINTER(b->ht[h2], next);
tcf_unbind_filter(tp, &f->res);
if (tcf_exts_get_net(&f->exts))
- call_rcu(&f->rcu, route4_delete_filter);
+ route4_queue_work(f);
else
__route4_delete_filter(f);
}
@@ -349,7 +344,7 @@ static int route4_delete(struct tcf_proto *tp, void *arg, bool *last,
/* Delete it */
tcf_unbind_filter(tp, &f->res);
tcf_exts_get_net(&f->exts);
- call_rcu(&f->rcu, route4_delete_filter);
+ tcf_queue_work(&f->rwork, route4_delete_filter_work);
/* Strip RTNL protected tree */
for (i = 0; i <= 32; i++) {
@@ -554,7 +549,7 @@ static int route4_change(struct net *net, struct sk_buff *in_skb,
if (fold) {
tcf_unbind_filter(tp, &fold->res);
tcf_exts_get_net(&fold->exts);
- call_rcu(&fold->rcu, route4_delete_filter);
+ tcf_queue_work(&fold->rwork, route4_delete_filter_work);
}
return 0;
diff --git a/net/sched/cls_rsvp.h b/net/sched/cls_rsvp.h
index 4f1297657c27..e9ccf7daea7d 100644
--- a/net/sched/cls_rsvp.h
+++ b/net/sched/cls_rsvp.h
@@ -97,10 +97,7 @@ struct rsvp_filter {
u32 handle;
struct rsvp_session *sess;
- union {
- struct work_struct work;
- struct rcu_head rcu;
- };
+ struct rcu_work rwork;
};
static inline unsigned int hash_dst(__be32 *dst, u8 protocol, u8 tunnelid)
@@ -294,21 +291,14 @@ static void __rsvp_delete_filter(struct rsvp_filter *f)
static void rsvp_delete_filter_work(struct work_struct *work)
{
- struct rsvp_filter *f = container_of(work, struct rsvp_filter, work);
-
+ struct rsvp_filter *f = container_of(to_rcu_work(work),
+ struct rsvp_filter,
+ rwork);
rtnl_lock();
__rsvp_delete_filter(f);
rtnl_unlock();
}
-static void rsvp_delete_filter_rcu(struct rcu_head *head)
-{
- struct rsvp_filter *f = container_of(head, struct rsvp_filter, rcu);
-
- INIT_WORK(&f->work, rsvp_delete_filter_work);
- tcf_queue_work(&f->work);
-}
-
static void rsvp_delete_filter(struct tcf_proto *tp, struct rsvp_filter *f)
{
tcf_unbind_filter(tp, &f->res);
@@ -317,7 +307,7 @@ static void rsvp_delete_filter(struct tcf_proto *tp, struct rsvp_filter *f)
* in cleanup() callback
*/
if (tcf_exts_get_net(&f->exts))
- call_rcu(&f->rcu, rsvp_delete_filter_rcu);
+ tcf_queue_work(&f->rwork, rsvp_delete_filter_work);
else
__rsvp_delete_filter(f);
}
diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index b49cc990a000..32f4bbd82f35 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -28,20 +28,14 @@
struct tcindex_filter_result {
struct tcf_exts exts;
struct tcf_result res;
- union {
- struct work_struct work;
- struct rcu_head rcu;
- };
+ struct rcu_work rwork;
};
struct tcindex_filter {
u16 key;
struct tcindex_filter_result result;
struct tcindex_filter __rcu *next;
- union {
- struct work_struct work;
- struct rcu_head rcu;
- };
+ struct rcu_work rwork;
};
@@ -152,21 +146,14 @@ static void tcindex_destroy_rexts_work(struct work_struct *work)
{
struct tcindex_filter_result *r;
- r = container_of(work, struct tcindex_filter_result, work);
+ r = container_of(to_rcu_work(work),
+ struct tcindex_filter_result,
+ rwork);
rtnl_lock();
__tcindex_destroy_rexts(r);
rtnl_unlock();
}
-static void tcindex_destroy_rexts(struct rcu_head *head)
-{
- struct tcindex_filter_result *r;
-
- r = container_of(head, struct tcindex_filter_result, rcu);
- INIT_WORK(&r->work, tcindex_destroy_rexts_work);
- tcf_queue_work(&r->work);
-}
-
static void __tcindex_destroy_fexts(struct tcindex_filter *f)
{
tcf_exts_destroy(&f->result.exts);
@@ -176,23 +163,15 @@ static void __tcindex_destroy_fexts(struct tcindex_filter *f)
static void tcindex_destroy_fexts_work(struct work_struct *work)
{
- struct tcindex_filter *f = container_of(work, struct tcindex_filter,
- work);
+ struct tcindex_filter *f = container_of(to_rcu_work(work),
+ struct tcindex_filter,
+ rwork);
rtnl_lock();
__tcindex_destroy_fexts(f);
rtnl_unlock();
}
-static void tcindex_destroy_fexts(struct rcu_head *head)
-{
- struct tcindex_filter *f = container_of(head, struct tcindex_filter,
- rcu);
-
- INIT_WORK(&f->work, tcindex_destroy_fexts_work);
- tcf_queue_work(&f->work);
-}
-
static int tcindex_delete(struct tcf_proto *tp, void *arg, bool *last,
struct netlink_ext_ack *extack)
{
@@ -228,12 +207,12 @@ static int tcindex_delete(struct tcf_proto *tp, void *arg, bool *last,
*/
if (f) {
if (tcf_exts_get_net(&f->result.exts))
- call_rcu(&f->rcu, tcindex_destroy_fexts);
+ tcf_queue_work(&f->rwork, tcindex_destroy_fexts_work);
else
__tcindex_destroy_fexts(f);
} else {
if (tcf_exts_get_net(&r->exts))
- call_rcu(&r->rcu, tcindex_destroy_rexts);
+ tcf_queue_work(&r->rwork, tcindex_destroy_rexts_work);
else
__tcindex_destroy_rexts(r);
}
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index bac47b5d18fd..fb861f90fde6 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -68,10 +68,7 @@ struct tc_u_knode {
u32 __percpu *pcpu_success;
#endif
struct tcf_proto *tp;
- union {
- struct work_struct work;
- struct rcu_head rcu;
- };
+ struct rcu_work rwork;
/* The 'sel' field MUST be the last field in structure to allow for
* tc_u32_keys allocated at end of structure.
*/
@@ -436,21 +433,14 @@ static int u32_destroy_key(struct tcf_proto *tp, struct tc_u_knode *n,
*/
static void u32_delete_key_work(struct work_struct *work)
{
- struct tc_u_knode *key = container_of(work, struct tc_u_knode, work);
-
+ struct tc_u_knode *key = container_of(to_rcu_work(work),
+ struct tc_u_knode,
+ rwork);
rtnl_lock();
u32_destroy_key(key->tp, key, false);
rtnl_unlock();
}
-static void u32_delete_key_rcu(struct rcu_head *rcu)
-{
- struct tc_u_knode *key = container_of(rcu, struct tc_u_knode, rcu);
-
- INIT_WORK(&key->work, u32_delete_key_work);
- tcf_queue_work(&key->work);
-}
-
/* u32_delete_key_freepf_rcu is the rcu callback variant
* that free's the entire structure including the statistics
* percpu variables. Only use this if the key is not a copy
@@ -460,21 +450,14 @@ static void u32_delete_key_rcu(struct rcu_head *rcu)
*/
static void u32_delete_key_freepf_work(struct work_struct *work)
{
- struct tc_u_knode *key = container_of(work, struct tc_u_knode, work);
-
+ struct tc_u_knode *key = container_of(to_rcu_work(work),
+ struct tc_u_knode,
+ rwork);
rtnl_lock();
u32_destroy_key(key->tp, key, true);
rtnl_unlock();
}
-static void u32_delete_key_freepf_rcu(struct rcu_head *rcu)
-{
- struct tc_u_knode *key = container_of(rcu, struct tc_u_knode, rcu);
-
- INIT_WORK(&key->work, u32_delete_key_freepf_work);
- tcf_queue_work(&key->work);
-}
-
static int u32_delete_key(struct tcf_proto *tp, struct tc_u_knode *key)
{
struct tc_u_knode __rcu **kp;
@@ -491,7 +474,7 @@ static int u32_delete_key(struct tcf_proto *tp, struct tc_u_knode *key)
tcf_unbind_filter(tp, &key->res);
idr_remove(&ht->handle_idr, key->handle);
tcf_exts_get_net(&key->exts);
- call_rcu(&key->rcu, u32_delete_key_freepf_rcu);
+ tcf_queue_work(&key->rwork, u32_delete_key_freepf_work);
return 0;
}
}
@@ -611,7 +594,7 @@ static void u32_clear_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht,
u32_remove_hw_knode(tp, n, extack);
idr_remove(&ht->handle_idr, n->handle);
if (tcf_exts_get_net(&n->exts))
- call_rcu(&n->rcu, u32_delete_key_freepf_rcu);
+ tcf_queue_work(&n->rwork, u32_delete_key_freepf_work);
else
u32_destroy_key(n->tp, n, true);
}
@@ -995,7 +978,7 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
u32_replace_knode(tp, tp_c, new);
tcf_unbind_filter(tp, &n->res);
tcf_exts_get_net(&n->exts);
- call_rcu(&n->rcu, u32_delete_key_rcu);
+ tcf_queue_work(&n->rwork, u32_delete_key_work);
return 0;
}
--
2.13.0
^ permalink raw reply related
* Re: [PATCH v3] ath10k: transmit queued frames after processing rx packets
From: Niklas Cassel @ 2018-05-23 22:20 UTC (permalink / raw)
To: Kalle Valo; +Cc: ath10k, linux-wireless, netdev, linux-kernel
In-Reply-To: <20180523221508.26391-1-niklas.cassel@linaro.org>
On Thu, May 24, 2018 at 12:15:08AM +0200, Niklas Cassel wrote:
> This problem cannot be reproduced on low-latency devices, e.g. pci,
> since they call ath10k_mac_tx_push_pending() from
> ath10k_htt_txrx_compl_task(). ath10k_htt_txrx_compl_task() is not called
> on high-latency devices.
> Fix the problem by calling ath10k_mac_tx_push_pending(), after
> processing rx packets, just like for low-latency devices, also in the
> SDIO case. Since we are calling ath10k_mac_tx_push_pending() directly,
> we also need to export it.
>
Even if we are now calling ath10k_mac_tx_push_pending each time we process rx packets,
the number of packets we actually queue from ath10k_mac_tx_push_pending are quite few:
>From running iperf for 20 seconds:
# grep ath10k_mac_tx_push_txq /sys/kernel/debug/tracing/trace | grep -v ath10k_mac_op_wake_tx_queue | wc -l
233
number of times ath10k_mac_tx_push_txq was called, but not from ath10k_mac_op_wake_tx_queue,
i.e. number of times ath10k_mac_tx_push_txq was called from ath10k_mac_tx_push_pending.
# grep ath10k_mac_tx_push_txq /sys/kernel/debug/tracing/trace | grep -v ath10k_mac_tx_push_pending | wc -l
28415
number of times ath10k_mac_tx_push_txq was called, but not from ath10k_mac_tx_push_pending,
i.e number of times ath10k_mac_tx_push_txq was called from ath10k_mac_op_wake_tx_queue.
Regards,
Niklas
^ permalink raw reply
* Re: [PATCH 2/4] arcnet: com20020: bindings for smsc com20020
From: Andrea Greco @ 2018-05-23 22:17 UTC (permalink / raw)
To: Rob Herring
Cc: Tobin C. Harding, Andrea Greco, Mark Rutland, netdev, devicetree,
linux-kernel
In-Reply-To: <20180523164931.GA3635@rob-hp-laptop>
On 05/23/2018 06:49 PM, Rob Herring wrote:
> One typo, otherwise:
>
> Reviewed-by: Rob Herring <robh@kernel.org>
Yes typo, Fixed over my branch, sorry for that...
I expect a comment about bps, Bit per Second, used in `bus-speed-bps`
You will add it by your self in property-units.txt, or required my patch?
If your confirm that, ready for: Reviewed-by
Regards, Andrea
^ permalink raw reply
* [PATCH v3] ath10k: transmit queued frames after processing rx packets
From: Niklas Cassel @ 2018-05-23 22:15 UTC (permalink / raw)
To: Kalle Valo, David S. Miller
Cc: Niklas Cassel, ath10k-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-wireless-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
When running iperf on ath10k SDIO, TX can stop working:
iperf -c 192.168.1.1 -i 1 -t 20 -w 10K
[ 3] 0.0- 1.0 sec 2.00 MBytes 16.8 Mbits/sec
[ 3] 1.0- 2.0 sec 3.12 MBytes 26.2 Mbits/sec
[ 3] 2.0- 3.0 sec 3.25 MBytes 27.3 Mbits/sec
[ 3] 3.0- 4.0 sec 655 KBytes 5.36 Mbits/sec
[ 3] 4.0- 5.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 5.0- 6.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 6.0- 7.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 7.0- 8.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 8.0- 9.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 9.0-10.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 0.0-10.3 sec 9.01 MBytes 7.32 Mbits/sec
There are frames in the ieee80211_txq and there are frames that have
been removed from from this queue, but haven't yet been sent on the wire
(num_pending_tx).
When num_pending_tx reaches max_num_pending_tx, we will stop the queues
by calling ieee80211_stop_queues().
As frames that have previously been sent for transmission
(num_pending_tx) are completed, we will decrease num_pending_tx and wake
the queues by calling ieee80211_wake_queue(). ieee80211_wake_queue()
does not call wake_tx_queue, so we might still have frames in the
queue at this point.
While the queues were stopped, the socket buffer might have filled up,
and in order for user space to write more, we need to free the frames
in the queue, since they are accounted to the socket. In order to free
them, we first need to transmit them.
This problem cannot be reproduced on low-latency devices, e.g. pci,
since they call ath10k_mac_tx_push_pending() from
ath10k_htt_txrx_compl_task(). ath10k_htt_txrx_compl_task() is not called
on high-latency devices.
Fix the problem by calling ath10k_mac_tx_push_pending(), after
processing rx packets, just like for low-latency devices, also in the
SDIO case. Since we are calling ath10k_mac_tx_push_pending() directly,
we also need to export it.
Signed-off-by: Niklas Cassel <niklas.cassel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
---
Changes since V2:
Moved ath10k_mac_tx_push_pending() call to ath10k_sdio_irq_handler().
drivers/net/wireless/ath/ath10k/mac.c | 1 +
drivers/net/wireless/ath/ath10k/sdio.c | 3 +++
2 files changed, 4 insertions(+)
diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index 487a7a7380fd..bfcd9705ed54 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -4038,6 +4038,7 @@ void ath10k_mac_tx_push_pending(struct ath10k *ar)
rcu_read_unlock();
spin_unlock_bh(&ar->txqs_lock);
}
+EXPORT_SYMBOL(ath10k_mac_tx_push_pending);
/************/
/* Scanning */
diff --git a/drivers/net/wireless/ath/ath10k/sdio.c b/drivers/net/wireless/ath/ath10k/sdio.c
index d612ce8c9cff..2856c75f9011 100644
--- a/drivers/net/wireless/ath/ath10k/sdio.c
+++ b/drivers/net/wireless/ath/ath10k/sdio.c
@@ -30,6 +30,7 @@
#include "debug.h"
#include "hif.h"
#include "htc.h"
+#include "mac.h"
#include "targaddrs.h"
#include "trace.h"
#include "sdio.h"
@@ -1342,6 +1343,8 @@ static void ath10k_sdio_irq_handler(struct sdio_func *func)
break;
} while (time_before(jiffies, timeout) && !done);
+ ath10k_mac_tx_push_pending(ar);
+
sdio_claim_host(ar_sdio->func);
if (ret && ret != -ECANCELED)
--
2.17.0
^ permalink raw reply related
* Re: [PATCH] net: phy: replace bool members in struct phy_device with bit-fields
From: Andrew Lunn @ 2018-05-23 22:11 UTC (permalink / raw)
To: Heiner Kallweit; +Cc: Florian Fainelli, netdev@vger.kernel.org
In-Reply-To: <3c59ea3d-f707-b991-1f88-8540891488b9@gmail.com>
On Wed, May 23, 2018 at 08:05:20AM +0200, Heiner Kallweit wrote:
> In struct phy_device we have a number of flags being defined as type
> bool. Similar to e.g. struct pci_dev we can save some space by using
> bit-fields.
>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Andrew
^ permalink raw reply
* Re: [PATCH net-next v2 0/2] net: phy: improve PHY suspend/resume
From: Andrew Lunn @ 2018-05-23 22:04 UTC (permalink / raw)
To: Heiner Kallweit; +Cc: Florian Fainelli, David Miller, netdev@vger.kernel.org
In-Reply-To: <8fe93623-9d05-6182-fe5f-da0bd32bae0b@gmail.com>
On Wed, May 23, 2018 at 10:15:29PM +0200, Heiner Kallweit wrote:
> I have the issue that suspending the MAC-integrated PHY gives an
> error during system suspend. The sequence is:
>
> 1. unconnected PHY/MAC are runtime-suspended already
> 2. system suspend commences
> 3. mdio_bus_phy_suspend is called
> 4. suspend callback of the network driver is called (implicitly
> MAC/PHY are runtime-resumed before)
> 5. suspend callback suspends MAC/PHY
>
> The problem occurs in step 3. phy_suspend() fails because the MDIO
> bus isn't accessible due to the chip being runtime-suspended.
I think you are fixing the wrong problem. I've had the same with the
FEC driver. I fixed it by making the MDIO operations runtime-suspend
aware:
commit 8fff755e9f8d0f70a595e79f248695ce6aef5cc3
Author: Andrew Lunn <andrew@lunn.ch>
Date: Sat Jul 25 22:38:02 2015 +0200
net: fec: Ensure clocks are enabled while using mdio bus
When a switch is attached to the mdio bus, the mdio bus can be used
while the interface is not open. If the IPG clock is not enabled, MDIO
reads/writes will simply time out.
Add support for runtime PM to control this clock. Enable/disable this
clock using runtime PM, with open()/close() and mdio read()/write()
function triggering runtime PM operations. Since PM is optional, the
IPG clock is enabled at probe and is no longer modified by
fec_enet_clk_enable(), thus if PM is not enabled in the kernel, it is
guaranteed the clock is running when MDIO operations are performed.
Don't copy this patch 1:1. I introduced a few bugs which took a while
to be shaken out :-(
Andrew
^ permalink raw reply
* Re: [PATCH bpf-next v2 0/3] bpf: add boot parameters for sysctl knobs
From: Alexei Starovoitov @ 2018-05-23 22:02 UTC (permalink / raw)
To: Eugene Syromiatnikov
Cc: netdev, linux-kernel, linux-doc, Kees Cook, Kai-Heng Feng,
Daniel Borkmann, Alexei Starovoitov, Jonathan Corbet, Jiri Olsa,
Jesper Dangaard Brouer
In-Reply-To: <20180523121806.GA27675@asgard.redhat.com>
On Wed, May 23, 2018 at 02:18:19PM +0200, Eugene Syromiatnikov wrote:
> Some BPF sysctl knobs affect the loading of BPF programs, and during
> system boot/init stages these sysctls are not yet configured.
> A concrete example is systemd, that has implemented loading of BPF
> programs.
>
> Thus, to allow controlling these setting at early boot, this patch set
> adds the ability to change the default setting of these sysctl knobs
> as well as option to override them via a boot-time kernel parameter
> (in order to avoid rebuilding kernel each time a need of changing these
> defaults arises).
>
> The sysctl knobs in question are kernel.unprivileged_bpf_disable,
> net.core.bpf_jit_harden, and net.core.bpf_jit_kallsyms.
- systemd is root. today it only uses cgroup-bpf progs which require root,
so disabling unpriv during boot time makes no difference to systemd.
what is the actual reason to present time?
- say in the future systemd wants to use so_reuseport+bpf for faster
networking. With unpriv disable during boot, it will force systemd
to do such networking from root, which will lower its security barrier.
How that make sense?
- bpf_jit_kallsyms sysctl has immediate effect on loaded programs.
Flipping it during the boot or right after or any time after
is the same thing. Why add such boot flag then?
- jit_harden can be turned on by systemd. so turning it during the boot
will make systemd progs to be constant blinded.
Constant blinding protects kernel from unprivileged JIT spraying.
Are you worried that systemd will attack the kernel with JIT spraying?
^ permalink raw reply
* Re: [PATCH 05/15] mtd: nand: pxa3xx: remove the dmaengine compat need
From: Daniel Mack @ 2018-05-23 21:54 UTC (permalink / raw)
To: Robert Jarzmik, Haojian Zhuang, Bartlomiej Zolnierkiewicz,
Tejun Heo, Vinod Koul, Mauro Carvalho Chehab, Ulf Hansson,
Ezequiel Garcia, Boris Brezillon, David Woodhouse, Brian Norris,
Marek Vasut, Richard Weinberger, Cyrille Pitchen, Nicolas Pitre,
Samuel Ortiz, Greg Kroah-Hartman, Jaroslav Kysela, Takashi Iwai,
Liam Girdwood, Mark Brown, Arnd Bergmann
Cc: devel, alsa-devel, netdev, linux-mmc, linux-kernel, linux-ide,
linux-mtd, dmaengine, Robert Jarzmik, linux-arm-kernel,
linux-media
In-Reply-To: <a09d80dc-e8fd-1e9a-1878-8875ddc83134@zonque.org>
[-- Attachment #1: Type: text/plain, Size: 1026 bytes --]
Hi Robert,
Please refer to the attached patch instead of the one I sent earlier. I
missed to also remove the platform_get_resource(IORESOURCE_DMA) call.
Thanks,
Daniel
On Friday, May 18, 2018 11:31 PM, Daniel Mack wrote:
> Hi Robert,
>
> Thanks for this series.
>
> On Monday, April 02, 2018 04:26 PM, Robert Jarzmik wrote:
>> From: Robert Jarzmik <robert.jarzmik@renault.com>
>>
>> As the pxa architecture switched towards the dmaengine slave map, the
>> old compatibility mechanism to acquire the dma requestor line number and
>> priority are not needed anymore.
>>
>> This patch simplifies the dma resource acquisition, using the more
>> generic function dma_request_slave_channel().
>>
>> Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
>> ---
>> drivers/mtd/nand/pxa3xx_nand.c | 10 +---------
>
> This driver was replaced by drivers/mtd/nand/raw/marvell_nand.c
> recently, so this patch can be dropped. I attached a version for the new
> driver which you can pick instead.
>
>
> Thanks,
> Daniel
>
[-- Attachment #2: 0001-mtd-rawnand-marvell-remove-dmaengine-compat-code.patch --]
[-- Type: text/x-patch, Size: 1793 bytes --]
>From 72a306157dedb21f8c3289f0f7a288fc4542bd96 Mon Sep 17 00:00:00 2001
From: Daniel Mack <daniel@zonque.org>
Date: Sat, 12 May 2018 21:50:13 +0200
Subject: [PATCH] mtd: rawnand: marvell: remove dmaengine compat code
As the pxa architecture switched towards the dmaengine slave map, the
old compatibility mechanism to acquire the dma requestor line number and
priority are not needed anymore.
This patch simplifies the dma resource acquisition, using the more
generic function dma_request_slave_channel().
Signed-off-by: Daniel Mack <daniel@zonque.org>
---
drivers/mtd/nand/raw/marvell_nand.c | 17 +----------------
1 file changed, 1 insertion(+), 16 deletions(-)
diff --git a/drivers/mtd/nand/raw/marvell_nand.c b/drivers/mtd/nand/raw/marvell_nand.c
index ebb1d141b900..319fea77daf1 100644
--- a/drivers/mtd/nand/raw/marvell_nand.c
+++ b/drivers/mtd/nand/raw/marvell_nand.c
@@ -2612,8 +2612,6 @@ static int marvell_nfc_init_dma(struct marvell_nfc *nfc)
dev);
struct dma_slave_config config = {};
struct resource *r;
- dma_cap_mask_t mask;
- struct pxad_param param;
int ret;
if (!IS_ENABLED(CONFIG_PXA_DMA)) {
@@ -2626,20 +2624,7 @@ static int marvell_nfc_init_dma(struct marvell_nfc *nfc)
if (ret)
return ret;
- r = platform_get_resource(pdev, IORESOURCE_DMA, 0);
- if (!r) {
- dev_err(nfc->dev, "No resource defined for data DMA\n");
- return -ENXIO;
- }
-
- param.drcmr = r->start;
- param.prio = PXAD_PRIO_LOWEST;
- dma_cap_zero(mask);
- dma_cap_set(DMA_SLAVE, mask);
- nfc->dma_chan =
- dma_request_slave_channel_compat(mask, pxad_filter_fn,
- ¶m, nfc->dev,
- "data");
+ nfc->dma_chan = dma_request_slave_channel(nfc->dev, "data");
if (!nfc->dma_chan) {
dev_err(nfc->dev,
"Unable to request data DMA channel\n");
--
2.14.3
[-- Attachment #3: Type: text/plain, Size: 169 bytes --]
_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
^ permalink raw reply related
* Re: [PATCH v6 0/5] PCI: Improve PCIe link status reporting
From: Bjorn Helgaas @ 2018-05-23 21:46 UTC (permalink / raw)
To: Jeff Kirsher, Ganesh Goudar, Michael Chan, Ariel Elior,
David S. Miller
Cc: linux-pci, everest-linux-l2, intel-wired-lan, netdev,
linux-kernel, Tal Gilboa, Tariq Toukan, Jacob Keller,
Jakub Kicinski
In-Reply-To: <152537719056.62474.2571390812509425478.stgit@bhelgaas-glaptop.roam.corp.google.com>
[+to Davem]
On Thu, May 03, 2018 at 03:00:07PM -0500, Bjorn Helgaas wrote:
> This is based on Tal's recent work to unify the approach for reporting PCIe
> link speed/width and whether the device is being limited by a slower
> upstream link.
>
> The new pcie_print_link_status() interface appeared in v4.17-rc1; see
> 9e506a7b5147 ("PCI: Add pcie_print_link_status() to log link speed and
> whether it's limited").
>
> That's a good way to replace use of pcie_get_minimum_link(), which gives
> misleading results when a path contains both a fast, narrow link and a
> slow, wide link: it reports the equivalent of a slow, narrow link.
>
> This series removes the remaining uses of pcie_get_minimum_link() and then
> removes the interface itself. I'd like to merge them all through the PCI
> tree to make the removal easy.
>
> This does change the dmesg reporting of link speeds, and in the ixgbe case,
> it changes the reporting from KERN_WARN level to KERN_INFO. If that's an
> issue, let's talk about it. I'm hoping the reduce code size, improved
> functionality, and consistency across drivers is enough to make this
> worthwhile.
>
> ---
>
> Bjorn Helgaas (5):
> bnx2x: Report PCIe link properties with pcie_print_link_status()
> bnxt_en: Report PCIe link properties with pcie_print_link_status()
> cxgb4: Report PCIe link properties with pcie_print_link_status()
> ixgbe: Report PCIe link properties with pcie_print_link_status()
> PCI: Remove unused pcie_get_minimum_link()
>
>
> drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 23 ++-----
> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 19 ------
> drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 75 ----------------------
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 47 --------------
> drivers/pci/pci.c | 43 -------------
> include/linux/pci.h | 2 -
> 6 files changed, 9 insertions(+), 200 deletions(-)
I applied all of these on pci/enumeration for v4.18. If you'd rather take
them, Dave, let me know and I'll drop them.
I solicited more acks, but only heard from Jeff.
^ permalink raw reply
* [PATCH v2] ppp: remove the PPPIOCDETACH ioctl
From: Eric Biggers @ 2018-05-23 21:37 UTC (permalink / raw)
To: linux-ppp, Paul Mackerras
Cc: netdev, linux-fsdevel, linux-kernel, Guillaume Nault,
syzkaller-bugs, Eric Biggers
In-Reply-To: <20180523035952.25768-1-ebiggers3@gmail.com>
From: Eric Biggers <ebiggers@google.com>
The PPPIOCDETACH ioctl effectively tries to "close" the given ppp file
before f_count has reached 0, which is fundamentally a bad idea. It
does check 'f_count < 2', which excludes concurrent operations on the
file since they would only be possible with a shared fd table, in which
case each fdget() would take a file reference. However, it fails to
account for the fact that even with 'f_count == 1' the file can still be
linked into epoll instances. As reported by syzbot, this can trivially
be used to cause a use-after-free.
Yet, the only known user of PPPIOCDETACH is pppd versions older than
ppp-2.4.2, which was released almost 15 years ago (November 2003).
Also, PPPIOCDETACH apparently stopped working reliably at around the
same time, when the f_count check was added to the kernel, e.g. see
https://lkml.org/lkml/2002/12/31/83. Also, the current 'f_count < 2'
check makes PPPIOCDETACH only work in single-threaded applications; it
always fails if called from a multithreaded application.
All pppd versions released in the last 15 years just close() the file
descriptor instead.
Therefore, instead of hacking around this bug by exporting epoll
internals to modules, and probably missing other related bugs, just
remove the PPPIOCDETACH ioctl and see if anyone actually notices. Leave
a stub in place that prints a one-time warning and returns EINVAL.
Reported-by: syzbot+16363c99d4134717c05b@syzkaller.appspotmail.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
v2: leave a stub in place, rather than removing the ioctl completely.
Documentation/networking/ppp_generic.txt | 6 ------
drivers/net/ppp/ppp_generic.c | 27 +++++-------------------
include/uapi/linux/ppp-ioctl.h | 2 +-
3 files changed, 6 insertions(+), 29 deletions(-)
diff --git a/Documentation/networking/ppp_generic.txt b/Documentation/networking/ppp_generic.txt
index 091d20273dcb..61daf4b39600 100644
--- a/Documentation/networking/ppp_generic.txt
+++ b/Documentation/networking/ppp_generic.txt
@@ -300,12 +300,6 @@ unattached instance are:
The ioctl calls available on an instance of /dev/ppp attached to a
channel are:
-* PPPIOCDETACH detaches the instance from the channel. This ioctl is
- deprecated since the same effect can be achieved by closing the
- instance. In order to prevent possible races this ioctl will fail
- with an EINVAL error if more than one file descriptor refers to this
- instance (i.e. as a result of dup(), dup2() or fork()).
-
* PPPIOCCONNECT connects this channel to a PPP interface. The
argument should point to an int containing the interface unit
number. It will return an EINVAL error if the channel is already
diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index dc7c7ec43202..02ad03a2fab7 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -605,30 +605,13 @@ static long ppp_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
if (cmd == PPPIOCDETACH) {
/*
- * We have to be careful here... if the file descriptor
- * has been dup'd, we could have another process in the
- * middle of a poll using the same file *, so we had
- * better not free the interface data structures -
- * instead we fail the ioctl. Even in this case, we
- * shut down the interface if we are the owner of it.
- * Actually, we should get rid of PPPIOCDETACH, userland
- * (i.e. pppd) could achieve the same effect by closing
- * this fd and reopening /dev/ppp.
+ * PPPIOCDETACH is no longer supported as it was heavily broken,
+ * and is only known to have been used by pppd older than
+ * ppp-2.4.2 (released November 2003).
*/
+ pr_warn_once("%s (%d) used obsolete PPPIOCDETACH ioctl\n",
+ current->comm, current->pid);
err = -EINVAL;
- if (pf->kind == INTERFACE) {
- ppp = PF_TO_PPP(pf);
- rtnl_lock();
- if (file == ppp->owner)
- unregister_netdevice(ppp->dev);
- rtnl_unlock();
- }
- if (atomic_long_read(&file->f_count) < 2) {
- ppp_release(NULL, file);
- err = 0;
- } else
- pr_warn("PPPIOCDETACH file->f_count=%ld\n",
- atomic_long_read(&file->f_count));
goto out;
}
diff --git a/include/uapi/linux/ppp-ioctl.h b/include/uapi/linux/ppp-ioctl.h
index b19a9c249b15..784c2e3e572e 100644
--- a/include/uapi/linux/ppp-ioctl.h
+++ b/include/uapi/linux/ppp-ioctl.h
@@ -106,7 +106,7 @@ struct pppol2tp_ioc_stats {
#define PPPIOCGIDLE _IOR('t', 63, struct ppp_idle) /* get idle time */
#define PPPIOCNEWUNIT _IOWR('t', 62, int) /* create new ppp unit */
#define PPPIOCATTACH _IOW('t', 61, int) /* attach to ppp unit */
-#define PPPIOCDETACH _IOW('t', 60, int) /* detach from ppp unit/chan */
+#define PPPIOCDETACH _IOW('t', 60, int) /* obsolete, do not use */
#define PPPIOCSMRRU _IOW('t', 59, int) /* set multilink MRU */
#define PPPIOCCONNECT _IOW('t', 58, int) /* connect channel to unit */
#define PPPIOCDISCONN _IO('t', 57) /* disconnect channel */
--
2.17.0.441.gb46fe60e1d-goog
^ permalink raw reply related
* Re: [PATCH bpf-next v3 10/10] tools: bpftool: add delimiters to multi-function JITed dumps
From: Jakub Kicinski @ 2018-05-23 21:32 UTC (permalink / raw)
To: Sandipan Das
Cc: Daniel Borkmann, ast, netdev, linuxppc-dev, mpe, naveen.n.rao,
Quentin Monnet
In-Reply-To: <7142939b-515e-50ac-bc0b-50444bf9cc97@linux.vnet.ibm.com>
On Wed, 23 May 2018 16:07:40 +0530, Sandipan Das wrote:
> "name": "bpf_prog_196af774a3477707_F",
> "insns": [{
> "pc": "0x0",
> "operation": "nop",
> "operands": [null
> ]
> },{
> "pc": "0x4",
> "operation": "nop",
> "operands": [null
> ]
> },{
> "pc": "0x8",
> "operation": "mflr",
> "operands": ["r0"
> ]
> },{
> ...
> }
> ]
> }
> ]
>
> If this is okay, I can send out the next revision with these changes.
Looks good, thank you!
^ permalink raw reply
* Re: [PATCH bpf-next v3 2/7] bpf: introduce bpf subcommand BPF_TASK_FD_QUERY
From: Yonghong Song @ 2018-05-23 21:27 UTC (permalink / raw)
To: Alexei Starovoitov, Martin KaFai Lau
Cc: peterz, ast, daniel, netdev, kernel-team
In-Reply-To: <20180523210359.dz2zwqcb76hebrtn@ast-mbp>
On 5/23/18 2:04 PM, Alexei Starovoitov wrote:
> On Wed, May 23, 2018 at 10:13:22AM -0700, Martin KaFai Lau wrote:
>>> + __u32 prog_id; /* output: prod_id */
>>> + __u32 attach_info; /* output: BPF_ATTACH_* */
>>> + __u64 probe_offset; /* output: probe_offset */
>>> + __u64 probe_addr; /* output: probe_addr */
>>> + } task_fd_query;
>>> } __attribute__((aligned(8)));
>>>
>>> /* The description below is an attempt at providing documentation to eBPF
>>> @@ -2458,4 +2475,14 @@ struct bpf_fib_lookup {
>>> __u8 dmac[6]; /* ETH_ALEN */
>>> };
>>>
>>> +/* used by <task, fd> based query */
>>> +enum {
>> Nit. Instead of a comment, is it better to give this
>> enum a descriptive name?
>>
>>> + BPF_ATTACH_RAW_TRACEPOINT, /* tp name */
>>> + BPF_ATTACH_TRACEPOINT, /* tp name */
>>> + BPF_ATTACH_KPROBE, /* (symbol + offset) or addr */
>>> + BPF_ATTACH_KRETPROBE, /* (symbol + offset) or addr */
>>> + BPF_ATTACH_UPROBE, /* filename + offset */
>>> + BPF_ATTACH_URETPROBE, /* filename + offset */
>>> +};
>
> One more nit here.
> Can we come up with better names for the above?
> 'attach' is a verb. I cannot help but read above as it's an action
> for the kernel to attach to something and not the type of event
> where the program was attached to.
> Since we pass task+fd into that BPF_TASK_FD_QUERY command how
> about returning BPF_FD_TYPE_KPROBE, BPF_FD_TYPE_TRACEPOINT, ... ?
Okay will use BPF_FD_TYPE_*... which is indeed better than
BPF_ATTACH_*.
^ permalink raw reply
* Re: [PATCH bpf-next v3 2/7] bpf: introduce bpf subcommand BPF_TASK_FD_QUERY
From: Yonghong Song @ 2018-05-23 21:25 UTC (permalink / raw)
To: Martin KaFai Lau; +Cc: peterz, ast, daniel, netdev, kernel-team
In-Reply-To: <20180523171320.ziswsvpsyelxb6fz@kafai-mbp>
On 5/23/18 10:13 AM, Martin KaFai Lau wrote:
> On Tue, May 22, 2018 at 09:30:46AM -0700, Yonghong Song wrote:
>> Currently, suppose a userspace application has loaded a bpf program
>> and attached it to a tracepoint/kprobe/uprobe, and a bpf
>> introspection tool, e.g., bpftool, wants to show which bpf program
>> is attached to which tracepoint/kprobe/uprobe. Such attachment
>> information will be really useful to understand the overall bpf
>> deployment in the system.
>>
>> There is a name field (16 bytes) for each program, which could
>> be used to encode the attachment point. There are some drawbacks
>> for this approaches. First, bpftool user (e.g., an admin) may not
>> really understand the association between the name and the
>> attachment point. Second, if one program is attached to multiple
>> places, encoding a proper name which can imply all these
>> attachments becomes difficult.
>>
>> This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY.
>> Given a pid and fd, if the <pid, fd> is associated with a
>> tracepoint/kprobe/uprobe perf event, BPF_TASK_FD_QUERY will return
>> . prog_id
>> . tracepoint name, or
>> . k[ret]probe funcname + offset or kernel addr, or
>> . u[ret]probe filename + offset
>> to the userspace.
>> The user can use "bpftool prog" to find more information about
>> bpf program itself with prog_id.
> LGTM, some comments inline.
>
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>> include/linux/trace_events.h | 16 ++++++
>> include/uapi/linux/bpf.h | 27 ++++++++++
>> kernel/bpf/syscall.c | 124 +++++++++++++++++++++++++++++++++++++++++++
>> kernel/trace/bpf_trace.c | 48 +++++++++++++++++
>> kernel/trace/trace_kprobe.c | 29 ++++++++++
>> kernel/trace/trace_uprobe.c | 22 ++++++++
>> 6 files changed, 266 insertions(+)
>>
>> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
>> index 2bde3ef..eab806d 100644
>> --- a/include/linux/trace_events.h
>> +++ b/include/linux/trace_events.h
>> @@ -473,6 +473,9 @@ int perf_event_query_prog_array(struct perf_event *event, void __user *info);
>> int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
>> int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
>> struct bpf_raw_event_map *bpf_find_raw_tracepoint(const char *name);
>> +int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
>> + u32 *attach_info, const char **buf,
>> + u64 *probe_offset, u64 *probe_addr);
> The first arg is 'const struct perf_event *event' while...
>
>> #else
>> static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
>> {
>> @@ -504,6 +507,12 @@ static inline struct bpf_raw_event_map *bpf_find_raw_tracepoint(const char *name
>> {
>> return NULL;
>> }
>> +static inline int bpf_get_perf_event_info(const struct file *file, u32 *prog_id,
> this one has 'const struct file *file'?
Thanks for catching this. Will correct this in the next revision.
>
>> + u32 *attach_info, const char **buf,
>> + u64 *probe_offset, u64 *probe_addr)
>> +{
>> + return -EOPNOTSUPP;
>> +}
>> #endif
>>
>> enum {
>> @@ -560,10 +569,17 @@ extern void perf_trace_del(struct perf_event *event, int flags);
>> #ifdef CONFIG_KPROBE_EVENTS
>> extern int perf_kprobe_init(struct perf_event *event, bool is_retprobe);
>> extern void perf_kprobe_destroy(struct perf_event *event);
>> +extern int bpf_get_kprobe_info(const struct perf_event *event,
>> + u32 *attach_info, const char **symbol,
>> + u64 *probe_offset, u64 *probe_addr,
>> + bool perf_type_tracepoint);
>> #endif
>> #ifdef CONFIG_UPROBE_EVENTS
>> extern int perf_uprobe_init(struct perf_event *event, bool is_retprobe);
>> extern void perf_uprobe_destroy(struct perf_event *event);
>> +extern int bpf_get_uprobe_info(const struct perf_event *event,
>> + u32 *attach_info, const char **filename,
>> + u64 *probe_offset, bool perf_type_tracepoint);
>> #endif
>> extern int ftrace_profile_set_filter(struct perf_event *event, int event_id,
>> char *filter_str);
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 97446bb..a602150 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -97,6 +97,7 @@ enum bpf_cmd {
>> BPF_RAW_TRACEPOINT_OPEN,
>> BPF_BTF_LOAD,
>> BPF_BTF_GET_FD_BY_ID,
>> + BPF_TASK_FD_QUERY,
>> };
>>
>> enum bpf_map_type {
>> @@ -379,6 +380,22 @@ union bpf_attr {
>> __u32 btf_log_size;
>> __u32 btf_log_level;
>> };
>> +
>> + struct {
>> + int pid; /* input: pid */
>> + int fd; /* input: fd */
> Should fd and pid be always positive?
> The current fd (like map_fd) in bpf_attr is using __u32.
Will change both pid and fd to __u32. In kernel fd is unsigned, but
pid_t is actually an int. The negative pid is often referred to
the process group the pid is in. Since here, we are only concerned
with actual process pid, make __u32 should be okay.
>
>> + __u32 flags; /* input: flags */
>> + __u32 buf_len; /* input: buf len */
>> + __aligned_u64 buf; /* input/output:
>> + * tp_name for tracepoint
>> + * symbol for kprobe
>> + * filename for uprobe
>> + */
>> + __u32 prog_id; /* output: prod_id */
>> + __u32 attach_info; /* output: BPF_ATTACH_* */
>> + __u64 probe_offset; /* output: probe_offset */
>> + __u64 probe_addr; /* output: probe_addr */
>> + } task_fd_query;
>> } __attribute__((aligned(8)));
>>
>> /* The description below is an attempt at providing documentation to eBPF
>> @@ -2458,4 +2475,14 @@ struct bpf_fib_lookup {
>> __u8 dmac[6]; /* ETH_ALEN */
>> };
>>
>> +/* used by <task, fd> based query */
>> +enum {
> Nit. Instead of a comment, is it better to give this
> enum a descriptive name?
Yes, will add an enum name bpf_task_fd_info to make it easy for
correlation with task_fd_query.
>
>> + BPF_ATTACH_RAW_TRACEPOINT, /* tp name */
>> + BPF_ATTACH_TRACEPOINT, /* tp name */
>> + BPF_ATTACH_KPROBE, /* (symbol + offset) or addr */
>> + BPF_ATTACH_KRETPROBE, /* (symbol + offset) or addr */
>> + BPF_ATTACH_UPROBE, /* filename + offset */
>> + BPF_ATTACH_URETPROBE, /* filename + offset */
>> +};
>> +
>> #endif /* _UAPI__LINUX_BPF_H__ */
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index bfcde94..9356f0e 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -18,7 +18,9 @@
>> #include <linux/vmalloc.h>
>> #include <linux/mmzone.h>
>> #include <linux/anon_inodes.h>
>> +#include <linux/fdtable.h>
>> #include <linux/file.h>
>> +#include <linux/fs.h>
>> #include <linux/license.h>
>> #include <linux/filter.h>
>> #include <linux/version.h>
>> @@ -2102,6 +2104,125 @@ static int bpf_btf_get_fd_by_id(const union bpf_attr *attr)
>> return btf_get_fd_by_id(attr->btf_id);
>> }
>>
>> +static int bpf_task_fd_query_copy(const union bpf_attr *attr,
>> + union bpf_attr __user *uattr,
>> + u32 prog_id, u32 attach_info,
>> + const char *buf, u64 probe_offset,
>> + u64 probe_addr)
>> +{
>> + __u64 __user *ubuf;
> Nit. ubuf is a string instead of an array of __u64?
will change it to void *.
>
>> + int len;
>> +
>> + ubuf = u64_to_user_ptr(attr->task_fd_query.buf);
>> + if (buf) {
>> + len = strlen(buf);
>> + if (attr->task_fd_query.buf_len < len + 1)
> I think the current convention is to take the min,
> copy whatever it can to buf and return the real
> len/size in buf_len. F.e., the prog_ids and prog_cnt in
> __cgroup_bpf_query().
>
> Should the same be done here or it does not make sense to
> truncate the string? The user may/may not need the tailing
> char though if its pretty print has limited width anyway.
> The user still needs to know what the buf_len should be to
> retry also but I guess any reasonable buf_len should
> work?
Make sense, will make buf_len input/output and copy
the actually needed length to buf_len and back to user.
>
>> + return -ENOSPC;
>> + if (copy_to_user(ubuf, buf, len + 1))
>> + return -EFAULT;
>> + } else if (attr->task_fd_query.buf_len) {
>> + /* copy '\0' to ubuf */
>> + __u8 zero = 0;
>> +
>> + if (copy_to_user(ubuf, &zero, 1))
>> + return -EFAULT;
>> + }
>> +
>> + if (copy_to_user(&uattr->task_fd_query.prog_id, &prog_id,
>> + sizeof(prog_id)) ||
>> + copy_to_user(&uattr->task_fd_query.attach_info, &attach_info,
>> + sizeof(attach_info)) ||
>> + copy_to_user(&uattr->task_fd_query.probe_offset, &probe_offset,
>> + sizeof(probe_offset)) ||
>> + copy_to_user(&uattr->task_fd_query.probe_addr, &probe_addr,
>> + sizeof(probe_addr)))
> Nit. put_user() may be able to shorten them.
Indeed, thanks for suggestion.
>
>> + return -EFAULT;
>> +
>> + return 0;
>> +}
>> +
>> +#define BPF_TASK_FD_QUERY_LAST_FIELD task_fd_query.probe_addr
>> +
>> +static int bpf_task_fd_query(const union bpf_attr *attr,
>> + union bpf_attr __user *uattr)
>> +{
>> + pid_t pid = attr->task_fd_query.pid;
>> + int fd = attr->task_fd_query.fd;
>> + const struct perf_event *event;
>> + struct files_struct *files;
>> + struct task_struct *task;
>> + struct file *file;
>> + int err;
>> +
>> + if (CHECK_ATTR(BPF_TASK_FD_QUERY))
>> + return -EINVAL;
>> +
>> + if (!capable(CAP_SYS_ADMIN))
>> + return -EPERM;
>> +
>> + if (attr->task_fd_query.flags != 0)
> How flags is used?
flags is not used yet, but for future extension.
For example, it could be used to query a specific
type of files (raw_tracepoint vs. perf-event based, etc.).
>
>> + return -EINVAL;
>> +
>> + task = get_pid_task(find_vpid(pid), PIDTYPE_PID);
>> + if (!task)
>> + return -ENOENT;
>> +
>> + files = get_files_struct(task);
>> + put_task_struct(task);
>> + if (!files)
>> + return -ENOENT;
>> +
>> + err = 0;
>> + spin_lock(&files->file_lock);
>> + file = fcheck_files(files, fd);
>> + if (!file)
>> + err = -EBADF;
>> + else
>> + get_file(file);
>> + spin_unlock(&files->file_lock);
>> + put_files_struct(files);
>> +
>> + if (err)
>> + goto out;
>> +
>> + if (file->f_op == &bpf_raw_tp_fops) {
>> + struct bpf_raw_tracepoint *raw_tp = file->private_data;
>> + struct bpf_raw_event_map *btp = raw_tp->btp;
>> +
>> + if (!raw_tp->prog)
>> + err = -ENOENT;
>> + else
>> + err = bpf_task_fd_query_copy(attr, uattr,
>> + raw_tp->prog->aux->id,
>> + BPF_ATTACH_RAW_TRACEPOINT,
>> + btp->tp->name, 0, 0);
>> + goto put_file;
>> + }
>> +
>> + event = perf_get_event(file);
>> + if (!IS_ERR(event)) {
>> + u64 probe_offset, probe_addr;
>> + u32 prog_id, attach_info;
>> + const char *buf;
>> +
>> + err = bpf_get_perf_event_info(event, &prog_id, &attach_info,
>> + &buf, &probe_offset,
>> + &probe_addr);
>> + if (!err)
>> + err = bpf_task_fd_query_copy(attr, uattr, prog_id,
>> + attach_info, buf,
>> + probe_offset,
>> + probe_addr);
>> + goto put_file;
>> + }
>> +
>> + err = -ENOTSUPP;
>> +put_file:
>> + fput(file);
>> +out:
>> + return err;
>> +}
>> +
>> SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
>> {
>> union bpf_attr attr = {};
>> @@ -2188,6 +2309,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
>> case BPF_BTF_GET_FD_BY_ID:
>> err = bpf_btf_get_fd_by_id(&attr);
>> break;
>> + case BPF_TASK_FD_QUERY:
>> + err = bpf_task_fd_query(&attr, uattr);
>> + break;
>> default:
>> err = -EINVAL;
>> break;
>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>> index ce2cbbf..323c80e 100644
>> --- a/kernel/trace/bpf_trace.c
>> +++ b/kernel/trace/bpf_trace.c
>> @@ -14,6 +14,7 @@
>> #include <linux/uaccess.h>
>> #include <linux/ctype.h>
>> #include <linux/kprobes.h>
>> +#include <linux/syscalls.h>
>> #include <linux/error-injection.h>
>>
>> #include "trace_probe.h"
>> @@ -1163,3 +1164,50 @@ int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
>> mutex_unlock(&bpf_event_mutex);
>> return err;
>> }
>> +
>> +int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
>> + u32 *attach_info, const char **buf,
>> + u64 *probe_offset, u64 *probe_addr)
>> +{
>> + bool is_tracepoint, is_syscall_tp;
>> + struct bpf_prog *prog;
>> + int flags, err = 0;
>> +
>> + prog = event->prog;
>> + if (!prog)
>> + return -ENOENT;
>> +
>> + /* not supporting BPF_PROG_TYPE_PERF_EVENT yet */
>> + if (prog->type == BPF_PROG_TYPE_PERF_EVENT)
>> + return -EOPNOTSUPP;
>> +
>> + *prog_id = prog->aux->id;
>> + flags = event->tp_event->flags;
>> + is_tracepoint = flags & TRACE_EVENT_FL_TRACEPOINT;
>> + is_syscall_tp = is_syscall_trace_event(event->tp_event);
>> +
>> + if (is_tracepoint || is_syscall_tp) {
>> + *buf = is_tracepoint ? event->tp_event->tp->name
>> + : event->tp_event->name;
>> + *attach_info = BPF_ATTACH_TRACEPOINT;
>> + *probe_offset = 0x0;
>> + *probe_addr = 0x0;
>> + } else {
>> + /* kprobe/uprobe */
>> + err = -EOPNOTSUPP;
>> +#ifdef CONFIG_KPROBE_EVENTS
>> + if (flags & TRACE_EVENT_FL_KPROBE)
>> + err = bpf_get_kprobe_info(event, attach_info, buf,
>> + probe_offset, probe_addr,
>> + event->attr.type == PERF_TYPE_TRACEPOINT);
>> +#endif
>> +#ifdef CONFIG_UPROBE_EVENTS
>> + if (flags & TRACE_EVENT_FL_UPROBE)
>> + err = bpf_get_uprobe_info(event, attach_info, buf,
>> + probe_offset,
>> + event->attr.type == PERF_TYPE_TRACEPOINT);
>> +#endif
>> + }
>> +
>> + return err;
>> +}
>> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
>> index 02aed76..32e9190 100644
>> --- a/kernel/trace/trace_kprobe.c
>> +++ b/kernel/trace/trace_kprobe.c
>> @@ -1287,6 +1287,35 @@ kretprobe_perf_func(struct trace_kprobe *tk, struct kretprobe_instance *ri,
>> head, NULL);
>> }
>> NOKPROBE_SYMBOL(kretprobe_perf_func);
>> +
>> +int bpf_get_kprobe_info(const struct perf_event *event, u32 *attach_info,
>> + const char **symbol, u64 *probe_offset,
>> + u64 *probe_addr, bool perf_type_tracepoint)
>> +{
>> + const char *pevent = trace_event_name(event->tp_event);
>> + const char *group = event->tp_event->class->system;
>> + struct trace_kprobe *tk;
>> +
>> + if (perf_type_tracepoint)
>> + tk = find_trace_kprobe(pevent, group);
>> + else
>> + tk = event->tp_event->data;
>> + if (!tk)
>> + return -EINVAL;
>> +
>> + *attach_info = trace_kprobe_is_return(tk) ? BPF_ATTACH_KRETPROBE
>> + : BPF_ATTACH_KPROBE;
>> + if (tk->symbol) {
>> + *symbol = tk->symbol;
>> + *probe_offset = tk->rp.kp.offset;
>> + *probe_addr = 0;
>> + } else {
>> + *symbol = NULL;
>> + *probe_offset = 0;
>> + *probe_addr = (unsigned long)tk->rp.kp.addr;
>> + }
>> + return 0;
>> +}
>> #endif /* CONFIG_PERF_EVENTS */
>>
>> /*
>> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
>> index ac89287..12a3667 100644
>> --- a/kernel/trace/trace_uprobe.c
>> +++ b/kernel/trace/trace_uprobe.c
>> @@ -1161,6 +1161,28 @@ static void uretprobe_perf_func(struct trace_uprobe *tu, unsigned long func,
>> {
>> __uprobe_perf_func(tu, func, regs, ucb, dsize);
>> }
>> +
>> +int bpf_get_uprobe_info(const struct perf_event *event, u32 *attach_info,
>> + const char **filename, u64 *probe_offset,
>> + bool perf_type_tracepoint)
>> +{
>> + const char *pevent = trace_event_name(event->tp_event);
>> + const char *group = event->tp_event->class->system;
>> + struct trace_uprobe *tu;
>> +
>> + if (perf_type_tracepoint)
>> + tu = find_probe_event(pevent, group);
>> + else
>> + tu = event->tp_event->data;
>> + if (!tu)
>> + return -EINVAL;
>> +
>> + *attach_info = is_ret_probe(tu) ? BPF_ATTACH_URETPROBE
>> + : BPF_ATTACH_UPROBE;
>> + *filename = tu->filename;
>> + *probe_offset = tu->offset;
>> + return 0;
>> +}
>> #endif /* CONFIG_PERF_EVENTS */
>>
>> static int
>> --
>> 2.9.5
>>
^ permalink raw reply
* Re: [PATCH net-next v15 4/7] sch_cake: Add NAT awareness to packet classifier
From: David Miller @ 2018-05-23 21:20 UTC (permalink / raw)
To: toke; +Cc: netdev, cake, netfilter-devel
In-Reply-To: <87bmd6xeur.fsf@toke.dk>
From: Toke Høiland-Jørgensen <toke@toke.dk>
Date: Wed, 23 May 2018 23:05:16 +0200
> Ah, right, that could work. Is there any particular field in sk_buff
> we should stomp on for this purpose, or would you prefer a new one?
> Looking through it, the only obvious one that comes to mind is, well,
> skb->_nfct :)
>
> If we wanted to avoid bloating sk_buff, we could add a union with that,
> fill it in the flow dissector, and just let conntrack overwrite it if
> active; then detect which is which in Cake, and read the data we need
> from _nfct if conntrack is active, and from what the flow dissector
> stored otherwise.
>
> Is that too many hoops to jump through to avoid adding an extra field?
Space is precious in sk_buff, so yes avoid adding new members at all
costs.
How much info do you need exactly?
^ permalink raw reply
* Re: [PATCH] ppp: remove the PPPIOCDETACH ioctl
From: Eric Biggers @ 2018-05-23 21:17 UTC (permalink / raw)
To: David Miller
Cc: g.nault, linux-ppp, paulus, netdev, linux-fsdevel, linux-kernel,
syzkaller-bugs, ebiggers
In-Reply-To: <20180523.115636.2241611659399097483.davem@davemloft.net>
On Wed, May 23, 2018 at 11:56:36AM -0400, David Miller wrote:
> From: Guillaume Nault <g.nault@alphalink.fr>
> Date: Wed, 23 May 2018 15:57:08 +0200
>
> > I'd rather add
> > + if (cmd == PPPIOCDETACH) {
> > + err = -EINVAL;
> > + goto out;
> > + }
> >
> > Making PPPIOCDETACH unknown to ppp_generic means that the ioctl would
> > be handled by the underlying channel when pf->kind == CHANNEL (see the
> > chan->ops->ioctl() call further down). That shouldn't be a problem per
> > se, but even though PPPIOCDETACH is unsupported, I feel that it should
> > remain a ppp_generic thing. I don't really want its value to be reused
> > for other purposes in the future or have different behaviour depending
> > on the underlying channel.
> >
> > Also PPPIOCDETACH can already fail with -EINVAL. Therefore, if ever
> > there really were programs out there using this call, they'd already
> > have to handle this case. Unconditionally returning -EINVAL would
> > further minimise possibilities for breakage.
>
> I agree.
Okay, I'll do that and leave the ioctl number reserved.
I will add a pr_warn_once() too.
Thanks,
- Eric
^ permalink raw reply
* Re: [PATCH net-next] net:sched: add action inheritdsfield to skbmod
From: Marcelo Ricardo Leitner @ 2018-05-23 21:06 UTC (permalink / raw)
To: Fu, Qiaobin
Cc: davem@davemloft.net, netdev@vger.kernel.org, jhs@mojatatu.com,
Michel Machado
In-Reply-To: <2F042100-2BAC-48E5-887C-5D426B1D5A5B@bu.edu>
Hi,
Some style fixes:
On Thu, May 17, 2018 at 07:33:08PM +0000, Fu, Qiaobin wrote:
> net/sched: add action inheritdsfield to skbmod
This extra line above should not be here.
>
> The new action inheritdsfield copies the field DS of
> IPv4 and IPv6 packets into skb->prioriry. This enables
typo -----^
> later classification of packets based on the DS field.
>
> Original idea by Jamal Hadi Salim <jhs@mojatatu.com>
>
> Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
> Reviewed-by: Michel Machado <michel@digirati.com.br>
> ---
>
> Note that the motivation for this patch is found in the following discussion:
> https://www.spinics.net/lists/netdev/msg501061.html
> ---
>
> diff --git a/include/uapi/linux/tc_act/tc_skbmod.h b/include/uapi/linux/tc_act/tc_skbmod.h
> index 38c072f..0718b48 100644
> --- a/include/uapi/linux/tc_act/tc_skbmod.h
> +++ b/include/uapi/linux/tc_act/tc_skbmod.h
> @@ -19,6 +19,7 @@
> #define SKBMOD_F_SMAC 0x2
> #define SKBMOD_F_ETYPE 0x4
> #define SKBMOD_F_SWAPMAC 0x8
> +#define SKBMOD_F_INHERITDSFIELD 0x10
>
> struct tc_skbmod {
> tc_gen;
> diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
> index ad050d7..21d5bec 100644
> --- a/net/sched/act_skbmod.c
> +++ b/net/sched/act_skbmod.c
> @@ -16,6 +16,9 @@
> #include <linux/rtnetlink.h>
> #include <net/netlink.h>
> #include <net/pkt_sched.h>
> +#include <net/ip.h>
> +#include <net/ipv6.h>
> +#include <net/dsfield.h>
>
> #include <linux/tc_act/tc_skbmod.h>
> #include <net/tc_act/tc_skbmod.h>
> @@ -72,6 +75,25 @@ static int tcf_skbmod_run(struct sk_buff *skb, const struct tc_action *a,
> ether_addr_copy(eth_hdr(skb)->h_source, (u8 *)tmpaddr);
> }
>
> + if (flags & SKBMOD_F_INHERITDSFIELD) {
> + int wlen = skb_network_offset(skb);
You need a blank line here, between var declaration and the rest.
> + switch (tc_skb_protocol(skb)) {
> + case htons(ETH_P_IP):
> + wlen += sizeof(struct iphdr);
> + if (!pskb_may_pull(skb, wlen))
> + return TC_ACT_SHOT;
> + skb->priority = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
> + break;
> +
> + case htons(ETH_P_IPV6):
> + wlen += sizeof(struct ipv6hdr);
> + if (!pskb_may_pull(skb, wlen))
> + return TC_ACT_SHOT;
> + skb->priority = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
> + break;
> + }
> + }
> +
> return action;
> }
>
> @@ -127,6 +149,9 @@ static int tcf_skbmod_init(struct net *net, struct nlattr *nla,
> if (parm->flags & SKBMOD_F_SWAPMAC)
> lflags = SKBMOD_F_SWAPMAC;
>
> + if (parm->flags & SKBMOD_F_INHERITDSFIELD)
> + lflags |= SKBMOD_F_INHERITDSFIELD;
> +
> exists = tcf_idr_check(tn, parm->index, a, bind);
> if (exists && bind)
> return 0;
^ permalink raw reply
* Re: [PATCH net-next v15 4/7] sch_cake: Add NAT awareness to packet classifier
From: Toke Høiland-Jørgensen @ 2018-05-23 21:05 UTC (permalink / raw)
To: David Miller; +Cc: netdev, cake, netfilter-devel
In-Reply-To: <20180523.164152.387997944739062215.davem@davemloft.net>
David Miller <davem@davemloft.net> writes:
> From: Toke Høiland-Jørgensen <toke@toke.dk>
> Date: Wed, 23 May 2018 22:38:30 +0200
>
>> How would this work?
>
> On egress the core networking flow dissector records what you need
> somewhere in SKB or wherever. You later retrieve it at egress time
> after NAT has occurred.
Ah, right, that could work. Is there any particular field in sk_buff
we should stomp on for this purpose, or would you prefer a new one?
Looking through it, the only obvious one that comes to mind is, well,
skb->_nfct :)
If we wanted to avoid bloating sk_buff, we could add a union with that,
fill it in the flow dissector, and just let conntrack overwrite it if
active; then detect which is which in Cake, and read the data we need
from _nfct if conntrack is active, and from what the flow dissector
stored otherwise.
Is that too many hoops to jump through to avoid adding an extra field?
-Toke
^ permalink raw reply
* Re: [PATCH bpf-next v3 2/7] bpf: introduce bpf subcommand BPF_TASK_FD_QUERY
From: Alexei Starovoitov @ 2018-05-23 21:04 UTC (permalink / raw)
To: Martin KaFai Lau; +Cc: Yonghong Song, peterz, ast, daniel, netdev, kernel-team
In-Reply-To: <20180523171320.ziswsvpsyelxb6fz@kafai-mbp>
On Wed, May 23, 2018 at 10:13:22AM -0700, Martin KaFai Lau wrote:
> > + __u32 prog_id; /* output: prod_id */
> > + __u32 attach_info; /* output: BPF_ATTACH_* */
> > + __u64 probe_offset; /* output: probe_offset */
> > + __u64 probe_addr; /* output: probe_addr */
> > + } task_fd_query;
> > } __attribute__((aligned(8)));
> >
> > /* The description below is an attempt at providing documentation to eBPF
> > @@ -2458,4 +2475,14 @@ struct bpf_fib_lookup {
> > __u8 dmac[6]; /* ETH_ALEN */
> > };
> >
> > +/* used by <task, fd> based query */
> > +enum {
> Nit. Instead of a comment, is it better to give this
> enum a descriptive name?
>
> > + BPF_ATTACH_RAW_TRACEPOINT, /* tp name */
> > + BPF_ATTACH_TRACEPOINT, /* tp name */
> > + BPF_ATTACH_KPROBE, /* (symbol + offset) or addr */
> > + BPF_ATTACH_KRETPROBE, /* (symbol + offset) or addr */
> > + BPF_ATTACH_UPROBE, /* filename + offset */
> > + BPF_ATTACH_URETPROBE, /* filename + offset */
> > +};
One more nit here.
Can we come up with better names for the above?
'attach' is a verb. I cannot help but read above as it's an action
for the kernel to attach to something and not the type of event
where the program was attached to.
Since we pass task+fd into that BPF_TASK_FD_QUERY command how
about returning BPF_FD_TYPE_KPROBE, BPF_FD_TYPE_TRACEPOINT, ... ?
^ permalink raw reply
* Re: [PATCH net-next v15 4/7] sch_cake: Add NAT awareness to packet classifier
From: David Miller @ 2018-05-23 20:41 UTC (permalink / raw)
To: toke; +Cc: netdev, cake, netfilter-devel
In-Reply-To: <87in7exg3d.fsf@toke.dk>
From: Toke Høiland-Jørgensen <toke@toke.dk>
Date: Wed, 23 May 2018 22:38:30 +0200
> How would this work?
On egress the core networking flow dissector records what you need
somewhere in SKB or wherever. You later retrieve it at egress time
after NAT has occurred.
> It's about making sure the per-host fairness works when NATing, so
> we can distribute bandwidth between the hosts on the local LAN
> regardless of how many flows they open.
Ok, understood.
> But it's not unreasonable to expect people who do NAT in eBPF to
> also set skb->tc_classid if they want pre-nat host fairness, is it?
And core networking can do it as well.
Please remove this conntrack dependency, I don't think it is necessary
and it is very short sighted.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox