From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: netdev@vger.kernel.org, Daniel Borkmann <borkmann@iogearbox.net>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
brouer@redhat.com
Subject: Re: [PATCH bpf-next] bpf: add skb->queue_mapping write access from tc clsact
Date: Tue, 19 Feb 2019 15:52:59 +0100 [thread overview]
Message-ID: <20190219155259.677d195c@carbon> (raw)
In-Reply-To: <11d2572e-3ff5-2fc0-8f05-c50dd0fb1d6d@iogearbox.net>
On Tue, 19 Feb 2019 12:46:57 +0100
Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 02/19/2019 11:24 AM, Jesper Dangaard Brouer wrote:
> > The skb->queue_mapping already have read access, via __sk_buff->queue_mapping.
> >
> > This patch allow BPF tc qdisc clsact write access to the queue_mapping via
> > tc_cls_act_is_valid_access.
> >
> > It is already possible to change this via TC filter action skbedit
> > tc-skbedit(8). Due to the lack of TC examples, lets show one:
> >
> > # tc qdisc add dev ixgbe1 handle ffff: ingress
> > # tc filter add dev ixgbe1 parent ffff: matchall action skbedit queue_mapping 5
> > # tc filter list dev ixgbe1 parent ffff:
>
> Using handles was in the old days, if we add examples, then lets do
> something more user friendly ;)
>
> # tc qdisc add dev ixgbe1 clsact
> # tc filter replace dev ixgbe1 ingress matchall action skbedit queue_mapping 5
> # tc filter list dev ixgbe1 ingress
>
> > The most common mistake is that XPS (Transmit Packet Steering) takes
> > precedence over setting skb->queue_mapping. XPS is configured per DEVICE
> > via /sys/class/net/DEVICE/queues/tx-*/xps_cpus via a CPU hex mask. To
> > disable set mask=00.
> >
> > The purpose of changing skb->queue_mapping is to influence the selection of
> > the net_device "txq" (struct netdev_queue), which influence selection of
> > the qdisc "root_lock" (via txq->qdisc->q.lock) and txq->_xmit_lock. When
> > using the MQ qdisc the txq->qdisc points to different qdiscs and associated
> > locks, and HARD_TX_LOCK (txq->_xmit_lock), allowing for CPU scalability.
> >
> > Due to lack of TC examples, lets show howto attach clsact BPF programs:
> >
> > # tc qdisc add dev ixgbe2 clsact
> > # tc filter replace dev ixgbe2 egress bpf da obj XXX_kern.o sec tc_qmap2cpu
> > # tc filter list dev ixgbe2 egress
> >
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> > net/core/filter.c | 14 +++++++++++---
> > 1 file changed, 11 insertions(+), 3 deletions(-)
> >
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 353735575204..d05ae8d05397 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -6238,6 +6238,7 @@ static bool tc_cls_act_is_valid_access(int off, int size,
> > case bpf_ctx_range(struct __sk_buff, tc_classid):
> > case bpf_ctx_range_till(struct __sk_buff, cb[0], cb[4]):
> > case bpf_ctx_range(struct __sk_buff, tstamp):
> > + case bpf_ctx_range(struct __sk_buff, queue_mapping):
> > break;
> > default:
> > return false;
> > @@ -6642,9 +6643,16 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type,
> > break;
> >
> > case offsetof(struct __sk_buff, queue_mapping):
> > - *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg,
> > - bpf_target_off(struct sk_buff, queue_mapping, 2,
> > - target_size));
> > + if (type == BPF_WRITE)
> > + *insn++ = BPF_STX_MEM(BPF_H, si->dst_reg, si->src_reg,
> > + bpf_target_off(struct sk_buff,
> > + queue_mapping,
> > + 2, target_size));
> > + else
> > + *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg,
> > + bpf_target_off(struct sk_buff,
> > + queue_mapping,
> > + 2, target_size));
>
> One thing we should avoid would be to allow user to write NO_QUEUE_MAPPING
> into skb->queue_mapping so we don't hit the warn in sk_tx_queue_set(), I'd
> add this into the ctx rewrite here.
Makes sense. I would really appreciate if you could help me out writing
the needed BPF instructions, as I'm not an expert here.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2019-02-19 14:53 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-19 10:24 [PATCH bpf-next] bpf: add skb->queue_mapping write access from tc clsact Jesper Dangaard Brouer
2019-02-19 11:46 ` Daniel Borkmann
2019-02-19 14:52 ` Jesper Dangaard Brouer [this message]
2019-02-19 16:18 ` Daniel Borkmann
2019-02-19 19:36 ` Jesper Dangaard Brouer
2019-02-19 15:57 ` Jesper Dangaard Brouer
2019-02-19 16:08 ` Daniel Borkmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190219155259.677d195c@carbon \
--to=brouer@redhat.com \
--cc=alexei.starovoitov@gmail.com \
--cc=borkmann@iogearbox.net \
--cc=daniel@iogearbox.net \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.