* Re: [PATCH bpf-next 1/2] bpf: switch most helper return values from 32-bit int to 64-bit long
From: John Fastabend @ 2020-06-18 23:30 UTC (permalink / raw)
To: Andrii Nakryiko, John Fastabend
Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
Daniel Borkmann, Kernel Team
In-Reply-To: <CAEf4BzZmWO=hO0kmtwkACEfHZm+H7+FZ+5moaLie2=13U3xU=g@mail.gmail.com>
Andrii Nakryiko wrote:
> On Thu, Jun 18, 2020 at 11:58 AM John Fastabend
> <john.fastabend@gmail.com> wrote:
> >
> > Andrii Nakryiko wrote:
> > > On Wed, Jun 17, 2020 at 11:49 PM John Fastabend
> > > <john.fastabend@gmail.com> wrote:
> > > >
> > > > Andrii Nakryiko wrote:
> > > > > Switch most of BPF helper definitions from returning int to long. These
> > > > > definitions are coming from comments in BPF UAPI header and are used to
> > > > > generate bpf_helper_defs.h (under libbpf) to be later included and used from
> > > > > BPF programs.
> > > > >
> > > > > In actual in-kernel implementation, all the helpers are defined as returning
> > > > > u64, but due to some historical reasons, most of them are actually defined as
> > > > > returning int in UAPI (usually, to return 0 on success, and negative value on
> > > > > error).
> > > >
> > > > Could we change the helpers side to return correct types now? Meaning if the
> > > > UAPI claims its an int lets actually return the int.
> > >
> > > I'm not sure how exactly you see this being done. BPF ABI dictates
> > > that the helper's result is passed in a full 64-bit r0 register. Are
> > > you suggesting that in addition to RET_ANYTHING we should add
> > > RET_ANYTHING32 and teach verifier that higher 32 bits of r0 are
> > > guaranteed to be zero? And then make helpers actually return 32-bit
> > > values without up-casting them to u64?
> >
> > Yes this is what I was thinking, having a RET_ANYTHING32 but I would
> > assume the upper 32-bits could be anything not zeroed. For +alu32
> > and programmer using correct types I would expect clang to generate
> > good code here and mostly not need to zero upper bits.
> >
> > I think this discussion can be independent of your changes though and
> > its not at the top of my todo list so probably wont get to investigating
> > more for awhile.
>
> I'm confused. If the verifier doesn't make any assumptions about upper
> 32-bits for RET_ANYTHING32, how is it different from RET_ANYTHING and
> today's logic? What you described is exactly what is happening when
> bpf_helpers_def.h has BPF helpers defined as returning int.
>
Agreed. I recall it helping the 32-bit bounds on the verifier side
somewhere. But lets drop it maybe it really is not useful. I'll go
try and recall the details later.
[...] Aggressively pruning
> >
> > Agreed. Sorry for the confusion on my side. Poked at this a bit more this
> > morning trying to see why I don't hit the same pattern when we have many
> > cases very similar to above.
> >
> > In your C code you never check for negative return codes? Oops, this
> > means you can walk backwards off the front of payload? This is probably
> > not valid either from program logic side and/or verifier will probably
> > yell. Commented where I think you want a <0 check here,
>
> You are missing that I'm using unsigned u64. So (s64)-1 ==
> (u64)0xFFFFFFFFFFFFFFFF. So negative errors are effectively turned
> into too large length and I filter them out with the same (len >
> MAX_SZ) check. This allows to do just one comparison instead of two,
> and also helps avoid some Clang optimizations that Yonghong is trying
> to undo right now (if (a > X && a < Y) turned into if (x < Y - X),
> with assembly that verifier can't verify). So no bug there, very
> deliberate choice of types.
I caught it just after I sent above ;) In our codebase we do need to
handle errors and truncated strings differently so really do need the
two conditions. I guess we could find some clever way around it but
in practice on latest kernels we've not seen much trouble around
these with +alu32.
Interesting about the optimization I've not seen that one yet.
[...]
> See above. In practice (it might be no-ALU32-only thing, don't know),
> doing two ifs is both less efficient and quite often leads to
> unverifiable code. Users have to do hacks to complicate control flow
> enough to prevent Clang from doing Hi/Lo combining. I learned a new
> inlined assembly trick recently to prevent this, but either way it's
> unpleasant and unnecessary.
In the end we also run on ancient kernels so have lots of tricks.
[...] more pruning
> > > My point was that this int -> long switch doesn't degrade ALU32 and
> > > helps no-ALU32, and thus is good :)
> >
> > With the long vs int I do see worse code when using the <0 check.
> > Using C function below which I took from some real code and renamed
> > variables.
> >
> > int myfunc(void *a, void *b, void *c) {
> > void *payload = a;
> > int len;
> >
> > len = probe_read_str(payload, 1000, a);
> > if (len < 0) return len;
> > if (len <= 1000) {
> > payload += len;
> > }
> > len = probe_read_str(payload, 1000, b);
> > if (len <= 1000) {
> > payload += len;
> > }
> > return 1;
> > }
> >
> > Then here is the side-by-side of generated code, with +ALU32.
> >
> > int BPF_FUNC(probe_read, ... long BPF_FUNC(probe_read, ...
> > ------------------------------- ---------------------------------
> > 0: r6 = r2 0: r6 = r2
> > 1: r7 = r1 1: r7 = r1
> > 2: w2 = 1000 2: w2 = 1000
> > 3: r3 = r7 3: r3 = r7
> > 4: call 45 4: call 45
> > 5: if w0 s< 0 goto +9 <LBB0_4> 5: r2 = r0
> > 6: w2 = w0 6: if w0 s< 0 goto +10 <LBB0_4>
> > 7: r1 = r7 7: r2 <<= 32
> > 8: r1 += r2 8: r2 s>>= 32
> > 9: if w0 s< 1001 goto +1 <LBB0_3> 9: r1 = r7
> > 10: r1 = r7 10: r1 += r2
> > 11: w2 = 1000 11: if w0 s< 1001 goto +1 <LBB0_3>
> > 12: r3 = r6 12: r1 = r7
> > 13: call 45 13: w2 = 1000
> > 14: w0 = 1 14: r3 = r6
> > 15: exit 15: call 45
> > 16: w0 = 1
> > 17: exit
> >
> > So a couple extra instruction, but more concerning we created a
> > <<=,s>> pattern. I'll need to do some more tests but my concern
> > is that could break verifier for real programs we have. I guess
> > it didn't in the selftests? Surely, this thread has at least
> > pointed out some gaps in our test cases. I guess I _could_ make
> > len a u64 type to remove the sext but then <0 check on a u64?!
>
> I addressed <0 check above. As for <<=,s>>=, I wish Clang was a bit
> smarter and just did w2 = w2 or something like that, given we just
> checked that w0 is non-negative. But then again, I wouldn't do two ifs
> and wouldn't use signed int for len.
It is smart enough once you get all the types aligned. So after pulling
in int->long change ideally we would change codebase to use long types as
well. Maybe we should modify the tests in selftests as well OTOH
its nice to test what happens when folks leave the return types as int.
>
> >
> > >
> > > Overall, long as a return type matches reality and BPF ABI
> > > specification. BTW, one of the varlen programs from patch 2 doesn't
> > > even validate successfully on latest kernel with latest Clang right
> > > now, if helpers return int, even though it's completely correct code.
> > > That's a real problem we have to deal with in few major BPF
> > > applications right now, and we have to use inline assembly to enforce
> > > Clang to do the right thing. A bunch of those problems are simply
> > > avoided with correct return types for helpers.
> >
> > Do the real programs check <0? Did I miss something? I'll try
> > applying your patch to our real code base and see what happens.
>
> That would be great. Self-tests do work, but having more testing with
> real-world application would certainly help as well.
Thanks for all the follow up.
I ran the change through some CI on my side and it passed so I can
complain about a few shifts here and there or just update my code or
just not change the return types on my side but I'm convinced its OK
in most cases and helps in some so...
Acked-by: John Fastabend <john.fastabend@gmail.com>
^ permalink raw reply
* Re: af_decnet.c: missing semi-colon and/or indentation?
From: Eric Dumazet @ 2020-06-18 23:35 UTC (permalink / raw)
To: Randy Dunlap, netdev@vger.kernel.org, David Miller
In-Reply-To: <4649af05-ac31-4c57-a895-39866504b5fb@infradead.org>
On 6/18/20 4:19 PM, Randy Dunlap wrote:
>
> Please see lines 1250-1251.
>
>
> case TIOCINQ:
> lock_sock(sk);
> skb = skb_peek(&scp->other_receive_queue);
> if (skb) {
> amount = skb->len;
> } else {
> skb_queue_walk(&sk->sk_receive_queue, skb) <<<<<
> amount += skb->len; <<<<<
> }
> release_sock(sk);
> err = put_user(amount, (int __user *)arg);
> break;
>
>
>
> or is this some kind of GCC nested function magic?
>
I do not see a problem
for (bla; bla; bla)
amount += skb->len;
Seems good to me.
>
> commit bec571ec762a4cf855ad4446f833086fc154b60e
> Author: David S. Miller <davem@davemloft.net>
> Date: Thu May 28 16:43:52 2009 -0700
>
> decnet: Use SKB queue and list helpers instead of doing it by-hand.
>
>
>
> thanks.
>
Also decnet should not be any of our concerns in 2020 ?
^ permalink raw reply
* Re: [PATCH 2/5] net: hns3: pointer type of buffer should be void
From: Jakub Kicinski @ 2020-06-18 23:35 UTC (permalink / raw)
To: Barry Song
Cc: davem, yisen.zhuang, salil.mehta, netdev, linyunsheng,
linux-kernel, linuxarm
In-Reply-To: <20200618010211.75840-3-song.bao.hua@hisilicon.com>
On Thu, 18 Jun 2020 13:02:08 +1200 Barry Song wrote:
> Move the type of buffer address from unsigned char to void
>
> Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
> ---
> drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 2 +-
> drivers/net/ethernet/hisilicon/hns3/hns3_enet.h | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
> index 1817d7f2e5f6..61b5a849b162 100644
> --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
> @@ -3070,7 +3070,7 @@ static int hns3_handle_rx_bd(struct hns3_enet_ring *ring)
> return -ENXIO;
>
> if (!skb)
> - ring->va = (unsigned char *)desc_cb->buf + desc_cb->page_offset;
> + ring->va = desc_cb->buf + desc_cb->page_offset;
>
> /* Prefetch first cache line of first page
> * Idea is to cache few bytes of the header of the packet. Our L1 Cache
> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
> index 66cd4395f781..9f64077ee834 100644
> --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
> @@ -407,7 +407,7 @@ struct hns3_enet_ring {
>
> u32 pull_len; /* head length for current packet */
> u32 frag_num;
> - unsigned char *va; /* first buffer address for current packet */
> + void *va; /* first buffer address for current packet */
>
> u32 flag; /* ring attribute */
>
I think void pointer arithmetic is questionable in the eyes of the C
standard. But I'm not sure about kernel C.
Otherwise series looks good to me.
^ permalink raw reply
* Re: [PATCH net-next v8 2/5] net: phy: Add a helper to return the index for of the internal delay
From: Dan Murphy @ 2020-06-18 23:42 UTC (permalink / raw)
To: kernel test robot, andrew, f.fainelli, hkallweit1, davem, robh
Cc: kbuild-all, netdev, linux-kernel, devicetree
In-Reply-To: <202006190753.uZWR7VTH%lkp@intel.com>
Hello
On 6/18/20 6:29 PM, kernel test robot wrote:
> Hi Dan,
>
> I love your patch! Yet something to improve:
>
> [auto build test ERROR on net-next/master]
>
> url: https://github.com/0day-ci/linux/commits/Dan-Murphy/RGMII-Internal-delay-common-property/20200619-051238
> base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git cb8e59cc87201af93dfbb6c3dccc8fcad72a09c2
> config: i386-defconfig (attached as .config)
> compiler: gcc-9 (Debian 9.3.0-13) 9.3.0
> reproduce (this is a W=1 build):
> # save the attached .config to linux build tree
> make W=1 ARCH=i386
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
>
> All errors (new ones prefixed by >>):
>
> drivers/net/phy/phy_device.c: In function 'phy_get_int_delay_property':
>>> drivers/net/phy/phy_device.c:2678:1: error: expected ';' before '}' token
> 2678 | }
> | ^
>
> vim +2678 drivers/net/phy/phy_device.c
>
> 2660
> 2661 #if IS_ENABLED(CONFIG_OF_MDIO)
> 2662 static int phy_get_int_delay_property(struct device *dev, const char *name)
> 2663 {
> 2664 s32 int_delay;
> 2665 int ret;
> 2666
> 2667 ret = device_property_read_u32(dev, name, &int_delay);
> 2668 if (ret)
> 2669 return ret;
> 2670
> 2671 return int_delay;
> 2672 }
> 2673 #else
> 2674 static inline int phy_get_int_delay_property(struct device *dev,
> 2675 const char *name)
> 2676 {
> 2677 return -EINVAL
Ack missed compiling this use case.
Dan
>> 2678 }
> 2679 #endif
> 2680
>
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
^ permalink raw reply
* Re: [PATCH net-next 1/3] net/sched: Introduce action hash
From: Daniel Borkmann @ 2020-06-18 23:42 UTC (permalink / raw)
To: Ariel Levkovich, netdev
Cc: jiri, kuba, jhs, xiyou.wangcong, ast, Jiri Pirko, bpf
In-Reply-To: <20200618221548.3805-2-lariel@mellanox.com>
On 6/19/20 12:15 AM, Ariel Levkovich wrote:
> Allow setting a hash value to a packet for a future match.
>
> The action will determine the packet's hash result according to
> the selected hash type.
>
> The first option is to select a basic asymmetric l4 hash calculation
> on the packet headers which will either use the skb->hash value as
> if such was already calculated and set by the device driver, or it
> will perform the kernel jenkins hash function on the packet which will
> generate the result otherwise.
>
> The other option is for user to provide an BPF program which is
> dedicated to calculate the hash. In such case the program is loaded
> and used by tc to perform the hash calculation and provide it to
> the hash action to be stored in skb->hash field.
The commit description is a bit misleading, since setting a custom skb->hash
from BPF program can be done already via {cls,act}_bpf from *within* BPF in
tc for ~3 years by now via [0].
[0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ded092cd73c2c56a394b936f86897f29b2e131c0
> The BPF option can be useful for future HW offload support of the hash
> calculation by emulating the HW hash function when it's different than
> the kernel's but yet we want to maintain consistency between the SW and
> the HW.
... use case should say 'dummy SW module for HW offload'. ;-/
> Usage is as follows:
>
> $ tc filter add dev ens1f0_0 ingress \
> prio 1 chain 0 proto ip \
> flower ip_proto tcp \
> action hash bpf object-file <bpf file> \
> action goto chain 2
[...]
>
> Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
> Reviewed-by: Jiri Pirko <jiri@mellanox.com>
[...]
> +static int tcf_hash_act(struct sk_buff *skb, const struct tc_action *a,
> + struct tcf_result *res)
> +{
> + struct tcf_hash *h = to_hash(a);
> + struct tcf_hash_params *p;
> + int action;
> + u32 hash;
> +
> + tcf_lastuse_update(&h->tcf_tm);
> + tcf_action_update_bstats(&h->common, skb);
> +
> + p = rcu_dereference_bh(h->hash_p);
> +
> + action = READ_ONCE(h->tcf_action);
> +
> + switch (p->alg) {
> + case TCA_HASH_ALG_L4:
> + hash = skb_get_hash(skb);
> + /* If a hash basis was provided, add it into
> + * hash calculation here and re-set skb->hash
> + * to the new result with sw_hash indication
> + * and keeping the l4 hash indication.
> + */
> + hash = jhash_1word(hash, p->basis);
> + __skb_set_sw_hash(skb, hash, skb->l4_hash);
> + break;
> + case TCA_HASH_ALG_BPF:
> + __skb_push(skb, skb->mac_len);
> + bpf_compute_data_pointers(skb);
> + hash = BPF_PROG_RUN(p->prog, skb);
> + __skb_pull(skb, skb->mac_len);
> + /* The BPF program hash function type is
> + * unknown so only the sw hash bit is set.
> + */
> + __skb_set_sw_hash(skb, hash, false);
Given this runs BPF_PROG_TYPE_SCHED_ACT-typed programs, act_hash.c now becomes
pretty much another clone of {cls,act}_bpf.c with the full feature set in terms
of what it can do with the program, like header mangling, encap/decap, setting
tunnel keys, cloning + redirecting skbs, etc. This is rather misleading given you
basically would only want the direct pkt access bits and plain insns for calc'ing
the hash, but none of the other features.
> + break;
> + }
> +
> + return action;
> +}
[...]
> +static int tcf_hash_bpf_init(struct nlattr **tb, struct tcf_hash_params *params)
> +{
> + struct bpf_prog *fp;
> + char *name = NULL;
> + u32 bpf_fd;
> +
> + bpf_fd = nla_get_u32(tb[TCA_HASH_BPF_FD]);
> +
> + fp = bpf_prog_get_type(bpf_fd, BPF_PROG_TYPE_SCHED_ACT);
> + if (IS_ERR(fp))
> + return PTR_ERR(fp);
> +
> + if (tb[TCA_HASH_BPF_NAME]) {
> + name = nla_memdup(tb[TCA_HASH_BPF_NAME], GFP_KERNEL);
> + if (!name) {
> + bpf_prog_put(fp);
> + return -ENOMEM;
> + }
> + }
> +
> + params->bpf_name = name;
> + params->prog = fp;
> +
> + return 0;
> +}
^ permalink raw reply
* Re: [net-next PATCH] net: Avoid overwriting valid skb->napi_id
From: Nambiar, Amritha @ 2020-06-18 23:44 UTC (permalink / raw)
To: Eric Dumazet
Cc: netdev, David Miller, Alexander Duyck, Eliezer Tamir,
Samudrala, Sridhar
In-Reply-To: <CANn89i+3CZE1V5AQt0MA_ptsjfHEqUL+LV2VwiD41_3dyXq2pQ@mail.gmail.com>
On 6/18/2020 3:29 PM, Eric Dumazet wrote:
> On Thu, Jun 18, 2020 at 2:21 PM Amritha Nambiar
> <amritha.nambiar@intel.com> wrote:
>>
>> This will be useful to allow busy poll for tunneled traffic. In case of
>> busy poll for sessions over tunnels, the underlying physical device's
>> queues need to be polled.
>>
>> Tunnels schedule NAPI either via netif_rx() for backlog queue or
>> schedule the gro_cell_poll(). netif_rx() propagates the valid skb->napi_id
>> to the socket. OTOH, gro_cell_poll() stamps the skb->napi_id again by
>> calling skb_mark_napi_id() with the tunnel NAPI which is not a busy poll
>> candidate.
>
>
> Yes the tunnel NAPI id should be 0 really (look for NAPI_STATE_NO_BUSY_POLL)
>
>> This was preventing tunneled traffic to use busy poll. A valid
>> NAPI ID in the skb indicates it was already marked for busy poll by a
>> NAPI driver and hence needs to be copied into the socket.
>>
>> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
>> ---
>> include/net/busy_poll.h | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
>> index 86e028388bad..b001fa91c14e 100644
>> --- a/include/net/busy_poll.h
>> +++ b/include/net/busy_poll.h
>> @@ -114,7 +114,11 @@ static inline void skb_mark_napi_id(struct sk_buff *skb,
>> struct napi_struct *napi)
>> {
>> #ifdef CONFIG_NET_RX_BUSY_POLL
>> - skb->napi_id = napi->napi_id;
>> + /* If the skb was already marked with a valid NAPI ID, avoid overwriting
>> + * it.
>> + */
>> + if (skb->napi_id < MIN_NAPI_ID)
>> + skb->napi_id = napi->napi_id;
>
>
> It should be faster to not depend on MIN_NAPI_ID (aka NR_CPUS + 1)
>
> if (napi->napi_id)
> skb->napi_id = napi->napi_id;
>
> Probably not a big deal.
>
> Reviewed-by: Eric Dumazet <edumazet@google.com>
>
Thanks for the review. Should I send a v2 with this change?
>
>
>
>
>
>>
>> #endif
>> }
>>
>>
^ permalink raw reply
* Re: [PATCH bpf 2/2] selftests/bpf: add variable-length data concatenation pattern test
From: John Fastabend @ 2020-06-18 23:48 UTC (permalink / raw)
To: Andrii Nakryiko, John Fastabend
Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
Daniel Borkmann, Kernel Team, Christoph Hellwig
In-Reply-To: <CAEf4BzYNFddhDxLAkOC+q_ZWAet42aHybDiJT9odrzF8n5BBig@mail.gmail.com>
Andrii Nakryiko wrote:
> On Thu, Jun 18, 2020 at 12:09 PM John Fastabend
> <john.fastabend@gmail.com> wrote:
> >
> > Andrii Nakryiko wrote:
> > > Add selftest that validates variable-length data reading and concatentation
> > > with one big shared data array. This is a common pattern in production use for
> > > monitoring and tracing applications, that potentially can read a lot of data,
> > > but usually reads much less. Such pattern allows to determine precisely what
> > > amount of data needs to be sent over perfbuf/ringbuf and maximize efficiency.
> > >
> > > This is the first BPF selftest that at all looks at and tests
> > > bpf_probe_read_str()-like helper's return value, closing a major gap in BPF
> > > testing. It surfaced the problem with bpf_probe_read_kernel_str() returning
> > > 0 on success, instead of amount of bytes successfully read.
> > >
> > > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > > ---
> >
> > [...]
> >
> > > +/* .data */
> > > +int payload2_len1 = -1;
> > > +int payload2_len2 = -1;
> > > +int total2 = -1;
> > > +char payload2[MAX_LEN + MAX_LEN] = { 1 };
> > > +
> > > +SEC("raw_tp/sys_enter")
> > > +int handler64(void *regs)
> > > +{
> > > + int pid = bpf_get_current_pid_tgid() >> 32;
> > > + void *payload = payload1;
> > > + u64 len;
> > > +
> > > + /* ignore irrelevant invocations */
> > > + if (test_pid != pid || !capture)
> > > + return 0;
> > > +
> > > + len = bpf_probe_read_kernel_str(payload, MAX_LEN, &buf_in1[0]);
> > > + if (len <= MAX_LEN) {
> >
> > Took me a bit grok this. You are relying on the fact that in errors,
> > such as a page fault, will encode to a large u64 value and so you
> > verifier is happy. But most of my programs actually want to distinguish
> > between legitimate errors on the probe vs buffer overrun cases.
>
> What buffer overrun? bpf_probe_read_str() family cannot return higher
> value than MAX_LEN. If you want to detect truncated strings, then you
> can attempt reading MAX_LEN + 1 and then check that the return result
> is MAX_LEN exactly. But still, that would be something like:
> u64 len;
>
> len = bpf_probe_read_str(payload, MAX_LEN + 1, &buf);
> if (len > MAX_LEN)
> return -1;
> if (len == MAX_LEN) {
> /* truncated */
> } else {
> /* full string */
> }
+1
>
> >
> > Can we make these tests do explicit check for errors. For example,
> >
> > if (len < 0) goto abort;
> >
> > But this also breaks your types here. This is what I was trying to
> > point out in the 1/2 patch thread. Wanted to make the point here as
> > well in case it wasn't clear. Not sure I did the best job explaining.
> >
>
> I can write *a correct* C code in a lot of ways such that it will not
> pass verifier verification, not sure what that will prove, though.
>
> Have you tried using the pattern with two ifs with no-ALU32? Does it work?
Ran our CI on both mcpu=v2 and mcpu=v3 and the pattern with multiple
ifs exists in those tests. They both passed so everything seems OK.
In the real progs though things are a bit more complicated I didn't
check the exact generate code. Some how I missed the case below.
I put a compiler barrier in a few spots so I think this is blocking
the optimization below causing no-alu32 failures. I'll remove the
barriers after I wrap a few things reviews.. my own bug fixes ;) and
see if I can trigger the case below.
>
> Also you are cheating in your example (in patch #1 thread). You are
> exiting on the first error and do not attempt to read any more data
> after that. In practice, you want to get as much info as possible,
> even if some of string reads fail (e.g., because argv might not be
> paged in, but env is, or vice versa). So you'll end up doing this:
Sure.
>
> len = bpf_probe_read_str(...);
> if (len >= 0 && len <= MAX_LEN) {
> payload += len;
> }
> ...
>
> ... and of course it spectacularly fails in no-ALU32.
>
> To be completely fair, this is a result of Clang optimization and
> Yonghong is trying to deal with it as we speak. Switching int to long
> for helpers doesn't help it either. But there are better code patterns
> (unsigned len + single if check) that do work with both ALU32 and
> no-ALU32.
Great.
>
> And I just double-checked, this pattern keeps working for ALU32 with
> both int and long types, so maybe there are unnecessary bit shifts,
> but at least code is still verifiable.
>
> So my point stands. int -> long helps in some cases and doesn't hurt
> in others, so I argue that it's a good thing to do :)
Convinced me as well. I Acked the other patch thanks.
^ permalink raw reply
* Re: [net-next 13/15] iecm: Add ethtool
From: Jakub Kicinski @ 2020-06-18 23:50 UTC (permalink / raw)
To: Jeff Kirsher
Cc: davem, Alice Michael, netdev, nhorman, sassmann, Alan Brady,
Phani Burra, Joshua Hay, Madhu Chittim, Pavan Kumar Linga,
Donald Skidmore, Jesse Brandeburg, Sridhar Samudrala
In-Reply-To: <20200618051344.516587-14-jeffrey.t.kirsher@intel.com>
On Wed, 17 Jun 2020 22:13:42 -0700 Jeff Kirsher wrote:
> +static const struct ethtool_ops iecm_ethtool_ops = {
> + .get_drvinfo = iecm_get_drvinfo,
> + .get_msglevel = iecm_get_msglevel,
> + .set_msglevel = iecm_set_msglevel,
> + .get_coalesce = iecm_get_coalesce,
> + .set_coalesce = iecm_set_coalesce,
> + .get_per_queue_coalesce = iecm_get_per_q_coalesce,
> + .set_per_queue_coalesce = iecm_set_per_q_coalesce,
> + .get_ethtool_stats = iecm_get_ethtool_stats,
> + .get_strings = iecm_get_strings,
> + .get_sset_count = iecm_get_sset_count,
> + .get_rxnfc = iecm_get_rxnfc,
> + .get_rxfh_key_size = iecm_get_rxfh_key_size,
> + .get_rxfh_indir_size = iecm_get_rxfh_indir_size,
> + .get_rxfh = iecm_get_rxfh,
> + .set_rxfh = iecm_set_rxfh,
> + .get_channels = iecm_get_channels,
> + .set_channels = iecm_set_channels,
> + .get_ringparam = iecm_get_ringparam,
> + .set_ringparam = iecm_set_ringparam,
> + .get_link_ksettings = iecm_get_link_ksettings,
> +};
Oh wow. So you're upstreaming this driver based on at least a 3 month
old tree? This:
commit 9000edb71ab29d184aa33f5a77fa6e52d8812bb9
Author: Jakub Kicinski <kuba@kernel.org>
Date: Mon Mar 16 13:47:12 2020 -0700
+int ethtool_check_ops(const struct ethtool_ops *ops)
+{
+ if (WARN_ON(ops->set_coalesce && !ops->supported_coalesce_params))
+ return -EINVAL;
would have otherwise triggered.
^ permalink raw reply
* Re: [net-next PATCH] net: Avoid overwriting valid skb->napi_id
From: Eric Dumazet @ 2020-06-18 23:53 UTC (permalink / raw)
To: Nambiar, Amritha
Cc: netdev, David Miller, Alexander Duyck, Eliezer Tamir,
Samudrala, Sridhar
In-Reply-To: <b943b5d1-aa71-4eb2-ee77-d3c56cdf494a@intel.com>
On Thu, Jun 18, 2020 at 4:44 PM Nambiar, Amritha
<amritha.nambiar@intel.com> wrote:
>
> Thanks for the review. Should I send a v2 with this change?
No, this is really a matter of taste ;)
^ permalink raw reply
* Re: [PATCH] bpf: Allow small structs to be type of function argument
From: Andrii Nakryiko @ 2020-06-18 23:59 UTC (permalink / raw)
To: John Fastabend
Cc: Jiri Olsa, Andrii Nakryiko, Jiri Olsa, Alexei Starovoitov,
Daniel Borkmann, Networking, bpf, Yonghong Song, Martin KaFai Lau,
Jakub Kicinski, David Miller, Jesper Dangaard Brouer, KP Singh,
Masanori Misono
In-Reply-To: <5eebe552dddc1_6d292ad5e7a285b83f@john-XPS-13-9370.notmuch>
On Thu, Jun 18, 2020 at 3:50 PM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Jiri Olsa wrote:
> > On Wed, Jun 17, 2020 at 04:20:54PM -0700, John Fastabend wrote:
> > > Jiri Olsa wrote:
> > > > This way we can have trampoline on function
> > > > that has arguments with types like:
> > > >
> > > > kuid_t uid
> > > > kgid_t gid
> > > >
> > > > which unwind into small structs like:
> > > >
> > > > typedef struct {
> > > > uid_t val;
> > > > } kuid_t;
> > > >
> > > > typedef struct {
> > > > gid_t val;
> > > > } kgid_t;
> > > >
> > > > And we can use them in bpftrace like:
> > > > (assuming d_path changes are in)
> > > >
> > > > # bpftrace -e 'lsm:path_chown { printf("uid %d, gid %d\n", args->uid, args->gid) }'
> > > > Attaching 1 probe...
> > > > uid 0, gid 0
> > > > uid 1000, gid 1000
> > > > ...
> > > >
> > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > > ---
> > > > kernel/bpf/btf.c | 12 +++++++++++-
> > > > 1 file changed, 11 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > > > index 58c9af1d4808..f8fee5833684 100644
> > > > --- a/kernel/bpf/btf.c
> > > > +++ b/kernel/bpf/btf.c
> > > > @@ -362,6 +362,14 @@ static bool btf_type_is_struct(const struct btf_type *t)
> > > > return kind == BTF_KIND_STRUCT || kind == BTF_KIND_UNION;
> > > > }
> > > >
> > > > +/* type is struct and its size is within 8 bytes
> > > > + * and it can be value of function argument
> > > > + */
> > > > +static bool btf_type_is_struct_arg(const struct btf_type *t)
> > > > +{
> > > > + return btf_type_is_struct(t) && (t->size <= sizeof(u64));
> > >
> > > Can you comment on why sizeof(u64) here? The int types can be larger
> > > than 64 for example and don't have a similar check, maybe the should
> > > as well?
> > >
> > > Here is an example from some made up program I ran through clang and
> > > bpftool.
> > >
> > > [2] INT '__int128' size=16 bits_offset=0 nr_bits=128 encoding=SIGNED
> > >
> > > We also have btf_type_int_is_regular to decide if the int is of some
> > > "regular" size but I don't see it used in these paths.
> >
> > so this small structs are passed as scalars via function arguments,
> > so the size limit is to fit teir value into register size which holds
> > the argument
> >
> > I'm not sure how 128bit numbers are passed to function as argument,
> > but I think we can treat them separately if there's a need
> >
>
> Moving Andrii up to the TO field ;)
I've got an upgrade, thanks :)
>
> Andrii, do we also need a guard on the int type with sizeof(u64)?
> Otherwise the arg calculation might be incorrect? wdyt did I follow
> along correctly.
Yes, we probably do. I actually never used __int128 in practice, but
decided to look at what Clang does for a function accepting __int128.
Turns out it passed it in two consecutive registers. So:
__weak int bla(__int128 x) { return (int)(x + 1); }
The assembly is:
38: b7 01 00 00 fe ff ff ff r1 = -2
39: b7 02 00 00 ff ff ff ff r2 = -1
40: 85 10 00 00 ff ff ff ff call -1
41: bc 01 00 00 00 00 00 00 w1 = w0
So low 64-bits go into r1, high 64-bits into r2.
Which means the 1:1 mapping between registers and input arguments
breaks with __int128, at least for target BPF. I'm too lazy to check
for x86-64, though.
^ permalink raw reply
* Re: [PATCH bpf 2/2] selftests/bpf: add variable-length data concatenation pattern test
From: Andrii Nakryiko @ 2020-06-19 0:09 UTC (permalink / raw)
To: John Fastabend
Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
Daniel Borkmann, Kernel Team, Christoph Hellwig
In-Reply-To: <5eebfd54ec54f_27ce2adb0816a5b876@john-XPS-13-9370.notmuch>
On Thu, Jun 18, 2020 at 4:48 PM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Andrii Nakryiko wrote:
> > On Thu, Jun 18, 2020 at 12:09 PM John Fastabend
> > <john.fastabend@gmail.com> wrote:
> > >
> > > Andrii Nakryiko wrote:
> > > > Add selftest that validates variable-length data reading and concatentation
> > > > with one big shared data array. This is a common pattern in production use for
> > > > monitoring and tracing applications, that potentially can read a lot of data,
> > > > but usually reads much less. Such pattern allows to determine precisely what
> > > > amount of data needs to be sent over perfbuf/ringbuf and maximize efficiency.
> > > >
> > > > This is the first BPF selftest that at all looks at and tests
> > > > bpf_probe_read_str()-like helper's return value, closing a major gap in BPF
> > > > testing. It surfaced the problem with bpf_probe_read_kernel_str() returning
> > > > 0 on success, instead of amount of bytes successfully read.
> > > >
> > > > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > > > ---
> > >
> > > [...]
> > >
> > > > +/* .data */
> > > > +int payload2_len1 = -1;
> > > > +int payload2_len2 = -1;
> > > > +int total2 = -1;
> > > > +char payload2[MAX_LEN + MAX_LEN] = { 1 };
> > > > +
> > > > +SEC("raw_tp/sys_enter")
> > > > +int handler64(void *regs)
> > > > +{
> > > > + int pid = bpf_get_current_pid_tgid() >> 32;
> > > > + void *payload = payload1;
> > > > + u64 len;
> > > > +
> > > > + /* ignore irrelevant invocations */
> > > > + if (test_pid != pid || !capture)
> > > > + return 0;
> > > > +
> > > > + len = bpf_probe_read_kernel_str(payload, MAX_LEN, &buf_in1[0]);
> > > > + if (len <= MAX_LEN) {
> > >
> > > Took me a bit grok this. You are relying on the fact that in errors,
> > > such as a page fault, will encode to a large u64 value and so you
> > > verifier is happy. But most of my programs actually want to distinguish
> > > between legitimate errors on the probe vs buffer overrun cases.
> >
> > What buffer overrun? bpf_probe_read_str() family cannot return higher
> > value than MAX_LEN. If you want to detect truncated strings, then you
> > can attempt reading MAX_LEN + 1 and then check that the return result
> > is MAX_LEN exactly. But still, that would be something like:
> > u64 len;
> >
> > len = bpf_probe_read_str(payload, MAX_LEN + 1, &buf);
> > if (len > MAX_LEN)
> > return -1;
> > if (len == MAX_LEN) {
> > /* truncated */
> > } else {
> > /* full string */
> > }
>
> +1
>
> >
> > >
> > > Can we make these tests do explicit check for errors. For example,
> > >
> > > if (len < 0) goto abort;
> > >
> > > But this also breaks your types here. This is what I was trying to
> > > point out in the 1/2 patch thread. Wanted to make the point here as
> > > well in case it wasn't clear. Not sure I did the best job explaining.
> > >
> >
> > I can write *a correct* C code in a lot of ways such that it will not
> > pass verifier verification, not sure what that will prove, though.
> >
> > Have you tried using the pattern with two ifs with no-ALU32? Does it work?
>
> Ran our CI on both mcpu=v2 and mcpu=v3 and the pattern with multiple
> ifs exists in those tests. They both passed so everything seems OK.
> In the real progs though things are a bit more complicated I didn't
> check the exact generate code. Some how I missed the case below.
> I put a compiler barrier in a few spots so I think this is blocking
> the optimization below causing no-alu32 failures. I'll remove the
> barriers after I wrap a few things reviews.. my own bug fixes ;) and
> see if I can trigger the case below.
>
> >
> > Also you are cheating in your example (in patch #1 thread). You are
> > exiting on the first error and do not attempt to read any more data
> > after that. In practice, you want to get as much info as possible,
> > even if some of string reads fail (e.g., because argv might not be
> > paged in, but env is, or vice versa). So you'll end up doing this:
>
> Sure.
>
> >
> > len = bpf_probe_read_str(...);
> > if (len >= 0 && len <= MAX_LEN) {
> > payload += len;
> > }
> > ...
> >
> > ... and of course it spectacularly fails in no-ALU32.
> >
> > To be completely fair, this is a result of Clang optimization and
> > Yonghong is trying to deal with it as we speak. Switching int to long
> > for helpers doesn't help it either. But there are better code patterns
> > (unsigned len + single if check) that do work with both ALU32 and
> > no-ALU32.
>
> Great.
>
> >
> > And I just double-checked, this pattern keeps working for ALU32 with
> > both int and long types, so maybe there are unnecessary bit shifts,
> > but at least code is still verifiable.
> >
> > So my point stands. int -> long helps in some cases and doesn't hurt
> > in others, so I argue that it's a good thing to do :)
>
> Convinced me as well. I Acked the other patch thanks.
Awesome :) Thanks for extra testing and validation on your side!
^ permalink raw reply
* Re: af_decnet.c: missing semi-colon and/or indentation?
From: Randy Dunlap @ 2020-06-19 0:23 UTC (permalink / raw)
To: Eric Dumazet, netdev@vger.kernel.org, David Miller
In-Reply-To: <407fa160-aec6-3135-1579-f833bebe59a2@gmail.com>
On 6/18/20 4:35 PM, Eric Dumazet wrote:
>
>
> On 6/18/20 4:19 PM, Randy Dunlap wrote:
>>
>> Please see lines 1250-1251.
>>
>>
>> case TIOCINQ:
>> lock_sock(sk);
>> skb = skb_peek(&scp->other_receive_queue);
>> if (skb) {
>> amount = skb->len;
>> } else {
>> skb_queue_walk(&sk->sk_receive_queue, skb) <<<<<
>> amount += skb->len; <<<<<
>> }
>> release_sock(sk);
>> err = put_user(amount, (int __user *)arg);
>> break;
>>
>>
>>
>> or is this some kind of GCC nested function magic?
>>
>
> I do not see a problem
>
> for (bla; bla; bla)
> amount += skb->len;
>
> Seems good to me.
>
OK, I get it (now). Thanks.
>>
>> commit bec571ec762a4cf855ad4446f833086fc154b60e
>> Author: David S. Miller <davem@davemloft.net>
>> Date: Thu May 28 16:43:52 2009 -0700
>>
>> decnet: Use SKB queue and list helpers instead of doing it by-hand.
>>
>>
>>
>> thanks.
>>
>
> Also decnet should not be any of our concerns in 2020 ?
Ack.
--
~Randy
^ permalink raw reply
* Re: [PATCH] bpf: Allow small structs to be type of function argument
From: John Fastabend @ 2020-06-19 0:25 UTC (permalink / raw)
To: Andrii Nakryiko, John Fastabend
Cc: Jiri Olsa, Andrii Nakryiko, Jiri Olsa, Alexei Starovoitov,
Daniel Borkmann, Networking, bpf, Yonghong Song, Martin KaFai Lau,
Jakub Kicinski, David Miller, Jesper Dangaard Brouer, KP Singh,
Masanori Misono
In-Reply-To: <CAEf4Bzb+U+A9i0VfGUHLVt28WCob7pb-0iVQA8d1fcR8A27ZpA@mail.gmail.com>
Andrii Nakryiko wrote:
> On Thu, Jun 18, 2020 at 3:50 PM John Fastabend <john.fastabend@gmail.com> wrote:
> >
> > Jiri Olsa wrote:
> > > On Wed, Jun 17, 2020 at 04:20:54PM -0700, John Fastabend wrote:
> > > > Jiri Olsa wrote:
> > > > > This way we can have trampoline on function
> > > > > that has arguments with types like:
> > > > >
> > > > > kuid_t uid
> > > > > kgid_t gid
> > > > >
> > > > > which unwind into small structs like:
> > > > >
> > > > > typedef struct {
> > > > > uid_t val;
> > > > > } kuid_t;
> > > > >
> > > > > typedef struct {
> > > > > gid_t val;
> > > > > } kgid_t;
> > > > >
> > > > > And we can use them in bpftrace like:
> > > > > (assuming d_path changes are in)
> > > > >
> > > > > # bpftrace -e 'lsm:path_chown { printf("uid %d, gid %d\n", args->uid, args->gid) }'
> > > > > Attaching 1 probe...
> > > > > uid 0, gid 0
> > > > > uid 1000, gid 1000
> > > > > ...
> > > > >
> > > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > > > ---
> > > > > kernel/bpf/btf.c | 12 +++++++++++-
> > > > > 1 file changed, 11 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > > > > index 58c9af1d4808..f8fee5833684 100644
> > > > > --- a/kernel/bpf/btf.c
> > > > > +++ b/kernel/bpf/btf.c
> > > > > @@ -362,6 +362,14 @@ static bool btf_type_is_struct(const struct btf_type *t)
> > > > > return kind == BTF_KIND_STRUCT || kind == BTF_KIND_UNION;
> > > > > }
> > > > >
> > > > > +/* type is struct and its size is within 8 bytes
> > > > > + * and it can be value of function argument
> > > > > + */
> > > > > +static bool btf_type_is_struct_arg(const struct btf_type *t)
> > > > > +{
> > > > > + return btf_type_is_struct(t) && (t->size <= sizeof(u64));
> > > >
> > > > Can you comment on why sizeof(u64) here? The int types can be larger
> > > > than 64 for example and don't have a similar check, maybe the should
> > > > as well?
> > > >
> > > > Here is an example from some made up program I ran through clang and
> > > > bpftool.
> > > >
> > > > [2] INT '__int128' size=16 bits_offset=0 nr_bits=128 encoding=SIGNED
> > > >
> > > > We also have btf_type_int_is_regular to decide if the int is of some
> > > > "regular" size but I don't see it used in these paths.
> > >
> > > so this small structs are passed as scalars via function arguments,
> > > so the size limit is to fit teir value into register size which holds
> > > the argument
> > >
> > > I'm not sure how 128bit numbers are passed to function as argument,
> > > but I think we can treat them separately if there's a need
> > >
> >
> > Moving Andrii up to the TO field ;)
>
> I've got an upgrade, thanks :)
>
> >
> > Andrii, do we also need a guard on the int type with sizeof(u64)?
> > Otherwise the arg calculation might be incorrect? wdyt did I follow
> > along correctly.
>
> Yes, we probably do. I actually never used __int128 in practice, but
> decided to look at what Clang does for a function accepting __int128.
> Turns out it passed it in two consecutive registers. So:
>
> __weak int bla(__int128 x) { return (int)(x + 1); }
>
> The assembly is:
>
> 38: b7 01 00 00 fe ff ff ff r1 = -2
> 39: b7 02 00 00 ff ff ff ff r2 = -1
> 40: 85 10 00 00 ff ff ff ff call -1
> 41: bc 01 00 00 00 00 00 00 w1 = w0
>
> So low 64-bits go into r1, high 64-bits into r2.
>
> Which means the 1:1 mapping between registers and input arguments
> breaks with __int128, at least for target BPF. I'm too lazy to check
> for x86-64, though.
OK confirms what I suspected. For a fix we should bound int types
here to pointer word size which I think should be safe most everywhere.
I can draft a patch if you haven't done one already. For what its worth
RISC-V had some convention where it would use the even registers for
things. So
foo(int a, __int128 b)
would put a in r0 and b in r2 and r3 leaving a hole in r1. But that
was some old reference manual and might no longer be the case
in reality. Perhaps just spreading hearsay, but the point is we
should say something about what the BPF backend convention
is and write it down. We've started to bump into these things
lately.
^ permalink raw reply
* RE: [net-next 13/15] iecm: Add ethtool
From: Brady, Alan @ 2020-06-19 0:36 UTC (permalink / raw)
To: Michal Kubecek, netdev@vger.kernel.org
Cc: Kirsher, Jeffrey T, davem@davemloft.net, Michael, Alice,
nhorman@redhat.com, sassmann@redhat.com, Burra, Phani R,
Hay, Joshua A, Chittim, Madhu, Linga, Pavan Kumar,
Skidmore, Donald C, Brandeburg, Jesse, Samudrala, Sridhar
In-Reply-To: <20200618211726.ijsafmx52ha3lamz@lion.mk-sys.cz>
> -----Original Message-----
> From: Michal Kubecek <mkubecek@suse.cz>
> Sent: Thursday, June 18, 2020 2:17 PM
> To: netdev@vger.kernel.org
> Cc: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; davem@davemloft.net;
> Michael, Alice <alice.michael@intel.com>; nhorman@redhat.com;
> sassmann@redhat.com; Brady, Alan <alan.brady@intel.com>; Burra, Phani R
> <phani.r.burra@intel.com>; Hay, Joshua A <joshua.a.hay@intel.com>; Chittim,
> Madhu <madhu.chittim@intel.com>; Linga, Pavan Kumar
> <pavan.kumar.linga@intel.com>; Skidmore, Donald C
> <donald.c.skidmore@intel.com>; Brandeburg, Jesse
> <jesse.brandeburg@intel.com>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>
> Subject: Re: [net-next 13/15] iecm: Add ethtool
>
> On Wed, Jun 17, 2020 at 10:13:42PM -0700, Jeff Kirsher wrote:
> > From: Alice Michael <alice.michael@intel.com>
> >
> > Implement ethtool interface for the common module.
> >
> > Signed-off-by: Alice Michael <alice.michael@intel.com>
> > Signed-off-by: Alan Brady <alan.brady@intel.com>
> > Signed-off-by: Phani Burra <phani.r.burra@intel.com>
> > Signed-off-by: Joshua Hay <joshua.a.hay@intel.com>
> > Signed-off-by: Madhu Chittim <madhu.chittim@intel.com>
> > Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
> > Reviewed-by: Donald Skidmore <donald.c.skidmore@intel.com>
> > Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> > Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> > Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> > ---
> [...]
> > +static int iecm_set_channels(struct net_device *netdev,
> > + struct ethtool_channels *ch)
> > +{
> > + struct iecm_vport *vport = iecm_netdev_to_vport(netdev);
> > + int num_req_q = ch->combined_count;
> > +
> > + if (num_req_q == max(vport->num_txq, vport->num_rxq))
> > + return 0;
> > +
> > + /* All of these should have already been checked by ethtool before this
> > + * even gets to us, but just to be sure.
> > + */
> > + if (num_req_q <= 0 || num_req_q > IECM_MAX_Q)
> > + return -EINVAL;
> > +
> > + if (ch->rx_count || ch->tx_count || ch->other_count !=
> IECM_MAX_NONQ)
> > + return -EINVAL;
>
> I don't see much sense in duplicating the checks. Having the checks in common
> code allows us to simplify driver callbacks.
>
You're right it's better to remove these. Will fix.
> > + vport->adapter->config_data.num_req_qs = num_req_q;
> > +
> > + return iecm_initiate_soft_reset(vport, __IECM_SR_Q_CHANGE); }
> [...]
> > +static int iecm_set_ringparam(struct net_device *netdev,
> > + struct ethtool_ringparam *ring) {
> > + struct iecm_vport *vport = iecm_netdev_to_vport(netdev);
> > + u32 new_rx_count, new_tx_count;
> > +
> > + if (ring->rx_mini_pending || ring->rx_jumbo_pending)
> > + return -EINVAL;
> > +
> > + new_tx_count = clamp_t(u32, ring->tx_pending,
> > + IECM_MIN_TXQ_DESC,
> > + IECM_MAX_TXQ_DESC);
> > + new_tx_count = ALIGN(new_tx_count, IECM_REQ_DESC_MULTIPLE);
> > +
> > + new_rx_count = clamp_t(u32, ring->rx_pending,
> > + IECM_MIN_RXQ_DESC,
> > + IECM_MAX_RXQ_DESC);
> > + new_rx_count = ALIGN(new_rx_count, IECM_REQ_DESC_MULTIPLE);
>
> Same here. This is actually a bit misleading as it seems that count exceeding
> maximum would be silently clamped to it but such value would be rejected by
> common code.
>
Also agreed, will fix in next version.
> > + /* if nothing to do return success */
> > + if (new_tx_count == vport->txq_desc_count &&
> > + new_rx_count == vport->rxq_desc_count)
> > + return 0;
> > +
> > + vport->adapter->config_data.num_req_txq_desc = new_tx_count;
> > + vport->adapter->config_data.num_req_rxq_desc = new_rx_count;
> > +
> > + return iecm_initiate_soft_reset(vport, __IECM_SR_Q_DESC_CHANGE); }
> [...]
> > +/* For now we have one and only one private flag and it is only
> > +defined
> > + * when we have support for the SKIP_CPU_SYNC DMA attribute. Instead
> > + * of leaving all this code sitting around empty we will strip it
> > +unless
> > + * our one private flag is actually available.
> > + */
>
> The code below will always return 1 for ETH_SS_PRIV_FLAGS in
> iecm_get_sset_count() and an array of one string in iecm_get_strings().
> This would e.g. result in "ethtool -i" saying "supports-priv-flags: yes"
> but then "ethtool --show-priv-flags" failing with -EOPNOTSUPP. IMHO you
> should not return bogus string set if private flags are not implemented.
>
> Michal
>
I'm embarrassed we added a nice comment about stripping this because it doesn't do anything and then failed to do just that. Agreed the priv flag stuff should not be here without any private flags existing, will fix. Thanks for the review.
Alan
^ permalink raw reply
* Re: [PATCH 01/11] bpf: Add btfid tool to resolve BTF IDs in ELF object
From: Andrii Nakryiko @ 2020-06-19 0:38 UTC (permalink / raw)
To: Jiri Olsa
Cc: Alexei Starovoitov, Daniel Borkmann, Networking, bpf, Song Liu,
Yonghong Song, Martin KaFai Lau, David Miller, John Fastabend,
Wenbo Zhang, KP Singh, Andrii Nakryiko, Brendan Gregg,
Florent Revest, Al Viro
In-Reply-To: <20200616100512.2168860-2-jolsa@kernel.org>
On Tue, Jun 16, 2020 at 3:06 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> The btfid tool scans Elf object for .BTF_ids section and
> resolves its symbols with BTF IDs.
naming is hard and subjective, I know. But given this actively
modifies ELF file it probably should indicate this in the name. So
something like patch_btfids or resolve_btfids would be a bit more
accurate and for people not in the know will still trigger the
"warning, tool can modify something" flag, if there are any problems.
>
> It will be used to during linking time to resolve arrays
> of BTF IDs used in verifier, so these IDs do not need to
> be resolved in runtime.
>
> The expected layout of .BTF_ids section is described
> in btfid.c header. Related kernel changes are coming in
> following changes.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
> tools/bpf/btfid/Build | 26 ++
> tools/bpf/btfid/Makefile | 71 +++++
> tools/bpf/btfid/btfid.c | 627 +++++++++++++++++++++++++++++++++++++++
> 3 files changed, 724 insertions(+)
> create mode 100644 tools/bpf/btfid/Build
> create mode 100644 tools/bpf/btfid/Makefile
> create mode 100644 tools/bpf/btfid/btfid.c
>
[...]
> diff --git a/tools/bpf/btfid/btfid.c b/tools/bpf/btfid/btfid.c
> new file mode 100644
> index 000000000000..7cdf39bfb150
> --- /dev/null
> +++ b/tools/bpf/btfid/btfid.c
> @@ -0,0 +1,627 @@
> +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
> +#define _GNU_SOURCE
> +
> +/*
> + * btfid scans Elf object for .BTF_ids section and resolves
> + * its symbols with BTF IDs.
> + *
> + * Each symbol points to 4 bytes data and is expected to have
> + * following name syntax:
> + *
> + * __BTF_ID__<type>__<symbol>[__<id>]
This ___<id> thingy is just disambiguation between multiple places in
the code that could have BTF_ID macro, right? Or it has extra meaning?
> + *
> + * type is:
> + *
> + * func - lookup BTF_KIND_FUNC symbol with <symbol> name
> + * and put its ID into its data
> + *
> + * __BTF_ID__func__vfs_close__1:
> + * .zero 4
> + *
> + * struct - lookup BTF_KIND_STRUCT symbol with <symbol> name
> + * and put its ID into its data
> + *
> + * __BTF_ID__struct__sk_buff__1:
> + * .zero 4
> + *
> + * sort - put symbol size into data area and sort following
Oh, I finally got what "put symbol size" means :) It's quite unclear,
to be honest. Also, is this size in bytes or number of IDs? Clarifying
would be helpful (I'll probably get this from reading further down the
code, but still..)
> + * ID list
> + *
> + * __BTF_ID__sort__list:
> + * list_cnt:
> + * .zero 4
> + * list:
> + * __BTF_ID__func__vfs_getattr__3:
> + * .zero 4
> + * __BTF_ID__func__vfs_fallocate__4:
> + * .zero 4
> + */
> +
[...]
> +
> +static int symbols_collect(struct object *obj)
> +{
> + Elf_Scn *scn = NULL;
> + int n, i, err = 0;
> + GElf_Shdr sh;
> + char *name;
> +
> + scn = elf_getscn(obj->efile.elf, obj->efile.symbols_shndx);
> + if (!scn)
> + return -1;
> +
> + if (gelf_getshdr(scn, &sh) != &sh)
> + return -1;
> +
> + n = sh.sh_size / sh.sh_entsize;
> +
> + /*
> + * Scan symbols and look for the ones starting with
> + * __BTF_ID__* over .BTF_ids section.
> + */
> + for (i = 0; !err && i < n; i++) {
> + char *tmp, *prefix;
> + struct btf_id *id;
> + GElf_Sym sym;
> + int err = -1;
> +
> + if (!gelf_getsym(obj->efile.symbols, i, &sym))
> + return -1;
> +
> + if (sym.st_shndx != obj->efile.idlist_shndx)
> + continue;
> +
> + name = elf_strptr(obj->efile.elf, obj->efile.strtabidx,
> + sym.st_name);
> +
> + if (!is_btf_id(name))
> + continue;
> +
> + /*
> + * __BTF_ID__TYPE__vfs_truncate__0
> + * prefix = ^
> + */
> + prefix = name + sizeof(BTF_ID) - 1;
> +
> + if (!strncmp(prefix, BTF_STRUCT, sizeof(BTF_STRUCT) - 1)) {
> + id = add_struct(obj, prefix);
> + } else if (!strncmp(prefix, BTF_FUNC, sizeof(BTF_FUNC) - 1)) {
> + id = add_func(obj, prefix);
> + } else if (!strncmp(prefix, BTF_SORT, sizeof(BTF_SORT) - 1)) {
> + id = add_sort(obj, prefix);
> +
> + /*
> + * SORT objects store list's count, which is encoded
> + * in symbol's size.
> + */
> + if (id)
> + id->cnt = sym.st_size / sizeof(int);
doesn't sym.st_size also include extra 4 bytes for length prefix?
> + } else {
> + pr_err("FAILED unsupported prefix %s\n", prefix);
> + return -1;
> + }
> +
> + if (!id)
> + return -ENOMEM;
> +
> + if (id->addr_cnt >= ADDR_CNT) {
> + pr_err("FAILED symbol %s crossed the number of allowed lists",
> + id->name);
> + return -1;
> + }
> + id->addr[id->addr_cnt++] = sym.st_value;
> + }
> +
> + return 0;
> +}
> +
> +static int symbols_resolve(struct object *obj)
> +{
> + int nr_structs = obj->nr_structs;
> + int nr_funcs = obj->nr_funcs;
> + struct btf *btf;
> + int err, type_id;
> + __u32 nr;
> +
> + btf = btf__parse_elf(obj->path, NULL);
> + err = libbpf_get_error(btf);
> + if (err) {
> + pr_err("FAILED: load BTF from %s: %s",
> + obj->path, strerror(err));
> + return -1;
> + }
> +
> + nr = btf__get_nr_types(btf);
> +
> + /*
> + * Iterate all the BTF types and search for collected symbol IDs.
> + */
> + for (type_id = 0; type_id < nr; type_id++) {
common gotcha: type_id <= nr, you can also skip type_id == 0 (always VOID)
> + const struct btf_type *type;
> + struct rb_root *root = NULL;
> + struct btf_id *id;
> + const char *str;
> + int *nr;
> +
> + type = btf__type_by_id(btf, type_id);
> + if (!type)
> + continue;
This ought to be an error...
> +
> + /* We support func/struct types. */
> + if (BTF_INFO_KIND(type->info) == BTF_KIND_FUNC && nr_funcs) {
see libbpf's btf.h: btf_is_func(type)
> + root = &obj->funcs;
> + nr = &nr_funcs;
> + } else if (BTF_INFO_KIND(type->info) == BTF_KIND_STRUCT && nr_structs) {
same as above: btf_is_struct
But I think you also need to support unions?
Also what about typedefs? A lot of types are typedefs to struct/func_proto/etc.
> + root = &obj->structs;
> + nr = &nr_structs;
> + } else {
> + continue;
> + }
> +
> + str = btf__name_by_offset(btf, type->name_off);
> + if (!str)
> + continue;
error, shouldn't happen
> +
> + id = btf_id__find(root, str);
> + if (id) {
isn't it an error, if not found?
> + id->id = type_id;
> + (*nr)--;
> + }
> + }
> +
> + return 0;
> +}
> +
[...]
> +
> + /*
> + * We do proper cleanup and file close
> + * intentionally only on success.
> + */
> + if (elf_collect(&obj))
> + return -1;
> +
> + if (symbols_collect(&obj))
> + return -1;
> +
> + if (symbols_resolve(&obj))
> + return -1;
> +
> + if (symbols_patch(&obj))
> + return -1;
nit: should these elf_end/close properly on error?
> +
> + elf_end(obj.efile.elf);
> + close(obj.efile.fd);
> + return 0;
> +}
> --
> 2.25.4
>
^ permalink raw reply
* Re: [PATCH bpf-next 1/2] bpf: switch most helper return values from 32-bit int to 64-bit long
From: John Fastabend @ 2020-06-19 0:39 UTC (permalink / raw)
To: John Fastabend, Andrii Nakryiko, John Fastabend
Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
Daniel Borkmann, Kernel Team
In-Reply-To: <5eebf9321e11a_519a2abc9795c5bc21@john-XPS-13-9370.notmuch>
John Fastabend wrote:
> Andrii Nakryiko wrote:
> > On Thu, Jun 18, 2020 at 11:58 AM John Fastabend
> > <john.fastabend@gmail.com> wrote:
[...]
> > That would be great. Self-tests do work, but having more testing with
> > real-world application would certainly help as well.
>
> Thanks for all the follow up.
>
> I ran the change through some CI on my side and it passed so I can
> complain about a few shifts here and there or just update my code or
> just not change the return types on my side but I'm convinced its OK
> in most cases and helps in some so...
>
> Acked-by: John Fastabend <john.fastabend@gmail.com>
I'll follow this up with a few more selftests to capture a couple of our
patterns. These changes are subtle and I worry a bit that additional
<<,s>> pattern could have the potential to break something.
Another one we didn't discuss that I found in our code base is feeding
the output of a probe_* helper back into the size field (after some
alu ops) of subsequent probe_* call. Unfortunately, the tests I ran
today didn't cover that case.
I'll put it on the list tomorrow and encode these in selftests. I'll
let the mainainers decide if they want to wait for those or not.
^ permalink raw reply
* Re: [PATCH 02/11] bpf: Compile btfid tool at kernel compilation start
From: Andrii Nakryiko @ 2020-06-19 0:40 UTC (permalink / raw)
To: Jiri Olsa, Arnaldo Carvalho de Melo
Cc: Alexei Starovoitov, Daniel Borkmann, Networking, bpf, Song Liu,
Yonghong Song, Martin KaFai Lau, David Miller, John Fastabend,
Wenbo Zhang, KP Singh, Andrii Nakryiko, Brendan Gregg,
Florent Revest, Al Viro
In-Reply-To: <20200616100512.2168860-3-jolsa@kernel.org>
On Tue, Jun 16, 2020 at 3:06 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> The btfid tool will be used during the vmlinux linking,
> so it's necessary it's ready for it.
>
Seeing troubles John runs into, I wonder if it maybe would be better
to add it to pahole instead? It's already a dependency for anything
BTF-related in the kernel. It has libelf, libbpf linked and set up.
WDYT? I've cc'ed Arnaldo as well for an opinion.
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
> Makefile | 22 ++++++++++++++++++----
> tools/Makefile | 3 +++
> tools/bpf/Makefile | 5 ++++-
> 3 files changed, 25 insertions(+), 5 deletions(-)
>
[...]
^ permalink raw reply
* RE: [net-next 13/15] iecm: Add ethtool
From: Michael, Alice @ 2020-06-19 0:46 UTC (permalink / raw)
To: Jakub Kicinski, Kirsher, Jeffrey T
Cc: davem@davemloft.net, netdev@vger.kernel.org, nhorman@redhat.com,
sassmann@redhat.com, Brady, Alan, Burra, Phani R, Hay, Joshua A,
Chittim, Madhu, Linga, Pavan Kumar, Skidmore, Donald C,
Brandeburg, Jesse, Samudrala, Sridhar
In-Reply-To: <20200618165008.4d475087@kicinski-fedora-PC1C0HJN>
> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Thursday, June 18, 2020 4:50 PM
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: davem@davemloft.net; Michael, Alice <alice.michael@intel.com>;
> netdev@vger.kernel.org; nhorman@redhat.com; sassmann@redhat.com;
> Brady, Alan <alan.brady@intel.com>; Burra, Phani R <phani.r.burra@intel.com>;
> Hay, Joshua A <joshua.a.hay@intel.com>; Chittim, Madhu
> <madhu.chittim@intel.com>; Linga, Pavan Kumar
> <pavan.kumar.linga@intel.com>; Skidmore, Donald C
> <donald.c.skidmore@intel.com>; Brandeburg, Jesse
> <jesse.brandeburg@intel.com>; Samudrala, Sridhar
> <sridhar.samudrala@intel.com>
> Subject: Re: [net-next 13/15] iecm: Add ethtool
>
> On Wed, 17 Jun 2020 22:13:42 -0700 Jeff Kirsher wrote:
> > +static const struct ethtool_ops iecm_ethtool_ops = {
> > + .get_drvinfo = iecm_get_drvinfo,
> > + .get_msglevel = iecm_get_msglevel,
> > + .set_msglevel = iecm_set_msglevel,
> > + .get_coalesce = iecm_get_coalesce,
> > + .set_coalesce = iecm_set_coalesce,
> > + .get_per_queue_coalesce = iecm_get_per_q_coalesce,
> > + .set_per_queue_coalesce = iecm_set_per_q_coalesce,
> > + .get_ethtool_stats = iecm_get_ethtool_stats,
> > + .get_strings = iecm_get_strings,
> > + .get_sset_count = iecm_get_sset_count,
> > + .get_rxnfc = iecm_get_rxnfc,
> > + .get_rxfh_key_size = iecm_get_rxfh_key_size,
> > + .get_rxfh_indir_size = iecm_get_rxfh_indir_size,
> > + .get_rxfh = iecm_get_rxfh,
> > + .set_rxfh = iecm_set_rxfh,
> > + .get_channels = iecm_get_channels,
> > + .set_channels = iecm_set_channels,
> > + .get_ringparam = iecm_get_ringparam,
> > + .set_ringparam = iecm_set_ringparam,
> > + .get_link_ksettings = iecm_get_link_ksettings,
> > +};
>
> Oh wow. So you're upstreaming this driver based on at least a 3 month old tree?
> This:
>
> commit 9000edb71ab29d184aa33f5a77fa6e52d8812bb9
> Author: Jakub Kicinski <kuba@kernel.org>
> Date: Mon Mar 16 13:47:12 2020 -0700
>
> +int ethtool_check_ops(const struct ethtool_ops *ops) {
> + if (WARN_ON(ops->set_coalesce && !ops->supported_coalesce_params))
> + return -EINVAL;
>
> would have otherwise triggered.
I hadn't gotten any errors from a regular compile while I was preparing patches, and yes that got merged while we were doing final internal reviews and therefore missed in this first version.
I'm adding in the flags and they'll appear in the V3 that I'm preparing addressing all the comments that we have gotten thus far. It looks straightforward, thanks for pointing it out.
Alice
^ permalink raw reply
* Re: [PATCH 02/11] bpf: Compile btfid tool at kernel compilation start
From: Arnaldo Carvalho de Melo @ 2020-06-19 0:47 UTC (permalink / raw)
To: Andrii Nakryiko, Jiri Olsa
Cc: Alexei Starovoitov, Daniel Borkmann, Networking, bpf, Song Liu,
Yonghong Song, Martin KaFai Lau, David Miller, John Fastabend,
Wenbo Zhang, KP Singh, Andrii Nakryiko, Brendan Gregg,
Florent Revest, Al Viro
In-Reply-To: <CAEf4BzaL3bc8Hmm20Y-qEqfr7kZS2s8-KeE8M6Mz9ni81CSu4w@mail.gmail.com>
On June 18, 2020 9:40:32 PM GMT-03:00, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>On Tue, Jun 16, 2020 at 3:06 AM Jiri Olsa <jolsa@kernel.org> wrote:
>>
>> The btfid tool will be used during the vmlinux linking,
>> so it's necessary it's ready for it.
>>
>
>Seeing troubles John runs into, I wonder if it maybe would be better
>to add it to pahole instead? It's already a dependency for anything
>BTF-related in the kernel. It has libelf, libbpf linked and set up.
>WDYT? I've cc'ed Arnaldo as well for an opinion.
I was reading this thread with a low prio, but my gut feeling was that yeah, since pahole is already there, why not have it do this?
I'll try to look at this tomorrow and see if this is more than just a hunch.
- Arnaldo
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
^ permalink raw reply
* Re: [PATCH v5 3/3] net: phy: mscc: handle the clkout control on some phy variants
From: kernel test robot @ 2020-06-19 0:50 UTC (permalink / raw)
To: Heiko Stuebner, davem, kuba
Cc: kbuild-all, clang-built-linux, robh+dt, andrew, f.fainelli,
hkallweit1, linux, netdev, devicetree, linux-kernel
In-Reply-To: <20200618121139.1703762-4-heiko@sntech.de>
[-- Attachment #1: Type: text/plain, Size: 2421 bytes --]
Hi Heiko,
I love your patch! Yet something to improve:
[auto build test ERROR on robh/for-next]
[also build test ERROR on sparc-next/master net-next/master net/master linus/master v5.8-rc1 next-20200618]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Heiko-Stuebner/add-clkout-support-to-mscc-phys/20200618-201732
base: https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git for-next
config: x86_64-allyesconfig (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 487ca07fcc75d52755c9fe2ee05bcb3b6eeeec44)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
In file included from drivers/net/phy/mscc/mscc_macsec.c:17:
>> drivers/net/phy/mscc/mscc.h:371:16: error: field has incomplete type 'struct clk_hw'
struct clk_hw clkout_hw;
^
drivers/net/phy/mscc/mscc.h:371:9: note: forward declaration of 'struct clk_hw'
struct clk_hw clkout_hw;
^
1 error generated.
vim +371 drivers/net/phy/mscc/mscc.h
354
355 struct vsc8531_private {
356 int rate_magic;
357 u16 supp_led_modes;
358 u32 leds_mode[MAX_LEDS];
359 u8 nleds;
360 const struct vsc85xx_hw_stat *hw_stats;
361 u64 *stats;
362 int nstats;
363 /* PHY address within the package. */
364 u8 addr;
365 /* For multiple port PHYs; the MDIO address of the base PHY in the
366 * package.
367 */
368 unsigned int base_addr;
369
370 #ifdef CONFIG_COMMON_CLK
> 371 struct clk_hw clkout_hw;
372 #endif
373 u32 clkout_rate;
374 int clkout_enabled;
375
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 75292 bytes --]
^ permalink raw reply
* Re: [PATCH 03/11] bpf: Add btf_ids object
From: Andrii Nakryiko @ 2020-06-19 0:56 UTC (permalink / raw)
To: Jiri Olsa
Cc: Alexei Starovoitov, Daniel Borkmann, Networking, bpf, Song Liu,
Yonghong Song, Martin KaFai Lau, David Miller, John Fastabend,
Wenbo Zhang, KP Singh, Andrii Nakryiko, Brendan Gregg,
Florent Revest, Al Viro
In-Reply-To: <20200616100512.2168860-4-jolsa@kernel.org>
On Tue, Jun 16, 2020 at 3:05 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Adding support to generate .BTF_ids section that would
> hold various BTF IDs list for verifier.
>
> Adding macros help to define lists of BTF IDs placed in
> .BTF_ids section. They are initially filled with zeros
> (during compilation) and resolved later during the
> linking phase by btfid tool.
>
> Following defines list of one BTF ID that is accessible
> within kernel code as bpf_skb_output_btf_ids array.
>
> extern int bpf_skb_output_btf_ids[];
>
> BTF_ID_LIST(bpf_skb_output_btf_ids)
> BTF_ID(struct, sk_buff)
>
> Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
> include/asm-generic/vmlinux.lds.h | 4 ++
> kernel/bpf/Makefile | 2 +-
> kernel/bpf/btf_ids.c | 3 ++
> kernel/bpf/btf_ids.h | 70 +++++++++++++++++++++++++++++++
> 4 files changed, 78 insertions(+), 1 deletion(-)
> create mode 100644 kernel/bpf/btf_ids.c
> create mode 100644 kernel/bpf/btf_ids.h
>
[...]
> +/*
> + * Following macros help to define lists of BTF IDs placed
> + * in .BTF_ids section. They are initially filled with zeros
> + * (during compilation) and resolved later during the
> + * linking phase by btfid tool.
> + *
> + * Any change in list layout must be reflected in btfid
> + * tool logic.
> + */
> +
> +#define SECTION ".BTF_ids"
nit: SECTION is super generic and non-greppable. BTF_IDS_SECTION?
> +
> +#define ____BTF_ID(symbol) \
> +asm( \
> +".pushsection " SECTION ",\"a\"; \n" \
section should be also read-only? Either immediately here, of btfid
tool should mark it? Unless I missed that it's already doing it :)
> +".local " #symbol " ; \n" \
> +".type " #symbol ", @object; \n" \
> +".size " #symbol ", 4; \n" \
> +#symbol ": \n" \
> +".zero 4 \n" \
> +".popsection; \n");
> +
> +#define __BTF_ID(...) \
> + ____BTF_ID(__VA_ARGS__)
why varargs, if it's always a single argument? Or it's one of those
macro black magic things were it works only in this particular case,
but not others?
> +
> +#define __ID(prefix) \
> + __PASTE(prefix, __COUNTER__)
> +
> +
> +/*
> + * The BTF_ID defines unique symbol for each ID pointing
> + * to 4 zero bytes.
> + */
> +#define BTF_ID(prefix, name) \
> + __BTF_ID(__ID(__BTF_ID__##prefix##__##name##__))
> +
> +
> +/*
> + * The BTF_ID_LIST macro defines pure (unsorted) list
> + * of BTF IDs, with following layout:
> + *
> + * BTF_ID_LIST(list1)
> + * BTF_ID(type1, name1)
> + * BTF_ID(type2, name2)
> + *
> + * list1:
> + * __BTF_ID__type1__name1__1:
> + * .zero 4
> + * __BTF_ID__type2__name2__2:
> + * .zero 4
> + *
> + */
> +#define BTF_ID_LIST(name) \
nit: btw, you call it a list here, but btfids tool talks about
"sorts". Maybe stick to consistent naming. Either "list" or "set"
seems to be appropriate. Set implies a sorted aspect a bit more, IMO.
> +asm( \
> +".pushsection " SECTION ",\"a\"; \n" \
> +".global " #name "; \n" \
I was expecting to see reserved 4 bytes for list size? I also couldn't
find where btfids tool prepends it. From what I could understand, it
just assumed the first 4 bytes are the length prefix? Sorry if I'm
slow...
> +#name ":; \n" \
> +".popsection; \n");
> +
> +#endif
> --
> 2.25.4
>
^ permalink raw reply
* Re: [PATCH 03/11] bpf: Add btf_ids object
From: Andrii Nakryiko @ 2020-06-19 1:02 UTC (permalink / raw)
To: Jiri Olsa
Cc: Alexei Starovoitov, Daniel Borkmann, Networking, bpf, Song Liu,
Yonghong Song, Martin KaFai Lau, David Miller, John Fastabend,
Wenbo Zhang, KP Singh, Andrii Nakryiko, Brendan Gregg,
Florent Revest, Al Viro
In-Reply-To: <20200616100512.2168860-4-jolsa@kernel.org>
On Tue, Jun 16, 2020 at 3:05 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Adding support to generate .BTF_ids section that would
> hold various BTF IDs list for verifier.
>
> Adding macros help to define lists of BTF IDs placed in
> .BTF_ids section. They are initially filled with zeros
> (during compilation) and resolved later during the
> linking phase by btfid tool.
>
> Following defines list of one BTF ID that is accessible
> within kernel code as bpf_skb_output_btf_ids array.
>
> extern int bpf_skb_output_btf_ids[];
>
> BTF_ID_LIST(bpf_skb_output_btf_ids)
> BTF_ID(struct, sk_buff)
>
> Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
> include/asm-generic/vmlinux.lds.h | 4 ++
> kernel/bpf/Makefile | 2 +-
> kernel/bpf/btf_ids.c | 3 ++
> kernel/bpf/btf_ids.h | 70 +++++++++++++++++++++++++++++++
> 4 files changed, 78 insertions(+), 1 deletion(-)
> create mode 100644 kernel/bpf/btf_ids.c
> create mode 100644 kernel/bpf/btf_ids.h
>
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index db600ef218d7..0be2ee265931 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -641,6 +641,10 @@
> __start_BTF = .; \
> *(.BTF) \
> __stop_BTF = .; \
> + } \
> + . = ALIGN(4); \
> + .BTF_ids : AT(ADDR(.BTF_ids) - LOAD_OFFSET) { \
> + *(.BTF_ids) \
> }
> #else
> #define BTF
> diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
> index 1131a921e1a6..21e4fc7c25ab 100644
> --- a/kernel/bpf/Makefile
> +++ b/kernel/bpf/Makefile
> @@ -7,7 +7,7 @@ obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list
> obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
> obj-$(CONFIG_BPF_SYSCALL) += disasm.o
> obj-$(CONFIG_BPF_JIT) += trampoline.o
> -obj-$(CONFIG_BPF_SYSCALL) += btf.o
> +obj-$(CONFIG_BPF_SYSCALL) += btf.o btf_ids.o
> obj-$(CONFIG_BPF_JIT) += dispatcher.o
> ifeq ($(CONFIG_NET),y)
> obj-$(CONFIG_BPF_SYSCALL) += devmap.o
> diff --git a/kernel/bpf/btf_ids.c b/kernel/bpf/btf_ids.c
> new file mode 100644
> index 000000000000..e7f9d94ad293
> --- /dev/null
> +++ b/kernel/bpf/btf_ids.c
> @@ -0,0 +1,3 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +#include "btf_ids.h"
hm... what's the purpose of this btf_ids.c file?
> diff --git a/kernel/bpf/btf_ids.h b/kernel/bpf/btf_ids.h
> new file mode 100644
> index 000000000000..68aa5c38a37f
> --- /dev/null
> +++ b/kernel/bpf/btf_ids.h
[...]
^ permalink raw reply
* Re: [PATCH 03/11] bpf: Add btf_ids object
From: Andrii Nakryiko @ 2020-06-19 1:06 UTC (permalink / raw)
To: Jiri Olsa
Cc: Alexei Starovoitov, Daniel Borkmann, Networking, bpf, Song Liu,
Yonghong Song, Martin KaFai Lau, David Miller, John Fastabend,
Wenbo Zhang, KP Singh, Andrii Nakryiko, Brendan Gregg,
Florent Revest, Al Viro
In-Reply-To: <CAEf4BzZ=BN7zDU_8xMEEoF7khjC4bwGitU+iYf+6uFXPZ_=u-g@mail.gmail.com>
On Thu, Jun 18, 2020 at 5:56 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Tue, Jun 16, 2020 at 3:05 AM Jiri Olsa <jolsa@kernel.org> wrote:
> >
> > Adding support to generate .BTF_ids section that would
> > hold various BTF IDs list for verifier.
> >
> > Adding macros help to define lists of BTF IDs placed in
> > .BTF_ids section. They are initially filled with zeros
> > (during compilation) and resolved later during the
> > linking phase by btfid tool.
> >
> > Following defines list of one BTF ID that is accessible
> > within kernel code as bpf_skb_output_btf_ids array.
> >
> > extern int bpf_skb_output_btf_ids[];
> >
> > BTF_ID_LIST(bpf_skb_output_btf_ids)
> > BTF_ID(struct, sk_buff)
> >
> > Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> > include/asm-generic/vmlinux.lds.h | 4 ++
> > kernel/bpf/Makefile | 2 +-
> > kernel/bpf/btf_ids.c | 3 ++
> > kernel/bpf/btf_ids.h | 70 +++++++++++++++++++++++++++++++
> > 4 files changed, 78 insertions(+), 1 deletion(-)
> > create mode 100644 kernel/bpf/btf_ids.c
> > create mode 100644 kernel/bpf/btf_ids.h
> >
>
> [...]
>
> > +/*
> > + * Following macros help to define lists of BTF IDs placed
> > + * in .BTF_ids section. They are initially filled with zeros
> > + * (during compilation) and resolved later during the
> > + * linking phase by btfid tool.
> > + *
> > + * Any change in list layout must be reflected in btfid
> > + * tool logic.
> > + */
> > +
> > +#define SECTION ".BTF_ids"
>
> nit: SECTION is super generic and non-greppable. BTF_IDS_SECTION?
>
> > +
> > +#define ____BTF_ID(symbol) \
> > +asm( \
> > +".pushsection " SECTION ",\"a\"; \n" \
>
> section should be also read-only? Either immediately here, of btfid
> tool should mark it? Unless I missed that it's already doing it :)
>
> > +".local " #symbol " ; \n" \
> > +".type " #symbol ", @object; \n" \
> > +".size " #symbol ", 4; \n" \
> > +#symbol ": \n" \
> > +".zero 4 \n" \
> > +".popsection; \n");
> > +
> > +#define __BTF_ID(...) \
> > + ____BTF_ID(__VA_ARGS__)
>
> why varargs, if it's always a single argument? Or it's one of those
> macro black magic things were it works only in this particular case,
> but not others?
>
>
> > +
> > +#define __ID(prefix) \
> > + __PASTE(prefix, __COUNTER__)
> > +
> > +
> > +/*
> > + * The BTF_ID defines unique symbol for each ID pointing
> > + * to 4 zero bytes.
> > + */
> > +#define BTF_ID(prefix, name) \
> > + __BTF_ID(__ID(__BTF_ID__##prefix##__##name##__))
> > +
> > +
> > +/*
> > + * The BTF_ID_LIST macro defines pure (unsorted) list
> > + * of BTF IDs, with following layout:
> > + *
> > + * BTF_ID_LIST(list1)
> > + * BTF_ID(type1, name1)
> > + * BTF_ID(type2, name2)
> > + *
> > + * list1:
> > + * __BTF_ID__type1__name1__1:
> > + * .zero 4
> > + * __BTF_ID__type2__name2__2:
> > + * .zero 4
> > + *
> > + */
> > +#define BTF_ID_LIST(name) \
>
> nit: btw, you call it a list here, but btfids tool talks about
> "sorts". Maybe stick to consistent naming. Either "list" or "set"
> seems to be appropriate. Set implies a sorted aspect a bit more, IMO.
>
> > +asm( \
> > +".pushsection " SECTION ",\"a\"; \n" \
> > +".global " #name "; \n" \
>
> I was expecting to see reserved 4 bytes for list size? I also couldn't
> find where btfids tool prepends it. From what I could understand, it
> just assumed the first 4 bytes are the length prefix? Sorry if I'm
> slow...
Never mind, this is different from whitelisting you do in patch #8.
But now I'm curious how this list symbol gets its size correctly
calculated?..
>
>
> > +#name ":; \n" \
> > +".popsection; \n");
> > +
> > +#endif
> > --
> > 2.25.4
> >
^ permalink raw reply
* Re: [PATCH 05/11] bpf: Remove btf_id helpers resolving
From: Andrii Nakryiko @ 2020-06-19 1:10 UTC (permalink / raw)
To: Jiri Olsa
Cc: Alexei Starovoitov, Daniel Borkmann, Networking, bpf, Song Liu,
Yonghong Song, Martin KaFai Lau, David Miller, John Fastabend,
Wenbo Zhang, KP Singh, Andrii Nakryiko, Brendan Gregg,
Florent Revest, Al Viro
In-Reply-To: <20200616100512.2168860-6-jolsa@kernel.org>
On Tue, Jun 16, 2020 at 3:06 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Now when we moved the helpers btf_id into .BTF_ids section,
> we can remove the code that resolve those IDs in runtime.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
Nice! :)
BTW, have you looked at bpf_ctx_convert stuff? Would we be able to
replace it with your btfids thing as well?
> kernel/bpf/btf.c | 88 +++---------------------------------------------
> 1 file changed, 4 insertions(+), 84 deletions(-)
>
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 58c9af1d4808..aea7b2cc8d26 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -4049,96 +4049,16 @@ int btf_struct_access(struct bpf_verifier_log *log,
> return -EINVAL;
> }
>
[...]
> int btf_resolve_helper_id(struct bpf_verifier_log *log,
> const struct bpf_func_proto *fn, int arg)
> {
> - int *btf_id = &fn->btf_id[arg];
> - int ret;
> -
> if (fn->arg_type[arg] != ARG_PTR_TO_BTF_ID)
> return -EINVAL;
>
> - ret = READ_ONCE(*btf_id);
> - if (ret)
> - return ret;
> - /* ok to race the search. The result is the same */
> - ret = __btf_resolve_helper_id(log, fn->func, arg);
> - if (!ret) {
> - /* Function argument cannot be type 'void' */
> - bpf_log(log, "BTF resolution bug\n");
> - return -EFAULT;
> - }
> - WRITE_ONCE(*btf_id, ret);
> - return ret;
> + if (WARN_ON_ONCE(!fn->btf_id))
> + return -EINVAL;
> +
> + return fn->btf_id[arg];
It probably would be a good idea to add some sanity checking here,
making sure that btf_id is >0 (void is never a right type) and <=
nr_types in vmlinux_btf?
> }
>
> static int __get_type_size(struct btf *btf, u32 btf_id,
> --
> 2.25.4
>
^ permalink raw reply
* Re: RATE not being printed on tc -s class show dev XXXX
From: Cong Wang @ 2020-06-19 1:30 UTC (permalink / raw)
To: Roberto J. Blandino Cisneros; +Cc: Linux Kernel Network Developers
In-Reply-To: <d33998c7-f529-e1d1-31a5-626aa8dd44da@ibw.com.ni>
On Tue, Jun 16, 2020 at 7:06 AM Roberto J. Blandino Cisneros
<roberto.blandino@ibw.com.ni> wrote:
> I am seing "rate 0bit".
>
> But installing from debian package iproute2 i got nothing so i decide to
> compile iproute2 using version 5.7.0
>
> But my output is the same as below:
You either need to enable /sys/module/sch_htb/parameters/htb_rate_est
or specify a rate estimator when you create your HTB class.
Thanks.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox