* Re: [net-next v2 0/5][pull request] 1GbE Intel Wired LAN Driver Updates 2019-07-24
From: David Miller @ 2019-07-24 22:35 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann
In-Reply-To: <20190724212613.1580-1-jeffrey.t.kirsher@intel.com>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 24 Jul 2019 14:26:08 -0700
> This series contains updates to igc and e1000e client drivers only.
>
> Sasha provides a couple of cleanups to remove code that is not needed
> and reduce structure sizes. Updated the MAC reset flow to use the
> device reset flow instead of a port reset flow. Added addition device
> id's that will be supported.
>
> Kai-Heng Feng provides a workaround for a possible stalled packet issue
> in our ICH devices due to a clock recovery from the PCH being too slow.
>
> v2: removed the last patch in the series that supposedly fixed a MAC/PHY
> de-sync potential issue while waiting for additional information from
> hardware engineers.
Pulled, thanks Jeff.
^ permalink raw reply
* Re: [PATCH v2 net-next] r8169: improve rtl_set_rx_mode
From: David Miller @ 2019-07-24 22:41 UTC (permalink / raw)
To: hkallweit1; +Cc: nic_swsd, netdev
In-Reply-To: <2f0e6d3c-aa44-3334-aab0-f158f46e4aa9@gmail.com>
From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Wed, 24 Jul 2019 23:34:45 +0200
> This patch improves and simplifies rtl_set_rx_mode a little.
> No functional change intended.
>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> ---
> v2:
> - change variable declarations to reverse xmas tree
Applied.
^ permalink raw reply
* Re: [PATCH RFC net-next] net: use helper skb_ensure_writable in skb_checksum_help and skb_crc32c_csum_help
From: David Miller @ 2019-07-24 22:43 UTC (permalink / raw)
To: hkallweit1; +Cc: netdev
In-Reply-To: <d5791755-5b1e-6dc6-fa9d-da3bb0353847@gmail.com>
From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Wed, 24 Jul 2019 23:47:49 +0200
> Instead of open-coding making the checksum writable we can use an
> appropriate helper. skb_ensure_writable is a candidate, however we
> might also use skb_header_unclone. Hints welcome.
>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
FWIW these conversions look accurate to me.
The only real difference is the perhaps unnecessary pskb_may_pull()
check.
^ permalink raw reply
* Re: [PATCH bpf-next 2/6] bpf: add BPF_MAP_DUMP command to dump more than one entry per call
From: Brian Vazquez @ 2019-07-24 22:44 UTC (permalink / raw)
To: Song Liu
Cc: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann,
David S . Miller, Stanislav Fomichev, Willem de Bruijn,
Petar Penkov, open list, Networking, bpf
In-Reply-To: <CAPhsuW4HPjXE+zZGmPM9GVPgnVieRr0WOuXfM0W6ec3SB4imDw@mail.gmail.com>
On Wed, Jul 24, 2019 at 2:40 PM Song Liu <liu.song.a23@gmail.com> wrote:
>
> On Wed, Jul 24, 2019 at 10:10 AM Brian Vazquez <brianvv@google.com> wrote:
> >
> > This introduces a new command to retrieve multiple number of entries
> > from a bpf map, wrapping the existing bpf methods:
> > map_get_next_key and map_lookup_elem
> >
> > To start dumping the map from the beginning you must specify NULL as
> > the prev_key.
> >
> > The new API returns 0 when it successfully copied all the elements
> > requested or it copied less because there weren't more elements to
> > retrieved (i.e err == -ENOENT). In last scenario err will be masked to 0.
> >
> > On a successful call buf and buf_len will contain correct data and in
> > case prev_key was provided (not for the first walk, since prev_key is
> > NULL) it will contain the last_key copied into the prev_key which will
> > simplify next call.
> >
> > Only when it can't find a single element it will return -ENOENT meaning
> > that the map has been entirely walked. When an error is return buf,
> > buf_len and prev_key shouldn't be read nor used.
> >
> > Because maps can be called from userspace and kernel code, this function
> > can have a scenario where the next_key was found but by the time we
> > try to retrieve the value the element is not there, in this case the
> > function continues and tries to get a new next_key value, skipping the
> > deleted key. If at some point the function find itself trap in a loop,
> > it will return -EINTR.
> >
> > The function will try to fit as much as possible in the buf provided and
> > will return -EINVAL if buf_len is smaller than elem_size.
> >
> > QUEUE and STACK maps are not supported.
> >
> > Note that map_dump doesn't guarantee that reading the entire table is
> > consistent since this function is always racing with kernel and user code
> > but the same behaviour is found when the entire table is walked using
> > the current interfaces: map_get_next_key + map_lookup_elem.
> > It is also important to note that with a locked map, the lock is grabbed
> > for 1 entry at the time, meaning that the returned buf might or might not
> > be consistent.
> >
> > Suggested-by: Stanislav Fomichev <sdf@google.com>
> > Signed-off-by: Brian Vazquez <brianvv@google.com>
> > ---
> > include/uapi/linux/bpf.h | 9 +++
> > kernel/bpf/syscall.c | 117 +++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 126 insertions(+)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index fa1c753dcdbc7..66dab5385170d 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -106,6 +106,7 @@ enum bpf_cmd {
> > BPF_TASK_FD_QUERY,
> > BPF_MAP_LOOKUP_AND_DELETE_ELEM,
> > BPF_MAP_FREEZE,
> > + BPF_MAP_DUMP,
> > };
> >
> > enum bpf_map_type {
> > @@ -388,6 +389,14 @@ union bpf_attr {
> > __u64 flags;
> > };
> >
> > + struct { /* struct used by BPF_MAP_DUMP command */
> > + __aligned_u64 prev_key;
> > + __aligned_u64 buf;
> > + __aligned_u64 buf_len; /* input/output: len of buf */
> > + __u64 flags;
>
> Please add explanation of flags here.
got it!
> Also, we need to update the
> comments of BPF_F_LOCK for BPF_MAP_DUMP.
What do you mean? I didn't get this part.
>
> > + __u32 map_fd;
> > + } dump;
> > +
> > struct { /* anonymous struct used by BPF_PROG_LOAD command */
> > __u32 prog_type; /* one of enum bpf_prog_type */
> > __u32 insn_cnt;
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 86cdc2f7bb56e..0c35505aa219f 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -1097,6 +1097,120 @@ static int map_get_next_key(union bpf_attr *attr)
> > return err;
> > }
> >
> > +/* last field in 'union bpf_attr' used by this command */
> > +#define BPF_MAP_DUMP_LAST_FIELD dump.map_fd
> > +
> > +static int map_dump(union bpf_attr *attr)
> > +{
> > + void __user *ukey = u64_to_user_ptr(attr->dump.prev_key);
> > + void __user *ubuf = u64_to_user_ptr(attr->dump.buf);
> > + u32 __user *ubuf_len = u64_to_user_ptr(attr->dump.buf_len);
> > + int ufd = attr->dump.map_fd;
> > + struct bpf_map *map;
> > + void *buf, *prev_key, *key, *value;
> > + u32 value_size, elem_size, buf_len, cp_len;
> > + struct fd f;
> > + int err;
> > + bool first_key = false;
> > +
> > + if (CHECK_ATTR(BPF_MAP_DUMP))
> > + return -EINVAL;
> > +
> > + if (attr->dump.flags & ~BPF_F_LOCK)
> > + return -EINVAL;
> > +
> > + f = fdget(ufd);
> > + map = __bpf_map_get(f);
> > + if (IS_ERR(map))
> > + return PTR_ERR(map);
> > + if (!(map_get_sys_perms(map, f) & FMODE_CAN_READ)) {
> > + err = -EPERM;
> > + goto err_put;
> > + }
> > +
> > + if ((attr->dump.flags & BPF_F_LOCK) &&
> > + !map_value_has_spin_lock(map)) {
> > + err = -EINVAL;
> > + goto err_put;
> > + }
>
> We can share these lines with map_lookup_elem(). Maybe
> add another helper function?
Which are the lines you are referring to? the dump.flags? It makes
sense so that way when a new flag is added you only need to modify
them in one spot.
> > +
> > + if (map->map_type == BPF_MAP_TYPE_QUEUE ||
> > + map->map_type == BPF_MAP_TYPE_STACK) {
> > + err = -ENOTSUPP;
> > + goto err_put;
> > + }
> > +
> > + value_size = bpf_map_value_size(map);
> > +
> > + err = get_user(buf_len, ubuf_len);
> > + if (err)
> > + goto err_put;
> > +
> > + elem_size = map->key_size + value_size;
> > + if (buf_len < elem_size) {
> > + err = -EINVAL;
> > + goto err_put;
> > + }
> > +
> > + if (ukey) {
> > + prev_key = __bpf_copy_key(ukey, map->key_size);
> > + if (IS_ERR(prev_key)) {
> > + err = PTR_ERR(prev_key);
> > + goto err_put;
> > + }
> > + } else {
> > + prev_key = NULL;
> > + first_key = true;
> > + }
> > +
> > + err = -ENOMEM;
> > + buf = kmalloc(elem_size, GFP_USER | __GFP_NOWARN);
> > + if (!buf)
> > + goto err_put;
> > +
> > + key = buf;
> > + value = key + map->key_size;
> > + for (cp_len = 0; cp_len + elem_size <= buf_len;) {
> > + if (signal_pending(current)) {
> > + err = -EINTR;
> > + break;
> > + }
> > +
> > + rcu_read_lock();
> > + err = map->ops->map_get_next_key(map, prev_key, key);
>
> If prev_key is deleted before map_get_next_key(), we get the first key
> again. This is pretty weird.
Yes, I know. But note that the current scenario happens even for the
old interface (imagine you are walking a map from userspace and you
tried get_next_key the prev_key was removed, you will start again from
the beginning without noticing it).
I tried to sent a patch in the past but I was missing some context:
before NULL was used to get the very first_key the interface relied in
a random (non existent) key to retrieve the first_key in the map, and
I was told what we still have to support that scenario.
>
> > + rcu_read_unlock();
> > +
> > + if (err)
> > + break;
> > +
> > + err = bpf_map_copy_value(map, key, value, attr->dump.flags);
> > +
> > + if (err == -ENOENT)
> > + continue;
> > + if (err)
> > + goto free_buf;
> > +
> > + if (copy_to_user(ubuf + cp_len, buf, elem_size)) {
> > + err = -EFAULT;
> > + goto free_buf;
> > + }
> > +
> > + prev_key = key;
> > + cp_len += elem_size;
> > + }
> > +
> > + if (err == -ENOENT && cp_len)
> > + err = 0;
> > + if (!err && (copy_to_user(ubuf_len, &cp_len, sizeof(cp_len)) ||
> > + (!first_key && copy_to_user(ukey, key, map->key_size))))
> > + err = -EFAULT;
> > +free_buf:
> > + kfree(buf);
> > +err_put:
> > + fdput(f);
> > + return err;
> > +}
> > +
> > #define BPF_MAP_LOOKUP_AND_DELETE_ELEM_LAST_FIELD value
> >
> > static int map_lookup_and_delete_elem(union bpf_attr *attr)
> > @@ -2910,6 +3024,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
> > case BPF_MAP_LOOKUP_AND_DELETE_ELEM:
> > err = map_lookup_and_delete_elem(&attr);
> > break;
> > + case BPF_MAP_DUMP:
> > + err = map_dump(&attr);
> > + break;
> > default:
> > err = -EINVAL;
> > break;
> > --
> > 2.22.0.657.g960e92d24f-goog
> >
Thanks for reviewing it!!
^ permalink raw reply
* Re: [PATCH bpf-next 1/7] bpf/flow_dissector: pass input flags to BPF flow dissector program
From: Song Liu @ 2019-07-24 22:45 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: Stanislav Fomichev, Networking, bpf, David S . Miller,
Alexei Starovoitov, Daniel Borkmann, Willem de Bruijn,
Petar Penkov
In-Reply-To: <20190724223210.GA3500@mini-arch>
On Wed, Jul 24, 2019 at 3:32 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
>
> On 07/24, Song Liu wrote:
> > On Wed, Jul 24, 2019 at 10:11 AM Stanislav Fomichev <sdf@google.com> wrote:
> > >
> > > C flow dissector supports input flags that tell it to customize parsing
> > > by either stopping early or trying to parse as deep as possible. Pass
> > > those flags to the BPF flow dissector so it can make the same
> > > decisions. In the next commits I'll add support for those flags to
> > > our reference bpf_flow.c
> > >
> > > Cc: Willem de Bruijn <willemb@google.com>
> > > Cc: Petar Penkov <ppenkov@google.com>
> > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > ---
> > > include/linux/skbuff.h | 2 +-
> > > include/net/flow_dissector.h | 4 ----
> > > include/uapi/linux/bpf.h | 5 +++++
> > > net/bpf/test_run.c | 2 +-
> > > net/core/flow_dissector.c | 5 +++--
> > > 5 files changed, 10 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > > index 718742b1c505..9b7a8038beec 100644
> > > --- a/include/linux/skbuff.h
> > > +++ b/include/linux/skbuff.h
> > > @@ -1271,7 +1271,7 @@ static inline int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
> > >
> > > struct bpf_flow_dissector;
> > > bool bpf_flow_dissect(struct bpf_prog *prog, struct bpf_flow_dissector *ctx,
> > > - __be16 proto, int nhoff, int hlen);
> > > + __be16 proto, int nhoff, int hlen, unsigned int flags);
> > >
> > > bool __skb_flow_dissect(const struct net *net,
> > > const struct sk_buff *skb,
> > > diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
> > > index 90bd210be060..3e2642587b76 100644
> > > --- a/include/net/flow_dissector.h
> > > +++ b/include/net/flow_dissector.h
> > > @@ -253,10 +253,6 @@ enum flow_dissector_key_id {
> > > FLOW_DISSECTOR_KEY_MAX,
> > > };
> > >
> > > -#define FLOW_DISSECTOR_F_PARSE_1ST_FRAG BIT(0)
> > > -#define FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL BIT(1)
> > > -#define FLOW_DISSECTOR_F_STOP_AT_ENCAP BIT(2)
> > > -
> > > struct flow_dissector_key {
> > > enum flow_dissector_key_id key_id;
> > > size_t offset; /* offset of struct flow_dissector_key_*
> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > index fa1c753dcdbc..b4ad19bd6aa8 100644
> > > --- a/include/uapi/linux/bpf.h
> > > +++ b/include/uapi/linux/bpf.h
> > > @@ -3507,6 +3507,10 @@ enum bpf_task_fd_type {
> > > BPF_FD_TYPE_URETPROBE, /* filename + offset */
> > > };
> > >
> > > +#define FLOW_DISSECTOR_F_PARSE_1ST_FRAG (1U << 0)
> > > +#define FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL (1U << 1)
> > > +#define FLOW_DISSECTOR_F_STOP_AT_ENCAP (1U << 2)
> >
> > Do we have to move these?
> These have to be a part of UAPI one way or another. The easiest thing
> we can do is to call the existing flags a UAPI and move them into
> exported headers. Alternatively, we can introduce new set of UAPI flags
> and do some kind of conversion between exported and internal ones.
>
> Since it's pretty easy to add/deprecate them (see cover letter), I've
> decided to just move them instead of adding another set and doing
> conversion. I'm open to suggestions if you think otherwise.
I see. This looks good to me. Thanks for the explanation.
Acked-by: Song Liu <songliubraving@fb.com>
^ permalink raw reply
* Re: [PATCH] ovs: datapath: hide clang frame-overflow warnings
From: David Miller @ 2019-07-24 22:45 UTC (permalink / raw)
To: arnd
Cc: pshelar, xiangxia.m.yue, johannes.berg, kjlu, netdev, dev,
linux-kernel, clang-built-linux
In-Reply-To: <20190722150018.1156794-1-arnd@arndb.de>
From: Arnd Bergmann <arnd@arndb.de>
Date: Mon, 22 Jul 2019 17:00:01 +0200
> Some functions in the datapath code are factored out so that each
> one has a stack frame smaller than 1024 bytes with gcc. However,
> when compiling with clang, the functions are inlined more aggressively
> and combined again so we get
>
> net/openvswitch/datapath.c:1124:12: error: stack frame size of 1528 bytes in function 'ovs_flow_cmd_set' [-Werror,-Wframe-larger-than=]
>
> Marking both get_flow_actions() and ovs_nla_init_match_and_action()
> as 'noinline_for_stack' gives us the same behavior that we see with
> gcc, and no warning. Note that this does not mean we actually use
> less stack, as the functions call each other, and we still get
> three copies of the large 'struct sw_flow_key' type on the stack.
>
> The comment tells us that this was previously considered safe,
> presumably since the netlink parsing functions are called with
> a known backchain that does not also use a lot of stack space.
>
> Fixes: 9cc9a5cb176c ("datapath: Avoid using stack larger than 1024.")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Applied, thanks Arnd.
^ permalink raw reply
* Re: [PATCH net-next] qed: reduce maximum stack frame size
From: David Miller @ 2019-07-24 22:46 UTC (permalink / raw)
To: arnd
Cc: aelior, GR-everest-linux-l2, Yuval.Mintz, manishc,
michal.kalderon, skalluru, denis.bolotin, rverma, netdev,
linux-kernel, clang-built-linux
In-Reply-To: <20190722150133.1157096-1-arnd@arndb.de>
From: Arnd Bergmann <arnd@arndb.de>
Date: Mon, 22 Jul 2019 17:01:23 +0200
> clang warns about an overly large stack frame in one function
> when it decides to inline all __qed_get_vport_*() functions into
> __qed_get_vport_stats():
>
> drivers/net/ethernet/qlogic/qed/qed_l2.c:1889:13: error: stack frame size of 1128 bytes in function '_qed_get_vport_stats' [-Werror,-Wframe-larger-than=]
>
> Use a noinline_for_stack annotation to prevent clang from inlining
> these, which keeps the maximum stack usage at around half of that
> in the worst case, similar to what we get with gcc.
>
> Fixes: 86622ee75312 ("qed: Move statistics to L2 code")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Applied to net-next.
^ permalink raw reply
* Re: [PATCH net-next] [net-next] mlx4: avoid large stack usage in mlx4_init_hca()
From: David Miller @ 2019-07-24 22:47 UTC (permalink / raw)
To: arnd
Cc: tariqt, ereza, jackm, eli, moshe, jiri, netdev, linux-rdma,
linux-kernel, clang-built-linux
In-Reply-To: <20190722150204.1157315-1-arnd@arndb.de>
From: Arnd Bergmann <arnd@arndb.de>
Date: Mon, 22 Jul 2019 17:01:55 +0200
> The mlx4_dev_cap and mlx4_init_hca_param are really too large
> to be put on the kernel stack, as shown by this clang warning:
>
> drivers/net/ethernet/mellanox/mlx4/main.c:3304:12: error: stack frame size of 1088 bytes in function 'mlx4_load_one' [-Werror,-Wframe-larger-than=]
>
> With gcc, the problem is the same, but it does not warn because
> it does not inline this function, and therefore stays just below
> the warning limit, while clang is just above it.
>
> Use kzalloc for dynamic allocation instead of putting them
> on stack. This gets the combined stack frame down to 424 bytes.
>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Applied.
^ permalink raw reply
* Re: [PATCH] net: sfc: falcon: convert to i2c_new_dummy_device
From: David Miller @ 2019-07-24 22:47 UTC (permalink / raw)
To: wsa+renesas
Cc: linux-i2c, linux-net-drivers, ecree, mhabets, netdev,
linux-kernel
In-Reply-To: <20190722172635.4535-1-wsa+renesas@sang-engineering.com>
From: Wolfram Sang <wsa+renesas@sang-engineering.com>
Date: Mon, 22 Jul 2019 19:26:35 +0200
> Move from i2c_new_dummy() to i2c_new_dummy_device(). So, we now get an
> ERRPTR which we use in error handling.
>
> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Solarflare folks, please review/test.
Thank you.
^ permalink raw reply
* Re: [RFC PATCH net-next 00/12] drop_monitor: Capture dropped packets and metadata
From: David Miller @ 2019-07-24 22:48 UTC (permalink / raw)
To: toke
Cc: idosch, netdev, nhorman, dsahern, roopa, nikolay, jakub.kicinski,
andy, f.fainelli, andrew, vivien.didelot, mlxsw, idosch
In-Reply-To: <87imrt4zzg.fsf@toke.dk>
From: Toke Høiland-Jørgensen <toke@redhat.com>
Date: Mon, 22 Jul 2019 21:43:15 +0200
> I like this!
I definitely think this is going in the right direction as well.
^ permalink raw reply
* Re: [Patch net] netrom: hold sock when setting skb->destructor
From: David Miller @ 2019-07-24 22:49 UTC (permalink / raw)
To: xiyou.wangcong
Cc: netdev, syzbot+622bdabb128acc33427d, syzbot+6eaef7158b19e3fec3a0,
syzbot+9399c158fcc09b21d0d2, syzbot+a34e5f3d0300163f0c87, ralf
In-Reply-To: <20190723034122.23166-1-xiyou.wangcong@gmail.com>
From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Mon, 22 Jul 2019 20:41:22 -0700
> sock_efree() releases the sock refcnt, if we don't hold this refcnt
> when setting skb->destructor to it, the refcnt would not be balanced.
> This leads to several bug reports from syzbot.
>
> I have checked other users of sock_efree(), all of them hold the
> sock refcnt.
>
> Fixes: c8c8218ec5af ("netrom: fix a memory leak in nr_rx_frame()")
> Reported-and-tested-by: <syzbot+622bdabb128acc33427d@syzkaller.appspotmail.com>
> Reported-and-tested-by: <syzbot+6eaef7158b19e3fec3a0@syzkaller.appspotmail.com>
> Reported-and-tested-by: <syzbot+9399c158fcc09b21d0d2@syzkaller.appspotmail.com>
> Reported-and-tested-by: <syzbot+a34e5f3d0300163f0c87@syzkaller.appspotmail.com>
> Cc: Ralf Baechle <ralf@linux-mips.org>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Applied and queued up for -stable.
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH v2] e1000e: Make speed detection on hotplugging cable more reliable
From: Brown, Aaron F @ 2019-07-24 22:52 UTC (permalink / raw)
To: Kirsher, Jeffrey T, Kai-Heng Feng
Cc: netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
linux-kernel@vger.kernel.org
In-Reply-To: <20190715122555.11922-1-kai.heng.feng@canonical.com>
On Mon, 2019-07-15 at 20:25 +0800, Kai-Heng Feng wrote:
> After hotplugging an 1Gbps ethernet cable with 1Gbps link partner, the
> MII_BMSR may report 10Mbps, renders the network rather slow.
>
> The issue has much lower fail rate after commit 59653e6497d1 ("e1000e:
> Make watchdog use delayed work"), which essentially introduces some
> delay before running the watchdog task.
>
> But there's still a chance that the hotplugging event and the queued
> watchdog task gets run at the same time, then the original issue can be
> observed once again.
>
> So let's use mod_delayed_work() to add a deterministic 1 second delay
> before running watchdog task, after an interrupt.
>
> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> ---
> drivers/net/ethernet/intel/e1000e/netdev.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
^ permalink raw reply
* Re: [PATCH bpf-next 2/6] bpf: add BPF_MAP_DUMP command to dump more than one entry per call
From: Song Liu @ 2019-07-24 23:04 UTC (permalink / raw)
To: Brian Vazquez
Cc: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann,
David S . Miller, Stanislav Fomichev, Willem de Bruijn,
Petar Penkov, open list, Networking, bpf
In-Reply-To: <CABCgpaXz4hO=iGoswdqYBECWE5eu2AdUgms=hyfKnqz7E+ZgNg@mail.gmail.com>
On Wed, Jul 24, 2019 at 3:44 PM Brian Vazquez <brianvv.kernel@gmail.com> wrote:
>
> On Wed, Jul 24, 2019 at 2:40 PM Song Liu <liu.song.a23@gmail.com> wrote:
> >
> > On Wed, Jul 24, 2019 at 10:10 AM Brian Vazquez <brianvv@google.com> wrote:
> > >
> > > This introduces a new command to retrieve multiple number of entries
> > > from a bpf map, wrapping the existing bpf methods:
> > > map_get_next_key and map_lookup_elem
> > >
> > > To start dumping the map from the beginning you must specify NULL as
> > > the prev_key.
> > >
> > > The new API returns 0 when it successfully copied all the elements
> > > requested or it copied less because there weren't more elements to
> > > retrieved (i.e err == -ENOENT). In last scenario err will be masked to 0.
> > >
> > > On a successful call buf and buf_len will contain correct data and in
> > > case prev_key was provided (not for the first walk, since prev_key is
> > > NULL) it will contain the last_key copied into the prev_key which will
> > > simplify next call.
> > >
> > > Only when it can't find a single element it will return -ENOENT meaning
> > > that the map has been entirely walked. When an error is return buf,
> > > buf_len and prev_key shouldn't be read nor used.
> > >
> > > Because maps can be called from userspace and kernel code, this function
> > > can have a scenario where the next_key was found but by the time we
> > > try to retrieve the value the element is not there, in this case the
> > > function continues and tries to get a new next_key value, skipping the
> > > deleted key. If at some point the function find itself trap in a loop,
> > > it will return -EINTR.
> > >
> > > The function will try to fit as much as possible in the buf provided and
> > > will return -EINVAL if buf_len is smaller than elem_size.
> > >
> > > QUEUE and STACK maps are not supported.
> > >
> > > Note that map_dump doesn't guarantee that reading the entire table is
> > > consistent since this function is always racing with kernel and user code
> > > but the same behaviour is found when the entire table is walked using
> > > the current interfaces: map_get_next_key + map_lookup_elem.
> > > It is also important to note that with a locked map, the lock is grabbed
> > > for 1 entry at the time, meaning that the returned buf might or might not
> > > be consistent.
> > >
> > > Suggested-by: Stanislav Fomichev <sdf@google.com>
> > > Signed-off-by: Brian Vazquez <brianvv@google.com>
> > > ---
> > > include/uapi/linux/bpf.h | 9 +++
> > > kernel/bpf/syscall.c | 117 +++++++++++++++++++++++++++++++++++++++
> > > 2 files changed, 126 insertions(+)
> > >
> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > index fa1c753dcdbc7..66dab5385170d 100644
> > > --- a/include/uapi/linux/bpf.h
> > > +++ b/include/uapi/linux/bpf.h
> > > @@ -106,6 +106,7 @@ enum bpf_cmd {
> > > BPF_TASK_FD_QUERY,
> > > BPF_MAP_LOOKUP_AND_DELETE_ELEM,
> > > BPF_MAP_FREEZE,
> > > + BPF_MAP_DUMP,
> > > };
> > >
> > > enum bpf_map_type {
> > > @@ -388,6 +389,14 @@ union bpf_attr {
> > > __u64 flags;
> > > };
> > >
> > > + struct { /* struct used by BPF_MAP_DUMP command */
> > > + __aligned_u64 prev_key;
> > > + __aligned_u64 buf;
> > > + __aligned_u64 buf_len; /* input/output: len of buf */
> > > + __u64 flags;
> >
> > Please add explanation of flags here.
>
> got it!
>
> > Also, we need to update the
> > comments of BPF_F_LOCK for BPF_MAP_DUMP.
>
> What do you mean? I didn't get this part.
I meant, current comment says BPF_F_LOCK is for BPF_MAP_UPDATE_ELEM.
But it is also used by BPF_MAP_LOOKUP_ELEM and BPF_MAP_DUMP. So
current comment is not accurate either.
Maybe fix it while you are on it?
>
> >
> > > + __u32 map_fd;
> > > + } dump;
> > > +
> > > struct { /* anonymous struct used by BPF_PROG_LOAD command */
> > > __u32 prog_type; /* one of enum bpf_prog_type */
> > > __u32 insn_cnt;
> > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > index 86cdc2f7bb56e..0c35505aa219f 100644
> > > --- a/kernel/bpf/syscall.c
> > > +++ b/kernel/bpf/syscall.c
> > > @@ -1097,6 +1097,120 @@ static int map_get_next_key(union bpf_attr *attr)
> > > return err;
> > > }
> > >
> > > +/* last field in 'union bpf_attr' used by this command */
> > > +#define BPF_MAP_DUMP_LAST_FIELD dump.map_fd
> > > +
> > > +static int map_dump(union bpf_attr *attr)
> > > +{
> > > + void __user *ukey = u64_to_user_ptr(attr->dump.prev_key);
> > > + void __user *ubuf = u64_to_user_ptr(attr->dump.buf);
> > > + u32 __user *ubuf_len = u64_to_user_ptr(attr->dump.buf_len);
> > > + int ufd = attr->dump.map_fd;
> > > + struct bpf_map *map;
> > > + void *buf, *prev_key, *key, *value;
> > > + u32 value_size, elem_size, buf_len, cp_len;
> > > + struct fd f;
> > > + int err;
> > > + bool first_key = false;
> > > +
> > > + if (CHECK_ATTR(BPF_MAP_DUMP))
> > > + return -EINVAL;
> > > +
> > > + if (attr->dump.flags & ~BPF_F_LOCK)
> > > + return -EINVAL;
> > > +
> > > + f = fdget(ufd);
> > > + map = __bpf_map_get(f);
> > > + if (IS_ERR(map))
> > > + return PTR_ERR(map);
> > > + if (!(map_get_sys_perms(map, f) & FMODE_CAN_READ)) {
> > > + err = -EPERM;
> > > + goto err_put;
> > > + }
> > > +
> > > + if ((attr->dump.flags & BPF_F_LOCK) &&
> > > + !map_value_has_spin_lock(map)) {
> > > + err = -EINVAL;
> > > + goto err_put;
> > > + }
> >
> > We can share these lines with map_lookup_elem(). Maybe
> > add another helper function?
>
> Which are the lines you are referring to? the dump.flags? It makes
> sense so that way when a new flag is added you only need to modify
> them in one spot.
I think I misread it. attr->dump.flags is not same as attr->flags.
So never mind.
>
> > > +
> > > + if (map->map_type == BPF_MAP_TYPE_QUEUE ||
> > > + map->map_type == BPF_MAP_TYPE_STACK) {
> > > + err = -ENOTSUPP;
> > > + goto err_put;
> > > + }
> > > +
> > > + value_size = bpf_map_value_size(map);
> > > +
> > > + err = get_user(buf_len, ubuf_len);
> > > + if (err)
> > > + goto err_put;
> > > +
> > > + elem_size = map->key_size + value_size;
> > > + if (buf_len < elem_size) {
> > > + err = -EINVAL;
> > > + goto err_put;
> > > + }
> > > +
> > > + if (ukey) {
> > > + prev_key = __bpf_copy_key(ukey, map->key_size);
> > > + if (IS_ERR(prev_key)) {
> > > + err = PTR_ERR(prev_key);
> > > + goto err_put;
> > > + }
> > > + } else {
> > > + prev_key = NULL;
> > > + first_key = true;
> > > + }
> > > +
> > > + err = -ENOMEM;
> > > + buf = kmalloc(elem_size, GFP_USER | __GFP_NOWARN);
> > > + if (!buf)
> > > + goto err_put;
> > > +
> > > + key = buf;
> > > + value = key + map->key_size;
> > > + for (cp_len = 0; cp_len + elem_size <= buf_len;) {
> > > + if (signal_pending(current)) {
> > > + err = -EINTR;
> > > + break;
> > > + }
> > > +
> > > + rcu_read_lock();
> > > + err = map->ops->map_get_next_key(map, prev_key, key);
> >
> > If prev_key is deleted before map_get_next_key(), we get the first key
> > again. This is pretty weird.
>
> Yes, I know. But note that the current scenario happens even for the
> old interface (imagine you are walking a map from userspace and you
> tried get_next_key the prev_key was removed, you will start again from
> the beginning without noticing it).
> I tried to sent a patch in the past but I was missing some context:
> before NULL was used to get the very first_key the interface relied in
> a random (non existent) key to retrieve the first_key in the map, and
> I was told what we still have to support that scenario.
BPF_MAP_DUMP is slightly different, as you may return the first key
multiple times in the same call. Also, BPF_MAP_DUMP is new, so we
don't have to support legacy scenarios.
Since BPF_MAP_DUMP keeps a list of elements. It is possible to try
to look up previous keys. Would something down this direction work?
Thanks,
Song
^ permalink raw reply
* Re: [PATCH bpf-next 3/6] bpf: keep bpf.h in sync with tools/
From: Andrii Nakryiko @ 2019-07-24 23:10 UTC (permalink / raw)
To: Brian Vazquez
Cc: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann,
David S . Miller, Stanislav Fomichev, Willem de Bruijn,
Petar Penkov, open list, Networking, bpf
In-Reply-To: <20190724165803.87470-4-brianvv@google.com>
On Wed, Jul 24, 2019 at 10:10 AM Brian Vazquez <brianvv@google.com> wrote:
>
> Adds bpf_attr.dump structure to libbpf.
>
> Suggested-by: Stanislav Fomichev <sdf@google.com>
> Signed-off-by: Brian Vazquez <brianvv@google.com>
> ---
> tools/include/uapi/linux/bpf.h | 9 +++++++++
> tools/lib/bpf/libbpf.map | 2 ++
> 2 files changed, 11 insertions(+)
>
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 4e455018da65f..e127f16e4e932 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -106,6 +106,7 @@ enum bpf_cmd {
> BPF_TASK_FD_QUERY,
> BPF_MAP_LOOKUP_AND_DELETE_ELEM,
> BPF_MAP_FREEZE,
> + BPF_MAP_DUMP,
> };
>
> enum bpf_map_type {
> @@ -388,6 +389,14 @@ union bpf_attr {
> __u64 flags;
> };
>
> + struct { /* struct used by BPF_MAP_DUMP command */
> + __aligned_u64 prev_key;
> + __aligned_u64 buf;
> + __aligned_u64 buf_len; /* input/output: len of buf */
> + __u64 flags;
> + __u32 map_fd;
> + } dump;
> +
> struct { /* anonymous struct used by BPF_PROG_LOAD command */
> __u32 prog_type; /* one of enum bpf_prog_type */
> __u32 insn_cnt;
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index f9d316e873d8d..cac3723d5c45c 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -183,4 +183,6 @@ LIBBPF_0.0.4 {
LIBBPF_0.0.4 is closed, this needs to go into LIBBPF_0.0.5.
> perf_buffer__new;
> perf_buffer__new_raw;
> perf_buffer__poll;
> + bpf_map_dump;
> + bpf_map_dump_flags;
As the general rule, please keep those lists of functions in alphabetical order.
> } LIBBPF_0.0.3;
> --
> 2.22.0.657.g960e92d24f-goog
>
^ permalink raw reply
* Re: BUG: spinlock recursion in release_sock
From: syzbot @ 2019-07-24 23:14 UTC (permalink / raw)
To: arvid.brodin, aviadye, borisp, daniel, davejwatson, davem,
jakub.kicinski, john.fastabend, john.hurley, linux-kernel, netdev,
simon.horman, syzkaller-bugs, willemb, xiyou.wangcong
In-Reply-To: <000000000000464b54058e722b54@google.com>
syzbot has bisected this bug to:
commit 8822e270d697010e6a4fd42a319dbefc33db91e1
Author: John Hurley <john.hurley@netronome.com>
Date: Sun Jul 7 14:01:54 2019 +0000
net: core: move push MPLS functionality from OvS to core helper
bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=13ca5a5c600000
start commit: 9e6dfe80 Add linux-next specific files for 20190724
git tree: linux-next
final crash: https://syzkaller.appspot.com/x/report.txt?x=102a5a5c600000
console output: https://syzkaller.appspot.com/x/log.txt?x=17ca5a5c600000
kernel config: https://syzkaller.appspot.com/x/.config?x=6cbb8fc2cf2842d7
dashboard link: https://syzkaller.appspot.com/bug?extid=e67cf584b5e6b35a8ffa
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13680594600000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15b34144600000
Reported-by: syzbot+e67cf584b5e6b35a8ffa@syzkaller.appspotmail.com
Fixes: 8822e270d697 ("net: core: move push MPLS functionality from OvS to
core helper")
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
^ permalink raw reply
* Re: [PATCH bpf-next 4/7] tools/bpf: sync bpf_flow_keys flags
From: Song Liu @ 2019-07-24 23:17 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: Networking, bpf, David S . Miller, Alexei Starovoitov,
Daniel Borkmann, Willem de Bruijn, Petar Penkov
In-Reply-To: <20190724170018.96659-5-sdf@google.com>
On Wed, Jul 24, 2019 at 10:11 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> Export bpf_flow_keys flags to tools/libbpf/selftests.
>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Petar Penkov <ppenkov@google.com>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
> ---
> tools/include/uapi/linux/bpf.h | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 4e455018da65..a0e1c891b56f 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -3504,6 +3504,10 @@ enum bpf_task_fd_type {
> BPF_FD_TYPE_URETPROBE, /* filename + offset */
> };
>
> +#define FLOW_DISSECTOR_F_PARSE_1ST_FRAG (1U << 0)
> +#define FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL (1U << 1)
> +#define FLOW_DISSECTOR_F_STOP_AT_ENCAP (1U << 2)
> +
> struct bpf_flow_keys {
> __u16 nhoff;
> __u16 thoff;
> @@ -3525,6 +3529,7 @@ struct bpf_flow_keys {
> __u32 ipv6_dst[4]; /* in6_addr; network order */
> };
> };
> + __u32 flags;
> };
>
> struct bpf_func_info {
> --
> 2.22.0.657.g960e92d24f-goog
>
^ permalink raw reply
* Re: [PATCH bpf-next 5/7] sefltests/bpf: support FLOW_DISSECTOR_F_PARSE_1ST_FRAG
From: Song Liu @ 2019-07-24 23:21 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: Networking, bpf, David S . Miller, Alexei Starovoitov,
Daniel Borkmann, Willem de Bruijn, Petar Penkov
In-Reply-To: <20190724170018.96659-6-sdf@google.com>
On Wed, Jul 24, 2019 at 10:11 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> bpf_flow.c: exit early unless FLOW_DISSECTOR_F_PARSE_1ST_FRAG is passed
> in flags. Also, set ip_proto earlier, this makes sure we have correct
> value with fragmented packets.
>
> Add selftest cases to test ipv4/ipv6 fragments and skip eth_get_headlen
> tests that don't have FLOW_DISSECTOR_F_PARSE_1ST_FRAG flag.
>
> eth_get_headlen calls flow dissector with
> FLOW_DISSECTOR_F_PARSE_1ST_FRAG flag so we can't run tests that
> have different set of input flags against it.
>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Petar Penkov <ppenkov@google.com>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
> .../selftests/bpf/prog_tests/flow_dissector.c | 129 ++++++++++++++++++
> tools/testing/selftests/bpf/progs/bpf_flow.c | 28 +++-
> 2 files changed, 151 insertions(+), 6 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> index c938283ac232..966cb3b06870 100644
> --- a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> +++ b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> @@ -5,6 +5,10 @@
> #include <linux/if_tun.h>
> #include <sys/uio.h>
>
> +#ifndef IP_MF
> +#define IP_MF 0x2000
> +#endif
> +
> #define CHECK_FLOW_KEYS(desc, got, expected) \
> CHECK_ATTR(memcmp(&got, &expected, sizeof(got)) != 0, \
> desc, \
> @@ -49,6 +53,18 @@ struct ipv6_pkt {
> struct tcphdr tcp;
> } __packed;
>
> +struct ipv6_frag_pkt {
> + struct ethhdr eth;
> + struct ipv6hdr iph;
> + struct frag_hdr {
> + __u8 nexthdr;
> + __u8 reserved;
> + __be16 frag_off;
> + __be32 identification;
> + } ipf;
> + struct tcphdr tcp;
> +} __packed;
> +
> struct dvlan_ipv6_pkt {
> struct ethhdr eth;
> __u16 vlan_tci;
> @@ -65,9 +81,11 @@ struct test {
> struct ipv4_pkt ipv4;
> struct svlan_ipv4_pkt svlan_ipv4;
> struct ipv6_pkt ipv6;
> + struct ipv6_frag_pkt ipv6_frag;
> struct dvlan_ipv6_pkt dvlan_ipv6;
> } pkt;
> struct bpf_flow_keys keys;
> + __u32 flags;
> };
>
> #define VLAN_HLEN 4
> @@ -143,6 +161,102 @@ struct test tests[] = {
> .n_proto = __bpf_constant_htons(ETH_P_IPV6),
> },
> },
> + {
> + .name = "ipv4-frag",
> + .pkt.ipv4 = {
> + .eth.h_proto = __bpf_constant_htons(ETH_P_IP),
> + .iph.ihl = 5,
> + .iph.protocol = IPPROTO_TCP,
> + .iph.tot_len = __bpf_constant_htons(MAGIC_BYTES),
> + .iph.frag_off = __bpf_constant_htons(IP_MF),
> + .tcp.doff = 5,
> + .tcp.source = 80,
> + .tcp.dest = 8080,
> + },
> + .keys = {
> + .flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG,
> + .nhoff = ETH_HLEN,
> + .thoff = ETH_HLEN + sizeof(struct iphdr),
> + .addr_proto = ETH_P_IP,
> + .ip_proto = IPPROTO_TCP,
> + .n_proto = __bpf_constant_htons(ETH_P_IP),
> + .is_frag = true,
> + .is_first_frag = true,
> + .sport = 80,
> + .dport = 8080,
> + },
> + .flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG,
> + },
> + {
> + .name = "ipv4-no-frag",
> + .pkt.ipv4 = {
> + .eth.h_proto = __bpf_constant_htons(ETH_P_IP),
> + .iph.ihl = 5,
> + .iph.protocol = IPPROTO_TCP,
> + .iph.tot_len = __bpf_constant_htons(MAGIC_BYTES),
> + .iph.frag_off = __bpf_constant_htons(IP_MF),
> + .tcp.doff = 5,
> + .tcp.source = 80,
> + .tcp.dest = 8080,
> + },
> + .keys = {
> + .nhoff = ETH_HLEN,
> + .thoff = ETH_HLEN + sizeof(struct iphdr),
> + .addr_proto = ETH_P_IP,
> + .ip_proto = IPPROTO_TCP,
> + .n_proto = __bpf_constant_htons(ETH_P_IP),
> + .is_frag = true,
> + .is_first_frag = true,
> + },
> + },
> + {
> + .name = "ipv6-frag",
> + .pkt.ipv6_frag = {
> + .eth.h_proto = __bpf_constant_htons(ETH_P_IPV6),
> + .iph.nexthdr = IPPROTO_FRAGMENT,
> + .iph.payload_len = __bpf_constant_htons(MAGIC_BYTES),
> + .ipf.nexthdr = IPPROTO_TCP,
> + .tcp.doff = 5,
> + .tcp.source = 80,
> + .tcp.dest = 8080,
> + },
> + .keys = {
> + .flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG,
> + .nhoff = ETH_HLEN,
> + .thoff = ETH_HLEN + sizeof(struct ipv6hdr) +
> + sizeof(struct frag_hdr),
> + .addr_proto = ETH_P_IPV6,
> + .ip_proto = IPPROTO_TCP,
> + .n_proto = __bpf_constant_htons(ETH_P_IPV6),
> + .is_frag = true,
> + .is_first_frag = true,
> + .sport = 80,
> + .dport = 8080,
> + },
> + .flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG,
> + },
> + {
> + .name = "ipv6-no-frag",
> + .pkt.ipv6_frag = {
> + .eth.h_proto = __bpf_constant_htons(ETH_P_IPV6),
> + .iph.nexthdr = IPPROTO_FRAGMENT,
> + .iph.payload_len = __bpf_constant_htons(MAGIC_BYTES),
> + .ipf.nexthdr = IPPROTO_TCP,
> + .tcp.doff = 5,
> + .tcp.source = 80,
> + .tcp.dest = 8080,
> + },
> + .keys = {
> + .nhoff = ETH_HLEN,
> + .thoff = ETH_HLEN + sizeof(struct ipv6hdr) +
> + sizeof(struct frag_hdr),
> + .addr_proto = ETH_P_IPV6,
> + .ip_proto = IPPROTO_TCP,
> + .n_proto = __bpf_constant_htons(ETH_P_IPV6),
> + .is_frag = true,
> + .is_first_frag = true,
> + },
> + },
> };
>
> static int create_tap(const char *ifname)
> @@ -225,6 +339,13 @@ void test_flow_dissector(void)
> .data_size_in = sizeof(tests[i].pkt),
> .data_out = &flow_keys,
> };
> + static struct bpf_flow_keys ctx = {};
> +
> + if (tests[i].flags) {
> + tattr.ctx_in = &ctx;
> + tattr.ctx_size_in = sizeof(ctx);
> + ctx.flags = tests[i].flags;
> + }
>
> err = bpf_prog_test_run_xattr(&tattr);
> CHECK_ATTR(tattr.data_size_out != sizeof(flow_keys) ||
> @@ -255,6 +376,14 @@ void test_flow_dissector(void)
> struct bpf_prog_test_run_attr tattr = {};
> __u32 key = 0;
>
> + /* Don't run tests that are not marked as
> + * FLOW_DISSECTOR_F_PARSE_1ST_FRAG; eth_get_headlen
> + * sets this flag.
> + */
> +
> + if (tests[i].flags != FLOW_DISSECTOR_F_PARSE_1ST_FRAG)
> + continue;
Maybe test flags & FLOW_DISSECTOR_F_PARSE_1ST_FRAG == 0 instead?
It is not necessary now, but might be useful in the future.
Thanks,
Song
^ permalink raw reply
* Re: [PATCH v4 net-next 08/19] ionic: Add notifyq support
From: Saeed Mahameed @ 2019-07-24 23:21 UTC (permalink / raw)
To: snelson@pensando.io, netdev@vger.kernel.org, davem@davemloft.net
In-Reply-To: <20190722214023.9513-9-snelson@pensando.io>
On Mon, 2019-07-22 at 14:40 -0700, Shannon Nelson wrote:
> The AdminQ is fine for sending messages and requests to the NIC,
> but we also need to have events published from the NIC to the
> driver. The NotifyQ handles this for us, using the same interrupt
> as AdminQ.
>
> Signed-off-by: Shannon Nelson <snelson@pensando.io>
> ---
> .../ethernet/pensando/ionic/ionic_debugfs.c | 16 ++
> .../net/ethernet/pensando/ionic/ionic_lif.c | 181
> +++++++++++++++++-
> .../net/ethernet/pensando/ionic/ionic_lif.h | 4 +
> 3 files changed, 200 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_debugfs.c
> b/drivers/net/ethernet/pensando/ionic/ionic_debugfs.c
> index 9af15c69b2a6..1d05b23de303 100644
> --- a/drivers/net/ethernet/pensando/ionic/ionic_debugfs.c
> +++ b/drivers/net/ethernet/pensando/ionic/ionic_debugfs.c
> @@ -126,6 +126,7 @@ int ionic_debugfs_add_qcq(struct lif *lif, struct
> qcq *qcq)
> struct debugfs_blob_wrapper *desc_blob;
> struct device *dev = lif->ionic->dev;
> struct intr *intr = &qcq->intr;
> + struct dentry *stats_dentry;
> struct queue *q = &qcq->q;
> struct cq *cq = &qcq->cq;
>
> @@ -219,6 +220,21 @@ int ionic_debugfs_add_qcq(struct lif *lif,
> struct qcq *qcq)
> intr_ctrl_regset);
> }
>
> + if (qcq->flags & QCQ_F_NOTIFYQ) {
> + stats_dentry = debugfs_create_dir("notifyblock",
> qcq_dentry);
> + if (IS_ERR_OR_NULL(stats_dentry))
> + return PTR_ERR(stats_dentry);
> +
> + debugfs_create_u64("eid", 0400, stats_dentry,
> + (u64 *)&lif->info->status.eid);
> + debugfs_create_u16("link_status", 0400, stats_dentry,
> + (u16 *)&lif->info-
> >status.link_status);
> + debugfs_create_u32("link_speed", 0400, stats_dentry,
> + (u32 *)&lif->info-
> >status.link_speed);
> + debugfs_create_u16("link_down_count", 0400,
> stats_dentry,
> + (u16 *)&lif->info-
> >status.link_down_count);
> + }
> +
you never write to these lif->info->status.xyz ..
and link state and speed are/should be available in "ethtool <ifname>"
so this looks redundant to me. you can also use ethtool -S to report
linkdown count.
> return 0;
> }
>
> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.c
> b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
> index 19c046502a26..01f9665611d4 100644
> --- a/drivers/net/ethernet/pensando/ionic/ionic_lif.c
> +++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
> @@ -12,6 +12,8 @@
> #include "ionic_lif.h"
> #include "ionic_debugfs.h"
>
> +static int ionic_notifyq_clean(struct lif *lif, int budget);
> +
> static bool ionic_adminq_service(struct cq *cq, struct cq_info
> *cq_info)
> {
> struct admin_comp *comp = cq_info->cq_desc;
> @@ -26,7 +28,90 @@ static bool ionic_adminq_service(struct cq *cq,
> struct cq_info *cq_info)
>
> static int ionic_adminq_napi(struct napi_struct *napi, int budget)
> {
> - return ionic_napi(napi, budget, ionic_adminq_service, NULL,
> NULL);
> + struct lif *lif = napi_to_cq(napi)->lif;
> + int n_work = 0;
> + int a_work = 0;
> +
> + if (likely(lif->notifyqcq && lif->notifyqcq->flags &
> QCQ_F_INITED))
> + n_work = ionic_notifyq_clean(lif, budget);
> + a_work = ionic_napi(napi, budget, ionic_adminq_service, NULL,
> NULL);
> +
> + return max(n_work, a_work);
> +}
> +
> +static bool ionic_notifyq_service(struct cq *cq, struct cq_info
> *cq_info)
> +{
> + union notifyq_comp *comp = cq_info->cq_desc;
> + struct net_device *netdev;
> + struct queue *q;
> + struct lif *lif;
> + u64 eid;
> +
> + q = cq->bound_q;
> + lif = q->info[0].cb_arg;
> + netdev = lif->netdev;
> + eid = le64_to_cpu(comp->event.eid);
> +
> + /* Have we run out of new completions to process? */
> + if (eid <= lif->last_eid)
> + return false;
> +
> + lif->last_eid = eid;
> +
> + dev_dbg(lif->ionic->dev, "notifyq event:\n");
> + dynamic_hex_dump("event ", DUMP_PREFIX_OFFSET, 16, 1,
> + comp, sizeof(*comp), true);
> +
> + switch (le16_to_cpu(comp->event.ecode)) {
> + case EVENT_OPCODE_LINK_CHANGE:
> + netdev_info(netdev, "Notifyq EVENT_OPCODE_LINK_CHANGE
> eid=%lld\n",
> + eid);
> + netdev_info(netdev,
> + " link_status=%d link_speed=%d\n",
> + le16_to_cpu(comp->link_change.link_status),
> + le32_to_cpu(comp->link_change.link_speed));
> + break;
> + case EVENT_OPCODE_RESET:
> + netdev_info(netdev, "Notifyq EVENT_OPCODE_RESET
> eid=%lld\n",
> + eid);
> + netdev_info(netdev, " reset_code=%d state=%d\n",
> + comp->reset.reset_code,
> + comp->reset.state);
> + break;
> + case EVENT_OPCODE_LOG:
> + netdev_info(netdev, "Notifyq EVENT_OPCODE_LOG
> eid=%lld\n", eid);
> + print_hex_dump(KERN_INFO, "notifyq ",
> DUMP_PREFIX_OFFSET, 16, 1,
> + comp->log.data, sizeof(comp->log.data),
> true);
So your device can generate log buffer dump into the kernel log ..
I am not sure how acceptable this is, maybe trace buffer is more
appropriate for this.
> + break;
> + default:
> + netdev_warn(netdev, "Notifyq unknown event ecode=%d
> eid=%lld\n",
> + comp->event.ecode, eid);
> + break;
> + }
> +
> + return true;
> +}
> +
> +static int ionic_notifyq_clean(struct lif *lif, int budget)
> +{
> + struct ionic_dev *idev = &lif->ionic->idev;
> + struct cq *cq = &lif->notifyqcq->cq;
> + u32 work_done;
> +
> + work_done = ionic_cq_service(cq, budget, ionic_notifyq_service,
> + NULL, NULL);
> + if (work_done)
> + ionic_intr_credits(idev->intr_ctrl, cq->bound_intr-
> >index,
> + work_done,
> IONIC_INTR_CRED_RESET_COALESCE);
> +
> + /* If we ran out of budget, there are more events
> + * to process and napi will reschedule us soon
> + */
> + if (work_done == budget)
> + goto return_to_napi;
> +
> +return_to_napi:
> + return work_done;
> }
>
> static irqreturn_t ionic_isr(int irq, void *data)
> @@ -62,6 +147,17 @@ static void ionic_intr_free(struct lif *lif, int
> index)
> clear_bit(index, lif->ionic->intrs);
> }
>
> +static void ionic_link_qcq_interrupts(struct qcq *src_qcq, struct
> qcq *n_qcq)
> +{
> + if (WARN_ON(n_qcq->flags & QCQ_F_INTR)) {
> + ionic_intr_free(n_qcq->cq.lif, n_qcq->intr.index);
> + n_qcq->flags &= ~QCQ_F_INTR;
> + }
> +
> + n_qcq->intr.vector = src_qcq->intr.vector;
> + n_qcq->intr.index = src_qcq->intr.index;
> +}
> +
> static int ionic_qcq_alloc(struct lif *lif, unsigned int type,
> unsigned int index,
> const char *name, unsigned int flags,
> @@ -236,11 +332,36 @@ static int ionic_qcqs_alloc(struct lif *lif)
> if (err)
> return err;
>
> + if (lif->ionic->nnqs_per_lif) {
> + flags = QCQ_F_NOTIFYQ;
> + err = ionic_qcq_alloc(lif, IONIC_QTYPE_NOTIFYQ, 0,
> "notifyq",
> + flags, IONIC_NOTIFYQ_LENGTH,
> + sizeof(struct notifyq_cmd),
> + sizeof(union notifyq_comp),
> + 0, lif->kern_pid, &lif-
> >notifyqcq);
> + if (err)
> + goto err_out_free_adminqcq;
> +
> + /* Let the notifyq ride on the adminq interrupt */
> + ionic_link_qcq_interrupts(lif->adminqcq, lif-
> >notifyqcq);
> + }
> +
> return 0;
> +
> +err_out_free_adminqcq:
> + ionic_qcq_free(lif, lif->adminqcq);
> + lif->adminqcq = NULL;
> +
> + return err;
> }
>
> static void ionic_qcqs_free(struct lif *lif)
> {
> + if (lif->notifyqcq) {
> + ionic_qcq_free(lif, lif->notifyqcq);
> + lif->notifyqcq = NULL;
> + }
> +
> if (lif->adminqcq) {
> ionic_qcq_free(lif, lif->adminqcq);
> lif->adminqcq = NULL;
> @@ -400,6 +521,7 @@ static void ionic_lif_deinit(struct lif *lif)
> clear_bit(LIF_INITED, lif->state);
>
> napi_disable(&lif->adminqcq->napi);
> + ionic_lif_qcq_deinit(lif, lif->notifyqcq);
> ionic_lif_qcq_deinit(lif, lif->adminqcq);
>
> ionic_lif_reset(lif);
> @@ -484,6 +606,55 @@ static int ionic_lif_adminq_init(struct lif
> *lif)
> return 0;
> }
>
> +static int ionic_lif_notifyq_init(struct lif *lif)
> +{
> + struct device *dev = lif->ionic->dev;
> + struct qcq *qcq = lif->notifyqcq;
> + struct queue *q = &qcq->q;
> + int err;
> +
> + struct ionic_admin_ctx ctx = {
> + .work = COMPLETION_INITIALIZER_ONSTACK(ctx.work),
> + .cmd.q_init = {
> + .opcode = CMD_OPCODE_Q_INIT,
> + .lif_index = cpu_to_le16(lif->index),
> + .type = q->type,
> + .index = cpu_to_le32(q->index),
> + .flags = cpu_to_le16(IONIC_QINIT_F_IRQ |
> + IONIC_QINIT_F_ENA),
> + .intr_index = cpu_to_le16(lif->adminqcq-
> >intr.index),
> + .pid = cpu_to_le16(q->pid),
> + .ring_size = ilog2(q->num_descs),
> + .ring_base = cpu_to_le64(q->base_pa),
> + }
> + };
> +
> + dev_dbg(dev, "notifyq_init.pid %d\n", ctx.cmd.q_init.pid);
> + dev_dbg(dev, "notifyq_init.index %d\n", ctx.cmd.q_init.index);
> + dev_dbg(dev, "notifyq_init.ring_base 0x%llx\n",
> ctx.cmd.q_init.ring_base);
> + dev_dbg(dev, "notifyq_init.ring_size %d\n",
> ctx.cmd.q_init.ring_size);
> +
> + err = ionic_adminq_post_wait(lif, &ctx);
> + if (err)
> + return err;
> +
> + q->hw_type = ctx.comp.q_init.hw_type;
> + q->hw_index = le32_to_cpu(ctx.comp.q_init.hw_index);
> + q->dbval = IONIC_DBELL_QID(q->hw_index);
> +
> + dev_dbg(dev, "notifyq->hw_type %d\n", q->hw_type);
> + dev_dbg(dev, "notifyq->hw_index %d\n", q->hw_index);
> +
> + /* preset the callback info */
> + q->info[0].cb_arg = lif;
> +
> + qcq->flags |= QCQ_F_INITED;
> +
> + ionic_debugfs_add_qcq(lif, qcq);
> +
> + return 0;
> +}
> +
> static int ionic_lif_init(struct lif *lif)
> {
> struct ionic_dev *idev = &lif->ionic->idev;
> @@ -534,10 +705,18 @@ static int ionic_lif_init(struct lif *lif)
> if (err)
> goto err_out_adminq_deinit;
>
> + if (lif->ionic->nnqs_per_lif) {
> + err = ionic_lif_notifyq_init(lif);
> + if (err)
> + goto err_out_notifyq_deinit;
> + }
> +
> set_bit(LIF_INITED, lif->state);
>
> return 0;
>
> +err_out_notifyq_deinit:
> + ionic_lif_qcq_deinit(lif, lif->notifyqcq);
> err_out_adminq_deinit:
> ionic_lif_qcq_deinit(lif, lif->adminqcq);
> ionic_lif_reset(lif);
> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.h
> b/drivers/net/ethernet/pensando/ionic/ionic_lif.h
> index 28ab92b43a64..80eec0778f40 100644
> --- a/drivers/net/ethernet/pensando/ionic/ionic_lif.h
> +++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.h
> @@ -7,6 +7,7 @@
> #include <linux/pci.h>
>
> #define IONIC_ADMINQ_LENGTH 16 /* must be a power of two */
> +#define IONIC_NOTIFYQ_LENGTH 64 /* must be a power of two */
>
> #define GET_NAPI_CNTR_IDX(work_done) (work_done)
> #define MAX_NUM_NAPI_CNTR (NAPI_POLL_WEIGHT + 1)
> @@ -26,6 +27,7 @@ struct rx_stats {
> #define QCQ_F_INITED BIT(0)
> #define QCQ_F_SG BIT(1)
> #define QCQ_F_INTR BIT(2)
> +#define QCQ_F_NOTIFYQ BIT(5)
>
> struct napi_stats {
> u64 poll_count;
> @@ -78,6 +80,8 @@ struct lif {
> u64 __iomem *kern_dbpage;
> spinlock_t adminq_lock; /* lock for AdminQ operations
> */
> struct qcq *adminqcq;
> + struct qcq *notifyqcq;
> + u64 last_eid;
> unsigned int neqs;
> unsigned int nxqs;
>
^ permalink raw reply
* Re: [PATCH 00/12] block/bio, fs: convert put_page() to put_user_page*()
From: John Hubbard @ 2019-07-24 23:23 UTC (permalink / raw)
To: Christoph Hellwig, john.hubbard
Cc: Andrew Morton, Alexander Viro, Anna Schumaker, David S . Miller,
Dominique Martinet, Eric Van Hensbergen, Jason Gunthorpe,
Jason Wang, Jens Axboe, Latchesar Ionkov, Michael S . Tsirkin,
Miklos Szeredi, Trond Myklebust, Christoph Hellwig,
Matthew Wilcox, linux-mm, LKML, ceph-devel, kvm, linux-block,
linux-cifs, linux-fsdevel, linux-nfs, linux-rdma, netdev,
samba-technical, v9fs-developer, virtualization
In-Reply-To: <20190724061750.GA19397@infradead.org>
On 7/23/19 11:17 PM, Christoph Hellwig wrote:
> On Tue, Jul 23, 2019 at 09:25:06PM -0700, john.hubbard@gmail.com wrote:
>> * Store, in the iov_iter, a "came from gup (get_user_pages)" parameter.
>> Then, use the new iov_iter_get_pages_use_gup() to retrieve it when
>> it is time to release the pages. That allows choosing between put_page()
>> and put_user_page*().
>>
>> * Pass in one more piece of information to bio_release_pages: a "from_gup"
>> parameter. Similar use as above.
>>
>> * Change the block layer, and several file systems, to use
>> put_user_page*().
>
> I think we can do this in a simple and better way. We have 5 ITER_*
> types. Of those ITER_DISCARD as the name suggests never uses pages, so
> we can skip handling it. ITER_PIPE is rejected іn the direct I/O path,
> which leaves us with three.
>
> Out of those ITER_BVEC needs a user page reference, so we want to call
^ ITER_IOVEC, I hope. Otherwise I'm hopeless lost. :)
> put_user_page* on it. ITER_BVEC always already has page reference,
> which means in the block direct I/O path path we alread don't take
> a page reference. We should extent that handling to all other calls
> of iov_iter_get_pages / iov_iter_get_pages_alloc. I think we should
> just reject ITER_KVEC for direct I/O as well as we have no users and
> it is rather pointless. Alternatively if we see a use for it the
> callers should always have a life page reference anyway (or might
> be on kmalloc memory), so we really should not take a reference either.
>
> In other words: the only time we should ever have to put a page in
> this patch is when they are user pages. We'll need to clean up
> various bits of code for that, but that can be done gradually before
> even getting to the actual put_user_pages conversion.
>
Sounds great. I'm part way into it and it doesn't look too bad. The main
question is where to scatter various checks and assertions, to keep
the kvecs out of direct I/0. Or at least keep the gups away from
direct I/0.
thanks,
--
John Hubbard
NVIDIA
^ permalink raw reply
* Re: [PATCH bpf-next 6/7] bpf/flow_dissector: support ipv6 flow_label and FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL
From: Song Liu @ 2019-07-24 23:28 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: Networking, bpf, David S . Miller, Alexei Starovoitov,
Daniel Borkmann, Willem de Bruijn, Petar Penkov
In-Reply-To: <20190724170018.96659-7-sdf@google.com>
On Wed, Jul 24, 2019 at 10:11 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> Add support for exporting ipv6 flow label via bpf_flow_keys.
> Export flow label from bpf_flow.c and also return early when
> FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL is passed.
>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Petar Penkov <ppenkov@google.com>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
^ permalink raw reply
* Re: [PATCH bpf-next 7/7] selftests/bpf: support FLOW_DISSECTOR_F_STOP_AT_ENCAP
From: Song Liu @ 2019-07-24 23:29 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: Networking, bpf, David S . Miller, Alexei Starovoitov,
Daniel Borkmann, Willem de Bruijn, Petar Penkov
In-Reply-To: <20190724170018.96659-8-sdf@google.com>
On Wed, Jul 24, 2019 at 10:11 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> Exit as soon as we found that packet is encapped when
> FLOW_DISSECTOR_F_STOP_AT_ENCAP is passed.
> Add appropriate selftest cases.
>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Petar Penkov <ppenkov@google.com>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
> ---
> .../selftests/bpf/prog_tests/flow_dissector.c | 60 +++++++++++++++++++
> tools/testing/selftests/bpf/progs/bpf_flow.c | 8 +++
> 2 files changed, 68 insertions(+)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> index 1ea921c4cdc0..e382264fbc40 100644
> --- a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> +++ b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> @@ -41,6 +41,13 @@ struct ipv4_pkt {
> struct tcphdr tcp;
> } __packed;
>
> +struct ipip_pkt {
> + struct ethhdr eth;
> + struct iphdr iph;
> + struct iphdr iph_inner;
> + struct tcphdr tcp;
> +} __packed;
> +
> struct svlan_ipv4_pkt {
> struct ethhdr eth;
> __u16 vlan_tci;
> @@ -82,6 +89,7 @@ struct test {
> union {
> struct ipv4_pkt ipv4;
> struct svlan_ipv4_pkt svlan_ipv4;
> + struct ipip_pkt ipip;
> struct ipv6_pkt ipv6;
> struct ipv6_frag_pkt ipv6_frag;
> struct dvlan_ipv6_pkt dvlan_ipv6;
> @@ -303,6 +311,58 @@ struct test tests[] = {
> },
> .flags = FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL,
> },
> + {
> + .name = "ipip-encap",
> + .pkt.ipip = {
> + .eth.h_proto = __bpf_constant_htons(ETH_P_IP),
> + .iph.ihl = 5,
> + .iph.protocol = IPPROTO_IPIP,
> + .iph.tot_len = __bpf_constant_htons(MAGIC_BYTES),
> + .iph_inner.ihl = 5,
> + .iph_inner.protocol = IPPROTO_TCP,
> + .iph_inner.tot_len = __bpf_constant_htons(MAGIC_BYTES),
> + .tcp.doff = 5,
> + .tcp.source = 80,
> + .tcp.dest = 8080,
> + },
> + .keys = {
> + .nhoff = 0,
> + .nhoff = ETH_HLEN,
> + .thoff = ETH_HLEN + sizeof(struct iphdr) +
> + sizeof(struct iphdr),
> + .addr_proto = ETH_P_IP,
> + .ip_proto = IPPROTO_TCP,
> + .n_proto = __bpf_constant_htons(ETH_P_IP),
> + .is_encap = true,
> + .sport = 80,
> + .dport = 8080,
> + },
> + },
> + {
> + .name = "ipip-no-encap",
> + .pkt.ipip = {
> + .eth.h_proto = __bpf_constant_htons(ETH_P_IP),
> + .iph.ihl = 5,
> + .iph.protocol = IPPROTO_IPIP,
> + .iph.tot_len = __bpf_constant_htons(MAGIC_BYTES),
> + .iph_inner.ihl = 5,
> + .iph_inner.protocol = IPPROTO_TCP,
> + .iph_inner.tot_len = __bpf_constant_htons(MAGIC_BYTES),
> + .tcp.doff = 5,
> + .tcp.source = 80,
> + .tcp.dest = 8080,
> + },
> + .keys = {
> + .flags = FLOW_DISSECTOR_F_STOP_AT_ENCAP,
> + .nhoff = ETH_HLEN,
> + .thoff = ETH_HLEN + sizeof(struct iphdr),
> + .addr_proto = ETH_P_IP,
> + .ip_proto = IPPROTO_IPIP,
> + .n_proto = __bpf_constant_htons(ETH_P_IP),
> + .is_encap = true,
> + },
> + .flags = FLOW_DISSECTOR_F_STOP_AT_ENCAP,
> + },
> };
>
> static int create_tap(const char *ifname)
> diff --git a/tools/testing/selftests/bpf/progs/bpf_flow.c b/tools/testing/selftests/bpf/progs/bpf_flow.c
> index 7d73b7bfe609..b6236cdf8564 100644
> --- a/tools/testing/selftests/bpf/progs/bpf_flow.c
> +++ b/tools/testing/selftests/bpf/progs/bpf_flow.c
> @@ -167,9 +167,15 @@ static __always_inline int parse_ip_proto(struct __sk_buff *skb, __u8 proto)
> return export_flow_keys(keys, BPF_OK);
> case IPPROTO_IPIP:
> keys->is_encap = true;
> + if (keys->flags & FLOW_DISSECTOR_F_STOP_AT_ENCAP)
> + return export_flow_keys(keys, BPF_OK);
> +
> return parse_eth_proto(skb, bpf_htons(ETH_P_IP));
> case IPPROTO_IPV6:
> keys->is_encap = true;
> + if (keys->flags & FLOW_DISSECTOR_F_STOP_AT_ENCAP)
> + return export_flow_keys(keys, BPF_OK);
> +
> return parse_eth_proto(skb, bpf_htons(ETH_P_IPV6));
> case IPPROTO_GRE:
> gre = bpf_flow_dissect_get_header(skb, sizeof(*gre), &_gre);
> @@ -189,6 +195,8 @@ static __always_inline int parse_ip_proto(struct __sk_buff *skb, __u8 proto)
> keys->thoff += 4; /* Step over sequence number */
>
> keys->is_encap = true;
> + if (keys->flags & FLOW_DISSECTOR_F_STOP_AT_ENCAP)
> + return export_flow_keys(keys, BPF_OK);
>
> if (gre->proto == bpf_htons(ETH_P_TEB)) {
> eth = bpf_flow_dissect_get_header(skb, sizeof(*eth),
> --
> 2.22.0.657.g960e92d24f-goog
>
^ permalink raw reply
* Re: [PATCH v4 net-next 09/19] ionic: Add the basic NDO callbacks for netdev support
From: Saeed Mahameed @ 2019-07-24 23:45 UTC (permalink / raw)
To: snelson@pensando.io, netdev@vger.kernel.org, davem@davemloft.net
In-Reply-To: <20190722214023.9513-10-snelson@pensando.io>
On Mon, 2019-07-22 at 14:40 -0700, Shannon Nelson wrote:
> Set up the initial NDO structure and callbacks for netdev
> to use, and register the netdev. This will allow us to do
> a few basic operations on the device, but no traffic yet.
>
> Signed-off-by: Shannon Nelson <snelson@pensando.io>
> ---
> drivers/net/ethernet/pensando/ionic/ionic.h | 1 +
> .../ethernet/pensando/ionic/ionic_bus_pci.c | 9 +
> .../net/ethernet/pensando/ionic/ionic_dev.h | 2 +
> .../net/ethernet/pensando/ionic/ionic_lif.c | 348
> ++++++++++++++++++
> .../net/ethernet/pensando/ionic/ionic_lif.h | 5 +
> 5 files changed, 365 insertions(+)
>
> diff --git a/drivers/net/ethernet/pensando/ionic/ionic.h
> b/drivers/net/ethernet/pensando/ionic/ionic.h
> index 87ab13aee89e..d7eee79b2a10 100644
> --- a/drivers/net/ethernet/pensando/ionic/ionic.h
> +++ b/drivers/net/ethernet/pensando/ionic/ionic.h
> @@ -34,6 +34,7 @@ struct ionic {
> unsigned int num_bars;
> struct identity ident;
> struct list_head lifs;
> + struct lif *master_lif;
> unsigned int nnqs_per_lif;
> unsigned int neqs_per_lif;
> unsigned int ntxqs_per_lif;
> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_bus_pci.c
> b/drivers/net/ethernet/pensando/ionic/ionic_bus_pci.c
> index 59d1ae7ce532..98c12b770c7f 100644
> --- a/drivers/net/ethernet/pensando/ionic/ionic_bus_pci.c
> +++ b/drivers/net/ethernet/pensando/ionic/ionic_bus_pci.c
> @@ -206,8 +206,16 @@ static int ionic_probe(struct pci_dev *pdev,
> const struct pci_device_id *ent)
> goto err_out_free_lifs;
> }
>
> + err = ionic_lifs_register(ionic);
> + if (err) {
> + dev_err(dev, "Cannot register LIFs: %d, aborting\n",
> err);
> + goto err_out_deinit_lifs;
> + }
> +
> return 0;
>
> +err_out_deinit_lifs:
> + ionic_lifs_deinit(ionic);
> err_out_free_lifs:
> ionic_lifs_free(ionic);
> err_out_free_irqs:
> @@ -239,6 +247,7 @@ static void ionic_remove(struct pci_dev *pdev)
> struct ionic *ionic = pci_get_drvdata(pdev);
>
> if (ionic) {
> + ionic_lifs_unregister(ionic);
> ionic_lifs_deinit(ionic);
> ionic_lifs_free(ionic);
> ionic_bus_free_irq_vectors(ionic);
> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_dev.h
> b/drivers/net/ethernet/pensando/ionic/ionic_dev.h
> index 8bd1501dd639..523927566925 100644
> --- a/drivers/net/ethernet/pensando/ionic/ionic_dev.h
> +++ b/drivers/net/ethernet/pensando/ionic/ionic_dev.h
> @@ -10,6 +10,8 @@
> #include "ionic_if.h"
> #include "ionic_regs.h"
>
> +#define IONIC_MIN_MTU ETH_MIN_MTU
> +#define IONIC_MAX_MTU 9194
> #define IONIC_LIFS_MAX 1024
>
> struct ionic_dev_bar {
> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.c
> b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
> index 01f9665611d4..005b1d908fa1 100644
> --- a/drivers/net/ethernet/pensando/ionic/ionic_lif.c
> +++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
> @@ -12,8 +12,74 @@
> #include "ionic_lif.h"
> #include "ionic_debugfs.h"
>
> +static int ionic_set_nic_features(struct lif *lif, netdev_features_t
> features);
> static int ionic_notifyq_clean(struct lif *lif, int budget);
>
> +int ionic_open(struct net_device *netdev)
> +{
> + struct lif *lif = netdev_priv(netdev);
> +
> + netif_carrier_off(netdev);
> +
> + set_bit(LIF_UP, lif->state);
> +
> + if (netif_carrier_ok(netdev))
always false ? you just invoked netif_carrier_off two lines ago..
> + netif_tx_wake_all_queues(netdev);
> +
> + return 0;
> +}
> +
> +static int ionic_lif_stop(struct lif *lif)
> +{
> + struct net_device *ndev = lif->netdev;
> + int err = 0;
> +
> + if (!test_bit(LIF_UP, lif->state)) {
> + dev_dbg(lif->ionic->dev, "%s: %s state=DOWN\n",
> + __func__, lif->name);
> + return 0;
> + }
> + dev_dbg(lif->ionic->dev, "%s: %s state=UP\n", __func__, lif-
> >name);
> + clear_bit(LIF_UP, lif->state);
> +
> + /* carrier off before disabling queues to avoid watchdog
> timeout */
> + netif_carrier_off(ndev);
> + netif_tx_stop_all_queues(ndev);
> + netif_tx_disable(ndev);
> + synchronize_rcu();
why synchronize_rcu ?
> +
> + return err;
> +}
> +
> +int ionic_stop(struct net_device *netdev)
> +{
> + struct lif *lif = netdev_priv(netdev);
> +
> + return ionic_lif_stop(lif);
> +}
> +
> +int ionic_reset_queues(struct lif *lif)
> +{
> + bool running;
> + int err = 0;
> +
> + /* Put off the next watchdog timeout */
> + netif_trans_update(lif->netdev);
this doesn't seem right to me also this won't help you if the next
while loop takes too long.. also netif_trans_update is marked to be
only used for legacy drivers.
> +
> + while (test_and_set_bit(LIF_QUEUE_RESET, lif->state))
> + usleep_range(100, 200);
> +
> + running = netif_running(lif->netdev);
> + if (running)
> + err = ionic_stop(lif->netdev);
> + if (!err && running)
> + ionic_open(lif->netdev);
> +
> + clear_bit(LIF_QUEUE_RESET, lif->state);
> +
> + return err;
> +}
> +
> static bool ionic_adminq_service(struct cq *cq, struct cq_info
> *cq_info)
> {
> struct admin_comp *comp = cq_info->cq_desc;
> @@ -114,6 +180,81 @@ static int ionic_notifyq_clean(struct lif *lif,
> int budget)
> return work_done;
> }
>
> +static int ionic_set_features(struct net_device *netdev,
> + netdev_features_t features)
> +{
> + struct lif *lif = netdev_priv(netdev);
> + int err;
> +
> + netdev_dbg(netdev, "%s: lif->features=0x%08llx
> new_features=0x%08llx\n",
> + __func__, (u64)lif->netdev->features,
> (u64)features);
> +
> + err = ionic_set_nic_features(lif, features);
> +
> + return err;
> +}
> +
> +static int ionic_set_mac_address(struct net_device *netdev, void
> *sa)
> +{
> + netdev_info(netdev, "%s: stubbed\n", __func__);
> + return 0;
> +}
> +
> +static int ionic_change_mtu(struct net_device *netdev, int new_mtu)
> +{
> + struct lif *lif = netdev_priv(netdev);
> + struct ionic_admin_ctx ctx = {
> + .work = COMPLETION_INITIALIZER_ONSTACK(ctx.work),
> + .cmd.lif_setattr = {
> + .opcode = CMD_OPCODE_LIF_SETATTR,
> + .index = cpu_to_le16(lif->index),
> + .attr = IONIC_LIF_ATTR_MTU,
> + .mtu = cpu_to_le32(new_mtu),
> + },
> + };
> + int err;
> +
> + err = ionic_adminq_post_wait(lif, &ctx);
> + if (err)
> + return err;
> +
> + netdev->mtu = new_mtu;
> + err = ionic_reset_queues(lif);
> +
> + return err;
> +}
> +
> +static void ionic_tx_timeout(struct net_device *netdev)
> +{
> + netdev_info(netdev, "%s: stubbed\n", __func__);
> +}
> +
> +static int ionic_vlan_rx_add_vid(struct net_device *netdev, __be16
> proto,
> + u16 vid)
> +{
> + netdev_info(netdev, "%s: stubbed\n", __func__);
> + return 0;
> +}
> +
> +static int ionic_vlan_rx_kill_vid(struct net_device *netdev, __be16
> proto,
> + u16 vid)
> +{
> + netdev_info(netdev, "%s: stubbed\n", __func__);
> + return 0;
> +}
> +
> +static const struct net_device_ops ionic_netdev_ops = {
> + .ndo_open = ionic_open,
> + .ndo_stop = ionic_stop,
> + .ndo_set_features = ionic_set_features,
> + .ndo_set_mac_address = ionic_set_mac_address,
> + .ndo_validate_addr = eth_validate_addr,
> + .ndo_tx_timeout = ionic_tx_timeout,
> + .ndo_change_mtu = ionic_change_mtu,
> + .ndo_vlan_rx_add_vid = ionic_vlan_rx_add_vid,
> + .ndo_vlan_rx_kill_vid = ionic_vlan_rx_kill_vid,
> +};
> +
> static irqreturn_t ionic_isr(int irq, void *data)
> {
> struct napi_struct *napi = data;
> @@ -388,6 +529,12 @@ static struct lif *ionic_lif_alloc(struct ionic
> *ionic, unsigned int index)
>
> lif = netdev_priv(netdev);
> lif->netdev = netdev;
> + ionic->master_lif = lif;
> + netdev->netdev_ops = &ionic_netdev_ops;
> +
> + netdev->watchdog_timeo = 2 * HZ;
> + netdev->min_mtu = IONIC_MIN_MTU;
> + netdev->max_mtu = IONIC_MAX_MTU;
>
> lif->neqs = ionic->neqs_per_lif;
> lif->nxqs = ionic->ntxqs_per_lif;
> @@ -655,6 +802,177 @@ static int ionic_lif_notifyq_init(struct lif
> *lif)
> return 0;
> }
>
> +static __le64 ionic_netdev_features_to_nic(netdev_features_t
> features)
> +{
> + u64 wanted = 0;
> +
> + if (features & NETIF_F_HW_VLAN_CTAG_TX)
> + wanted |= ETH_HW_VLAN_TX_TAG;
> + if (features & NETIF_F_HW_VLAN_CTAG_RX)
> + wanted |= ETH_HW_VLAN_RX_STRIP;
> + if (features & NETIF_F_HW_VLAN_CTAG_FILTER)
> + wanted |= ETH_HW_VLAN_RX_FILTER;
> + if (features & NETIF_F_RXHASH)
> + wanted |= ETH_HW_RX_HASH;
> + if (features & NETIF_F_RXCSUM)
> + wanted |= ETH_HW_RX_CSUM;
> + if (features & NETIF_F_SG)
> + wanted |= ETH_HW_TX_SG;
> + if (features & NETIF_F_HW_CSUM)
> + wanted |= ETH_HW_TX_CSUM;
> + if (features & NETIF_F_TSO)
> + wanted |= ETH_HW_TSO;
> + if (features & NETIF_F_TSO6)
> + wanted |= ETH_HW_TSO_IPV6;
> + if (features & NETIF_F_TSO_ECN)
> + wanted |= ETH_HW_TSO_ECN;
> + if (features & NETIF_F_GSO_GRE)
> + wanted |= ETH_HW_TSO_GRE;
> + if (features & NETIF_F_GSO_GRE_CSUM)
> + wanted |= ETH_HW_TSO_GRE_CSUM;
> + if (features & NETIF_F_GSO_IPXIP4)
> + wanted |= ETH_HW_TSO_IPXIP4;
> + if (features & NETIF_F_GSO_IPXIP6)
> + wanted |= ETH_HW_TSO_IPXIP6;
> + if (features & NETIF_F_GSO_UDP_TUNNEL)
> + wanted |= ETH_HW_TSO_UDP;
> + if (features & NETIF_F_GSO_UDP_TUNNEL_CSUM)
> + wanted |= ETH_HW_TSO_UDP_CSUM;
> +
> + return cpu_to_le64(wanted);
> +}
> +
> +static int ionic_set_nic_features(struct lif *lif, netdev_features_t
> features)
> +{
> + struct device *dev = lif->ionic->dev;
> + struct ionic_admin_ctx ctx = {
> + .work = COMPLETION_INITIALIZER_ONSTACK(ctx.work),
> + .cmd.lif_setattr = {
> + .opcode = CMD_OPCODE_LIF_SETATTR,
> + .index = cpu_to_le16(lif->index),
> + .attr = IONIC_LIF_ATTR_FEATURES,
> + },
> + };
> + u64 vlan_flags = ETH_HW_VLAN_TX_TAG |
> + ETH_HW_VLAN_RX_STRIP |
> + ETH_HW_VLAN_RX_FILTER;
> + int err;
> +
> + ctx.cmd.lif_setattr.features =
> ionic_netdev_features_to_nic(features);
> + err = ionic_adminq_post_wait(lif, &ctx);
> + if (err)
> + return err;
> +
> + lif->hw_features = le64_to_cpu(ctx.cmd.lif_setattr.features &
> + ctx.comp.lif_setattr.features);
> +
> + if ((vlan_flags & features) &&
> + !(vlan_flags & le64_to_cpu(ctx.comp.lif_setattr.features)))
> + dev_info_once(lif->ionic->dev, "NIC is not supporting
> vlan offload, likely in SmartNIC mode\n");
> +
> + if (lif->hw_features & ETH_HW_VLAN_TX_TAG)
> + dev_dbg(dev, "feature ETH_HW_VLAN_TX_TAG\n");
> + if (lif->hw_features & ETH_HW_VLAN_RX_STRIP)
> + dev_dbg(dev, "feature ETH_HW_VLAN_RX_STRIP\n");
> + if (lif->hw_features & ETH_HW_VLAN_RX_FILTER)
> + dev_dbg(dev, "feature ETH_HW_VLAN_RX_FILTER\n");
> + if (lif->hw_features & ETH_HW_RX_HASH)
> + dev_dbg(dev, "feature ETH_HW_RX_HASH\n");
> + if (lif->hw_features & ETH_HW_TX_SG)
> + dev_dbg(dev, "feature ETH_HW_TX_SG\n");
> + if (lif->hw_features & ETH_HW_TX_CSUM)
> + dev_dbg(dev, "feature ETH_HW_TX_CSUM\n");
> + if (lif->hw_features & ETH_HW_RX_CSUM)
> + dev_dbg(dev, "feature ETH_HW_RX_CSUM\n");
> + if (lif->hw_features & ETH_HW_TSO)
> + dev_dbg(dev, "feature ETH_HW_TSO\n");
> + if (lif->hw_features & ETH_HW_TSO_IPV6)
> + dev_dbg(dev, "feature ETH_HW_TSO_IPV6\n");
> + if (lif->hw_features & ETH_HW_TSO_ECN)
> + dev_dbg(dev, "feature ETH_HW_TSO_ECN\n");
> + if (lif->hw_features & ETH_HW_TSO_GRE)
> + dev_dbg(dev, "feature ETH_HW_TSO_GRE\n");
> + if (lif->hw_features & ETH_HW_TSO_GRE_CSUM)
> + dev_dbg(dev, "feature ETH_HW_TSO_GRE_CSUM\n");
> + if (lif->hw_features & ETH_HW_TSO_IPXIP4)
> + dev_dbg(dev, "feature ETH_HW_TSO_IPXIP4\n");
> + if (lif->hw_features & ETH_HW_TSO_IPXIP6)
> + dev_dbg(dev, "feature ETH_HW_TSO_IPXIP6\n");
> + if (lif->hw_features & ETH_HW_TSO_UDP)
> + dev_dbg(dev, "feature ETH_HW_TSO_UDP\n");
> + if (lif->hw_features & ETH_HW_TSO_UDP_CSUM)
> + dev_dbg(dev, "feature ETH_HW_TSO_UDP_CSUM\n");
> +
> + return 0;
> +}
> +
> +static int ionic_init_nic_features(struct lif *lif)
> +{
> + struct net_device *netdev = lif->netdev;
> + netdev_features_t features;
> + int err;
> +
> + /* set up what we expect to support by default */
> + features = NETIF_F_HW_VLAN_CTAG_TX |
> + NETIF_F_HW_VLAN_CTAG_RX |
> + NETIF_F_HW_VLAN_CTAG_FILTER |
> + NETIF_F_RXHASH |
> + NETIF_F_SG |
> + NETIF_F_HW_CSUM |
> + NETIF_F_RXCSUM |
> + NETIF_F_TSO |
> + NETIF_F_TSO6 |
> + NETIF_F_TSO_ECN;
> +
> + err = ionic_set_nic_features(lif, features);
> + if (err)
> + return err;
> +
> + /* tell the netdev what we actually can support */
> + netdev->features |= NETIF_F_HIGHDMA;
> +
> + if (lif->hw_features & ETH_HW_VLAN_TX_TAG)
> + netdev->hw_features |= NETIF_F_HW_VLAN_CTAG_TX;
> + if (lif->hw_features & ETH_HW_VLAN_RX_STRIP)
> + netdev->hw_features |= NETIF_F_HW_VLAN_CTAG_RX;
> + if (lif->hw_features & ETH_HW_VLAN_RX_FILTER)
> + netdev->hw_features |= NETIF_F_HW_VLAN_CTAG_FILTER;
> + if (lif->hw_features & ETH_HW_RX_HASH)
> + netdev->hw_features |= NETIF_F_RXHASH;
> + if (lif->hw_features & ETH_HW_TX_SG)
> + netdev->hw_features |= NETIF_F_SG;
> +
> + if (lif->hw_features & ETH_HW_TX_CSUM)
> + netdev->hw_enc_features |= NETIF_F_HW_CSUM;
> + if (lif->hw_features & ETH_HW_RX_CSUM)
> + netdev->hw_enc_features |= NETIF_F_RXCSUM;
> + if (lif->hw_features & ETH_HW_TSO)
> + netdev->hw_enc_features |= NETIF_F_TSO;
> + if (lif->hw_features & ETH_HW_TSO_IPV6)
> + netdev->hw_enc_features |= NETIF_F_TSO6;
> + if (lif->hw_features & ETH_HW_TSO_ECN)
> + netdev->hw_enc_features |= NETIF_F_TSO_ECN;
> + if (lif->hw_features & ETH_HW_TSO_GRE)
> + netdev->hw_enc_features |= NETIF_F_GSO_GRE;
> + if (lif->hw_features & ETH_HW_TSO_GRE_CSUM)
> + netdev->hw_enc_features |= NETIF_F_GSO_GRE_CSUM;
> + if (lif->hw_features & ETH_HW_TSO_IPXIP4)
> + netdev->hw_enc_features |= NETIF_F_GSO_IPXIP4;
> + if (lif->hw_features & ETH_HW_TSO_IPXIP6)
> + netdev->hw_enc_features |= NETIF_F_GSO_IPXIP6;
> + if (lif->hw_features & ETH_HW_TSO_UDP)
> + netdev->hw_enc_features |= NETIF_F_GSO_UDP_TUNNEL;
> + if (lif->hw_features & ETH_HW_TSO_UDP_CSUM)
> + netdev->hw_enc_features |= NETIF_F_GSO_UDP_TUNNEL_CSUM;
> +
> + netdev->hw_features |= netdev->hw_enc_features;
> + netdev->features |= netdev->hw_features;
> +
> + netdev->priv_flags |= IFF_UNICAST_FLT;
> +
> + return 0;
> +}
> +
> static int ionic_lif_init(struct lif *lif)
> {
> struct ionic_dev *idev = &lif->ionic->idev;
> @@ -711,6 +1029,10 @@ static int ionic_lif_init(struct lif *lif)
> goto err_out_notifyq_deinit;
> }
>
> + err = ionic_init_nic_features(lif);
> + if (err)
> + goto err_out_notifyq_deinit;
> +
> set_bit(LIF_INITED, lif->state);
>
> return 0;
> @@ -745,6 +1067,32 @@ int ionic_lifs_init(struct ionic *ionic)
> return 0;
> }
>
> +int ionic_lifs_register(struct ionic *ionic)
> +{
> + int err;
> +
> + /* only register LIF0 for now */
> + err = register_netdev(ionic->master_lif->netdev);
> + if (err) {
> + dev_err(ionic->dev, "Cannot register net device,
> aborting\n");
> + return err;
> + }
> +
> + ionic->master_lif->registered = true;
> +
> + return 0;
> +}
> +
> +void ionic_lifs_unregister(struct ionic *ionic)
> +{
> + /* There is only one lif ever registered in the
> + * current model, so don't bother searching the
> + * ionic->lif for candidates to unregister
> + */
> + if (ionic->master_lif->netdev->reg_state == NETREG_REGISTERED)
> + unregister_netdev(ionic->master_lif->netdev);
> +}
> +
> int ionic_lif_identify(struct ionic *ionic, u8 lif_type,
> union lif_identity *lid)
> {
> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.h
> b/drivers/net/ethernet/pensando/ionic/ionic_lif.h
> index 80eec0778f40..ef3f7340a277 100644
> --- a/drivers/net/ethernet/pensando/ionic/ionic_lif.h
> +++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.h
> @@ -61,6 +61,8 @@ struct qcq {
>
> enum lif_state_flags {
> LIF_INITED,
> + LIF_UP,
> + LIF_QUEUE_RESET,
>
> /* leave this as last */
> LIF_STATE_SIZE
> @@ -84,6 +86,7 @@ struct lif {
> u64 last_eid;
> unsigned int neqs;
> unsigned int nxqs;
> + u64 hw_features;
>
> struct lif_info *info;
> dma_addr_t info_pa;
> @@ -124,6 +127,8 @@ int ionic_lifs_alloc(struct ionic *ionic);
> void ionic_lifs_free(struct ionic *ionic);
> void ionic_lifs_deinit(struct ionic *ionic);
> int ionic_lifs_init(struct ionic *ionic);
> +int ionic_lifs_register(struct ionic *ionic);
> +void ionic_lifs_unregister(struct ionic *ionic);
> int ionic_lif_identify(struct ionic *ionic, u8 lif_type,
> union lif_identity *lif_ident);
> int ionic_lifs_size(struct ionic *ionic);
^ permalink raw reply
* Re: [PATCH net-next] netfilter: nf_table_offload: Fix zero prio of flow_cls_common_offload
From: Marcelo Ricardo Leitner @ 2019-07-24 23:51 UTC (permalink / raw)
To: wenxu; +Cc: pablo, davem, netfilter-devel, netdev
In-Reply-To: <1562832210-25981-1-git-send-email-wenxu@ucloud.cn>
On Thu, Jul 11, 2019 at 04:03:30PM +0800, wenxu@ucloud.cn wrote:
> From: wenxu <wenxu@ucloud.cn>
>
> The flow_cls_common_offload prio should be not zero
>
> It leads the invalid table prio in hw.
>
> # nft add table netdev firewall
> # nft add chain netdev firewall acl { type filter hook ingress device mlx_pf0vf0 priority - 300 \; }
> # nft add rule netdev firewall acl ip daddr 1.1.1.7 drop
> Error: Could not process rule: Invalid argument
>
> kernel log
> mlx5_core 0000:81:00.0: E-Switch: Failed to create FDB Table err -22 (table prio: 65535, level: 0, size: 4194304)
>
> Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support")
> Signed-off-by: wenxu <wenxu@ucloud.cn>
> ---
> net/netfilter/nf_tables_offload.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/net/netfilter/nf_tables_offload.c b/net/netfilter/nf_tables_offload.c
> index 2c33028..01d8133 100644
> --- a/net/netfilter/nf_tables_offload.c
> +++ b/net/netfilter/nf_tables_offload.c
> @@ -7,6 +7,8 @@
> #include <net/netfilter/nf_tables_offload.h>
> #include <net/pkt_cls.h>
>
> +#define FLOW_OFFLOAD_DEFAUT_PRIO 1U
> +
> static struct nft_flow_rule *nft_flow_rule_alloc(int num_actions)
> {
> struct nft_flow_rule *flow;
> @@ -107,6 +109,7 @@ static void nft_flow_offload_common_init(struct flow_cls_common_offload *common,
> struct netlink_ext_ack *extack)
> {
> common->protocol = proto;
> + common->prio = TC_H_MAKE(FLOW_OFFLOAD_DEFAUT_PRIO << 16, 0);
Note that tc semantics for this is to auto-generate a priority in such
cases, instead of using a default.
@tc_new_tfilter():
if (prio == 0) {
/* If no priority is provided by the user,
* we allocate one.
*/
if (n->nlmsg_flags & NLM_F_CREATE) {
prio = TC_H_MAKE(0x80000000U, 0U);
prio_allocate = true;
...
if (prio_allocate)
prio = tcf_auto_prio(tcf_chain_tp_prev(chain,
&chain_info));
> common->extack = extack;
> }
>
> --
> 1.8.3.1
>
^ permalink raw reply
* Re: [PATCH v4 net-next 10/19] ionic: Add management of rx filters
From: Saeed Mahameed @ 2019-07-24 23:52 UTC (permalink / raw)
To: snelson@pensando.io, netdev@vger.kernel.org, davem@davemloft.net
In-Reply-To: <20190722214023.9513-11-snelson@pensando.io>
On Mon, 2019-07-22 at 14:40 -0700, Shannon Nelson wrote:
> Set up the infrastructure for managing Rx filters. We can't ask the
> hardware for what filters it has, so we keep a local list of filters
> that we've pushed into the HW.
>
> Signed-off-by: Shannon Nelson <snelson@pensando.io>
> ---
> drivers/net/ethernet/pensando/ionic/Makefile | 4 +-
> .../net/ethernet/pensando/ionic/ionic_lif.c | 6 +
> .../net/ethernet/pensando/ionic/ionic_lif.h | 2 +
> .../ethernet/pensando/ionic/ionic_rx_filter.c | 143
> ++++++++++++++++++
> .../ethernet/pensando/ionic/ionic_rx_filter.h | 35 +++++
> 5 files changed, 188 insertions(+), 2 deletions(-)
> create mode 100644
> drivers/net/ethernet/pensando/ionic/ionic_rx_filter.c
> create mode 100644
> drivers/net/ethernet/pensando/ionic/ionic_rx_filter.h
>
>
[...]
> +#define RXQ_INDEX_ANY (0xFFFF)
> +struct rx_filter {
> + u32 flow_id;
> + u32 filter_id;
> + u16 rxq_index;
> + struct rx_filter_add_cmd cmd;
> + struct hlist_node by_hash;
> + struct hlist_node by_id;
> +};
> +
> +#define RX_FILTER_HASH_BITS 10
> +#define RX_FILTER_HLISTS BIT(RX_FILTER_HASH_BITS)
> +#define RX_FILTER_HLISTS_MASK (RX_FILTER_HLISTS - 1)
> +struct rx_filters {
> + spinlock_t lock; /* filter list lock
> */
> + struct hlist_head by_hash[RX_FILTER_HLISTS]; /* by skb
> hash */
> + struct hlist_head by_id[RX_FILTER_HLISTS]; /* by
> filter_id */
> +};
> +
>
Following Dave's comment on this, you use too generic struct and macro
/define names, i strongly recommend to add a unique prefix to this
driver.
^ permalink raw reply
* Re: [PATCH bpf-next 5/7] sefltests/bpf: support FLOW_DISSECTOR_F_PARSE_1ST_FRAG
From: Stanislav Fomichev @ 2019-07-24 23:52 UTC (permalink / raw)
To: Song Liu
Cc: Stanislav Fomichev, Networking, bpf, David S . Miller,
Alexei Starovoitov, Daniel Borkmann, Willem de Bruijn,
Petar Penkov
In-Reply-To: <CAPhsuW6Z2Bx66ZDOV-9jW+hsxKbZJxY-YFgP0rL_4QipAuptQA@mail.gmail.com>
On 07/24, Song Liu wrote:
> On Wed, Jul 24, 2019 at 10:11 AM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > bpf_flow.c: exit early unless FLOW_DISSECTOR_F_PARSE_1ST_FRAG is passed
> > in flags. Also, set ip_proto earlier, this makes sure we have correct
> > value with fragmented packets.
> >
> > Add selftest cases to test ipv4/ipv6 fragments and skip eth_get_headlen
> > tests that don't have FLOW_DISSECTOR_F_PARSE_1ST_FRAG flag.
> >
> > eth_get_headlen calls flow dissector with
> > FLOW_DISSECTOR_F_PARSE_1ST_FRAG flag so we can't run tests that
> > have different set of input flags against it.
> >
> > Cc: Willem de Bruijn <willemb@google.com>
> > Cc: Petar Penkov <ppenkov@google.com>
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> > .../selftests/bpf/prog_tests/flow_dissector.c | 129 ++++++++++++++++++
> > tools/testing/selftests/bpf/progs/bpf_flow.c | 28 +++-
> > 2 files changed, 151 insertions(+), 6 deletions(-)
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> > index c938283ac232..966cb3b06870 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/flow_dissector.c
> > @@ -5,6 +5,10 @@
> > #include <linux/if_tun.h>
> > #include <sys/uio.h>
> >
> > +#ifndef IP_MF
> > +#define IP_MF 0x2000
> > +#endif
> > +
> > #define CHECK_FLOW_KEYS(desc, got, expected) \
> > CHECK_ATTR(memcmp(&got, &expected, sizeof(got)) != 0, \
> > desc, \
> > @@ -49,6 +53,18 @@ struct ipv6_pkt {
> > struct tcphdr tcp;
> > } __packed;
> >
> > +struct ipv6_frag_pkt {
> > + struct ethhdr eth;
> > + struct ipv6hdr iph;
> > + struct frag_hdr {
> > + __u8 nexthdr;
> > + __u8 reserved;
> > + __be16 frag_off;
> > + __be32 identification;
> > + } ipf;
> > + struct tcphdr tcp;
> > +} __packed;
> > +
> > struct dvlan_ipv6_pkt {
> > struct ethhdr eth;
> > __u16 vlan_tci;
> > @@ -65,9 +81,11 @@ struct test {
> > struct ipv4_pkt ipv4;
> > struct svlan_ipv4_pkt svlan_ipv4;
> > struct ipv6_pkt ipv6;
> > + struct ipv6_frag_pkt ipv6_frag;
> > struct dvlan_ipv6_pkt dvlan_ipv6;
> > } pkt;
> > struct bpf_flow_keys keys;
> > + __u32 flags;
> > };
> >
> > #define VLAN_HLEN 4
> > @@ -143,6 +161,102 @@ struct test tests[] = {
> > .n_proto = __bpf_constant_htons(ETH_P_IPV6),
> > },
> > },
> > + {
> > + .name = "ipv4-frag",
> > + .pkt.ipv4 = {
> > + .eth.h_proto = __bpf_constant_htons(ETH_P_IP),
> > + .iph.ihl = 5,
> > + .iph.protocol = IPPROTO_TCP,
> > + .iph.tot_len = __bpf_constant_htons(MAGIC_BYTES),
> > + .iph.frag_off = __bpf_constant_htons(IP_MF),
> > + .tcp.doff = 5,
> > + .tcp.source = 80,
> > + .tcp.dest = 8080,
> > + },
> > + .keys = {
> > + .flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG,
> > + .nhoff = ETH_HLEN,
> > + .thoff = ETH_HLEN + sizeof(struct iphdr),
> > + .addr_proto = ETH_P_IP,
> > + .ip_proto = IPPROTO_TCP,
> > + .n_proto = __bpf_constant_htons(ETH_P_IP),
> > + .is_frag = true,
> > + .is_first_frag = true,
> > + .sport = 80,
> > + .dport = 8080,
> > + },
> > + .flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG,
> > + },
> > + {
> > + .name = "ipv4-no-frag",
> > + .pkt.ipv4 = {
> > + .eth.h_proto = __bpf_constant_htons(ETH_P_IP),
> > + .iph.ihl = 5,
> > + .iph.protocol = IPPROTO_TCP,
> > + .iph.tot_len = __bpf_constant_htons(MAGIC_BYTES),
> > + .iph.frag_off = __bpf_constant_htons(IP_MF),
> > + .tcp.doff = 5,
> > + .tcp.source = 80,
> > + .tcp.dest = 8080,
> > + },
> > + .keys = {
> > + .nhoff = ETH_HLEN,
> > + .thoff = ETH_HLEN + sizeof(struct iphdr),
> > + .addr_proto = ETH_P_IP,
> > + .ip_proto = IPPROTO_TCP,
> > + .n_proto = __bpf_constant_htons(ETH_P_IP),
> > + .is_frag = true,
> > + .is_first_frag = true,
> > + },
> > + },
> > + {
> > + .name = "ipv6-frag",
> > + .pkt.ipv6_frag = {
> > + .eth.h_proto = __bpf_constant_htons(ETH_P_IPV6),
> > + .iph.nexthdr = IPPROTO_FRAGMENT,
> > + .iph.payload_len = __bpf_constant_htons(MAGIC_BYTES),
> > + .ipf.nexthdr = IPPROTO_TCP,
> > + .tcp.doff = 5,
> > + .tcp.source = 80,
> > + .tcp.dest = 8080,
> > + },
> > + .keys = {
> > + .flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG,
> > + .nhoff = ETH_HLEN,
> > + .thoff = ETH_HLEN + sizeof(struct ipv6hdr) +
> > + sizeof(struct frag_hdr),
> > + .addr_proto = ETH_P_IPV6,
> > + .ip_proto = IPPROTO_TCP,
> > + .n_proto = __bpf_constant_htons(ETH_P_IPV6),
> > + .is_frag = true,
> > + .is_first_frag = true,
> > + .sport = 80,
> > + .dport = 8080,
> > + },
> > + .flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG,
> > + },
> > + {
> > + .name = "ipv6-no-frag",
> > + .pkt.ipv6_frag = {
> > + .eth.h_proto = __bpf_constant_htons(ETH_P_IPV6),
> > + .iph.nexthdr = IPPROTO_FRAGMENT,
> > + .iph.payload_len = __bpf_constant_htons(MAGIC_BYTES),
> > + .ipf.nexthdr = IPPROTO_TCP,
> > + .tcp.doff = 5,
> > + .tcp.source = 80,
> > + .tcp.dest = 8080,
> > + },
> > + .keys = {
> > + .nhoff = ETH_HLEN,
> > + .thoff = ETH_HLEN + sizeof(struct ipv6hdr) +
> > + sizeof(struct frag_hdr),
> > + .addr_proto = ETH_P_IPV6,
> > + .ip_proto = IPPROTO_TCP,
> > + .n_proto = __bpf_constant_htons(ETH_P_IPV6),
> > + .is_frag = true,
> > + .is_first_frag = true,
> > + },
> > + },
> > };
> >
> > static int create_tap(const char *ifname)
> > @@ -225,6 +339,13 @@ void test_flow_dissector(void)
> > .data_size_in = sizeof(tests[i].pkt),
> > .data_out = &flow_keys,
> > };
> > + static struct bpf_flow_keys ctx = {};
> > +
> > + if (tests[i].flags) {
> > + tattr.ctx_in = &ctx;
> > + tattr.ctx_size_in = sizeof(ctx);
> > + ctx.flags = tests[i].flags;
> > + }
> >
> > err = bpf_prog_test_run_xattr(&tattr);
> > CHECK_ATTR(tattr.data_size_out != sizeof(flow_keys) ||
> > @@ -255,6 +376,14 @@ void test_flow_dissector(void)
> > struct bpf_prog_test_run_attr tattr = {};
> > __u32 key = 0;
> >
> > + /* Don't run tests that are not marked as
> > + * FLOW_DISSECTOR_F_PARSE_1ST_FRAG; eth_get_headlen
> > + * sets this flag.
> > + */
> > +
> > + if (tests[i].flags != FLOW_DISSECTOR_F_PARSE_1ST_FRAG)
> > + continue;
>
> Maybe test flags & FLOW_DISSECTOR_F_PARSE_1ST_FRAG == 0 instead?
> It is not necessary now, but might be useful in the future.
I'm not sure about this one. We want flags here to match flags
from eth_get_headlen:
const unsigned int flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG;
...
if (!skb_flow_dissect_flow_keys_basic(..., flags))
Otherwise the test might break unexpectedly. So I'd rather manually
adjust a test here if eth_get_headlen flags change.
Maybe I should clarify the comment to signify that dependency? Because
currently it might be read as if we only care about
FLOW_DISSECTOR_F_PARSE_1ST_FRAG, but we really care about all flags
in eth_get_headlen; it just happens that it only has one right now.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox