Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v3 1/2] net: core: Notify on changes to dev->promiscuity.
From: Ivan Vecera @ 2019-08-30  6:54 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David Miller, idosch, andrew, horatiu.vultur, alexandre.belloni,
	UNGLinuxDriver, allan.nielsen, f.fainelli, netdev, linux-kernel
In-Reply-To: <20190830063624.GN2312@nanopsycho>

On Fri, 30 Aug 2019 08:36:24 +0200
Jiri Pirko <jiri@resnulli.us> wrote:

> Fri, Aug 30, 2019 at 08:02:33AM CEST, davem@davemloft.net wrote:
> >From: Jiri Pirko <jiri@resnulli.us>
> >Date: Fri, 30 Aug 2019 07:39:40 +0200
> >  
> >> Because the "promisc mode" would gain another meaning. Now how the
> >> driver should guess which meaning the user ment when he setted it?
> >> filter or trap?
> >> 
> >> That is very confusing. If the flag is the way to do this, let's
> >> introduce another flag, like IFF_TRAPPING indicating that user
> >> wants exactly this.  
> >
> >I don't understand how the meaning of promiscuous mode for a
> >networking device has suddenly become ambiguous, when did this start
> >happening?  
> 
> The promiscuity is a way to setup the rx filter. So promics == rx
> filter off. For normal nics, where there is no hw fwd datapath,
> this coincidentally means all received packets go to cpu.
> But if there is hw fwd datapath, rx filter is still off, all rxed
> packets are processed. But that does not mean they should be trapped
> to cpu.

+1
Promisc is Rx filtering option and should not imply that offloaded
traffic is trapped to CPU.

> Simple example:
> I need to see slowpath packets, for example arps/stp/bgp/... that
> are going to cpu, I do:
> tcpdump -i swp1
> 
> I don't want to get all the traffic running over hw running this cmd.
> This is a valid usecase.
> 
> To cope with hw fwd datapath devices, I believe that tcpdump has to
> have notion of that. Something like:
> 
> tcpdump -i swp1 --hw-trapping-mode
> 
> The logic can be inverse:
> tcpdump -i swp1
> tcpdump -i swp1 --no-hw-trapping-mode
> 
> However, that would provide inconsistent behaviour between existing
> and patched tcpdump/kernel.
> 
> All I'm trying to say, there are 2 flags
> needed (if we don't use tc trap).


^ permalink raw reply

* Re: auto-split of commit. Was: [PATCH bpf-next 04/10] tools/bpf: add libbpf_prog_type_(from|to)_str helpers
From: Greg Kroah-Hartman @ 2019-08-30  6:47 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jakub Kicinski, Julia Kartseva, ast, Thomas Gleixner, rdna, bpf,
	daniel, netdev, kernel-team
In-Reply-To: <20190829171655.fww5qxtfusehcpds@ast-mbp.dhcp.thefacebook.com>

On Thu, Aug 29, 2019 at 10:16:56AM -0700, Alexei Starovoitov wrote:
> On Thu, Aug 29, 2019 at 08:51:51AM +0200, Greg Kroah-Hartman wrote:
> > On Wed, Aug 28, 2019 at 04:46:28PM -0700, Alexei Starovoitov wrote:
> > > On Wed, Aug 28, 2019 at 04:34:22PM -0700, Jakub Kicinski wrote:
> > > > 
> > > > Greg, Thomas, libbpf is extracted from the kernel sources and
> > > > maintained in a clone repo on GitHub for ease of packaging.
> > > > 
> > > > IIUC Alexei's concern is that since we are moving the commits from
> > > > the kernel repo to the GitHub one we have to preserve the commits
> > > > exactly as they are, otherwise SOB lines lose their power.
> > > > 
> > > > Can you provide some guidance on whether that's a valid concern, 
> > > > or whether it's perfectly fine to apply a partial patch?
> > > 
> > > Right. That's exactly the concern.
> > > 
> > > Greg, Thomas,
> > > could you please put your legal hat on and clarify the following.
> > > Say some developer does a patch that modifies
> > > include/uapi/linux/bpf.h
> > > ..some other kernel code...and
> > > tools/include/uapi/linux/bpf.h
> > > 
> > > That tools/include/uapi/linux/bpf.h is used by perf and by libbpf.
> > > We have automatic mirror of tools/libbpf into github/libbpf/
> > > so that external projects and can do git submodule of it,
> > > can build packages out of it, etc.
> > > 
> > > The question is whether it's ok to split tools/* part out of
> > > original commit, keep Author and SOB, create new commit out of it,
> > > and automatically push that auto-generated commit into github mirror.
> > 
> > Note, I am not a laywer, and am not _your_ lawyer either, only _your_
> > lawyer can answer questions as to what is best for you.
> > 
> > That being said, from a "are you keeping the correct authorship info",
> > yes, it sounds like you are doing the correct thing here.
> > 
> > Look at what I do for stable kernels, I take the original commit and add
> > it to "another tree" keeping the original author and s-o-b chain intact,
> > and adding a "this is the original git commit id" type message to the
> > changelog text so that people can link it back to the original.
> 
> I think you're describing 'git cherry-pick -x'.

Well, my scripts don't use git, but yes, it's much the same thing :)

> The question was about taking pieces of the original commit. Not the whole commit.
> Author field obviously stays, but SOB is questionable.

sob matters to the file the commit is touching, and if it is identical
to the original file (including same license), then it should be fine.

> If author meant to change X and Y and Z. Silently taking only Z chunk of the diff
> doesn't quite seem right.

It can be confusing, I agree.

> If we document that such commit split happens in Documentation/bpf/bpf_devel_QA.rst
> do you think it will be enough to properly inform developers?
> The main concern is the surprise factor when people start seeing their commits
> in the mirror, but not their full commits.

Personally, I wouldn't care, but maybe you should just enforce the fact
that the original patch should ONLY touch Z, and not X and Y in the same
patch, to make this a lot more obvious.

Patches should only be doing "one logical thing" in the first place, but
maybe you also need to touch other things when doing a change that you
can't do this, I really do not know, sorry.

greg k-h

^ permalink raw reply

* Re: [PATCH bpf-next 01/13] bpf: add bpf_map_value_size and bp_map_copy_value helper functions
From: Yonghong Song @ 2019-08-30  6:40 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, Networking, Alexei Starovoitov, Brian Vazquez,
	Daniel Borkmann, Kernel Team
In-Reply-To: <2DB6B840-9EF6-483D-8570-4BB9EB74F3DA@fb.com>



On 8/29/19 3:04 PM, Song Liu wrote:
> 
> 
>> On Aug 28, 2019, at 11:45 PM, Yonghong Song <yhs@fb.com> wrote:
>>
>> From: Brian Vazquez <brianvv@google.com>
>>
>> Move reusable code from map_lookup_elem to helper functions to avoid code
>> duplication in kernel/bpf/syscall.c
>>
>> Suggested-by: Stanislav Fomichev <sdf@google.com>
>> Signed-off-by: Brian Vazquez <brianvv@google.com>
> 
> 
> Acked-by: Song Liu <songliubraving@fb.com>
> 
> Yonghong, we also need your SoB.

Thanks for reminding me of this. Will add my SoB in next revision.

^ permalink raw reply

* Re: [PATCH bpf-next 05/13] bpf: adding map batch processing support
From: Yonghong Song @ 2019-08-30  6:39 UTC (permalink / raw)
  To: Brian Vazquez
  Cc: bpf@vger.kernel.org, netdev@vger.kernel.org, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team
In-Reply-To: <CAMzD94RuPs8_BHNRDjk6NDkyi=hJ+pCKeLb3ihACLYaOWz8sAA@mail.gmail.com>



On 8/29/19 4:01 PM, Brian Vazquez wrote:
> Hi Yonghong!
> 
> Thanks for sending this series of patches and starting the discussion.
> 
> On Wed, Aug 28, 2019 at 11:45 PM Yonghong Song <yhs@fb.com> wrote:
>>
>> Brian Vazquez has proposed BPF_MAP_DUMP command to look up more than one
>> map entries per syscall.
>>    https://lore.kernel.org/bpf/CABCgpaU3xxX6CMMxD+1knApivtc2jLBHysDXw-0E9bQEL0qC3A@mail.gmail.com/T/#t
>>
>> During discussion, we found more use cases can be supported in a similar
>> map operation batching framework. For example, batched map lookup and delete,
>> which can be really helpful for bcc.
>>    https://github.com/iovisor/bcc/blob/master/tools/tcptop.py#L233-L243
>>    https://github.com/iovisor/bcc/blob/master/tools/slabratetop.py#L129-L138
>>
>> Also, in bcc, we have API to delete all entries in a map.
>>    https://github.com/iovisor/bcc/blob/master/src/cc/api/BPFTable.h#L257-L264
>>
>> For map update, batched operations also useful as sometimes applications need
>> to populate initial maps with more than one entry. For example, the below
>> example is from kernel/samples/bpf/xdp_redirect_cpu_user.c:
>>    https://github.com/torvalds/linux/blob/master/samples/bpf/xdp_redirect_cpu_user.c#L543-L550
>>
>> This patch addresses all the above use cases. To make uapi stable, it also
>> covers other potential use cases. Four bpf syscall subcommands are introduced:
>>          BPF_MAP_LOOKUP_BATCH
>>          BPF_MAP_LOOKUP_AND_DELETE_BATCH
>>          BPF_MAP_UPDATE_BATCH
>>          BPF_MAP_DELETE_BATCH
>>
>> The UAPI attribute structure looks like:
>>          struct { /* struct used by BPF_MAP_*_BATCH commands */
>>                  __aligned_u64   start_key;      /* input: storing start key,
>>                                                   * if NULL, starting from the beginning.
>>                                                   */
>>                  __aligned_u64   next_start_key; /* output: storing next batch start_key,
>>                                                   * if NULL, no next key.
>>                                                   */
>>                  __aligned_u64   keys;           /* input/output: key buffer */
>>                  __aligned_u64   values;         /* input/output: value buffer */
>>                  __u32           count;          /* input: # of keys/values in
>>                                                   *   or fits in keys[]/values[].
>>                                                   * output: how many successful
>>                                                   *   lookup/lookup_and_delete
>>                                                   *   /delete/update operations.
>>                                                   */
>>                  __u32           map_fd;
>>                  __u64           elem_flags;     /* BPF_MAP_{UPDATE,LOOKUP}_ELEM flags */
>>                  __u64           flags;          /* flags for batch operation */
>>          } batch;
>>
>> The 'start_key' and 'next_start_key' are used to BPF_MAP_LOOKUP_BATCH,
>> BPF_MAP_LOOKUP_AND_DELETE_BATCH and BPF_MAP_DELETE_BATCH
>> to start the operation on 'start_key' and also set the
>> next batch start key in 'next_start_key'.
>>
>> If 'count' is greater than 0 and the return code is 0,
>>    . the 'count' may be updated to be smaller if there is less
>>      elements than 'count' for the operation. In such cases,
>>      'next_start_key' will be set to NULL.
>>    . the 'count' remains the same. 'next_start_key' could be NULL
>>      or could point to next start_key for batch processing.
>>
>> If 'count' is greater than 0 and the return code is an error
>> other than -EFAULT, the kernel will try to overwrite 'count'
>> to contain the number of successful element-level (lookup,
>> lookup_and_delete, update and delete) operations. The following
>> attributes can be checked:
>>    . if 'count' value is modified, the new value will be
>>      the number of successful element-level operations.
>>    . if 'count' value is modified, the keys[]/values[] will
>>      contain correct results for new 'count' number of
>>      operations for lookup[_and_delete] and update.
>>
>> The implementation in this patch mimics what user space
>> did, e.g., for lookup_and_delete,
>>      while(bpf_get_next_key(map, keyp, next_keyp) == 0) {
>>         bpf_map_delete_elem(map, keyp);
>>         bpf_map_lookup_elem(map, next_keyp, &value, flags);
>>         keyp, next_keyp = next_keyp, keyp;
>>      }
>> The similar loop is implemented in the kernel, and
>> each operation, bpf_get_next_key(), bpf_map_delete_elem()
>> and bpf_map_lookup_elem(), uses existing kernel functions
>> each of which has its own rcu_read_lock region, bpf_prog_active
>> protection, etc.
>> Therefore, it is totally possible that after bpf_get_next_key(),
>> the bpf_map_delete_elem() or bpf_map_lookup_elem() may fail
>> as the key may be deleted concurrently by kernel or
>> other user space processes/threads.
>> By default, the current implementation permits the loop
>> to continue, just like typical user space handling. But
>> a flag, BPF_F_ENFORCE_ENOENT, can be specified, so kernel
>> will return an error if bpf_map_delete_elem() or
>> bpf_map_lookup_elem() failed.
>>
>> The high-level algorithm for BPF_MAP_LOOKUP_BATCH and
>> BPF_MAP_LOOKUP_AND_DELETE_BATCH:
>>          if (start_key == NULL and next_start_key == NULL) {
>>                  lookup up to 'count' keys in keys[] and fill
>>                  corresponding values[], and delete those
>>                  keys if BPF_MAP_LOOKUP_AND_DELETE_BATCH.
>>          } else if (start_key == NULL && next_start_key != NULL) {
>>                  lookup up to 'count' keys from the beginning
>>                  of the map and fill keys[]/values[], delete
>>                  those keys if BPF_MAP_LOOKUP_AND_DELETE_BATCH.
>>                  Set 'next_start_key' for next batch operation.
>>          } else if (start_key != NULL && next_start_key != NULL) {
>>                  lookup to 'count' keys from 'start_key', inclusive,
>>                  and fill keys[]/values[], delete those keys if
>>                  BPF_MAP_LOOKUP_AND_DELETE_BATCH.
>>                  Set 'next_start_key' for next batch operation.
>>          }
>>
>> The high-level algorithm for BPF_MAP_UPDATE_BATCH:
>>          if (count != 0) {
>>                  do 'count' number of updates on keys[]/values[].
>>          }
>>
>> The high-level algorithm for BPF_MAP_DELETE_BATCH:
>>          if (count == 0) {
>>                  if (start_key == NULL) {
>>                          delete all elements from map.
>>                  } else {
>>                          delete all elements from start_key to the end of map.
>>                  }
>>          } else {
>>                  if (start_key == NULL and next_start_key == NULL) {
>>                          delete 'count' number of keys in keys[].
>>                  } else if (start_key == NULL and next_start_key != NULL) {
>>                          delete 'count' number of keys from the
>>                          beginning of the map and set 'next_start_key'
>>                          properly.
>>                  } else if (start_key != NULL and next_start_keeey != NULL) {
>>                          delete 'count' number of keys from 'start_key',
>>                          and set 'next_start_key' properly.
>>                  }
>>          }
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>   include/uapi/linux/bpf.h |  27 +++
>>   kernel/bpf/syscall.c     | 448 +++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 475 insertions(+)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 5d2fb183ee2d..576688f13e8c 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -107,6 +107,10 @@ enum bpf_cmd {
>>          BPF_MAP_LOOKUP_AND_DELETE_ELEM,
>>          BPF_MAP_FREEZE,
>>          BPF_BTF_GET_NEXT_ID,
>> +       BPF_MAP_LOOKUP_BATCH,
>> +       BPF_MAP_LOOKUP_AND_DELETE_BATCH,
>> +       BPF_MAP_UPDATE_BATCH,
>> +       BPF_MAP_DELETE_BATCH,
>>   };
>>
>>   enum bpf_map_type {
>> @@ -347,6 +351,9 @@ enum bpf_attach_type {
>>   /* flags for BPF_PROG_QUERY */
>>   #define BPF_F_QUERY_EFFECTIVE  (1U << 0)
>>
>> +/* flags for BPF_MAP_*_BATCH */
>> +#define BPF_F_ENFORCE_ENOENT   (1U << 0)
>> +
>>   enum bpf_stack_build_id_status {
>>          /* user space need an empty entry to identify end of a trace */
>>          BPF_STACK_BUILD_ID_EMPTY = 0,
>> @@ -396,6 +403,26 @@ union bpf_attr {
>>                  __u64           flags;
>>          };
>>
>> +       struct { /* struct used by BPF_MAP_*_BATCH commands */
>> +               __aligned_u64   start_key;      /* input: storing start key,
>> +                                                * if NULL, starting from the beginning.
>> +                                                */
>> +               __aligned_u64   next_start_key; /* output: storing next batch start_key,
>> +                                                * if NULL, no next key.
>> +                                                */
>> +               __aligned_u64   keys;           /* input/output: key buffer */
>> +               __aligned_u64   values;         /* input/output: value buffer */
>> +               __u32           count;          /* input: # of keys/values in
>> +                                                *   or fits in keys[]/values[].
>> +                                                * output: how many successful
>> +                                                *   lookup/lookup_and_delete
>> +                                                *   update/delete operations.
>> +                                                */
>> +               __u32           map_fd;
>> +               __u64           elem_flags;     /* BPF_MAP_*_ELEM flags */
>> +               __u64           flags;          /* flags for batch operation */
>> +       } batch;
>> +
>>          struct { /* anonymous struct used by BPF_PROG_LOAD command */
>>                  __u32           prog_type;      /* one of enum bpf_prog_type */
>>                  __u32           insn_cnt;
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index 06308f0206e5..8746b55405f9 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -33,6 +33,17 @@
>>
>>   #define BPF_OBJ_FLAG_MASK   (BPF_F_RDONLY | BPF_F_WRONLY)
>>
>> +#define BPF_MAP_BATCH_SWAP_KEYS(key1, key2, buf1, buf2)        \
>> +       do {                                            \
>> +               if (key1 == (buf1)) {                   \
>> +                       key1 = buf2;                    \
>> +                       key2 = buf1;                    \
>> +               } else {                                \
>> +                       key1 = buf1;                    \
>> +                       key2 = buf2;                    \
>> +               }                                       \
>> +       } while (0)                                     \
>> +
>>   DEFINE_PER_CPU(int, bpf_prog_active);
>>   static DEFINE_IDR(prog_idr);
>>   static DEFINE_SPINLOCK(prog_idr_lock);
>> @@ -1183,6 +1194,431 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
>>          return err;
>>   }
>>
>> +static void __map_batch_get_attrs(const union bpf_attr *attr,
>> +                                 void __user **skey, void __user **nskey,
>> +                                 void __user **keys, void __user **values,
>> +                                 u32 *max_count, u64 *elem_flags, u64 *flags)
>> +{
>> +       *skey = u64_to_user_ptr(attr->batch.start_key);
>> +       *nskey = u64_to_user_ptr(attr->batch.next_start_key);
>> +       *keys = u64_to_user_ptr(attr->batch.keys);
>> +       *values = u64_to_user_ptr(attr->batch.values);
>> +       *max_count = attr->batch.count;
>> +       *elem_flags = attr->batch.elem_flags;
>> +       *flags = attr->batch.flags;
>> +}
>> +
>> +static int
>> +__map_lookup_delete_batch_key_in_keys(struct bpf_map *map, void *key, void *value,
>> +                                     u32 max_count, u32 key_size, u32 value_size,
>> +                                     u64 elem_flags, void __user *keys,
>> +                                     void __user *values,
>> +                                     union bpf_attr __user *uattr,
>> +                                     bool ignore_enoent)
>> +{
>> +       u32 count, missed = 0;
>> +       int ret = 0, err;
>> +
>> +       for (count = 0; count < max_count; count++) {
>> +               if (copy_from_user(key, keys + count * key_size, key_size)) {
>> +                       ret = -EFAULT;
>> +                       break;
>> +               }
>> +
>> +               ret = bpf_map_copy_value(map, key, value, elem_flags);
>> +               if (ret) {
>> +                       if (ret != -ENOENT || !ignore_enoent)
>> +                               break;
>> +
>> +                       missed++;
>> +                       continue;
>> +               }
>> +
>> +
>> +               if (copy_to_user(values + count * value_size, value, value_size)) {
>> +                       ret = -EFAULT;
>> +                       break;
>> +               }
>> +
>> +               ret = bpf_map_delete_elem(map, key);
>> +               if (ret) {
>> +                       if (ret != -ENOENT || !ignore_enoent)
>> +                               break;
>> +
>> +                       missed++;
>> +               }
>> +       }
>> +
>> +       count -= missed;
>> +       if ((!ret && missed) || (ret && ret != -EFAULT)) {
>> +               err = put_user(count, &uattr->batch.count);
>> +               ret = err ? : ret;
>> +       }
>> +
>> +       return ret;
>> +}
>> +
>> +static int map_lookup_and_delete_batch(struct bpf_map *map,
>> +                                      const union bpf_attr *attr,
>> +                                      union bpf_attr __user *uattr,
>> +                                      bool do_delete)
>> +{
>> +       u32 max_count, count = 0, key_size, roundup_key_size, value_size;
>> +       bool ignore_enoent, nextkey_is_null, copied;
>> +       void *buf = NULL, *key, *value, *next_key;
>> +       void __user *skey, *nskey, *keys, *values;
>> +       u64 elem_flags, flags, zero = 0;
>> +       int err, ret = 0;
>> +
>> +       if (map->map_type == BPF_MAP_TYPE_QUEUE ||
>> +           map->map_type == BPF_MAP_TYPE_STACK)
>> +               return -ENOTSUPP;
>> +
>> +       __map_batch_get_attrs(attr, &skey, &nskey, &keys, &values, &max_count,
>> +                             &elem_flags, &flags);
>> +
>> +       if (elem_flags & ~BPF_F_LOCK || flags & ~BPF_F_ENFORCE_ENOENT)
>> +               return -EINVAL;
>> +
>> +       if (!max_count)
>> +               return 0;
>> +
>> +       /* The following max_count/skey/nskey combinations are supported:
>> +        * max_count != 0 && !skey && !nskey: loop/delete max_count elements in keys[]/values[].
>> +        * max_count != 0 && !skey && nskey: loop/delete max_count elements starting from map start.
>> +        * max_count != 0 && skey && nskey: loop/delete max_count elements starting from skey.
>> +        */
>> +       if (skey && !nskey)
>> +               return -EINVAL;
>> +
>> +       /* allocate space for two keys and one value. */
>> +       key_size = map->key_size;
>> +       roundup_key_size = round_up(map->key_size, 8);
>> +       value_size = bpf_map_value_size(map);
>> +       buf = kmalloc(roundup_key_size * 2 + value_size, GFP_USER | __GFP_NOWARN);
>> +       if (!buf)
>> +               return -ENOMEM;
>> +
>> +       key = buf;
>> +       next_key = buf + roundup_key_size;
>> +       value = buf + roundup_key_size * 2;
>> +       ignore_enoent = !(flags & BPF_F_ENFORCE_ENOENT);
>> +
>> +       if (!skey && !nskey) {
>> +               /* handle cases where keys in keys[] */
>> +               ret = __map_lookup_delete_batch_key_in_keys(map, key, value, max_count,
>> +                                                           key_size, value_size,
>> +                                                           elem_flags, keys, values,
>> +                                                           uattr, ignore_enoent);
>> +               goto out;
>> +       }
>> +
>> +       /* Get the first key. */
>> +       if (!skey) {
>> +               ret = bpf_map_get_next_key(map, NULL, key);
>> +               if (ret) {
>> +                       nextkey_is_null = true;
>> +                       goto after_loop;
>> +               }
>> +       } else if (copy_from_user(key, skey, key_size)) {
>> +               ret = -EFAULT;
>> +               goto out;
>> +       }
>> +
>> +       /* Copy the first key/value pair */
>> +       ret = bpf_map_copy_value(map, key, value, elem_flags);
>> +       if (ret) {
>> +               if (skey)
>> +                       goto out;
>> +
>> +               nextkey_is_null = true;
>> +               goto after_loop;
>> +       }
>> +
>> +       if (copy_to_user(keys, key, key_size) ||
>> +           copy_to_user(values, value, value_size)) {
>> +               ret = -EFAULT;
>> +               goto out;
>> +       }
>> +
>> +       /* We will always try to get next_key first
>> +        * and then delete the current key.
>> +        */
>> +       ret = bpf_map_get_next_key(map, key, next_key);
> 
> One of the issues I see in this implementation is that is still
> relying on the existing functions and has the same consistency
> problems that my attempt had.

Right. The approach is very similar to what you proposed earlier.

> 
> The problem happens when you are trying to do batch lookup on a
> hashmap and when executing bpf_map_get_next_key(map, key, next_key)
> the key is removed, then that call will return the first key and you'd
> start iterating the map from the beginning again and retrieve
> duplicate information.

Right. Maybe we can have another bpf_map_ops callback function
like 'map_batch_get_next_key' which won't fall back to the
first key if the 'key' is not available in the hash table?

If 'map_batch_get_next_key' failed due to 'key' is gone, we just
return to user space with whatever results we got so far.

We can have a flag to control this behavior. Some users may not
care about duplication.

> 
> Note that sometimes you can also start from the same bucket when the
> key is updated while dumping it because it can be inserted on the head
> of the bucket so you could potentially revisit elements that you had
> already visited.

Do you think 'map_batch_get_next_key' approach could solve this as well?

> 
>  From previous discussion my understanding was that what we wanted was
> to pursue 'atomic' compounded operations first and after that, try to
> batch them. Although I don't think there's an easy way of batching and
> being consistent at the same time.

Yes, I thought about this 'atomic' compounded approach. First,
e.g., batch deletion, after some of elements are deleted, we found
one key is gone. we cannot really go back to the state where no
deletion happens.

Second, even we permit partial work, batched update/delete and element
update/delete concurrency (userspace or bpf program) still hard to 
manage. We cannot put giant batched update/delete in a spin
lock... (bpf program uses spin lock for may update/delete operations).

I agree with you as I also did not find an easy way to have both
batching and good consistency.

^ permalink raw reply

* Re: [PATCH v3 1/2] net: core: Notify on changes to dev->promiscuity.
From: Jiri Pirko @ 2019-08-30  6:36 UTC (permalink / raw)
  To: David Miller
  Cc: idosch, andrew, horatiu.vultur, alexandre.belloni, UNGLinuxDriver,
	allan.nielsen, ivecera, f.fainelli, netdev, linux-kernel
In-Reply-To: <20190829.230233.287975311556641534.davem@davemloft.net>

Fri, Aug 30, 2019 at 08:02:33AM CEST, davem@davemloft.net wrote:
>From: Jiri Pirko <jiri@resnulli.us>
>Date: Fri, 30 Aug 2019 07:39:40 +0200
>
>> Because the "promisc mode" would gain another meaning. Now how the
>> driver should guess which meaning the user ment when he setted it?
>> filter or trap?
>> 
>> That is very confusing. If the flag is the way to do this, let's
>> introduce another flag, like IFF_TRAPPING indicating that user wants
>> exactly this.
>
>I don't understand how the meaning of promiscuous mode for a
>networking device has suddenly become ambiguous, when did this start
>happening?

The promiscuity is a way to setup the rx filter. So promics == rx filter
off. For normal nics, where there is no hw fwd datapath,
this coincidentally means all received packets go to cpu.
But if there is hw fwd datapath, rx filter is still off, all rxed packets
are processed. But that does not mean they should be trapped to cpu.

Simple example:
I need to see slowpath packets, for example arps/stp/bgp/... that
are going to cpu, I do:
tcpdump -i swp1

I don't want to get all the traffic running over hw running this cmd.
This is a valid usecase.

To cope with hw fwd datapath devices, I believe that tcpdump has to have
notion of that. Something like:

tcpdump -i swp1 --hw-trapping-mode

The logic can be inverse:
tcpdump -i swp1
tcpdump -i swp1 --no-hw-trapping-mode

However, that would provide inconsistent behaviour between existing and
patched tcpdump/kernel.

All I'm trying to say, there are 2 flags
needed (if we don't use tc trap).

^ permalink raw reply

* Re: [PATCH v3 1/2] net: core: Notify on changes to dev->promiscuity.
From: David Miller @ 2019-08-30  6:18 UTC (permalink / raw)
  To: jiri
  Cc: andrew, horatiu.vultur, alexandre.belloni, UNGLinuxDriver,
	allan.nielsen, ivecera, f.fainelli, netdev, linux-kernel
In-Reply-To: <20190830061327.GM2312@nanopsycho>

From: Jiri Pirko <jiri@resnulli.us>
Date: Fri, 30 Aug 2019 08:13:27 +0200

> In fact, I have usecase where I need to see only slow-path traffic by
> wireshark, not all packets going through hw.

This could be an attribute in the descriptor entries for the packets
provided to userspace, and BPF filters could act on these as well.

Switch network devices aren't special, promiscuous mode means all
packets on the wire that you are attached.

This talk about special handling of "trapping" is a strawman.

^ permalink raw reply

* Re: [PATCH v3 1/2] net: core: Notify on changes to dev->promiscuity.
From: Jiri Pirko @ 2019-08-30  6:13 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Horatiu Vultur, alexandre.belloni, UNGLinuxDriver, davem,
	allan.nielsen, ivecera, f.fainelli, netdev, linux-kernel
In-Reply-To: <20190829143732.GB17864@lunn.ch>

Thu, Aug 29, 2019 at 04:37:32PM CEST, andrew@lunn.ch wrote:
>> Wait, I believe there has been some misundestanding. Promisc mode is NOT
>> about getting packets to the cpu. It's about setting hw filters in a way
>> that no rx packet is dropped.
>> 
>> If you want to get packets from the hw forwarding dataplane to cpu, you
>> should not use promisc mode for that. That would be incorrect.
>
>Hi Jiri
>
>I'm not sure a wireshark/tcpdump/pcap user would agree with you. They
>want to see packets on an interface, so they use these tools. The fact
>that the interface is a switch interface should not matter. The
>switchdev model is that we try to hide away the interface happens to
>be on a switch, you can just use it as normal. So why should promisc
>mode not work as normal?

It does, disables the rx filter. Why do you think it means the same
thing as "trap all to cpu"? Hw datapath was never considered by
wireshark.

In fact, I have usecase where I need to see only slow-path traffic by
wireshark, not all packets going through hw. So apparently, there is a
need of another wireshark option and perhaps another flag
IFF_HW_TRAPPING?.

tcpdump -i eth0
tcpdump -i eth0 --no-promiscuous-mode
tcpdump -i eth0 --hw-trapping-mode


> 
>> If you want to get packets from the hw forwarding dataplane to cpu, you
>> should use tc trap action. It is there exactly for this purpose.
>
>Do you really think a wireshark/tcpdump/pcap user should need to use
>tc trap for the special case the interface is a switch port? Doesn't that
>break the switchdev model?
>
>tc trap is more about fine grained selection of packets. Also, it
>seems like trapped packets are not forwarded, which is not what you
>would expect from wireshark/tcpdump/pcap.
>
>      Andrew

^ permalink raw reply

* [PATCH v3] ipv6: Not to probe neighbourless routes
From: Cheng Lin @ 2019-08-30  6:11 UTC (permalink / raw)
  To: davem
  Cc: kuznet, yoshfuji, netdev, linux-kernel, xue.zhihong, wang.yi59,
	wang.liang82, Cheng Lin

From: Cheng Lin <cheng.lin130@zte.com.cn>

Originally, Router Reachability Probing require a neighbour entry
existed. Commit 2152caea7196 ("ipv6: Do not depend on rt->n in
rt6_probe().") removed the requirement for a neighbour entry. And
commit f547fac624be ("ipv6: rate-limit probes for neighbourless
routes") adds rate-limiting for neighbourless routes.

And, the Neighbor Discovery for IP version 6 (IPv6)(rfc4861) says,
"
7.2.5.  Receipt of Neighbor Advertisements

When a valid Neighbor Advertisement is received (either solicited or
unsolicited), the Neighbor Cache is searched for the target's entry.
If no entry exists, the advertisement SHOULD be silently discarded.
There is no need to create an entry if none exists, since the
recipient has apparently not initiated any communication with the
target.
".

In rt6_probe(), just a Neighbor Solicitation message are transmited.
When receiving a Neighbor Advertisement, the node does nothing in a
Neighborless condition.

Not sure it's needed to create a neighbor entry in Router
Reachability Probing. And the Original way may be the right way.

This patch recover the requirement for a neighbour entry.

Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn>
---
 include/net/ip6_fib.h | 5 -----
 net/ipv6/route.c      | 6 +-----
 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 4b5656c..8c2e022 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -124,11 +124,6 @@ struct rt6_exception {
 
 struct fib6_nh {
 	struct fib_nh_common	nh_common;
-
-#ifdef CONFIG_IPV6_ROUTER_PREF
-	unsigned long		last_probe;
-#endif
-
 	struct rt6_info * __percpu *rt6i_pcpu;
 	struct rt6_exception_bucket __rcu *rt6i_exception_bucket;
 };
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index fd059e0..1839dd7 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -639,12 +639,12 @@ static void rt6_probe(struct fib6_nh *fib6_nh)
 	nh_gw = &fib6_nh->fib_nh_gw6;
 	dev = fib6_nh->fib_nh_dev;
 	rcu_read_lock_bh();
-	idev = __in6_dev_get(dev);
 	neigh = __ipv6_neigh_lookup_noref(dev, nh_gw);
 	if (neigh) {
 		if (neigh->nud_state & NUD_VALID)
 			goto out;
 
+		idev = __in6_dev_get(dev);
 		write_lock(&neigh->lock);
 		if (!(neigh->nud_state & NUD_VALID) &&
 		    time_after(jiffies,
@@ -654,13 +654,9 @@ static void rt6_probe(struct fib6_nh *fib6_nh)
 				__neigh_set_probe_once(neigh);
 		}
 		write_unlock(&neigh->lock);
-	} else if (time_after(jiffies, fib6_nh->last_probe +
-				       idev->cnf.rtr_probe_interval)) {
-		work = kmalloc(sizeof(*work), GFP_ATOMIC);
 	}
 
 	if (work) {
-		fib6_nh->last_probe = jiffies;
 		INIT_WORK(&work->work, rt6_probe_deferred);
 		work->target = *nh_gw;
 		dev_hold(dev);
-- 
1.8.3.1


^ permalink raw reply related

* Re: [PATCH v3 1/2] net: core: Notify on changes to dev->promiscuity.
From: David Miller @ 2019-08-30  6:02 UTC (permalink / raw)
  To: jiri
  Cc: idosch, andrew, horatiu.vultur, alexandre.belloni, UNGLinuxDriver,
	allan.nielsen, ivecera, f.fainelli, netdev, linux-kernel
In-Reply-To: <20190830053940.GL2312@nanopsycho>

From: Jiri Pirko <jiri@resnulli.us>
Date: Fri, 30 Aug 2019 07:39:40 +0200

> Because the "promisc mode" would gain another meaning. Now how the
> driver should guess which meaning the user ment when he setted it?
> filter or trap?
> 
> That is very confusing. If the flag is the way to do this, let's
> introduce another flag, like IFF_TRAPPING indicating that user wants
> exactly this.

I don't understand how the meaning of promiscuous mode for a
networking device has suddenly become ambiguous, when did this start
happening?

^ permalink raw reply

* Re: [PATCH v3 1/2] net: core: Notify on changes to dev->promiscuity.
From: Jiri Pirko @ 2019-08-30  5:39 UTC (permalink / raw)
  To: David Miller
  Cc: idosch, andrew, horatiu.vultur, alexandre.belloni, UNGLinuxDriver,
	allan.nielsen, ivecera, f.fainelli, netdev, linux-kernel
In-Reply-To: <20190829.151201.940681219080864052.davem@davemloft.net>

Fri, Aug 30, 2019 at 12:12:01AM CEST, davem@davemloft.net wrote:
>From: Ido Schimmel <idosch@idosch.org>
>Date: Thu, 29 Aug 2019 22:36:13 +0300
>
>> I fully agree that we should make it easy for users to capture offloaded
>> traffic, which is why I suggested patching libpcap. Add a flag to
>> capable netdevs that tells libpcap that in order to capture all the
>> traffic from this interface it needs to add a tc filter with a trap
>> action. That way zero familiarity with tc is required from users.
>
>Why not just make setting promisc mode on the device do this rather than
>require all of this tc indirection nonsense?

Because the "promisc mode" would gain another meaning. Now how the
driver should guess which meaning the user ment when he setted it?
filter or trap?

That is very confusing. If the flag is the way to do this, let's
introduce another flag, like IFF_TRAPPING indicating that user wants
exactly this.


>
>That's the whole point of this conversation I thought?

^ permalink raw reply

* [PATCH net] dev: Delay the free of the percpu refcount
From: Subash Abhinov Kasiviswanathan @ 2019-08-30  5:23 UTC (permalink / raw)
  To: eric.dumazet, davem, netdev
  Cc: Subash Abhinov Kasiviswanathan, Sean Tranchetti

While running stress-ng on an ARM64 kernel, the following oops
was observedi -

44837.761523:   <6> Unable to handle kernel paging request at
                     virtual address 0000004a88287000
44837.761651:   <2> pc : in_dev_finish_destroy+0x4c/0xc8
44837.761654:   <2> lr : in_dev_finish_destroy+0x2c/0xc8
44837.762393:   <2> Call trace:
44837.762398:   <2>  in_dev_finish_destroy+0x4c/0xc8
44837.762404:   <2>  in_dev_rcu_put+0x24/0x30
44837.762412:   <2>  rcu_nocb_kthread+0x43c/0x468
44837.762418:   <2>  kthread+0x118/0x128
44837.762424:   <2>  ret_from_fork+0x10/0x1c

Prior to this, it appeared as if some of the inet6_dev allocations
were failing. From the memory dump, the last operation performed
was dev_put(), however the pcpu_refcnt was NULL while the
reg_state = NETREG_RELEASED. Effectively, the refcount memory was
freed in free_netdev() before the last reference was dropped.

Fix this by freeing the memory after all references are dropped and
before the dev memory itself is freed.

Fixes: 29b4433d991c ("net: percpu net_device refcount")
Cc: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
---
 net/core/dev.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 49589ed..bce40d8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9128,6 +9128,9 @@ void netdev_freemem(struct net_device *dev)
 {
 	char *addr = (char *)dev - dev->padded;
 
+	free_percpu(dev->pcpu_refcnt);
+	dev->pcpu_refcnt = NULL;
+
 	kvfree(addr);
 }
 
@@ -9272,9 +9275,6 @@ void free_netdev(struct net_device *dev)
 	list_for_each_entry_safe(p, n, &dev->napi_list, dev_list)
 		netif_napi_del(p);
 
-	free_percpu(dev->pcpu_refcnt);
-	dev->pcpu_refcnt = NULL;
-
 	/*  Compatibility with error handling in drivers */
 	if (dev->reg_state == NETREG_UNINITIALIZED) {
 		netdev_freemem(dev);
-- 
1.9.1


^ permalink raw reply related

* [PATCH net-next] net_failover: get rid of the limitaion receive packet from standy dev when primary exist
From: wenxu @ 2019-08-30  5:07 UTC (permalink / raw)
  To: sridhar.samudrala; +Cc: netdev, davem

From: wenxu <wenxu@ucloud.cn>

For receive side the standby, primary and failover is the same one,
If the packet receive from standby or primary should can be deliver
to failover dev.

For example: there are VF and virtio device failover together.
When live migration the VF detached and send/recv packet through
virtio device. When VF attached again some ingress traffic may
receive from virtio device for cache reason(TC flower offload in
sw mode).

Signed-off-by: wenxu <wenxu@ucloud.cn>
---
 drivers/net/net_failover.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/drivers/net/net_failover.c b/drivers/net/net_failover.c
index b16a122..da3beb5 100644
--- a/drivers/net/net_failover.c
+++ b/drivers/net/net_failover.c
@@ -362,14 +362,6 @@ static rx_handler_result_t net_failover_handle_frame(struct sk_buff **pskb)
 {
 	struct sk_buff *skb = *pskb;
 	struct net_device *dev = rcu_dereference(skb->dev->rx_handler_data);
-	struct net_failover_info *nfo_info = netdev_priv(dev);
-	struct net_device *primary_dev, *standby_dev;
-
-	primary_dev = rcu_dereference(nfo_info->primary_dev);
-	standby_dev = rcu_dereference(nfo_info->standby_dev);
-
-	if (primary_dev && skb->dev == standby_dev)
-		return RX_HANDLER_EXACT;
 
 	skb->dev = dev;
 
-- 
1.8.3.1


^ permalink raw reply related

* linux-next: manual merge of the net-next tree with the net tree
From: Stephen Rothwell @ 2019-08-30  4:19 UTC (permalink / raw)
  To: David Miller, Networking
  Cc: Linux Next Mailing List, Linux Kernel Mailing List, Hayes Wang

[-- Attachment #1: Type: text/plain, Size: 1900 bytes --]

Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  drivers/net/usb/r8152.c

between commits:

  49d4b14113ca ("Revert "r8152: napi hangup fix after disconnect"")
  973dc6cfc0e2 ("r8152: remove calling netif_napi_del")

from the net tree and commit:

  d2187f8e4454 ("r8152: divide the tx and rx bottom functions")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/net/usb/r8152.c
index 04137ac373b0,c6fa0c17c13d..000000000000
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@@ -4021,7 -4214,9 +4214,8 @@@ static int rtl8152_close(struct net_dev
  #ifdef CONFIG_PM_SLEEP
  	unregister_pm_notifier(&tp->pm_notifier);
  #endif
+ 	tasklet_disable(&tp->tx_tl);
 -	if (!test_bit(RTL8152_UNPLUG, &tp->flags))
 -		napi_disable(&tp->napi);
 +	napi_disable(&tp->napi);
  	clear_bit(WORK_ENABLE, &tp->flags);
  	usb_kill_urb(tp->intr_urb);
  	cancel_delayed_work_sync(&tp->schedule);
@@@ -5352,6 -5604,8 +5603,7 @@@ static int rtl8152_probe(struct usb_int
  	return 0;
  
  out1:
 -	netif_napi_del(&tp->napi);
+ 	tasklet_kill(&tp->tx_tl);
  	usb_set_intfdata(intf, NULL);
  out:
  	free_netdev(netdev);
@@@ -5366,7 -5620,9 +5618,8 @@@ static void rtl8152_disconnect(struct u
  	if (tp) {
  		rtl_set_unplug(tp);
  
 -		netif_napi_del(&tp->napi);
  		unregister_netdev(tp->netdev);
+ 		tasklet_kill(&tp->tx_tl);
  		cancel_delayed_work_sync(&tp->hw_phy_work);
  		tp->rtl_ops.unload(tp);
  		free_netdev(tp->netdev);

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/3] capability: introduce CAP_BPF and CAP_TRACING
From: Alexei Starovoitov @ 2019-08-30  4:16 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andy Lutomirski, David S. Miller,
	Peter Zijlstra, Steven Rostedt, Network Development, bpf,
	Kernel Team, Linux API
In-Reply-To: <536636ad-0baf-31e9-85fe-2591b65068df@iogearbox.net>

On Thu, Aug 29, 2019 at 8:47 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 8/29/19 7:12 AM, Alexei Starovoitov wrote:
> [...]
> >
> > +/*
> > + * CAP_BPF allows the following BPF operations:
> > + * - Loading all types of BPF programs
> > + * - Creating all types of BPF maps except:
> > + *    - stackmap that needs CAP_TRACING
> > + *    - devmap that needs CAP_NET_ADMIN
> > + *    - cpumap that needs CAP_SYS_ADMIN
> > + * - Advanced verifier features
> > + *   - Indirect variable access
> > + *   - Bounded loops
> > + *   - BPF to BPF function calls
> > + *   - Scalar precision tracking
> > + *   - Larger complexity limits
> > + *   - Dead code elimination
> > + *   - And potentially other features
> > + * - Use of pointer-to-integer conversions in BPF programs
> > + * - Bypassing of speculation attack hardening measures
> > + * - Loading BPF Type Format (BTF) data
> > + * - Iterate system wide loaded programs, maps, BTF objects
> > + * - Retrieve xlated and JITed code of BPF programs
> > + * - Access maps and programs via id
> > + * - Use bpf_spin_lock() helper
>
> This is still very wide.

'still very wide' ? you make it sound like it's a bad thing.
Covering all of bpf with single CAP_BPF is #1 goal of this set.

> Consider following example: app has CAP_BPF +> CAP_NET_ADMIN. Why can't we in this case *only* allow loading networking
> related [plus generic] maps and programs? If it doesn't have CAP_TRACING,
> what would be a reason to allow loading it? Same vice versa. There are
> some misc program types like the infraread stuff, but they could continue
> to live under [CAP_BPF +] CAP_SYS_ADMIN as fallback. I think categorizing
> a specific list of prog and map types might be more clear than disallowing
> some helpers like below (e.g. why choice of bpf_probe_read() but not
> bpf_probe_write_user() etc).

It kinda makes sense:
cap_bpf+cap_net_admin allows networking progs.
cap_bpf+cap_tracing allows tracing progs.
But what to do with cg_sysctl, cg_device, lirc ?
They are clearly neither.
Invent yet another cap_foo for them?
or let them under cap_bpf alone?
If cap_bpf alone is enough then why bother with bpf+net_admin for networking?
It's not making anything cleaner. Only confuses users.

Also bpf_trace_printk() is using ftrace and can print arbitrary memory.
In that sense it's no different than bpf_probe_read.
Both should be under CAP_TRACING.
But bpf_trace_printk() is available to all progs.
Even to socket filters under cap_sys_admin today.
With this patch set I'm allowing bpf_trace_printk() under CAP_TRACING.
Same goes to bpf_probe_read.

High level description:
cap_bpf alone allows loading of all progs except when
later cap_net_admin or cap_tracing will _not_ be able to
filter out the helper at attach time that shouldn't be there.

Example of how this patch set works:
- to load and attach networking progs
both cap_bpf and cap_net_admin are necessary.
- to load and attach tracing progs
both cap_bpf and cap_tracing are necessary.

when networking prog is using bpf_trace_printk
then cap_tracing is needed too.
And it's checked at load time.
If we do what you're proposing:
"lets allow load of all networking with bpf+net_admin"
then this won't work for bpf_trace_printk.
Per helper function capability check is still needed.
And since it's needed I see no reason to limit
networking progs to bpf+net_admin at load time.
Load time is still cap_bpf only.
And helpers will be filtered out at attach by net_admin.

^ permalink raw reply

* [PATCH net-next v2 10/22] bnxt_en: Discover firmware error recovery capabilities.
From: Michael Chan @ 2019-08-30  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui
In-Reply-To: <1567137305-5853-1-git-send-email-michael.chan@broadcom.com>

Call the new firmware API HWRM_ERROR_RECOVERY_QCFG if it is supported
to discover the firmware health and recovery capabilities and settings.
This feature allows the driver to reset the chip if firmware crashes and
becomes unresponsive.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 77 +++++++++++++++++++++++++++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h | 38 +++++++++++++++
 2 files changed, 115 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 303933b..825a7f9 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -6847,6 +6847,8 @@ static int __bnxt_hwrm_func_qcaps(struct bnxt *bp)
 		bp->fw_cap |= BNXT_FW_CAP_PCIE_STATS_SUPPORTED;
 	if (flags & FUNC_QCAPS_RESP_FLAGS_EXT_STATS_SUPPORTED)
 		bp->fw_cap |= BNXT_FW_CAP_EXT_STATS_SUPPORTED;
+	if (flags &  FUNC_QCAPS_RESP_FLAGS_ERROR_RECOVERY_CAPABLE)
+		bp->fw_cap |= BNXT_FW_CAP_ERROR_RECOVERY;
 
 	bp->tx_push_thresh = 0;
 	if (flags & FUNC_QCAPS_RESP_FLAGS_PUSH_MODE_SUPPORTED)
@@ -6948,6 +6950,74 @@ static int bnxt_hwrm_cfa_adv_flow_mgnt_qcaps(struct bnxt *bp)
 	return rc;
 }
 
+static int bnxt_hwrm_error_recovery_qcfg(struct bnxt *bp)
+{
+	struct hwrm_error_recovery_qcfg_output *resp = bp->hwrm_cmd_resp_addr;
+	struct bnxt_fw_health *fw_health = bp->fw_health;
+	struct hwrm_error_recovery_qcfg_input req = {0};
+	int rc, i;
+
+	if (!(bp->fw_cap & BNXT_FW_CAP_ERROR_RECOVERY))
+		return 0;
+
+	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_ERROR_RECOVERY_QCFG, -1, -1);
+	mutex_lock(&bp->hwrm_cmd_lock);
+	rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+	if (rc)
+		goto err_recovery_out;
+	if (!fw_health) {
+		fw_health = kzalloc(sizeof(*fw_health), GFP_KERNEL);
+		bp->fw_health = fw_health;
+		if (!fw_health) {
+			rc = -ENOMEM;
+			goto err_recovery_out;
+		}
+	}
+	fw_health->flags = le32_to_cpu(resp->flags);
+	if ((fw_health->flags & ERROR_RECOVERY_QCFG_RESP_FLAGS_CO_CPU) &&
+	    !(bp->fw_cap & BNXT_FW_CAP_KONG_MB_CHNL)) {
+		rc = -EINVAL;
+		goto err_recovery_out;
+	}
+	fw_health->polling_dsecs = le32_to_cpu(resp->driver_polling_freq);
+	fw_health->master_func_wait_dsecs =
+		le32_to_cpu(resp->master_func_wait_period);
+	fw_health->normal_func_wait_dsecs =
+		le32_to_cpu(resp->normal_func_wait_period);
+	fw_health->post_reset_wait_dsecs =
+		le32_to_cpu(resp->master_func_wait_period_after_reset);
+	fw_health->post_reset_max_wait_dsecs =
+		le32_to_cpu(resp->max_bailout_time_after_reset);
+	fw_health->regs[BNXT_FW_HEALTH_REG] =
+		le32_to_cpu(resp->fw_health_status_reg);
+	fw_health->regs[BNXT_FW_HEARTBEAT_REG] =
+		le32_to_cpu(resp->fw_heartbeat_reg);
+	fw_health->regs[BNXT_FW_RESET_CNT_REG] =
+		le32_to_cpu(resp->fw_reset_cnt_reg);
+	fw_health->regs[BNXT_FW_RESET_INPROG_REG] =
+		le32_to_cpu(resp->reset_inprogress_reg);
+	fw_health->fw_reset_inprog_reg_mask =
+		le32_to_cpu(resp->reset_inprogress_reg_mask);
+	fw_health->fw_reset_seq_cnt = resp->reg_array_cnt;
+	if (fw_health->fw_reset_seq_cnt >= 16) {
+		rc = -EINVAL;
+		goto err_recovery_out;
+	}
+	for (i = 0; i < fw_health->fw_reset_seq_cnt; i++) {
+		fw_health->fw_reset_seq_regs[i] =
+			le32_to_cpu(resp->reset_reg[i]);
+		fw_health->fw_reset_seq_vals[i] =
+			le32_to_cpu(resp->reset_reg_val[i]);
+		fw_health->fw_reset_seq_delay_msec[i] =
+			resp->delay_after_reset[i];
+	}
+err_recovery_out:
+	mutex_unlock(&bp->hwrm_cmd_lock);
+	if (rc)
+		bp->fw_cap &= ~BNXT_FW_CAP_ERROR_RECOVERY;
+	return rc;
+}
+
 static int bnxt_hwrm_func_reset(struct bnxt *bp)
 {
 	struct hwrm_func_reset_input req = {0};
@@ -10058,6 +10128,11 @@ static int bnxt_fw_init_one_p2(struct bnxt *bp)
 		netdev_warn(bp->dev, "hwrm query adv flow mgnt failure rc: %d\n",
 			    rc);
 
+	rc = bnxt_hwrm_error_recovery_qcfg(bp);
+	if (rc)
+		netdev_warn(bp->dev, "hwrm query error recovery failure rc: %d\n",
+			    rc);
+
 	rc = bnxt_hwrm_func_drv_rgtr(bp);
 	if (rc)
 		return -ENODEV;
@@ -11238,6 +11313,8 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	bnxt_free_ctx_mem(bp);
 	kfree(bp->ctx);
 	bp->ctx = NULL;
+	kfree(bp->fw_health);
+	bp->fw_health = NULL;
 	bnxt_cleanup_pci(bp);
 
 init_err_free:
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 3e0fcc2..ce535e5 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1333,6 +1333,41 @@ struct bnxt_ctx_mem_info {
 	struct bnxt_ctx_pg_info *tqm_mem[9];
 };
 
+struct bnxt_fw_health {
+	u32 flags;
+	u32 polling_dsecs;
+	u32 master_func_wait_dsecs;
+	u32 normal_func_wait_dsecs;
+	u32 post_reset_wait_dsecs;
+	u32 post_reset_max_wait_dsecs;
+	u32 regs[4];
+	u32 mapped_regs[4];
+#define BNXT_FW_HEALTH_REG		0
+#define BNXT_FW_HEARTBEAT_REG		1
+#define BNXT_FW_RESET_CNT_REG		2
+#define BNXT_FW_RESET_INPROG_REG	3
+	u32 fw_reset_inprog_reg_mask;
+	u32 last_fw_heartbeat;
+	u32 last_fw_reset_cnt;
+	u8 enabled:1;
+	u8 master:1;
+	u8 tmr_multiplier;
+	u8 tmr_counter;
+	u8 fw_reset_seq_cnt;
+	u32 fw_reset_seq_regs[16];
+	u32 fw_reset_seq_vals[16];
+	u32 fw_reset_seq_delay_msec[16];
+};
+
+#define BNXT_FW_HEALTH_REG_TYPE_MASK	3
+#define BNXT_FW_HEALTH_REG_TYPE_CFG	0
+#define BNXT_FW_HEALTH_REG_TYPE_GRC	1
+#define BNXT_FW_HEALTH_REG_TYPE_BAR0	2
+#define BNXT_FW_HEALTH_REG_TYPE_BAR1	3
+
+#define BNXT_FW_HEALTH_REG_TYPE(reg)	((reg) & BNXT_FW_HEALTH_REG_TYPE_MASK)
+#define BNXT_FW_HEALTH_REG_OFF(reg)	((reg) & ~BNXT_FW_HEALTH_REG_TYPE_MASK)
+
 struct bnxt {
 	void __iomem		*bar0;
 	void __iomem		*bar1;
@@ -1581,6 +1616,7 @@ struct bnxt {
 	#define BNXT_FW_CAP_KONG_MB_CHNL		0x00000080
 	#define BNXT_FW_CAP_OVS_64BIT_HANDLE		0x00000400
 	#define BNXT_FW_CAP_TRUSTED_VF			0x00000800
+	#define BNXT_FW_CAP_ERROR_RECOVERY		0x00002000
 	#define BNXT_FW_CAP_PKG_VER			0x00004000
 	#define BNXT_FW_CAP_CFA_ADV_FLOW		0x00008000
 	#define BNXT_FW_CAP_CFA_RFS_RING_TBL_IDX	0x00010000
@@ -1666,6 +1702,8 @@ struct bnxt {
 #define BNXT_UPDATE_PHY_SP_EVENT	16
 #define BNXT_RING_COAL_NOW_SP_EVENT	17
 
+	struct bnxt_fw_health	*fw_health;
+
 	struct bnxt_hw_resc	hw_resc;
 	struct bnxt_pf_info	pf;
 	struct bnxt_ctx_mem_info	*ctx;
-- 
2.5.1


^ permalink raw reply related

* [PATCH net-next v2 11/22] bnxt_en: Pre-map the firmware health monitoring registers.
From: Michael Chan @ 2019-08-30  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui
In-Reply-To: <1567137305-5853-1-git-send-email-michael.chan@broadcom.com>

Pre-map the GRC registers for periodic firmware health monitoring.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 29 +++++++++++++++++++++++++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  6 ++++++
 2 files changed, 35 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 825a7f9..8ec41d6 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -6950,6 +6950,33 @@ static int bnxt_hwrm_cfa_adv_flow_mgnt_qcaps(struct bnxt *bp)
 	return rc;
 }
 
+static int bnxt_map_fw_health_regs(struct bnxt *bp)
+{
+	struct bnxt_fw_health *fw_health = bp->fw_health;
+	u32 reg_base = 0xffffffff;
+	int i;
+
+	/* Only pre-map the monitoring GRC registers using window 3 */
+	for (i = 0; i < 4; i++) {
+		u32 reg = fw_health->regs[i];
+
+		if (BNXT_FW_HEALTH_REG_TYPE(reg) != BNXT_FW_HEALTH_REG_TYPE_GRC)
+			continue;
+		if (reg_base == 0xffffffff)
+			reg_base = reg & BNXT_GRC_BASE_MASK;
+		if ((reg & BNXT_GRC_BASE_MASK) != reg_base)
+			return -ERANGE;
+		fw_health->mapped_regs[i] = BNXT_FW_HEALTH_WIN_BASE +
+					    (reg & BNXT_GRC_OFFSET_MASK);
+	}
+	if (reg_base == 0xffffffff)
+		return 0;
+
+	writel(reg_base, bp->bar0 + BNXT_GRCPF_REG_WINDOW_BASE_OUT +
+			 BNXT_FW_HEALTH_WIN_MAP_OFF);
+	return 0;
+}
+
 static int bnxt_hwrm_error_recovery_qcfg(struct bnxt *bp)
 {
 	struct hwrm_error_recovery_qcfg_output *resp = bp->hwrm_cmd_resp_addr;
@@ -7013,6 +7040,8 @@ static int bnxt_hwrm_error_recovery_qcfg(struct bnxt *bp)
 	}
 err_recovery_out:
 	mutex_unlock(&bp->hwrm_cmd_lock);
+	if (!rc)
+		rc = bnxt_map_fw_health_regs(bp);
 	if (rc)
 		bp->fw_cap &= ~BNXT_FW_CAP_ERROR_RECOVERY;
 	return rc;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index ce535e5..78fd585 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1217,6 +1217,9 @@ struct bnxt_test_info {
 #define BNXT_GRCPF_REG_KONG_COMM		0xA00
 #define BNXT_GRCPF_REG_KONG_COMM_TRIGGER	0xB00
 
+#define BNXT_GRC_BASE_MASK			0xfffff000
+#define BNXT_GRC_OFFSET_MASK			0x00000ffc
+
 struct bnxt_tc_flow_stats {
 	u64		packets;
 	u64		bytes;
@@ -1368,6 +1371,9 @@ struct bnxt_fw_health {
 #define BNXT_FW_HEALTH_REG_TYPE(reg)	((reg) & BNXT_FW_HEALTH_REG_TYPE_MASK)
 #define BNXT_FW_HEALTH_REG_OFF(reg)	((reg) & ~BNXT_FW_HEALTH_REG_TYPE_MASK)
 
+#define BNXT_FW_HEALTH_WIN_BASE		0x3000
+#define BNXT_FW_HEALTH_WIN_MAP_OFF	8
+
 struct bnxt {
 	void __iomem		*bar0;
 	void __iomem		*bar1;
-- 
2.5.1


^ permalink raw reply related

* [PATCH net-next v2 12/22] bnxt_en: Enable health monitoring.
From: Michael Chan @ 2019-08-30  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui
In-Reply-To: <1567137305-5853-1-git-send-email-michael.chan@broadcom.com>

Handle the async event from the firmware that enables firmware health
monitoring.  Store initial health metrics.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 66 ++++++++++++++++++++++++++++++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  9 +++++
 2 files changed, 73 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 8ec41d6..27fc047 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -254,6 +254,7 @@ static const u16 bnxt_async_events_arr[] = {
 	ASYNC_EVENT_CMPL_EVENT_ID_PORT_CONN_NOT_ALLOWED,
 	ASYNC_EVENT_CMPL_EVENT_ID_VF_CFG_CHANGE,
 	ASYNC_EVENT_CMPL_EVENT_ID_LINK_SPEED_CFG_CHANGE,
+	ASYNC_EVENT_CMPL_EVENT_ID_ERROR_RECOVERY,
 };
 
 static struct workqueue_struct *bnxt_pf_wq;
@@ -1896,6 +1897,33 @@ static int bnxt_force_rx_discard(struct bnxt *bp,
 	return bnxt_rx_pkt(bp, cpr, raw_cons, event);
 }
 
+u32 bnxt_fw_health_readl(struct bnxt *bp, int reg_idx)
+{
+	struct bnxt_fw_health *fw_health = bp->fw_health;
+	u32 reg = fw_health->regs[reg_idx];
+	u32 reg_type, reg_off, val = 0;
+
+	reg_type = BNXT_FW_HEALTH_REG_TYPE(reg);
+	reg_off = BNXT_FW_HEALTH_REG_OFF(reg);
+	switch (reg_type) {
+	case BNXT_FW_HEALTH_REG_TYPE_CFG:
+		pci_read_config_dword(bp->pdev, reg_off, &val);
+		break;
+	case BNXT_FW_HEALTH_REG_TYPE_GRC:
+		reg_off = fw_health->mapped_regs[reg_idx];
+		/* fall through */
+	case BNXT_FW_HEALTH_REG_TYPE_BAR0:
+		val = readl(bp->bar0 + reg_off);
+		break;
+	case BNXT_FW_HEALTH_REG_TYPE_BAR1:
+		val = readl(bp->bar1 + reg_off);
+		break;
+	}
+	if (reg_idx == BNXT_FW_RESET_INPROG_REG)
+		val &= fw_health->fw_reset_inprog_reg_mask;
+	return val;
+}
+
 #define BNXT_GET_EVENT_PORT(data)	\
 	((data) &			\
 	 ASYNC_EVENT_CMPL_PORT_CONN_NOT_ALLOWED_EVENT_DATA1_PORT_ID_MASK)
@@ -1951,6 +1979,35 @@ static int bnxt_async_event_process(struct bnxt *bp,
 			goto async_event_process_exit;
 		set_bit(BNXT_RESET_TASK_SILENT_SP_EVENT, &bp->sp_event);
 		break;
+	case ASYNC_EVENT_CMPL_EVENT_ID_ERROR_RECOVERY: {
+		struct bnxt_fw_health *fw_health = bp->fw_health;
+		u32 data1 = le32_to_cpu(cmpl->event_data1);
+
+		if (!fw_health)
+			goto async_event_process_exit;
+
+		fw_health->enabled = EVENT_DATA1_RECOVERY_ENABLED(data1);
+		fw_health->master = EVENT_DATA1_RECOVERY_MASTER_FUNC(data1);
+		if (!fw_health->enabled)
+			break;
+
+		if (netif_msg_drv(bp))
+			netdev_info(bp->dev, "Error recovery info: error recovery[%d], master[%d], reset count[0x%x], health status: 0x%x\n",
+				    fw_health->enabled, fw_health->master,
+				    bnxt_fw_health_readl(bp,
+							 BNXT_FW_RESET_CNT_REG),
+				    bnxt_fw_health_readl(bp,
+							 BNXT_FW_HEALTH_REG));
+		fw_health->tmr_multiplier =
+			DIV_ROUND_UP(fw_health->polling_dsecs * HZ,
+				     bp->current_interval * 10);
+		fw_health->tmr_counter = fw_health->tmr_multiplier;
+		fw_health->last_fw_heartbeat =
+			bnxt_fw_health_readl(bp, BNXT_FW_HEARTBEAT_REG);
+		fw_health->last_fw_reset_cnt =
+			bnxt_fw_health_readl(bp, BNXT_FW_RESET_CNT_REG);
+		goto async_event_process_exit;
+	}
 	default:
 		goto async_event_process_exit;
 	}
@@ -4310,9 +4367,14 @@ int bnxt_hwrm_func_rgtr_async_events(struct bnxt *bp, unsigned long *bmap,
 		cpu_to_le32(FUNC_DRV_RGTR_REQ_ENABLES_ASYNC_EVENT_FWD);
 
 	memset(async_events_bmap, 0, sizeof(async_events_bmap));
-	for (i = 0; i < ARRAY_SIZE(bnxt_async_events_arr); i++)
-		__set_bit(bnxt_async_events_arr[i], async_events_bmap);
+	for (i = 0; i < ARRAY_SIZE(bnxt_async_events_arr); i++) {
+		u16 event_id = bnxt_async_events_arr[i];
 
+		if (event_id == ASYNC_EVENT_CMPL_EVENT_ID_ERROR_RECOVERY &&
+		    !(bp->fw_cap & BNXT_FW_CAP_ERROR_RECOVERY))
+			continue;
+		__set_bit(bnxt_async_events_arr[i], async_events_bmap);
+	}
 	if (bmap && bmap_size) {
 		for (i = 0; i < bmap_size; i++) {
 			if (test_bit(i, bmap))
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 78fd585..eb55519 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -472,6 +472,14 @@ struct rx_tpa_end_cmp_ext {
 	((le32_to_cpu((rx_tpa_end_ext)->rx_tpa_end_cmp_dup_acks) &	\
 	 RX_TPA_END_CMP_AGG_BUFS_P5) >> RX_TPA_END_CMP_AGG_BUFS_SHIFT_P5)
 
+#define EVENT_DATA1_RECOVERY_MASTER_FUNC(data1)				\
+	!!((data1) &							\
+	   ASYNC_EVENT_CMPL_ERROR_RECOVERY_EVENT_DATA1_FLAGS_MASTER_FUNC)
+
+#define EVENT_DATA1_RECOVERY_ENABLED(data1)				\
+	!!((data1) &							\
+	   ASYNC_EVENT_CMPL_ERROR_RECOVERY_EVENT_DATA1_FLAGS_RECOVERY_ENABLED)
+
 struct nqe_cn {
 	__le16	type;
 	#define NQ_CN_TYPE_MASK           0x3fUL
@@ -1914,6 +1922,7 @@ extern const u16 bnxt_lhint_arr[];
 int bnxt_alloc_rx_data(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
 		       u16 prod, gfp_t gfp);
 void bnxt_reuse_rx_data(struct bnxt_rx_ring_info *rxr, u16 cons, void *data);
+u32 bnxt_fw_health_readl(struct bnxt *bp, int reg_idx);
 void bnxt_set_tpa_flags(struct bnxt *bp);
 void bnxt_set_ring_params(struct bnxt *);
 int bnxt_set_rx_skb_mode(struct bnxt *bp, bool page_mode);
-- 
2.5.1


^ permalink raw reply related

* [PATCH net-next v2 02/22] bnxt_en: Remove the -1 error return code from bnxt_hwrm_do_send_msg().
From: Michael Chan @ 2019-08-30  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui
In-Reply-To: <1567137305-5853-1-git-send-email-michael.chan@broadcom.com>

Replace the non-standard -1 code with -EBUSY when there is no firmware
response after waiting for the maximum timeout.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index b9ad43d..c8550ca 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -4162,7 +4162,7 @@ static int bnxt_hwrm_do_send_msg(struct bnxt *bp, void *msg, u32 msg_len,
 		if (bp->hwrm_intr_seq_id != (u16)~seq_id) {
 			netdev_err(bp->dev, "Resp cmpl intr err msg: 0x%x\n",
 				   le16_to_cpu(req->req_type));
-			return -1;
+			return -EBUSY;
 		}
 		len = (le32_to_cpu(*resp_len) & HWRM_RESP_LEN_MASK) >>
 		      HWRM_RESP_LEN_SFT;
@@ -4190,7 +4190,7 @@ static int bnxt_hwrm_do_send_msg(struct bnxt *bp, void *msg, u32 msg_len,
 				   HWRM_TOTAL_TIMEOUT(i),
 				   le16_to_cpu(req->req_type),
 				   le16_to_cpu(req->seq_id), len);
-			return -1;
+			return -EBUSY;
 		}
 
 		/* Last byte of resp contains valid bit */
@@ -4208,7 +4208,7 @@ static int bnxt_hwrm_do_send_msg(struct bnxt *bp, void *msg, u32 msg_len,
 				   HWRM_TOTAL_TIMEOUT(i),
 				   le16_to_cpu(req->req_type),
 				   le16_to_cpu(req->seq_id), len, *valid);
-			return -1;
+			return -EBUSY;
 		}
 	}
 
-- 
2.5.1


^ permalink raw reply related

* [PATCH net-next v2 15/22] bnxt_en: Handle RESET_NOTIFY async event from firmware.
From: Michael Chan @ 2019-08-30  3:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui
In-Reply-To: <1567137305-5853-1-git-send-email-michael.chan@broadcom.com>

This event from firmware signals a coordinated reset initiated by the
firmware.  It may be triggered by some error conditions encountered
in the firmware or other orderly reset conditions.

We store the parameters from this event.  Subsequent patches will
add logic to handle reset itself using devlink reporters.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 11 +++++++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  7 +++++++
 2 files changed, 18 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 4caacab..d1d33f6 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -254,6 +254,7 @@ static const u16 bnxt_async_events_arr[] = {
 	ASYNC_EVENT_CMPL_EVENT_ID_PORT_CONN_NOT_ALLOWED,
 	ASYNC_EVENT_CMPL_EVENT_ID_VF_CFG_CHANGE,
 	ASYNC_EVENT_CMPL_EVENT_ID_LINK_SPEED_CFG_CHANGE,
+	ASYNC_EVENT_CMPL_EVENT_ID_RESET_NOTIFY,
 	ASYNC_EVENT_CMPL_EVENT_ID_ERROR_RECOVERY,
 };
 
@@ -1979,6 +1980,16 @@ static int bnxt_async_event_process(struct bnxt *bp,
 			goto async_event_process_exit;
 		set_bit(BNXT_RESET_TASK_SILENT_SP_EVENT, &bp->sp_event);
 		break;
+	case ASYNC_EVENT_CMPL_EVENT_ID_RESET_NOTIFY:
+		bp->fw_reset_timestamp = jiffies;
+		bp->fw_reset_min_dsecs = cmpl->timestamp_lo;
+		if (!bp->fw_reset_min_dsecs)
+			bp->fw_reset_min_dsecs = BNXT_DFLT_FW_RST_MIN_DSECS;
+		bp->fw_reset_max_dsecs = le16_to_cpu(cmpl->timestamp_hi);
+		if (!bp->fw_reset_max_dsecs)
+			bp->fw_reset_max_dsecs = BNXT_DFLT_FW_RST_MAX_DSECS;
+		set_bit(BNXT_FW_RESET_NOTIFY_SP_EVENT, &bp->sp_event);
+		break;
 	case ASYNC_EVENT_CMPL_EVENT_ID_ERROR_RECOVERY: {
 		struct bnxt_fw_health *fw_health = bp->fw_health;
 		u32 data1 = le32_to_cpu(cmpl->event_data1);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index a75fe16..858dc40 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1719,6 +1719,13 @@ struct bnxt {
 #define BNXT_FLOW_STATS_SP_EVENT	15
 #define BNXT_UPDATE_PHY_SP_EVENT	16
 #define BNXT_RING_COAL_NOW_SP_EVENT	17
+#define BNXT_FW_RESET_NOTIFY_SP_EVENT	18
+
+	u16			fw_reset_min_dsecs;
+#define BNXT_DFLT_FW_RST_MIN_DSECS	20
+	u16			fw_reset_max_dsecs;
+#define BNXT_DFLT_FW_RST_MAX_DSECS	60
+	unsigned long		fw_reset_timestamp;
 
 	struct bnxt_fw_health	*fw_health;
 
-- 
2.5.1


^ permalink raw reply related

* [PATCH net-next v2 17/22] bnxt_en: Add devlink health reset reporter.
From: Michael Chan @ 2019-08-30  3:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui, Jiri Pirko
In-Reply-To: <1567137305-5853-1-git-send-email-michael.chan@broadcom.com>

From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>

Add devlink health reporter for the firmware reset event.  Once we get
the notification from firmware about the impending reset, the driver
will report this to devlink and the call to bnxt_fw_reset() will be
initiated to complete the reset sequence.

Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c         |  3 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h         |  5 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 52 +++++++++++++++++++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h |  1 +
 4 files changed, 61 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 98b15551..cd20067 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -10167,6 +10167,9 @@ static void bnxt_sp_task(struct work_struct *work)
 	if (test_and_clear_bit(BNXT_RESET_TASK_SILENT_SP_EVENT, &bp->sp_event))
 		bnxt_reset(bp, true);
 
+	if (test_and_clear_bit(BNXT_FW_RESET_NOTIFY_SP_EVENT, &bp->sp_event))
+		bnxt_devlink_health_report(bp, BNXT_FW_RESET_NOTIFY_SP_EVENT);
+
 	smp_mb__before_atomic();
 	clear_bit(BNXT_STATE_IN_SP_TASK, &bp->state);
 }
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index c78aa51..d65625f 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1371,6 +1371,11 @@ struct bnxt_fw_health {
 	u32 fw_reset_seq_vals[16];
 	u32 fw_reset_seq_delay_msec[16];
 	struct devlink_health_reporter	*fw_reporter;
+	struct devlink_health_reporter *fw_reset_reporter;
+};
+
+struct bnxt_fw_reporter_ctx {
+	unsigned long sp_event;
 };
 
 #define BNXT_FW_HEALTH_REG_TYPE_MASK	3
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
index e15d5dc..8512467 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
@@ -65,6 +65,24 @@ static const struct devlink_health_reporter_ops bnxt_dl_fw_reporter_ops = {
 	.diagnose = bnxt_fw_reporter_diagnose,
 };
 
+static int bnxt_fw_reset_recover(struct devlink_health_reporter *reporter,
+				 void *priv_ctx)
+{
+	struct bnxt *bp = devlink_health_reporter_priv(reporter);
+
+	if (!priv_ctx)
+		return -EOPNOTSUPP;
+
+	bnxt_fw_reset(bp);
+	return 0;
+}
+
+static const
+struct devlink_health_reporter_ops bnxt_dl_fw_reset_reporter_ops = {
+	.name = "fw_reset",
+	.recover = bnxt_fw_reset_recover,
+};
+
 static void bnxt_dl_fw_reporters_create(struct bnxt *bp)
 {
 	struct bnxt_fw_health *health = bp->fw_health;
@@ -80,6 +98,16 @@ static void bnxt_dl_fw_reporters_create(struct bnxt *bp)
 			    PTR_ERR(health->fw_reporter));
 		health->fw_reporter = NULL;
 	}
+
+	health->fw_reset_reporter =
+		devlink_health_reporter_create(bp->dl,
+					       &bnxt_dl_fw_reset_reporter_ops,
+					       0, true, bp);
+	if (IS_ERR(health->fw_reset_reporter)) {
+		netdev_warn(bp->dev, "Failed to create FW fatal health reporter, rc = %ld\n",
+			    PTR_ERR(health->fw_reset_reporter));
+		health->fw_reset_reporter = NULL;
+	}
 }
 
 static void bnxt_dl_fw_reporters_destroy(struct bnxt *bp)
@@ -91,6 +119,30 @@ static void bnxt_dl_fw_reporters_destroy(struct bnxt *bp)
 
 	if (health->fw_reporter)
 		devlink_health_reporter_destroy(health->fw_reporter);
+
+	if (health->fw_reset_reporter)
+		devlink_health_reporter_destroy(health->fw_reset_reporter);
+}
+
+void bnxt_devlink_health_report(struct bnxt *bp, unsigned long event)
+{
+	struct bnxt_fw_health *fw_health = bp->fw_health;
+	struct bnxt_fw_reporter_ctx fw_reporter_ctx;
+
+	if (!fw_health)
+		return;
+
+	fw_reporter_ctx.sp_event = event;
+	switch (event) {
+	case BNXT_FW_RESET_NOTIFY_SP_EVENT:
+		if (!fw_health->fw_reset_reporter)
+			return;
+
+		devlink_health_report(fw_health->fw_reset_reporter,
+				      "FW non-fatal reset event received",
+				      &fw_reporter_ctx);
+		return;
+	}
 }
 
 static const struct devlink_ops bnxt_dl_ops = {
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h
index 5b6b2c7..b97e0ba 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h
@@ -55,6 +55,7 @@ struct bnxt_dl_nvm_param {
 	u16 num_bits;
 };
 
+void bnxt_devlink_health_report(struct bnxt *bp, unsigned long event);
 int bnxt_dl_register(struct bnxt *bp);
 void bnxt_dl_unregister(struct bnxt *bp);
 
-- 
2.5.1


^ permalink raw reply related

* [PATCH net-next v2 18/22] bnxt_en: Retain user settings on a VF after RESET_NOTIFY event.
From: Michael Chan @ 2019-08-30  3:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui
In-Reply-To: <1567137305-5853-1-git-send-email-michael.chan@broadcom.com>

From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>

Retain the VF MAC address, default VLAN, TX rate control, trust settings
of VFs after firmware reset.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c       |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 50 +++++++++++++++++++++----
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h |  2 +-
 3 files changed, 45 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index cd20067..ff911d6 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -9211,7 +9211,7 @@ static int bnxt_open(struct net_device *dev)
 			int n = pf->active_vfs;
 
 			if (n)
-				bnxt_cfg_hw_sriov(bp, &n);
+				bnxt_cfg_hw_sriov(bp, &n, true);
 		}
 		bnxt_hwmon_open(bp);
 	}
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index 4aacfc3..f6f3454 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -470,10 +470,43 @@ static int bnxt_hwrm_func_buf_rgtr(struct bnxt *bp)
 	return hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
 }
 
+/* Caller holds bp->hwrm_cmd_lock mutex lock */
+static void __bnxt_set_vf_params(struct bnxt *bp, int vf_id)
+{
+	struct hwrm_func_cfg_input req = {0};
+	struct bnxt_vf_info *vf;
+
+	vf = &bp->pf.vf[vf_id];
+	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_FUNC_CFG, -1, -1);
+	req.fid = cpu_to_le16(vf->fw_fid);
+	req.flags = cpu_to_le32(vf->func_flags);
+
+	if (is_valid_ether_addr(vf->mac_addr)) {
+		req.enables |= cpu_to_le32(FUNC_CFG_REQ_ENABLES_DFLT_MAC_ADDR);
+		memcpy(req.dflt_mac_addr, vf->mac_addr, ETH_ALEN);
+	}
+	if (vf->vlan) {
+		req.enables |= cpu_to_le32(FUNC_CFG_REQ_ENABLES_DFLT_VLAN);
+		req.dflt_vlan = cpu_to_le16(vf->vlan);
+	}
+	if (vf->max_tx_rate) {
+		req.enables |= cpu_to_le32(FUNC_CFG_REQ_ENABLES_MAX_BW);
+		req.max_bw = cpu_to_le32(vf->max_tx_rate);
+#ifdef HAVE_IFLA_TX_RATE
+		req.enables |= cpu_to_le32(FUNC_CFG_REQ_ENABLES_MIN_BW);
+		req.min_bw = cpu_to_le32(vf->min_tx_rate);
+#endif
+	}
+	if (vf->flags & BNXT_VF_TRUST)
+		req.flags |= cpu_to_le32(FUNC_CFG_REQ_FLAGS_TRUSTED_VF_ENABLE);
+
+	_hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+}
+
 /* Only called by PF to reserve resources for VFs, returns actual number of
  * VFs configured, or < 0 on error.
  */
-static int bnxt_hwrm_func_vf_resc_cfg(struct bnxt *bp, int num_vfs)
+static int bnxt_hwrm_func_vf_resc_cfg(struct bnxt *bp, int num_vfs, bool reset)
 {
 	struct hwrm_func_vf_resource_cfg_input req = {0};
 	struct bnxt_hw_resc *hw_resc = &bp->hw_resc;
@@ -545,6 +578,9 @@ static int bnxt_hwrm_func_vf_resc_cfg(struct bnxt *bp, int num_vfs)
 
 	mutex_lock(&bp->hwrm_cmd_lock);
 	for (i = 0; i < num_vfs; i++) {
+		if (reset)
+			__bnxt_set_vf_params(bp, i);
+
 		req.vf_id = cpu_to_le16(pf->first_vf_id + i);
 		rc = _hwrm_send_message(bp, &req, sizeof(req),
 					HWRM_CMD_TIMEOUT);
@@ -659,15 +695,15 @@ static int bnxt_hwrm_func_cfg(struct bnxt *bp, int num_vfs)
 	return rc;
 }
 
-static int bnxt_func_cfg(struct bnxt *bp, int num_vfs)
+static int bnxt_func_cfg(struct bnxt *bp, int num_vfs, bool reset)
 {
 	if (BNXT_NEW_RM(bp))
-		return bnxt_hwrm_func_vf_resc_cfg(bp, num_vfs);
+		return bnxt_hwrm_func_vf_resc_cfg(bp, num_vfs, reset);
 	else
 		return bnxt_hwrm_func_cfg(bp, num_vfs);
 }
 
-int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs)
+int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs, bool reset)
 {
 	int rc;
 
@@ -677,7 +713,7 @@ int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs)
 		return rc;
 
 	/* Reserve resources for VFs */
-	rc = bnxt_func_cfg(bp, *num_vfs);
+	rc = bnxt_func_cfg(bp, *num_vfs, reset);
 	if (rc != *num_vfs) {
 		if (rc <= 0) {
 			netdev_warn(bp->dev, "Unable to reserve resources for SRIOV.\n");
@@ -758,7 +794,7 @@ static int bnxt_sriov_enable(struct bnxt *bp, int *num_vfs)
 	if (rc)
 		goto err_out1;
 
-	rc = bnxt_cfg_hw_sriov(bp, num_vfs);
+	rc = bnxt_cfg_hw_sriov(bp, num_vfs, false);
 	if (rc)
 		goto err_out2;
 
@@ -1144,7 +1180,7 @@ int bnxt_approve_mac(struct bnxt *bp, u8 *mac, bool strict)
 }
 #else
 
-int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs)
+int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs, bool reset)
 {
 	if (*num_vfs)
 		return -EOPNOTSUPP;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
index 0abf18e..629641b 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.h
@@ -36,7 +36,7 @@ int bnxt_set_vf_link_state(struct net_device *, int, int);
 int bnxt_set_vf_spoofchk(struct net_device *, int, bool);
 int bnxt_set_vf_trust(struct net_device *dev, int vf_id, bool trust);
 int bnxt_sriov_configure(struct pci_dev *pdev, int num_vfs);
-int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs);
+int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs, bool reset);
 void bnxt_sriov_disable(struct bnxt *);
 void bnxt_hwrm_exec_fwd_req(struct bnxt *);
 void bnxt_update_vf_mac(struct bnxt *);
-- 
2.5.1


^ permalink raw reply related

* [PATCH net-next v2 19/22] bnxt_en: Do not send firmware messages if firmware is in error state.
From: Michael Chan @ 2019-08-30  3:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui
In-Reply-To: <1567137305-5853-1-git-send-email-michael.chan@broadcom.com>

Add a flag to mark that the firmware has encountered fatal condition.
The driver will not send any more firmware messages and will return
error to the caller.  Fix up some clean up functions to continue
and not abort when the firmware message function returns error.

This is preparation work to fully handle firmware error recovery
under fatal conditions.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 18 +++++++++++-------
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  1 +
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index ff911d6..f584a6b 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -4173,6 +4173,9 @@ static int bnxt_hwrm_do_send_msg(struct bnxt *bp, void *msg, u32 msg_len,
 	u32 bar_offset = BNXT_GRCPF_REG_CHIMP_COMM;
 	u16 dst = BNXT_HWRM_CHNL_CHIMP;
 
+	if (test_bit(BNXT_STATE_FW_FATAL_COND, &bp->state))
+		return -EBUSY;
+
 	if (msg_len > BNXT_HWRM_MAX_REQ_LEN) {
 		if (msg_len > bp->hwrm_max_ext_req_len ||
 		    !bp->hwrm_short_cmd_req_addr)
@@ -5042,8 +5045,6 @@ static int bnxt_hwrm_vnic_free_one(struct bnxt *bp, u16 vnic_id)
 			cpu_to_le32(bp->vnic_info[vnic_id].fw_vnic_id);
 
 		rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
-		if (rc)
-			return rc;
 		bp->vnic_info[vnic_id].fw_vnic_id = INVALID_HW_RING_ID;
 	}
 	return rc;
@@ -5183,8 +5184,6 @@ static int bnxt_hwrm_ring_grp_free(struct bnxt *bp)
 
 		rc = _hwrm_send_message(bp, &req, sizeof(req),
 					HWRM_CMD_TIMEOUT);
-		if (rc)
-			break;
 		bp->grp_info[i].fw_grp_id = INVALID_HW_RING_ID;
 	}
 	mutex_unlock(&bp->hwrm_cmd_lock);
@@ -5503,6 +5502,9 @@ static int hwrm_ring_free_send_msg(struct bnxt *bp,
 	struct hwrm_ring_free_output *resp = bp->hwrm_cmd_resp_addr;
 	u16 error_code;
 
+	if (test_bit(BNXT_STATE_FW_FATAL_COND, &bp->state))
+		return 0;
+
 	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_RING_FREE, cmpl_ring_id, -1);
 	req.ring_type = ring_type;
 	req.ring_id = cpu_to_le16(ring->fw_ring_id);
@@ -6300,8 +6302,6 @@ static int bnxt_hwrm_stat_ctx_free(struct bnxt *bp)
 
 			rc = _hwrm_send_message(bp, &req, sizeof(req),
 						HWRM_CMD_TIMEOUT);
-			if (rc)
-				break;
 
 			cpr->hw_stats_ctx_id = INVALID_STATS_CTX_ID;
 		}
@@ -7415,6 +7415,8 @@ static int bnxt_set_tpa(struct bnxt *bp, bool set_tpa)
 
 	if (set_tpa)
 		tpa_flags = bp->flags & BNXT_FLAG_TPA;
+	else if (test_bit(BNXT_STATE_FW_FATAL_COND, &bp->state))
+		return 0;
 	for (i = 0; i < bp->nr_vnics; i++) {
 		rc = bnxt_hwrm_vnic_set_tpa(bp, i, tpa_flags);
 		if (rc) {
@@ -10009,7 +10011,8 @@ void bnxt_fw_reset(struct bnxt *bp)
 	if (test_bit(BNXT_STATE_OPEN, &bp->state) &&
 	    !test_bit(BNXT_STATE_IN_FW_RESET, &bp->state)) {
 		set_bit(BNXT_STATE_IN_FW_RESET, &bp->state);
-		if (BNXT_PF(bp) && bp->pf.active_vfs) {
+		if (BNXT_PF(bp) && bp->pf.active_vfs &&
+		    !test_bit(BNXT_STATE_FW_FATAL_COND, &bp->state)) {
 			rc = bnxt_hwrm_func_qcfg(bp);
 			if (rc) {
 				netdev_err(bp->dev, "Firmware reset aborted, first func_qcfg cmd failed, rc = %d\n",
@@ -10439,6 +10442,7 @@ static void bnxt_fw_reset_task(struct work_struct *work)
 		bnxt_queue_fw_reset_work(bp, bp->fw_reset_min_dsecs * HZ / 10);
 		return;
 	case BNXT_FW_RESET_STATE_ENABLE_DEV:
+		clear_bit(BNXT_STATE_FW_FATAL_COND, &bp->state);
 		if (pci_enable_device(bp->pdev)) {
 			netdev_err(bp->dev, "Cannot re-enable PCI device\n");
 			goto fw_reset_abort;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index d65625f..f3a6aad 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1617,6 +1617,7 @@ struct bnxt {
 #define BNXT_STATE_FW_RESET_DET 3
 #define BNXT_STATE_IN_FW_RESET	4
 #define BNXT_STATE_ABORT_ERR	5
+#define BNXT_STATE_FW_FATAL_COND	6
 
 	struct bnxt_irq	*irq_tbl;
 	int			total_irqs;
-- 
2.5.1


^ permalink raw reply related

* [PATCH net-next v2 20/22] bnxt_en: Add RESET_FW state logic to bnxt_fw_reset_task().
From: Michael Chan @ 2019-08-30  3:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui
In-Reply-To: <1567137305-5853-1-git-send-email-michael.chan@broadcom.com>

This state handles driver initiated chip reset during error recovery.
Only the master function will perform this step during error recovery.
The next patch will add code to initiate this reset from the master
function.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 64 +++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index f584a6b..51cf679 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -10402,6 +10402,62 @@ static int bnxt_fw_init_one(struct bnxt *bp)
 	return 0;
 }
 
+static void bnxt_fw_reset_writel(struct bnxt *bp, int reg_idx)
+{
+	struct bnxt_fw_health *fw_health = bp->fw_health;
+	u32 reg = fw_health->fw_reset_seq_regs[reg_idx];
+	u32 val = fw_health->fw_reset_seq_vals[reg_idx];
+	u32 reg_type, reg_off, delay_msecs;
+
+	delay_msecs = fw_health->fw_reset_seq_delay_msec[reg_idx];
+	reg_type = BNXT_FW_HEALTH_REG_TYPE(reg);
+	reg_off = BNXT_FW_HEALTH_REG_OFF(reg);
+	switch (reg_type) {
+	case BNXT_FW_HEALTH_REG_TYPE_CFG:
+		pci_write_config_dword(bp->pdev, reg_off, val);
+		break;
+	case BNXT_FW_HEALTH_REG_TYPE_GRC:
+		writel(reg_off & BNXT_GRC_BASE_MASK,
+		       bp->bar0 + BNXT_GRCPF_REG_WINDOW_BASE_OUT + 4);
+		reg_off = (reg_off & BNXT_GRC_OFFSET_MASK) + 0x2000;
+		/* fall through */
+	case BNXT_FW_HEALTH_REG_TYPE_BAR0:
+		writel(val, bp->bar0 + reg_off);
+		break;
+	case BNXT_FW_HEALTH_REG_TYPE_BAR1:
+		writel(val, bp->bar1 + reg_off);
+		break;
+	}
+	if (delay_msecs) {
+		pci_read_config_dword(bp->pdev, 0, &val);
+		msleep(delay_msecs);
+	}
+}
+
+static void bnxt_reset_all(struct bnxt *bp)
+{
+	struct bnxt_fw_health *fw_health = bp->fw_health;
+	int i;
+
+	if (fw_health->flags & ERROR_RECOVERY_QCFG_RESP_FLAGS_HOST) {
+		for (i = 0; i < fw_health->fw_reset_seq_cnt; i++)
+			bnxt_fw_reset_writel(bp, i);
+	} else if (fw_health->flags & ERROR_RECOVERY_QCFG_RESP_FLAGS_CO_CPU) {
+		struct hwrm_fw_reset_input req = {0};
+		int rc;
+
+		bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_FW_RESET, -1, -1);
+		req.resp_addr = cpu_to_le64(bp->hwrm_cmd_kong_resp_dma_addr);
+		req.embedded_proc_type = FW_RESET_REQ_EMBEDDED_PROC_TYPE_CHIP;
+		req.selfrst_status = FW_RESET_REQ_SELFRST_STATUS_SELFRSTASAP;
+		req.flags = FW_RESET_REQ_FLAGS_RESET_GRACEFUL;
+		rc = hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+		if (rc)
+			netdev_warn(bp->dev, "Unable to reset FW rc=%d\n", rc);
+	}
+	bp->fw_reset_timestamp = jiffies;
+}
+
 static void bnxt_fw_reset_task(struct work_struct *work)
 {
 	struct bnxt *bp = container_of(work, struct bnxt, fw_reset_task.work);
@@ -10441,6 +10497,14 @@ static void bnxt_fw_reset_task(struct work_struct *work)
 		rtnl_unlock();
 		bnxt_queue_fw_reset_work(bp, bp->fw_reset_min_dsecs * HZ / 10);
 		return;
+	case BNXT_FW_RESET_STATE_RESET_FW: {
+		u32 wait_dsecs = bp->fw_health->post_reset_wait_dsecs;
+
+		bnxt_reset_all(bp);
+		bp->fw_reset_state = BNXT_FW_RESET_STATE_ENABLE_DEV;
+		bnxt_queue_fw_reset_work(bp, wait_dsecs * HZ / 10);
+		return;
+	}
 	case BNXT_FW_RESET_STATE_ENABLE_DEV:
 		clear_bit(BNXT_STATE_FW_FATAL_COND, &bp->state);
 		if (pci_enable_device(bp->pdev)) {
-- 
2.5.1


^ permalink raw reply related

* [PATCH net-next v2 21/22] bnxt_en: Add bnxt_fw_exception() to handle fatal firmware errors.
From: Michael Chan @ 2019-08-30  3:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui
In-Reply-To: <1567137305-5853-1-git-send-email-michael.chan@broadcom.com>

This call will handle fatal firmware errors by forcing a reset on the
firmware.  The master function driver will carry out the forced reset.
The sequence will go through the same bnxt_fw_reset_task() workqueue.
This fatal reset differs from the non-fatal reset at the beginning
stages.  From the BNXT_FW_RESET_STATE_ENABLE_DEV state onwards where
the firmware is coming out of reset, it is practically identical to the
non-fatal reset.

The next patch will add the periodic heartbeat check and the devlink
reporter to report the fatal event and to initiate the bnxt_fw_exception()
call.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 44 +++++++++++++++++++++++++++++++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  1 +
 2 files changed, 45 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 51cf679..5c7379e 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -10003,6 +10003,40 @@ static void bnxt_fw_reset_close(struct bnxt *bp)
 	bp->ctx = NULL;
 }
 
+/* rtnl_lock is acquired before calling this function */
+static void bnxt_force_fw_reset(struct bnxt *bp)
+{
+	struct bnxt_fw_health *fw_health = bp->fw_health;
+	u32 wait_dsecs;
+
+	if (!test_bit(BNXT_STATE_OPEN, &bp->state) ||
+	    test_bit(BNXT_STATE_IN_FW_RESET, &bp->state))
+		return;
+
+	set_bit(BNXT_STATE_IN_FW_RESET, &bp->state);
+	bnxt_fw_reset_close(bp);
+	wait_dsecs = fw_health->master_func_wait_dsecs;
+	if (fw_health->master) {
+		if (fw_health->flags & ERROR_RECOVERY_QCFG_RESP_FLAGS_CO_CPU)
+			wait_dsecs = 0;
+		bp->fw_reset_state = BNXT_FW_RESET_STATE_RESET_FW;
+	} else {
+		bp->fw_reset_timestamp = jiffies + wait_dsecs * HZ / 10;
+		wait_dsecs = fw_health->normal_func_wait_dsecs;
+		bp->fw_reset_state = BNXT_FW_RESET_STATE_ENABLE_DEV;
+	}
+	bp->fw_reset_max_dsecs = fw_health->post_reset_max_wait_dsecs;
+	bnxt_queue_fw_reset_work(bp, wait_dsecs * HZ / 10);
+}
+
+void bnxt_fw_exception(struct bnxt *bp)
+{
+	set_bit(BNXT_STATE_FW_FATAL_COND, &bp->state);
+	bnxt_rtnl_lock_sp(bp);
+	bnxt_force_fw_reset(bp);
+	bnxt_rtnl_unlock_sp(bp);
+}
+
 void bnxt_fw_reset(struct bnxt *bp)
 {
 	int rc;
@@ -10506,6 +10540,16 @@ static void bnxt_fw_reset_task(struct work_struct *work)
 		return;
 	}
 	case BNXT_FW_RESET_STATE_ENABLE_DEV:
+		if (test_bit(BNXT_STATE_FW_FATAL_COND, &bp->state) &&
+		    bp->fw_health) {
+			u32 val;
+
+			val = bnxt_fw_health_readl(bp,
+						   BNXT_FW_RESET_INPROG_REG);
+			if (val)
+				netdev_warn(bp->dev, "FW reset inprog %x after min wait time.\n",
+					    val);
+		}
 		clear_bit(BNXT_STATE_FW_FATAL_COND, &bp->state);
 		if (pci_enable_device(bp->pdev)) {
 			netdev_err(bp->dev, "Cannot re-enable PCI device\n");
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index f3a6aad..3459b2a 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1982,6 +1982,7 @@ int bnxt_open_nic(struct bnxt *, bool, bool);
 int bnxt_half_open_nic(struct bnxt *bp);
 void bnxt_half_close_nic(struct bnxt *bp);
 int bnxt_close_nic(struct bnxt *, bool, bool);
+void bnxt_fw_exception(struct bnxt *bp);
 void bnxt_fw_reset(struct bnxt *bp);
 int bnxt_check_rings(struct bnxt *bp, int tx, int rx, bool sh, int tcs,
 		     int tx_xdp);
-- 
2.5.1


^ permalink raw reply related

* [PATCH net-next v2 22/22] bnxt_en: Add FW fatal devlink_health_reporter.
From: Michael Chan @ 2019-08-30  3:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, vasundhara-v.volam, ray.jui, Jiri Pirko
In-Reply-To: <1567137305-5853-1-git-send-email-michael.chan@broadcom.com>

From: Vasundhara Volam <vasundhara-v.volam@broadcom.com>

Health show command example and output:

$ devlink health show pci/0000:af:00.0 reporter fw_fatal

pci/0000:af:00.0:
  name fw_fatal
    state healthy error 1 recover 1 grace_period 0 auto_recover true

Fatal events from firmware or missing periodic heartbeats will
be reported and recovery will be handled.

We also turn on the support flags when we register with the firmware to
enable this health and recovery feature in the firmware.

Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c         | 80 ++++++++++++++++++++++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h         |  7 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 56 ++++++++++++++++
 3 files changed, 141 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 5c7379e..f8a834f 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1988,7 +1988,9 @@ static int bnxt_async_event_process(struct bnxt *bp,
 			goto async_event_process_exit;
 		set_bit(BNXT_RESET_TASK_SILENT_SP_EVENT, &bp->sp_event);
 		break;
-	case ASYNC_EVENT_CMPL_EVENT_ID_RESET_NOTIFY:
+	case ASYNC_EVENT_CMPL_EVENT_ID_RESET_NOTIFY: {
+		u32 data1 = le32_to_cpu(cmpl->event_data1);
+
 		bp->fw_reset_timestamp = jiffies;
 		bp->fw_reset_min_dsecs = cmpl->timestamp_lo;
 		if (!bp->fw_reset_min_dsecs)
@@ -1996,8 +1998,16 @@ static int bnxt_async_event_process(struct bnxt *bp,
 		bp->fw_reset_max_dsecs = le16_to_cpu(cmpl->timestamp_hi);
 		if (!bp->fw_reset_max_dsecs)
 			bp->fw_reset_max_dsecs = BNXT_DFLT_FW_RST_MAX_DSECS;
+		if (EVENT_DATA1_RESET_NOTIFY_FATAL(data1)) {
+			netdev_warn(bp->dev, "Firmware fatal reset event received\n");
+			set_bit(BNXT_STATE_FW_FATAL_COND, &bp->state);
+		} else {
+			netdev_warn(bp->dev, "Firmware non-fatal reset event received, max wait time %d msec\n",
+				    bp->fw_reset_max_dsecs * 100);
+		}
 		set_bit(BNXT_FW_RESET_NOTIFY_SP_EVENT, &bp->sp_event);
 		break;
+	}
 	case ASYNC_EVENT_CMPL_EVENT_ID_ERROR_RECOVERY: {
 		struct bnxt_fw_health *fw_health = bp->fw_health;
 		u32 data1 = le32_to_cpu(cmpl->event_data1);
@@ -4414,6 +4424,7 @@ static int bnxt_hwrm_func_drv_rgtr(struct bnxt *bp)
 {
 	struct hwrm_func_drv_rgtr_output *resp = bp->hwrm_cmd_resp_addr;
 	struct hwrm_func_drv_rgtr_input req = {0};
+	u32 flags;
 	int rc;
 
 	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_FUNC_DRV_RGTR, -1, -1);
@@ -4423,7 +4434,11 @@ static int bnxt_hwrm_func_drv_rgtr(struct bnxt *bp)
 			    FUNC_DRV_RGTR_REQ_ENABLES_VER);
 
 	req.os_type = cpu_to_le16(FUNC_DRV_RGTR_REQ_OS_TYPE_LINUX);
-	req.flags = cpu_to_le32(FUNC_DRV_RGTR_REQ_FLAGS_16BIT_VER_MODE);
+	flags = FUNC_DRV_RGTR_REQ_FLAGS_16BIT_VER_MODE |
+		FUNC_DRV_RGTR_REQ_FLAGS_HOT_RESET_SUPPORT;
+	if (bp->fw_cap & BNXT_FW_CAP_ERROR_RECOVERY)
+		flags |= FUNC_DRV_RGTR_REQ_FLAGS_ERROR_RECOVERY_SUPPORT;
+	req.flags = cpu_to_le32(flags);
 	req.ver_maj_8b = DRV_VER_MAJ;
 	req.ver_min_8b = DRV_VER_MIN;
 	req.ver_upd_8b = DRV_VER_UPD;
@@ -9926,6 +9941,38 @@ static void bnxt_tx_timeout(struct net_device *dev)
 	bnxt_queue_sp_work(bp);
 }
 
+static void bnxt_fw_health_check(struct bnxt *bp)
+{
+	struct bnxt_fw_health *fw_health = bp->fw_health;
+	u32 val;
+
+	if (!fw_health || !fw_health->enabled ||
+	    test_bit(BNXT_STATE_IN_FW_RESET, &bp->state))
+		return;
+
+	if (fw_health->tmr_counter) {
+		fw_health->tmr_counter--;
+		return;
+	}
+
+	val = bnxt_fw_health_readl(bp, BNXT_FW_HEARTBEAT_REG);
+	if (val == fw_health->last_fw_heartbeat)
+		goto fw_reset;
+
+	fw_health->last_fw_heartbeat = val;
+
+	val = bnxt_fw_health_readl(bp, BNXT_FW_RESET_CNT_REG);
+	if (val != fw_health->last_fw_reset_cnt)
+		goto fw_reset;
+
+	fw_health->tmr_counter = fw_health->tmr_multiplier;
+	return;
+
+fw_reset:
+	set_bit(BNXT_FW_EXCEPTION_SP_EVENT, &bp->sp_event);
+	bnxt_queue_sp_work(bp);
+}
+
 static void bnxt_timer(struct timer_list *t)
 {
 	struct bnxt *bp = from_timer(bp, t, timer);
@@ -9937,6 +9984,9 @@ static void bnxt_timer(struct timer_list *t)
 	if (atomic_read(&bp->intr_sem) != 0)
 		goto bnxt_restart_timer;
 
+	if (bp->fw_cap & BNXT_FW_CAP_ERROR_RECOVERY)
+		bnxt_fw_health_check(bp);
+
 	if (bp->link_info.link_up && (bp->flags & BNXT_FLAG_PORT_STATS) &&
 	    bp->stats_coal_ticks) {
 		set_bit(BNXT_PERIODIC_STATS_SP_EVENT, &bp->sp_event);
@@ -10003,6 +10053,26 @@ static void bnxt_fw_reset_close(struct bnxt *bp)
 	bp->ctx = NULL;
 }
 
+static bool is_bnxt_fw_ok(struct bnxt *bp)
+{
+	struct bnxt_fw_health *fw_health = bp->fw_health;
+	bool no_heartbeat = false, has_reset = false;
+	u32 val;
+
+	val = bnxt_fw_health_readl(bp, BNXT_FW_HEARTBEAT_REG);
+	if (val == fw_health->last_fw_heartbeat)
+		no_heartbeat = true;
+
+	val = bnxt_fw_health_readl(bp, BNXT_FW_RESET_CNT_REG);
+	if (val != fw_health->last_fw_reset_cnt)
+		has_reset = true;
+
+	if (!no_heartbeat && has_reset)
+		return true;
+
+	return false;
+}
+
 /* rtnl_lock is acquired before calling this function */
 static void bnxt_force_fw_reset(struct bnxt *bp)
 {
@@ -10207,6 +10277,12 @@ static void bnxt_sp_task(struct work_struct *work)
 	if (test_and_clear_bit(BNXT_FW_RESET_NOTIFY_SP_EVENT, &bp->sp_event))
 		bnxt_devlink_health_report(bp, BNXT_FW_RESET_NOTIFY_SP_EVENT);
 
+	if (test_and_clear_bit(BNXT_FW_EXCEPTION_SP_EVENT, &bp->sp_event)) {
+		if (!is_bnxt_fw_ok(bp))
+			bnxt_devlink_health_report(bp,
+						   BNXT_FW_EXCEPTION_SP_EVENT);
+	}
+
 	smp_mb__before_atomic();
 	clear_bit(BNXT_STATE_IN_SP_TASK, &bp->state);
 }
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 3459b2a..333b0a8 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -472,6 +472,11 @@ struct rx_tpa_end_cmp_ext {
 	((le32_to_cpu((rx_tpa_end_ext)->rx_tpa_end_cmp_dup_acks) &	\
 	 RX_TPA_END_CMP_AGG_BUFS_P5) >> RX_TPA_END_CMP_AGG_BUFS_SHIFT_P5)
 
+#define EVENT_DATA1_RESET_NOTIFY_FATAL(data1)				\
+	(((data1) &							\
+	  ASYNC_EVENT_CMPL_RESET_NOTIFY_EVENT_DATA1_REASON_CODE_MASK) ==\
+	 ASYNC_EVENT_CMPL_RESET_NOTIFY_EVENT_DATA1_REASON_CODE_FW_EXCEPTION_FATAL)
+
 #define EVENT_DATA1_RECOVERY_MASTER_FUNC(data1)				\
 	!!((data1) &							\
 	   ASYNC_EVENT_CMPL_ERROR_RECOVERY_EVENT_DATA1_FLAGS_MASTER_FUNC)
@@ -1372,6 +1377,7 @@ struct bnxt_fw_health {
 	u32 fw_reset_seq_delay_msec[16];
 	struct devlink_health_reporter	*fw_reporter;
 	struct devlink_health_reporter *fw_reset_reporter;
+	struct devlink_health_reporter *fw_fatal_reporter;
 };
 
 struct bnxt_fw_reporter_ctx {
@@ -1728,6 +1734,7 @@ struct bnxt {
 #define BNXT_UPDATE_PHY_SP_EVENT	16
 #define BNXT_RING_COAL_NOW_SP_EVENT	17
 #define BNXT_FW_RESET_NOTIFY_SP_EVENT	18
+#define BNXT_FW_EXCEPTION_SP_EVENT	19
 
 	struct delayed_work	fw_reset_task;
 	int			fw_reset_state;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
index 8512467..e664392 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
@@ -83,6 +83,31 @@ struct devlink_health_reporter_ops bnxt_dl_fw_reset_reporter_ops = {
 	.recover = bnxt_fw_reset_recover,
 };
 
+static int bnxt_fw_fatal_recover(struct devlink_health_reporter *reporter,
+				 void *priv_ctx)
+{
+	struct bnxt *bp = devlink_health_reporter_priv(reporter);
+	struct bnxt_fw_reporter_ctx *fw_reporter_ctx = priv_ctx;
+	unsigned long event;
+
+	if (!priv_ctx)
+		return -EOPNOTSUPP;
+
+	event = fw_reporter_ctx->sp_event;
+	if (event == BNXT_FW_RESET_NOTIFY_SP_EVENT)
+		bnxt_fw_reset(bp);
+	else if (event == BNXT_FW_EXCEPTION_SP_EVENT)
+		bnxt_fw_exception(bp);
+
+	return 0;
+}
+
+static const
+struct devlink_health_reporter_ops bnxt_dl_fw_fatal_reporter_ops = {
+	.name = "fw_fatal",
+	.recover = bnxt_fw_fatal_recover,
+};
+
 static void bnxt_dl_fw_reporters_create(struct bnxt *bp)
 {
 	struct bnxt_fw_health *health = bp->fw_health;
@@ -108,6 +133,16 @@ static void bnxt_dl_fw_reporters_create(struct bnxt *bp)
 			    PTR_ERR(health->fw_reset_reporter));
 		health->fw_reset_reporter = NULL;
 	}
+
+	health->fw_fatal_reporter =
+		devlink_health_reporter_create(bp->dl,
+					       &bnxt_dl_fw_fatal_reporter_ops,
+					       0, true, bp);
+	if (IS_ERR(health->fw_fatal_reporter)) {
+		netdev_warn(bp->dev, "Failed to create FW fatal health reporter, rc = %ld\n",
+			    PTR_ERR(health->fw_fatal_reporter));
+		health->fw_fatal_reporter = NULL;
+	}
 }
 
 static void bnxt_dl_fw_reporters_destroy(struct bnxt *bp)
@@ -122,6 +157,9 @@ static void bnxt_dl_fw_reporters_destroy(struct bnxt *bp)
 
 	if (health->fw_reset_reporter)
 		devlink_health_reporter_destroy(health->fw_reset_reporter);
+
+	if (health->fw_fatal_reporter)
+		devlink_health_reporter_destroy(health->fw_fatal_reporter);
 }
 
 void bnxt_devlink_health_report(struct bnxt *bp, unsigned long event)
@@ -135,6 +173,15 @@ void bnxt_devlink_health_report(struct bnxt *bp, unsigned long event)
 	fw_reporter_ctx.sp_event = event;
 	switch (event) {
 	case BNXT_FW_RESET_NOTIFY_SP_EVENT:
+		if (test_bit(BNXT_STATE_FW_FATAL_COND, &bp->state)) {
+			if (!fw_health->fw_fatal_reporter)
+				return;
+
+			devlink_health_report(fw_health->fw_fatal_reporter,
+					      "FW fatal async event received",
+					      &fw_reporter_ctx);
+			return;
+		}
 		if (!fw_health->fw_reset_reporter)
 			return;
 
@@ -142,6 +189,15 @@ void bnxt_devlink_health_report(struct bnxt *bp, unsigned long event)
 				      "FW non-fatal reset event received",
 				      &fw_reporter_ctx);
 		return;
+
+	case BNXT_FW_EXCEPTION_SP_EVENT:
+		if (!fw_health->fw_fatal_reporter)
+			return;
+
+		devlink_health_report(fw_health->fw_fatal_reporter,
+				      "FW fatal error reported",
+				      &fw_reporter_ctx);
+		return;
 	}
 }
 
-- 
2.5.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox