Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] hv_sock: use HV_HYP_PAGE_SIZE instead of PAGE_SIZE_4K
From: David Miller @ 2019-07-26  0:26 UTC (permalink / raw)
  To: himadrispandya
  Cc: mikelley, kys, haiyangz, sthemmin, sashal, linux-hyperv, netdev,
	linux-kernel, himadri18.07
In-Reply-To: <20190725051125.10605-1-himadri18.07@gmail.com>

From: Himadri Pandya <himadrispandya@gmail.com>
Date: Thu, 25 Jul 2019 05:11:25 +0000

> Older windows hosts require the hv_sock ring buffer to be defined
> using 4K pages. This was achieved by using the symbol PAGE_SIZE_4K
> defined specifically for this purpose. But now we have a new symbol
> HV_HYP_PAGE_SIZE defined in hyperv-tlfs which can be used for this.
> 
> This patch removes the definition of symbol PAGE_SIZE_4K and replaces
> its usage with the symbol HV_HYP_PAGE_SIZE. This patch also aligns
> sndbuf and rcvbuf to hyper-v specific page size using HV_HYP_PAGE_SIZE
> instead of the guest page size(PAGE_SIZE) as hyper-v expects the page
> size to be 4K and it might not be the case on ARM64 architecture.
> 
> Signed-off-by: Himadri Pandya <himadri18.07@gmail.com>

This doesn't compile:

  CC [M]  net/vmw_vsock/hyperv_transport.o
net/vmw_vsock/hyperv_transport.c:58:28: error: ‘HV_HYP_PAGE_SIZE’ undeclared here (not in a function); did you mean ‘HV_MESSAGE_SIZE’?
 #define HVS_SEND_BUF_SIZE (HV_HYP_PAGE_SIZE - sizeof(struct vmpipe_proto_header))
                            ^~~~~~~~~~~~~~~~

^ permalink raw reply

* Re: [PATCH net-next] net: mvneta: use devm_platform_ioremap_resource() to simplify code
From: David Miller @ 2019-07-26  0:28 UTC (permalink / raw)
  To: Jisheng.Zhang; +Cc: thomas.petazzoni, netdev, linux-kernel
In-Reply-To: <20190725153741.095dca99@xhacker.debian>

From: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Date: Thu, 25 Jul 2019 07:48:04 +0000

> devm_platform_ioremap_resource() wraps platform_get_resource() and
> devm_ioremap_resource() in a single helper, let's use that helper to
> simplify the code.
> 
> Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>

Applied.

^ permalink raw reply

* Re: pull-request: bpf 2019-07-25
From: David Miller @ 2019-07-26  0:35 UTC (permalink / raw)
  To: ast; +Cc: daniel, netdev, bpf, kernel-team
In-Reply-To: <20190725173541.2413580-1-ast@kernel.org>

From: Alexei Starovoitov <ast@kernel.org>
Date: Thu, 25 Jul 2019 10:35:41 -0700

> The following pull-request contains BPF updates for your *net* tree.
> 
> The main changes are:
> 
> 1) fix segfault in libbpf, from Andrii.
> 
> 2) fix gso_segs access, from Eric.
> 
> 3) tls/sockmap fixes, from Jakub and John.
> 
> Please consider pulling these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Pulled, thanks Alexei.

I will push back out after build testing.

^ permalink raw reply

* Re: [PATCH bpf-next v10 0/2] bpf: Allow bpf_skb_event_output for more prog types
From: Alexei Starovoitov @ 2019-07-26  0:59 UTC (permalink / raw)
  To: Allan Zhang
  Cc: Network Development, bpf, Song Liu, Daniel Borkmann,
	Andrii Nakryiko, Alexei Starovoitov
In-Reply-To: <20190724000725.15634-1-allanzhang@google.com>

On Tue, Jul 23, 2019 at 5:07 PM Allan Zhang <allanzhang@google.com> wrote:
>
> Software event output is only enabled by a few prog types right now (TC,
> LWT out, XDP, sockops). Many other skb based prog types need
> bpf_skb_event_output to produce software event.
>
> More prog types are enabled to access bpf_skb_event_output in this
> patch.

Applied. Thanks

^ permalink raw reply

* Re: [PATCH bpf-next 2/6] bpf: add BPF_MAP_DUMP command to dump more than one entry per call
From: Willem de Bruijn @ 2019-07-26  1:02 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Brian Vazquez, Song Liu, Brian Vazquez, Alexei Starovoitov,
	Daniel Borkmann, David S . Miller, Stanislav Fomichev,
	Petar Penkov, open list, Networking, bpf
In-Reply-To: <20190725235432.lkptx3fafegnm2et@ast-mbp>

On Thu, Jul 25, 2019 at 7:54 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Jul 25, 2019 at 04:25:53PM -0700, Brian Vazquez wrote:
> > > > > If prev_key is deleted before map_get_next_key(), we get the first key
> > > > > again. This is pretty weird.
> > > >
> > > > Yes, I know. But note that the current scenario happens even for the
> > > > old interface (imagine you are walking a map from userspace and you
> > > > tried get_next_key the prev_key was removed, you will start again from
> > > > the beginning without noticing it).
> > > > I tried to sent a patch in the past but I was missing some context:
> > > > before NULL was used to get the very first_key the interface relied in
> > > > a random (non existent) key to retrieve the first_key in the map, and
> > > > I was told what we still have to support that scenario.
> > >
> > > BPF_MAP_DUMP is slightly different, as you may return the first key
> > > multiple times in the same call. Also, BPF_MAP_DUMP is new, so we
> > > don't have to support legacy scenarios.
> > >
> > > Since BPF_MAP_DUMP keeps a list of elements. It is possible to try
> > > to look up previous keys. Would something down this direction work?
> >
> > I've been thinking about it and I think first we need a way to detect
> > that since key was not present we got the first_key instead:
> >
> > - One solution I had in mind was to explicitly asked for the first key
> > with map_get_next_key(map, NULL, first_key) and while walking the map
> > check that map_get_next_key(map, prev_key, key) doesn't return the
> > same key. This could be done using memcmp.
> > - Discussing with Stan, he mentioned that another option is to support
> > a flag in map_get_next_key to let it know that we want an error
> > instead of the first_key.
> >
> > After detecting the problem we also need to define what we want to do,
> > here some options:
> >
> > a) Return the error to the caller
> > b) Try with previous keys if any (which be limited to the keys that we
> > have traversed so far in this dump call)
> > c) continue with next entries in the map. array is easy just get the
> > next valid key (starting on i+1), but hmap might be difficult since
> > starting on the next bucket could potentially skip some keys that were
> > concurrently added to the same bucket where key used to be, and
> > starting on the same bucket could lead us to return repeated elements.
> >
> > Or maybe we could support those 3 cases via flags and let the caller
> > decide which one to use?
>
> this type of indecision is the reason why I wasn't excited about
> batch dumping in the first place and gave 'soft yes' when Stan
> mentioned it during lsf/mm/bpf uconf.
> We probably shouldn't do it.
> It feels this map_dump makes api more complex and doesn't really
> give much benefit to the user other than large map dump becomes faster.
> I think we gotta solve this problem differently.

Multiple variants with flags indeed makes the API complex. I think the
kernel should expose only the simplest, most obvious behavior that
allows the application to recover. In this case, that sounds like
option (a) and restart.

In practice, the common use case is to allocate enough user memory to
read an entire table in one go, in which case the entire issue is
moot.

The cycle savings of dump are significant for large tables. I'm not
sure how we achieve that differently and even simpler? We originally
looked at shared memory, but that is obviously much more complex.

^ permalink raw reply

* Re: [PATCH bpf-next v3 0/7] bpf/flow_dissector: support input flags
From: Alexei Starovoitov @ 2019-07-26  1:06 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Network Development, bpf, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Song Liu, Willem de Bruijn, Petar Penkov
In-Reply-To: <20190725225231.195090-1-sdf@google.com>

On Thu, Jul 25, 2019 at 3:52 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> C flow dissector supports input flags that tell it to customize parsing
> by either stopping early or trying to parse as deep as possible.
> BPF flow dissector always parses as deep as possible which is sub-optimal.
> Pass input flags to the BPF flow dissector as well so it can make the same
> decisions.
>
> Series outline:
> * remove unused FLOW_DISSECTOR_F_STOP_AT_L3 flag
> * export FLOW_DISSECTOR_F_XXX flags as uapi and pass them to BPF
>   flow dissector
> * add documentation for the export flags
> * support input flags in BPF_PROG_TEST_RUN via ctx_{in,out}
> * sync uapi to tools
> * support FLOW_DISSECTOR_F_PARSE_1ST_FRAG in selftest
> * support FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL in kernel and selftest
> * support FLOW_DISSECTOR_F_STOP_AT_ENCAP in selftest
>
> Pros:
> * makes BPF flow dissector faster by avoiding burning extra cycles
> * existing BPF progs continue to work by ignoring the flags and always
>   parsing as deep as possible
>
> Cons:
> * new UAPI which we need to support (OTOH, if we need to deprecate some
>   flags, we can just stop setting them upon calling BPF programs)
>
> Some numbers (with .repeat = 4000000 in test_flow_dissector):
>         test_flow_dissector:PASS:ipv4-frag 35 nsec
>         test_flow_dissector:PASS:ipv4-frag 35 nsec
>         test_flow_dissector:PASS:ipv4-no-frag 32 nsec
>         test_flow_dissector:PASS:ipv4-no-frag 32 nsec
>
>         test_flow_dissector:PASS:ipv6-frag 39 nsec
>         test_flow_dissector:PASS:ipv6-frag 39 nsec
>         test_flow_dissector:PASS:ipv6-no-frag 36 nsec
>         test_flow_dissector:PASS:ipv6-no-frag 36 nsec
>
>         test_flow_dissector:PASS:ipv6-flow-label 36 nsec
>         test_flow_dissector:PASS:ipv6-flow-label 36 nsec
>         test_flow_dissector:PASS:ipv6-no-flow-label 33 nsec
>         test_flow_dissector:PASS:ipv6-no-flow-label 33 nsec
>
>         test_flow_dissector:PASS:ipip-encap 38 nsec
>         test_flow_dissector:PASS:ipip-encap 38 nsec
>         test_flow_dissector:PASS:ipip-no-encap 32 nsec
>         test_flow_dissector:PASS:ipip-no-encap 32 nsec
>
> The improvement is around 10%, but it's in a tight cache-hot
> BPF_PROG_TEST_RUN loop.

Applied. Thanks

^ permalink raw reply

* Re: [PATCH 00/12] block/bio, fs: convert put_page() to put_user_page*()
From: John Hubbard @ 2019-07-26  1:24 UTC (permalink / raw)
  To: Bob Liu, Andrew Morton
  Cc: Alexander Viro, Anna Schumaker, David S . Miller,
	Dominique Martinet, Eric Van Hensbergen, Jason Gunthorpe,
	Jason Wang, Jens Axboe, Latchesar Ionkov, Michael S . Tsirkin,
	Miklos Szeredi, Trond Myklebust, Christoph Hellwig,
	Matthew Wilcox, linux-mm, LKML, ceph-devel, kvm, linux-block,
	linux-cifs, linux-fsdevel, linux-nfs, linux-rdma, netdev,
	samba-technical, v9fs-developer, virtualization
In-Reply-To: <8621066c-e242-c449-eb04-4f2ce6867140@oracle.com>

On 7/24/19 5:41 PM, Bob Liu wrote:
> On 7/24/19 12:25 PM, john.hubbard@gmail.com wrote:
>> From: John Hubbard <jhubbard@nvidia.com>
>>
>> Hi,
>>
>> This is mostly Jerome's work, converting the block/bio and related areas
>> to call put_user_page*() instead of put_page(). Because I've changed
>> Jerome's patches, in some cases significantly, I'd like to get his
>> feedback before we actually leave him listed as the author (he might
>> want to disown some or all of these).
>>
> 
> Could you add some background to the commit log for people don't have the context..
> Why this converting? What's the main differences?
> 

Hi Bob,

1. Many of the patches have a blurb like this:

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

...and if you look at that commit, you'll find several pages of
information in its commit description, which should address your point.

2. This whole series has to be re-worked, as per the other feedback thread.
So I'll keep your comment in mind when I post a new series.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply

* Re: [PATCH bpf-next 2/6] bpf: add BPF_MAP_DUMP command to dump more than one entry per call
From: Brian Vazquez @ 2019-07-26  1:24 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Song Liu, Brian Vazquez, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller, Stanislav Fomichev, Willem de Bruijn,
	Petar Penkov, open list, Networking, bpf
In-Reply-To: <20190725235432.lkptx3fafegnm2et@ast-mbp>

On Thu, Jul 25, 2019 at 4:54 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Jul 25, 2019 at 04:25:53PM -0700, Brian Vazquez wrote:
> > > > > If prev_key is deleted before map_get_next_key(), we get the first key
> > > > > again. This is pretty weird.
> > > >
> > > > Yes, I know. But note that the current scenario happens even for the
> > > > old interface (imagine you are walking a map from userspace and you
> > > > tried get_next_key the prev_key was removed, you will start again from
> > > > the beginning without noticing it).
> > > > I tried to sent a patch in the past but I was missing some context:
> > > > before NULL was used to get the very first_key the interface relied in
> > > > a random (non existent) key to retrieve the first_key in the map, and
> > > > I was told what we still have to support that scenario.
> > >
> > > BPF_MAP_DUMP is slightly different, as you may return the first key
> > > multiple times in the same call. Also, BPF_MAP_DUMP is new, so we
> > > don't have to support legacy scenarios.
> > >
> > > Since BPF_MAP_DUMP keeps a list of elements. It is possible to try
> > > to look up previous keys. Would something down this direction work?
> >
> > I've been thinking about it and I think first we need a way to detect
> > that since key was not present we got the first_key instead:
> >
> > - One solution I had in mind was to explicitly asked for the first key
> > with map_get_next_key(map, NULL, first_key) and while walking the map
> > check that map_get_next_key(map, prev_key, key) doesn't return the
> > same key. This could be done using memcmp.
> > - Discussing with Stan, he mentioned that another option is to support
> > a flag in map_get_next_key to let it know that we want an error
> > instead of the first_key.
> >
> > After detecting the problem we also need to define what we want to do,
> > here some options:
> >
> > a) Return the error to the caller
> > b) Try with previous keys if any (which be limited to the keys that we
> > have traversed so far in this dump call)
> > c) continue with next entries in the map. array is easy just get the
> > next valid key (starting on i+1), but hmap might be difficult since
> > starting on the next bucket could potentially skip some keys that were
> > concurrently added to the same bucket where key used to be, and
> > starting on the same bucket could lead us to return repeated elements.
> >
> > Or maybe we could support those 3 cases via flags and let the caller
> > decide which one to use?
>
> this type of indecision is the reason why I wasn't excited about
> batch dumping in the first place and gave 'soft yes' when Stan
> mentioned it during lsf/mm/bpf uconf.
> We probably shouldn't do it.
> It feels this map_dump makes api more complex and doesn't really
> give much benefit to the user other than large map dump becomes faster.
> I think we gotta solve this problem differently.

Some users are working around the dumping problems with the existing
api by creating a bpf_map_get_next_key_and_delete userspace function
(see https://www.bouncybouncy.net/blog/bpf_map_get_next_key-pitfalls/)
which in my opinion is actually a good idea. The only problem with
that is that calling bpf_map_get_next_key(fd, key, next_key) and then
bpf_map_delete_elem(fd, key) from userspace is racing with kernel code
and it might lose some information when deleting.
We could then do map_dump_and_delete using that idea but in the kernel
where we could better handle the racing condition. In that scenario
even if we retrieve the same key it will contain different info ( the
delta between old and new value). Would that work?

^ permalink raw reply

* RE: [PATCH 0/8] can: flexcan: add CAN FD support for NXP Flexcan
From: Joakim Zhang @ 2019-07-26  1:25 UTC (permalink / raw)
  To: Marc Kleine-Budde, linux-can@vger.kernel.org
  Cc: wg@grandegger.com, dl-linux-imx, netdev@vger.kernel.org
In-Reply-To: <24eb5c67-4692-1002-2468-4ae2e1a6b68b@pengutronix.de>


> -----Original Message-----
> From: Marc Kleine-Budde <mkl@pengutronix.de>
> Sent: 2019年7月25日 18:37
> To: Joakim Zhang <qiangqing.zhang@nxp.com>; linux-can@vger.kernel.org
> Cc: wg@grandegger.com; dl-linux-imx <linux-imx@nxp.com>;
> netdev@vger.kernel.org
> Subject: Re: [PATCH 0/8] can: flexcan: add CAN FD support for NXP Flexcan
> 
> On 7/25/19 9:53 AM, Marc Kleine-Budde wrote:
> > On 7/25/19 9:38 AM, Joakim Zhang wrote:
> >> Kindly pinging...
> >>
> >> After you git pull request for linux-can-next-for-5.4-20190724, some patches
> are missing from linux-can-next/testing.
> >> can: flexcan: flexcan_mailbox_read() make use of flexcan_write64() to
> >> mark the mailbox as read
> >> can: flexcan: flexcan_irq(): add support for TX mailbox in iflag1
> >> can: flexcan: flexcan_read_reg_iflag_rx(): optimize reading
> >> can: flexcan: introduce struct flexcan_priv::tx_mask and make use of
> >> it
> >> can: flexcan: convert struct flexcan_priv::rx_mask{1,2} to rx_mask
> >> can: flexcan: remove TX mailbox bit from struct
> >> flexcan_priv::rx_mask{1,2}
> >> can: flexcan: rename struct flexcan_priv::reg_imask{1,2}_default to
> >> rx_mask{1,2}
> >> can: flexcan: flexcan_irq(): rename variable reg_iflag ->
> >> reg_iflag_rx
> >> can: flexcan: rename macro FLEXCAN_IFLAG_MB() ->
> FLEXCAN_IFLAG2_MB()
> >>
> >> You can refer to below link for the reason of adding above patches:
> >> https://www.spinics.net/lists/linux-can/msg00777.html
> >> https://www.spinics.net/lists/linux-can/msg01150.html
> >>
> >> Are you prepared to add back these patches as they are necessary for
> >> Flexcan CAN FD? And this Flexcan CAN FD patch set is based on these
> >> patches.
> >
> > Yes, these patches will be added back.
> 
> I've cleaned up the first patch a bit, and pushed everything to the testing
> branch. Can you give it a test.

Hi Marc,

Both Classic CAN and CAN FD can work fine on my side test, thank you for your kindly review.

Best Regards,
Joakim Zhang
> regards,
> Marc
> 
> --
> Pengutronix e.K.                  | Marc Kleine-Budde           |
> Industrial Linux Solutions        | Phone: +49-231-2826-924     |
> Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
> Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


^ permalink raw reply

* Re: [PATCH net-next 07/11] net: hns3: adds debug messages to identify eth down cause
From: Jakub Kicinski @ 2019-07-26  1:28 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: tanhuazhong@huawei.com, davem@davemloft.net,
	liuyonglong@huawei.com, lipeng321@huawei.com,
	yisen.zhuang@huawei.com, salil.mehta@huawei.com,
	linuxarm@huawei.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <75a02bbe5b3b0f2755cd901a8830d4a3026f9383.camel@mellanox.com>

On Thu, 25 Jul 2019 21:59:08 +0000, Saeed Mahameed wrote:
> I couldn't find any rules regarding what to put in kernel log, Maybe
> someone can share ?. but i vaguely remember that the recommendation
> for device drivers is to put nothing, only error/warning messages.

FWIW my understanding is also that only error/warning messages should
be printed. IOW things which should "never happen".

There are some historical exceptions. Probe logs for instance may be
useful, because its not trivial to get to the device if probe fails.

Another one is ethtool flashing, if it takes time we used to print into
logs some message like "please wait patiently". But since Jiri added
the progress messages in devlink that's no longer necessary.

For the messages which are basically printing the name of the function
or name of the function and their args - we have ftrace.

That's my $0.02 :)

^ permalink raw reply

* Re: [PATCH bpf-next 2/6] bpf: add BPF_MAP_DUMP command to dump more than one entry per call
From: Alexei Starovoitov @ 2019-07-26  1:47 UTC (permalink / raw)
  To: Brian Vazquez
  Cc: Song Liu, Brian Vazquez, Daniel Borkmann, David S . Miller,
	Stanislav Fomichev, Willem de Bruijn, Petar Penkov, Networking,
	bpf, Yonghong Song
In-Reply-To: <CABCgpaXE=dkBcJVqs95NZQTFuznA-q64kYPEcbvmYvAJ4wSp1A@mail.gmail.com>

On Thu, Jul 25, 2019 at 6:24 PM Brian Vazquez <brianvv.kernel@gmail.com> wrote:
>
> On Thu, Jul 25, 2019 at 4:54 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Jul 25, 2019 at 04:25:53PM -0700, Brian Vazquez wrote:
> > > > > > If prev_key is deleted before map_get_next_key(), we get the first key
> > > > > > again. This is pretty weird.
> > > > >
> > > > > Yes, I know. But note that the current scenario happens even for the
> > > > > old interface (imagine you are walking a map from userspace and you
> > > > > tried get_next_key the prev_key was removed, you will start again from
> > > > > the beginning without noticing it).
> > > > > I tried to sent a patch in the past but I was missing some context:
> > > > > before NULL was used to get the very first_key the interface relied in
> > > > > a random (non existent) key to retrieve the first_key in the map, and
> > > > > I was told what we still have to support that scenario.
> > > >
> > > > BPF_MAP_DUMP is slightly different, as you may return the first key
> > > > multiple times in the same call. Also, BPF_MAP_DUMP is new, so we
> > > > don't have to support legacy scenarios.
> > > >
> > > > Since BPF_MAP_DUMP keeps a list of elements. It is possible to try
> > > > to look up previous keys. Would something down this direction work?
> > >
> > > I've been thinking about it and I think first we need a way to detect
> > > that since key was not present we got the first_key instead:
> > >
> > > - One solution I had in mind was to explicitly asked for the first key
> > > with map_get_next_key(map, NULL, first_key) and while walking the map
> > > check that map_get_next_key(map, prev_key, key) doesn't return the
> > > same key. This could be done using memcmp.
> > > - Discussing with Stan, he mentioned that another option is to support
> > > a flag in map_get_next_key to let it know that we want an error
> > > instead of the first_key.
> > >
> > > After detecting the problem we also need to define what we want to do,
> > > here some options:
> > >
> > > a) Return the error to the caller
> > > b) Try with previous keys if any (which be limited to the keys that we
> > > have traversed so far in this dump call)
> > > c) continue with next entries in the map. array is easy just get the
> > > next valid key (starting on i+1), but hmap might be difficult since
> > > starting on the next bucket could potentially skip some keys that were
> > > concurrently added to the same bucket where key used to be, and
> > > starting on the same bucket could lead us to return repeated elements.
> > >
> > > Or maybe we could support those 3 cases via flags and let the caller
> > > decide which one to use?
> >
> > this type of indecision is the reason why I wasn't excited about
> > batch dumping in the first place and gave 'soft yes' when Stan
> > mentioned it during lsf/mm/bpf uconf.
> > We probably shouldn't do it.
> > It feels this map_dump makes api more complex and doesn't really
> > give much benefit to the user other than large map dump becomes faster.
> > I think we gotta solve this problem differently.
>
> Some users are working around the dumping problems with the existing
> api by creating a bpf_map_get_next_key_and_delete userspace function
> (see https://www.bouncybouncy.net/blog/bpf_map_get_next_key-pitfalls/)
> which in my opinion is actually a good idea. The only problem with
> that is that calling bpf_map_get_next_key(fd, key, next_key) and then
> bpf_map_delete_elem(fd, key) from userspace is racing with kernel code
> and it might lose some information when deleting.
> We could then do map_dump_and_delete using that idea but in the kernel
> where we could better handle the racing condition. In that scenario
> even if we retrieve the same key it will contain different info ( the
> delta between old and new value). Would that work?

you mean get_next+lookup+delete at once?
Sounds useful.
Yonghong has been thinking about batching api as well.

I think if we cannot figure out how to make a batch of two commands
get_next + lookup to work correctly then we need to identify/invent one
command and make batching more generic.
Like make one jumbo/compound/atomic command to be get_next+lookup+delete.
Define the semantics of this single compound command.
And then let batching to be a multiplier of such command.
In a sense that multiplier 1 or N should be have the same way.
No extra flags to alter the batching.
The high level description of the batch would be:
pls execute get_next,lookup,delete and repeat it N times.
or
pls execute get_next,lookup and repeat N times.
where each command action is defined to be composable.

Just a rough idea.

^ permalink raw reply

* Re: [PATCH net-next 06/11] net: hns3: modify firmware version display format
From: tanhuazhong @ 2019-07-26  1:50 UTC (permalink / raw)
  To: Saeed Mahameed, davem@davemloft.net
  Cc: lipeng321@huawei.com, yisen.zhuang@huawei.com,
	salil.mehta@huawei.com, linuxarm@huawei.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	moyufeng@huawei.com
In-Reply-To: <d6a32434af7e9c883f104ae66e62b7b376abb39c.camel@mellanox.com>



On 2019/7/26 5:32, Saeed Mahameed wrote:
> On Thu, 2019-07-25 at 10:34 +0800, tanhuazhong wrote:
>>
>> On 2019/7/25 2:34, Saeed Mahameed wrote:
>>> On Wed, 2019-07-24 at 11:18 +0800, Huazhong Tan wrote:
>>>> From: Yufeng Mo <moyufeng@huawei.com>
>>>>
>>>> This patch modifies firmware version display format in
>>>> hclge(vf)_cmd_init() and hns3_get_drvinfo(). Also, adds
>>>> some optimizations for firmware version display format.
>>>>
>>>> Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
>>>> Signed-off-by: Peng Li <lipeng321@huawei.com>
>>>> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
>>>> ---
>>>>    drivers/net/ethernet/hisilicon/hns3/hnae3.h              |  9
>>>> +++++++++
>>>>    drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c       | 15
>>>> +++++++++++++--
>>>>    drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c   | 10
>>>> +++++++++-
>>>>    drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c | 11
>>>> +++++++++--
>>>>    4 files changed, 40 insertions(+), 5 deletions(-)
>>>>
>>>>
> 
> [...]
> 
>>>> --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
>>>> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
>>>> @@ -419,7 +419,15 @@ int hclge_cmd_init(struct hclge_dev *hdev)
>>>>    	}
>>>>    	hdev->fw_version = version;
>>>>    
>>>> -	dev_info(&hdev->pdev->dev, "The firmware version is %08x\n",
>>>> version);
>>>> +	pr_info_once("The firmware version is %lu.%lu.%lu.%lu\n",
>>>> +		     hnae3_get_field(version,
>>>> HNAE3_FW_VERSION_BYTE3_MASK,
>>>> +				     HNAE3_FW_VERSION_BYTE3_SHIFT),
>>>> +		     hnae3_get_field(version,
>>>> HNAE3_FW_VERSION_BYTE2_MASK,
>>>> +				     HNAE3_FW_VERSION_BYTE2_SHIFT),
>>>> +		     hnae3_get_field(version,
>>>> HNAE3_FW_VERSION_BYTE1_MASK,
>>>> +				     HNAE3_FW_VERSION_BYTE1_SHIFT),
>>>> +		     hnae3_get_field(version,
>>>> HNAE3_FW_VERSION_BYTE0_MASK,
>>>> +				     HNAE3_FW_VERSION_BYTE0_SHIFT));
>>>>    
>>>
>>> Device name/string will not be printed now, what happens if i have
>>> multiple devices ? at least print the device name as it was before
>>>
>> Since on each board we only have one firmware, the firmware
>> version is same per device, and will not change when running.
>> So pr_info_once() looks good for this case.
>>
> 
> boards change too often to have such static assumption.

Ok, I will use dev_info instead of pr_info here.

> 
>> BTW, maybe we should change below print in the end of
>> hclge_init_ae_dev(), use dev_info() instead of pr_info(),
>> then we can know that which device has already initialized.
>> I will send other patch to do that, is it acceptable for you?
>>
>> "pr_info("%s driver initialization finished.\n", HCLGE_DRIVER_NAME);"
>>
> 
> I would avoid using pr_info when i can ! if you have the option to
> print with dev information as it was before that is preferable.
> 
> Thanks,
> Saeed.
> 

Thanks,
Huazhong.


^ permalink raw reply

* Re: [PATCH net-next 07/11] net: hns3: adds debug messages to identify eth down cause
From: liuyonglong @ 2019-07-26  2:21 UTC (permalink / raw)
  To: Saeed Mahameed, tanhuazhong@huawei.com, davem@davemloft.net
  Cc: lipeng321@huawei.com, yisen.zhuang@huawei.com,
	salil.mehta@huawei.com, linuxarm@huawei.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <75a02bbe5b3b0f2755cd901a8830d4a3026f9383.camel@mellanox.com>

We will change all of them to netif_msg_drv() which is default off
Thanks for your reply!

On 2019/7/26 5:59, Saeed Mahameed wrote:
> On Thu, 2019-07-25 at 20:28 +0800, liuyonglong wrote:
>>
>> On 2019/7/25 3:12, Saeed Mahameed wrote:
>>> On Wed, 2019-07-24 at 11:18 +0800, Huazhong Tan wrote:
>>>> From: Yonglong Liu <liuyonglong@huawei.com>
>>>>
>>>> Some times just see the eth interface have been down/up via
>>>> dmesg, but can not know why the eth down. So adds some debug
>>>> messages to identify the cause for this.
>>>>
>>>
>>> I really don't like this. your default msg lvl has NETIF_MSG_IFDOWN
>>> turned on .. dumping every single operation that happens on your
>>> device
>>> by default to kernel log is too much ! 
>>>
>>> We should really consider using trace buffers with well defined
>>> structures for vendor specific events. so we can use bpf filters
>>> and
>>> state of the art tools for netdev debugging.
>>>
>>
>> We do this because we can just see a link down message in dmesg, and
>> had
>> take a long time to found the cause of link down, just because
>> another
>> user changed the settings.
>>
>> We can change the net_open/net_stop/dcbnl_ops to msg_drv (not default
>> turned on),  and want to keep the others default print to kernel log,
>> is it acceptable?
>>
> 
> acceptable as long as debug information are kept off by default and
> your driver doens't spam the kernel log.
> 
> you should use dynamic debug [1] and/or "off by default" msg lvls for
> debugging information..
> 
> I couldn't find any rules regarding what to put in kernel log, Maybe
> someone can share ?. but i vaguely remember that the recommendation
> for device drivers is to put nothing, only error/warning messages.
> 
> [1] 
> https://www.kernel.org/doc/html/v4.15/admin-guide/dynamic-debug-howto.html
> 
>>>> @@ -1593,6 +1603,11 @@ static int hns3_ndo_set_vf_vlan(struct
>>>> net_device *netdev, int vf, u16 vlan,
>>>>  	struct hnae3_handle *h = hns3_get_handle(netdev);
>>>>  	int ret = -EIO;
>>>>  
>>>> +	if (netif_msg_ifdown(h))
>>>
>>> why msg_ifdown ? looks like netif_msg_drv is more appropriate, for
>>> many
>>> of the cases in this patch.
>>>
>>
>> This operation may cause link down, so we use msg_ifdown.
>>
> 
> ifdown isn't link down.. 
> 
> to be honest, I couldn't find any documentation explaining how/when to
> use msg lvls, (i didn't look too deep though), by looking at other
> drivers, my interpretations is:
> 
> ifdup (open/boot up flow)
> ifdwon (close/teardown flow)
> drv (driver based or dynamic flows) 
> etc .. 
> 
> -Saeed.
> 


^ permalink raw reply

* [PATCH 2/2] net: ipv4: Fix a possible null-pointer dereference in fib4_rule_suppress()
From: Jia-Ju Bai @ 2019-07-26  2:25 UTC (permalink / raw)
  To: davem, kuznet, yoshfuji; +Cc: netdev, linux-kernel, Jia-Ju Bai

In fib4_rule_suppress(), there is an if statement on line 145 to check
whether result->fi is NULL:
    if (result->fi)

When result->fi is NULL, it is used on line 167:
    fib_info_put(result->fi);

In fib_info_put(), the argument fi is used:
    if (refcount_dec_and_test(&fi->fib_clntref))

Thus, a possible null-pointer dereference may occur.

To fix this bug, result->fi is checked before calling fib_info_put().

This bug is found by a static analysis tool STCheck written by us.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
---
 net/ipv4/fib_rules.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index b43a7ba5c6a4..daedce293aab 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -163,7 +163,7 @@ static bool fib4_rule_suppress(struct fib_rule *rule, struct fib_lookup_arg *arg
 	return false;
 
 suppress_route:
-	if (!(arg->flags & FIB_LOOKUP_NOREF))
+	if (!(arg->flags & FIB_LOOKUP_NOREF) && result->fi)
 		fib_info_put(result->fi);
 	return true;
 }
-- 
2.17.0


^ permalink raw reply related

* [PATCH 1/2] net: ipv4: Fix a possible null-pointer dereference in inet_csk_rebuild_route()
From: Jia-Ju Bai @ 2019-07-26  2:25 UTC (permalink / raw)
  To: davem, kuznet, yoshfuji; +Cc: netdev, linux-kernel, Jia-Ju Bai

In inet_csk_rebuild_route(), rt is assigned to NULL on line 1071.
On line 1076, rt is used:
    return &rt->dst;
Thus, a possible null-pointer dereference may occur.

To fix this bug, rt is checked before being used.

This bug is found by a static analysis tool STCheck written by us.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
---
 net/ipv4/inet_connection_sock.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index f5c163d4771b..27d9d80f3401 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -1073,7 +1073,10 @@ static struct dst_entry *inet_csk_rebuild_route(struct sock *sk, struct flowi *f
 		sk_setup_caps(sk, &rt->dst);
 	rcu_read_unlock();
 
-	return &rt->dst;
+	if (rt)
+		return &rt->dst;
+	else
+		return NULL;
 }
 
 struct dst_entry *inet_csk_update_pmtu(struct sock *sk, u32 mtu)
-- 
2.17.0


^ permalink raw reply related

* Re: [PATCH net-next 07/11] net: hns3: adds debug messages to identify eth down cause
From: liuyonglong @ 2019-07-26  2:33 UTC (permalink / raw)
  To: Jakub Kicinski, Saeed Mahameed
  Cc: tanhuazhong@huawei.com, davem@davemloft.net, lipeng321@huawei.com,
	yisen.zhuang@huawei.com, salil.mehta@huawei.com,
	linuxarm@huawei.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20190725182846.253ae93f@cakuba.netronome.com>

As Saeed said, we will use netif_msg_drv() which is default off, this
can be easily open with ethtool.
Thanks for your reply!

On 2019/7/26 9:28, Jakub Kicinski wrote:
> On Thu, 25 Jul 2019 21:59:08 +0000, Saeed Mahameed wrote:
>> I couldn't find any rules regarding what to put in kernel log, Maybe
>> someone can share ?. but i vaguely remember that the recommendation
>> for device drivers is to put nothing, only error/warning messages.
> 
> FWIW my understanding is also that only error/warning messages should
> be printed. IOW things which should "never happen".
> 
> There are some historical exceptions. Probe logs for instance may be
> useful, because its not trivial to get to the device if probe fails.
> 
> Another one is ethtool flashing, if it takes time we used to print into
> logs some message like "please wait patiently". But since Jiri added
> the progress messages in devlink that's no longer necessary.
> 
> For the messages which are basically printing the name of the function
> or name of the function and their args - we have ftrace.
> 
> That's my $0.02 :)
> 
> .
> 


^ permalink raw reply

* Re: memory leak in dma_buf_ioctl
From: syzbot @ 2019-07-26  2:34 UTC (permalink / raw)
  To: bsingharora, coreteam, davem, dri-devel, duwe, dvyukov, kaber,
	kadlec, linaro-mm-sig, linux-kernel, linux-media, mingo, mpe,
	netdev, netfilter-devel, pablo, rostedt, sumit.semwal,
	syzkaller-bugs
In-Reply-To: <000000000000b68e04058e6a3421@google.com>

syzbot has bisected this bug to:

commit 04cf31a759ef575f750a63777cee95500e410994
Author: Michael Ellerman <mpe@ellerman.id.au>
Date:   Thu Mar 24 11:04:01 2016 +0000

     ftrace: Make ftrace_location_range() global

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=154293f4600000
start commit:   abdfd52a Merge tag 'armsoc-defconfig' of git://git.kernel...
git tree:       upstream
final crash:    https://syzkaller.appspot.com/x/report.txt?x=174293f4600000
console output: https://syzkaller.appspot.com/x/log.txt?x=134293f4600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=d31de3d88059b7fa
dashboard link: https://syzkaller.appspot.com/bug?extid=b2098bc44728a4efb3e9
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12526e58600000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=161784f0600000

Reported-by: syzbot+b2098bc44728a4efb3e9@syzkaller.appspotmail.com
Fixes: 04cf31a759ef ("ftrace: Make ftrace_location_range() global")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply

* Re: memory leak in dma_buf_ioctl
From: Steven Rostedt @ 2019-07-26  2:46 UTC (permalink / raw)
  To: syzbot
  Cc: bsingharora, coreteam, davem, dri-devel, duwe, dvyukov, kaber,
	kadlec, linaro-mm-sig, linux-kernel, linux-media, mingo, mpe,
	netdev, netfilter-devel, pablo, sumit.semwal, syzkaller-bugs
In-Reply-To: <00000000000005dbbc058e8c608d@google.com>

On Thu, 25 Jul 2019 19:34:01 -0700
syzbot <syzbot+b2098bc44728a4efb3e9@syzkaller.appspotmail.com> wrote:

> syzbot has bisected this bug to:
> 
> commit 04cf31a759ef575f750a63777cee95500e410994
> Author: Michael Ellerman <mpe@ellerman.id.au>
> Date:   Thu Mar 24 11:04:01 2016 +0000
> 
>      ftrace: Make ftrace_location_range() global

It's sad that I have yet to find a single syzbot bisect useful. Really?
setting a function from static to global will cause a memory leak in a
completely unrelated area of the kernel?

I'm about to set these to my /dev/null folder.

-- Steve


> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=154293f4600000
> start commit:   abdfd52a Merge tag 'armsoc-defconfig' of git://git.kernel...
> git tree:       upstream
> final crash:    https://syzkaller.appspot.com/x/report.txt?x=174293f4600000
> console output: https://syzkaller.appspot.com/x/log.txt?x=134293f4600000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=d31de3d88059b7fa
> dashboard link: https://syzkaller.appspot.com/bug?extid=b2098bc44728a4efb3e9
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12526e58600000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=161784f0600000
> 
> Reported-by: syzbot+b2098bc44728a4efb3e9@syzkaller.appspotmail.com
> Fixes: 04cf31a759ef ("ftrace: Make ftrace_location_range() global")
> 
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection


^ permalink raw reply

* [PATCH V2 net-next 04/11] net: hns3: fix mis-counting IRQ vector numbers issue
From: Huazhong Tan @ 2019-07-26  3:24 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	Yonglong Liu, Peng Li, Huazhong Tan
In-Reply-To: <1564111502-15504-1-git-send-email-tanhuazhong@huawei.com>

From: Yonglong Liu <liuyonglong@huawei.com>

The num_msi_left means the vector numbers of NIC, but if the
PF supported RoCE, it contains the vector numbers of NIC and
RoCE(Not expected).

This may cause interrupts lost in some case, because of the
NIC module used the vector resources which belongs to RoCE.

This patch corrects the value of num_msi_left to be equals to
the vector numbers of NIC, and adjust the default tqp numbers
according to the value of num_msi_left. Also, adds a little
cleanup about checking whether the device is supporting RoCE
before allocating IRQ vector.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c   |  2 ++
 drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 12 ++++++++++--
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 3c64d70..9329aab 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -1476,6 +1476,8 @@ static int hclge_vport_setup(struct hclge_vport *vport, u16 num_tqps)
 	nic->ae_algo = &ae_algo;
 	nic->numa_node_mask = hdev->numa_node_mask;
 
+	num_tqps = min_t(u16, hdev->roce_base_msix_offset - 1, num_tqps);
+
 	ret = hclge_knic_setup(vport, num_tqps,
 			       hdev->num_tx_desc, hdev->num_rx_desc);
 	if (ret)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index a13a0e1..db84782 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -287,6 +287,14 @@ static int hclgevf_get_queue_info(struct hclgevf_dev *hdev)
 	memcpy(&hdev->rss_size_max, &resp_msg[2], sizeof(u16));
 	memcpy(&hdev->rx_buf_len, &resp_msg[4], sizeof(u16));
 
+	/* if irq is not enough, let tqps have the same value of irqs,
+	 * to make sure one irq just bind to one tqp, this can improve
+	 * the performance
+	 */
+	hdev->num_tqps = min_t(u16, hdev->roce_base_msix_offset - 1,
+			       hdev->num_tqps);
+	hdev->rss_size_max = min_t(u16, hdev->rss_size_max, hdev->num_tqps);
+
 	return 0;
 }
 
@@ -2208,7 +2216,7 @@ static int hclgevf_init_msi(struct hclgevf_dev *hdev)
 	int vectors;
 	int i;
 
-	if (hnae3_get_bit(hdev->ae_dev->flag, HNAE3_DEV_SUPPORT_ROCE_B))
+	if (hnae3_dev_roce_supported(hdev))
 		vectors = pci_alloc_irq_vectors(pdev,
 						hdev->roce_base_msix_offset + 1,
 						hdev->num_msi,
@@ -2495,7 +2503,7 @@ static int hclgevf_query_vf_resource(struct hclgevf_dev *hdev)
 
 	req = (struct hclgevf_query_res_cmd *)desc.data;
 
-	if (hnae3_get_bit(hdev->ae_dev->flag, HNAE3_DEV_SUPPORT_ROCE_B)) {
+	if (hnae3_dev_roce_supported(hdev)) {
 		hdev->roce_base_msix_offset =
 		hnae3_get_field(__le16_to_cpu(req->msixcap_localid_ba_rocee),
 				HCLGEVF_MSIX_OFT_ROCEE_M,
-- 
2.7.4


^ permalink raw reply related

* [PATCH V2 net-next 07/11] net: hns3: adds debug messages to identify eth down cause
From: Huazhong Tan @ 2019-07-26  3:24 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	Yonglong Liu, Peng Li, Huazhong Tan
In-Reply-To: <1564111502-15504-1-git-send-email-tanhuazhong@huawei.com>

From: Yonglong Liu <liuyonglong@huawei.com>

Some times just see the eth interface have been down/up via
dmesg, but can not know why the eth down. So adds some debug
messages to identify the cause for this.

Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c    | 24 ++++++++++++++++++++
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 26 ++++++++++++++++++++++
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c | 14 ++++++++++++
 3 files changed, 64 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 4d58c53..2e30cfa 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -459,6 +459,10 @@ static int hns3_nic_net_open(struct net_device *netdev)
 		h->ae_algo->ops->set_timer_task(priv->ae_handle, true);
 
 	hns3_config_xps(priv);
+
+	if (netif_msg_drv(h))
+		netdev_info(netdev, "net open\n");
+
 	return 0;
 }
 
@@ -519,6 +523,9 @@ static int hns3_nic_net_stop(struct net_device *netdev)
 	if (test_and_set_bit(HNS3_NIC_STATE_DOWN, &priv->state))
 		return 0;
 
+	if (netif_msg_drv(h))
+		netdev_info(netdev, "net stop\n");
+
 	if (h->ae_algo->ops->set_timer_task)
 		h->ae_algo->ops->set_timer_task(priv->ae_handle, false);
 
@@ -1550,6 +1557,9 @@ static int hns3_setup_tc(struct net_device *netdev, void *type_data)
 	h = hns3_get_handle(netdev);
 	kinfo = &h->kinfo;
 
+	if (netif_msg_drv(h))
+		netdev_info(netdev, "setup tc: num_tc=%d\n", tc);
+
 	return (kinfo->dcb_ops && kinfo->dcb_ops->setup_tc) ?
 		kinfo->dcb_ops->setup_tc(h, tc, prio_tc) : -EOPNOTSUPP;
 }
@@ -1593,6 +1603,11 @@ static int hns3_ndo_set_vf_vlan(struct net_device *netdev, int vf, u16 vlan,
 	struct hnae3_handle *h = hns3_get_handle(netdev);
 	int ret = -EIO;
 
+	if (netif_msg_drv(h))
+		netdev_info(netdev,
+			    "set vf vlan: vf=%d, vlan=%d, qos=%d, vlan_proto=%d\n",
+			    vf, vlan, qos, vlan_proto);
+
 	if (h->ae_algo->ops->set_vf_vlan_filter)
 		ret = h->ae_algo->ops->set_vf_vlan_filter(h, vf, vlan,
 							  qos, vlan_proto);
@@ -1611,6 +1626,10 @@ static int hns3_nic_change_mtu(struct net_device *netdev, int new_mtu)
 	if (!h->ae_algo->ops->set_mtu)
 		return -EOPNOTSUPP;
 
+	if (netif_msg_drv(h))
+		netdev_info(netdev, "change mtu from %d to %d\n",
+			    netdev->mtu, new_mtu);
+
 	ret = h->ae_algo->ops->set_mtu(h, new_mtu);
 	if (ret)
 		netdev_err(netdev, "failed to change MTU in hardware %d\n",
@@ -4395,6 +4414,11 @@ int hns3_set_channels(struct net_device *netdev,
 	if (kinfo->rss_size == new_tqp_num)
 		return 0;
 
+	if (netif_msg_drv(h))
+		netdev_info(netdev,
+			    "set channels: tqp_num=%d, rxfh=%d\n",
+			    new_tqp_num, rxfh_configured);
+
 	ret = hns3_reset_notify(h, HNAE3_DOWN_CLIENT);
 	if (ret)
 		return ret;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index e71c92b..08334d7 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -311,6 +311,9 @@ static void hns3_self_test(struct net_device *ndev,
 	if (eth_test->flags != ETH_TEST_FL_OFFLINE)
 		return;
 
+	if (netif_msg_drv(h))
+		netdev_info(ndev, "self test start\n");
+
 	st_param[HNAE3_LOOP_APP][0] = HNAE3_LOOP_APP;
 	st_param[HNAE3_LOOP_APP][1] =
 			h->flags & HNAE3_SUPPORT_APP_LOOPBACK;
@@ -374,6 +377,9 @@ static void hns3_self_test(struct net_device *ndev,
 
 	if (if_running)
 		ndev->netdev_ops->ndo_open(ndev);
+
+	if (netif_msg_drv(h))
+		netdev_info(ndev, "self test end\n");
 }
 
 static int hns3_get_sset_count(struct net_device *netdev, int stringset)
@@ -604,6 +610,11 @@ static int hns3_set_pauseparam(struct net_device *netdev,
 {
 	struct hnae3_handle *h = hns3_get_handle(netdev);
 
+	if (netif_msg_drv(h))
+		netdev_info(netdev,
+			    "set pauseparam: autoneg=%d, rx:%d, tx:%d\n",
+			    param->autoneg, param->rx_pause, param->tx_pause);
+
 	if (h->ae_algo->ops->set_pauseparam)
 		return h->ae_algo->ops->set_pauseparam(h, param->autoneg,
 						       param->rx_pause,
@@ -743,6 +754,13 @@ static int hns3_set_link_ksettings(struct net_device *netdev,
 	if (cmd->base.speed == SPEED_1000 && cmd->base.duplex == DUPLEX_HALF)
 		return -EINVAL;
 
+	if (netif_msg_drv(handle))
+		netdev_info(netdev,
+			    "set link(%s): autoneg=%d, speed=%d, duplex=%d\n",
+			    netdev->phydev ? "phy" : "mac",
+			    cmd->base.autoneg, cmd->base.speed,
+			    cmd->base.duplex);
+
 	/* Only support ksettings_set for netdev with phy attached for now */
 	if (netdev->phydev)
 		return phy_ethtool_ksettings_set(netdev->phydev, cmd);
@@ -984,6 +1002,10 @@ static int hns3_nway_reset(struct net_device *netdev)
 		return -EINVAL;
 	}
 
+	if (netif_msg_drv(handle))
+		netdev_info(netdev, "nway reset (using %s)\n",
+			    phy ? "phy" : "mac");
+
 	if (phy)
 		return genphy_restart_aneg(phy);
 
@@ -1308,6 +1330,10 @@ static int hns3_set_fecparam(struct net_device *netdev,
 	if (!ops->set_fec)
 		return -EOPNOTSUPP;
 	fec_mode = eth_to_loc_fec(fec->fec);
+
+	if (netif_msg_drv(handle))
+		netdev_info(netdev, "set fecparam: mode=%d\n", fec_mode);
+
 	return ops->set_fec(handle, fec_mode);
 }
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
index bac4ce1..07471ba 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
@@ -201,6 +201,7 @@ static int hclge_client_setup_tc(struct hclge_dev *hdev)
 static int hclge_ieee_setets(struct hnae3_handle *h, struct ieee_ets *ets)
 {
 	struct hclge_vport *vport = hclge_get_vport(h);
+	struct net_device *netdev = h->kinfo.netdev;
 	struct hclge_dev *hdev = vport->back;
 	bool map_changed = false;
 	u8 num_tc = 0;
@@ -215,6 +216,9 @@ static int hclge_ieee_setets(struct hnae3_handle *h, struct ieee_ets *ets)
 		return ret;
 
 	if (map_changed) {
+		if (netif_msg_drv(h))
+			netdev_info(netdev, "set ets\n");
+
 		ret = hclge_notify_client(hdev, HNAE3_DOWN_CLIENT);
 		if (ret)
 			return ret;
@@ -300,6 +304,7 @@ static int hclge_ieee_getpfc(struct hnae3_handle *h, struct ieee_pfc *pfc)
 static int hclge_ieee_setpfc(struct hnae3_handle *h, struct ieee_pfc *pfc)
 {
 	struct hclge_vport *vport = hclge_get_vport(h);
+	struct net_device *netdev = h->kinfo.netdev;
 	struct hclge_dev *hdev = vport->back;
 	u8 i, j, pfc_map, *prio_tc;
 
@@ -325,6 +330,11 @@ static int hclge_ieee_setpfc(struct hnae3_handle *h, struct ieee_pfc *pfc)
 	hdev->tm_info.hw_pfc_map = pfc_map;
 	hdev->tm_info.pfc_en = pfc->pfc_en;
 
+	if (netif_msg_drv(h))
+		netdev_info(netdev,
+			    "set pfc: pfc_en=%d, pfc_map=%d, num_tc=%d\n",
+			    pfc->pfc_en, pfc_map, hdev->tm_info.num_tc);
+
 	hclge_tm_pfc_info_update(hdev);
 
 	return hclge_pause_setup_hw(hdev, false);
@@ -345,8 +355,12 @@ static u8 hclge_getdcbx(struct hnae3_handle *h)
 static u8 hclge_setdcbx(struct hnae3_handle *h, u8 mode)
 {
 	struct hclge_vport *vport = hclge_get_vport(h);
+	struct net_device *netdev = h->kinfo.netdev;
 	struct hclge_dev *hdev = vport->back;
 
+	if (netif_msg_drv(h))
+		netdev_info(netdev, "set dcbx: mode=%d\n", mode);
+
 	/* No support for LLD_MANAGED modes or CEE */
 	if ((mode & DCB_CAP_DCBX_LLD_MANAGED) ||
 	    (mode & DCB_CAP_DCBX_VER_CEE) ||
-- 
2.7.4


^ permalink raw reply related

* [PATCH V2 net-next 06/11] net: hns3: modify firmware version display format
From: Huazhong Tan @ 2019-07-26  3:24 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	Yufeng Mo, Peng Li, Huazhong Tan
In-Reply-To: <1564111502-15504-1-git-send-email-tanhuazhong@huawei.com>

From: Yufeng Mo <moyufeng@huawei.com>

This patch modifies firmware version display format in
hclge(vf)_cmd_init() and hns3_get_drvinfo(). Also, adds
some optimizations for firmware version display format.

Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hnae3.h              |  9 +++++++++
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c       | 15 +++++++++++++--
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c   | 10 +++++++++-
 drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c | 10 +++++++++-
 4 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
index 48c7b70..a4624db 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
@@ -179,6 +179,15 @@ struct hnae3_vector_info {
 #define HNAE3_RING_GL_RX 0
 #define HNAE3_RING_GL_TX 1
 
+#define HNAE3_FW_VERSION_BYTE3_SHIFT	24
+#define HNAE3_FW_VERSION_BYTE3_MASK	GENMASK(31, 24)
+#define HNAE3_FW_VERSION_BYTE2_SHIFT	16
+#define HNAE3_FW_VERSION_BYTE2_MASK	GENMASK(23, 16)
+#define HNAE3_FW_VERSION_BYTE1_SHIFT	8
+#define HNAE3_FW_VERSION_BYTE1_MASK	GENMASK(15, 8)
+#define HNAE3_FW_VERSION_BYTE0_SHIFT	0
+#define HNAE3_FW_VERSION_BYTE0_MASK	GENMASK(7, 0)
+
 struct hnae3_ring_chain_node {
 	struct hnae3_ring_chain_node *next;
 	u32 tqp_index;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index 5bff98a..e71c92b 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -527,6 +527,7 @@ static void hns3_get_drvinfo(struct net_device *netdev,
 {
 	struct hns3_nic_priv *priv = netdev_priv(netdev);
 	struct hnae3_handle *h = priv->ae_handle;
+	u32 fw_version;
 
 	if (!h->ae_algo->ops->get_fw_version) {
 		netdev_err(netdev, "could not get fw version!\n");
@@ -545,8 +546,18 @@ static void hns3_get_drvinfo(struct net_device *netdev,
 		sizeof(drvinfo->bus_info));
 	drvinfo->bus_info[ETHTOOL_BUSINFO_LEN - 1] = '\0';
 
-	snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version), "0x%08x",
-		 priv->ae_handle->ae_algo->ops->get_fw_version(h));
+	fw_version = priv->ae_handle->ae_algo->ops->get_fw_version(h);
+
+	snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version),
+		 "%lu.%lu.%lu.%lu",
+		 hnae3_get_field(fw_version, HNAE3_FW_VERSION_BYTE3_MASK,
+				 HNAE3_FW_VERSION_BYTE3_SHIFT),
+		 hnae3_get_field(fw_version, HNAE3_FW_VERSION_BYTE2_MASK,
+				 HNAE3_FW_VERSION_BYTE2_SHIFT),
+		 hnae3_get_field(fw_version, HNAE3_FW_VERSION_BYTE1_MASK,
+				 HNAE3_FW_VERSION_BYTE1_SHIFT),
+		 hnae3_get_field(fw_version, HNAE3_FW_VERSION_BYTE0_MASK,
+				 HNAE3_FW_VERSION_BYTE0_SHIFT));
 }
 
 static u32 hns3_get_link(struct net_device *netdev)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
index 22f6acd..d9858f2 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
@@ -419,7 +419,15 @@ int hclge_cmd_init(struct hclge_dev *hdev)
 	}
 	hdev->fw_version = version;
 
-	dev_info(&hdev->pdev->dev, "The firmware version is %08x\n", version);
+	dev_info(&hdev->pdev->dev, "The firmware version is %lu.%lu.%lu.%lu\n",
+		 hnae3_get_field(version, HNAE3_FW_VERSION_BYTE3_MASK,
+				 HNAE3_FW_VERSION_BYTE3_SHIFT),
+		 hnae3_get_field(version, HNAE3_FW_VERSION_BYTE2_MASK,
+				 HNAE3_FW_VERSION_BYTE2_SHIFT),
+		 hnae3_get_field(version, HNAE3_FW_VERSION_BYTE1_MASK,
+				 HNAE3_FW_VERSION_BYTE1_SHIFT),
+		 hnae3_get_field(version, HNAE3_FW_VERSION_BYTE0_MASK,
+				 HNAE3_FW_VERSION_BYTE0_SHIFT));
 
 	return 0;
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c
index 652b796..8f21eb3 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c
@@ -405,7 +405,15 @@ int hclgevf_cmd_init(struct hclgevf_dev *hdev)
 	}
 	hdev->fw_version = version;
 
-	dev_info(&hdev->pdev->dev, "The firmware version is %08x\n", version);
+	dev_info(&hdev->pdev->dev, "The firmware version is %lu.%lu.%lu.%lu\n",
+		 hnae3_get_field(version, HNAE3_FW_VERSION_BYTE3_MASK,
+				 HNAE3_FW_VERSION_BYTE3_SHIFT),
+		 hnae3_get_field(version, HNAE3_FW_VERSION_BYTE2_MASK,
+				 HNAE3_FW_VERSION_BYTE2_SHIFT),
+		 hnae3_get_field(version, HNAE3_FW_VERSION_BYTE1_MASK,
+				 HNAE3_FW_VERSION_BYTE1_SHIFT),
+		 hnae3_get_field(version, HNAE3_FW_VERSION_BYTE0_MASK,
+				 HNAE3_FW_VERSION_BYTE0_SHIFT));
 
 	return 0;
 
-- 
2.7.4


^ permalink raw reply related

* [PATCH V2 net-next 10/11] net: hns3: Add support for using order 1 pages with a 4K buffer
From: Huazhong Tan @ 2019-07-26  3:25 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	Yunsheng Lin, Huazhong Tan
In-Reply-To: <1564111502-15504-1-git-send-email-tanhuazhong@huawei.com>

From: Yunsheng Lin <linyunsheng@huawei.com>

Hardware supports 0.5K, 1K, 2K, 4K RX buffer size, the
RX buffer can not be reused because the hns3_page_order
return 0 when page size and RX buffer size are both 4096.

So this patch changes the hns3_page_order to return 1 when
RX buffer is greater than half of the page size and page size
is less the 8192, and dev_alloc_pages has already been used
to allocate the compound page for RX buffer.

This patch also changes hnae3_* to hns3_* for page order
and RX buffer size calculation because they are used in
hns3 module.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 10 +++++-----
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.h | 15 ++++++++++++---
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 2e30cfa..3a93bef 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -2086,7 +2086,7 @@ static void hns3_set_default_feature(struct net_device *netdev)
 static int hns3_alloc_buffer(struct hns3_enet_ring *ring,
 			     struct hns3_desc_cb *cb)
 {
-	unsigned int order = hnae3_page_order(ring);
+	unsigned int order = hns3_page_order(ring);
 	struct page *p;
 
 	p = dev_alloc_pages(order);
@@ -2097,7 +2097,7 @@ static int hns3_alloc_buffer(struct hns3_enet_ring *ring,
 	cb->page_offset = 0;
 	cb->reuse_flag = 0;
 	cb->buf  = page_address(p);
-	cb->length = hnae3_page_size(ring);
+	cb->length = hns3_page_size(ring);
 	cb->type = DESC_TYPE_PAGE;
 
 	return 0;
@@ -2400,7 +2400,7 @@ static void hns3_nic_reuse_page(struct sk_buff *skb, int i,
 {
 	struct hns3_desc *desc = &ring->desc[ring->next_to_clean];
 	int size = le16_to_cpu(desc->rx.size);
-	u32 truesize = hnae3_buf_size(ring);
+	u32 truesize = hns3_buf_size(ring);
 
 	skb_add_rx_frag(skb, i, desc_cb->priv, desc_cb->page_offset + pull_len,
 			size - pull_len, truesize);
@@ -2415,7 +2415,7 @@ static void hns3_nic_reuse_page(struct sk_buff *skb, int i,
 	/* Move offset up to the next cache line */
 	desc_cb->page_offset += truesize;
 
-	if (desc_cb->page_offset + truesize <= hnae3_page_size(ring)) {
+	if (desc_cb->page_offset + truesize <= hns3_page_size(ring)) {
 		desc_cb->reuse_flag = 1;
 		/* Bump ref count on page before it is given */
 		get_page(desc_cb->priv);
@@ -2697,7 +2697,7 @@ static int hns3_add_frag(struct hns3_enet_ring *ring, struct hns3_desc *desc,
 		}
 
 		if (ring->tail_skb) {
-			head_skb->truesize += hnae3_buf_size(ring);
+			head_skb->truesize += hns3_buf_size(ring);
 			head_skb->data_len += le16_to_cpu(desc->rx.size);
 			head_skb->len += le16_to_cpu(desc->rx.size);
 			skb = ring->tail_skb;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
index 848b866..1a17856 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
@@ -608,9 +608,18 @@ static inline bool hns3_nic_resetting(struct net_device *netdev)
 
 #define tx_ring_data(priv, idx) ((priv)->ring_data[idx])
 
-#define hnae3_buf_size(_ring) ((_ring)->buf_size)
-#define hnae3_page_order(_ring) (get_order(hnae3_buf_size(_ring)))
-#define hnae3_page_size(_ring) (PAGE_SIZE << (u32)hnae3_page_order(_ring))
+#define hns3_buf_size(_ring) ((_ring)->buf_size)
+
+static inline unsigned int hns3_page_order(struct hns3_enet_ring *ring)
+{
+#if (PAGE_SIZE < 8192)
+	if (ring->buf_size > (PAGE_SIZE / 2))
+		return 1;
+#endif
+	return 0;
+}
+
+#define hns3_page_size(_ring) (PAGE_SIZE << hns3_page_order(_ring))
 
 /* iterator for handling rings in ring group */
 #define hns3_for_each_ring(pos, head) \
-- 
2.7.4


^ permalink raw reply related

* [PATCH V2 net-next 09/11] net: hns3: add interrupt affinity support for misc interrupt
From: Huazhong Tan @ 2019-07-26  3:25 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	Yunsheng Lin, Peng Li, Huazhong Tan
In-Reply-To: <1564111502-15504-1-git-send-email-tanhuazhong@huawei.com>

From: Yunsheng Lin <linyunsheng@huawei.com>

The misc interrupt is used to schedule the reset and mailbox
subtask, and a 1 sec timer is used to schedule the service
subtask, which does periodic work.

This patch sets the above three subtask's affinity using the
misc interrupt' affinity.

Also this patch setups a affinity notify for misc interrupt to
allow user to change the above three subtask's affinity.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c    | 53 ++++++++++++++++++++--
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h    |  4 ++
 2 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index e804a19..3e43dff 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -1270,6 +1270,12 @@ static int hclge_configure(struct hclge_dev *hdev)
 
 	hclge_init_kdump_kernel_config(hdev);
 
+	/* Set the init affinity based on pci func number */
+	i = cpumask_weight(cpumask_of_node(dev_to_node(&hdev->pdev->dev)));
+	i = i ? PCI_FUNC(hdev->pdev->devfn) % i : 0;
+	cpumask_set_cpu(cpumask_local_spread(i, dev_to_node(&hdev->pdev->dev)),
+			&hdev->affinity_mask);
+
 	return ret;
 }
 
@@ -2501,14 +2507,16 @@ static void hclge_mbx_task_schedule(struct hclge_dev *hdev)
 {
 	if (!test_bit(HCLGE_STATE_CMD_DISABLE, &hdev->state) &&
 	    !test_and_set_bit(HCLGE_STATE_MBX_SERVICE_SCHED, &hdev->state))
-		schedule_work(&hdev->mbx_service_task);
+		queue_work_on(cpumask_first(&hdev->affinity_mask), system_wq,
+			      &hdev->mbx_service_task);
 }
 
 static void hclge_reset_task_schedule(struct hclge_dev *hdev)
 {
 	if (!test_bit(HCLGE_STATE_REMOVING, &hdev->state) &&
 	    !test_and_set_bit(HCLGE_STATE_RST_SERVICE_SCHED, &hdev->state))
-		schedule_work(&hdev->rst_service_task);
+		queue_work_on(cpumask_first(&hdev->affinity_mask), system_wq,
+			      &hdev->rst_service_task);
 }
 
 static void hclge_task_schedule(struct hclge_dev *hdev)
@@ -2518,8 +2526,9 @@ static void hclge_task_schedule(struct hclge_dev *hdev)
 	    !test_and_set_bit(HCLGE_STATE_SERVICE_SCHED, &hdev->state)) {
 		hdev->hw_stats.stats_timer++;
 		hdev->fd_arfs_expire_timer++;
-		mod_delayed_work(system_wq, &hdev->service_task,
-				 round_jiffies_relative(HZ));
+		mod_delayed_work_on(cpumask_first(&hdev->affinity_mask),
+				    system_wq, &hdev->service_task,
+				    round_jiffies_relative(HZ));
 	}
 }
 
@@ -2905,6 +2914,36 @@ static void hclge_get_misc_vector(struct hclge_dev *hdev)
 	hdev->num_msi_used += 1;
 }
 
+static void hclge_irq_affinity_notify(struct irq_affinity_notify *notify,
+				      const cpumask_t *mask)
+{
+	struct hclge_dev *hdev = container_of(notify, struct hclge_dev,
+					      affinity_notify);
+
+	cpumask_copy(&hdev->affinity_mask, mask);
+}
+
+static void hclge_irq_affinity_release(struct kref *ref)
+{
+}
+
+static void hclge_misc_affinity_setup(struct hclge_dev *hdev)
+{
+	irq_set_affinity_hint(hdev->misc_vector.vector_irq,
+			      &hdev->affinity_mask);
+
+	hdev->affinity_notify.notify = hclge_irq_affinity_notify;
+	hdev->affinity_notify.release = hclge_irq_affinity_release;
+	irq_set_affinity_notifier(hdev->misc_vector.vector_irq,
+				  &hdev->affinity_notify);
+}
+
+static void hclge_misc_affinity_teardown(struct hclge_dev *hdev)
+{
+	irq_set_affinity_notifier(hdev->misc_vector.vector_irq, NULL);
+	irq_set_affinity_hint(hdev->misc_vector.vector_irq, NULL);
+}
+
 static int hclge_misc_irq_init(struct hclge_dev *hdev)
 {
 	int ret;
@@ -8796,6 +8835,11 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)
 	INIT_WORK(&hdev->rst_service_task, hclge_reset_service_task);
 	INIT_WORK(&hdev->mbx_service_task, hclge_mailbox_service_task);
 
+	/* Setup affinity after service timer setup because add_timer_on
+	 * is called in affinity notify.
+	 */
+	hclge_misc_affinity_setup(hdev);
+
 	hclge_clear_all_event_cause(hdev);
 	hclge_clear_resetting_state(hdev);
 
@@ -8957,6 +9001,7 @@ static void hclge_uninit_ae_dev(struct hnae3_ae_dev *ae_dev)
 	struct hclge_dev *hdev = ae_dev->priv;
 	struct hclge_mac *mac = &hdev->hw.mac;
 
+	hclge_misc_affinity_teardown(hdev);
 	hclge_state_uninit(hdev);
 
 	if (mac->phydev)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
index dde8f22..688e425 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
@@ -863,6 +863,10 @@ struct hclge_dev {
 
 	DECLARE_KFIFO(mac_tnl_log, struct hclge_mac_tnl_stats,
 		      HCLGE_MAC_TNL_LOG_SIZE);
+
+	/* affinity mask and notify for misc interrupt */
+	cpumask_t affinity_mask;
+	struct irq_affinity_notify affinity_notify;
 };
 
 /* VPort level vlan tag configuration for TX direction */
-- 
2.7.4


^ permalink raw reply related

* [PATCH V2 net-next 08/11] net: hns3: make hclge_service use delayed workqueue
From: Huazhong Tan @ 2019-07-26  3:24 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	Yunsheng Lin, Huazhong Tan
In-Reply-To: <1564111502-15504-1-git-send-email-tanhuazhong@huawei.com>

From: Yunsheng Lin <linyunsheng@huawei.com>

Use delayed work instead of using timers to trigger the
hclge_serive.

Simplify the code with one less middle function and in order
to support misc irq affinity.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c    | 52 +++++++++-------------
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h    |  3 +-
 2 files changed, 21 insertions(+), 34 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index faf60b4..e804a19 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -2515,8 +2515,12 @@ static void hclge_task_schedule(struct hclge_dev *hdev)
 {
 	if (!test_bit(HCLGE_STATE_DOWN, &hdev->state) &&
 	    !test_bit(HCLGE_STATE_REMOVING, &hdev->state) &&
-	    !test_and_set_bit(HCLGE_STATE_SERVICE_SCHED, &hdev->state))
-		(void)schedule_work(&hdev->service_task);
+	    !test_and_set_bit(HCLGE_STATE_SERVICE_SCHED, &hdev->state)) {
+		hdev->hw_stats.stats_timer++;
+		hdev->fd_arfs_expire_timer++;
+		mod_delayed_work(system_wq, &hdev->service_task,
+				 round_jiffies_relative(HZ));
+	}
 }
 
 static int hclge_get_mac_link_status(struct hclge_dev *hdev)
@@ -2731,25 +2735,6 @@ static int hclge_get_status(struct hnae3_handle *handle)
 	return hdev->hw.mac.link;
 }
 
-static void hclge_service_timer(struct timer_list *t)
-{
-	struct hclge_dev *hdev = from_timer(hdev, t, service_timer);
-
-	mod_timer(&hdev->service_timer, jiffies + HZ);
-	hdev->hw_stats.stats_timer++;
-	hdev->fd_arfs_expire_timer++;
-	hclge_task_schedule(hdev);
-}
-
-static void hclge_service_complete(struct hclge_dev *hdev)
-{
-	WARN_ON(!test_bit(HCLGE_STATE_SERVICE_SCHED, &hdev->state));
-
-	/* Flush memory before next watchdog */
-	smp_mb__before_atomic();
-	clear_bit(HCLGE_STATE_SERVICE_SCHED, &hdev->state);
-}
-
 static u32 hclge_check_event_cause(struct hclge_dev *hdev, u32 *clearval)
 {
 	u32 rst_src_reg, cmdq_src_reg, msix_src_reg;
@@ -3596,7 +3581,9 @@ static void hclge_update_vport_alive(struct hclge_dev *hdev)
 static void hclge_service_task(struct work_struct *work)
 {
 	struct hclge_dev *hdev =
-		container_of(work, struct hclge_dev, service_task);
+		container_of(work, struct hclge_dev, service_task.work);
+
+	clear_bit(HCLGE_STATE_SERVICE_SCHED, &hdev->state);
 
 	if (hdev->hw_stats.stats_timer >= HCLGE_STATS_TIMER_INTERVAL) {
 		hclge_update_stats_for_all(hdev);
@@ -3611,7 +3598,8 @@ static void hclge_service_task(struct work_struct *work)
 		hclge_rfs_filter_expire(hdev);
 		hdev->fd_arfs_expire_timer = 0;
 	}
-	hclge_service_complete(hdev);
+
+	hclge_task_schedule(hdev);
 }
 
 struct hclge_vport *hclge_get_vport(struct hnae3_handle *handle)
@@ -6150,10 +6138,13 @@ static void hclge_set_timer_task(struct hnae3_handle *handle, bool enable)
 	struct hclge_dev *hdev = vport->back;
 
 	if (enable) {
-		mod_timer(&hdev->service_timer, jiffies + HZ);
+		hclge_task_schedule(hdev);
 	} else {
-		del_timer_sync(&hdev->service_timer);
-		cancel_work_sync(&hdev->service_task);
+		/* Set the DOWN flag here to disable the service to be
+		 * scheduled again
+		 */
+		set_bit(HCLGE_STATE_DOWN, &hdev->state);
+		cancel_delayed_work_sync(&hdev->service_task);
 		clear_bit(HCLGE_STATE_SERVICE_SCHED, &hdev->state);
 	}
 }
@@ -8592,12 +8583,10 @@ static void hclge_state_uninit(struct hclge_dev *hdev)
 	set_bit(HCLGE_STATE_DOWN, &hdev->state);
 	set_bit(HCLGE_STATE_REMOVING, &hdev->state);
 
-	if (hdev->service_timer.function)
-		del_timer_sync(&hdev->service_timer);
 	if (hdev->reset_timer.function)
 		del_timer_sync(&hdev->reset_timer);
-	if (hdev->service_task.func)
-		cancel_work_sync(&hdev->service_task);
+	if (hdev->service_task.work.func)
+		cancel_delayed_work_sync(&hdev->service_task);
 	if (hdev->rst_service_task.func)
 		cancel_work_sync(&hdev->rst_service_task);
 	if (hdev->mbx_service_task.func)
@@ -8802,9 +8791,8 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)
 
 	hclge_dcb_ops_set(hdev);
 
-	timer_setup(&hdev->service_timer, hclge_service_timer, 0);
 	timer_setup(&hdev->reset_timer, hclge_reset_timer, 0);
-	INIT_WORK(&hdev->service_task, hclge_service_task);
+	INIT_DELAYED_WORK(&hdev->service_task, hclge_service_task);
 	INIT_WORK(&hdev->rst_service_task, hclge_reset_service_task);
 	INIT_WORK(&hdev->mbx_service_task, hclge_mailbox_service_task);
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
index 6a12285..dde8f22 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
@@ -806,9 +806,8 @@ struct hclge_dev {
 	u16 adminq_work_limit; /* Num of admin receive queue desc to process */
 	unsigned long service_timer_period;
 	unsigned long service_timer_previous;
-	struct timer_list service_timer;
 	struct timer_list reset_timer;
-	struct work_struct service_task;
+	struct delayed_work service_task;
 	struct work_struct rst_service_task;
 	struct work_struct mbx_service_task;
 
-- 
2.7.4


^ permalink raw reply related

* [PATCH V2 net-next 11/11] net: hns3: use dev_info() instead of pr_info()
From: Huazhong Tan @ 2019-07-26  3:25 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-kernel, salil.mehta, yisen.zhuang, linuxarm,
	Huazhong Tan
In-Reply-To: <1564111502-15504-1-git-send-email-tanhuazhong@huawei.com>

dev_info() is more appropriate for printing messages when driver
initialization done, so switch to dev_info().

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c   | 4 +++-
 drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 3 ++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 3e43dff..588fb42 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -8864,7 +8864,9 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)
 	hclge_state_init(hdev);
 	hdev->last_reset_time = jiffies;
 
-	pr_info("%s driver initialization finished.\n", HCLGE_DRIVER_NAME);
+	dev_info(&hdev->pdev->dev, "%s driver initialization finished.\n",
+		 HCLGE_DRIVER_NAME);
+
 	return 0;
 
 err_mdiobus_unreg:
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index db84782..218fd5d 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -2703,7 +2703,8 @@ static int hclgevf_init_hdev(struct hclgevf_dev *hdev)
 	}
 
 	hdev->last_reset_time = jiffies;
-	pr_info("finished initializing %s driver\n", HCLGEVF_DRIVER_NAME);
+	dev_info(&hdev->pdev->dev, "finished initializing %s driver\n",
+		 HCLGEVF_DRIVER_NAME);
 
 	return 0;
 
-- 
2.7.4


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox