Netdev List
 help / color / mirror / Atom feed
* [PATCH 1/2] net: ipv4: Fix a possible null-pointer dereference in inet_csk_rebuild_route()
From: Jia-Ju Bai @ 2019-07-26  2:25 UTC (permalink / raw)
  To: davem, kuznet, yoshfuji; +Cc: netdev, linux-kernel, Jia-Ju Bai

In inet_csk_rebuild_route(), rt is assigned to NULL on line 1071.
On line 1076, rt is used:
    return &rt->dst;
Thus, a possible null-pointer dereference may occur.

To fix this bug, rt is checked before being used.

This bug is found by a static analysis tool STCheck written by us.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
---
 net/ipv4/inet_connection_sock.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index f5c163d4771b..27d9d80f3401 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -1073,7 +1073,10 @@ static struct dst_entry *inet_csk_rebuild_route(struct sock *sk, struct flowi *f
 		sk_setup_caps(sk, &rt->dst);
 	rcu_read_unlock();
 
-	return &rt->dst;
+	if (rt)
+		return &rt->dst;
+	else
+		return NULL;
 }
 
 struct dst_entry *inet_csk_update_pmtu(struct sock *sk, u32 mtu)
-- 
2.17.0


^ permalink raw reply related

* [PATCH 2/2] net: ipv4: Fix a possible null-pointer dereference in fib4_rule_suppress()
From: Jia-Ju Bai @ 2019-07-26  2:25 UTC (permalink / raw)
  To: davem, kuznet, yoshfuji; +Cc: netdev, linux-kernel, Jia-Ju Bai

In fib4_rule_suppress(), there is an if statement on line 145 to check
whether result->fi is NULL:
    if (result->fi)

When result->fi is NULL, it is used on line 167:
    fib_info_put(result->fi);

In fib_info_put(), the argument fi is used:
    if (refcount_dec_and_test(&fi->fib_clntref))

Thus, a possible null-pointer dereference may occur.

To fix this bug, result->fi is checked before calling fib_info_put().

This bug is found by a static analysis tool STCheck written by us.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
---
 net/ipv4/fib_rules.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index b43a7ba5c6a4..daedce293aab 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -163,7 +163,7 @@ static bool fib4_rule_suppress(struct fib_rule *rule, struct fib_lookup_arg *arg
 	return false;
 
 suppress_route:
-	if (!(arg->flags & FIB_LOOKUP_NOREF))
+	if (!(arg->flags & FIB_LOOKUP_NOREF) && result->fi)
 		fib_info_put(result->fi);
 	return true;
 }
-- 
2.17.0


^ permalink raw reply related

* Re: [PATCH net-next 07/11] net: hns3: adds debug messages to identify eth down cause
From: liuyonglong @ 2019-07-26  2:21 UTC (permalink / raw)
  To: Saeed Mahameed, tanhuazhong@huawei.com, davem@davemloft.net
  Cc: lipeng321@huawei.com, yisen.zhuang@huawei.com,
	salil.mehta@huawei.com, linuxarm@huawei.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <75a02bbe5b3b0f2755cd901a8830d4a3026f9383.camel@mellanox.com>

We will change all of them to netif_msg_drv() which is default off
Thanks for your reply!

On 2019/7/26 5:59, Saeed Mahameed wrote:
> On Thu, 2019-07-25 at 20:28 +0800, liuyonglong wrote:
>>
>> On 2019/7/25 3:12, Saeed Mahameed wrote:
>>> On Wed, 2019-07-24 at 11:18 +0800, Huazhong Tan wrote:
>>>> From: Yonglong Liu <liuyonglong@huawei.com>
>>>>
>>>> Some times just see the eth interface have been down/up via
>>>> dmesg, but can not know why the eth down. So adds some debug
>>>> messages to identify the cause for this.
>>>>
>>>
>>> I really don't like this. your default msg lvl has NETIF_MSG_IFDOWN
>>> turned on .. dumping every single operation that happens on your
>>> device
>>> by default to kernel log is too much ! 
>>>
>>> We should really consider using trace buffers with well defined
>>> structures for vendor specific events. so we can use bpf filters
>>> and
>>> state of the art tools for netdev debugging.
>>>
>>
>> We do this because we can just see a link down message in dmesg, and
>> had
>> take a long time to found the cause of link down, just because
>> another
>> user changed the settings.
>>
>> We can change the net_open/net_stop/dcbnl_ops to msg_drv (not default
>> turned on),  and want to keep the others default print to kernel log,
>> is it acceptable?
>>
> 
> acceptable as long as debug information are kept off by default and
> your driver doens't spam the kernel log.
> 
> you should use dynamic debug [1] and/or "off by default" msg lvls for
> debugging information..
> 
> I couldn't find any rules regarding what to put in kernel log, Maybe
> someone can share ?. but i vaguely remember that the recommendation
> for device drivers is to put nothing, only error/warning messages.
> 
> [1] 
> https://www.kernel.org/doc/html/v4.15/admin-guide/dynamic-debug-howto.html
> 
>>>> @@ -1593,6 +1603,11 @@ static int hns3_ndo_set_vf_vlan(struct
>>>> net_device *netdev, int vf, u16 vlan,
>>>>  	struct hnae3_handle *h = hns3_get_handle(netdev);
>>>>  	int ret = -EIO;
>>>>  
>>>> +	if (netif_msg_ifdown(h))
>>>
>>> why msg_ifdown ? looks like netif_msg_drv is more appropriate, for
>>> many
>>> of the cases in this patch.
>>>
>>
>> This operation may cause link down, so we use msg_ifdown.
>>
> 
> ifdown isn't link down.. 
> 
> to be honest, I couldn't find any documentation explaining how/when to
> use msg lvls, (i didn't look too deep though), by looking at other
> drivers, my interpretations is:
> 
> ifdup (open/boot up flow)
> ifdwon (close/teardown flow)
> drv (driver based or dynamic flows) 
> etc .. 
> 
> -Saeed.
> 


^ permalink raw reply

* Re: [PATCH net-next 06/11] net: hns3: modify firmware version display format
From: tanhuazhong @ 2019-07-26  1:50 UTC (permalink / raw)
  To: Saeed Mahameed, davem@davemloft.net
  Cc: lipeng321@huawei.com, yisen.zhuang@huawei.com,
	salil.mehta@huawei.com, linuxarm@huawei.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	moyufeng@huawei.com
In-Reply-To: <d6a32434af7e9c883f104ae66e62b7b376abb39c.camel@mellanox.com>



On 2019/7/26 5:32, Saeed Mahameed wrote:
> On Thu, 2019-07-25 at 10:34 +0800, tanhuazhong wrote:
>>
>> On 2019/7/25 2:34, Saeed Mahameed wrote:
>>> On Wed, 2019-07-24 at 11:18 +0800, Huazhong Tan wrote:
>>>> From: Yufeng Mo <moyufeng@huawei.com>
>>>>
>>>> This patch modifies firmware version display format in
>>>> hclge(vf)_cmd_init() and hns3_get_drvinfo(). Also, adds
>>>> some optimizations for firmware version display format.
>>>>
>>>> Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
>>>> Signed-off-by: Peng Li <lipeng321@huawei.com>
>>>> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
>>>> ---
>>>>    drivers/net/ethernet/hisilicon/hns3/hnae3.h              |  9
>>>> +++++++++
>>>>    drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c       | 15
>>>> +++++++++++++--
>>>>    drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c   | 10
>>>> +++++++++-
>>>>    drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_cmd.c | 11
>>>> +++++++++--
>>>>    4 files changed, 40 insertions(+), 5 deletions(-)
>>>>
>>>>
> 
> [...]
> 
>>>> --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
>>>> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
>>>> @@ -419,7 +419,15 @@ int hclge_cmd_init(struct hclge_dev *hdev)
>>>>    	}
>>>>    	hdev->fw_version = version;
>>>>    
>>>> -	dev_info(&hdev->pdev->dev, "The firmware version is %08x\n",
>>>> version);
>>>> +	pr_info_once("The firmware version is %lu.%lu.%lu.%lu\n",
>>>> +		     hnae3_get_field(version,
>>>> HNAE3_FW_VERSION_BYTE3_MASK,
>>>> +				     HNAE3_FW_VERSION_BYTE3_SHIFT),
>>>> +		     hnae3_get_field(version,
>>>> HNAE3_FW_VERSION_BYTE2_MASK,
>>>> +				     HNAE3_FW_VERSION_BYTE2_SHIFT),
>>>> +		     hnae3_get_field(version,
>>>> HNAE3_FW_VERSION_BYTE1_MASK,
>>>> +				     HNAE3_FW_VERSION_BYTE1_SHIFT),
>>>> +		     hnae3_get_field(version,
>>>> HNAE3_FW_VERSION_BYTE0_MASK,
>>>> +				     HNAE3_FW_VERSION_BYTE0_SHIFT));
>>>>    
>>>
>>> Device name/string will not be printed now, what happens if i have
>>> multiple devices ? at least print the device name as it was before
>>>
>> Since on each board we only have one firmware, the firmware
>> version is same per device, and will not change when running.
>> So pr_info_once() looks good for this case.
>>
> 
> boards change too often to have such static assumption.

Ok, I will use dev_info instead of pr_info here.

> 
>> BTW, maybe we should change below print in the end of
>> hclge_init_ae_dev(), use dev_info() instead of pr_info(),
>> then we can know that which device has already initialized.
>> I will send other patch to do that, is it acceptable for you?
>>
>> "pr_info("%s driver initialization finished.\n", HCLGE_DRIVER_NAME);"
>>
> 
> I would avoid using pr_info when i can ! if you have the option to
> print with dev information as it was before that is preferable.
> 
> Thanks,
> Saeed.
> 

Thanks,
Huazhong.


^ permalink raw reply

* Re: [PATCH bpf-next 2/6] bpf: add BPF_MAP_DUMP command to dump more than one entry per call
From: Alexei Starovoitov @ 2019-07-26  1:47 UTC (permalink / raw)
  To: Brian Vazquez
  Cc: Song Liu, Brian Vazquez, Daniel Borkmann, David S . Miller,
	Stanislav Fomichev, Willem de Bruijn, Petar Penkov, Networking,
	bpf, Yonghong Song
In-Reply-To: <CABCgpaXE=dkBcJVqs95NZQTFuznA-q64kYPEcbvmYvAJ4wSp1A@mail.gmail.com>

On Thu, Jul 25, 2019 at 6:24 PM Brian Vazquez <brianvv.kernel@gmail.com> wrote:
>
> On Thu, Jul 25, 2019 at 4:54 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Jul 25, 2019 at 04:25:53PM -0700, Brian Vazquez wrote:
> > > > > > If prev_key is deleted before map_get_next_key(), we get the first key
> > > > > > again. This is pretty weird.
> > > > >
> > > > > Yes, I know. But note that the current scenario happens even for the
> > > > > old interface (imagine you are walking a map from userspace and you
> > > > > tried get_next_key the prev_key was removed, you will start again from
> > > > > the beginning without noticing it).
> > > > > I tried to sent a patch in the past but I was missing some context:
> > > > > before NULL was used to get the very first_key the interface relied in
> > > > > a random (non existent) key to retrieve the first_key in the map, and
> > > > > I was told what we still have to support that scenario.
> > > >
> > > > BPF_MAP_DUMP is slightly different, as you may return the first key
> > > > multiple times in the same call. Also, BPF_MAP_DUMP is new, so we
> > > > don't have to support legacy scenarios.
> > > >
> > > > Since BPF_MAP_DUMP keeps a list of elements. It is possible to try
> > > > to look up previous keys. Would something down this direction work?
> > >
> > > I've been thinking about it and I think first we need a way to detect
> > > that since key was not present we got the first_key instead:
> > >
> > > - One solution I had in mind was to explicitly asked for the first key
> > > with map_get_next_key(map, NULL, first_key) and while walking the map
> > > check that map_get_next_key(map, prev_key, key) doesn't return the
> > > same key. This could be done using memcmp.
> > > - Discussing with Stan, he mentioned that another option is to support
> > > a flag in map_get_next_key to let it know that we want an error
> > > instead of the first_key.
> > >
> > > After detecting the problem we also need to define what we want to do,
> > > here some options:
> > >
> > > a) Return the error to the caller
> > > b) Try with previous keys if any (which be limited to the keys that we
> > > have traversed so far in this dump call)
> > > c) continue with next entries in the map. array is easy just get the
> > > next valid key (starting on i+1), but hmap might be difficult since
> > > starting on the next bucket could potentially skip some keys that were
> > > concurrently added to the same bucket where key used to be, and
> > > starting on the same bucket could lead us to return repeated elements.
> > >
> > > Or maybe we could support those 3 cases via flags and let the caller
> > > decide which one to use?
> >
> > this type of indecision is the reason why I wasn't excited about
> > batch dumping in the first place and gave 'soft yes' when Stan
> > mentioned it during lsf/mm/bpf uconf.
> > We probably shouldn't do it.
> > It feels this map_dump makes api more complex and doesn't really
> > give much benefit to the user other than large map dump becomes faster.
> > I think we gotta solve this problem differently.
>
> Some users are working around the dumping problems with the existing
> api by creating a bpf_map_get_next_key_and_delete userspace function
> (see https://www.bouncybouncy.net/blog/bpf_map_get_next_key-pitfalls/)
> which in my opinion is actually a good idea. The only problem with
> that is that calling bpf_map_get_next_key(fd, key, next_key) and then
> bpf_map_delete_elem(fd, key) from userspace is racing with kernel code
> and it might lose some information when deleting.
> We could then do map_dump_and_delete using that idea but in the kernel
> where we could better handle the racing condition. In that scenario
> even if we retrieve the same key it will contain different info ( the
> delta between old and new value). Would that work?

you mean get_next+lookup+delete at once?
Sounds useful.
Yonghong has been thinking about batching api as well.

I think if we cannot figure out how to make a batch of two commands
get_next + lookup to work correctly then we need to identify/invent one
command and make batching more generic.
Like make one jumbo/compound/atomic command to be get_next+lookup+delete.
Define the semantics of this single compound command.
And then let batching to be a multiplier of such command.
In a sense that multiplier 1 or N should be have the same way.
No extra flags to alter the batching.
The high level description of the batch would be:
pls execute get_next,lookup,delete and repeat it N times.
or
pls execute get_next,lookup and repeat N times.
where each command action is defined to be composable.

Just a rough idea.

^ permalink raw reply

* Re: [PATCH net-next 07/11] net: hns3: adds debug messages to identify eth down cause
From: Jakub Kicinski @ 2019-07-26  1:28 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: tanhuazhong@huawei.com, davem@davemloft.net,
	liuyonglong@huawei.com, lipeng321@huawei.com,
	yisen.zhuang@huawei.com, salil.mehta@huawei.com,
	linuxarm@huawei.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <75a02bbe5b3b0f2755cd901a8830d4a3026f9383.camel@mellanox.com>

On Thu, 25 Jul 2019 21:59:08 +0000, Saeed Mahameed wrote:
> I couldn't find any rules regarding what to put in kernel log, Maybe
> someone can share ?. but i vaguely remember that the recommendation
> for device drivers is to put nothing, only error/warning messages.

FWIW my understanding is also that only error/warning messages should
be printed. IOW things which should "never happen".

There are some historical exceptions. Probe logs for instance may be
useful, because its not trivial to get to the device if probe fails.

Another one is ethtool flashing, if it takes time we used to print into
logs some message like "please wait patiently". But since Jiri added
the progress messages in devlink that's no longer necessary.

For the messages which are basically printing the name of the function
or name of the function and their args - we have ftrace.

That's my $0.02 :)

^ permalink raw reply

* RE: [PATCH 0/8] can: flexcan: add CAN FD support for NXP Flexcan
From: Joakim Zhang @ 2019-07-26  1:25 UTC (permalink / raw)
  To: Marc Kleine-Budde, linux-can@vger.kernel.org
  Cc: wg@grandegger.com, dl-linux-imx, netdev@vger.kernel.org
In-Reply-To: <24eb5c67-4692-1002-2468-4ae2e1a6b68b@pengutronix.de>


> -----Original Message-----
> From: Marc Kleine-Budde <mkl@pengutronix.de>
> Sent: 2019年7月25日 18:37
> To: Joakim Zhang <qiangqing.zhang@nxp.com>; linux-can@vger.kernel.org
> Cc: wg@grandegger.com; dl-linux-imx <linux-imx@nxp.com>;
> netdev@vger.kernel.org
> Subject: Re: [PATCH 0/8] can: flexcan: add CAN FD support for NXP Flexcan
> 
> On 7/25/19 9:53 AM, Marc Kleine-Budde wrote:
> > On 7/25/19 9:38 AM, Joakim Zhang wrote:
> >> Kindly pinging...
> >>
> >> After you git pull request for linux-can-next-for-5.4-20190724, some patches
> are missing from linux-can-next/testing.
> >> can: flexcan: flexcan_mailbox_read() make use of flexcan_write64() to
> >> mark the mailbox as read
> >> can: flexcan: flexcan_irq(): add support for TX mailbox in iflag1
> >> can: flexcan: flexcan_read_reg_iflag_rx(): optimize reading
> >> can: flexcan: introduce struct flexcan_priv::tx_mask and make use of
> >> it
> >> can: flexcan: convert struct flexcan_priv::rx_mask{1,2} to rx_mask
> >> can: flexcan: remove TX mailbox bit from struct
> >> flexcan_priv::rx_mask{1,2}
> >> can: flexcan: rename struct flexcan_priv::reg_imask{1,2}_default to
> >> rx_mask{1,2}
> >> can: flexcan: flexcan_irq(): rename variable reg_iflag ->
> >> reg_iflag_rx
> >> can: flexcan: rename macro FLEXCAN_IFLAG_MB() ->
> FLEXCAN_IFLAG2_MB()
> >>
> >> You can refer to below link for the reason of adding above patches:
> >> https://www.spinics.net/lists/linux-can/msg00777.html
> >> https://www.spinics.net/lists/linux-can/msg01150.html
> >>
> >> Are you prepared to add back these patches as they are necessary for
> >> Flexcan CAN FD? And this Flexcan CAN FD patch set is based on these
> >> patches.
> >
> > Yes, these patches will be added back.
> 
> I've cleaned up the first patch a bit, and pushed everything to the testing
> branch. Can you give it a test.

Hi Marc,

Both Classic CAN and CAN FD can work fine on my side test, thank you for your kindly review.

Best Regards,
Joakim Zhang
> regards,
> Marc
> 
> --
> Pengutronix e.K.                  | Marc Kleine-Budde           |
> Industrial Linux Solutions        | Phone: +49-231-2826-924     |
> Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
> Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


^ permalink raw reply

* Re: [PATCH bpf-next 2/6] bpf: add BPF_MAP_DUMP command to dump more than one entry per call
From: Brian Vazquez @ 2019-07-26  1:24 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Song Liu, Brian Vazquez, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller, Stanislav Fomichev, Willem de Bruijn,
	Petar Penkov, open list, Networking, bpf
In-Reply-To: <20190725235432.lkptx3fafegnm2et@ast-mbp>

On Thu, Jul 25, 2019 at 4:54 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Jul 25, 2019 at 04:25:53PM -0700, Brian Vazquez wrote:
> > > > > If prev_key is deleted before map_get_next_key(), we get the first key
> > > > > again. This is pretty weird.
> > > >
> > > > Yes, I know. But note that the current scenario happens even for the
> > > > old interface (imagine you are walking a map from userspace and you
> > > > tried get_next_key the prev_key was removed, you will start again from
> > > > the beginning without noticing it).
> > > > I tried to sent a patch in the past but I was missing some context:
> > > > before NULL was used to get the very first_key the interface relied in
> > > > a random (non existent) key to retrieve the first_key in the map, and
> > > > I was told what we still have to support that scenario.
> > >
> > > BPF_MAP_DUMP is slightly different, as you may return the first key
> > > multiple times in the same call. Also, BPF_MAP_DUMP is new, so we
> > > don't have to support legacy scenarios.
> > >
> > > Since BPF_MAP_DUMP keeps a list of elements. It is possible to try
> > > to look up previous keys. Would something down this direction work?
> >
> > I've been thinking about it and I think first we need a way to detect
> > that since key was not present we got the first_key instead:
> >
> > - One solution I had in mind was to explicitly asked for the first key
> > with map_get_next_key(map, NULL, first_key) and while walking the map
> > check that map_get_next_key(map, prev_key, key) doesn't return the
> > same key. This could be done using memcmp.
> > - Discussing with Stan, he mentioned that another option is to support
> > a flag in map_get_next_key to let it know that we want an error
> > instead of the first_key.
> >
> > After detecting the problem we also need to define what we want to do,
> > here some options:
> >
> > a) Return the error to the caller
> > b) Try with previous keys if any (which be limited to the keys that we
> > have traversed so far in this dump call)
> > c) continue with next entries in the map. array is easy just get the
> > next valid key (starting on i+1), but hmap might be difficult since
> > starting on the next bucket could potentially skip some keys that were
> > concurrently added to the same bucket where key used to be, and
> > starting on the same bucket could lead us to return repeated elements.
> >
> > Or maybe we could support those 3 cases via flags and let the caller
> > decide which one to use?
>
> this type of indecision is the reason why I wasn't excited about
> batch dumping in the first place and gave 'soft yes' when Stan
> mentioned it during lsf/mm/bpf uconf.
> We probably shouldn't do it.
> It feels this map_dump makes api more complex and doesn't really
> give much benefit to the user other than large map dump becomes faster.
> I think we gotta solve this problem differently.

Some users are working around the dumping problems with the existing
api by creating a bpf_map_get_next_key_and_delete userspace function
(see https://www.bouncybouncy.net/blog/bpf_map_get_next_key-pitfalls/)
which in my opinion is actually a good idea. The only problem with
that is that calling bpf_map_get_next_key(fd, key, next_key) and then
bpf_map_delete_elem(fd, key) from userspace is racing with kernel code
and it might lose some information when deleting.
We could then do map_dump_and_delete using that idea but in the kernel
where we could better handle the racing condition. In that scenario
even if we retrieve the same key it will contain different info ( the
delta between old and new value). Would that work?

^ permalink raw reply

* Re: [PATCH 00/12] block/bio, fs: convert put_page() to put_user_page*()
From: John Hubbard @ 2019-07-26  1:24 UTC (permalink / raw)
  To: Bob Liu, Andrew Morton
  Cc: Alexander Viro, Anna Schumaker, David S . Miller,
	Dominique Martinet, Eric Van Hensbergen, Jason Gunthorpe,
	Jason Wang, Jens Axboe, Latchesar Ionkov, Michael S . Tsirkin,
	Miklos Szeredi, Trond Myklebust, Christoph Hellwig,
	Matthew Wilcox, linux-mm, LKML, ceph-devel, kvm, linux-block,
	linux-cifs, linux-fsdevel, linux-nfs, linux-rdma, netdev,
	samba-technical, v9fs-developer, virtualization
In-Reply-To: <8621066c-e242-c449-eb04-4f2ce6867140@oracle.com>

On 7/24/19 5:41 PM, Bob Liu wrote:
> On 7/24/19 12:25 PM, john.hubbard@gmail.com wrote:
>> From: John Hubbard <jhubbard@nvidia.com>
>>
>> Hi,
>>
>> This is mostly Jerome's work, converting the block/bio and related areas
>> to call put_user_page*() instead of put_page(). Because I've changed
>> Jerome's patches, in some cases significantly, I'd like to get his
>> feedback before we actually leave him listed as the author (he might
>> want to disown some or all of these).
>>
> 
> Could you add some background to the commit log for people don't have the context..
> Why this converting? What's the main differences?
> 

Hi Bob,

1. Many of the patches have a blurb like this:

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

...and if you look at that commit, you'll find several pages of
information in its commit description, which should address your point.

2. This whole series has to be re-worked, as per the other feedback thread.
So I'll keep your comment in mind when I post a new series.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply

* Re: [PATCH bpf-next v3 0/7] bpf/flow_dissector: support input flags
From: Alexei Starovoitov @ 2019-07-26  1:06 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Network Development, bpf, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Song Liu, Willem de Bruijn, Petar Penkov
In-Reply-To: <20190725225231.195090-1-sdf@google.com>

On Thu, Jul 25, 2019 at 3:52 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> C flow dissector supports input flags that tell it to customize parsing
> by either stopping early or trying to parse as deep as possible.
> BPF flow dissector always parses as deep as possible which is sub-optimal.
> Pass input flags to the BPF flow dissector as well so it can make the same
> decisions.
>
> Series outline:
> * remove unused FLOW_DISSECTOR_F_STOP_AT_L3 flag
> * export FLOW_DISSECTOR_F_XXX flags as uapi and pass them to BPF
>   flow dissector
> * add documentation for the export flags
> * support input flags in BPF_PROG_TEST_RUN via ctx_{in,out}
> * sync uapi to tools
> * support FLOW_DISSECTOR_F_PARSE_1ST_FRAG in selftest
> * support FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL in kernel and selftest
> * support FLOW_DISSECTOR_F_STOP_AT_ENCAP in selftest
>
> Pros:
> * makes BPF flow dissector faster by avoiding burning extra cycles
> * existing BPF progs continue to work by ignoring the flags and always
>   parsing as deep as possible
>
> Cons:
> * new UAPI which we need to support (OTOH, if we need to deprecate some
>   flags, we can just stop setting them upon calling BPF programs)
>
> Some numbers (with .repeat = 4000000 in test_flow_dissector):
>         test_flow_dissector:PASS:ipv4-frag 35 nsec
>         test_flow_dissector:PASS:ipv4-frag 35 nsec
>         test_flow_dissector:PASS:ipv4-no-frag 32 nsec
>         test_flow_dissector:PASS:ipv4-no-frag 32 nsec
>
>         test_flow_dissector:PASS:ipv6-frag 39 nsec
>         test_flow_dissector:PASS:ipv6-frag 39 nsec
>         test_flow_dissector:PASS:ipv6-no-frag 36 nsec
>         test_flow_dissector:PASS:ipv6-no-frag 36 nsec
>
>         test_flow_dissector:PASS:ipv6-flow-label 36 nsec
>         test_flow_dissector:PASS:ipv6-flow-label 36 nsec
>         test_flow_dissector:PASS:ipv6-no-flow-label 33 nsec
>         test_flow_dissector:PASS:ipv6-no-flow-label 33 nsec
>
>         test_flow_dissector:PASS:ipip-encap 38 nsec
>         test_flow_dissector:PASS:ipip-encap 38 nsec
>         test_flow_dissector:PASS:ipip-no-encap 32 nsec
>         test_flow_dissector:PASS:ipip-no-encap 32 nsec
>
> The improvement is around 10%, but it's in a tight cache-hot
> BPF_PROG_TEST_RUN loop.

Applied. Thanks

^ permalink raw reply

* Re: [PATCH bpf-next 2/6] bpf: add BPF_MAP_DUMP command to dump more than one entry per call
From: Willem de Bruijn @ 2019-07-26  1:02 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Brian Vazquez, Song Liu, Brian Vazquez, Alexei Starovoitov,
	Daniel Borkmann, David S . Miller, Stanislav Fomichev,
	Petar Penkov, open list, Networking, bpf
In-Reply-To: <20190725235432.lkptx3fafegnm2et@ast-mbp>

On Thu, Jul 25, 2019 at 7:54 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Jul 25, 2019 at 04:25:53PM -0700, Brian Vazquez wrote:
> > > > > If prev_key is deleted before map_get_next_key(), we get the first key
> > > > > again. This is pretty weird.
> > > >
> > > > Yes, I know. But note that the current scenario happens even for the
> > > > old interface (imagine you are walking a map from userspace and you
> > > > tried get_next_key the prev_key was removed, you will start again from
> > > > the beginning without noticing it).
> > > > I tried to sent a patch in the past but I was missing some context:
> > > > before NULL was used to get the very first_key the interface relied in
> > > > a random (non existent) key to retrieve the first_key in the map, and
> > > > I was told what we still have to support that scenario.
> > >
> > > BPF_MAP_DUMP is slightly different, as you may return the first key
> > > multiple times in the same call. Also, BPF_MAP_DUMP is new, so we
> > > don't have to support legacy scenarios.
> > >
> > > Since BPF_MAP_DUMP keeps a list of elements. It is possible to try
> > > to look up previous keys. Would something down this direction work?
> >
> > I've been thinking about it and I think first we need a way to detect
> > that since key was not present we got the first_key instead:
> >
> > - One solution I had in mind was to explicitly asked for the first key
> > with map_get_next_key(map, NULL, first_key) and while walking the map
> > check that map_get_next_key(map, prev_key, key) doesn't return the
> > same key. This could be done using memcmp.
> > - Discussing with Stan, he mentioned that another option is to support
> > a flag in map_get_next_key to let it know that we want an error
> > instead of the first_key.
> >
> > After detecting the problem we also need to define what we want to do,
> > here some options:
> >
> > a) Return the error to the caller
> > b) Try with previous keys if any (which be limited to the keys that we
> > have traversed so far in this dump call)
> > c) continue with next entries in the map. array is easy just get the
> > next valid key (starting on i+1), but hmap might be difficult since
> > starting on the next bucket could potentially skip some keys that were
> > concurrently added to the same bucket where key used to be, and
> > starting on the same bucket could lead us to return repeated elements.
> >
> > Or maybe we could support those 3 cases via flags and let the caller
> > decide which one to use?
>
> this type of indecision is the reason why I wasn't excited about
> batch dumping in the first place and gave 'soft yes' when Stan
> mentioned it during lsf/mm/bpf uconf.
> We probably shouldn't do it.
> It feels this map_dump makes api more complex and doesn't really
> give much benefit to the user other than large map dump becomes faster.
> I think we gotta solve this problem differently.

Multiple variants with flags indeed makes the API complex. I think the
kernel should expose only the simplest, most obvious behavior that
allows the application to recover. In this case, that sounds like
option (a) and restart.

In practice, the common use case is to allocate enough user memory to
read an entire table in one go, in which case the entire issue is
moot.

The cycle savings of dump are significant for large tables. I'm not
sure how we achieve that differently and even simpler? We originally
looked at shared memory, but that is obviously much more complex.

^ permalink raw reply

* Re: [PATCH bpf-next v10 0/2] bpf: Allow bpf_skb_event_output for more prog types
From: Alexei Starovoitov @ 2019-07-26  0:59 UTC (permalink / raw)
  To: Allan Zhang
  Cc: Network Development, bpf, Song Liu, Daniel Borkmann,
	Andrii Nakryiko, Alexei Starovoitov
In-Reply-To: <20190724000725.15634-1-allanzhang@google.com>

On Tue, Jul 23, 2019 at 5:07 PM Allan Zhang <allanzhang@google.com> wrote:
>
> Software event output is only enabled by a few prog types right now (TC,
> LWT out, XDP, sockops). Many other skb based prog types need
> bpf_skb_event_output to produce software event.
>
> More prog types are enabled to access bpf_skb_event_output in this
> patch.

Applied. Thanks

^ permalink raw reply

* Re: pull-request: bpf 2019-07-25
From: David Miller @ 2019-07-26  0:35 UTC (permalink / raw)
  To: ast; +Cc: daniel, netdev, bpf, kernel-team
In-Reply-To: <20190725173541.2413580-1-ast@kernel.org>

From: Alexei Starovoitov <ast@kernel.org>
Date: Thu, 25 Jul 2019 10:35:41 -0700

> The following pull-request contains BPF updates for your *net* tree.
> 
> The main changes are:
> 
> 1) fix segfault in libbpf, from Andrii.
> 
> 2) fix gso_segs access, from Eric.
> 
> 3) tls/sockmap fixes, from Jakub and John.
> 
> Please consider pulling these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Pulled, thanks Alexei.

I will push back out after build testing.

^ permalink raw reply

* Re: [PATCH net-next] net: mvneta: use devm_platform_ioremap_resource() to simplify code
From: David Miller @ 2019-07-26  0:28 UTC (permalink / raw)
  To: Jisheng.Zhang; +Cc: thomas.petazzoni, netdev, linux-kernel
In-Reply-To: <20190725153741.095dca99@xhacker.debian>

From: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Date: Thu, 25 Jul 2019 07:48:04 +0000

> devm_platform_ioremap_resource() wraps platform_get_resource() and
> devm_ioremap_resource() in a single helper, let's use that helper to
> simplify the code.
> 
> Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>

Applied.

^ permalink raw reply

* Re: [PATCH] hv_sock: use HV_HYP_PAGE_SIZE instead of PAGE_SIZE_4K
From: David Miller @ 2019-07-26  0:26 UTC (permalink / raw)
  To: himadrispandya
  Cc: mikelley, kys, haiyangz, sthemmin, sashal, linux-hyperv, netdev,
	linux-kernel, himadri18.07
In-Reply-To: <20190725051125.10605-1-himadri18.07@gmail.com>

From: Himadri Pandya <himadrispandya@gmail.com>
Date: Thu, 25 Jul 2019 05:11:25 +0000

> Older windows hosts require the hv_sock ring buffer to be defined
> using 4K pages. This was achieved by using the symbol PAGE_SIZE_4K
> defined specifically for this purpose. But now we have a new symbol
> HV_HYP_PAGE_SIZE defined in hyperv-tlfs which can be used for this.
> 
> This patch removes the definition of symbol PAGE_SIZE_4K and replaces
> its usage with the symbol HV_HYP_PAGE_SIZE. This patch also aligns
> sndbuf and rcvbuf to hyper-v specific page size using HV_HYP_PAGE_SIZE
> instead of the guest page size(PAGE_SIZE) as hyper-v expects the page
> size to be 4K and it might not be the case on ARM64 architecture.
> 
> Signed-off-by: Himadri Pandya <himadri18.07@gmail.com>

This doesn't compile:

  CC [M]  net/vmw_vsock/hyperv_transport.o
net/vmw_vsock/hyperv_transport.c:58:28: error: ‘HV_HYP_PAGE_SIZE’ undeclared here (not in a function); did you mean ‘HV_MESSAGE_SIZE’?
 #define HVS_SEND_BUF_SIZE (HV_HYP_PAGE_SIZE - sizeof(struct vmpipe_proto_header))
                            ^~~~~~~~~~~~~~~~

^ permalink raw reply

* Re: [PATCH net] selftests/net: add missing gitignores (ipv6_flowlabel)
From: David Miller @ 2019-07-26  0:26 UTC (permalink / raw)
  To: jakub.kicinski; +Cc: netdev, oss-drivers, willemb, quentin.monnet
In-Reply-To: <20190725000714.10200-1-jakub.kicinski@netronome.com>

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Wed, 24 Jul 2019 17:07:14 -0700

> ipv6_flowlabel and ipv6_flowlabel_mgr are missing from
> gitignore.  Quentin points out that the original
> commit 3fb321fde22d ("selftests/net: ipv6 flowlabel")
> did add ignore entries, they are just missing the "ipv6_"
> prefix.
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>

Applied.

^ permalink raw reply

* Re: [PATCH] ipip: validate header length in ipip_tunnel_xmit
From: David Miller @ 2019-07-26  0:26 UTC (permalink / raw)
  To: yanhaishuang; +Cc: kuznet, netdev, linux-kernel
In-Reply-To: <1564024076-13764-1-git-send-email-yanhaishuang@cmss.chinamobile.com>

From: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Date: Thu, 25 Jul 2019 11:07:55 +0800

> We need the same checks introduced by commit cb9f1b783850
> ("ip: validate header length on virtual device xmit") for
> ipip tunnel.
> 
> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH] net: mscc: ocelot: null check devm_kcalloc
From: David Miller @ 2019-07-26  0:20 UTC (permalink / raw)
  To: navid.emamdoost
  Cc: emamd001, kjlu, smccaman, secalert, alexandre.belloni,
	UNGLinuxDriver, netdev, linux-kernel
In-Reply-To: <20190725015609.24389-1-navid.emamdoost@gmail.com>

From: Navid Emamdoost <navid.emamdoost@gmail.com>
Date: Wed, 24 Jul 2019 20:56:09 -0500

> devm_kcalloc may fail and return NULL. Added the null check.
> 
> Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
> ---
>  drivers/net/ethernet/mscc/ocelot_board.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/ethernet/mscc/ocelot_board.c b/drivers/net/ethernet/mscc/ocelot_board.c
> index 58bde1a9eacb..52377cfdc31a 100644
> --- a/drivers/net/ethernet/mscc/ocelot_board.c
> +++ b/drivers/net/ethernet/mscc/ocelot_board.c
> @@ -257,6 +257,8 @@ static int mscc_ocelot_probe(struct platform_device *pdev)
>  
>  	ocelot->ports = devm_kcalloc(&pdev->dev, ocelot->num_phys_ports,
>  				     sizeof(struct ocelot_port *), GFP_KERNEL);
> +	if (!ocelot->ports)
> +		return -ENOMEM;
>  

At the very least this leaks a reference to 'ports'.  I didn't check what other
resources obtained by this function are leaked as well by this change, please
audit before resubmitting.

^ permalink raw reply

* Re: [PATCH v4 net-next 18/19] ionic: Add coalesce and other features
From: Saeed Mahameed @ 2019-07-26  0:13 UTC (permalink / raw)
  To: snelson@pensando.io, netdev@vger.kernel.org, davem@davemloft.net
In-Reply-To: <20190722214023.9513-19-snelson@pensando.io>

On Mon, 2019-07-22 at 14:40 -0700, Shannon Nelson wrote:
> Interrupt coalescing, tunable copybreak value, and
> tx timeout.
> 
> Signed-off-by: Shannon Nelson <snelson@pensando.io>
> ---
>  drivers/net/ethernet/pensando/ionic/ionic.h   |   2 +-
>  .../ethernet/pensando/ionic/ionic_ethtool.c   | 105
> ++++++++++++++++++
>  .../net/ethernet/pensando/ionic/ionic_lif.c   |  13 ++-
>  .../net/ethernet/pensando/ionic/ionic_lif.h   |   1 +
>  4 files changed, 119 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/pensando/ionic/ionic.h
> b/drivers/net/ethernet/pensando/ionic/ionic.h
> index 9b720187b549..cd08166f73a9 100644
> --- a/drivers/net/ethernet/pensando/ionic/ionic.h
> +++ b/drivers/net/ethernet/pensando/ionic/ionic.h
> @@ -11,7 +11,7 @@ struct lif;
>  
>  #define DRV_NAME		"ionic"
>  #define DRV_DESCRIPTION		"Pensando Ethernet NIC Driver"
> -#define DRV_VERSION		"0.11.0-k"
> +#define DRV_VERSION		"0.11.0-44-k"
>  
>  #define PCI_VENDOR_ID_PENSANDO			0x1dd8
>  
> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_ethtool.c
> b/drivers/net/ethernet/pensando/ionic/ionic_ethtool.c
> index 742d7d47f4d8..e6b579a40b70 100644
> --- a/drivers/net/ethernet/pensando/ionic/ionic_ethtool.c
> +++ b/drivers/net/ethernet/pensando/ionic/ionic_ethtool.c
> @@ -377,6 +377,75 @@ static int ionic_get_coalesce(struct net_device
> *netdev,
>  	return 0;
>  }
>  
> +static int ionic_set_coalesce(struct net_device *netdev,
> +			      struct ethtool_coalesce *coalesce)
> +{
> +	struct lif *lif = netdev_priv(netdev);
> +	struct identity *ident = &lif->ionic->ident;
> +	struct ionic_dev *idev = &lif->ionic->idev;
> +	u32 tx_coal, rx_coal;
> +	struct qcq *qcq;
> +	unsigned int i;
> +
> +	if (coalesce->rx_max_coalesced_frames ||
> +	    coalesce->rx_coalesce_usecs_irq ||
> +	    coalesce->rx_max_coalesced_frames_irq ||
> +	    coalesce->tx_max_coalesced_frames ||
> +	    coalesce->tx_coalesce_usecs_irq ||
> +	    coalesce->tx_max_coalesced_frames_irq ||
> +	    coalesce->stats_block_coalesce_usecs ||
> +	    coalesce->use_adaptive_rx_coalesce ||
> +	    coalesce->use_adaptive_tx_coalesce ||
> +	    coalesce->pkt_rate_low ||
> +	    coalesce->rx_coalesce_usecs_low ||
> +	    coalesce->rx_max_coalesced_frames_low ||
> +	    coalesce->tx_coalesce_usecs_low ||
> +	    coalesce->tx_max_coalesced_frames_low ||
> +	    coalesce->pkt_rate_high ||
> +	    coalesce->rx_coalesce_usecs_high ||
> +	    coalesce->rx_max_coalesced_frames_high ||
> +	    coalesce->tx_coalesce_usecs_high ||
> +	    coalesce->tx_max_coalesced_frames_high ||
> +	    coalesce->rate_sample_interval)
> +		return -EINVAL;
> +
> +	if (ident->dev.intr_coal_div == 0)
> +		return -EIO;
> +
> +	/* Convert from usecs to device units */
> +	tx_coal = coalesce->tx_coalesce_usecs *
> +		  le32_to_cpu(ident->dev.intr_coal_mult) /
> +		  le32_to_cpu(ident->dev.intr_coal_div);
> +	rx_coal = coalesce->rx_coalesce_usecs *
> +		  le32_to_cpu(ident->dev.intr_coal_mult) /
> +		  le32_to_cpu(ident->dev.intr_coal_div);
> +
> +	if (tx_coal > INTR_CTRL_COAL_MAX || rx_coal >
> INTR_CTRL_COAL_MAX)
> +		return -ERANGE;
> +
> +	if (coalesce->tx_coalesce_usecs != lif->tx_coalesce_usecs) {
> +		for (i = 0; i < lif->nxqs; i++) {
> +			qcq = lif->txqcqs[i].qcq;
> +			ionic_intr_coal_init(idev->intr_ctrl,
> +					     qcq->intr.index,
> +					     tx_coal);
> +		}
> +		lif->tx_coalesce_usecs = coalesce->tx_coalesce_usecs;
> +	}
> +
> +	if (coalesce->rx_coalesce_usecs != lif->rx_coalesce_usecs) {
> +		for (i = 0; i < lif->nxqs; i++) {
> +			qcq = lif->rxqcqs[i].qcq;
> +			ionic_intr_coal_init(idev->intr_ctrl,
> +					     qcq->intr.index,
> +					     rx_coal);
> +		}
> +		lif->rx_coalesce_usecs = coalesce->rx_coalesce_usecs;
> +	}
> +
> +	return 0;
> +}
> +
>  static void ionic_get_ringparam(struct net_device *netdev,
>  				struct ethtool_ringparam *ring)
>  {
> @@ -562,6 +631,39 @@ static int ionic_set_priv_flags(struct
> net_device *netdev, u32 priv_flags)
>  	return 0;
>  }
>  
> +static int ionic_set_tunable(struct net_device *dev,
> +			     const struct ethtool_tunable *tuna,
> +			     const void *data)
> +{
> +	struct lif *lif = netdev_priv(dev);
> +
> +	switch (tuna->id) {
> +	case ETHTOOL_RX_COPYBREAK:
> +		lif->rx_copybreak = *(u32 *)data;
> +		break;
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +
> +	return 0;
> +}
> +
> +static int ionic_get_tunable(struct net_device *netdev,
> +			     const struct ethtool_tunable *tuna, void
> *data)
> +{
> +	struct lif *lif = netdev_priv(netdev);
> +
> +	switch (tuna->id) {
> +	case ETHTOOL_RX_COPYBREAK:
> +		*(u32 *)data = lif->rx_copybreak;
> +		break;
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +
> +	return 0;
> +}
> +
>  static int ionic_get_module_info(struct net_device *netdev,
>  				 struct ethtool_modinfo *modinfo)
>  
> @@ -641,6 +743,7 @@ static const struct ethtool_ops ionic_ethtool_ops
> = {
>  	.get_link		= ethtool_op_get_link,
>  	.get_link_ksettings	= ionic_get_link_ksettings,
>  	.get_coalesce		= ionic_get_coalesce,
> +	.set_coalesce		= ionic_set_coalesce,
>  	.get_ringparam		= ionic_get_ringparam,
>  	.set_ringparam		= ionic_set_ringparam,
>  	.get_channels		= ionic_get_channels,
> @@ -655,6 +758,8 @@ static const struct ethtool_ops ionic_ethtool_ops
> = {
>  	.set_rxfh		= ionic_set_rxfh,
>  	.get_priv_flags		= ionic_get_priv_flags,
>  	.set_priv_flags		= ionic_set_priv_flags,
> +	.get_tunable		= ionic_get_tunable,
> +	.set_tunable		= ionic_set_tunable,
>  	.get_module_info	= ionic_get_module_info,
>  	.get_module_eeprom	= ionic_get_module_eeprom,
>  	.get_pauseparam		= ionic_get_pauseparam,
> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.c
> b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
> index 68a9975e34c6..8473b065763b 100644
> --- a/drivers/net/ethernet/pensando/ionic/ionic_lif.c
> +++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
> @@ -744,9 +744,19 @@ static int ionic_change_mtu(struct net_device
> *netdev, int new_mtu)
>  	return err;
>  }
>  
> +static void ionic_tx_timeout_work(struct work_struct *ws)
> +{
> +	struct lif *lif = container_of(ws, struct lif,
> tx_timeout_work);
> +
> +	netdev_info(lif->netdev, "Tx Timeout recovery\n");
> +	ionic_reset_queues(lif);

missing rtnl_lock ?

> +}
> +
>  static void ionic_tx_timeout(struct net_device *netdev)
>  {
> -	netdev_info(netdev, "%s: stubbed\n", __func__);
> +	struct lif *lif = netdev_priv(netdev);
> +
> +	schedule_work(&lif->tx_timeout_work);
>  }

missing cancel work ? be careful when combined with the rtnl_lockthough .. 

>  
>  static int ionic_vlan_rx_add_vid(struct net_device *netdev, __be16
> proto,
> @@ -2009,6 +2019,7 @@ static int ionic_lif_init(struct lif *lif)
>  
>  	ionic_link_status_check(lif);
>  
> +	INIT_WORK(&lif->tx_timeout_work, ionic_tx_timeout_work);
>  	return 0;
>  
>  err_out_notifyq_deinit:
> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.h
> b/drivers/net/ethernet/pensando/ionic/ionic_lif.h
> index 0e6908f959f2..76cc519acd5a 100644
> --- a/drivers/net/ethernet/pensando/ionic/ionic_lif.h
> +++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.h
> @@ -180,6 +180,7 @@ struct lif {
>  	unsigned int dbid_count;
>  	struct dentry *dentry;
>  	u32 flags;
> +	struct work_struct tx_timeout_work;
>  };
>  
>  #define lif_to_txqcq(lif, i)	((lif)->txqcqs[i].qcq)

^ permalink raw reply

* Re: [PATCH net 1/1] bnx2x: Disable multi-cos feature.
From: David Miller @ 2019-07-26  0:10 UTC (permalink / raw)
  To: skalluru; +Cc: netdev, manishc, mkalderon
In-Reply-To: <20190724023241.24794-1-skalluru@marvell.com>

From: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Date: Tue, 23 Jul 2019 19:32:41 -0700

> Commit 3968d38917eb ("bnx2x: Fix Multi-Cos.") which enabled multi-cos
> feature after prolonged time in driver added some regression causing
> numerous issues (sudden reboots, tx timeout etc.) reported by customers.
> We plan to backout this commit and submit proper fix once we have root
> cause of issues reported with this feature enabled.
> 
> Fixes: 3968d38917eb ("bnx2x: Fix Multi-Cos.")
> Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
> Signed-off-by: Manish Chopra <manishc@marvell.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH v4 net-next 15/19] ionic: Add netdev-event handling
From: Saeed Mahameed @ 2019-07-25 23:55 UTC (permalink / raw)
  To: snelson@pensando.io, netdev@vger.kernel.org, davem@davemloft.net
In-Reply-To: <20190722214023.9513-16-snelson@pensando.io>

On Mon, 2019-07-22 at 14:40 -0700, Shannon Nelson wrote:
> When the netdev gets a new name from userland, pass that name
> down to the NIC for internal tracking.
> 

Just out of curiosity, why your NIC internal device/firmware need to
keep tracking of the netdev name ?



^ permalink raw reply

* Re: [PATCH bpf-next 2/6] bpf: add BPF_MAP_DUMP command to dump more than one entry per call
From: Alexei Starovoitov @ 2019-07-25 23:54 UTC (permalink / raw)
  To: Brian Vazquez
  Cc: Song Liu, Brian Vazquez, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller, Stanislav Fomichev, Willem de Bruijn,
	Petar Penkov, open list, Networking, bpf
In-Reply-To: <CABCgpaV7mj5DhFqh44rUNVj5XMAyP+n79LrMobW_=DfvEaS4BQ@mail.gmail.com>

On Thu, Jul 25, 2019 at 04:25:53PM -0700, Brian Vazquez wrote:
> > > > If prev_key is deleted before map_get_next_key(), we get the first key
> > > > again. This is pretty weird.
> > >
> > > Yes, I know. But note that the current scenario happens even for the
> > > old interface (imagine you are walking a map from userspace and you
> > > tried get_next_key the prev_key was removed, you will start again from
> > > the beginning without noticing it).
> > > I tried to sent a patch in the past but I was missing some context:
> > > before NULL was used to get the very first_key the interface relied in
> > > a random (non existent) key to retrieve the first_key in the map, and
> > > I was told what we still have to support that scenario.
> >
> > BPF_MAP_DUMP is slightly different, as you may return the first key
> > multiple times in the same call. Also, BPF_MAP_DUMP is new, so we
> > don't have to support legacy scenarios.
> >
> > Since BPF_MAP_DUMP keeps a list of elements. It is possible to try
> > to look up previous keys. Would something down this direction work?
> 
> I've been thinking about it and I think first we need a way to detect
> that since key was not present we got the first_key instead:
> 
> - One solution I had in mind was to explicitly asked for the first key
> with map_get_next_key(map, NULL, first_key) and while walking the map
> check that map_get_next_key(map, prev_key, key) doesn't return the
> same key. This could be done using memcmp.
> - Discussing with Stan, he mentioned that another option is to support
> a flag in map_get_next_key to let it know that we want an error
> instead of the first_key.
> 
> After detecting the problem we also need to define what we want to do,
> here some options:
> 
> a) Return the error to the caller
> b) Try with previous keys if any (which be limited to the keys that we
> have traversed so far in this dump call)
> c) continue with next entries in the map. array is easy just get the
> next valid key (starting on i+1), but hmap might be difficult since
> starting on the next bucket could potentially skip some keys that were
> concurrently added to the same bucket where key used to be, and
> starting on the same bucket could lead us to return repeated elements.
> 
> Or maybe we could support those 3 cases via flags and let the caller
> decide which one to use?

this type of indecision is the reason why I wasn't excited about
batch dumping in the first place and gave 'soft yes' when Stan
mentioned it during lsf/mm/bpf uconf.
We probably shouldn't do it.
It feels this map_dump makes api more complex and doesn't really
give much benefit to the user other than large map dump becomes faster.
I think we gotta solve this problem differently.


^ permalink raw reply

* Re: [PATCH v4 net-next 14/19] ionic: Add Tx and Rx handling
From: Saeed Mahameed @ 2019-07-25 23:48 UTC (permalink / raw)
  To: snelson@pensando.io, netdev@vger.kernel.org, davem@davemloft.net
In-Reply-To: <20190722214023.9513-15-snelson@pensando.io>

On Mon, 2019-07-22 at 14:40 -0700, Shannon Nelson wrote:
> Add both the Tx and Rx queue setup and handling.  The related
> stats display comes later.  Instead of using the generic napi
> routines used by the slow-path commands, the Tx and Rx paths
> are simplified and inlined in one file in order to get better
> compiler optimizations.
> 
> Signed-off-by: Shannon Nelson <snelson@pensando.io>
> ---
>  drivers/net/ethernet/pensando/ionic/Makefile  |   2 +-
>  .../net/ethernet/pensando/ionic/ionic_lif.c   | 387 ++++++++
>  .../net/ethernet/pensando/ionic/ionic_lif.h   |  52 ++
>  .../net/ethernet/pensando/ionic/ionic_txrx.c  | 879
> ++++++++++++++++++
>  .../net/ethernet/pensando/ionic/ionic_txrx.h  |  15 +
>  5 files changed, 1334 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/net/ethernet/pensando/ionic/ionic_txrx.c
>  create mode 100644 drivers/net/ethernet/pensando/ionic/ionic_txrx.h
> 
> diff --git a/drivers/net/ethernet/pensando/ionic/Makefile
> b/drivers/net/ethernet/pensando/ionic/Makefile
> index 9b19bf57a489..0e2dc53f08d4 100644
> --- a/drivers/net/ethernet/pensando/ionic/Makefile
> +++ b/drivers/net/ethernet/pensando/ionic/Makefile
> @@ -4,4 +4,4 @@
>  obj-$(CONFIG_IONIC) := ionic.o
>  
>  ionic-y := ionic_main.o ionic_bus_pci.o ionic_dev.o ionic_ethtool.o
> \
> -	   ionic_lif.o ionic_rx_filter.o ionic_debugfs.o
> +	   ionic_lif.o ionic_rx_filter.o ionic_txrx.o ionic_debugfs.o
> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.c
> b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
> index 2bd8ce61c4a0..40d3b1cb362a 100644
> --- a/drivers/net/ethernet/pensando/ionic/ionic_lif.c
> +++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
> @@ -10,6 +10,7 @@
>  #include "ionic.h"
>  #include "ionic_bus.h"
>  #include "ionic_lif.h"
> +#include "ionic_txrx.h"
>  #include "ionic_ethtool.h"
>  #include "ionic_debugfs.h"
>  
> @@ -18,6 +19,13 @@ static int ionic_lif_addr_add(struct lif *lif,
> const u8 *addr);
>  static int ionic_lif_addr_del(struct lif *lif, const u8 *addr);
>  static void ionic_link_status_check(struct lif *lif);
>  
> +static int ionic_lif_stop(struct lif *lif);
> +static int ionic_txrx_alloc(struct lif *lif);
> +static int ionic_txrx_init(struct lif *lif);
> +static void ionic_qcq_free(struct lif *lif, struct qcq *qcq);
> +static int ionic_lif_txqs_init(struct lif *lif);
> +static int ionic_lif_rxqs_init(struct lif *lif);
> +static void ionic_lif_qcq_deinit(struct lif *lif, struct qcq *qcq);
>  static int ionic_set_nic_features(struct lif *lif, netdev_features_t
> features);
>  static int ionic_notifyq_clean(struct lif *lif, int budget);
>  
> @@ -66,12 +74,96 @@ static void ionic_lif_deferred_enqueue(struct
> ionic_deferred *def,
>  	schedule_work(&def->work);
>  }

Bottom up or top down ? your current design is very mixed and I had to
to scroll down and up too often just to understand what a function is
doing, i strongly suggest to pick an use one approach.

[1] 
https://en.wikipedia.org/wiki/Top-down_and_bottom-up_design#Programming


>  
> +static int ionic_qcq_enable(struct qcq *qcq)
> +{
> +	struct queue *q = &qcq->q;
> +	struct lif *lif = q->lif;
> +	struct device *dev = lif->ionic->dev;
> +	struct ionic_dev *idev = &lif->ionic->idev;
> +	struct ionic_admin_ctx ctx = {
> +		.work = COMPLETION_INITIALIZER_ONSTACK(ctx.work),
> +		.cmd.q_control = {
> +			.opcode = CMD_OPCODE_Q_CONTROL,
> +			.lif_index = cpu_to_le16(lif->index),
> +			.type = q->type,
> +			.index = cpu_to_le32(q->index),
> +			.oper = IONIC_Q_ENABLE,
> +		},
> +	};
> +
> +	dev_dbg(dev, "q_enable.index %d q_enable.qtype %d\n",
> +		ctx.cmd.q_control.index, ctx.cmd.q_control.type);
> +
> +	if (qcq->flags & QCQ_F_INTR) {
> +		irq_set_affinity_hint(qcq->intr.vector,
> +				      &qcq->intr.affinity_mask);
> +		napi_enable(&qcq->napi);
> +		ionic_intr_clean(idev->intr_ctrl, qcq->intr.index);
> +		ionic_intr_mask(idev->intr_ctrl, qcq->intr.index,
> +				IONIC_INTR_MASK_CLEAR);
> +	}
> +
> +	return ionic_adminq_post_wait(lif, &ctx);
> +}
> +
> +static int ionic_qcq_disable(struct qcq *qcq)
> +{
> +	struct queue *q = &qcq->q;
> +	struct lif *lif = q->lif;
> +	struct device *dev = lif->ionic->dev;
> +	struct ionic_dev *idev = &lif->ionic->idev;
> +	struct ionic_admin_ctx ctx = {
> +		.work = COMPLETION_INITIALIZER_ONSTACK(ctx.work),
> +		.cmd.q_control = {
> +			.opcode = CMD_OPCODE_Q_CONTROL,
> +			.lif_index = cpu_to_le16(lif->index),
> +			.type = q->type,
> +			.index = cpu_to_le32(q->index),
> +			.oper = IONIC_Q_DISABLE,
> +		},
> +	};
> +
> +	dev_dbg(dev, "q_disable.index %d q_disable.qtype %d\n",
> +		ctx.cmd.q_control.index, ctx.cmd.q_control.type);
> +
> +	if (qcq->flags & QCQ_F_INTR) {
> +		ionic_intr_mask(idev->intr_ctrl, qcq->intr.index,
> +				IONIC_INTR_MASK_SET);
> +		synchronize_irq(qcq->intr.vector);
> +		irq_set_affinity_hint(qcq->intr.vector, NULL);
> +		napi_disable(&qcq->napi);
> +	}
> +
> +	return ionic_adminq_post_wait(lif, &ctx);
> +}
> +
>  int ionic_open(struct net_device *netdev)
>  {
>  	struct lif *lif = netdev_priv(netdev);
> +	unsigned int i;
> +	int err;
>  
>  	netif_carrier_off(netdev);
>  
> +	err = ionic_txrx_alloc(lif);
> +	if (err)
> +		return err;
> +
> +	err = ionic_txrx_init(lif);
> +	if (err)
> +		goto err_out;
> +
> +	for (i = 0; i < lif->nxqs; i++) {
> +		err = ionic_qcq_enable(lif->txqcqs[i].qcq);
> +		if (err)
> +			goto err_out;
> +
> +		ionic_rx_fill(&lif->rxqcqs[i].qcq->q);
> +		err = ionic_qcq_enable(lif->rxqcqs[i].qcq);
> +		if (err)
> +			goto err_out;
> +	}
> +
>  	set_bit(LIF_UP, lif->state);
>  
>  	ionic_link_status_check(lif);
> @@ -79,11 +171,16 @@ int ionic_open(struct net_device *netdev)
>  		netif_tx_wake_all_queues(netdev);
>  
>  	return 0;
> +
> +err_out:
> +	ionic_lif_stop(lif);

This is dangerous, maybe now it is ok to stop a partially open driver, 
but future development might break this assumption, and to avoid this i
strongly recommend to have a symmetrical cleanup flow, or have a
documentation that says ionic_lif_stop might be called with partially
open resources and proper checks must be done.

> +	return err;
>  }
>  
>  static int ionic_lif_stop(struct lif *lif)
>  {
>  	struct net_device *ndev = lif->netdev;
> +	unsigned int i;
>  	int err = 0;
>  
>  	if (!test_bit(LIF_UP, lif->state)) {
> @@ -100,6 +197,21 @@ static int ionic_lif_stop(struct lif *lif)
>  	netif_tx_disable(ndev);
>  	synchronize_rcu();
>  
> +	for (i = 0; i < lif->nxqs; i++) {
> +		(void)ionic_qcq_disable(lif->txqcqs[i].qcq);
> +		ionic_tx_flush(&lif->txqcqs[i].qcq->cq);
> +		ionic_lif_qcq_deinit(lif, lif->txqcqs[i].qcq);
> +		ionic_qcq_free(lif, lif->txqcqs[i].qcq);
> +		lif->txqcqs[i].qcq = NULL;
> +
> +		(void)ionic_qcq_disable(lif->rxqcqs[i].qcq);
> +		ionic_rx_flush(&lif->rxqcqs[i].qcq->cq);
> +		ionic_lif_qcq_deinit(lif, lif->rxqcqs[i].qcq);
> +		ionic_rx_empty(&lif->rxqcqs[i].qcq->q);
> +		ionic_qcq_free(lif, lif->rxqcqs[i].qcq);
> +		lif->rxqcqs[i].qcq = NULL;
> +	}
> +

Why don't you break this down to stages/functions (disbale/uninit/free)
as you did on open (alloc/ini/enable) ?  will be easier to review and
to spot inconsistencies
 
>  	return err;
>  }
>  
> @@ -694,6 +806,7 @@ static int ionic_vlan_rx_kill_vid(struct
> net_device *netdev, __be16 proto,
>  static const struct net_device_ops ionic_netdev_ops = {
>  	.ndo_open               = ionic_open,
>  	.ndo_stop               = ionic_stop,
> +	.ndo_start_xmit		= ionic_start_xmit,
>  	.ndo_get_stats64	= ionic_get_stats64,
>  	.ndo_set_rx_mode	= ionic_set_rx_mode,
>  	.ndo_set_features	= ionic_set_features,
> @@ -909,10 +1022,83 @@ static void ionic_qcq_free(struct lif *lif,
> struct qcq *qcq)
>  	devm_kfree(dev, qcq);
>  }
>  
> +static int ionic_txrx_alloc(struct lif *lif)
> +{
> +	unsigned int flags;
> +	unsigned int i;
> +	int err = 0;
> +
> +	flags = QCQ_F_TX_STATS | QCQ_F_SG;
> +	for (i = 0; i < lif->nxqs; i++) {
> +		err = ionic_qcq_alloc(lif, IONIC_QTYPE_TXQ, i, "tx",
> flags,
> +				      lif->ntxq_descs,
> +				      sizeof(struct txq_desc),
> +				      sizeof(struct txq_comp),
> +				      sizeof(struct txq_sg_desc),
> +				      lif->kern_pid, &lif-
> >txqcqs[i].qcq);
> +		if (err)
> +			goto err_out_free_txqcqs;
> +
> +		lif->txqcqs[i].qcq->stats = lif->txqcqs[i].stats;
> +	}
> +
> +	flags = QCQ_F_RX_STATS | QCQ_F_INTR;
> +	for (i = 0; i < lif->nxqs; i++) {
> +		err = ionic_qcq_alloc(lif, IONIC_QTYPE_RXQ, i, "rx",
> flags,
> +				      lif->nrxq_descs,
> +				      sizeof(struct rxq_desc),
> +				      sizeof(struct rxq_comp),
> +				      0, lif->kern_pid, &lif-
> >rxqcqs[i].qcq);
> +		if (err)
> +			goto err_out_free_rxqcqs;
> +
> +		lif->rxqcqs[i].qcq->stats = lif->rxqcqs[i].stats;
> +
> +		ionic_link_qcq_interrupts(lif->rxqcqs[i].qcq,
> +					  lif->txqcqs[i].qcq);
> +	}
> +
> +	return 0;
> +
> +err_out_free_rxqcqs:
> +	for (i = 0; i < lif->nxqs; i++)
> +		ionic_qcq_free(lif, lif->rxqcqs[i].qcq);
> +err_out_free_txqcqs:
> +	for (i = 0; i < lif->nxqs; i++)
> +		ionic_qcq_free(lif, lif->txqcqs[i].qcq);
> +
> +	return err;
> +}
> +
> +static int ionic_txrx_init(struct lif *lif)
> +{
> +	int err;
> +
> +	err = ionic_lif_txqs_init(lif);
> +	if (err)
> +		return err;
> +
> +	err = ionic_lif_rxqs_init(lif);
> +	if (err)
> +		goto err_out;
> +
> +	ionic_set_rx_mode(lif->netdev);
> +
> +	return 0;
> +
> +err_out:
> +	ionic_stop(lif->netdev);

I would move error handling to the caller, keep the code flow and err
path symmetrical with good path.

> +
> +	return err;
> +}
> +
>  static int ionic_qcqs_alloc(struct lif *lif)
>  {
> +	struct device *dev = lif->ionic->dev;
> +	unsigned int q_list_size;
>  	unsigned int flags;
>  	int err;
> +	int i;
>  
>  	flags = QCQ_F_INTR;
>  	err = ionic_qcq_alloc(lif, IONIC_QTYPE_ADMINQ, 0, "admin",
> flags,
> @@ -937,8 +1123,47 @@ static int ionic_qcqs_alloc(struct lif *lif)
>  		ionic_link_qcq_interrupts(lif->adminqcq, lif-
> >notifyqcq);
>  	}
>  
> +	q_list_size = sizeof(*lif->txqcqs) * lif->nxqs;
> +	err = -ENOMEM;
> +	lif->txqcqs = devm_kzalloc(dev, q_list_size, GFP_KERNEL);
> +	if (!lif->txqcqs)
> +		goto err_out_free_notifyqcq;
> +	for (i = 0; i < lif->nxqs; i++) {
> +		lif->txqcqs[i].stats = devm_kzalloc(dev, sizeof(struct
> q_stats),
> +						    GFP_KERNEL);

why not inlineing stats into txqcq struct and avoid this kzalloc  ? 

> +		if (!lif->txqcqs[i].stats)
> +			goto err_out_free_tx_stats;
> +	}
> +
> +	lif->rxqcqs = devm_kzalloc(dev, q_list_size, GFP_KERNEL);
> +	if (!lif->rxqcqs)
> +		goto err_out_free_tx_stats;
> +	for (i = 0; i < lif->nxqs; i++) {
> +		lif->rxqcqs[i].stats = devm_kzalloc(dev, sizeof(struct
> q_stats),
> +						    GFP_KERNEL);
> +		if (!lif->rxqcqs[i].stats)
> +			goto err_out_free_rx_stats;

same 

> +	}
> +
>  	return 0;
>  
> +err_out_free_rx_stats:
> +	for (i = 0; i < lif->nxqs; i++)
> +		if (lif->rxqcqs[i].stats)
> +			devm_kfree(dev, lif->rxqcqs[i].stats);
> +	devm_kfree(dev, lif->rxqcqs);
> +	lif->rxqcqs = NULL;
> +err_out_free_tx_stats:
> +	for (i = 0; i < lif->nxqs; i++)
> +		if (lif->txqcqs[i].stats)
> +			devm_kfree(dev, lif->txqcqs[i].stats);
> +	devm_kfree(dev, lif->txqcqs);
> +	lif->txqcqs = NULL;
> +err_out_free_notifyqcq:
> +	if (lif->notifyqcq) {
> +		ionic_qcq_free(lif, lif->notifyqcq);
> +		lif->notifyqcq = NULL;
> +	}
>  err_out_free_adminqcq:
>  	ionic_qcq_free(lif, lif->adminqcq);
>  	lif->adminqcq = NULL;
> @@ -948,6 +1173,9 @@ static int ionic_qcqs_alloc(struct lif *lif)
>  
>  static void ionic_qcqs_free(struct lif *lif)
>  {
> +	struct device *dev = lif->ionic->dev;
> +	unsigned int i;
> +
>  	if (lif->notifyqcq) {
>  		ionic_qcq_free(lif, lif->notifyqcq);
>  		lif->notifyqcq = NULL;
> @@ -957,6 +1185,20 @@ static void ionic_qcqs_free(struct lif *lif)
>  		ionic_qcq_free(lif, lif->adminqcq);
>  		lif->adminqcq = NULL;
>  	}
> +
> +	for (i = 0; i < lif->nxqs; i++)
> +		if (lif->rxqcqs[i].stats)
> +			devm_kfree(dev, lif->rxqcqs[i].stats);
> +
> +	devm_kfree(dev, lif->rxqcqs);
> +	lif->rxqcqs = NULL;
> +
> +	for (i = 0; i < lif->nxqs; i++)
> +		if (lif->txqcqs[i].stats)
> +			devm_kfree(dev, lif->txqcqs[i].stats);
> +
> +	devm_kfree(dev, lif->txqcqs);
> +	lif->txqcqs = NULL;
>  }
>  
>  static struct lif *ionic_lif_alloc(struct ionic *ionic, unsigned int
> index)
> @@ -992,6 +1234,8 @@ static struct lif *ionic_lif_alloc(struct ionic
> *ionic, unsigned int index)
>  
>  	lif->ionic = ionic;
>  	lif->index = index;
> +	lif->ntxq_descs = IONIC_DEF_TXRX_DESC;
> +	lif->nrxq_descs = IONIC_DEF_TXRX_DESC;
>  
>  	snprintf(lif->name, sizeof(lif->name), "lif%u", index);
>  
> @@ -1432,6 +1676,147 @@ static int ionic_init_nic_features(struct lif
> *lif)
>  	return 0;
>  }
>  
> +static int ionic_lif_txq_init(struct lif *lif, struct qcq *qcq)
> +{
> +	struct device *dev = lif->ionic->dev;
> +	struct queue *q = &qcq->q;
> +	struct cq *cq = &qcq->cq;
> +	struct ionic_admin_ctx ctx = {
> +		.work = COMPLETION_INITIALIZER_ONSTACK(ctx.work),
> +		.cmd.q_init = {
> +			.opcode = CMD_OPCODE_Q_INIT,
> +			.lif_index = cpu_to_le16(lif->index),
> +			.type = q->type,
> +			.index = cpu_to_le32(q->index),
> +			.flags = cpu_to_le16(IONIC_QINIT_F_IRQ |
> +					     IONIC_QINIT_F_SG),
> +			.intr_index = cpu_to_le16(lif->rxqcqs[q-
> >index].qcq->intr.index),
> +			.pid = cpu_to_le16(q->pid),
> +			.ring_size = ilog2(q->num_descs),
> +			.ring_base = cpu_to_le64(q->base_pa),
> +			.cq_ring_base = cpu_to_le64(cq->base_pa),
> +			.sg_ring_base = cpu_to_le64(q->sg_base_pa),
> +		},
> +	};
> +	int err;
> +
> +	dev_dbg(dev, "txq_init.pid %d\n", ctx.cmd.q_init.pid);
> +	dev_dbg(dev, "txq_init.index %d\n", ctx.cmd.q_init.index);
> +	dev_dbg(dev, "txq_init.ring_base 0x%llx\n",
> ctx.cmd.q_init.ring_base);
> +	dev_dbg(dev, "txq_init.ring_size %d\n",
> ctx.cmd.q_init.ring_size);
> +
> +	err = ionic_adminq_post_wait(lif, &ctx);
> +	if (err)
> +		return err;
> +
> +	q->hw_type = ctx.comp.q_init.hw_type;
> +	q->hw_index = le32_to_cpu(ctx.comp.q_init.hw_index);
> +	q->dbval = IONIC_DBELL_QID(q->hw_index);
> +
> +	dev_dbg(dev, "txq->hw_type %d\n", q->hw_type);
> +	dev_dbg(dev, "txq->hw_index %d\n", q->hw_index);
> +
> +	qcq->flags |= QCQ_F_INITED;
> +
> +	ionic_debugfs_add_qcq(lif, qcq);
> +
> +	return 0;
> +}
> +
> +static int ionic_lif_txqs_init(struct lif *lif)
> +{
> +	unsigned int i;
> +	int err;
> +
> +	for (i = 0; i < lif->nxqs; i++) {
> +		err = ionic_lif_txq_init(lif, lif->txqcqs[i].qcq);
> +		if (err)
> +			goto err_out;
> +	}
> +
> +	return 0;
> +
> +err_out:
> +	for (; i > 0; i--)
> +		ionic_lif_qcq_deinit(lif, lif->txqcqs[i-1].qcq);
> +
> +	return err;
> +}
> +
> +static int ionic_lif_rxq_init(struct lif *lif, struct qcq *qcq)
> +{
> +	struct device *dev = lif->ionic->dev;
> +	struct queue *q = &qcq->q;
> +	struct cq *cq = &qcq->cq;
> +	struct ionic_admin_ctx ctx = {
> +		.work = COMPLETION_INITIALIZER_ONSTACK(ctx.work),
> +		.cmd.q_init = {
> +			.opcode = CMD_OPCODE_Q_INIT,
> +			.lif_index = cpu_to_le16(lif->index),
> +			.type = q->type,
> +			.index = cpu_to_le32(q->index),
> +			.flags = cpu_to_le16(IONIC_QINIT_F_IRQ),
> +			.intr_index = cpu_to_le16(cq->bound_intr-
> >index),
> +			.pid = cpu_to_le16(q->pid),
> +			.ring_size = ilog2(q->num_descs),
> +			.ring_base = cpu_to_le64(q->base_pa),
> +			.cq_ring_base = cpu_to_le64(cq->base_pa),
> +		},
> +	};
> +	int err;
> +
> +	dev_dbg(dev, "rxq_init.pid %d\n", ctx.cmd.q_init.pid);
> +	dev_dbg(dev, "rxq_init.index %d\n", ctx.cmd.q_init.index);
> +	dev_dbg(dev, "rxq_init.ring_base 0x%llx\n",
> ctx.cmd.q_init.ring_base);
> +	dev_dbg(dev, "rxq_init.ring_size %d\n",
> ctx.cmd.q_init.ring_size);
> +
> +	err = ionic_adminq_post_wait(lif, &ctx);
> +	if (err)
> +		return err;
> +
> +	q->hw_type = ctx.comp.q_init.hw_type;
> +	q->hw_index = le32_to_cpu(ctx.comp.q_init.hw_index);
> +	q->dbval = IONIC_DBELL_QID(q->hw_index);
> +
> +	dev_dbg(dev, "rxq->hw_type %d\n", q->hw_type);
> +	dev_dbg(dev, "rxq->hw_index %d\n", q->hw_index);
> +
> +	netif_napi_add(lif->netdev, &qcq->napi, ionic_rx_napi,
> +		       NAPI_POLL_WEIGHT);
> +
> +	err = ionic_request_irq(lif, qcq);

Again, your init/deinit objects code paths is very asymmetrical 
I was looking for where you do free_irq, and eventually found it in 
ionic_lif_qcq_deinit which is shared between rx/tx queues, but it is
very hard to follow who sets the QCQ_F_INTR flag so
ionic_lif_qcq_deinit would free the irq. so i gave up !

I strongly suggest for you to make the code symmetrical as much as
possible by following two simple rules:
1) each function "do_foo" must have a reverse function "undo_foo".. 
2) each function "do_foo" should handle its own error path rollback and
NOT by blindly calling "undo_foo"

This is a very well defined code structure and useful technique to help
reviewers and developers keep track of what is happening, and most
importantly a very good practice that reduces the number of current
bugs and future bugs by an order of magnitude!

example:

open()
 alloc_txrx_queues()
 init_txrx_queues()
 request_irqs()
 enable_txrx_queues()

close()
 disable_txrx_queues()
 free_irqs()
 deinit_txrx_queues()
 dealloc_txrxqs()

this will remove the need for some flags and unnecessary state tracking
for when a resources is open/close/inited, if each function is
symmetrical and handles its own rollback, so at any point in the code
you would know what to do without any extra if statement branches.

> +	if (err) {
> +		netif_napi_del(&qcq->napi);
> +		return err;
> +	}
> +
> +	qcq->flags |= QCQ_F_INITED;
> +
> +	ionic_debugfs_add_qcq(lif, qcq);
> +
> +	return 0;
> +}
> +
> +static int ionic_lif_rxqs_init(struct lif *lif)
> +{
> +	unsigned int i;
> +	int err;
> +
> +	for (i = 0; i < lif->nxqs; i++) {
> +		err = ionic_lif_rxq_init(lif, lif->rxqcqs[i].qcq);
> +		if (err)
> +			goto err_out;
> +	}
> +
> +	return 0;
> +
> +err_out:
> +	for (; i > 0; i--)
> +		ionic_lif_qcq_deinit(lif, lif->rxqcqs[i-1].qcq);
> +
> +	return err;
> +}
> +
>  static int ionic_station_set(struct lif *lif)
>  {
>  	struct net_device *netdev = lif->netdev;
> @@ -1531,6 +1916,8 @@ static int ionic_lif_init(struct lif *lif)
>  	if (err)
>  		goto err_out_notifyq_deinit;
>  
> +	lif->rx_copybreak = IONIC_RX_COPYBREAK_DEFAULT;
> +
>  	set_bit(LIF_INITED, lif->state);
>  
>  	ionic_link_status_check(lif);
> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.h
> b/drivers/net/ethernet/pensando/ionic/ionic_lif.h
> index d8589a306aa5..88d5bf8f58a1 100644
> --- a/drivers/net/ethernet/pensando/ionic/ionic_lif.h
> +++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.h
> @@ -14,20 +14,38 @@
>  #define MAX_NUM_NAPI_CNTR	(NAPI_POLL_WEIGHT + 1)
>  #define GET_SG_CNTR_IDX(num_sg_elems)	(num_sg_elems)
>  #define MAX_NUM_SG_CNTR		(IONIC_TX_MAX_SG_ELEMS + 1)
> +#define IONIC_RX_COPYBREAK_DEFAULT		256
>  
>  struct tx_stats {
> +	u64 dma_map_err;
>  	u64 pkts;
>  	u64 bytes;
> +	u64 clean;
> +	u64 linearize;
> +	u64 no_csum;
> +	u64 csum;
> +	u64 crc32_csum;
> +	u64 tso;
> +	u64 frags;
> +	u64 sg_cntr[MAX_NUM_SG_CNTR];
>  };
>  
>  struct rx_stats {
> +	u64 dma_map_err;
> +	u64 alloc_err;
>  	u64 pkts;
>  	u64 bytes;
> +	u64 csum_none;
> +	u64 csum_complete;
> +	u64 csum_error;
> +	u64 buffers_posted;
>  };
>  
>  #define QCQ_F_INITED		BIT(0)
>  #define QCQ_F_SG		BIT(1)
>  #define QCQ_F_INTR		BIT(2)
> +#define QCQ_F_TX_STATS		BIT(3)
> +#define QCQ_F_RX_STATS		BIT(4)
>  #define QCQ_F_NOTIFYQ		BIT(5)
>  
>  struct napi_stats {
> @@ -56,7 +74,14 @@ struct qcq {
>  	struct dentry *dentry;
>  };
>  
> +struct qcqst {
> +	struct qcq *qcq;
> +	struct q_stats *stats;
> +};
> +
>  #define q_to_qcq(q)		container_of(q, struct qcq, q)
> +#define q_to_tx_stats(q)	(&q_to_qcq(q)->stats->tx)
> +#define q_to_rx_stats(q)	(&q_to_qcq(q)->stats->rx)
>  #define napi_to_qcq(napi)	container_of(napi, struct qcq, napi)
>  #define napi_to_cq(napi)	(&napi_to_qcq(napi)->cq)
>  
> @@ -108,11 +133,14 @@ struct lif {
>  	spinlock_t adminq_lock;		/* lock for AdminQ operations
> */
>  	struct qcq *adminqcq;
>  	struct qcq *notifyqcq;
> +	struct qcqst *txqcqs;
> +	struct qcqst *rxqcqs;
>  	u64 last_eid;
>  	unsigned int neqs;
>  	unsigned int nxqs;
>  	unsigned int ntxq_descs;
>  	unsigned int nrxq_descs;
> +	u32 rx_copybreak;
>  	unsigned int rx_mode;
>  	u64 hw_features;
>  	bool mc_overflow;
> @@ -134,6 +162,11 @@ struct lif {
>  	u32 flags;
>  };
>  
> +#define lif_to_txqcq(lif, i)	((lif)->txqcqs[i].qcq)
> +#define lif_to_rxqcq(lif, i)	((lif)->rxqcqs[i].qcq)
> +#define lif_to_txq(lif, i)	(&lif_to_txqcq((lif), i)->q)
> +#define lif_to_rxq(lif, i)	(&lif_to_txqcq((lif), i)->q)
> +
>  static inline bool ionic_is_pf(struct ionic *ionic)
>  {
>  	return ionic->pdev &&
> @@ -173,6 +206,22 @@ int ionic_open(struct net_device *netdev);
>  int ionic_stop(struct net_device *netdev);
>  int ionic_reset_queues(struct lif *lif);
>  
> +static inline void debug_stats_txq_post(struct qcq *qcq,
> +					struct txq_desc *desc, bool
> dbell)
> +{
> +	u8 num_sg_elems = ((le64_to_cpu(desc->cmd) >>
> IONIC_TXQ_DESC_NSGE_SHIFT)
> +						&
> IONIC_TXQ_DESC_NSGE_MASK);
> +	u8 sg_cntr_idx;
> +
> +	qcq->q.dbell_count += dbell;
> +
> +	sg_cntr_idx = GET_SG_CNTR_IDX(num_sg_elems);
> +	if (sg_cntr_idx > (MAX_NUM_SG_CNTR - 1))
> +		sg_cntr_idx = MAX_NUM_SG_CNTR - 1;
> +
> +	qcq->stats->tx.sg_cntr[sg_cntr_idx]++;
> +}
> +
>  static inline void debug_stats_napi_poll(struct qcq *qcq,
>  					 unsigned int work_done)
>  {
> @@ -188,7 +237,10 @@ static inline void debug_stats_napi_poll(struct
> qcq *qcq,
>  }
>  
>  #define DEBUG_STATS_CQE_CNT(cq)		((cq)->compl_count++)
> +#define DEBUG_STATS_RX_BUFF_CNT(qcq)	((qcq)->stats-
> >rx.buffers_posted++)
>  #define DEBUG_STATS_INTR_REARM(intr)	((intr)->rearm_count++)
> +#define DEBUG_STATS_TXQ_POST(qcq, txdesc, dbell) \
> +	debug_stats_txq_post(qcq, txdesc, dbell)
>  #define DEBUG_STATS_NAPI_POLL(qcq, work_done) \
>  	debug_stats_napi_poll(qcq, work_done)
>  
> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_txrx.c
> b/drivers/net/ethernet/pensando/ionic/ionic_txrx.c
> new file mode 100644
> index 000000000000..7787e7dd5504
> --- /dev/null
> +++ b/drivers/net/ethernet/pensando/ionic/ionic_txrx.c
> @@ -0,0 +1,879 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright(c) 2017 - 2019 Pensando Systems, Inc */
> +
> +#include <linux/ip.h>
> +#include <linux/ipv6.h>
> +#include <linux/if_vlan.h>
> +#include <net/ip6_checksum.h>
> +
> +#include "ionic.h"
> +#include "ionic_lif.h"
> +#include "ionic_txrx.h"
> +
> +static void ionic_tx_clean(struct queue *q, struct desc_info
> *desc_info,
> +			   struct cq_info *cq_info, void *cb_arg);
> +static void ionic_rx_clean(struct queue *q, struct desc_info
> *desc_info,
> +			   struct cq_info *cq_info, void *cb_arg);
> +
> +static inline void ionic_txq_post(struct queue *q, bool ring_dbell,
> +				  desc_cb cb_func, void *cb_arg)
> +{
> +	DEBUG_STATS_TXQ_POST(q_to_qcq(q), q->head->desc, ring_dbell);
> +
> +	ionic_q_post(q, ring_dbell, cb_func, cb_arg);
> +}
> +
> +static inline void ionic_rxq_post(struct queue *q, bool ring_dbell,
> +				  desc_cb cb_func, void *cb_arg)
> +{
> +	ionic_q_post(q, ring_dbell, cb_func, cb_arg);
> +
> +	DEBUG_STATS_RX_BUFF_CNT(q_to_qcq(q));
> +}
> +
> +static void ionic_rx_recycle(struct queue *q, struct desc_info
> *desc_info,
> +			     struct sk_buff *skb)
> +{
> +	struct rxq_desc *old = desc_info->desc;
> +	struct rxq_desc *new = q->head->desc;
> +
> +	new->addr = old->addr;
> +	new->len = old->len;
> +
> +	ionic_rxq_post(q, true, ionic_rx_clean, skb);
> +}
> +
> +static bool ionic_rx_copybreak(struct queue *q, struct desc_info
> *desc_info,
> +			       struct cq_info *cq_info, struct sk_buff
> **skb)
> +{
> +	struct net_device *netdev = q->lif->netdev;
> +	struct device *dev = q->lif->ionic->dev;
> +	struct rxq_desc *desc = desc_info->desc;
> +	struct rxq_comp *comp = cq_info->cq_desc;
> +	struct sk_buff *new_skb;
> +	u16 clen, dlen;
> +
> +	clen = le16_to_cpu(comp->len);
> +	dlen = le16_to_cpu(desc->len);
> +	if (clen > q->lif->rx_copybreak) {
> +		dma_unmap_single(dev, (dma_addr_t)le64_to_cpu(desc-
> >addr),
> +				 dlen, DMA_FROM_DEVICE);
> +		return false;
> +	}
> +
> +	new_skb = netdev_alloc_skb_ip_align(netdev, clen);
> +	if (!new_skb) {
> +		dma_unmap_single(dev, (dma_addr_t)le64_to_cpu(desc-
> >addr),
> +				 dlen, DMA_FROM_DEVICE);
> +		return false;
> +	}
> +
> +	dma_sync_single_for_cpu(dev, (dma_addr_t)le64_to_cpu(desc-
> >addr),
> +				clen, DMA_FROM_DEVICE);
> +
> +	memcpy(new_skb->data, (*skb)->data, clen);
> +
> +	ionic_rx_recycle(q, desc_info, *skb);
> +	*skb = new_skb;
> +
> +	return true;
> +}
> +
> +static void ionic_rx_clean(struct queue *q, struct desc_info
> *desc_info,
> +			   struct cq_info *cq_info, void *cb_arg)
> +{
> +	struct rxq_comp *comp = cq_info->cq_desc;
> +	struct sk_buff *skb = cb_arg;
> +	struct qcq *qcq = q_to_qcq(q);
> +	struct net_device *netdev;
> +	struct rx_stats *stats;
> +
> +	stats = q_to_rx_stats(q);
> +	netdev = q->lif->netdev;
> +
> +	if (comp->status) {
> +		ionic_rx_recycle(q, desc_info, skb);
> +		return;
> +	}
> +
> +	if (unlikely(test_bit(LIF_QUEUE_RESET, q->lif->state))) {
> +		/* no packet processing while resetting */
> +		ionic_rx_recycle(q, desc_info, skb);
> +		return;
> +	}
> +
> +	stats->pkts++;
> +	stats->bytes += le16_to_cpu(comp->len);
> +
> +	ionic_rx_copybreak(q, desc_info, cq_info, &skb);
> +
> +	skb_put(skb, le16_to_cpu(comp->len));
> +	skb->protocol = eth_type_trans(skb, netdev);
> +
> +	skb_record_rx_queue(skb, q->index);
> +
> +	if (netdev->features & NETIF_F_RXHASH) {
> +		switch (comp->pkt_type_color &
> IONIC_RXQ_COMP_PKT_TYPE_MASK) {
> +		case PKT_TYPE_IPV4:
> +		case PKT_TYPE_IPV6:
> +			skb_set_hash(skb, le32_to_cpu(comp->rss_hash),
> +				     PKT_HASH_TYPE_L3);
> +			break;
> +		case PKT_TYPE_IPV4_TCP:
> +		case PKT_TYPE_IPV6_TCP:
> +		case PKT_TYPE_IPV4_UDP:
> +		case PKT_TYPE_IPV6_UDP:
> +			skb_set_hash(skb, le32_to_cpu(comp->rss_hash),
> +				     PKT_HASH_TYPE_L4);
> +			break;
> +		}
> +	}
> +
> +	if (netdev->features & NETIF_F_RXCSUM) {
> +		if (comp->csum_flags & IONIC_RXQ_COMP_CSUM_F_CALC) {
> +			skb->ip_summed = CHECKSUM_COMPLETE;
> +			skb->csum = (__wsum)le16_to_cpu(comp->csum);
> +			stats->csum_complete++;
> +		}
> +	} else {
> +		stats->csum_none++;
> +	}
> +
> +	if ((comp->csum_flags & IONIC_RXQ_COMP_CSUM_F_TCP_BAD) ||
> +	    (comp->csum_flags & IONIC_RXQ_COMP_CSUM_F_UDP_BAD) ||
> +	    (comp->csum_flags & IONIC_RXQ_COMP_CSUM_F_IP_BAD))
> +		stats->csum_error++;
> +
> +	if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX) {
> +		if (comp->csum_flags & IONIC_RXQ_COMP_CSUM_F_VLAN)
> +			__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q),
> +					       le16_to_cpu(comp-
> >vlan_tci));
> +	}
> +
> +	napi_gro_receive(&qcq->napi, skb);
> +}
> +
> +static bool ionic_rx_service(struct cq *cq, struct cq_info *cq_info)
> +{
> +	struct rxq_comp *comp = cq_info->cq_desc;
> +	struct queue *q = cq->bound_q;
> +	struct desc_info *desc_info;
> +
> +	if (!color_match(comp->pkt_type_color, cq->done_color))
> +		return false;
> +
> +	/* check for empty queue */
> +	if (q->tail->index == q->head->index)
> +		return false;
> +
> +	desc_info = q->tail;
> +	if (desc_info->index != le16_to_cpu(comp->comp_index))
> +		return false;
> +
> +	q->tail = desc_info->next;
> +
> +	/* clean the related q entry, only one per qc completion */
> +	ionic_rx_clean(q, desc_info, cq_info, desc_info->cb_arg);
> +
> +	desc_info->cb = NULL;
> +	desc_info->cb_arg = NULL;
> +
> +	return true;
> +}
> +
> +static u32 ionic_rx_walk_cq(struct cq *rxcq, u32 limit)
> +{
> +	u32 work_done = 0;
> +
> +	while (ionic_rx_service(rxcq, rxcq->tail)) {
> +		if (rxcq->tail->last)
> +			rxcq->done_color = !rxcq->done_color;
> +		rxcq->tail = rxcq->tail->next;
> +		DEBUG_STATS_CQE_CNT(rxcq);
> +
> +		if (++work_done >= limit)
> +			break;
> +	}
> +
> +	return work_done;
> +}
> +
> +void ionic_rx_flush(struct cq *cq)
> +{
> +	struct ionic_dev *idev = &cq->lif->ionic->idev;
> +	u32 work_done;
> +
> +	work_done = ionic_rx_walk_cq(cq, cq->num_descs);
> +
> +	if (work_done)
> +		ionic_intr_credits(idev->intr_ctrl, cq->bound_intr-
> >index,
> +				   work_done,
> IONIC_INTR_CRED_RESET_COALESCE);
> +}

mixing between rx and tx unrelated flows, very hard to review.
 
> +
> +void ionic_tx_flush(struct cq *cq)
> +{
> +	struct ionic_dev *idev = &cq->lif->ionic->idev;
> +	struct txq_comp *comp = cq->tail->cq_desc;
> +	struct queue *q = cq->bound_q;
> +	struct desc_info *desc_info;
> +	unsigned int work_done = 0;
> +
> +	/* walk the completed cq entries */
> +	while (work_done < cq->num_descs &&
> +	       color_match(comp->color, cq->done_color)) {
> +
> +		/* clean the related q entries, there could be
> +		 * several q entries completed for each cq completion
> +		 */
> +		do {
> +			desc_info = q->tail;
> +			q->tail = desc_info->next;
> +			ionic_tx_clean(q, desc_info, cq->tail,
> +				       desc_info->cb_arg);
> +			desc_info->cb = NULL;
> +			desc_info->cb_arg = NULL;
> +		} while (desc_info->index != le16_to_cpu(comp-
> >comp_index));
> +
> +		if (cq->tail->last)
> +			cq->done_color = !cq->done_color;
> +
> +		cq->tail = cq->tail->next;
> +		comp = cq->tail->cq_desc;
> +		DEBUG_STATS_CQE_CNT(cq);
> +
> +		work_done++;
> +	}
> +
> +	if (work_done)
> +		ionic_intr_credits(idev->intr_ctrl, cq->bound_intr-
> >index,
> +				   work_done, 0);
> +}
> +
> +static struct sk_buff *ionic_rx_skb_alloc(struct queue *q, unsigned
> int len,
> +					  dma_addr_t *dma_addr)
> +{
> +	struct lif *lif = q->lif;
> +	struct net_device *netdev = lif->netdev;
> +	struct device *dev = lif->ionic->dev;
> +	struct rx_stats *stats;
> +	struct sk_buff *skb;
> +
> +	stats = q_to_rx_stats(q);
> +	skb = netdev_alloc_skb_ip_align(netdev, len);
> +	if (!skb) {
> +		net_warn_ratelimited("%s: SKB alloc failed on %s!\n",
> +				     netdev->name, q->name);
> +		stats->alloc_err++;
> +		return NULL;
> +	}
> +
> +	*dma_addr = dma_map_single(dev, skb->data, len,
> DMA_FROM_DEVICE);
> +	if (dma_mapping_error(dev, *dma_addr)) {
> +		dev_kfree_skb(skb);
> +		net_warn_ratelimited("%s: DMA single map failed on
> %s!\n",
> +				     netdev->name, q->name);
> +		stats->dma_map_err++;
> +		return NULL;
> +	}
> +
> +	return skb;
> +}
> +
> +static void ionic_rx_skb_free(struct queue *q, struct sk_buff *skb,
> +			      unsigned int len, dma_addr_t dma_addr)
> +{
> +	struct device *dev = q->lif->ionic->dev;
> +
> +	dma_unmap_single(dev, dma_addr, len, DMA_FROM_DEVICE);
> +	dev_kfree_skb(skb);
> +}
> +
> +#define RX_RING_DOORBELL_STRIDE		((1 << 2) - 1)
> +
> +void ionic_rx_fill(struct queue *q)
> +{
> +	struct net_device *netdev = q->lif->netdev;
> +	struct rxq_desc *desc;
> +	struct sk_buff *skb;
> +	dma_addr_t dma_addr;
> +	bool ring_doorbell;
> +	unsigned int len;
> +	unsigned int i;
> +
> +	len = netdev->mtu + ETH_HLEN;
> +
> +	for (i = ionic_q_space_avail(q); i; i--) {
> +		skb = ionic_rx_skb_alloc(q, len, &dma_addr);
> +		if (!skb)
> +			return;
> +
> +		desc = q->head->desc;
> +		desc->addr = cpu_to_le64(dma_addr);
> +		desc->len = cpu_to_le16(len);
> +		desc->opcode = RXQ_DESC_OPCODE_SIMPLE;
> +
> +		ring_doorbell = ((q->head->index + 1) &
> +				RX_RING_DOORBELL_STRIDE) == 0;
> +
> +		ionic_rxq_post(q, ring_doorbell, ionic_rx_clean, skb);
> +	}
> +}
> +
> +static void ionic_rx_fill_cb(void *arg)
> +{
> +	ionic_rx_fill(arg);
> +}
> +
> +void ionic_rx_empty(struct queue *q)
> +{
> +	struct desc_info *cur = q->tail;
> +	struct rxq_desc *desc;
> +
> +	while (cur != q->head) {
> +		desc = cur->desc;
> +
> +		ionic_rx_skb_free(q, cur->cb_arg, le16_to_cpu(desc-
> >len),
> +				  le64_to_cpu(desc->addr));
> +		cur->cb_arg = NULL;
> +
> +		cur = cur->next;
> +	}
> +}
> +
> +int ionic_rx_napi(struct napi_struct *napi, int budget)
> +{
> +	struct qcq *qcq = napi_to_qcq(napi);
> +	struct cq *rxcq = napi_to_cq(napi);
> +	unsigned int qi = rxcq->bound_q->index;
> +	struct lif *lif = rxcq->bound_q->lif;
> +	struct ionic_dev *idev = &lif->ionic->idev;
> +	struct cq *txcq = &lif->txqcqs[qi].qcq->cq;
> +	u32 work_done = 0;
> +	u32 flags = 0;
> +
> +	ionic_tx_flush(txcq);
> +
> +	work_done = ionic_rx_walk_cq(rxcq, budget);
> +
> +	if (work_done)
> +		ionic_rx_fill_cb(rxcq->bound_q);
> +
> +	if (work_done < budget && napi_complete_done(napi, work_done))
> {
> +		flags |= IONIC_INTR_CRED_UNMASK;
> +		DEBUG_STATS_INTR_REARM(rxcq->bound_intr);
> +	}
> +
> +	if (work_done || flags) {
> +		flags |= IONIC_INTR_CRED_RESET_COALESCE;
> +		ionic_intr_credits(idev->intr_ctrl, rxcq->bound_intr-
> >index,
> +				   work_done, flags);
> +	}
> +
> +	DEBUG_STATS_NAPI_POLL(qcq, work_done);
> +
> +	return work_done;
> +}
> +
> +static dma_addr_t ionic_tx_map_single(struct queue *q, void *data,
> size_t len)
> +{
> +	struct tx_stats *stats = q_to_tx_stats(q);
> +	struct device *dev = q->lif->ionic->dev;
> +	dma_addr_t dma_addr;
> +
> +	dma_addr = dma_map_single(dev, data, len, DMA_TO_DEVICE);
> +	if (dma_mapping_error(dev, dma_addr)) {
> +		net_warn_ratelimited("%s: DMA single map failed on
> %s!\n",
> +				     q->lif->netdev->name, q->name);
> +		stats->dma_map_err++;
> +		return 0;
> +	}
> +	return dma_addr;
> +}
> +
> +static dma_addr_t ionic_tx_map_frag(struct queue *q, const
> skb_frag_t *frag,
> +				    size_t offset, size_t len)
> +{
> +	struct tx_stats *stats = q_to_tx_stats(q);
> +	struct device *dev = q->lif->ionic->dev;
> +	dma_addr_t dma_addr;
> +
> +	dma_addr = skb_frag_dma_map(dev, frag, offset, len,
> DMA_TO_DEVICE);
> +	if (dma_mapping_error(dev, dma_addr)) {
> +		net_warn_ratelimited("%s: DMA frag map failed on
> %s!\n",
> +				     q->lif->netdev->name, q->name);
> +		stats->dma_map_err++;
> +		return 0;
> +	}
> +	return dma_addr;
> +}
> +
> +static void ionic_tx_clean(struct queue *q, struct desc_info
> *desc_info,
> +			   struct cq_info *cq_info, void *cb_arg)
> +{
> +	struct txq_sg_desc *sg_desc = desc_info->sg_desc;
> +	struct txq_sg_elem *elem = sg_desc->elems;
> +	struct tx_stats *stats = q_to_tx_stats(q);
> +	struct txq_desc *desc = desc_info->desc;
> +	struct device *dev = q->lif->ionic->dev;
> +	struct sk_buff *skb = cb_arg;
> +	u8 opcode, flags, nsge;
> +	u16 queue_index;
> +	unsigned int i;
> +	u64 addr;
> +
> +	decode_txq_desc_cmd(le64_to_cpu(desc->cmd),
> +			    &opcode, &flags, &nsge, &addr);
> +
> +	dma_unmap_page(dev, (dma_addr_t)addr,
> +		       le16_to_cpu(desc->len), DMA_TO_DEVICE);
> +	for (i = 0; i < nsge; i++, elem++)
> +		dma_unmap_page(dev, (dma_addr_t)le64_to_cpu(elem-
> >addr),
> +			       le16_to_cpu(elem->len), DMA_TO_DEVICE);
> +
> +	if (skb) {
when skb is null and queue is stopped what do you do ? 
also you should do this at the end of napi cycle.. and not per packet. 

> +		queue_index = skb_get_queue_mapping(skb);
> +		if (unlikely(__netif_subqueue_stopped(q->lif->netdev,
> +						      queue_index))) {
> +			netif_wake_subqueue(q->lif->netdev,
> queue_index);
> +			q->wake++;
> +		}
> +		dev_kfree_skb_any(skb);
> +		stats->clean++;
> +	}
> +}
> +
> +static void ionic_tx_tcp_inner_pseudo_csum(struct sk_buff *skb)
> +{
> +	skb_cow_head(skb, 0);
> +
> +	if (skb->protocol == cpu_to_be16(ETH_P_IP)) {
> +		inner_ip_hdr(skb)->check = 0;
> +		inner_tcp_hdr(skb)->check =
> +			~csum_tcpudp_magic(inner_ip_hdr(skb)->saddr,
> +					   inner_ip_hdr(skb)->daddr,
> +					   0, IPPROTO_TCP, 0);
> +	} else if (skb->protocol == cpu_to_be16(ETH_P_IPV6)) {
> +		inner_tcp_hdr(skb)->check =
> +			~csum_ipv6_magic(&inner_ipv6_hdr(skb)->saddr,
> +					 &inner_ipv6_hdr(skb)->daddr,
> +					 0, IPPROTO_TCP, 0);
> +	}
> +}
> +
> +static void ionic_tx_tcp_pseudo_csum(struct sk_buff *skb)
> +{
> +	skb_cow_head(skb, 0);
> +
> +	if (skb->protocol == cpu_to_be16(ETH_P_IP)) {
> +		ip_hdr(skb)->check = 0;
> +		tcp_hdr(skb)->check =
> +			~csum_tcpudp_magic(ip_hdr(skb)->saddr,
> +					   ip_hdr(skb)->daddr,
> +					   0, IPPROTO_TCP, 0);
> +	} else if (skb->protocol == cpu_to_be16(ETH_P_IPV6)) {
> +		tcp_hdr(skb)->check =
> +			~csum_ipv6_magic(&ipv6_hdr(skb)->saddr,
> +					 &ipv6_hdr(skb)->daddr,
> +					 0, IPPROTO_TCP, 0);
> +	}
> +}
> +
> +static void ionic_tx_tso_post(struct queue *q, struct txq_desc
> *desc,
> +			      struct sk_buff *skb,
> +			      dma_addr_t addr, u8 nsge, u16 len,
> +			      unsigned int hdrlen, unsigned int mss,
> +			      bool outer_csum,
> +			      u16 vlan_tci, bool has_vlan,
> +			      bool start, bool done)
> +{
> +	u8 flags = 0;
> +	u64 cmd;
> +
> +	flags |= has_vlan ? IONIC_TXQ_DESC_FLAG_VLAN : 0;
> +	flags |= outer_csum ? IONIC_TXQ_DESC_FLAG_ENCAP : 0;
> +	flags |= start ? IONIC_TXQ_DESC_FLAG_TSO_SOT : 0;
> +	flags |= done ? IONIC_TXQ_DESC_FLAG_TSO_EOT : 0;
> +
> +	cmd = encode_txq_desc_cmd(IONIC_TXQ_DESC_OPCODE_TSO, flags,
> nsge, addr);
> +	desc->cmd = cpu_to_le64(cmd);
> +	desc->len = cpu_to_le16(len);
> +	desc->vlan_tci = cpu_to_le16(vlan_tci);
> +	desc->hdr_len = cpu_to_le16(hdrlen);
> +	desc->mss = cpu_to_le16(mss);
> +
> +	if (done) {
> +		skb_tx_timestamp(skb);
> +		ionic_txq_post(q, !netdev_xmit_more(), ionic_tx_clean,
> skb);
> +	} else {
> +		ionic_txq_post(q, false, ionic_tx_clean, NULL);
> +	}
> +}
> +
> +static struct txq_desc *ionic_tx_tso_next(struct queue *q,
> +					  struct txq_sg_elem **elem)
> +{
> +	struct txq_sg_desc *sg_desc = q->head->sg_desc;
> +	struct txq_desc *desc = q->head->desc;
> +
> +	*elem = sg_desc->elems;
> +	return desc;
> +}
> +
> +static int ionic_tx_tso(struct queue *q, struct sk_buff *skb)
> +{
> +	struct tx_stats *stats = q_to_tx_stats(q);
> +	struct desc_info *abort = q->head;
> +	struct desc_info *rewind = abort;
> +	unsigned int frag_left = 0;
> +	struct txq_sg_elem *elem;
> +	unsigned int offset = 0;
> +	unsigned int len_left;
> +	struct txq_desc *desc;
> +	dma_addr_t desc_addr;
> +	unsigned int hdrlen;
> +	unsigned int nfrags;
> +	unsigned int seglen;
> +	u64 total_bytes = 0;
> +	u64 total_pkts = 0;
> +	unsigned int left;
> +	unsigned int len;
> +	unsigned int mss;
> +	skb_frag_t *frag;
> +	bool start, done;
> +	bool outer_csum;
> +	bool has_vlan;
> +	u16 desc_len;
> +	u8 desc_nsge;
> +	u16 vlan_tci;
> +	bool encap;
> +
> +	mss = skb_shinfo(skb)->gso_size;
> +	nfrags = skb_shinfo(skb)->nr_frags;
> +	len_left = skb->len - skb_headlen(skb);
> +	outer_csum = (skb_shinfo(skb)->gso_type & SKB_GSO_GRE_CSUM) ||
> +		     (skb_shinfo(skb)->gso_type &
> SKB_GSO_UDP_TUNNEL_CSUM);
> +	has_vlan = !!skb_vlan_tag_present(skb);
> +	vlan_tci = skb_vlan_tag_get(skb);
> +	encap = skb->encapsulation;
> +
> +	/* Preload inner-most TCP csum field with IP pseudo hdr
> +	 * calculated with IP length set to zero.  HW will later
> +	 * add in length to each TCP segment resulting from the TSO.
> +	 */
> +
> +	if (encap)
> +		ionic_tx_tcp_inner_pseudo_csum(skb);
> +	else
> +		ionic_tx_tcp_pseudo_csum(skb);
> +
> +	if (encap)
> +		hdrlen = skb_inner_transport_header(skb) - skb->data +
> +			 inner_tcp_hdrlen(skb);
> +	else
> +		hdrlen = skb_transport_offset(skb) + tcp_hdrlen(skb);
> +
> +	seglen = hdrlen + mss;
> +	left = skb_headlen(skb);
> +
> +	desc = ionic_tx_tso_next(q, &elem);
> +	start = true;
> +
> +	/* Chop skb->data up into desc segments */
> +
> +	while (left > 0) {
> +		len = min(seglen, left);
> +		frag_left = seglen - len;
> +		desc_addr = ionic_tx_map_single(q, skb->data + offset,
> len);
> +		if (!desc_addr)
> +			goto err_out_abort;
> +		desc_len = len;
> +		desc_nsge = 0;
> +		left -= len;
> +		offset += len;
> +		if (nfrags > 0 && frag_left > 0)
> +			continue;
> +		done = (nfrags == 0 && left == 0);
> +		ionic_tx_tso_post(q, desc, skb,
> +				  desc_addr, desc_nsge, desc_len,
> +				  hdrlen, mss,
> +				  outer_csum,
> +				  vlan_tci, has_vlan,
> +				  start, done);
> +		total_pkts++;
> +		total_bytes += start ? len : len + hdrlen;
> +		desc = ionic_tx_tso_next(q, &elem);
> +		start = false;
> +		seglen = mss;
> +	}
> +
> +	/* Chop skb frags into desc segments */
> +
> +	for (frag = skb_shinfo(skb)->frags; len_left; frag++) {
> +		offset = 0;
> +		left = skb_frag_size(frag);
> +		len_left -= left;
> +		nfrags--;
> +		stats->frags++;
> +
> +		while (left > 0) {
> +			if (frag_left > 0) {
> +				len = min(frag_left, left);
> +				frag_left -= len;
> +				elem->addr =
> +				    cpu_to_le64(ionic_tx_map_frag(q,
> frag,
> +								  offse
> t, len));
> +				if (!elem->addr)
> +					goto err_out_abort;
> +				elem->len = cpu_to_le16(len);
> +				elem++;
> +				desc_nsge++;
> +				left -= len;
> +				offset += len;
> +				if (nfrags > 0 && frag_left > 0)
> +					continue;
> +				done = (nfrags == 0 && left == 0);
> +				ionic_tx_tso_post(q, desc, skb,
> desc_addr,
> +						  desc_nsge, desc_len,
> +						  hdrlen, mss,
> outer_csum,
> +						  vlan_tci, has_vlan,
> +						  start, done);
> +				total_pkts++;
> +				total_bytes += start ? len : len +
> hdrlen;
> +				desc = ionic_tx_tso_next(q, &elem);
> +				start = false;
> +			} else {
> +				len = min(mss, left);
> +				frag_left = mss - len;
> +				desc_addr = ionic_tx_map_frag(q, frag,
> +							      offset,
> len);
> +				if (!desc_addr)
> +					goto err_out_abort;
> +				desc_len = len;
> +				desc_nsge = 0;
> +				left -= len;
> +				offset += len;
> +				if (nfrags > 0 && frag_left > 0)
> +					continue;
> +				done = (nfrags == 0 && left == 0);
> +				ionic_tx_tso_post(q, desc, skb,
> desc_addr,
> +						  desc_nsge, desc_len,
> +						  hdrlen, mss,
> outer_csum,
> +						  vlan_tci, has_vlan,
> +						  start, done);
> +				total_pkts++;
> +				total_bytes += start ? len : len +
> hdrlen;
> +				desc = ionic_tx_tso_next(q, &elem);
> +				start = false;
> +			}
> +		}
> +	}
> +
> +	stats->pkts += total_pkts;
> +	stats->bytes += total_bytes;
> +	stats->tso++;
> +
> +	return 0;
> +
> +err_out_abort:
> +	while (rewind->desc != q->head->desc) {
> +		ionic_tx_clean(q, rewind, NULL, NULL);
> +		rewind = rewind->next;
> +	}
> +	q->head = abort;
> +
> +	return -ENOMEM;
> +}
> +
> +static int ionic_tx_calc_csum(struct queue *q, struct sk_buff *skb)
> +{
> +	struct tx_stats *stats = q_to_tx_stats(q);
> +	struct txq_desc *desc = q->head->desc;
> +	dma_addr_t addr;
> +	bool has_vlan;
> +	u8 flags = 0;
> +	bool encap;
> +	u64 cmd;
> +
> +	has_vlan = !!skb_vlan_tag_present(skb);
> +	encap = skb->encapsulation;
> +
> +	addr = ionic_tx_map_single(q, skb->data, skb_headlen(skb));
> +	if (!addr)
> +		return -ENOMEM;
> +
> +	flags |= has_vlan ? IONIC_TXQ_DESC_FLAG_VLAN : 0;
> +	flags |= encap ? IONIC_TXQ_DESC_FLAG_ENCAP : 0;
> +
> +	cmd = encode_txq_desc_cmd(IONIC_TXQ_DESC_OPCODE_CSUM_PARTIAL,
> +				  flags, skb_shinfo(skb)->nr_frags,
> addr);
> +	desc->cmd = cpu_to_le64(cmd);
> +	desc->len = cpu_to_le16(skb_headlen(skb));
> +	desc->vlan_tci = cpu_to_le16(skb_vlan_tag_get(skb));
> +	desc->csum_start = cpu_to_le16(skb_checksum_start_offset(skb));
> +	desc->csum_offset = cpu_to_le16(skb->csum_offset);
> +
> +	if (skb->csum_not_inet)
> +		stats->crc32_csum++;
> +	else
> +		stats->csum++;
> +
> +	return 0;
> +}
> +
> +static int ionic_tx_calc_no_csum(struct queue *q, struct sk_buff
> *skb)
> +{
> +	struct tx_stats *stats = q_to_tx_stats(q);
> +	struct txq_desc *desc = q->head->desc;
> +	dma_addr_t addr;
> +	bool has_vlan;
> +	u8 flags = 0;
> +	bool encap;
> +	u64 cmd;
> +
> +	has_vlan = !!skb_vlan_tag_present(skb);
> +	encap = skb->encapsulation;
> +
> +	addr = ionic_tx_map_single(q, skb->data, skb_headlen(skb));
> +	if (!addr)
> +		return -ENOMEM;
> +
> +	flags |= has_vlan ? IONIC_TXQ_DESC_FLAG_VLAN : 0;
> +	flags |= encap ? IONIC_TXQ_DESC_FLAG_ENCAP : 0;
> +
> +	cmd = encode_txq_desc_cmd(IONIC_TXQ_DESC_OPCODE_CSUM_NONE,
> +				  flags, skb_shinfo(skb)->nr_frags,
> addr);
> +	desc->cmd = cpu_to_le64(cmd);
> +	desc->len = cpu_to_le16(skb_headlen(skb));
> +	desc->vlan_tci = cpu_to_le16(skb_vlan_tag_get(skb));
> +
> +	stats->no_csum++;
> +
> +	return 0;
> +}
> +
> +static int ionic_tx_skb_frags(struct queue *q, struct sk_buff *skb)
> +{
> +	unsigned int len_left = skb->len - skb_headlen(skb);
> +	struct txq_sg_desc *sg_desc = q->head->sg_desc;
> +	struct txq_sg_elem *elem = sg_desc->elems;
> +	struct tx_stats *stats = q_to_tx_stats(q);
> +	dma_addr_t dma_addr;
> +	skb_frag_t *frag;
> +	u16 len;
> +
> +	for (frag = skb_shinfo(skb)->frags; len_left; frag++, elem++) {
> +		len = skb_frag_size(frag);
> +		elem->len = cpu_to_le16(len);
> +		dma_addr = ionic_tx_map_frag(q, frag, 0, len);
> +		if (!dma_addr)
> +			return -ENOMEM;
> +		elem->addr = cpu_to_le64(dma_addr);
> +		len_left -= len;
> +		stats->frags++;
> +	}
> +
> +	return 0;
> +}
> +
> +static int ionic_tx(struct queue *q, struct sk_buff *skb)
> +{
> +	struct tx_stats *stats = q_to_tx_stats(q);
> +	int err;
> +
> +	if (skb->ip_summed == CHECKSUM_PARTIAL)
> +		err = ionic_tx_calc_csum(q, skb);
> +	else
> +		err = ionic_tx_calc_no_csum(q, skb);
> +	if (err)
> +		return err;
> +
> +	err = ionic_tx_skb_frags(q, skb);
> +	if (err)
> +		return err;
> +
> +	skb_tx_timestamp(skb);
> +	stats->pkts++;
> +	stats->bytes += skb->len;
> +
> +	ionic_txq_post(q, !netdev_xmit_more(), ionic_tx_clean, skb);
> +
> +	return 0;
> +}
> +
> +static int ionic_tx_descs_needed(struct queue *q, struct sk_buff
> *skb)
> +{
> +	struct tx_stats *stats = q_to_tx_stats(q);
> +	int err;
> +
> +	/* If TSO, need roundup(skb->len/mss) descs */
> +	if (skb_is_gso(skb))
> +		return (skb->len / skb_shinfo(skb)->gso_size) + 1;
> +
> +	/* If non-TSO, just need 1 desc and nr_frags sg elems */
> +	if (skb_shinfo(skb)->nr_frags <= IONIC_TX_MAX_SG_ELEMS)
> +		return 1;
> +
> +	/* Too many frags, so linearize */
> +	err = skb_linearize(skb);
> +	if (err)
> +		return err;
> +
> +	stats->linearize++;
> +
> +	/* Need 1 desc and zero sg elems */
> +	return 1;
> +}
> +
> +netdev_tx_t ionic_start_xmit(struct sk_buff *skb, struct net_device
> *netdev)
> +{
> +	u16 queue_index = skb_get_queue_mapping(skb);
> +	struct lif *lif = netdev_priv(netdev);
> +	struct queue *q;
> +	int ndescs;
> +	int err;
> +
> +	if (unlikely(!test_bit(LIF_UP, lif->state))) {
> +		dev_kfree_skb(skb);
> +		return NETDEV_TX_OK;
> +	}
> +
> +	if (likely(lif_to_txqcq(lif, queue_index)))
> +		q = lif_to_txq(lif, queue_index);
> +	else
> +		q = lif_to_txq(lif, 0);
> +
> +	ndescs = ionic_tx_descs_needed(q, skb);
> +	if (ndescs < 0)
> +		goto err_out_drop;
> +
> +	if (!ionic_q_has_space(q, ndescs)) {
> +		netif_stop_subqueue(netdev, queue_index);
> +		q->stop++;
> +
> +		/* Might race with ionic_tx_clean, check again */
> +		smp_rmb();
> +		if (ionic_q_has_space(q, ndescs)) {
> +			netif_wake_subqueue(netdev, queue_index);
> +			q->wake++;
> +		} else {
> +			return NETDEV_TX_BUSY;
> +		}
> +	}
> +

You are missing netdev_tx_sent_queue() here
and corresponding netdev_tx_completed_queue() on completion.

> +	if (skb_is_gso(skb))
> +		err = ionic_tx_tso(q, skb);
> +	else
> +		err = ionic_tx(q, skb);
> +
> +	if (err)
> +		goto err_out_drop;
> +
> +	return NETDEV_TX_OK;
> +
> +err_out_drop:
> +	netif_stop_subqueue(netdev, queue_index);
> +	q->stop++;
> +	q->drop++;
> +	dev_kfree_skb(skb);
> +	return NETDEV_TX_OK;
> +}
> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_txrx.h
> b/drivers/net/ethernet/pensando/ionic/ionic_txrx.h
> new file mode 100644
> index 000000000000..2391a0eec65a
> --- /dev/null
> +++ b/drivers/net/ethernet/pensando/ionic/ionic_txrx.h
> @@ -0,0 +1,15 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright(c) 2017 - 2019 Pensando Systems, Inc */
> +
> +#ifndef _IONIC_TXRX_H_
> +#define _IONIC_TXRX_H_
> +
> +void ionic_rx_flush(struct cq *cq);
> +void ionic_tx_flush(struct cq *cq);
> +
> +void ionic_rx_fill(struct queue *q);
> +void ionic_rx_empty(struct queue *q);
> +int ionic_rx_napi(struct napi_struct *napi, int budget);
> +netdev_tx_t ionic_start_xmit(struct sk_buff *skb, struct net_device
> *netdev);
> +
> +#endif /* _IONIC_TXRX_H_ */

^ permalink raw reply

* Slowness forming TIPC cluster with explicit node addresses
From: Chris Packham @ 2019-07-25 23:37 UTC (permalink / raw)
  To: tipc-discussion@lists.sourceforge.net
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org

Hi,

I'm having problems forming a TIPC cluster between 2 nodes.

This is the basic steps I'm going through on each node.

modprobe tipc
ip link set eth2 up
tipc node set addr 1.1.5 # or 1.1.6
tipc bearer enable media eth dev eth0

Then to confirm if the cluster is formed I use tipc link list

[root@node-5 ~]# tipc link list
broadcast-link: up
...

Looking at tcpdump the two nodes are sending packets 

22:30:05.782320 TIPC v2.0 1.1.5 > 0.0.0, headerlength 60 bytes,
MessageSize 76 bytes, Neighbor Detection Protocol internal, messageType
Link request
22:30:05.863555 TIPC v2.0 1.1.6 > 0.0.0, headerlength 60 bytes,
MessageSize 76 bytes, Neighbor Detection Protocol internal, messageType
Link request

Eventually (after a few minutes) the link does come up

[root@node-6 ~]# tipc link list
broadcast-link: up
1001006:eth2-1001005:eth2: up

[root@node-5 ~]# tipc link list
broadcast-link: up
1001005:eth2-1001006:eth2: up

When I remove the "tipc node set addr" things seem to kick into life
straight away

[root@node-5 ~]# tipc link list
broadcast-link: up
0050b61bd2aa:eth2-0050b61e6dfa:eth2: up

So there appears to be some difference in behaviour between having an
explicit node address and using the default. Unfortunately our
application relies on setting the node addresses.

[root@node-5 ~]# uname -a
Linux linuxbox 5.2.0-at1+ #8 SMP Thu Jul 25 23:22:41 UTC 2019 ppc
GNU/Linux

Any thoughts on the problem?

^ permalink raw reply

* Re: [PATCH bpf-next 3/6] bpf: keep bpf.h in sync with tools/
From: Brian Vazquez @ 2019-07-25 23:27 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller, Stanislav Fomichev, Willem de Bruijn,
	Petar Penkov, open list, Networking, bpf
In-Reply-To: <CAEf4BzaCUBA40DKUYm6rSa0v-jQMK7aPu867oYkZhfZGB4wiSA@mail.gmail.com>

On Wed, Jul 24, 2019 at 4:10 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Wed, Jul 24, 2019 at 10:10 AM Brian Vazquez <brianvv@google.com> wrote:
> >
> > Adds bpf_attr.dump structure to libbpf.
> >
> > Suggested-by: Stanislav Fomichev <sdf@google.com>
> > Signed-off-by: Brian Vazquez <brianvv@google.com>
> > ---
> >  tools/include/uapi/linux/bpf.h | 9 +++++++++
> >  tools/lib/bpf/libbpf.map       | 2 ++
> >  2 files changed, 11 insertions(+)
> >
> > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> > index 4e455018da65f..e127f16e4e932 100644
> > --- a/tools/include/uapi/linux/bpf.h
> > +++ b/tools/include/uapi/linux/bpf.h
> > @@ -106,6 +106,7 @@ enum bpf_cmd {
> >         BPF_TASK_FD_QUERY,
> >         BPF_MAP_LOOKUP_AND_DELETE_ELEM,
> >         BPF_MAP_FREEZE,
> > +       BPF_MAP_DUMP,
> >  };
> >
> >  enum bpf_map_type {
> > @@ -388,6 +389,14 @@ union bpf_attr {
> >                 __u64           flags;
> >         };
> >
> > +       struct { /* struct used by BPF_MAP_DUMP command */
> > +               __aligned_u64   prev_key;
> > +               __aligned_u64   buf;
> > +               __aligned_u64   buf_len; /* input/output: len of buf */
> > +               __u64           flags;
> > +               __u32           map_fd;
> > +       } dump;
> > +
> >         struct { /* anonymous struct used by BPF_PROG_LOAD command */
> >                 __u32           prog_type;      /* one of enum bpf_prog_type */
> >                 __u32           insn_cnt;
> > diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> > index f9d316e873d8d..cac3723d5c45c 100644
> > --- a/tools/lib/bpf/libbpf.map
> > +++ b/tools/lib/bpf/libbpf.map
> > @@ -183,4 +183,6 @@ LIBBPF_0.0.4 {
>
> LIBBPF_0.0.4 is closed, this needs to go into LIBBPF_0.0.5.

Sorry my bad, I didn't closely look at the rebase so this got it wrong.

>
> >                 perf_buffer__new;
> >                 perf_buffer__new_raw;
> >                 perf_buffer__poll;
> > +               bpf_map_dump;
> > +               bpf_map_dump_flags;
>
> As the general rule, please keep those lists of functions in alphabetical order.

right.

I will fix it in next version, thanks for reviewing it!

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox