Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel
From: Rahul Lakkireddy @ 2018-04-18 15:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Dave Young, netdev@vger.kernel.org, kexec@lists.infradead.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Indranil Choudhury, Nirranjan Kirubaharan,
	stephen@networkplumber.org, Ganesh GR, akpm@linux-foundation.org,
	torvalds@linux-foundation.org, davem@davemloft.net,
	viro@zeniv.linux.org.uk
In-Reply-To: <871sfcy4ge.fsf@xmission.com>

On Wednesday, April 04/18/18, 2018 at 19:58:01 +0530, Eric W. Biederman wrote:
> Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> writes:
> 
> > On Wednesday, April 04/18/18, 2018 at 11:45:46 +0530, Dave Young wrote:
> >> Hi Rahul,
> >> On 04/17/18 at 01:14pm, Rahul Lakkireddy wrote:
> >> > On production servers running variety of workloads over time, kernel
> >> > panic can happen sporadically after days or even months. It is
> >> > important to collect as much debug logs as possible to root cause
> >> > and fix the problem, that may not be easy to reproduce. Snapshot of
> >> > underlying hardware/firmware state (like register dump, firmware
> >> > logs, adapter memory, etc.), at the time of kernel panic will be very
> >> > helpful while debugging the culprit device driver.
> >> > 
> >> > This series of patches add new generic framework that enable device
> >> > drivers to collect device specific snapshot of the hardware/firmware
> >> > state of the underlying device in the crash recovery kernel. In crash
> >> > recovery kernel, the collected logs are added as elf notes to
> >> > /proc/vmcore, which is copied by user space scripts for post-analysis.
> >> > 
> >> > The sequence of actions done by device drivers to append their device
> >> > specific hardware/firmware logs to /proc/vmcore are as follows:
> >> > 
> >> > 1. During probe (before hardware is initialized), device drivers
> >> > register to the vmcore module (via vmcore_add_device_dump()), with
> >> > callback function, along with buffer size and log name needed for
> >> > firmware/hardware log collection.
> >> 
> >> I assumed the elf notes info should be prepared while kexec_[file_]load
> >> phase. But I did not read the old comment, not sure if it has been discussed
> >> or not.
> >> 
> >
> > We must not collect dumps in crashing kernel. Adding more things in
> > crash dump path risks not collecting vmcore at all. Eric had
> > discussed this in more detail at:
> >
> > https://lkml.org/lkml/2018/3/24/319
> >
> > We are safe to collect dumps in the second kernel. Each device dump
> > will be exported as an elf note in /proc/vmcore.
> 
> It just occurred to me there is one variation that is worth
> considering.
> 
> Is the area you are looking at dumping part of a huge mmio area?
> I think someone said 2GB?
> 
> If that is the case it could be worth it to simply add the needed
> addresses to the range of memory we need to dump, and simply having a
> elf note saying that is what happened.
> 

We are _not_ dumping mmio area. However, one part of the dump
collection involves reading 2 GB on-chip memory via PIO access,
which is compressed and stored.

Thanks,
Rahul

^ permalink raw reply

* Re: [PATCH net-next 2/2] openvswitch: Support conntrack zone limit
From: Gregory Rose @ 2018-04-18 15:05 UTC (permalink / raw)
  To: Yi-Hung Wei; +Cc: netdev
In-Reply-To: <CAG1aQhJxgwmEGPpO61rpGo1ve9Rdr+fV7r-EF95x0=1SqZgX+A@mail.gmail.com>

On 4/17/2018 5:30 PM, Yi-Hung Wei wrote:
>> s/to commit/from committing/
>> s/entry/entries/
> Thanks, will fix that in both patches in v2.
>
>
>> I think this is a great idea but I suggest porting to the iproute2 package
>> so everyone can use it.  Then git rid of the OVS specific prefixes.
>> Presuming of course that the conntrack connection
>> limit backend works there as well I guess.  If it doesn't, then I'd suggest
>> extending
>> it.  This is a nice feature for all users in my opinion and then OVS
>> can take advantage of it as well.
> Thanks for the comment.  And yes, I think currently, iptables’s
> connlimit extension does support limiting the # of connections.  Users
> need to configure the zone properly, and the iptable’s connlimit
> extension is using netfilter's nf_conncount backend already.
>
> The main goal for this patch is to utilize netfilter backend
> (nf_conncount) to count and limit the number of connections. OVS needs
> the proposed OVS_CT_LIMIT netlink API and the corresponding booking
> data structure because the current nf_conncount backend only counts
> the # of connections, but it does not keep track of the connection
> limit in nf_conncount.
>
> Thanks,
>
> -Yi-Hung

Thanks Yi-hung, I figured I was just missing something there.  I 
appreciate the explanation.

- Greg

^ permalink raw reply

* Re: [PATCH net-next] team: account for oper state
From: Jiri Pirko @ 2018-04-18 14:58 UTC (permalink / raw)
  To: George Wilkie; +Cc: netdev
In-Reply-To: <20180418133549.qd5uqp3km45vw3ar@debian9.gwilkie>

Wed, Apr 18, 2018 at 03:35:49PM CEST, gwilkie@vyatta.att-mail.com wrote:
>On Wed, Apr 18, 2018 at 02:56:44PM +0200, Jiri Pirko wrote:
>> Wed, Apr 18, 2018 at 12:29:50PM CEST, gwilkie@vyatta.att-mail.com wrote:
>> >Account for operational state when determining port linkup state,
>> >as per Documentation/networking/operstates.txt.
>> 
>> Could you please point me to the exact place in the document where this
>> is suggested?
>> 
>
>Various places cover it I think.
>
>In 1. Introduction:
>"interface is not usable just because the admin enabled it"
>"userspace must be granted the possibility to
>influence operational state"
>
>In 4. Setting from userspace:
>"the userspace application can set IFLA_OPERSTATE
>to IF_OPER_DORMANT or IF_OPER_UP as long as the driver does not set
>netif_carrier_off() or netif_dormant_on()"
>
>We have a use case where we want to set the oper state of the team ports based
>on whether they are actually usable or not (as opposed to just admin up).

Are you running a supplicant there or what is the use-case?

How is this handle in other drivers like bond, openvswitch, bridge, etc?

>
>Cheers.
>
>> 
>> >
>> >Signed-off-by: George Wilkie <gwilkie@vyatta.att-mail.com>
>> >---
>> > drivers/net/team/team.c | 3 ++-
>> > 1 file changed, 2 insertions(+), 1 deletion(-)
>> >
>> >diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
>> >index a6c6ce19eeee..231264a05e55 100644
>> >--- a/drivers/net/team/team.c
>> >+++ b/drivers/net/team/team.c
>> >@@ -2918,7 +2918,8 @@ static int team_device_event(struct notifier_block *unused,
>> > 	case NETDEV_CHANGE:
>> > 		if (netif_running(port->dev))
>> > 			team_port_change_check(port,
>> >-					       !!netif_carrier_ok(port->dev));
>> >+					       !!(netif_carrier_ok(port->dev) &&
>> >+						  netif_oper_up(port->dev)));
>> > 		break;
>> > 	case NETDEV_UNREGISTER:
>> > 		team_del_slave(port->team->dev, dev);
>> >-- 
>> >2.11.0
>> >

^ permalink raw reply

* Re: [PATCH] net: qmi_wwan: add Wistron Neweb D19Q1
From: Bjørn Mork @ 2018-04-18 14:39 UTC (permalink / raw)
  To: Pawel Dembicki; +Cc: netdev, linux-usb, linux-kernel
In-Reply-To: <1524060204-7814-1-git-send-email-paweldembicki@gmail.com>

Pawel Dembicki <paweldembicki@gmail.com> writes:

> This modem is embedded on dlink dwr-960 router.
> The oem configuration states:
>
> T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=480 MxCh= 0
> D: Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
> P: Vendor=1435 ProdID=d191 Rev=ff.ff
> S: Manufacturer=Android
> S: Product=Android
> S: SerialNumber=0123456789ABCDEF
> C:* #Ifs= 6 Cfg#= 1 Atr=80 MxPwr=500mA
> I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
> E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
> E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
> I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none)
> E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
> E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
> I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none)
> E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
> E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
> E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
> I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none)
> E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
> E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
> E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
> I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
> E: Ad=88(I) Atr=03(Int.) MxPS= 8 Ivl=32ms
> E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
> E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
> I:* If#= 5 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=(none)
> E: Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
> E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=125us
>
> Tested on openwrt distribution
>
> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>

Acked-by: Bjørn Mork <bjorn@mork.no>

^ permalink raw reply

* Re: [PATCH 6/6] rhashtable: add rhashtable_walk_prev()
From: Herbert Xu @ 2018-04-18 14:35 UTC (permalink / raw)
  To: NeilBrown; +Cc: Thomas Graf, netdev, linux-kernel
In-Reply-To: <152403402206.16895.14563720960374849428.stgit2@noble>

On Wed, Apr 18, 2018 at 04:47:02PM +1000, NeilBrown wrote:
> rhashtable_walk_prev() returns the object returned by
> the previous rhashtable_walk_next(), providing it is still in the
> table (or was during this grace period).
> This works even if rhashtable_walk_stop() and rhashtable_talk_start()
> have been called since the last rhashtable_walk_next().
> 
> If there have been no calls to rhashtable_walk_next(), or if the
> object is gone from the table, then NULL is returned.
> 
> This can usefully be used in a seq_file ->start() function.
> If the pos is the same as was returned by the last ->next() call,
> then rhashtable_walk_prev() can be used to re-establish the
> current location in the table.  If it returns NULL, then
> rhashtable_walk_next() should be used.
> 
> Signed-off-by: NeilBrown <neilb@suse.com>

Can you explain the need for this function and its difference
from the existing rhashtable_walk_peek?

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* [PATCH] net: don't use kvzalloc for DMA memory
From: Mikulas Patocka @ 2018-04-18 14:34 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet
  Cc: Joby Poriyath, Ben Hutchings, netdev, linux-kernel

The patch 74d332c13b21 changes alloc_netdev_mqs to use vzalloc if kzalloc
fails (later patches change it to kvzalloc).

The problem with this is that if the vzalloc function is actually used, 
virtio_net doesn't work (because it expects that the extra memory should 
be accessible with DMA-API and memory allocated with vzalloc isn't).

This patch changes it back to kzalloc and adds a warning if the allocated
size is too large (the allocation is unreliable in this case).

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: 74d332c13b21 ("net: extend net_device allocation to vmalloc()")

---
 net/core/dev.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6/net/core/dev.c
===================================================================
--- linux-2.6.orig/net/core/dev.c	2018-04-16 21:08:36.000000000 +0200
+++ linux-2.6/net/core/dev.c	2018-04-18 16:24:43.000000000 +0200
@@ -8366,7 +8366,8 @@ struct net_device *alloc_netdev_mqs(int
 	/* ensure 32-byte alignment of whole construct */
 	alloc_size += NETDEV_ALIGN - 1;

-	p = kvzalloc(alloc_size, GFP_KERNEL | __GFP_RETRY_MAYFAIL);
+	WARN_ON(alloc_size > PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER);
+	p = kzalloc(alloc_size, GFP_KERNEL | __GFP_RETRY_MAYFAIL);
 	if (!p)
 		return NULL;

^ permalink raw reply

* Re: [PATCH 1/6] rhashtable: remove outdated comments about grow_decision etc
From: Herbert Xu @ 2018-04-18 14:29 UTC (permalink / raw)
  To: NeilBrown; +Cc: Thomas Graf, netdev, linux-kernel
In-Reply-To: <152403402187.16895.84802790561768231.stgit2@noble>

On Wed, Apr 18, 2018 at 04:47:01PM +1000, NeilBrown wrote:
> grow_decision and shink_decision no longer exist, so remove
> the remaining references to them.
> 
> Signed-off-by: NeilBrown <neilb@suse.com>

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH RFC net-next 00/11] udp gso
From: Willem de Bruijn @ 2018-04-18 14:28 UTC (permalink / raw)
  To: Sowmini Varadhan
  Cc: Samudrala, Sridhar, Network Development, Willem de Bruijn
In-Reply-To: <CAF=yD-+iT55h_QbQNR6RWa0R41N=3GCr+71+qr32GW=1oEc0Hg@mail.gmail.com>

On Wed, Apr 18, 2018 at 9:59 AM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>> One thing that was not clear to me about the API: shouldn't UDP_SEGMENT
>> just be automatically determined in the stack from the pmtu? Whats
>> the motivation for the socket option for this? also AIUI this can be
>> either a per-socket or a per-packet option?

I forgot to respond to the last point: yes, it is set either as a setsockopt or
passed as a cmsg for a given send call.

Especially when using unconnected sockets to communicate with many
clients, it is likely that this value will vary per call.

^ permalink raw reply

* Re: [PATCH 2/6] rhashtable: remove incorrect comment on r{hl, hash}table_walk_enter()
From: Herbert Xu @ 2018-04-18 14:28 UTC (permalink / raw)
  To: NeilBrown; +Cc: Thomas Graf, netdev, linux-kernel
In-Reply-To: <152403402192.16895.9740762152906281009.stgit2@noble>

On Wed, Apr 18, 2018 at 04:47:01PM +1000, NeilBrown wrote:
> Neither rhashtable_walk_enter() or rhltable_walk_enter() sleep, so
> remove the comments which suggest that they do.
> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  include/linux/rhashtable.h |    3 ---
>  lib/rhashtable.c           |    3 ---
>  2 files changed, 6 deletions(-)
> 
> diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
> index 87d443a5b11d..b01d88e196c2 100644
> --- a/include/linux/rhashtable.h
> +++ b/include/linux/rhashtable.h
> @@ -1268,9 +1268,6 @@ static inline int rhashtable_walk_init(struct rhashtable *ht,
>   * For a completely stable walk you should construct your own data
>   * structure outside the hash table.
>   *
> - * This function may sleep so you must not call it from interrupt
> - * context or with spin locks held.

It does a naked spin lock so even though we removed the memory
allocation you still mustn't call it from interrupt context.

Why do you need to do that anyway?

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel
From: Eric W. Biederman @ 2018-04-18 14:28 UTC (permalink / raw)
  To: Rahul Lakkireddy
  Cc: Dave Young, netdev@vger.kernel.org, kexec@lists.infradead.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Indranil Choudhury, Nirranjan Kirubaharan,
	stephen@networkplumber.org, Ganesh GR, akpm@linux-foundation.org,
	torvalds@linux-foundation.org, davem@davemloft.net,
	viro@zeniv.linux.org.uk
In-Reply-To: <20180418123114.GA19159@chelsio.com>

Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> writes:

> On Wednesday, April 04/18/18, 2018 at 11:45:46 +0530, Dave Young wrote:
>> Hi Rahul,
>> On 04/17/18 at 01:14pm, Rahul Lakkireddy wrote:
>> > On production servers running variety of workloads over time, kernel
>> > panic can happen sporadically after days or even months. It is
>> > important to collect as much debug logs as possible to root cause
>> > and fix the problem, that may not be easy to reproduce. Snapshot of
>> > underlying hardware/firmware state (like register dump, firmware
>> > logs, adapter memory, etc.), at the time of kernel panic will be very
>> > helpful while debugging the culprit device driver.
>> > 
>> > This series of patches add new generic framework that enable device
>> > drivers to collect device specific snapshot of the hardware/firmware
>> > state of the underlying device in the crash recovery kernel. In crash
>> > recovery kernel, the collected logs are added as elf notes to
>> > /proc/vmcore, which is copied by user space scripts for post-analysis.
>> > 
>> > The sequence of actions done by device drivers to append their device
>> > specific hardware/firmware logs to /proc/vmcore are as follows:
>> > 
>> > 1. During probe (before hardware is initialized), device drivers
>> > register to the vmcore module (via vmcore_add_device_dump()), with
>> > callback function, along with buffer size and log name needed for
>> > firmware/hardware log collection.
>> 
>> I assumed the elf notes info should be prepared while kexec_[file_]load
>> phase. But I did not read the old comment, not sure if it has been discussed
>> or not.
>> 
>
> We must not collect dumps in crashing kernel. Adding more things in
> crash dump path risks not collecting vmcore at all. Eric had
> discussed this in more detail at:
>
> https://lkml.org/lkml/2018/3/24/319
>
> We are safe to collect dumps in the second kernel. Each device dump
> will be exported as an elf note in /proc/vmcore.

It just occurred to me there is one variation that is worth
considering.

Is the area you are looking at dumping part of a huge mmio area?
I think someone said 2GB?

If that is the case it could be worth it to simply add the needed
addresses to the range of memory we need to dump, and simply having a
elf note saying that is what happened.

>> If do this in 2nd kernel a question is driver can be loaded later than vmcore init.
>
> Yes, drivers will add their device dumps after vmcore init.
>
>> How to guarantee the function works if vmcore reading happens before
>> the driver is loaded?
>> 
>> Also it is possible that kdump initramfs does not contains the driver
>> module.
>> 
>> Am I missing something?
>> 
>
> Yes, driver must be in initramfs if it wants to collect and add device
> dump to /proc/vmcore in second kernel.

Eric

^ permalink raw reply

* Re: [PATCH bpf-next v3 8/8] bpf: add documentation for eBPF helpers (58-64)
From: Quentin Monnet @ 2018-04-18 14:09 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: daniel, ast, netdev, oss-drivers, linux-doc, linux-man,
	John Fastabend
In-Reply-To: <20180418153448.574c6814@redhat.com>

2018-04-18 15:34 UTC+0200 ~ Jesper Dangaard Brouer <brouer@redhat.com>
> On Tue, 17 Apr 2018 15:34:38 +0100
> Quentin Monnet <quentin.monnet@netronome.com> wrote:
> 
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 350459c583de..3d329538498f 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -1276,6 +1276,50 @@ union bpf_attr {
>>   * 	Return
>>   * 		0 on success, or a negative error in case of failure.
>>   *
>> + * int bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags)
>> + * 	Description
>> + * 		Redirect the packet to the endpoint referenced by *map* at
>> + * 		index *key*. Depending on its type, his *map* can contain
>                                                     ^^^
> 
> "his" -> "this"

Thanks!

>> + * 		references to net devices (for forwarding packets through other
>> + * 		ports), or to CPUs (for redirecting XDP frames to another CPU;
>> + * 		but this is only implemented for native XDP (with driver
>> + * 		support) as of this writing).
>> + *
>> + * 		All values for *flags* are reserved for future usage, and must
>> + * 		be left at zero.
>> + * 	Return
>> + * 		**XDP_REDIRECT** on success, or **XDP_ABORT** on error.
>> + *
> 
> "XDP_ABORT" -> "XDP_ABORTED"

Ouch. And I did the same for bpf_redirect(). Thanks for the catch.

> 
> I don't know if it's worth mentioning in the doc/man-page; that for XDP
> using bpf_redirect_map() is a HUGE performance advantage, compared to
> the bpf_redirect() call ?

It seems worth to me. How would you simply explain the reason for this
difference?

Quentin

^ permalink raw reply

* Re: [PATCH net-next 0/5] virtio-net: Add SCTP checksum offload support
From: Michael S. Tsirkin @ 2018-04-18 14:06 UTC (permalink / raw)
  To: Vlad Yasevich
  Cc: Marcelo Ricardo Leitner, Vladislav Yasevich, netdev, linux-sctp,
	virtualization, jasowang, nhorman
In-Reply-To: <6bc762f6-d6fb-5471-2893-a888cce199f9@redhat.com>

On Tue, Apr 17, 2018 at 04:35:18PM -0400, Vlad Yasevich wrote:
> On 04/02/2018 10:47 AM, Marcelo Ricardo Leitner wrote:
> > On Mon, Apr 02, 2018 at 09:40:01AM -0400, Vladislav Yasevich wrote:
> >> Now that we have SCTP offload capabilities in the kernel, we can add
> >> them to virtio as well.  First step is SCTP checksum.
> > 
> > Thanks.
> > 
> >> As for GSO, the way sctp GSO is currently implemented buys us nothing
> >> in added support to virtio.  To add true GSO, would require a lot of
> >> re-work inside of SCTP and would require extensions to the virtio
> >> net header to carry extra sctp data.
> > 
> > Can you please elaborate more on this? Is this because SCTP GSO relies
> > on the gso skb format for knowing how to segment it instead of having
> > a list of sizes?
> > 
> 
> it's mainly because all the true segmentation, placing data into chunks,
> has already happened.  All that GSO does is allow for higher bundling
> rate between VMs. If that is all SCTP GSO ever going to do, that fine,
> but the goal is to do real GSO eventually and potentially reduce the
> amount of memory copying we are doing.
> If we do that, any current attempt at GSO in virtio would have to be
> depricated and we'd need GSO2 or something like that.

Batching helps virtualization *a lot* though.
Are there actual plans for GSO2? Is it just for SCTP?

> 
> This is why, after doing the GSO support, I decided not to include it.
> 
> -vlad
> >   Marcelo
> > 

^ permalink raw reply

* [PATCH] net: qmi_wwan: add Wistron Neweb D19Q1
From: Pawel Dembicki @ 2018-04-18 14:03 UTC (permalink / raw)
  Cc: Pawel Dembicki, Bjørn Mork, netdev, linux-usb, linux-kernel

This modem is embedded on dlink dwr-960 router.
The oem configuration states:

T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=480 MxCh= 0
D: Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=1435 ProdID=d191 Rev=ff.ff
S: Manufacturer=Android
S: Product=Android
S: SerialNumber=0123456789ABCDEF
C:* #Ifs= 6 Cfg#= 1 Atr=80 MxPwr=500mA
I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none)
E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none)
E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none)
E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
E: Ad=88(I) Atr=03(Int.) MxPS= 8 Ivl=32ms
E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 5 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=(none)
E: Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=125us

Tested on openwrt distribution

Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
---
 drivers/net/usb/qmi_wwan.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index ca066b7..c853e74 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -1107,6 +1107,7 @@ static const struct usb_device_id products[] = {
 	{QMI_FIXED_INTF(0x1435, 0xd181, 3)},	/* Wistron NeWeb D18Q1 */
 	{QMI_FIXED_INTF(0x1435, 0xd181, 4)},	/* Wistron NeWeb D18Q1 */
 	{QMI_FIXED_INTF(0x1435, 0xd181, 5)},	/* Wistron NeWeb D18Q1 */
+	{QMI_FIXED_INTF(0x1435, 0xd191, 4)},	/* Wistron NeWeb D19Q1 */
 	{QMI_FIXED_INTF(0x16d8, 0x6003, 0)},	/* CMOTech 6003 */
 	{QMI_FIXED_INTF(0x16d8, 0x6007, 0)},	/* CMOTech CHE-628S */
 	{QMI_FIXED_INTF(0x16d8, 0x6008, 0)},	/* CMOTech CMU-301 */
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH RFC net-next 00/11] udp gso
From: Willem de Bruijn @ 2018-04-18 13:59 UTC (permalink / raw)
  To: Sowmini Varadhan
  Cc: Samudrala, Sridhar, Network Development, Willem de Bruijn
In-Reply-To: <20180418123103.GC19633@oracle.com>

> One thing that was not clear to me about the API: shouldn't UDP_SEGMENT
> just be automatically determined in the stack from the pmtu? Whats
> the motivation for the socket option for this? also AIUI this can be
> either a per-socket or a per-packet option?

I decided to let the application explicitly set segment size, to avoid
bugs from the application assuming a different MTU from the one
used in the kernel for segmentation.

With path MTU, it is too easy for a process to incorrectly assume
link MTU or stale path MTU. With the current interface, if a process
tries to assemble segments larger than relevant path MTU, the
send call will fail.

A process may also explicitly want to send a chain of packets
smaller than MTU.

^ permalink raw reply

* Re: [PATCH net-next] team: account for oper state
From: George Wilkie @ 2018-04-18 13:35 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev
In-Reply-To: <20180418125644.GD1989@nanopsycho>

On Wed, Apr 18, 2018 at 02:56:44PM +0200, Jiri Pirko wrote:
> Wed, Apr 18, 2018 at 12:29:50PM CEST, gwilkie@vyatta.att-mail.com wrote:
> >Account for operational state when determining port linkup state,
> >as per Documentation/networking/operstates.txt.
> 
> Could you please point me to the exact place in the document where this
> is suggested?
> 

Various places cover it I think.

In 1. Introduction:
"interface is not usable just because the admin enabled it"
"userspace must be granted the possibility to
influence operational state"

In 4. Setting from userspace:
"the userspace application can set IFLA_OPERSTATE
to IF_OPER_DORMANT or IF_OPER_UP as long as the driver does not set
netif_carrier_off() or netif_dormant_on()"

We have a use case where we want to set the oper state of the team ports based
on whether they are actually usable or not (as opposed to just admin up).

Cheers.

> 
> >
> >Signed-off-by: George Wilkie <gwilkie@vyatta.att-mail.com>
> >---
> > drivers/net/team/team.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> >diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
> >index a6c6ce19eeee..231264a05e55 100644
> >--- a/drivers/net/team/team.c
> >+++ b/drivers/net/team/team.c
> >@@ -2918,7 +2918,8 @@ static int team_device_event(struct notifier_block *unused,
> > 	case NETDEV_CHANGE:
> > 		if (netif_running(port->dev))
> > 			team_port_change_check(port,
> >-					       !!netif_carrier_ok(port->dev));
> >+					       !!(netif_carrier_ok(port->dev) &&
> >+						  netif_oper_up(port->dev)));
> > 		break;
> > 	case NETDEV_UNREGISTER:
> > 		team_del_slave(port->team->dev, dev);
> >-- 
> >2.11.0
> >

^ permalink raw reply

* Re: [PATCH RFC net-next 00/11] udp gso
From: Willem de Bruijn @ 2018-04-18 13:51 UTC (permalink / raw)
  To: Sowmini Varadhan
  Cc: Eric Dumazet, Samudrala, Sridhar, Network Development,
	Willem de Bruijn
In-Reply-To: <20180418134706.GD19633@oracle.com>

On Wed, Apr 18, 2018 at 9:47 AM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> On (04/18/18 06:35), Eric Dumazet wrote:
>>
>> There is no change at all.
>>
>> This will only be used as a mechanism to send X packets of same size.
>>
>> So instead of X system calls , one system call.
>>
>> One traversal of some expensive part of the host stack.
>>
>> The content on the wire should be the same.
>
> I'm sorry that's not how I interpret Willem's email below
> (and maybe I misunderstood)
>
> the following taken from https://www.spinics.net/lists/netdev/msg496150.html
>
> Sowmini> If yes, how will the recvmsg differentiate between the case
> Sowmini> (2000 byte message followed by 512 byte message) and
> Sowmini> (1472 byte message, 526 byte message, then 512 byte message),
> Sowmini> in other words, how are UDP message boundary semantics preserved?
>
> Willem> They aren't. This is purely an optimization to amortize the cost of
> Willem> repeated tx stack traversal. Unlike UFO, which would preserve the
> Willem> boundaries of the original larger than MTU datagram.
>
> As I understand Willem's explanation, if I do a sendmsg of 2000 bytes,
> - classic UDP will send 2 IP fragments, the first one with a full UDP
>   header, and the IP header indicating that this is the first frag for
>   that ipid, with more frags to follow. The second frag will have the
>   rest with the same ipid, it will not have a udp header,
>   and it will indicatet that it is the last frag (no more frags).
>
>   The receiver can thus use the ipid, "more-frags" bit, frag offset etc
>   to stitch the 2000 byte udp message together and pass it up on the udp
>   socket.
>
> - in the "GSO" proposal my 2000  bytes of data are sent as *two*
>   udp packets, each of them with a unique udp header, and uh_len set
>   to 1476 (for first) and 526 (for second). The receiver has no clue
>   that they are both part of the same UDP datagram, So wire format
>   is not the same, am I mistaken?

Eric is correct. If the application sets a segment size with UDP_SEGMENT
this is an instruction to the kernel to split the payload along that border into
separate discrete datagrams.

It does not matter what the behavior is without setting this option. If a
process wants to send a larger than MTU datagram and rely on the
kernel to fragment, then it should not set the option.

^ permalink raw reply

* Re: [PATCH bpf-next 2/9] bpf: add bpf_get_stack helper
From: kbuild test robot @ 2018-04-18 13:49 UTC (permalink / raw)
  To: Yonghong Song; +Cc: kbuild-all, ast, daniel, netdev, kernel-team
In-Reply-To: <20180417174642.3342753-3-yhs@fb.com>

[-- Attachment #1: Type: text/plain, Size: 1796 bytes --]

Hi Yonghong,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Yonghong-Song/bpf-add-bpf_get_stack-helper/20180418-210810
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   kernel/bpf/core.c: In function 'bpf_prog_free_deferred':
>> kernel/bpf/core.c:1714:3: error: implicit declaration of function 'put_callchain_buffers' [-Werror=implicit-function-declaration]
      put_callchain_buffers();
      ^~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/put_callchain_buffers +1714 kernel/bpf/core.c

  1704	
  1705	static void bpf_prog_free_deferred(struct work_struct *work)
  1706	{
  1707		struct bpf_prog_aux *aux;
  1708		int i;
  1709	
  1710		aux = container_of(work, struct bpf_prog_aux, work);
  1711		if (bpf_prog_is_dev_bound(aux))
  1712			bpf_prog_offload_destroy(aux->prog);
  1713		if (aux->prog->need_callchain_buf)
> 1714			put_callchain_buffers();
  1715		for (i = 0; i < aux->func_cnt; i++)
  1716			bpf_jit_free(aux->func[i]);
  1717		if (aux->func_cnt) {
  1718			kfree(aux->func);
  1719			bpf_prog_unlock_free(aux->prog);
  1720		} else {
  1721			bpf_jit_free(aux->prog);
  1722		}
  1723	}
  1724	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 49847 bytes --]

^ permalink raw reply

* Re: [PATCH RFC net-next 00/11] udp gso
From: Willem de Bruijn @ 2018-04-18 13:49 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Network Development, Willem de Bruijn
In-Reply-To: <1524050274.2599.21.camel@redhat.com>

On Wed, Apr 18, 2018 at 7:17 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> On Tue, 2018-04-17 at 16:00 -0400, Willem de Bruijn wrote:
>> From: Willem de Bruijn <willemb@google.com>
>>
>> Segmentation offload reduces cycles/byte for large packets by
>> amortizing the cost of protocol stack traversal.
>>
>> This patchset implements GSO for UDP. A process can concatenate and
>> submit multiple datagrams to the same destination in one send call
>> by setting socket option SOL_UDP/UDP_SEGMENT with the segment size,
>> or passing an analogous cmsg at send time.
>>
>> The stack will send the entire large (up to network layer max size)
>> datagram through the protocol layer. At the GSO layer, it is broken
>> up in individual segments. All receive the same network layer header
>> and UDP src and dst port. All but the last segment have the same UDP
>> header, but the last may differ in length and checksum.
>
> This is interesting, thanks for sharing!
>
> I have some local patches somewhere implementing UDP GRO, but I never
> tried to upstream them, since I lacked the associated GSO and I thought
> that the use-case was not too relevant.
>
> Given that your use-case is a connected socket - no per packet route
> lookup - how does GSO performs compared to plain sendmmsg()? Have you
> considered using and/or improving the latter?
>
> When testing with Spectre/Meltdown mitigation in places, I expect that
> the most relevant part of the gain is due to the single syscall per
> burst.

The main benefit is actually not route lookup avoidance. Somewhat to
my surprise. The benchmark can be run both in connected and
unconnected ('-u') mode. Both saturate the cpu cycles, so only showing
throughput:

[connected]     udp tx:    825 MB/s   588336 calls/s  14008 msg/s
[unconnected] udp tx:    711 MB/s   506646 calls/s  12063 msg/s

This corresponds to results previously seen with other applications
of about 15%.

When looking at a perf report, there is no clear hot spot, which
indicates that the savings accrue across the protocol stack traversal.

I just hacked up a sendmmsg extension to the benchmark to verify.
Indeed that does not have nearly the same benefit as GSO:

udp tx:    976 MB/s   695394 calls/s  16557 msg/s

This matches the numbers seen from TCP without TSO and GSO.
That also has few system calls, but observes per MTU stack traversal.

I pushed the branch to my github at

  https://github.com/wdebruij/linux/tree/udpgso-20180418

and also the version I sent for RFC yesterday at

  https://github.com/wdebruij/linux/tree/udpgso-rfc-v1

^ permalink raw reply

* Re:Re: [PATCH v3] net: davicom: dm9000: Avoid spinlock recursion during dm9000_timeout routine
From: liuxiang @ 2018-04-18 13:48 UTC (permalink / raw)
  To: David Miller; +Cc: liu.xiang6, netdev, linux-kernel
In-Reply-To: <20180416.110501.92472500114183248.davem@davemloft.net>

Hi,
Because the timeout task gets the main spinlock and disable the current cpu's irq, 
there is no other task on the same cpu can run, and tasks on the other cpus can not
enter the dm9000_timeout() again. So in the whole dm9000_timeout() routine, 
db->timeout_cpu can not be changed by other tasks. Although smp_processor_id() may change 
after preempt_enable(), these tasks always get the false result when call dm9000_current_in_timeout.
Only the timeout task get the true result. And if there is no timeout, all the tasks that want to 
do asynchronous phy operation get the false result. So I think this can avoid racy.

At 2018-04-16 23:05:01, "David Miller" <davem@davemloft.net> wrote:
>From: Liu Xiang <liu.xiang6@zte.com.cn>
>Date: Sat, 14 Apr 2018 16:50:34 +0800
>
>> +static bool dm9000_current_in_timeout(struct board_info *db)
>> +{
>> +	bool ret = false;
>> +
>> +	preempt_disable();
>> +	ret = (db->timeout_cpu == smp_processor_id());
>> +	preempt_enable();
>
>This doesn't work.
>
>As soon as you do preempt_enable(), smp_processor_id() can change.

^ permalink raw reply

* Re: [PATCH RFC net-next 00/11] udp gso
From: Sowmini Varadhan @ 2018-04-18 13:47 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Willem de Bruijn, Samudrala, Sridhar, Network Development,
	Willem de Bruijn
In-Reply-To: <66ce1fb6-120f-ae49-704a-69915b317c6b@gmail.com>

On (04/18/18 06:35), Eric Dumazet wrote:
> 
> There is no change at all.
> 
> This will only be used as a mechanism to send X packets of same size.
> 
> So instead of X system calls , one system call.
> 
> One traversal of some expensive part of the host stack.
> 
> The content on the wire should be the same.

I'm sorry that's not how I interpret Willem's email below
(and maybe I misunderstood)

the following taken from https://www.spinics.net/lists/netdev/msg496150.html

Sowmini> If yes, how will the recvmsg differentiate between the case
Sowmini> (2000 byte message followed by 512 byte message) and
Sowmini> (1472 byte message, 526 byte message, then 512 byte message),
Sowmini> in other words, how are UDP message boundary semantics preserved?

Willem> They aren't. This is purely an optimization to amortize the cost of
Willem> repeated tx stack traversal. Unlike UFO, which would preserve the
Willem> boundaries of the original larger than MTU datagram.

As I understand Willem's explanation, if I do a sendmsg of 2000 bytes,
- classic UDP will send 2 IP fragments, the first one with a full UDP
  header, and the IP header indicating that this is the first frag for
  that ipid, with more frags to follow. The second frag will have the
  rest with the same ipid, it will not have a udp header,
  and it will indicatet that it is the last frag (no more frags).

  The receiver can thus use the ipid, "more-frags" bit, frag offset etc
  to stitch the 2000 byte udp message together and pass it up on the udp
  socket.

- in the "GSO" proposal my 2000  bytes of data are sent as *two*
  udp packets, each of them with a unique udp header, and uh_len set
  to 1476 (for first) and 526 (for second). The receiver has no clue
  that they are both part of the same UDP datagram, So wire format
  is not the same, am I mistaken?

--Sowmini

^ permalink raw reply

* Re: [PATCH RFC net-next 00/11] udp gso
From: Eric Dumazet @ 2018-04-18 13:35 UTC (permalink / raw)
  To: Sowmini Varadhan, Willem de Bruijn
  Cc: Samudrala, Sridhar, Network Development, Willem de Bruijn
In-Reply-To: <20180418123103.GC19633@oracle.com>



On 04/18/2018 05:31 AM, Sowmini Varadhan wrote:
> 
> I went through the patch set and the code looks fine- it extends existing
> infra for TCP/GSO to UDP.
> 
> One thing that was not clear to me about the API: shouldn't UDP_SEGMENT
> just be automatically determined in the stack from the pmtu? Whats
> the motivation for the socket option for this? also AIUI this can be
> either a per-socket or a per-packet option?
> 
> However, I share Sridhar's concerns about the very fundamental change
> to UDP message boundary semantics here.  

There is no change at all.

This will only be used as a mechanism to send X packets of same size.

So instead of X system calls , one system call.

One traversal of some expensive part of the host stack.

The content on the wire should be the same.

^ permalink raw reply

* Re: [PATCH bpf-next v3 8/8] bpf: add documentation for eBPF helpers (58-64)
From: Jesper Dangaard Brouer @ 2018-04-18 13:34 UTC (permalink / raw)
  To: Quentin Monnet
  Cc: brouer, daniel, ast, netdev, oss-drivers, linux-doc, linux-man,
	John Fastabend
In-Reply-To: <20180417143438.7018-9-quentin.monnet@netronome.com>

On Tue, 17 Apr 2018 15:34:38 +0100
Quentin Monnet <quentin.monnet@netronome.com> wrote:

> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 350459c583de..3d329538498f 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -1276,6 +1276,50 @@ union bpf_attr {
>   * 	Return
>   * 		0 on success, or a negative error in case of failure.
>   *
> + * int bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags)
> + * 	Description
> + * 		Redirect the packet to the endpoint referenced by *map* at
> + * 		index *key*. Depending on its type, his *map* can contain
                                                    ^^^

"his" -> "this"

> + * 		references to net devices (for forwarding packets through other
> + * 		ports), or to CPUs (for redirecting XDP frames to another CPU;
> + * 		but this is only implemented for native XDP (with driver
> + * 		support) as of this writing).
> + *
> + * 		All values for *flags* are reserved for future usage, and must
> + * 		be left at zero.
> + * 	Return
> + * 		**XDP_REDIRECT** on success, or **XDP_ABORT** on error.
> + *

"XDP_ABORT" -> "XDP_ABORTED"

I don't know if it's worth mentioning in the doc/man-page; that for XDP
using bpf_redirect_map() is a HUGE performance advantage, compared to
the bpf_redirect() call ?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCHv2 1/1] net/mlx4_core: avoid resetting HCA when accessing an  offline device
From: Zhu Yanjun @ 2018-04-18 13:31 UTC (permalink / raw)
  To: tariqt, netdev, linux-rdma

While a faulty cable is used or HCA firmware error, HCA device will
be offline. When the driver is accessing this offline device, the
following call trace will pop out.

"
...
  [<ffffffff816e4842>] dump_stack+0x63/0x81
  [<ffffffff816e459e>] panic+0xcc/0x21b
  [<ffffffffa03e5f8a>] mlx4_enter_error_state+0xba/0xf0 [mlx4_core]
  [<ffffffffa03e7298>] mlx4_cmd_reset_flow+0x38/0x60 [mlx4_core]
  [<ffffffffa03e7381>] mlx4_cmd_poll+0xc1/0x2e0 [mlx4_core]
  [<ffffffffa03e9f00>] __mlx4_cmd+0xb0/0x160 [mlx4_core]
  [<ffffffffa0406934>] mlx4_SENSE_PORT+0x54/0xd0 [mlx4_core]
  [<ffffffffa03f5f54>] mlx4_dev_cap+0x4a4/0xb50 [mlx4_core]
...
"
In the above call trace, the function mlx4_cmd_poll calls the function
mlx4_cmd_post to access the HCA while HCA is offline. Then mlx4_cmd_post
returns an error -EIO. Per -EIO, the function mlx4_cmd_poll calls
mlx4_cmd_reset_flow to reset HCA. And the above call trace pops out.

This is not reasonable. Since HCA device is offline when it is being
accessed, it should not be reset again.

In this patch, since HCA is offline, the function mlx4_cmd_post returns
an error -EINVAL. Per -EINVAL, the function mlx4_cmd_poll directly returns
instead of resetting HCA.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Suggested-by: Håkon Bugge <haakon.bugge@oracle.com>
Suggested-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
---
V1->V2: Follow Tariq's advice, avoid the disturbance from other returned errors.
Since the returned values from the function mlx4_cmd_post are -EIO and -EINVAL,
to -EIO, the HCA device should be reset. To -EINVAL, that means that the function
mlx4_cmd_post is accessing an offline device. It is not necessary to reset HCA.
Go to label out directly.
---
 drivers/net/ethernet/mellanox/mlx4/cmd.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index 6a9086d..df735b8 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -451,6 +451,8 @@ static int mlx4_cmd_post(struct mlx4_dev *dev, u64 in_param, u64 out_param,
 		 * Device is going through error recovery
 		 * and cannot accept commands.
 		 */
+		mlx4_err(dev, "%s : Device is in error recovery.\n", __func__);
+		ret = -EINVAL;
 		goto out;
 	}

@@ -610,8 +612,11 @@ static int mlx4_cmd_poll(struct mlx4_dev *dev, u64 in_param, u64 *out_param,

 	err = mlx4_cmd_post(dev, in_param, out_param ? *out_param : 0,
 			    in_modifier, op_modifier, op, CMD_POLL_TOKEN, 0);
-	if (err)
+	if (err) {
+		if (err == -EINVAL)
+			goto out;
 		goto out_reset;
+	}

 	end = msecs_to_jiffies(timeout) + jiffies;
 	while (cmd_pending(dev) && time_before(jiffies, end)) {
@@ -710,8 +715,11 @@ static int mlx4_cmd_wait(struct mlx4_dev *dev, u64 in_param, u64 *out_param,

 	err = mlx4_cmd_post(dev, in_param, out_param ? *out_param : 0,
 			    in_modifier, op_modifier, op, context->token, 1);
-	if (err)
+	if (err) {
+		if (err == -EINVAL)
+			goto out;
 		goto out_reset;
+	}

 	if (op == MLX4_CMD_SENSE_PORT) {
 		ret_wait =
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH bpf-next v2 02/11] bpf: make generic xdp compatible w/ bpf_xdp_adjust_tail
From: Nikita V. Shirokov @ 2018-04-18  4:48 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Alexei Starovoitov, Daniel Borkmann, David S. Miller , netdev
In-Reply-To: <20180418144818.0bcba1f9@redhat.com>

On Wed, Apr 18, 2018 at 02:48:18PM +0200, Jesper Dangaard Brouer wrote:
> On Tue, 17 Apr 2018 21:29:42 -0700
> "Nikita V. Shirokov" <tehnerd@tehnerd.com> wrote:
> 
> > w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as
> > well (only "decrease" of pointer's location is going to be supported).
> > changing of this pointer will change packet's size.
> > for generic XDP we need to reflect this packet's length change by
> > adjusting skb's tail pointer
> > 
> > Acked-by: Alexei Starovoitov <ast@kernel.org>
> 
> You are missing your own Signed-off-by: line on all of the patches.
> 
yeah, somehow lost it between v1 and v2 :) thanks !
> BTW, thank you for working on this! It have been on my todo-list for a
> while now!
> 
> _After_ this patchset, I would like to see adding support for
> "increasing" the data_end location to create a larger packet.  For that
> we should likely add a data_hard_end pointer.  This, would also be
> helpful in cpu_map_build_skb() to know the data_hard_end, to determine
> the frame size (as some driver doesn't use PAGE_SIZE frames, ixgbe).
> 
yeah, increasing the size would be nice to have, but will require more
thinking / rework on drivers side (as you pointed out it's not as easy
as "every driver have at least PAGE_SIZE of data available for xdp".).
will add to my TODO
> 
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [RFC PATCH] net: bridge: multicast querier per VLAN support
From: Joachim Nilsson @ 2018-04-18 13:25 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, Stephen Hemminger, roopa
In-Reply-To: <da36ee2f-d39b-d6c0-15b2-50bde81482ab@cumulusnetworks.com>

On Wed, Apr 18, 2018 at 04:14:26PM +0300, Nikolay Aleksandrov wrote:
> We want to avoid sysfs in general, all of networking config and stats
> are moving to netlink. It is better controlled and structured for such
> changes, also provides nice interfaces for automatic  type checks etc.

Aha, didn't know that. Thanks! :)

> Also (but a minor reason) there is no tree/entity in sysfs for the vlans
> where to add this. It will either have to be a file which does some
> format string hack (like us currently) or will need to add new tree for
> them which I'd really like to avoid for the bridge.

Yup, I did some ugly sysfs patches to read queriers per VLAN like that, just
for some basic feedback.  Really awful, although easy to debug because of it
being a simple file ... (I guess I'll have to make friends withe Netlink.)

> [..]
> Also after my vlan rhastable change, we have per-vlan context even today
> (e.g. per-vlan stats use it) so we'll just extend that.

Interesting, this I'll have to look at in more detail!

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox