Netdev List
 help / color / mirror / Atom feed
* Re: BUG: corrupted list in p9_read_work
From: Dmitry Vyukov @ 2018-10-11 13:27 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Leon Romanovsky, syzbot, David Miller, Eric Van Hensbergen, LKML,
	Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs,
	v9fs-developer
In-Reply-To: <20181011131045.GA32030@nautica>

On Thu, Oct 11, 2018 at 3:10 PM, Dominique Martinet
<asmadeus@codewreck.org> wrote:
> Dmitry Vyukov wrote on Thu, Oct 11, 2018:
>> > That's still the tricky part, I'm afraid... Making a separate server
>> > would have been easy because I could have reused some of my junk for the
>> > actual connection handling (some rdma helper library I wrote ages
>> > ago[1]), but if you're going to just embed C code you'll probably want
>> > something lower level? I've never seen syzkaller use any library call
>> > but I'm not even sure I would know how to create a qp without
>> > libibverbs, would standard stuff be OK ?
>>
>> Raw syscalls preferably.
>> What does 'rxe_cfg start ens3' do on syscall level? Some netlink?
>
> modprobe rdma_rxe (and a bunch of other rdma modules before that) then
> writes the interface name in /sys/module/rdma_rxe/parameters/add
> apparently; then checks it worked.
> this part could be done in C directly without too much trouble, but as
> long as the proper kernel configuration/modules are available

Now we are talking!
We generally assume that all modules are simply compiled into kernel.
At least that's we have on syzbot. If somebody can't compile them in,
we can suggest to add modprobe into init.
So this boils down to just writing to /sys/module/rdma_rxe/parameters/add.



>> Any libraries and utilities are hell pain in linux world. Will it work
>> in Android userspace? gVisor? Who will explain all syzkaller users
>> where they get this for their who-knows-what distro, which is 10 years
>> old because of corp policies, and debug how their version of the
>> library has a slightly incompatible version?
>> For example, after figuring out that rxe_cfg actually comes from
>> rdma-core (which is a separate delight on linux), my debian
>> destribution failed to install it because of some conflicts around
>> /etc/modprobe.d/mlx4.conf, and my ubuntu distro does not know about
>> such package. And we've just started :)
>
> The rdma ecosystem is a pain, I'll easily agree with that...
>
>> Syscalls tend to be simpler and more reliable. If it gives ENOSUPP,
>> ok, that's it. If it works, great, we can use it.
>
> I'll have to look into it a bit more; libibverbs abstracts a lot of
> stuff into per-nic userspace drivers (the files I cited in a previous
> mail) and basically with the mellanox cards I'm familiar with the whole
> user session looks like this:
>  * common libibverbs/rdmacm code opens /dev/infiniband/rdma_cm and
> /dev/infiniband/uverbs0 (plus a bunch of files to figure out abi
> version, what user driver to load etc)
>  * it and the userspace driver issue "commands" over these two files' fd
> to setup the connection ; some commands are standard but some are
> specific to the interface and defined in the driver.

But we will use some kind of virtual/stub driver, right? We don't have
real hardware. So all these commands should be fixed and known for the
virtual/stub driver.

> There are many facets to a connection in RDMA: a protection domain used
> to register memory with the nic, a queue pair that is the actual tx/rx
> connection, optionally a completion channel that will be another fd to
> listen on for events that tell you something happened and finally some
> memory regions to directly communicate with the nic from userspace
> depending on the specific driver.
>  * then there's the actual usage, more commands through the uverbs0 char
> device to register the memory you'll use, and once that's done it's
> entierly up to the driver - for example the mellanox lib can do
> everything in userspace playing with the memory regions it registered,
> but I'd wager the rxe driver does more calls through the uverbs0 fd...
>
> Honestly I'm not keen on reimplementing all of this; the interface
> itself pretty much depends on your version of the kernel (there is a
> common ABI defined, but as far as specific nics are concerned if your
> kernel module doesn't match the user library version you can get some
> nasty surprises), and it's far from the black or white of a good ol'
> ENOSUPP error.
>
>
> I'll look if I can figure out if there is a common subset of verbs
> commands that are standard and sufficient to setup a listening
> connection and exchange data that should be supported for all devices
> and would let us reimplement just that, but while I hear your point
> about android and ten years in the future I think it's more likely than
> ten years in the future the verb abi will have changed but libibverbs
> will just have the new version implemented and hide the change :P

But again we don't need to support all of the available hardware.
For example, we are testing net stack from external side using tun.
tun is a very simple, virtual abstraction of a network card. It allows
us to test all of generic net stack starting from L2 without messing
with any real drivers and their differences entirely. I had impression
that we are talking about something similar here too. Or not?

Also I am a bit missing context about rdma<->9p interface. Do we need
to setup all these ring buffers to satisfy the parts that 9p needs? Is
it that 9p actually reads data directly from these ring buffers? Or
there is some higher-level rdma interface that 9p uses?

^ permalink raw reply

* Re: [PATCH] qmi_wwan: Added support for Gemalto's Cinterion ALASxx WWAN interface
From: David Miller @ 2018-10-11  5:57 UTC (permalink / raw)
  To: gciofono; +Cc: netdev
In-Reply-To: <20181010180553.3986-1-gciofono@gmail.com>

From: Giacinto Cifelli <gciofono@gmail.com>
Date: Wed, 10 Oct 2018 20:05:53 +0200

> Added support for Gemalto's Cinterion ALASxx WWAN interfaces
> by adding QMI_FIXED_INTF with Cinterion's VID and PID.
> 
> Signed-off-by: Giacinto Cifelli <gciofono@gmail.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [net 1/1] tipc: queue socket protocol error messages into socket receive buffer
From: David Miller @ 2018-10-11  5:57 UTC (permalink / raw)
  To: jon.maloy
  Cc: netdev, gordan.mihaljevic, tung.q.nguyen, hoang.h.le, canh.d.luu,
	ying.xue, tipc-discussion
In-Reply-To: <1539186623-344-1-git-send-email-jon.maloy@ericsson.com>

From: Jon Maloy <jon.maloy@ericsson.com>
Date: Wed, 10 Oct 2018 17:50:23 +0200

> From: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
> 
> In tipc_sk_filter_rcv(), when we detect protocol messages with error we
> call tipc_sk_conn_proto_rcv() and let it reset the connection and notify
> the socket by calling sk->sk_state_change().
> 
> However, tipc_sk_filter_rcv() may have been called from the function
> tipc_backlog_rcv(), in which case the socket lock is held and the socket
> already awake. This means that the sk_state_change() call is ignored and
> the error notification lost. Now the receive queue will remain empty and
> the socket sleeps forever.
> 
> In this commit, we convert the protocol message into a connection abort
> message and enqueue it into the socket's receive queue. By this addition
> to the above state change we cover all conditions.
> 
> Acked-by: Ying Xue <ying.xue@windriver.com>
> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>

Applied.

^ permalink raw reply

* Re: [net 1/1] tipc: set link tolerance correctly in broadcast link
From: David Miller @ 2018-10-11  5:56 UTC (permalink / raw)
  To: jon.maloy
  Cc: netdev, gordan.mihaljevic, tung.q.nguyen, hoang.h.le, canh.d.luu,
	ying.xue, tipc-discussion
In-Reply-To: <1539185641-32711-1-git-send-email-jon.maloy@ericsson.com>

From: Jon Maloy <jon.maloy@ericsson.com>
Date: Wed, 10 Oct 2018 17:34:01 +0200

> In the patch referred to below we added link tolerance as an additional
> criteria for declaring broadcast transmission "stale" and resetting the
> affected links.
> 
> However, the 'tolerance' field of the broadcast link is never set, and
> remains at zero. This renders the whole commit without the intended
> improving effect, but luckily also with no negative effect.
> 
> In this commit we add the missing initialization.
> 
> Fixes: a4dc70d46cf1 ("tipc: extend link reset criteria for stale packet
> retransmission")
> 
> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>

Applied.

^ permalink raw reply

* Re: [PATCH net 0/2] net: dsa: bcm_sf2: Couple of fixes
From: David Miller @ 2018-10-11  5:53 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, andrew, vivien.didelot
In-Reply-To: <20181009234858.23920-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Tue,  9 Oct 2018 16:48:56 -0700

> Here are two fixes for the bcm_sf2 driver that were found during
> testing unbind and analysing another issue during system
> suspend/resume.

Series applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH net-next] net: sched: avoid writing on noop_qdisc
From: David Miller @ 2018-10-11  5:49 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, john.fastabend, eric.dumazet
In-Reply-To: <20181009222050.175348-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Tue,  9 Oct 2018 15:20:50 -0700

> While noop_qdisc.gso_skb and noop_qdisc.skb_bad_txq are not used
> in other places, it seems not correct to overwrite their fields
> in dev_init_scheduler_queue().
> 
> noop_qdisc is essentially a shared and read-only object, even if
> it is not marked as const because of some implementation detail.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] net/ipv6: Add knob to skip DELROUTE message on device down
From: David Miller @ 2018-10-11  5:48 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, roopa, dsahern
In-Reply-To: <20181009212751.17695-1-dsahern@kernel.org>

From: David Ahern <dsahern@kernel.org>
Date: Tue,  9 Oct 2018 14:27:51 -0700

> From: David Ahern <dsahern@gmail.com>
> 
> Another difference between IPv4 and IPv6 is the generation of RTM_DELROUTE
> notifications when a device is taken down (admin down) or deleted. IPv4
> does not generate a message for routes evicted by the down or delete;
> IPv6 does. A NOS at scale really needs to avoid these messages and have
> IPv4 and IPv6 behave similarly, relying on userspace to handle link
> notifications and evict the routes.
> 
> At this point existing user behavior needs to be preserved. Since
> notifications are a global action (not per app) the only way to preserve
> existing behavior and allow the messages to be skipped is to add a new
> sysctl (net/ipv6/route/skip_notify_on_dev_down) which can be set to
> disable the notificatioons.
> 
> IPv6 route code already supports the option to skip the message (it is
> used for multipath routes for example). Besides the new sysctl we need
> to pass the skip_notify setting through the generic fib6_clean and
> fib6_walk functions to fib6_clean_node and to set skip_notify on calls
> to __ip_del_rt for the addrconf_ifdown path.
> 
> Signed-off-by: David Ahern <dsahern@gmail.com>

This doesn't apply cleanly, and you seem to be saying that the anycast
and addrconf bits should be removed anyways.

^ permalink raw reply

* Re: [PATCH net-next] net/mpls: Implement handler for strict data checking on dumps
From: David Miller @ 2018-10-11  5:46 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, arnd, dsahern
In-Reply-To: <20181009181043.25350-1-dsahern@kernel.org>

From: David Ahern <dsahern@kernel.org>
Date: Tue,  9 Oct 2018 11:10:43 -0700

> From: David Ahern <dsahern@gmail.com>
> 
> Without CONFIG_INET enabled compiles fail with:
> 
> net/mpls/af_mpls.o: In function `mpls_dump_routes':
> af_mpls.c:(.text+0xed0): undefined reference to `ip_valid_fib_dump_req'
> 
> The preference is for MPLS to use the same handler as ipv4 and ipv6
> to allow consistency when doing a dump for AF_UNSPEC which walks
> all address families invoking the route dump handler. If INET is
> disabled then fallback to an MPLS version which can be tighter on
> the data checks.
> 
> Fixes: e8ba330ac0c5 ("rtnetlink: Update fib dumps for strict data checking")
> Reported-by: Randy Dunlap <rdunlap@infradead.org>
> Reported-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: David Ahern <dsahern@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net v2 0/2] net: ipv4: fixes for PMTU when link MTU changes
From: David Miller @ 2018-10-11  5:45 UTC (permalink / raw)
  To: sd; +Cc: netdev, dsahern, sbrivio
In-Reply-To: <cover.1539073548.git.sd@queasysnail.net>

From: Sabrina Dubroca <sd@queasysnail.net>
Date: Tue,  9 Oct 2018 17:48:13 +0200

> The first patch adapts the changes that commit e9fa1495d738 ("ipv6:
> Reflect MTU changes on PMTU of exceptions for MTU-less routes") did in
> IPv6 to IPv4: lower PMTU when the first hop's MTU drops below it, and
> raise PMTU when the first hop was limiting PMTU discovery and its MTU
> is increased.
> 
> The second patch fixes bugs introduced in commit d52e5a7e7ca4 ("ipv4:
> lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu") that
> only appear once the first patch is applied.
> 
> Selftests for these cases were introduced in net-next commit
> e44e428f59e4 ("selftests: pmtu: add basic IPv4 and IPv6 PMTU tests")
> 
> v2: add cover letter, and fix a few small things in patch 1

Series applied and queued up for -stable.

^ permalink raw reply

* Re: BUG: corrupted list in p9_read_work
From: Dominique Martinet @ 2018-10-11 13:10 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Leon Romanovsky, syzbot, David Miller, Eric Van Hensbergen, LKML,
	Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs,
	v9fs-developer
In-Reply-To: <CACT4Y+bJvxXeJxbAKCPTm0X608t302Kcq+F7Ccot3_-wx7sdog@mail.gmail.com>

Dmitry Vyukov wrote on Thu, Oct 11, 2018:
> > That's still the tricky part, I'm afraid... Making a separate server
> > would have been easy because I could have reused some of my junk for the
> > actual connection handling (some rdma helper library I wrote ages
> > ago[1]), but if you're going to just embed C code you'll probably want
> > something lower level? I've never seen syzkaller use any library call
> > but I'm not even sure I would know how to create a qp without
> > libibverbs, would standard stuff be OK ?
> 
> Raw syscalls preferably.
> What does 'rxe_cfg start ens3' do on syscall level? Some netlink?

modprobe rdma_rxe (and a bunch of other rdma modules before that) then
writes the interface name in /sys/module/rdma_rxe/parameters/add
apparently; then checks it worked.
this part could be done in C directly without too much trouble, but as
long as the proper kernel configuration/modules are available

> Any libraries and utilities are hell pain in linux world. Will it work
> in Android userspace? gVisor? Who will explain all syzkaller users
> where they get this for their who-knows-what distro, which is 10 years
> old because of corp policies, and debug how their version of the
> library has a slightly incompatible version?
> For example, after figuring out that rxe_cfg actually comes from
> rdma-core (which is a separate delight on linux), my debian
> destribution failed to install it because of some conflicts around
> /etc/modprobe.d/mlx4.conf, and my ubuntu distro does not know about
> such package. And we've just started :)

The rdma ecosystem is a pain, I'll easily agree with that...

> Syscalls tend to be simpler and more reliable. If it gives ENOSUPP,
> ok, that's it. If it works, great, we can use it.

I'll have to look into it a bit more; libibverbs abstracts a lot of
stuff into per-nic userspace drivers (the files I cited in a previous
mail) and basically with the mellanox cards I'm familiar with the whole
user session looks like this:
 * common libibverbs/rdmacm code opens /dev/infiniband/rdma_cm and
/dev/infiniband/uverbs0 (plus a bunch of files to figure out abi
version, what user driver to load etc) 
 * it and the userspace driver issue "commands" over these two files' fd
to setup the connection ; some commands are standard but some are
specific to the interface and defined in the driver.
There are many facets to a connection in RDMA: a protection domain used
to register memory with the nic, a queue pair that is the actual tx/rx
connection, optionally a completion channel that will be another fd to
listen on for events that tell you something happened and finally some
memory regions to directly communicate with the nic from userspace
depending on the specific driver.
 * then there's the actual usage, more commands through the uverbs0 char
device to register the memory you'll use, and once that's done it's
entierly up to the driver - for example the mellanox lib can do
everything in userspace playing with the memory regions it registered,
but I'd wager the rxe driver does more calls through the uverbs0 fd...

Honestly I'm not keen on reimplementing all of this; the interface
itself pretty much depends on your version of the kernel (there is a
common ABI defined, but as far as specific nics are concerned if your
kernel module doesn't match the user library version you can get some
nasty surprises), and it's far from the black or white of a good ol'
ENOSUPP error.


I'll look if I can figure out if there is a common subset of verbs
commands that are standard and sufficient to setup a listening
connection and exchange data that should be supported for all devices
and would let us reimplement just that, but while I hear your point
about android and ten years in the future I think it's more likely than
ten years in the future the verb abi will have changed but libibverbs
will just have the new version implemented and hide the change :P

-- 
Dominique

^ permalink raw reply

* Re: [PATCH V1 net-next 00/12] Improving performance and reducing latencies, by using latest capabilities exposed in ENA device
From: David Miller @ 2018-10-11  5:41 UTC (permalink / raw)
  To: akiyano
  Cc: netdev, dwmw, zorik, matua, saeedb, msw, aliguori, nafea, gtzalik,
	netanel, alisaidi
In-Reply-To: <1539110709-31954-1-git-send-email-akiyano@amazon.com>

From: <akiyano@amazon.com>
Date: Tue, 9 Oct 2018 21:44:57 +0300

> This patchset introduces the following:
> 1. A new placement policy of Tx headers and descriptors, which takes
> advantage of an option to place headers + descriptors in device memory
> space. This is sometimes referred to as LLQ - low latency queue.
> The patch set defines the admin capability, maps the device memory as
> write-combined, and adds a mode in transmit datapath to do header +
> descriptor placement on the device.
> 2. Support for RX checksum offloading
> 3. Miscelaneous small improvements and code cleanups

This doesn't apply cleanly to net-next, please respin.

^ permalink raw reply

* Re: [PATCH] virtio_net: enable tx after resuming from suspend
From: Jason Wang @ 2018-10-11 13:06 UTC (permalink / raw)
  To: ake
  Cc: netdev, virtualization, David S. Miller, linux-kernel,
	Michael S. Tsirkin
In-Reply-To: <7e87b140-79ae-c79e-40ed-dc76b38eeae4@igel.co.jp>



On 2018年10月11日 18:22, ake wrote:
>
> On 2018年10月11日 18:44, Jason Wang wrote:
>>
>> On 2018年10月11日 15:51, Ake Koomsin wrote:
>>> commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
>>> disabled the virtio tx before going to suspend to avoid a use after free.
>>> However, after resuming, it causes the virtio_net device to lose its
>>> network connectivity.
>>>
>>> To solve the issue, we need to enable tx after resuming.
>>>
>>> Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine during
>>> reset")
>>> Signed-off-by: Ake Koomsin <ake@igel.co.jp>
>>> ---
>>>    drivers/net/virtio_net.c | 1 +
>>>    1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index dab504ec5e50..3453d80f5f81 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -2256,6 +2256,7 @@ static int virtnet_restore_up(struct
>>> virtio_device *vdev)
>>>        }
>>>          netif_device_attach(vi->dev);
>>> +    netif_start_queue(vi->dev);
>> I believe this is duplicated with netif_tx_wake_all_queues() in
>> netif_device_attach() above?
> Thank you for your review.
>
> If both netif_tx_wake_all_queues() and netif_start_queue() result in
> clearing __QUEUE_STATE_DRV_XOFF, then is it possible that some
> conditions in netif_device_attach() is not satisfied?

Yes, maybe. One case I can see now is when the device is down, in this 
case netif_device_attach() won't try to wakeup the queue.

>   Without
> netif_start_queue(), the virtio_net device does not resume properly
> after waking up.

How do you trigger the issue? Just do suspend/resume?

>
> Is it better to report this as a bug first?

Nope, you're very welcome to post patch directly.

> If I am to do more
> investigation, what areas should I look into?

As you've figured out, you can start with why netif_tx_wake_all_queues() 
were not executed?

(Btw, does the issue disappear if you move netif_tx_disable() under the 
check of netif_running() in virtnet_freeze_down()?)

Thanks

>
> Best Regards
> Ake Koomsin
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* [PATCH] net: cdc_ncm: use tasklet_init() for tasklet_struct init
From: Ben Dooks @ 2018-10-11 13:03 UTC (permalink / raw)
  To: avem, oliver; +Cc: linux-usb, netdev, linux-kernel, linux-kernel, Ben Dooks

The tasklet initialisation would be better done by tasklet_init()
instead of assuming all the fields are in an ok state by default.

This does not fix any actual know bug.

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
---
 drivers/net/usb/cdc_ncm.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c
index 0d722b326e1b..863f3548a439 100644
--- a/drivers/net/usb/cdc_ncm.c
+++ b/drivers/net/usb/cdc_ncm.c
@@ -784,8 +784,7 @@ int cdc_ncm_bind_common(struct usbnet *dev, struct usb_interface *intf, u8 data_
 
 	hrtimer_init(&ctx->tx_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
 	ctx->tx_timer.function = &cdc_ncm_tx_timer_cb;
-	ctx->bh.data = (unsigned long)dev;
-	ctx->bh.func = cdc_ncm_txpath_bh;
+	tasklet_init(&ctx->bh, cdc_ncm_txpath_bh, (unsigned long)dev);
 	atomic_set(&ctx->stop, 0);
 	spin_lock_init(&ctx->mtx);
 
-- 
2.19.1

^ permalink raw reply related

* RE: [PATCH net-next v5] net/ncsi: Extend NC-SI Netlink interface to allow user space to send NC-SI command
From: Justin.Lee1 @ 2018-10-11  5:37 UTC (permalink / raw)
  To: sam, joel; +Cc: linux-aspeed, netdev, openbmc, amithash, christian, vijaykhemka
In-Reply-To: <e26c7aa664a2dad4f5bcf4efaa1b3eb655548b01.camel@mendozajonas.com>


> On Wed, 2018-10-10 at 18:11 +0000, Justin.Lee1@Dell.com wrote:
> <snip>
> > +
> > +	len = nla_len(info->attrs[NCSI_ATTR_DATA]);
> > +	if (len < sizeof(struct ncsi_pkt_hdr)) {
> > +		netdev_info(ndp->ndev.dev, "NCSI: no command to send %u\n",
> > +			    package_id);
> > +		ret = -EINVAL;
> > +		goto out_netlink;
> > +	} else {
> > +		data = (unsigned char *)nla_data(info->attrs[NCSI_ATTR_DATA]);
> > +	}
> 
> I only just noticed this, the call to nla_len() can cause a null-dereference if
> the NCSI_ATTR_DATA attribute isn't present; we need to make sure it exists
> before accessing it in info->attrs.
> 
> eg:
> 
> root@ozrom2-bmc:~# ./ncsi-netlink -l 2 -p 0 -c 0 --cmd
> [   81.399837] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> [   81.409092] pgd = ddaa9fa6
> [   81.413084] [00000000] *pgd=9702c831, *pte=00000000, *ppte=00000000
> [   81.420729] Internal error: Oops: 17 [#1] ARM
> [   81.426447] CPU: 0 PID: 1028 Comm: ncsi-netlink Not tainted 4.18.8-sammj-00144-gbc129f31bfa5 #12
> ...
> [   81.874434] Kernel panic - not syncing: Fatal exception
> 
> Cheers,
> Sam

Good catch! I will address this and generate the new patch.

Thanks,
Justin

^ permalink raw reply

* Re: [PATCH net-next v2] net: tun: remove useless codes of tun_automq_select_queue
From: David Miller @ 2018-10-11  5:35 UTC (permalink / raw)
  To: wangli39; +Cc: netdev
In-Reply-To: <20181009023204.62836-1-wangli39@baidu.com>

From: Wang Li <wangli39@baidu.com>
Date: Tue,  9 Oct 2018 10:32:04 +0800

> Because the function __skb_get_hash_symmetric always returns non-zero.
> 
> Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
> Signed-off-by: Wang Li <wangli39@baidu.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 0/3] nfp: flower: speed up stats update loop
From: David Miller @ 2018-10-11  5:33 UTC (permalink / raw)
  To: jakub.kicinski; +Cc: netdev, oss-drivers
In-Reply-To: <20181009015736.30268-1-jakub.kicinski@netronome.com>

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Mon,  8 Oct 2018 18:57:33 -0700

> This set from Pieter improves performance of processing FW stats
> update notifications.  The FW seems to send those at relatively
> high rate (roughly ten per second per flow), therefore if we want
> to approach the million flows mark we have to be very careful
> about our data structures.
> 
> We tried rhashtable for stat updates, but according to our experiments
> rhashtable lookup on a u32 takes roughly 60ns on an Xeon E5-2670 v3.
> Which translate to a hard limit of 16M lookups per second on this CPU,
> and, according to perf record jhash and memcmp account for 60% of CPU
> usage on the core handling the updates.
> 
> Given that our statistic IDs are already array indices, and considering
> each statistic is only 24B in size, we decided to forego the use
> of hashtables and use a directly indexed array.  The CPU savings are
> considerable.
> 
> With the recent improvements in TC core and with our own bottlenecks
> out of the way Pieter removes the artificial limit of 128 flows, and
> allows the driver to install as many flows as FW supports.

Series applied.

^ permalink raw reply

* Re: [PATCH net-next] tcp: refactor DCTCP ECN ACK handling
From: David Miller @ 2018-10-11  5:26 UTC (permalink / raw)
  To: ycheng; +Cc: netdev, edumazet, ncardwell, ysseung
In-Reply-To: <20181008223220.93230-1-ycheng@google.com>

From: Yuchung Cheng <ycheng@google.com>
Date: Mon,  8 Oct 2018 15:32:20 -0700

> DCTCP has two parts - a new ECN signalling mechanism and the response
> function to it. The first part can be used by other congestion
> control for DCTCP-ECN deployed networks. This patch moves that part
> into a separate tcp_dctcp.h to be used by other congestion control
> module (like how Yeah uses Vegas algorithmas). For example, BBR is
> experimenting such ECN signal currently
> https://tinyurl.com/ietf-102-iccrg-bbr2
> 
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> Signed-off-by: Yousuk Seung <ysseung@google.com>
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net/ipv6: Make ipv6_route_table_template static
From: David Miller @ 2018-10-11  5:26 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, dsahern
In-Reply-To: <20181008210634.19645-1-dsahern@kernel.org>

From: David Ahern <dsahern@kernel.org>
Date: Mon,  8 Oct 2018 14:06:34 -0700

> From: David Ahern <dsahern@gmail.com>
> 
> ipv6_route_table_template is exported but there are no users outside
> of route.c. Make it static.
> 
> Signed-off-by: David Ahern <dsahern@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] rtnetlink: Update comment in rtnl_stats_dump regarding strict data checking
From: David Miller @ 2018-10-11  5:26 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, dsahern
In-Reply-To: <20181008205807.18727-1-dsahern@kernel.org>

From: David Ahern <dsahern@kernel.org>
Date: Mon,  8 Oct 2018 13:58:07 -0700

> From: David Ahern <dsahern@gmail.com>
> 
> The NLM_F_DUMP_PROPER_HDR netlink flag was replaced by a setsockopt.
> Update the comment in rtnl_stats_dump.
> 
> Fixes: 841891ec0c65 ("rtnetlink: Update rtnl_stats_dump for strict data checking")
> Reported-by: Christian Brauner <christian@brauner.io>
> Signed-off-by: David Ahern <dsahern@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] rtnetlink: Move ifm in valid_fdb_dump_legacy to closer to use
From: David Miller @ 2018-10-11  5:26 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, dsahern
In-Reply-To: <20181008205724.13030-1-dsahern@kernel.org>

From: David Ahern <dsahern@kernel.org>
Date: Mon,  8 Oct 2018 13:57:24 -0700

> From: David Ahern <dsahern@gmail.com>
> 
> Move setting of local variable ifm to after the message parsing in
> valid_fdb_dump_legacy. Avoid potential future use of unchecked variable.
> 
> Fixes: 8dfbda19a21b ("rtnetlink: Move input checking for rtnl_fdb_dump to helper")
> Reported-by: Christian Brauner <christian@brauner.io>
> Signed-off-by: David Ahern <dsahern@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 0/3] mlxsw: selftests: Few small updates
From: David Miller @ 2018-10-11  5:23 UTC (permalink / raw)
  To: idosch; +Cc: netdev, jiri, petrm, nird
In-Reply-To: <20181008184945.788-1-idosch@mellanox.com>

From: Ido Schimmel <idosch@mellanox.com>
Date: Mon, 8 Oct 2018 18:50:38 +0000

> First patch fixes a typo in mlxsw.
> 
> Second patch fixes a race in a recent test.
> 
> Third patch makes a recent test executable.

Series applied, thanks Ido.

^ permalink raw reply

* Re: [PATCH net] rds: RDS (tcp) hangs on sendto() to unresponding address
From: David Miller @ 2018-10-11  5:21 UTC (permalink / raw)
  To: ka-cheong.poon; +Cc: netdev, santosh.shilimkar, rds-devel
In-Reply-To: <1539015431-26974-1-git-send-email-ka-cheong.poon@oracle.com>

From: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Date: Mon,  8 Oct 2018 09:17:11 -0700

> In rds_send_mprds_hash(), if the calculated hash value is non-zero and
> the MPRDS connections are not yet up, it will wait.  But it should not
> wait if the send is non-blocking.  In this case, it should just use the
> base c_path for sending the message.
> 
> Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>

Applied.

^ permalink raw reply

* Re: BUG: corrupted list in p9_read_work
From: Dmitry Vyukov @ 2018-10-11 12:33 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Leon Romanovsky, syzbot, David Miller, Eric Van Hensbergen, LKML,
	Latchesar Ionkov, netdev, Ron Minnich, syzkaller-bugs,
	v9fs-developer
In-Reply-To: <20181010155814.GC20918@nautica>

On Wed, Oct 10, 2018 at 5:58 PM, Dominique Martinet
<asmadeus@codewreck.org> wrote:
> Dmitry Vyukov wrote on Wed, Oct 10, 2018:
>> > The problem is that you can't just give the client a file like trans fd;
>> > you'd need to open an ""rdma socket"" (simplifying wording a bit), and
>> > afaik there is no standard tool for it ; or rather, the problem is that
>> > RDMA is packet based so even if there were you can't just write stuff
>> > in a fd and hope it'll work, so you need a server.
>> >
>> > If you're interested, 9p is trivial enough that I could provide you with
>> > a trivial server that works like your file (just need to reimplement
>> > something that parses header to packetize it properly; so you could
>> > write to its stdin for example) ; that'd require some setup in the VM
>> > (configure rxe and install that tool), but it would definitely be
>> > possible.
>> > What do you think ?
>>
>> I would like to hear more details.
>> Opening a socket is not a problem. Why do we need a tool for this?
>
> Sorry, that's my head thinking unixy and piping things :)
>
>> I don't understand the problem with "packet-based" and what does it
>> mean to have a separate server? Any why?
>
> Packet-based means you can't just read/write in a fd and expect the
> other side to know where to cut the packets to send it to the client,
> but if we do it internally there's no problem. We know where to cut.

Ah, OK. This should not be a problem because the descriptions:
https://github.com/google/syzkaller/blob/master/sys/linux/9p.txt#L70
know what a packet is. So we always give write a single packet.


>> We definitely don't want to involve a separate third-party server,
>> that's very problematic for multiple reasons. But we can have a chunk
>> of custom C code inside of syzkaller.
>> What exactly setup we need?
>
> The setup itself isn't that bad, it's actually pretty much trivial - on
> a fedora VM I just had to run 'rxe_cfg start ens3' (virtio interface
> name) and then the infiniband tools are happy e.g. ibv_devinfo should
> list an interface if you have the userspace library that should have
> come with rxe_cfg.
> (specifically, my VM uses /etc/libibverbs.d/rxe.driver to point to the
> lib, and /usr/lib64/libibverbs/librxe-rdmav16.so the lib itself)
>
> Once tools like ibv_devinfo list the interface, it means syzkaller can
> use it, and very probably means the kernel can as well; that's it.
>
>
>> I guess it will make things simpler if you provide some kind of "hello
>> world" C program that mounts 9p/rdma. I don't need exact messages
>> (they will be same as with pipe transport, right?) nor actual server
>> implementation, but just the place where to inject these packets.
>
> That's still the tricky part, I'm afraid... Making a separate server
> would have been easy because I could have reused some of my junk for the
> actual connection handling (some rdma helper library I wrote ages
> ago[1]), but if you're going to just embed C code you'll probably want
> something lower level? I've never seen syzkaller use any library call
> but I'm not even sure I would know how to create a qp without
> libibverbs, would standard stuff be OK ?

Raw syscalls preferably.
What does 'rxe_cfg start ens3' do on syscall level? Some netlink?

Any libraries and utilities are hell pain in linux world. Will it work
in Android userspace? gVisor? Who will explain all syzkaller users
where they get this for their who-knows-what distro, which is 10 years
old because of corp policies, and debug how their version of the
library has a slightly incompatible version?
For example, after figuring out that rxe_cfg actually comes from
rdma-core (which is a separate delight on linux), my debian
destribution failed to install it because of some conflicts around
/etc/modprobe.d/mlx4.conf, and my ubuntu distro does not know about
such package. And we've just started :)

Syscalls tend to be simpler and more reliable. If it gives ENOSUPP,
ok, that's it. If it works, great, we can use it.

^ permalink raw reply

* Re: Re: [PATCH net-next v2 0/5] virtio: support packed ring
From: Tiwei Bie @ 2018-10-11 12:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, virtualization, linux-kernel, netdev, virtio-dev,
	wexu, jfreimann
In-Reply-To: <20181010103335-mutt-send-email-mst@kernel.org>

On Wed, Oct 10, 2018 at 10:36:26AM -0400, Michael S. Tsirkin wrote:
> On Thu, Sep 13, 2018 at 05:47:29PM +0800, Jason Wang wrote:
> > On 2018年09月13日 16:59, Tiwei Bie wrote:
> > > > If what you say is true then we should take a careful look
> > > > and not supporting these generic things with packed layout.
> > > > Once we do support them it will be too late and we won't
> > > > be able to get performance back.
> > > I think it's a good point that we don't need to support
> > > everything in packed ring (especially these which would
> > > hurt the performance), as the packed ring aims at high
> > > performance. I'm also wondering about the features. Is
> > > there any possibility that we won't support the out of
> > > order processing (at least not by default) in packed ring?
> > > If I didn't miss anything, the need to support out of order
> > > processing in packed ring will make the data structure
> > > inside the driver not cache friendly which is similar to
> > > the case of the descriptor table in the split ring (the
> > > difference is that, it only happens in driver now).
> > 
> > Out of order is not the only user, DMA is another one. We don't have used
> > ring(len), so we need to maintain buffer length somewhere even for in order
> > device.
> 
> For a bunch of systems dma unmap is a nop so we do not really
> need to maintain it. It's a question of an API to detect that
> and optimize for it. I posted a proposed patch for that -
> want to try using that?

Yeah, definitely!

> 
> > But if it's not too late, I second for a OUT_OF_ORDER feature.
> > Starting from in order can have much simpler code in driver.
> > 
> > Thanks
> 
> It's tricky to change the flag polarity because of compatibility
> with legacy interfaces. Why is this such a big deal?
> 
> Let's teach drivers about IN_ORDER, then if devices
> are in order it will get enabled by default.

Yeah, make sense.

Besides, I have done some further profiling and debugging
both in kernel driver and DPDK vhost. Previously I was mislead
by a bug in vhost code. I will send a patch to fix that bug.
With that bug fixed, the performance of packed ring in the
test between kernel driver and DPDK vhost is better now.
I will send a new series soon. Thanks!

> 
> -- 
> MST

^ permalink raw reply

* Re: net/tipc: recursive locking in tipc_link_reset
From: Dmitry Vyukov @ 2018-10-11 12:11 UTC (permalink / raw)
  To: Ying Xue; +Cc: Jon Maloy, David Miller, netdev, tipc-discussion, LKML
In-Reply-To: <97d8c196-cd11-8859-fc14-ed12191f52af@windriver.com>

On Thu, Oct 11, 2018 at 2:03 PM, Ying Xue <ying.xue@windriver.com> wrote:
>>> Hi,
>>>
>>> I am getting the following error while booting the latest kernel on
>>> bb2d8f2f61047cbde08b78ec03e4ebdb01ee5434 (Oct 10). Config is attached.
>>>
>>> Since this happens during boot, this makes LOCKDEP completely
>>> unusable, does not allow to discover any other locking issues and
>>> masks all new bugs being introduced into kernel.
>>> Please fix asap.
>>> Thanks
>> -parthasarathy.bhuvaragan address as it gives me bounces
>> but this is highly likely due to:
>>
>> commit 3f32d0be6c16b902b687453c962d17eea5b8ea19
>> Author: Parthasarathy Bhuvaragan
>> Date:   Tue Sep 25 22:09:10 2018 +0200
>>
>>     tipc: lock wakeup & inputq at tipc_link_reset()
>>
>>
>
> Dmitry, I agree with you. The complaint should be caused by the commit
> above. Please try to verify the patch:
> https://patchwork.ozlabs.org/patch/982447.


I trust you for testing ;)

Thanks for the quick fix!

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox