Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next] net/ncsi: Refactor MAC, VLAN filters
From: David Miller @ 2018-04-09  2:40 UTC (permalink / raw)
  To: sam; +Cc: netdev, linux-kernel, openbmc
In-Reply-To: <20180409021128.10055-1-sam@mendozajonas.com>


The net-next tree is closed at this time, please resend this when the
merge window is over and the net-next tree opens back up.

Thank you.

^ permalink raw reply

* Re: [PATCH] vhost-net: set packet weight of tx polling to 2 * vq size
From: Michael S. Tsirkin @ 2018-04-09  2:42 UTC (permalink / raw)
  To: haibinzhang(张海斌)
  Cc: Jason Wang, kvm@vger.kernel.org,
	virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	lidongchen(陈立东),
	yunfangtai(台运方)
In-Reply-To: <88D661ADF6AFBF42B2AB88D8E7682B0901FC47D3@EXMBX-SZMAIL011.tencent.com>

On Fri, Apr 06, 2018 at 08:22:37AM +0000, haibinzhang(张海斌) wrote:
> handle_tx will delay rx for tens or even hundreds of milliseconds when tx busy
> polling udp packets with small length(e.g. 1byte udp payload), because setting
> VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet length.
> 
> Ping-Latencies shown below were tested between two Virtual Machines using
> netperf (UDP_STREAM, len=1), and then another machine pinged the client:
> 
> Packet-Weight      Ping-Latencies(millisecond)
>                    min      avg       max
> Origin           3.319   18.489    57.303
> 64               1.643    2.021     2.552
> 128              1.825    2.600     3.224
> 256              1.997    2.710     4.295
> 512              1.860    3.171     4.631
> 1024             2.002    4.173     9.056
> 2048             2.257    5.650     9.688
> 4096             2.093    8.508    15.943

And this is with Q size 256 right?

> Ring size is a hint from device about a burst size it can tolerate. Based on
> benchmarks, set the weight to 2 * vq size.
> 
> To evaluate this change, another tests were done using netperf(RR, TX) between
> two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size was
> tweaked through qemu. Results shown below does not show obvious changes.

What I asked for is ping-latency with different VQ sizes,
streaming below does not show anything.

> vq size=256 TCP_RR                vq size=512 TCP_RR
> size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
>    1/       1/  -7%/        -2%      1/       1/   0%/        -2%
>    1/       4/  +1%/         0%      1/       4/  +1%/         0%
>    1/       8/  +1%/        -2%      1/       8/   0%/        +1%
>   64/       1/  -6%/         0%     64/       1/  +7%/        +3%
>   64/       4/   0%/        +2%     64/       4/  -1%/        +1%
>   64/       8/   0%/         0%     64/       8/  -1%/        -2%
>  256/       1/  -3%/        -4%    256/       1/  -4%/        -2%
>  256/       4/  +3%/        +4%    256/       4/  +1%/        +2%
>  256/       8/  +2%/         0%    256/       8/  +1%/        -1%
> 
> vq size=256 UDP_RR                vq size=512 UDP_RR
> size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
>    1/       1/  -5%/        +1%      1/       1/  -3%/        -2%
>    1/       4/  +4%/        +1%      1/       4/  -2%/        +2%
>    1/       8/  -1%/        -1%      1/       8/  -1%/         0%
>   64/       1/  -2%/        -3%     64/       1/  +1%/        +1%
>   64/       4/  -5%/        -1%     64/       4/  +2%/         0%
>   64/       8/   0%/        -1%     64/       8/  -2%/        +1%
>  256/       1/  +7%/        +1%    256/       1/  -7%/         0%
>  256/       4/  +1%/        +1%    256/       4/  -3%/        -4%
>  256/       8/  +2%/        +2%    256/       8/  +1%/        +1%
> 
> vq size=256 TCP_STREAM            vq size=512 TCP_STREAM
> size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
>   64/       1/   0%/        -3%     64/       1/   0%/         0%
>   64/       4/  +3%/        -1%     64/       4/  -2%/        +4%
>   64/       8/  +9%/        -4%     64/       8/  -1%/        +2%
>  256/       1/  +1%/        -4%    256/       1/  +1%/        +1%
>  256/       4/  -1%/        -1%    256/       4/  -3%/         0%
>  256/       8/  +7%/        +5%    256/       8/  -3%/         0%
>  512/       1/  +1%/         0%    512/       1/  -1%/        -1%
>  512/       4/  +1%/        -1%    512/       4/   0%/         0%
>  512/       8/  +7%/        -5%    512/       8/  +6%/        -1%
> 1024/       1/   0%/        -1%   1024/       1/   0%/        +1%
> 1024/       4/  +3%/         0%   1024/       4/  +1%/         0%
> 1024/       8/  +8%/        +5%   1024/       8/  -1%/         0%
> 2048/       1/  +2%/        +2%   2048/       1/  -1%/         0%
> 2048/       4/  +1%/         0%   2048/       4/   0%/        -1%
> 2048/       8/  -2%/         0%   2048/       8/   5%/        -1%
> 4096/       1/  -2%/         0%   4096/       1/  -2%/         0%
> 4096/       4/  +2%/         0%   4096/       4/   0%/         0%
> 4096/       8/  +9%/        -2%   4096/       8/  -5%/        -1%
> 
> Signed-off-by: Haibin Zhang <haibinzhang@tencent.com>
> Signed-off-by: Yunfang Tai <yunfangtai@tencent.com>
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>

Code is fine but I'd like to see validation of the heuristic
2*vq->num with another vq size.



> ---
>  drivers/vhost/net.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 8139bc70ad7d..3563a305cc0a 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -44,6 +44,10 @@ MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;"
>   * Using this limit prevents one virtqueue from starving others. */
>  #define VHOST_NET_WEIGHT 0x80000
>  
> +/* Max number of packets transferred before requeueing the job.
> + * Using this limit prevents one virtqueue from starving rx. */
> +#define VHOST_NET_PKT_WEIGHT(vq) ((vq)->num * 2)
> +
>  /* MAX number of TX used buffers for outstanding zerocopy */
>  #define VHOST_MAX_PEND 128
>  #define VHOST_GOODCOPY_LEN 256
> @@ -473,6 +477,7 @@ static void handle_tx(struct vhost_net *net)
>  	struct socket *sock;
>  	struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
>  	bool zcopy, zcopy_used;
> +	int sent_pkts = 0;
>  
>  	mutex_lock(&vq->mutex);
>  	sock = vq->private_data;
> @@ -580,7 +585,8 @@ static void handle_tx(struct vhost_net *net)
>  		else
>  			vhost_zerocopy_signal_used(net, vq);
>  		vhost_net_tx_packet(net);
> -		if (unlikely(total_len >= VHOST_NET_WEIGHT)) {
> +		if (unlikely(total_len >= VHOST_NET_WEIGHT) ||
> +		    unlikely(++sent_pkts >= VHOST_NET_PKT_WEIGHT(vq))) {
>  			vhost_poll_queue(&vq->poll);
>  			break;
>  		}
> -- 
> 2.12.3
> 

^ permalink raw reply

* Re: kernel BUG at drivers/vhost/vhost.c:LINE! (2)
From: Michael S. Tsirkin @ 2018-04-09  2:44 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: syzbot, Jason Wang, kvm, linux-kernel, netdev, syzkaller-bugs,
	Linux Virtualization
In-Reply-To: <CAJSP0QVHk7NSQcjfLWq13=1Vzm=_vtmKZqp4dDjuh8ETLO5g_g@mail.gmail.com>

On Mon, Apr 09, 2018 at 10:37:45AM +0800, Stefan Hajnoczi wrote:
> On Sat, Apr 7, 2018 at 3:02 AM, syzbot
> <syzbot+65a84dde0214b0387ccd@syzkaller.appspotmail.com> wrote:
> > syzbot hit the following crash on upstream commit
> > 38c23685b273cfb4ccf31a199feccce3bdcb5d83 (Fri Apr 6 04:29:35 2018 +0000)
> > Merge tag 'armsoc-drivers' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
> > syzbot dashboard link:
> > https://syzkaller.appspot.com/bug?extid=65a84dde0214b0387ccd
> 
> To prevent duplicated work: I am working on this one.
> 
> Stefan

Do you want to try this patchset:
https://lkml.org/lkml/2018/4/5/665

?

-- 
MST

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2018-04-09  2:50 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


1) The sockmap code has to free socket memory on close if
   there is corked data, from John Fastabend.

2) Tunnel names coming from userspace need to be length
   validated.  From Eric Dumazet.

3) arp_filter() has to take VRFs properly into account, from
   Miguel Fadon Perlines.

4) Fix oops in error path of tcf_bpf_init(), from Davide Caratti.

5) Missing idr_remove() in u32_delete_key(), from Cong Wang.

6) More syzbot stuff.  Several use of uninitialized value fixes all
   over, from Eric Dumazet.

7) Do not leak kernel memory to userspace in sctp, also from Eric
   Dumazet.

8) Discard frames from unused ports in DSA, from Andrew Lunn.

9) Fix DMA mapping and reset/failover problems in ibmvnic, from Thomas
   Falcon.

10) Do not access dp83640 PHY registers prematurely after reset, from
    Esben Haabendal.

Please pull, thanks a lot!

The following changes since commit 06dd3dfeea60e2a6457a6aedf97afc8e6d2ba497:

  Merge tag 'char-misc-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc (2018-04-04 20:07:20 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to 76327a35caabd1a932e83d6a42b967aa08584e5d:

  dp83640: Ensure against premature access to PHY registers after reset (2018-04-08 19:58:52 -0400)

----------------------------------------------------------------
Anders Roxell (1):
      kernel/bpf/syscall: fix warning defined but not used

Andrew Lunn (1):
      net: dsa: Discard frames from unused ports

Anirudh Venkataramanan (1):
      ice: Bug fixes in ethtool code

Cong Wang (2):
      net_sched: fix a missing idr_remove() in u32_delete_key()
      tipc: use the right skb in tipc_sk_fill_sock_diag()

David S. Miller (7):
      Merge branch 'net-tunnel-name-validate'
      Merge branch 'hv_netvsc-Fix-shutdown-issues-on-older-Windows-hosts'
      Merge branch '100GbE' of git://git.kernel.org/.../jkirsher/net-queue
      Merge branch 'net-fix-uninit-values-in-networking-stack'
      Merge branch 'ibmvnic-Fix-driver-reset-and-DMA-bugs'
      Merge branch 'for-upstream' of git://git.kernel.org/.../bluetooth/bluetooth
      Merge git://git.kernel.org/.../bpf/bpf

Davide Caratti (1):
      net/sched: fix NULL dereference in the error path of tcf_bpf_init()

Eric Dumazet (16):
      net: fool proof dev_valid_name()
      ip_tunnel: better validate user provided tunnel names
      ipv6: sit: better validate user provided tunnel names
      ip6_gre: better validate user provided tunnel names
      ip6_tunnel: better validate user provided tunnel names
      vti6: better validate user provided tunnel names
      crypto: af_alg - fix possible uninit-value in alg_bind()
      netlink: fix uninit-value in netlink_sendmsg
      net: fix rtnh_ok()
      net: initialize skb->peeked when cloning
      net: fix uninit-value in __hw_addr_add_ex()
      dccp: initialize ireq->ir_mark
      ipv4: fix uninit-value in ip_route_output_key_hash_rcu()
      soreuseport: initialise timewait reuseport field
      sctp: do not leak kernel memory to user space
      sctp: sctp_sockaddr_af must check minimal addr length for AF_INET6

Esben Haabendal (4):
      net: phy: marvell: Enable interrupt function on LED2 pin
      net/fsl_pq_mdio: Allow explicit speficition of TBIPA address
      ARM: dts: ls1021a: Specify TBIPA register address
      dp83640: Ensure against premature access to PHY registers after reset

Jeff Barnhill (1):
      net/ipv6: Increment OUTxxx counters after netfilter hook

Jiri Pirko (1):
      devlink: convert occ_get op to separate registration

John Fastabend (2):
      bpf: sockmap, free memory on sock close with cork data
      bpf: sockmap, duplicates release calls may NULL sk_prot

Maxime Chevallier (1):
      net: mvpp2: Fix parser entry init boundary check

Miguel Fadon Perlines (1):
      arp: fix arp_filter on l3slave devices

Mohammed Gamal (4):
      hv_netvsc: Use Windows version instead of NVSP version on GPAD teardown
      hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()
      hv_netvsc: Ensure correct teardown message sequence order
      hv_netvsc: Pass net_device parameter to revoke and teardown functions

Nathan Fontenot (1):
      ibmvnic: Do not reset CRQ for Mobility driver resets

Szymon Janc (1):
      Bluetooth: Fix connection if directed advertising and privacy is used

Thomas Falcon (4):
      ibmvnic: Fix DMA mapping mistakes
      ibmvnic: Zero used TX descriptor counter on reset
      ibmvnic: Fix reset scheduler error handling
      ibmvnic: Fix failover case for non-redundant configuration

Wei Yongjun (1):
      ice: Fix error return code in ice_init_hw()

 Documentation/devicetree/bindings/net/fsl-tsec-phy.txt |   6 +++-
 arch/arm/boot/dts/ls1021a.dtsi                         |   3 +-
 crypto/af_alg.c                                        |   8 ++---
 drivers/net/ethernet/freescale/fsl_pq_mdio.c           |  50 ++++++++++++++++++++----------
 drivers/net/ethernet/ibm/ibmvnic.c                     | 146 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------------------
 drivers/net/ethernet/ibm/ibmvnic.h                     |   1 +
 drivers/net/ethernet/intel/ice/ice_common.c            |   4 ++-
 drivers/net/ethernet/intel/ice/ice_ethtool.c           |   4 +--
 drivers/net/ethernet/marvell/mvpp2.c                   |   2 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c         |  24 +++------------
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h         |   1 -
 drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c    |  67 +++++++++++++++++++++++-----------------
 drivers/net/hyperv/netvsc.c                            |  60 ++++++++++++++++++++++++++----------
 drivers/net/netdevsim/devlink.c                        |  65 +++++++++++++++++++--------------------
 drivers/net/phy/dp83640.c                              |  18 +++++++++++
 drivers/net/phy/marvell.c                              |  20 ++++++++++--
 include/net/bluetooth/hci_core.h                       |   2 +-
 include/net/devlink.h                                  |  40 +++++++++++++++---------
 include/net/inet_timewait_sock.h                       |   1 +
 include/net/nexthop.h                                  |   2 +-
 kernel/bpf/sockmap.c                                   |  12 ++++++--
 kernel/bpf/syscall.c                                   |  24 +++++++--------
 net/bluetooth/hci_conn.c                               |  29 +++++++++++++-----
 net/bluetooth/hci_event.c                              |  15 ++++++---
 net/bluetooth/l2cap_core.c                             |   2 +-
 net/core/dev.c                                         |   2 +-
 net/core/dev_addr_lists.c                              |   4 +--
 net/core/devlink.c                                     |  74 ++++++++++++++++++++++++++++++++++++++------
 net/core/skbuff.c                                      |   1 +
 net/dccp/ipv4.c                                        |   1 +
 net/dccp/ipv6.c                                        |   1 +
 net/dsa/dsa_priv.h                                     |   8 ++++-
 net/ipv4/arp.c                                         |   2 +-
 net/ipv4/inet_timewait_sock.c                          |   1 +
 net/ipv4/ip_tunnel.c                                   |  11 ++++---
 net/ipv4/route.c                                       |  11 ++++---
 net/ipv6/ip6_gre.c                                     |   8 +++--
 net/ipv6/ip6_output.c                                  |   7 +++--
 net/ipv6/ip6_tunnel.c                                  |  11 ++++---
 net/ipv6/ip6_vti.c                                     |   7 +++--
 net/ipv6/sit.c                                         |   8 +++--
 net/netlink/af_netlink.c                               |   2 ++
 net/sched/act_bpf.c                                    |  12 +++++---
 net/sched/cls_u32.c                                    |   1 +
 net/sctp/ipv6.c                                        |   4 ++-
 net/sctp/socket.c                                      |  13 +++++---
 net/tipc/diag.c                                        |   2 +-
 net/tipc/socket.c                                      |   6 ++--
 net/tipc/socket.h                                      |   4 +--
 49 files changed, 534 insertions(+), 273 deletions(-)

^ permalink raw reply

* Re: DPAA TX Issues
From: Jacob S. Moroni @ 2018-04-09  3:20 UTC (permalink / raw)
  To: madalin.bucur; +Cc: netdev
In-Reply-To: <1523231216.3843066.1330879848.1F0E863A@webmail.messagingengine.com>

On Sun, Apr 8, 2018, at 7:46 PM, Jacob S. Moroni wrote:
> Hello Madalin,
> 
> I've been experiencing some issues with the DPAA Ethernet driver,
> specifically related to frame transmission. Hopefully you can point
> me in the right direction.
> 
> TLDR: Attempting to transmit faster than a few frames per second causes
> the TX FQ CGR to enter into the congested state and remain there forever,
> even after transmission stops.
> 
> The hardware is a T2080RDB, running from the tip of net-next, using
> the standard t2080rdb device tree and corenet64_smp_defconfig kernel
> config. No changes were made to any of the files. The issue occurs
> with 4.16.1 stable as well. In fact, the only time I've been able
> to achieve reliable frame transmission was with the SDK 4.1 kernel.
> 
> For my tests, I'm running iperf3 both with and without the -R
> option (send/receive). When using a USB Ethernet adapter, there
> are no issues.
> 
> The issue is that it seems like the TX frame queues are getting
> "stuck" when attempting to transmit at rates greater than a few frames
> per second. Ping works fine, but it seems like anything that could
> potentially cause multiple TX frames to be enqueued causes issues.
> 
> If I run iperf3 in reverse mode (with the T2080RDB receiving), then
> I can achieve ~940 Mbps, but this is also somewhat unreliable.
> 
> If I run it with the T2080RDB transmitting, the test will never
> complete. Sometimes it starts transmitting for a few seconds then stops,
> and other times it never even starts. This also seems to force the
> interface into a bad state.
> 
> The ethtool stats show that the interface has entered
> congestion a few times, and that it's currently congested. The fact
> that it's currently congested even after stopping transmission
> indicates that the FQ somehow stopped being drained. I've also
> noticed that whenever this issue occurs, the TX confirmation
> counters are always less than the TX packet counters.
> 
> When it gets into this state, I can see that the memory usage is
> climbing, up until about the point of where the CGR threshold
> is (about 100 MB).
> 
> Any idea what could prevent the TX FQ from being drained? My first
> guess was flow control, but it's completely disabled.
> 
> I tried messing with the egress congestion threshold, workqueue
> assignments, etc., but nothing seemed to have any effect.
> 
> If you need any more information or want me to run any tests,
> please let me know.
> 
> Thanks,
> -- 
>   Jacob S. Moroni
>   mail@jakemoroni.com

It turns out that irqbalance was causing all of the issues. After
disabling it and rebooting, the interfaces worked perfectly.

Perhaps there's an issue with how the qman/bman portals are defined
as per-cpu variables.

During the portal's probe, the CPUs are assigned one-by-one and
subsequently passed into request_irq as the argument.
However, it seems like if the IRQ affinity changes, then the ISR could be
passed a reference to a per-cpu variable belonging to another CPU.

At least I know where to look now.

- Jake

^ permalink raw reply

* Re: kernel BUG at drivers/vhost/vhost.c:LINE! (2)
From: Stefan Hajnoczi @ 2018-04-09  3:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: syzbot, Jason Wang, kvm, linux-kernel, netdev, syzkaller-bugs,
	Linux Virtualization
In-Reply-To: <20180409054316-mutt-send-email-mst@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 961 bytes --]

On Mon, Apr 09, 2018 at 05:44:36AM +0300, Michael S. Tsirkin wrote:
> On Mon, Apr 09, 2018 at 10:37:45AM +0800, Stefan Hajnoczi wrote:
> > On Sat, Apr 7, 2018 at 3:02 AM, syzbot
> > <syzbot+65a84dde0214b0387ccd@syzkaller.appspotmail.com> wrote:
> > > syzbot hit the following crash on upstream commit
> > > 38c23685b273cfb4ccf31a199feccce3bdcb5d83 (Fri Apr 6 04:29:35 2018 +0000)
> > > Merge tag 'armsoc-drivers' of
> > > git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
> > > syzbot dashboard link:
> > > https://syzkaller.appspot.com/bug?extid=65a84dde0214b0387ccd
> > 
> > To prevent duplicated work: I am working on this one.
> > 
> > Stefan
> 
> Do you want to try this patchset:
> https://lkml.org/lkml/2018/4/5/665
> 
> ?

Thanks, I'll give it a shot.

I also noticed a regression in commit
d65026c6c62e7d9616c8ceb5a53b68bcdc050525 ("vhost: validate log when
IOTLB is enabled") and am currently testing a fix.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply

* Re: [RFC PATCH bpf-next 2/6] bpf: add bpf_get_stack helper
From: Alexei Starovoitov @ 2018-04-09  3:34 UTC (permalink / raw)
  To: Yonghong Song, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20180406214846.916265-3-yhs@fb.com>

On 4/6/18 2:48 PM, Yonghong Song wrote:
> Currently, stackmap and bpf_get_stackid helper are provided
> for bpf program to get the stack trace. This approach has
> a limitation though. If two stack traces have the same hash,
> only one will get stored in the stackmap table,
> so some stack traces are missing from user perspective.
>
> This patch implements a new helper, bpf_get_stack, will
> send stack traces directly to bpf program. The bpf program
> is able to see all stack traces, and then can do in-kernel
> processing or send stack traces to user space through
> shared map or bpf_perf_event_output.
>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  include/linux/bpf.h      |  1 +
>  include/linux/filter.h   |  3 ++-
>  include/uapi/linux/bpf.h | 17 +++++++++++++--
>  kernel/bpf/stackmap.c    | 56 ++++++++++++++++++++++++++++++++++++++++++++++++
>  kernel/bpf/syscall.c     | 12 ++++++++++-
>  kernel/bpf/verifier.c    |  3 +++
>  kernel/trace/bpf_trace.c | 50 +++++++++++++++++++++++++++++++++++++++++-
>  7 files changed, 137 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 95a7abd..72ccb9a 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -676,6 +676,7 @@ extern const struct bpf_func_proto bpf_get_current_comm_proto;
>  extern const struct bpf_func_proto bpf_skb_vlan_push_proto;
>  extern const struct bpf_func_proto bpf_skb_vlan_pop_proto;
>  extern const struct bpf_func_proto bpf_get_stackid_proto;
> +extern const struct bpf_func_proto bpf_get_stack_proto;
>  extern const struct bpf_func_proto bpf_sock_map_update_proto;
>
>  /* Shared helpers among cBPF and eBPF. */
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index fc4e8f9..9b64f63 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -467,7 +467,8 @@ struct bpf_prog {
>  				dst_needed:1,	/* Do we need dst entry? */
>  				blinded:1,	/* Was blinded */
>  				is_func:1,	/* program is a bpf function */
> -				kprobe_override:1; /* Do we override a kprobe? */
> +				kprobe_override:1, /* Do we override a kprobe? */
> +				need_callchain_buf:1; /* Needs callchain buffer? */
>  	enum bpf_prog_type	type;		/* Type of BPF program */
>  	enum bpf_attach_type	expected_attach_type; /* For some prog types */
>  	u32			len;		/* Number of filter blocks */
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index c5ec897..a4ff5b7 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -517,6 +517,17 @@ union bpf_attr {
>   *             other bits - reserved
>   *     Return: >= 0 stackid on success or negative error
>   *
> + * int bpf_get_stack(ctx, buf, size, flags)
> + *     walk user or kernel stack and store the ips in buf
> + *     @ctx: struct pt_regs*
> + *     @buf: user buffer to fill stack
> + *     @size: the buf size
> + *     @flags: bits 0-7 - numer of stack frames to skip
> + *             bit 8 - collect user stack instead of kernel
> + *             bit 11 - get build-id as well if user stack
> + *             other bits - reserved
> + *     Return: >= 0 size copied on success or negative error
> + *
>   * s64 bpf_csum_diff(from, from_size, to, to_size, seed)
>   *     calculate csum diff
>   *     @from: raw from buffer
> @@ -821,7 +832,8 @@ union bpf_attr {
>  	FN(msg_apply_bytes),		\
>  	FN(msg_cork_bytes),		\
>  	FN(msg_pull_data),		\
> -	FN(bind),
> +	FN(bind),			\
> +	FN(get_stack),
>
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> @@ -855,11 +867,12 @@ enum bpf_func_id {
>  /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
>  #define BPF_F_TUNINFO_IPV6		(1ULL << 0)
>
> -/* BPF_FUNC_get_stackid flags. */
> +/* BPF_FUNC_get_stackid and BPF_FUNC_get_stack flags. */
>  #define BPF_F_SKIP_FIELD_MASK		0xffULL
>  #define BPF_F_USER_STACK		(1ULL << 8)
>  #define BPF_F_FAST_STACK_CMP		(1ULL << 9)
>  #define BPF_F_REUSE_STACKID		(1ULL << 10)
> +#define BPF_F_USER_BUILD_ID		(1ULL << 11)

the comment above is not quite correct.
This new flag is only available for new helper.

>
>  /* BPF_FUNC_skb_set_tunnel_key flags. */
>  #define BPF_F_ZERO_CSUM_TX		(1ULL << 1)
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 04f6ec1..371c72e 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -402,6 +402,62 @@ const struct bpf_func_proto bpf_get_stackid_proto = {
>  	.arg3_type	= ARG_ANYTHING,
>  };
>
> +BPF_CALL_4(bpf_get_stack, struct pt_regs *, regs, void *, buf, u32, size,
> +	   u64, flags)
> +{
> +	u32 init_nr, trace_nr, copy_len, elem_size, num_elem;
> +	bool user_build_id = flags & BPF_F_USER_BUILD_ID;
> +	u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
> +	bool user = flags & BPF_F_USER_STACK;
> +	struct perf_callchain_entry *trace;
> +	bool kernel = !user;
> +	u64 *ips;
> +
> +	if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
> +			       BPF_F_USER_BUILD_ID)))
> +		return -EINVAL;
> +
> +	elem_size = (user && user_build_id) ? sizeof(struct bpf_stack_build_id)
> +					    : sizeof(u64);
> +	if (unlikely(size % elem_size))
> +		return -EINVAL;
> +
> +	num_elem = size / elem_size;
> +	if (sysctl_perf_event_max_stack < num_elem)
> +		init_nr = 0;

prog's buffer should be zero padded in this case since it
points to uninit_mem.

> +	else
> +		init_nr = sysctl_perf_event_max_stack - num_elem;
> +	trace = get_perf_callchain(regs, init_nr, kernel, user,
> +				   sysctl_perf_event_max_stack, false, false);
> +	if (unlikely(!trace))
> +		return -EFAULT;
> +
> +	trace_nr = trace->nr - init_nr;
> +	if (trace_nr <= skip)
> +		return -EFAULT;
> +
> +	trace_nr -= skip;
> +	trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
> +	copy_len = trace_nr * elem_size;
> +	ips = trace->ip + skip + init_nr;
> +	if (user && user_build_id)

the combination of kern + user_build_id should probably be rejected
earlier with einval.

> +		stack_map_get_build_id_offset(buf, ips, trace_nr, user);
> +	else
> +		memcpy(buf, ips, copy_len);
> +
> +	return copy_len;
> +}
> +
> +const struct bpf_func_proto bpf_get_stack_proto = {
> +	.func		= bpf_get_stack,
> +	.gpl_only	= true,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type	= ARG_PTR_TO_CTX,
> +	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
> +	.arg3_type	= ARG_CONST_SIZE_OR_ZERO,

why allow zero size?
I'm not sure the helper will work correctly when size=0

> +	.arg4_type	= ARG_ANYTHING,
> +};
> +
>  /* Called from eBPF program */
>  static void *stack_map_lookup_elem(struct bpf_map *map, void *key)
>  {
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 0244973..2aa3a65 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -984,10 +984,13 @@ void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock)
>  static void __bpf_prog_put_rcu(struct rcu_head *rcu)
>  {
>  	struct bpf_prog_aux *aux = container_of(rcu, struct bpf_prog_aux, rcu);
> +	bool need_callchain_buf = aux->prog->need_callchain_buf;
>
>  	free_used_maps(aux);
>  	bpf_prog_uncharge_memlock(aux->prog);
>  	security_bpf_prog_free(aux);
> +	if (need_callchain_buf)
> +		put_callchain_buffers();
>  	bpf_prog_free(aux->prog);
>  }
>
> @@ -1004,7 +1007,8 @@ static void __bpf_prog_put(struct bpf_prog *prog, bool do_idr_lock)
>  			bpf_prog_kallsyms_del(prog->aux->func[i]);
>  		bpf_prog_kallsyms_del(prog);
>
> -		call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
> +		synchronize_rcu();
> +		__bpf_prog_put_rcu(&prog->aux->rcu);

there should have been lockdep splat.
We cannot call synchronize_rcu here, since we cannot sleep
in some cases.

^ permalink raw reply

* Re: [PATCH net] arp: fix arp_filter on l3slave devices
From: Sasha Levin @ 2018-04-09  3:36 UTC (permalink / raw)
  To: Sasha Levin, Miguel Fadon Perlines, netdev@vger.kernel.org
  Cc: David Ahern, stable@vger.kernel.org
In-Reply-To: <1522916738-192046-1-git-send-email-mfadon@teldat.com>

Hi,

[This is an automated email]

This commit has been processed by the -stable helper bot and determined
to be a high probability candidate for -stable trees. (score: 33.5930)

The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, v4.4.126.

v4.16: Build OK!
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!
v4.4.126: Build OK!

Please let us know if you'd like to have this patch included in a stable tree.

^ permalink raw reply

* Re: [PATCH net 3/6] ipv6: sit: better validate user provided tunnel names
From: Sasha Levin @ 2018-04-09  3:37 UTC (permalink / raw)
  To: Sasha Levin, Eric Dumazet, David S . Miller
  Cc: netdev, stable@vger.kernel.org
In-Reply-To: <20180405133931.207634-4-edumazet@google.com>

Hi,

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag,
fixing commit: 1da177e4c3f4 Linux-2.6.12-rc2.

The bot has also determined it's probably a bug fixing patch. (score: 53.2877)

The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, v4.4.126.

v4.16: Build OK!
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!
v4.4.126: Build OK!

^ permalink raw reply

* Re: [PATCH net 5/6] ip6_tunnel: better validate user provided tunnel names
From: Sasha Levin @ 2018-04-09  3:37 UTC (permalink / raw)
  To: Sasha Levin, Eric Dumazet, David S . Miller
  Cc: netdev, stable@vger.kernel.org
In-Reply-To: <20180405133931.207634-6-edumazet@google.com>

Hi,

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag,
fixing commit: 1da177e4c3f4 Linux-2.6.12-rc2.

The bot has also determined it's probably a bug fixing patch. (score: 24.0820)

The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, v4.4.126.

v4.16: Build OK!
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!
v4.4.126: Build OK!

^ permalink raw reply

* Re: [PATCH net 4/6] ip6_gre: better validate user provided tunnel names
From: Sasha Levin @ 2018-04-09  3:37 UTC (permalink / raw)
  To: Sasha Levin, Eric Dumazet, David S . Miller
  Cc: netdev, stable@vger.kernel.org
In-Reply-To: <20180405133931.207634-5-edumazet@google.com>

Hi,

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag,
fixing commit: c12b395a4664 gre: Support GRE over IPv6.

The bot has also determined it's probably a bug fixing patch. (score: 52.9896)

The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, v4.4.126.

v4.16: Build OK!
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!
v4.4.126: Build OK!

^ permalink raw reply

* Re: [PATCH net 6/6] vti6: better validate user provided tunnel names
From: Sasha Levin @ 2018-04-09  3:37 UTC (permalink / raw)
  To: Sasha Levin, Eric Dumazet, David S . Miller
  Cc: netdev, Steffen Klassert, stable@vger.kernel.org
In-Reply-To: <20180405133931.207634-7-edumazet@google.com>

Hi,

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag,
fixing commit: ed1efb2aefbb ipv6: Add support for IPsec virtual tunnel interfaces.

The bot has also determined it's probably a bug fixing patch. (score: 65.4654)

The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, v4.4.126.

v4.16: Build OK!
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!
v4.4.126: Build OK!

^ permalink raw reply

* Re: [PATCH] net: phy: marvell: Enable interrupt function on LED2 pin
From: Sasha Levin @ 2018-04-09  3:37 UTC (permalink / raw)
  To: Sasha Levin, Esben Haabendal, Esben Haabendal,
	netdev@vger.kernel.org
  Cc: Esben Haabendal, Rasmus Villemoes, stable@vger.kernel.org
In-Reply-To: <20180405133504.12257-1-esben.haabendal@gmail.com>

Hi,

[This is an automated email]

This commit has been processed by the -stable helper bot and determined
to be a high probability candidate for -stable trees. (score: 7.3040)

The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, v4.4.126.

v4.16: Build OK!
v4.15.15: Build failed! Errors:
    drivers/net/phy/marvell.c:472:9: error: implicit declaration of function ‘phy_modify’; did you mean ‘pmd_modify’? [-Werror=implicit-function-declaration]

v4.14.32: Build failed! Errors:
    drivers/net/phy/marvell.c:472:9: error: implicit declaration of function ‘phy_modify’; did you mean ‘pmd_modify’? [-Werror=implicit-function-declaration]

v4.9.92: Failed to apply! Possible dependencies:
    864dc729d528 ("net: phy: marvell: Refactor m88e1121 RGMII delay configuration")

v4.4.126: Failed to apply! Possible dependencies:
    864dc729d528 ("net: phy: marvell: Refactor m88e1121 RGMII delay configuration")


Please let us know if you'd like to have this patch included in a stable tree.

--
Thanks,
Sasha

^ permalink raw reply

* Re: [PATCH net 0/6] net: better validate user provided tunnel names
From: Sasha Levin @ 2018-04-09  3:37 UTC (permalink / raw)
  To: Sasha Levin, Eric Dumazet, David S . Miller
  Cc: netdev, stable@vger.kernel.org
In-Reply-To: <20180405133931.207634-1-edumazet@google.com>

Hi,

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag,
fixing commit: ed1efb2aefbb ipv6: Add support for IPsec virtual tunnel interfaces.

The bot has also determined it's probably a bug fixing patch. (score: 53.6463)

The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, v4.4.126.

v4.16: Build OK!
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!
v4.4.126: Build OK!

^ permalink raw reply

* Re: [PATCH net 2/6] ip_tunnel: better validate user provided tunnel names
From: Sasha Levin @ 2018-04-09  3:37 UTC (permalink / raw)
  To: Sasha Levin, Eric Dumazet, David S . Miller
  Cc: netdev, stable@vger.kernel.org
In-Reply-To: <20180405133931.207634-3-edumazet@google.com>

Hi,

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag,
fixing commit: c54419321455 GRE: Refactor GRE tunneling code..

The bot has also determined it's probably a bug fixing patch. (score: 46.6256)

The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, v4.4.126.

v4.16: Build OK!
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!
v4.4.126: Build OK!

^ permalink raw reply

* Re: [RFC bpf-next] bpf: document eBPF helpers and add a script to generate man page
From: Alexei Starovoitov @ 2018-04-09  3:48 UTC (permalink / raw)
  To: Quentin Monnet; +Cc: daniel, ast, netdev, oss-drivers, linux-doc, linux-man
In-Reply-To: <20180406111122.11038-1-quentin.monnet@netronome.com>

On Fri, Apr 06, 2018 at 12:11:22PM +0100, Quentin Monnet wrote:
> eBPF helper functions can be called from within eBPF programs to perform
> a variety of tasks that would be otherwise hard or impossible to do with
> eBPF itself. There is a growing number of such helper functions in the
> kernel, but documentation is scarce. The main user space header file
> does contain a short commented description of most helpers, but it is
> somewhat outdated and not complete. It is more a "cheat sheet" than a
> real documentation accessible to new eBPF developers.
> 
> This commit attempts to improve the situation by replacing the existing
> overview for the helpers with a more developed description. Furthermore,
> a Python script is added to generate a manual page for eBPF helpers. The
> workflow is the following, and requires the rst2man utility:
> 
>     $ ./scripts/bpf_helpers_doc.py \
>             --filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst
>     $ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7
>     $ man /tmp/bpf-helpers.7
> 
> The objective is to keep all documentation related to the helpers in a
> single place, and to be able to generate from here a manual page that
> could be packaged in the man-pages repository and shipped with most
> distributions [1].
> 
> Additionally, parsing the prototypes of the helper functions could
> hopefully be reused, with a different Printer object, to generate
> header files needed in some eBPF-related projects.
> 
> Regarding the description of each helper, it comprises several items:
> 
> - The function prototype.
> - A description of the function and of its arguments (except for a
>   couple of cases, when there are no arguments and the return value
>   makes the function usage really obvious).
> - A description of return values (if not void).
> - A listing of eBPF program types (if relevant, map types) compatible
>   with the helper.
> - Information about the helper being restricted to GPL programs, or not.
> - The kernel version in which the helper was introduced.
> - The commit that introduced the helper (this is mostly to have it in
>   the source of the man page, as it can be used to track changes and
>   update the page).
> 
> For several helpers, descriptions are inspired (at times, nearly copied)
> from the commit logs introducing them in the kernel--Many thanks to
> their respective authors! They were completed as much as possible, the
> objective being to have something easily accessible even for people just
> starting with eBPF. There is probably a bit more work to do in this
> direction for some helpers.
> 
> Some RST formatting is used in the descriptions (not in function
> prototypes, to keep them readable, but the Python script provided in
> order to generate the RST for the manual page does add formatting to
> prototypes, to produce something pretty) to get "bold" and "italics" in
> manual pages. Hopefully, the descriptions in bpf.h file remains
> perfectly readable. Note that the few trailing white spaces are
> intentional, removing them would break paragraphs for rst2man.
> 
> The descriptions should ideally be updated each time someone adds a new
> helper, or updates the behaviour (compatibility extended to new program
> types, new socket option supported...) or the interface (new flags
> available, ...) of existing ones.
> 
> [1] I have not contacted people from the man-pages project prior to
>     sending this RFC, so I can offer no guaranty at this time that they
>     would accept to take the generated man page.
> 
> Cc: linux-doc@vger.kernel.org
> Cc: linux-man@vger.kernel.org
> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
> ---
>  include/uapi/linux/bpf.h   | 2237 ++++++++++++++++++++++++++++++++++++--------
>  scripts/bpf_helpers_doc.py |  568 +++++++++++
>  2 files changed, 2429 insertions(+), 376 deletions(-)
>  create mode 100755 scripts/bpf_helpers_doc.py
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index c5ec89732a8d..f47aeddbbe0a 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -367,394 +367,1879 @@ union bpf_attr {
>  
>  /* BPF helper function descriptions:
>   *
> - * void *bpf_map_lookup_elem(&map, &key)
> - *     Return: Map value or NULL
> - *
> - * int bpf_map_update_elem(&map, &key, &value, flags)
> - *     Return: 0 on success or negative error
> - *
> - * int bpf_map_delete_elem(&map, &key)
> - *     Return: 0 on success or negative error
> - *
> - * int bpf_probe_read(void *dst, int size, void *src)
> - *     Return: 0 on success or negative error
> + * void *bpf_map_lookup_elem(struct bpf_map *map, void *key)
> + * 	Description
> + * 		Perform a lookup in *map* for an entry associated to *key*.
> + * 	Return
> + * 		Map value associated to *key*, or **NULL** if no entry was
> + * 		found.
> + * 	For
> + * 		All types of programs. Limited to maps of types
> + * 		**BPF_MAP_TYPE_HASH**,
> + * 		**BPF_MAP_TYPE_ARRAY**,
> + * 		**BPF_MAP_TYPE_PERCPU_HASH**,
> + * 		**BPF_MAP_TYPE_PERCPU_ARRAY**,
> + * 		**BPF_MAP_TYPE_LRU_HASH**,
> + * 		**BPF_MAP_TYPE_LRU_PERCPU_HASH**,
> + * 		**BPF_MAP_TYPE_LPM_TRIE**,
> + * 		**BPF_MAP_TYPE_ARRAY_OF_MAPS**,
> + * 		and **BPF_MAP_TYPE_HASH_OF_MAPS**.
> + * 	GPL only
> + * 		No
> + * 	Since
> + * 		Linux 3.19

I think we should give it a try. There is a risk that it will become stale
and to reduce that I'd propose to remove 'For', 'GPL only' and 'Since',
since for new helpers 'Since' is kinda hard to fill in before it lands
all the way, and 'For' keeps changing as new types are added.

'Description' is the most useful and it needs separate thorough review
for every helper.

^ permalink raw reply

* Re: KASAN: slab-out-of-bounds Read in pfkey_add
From: Eric Biggers @ 2018-04-09  4:04 UTC (permalink / raw)
  To: Kevin Easton
  Cc: davem, herbert, linux-kernel, netdev, steffen.klassert,
	syzkaller-bugs, syzbot
In-Reply-To: <001a114292fadd3e2505607060a8@google.com>

On Fri, Dec 15, 2017 at 11:51:01PM -0800, syzbot wrote:
> Hello,
> 
> syzkaller hit the following crash on
> 50c4c4e268a2d7a3e58ebb698ac74da0de40ae36
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> C reproducer is attached
> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> for information about syzkaller reproducers
> 
> 
> audit: type=1400 audit(1513021744.055:7): avc:  denied  { map } for
> pid=3149 comm="syzkaller428285" path="/root/syzkaller428285483" dev="sda1"
> ino=16481 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023
> tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1
> ==================================================================
> BUG: KASAN: slab-out-of-bounds in memcpy include/linux/string.h:341 [inline]
> BUG: KASAN: slab-out-of-bounds in pfkey_msg2xfrm_state net/key/af_key.c:1212
> [inline]
> BUG: KASAN: slab-out-of-bounds in pfkey_add+0x1634/0x3270
> net/key/af_key.c:1491
> Read of size 8192 at addr ffff8801c5197318 by task syzkaller428285/3149
> 
> CPU: 0 PID: 3149 Comm: syzkaller428285 Not tainted 4.15.0-rc3+ #127
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>  print_address_description+0x73/0x250 mm/kasan/report.c:252
>  kasan_report_error mm/kasan/report.c:351 [inline]
>  kasan_report+0x25b/0x340 mm/kasan/report.c:409
>  check_memory_region_inline mm/kasan/kasan.c:260 [inline]
>  check_memory_region+0x137/0x190 mm/kasan/kasan.c:267
>  memcpy+0x23/0x50 mm/kasan/kasan.c:302
>  memcpy include/linux/string.h:341 [inline]
>  pfkey_msg2xfrm_state net/key/af_key.c:1212 [inline]
>  pfkey_add+0x1634/0x3270 net/key/af_key.c:1491
>  pfkey_process+0x60b/0x720 net/key/af_key.c:2809
>  pfkey_sendmsg+0x4d6/0x9f0 net/key/af_key.c:3648
>  sock_sendmsg_nosec net/socket.c:636 [inline]
>  sock_sendmsg+0xca/0x110 net/socket.c:646
>  ___sys_sendmsg+0x75b/0x8a0 net/socket.c:2026
>  __sys_sendmsg+0xe5/0x210 net/socket.c:2060
>  C_SYSC_sendmsg net/compat.c:739 [inline]
>  compat_SyS_sendmsg+0x2a/0x40 net/compat.c:737
>  do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
>  do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
>  entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
> RIP: 0023:0xf7fd4c79
> RSP: 002b:00000000ff9d7c1c EFLAGS: 00000203 ORIG_RAX: 0000000000000172
> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000205f5000
> RDX: 0000000000000000 RSI: 0000000000000167 RDI: 000000000000000f
> RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> 

Looks like this is going to be fixed by
https://patchwork.kernel.org/patch/10327883/ ("af_key: Always verify length of
provided sadb_key"), but it's not applied yet to the ipsec tree yet.  Kevin, for
future reference, for syzbot bugs it would be helpful to reply to the original
bug report and say that a patch was sent out, or even better send the patch as a
reply to the bug report email, e.g.

	git format-patch --in-reply-to="<001a114292fadd3e2505607060a8@google.com>"

for this one (and the Message ID can be found in the syzkaller-bugs archive even
if the email isn't in your inbox).  Otherwise people may not know that a patch
was sent out and do redundant work.  Thanks!

I also simplified the reproducer for this, so here it is just in case someone
wants it anyway:

        #include <sys/socket.h>
        #include <unistd.h>
    
        int main()
        {
                int fd = socket(AF_KEY, SOCK_RAW, 2);
                char msg[96] =
                "\x02\x03\x00\x02\x0c\x00\x00\x00\x00\x00\x00\x01\x02\x00\x00\x00"
                "\x03\x00\x05\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00"
                "\x00\x00\x00\x00\x00\x00\x00\x00"
                "\x03\x00\x06\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00"
                "\x00\x00\x00\x00\x00\x00\x00\x00"
                "\x02\x00\x01\x00\x00\x00\x00\x00\x00\x00\xfb\x00\x00\x00\x00\x00"
                "\x02\x00\x08\x00\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00";
    
                write(fd, msg, sizeof(msg));
        }

It causes a 8192-byte out-of-bounds read.

Eric

^ permalink raw reply

* Re: [PATCH] vhost-net: set packet weight of tx polling to 2 * vq size
From: haibinzhang(张海斌) @ 2018-04-09  4:09 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, kvm@vger.kernel.org,
	virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	lidongchen(陈立东),
	yunfangtai(台运方)


> On Fri, Apr 06, 2018 at 08:22:37AM +0000, haibinzhang(张海斌) wrote:
> > handle_tx will delay rx for tens or even hundreds of milliseconds when tx busy
> > polling udp packets with small length(e.g. 1byte udp payload), because setting
> > VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet length.
> > 
> > Ping-Latencies shown below were tested between two Virtual Machines using
> > netperf (UDP_STREAM, len=1), and then another machine pinged the client:
> > 
> > Packet-Weight      Ping-Latencies(millisecond)
> >                    min      avg       max
> > Origin           3.319   18.489    57.303
> > 64               1.643    2.021     2.552
> > 128              1.825    2.600     3.224
> > 256              1.997    2.710     4.295
> > 512              1.860    3.171     4.631
> > 1024             2.002    4.173     9.056
> > 2048             2.257    5.650     9.688
> > 4096             2.093    8.508    15.943
>
> And this is with Q size 256 right?

Yes. Ping-latencies with 512 VQ size show below.

Packet-Weight      Ping-Latencies(millisecond)
                    min      avg       max
Origin           6.357   29.177    66.245
64               2.798    3.614     4.403
128              2.861    3.820     4.775
256              3.008    4.018     4.807
512              3.254    4.523     5.824
1024             3.079    5.335     7.747
2048             3.944    8.201     12.762
4096             4.158   11.057    19.985

We will submit again. Is there anything else?

>
> > Ring size is a hint from device about a burst size it can tolerate. Based on
> > benchmarks, set the weight to 2 * vq size.
> > 
> > To evaluate this change, another tests were done using netperf(RR, TX) between
> > two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size was
> > tweaked through qemu. Results shown below does not show obvious changes.
>
> What I asked for is ping-latency with different VQ sizes,
> streaming below does not show anything.
>
> > vq size=256 TCP_RR                vq size=512 TCP_RR
> > size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
> >    1/       1/  -7%/        -2%      1/       1/   0%/        -2%
> >    1/       4/  +1%/         0%      1/       4/  +1%/         0%
> >    1/       8/  +1%/        -2%      1/       8/   0%/        +1%
> >   64/       1/  -6%/         0%     64/       1/  +7%/        +3%
> >   64/       4/   0%/        +2%     64/       4/  -1%/        +1%
> >   64/       8/   0%/         0%     64/       8/  -1%/        -2%
> >  256/       1/  -3%/        -4%    256/       1/  -4%/        -2%
> >  256/       4/  +3%/        +4%    256/       4/  +1%/        +2%
> >  256/       8/  +2%/         0%    256/       8/  +1%/        -1%
> > 
> > vq size=256 UDP_RR                vq size=512 UDP_RR
> > size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
> >    1/       1/  -5%/        +1%      1/       1/  -3%/        -2%
> >    1/       4/  +4%/        +1%      1/       4/  -2%/        +2%
> >    1/       8/  -1%/        -1%      1/       8/  -1%/         0%
> >   64/       1/  -2%/        -3%     64/       1/  +1%/        +1%
> >   64/       4/  -5%/        -1%     64/       4/  +2%/         0%
> >   64/       8/   0%/        -1%     64/       8/  -2%/        +1%
> >  256/       1/  +7%/        +1%    256/       1/  -7%/         0%
> >  256/       4/  +1%/        +1%    256/       4/  -3%/        -4%
> >  256/       8/  +2%/        +2%    256/       8/  +1%/        +1%
> > 
> > vq size=256 TCP_STREAM            vq size=512 TCP_STREAM
> > size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
> >   64/       1/   0%/        -3%     64/       1/   0%/         0%
> >   64/       4/  +3%/        -1%     64/       4/  -2%/        +4%
> >   64/       8/  +9%/        -4%     64/       8/  -1%/        +2%
> >  256/       1/  +1%/        -4%    256/       1/  +1%/        +1%
> >  256/       4/  -1%/        -1%    256/       4/  -3%/         0%
> >  256/       8/  +7%/        +5%    256/       8/  -3%/         0%
> >  512/       1/  +1%/         0%    512/       1/  -1%/        -1%
> >  512/       4/  +1%/        -1%    512/       4/   0%/         0%
> >  512/       8/  +7%/        -5%    512/       8/  +6%/        -1%
> > 1024/       1/   0%/        -1%   1024/       1/   0%/        +1%
> > 1024/       4/  +3%/         0%   1024/       4/  +1%/         0%
> > 1024/       8/  +8%/        +5%   1024/       8/  -1%/         0%
> > 2048/       1/  +2%/        +2%   2048/       1/  -1%/         0%
> > 2048/       4/  +1%/         0%   2048/       4/   0%/        -1%
> > 2048/       8/  -2%/         0%   2048/       8/   5%/        -1%
> > 4096/       1/  -2%/         0%   4096/       1/  -2%/         0%
> > 4096/       4/  +2%/         0%   4096/       4/   0%/         0%
> > 4096/       8/  +9%/        -2%   4096/       8/  -5%/        -1%
> > 
> > Signed-off-by: Haibin Zhang <haibinzhang@tencent.com>
> > Signed-off-by: Yunfang Tai <yunfangtai@tencent.com>
> > Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>
> Code is fine but I'd like to see validation of the heuristic
> 2*vq->num with another vq size.
>
>
>
> > ---
> >  drivers/vhost/net.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index 8139bc70ad7d..3563a305cc0a 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -44,6 +44,10 @@ MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;"
> >   * Using this limit prevents one virtqueue from starving others. */
> >  #define VHOST_NET_WEIGHT 0x80000
> >  
> > +/* Max number of packets transferred before requeueing the job.
> > + * Using this limit prevents one virtqueue from starving rx. */
> > +#define VHOST_NET_PKT_WEIGHT(vq) ((vq)->num * 2)
> > +
> >  /* MAX number of TX used buffers for outstanding zerocopy */
> >  #define VHOST_MAX_PEND 128
> >  #define VHOST_GOODCOPY_LEN 256
> > @@ -473,6 +477,7 @@ static void handle_tx(struct vhost_net *net)
> >  	struct socket *sock;
> >  	struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
> >  	bool zcopy, zcopy_used;
> > +	int sent_pkts = 0;
> >  
> >  	mutex_lock(&vq->mutex);
> >  	sock = vq->private_data;
> > @@ -580,7 +585,8 @@ static void handle_tx(struct vhost_net *net)
> >  		else
> >  			vhost_zerocopy_signal_used(net, vq);
> >  		vhost_net_tx_packet(net);
> > -		if (unlikely(total_len >= VHOST_NET_WEIGHT)) {
> > +		if (unlikely(total_len >= VHOST_NET_WEIGHT) ||
> > +		    unlikely(++sent_pkts >= VHOST_NET_PKT_WEIGHT(vq))) {
> >  			vhost_poll_queue(&vq->poll);
> >  			break;
> >  		}
> > -- 
> > 2.12.3
> > 


^ permalink raw reply

* Re: [RFC PATCH bpf-next 2/6] bpf: add bpf_get_stack helper
From: Yonghong Song @ 2018-04-09  4:53 UTC (permalink / raw)
  To: Alexei Starovoitov, daniel, netdev; +Cc: kernel-team
In-Reply-To: <e532530c-d5ea-a889-6467-1d8dd2b4f098@fb.com>



On 4/8/18 8:34 PM, Alexei Starovoitov wrote:
> On 4/6/18 2:48 PM, Yonghong Song wrote:
>> Currently, stackmap and bpf_get_stackid helper are provided
>> for bpf program to get the stack trace. This approach has
>> a limitation though. If two stack traces have the same hash,
>> only one will get stored in the stackmap table,
>> so some stack traces are missing from user perspective.
>>
>> This patch implements a new helper, bpf_get_stack, will
>> send stack traces directly to bpf program. The bpf program
>> is able to see all stack traces, and then can do in-kernel
>> processing or send stack traces to user space through
>> shared map or bpf_perf_event_output.
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>  include/linux/bpf.h      |  1 +
>>  include/linux/filter.h   |  3 ++-
>>  include/uapi/linux/bpf.h | 17 +++++++++++++--
>>  kernel/bpf/stackmap.c    | 56 
>> ++++++++++++++++++++++++++++++++++++++++++++++++
>>  kernel/bpf/syscall.c     | 12 ++++++++++-
>>  kernel/bpf/verifier.c    |  3 +++
>>  kernel/trace/bpf_trace.c | 50 +++++++++++++++++++++++++++++++++++++++++-
>>  7 files changed, 137 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index 95a7abd..72ccb9a 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -676,6 +676,7 @@ extern const struct bpf_func_proto 
>> bpf_get_current_comm_proto;
>>  extern const struct bpf_func_proto bpf_skb_vlan_push_proto;
>>  extern const struct bpf_func_proto bpf_skb_vlan_pop_proto;
>>  extern const struct bpf_func_proto bpf_get_stackid_proto;
>> +extern const struct bpf_func_proto bpf_get_stack_proto;
>>  extern const struct bpf_func_proto bpf_sock_map_update_proto;
>>
>>  /* Shared helpers among cBPF and eBPF. */
>> diff --git a/include/linux/filter.h b/include/linux/filter.h
>> index fc4e8f9..9b64f63 100644
>> --- a/include/linux/filter.h
>> +++ b/include/linux/filter.h
>> @@ -467,7 +467,8 @@ struct bpf_prog {
>>                  dst_needed:1,    /* Do we need dst entry? */
>>                  blinded:1,    /* Was blinded */
>>                  is_func:1,    /* program is a bpf function */
>> -                kprobe_override:1; /* Do we override a kprobe? */
>> +                kprobe_override:1, /* Do we override a kprobe? */
>> +                need_callchain_buf:1; /* Needs callchain buffer? */
>>      enum bpf_prog_type    type;        /* Type of BPF program */
>>      enum bpf_attach_type    expected_attach_type; /* For some prog 
>> types */
>>      u32            len;        /* Number of filter blocks */
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index c5ec897..a4ff5b7 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -517,6 +517,17 @@ union bpf_attr {
>>   *             other bits - reserved
>>   *     Return: >= 0 stackid on success or negative error
>>   *
>> + * int bpf_get_stack(ctx, buf, size, flags)
>> + *     walk user or kernel stack and store the ips in buf
>> + *     @ctx: struct pt_regs*
>> + *     @buf: user buffer to fill stack
>> + *     @size: the buf size
>> + *     @flags: bits 0-7 - numer of stack frames to skip
>> + *             bit 8 - collect user stack instead of kernel
>> + *             bit 11 - get build-id as well if user stack
>> + *             other bits - reserved
>> + *     Return: >= 0 size copied on success or negative error
>> + *
>>   * s64 bpf_csum_diff(from, from_size, to, to_size, seed)
>>   *     calculate csum diff
>>   *     @from: raw from buffer
>> @@ -821,7 +832,8 @@ union bpf_attr {
>>      FN(msg_apply_bytes),        \
>>      FN(msg_cork_bytes),        \
>>      FN(msg_pull_data),        \
>> -    FN(bind),
>> +    FN(bind),            \
>> +    FN(get_stack),
>>
>>  /* integer value in 'imm' field of BPF_CALL instruction selects which 
>> helper
>>   * function eBPF program intends to call
>> @@ -855,11 +867,12 @@ enum bpf_func_id {
>>  /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
>>  #define BPF_F_TUNINFO_IPV6        (1ULL << 0)
>>
>> -/* BPF_FUNC_get_stackid flags. */
>> +/* BPF_FUNC_get_stackid and BPF_FUNC_get_stack flags. */
>>  #define BPF_F_SKIP_FIELD_MASK        0xffULL
>>  #define BPF_F_USER_STACK        (1ULL << 8)
>>  #define BPF_F_FAST_STACK_CMP        (1ULL << 9)
>>  #define BPF_F_REUSE_STACKID        (1ULL << 10)
>> +#define BPF_F_USER_BUILD_ID        (1ULL << 11)
> 
> the comment above is not quite correct.
> This new flag is only available for new helper.

Right, some flags are used for both helpers and some are only used for 
one of them. Will make it clear in the next revision.

> 
>>
>>  /* BPF_FUNC_skb_set_tunnel_key flags. */
>>  #define BPF_F_ZERO_CSUM_TX        (1ULL << 1)
>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
>> index 04f6ec1..371c72e 100644
>> --- a/kernel/bpf/stackmap.c
>> +++ b/kernel/bpf/stackmap.c
>> @@ -402,6 +402,62 @@ const struct bpf_func_proto bpf_get_stackid_proto 
>> = {
>>      .arg3_type    = ARG_ANYTHING,
>>  };
>>
>> +BPF_CALL_4(bpf_get_stack, struct pt_regs *, regs, void *, buf, u32, 
>> size,
>> +       u64, flags)
>> +{
>> +    u32 init_nr, trace_nr, copy_len, elem_size, num_elem;
>> +    bool user_build_id = flags & BPF_F_USER_BUILD_ID;
>> +    u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
>> +    bool user = flags & BPF_F_USER_STACK;
>> +    struct perf_callchain_entry *trace;
>> +    bool kernel = !user;
>> +    u64 *ips;
>> +
>> +    if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
>> +                   BPF_F_USER_BUILD_ID)))
>> +        return -EINVAL;
>> +
>> +    elem_size = (user && user_build_id) ? sizeof(struct 
>> bpf_stack_build_id)
>> +                        : sizeof(u64);
>> +    if (unlikely(size % elem_size))
>> +        return -EINVAL;
>> +
>> +    num_elem = size / elem_size;
>> +    if (sysctl_perf_event_max_stack < num_elem)
>> +        init_nr = 0;
> 
> prog's buffer should be zero padded in this case since it
> points to uninit_mem.

Will make the change in the next revision.

> 
>> +    else
>> +        init_nr = sysctl_perf_event_max_stack - num_elem;
>> +    trace = get_perf_callchain(regs, init_nr, kernel, user,
>> +                   sysctl_perf_event_max_stack, false, false);
>> +    if (unlikely(!trace))
>> +        return -EFAULT;
>> +
>> +    trace_nr = trace->nr - init_nr;
>> +    if (trace_nr <= skip)
>> +        return -EFAULT;
>> +
>> +    trace_nr -= skip;
>> +    trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
>> +    copy_len = trace_nr * elem_size;
>> +    ips = trace->ip + skip + init_nr;
>> +    if (user && user_build_id)
> 
> the combination of kern + user_build_id should probably be rejected
> earlier with einval.

Right, I missed this case. It should be rejected.

> 
>> +        stack_map_get_build_id_offset(buf, ips, trace_nr, user);
>> +    else
>> +        memcpy(buf, ips, copy_len);
>> +
>> +    return copy_len;
>> +}
>> +
>> +const struct bpf_func_proto bpf_get_stack_proto = {
>> +    .func        = bpf_get_stack,
>> +    .gpl_only    = true,
>> +    .ret_type    = RET_INTEGER,
>> +    .arg1_type    = ARG_PTR_TO_CTX,
>> +    .arg2_type    = ARG_PTR_TO_UNINIT_MEM,
>> +    .arg3_type    = ARG_CONST_SIZE_OR_ZERO,
> 
> why allow zero size?
> I'm not sure the helper will work correctly when size=0

The only reason I had is to make bpf program easier so
they do not need to test zero size, similiar to 
bpf_probe_read/bpf_perf_event_output.

I have double checked my implementation and it should
handle zero size properly.

Let me double check whether whether not allowing zero
can still have reasonable bpf program coding which
passes verifier.

> 
>> +    .arg4_type    = ARG_ANYTHING,
>> +};
>> +
>>  /* Called from eBPF program */
>>  static void *stack_map_lookup_elem(struct bpf_map *map, void *key)
>>  {
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index 0244973..2aa3a65 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -984,10 +984,13 @@ void bpf_prog_free_id(struct bpf_prog *prog, 
>> bool do_idr_lock)
>>  static void __bpf_prog_put_rcu(struct rcu_head *rcu)
>>  {
>>      struct bpf_prog_aux *aux = container_of(rcu, struct bpf_prog_aux, 
>> rcu);
>> +    bool need_callchain_buf = aux->prog->need_callchain_buf;
>>
>>      free_used_maps(aux);
>>      bpf_prog_uncharge_memlock(aux->prog);
>>      security_bpf_prog_free(aux);
>> +    if (need_callchain_buf)
>> +        put_callchain_buffers();
>>      bpf_prog_free(aux->prog);
>>  }
>>
>> @@ -1004,7 +1007,8 @@ static void __bpf_prog_put(struct bpf_prog 
>> *prog, bool do_idr_lock)
>>              bpf_prog_kallsyms_del(prog->aux->func[i]);
>>          bpf_prog_kallsyms_del(prog);
>>
>> -        call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
>> +        synchronize_rcu();
>> +        __bpf_prog_put_rcu(&prog->aux->rcu);
> 
> there should have been lockdep splat.
> We cannot call synchronize_rcu here, since we cannot sleep
> in some cases.

Let me double check this. The following is the reason
why I am using synchronize_rcu().

With call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu) and
_bpf_prog_put_rcu calls put_callchain_buffers() which
calls mutex_lock(), the runtime with CONFIG_DEBUG_ATOMIC_SLEEP=y
will complains since potential sleep inside the call_rcu is not
allowed.

^ permalink raw reply

* Re: [RFC PATCH bpf-next 2/6] bpf: add bpf_get_stack helper
From: Alexei Starovoitov @ 2018-04-09  5:02 UTC (permalink / raw)
  To: Yonghong Song, daniel, netdev; +Cc: kernel-team
In-Reply-To: <625de1bb-7cb2-5f9e-01c3-a863cd27b0e6@fb.com>

On 4/8/18 9:53 PM, Yonghong Song wrote:
>>> @@ -1004,7 +1007,8 @@ static void __bpf_prog_put(struct bpf_prog
>>> *prog, bool do_idr_lock)
>>>              bpf_prog_kallsyms_del(prog->aux->func[i]);
>>>          bpf_prog_kallsyms_del(prog);
>>>
>>> -        call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
>>> +        synchronize_rcu();
>>> +        __bpf_prog_put_rcu(&prog->aux->rcu);
>>
>> there should have been lockdep splat.
>> We cannot call synchronize_rcu here, since we cannot sleep
>> in some cases.
>
> Let me double check this. The following is the reason
> why I am using synchronize_rcu().
>
> With call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu) and
> _bpf_prog_put_rcu calls put_callchain_buffers() which
> calls mutex_lock(), the runtime with CONFIG_DEBUG_ATOMIC_SLEEP=y
> will complains since potential sleep inside the call_rcu is not
> allowed.

I see. Indeed. We cannot call put_callchain_buffers() from rcu callback,
but doing synchronize_rcu() here is also not possible.
How about moving put_callchain into bpf_prog_free_deferred() ?

^ permalink raw reply

* Re: [PATCH] vhost-net: set packet weight of tx polling to 2 * vq size
From: Michael S. Tsirkin @ 2018-04-09  5:46 UTC (permalink / raw)
  To: haibinzhang(张海斌)
  Cc: kvm@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	yunfangtai(台运方),
	lidongchen(陈立东)
In-Reply-To: <88D661ADF6AFBF42B2AB88D8E7682B0901FC49BE@EXMBX-SZMAIL011.tencent.com>

On Mon, Apr 09, 2018 at 04:09:20AM +0000, haibinzhang(张海斌) wrote:
> 
> > On Fri, Apr 06, 2018 at 08:22:37AM +0000, haibinzhang(张海斌) wrote:
> > > handle_tx will delay rx for tens or even hundreds of milliseconds when tx busy
> > > polling udp packets with small length(e.g. 1byte udp payload), because setting
> > > VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet length.
> > > 
> > > Ping-Latencies shown below were tested between two Virtual Machines using
> > > netperf (UDP_STREAM, len=1), and then another machine pinged the client:
> > > 
> > > Packet-Weight      Ping-Latencies(millisecond)
> > >                    min      avg       max
> > > Origin           3.319   18.489    57.303
> > > 64               1.643    2.021     2.552
> > > 128              1.825    2.600     3.224
> > > 256              1.997    2.710     4.295
> > > 512              1.860    3.171     4.631
> > > 1024             2.002    4.173     9.056
> > > 2048             2.257    5.650     9.688
> > > 4096             2.093    8.508    15.943
> >
> > And this is with Q size 256 right?
> 
> Yes. Ping-latencies with 512 VQ size show below.
> 
> Packet-Weight      Ping-Latencies(millisecond)
>                     min      avg       max
> Origin           6.357   29.177    66.245
> 64               2.798    3.614     4.403
> 128              2.861    3.820     4.775
> 256              3.008    4.018     4.807
> 512              3.254    4.523     5.824
> 1024             3.079    5.335     7.747
> 2048             3.944    8.201     12.762
> 4096             4.158   11.057    19.985
> 
> We will submit again. Is there anything else?

Seems pretty consistent, a small dip at 2 VQ sizes.


Acked-by: Michael S. Tsirkin <mst@redhat.com>

> >
> > > Ring size is a hint from device about a burst size it can tolerate. Based on
> > > benchmarks, set the weight to 2 * vq size.
> > > 
> > > To evaluate this change, another tests were done using netperf(RR, TX) between
> > > two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size was
> > > tweaked through qemu. Results shown below does not show obvious changes.
> >
> > What I asked for is ping-latency with different VQ sizes,
> > streaming below does not show anything.
> >
> > > vq size=256 TCP_RR                vq size=512 TCP_RR
> > > size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
> > >    1/       1/  -7%/        -2%      1/       1/   0%/        -2%
> > >    1/       4/  +1%/         0%      1/       4/  +1%/         0%
> > >    1/       8/  +1%/        -2%      1/       8/   0%/        +1%
> > >   64/       1/  -6%/         0%     64/       1/  +7%/        +3%
> > >   64/       4/   0%/        +2%     64/       4/  -1%/        +1%
> > >   64/       8/   0%/         0%     64/       8/  -1%/        -2%
> > >  256/       1/  -3%/        -4%    256/       1/  -4%/        -2%
> > >  256/       4/  +3%/        +4%    256/       4/  +1%/        +2%
> > >  256/       8/  +2%/         0%    256/       8/  +1%/        -1%
> > > 
> > > vq size=256 UDP_RR                vq size=512 UDP_RR
> > > size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
> > >    1/       1/  -5%/        +1%      1/       1/  -3%/        -2%
> > >    1/       4/  +4%/        +1%      1/       4/  -2%/        +2%
> > >    1/       8/  -1%/        -1%      1/       8/  -1%/         0%
> > >   64/       1/  -2%/        -3%     64/       1/  +1%/        +1%
> > >   64/       4/  -5%/        -1%     64/       4/  +2%/         0%
> > >   64/       8/   0%/        -1%     64/       8/  -2%/        +1%
> > >  256/       1/  +7%/        +1%    256/       1/  -7%/         0%
> > >  256/       4/  +1%/        +1%    256/       4/  -3%/        -4%
> > >  256/       8/  +2%/        +2%    256/       8/  +1%/        +1%
> > > 
> > > vq size=256 TCP_STREAM            vq size=512 TCP_STREAM
> > > size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
> > >   64/       1/   0%/        -3%     64/       1/   0%/         0%
> > >   64/       4/  +3%/        -1%     64/       4/  -2%/        +4%
> > >   64/       8/  +9%/        -4%     64/       8/  -1%/        +2%
> > >  256/       1/  +1%/        -4%    256/       1/  +1%/        +1%
> > >  256/       4/  -1%/        -1%    256/       4/  -3%/         0%
> > >  256/       8/  +7%/        +5%    256/       8/  -3%/         0%
> > >  512/       1/  +1%/         0%    512/       1/  -1%/        -1%
> > >  512/       4/  +1%/        -1%    512/       4/   0%/         0%
> > >  512/       8/  +7%/        -5%    512/       8/  +6%/        -1%
> > > 1024/       1/   0%/        -1%   1024/       1/   0%/        +1%
> > > 1024/       4/  +3%/         0%   1024/       4/  +1%/         0%
> > > 1024/       8/  +8%/        +5%   1024/       8/  -1%/         0%
> > > 2048/       1/  +2%/        +2%   2048/       1/  -1%/         0%
> > > 2048/       4/  +1%/         0%   2048/       4/   0%/        -1%
> > > 2048/       8/  -2%/         0%   2048/       8/   5%/        -1%
> > > 4096/       1/  -2%/         0%   4096/       1/  -2%/         0%
> > > 4096/       4/  +2%/         0%   4096/       4/   0%/         0%
> > > 4096/       8/  +9%/        -2%   4096/       8/  -5%/        -1%
> > > 
> > > Signed-off-by: Haibin Zhang <haibinzhang@tencent.com>
> > > Signed-off-by: Yunfang Tai <yunfangtai@tencent.com>
> > > Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> >
> > Code is fine but I'd like to see validation of the heuristic
> > 2*vq->num with another vq size.
> >
> >
> >
> > > ---
> > >  drivers/vhost/net.c | 8 +++++++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > > index 8139bc70ad7d..3563a305cc0a 100644
> > > --- a/drivers/vhost/net.c
> > > +++ b/drivers/vhost/net.c
> > > @@ -44,6 +44,10 @@ MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;"
> > >   * Using this limit prevents one virtqueue from starving others. */
> > >  #define VHOST_NET_WEIGHT 0x80000
> > >  
> > > +/* Max number of packets transferred before requeueing the job.
> > > + * Using this limit prevents one virtqueue from starving rx. */
> > > +#define VHOST_NET_PKT_WEIGHT(vq) ((vq)->num * 2)
> > > +
> > >  /* MAX number of TX used buffers for outstanding zerocopy */
> > >  #define VHOST_MAX_PEND 128
> > >  #define VHOST_GOODCOPY_LEN 256
> > > @@ -473,6 +477,7 @@ static void handle_tx(struct vhost_net *net)
> > >  	struct socket *sock;
> > >  	struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
> > >  	bool zcopy, zcopy_used;
> > > +	int sent_pkts = 0;
> > >  
> > >  	mutex_lock(&vq->mutex);
> > >  	sock = vq->private_data;
> > > @@ -580,7 +585,8 @@ static void handle_tx(struct vhost_net *net)
> > >  		else
> > >  			vhost_zerocopy_signal_used(net, vq);
> > >  		vhost_net_tx_packet(net);
> > > -		if (unlikely(total_len >= VHOST_NET_WEIGHT)) {
> > > +		if (unlikely(total_len >= VHOST_NET_WEIGHT) ||
> > > +		    unlikely(++sent_pkts >= VHOST_NET_PKT_WEIGHT(vq))) {
> > >  			vhost_poll_queue(&vq->poll);
> > >  			break;
> > >  		}
> > > -- 
> > > 2.12.3
> > > 
> 
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: KASAN: slab-out-of-bounds Read in pfkey_add
From: Kevin Easton @ 2018-04-09  5:56 UTC (permalink / raw)
  To: Eric Biggers
  Cc: davem, herbert, linux-kernel, netdev, steffen.klassert,
	syzkaller-bugs, syzbot
In-Reply-To: <20180409040433.GJ685@sol.localdomain>

On Sun, Apr 08, 2018 at 09:04:33PM -0700, Eric Biggers wrote:
...
> 
> Looks like this is going to be fixed by
> https://patchwork.kernel.org/patch/10327883/ ("af_key: Always verify length of
> provided sadb_key"), but it's not applied yet to the ipsec tree yet.  Kevin, for
> future reference, for syzbot bugs it would be helpful to reply to the original
> bug report and say that a patch was sent out, or even better send the patch as a
> reply to the bug report email, e.g.
> 
> 	git format-patch --in-reply-to="<001a114292fadd3e2505607060a8@google.com>"
> 
> for this one (and the Message ID can be found in the syzkaller-bugs archive even
> if the email isn't in your inbox).

Sure, I can do that.

    - Kevin

^ permalink raw reply

* WARNING in ip_rt_bug
From: syzbot @ 2018-04-09  5:59 UTC (permalink / raw)
  To: davem, kuznet, linux-kernel, netdev, syzkaller-bugs, yoshfuji

Hello,

syzbot hit the following crash on net-next commit
8bde261e535257e81087d39ff808414e2f5aa39d (Sun Apr 1 02:31:43 2018 +0000)
Merge tag 'mlx5-updates-2018-03-30' of  
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=b09ac67a2af842b12eab

Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=5991727739437056
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=3327544840960562528
compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+b09ac67a2af842b12eab@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.
If you forward the report, please keep this part and the footer.

netlink: 'syz-executor6': attribute type 3 has an invalid length.
WARNING: CPU: 0 PID: 11678 at net/ipv4/route.c:1213 ip_rt_bug+0x15/0x20  
net/ipv4/route.c:1212
Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 11678 Comm: kworker/u4:7 Not tainted 4.16.0-rc6+ #289
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  <IRQ>
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x194/0x24d lib/dump_stack.c:53
  panic+0x1e4/0x41c kernel/panic.c:183
  __warn+0x1dc/0x200 kernel/panic.c:547
  report_bug+0x1f4/0x2b0 lib/bug.c:186
  fixup_bug.part.10+0x37/0x80 arch/x86/kernel/traps.c:178
  fixup_bug arch/x86/kernel/traps.c:247 [inline]
  do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
  invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986
RIP: 0010:ip_rt_bug+0x15/0x20 net/ipv4/route.c:1212
RSP: 0018:ffff8801db007290 EFLAGS: 00010282
RAX: dffffc0000000000 RBX: ffff8801d8dda3c0 RCX: ffffffff856c31ca
RDX: 0000000000000100 RSI: ffffffff8858c300 RDI: 0000000000000282
RBP: ffff8801db007298 R08: 1ffff1003b600de1 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801d8dda3c0
R13: ffff88019bdb2200 R14: ffff88019bdeed80 R15: ffff8801d8dda418
  dst_output include/net/dst.h:444 [inline]
  ip_local_out+0x95/0x160 net/ipv4/ip_output.c:124
  ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1414
  ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1434
  icmp_push_reply+0x395/0x4f0 net/ipv4/icmp.c:394
  icmp_send+0x1136/0x19b0 net/ipv4/icmp.c:741
  ipv4_link_failure+0x2a/0x1b0 net/ipv4/route.c:1200
  dst_link_failure include/net/dst.h:427 [inline]
  arp_error_report+0xae/0x180 net/ipv4/arp.c:297
  neigh_invalidate+0x225/0x530 net/core/neighbour.c:883
  neigh_timer_handler+0x897/0xd60 net/core/neighbour.c:969
  call_timer_fn+0x228/0x820 kernel/time/timer.c:1326
  expire_timers kernel/time/timer.c:1363 [inline]
  __run_timers+0x7ee/0xb70 kernel/time/timer.c:1666
  run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
  __do_softirq+0x2d7/0xb85 kernel/softirq.c:285
  invoke_softirq kernel/softirq.c:365 [inline]
  irq_exit+0x1cc/0x200 kernel/softirq.c:405
  exiting_irq arch/x86/include/asm/apic.h:541 [inline]
  smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:857
  </IRQ>
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:778  
[inline]
RIP: 0010:lock_acquire+0x256/0x580 kernel/locking/lockdep.c:3923
RSP: 0018:ffff880197b3f980 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff12
RAX: dffffc0000000000 RBX: ffff8801d225e400 RCX: 0000000000000000
RDX: 1ffffffff10a24e5 RSI: 00000000b98b8227 RDI: 0000000000000282
RBP: ffff880197b3fa78 R08: 1ffff10032f67e93 R09: 0000000000000004
R10: ffff880197b3f960 R11: 0000000000000003 R12: 1ffff10032f67f36
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
  down_write_killable+0x8a/0x140 kernel/locking/rwsem.c:84
  __bprm_mm_init fs/exec.c:297 [inline]
  bprm_mm_init fs/exec.c:414 [inline]
  do_execveat_common.isra.30+0xc8e/0x23c0 fs/exec.c:1771
  do_execve+0x31/0x40 fs/exec.c:1847
  call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
Dumping ftrace buffer:
    (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkaller@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.
Note: all commands must start from beginning of the line in the email body.

^ permalink raw reply

* WARNING in ip_rt_bug
From: syzbot @ 2018-04-09  5:59 UTC (permalink / raw)
  To: davem, kuznet, linux-kernel, netdev, syzkaller-bugs, yoshfuji

Hello,

syzbot hit the following crash on net-next commit
8bde261e535257e81087d39ff808414e2f5aa39d (Sun Apr 1 02:31:43 2018 +0000)
Merge tag 'mlx5-updates-2018-03-30' of  
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=b09ac67a2af842b12eab

Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=5991727739437056
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=3327544840960562528
compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+b09ac67a2af842b12eab@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.
If you forward the report, please keep this part and the footer.

netlink: 'syz-executor6': attribute type 3 has an invalid length.
WARNING: CPU: 0 PID: 11678 at net/ipv4/route.c:1213 ip_rt_bug+0x15/0x20  
net/ipv4/route.c:1212
Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 11678 Comm: kworker/u4:7 Not tainted 4.16.0-rc6+ #289
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  <IRQ>
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x194/0x24d lib/dump_stack.c:53
  panic+0x1e4/0x41c kernel/panic.c:183
  __warn+0x1dc/0x200 kernel/panic.c:547
  report_bug+0x1f4/0x2b0 lib/bug.c:186
  fixup_bug.part.10+0x37/0x80 arch/x86/kernel/traps.c:178
  fixup_bug arch/x86/kernel/traps.c:247 [inline]
  do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
  invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986
RIP: 0010:ip_rt_bug+0x15/0x20 net/ipv4/route.c:1212
RSP: 0018:ffff8801db007290 EFLAGS: 00010282
RAX: dffffc0000000000 RBX: ffff8801d8dda3c0 RCX: ffffffff856c31ca
RDX: 0000000000000100 RSI: ffffffff8858c300 RDI: 0000000000000282
RBP: ffff8801db007298 R08: 1ffff1003b600de1 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801d8dda3c0
R13: ffff88019bdb2200 R14: ffff88019bdeed80 R15: ffff8801d8dda418
  dst_output include/net/dst.h:444 [inline]
  ip_local_out+0x95/0x160 net/ipv4/ip_output.c:124
  ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1414
  ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1434
  icmp_push_reply+0x395/0x4f0 net/ipv4/icmp.c:394
  icmp_send+0x1136/0x19b0 net/ipv4/icmp.c:741
  ipv4_link_failure+0x2a/0x1b0 net/ipv4/route.c:1200
  dst_link_failure include/net/dst.h:427 [inline]
  arp_error_report+0xae/0x180 net/ipv4/arp.c:297
  neigh_invalidate+0x225/0x530 net/core/neighbour.c:883
  neigh_timer_handler+0x897/0xd60 net/core/neighbour.c:969
  call_timer_fn+0x228/0x820 kernel/time/timer.c:1326
  expire_timers kernel/time/timer.c:1363 [inline]
  __run_timers+0x7ee/0xb70 kernel/time/timer.c:1666
  run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
  __do_softirq+0x2d7/0xb85 kernel/softirq.c:285
  invoke_softirq kernel/softirq.c:365 [inline]
  irq_exit+0x1cc/0x200 kernel/softirq.c:405
  exiting_irq arch/x86/include/asm/apic.h:541 [inline]
  smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:857
  </IRQ>
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:778  
[inline]
RIP: 0010:lock_acquire+0x256/0x580 kernel/locking/lockdep.c:3923
RSP: 0018:ffff880197b3f980 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff12
RAX: dffffc0000000000 RBX: ffff8801d225e400 RCX: 0000000000000000
RDX: 1ffffffff10a24e5 RSI: 00000000b98b8227 RDI: 0000000000000282
RBP: ffff880197b3fa78 R08: 1ffff10032f67e93 R09: 0000000000000004
R10: ffff880197b3f960 R11: 0000000000000003 R12: 1ffff10032f67f36
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
  down_write_killable+0x8a/0x140 kernel/locking/rwsem.c:84
  __bprm_mm_init fs/exec.c:297 [inline]
  bprm_mm_init fs/exec.c:414 [inline]
  do_execveat_common.isra.30+0xc8e/0x23c0 fs/exec.c:1771
  do_execve+0x31/0x40 fs/exec.c:1847
  call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
Dumping ftrace buffer:
    (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkaller@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.
Note: all commands must start from beginning of the line in the email body.

^ permalink raw reply

* Re: WARNING in ip_rt_bug
From: Dmitry Vyukov @ 2018-04-09  6:06 UTC (permalink / raw)
  To: syzbot
  Cc: David Miller, Alexey Kuznetsov, LKML, netdev, syzkaller-bugs,
	Hideaki YOSHIFUJI, Eric Dumazet
In-Reply-To: <001a11408d9c43ff4a0569641a3f@google.com>

On Mon, Apr 9, 2018 at 7:59 AM, syzbot
<syzbot+b09ac67a2af842b12eab@syzkaller.appspotmail.com> wrote:
> Hello,
>
> syzbot hit the following crash on net-next commit
> 8bde261e535257e81087d39ff808414e2f5aa39d (Sun Apr 1 02:31:43 2018 +0000)
> Merge tag 'mlx5-updates-2018-03-30' of
> git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=b09ac67a2af842b12eab
>
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=5991727739437056
> Kernel config:
> https://syzkaller.appspot.com/x/.config?id=3327544840960562528
> compiler: gcc (GCC) 7.1.1 20170620
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+b09ac67a2af842b12eab@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.


+Eric said that perhaps we just need to revert:

commit c378a9c019cf5e017d1ed24954b54fae7bebd2bc
Date:   Sat May 21 07:16:42 2011 +0000
    ipv4: Give backtrace in ip_rt_bug().


> netlink: 'syz-executor6': attribute type 3 has an invalid length.
> WARNING: CPU: 0 PID: 11678 at net/ipv4/route.c:1213 ip_rt_bug+0x15/0x20
> net/ipv4/route.c:1212
> Kernel panic - not syncing: panic_on_warn set ...
>
> CPU: 0 PID: 11678 Comm: kworker/u4:7 Not tainted 4.16.0-rc6+ #289
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>  <IRQ>
>  __dump_stack lib/dump_stack.c:17 [inline]
>  dump_stack+0x194/0x24d lib/dump_stack.c:53
>  panic+0x1e4/0x41c kernel/panic.c:183
>  __warn+0x1dc/0x200 kernel/panic.c:547
>  report_bug+0x1f4/0x2b0 lib/bug.c:186
>  fixup_bug.part.10+0x37/0x80 arch/x86/kernel/traps.c:178
>  fixup_bug arch/x86/kernel/traps.c:247 [inline]
>  do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
>  invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986
> RIP: 0010:ip_rt_bug+0x15/0x20 net/ipv4/route.c:1212
> RSP: 0018:ffff8801db007290 EFLAGS: 00010282
> RAX: dffffc0000000000 RBX: ffff8801d8dda3c0 RCX: ffffffff856c31ca
> RDX: 0000000000000100 RSI: ffffffff8858c300 RDI: 0000000000000282
> RBP: ffff8801db007298 R08: 1ffff1003b600de1 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801d8dda3c0
> R13: ffff88019bdb2200 R14: ffff88019bdeed80 R15: ffff8801d8dda418
>  dst_output include/net/dst.h:444 [inline]
>  ip_local_out+0x95/0x160 net/ipv4/ip_output.c:124
>  ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1414
>  ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1434
>  icmp_push_reply+0x395/0x4f0 net/ipv4/icmp.c:394
>  icmp_send+0x1136/0x19b0 net/ipv4/icmp.c:741
>  ipv4_link_failure+0x2a/0x1b0 net/ipv4/route.c:1200
>  dst_link_failure include/net/dst.h:427 [inline]
>  arp_error_report+0xae/0x180 net/ipv4/arp.c:297
>  neigh_invalidate+0x225/0x530 net/core/neighbour.c:883
>  neigh_timer_handler+0x897/0xd60 net/core/neighbour.c:969
>  call_timer_fn+0x228/0x820 kernel/time/timer.c:1326
>  expire_timers kernel/time/timer.c:1363 [inline]
>  __run_timers+0x7ee/0xb70 kernel/time/timer.c:1666
>  run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
>  __do_softirq+0x2d7/0xb85 kernel/softirq.c:285
>  invoke_softirq kernel/softirq.c:365 [inline]
>  irq_exit+0x1cc/0x200 kernel/softirq.c:405
>  exiting_irq arch/x86/include/asm/apic.h:541 [inline]
>  smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:857
>  </IRQ>
> RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:778
> [inline]
> RIP: 0010:lock_acquire+0x256/0x580 kernel/locking/lockdep.c:3923
> RSP: 0018:ffff880197b3f980 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff12
> RAX: dffffc0000000000 RBX: ffff8801d225e400 RCX: 0000000000000000
> RDX: 1ffffffff10a24e5 RSI: 00000000b98b8227 RDI: 0000000000000282
> RBP: ffff880197b3fa78 R08: 1ffff10032f67e93 R09: 0000000000000004
> R10: ffff880197b3f960 R11: 0000000000000003 R12: 1ffff10032f67f36
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
>  down_write_killable+0x8a/0x140 kernel/locking/rwsem.c:84
>  __bprm_mm_init fs/exec.c:297 [inline]
>  bprm_mm_init fs/exec.c:414 [inline]
>  do_execveat_common.isra.30+0xc8e/0x23c0 fs/exec.c:1771
>  do_execve+0x31/0x40 fs/exec.c:1847
>  call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
>
>
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkaller@googlegroups.com.
>
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug
> report.
> Note: all commands must start from beginning of the line in the email body.
>
> --
> You received this message because you are subscribed to the Google Groups
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/001a11408d9c43ff4a0569641a3f%40google.com.
>
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox